Re: [Scikit-learn-general] [GSoC] Metric Learning

Andreas Mueller Mon, 23 Mar 2015 15:45:05 -0700

Hi Artem.
I thought that was you, but I wasn't sure.

Great, I linked to your draft from the wiki overview page, otherwise itis hard to find.

I haven't looked at it in detail yet, though.

1.1: no, generalizing K-Means is out of scope. Hierarchical should workwith arbitrary metrics.1.2: matrix-like Y should actually be fine with cross-validation. Ithink it would be nice if we could get some benefit by having aclassification-like y, but I'm not opposed to also allowing matrix Y.

2. I'd have to look into it. I don't understand why KPCA wouldn't work.It should work for all metrics, right? Having something produce asimilarity matrix is not ideal, but I think it could be made to work.I'd still call it ``transform`` probably, though. It would be a bitconfusing because it uses the squared transform, but it would make itpossible to build pipelines with clustering algorithms.


Best,
Andy


On 03/23/2015 06:31 PM, Artem wrote:

Hi Andreas

My GitHub's name is Barmaley-exe. I put a draft<https://github.com/scikit-learn/scikit-learn/wiki/%5BWIP%5D-GSoC-2015-Proposal:-Metric-Learning-module>of my proposal on wiki, but there are still several unanswered questions:


 1. One of the applications of metric learning I envision is a
    "somewhat-supervised" clustering, where user can seed in some
    knowledge, and then use the resultant metric in clustering. To get
    it working following is needed:
     1. DistanceMetric-aware Clustering. Turned out, there are already
        methods that can do clustering on a similarity matrix, but
        should I generalize KMeans / Hierarchical clustering?
     2. General scheme of training would require matrix-like y (Like
        the one proposed by Joel). What is the consensus on that?
 2. Though 2 of 3 methods that are planned to implement are
    kernelizable by KPCA, the last one (ITML) is not. So if I
    implement it (ITML with a kernel trick), it'd be impossible to
    transform the data space. Thus, it won't work as a Transformer.
    This problem can be fixed by making it not a Transformer, but an
    Estimator that would predict a similarity matrix. What do you think?

On Tue, Mar 24, 2015 at 1:09 AM, Andreas Mueller <[email protected]<mailto:[email protected]>> wrote:


    Hi Artem.
    I think the overall feedback on your proposal was positive.
    Did you get the chance to write it up yet?
    Please submit your proposal on melange
    https://www.google-melange.com (deadline is this Friday)
    and mention / link it in our wiki:
    
https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2015

    Btw, what is your github name?

    Andy

    On 03/18/2015 08:39 AM, Artem wrote:

    Hello everyone

    Recently I mentioned metric learning as one of possible projects
    for this years' GSoC, and would like to hear your comments.

    Metric learning, as follows from the name, is about learning
    distance functions. Usually the metric that is learned is a
    Mahalanobis metric, thus the problem reduces to finding a PSD
    matrix A that minimizes some functional.

    Metric learning is usually done in a supervised way, that is, a
    user tells which points should be closer and which should be more
    distant. It can be expressed either in form of "similar" /
    "dissimilar", or "A is closer to B than to C".

    Since metric learning is (mostly) about a PSD matrix A, one can
    do Cholesky decomposition on it to obtain a matrix G to transform
    the data. It could lead to something like guided clustering,
    where we first transform the data space according to our prior
    knowledge of similarity.

    Metric learning seems to be quite an active field of research ([1
    <http://www.icml2010.org/tutorials.html>], [2
    <http://www.ariel.ac.il/sites/ofirpele/DFML_ECCV2010_tutorial/>],
    [3 <http://nips.cc/Conferences/2011/Program/event.php?ID=2543>]).
    There are 2 somewhat up-to date surveys: [1
    <http://web.cse.ohio-state.edu/%7Ekulis/pubs/ftml_metric_learning.pdf>]
    and [2 <http://arxiv.org/abs/1306.6709>].

    Top 3 seemingly most cited methods (according to Google Scholar) are

      * MMC by Xing et al.
        
<http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information.pdf>
 This
        is a pioneering work and, according to the survey #2

            The algorithm used to solve (1) is a simple projected
            gradient approach requiring the full
             
            eigenvalue decomposition of
             
            M
             
            at each iteration. This is typically intractable for medium

and high-dimensional problems

      * Large Margin Nearest Neighbor by Weinberger et al
        
<http://papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification.pdf>.
        The survey 2 acknowledges this method as "one of the most
        widely-used Mahalanobis distance learning methods"

            LMNN generally performs very well in practice, although
            it is sometimes prone to overfitting due to the absence
            of regularization, especially in high dimension

      * Information-theoretic metric learning by Davis et al.
        <http://dl.acm.org/citation.cfm?id=1273523> This one features
        a special kind of regularizer called logDet.
      * There are many other methods. If you guys know that other
        methods rock, let me know.


    So the project I'm proposing is about implementing 2nd or 3rd (or
    both?) algorithms along with a relevant transformer.


    
------------------------------------------------------------------------------
    Dive into the World of Parallel Programming The Go Parallel Website, 
sponsored
    by Intel and developed in partnership with Slashdot Media, is your hub for 
all
    things parallel software development, from weekly thought leadership blogs 
to
    news, videos, case studies, tutorials and more. Take a look and join the
    conversation now.http://goparallel.sourceforge.net/


    _______________________________________________
    Scikit-learn-general mailing list
    [email protected]  
<mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



    
------------------------------------------------------------------------------
    Dive into the World of Parallel Programming The Go Parallel
    Website, sponsored
    by Intel and developed in partnership with Slashdot Media, is your
    hub for all
    things parallel software development, from weekly thought
    leadership blogs to
    news, videos, case studies, tutorials and more. Take a look and
    join the
    conversation now. http://goparallel.sourceforge.net/
    _______________________________________________
    Scikit-learn-general mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [GSoC] Metric Learning

Reply via email to