Re: [Scikit-learn-general] [GSoC] Metric Learning

Andreas Mueller Wed, 18 Mar 2015 07:54:11 -0700

Hey.

I am not very familiar with the literature on metric learning, but Ithink one thing that we need to think about before

is what the interface would be.

We really want something that works in a .fit().predict() or.fit().transform() way.I guess you could do "transform" to get the distances to the trainingdata (is that what one would want?)

But how would the labels for the "fit" look like?


Cheers,
Andy

On 03/18/2015 08:39 AM, Artem wrote:

Hello everyone
Recently I mentioned metric learning as one of possible projects forthis years' GSoC, and would like to hear your comments.
Metric learning, as follows from the name, is about learning distancefunctions. Usually the metric that is learned is a Mahalanobis metric,thus the problem reduces to finding a PSD matrix A that minimizes somefunctional.
Metric learning is usually done in a supervised way, that is, a usertells which points should be closer and which should be more distant.It can be expressed either in form of "similar" / "dissimilar", or "Ais closer to B than to C".
Since metric learning is (mostly) about a PSD matrix A, one cando Cholesky decomposition on it to obtain a matrix G to transform thedata. It could lead to something like guided clustering, where wefirst transform the data space according to our prior knowledge ofsimilarity.
Metric learning seems to be quite an active field of research ([1<http://www.icml2010.org/tutorials.html>], [2<http://www.ariel.ac.il/sites/ofirpele/DFML_ECCV2010_tutorial/>], [3<http://nips.cc/Conferences/2011/Program/event.php?ID=2543>]). Thereare 2 somewhat up-to date surveys: [1<http://web.cse.ohio-state.edu/%7Ekulis/pubs/ftml_metric_learning.pdf>] and[2 <http://arxiv.org/abs/1306.6709>].
Top 3 seemingly most cited methods (according to Google Scholar) are

  * MMC by Xing et al.
    
<http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information.pdf>
 This
    is a pioneering work and, according to the survey #2

        The algorithm used to solve (1) is a simple projected gradient
        approach requiring the full
         
        eigenvalue decomposition of
         
        M
         
        at each iteration. This is typically intractable for medium
         
and high-dimensional problems
  * Large Margin Nearest Neighbor by Weinberger et al
    
<http://papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification.pdf>.
    The survey 2 acknowledges this method as "one of the most
    widely-used Mahalanobis distance learning methods"

        LMNN generally performs very well in practice, although it is
        sometimes prone to overfitting due to the absence of
        regularization, especially in high dimension

  * Information-theoretic metric learning by Davis et al.
    <http://dl.acm.org/citation.cfm?id=1273523> This one features a
    special kind of regularizer called logDet.
  * There are many other methods. If you guys know that other methods
    rock, let me know.
So the project I'm proposing is about implementing 2nd or 3rd (orboth?) algorithms along with a relevant transformer.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [GSoC] Metric Learning

Reply via email to