Do you have an idea of what y would look like? Also +1 on what you said (but you knew that ;)
On 03/18/2015 11:27 AM, Gael Varoquaux wrote: > Simple, efficient and robust metric learning that learns on a supervised > set and can do a transform that applies the metric? Do you think that > would be useful? It seems to me that it would. > > If people agree that it would be useful with such a very simple API, I > would be in favor of a GSoC proposal on this. As I don't think that we > have mentors that are experts of the algorithms involved, the student > would need to show in his proposal that he has a good understanding of > the algorithms and usecases. > > Importantly, when introducing a new type of algorithms to scikit-learn, > simpler is always better: the API, the examples, and the usecases must be > tuned on simple algorithms. > > Cheers, > > Gaël > > > > On Wed, Mar 18, 2015 at 10:53:22AM -0400, Andreas Mueller wrote: >> Hey. >> I am not very familiar with the literature on metric learning, but I think >> one >> thing that we need to think about before >> is what the interface would be. >> We really want something that works in a .fit().predict() or >> .fit().transform() >> way. >> I guess you could do "transform" to get the distances to the training data >> (is >> that what one would want?) >> But how would the labels for the "fit" look like? >> Cheers, >> Andy >> On 03/18/2015 08:39 AM, Artem wrote: >> Hello everyone >> Recently I mentioned metric learning as one of possible projects for >> this >> years' GSoC, and would like to hear your comments. >> Metric learning, as follows from the name, is about learning distance >> functions. Usually the metric that is learned is a Mahalanobis metric, >> thus >> the problem reduces to finding a PSD matrix A that minimizes some >> functional. >> Metric learning is usually done in a supervised way, that is, a user >> tells >> which points should be closer and which should be more distant. It can >> be >> expressed either in form of "similar" / "dissimilar", or "A is closer >> to B >> than to C". >> Since metric learning is (mostly) about a PSD matrix A, one can do >> Cholesky >> decomposition on it to obtain a matrix G to transform the data. It could >> lead to something like guided clustering, where we first transform the >> data >> space according to our prior knowledge of similarity. >> Metric learning seems to be quite an active field of research ([1], >> [2], [3 >> ]). There are 2 somewhat up-to date surveys: [1] and [2]. >> Top 3 seemingly most cited methods (according to Google Scholar) are >> □ MMC by Xing et al. This is a pioneering work and, according to the >> survey #2 >> The algorithm used to solve (1) is a simple projected gradient >> approach requiring the full >> >> eigenvalue decomposition of >> >> M >> >> at each iteration. This is typically intractable for medium >> >> and high-dimensional problems >> □ Large Margin Nearest Neighbor by Weinberger et al. The survey 2 >> acknowledges this method as "one of the most widely-used Mahalanobis >> distance learning methods" >> LMNN generally performs very well in practice, although it is >> sometimes prone to overfitting due to the absence of >> regularization, especially in high dimension >> □ Information-theoretic metric learning by Davis et al. This one >> features >> a special kind of regularizer called logDet. >> □ There are many other methods. If you guys know that other methods >> rock, >> let me know. >> So the project I'm proposing is about implementing 2nd or 3rd (or both?) >> algorithms along with a relevant transformer. > > >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming The Go Parallel Website, >> sponsored >> by Intel and developed in partnership with Slashdot Media, is your hub >> for all >> things parallel software development, from weekly thought leadership >> blogs to >> news, videos, case studies, tutorials and more. Take a look and join the >> conversation now. http://goparallel.sourceforge.net/ > > >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming The Go Parallel Website, >> sponsored >> by Intel and developed in partnership with Slashdot Media, is your hub for >> all >> things parallel software development, from weekly thought leadership blogs to >> news, videos, case studies, tutorials and more. Take a look and join the >> conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general