Yeah, the API is the most important question of the implementation.
These learners are not classifiers (though there exist metric-adapting
algorithms like Neighbourhood Components Analysis
<http://en.wikipedia.org/wiki/Neighbourhood_components_analysis>), so they
don't fit into usual estimator-like fit + predict scheme.
Another thing to take into consideration is that we could want to use
learned metric, say, in the KNN, thus it'd helpful to have a way to get a
DistanceMetric corresponding to the learned metric.
With this in mind, a Transformer's instance with y-aware fit and an
attribute like `metric_` should work.
As to what y should look like, it depends on what we'd like the algorithm
to do. We can go with usual y vector consisting of feature labels.
Actually, LMNN is done this way, the optimization objective depends on the
equality of labels only. For ITML (any many others) we need sets of
(S)imilar and (D)issimilar pairs, which can also be inferred from labels.
This is a bit less generic since we would imply that similarity is
transitive, and that's not true in a general case. For the general case
we'd need a way to feed in actual pairs. This could be done with fit having
2 optional arguments (similar and dissimilar) defaulted to None, which are
inferred from y in case of absence.
So, the interface would be usual fit(X, y) if we only want to facilitate
non-linear methods (like KNN)
space_warper = ITMLTransformer(...)
new_X = space_warper.fit(X, y).transform(X)
Or more sophisticated
new_X = space_warper.fit(X, similar=S, different=D).transform(X)
Not all methods support the later scheme, so the former one would be
default, whereas (S, D)-aware methods will infer those sets from labels y.
P.S. I didn't consider a case when prior knowledge is given in the form of
"X is closer to A than to B", but it can be treated the same way, the set
of relations could be inferred from labels as R = {(X, A, B) : y(A) = y(X),
y(X) != y(B)}
On Wed, Mar 18, 2015 at 5:53 PM, Andreas Mueller <t3k...@gmail.com> wrote:
> Hey.
> I am not very familiar with the literature on metric learning, but I think
> one thing that we need to think about before
> is what the interface would be.
> We really want something that works in a .fit().predict() or
> .fit().transform() way.
> I guess you could do "transform" to get the distances to the training data
> (is that what one would want?)
> But how would the labels for the "fit" look like?
>
> Cheers,
> Andy
>
>
> On 03/18/2015 08:39 AM, Artem wrote:
>
> Hello everyone
>
> Recently I mentioned metric learning as one of possible projects for
> this years' GSoC, and would like to hear your comments.
>
> Metric learning, as follows from the name, is about learning distance
> functions. Usually the metric that is learned is a Mahalanobis metric, thus
> the problem reduces to finding a PSD matrix A that minimizes some
> functional.
>
> Metric learning is usually done in a supervised way, that is, a user
> tells which points should be closer and which should be more distant. It
> can be expressed either in form of "similar" / "dissimilar", or "A is
> closer to B than to C".
>
> Since metric learning is (mostly) about a PSD matrix A, one can
> do Cholesky decomposition on it to obtain a matrix G to transform the data.
> It could lead to something like guided clustering, where we first transform
> the data space according to our prior knowledge of similarity.
>
> Metric learning seems to be quite an active field of research ([1
> <http://www.icml2010.org/tutorials.html>], [2
> <http://www.ariel.ac.il/sites/ofirpele/DFML_ECCV2010_tutorial/>], [3
> <http://nips.cc/Conferences/2011/Program/event.php?ID=2543>]). There are
> 2 somewhat up-to date surveys: [1
> <http://web.cse.ohio-state.edu/%7Ekulis/pubs/ftml_metric_learning.pdf>]
> and [2 <http://arxiv.org/abs/1306.6709>].
>
> Top 3 seemingly most cited methods (according to Google Scholar) are
>
>
> - MMC by Xing et al.
>
> <http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information.pdf>
> This
> is a pioneering work and, according to the survey #2
>
> The algorithm used to solve (1) is a simple projected gradient approach
>> requiring the full
>>
>> eigenvalue decomposition of
>>
>> M
>>
>> at each iteration. This is typically intractable for medium
>>
>> and high-dimensional problems
>
>
> - Large Margin Nearest Neighbor by Weinberger et al
>
> <http://papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification.pdf>.
> The survey 2 acknowledges this method as "one of the most widely-used
> Mahalanobis distance learning methods"
>
> LMNN generally performs very well in practice, although it is sometimes
>> prone to overfitting due to the absence of regularization, especially in
>> high dimension
>
>
> - Information-theoretic metric learning by Davis et al.
> <http://dl.acm.org/citation.cfm?id=1273523> This one features a
> special kind of regularizer called logDet.
> - There are many other methods. If you guys know that other methods
> rock, let me know.
>
>
> So the project I'm proposing is about implementing 2nd or 3rd (or both?)
> algorithms along with a relevant transformer.
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general