Re: [Scikit-learn-general] [GSoC] Metric Learning

Artem Tue, 24 Mar 2015 18:40:04 -0700

Hi Vlad

1. Usually metric learning uses supervision in one of 2 forms: either two
sets of similar (distance is less than some predefined value u) and
dissimilar (distance is bigger than l) pairs, or a set of triplets (x, y,
z) such that d(x, y) < d(x, z). Though, I think, it's possible to
generalize the former to a case when we have control over thresholds u and
l for each pair, I'm not sure if it'd be useful.


The drawback of classification-like y is that it induces transitivity on
the notion of similarity, which may be not a good idea.

2. I mentioned KNN because it was the first distance-based algorithm I
though of. Also, existing literature mostly deals with applications to the
classification. One way to approach regression is to use kernel regression
(also known as Nadaraya-Watson method) with an RBF-like kernel where
Euclidean distance is replaced by Mahalanobis' distance.

I think, one can, indeed, bin target ys, learn a metric on top of these
bins, and then use any distance-based regression algorithm.

3. Each algorithm (NCA, LMNN, ITML) will have a separate pull request and
will be reviewed separately. I expect to finish the first PR (NCA) before
submitting the last one (ITML). By the end of the 10th week I might still
not have the second review completed, but it's okay, there're 2+ more weeks
to get it done.

On Wed, Mar 25, 2015 at 4:04 AM, Vlad Niculae <zephy...@gmail.com> wrote:

> Hi Artem, hi everybody,
>
> There were two API issues and I think both need thought. The first is the
> matrix-like Y which at the moment overlaps semantically with multilabel and
> multioutput-multiclass (though I think it could be seen as a form of
> multi-target regression…)
>
> The second is the `estimator.metric` which would be a new convention. The
> problem here is proxying fit/predict/{set|get}_params calls to the parent,
> as Joel noted.
>
> IMHO the first is slightly less scary that the second, but I’m not sure
> where we should draw the line.
>
> A few thoughts and questions about your proposal, on top of the excellent
> comments the others gave so far:
>
> The matrix-like Y links to a question I had: you say it only has -1, 1s
> and 0s. But don’t metric learning methods support more fine-grained
> (continuous) values there? Otherwise the expressiveness gain over just
> having a classification y is not that big, is it?
>
> Overall the proposal would benefit by including a bit more detail on the
> metric learning methods and the relationship/differences/tradeoffs between
> them.
>
> Would metric learning be useful for regression in any way? My question was
> triggered by your saying that it could be used in the KNN classifier, which
> made me wonder why not in the regressor. E.g. one could bin the `y`.
>
> Nitpicks:
>
> * what does SWE stand for?
> * missing articles: equivalent to (linear) -> equivalent to a (linear), as
> if trained kernelized -> as if we trained a kernelized, Core contribution->
> The core contribution, expect integration phase -> expect the integration
> phase.
> * I think ITML skips from review #1 to review #3.
>
> Hope this helps,
>
> Yours,
> Vlad
>
> > On 24 Mar 2015, at 20:25, Artem <barmaley....@gmail.com> wrote:
> >
> > You mean matrix-like y?
> >
> > Gael said
> > > FWIW It'll require some changes to cross-validation routines.
> > I'd rather we try not to add new needs and usecases to these before we
> release 1.0. We are already having a hard time covering in a homogeneous
> way all the possible options.
> >
> > Then Andreas
> > 1.2: matrix-like Y should actually be fine with cross-validation. I
> think it would be nice if we could get some benefit by having a
> classification-like y, but I'm not opposed to also allowing matrix Y.
> >
> > So if we don't want to alter API, I suppose this feature should be
> postponed until 1.0?
> >
> >
> > On Wed, Mar 25, 2015 at 1:44 AM, Olivier Grisel <
> olivier.gri...@ensta.org> wrote:
> > I also share Gael's concerns with respect to extending our API in yet
> > another direction at a time where we are trying to focus on ironing
> > out consistency issues...
> >
> > --
> > Olivier
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> > by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> > things parallel software development, from weekly thought leadership
> blogs to
> > news, videos, case studies, tutorials and more. Take a look and join the
> > conversation now. http://goparallel.sourceforge.net/
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> > by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> > things parallel software development, from weekly thought leadership
> blogs to
> > news, videos, case studies, tutorials and more. Take a look and join the
> > conversation now.
> http://goparallel.sourceforge.net/_______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [GSoC] Metric Learning

Reply via email to