Re: [Scikit-learn-general] [GSoC] Metric Learning

Artem Wed, 18 Mar 2015 11:55:02 -0700

I mean that if we were solving classification, we would have y that tells
us which class each example belongs to. So if we pass this classification's
ground truth vector y to metric learning's fit, we can form S and D inside
by saying that observations from the same class should be similar.



> Only being able to "transform" to a distance to the training set is a bit
> limiting

Sorry, I don't understand what you mean by this. Can you elaborate?


The metric does not memorize training samples, it finds a (linear unless
kernelized) transformation that makes similar examples cluster together.
Moreover, since the metric is completely determined by a PSD matrix, we can
compute its square root, and use to transform new data without any
supervision.



On Wed, Mar 18, 2015 at 9:36 PM, Andreas Mueller <[email protected]> wrote:

>  The issue with having anything else than fit(X, y) would break
> cross_val_score, GridSearchCV and Pipeline.
> I agree that more control is good, but having functions that don't work
> well with the rest of the package is not great.
>
> Only being able to "transform" to a distance to the training set is a bit
> limiting, but I don't see a different way to do
> it within the current API.
> Can you explain this statement a bit more " We can go with usual y vector
> consisting of feature labels" ?
>
> Thanks,
> Andy
>
>
> On 03/18/2015 12:55 PM, Artem wrote:
>
>  Well, we could go with fit(X, y), but since algorithms use S and D, it'd
> better to give user a way to specify them directly if (s)he wants to.
> Either way, LMNN works with raw labels, so it doesn't require any changes
> to the existing API.
>
> On Wed, Mar 18, 2015 at 7:26 PM, Gael Varoquaux <
> [email protected]> wrote:
>
>> On Wed, Mar 18, 2015 at 07:21:18PM +0300, Artem wrote:
>> > As to what y should look like, it depends on what we'd like the
>> algorithm to
>> > do. We can go with usual y vector consisting of feature labels.
>> Actually, LMNN
>> > is done this way, the optimization objective depends on the equality of
>> labels
>> > only. For ITML (any many others) we need sets of (S)imilar and
>> (D)issimilar
>> > pairs, which can also be inferred from labels.
>>
>> > This is a bit less generic since we would imply that similarity is
>> transitive,
>> > and that's not true in a general case. For the general case we'd need a
>> way to
>> > feed in actual pairs. This could be done with fit having 2 optional
>> arguments
>> > (similar and dissimilar) defaulted to None, which are inferred from y
>> in case
>> > of absence.
>>
>> For now, I don't think that we want to add new variants of the
>> scikit-learn API.
>>
>> G
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [GSoC] Metric Learning

Reply via email to