I don't know a lot about metric learning either, but it sounded like from
your initial statement that fit(X, D) where D is the target/known distance
between each point in X might be appropriate. I have no idea if this is how
it is formulated in the literature (your mention of asymmetric metrics
means it might be), but it seems an intuitive representation of the problem.
Your suggestion of "similar" and "dissimilar" groups could be represented
by D being a symmetric matrix with some distances 1 (dissimilar) and others
0 (similar), but you imply that some or the majority of cells would be
unknown (in which case a sparse D interpreting all non-explicit values as
unknown may be appropriate).
I would have thought in the case of Mahalanobis distances that transform
would transform each feature such that the resulting feature space was
Euclidean.
On 19 March 2015 at 08:47, Andreas Mueller <t3k...@gmail.com> wrote:
> In summary, I think this does look like a good basis for a proposal :)
>
>
>
> On 03/18/2015 05:14 PM, Artem wrote:
>
>
>> Do you think this interface would be useful enough?
>
> One of mentioned methods (LMNN) actually uses prior knowledge in exactly
> the same way, by comparing labels' equality. Though, it was designed to
> facilitate KNN.
>
> Authors of the other one (ITML) explicitly mention in the paper that one
> can construct those sets S and D from labels.
>
> Do you think it would make sense to use such a transformer in a pipeline
>> with a KNN classifier?
>> I feel that training both on the same labels might be a bit of an issue
>> with overfitting
>
> Pipelining looks like a good way to combine these methods, but overfitting
> could be a problem, indeed.
> Not sure how severe it can be.
>
> On Wed, Mar 18, 2015 at 10:07 PM, Andreas Mueller <t3k...@gmail.com>
> wrote:
>
>>
>> On 03/18/2015 02:53 PM, Artem wrote:
>>
>> I mean that if we were solving classification, we would have y that
>> tells us which class each example belongs to. So if we pass this
>> classification's ground truth vector y to metric learning's fit, we can
>> form S and D inside by saying that observations from the same class should
>> be similar.
>>
>> Ah, I got it now.
>>
>>
>>
>>> Only being able to "transform" to a distance to the training set is a
>>> bit limiting
>>
>> Sorry, I don't understand what you mean by this. Can you elaborate?
>>
>>
>> The metric does not memorize training samples, it finds a (linear
>> unless kernelized) transformation that makes similar examples cluster
>> together. Moreover, since the metric is completely determined by a PSD
>> matrix, we can compute its square root, and use to transform new data
>> without any supervision.
>>
>> Ah, I think I misunderstood your proposal for the transformer interface.
>> Never mind.
>>
>>
>> Do you think this interface would be useful enough? I can think of a
>> couple of applications.
>> It would definitely fit well into the current scikit-learn framework.
>>
>> Do you think it would make sense to use such a transformer in a pipeline
>> with a KNN classifier?
>> I feel that training both on the same labels might be a bit of an issue
>> with overfitting.
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general