Yes, your suggestion is viable, but I have not seen any algorithms in
sklearn that use y like that in fit method.

​​
> I would have thought in the case of Mahalanobis distances that transform
> would transform each feature such that the resulting feature space was
> Euclidean.

​Exactly. Thus, methods that use usual L2 distance (like KMeans) will be
effectively using those custom metrics.​

Also, one can do the kernel trick to get a metric for a non-linear
transformation.

On Thu, Mar 19, 2015 at 5:35 AM, Joel Nothman <joel.noth...@gmail.com>
wrote:

> I don't know a lot about metric learning either, but it sounded like from
> your initial statement that fit(X, D) where D is the target/known distance
> between each point in X might be appropriate. I have no idea if this is how
> it is formulated in the literature (your mention of asymmetric metrics
> means it might be), but it seems an intuitive representation of the problem.
>
> Your suggestion of "similar" and "dissimilar" groups could be represented
> by D being a symmetric matrix with some distances 1 (dissimilar) and others
> 0 (similar), but you imply that some or the majority of cells would be
> unknown (in which case a sparse D interpreting all non-explicit values as
> unknown may be appropriate).
>
> I would have thought in the case of Mahalanobis distances that transform
> would transform each feature such that the resulting feature space was
> Euclidean.
>
> On 19 March 2015 at 08:47, Andreas Mueller <t3k...@gmail.com> wrote:
>
>>  In summary, I think this does look like a good basis for a proposal :)
>>
>>
>>
>> On 03/18/2015 05:14 PM, Artem wrote:
>>
>>  ​
>>> Do you think this interface would be useful enough?
>>
>> ​One of mentioned methods (LMNN) actually uses prior knowledge in exactly
>> the same way, by comparing labels' equality. Though, it was designed to
>> facilitate KNN. ​
>> ​
>> ​Authors of the other one (ITML) explicitly mention in the paper that one
>> can construct those sets S and D from labels.
>>
>> Do you think it would make sense to use such a transformer in a pipeline
>>> with a KNN classifier?
>>> I feel that training both on the same labels might be a bit of an issue
>>> with overfitting
>>
>> Pipelining looks like a good way to combine these methods, but
>> overfitting could be a problem, indeed.
>> Not sure how severe it can be.
>>
>> On Wed, Mar 18, 2015 at 10:07 PM, Andreas Mueller <t3k...@gmail.com>
>> wrote:
>>
>>>
>>> On 03/18/2015 02:53 PM, Artem wrote:
>>>
>>>  I mean that if we were solving classification, we would have y that
>>> tells us which class each example belongs to. So if we pass this
>>> classification's ground truth vector y to metric learning's fit, we can
>>> form S and D inside by saying that observations from the same class should
>>> be similar.
>>>
>>>  Ah, I got it now.
>>>
>>>
>>>  ​
>>>> Only being able to "transform" to a distance to the training set is a
>>>> bit limiting
>>>
>>> ​Sorry, I don't understand what you mean by this. Can you elaborate?​
>>> ​
>>> ​
>>>  The metric does not memorize training samples, it finds a (linear
>>> unless kernelized) transformation that makes similar examples cluster
>>> together. Moreover, since the metric is completely determined by a PSD
>>> matrix, we can compute its square root, and use to transform new data
>>> without any supervision.​
>>>
>>>  Ah, I think I misunderstood your proposal for the transformer
>>> interface. Never mind.
>>>
>>>
>>> Do you think this interface would be useful enough? I can think of a
>>> couple of applications.
>>> It would definitely fit well into the current scikit-learn framework.
>>>
>>> Do you think it would make sense to use such a transformer in a pipeline
>>> with a KNN classifier?
>>> I feel that training both on the same labels might be a bit of an issue
>>> with overfitting.
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>> sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>> for all
>>> things parallel software development, from weekly thought leadership
>>> blogs to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, 
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>> all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to