In summary, I think this does look like a good basis for a proposal :)
On 03/18/2015 05:14 PM, Artem wrote:
Do you think this interface would be useful enough?
One of mentioned methods (LMNN) actually uses prior knowledge in
exactly the same way, by comparing labels' equality. Though, it was
designed to facilitate KNN.
Authors of the other one (ITML) explicitly mention in the paper that
one can construct those sets S and D from labels.
Do you think it would make sense to use such a transformer in a
pipeline with a KNN classifier?
I feel that training both on the same labels might be a bit of an
issue with overfitting
Pipelining looks like a good way to combine these methods, but
overfitting could be a problem, indeed.
Not sure how severe it can be.
On Wed, Mar 18, 2015 at 10:07 PM, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
On 03/18/2015 02:53 PM, Artem wrote:
I mean that if we were solving classification, we would have y
that tells us which class each example belongs to. So if we pass
this classification's ground truth vector y to metric learning's
fit, we can form S and D inside by saying that observations from
the same class should be similar.
Ah, I got it now.
Only being able to "transform" to a distance to the training
set is a bit limiting
Sorry, I don't understand what you mean by this. Can you elaborate?
The metric does not memorize training samples, it finds a (linear
unless kernelized) transformation that makes similar examples
cluster together. Moreover, since the metric is completely
determined by a PSD matrix, we can compute its square root, and
use to transform new data without any supervision.
Ah, I think I misunderstood your proposal for the transformer
interface. Never mind.
Do you think this interface would be useful enough? I can think of
a couple of applications.
It would definitely fit well into the current scikit-learn framework.
Do you think it would make sense to use such a transformer in a
pipeline with a KNN classifier?
I feel that training both on the same labels might be a bit of an
issue with overfitting.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and
join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general