>
> Are there any objections on Joel's variant of y? It serves my needs, but
> is quite different from what one can usually find in scikit-learn.
FWIW It'll require some changes to cross-validation routines.
On 22 March 2015 at 11:54, Artem <barmaley....@gmail.com> wrote:
> Are there any objections on Joel's variant of y? It serves my needs, but
> is quite different from what one can usually find in scikit-learn.
>
> ------
>
> Another point I want to bring up is metric-aware KMeans. Currently it
> works with Euclidean distance only, which is not a problem for a
> Mahalanobis distance, but as (and if) we move towards kernel metrics, it
> becomes impossible to transform the data in a way that the Euclidean
> distance between the transformed points accurately reflects the distance
> between the points in a space with the learned metric.
>
> I think it'd nice to have "non-linear" metrics, too. One of the possible
> approaches (widely recognized among researchers on metric learning) is to
> use KernelPCA before learning the metric. This would work really well with
> sklearn's Pipelines.
> But not all the methods are justified to be used with Kernel PCA. Namely,
> ITML uses a special kind of regularization that breaks all theoretical
> guarantees.
>
> And, it's a bit weird that something that is called a metric learning
> actually does space transformation. Maybe we should also add something like
> factories of metrics, whose sole result is a DistanceMetric (in particular
> for those kernel metrics)?
>
> On Fri, Mar 20, 2015 at 10:01 AM, Gael Varoquaux <
> gael.varoqu...@normalesup.org> wrote:
>
>> On Fri, Mar 20, 2015 at 11:50:37AM +1100, Zay Maung Maung Aye wrote:
>> > Neighborhood Component Analysis is more cited than ITML.
>>
>> There is already a pull request on neighborhood component analysis
>> https://github.com/scikit-learn/scikit-learn/issues/3213
>>
>> A first step of the GSoC could be to complete it.
>>
>> Gaël
>>
>> > On Wed, Mar 18, 2015 at 11:39 PM, Artem <barmaley....@gmail.com> wrote:
>>
>> > Hello everyone
>>
>> > Recently I mentioned metric learning as one of possible projects
>> for this
>> > years' GSoC, and would like to hear your comments.
>>
>> > Metric learning, as follows from the name, is about learning
>> distance
>> > functions. Usually the metric that is learned is a Mahalanobis
>> metric, thus
>> > the problem reduces to finding a PSD matrix A that minimizes some
>> > functional.
>>
>> > Metric learning is usually done in a supervised way, that is, a
>> user tells
>> > which points should be closer and which should be more distant. It
>> can be
>> > expressed either in form of "similar" / "dissimilar", or "A is
>> closer to B
>> > than to C".
>>
>> > Since metric learning is (mostly) about a PSD matrix A, one can
>> do Cholesky
>> > decomposition on it to obtain a matrix G to transform the data. It
>> could
>> > lead to something like guided clustering, where we first transform
>> the data
>> > space according to our prior knowledge of similarity.
>>
>> > Metric learning seems to be quite an active field of research ([1],
>> [2], [3
>> > ]). There are 2 somewhat up-to date surveys: [1] and [2].
>>
>> > Top 3 seemingly most cited methods (according to Google Scholar)
>> are
>>
>> > □ MMC by Xing et al. This is a pioneering work and, according to
>> the
>> > survey #2
>>
>> > The algorithm used to solve (1) is a simple projected
>> gradient
>> > approach requiring the full
>> >
>> > eigenvalue decomposition of
>> >
>> > M
>> >
>> > at each iteration. This is typically intractable for medium
>> >
>> > and high-dimensional problems
>>
>> > □ Large Margin Nearest Neighbor by Weinberger et al. The survey 2
>> > acknowledges this method as "one of the most widely-used
>> Mahalanobis
>> > distance learning methods"
>>
>> > LMNN generally performs very well in practice, although it
>> is
>> > sometimes prone to overfitting due to the absence of
>> > regularization, especially in high dimension
>>
>> > □ Information-theoretic metric learning by Davis et al. This one
>> features
>> > a special kind of regularizer called logDet.
>> > □ There are many other methods. If you guys know that other
>> methods rock,
>> > let me know.
>>
>> > So the project I'm proposing is about implementing 2nd or 3rd (or
>> both?)
>> > algorithms along with a relevant transformer.
>>
>> >
>>
>> ------------------------------------------------------------------------------
>> > Dive into the World of Parallel Programming The Go Parallel Website,
>> > sponsored
>> > by Intel and developed in partnership with Slashdot Media, is your
>> hub for
>> > all
>> > things parallel software development, from weekly thought
>> leadership blogs
>> > to
>> > news, videos, case studies, tutorials and more. Take a look and
>> join the
>> > conversation now. http://goparallel.sourceforge.net/
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> --
>> Gael Varoquaux
>> Researcher, INRIA Parietal
>> Laboratoire de Neuro-Imagerie Assistee par Ordinateur
>> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>> Phone: ++ 33-1-69-08-79-68
>> http://gael-varoquaux.info
>> http://twitter.com/GaelVaroquaux
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general