Re: [Scikit-learn-general] [GSoC2015 metric learning]

Michael Eickenberg Fri, 29 May 2015 14:35:23 -0700

Thanks for the update.

> So, what's the consensus on benchmarks? I can share ipython notebooks via
gist, for example.


My (weak) preference would be to have a script within the sklearn repo,
just to keep stuff in one place for easy future reference.

Michael


On Fri, May 29, 2015 at 6:24 PM, Artem <barmaley....@gmail.com> wrote:

> So, I created a WIP PR dedicated to NCA:
> https://github.com/scikit-learn/scikit-learn/pull/4789
>
> As suggested by Michael, I refactored "the meat" into a function. I also
> rewrote it as a first order oracle, so I can (and I do) use scipy's
> optimizers. I've seen scipy.optimize.minimize (apparently, with BFGS)
> sometimes stopping at some weird point (a local minimum / saddle point?),
> whereas gradient descent seems to always converge. Though, I didn't test
> either of them extensively.
>
> I also fully vectorized function and gradient calculations, no loops
> involved.
>
> So, what's the consensus on benchmarks? I can share ipython notebooks via
> gist, for example.
>
> On Fri, May 29, 2015 at 10:51 AM, Michael Eickenberg <
> michael.eickenb...@gmail.com> wrote:
>
>> Hi Aurélien,
>>
>> thanks for these very good pointers!
>> (Now we also know who else to bug periodically for opinions ;))
>>
>> Michael
>>
>>
>> On Fri, May 29, 2015 at 12:05 AM, Aurélien Bellet <
>> aurelien.bel...@telecom-paristech.fr> wrote:
>>
>>> Hi everyone,
>>>
>>> A few additional things to consider for scaling-up NCA to large datasets:
>>>
>>> - Take a look at the t-SNE (technique for visualization/dim reduction
>>> very similar to NCA) implementations, I think they have a few speed-up
>>> tricks that you could potentially re-use:
>>> http://lvdmaaten.github.io/tsne/
>>>
>>> - Like you said, SGD can help reduce the computational cost - you could
>>> also consider recent improvements of SGD, such as SAG/SAGA, SVRG, etc.
>>>
>>> - Similarly to what was suggested in previous replies, a general idea is
>>> to only consider a neighborhood around each point (either fixed in
>>> advance, or updated every now and then during the course of
>>> optimization), since the probabilities decrease very fast with the
>>> distance so farther points can be safely ignored in the computation.
>>> This is explored for instance in:
>>> http://dl.acm.org/citation.cfm?id=2142432
>>>
>>> - Another related idea is to construct class representatives (for
>>> instance using k-means), and to model the distribution only wrt these
>>> points instead of the entire dataset. This is especially useful if some
>>> classes are very large. An extreme version of this is to reframe NCA for
>>> a Nearest Class Mean classifier, where each class is only modeled by its
>>> center:
>>>
>>> https://hal.archives-ouvertes.fr/file/index/docid/722313/filename/mensink12eccv.final.pdf
>>>
>>> Hope this helps.
>>>
>>> Aurelien
>>>
>>> Le 5/28/15 11:20 PM, Andreas Mueller a écrit :
>>> >
>>> >
>>> > On 05/28/2015 05:11 PM, Michael Eickenberg wrote:
>>> >>
>>> >> Code-wise, I would attack the problem as a function first. Write a
>>> >> function that takes X and y (plus maybe some options) and gives back
>>> >> L. You can put a skeleton of a sklearn estimator around it by calling
>>> >> this function from fit.
>>> >> Please keep your code either in a sklearn WIP PR or a public gist, so
>>> >> it can be reviewed. Writing benchmarks can be framed as writing
>>> >> examples, i.e. plot_* functions (maybe Andy or Olivier have a comment
>>> >> on how benchmarks have been handled in the past?).
>>> >>
>>> > There is a "benchmark" folder, which is in a horrible shape.
>>> > Basically there are three ways to do it: examples (with or without plot
>>> > depending on the runtime), a script in the benchmark folder, or a gist.
>>> > Often we just use a gist and the PR person posts the output. Not that
>>> > great for reproducibility, though.
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > _______________________________________________
>>> > Scikit-learn-general mailing list
>>> > Scikit-learn-general@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> >
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [GSoC2015 metric learning]

Reply via email to