I added a simple benchmark
<https://github.com/Barmaley-exe/scikit-learn/blob/metric-learning/benchmarks/bench_nca.py>
that
compares NCA-assisted 1NN with default one (with Euclidean distance) on
Wine dataset (it was one of the datasets reported in the NCA paper). See my
output here <https://gist.github.com/Barmaley-exe/a713f23f74eb53a2f2bd>.

It also compares semivectorized and vectorized implementations:
surprisingly, semivectorizes is about 2 times faster. I think, this might
be a reason to throw fully vectorized (nca_vectorized_oracle)
implementation away.

On Sat, May 30, 2015 at 12:33 AM, Michael Eickenberg <
michael.eickenb...@gmail.com> wrote:

> Thanks for the update.
>
> > So, what's the consensus on benchmarks? I can share ipython notebooks
> via gist, for example.
>
> My (weak) preference would be to have a script within the sklearn repo,
> just to keep stuff in one place for easy future reference.
>
> Michael
>
>
> On Fri, May 29, 2015 at 6:24 PM, Artem <barmaley....@gmail.com> wrote:
>
>> So, I created a WIP PR dedicated to NCA:
>> https://github.com/scikit-learn/scikit-learn/pull/4789
>>
>> As suggested by Michael, I refactored "the meat" into a function. I also
>> rewrote it as a first order oracle, so I can (and I do) use scipy's
>> optimizers. I've seen scipy.optimize.minimize (apparently, with BFGS)
>> sometimes stopping at some weird point (a local minimum / saddle point?),
>> whereas gradient descent seems to always converge. Though, I didn't test
>> either of them extensively.
>>
>> I also fully vectorized function and gradient calculations, no loops
>> involved.
>>
>> So, what's the consensus on benchmarks? I can share ipython notebooks via
>> gist, for example.
>>
>> On Fri, May 29, 2015 at 10:51 AM, Michael Eickenberg <
>> michael.eickenb...@gmail.com> wrote:
>>
>>> Hi Aurélien,
>>>
>>> thanks for these very good pointers!
>>> (Now we also know who else to bug periodically for opinions ;))
>>>
>>> Michael
>>>
>>>
>>> On Fri, May 29, 2015 at 12:05 AM, Aurélien Bellet <
>>> aurelien.bel...@telecom-paristech.fr> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> A few additional things to consider for scaling-up NCA to large
>>>> datasets:
>>>>
>>>> - Take a look at the t-SNE (technique for visualization/dim reduction
>>>> very similar to NCA) implementations, I think they have a few speed-up
>>>> tricks that you could potentially re-use:
>>>> http://lvdmaaten.github.io/tsne/
>>>>
>>>> - Like you said, SGD can help reduce the computational cost - you could
>>>> also consider recent improvements of SGD, such as SAG/SAGA, SVRG, etc.
>>>>
>>>> - Similarly to what was suggested in previous replies, a general idea is
>>>> to only consider a neighborhood around each point (either fixed in
>>>> advance, or updated every now and then during the course of
>>>> optimization), since the probabilities decrease very fast with the
>>>> distance so farther points can be safely ignored in the computation.
>>>> This is explored for instance in:
>>>> http://dl.acm.org/citation.cfm?id=2142432
>>>>
>>>> - Another related idea is to construct class representatives (for
>>>> instance using k-means), and to model the distribution only wrt these
>>>> points instead of the entire dataset. This is especially useful if some
>>>> classes are very large. An extreme version of this is to reframe NCA for
>>>> a Nearest Class Mean classifier, where each class is only modeled by its
>>>> center:
>>>>
>>>> https://hal.archives-ouvertes.fr/file/index/docid/722313/filename/mensink12eccv.final.pdf
>>>>
>>>> Hope this helps.
>>>>
>>>> Aurelien
>>>>
>>>> Le 5/28/15 11:20 PM, Andreas Mueller a écrit :
>>>> >
>>>> >
>>>> > On 05/28/2015 05:11 PM, Michael Eickenberg wrote:
>>>> >>
>>>> >> Code-wise, I would attack the problem as a function first. Write a
>>>> >> function that takes X and y (plus maybe some options) and gives back
>>>> >> L. You can put a skeleton of a sklearn estimator around it by calling
>>>> >> this function from fit.
>>>> >> Please keep your code either in a sklearn WIP PR or a public gist, so
>>>> >> it can be reviewed. Writing benchmarks can be framed as writing
>>>> >> examples, i.e. plot_* functions (maybe Andy or Olivier have a comment
>>>> >> on how benchmarks have been handled in the past?).
>>>> >>
>>>> > There is a "benchmark" folder, which is in a horrible shape.
>>>> > Basically there are three ways to do it: examples (with or without
>>>> plot
>>>> > depending on the runtime), a script in the benchmark folder, or a
>>>> gist.
>>>> > Often we just use a gist and the PR person posts the output. Not that
>>>> > great for reproducibility, though.
>>>> >
>>>> >
>>>> ------------------------------------------------------------------------------
>>>> > _______________________________________________
>>>> > Scikit-learn-general mailing list
>>>> > Scikit-learn-general@lists.sourceforge.net
>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>> >
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to