I added a simple benchmark <https://github.com/Barmaley-exe/scikit-learn/blob/metric-learning/benchmarks/bench_nca.py> that compares NCA-assisted 1NN with default one (with Euclidean distance) on Wine dataset (it was one of the datasets reported in the NCA paper). See my output here <https://gist.github.com/Barmaley-exe/a713f23f74eb53a2f2bd>.
It also compares semivectorized and vectorized implementations: surprisingly, semivectorizes is about 2 times faster. I think, this might be a reason to throw fully vectorized (nca_vectorized_oracle) implementation away. On Sat, May 30, 2015 at 12:33 AM, Michael Eickenberg < michael.eickenb...@gmail.com> wrote: > Thanks for the update. > > > So, what's the consensus on benchmarks? I can share ipython notebooks > via gist, for example. > > My (weak) preference would be to have a script within the sklearn repo, > just to keep stuff in one place for easy future reference. > > Michael > > > On Fri, May 29, 2015 at 6:24 PM, Artem <barmaley....@gmail.com> wrote: > >> So, I created a WIP PR dedicated to NCA: >> https://github.com/scikit-learn/scikit-learn/pull/4789 >> >> As suggested by Michael, I refactored "the meat" into a function. I also >> rewrote it as a first order oracle, so I can (and I do) use scipy's >> optimizers. I've seen scipy.optimize.minimize (apparently, with BFGS) >> sometimes stopping at some weird point (a local minimum / saddle point?), >> whereas gradient descent seems to always converge. Though, I didn't test >> either of them extensively. >> >> I also fully vectorized function and gradient calculations, no loops >> involved. >> >> So, what's the consensus on benchmarks? I can share ipython notebooks via >> gist, for example. >> >> On Fri, May 29, 2015 at 10:51 AM, Michael Eickenberg < >> michael.eickenb...@gmail.com> wrote: >> >>> Hi Aurélien, >>> >>> thanks for these very good pointers! >>> (Now we also know who else to bug periodically for opinions ;)) >>> >>> Michael >>> >>> >>> On Fri, May 29, 2015 at 12:05 AM, Aurélien Bellet < >>> aurelien.bel...@telecom-paristech.fr> wrote: >>> >>>> Hi everyone, >>>> >>>> A few additional things to consider for scaling-up NCA to large >>>> datasets: >>>> >>>> - Take a look at the t-SNE (technique for visualization/dim reduction >>>> very similar to NCA) implementations, I think they have a few speed-up >>>> tricks that you could potentially re-use: >>>> http://lvdmaaten.github.io/tsne/ >>>> >>>> - Like you said, SGD can help reduce the computational cost - you could >>>> also consider recent improvements of SGD, such as SAG/SAGA, SVRG, etc. >>>> >>>> - Similarly to what was suggested in previous replies, a general idea is >>>> to only consider a neighborhood around each point (either fixed in >>>> advance, or updated every now and then during the course of >>>> optimization), since the probabilities decrease very fast with the >>>> distance so farther points can be safely ignored in the computation. >>>> This is explored for instance in: >>>> http://dl.acm.org/citation.cfm?id=2142432 >>>> >>>> - Another related idea is to construct class representatives (for >>>> instance using k-means), and to model the distribution only wrt these >>>> points instead of the entire dataset. This is especially useful if some >>>> classes are very large. An extreme version of this is to reframe NCA for >>>> a Nearest Class Mean classifier, where each class is only modeled by its >>>> center: >>>> >>>> https://hal.archives-ouvertes.fr/file/index/docid/722313/filename/mensink12eccv.final.pdf >>>> >>>> Hope this helps. >>>> >>>> Aurelien >>>> >>>> Le 5/28/15 11:20 PM, Andreas Mueller a écrit : >>>> > >>>> > >>>> > On 05/28/2015 05:11 PM, Michael Eickenberg wrote: >>>> >> >>>> >> Code-wise, I would attack the problem as a function first. Write a >>>> >> function that takes X and y (plus maybe some options) and gives back >>>> >> L. You can put a skeleton of a sklearn estimator around it by calling >>>> >> this function from fit. >>>> >> Please keep your code either in a sklearn WIP PR or a public gist, so >>>> >> it can be reviewed. Writing benchmarks can be framed as writing >>>> >> examples, i.e. plot_* functions (maybe Andy or Olivier have a comment >>>> >> on how benchmarks have been handled in the past?). >>>> >> >>>> > There is a "benchmark" folder, which is in a horrible shape. >>>> > Basically there are three ways to do it: examples (with or without >>>> plot >>>> > depending on the runtime), a script in the benchmark folder, or a >>>> gist. >>>> > Often we just use a gist and the PR person posts the output. Not that >>>> > great for reproducibility, though. >>>> > >>>> > >>>> ------------------------------------------------------------------------------ >>>> > _______________________________________________ >>>> > Scikit-learn-general mailing list >>>> > Scikit-learn-general@lists.sourceforge.net >>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> > >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general