Re: [Scikit-learn-general] [GSoC2015 metric learning]
On Sun, May 31, 2015 at 7:25 PM, Artem barmaley@gmail.com wrote: I added a simple benchmark https://github.com/Barmaley-exe/scikit-learn/blob/metric-learning/benchmarks/bench_nca.py that compares NCA-assisted 1NN with default one (with Euclidean distance) on Wine dataset (it was one of the datasets reported in the NCA paper). See my output here https://gist.github.com/Barmaley-exe/a713f23f74eb53a2f2bd. It also compares semivectorized and vectorized implementations: surprisingly, semivectorizes is about 2 times faster. I think, this might be a reason to throw fully vectorized (nca_vectorized_oracle) implementation away. This is very interesting. I like the way you made the semi-vectorized way and this seems to show that you are still benefitting from the remaining vectorizations you have. However, I am surprised to see such a difference between the two methods. Do you have any idea why this could be the case? Where do you think the vectorized version looses all this time? On Sat, May 30, 2015 at 12:33 AM, Michael Eickenberg michael.eickenb...@gmail.com wrote: Thanks for the update. So, what's the consensus on benchmarks? I can share ipython notebooks via gist, for example. My (weak) preference would be to have a script within the sklearn repo, just to keep stuff in one place for easy future reference. Michael On Fri, May 29, 2015 at 6:24 PM, Artem barmaley@gmail.com wrote: So, I created a WIP PR dedicated to NCA: https://github.com/scikit-learn/scikit-learn/pull/4789 As suggested by Michael, I refactored the meat into a function. I also rewrote it as a first order oracle, so I can (and I do) use scipy's optimizers. I've seen scipy.optimize.minimize (apparently, with BFGS) sometimes stopping at some weird point (a local minimum / saddle point?), whereas gradient descent seems to always converge. Though, I didn't test either of them extensively. I also fully vectorized function and gradient calculations, no loops involved. So, what's the consensus on benchmarks? I can share ipython notebooks via gist, for example. On Fri, May 29, 2015 at 10:51 AM, Michael Eickenberg michael.eickenb...@gmail.com wrote: Hi Aurélien, thanks for these very good pointers! (Now we also know who else to bug periodically for opinions ;)) Michael On Fri, May 29, 2015 at 12:05 AM, Aurélien Bellet aurelien.bel...@telecom-paristech.fr wrote: Hi everyone, A few additional things to consider for scaling-up NCA to large datasets: - Take a look at the t-SNE (technique for visualization/dim reduction very similar to NCA) implementations, I think they have a few speed-up tricks that you could potentially re-use: http://lvdmaaten.github.io/tsne/ - Like you said, SGD can help reduce the computational cost - you could also consider recent improvements of SGD, such as SAG/SAGA, SVRG, etc. - Similarly to what was suggested in previous replies, a general idea is to only consider a neighborhood around each point (either fixed in advance, or updated every now and then during the course of optimization), since the probabilities decrease very fast with the distance so farther points can be safely ignored in the computation. This is explored for instance in: http://dl.acm.org/citation.cfm?id=2142432 - Another related idea is to construct class representatives (for instance using k-means), and to model the distribution only wrt these points instead of the entire dataset. This is especially useful if some classes are very large. An extreme version of this is to reframe NCA for a Nearest Class Mean classifier, where each class is only modeled by its center: https://hal.archives-ouvertes.fr/file/index/docid/722313/filename/mensink12eccv.final.pdf Hope this helps. Aurelien Le 5/28/15 11:20 PM, Andreas Mueller a écrit : On 05/28/2015 05:11 PM, Michael Eickenberg wrote: Code-wise, I would attack the problem as a function first. Write a function that takes X and y (plus maybe some options) and gives back L. You can put a skeleton of a sklearn estimator around it by calling this function from fit. Please keep your code either in a sklearn WIP PR or a public gist, so it can be reviewed. Writing benchmarks can be framed as writing examples, i.e. plot_* functions (maybe Andy or Olivier have a comment on how benchmarks have been handled in the past?). There is a benchmark folder, which is in a horrible shape. Basically there are three ways to do it: examples (with or without plot depending on the runtime), a script in the benchmark folder, or a gist. Often we just use a gist and the PR person posts the output. Not that great for reproducibility, though. -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net
Re: [Scikit-learn-general] [GSoC2015 metric learning]
I added a simple benchmark https://github.com/Barmaley-exe/scikit-learn/blob/metric-learning/benchmarks/bench_nca.py that compares NCA-assisted 1NN with default one (with Euclidean distance) on Wine dataset (it was one of the datasets reported in the NCA paper). See my output here https://gist.github.com/Barmaley-exe/a713f23f74eb53a2f2bd. It also compares semivectorized and vectorized implementations: surprisingly, semivectorizes is about 2 times faster. I think, this might be a reason to throw fully vectorized (nca_vectorized_oracle) implementation away. On Sat, May 30, 2015 at 12:33 AM, Michael Eickenberg michael.eickenb...@gmail.com wrote: Thanks for the update. So, what's the consensus on benchmarks? I can share ipython notebooks via gist, for example. My (weak) preference would be to have a script within the sklearn repo, just to keep stuff in one place for easy future reference. Michael On Fri, May 29, 2015 at 6:24 PM, Artem barmaley@gmail.com wrote: So, I created a WIP PR dedicated to NCA: https://github.com/scikit-learn/scikit-learn/pull/4789 As suggested by Michael, I refactored the meat into a function. I also rewrote it as a first order oracle, so I can (and I do) use scipy's optimizers. I've seen scipy.optimize.minimize (apparently, with BFGS) sometimes stopping at some weird point (a local minimum / saddle point?), whereas gradient descent seems to always converge. Though, I didn't test either of them extensively. I also fully vectorized function and gradient calculations, no loops involved. So, what's the consensus on benchmarks? I can share ipython notebooks via gist, for example. On Fri, May 29, 2015 at 10:51 AM, Michael Eickenberg michael.eickenb...@gmail.com wrote: Hi Aurélien, thanks for these very good pointers! (Now we also know who else to bug periodically for opinions ;)) Michael On Fri, May 29, 2015 at 12:05 AM, Aurélien Bellet aurelien.bel...@telecom-paristech.fr wrote: Hi everyone, A few additional things to consider for scaling-up NCA to large datasets: - Take a look at the t-SNE (technique for visualization/dim reduction very similar to NCA) implementations, I think they have a few speed-up tricks that you could potentially re-use: http://lvdmaaten.github.io/tsne/ - Like you said, SGD can help reduce the computational cost - you could also consider recent improvements of SGD, such as SAG/SAGA, SVRG, etc. - Similarly to what was suggested in previous replies, a general idea is to only consider a neighborhood around each point (either fixed in advance, or updated every now and then during the course of optimization), since the probabilities decrease very fast with the distance so farther points can be safely ignored in the computation. This is explored for instance in: http://dl.acm.org/citation.cfm?id=2142432 - Another related idea is to construct class representatives (for instance using k-means), and to model the distribution only wrt these points instead of the entire dataset. This is especially useful if some classes are very large. An extreme version of this is to reframe NCA for a Nearest Class Mean classifier, where each class is only modeled by its center: https://hal.archives-ouvertes.fr/file/index/docid/722313/filename/mensink12eccv.final.pdf Hope this helps. Aurelien Le 5/28/15 11:20 PM, Andreas Mueller a écrit : On 05/28/2015 05:11 PM, Michael Eickenberg wrote: Code-wise, I would attack the problem as a function first. Write a function that takes X and y (plus maybe some options) and gives back L. You can put a skeleton of a sklearn estimator around it by calling this function from fit. Please keep your code either in a sklearn WIP PR or a public gist, so it can be reviewed. Writing benchmarks can be framed as writing examples, i.e. plot_* functions (maybe Andy or Olivier have a comment on how benchmarks have been handled in the past?). There is a benchmark folder, which is in a horrible shape. Basically there are three ways to do it: examples (with or without plot depending on the runtime), a script in the benchmark folder, or a gist. Often we just use a gist and the PR person posts the output. Not that great for reproducibility, though. -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- ___ Scikit-learn-general
Re: [Scikit-learn-general] [GSoC2015 metric learning]
Apparently, the problem was with multiplying big 4D tensors: grad = np.sum(proba[:, :, np.newaxis, np.newaxis] * outer, axis=1).sum(axis= 0) I don't know the details, but line-prof said that most of the time was spent here (and the other similar line), probably it was due to copies. I recalled your words on np.einsum, replaced sums with it and got quite a speedup (x4)! New output log: link https://gist.github.com/Barmaley-exe/d5ca99d7b94fb665ad03. Now fully vectorized version takes ~2s per iteration and is faster than semivectorized one. P.S. I made the same optimization in semivectorized version, but its speedup isn't that significant. On Sun, May 31, 2015 at 9:29 PM, Michael Eickenberg michael.eickenb...@gmail.com wrote: On Sun, May 31, 2015 at 7:25 PM, Artem barmaley@gmail.com wrote: I added a simple benchmark https://github.com/Barmaley-exe/scikit-learn/blob/metric-learning/benchmarks/bench_nca.py that compares NCA-assisted 1NN with default one (with Euclidean distance) on Wine dataset (it was one of the datasets reported in the NCA paper). See my output here https://gist.github.com/Barmaley-exe/a713f23f74eb53a2f2bd. It also compares semivectorized and vectorized implementations: surprisingly, semivectorizes is about 2 times faster. I think, this might be a reason to throw fully vectorized (nca_vectorized_oracle) implementation away. This is very interesting. I like the way you made the semi-vectorized way and this seems to show that you are still benefitting from the remaining vectorizations you have. However, I am surprised to see such a difference between the two methods. Do you have any idea why this could be the case? Where do you think the vectorized version looses all this time? On Sat, May 30, 2015 at 12:33 AM, Michael Eickenberg michael.eickenb...@gmail.com wrote: Thanks for the update. So, what's the consensus on benchmarks? I can share ipython notebooks via gist, for example. My (weak) preference would be to have a script within the sklearn repo, just to keep stuff in one place for easy future reference. Michael On Fri, May 29, 2015 at 6:24 PM, Artem barmaley@gmail.com wrote: So, I created a WIP PR dedicated to NCA: https://github.com/scikit-learn/scikit-learn/pull/4789 As suggested by Michael, I refactored the meat into a function. I also rewrote it as a first order oracle, so I can (and I do) use scipy's optimizers. I've seen scipy.optimize.minimize (apparently, with BFGS) sometimes stopping at some weird point (a local minimum / saddle point?), whereas gradient descent seems to always converge. Though, I didn't test either of them extensively. I also fully vectorized function and gradient calculations, no loops involved. So, what's the consensus on benchmarks? I can share ipython notebooks via gist, for example. On Fri, May 29, 2015 at 10:51 AM, Michael Eickenberg michael.eickenb...@gmail.com wrote: Hi Aurélien, thanks for these very good pointers! (Now we also know who else to bug periodically for opinions ;)) Michael On Fri, May 29, 2015 at 12:05 AM, Aurélien Bellet aurelien.bel...@telecom-paristech.fr wrote: Hi everyone, A few additional things to consider for scaling-up NCA to large datasets: - Take a look at the t-SNE (technique for visualization/dim reduction very similar to NCA) implementations, I think they have a few speed-up tricks that you could potentially re-use: http://lvdmaaten.github.io/tsne/ - Like you said, SGD can help reduce the computational cost - you could also consider recent improvements of SGD, such as SAG/SAGA, SVRG, etc. - Similarly to what was suggested in previous replies, a general idea is to only consider a neighborhood around each point (either fixed in advance, or updated every now and then during the course of optimization), since the probabilities decrease very fast with the distance so farther points can be safely ignored in the computation. This is explored for instance in: http://dl.acm.org/citation.cfm?id=2142432 - Another related idea is to construct class representatives (for instance using k-means), and to model the distribution only wrt these points instead of the entire dataset. This is especially useful if some classes are very large. An extreme version of this is to reframe NCA for a Nearest Class Mean classifier, where each class is only modeled by its center: https://hal.archives-ouvertes.fr/file/index/docid/722313/filename/mensink12eccv.final.pdf Hope this helps. Aurelien Le 5/28/15 11:20 PM, Andreas Mueller a écrit : On 05/28/2015 05:11 PM, Michael Eickenberg wrote: Code-wise, I would attack the problem as a function first. Write a function that takes X and y (plus maybe some options) and gives back L. You can put a skeleton of a sklearn estimator around it by calling this function from fit. Please keep your code either in a sklearn WIP PR or a public gist, so it can be reviewed.
[Scikit-learn-general] my silence
Just a quick note that I've been silent lately because I've been Busy With Life, but also because github was notifying an email address hosted at my previous employer, which was deactivated a fortnight ago. If there were issues that sought my particular attention, please let me know. -- ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general