I recently read about the approximation and I think it would be a great
addition.
Do you think it makes sense to include it via an ``algorithm`` paramter
to tSNE?
I totally agree with what Kyle said about demonstrating speedups and
approximation accuracy (if possible).
I haven't used Cython.parallel before. Does it degrade gracefully if
OpenMP is not available?
If so, we should really be using it much more.
On 12/24/2014 02:28 PM, Kyle Kastner wrote:
Sounds like an excellent improvement for usability!
If you could benchmark time spent, and show that it is a noticeable
improvement that will be crucial. Also showing how bad the
approximation is compared to base t-SNE will be important - though
there comes a point where you can't really compare, because vanilla
t-SNE just never finishes! If it is a huge improvement for a small
approximation cost, that is exactly the kind of engineering tradeoff
that has been made in the past for other things like randomized SVD, etc.
Making approximation an option of the current solver seems OK to me,
that way people can always choose which version to use if exact
methods are required.
I would start the PR fairly early, once you have a working version
that others can try along with a minimal test suite. There will
probably be lots of tweaks for consistency, etc. but that is easy to
do once the core is there and the tests cover it. I know lots of great
people helped me with this in the past.
Looking forward to checking it out!
On Wed, Dec 24, 2014 at 2:13 PM, Christopher Moody
<chrisemo...@gmail.com <mailto:chrisemo...@gmail.com>> wrote:
Hi folks,
Nick Travers and I have been working steadily
<https://github.com/cemoody/scikit-learn/tree/cemoody/bhtsne/sklearn/manifold>
on a Barnes-Hut approximation of the t-SNE algorithm currently
implemented as a manifold learning technique. This version makes
the gradient calculation much faster, changing the computational
time from O(N^2) to O(N log N). This effectively extends t-SNE
from being usable on thousands of examples to millions of examples
while only losing a little bit of precision. t-SNE was first
published in 2009 <http://lvdmaaten.github.io/tsne/> and the
Barnes-Hut approximation was introduced
<http://arxiv.org/abs/1301.3342> only two years ago, but it was
accepted into publication only this year [pdf
<http://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf>]
making it relatively new.
Ahead of submitting a PR, I wanted to start the process for
thinking about its inclusion (or exclusion!). It's a performance
improvement to an established and widely-used algorithm but it's
also quite new.
- What do you all think?
- Is this too new to be incorporated, or does its usefulness merit
inclusion?
- We've read the Contributing
<http://scikit-learn.org/stable/developers/> docs -- what else can
Nick & I do to make the PR as easy as we can for reviewers? And
looking forward, what can we do minimize code rot and ease the
maintenance burden?
Many thanks & looking forward to being part of scikit-learn community!
chris & nick
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot
Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and
more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general