Re: [Scikit-learn-general] Adding Barnes-Hut t-SNE

Andy Wed, 24 Dec 2014 12:00:49 -0800

I recently read about the approximation and I think it would be a greataddition.Do you think it makes sense to include it via an ``algorithm`` paramterto tSNE?I totally agree with what Kyle said about demonstrating speedups andapproximation accuracy (if possible).

I haven't used Cython.parallel before. Does it degrade gracefully ifOpenMP is not available?

If so, we should really be using it much more.




On 12/24/2014 02:28 PM, Kyle Kastner wrote:

Sounds like an excellent improvement for usability!

If you could benchmark time spent, and show that it is a noticeableimprovement that will be crucial. Also showing how bad theapproximation is compared to base t-SNE will be important - thoughthere comes a point where you can't really compare, because vanillat-SNE just never finishes! If it is a huge improvement for a smallapproximation cost, that is exactly the kind of engineering tradeoffthat has been made in the past for other things like randomized SVD, etc.

Making approximation an option of the current solver seems OK to me,that way people can always choose which version to use if exactmethods are required.

I would start the PR fairly early, once you have a working versionthat others can try along with a minimal test suite. There willprobably be lots of tweaks for consistency, etc. but that is easy todo once the core is there and the tests cover it. I know lots of greatpeople helped me with this in the past.


Looking forward to checking it out!

On Wed, Dec 24, 2014 at 2:13 PM, Christopher Moody<chrisemo...@gmail.com <mailto:chrisemo...@gmail.com>> wrote:


    Hi folks,
    Nick Travers and I have been working steadily
    
<https://github.com/cemoody/scikit-learn/tree/cemoody/bhtsne/sklearn/manifold>
    on a Barnes-Hut approximation of the t-SNE algorithm currently
    implemented as a manifold learning technique. This version makes
    the gradient calculation much faster, changing the computational
    time from O(N^2) to O(N log N). This effectively extends t-SNE
    from being usable on thousands of examples to millions of examples
    while only losing a little bit of precision. t-SNE was first
    published in 2009 <http://lvdmaaten.github.io/tsne/> and the
    Barnes-Hut approximation was introduced
    <http://arxiv.org/abs/1301.3342> only two years ago, but it was
    accepted into publication only this year [pdf
    <http://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf>]
    making it relatively new.

    Ahead of submitting a PR, I wanted to start the process for
    thinking about its inclusion (or exclusion!). It's a performance
    improvement to an established and widely-used algorithm but it's
    also quite new.

    - What do you all think?
    - Is this too new to be incorporated, or does its usefulness merit
    inclusion?
    - We've read the Contributing
    <http://scikit-learn.org/stable/developers/> docs -- what else can
    Nick & I do to make the PR as easy as we can for reviewers? And
    looking forward, what  can we do minimize code rot and ease the
    maintenance burden?

    Many thanks & looking forward to being part of scikit-learn community!
    chris & nick

    
------------------------------------------------------------------------------
    Dive into the World of Parallel Programming! The Go Parallel Website,
    sponsored by Intel and developed in partnership with Slashdot
    Media, is your
    hub for all things parallel software development, from weekly thought
    leadership blogs to news, videos, case studies, tutorials and
    more. Take a
    look and join the conversation now. http://goparallel.sourceforge.net
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Adding Barnes-Hut t-SNE

Reply via email to