Thanks everyone for your suggestions! Great to see that it should a good
inclusion & happy to see so much excitement around it! Nick and I will
likely submit a PR once we've shown that it's performant and we've got a
script others can run to get started.
Also, while we've experimented with prange-loops in the BH t-SNE code, they
actually turn out to be slower than single-threaded loops. I was hoping for
a quick way to parallelism, but it looks like the prange for-loops in fact
run more slowly.
chris
On Thu, Dec 25, 2014 at 9:57 PM, Mathieu Blondel <math...@mblondel.org>
wrote:
>
>
> On Thu, Dec 25, 2014 at 4:59 AM, Andy <t3k...@gmail.com> wrote:
>
>> I recently read about the approximation and I think it would be a great
>> addition.
>> Do you think it makes sense to include it via an ``algorithm`` paramter
>> to tSNE?
>> I totally agree with what Kyle said about demonstrating speedups and
>> approximation accuracy (if possible).
>>
>> I haven't used Cython.parallel before. Does it degrade gracefully if
>> OpenMP is not available?
>> If so, we should really be using it much more.
>>
>
> Programs that use prange / parallel compile fine on all compilers as long
> as the OpenMP compiler and linker flags are not specified in setup.py.
> If they are specified, then compilers without OpenMP support will raise an
> error. This is problematic as the default compiler on OS X (clang) doesn't
> support OpenMP. We can either try to detect OpenMP support (
> http://stackoverflow.com/questions/16549893/programatically-testing-for-openmp-support-from-a-python-setup-script)
> or add a build option to setup.py (
> http://stackoverflow.com/questions/2709278/setup-py-adding-options-aka-setup-py-enable-feature
> ).
>
> I'm currently playing myself with OpenMP in lightning for parallel
> gradient computation but this is a work in progress.
>
> Mathieu
>
>
>>
>>
>>
>> On 12/24/2014 02:28 PM, Kyle Kastner wrote:
>>
>> Sounds like an excellent improvement for usability!
>>
>> If you could benchmark time spent, and show that it is a noticeable
>> improvement that will be crucial. Also showing how bad the approximation is
>> compared to base t-SNE will be important - though there comes a point where
>> you can't really compare, because vanilla t-SNE just never finishes! If it
>> is a huge improvement for a small approximation cost, that is exactly the
>> kind of engineering tradeoff that has been made in the past for other
>> things like randomized SVD, etc.
>>
>> Making approximation an option of the current solver seems OK to me,
>> that way people can always choose which version to use if exact methods are
>> required.
>>
>> I would start the PR fairly early, once you have a working version that
>> others can try along with a minimal test suite. There will probably be lots
>> of tweaks for consistency, etc. but that is easy to do once the core is
>> there and the tests cover it. I know lots of great people helped me with
>> this in the past.
>>
>> Looking forward to checking it out!
>>
>> On Wed, Dec 24, 2014 at 2:13 PM, Christopher Moody <chrisemo...@gmail.com
>> > wrote:
>>
>>> Hi folks,
>>> Nick Travers and I have been working steadily
>>> <https://github.com/cemoody/scikit-learn/tree/cemoody/bhtsne/sklearn/manifold>
>>> on a Barnes-Hut approximation of the t-SNE algorithm currently implemented
>>> as a manifold learning technique. This version makes the gradient
>>> calculation much faster, changing the computational time from O(N^2) to O(N
>>> log N). This effectively extends t-SNE from being usable on thousands of
>>> examples to millions of examples while only losing a little bit of
>>> precision. t-SNE was first published in 2009
>>> <http://lvdmaaten.github.io/tsne/> and the Barnes-Hut approximation was
>>> introduced <http://arxiv.org/abs/1301.3342> only two years ago, but it
>>> was accepted into publication only this year [pdf
>>> <http://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf>] making
>>> it relatively new.
>>>
>>> Ahead of submitting a PR, I wanted to start the process for thinking
>>> about its inclusion (or exclusion!). It's a performance improvement to an
>>> established and widely-used algorithm but it's also quite new.
>>>
>>> - What do you all think?
>>> - Is this too new to be incorporated, or does its usefulness merit
>>> inclusion?
>>> - We've read the Contributing
>>> <http://scikit-learn.org/stable/developers/> docs -- what else can Nick
>>> & I do to make the PR as easy as we can for reviewers? And looking forward,
>>> what can we do minimize code rot and ease the maintenance burden?
>>>
>>> Many thanks & looking forward to being part of scikit-learn community!
>>> chris & nick
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming! The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming! The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming! The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming! The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general