Olivier Grisel wrote:

<snip>
> +1 for the dense case
>
> But ball tree does not work for high dim sparse data.
>   
I'm working on that - I hope to have a pull request within the next few 
weeks.
> We would also need some truncated kernels (e.g. cosine similarity for
> positive data or RBF in the general case) probably implemented in
> cython for the high dim sparse case where the dense output shape
> (n_samples, n_neighbors) is preallocated in advance (and assumed to
> fit in memory while a dense array for (n_samples, n_samples) or
> (n_samples, n_features) would not).
>
> That would be very useful to make SpectralClustering work on text
> data. That should also help with the "over-convergence" issues I
> observe on the power iteration clustering branch when n_samples is too
> big.
>
> Using LSH (or some variant of random projection) might indeed
> interesting to quickly the approximate nearest neighbors graph of high
> dim sparse data (but I think a cython version for the exact case
> truncated case would still be useful, at least as a control reference
> for the approximate case).
>
> BTW, I am making some progress on the Random Projection branch: I have
> started integrating murmurhash to simulate random projection by a
> sparse matrix that is never materialized in memory. The example looks
> good too. It still need some work on the hashing part and on the
> narrative doc.
>
>   

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to