2012/1/23 Alexandre Gramfort <[email protected]>:
> I am not sure it is what you want but you could use:
>
> K = radius_neighbors_graph(X, radius, mode='distance')
> K.data **= 2
> K.data *= -gamma
> np.exp(K.data, out=K.data)
>
> no?

+1 for the dense case

But ball tree does not work for high dim sparse data.

We would also need some truncated kernels (e.g. cosine similarity for
positive data or RBF in the general case) probably implemented in
cython for the high dim sparse case where the dense output shape
(n_samples, n_neighbors) is preallocated in advance (and assumed to
fit in memory while a dense array for (n_samples, n_samples) or
(n_samples, n_features) would not).

That would be very useful to make SpectralClustering work on text
data. That should also help with the "over-convergence" issues I
observe on the power iteration clustering branch when n_samples is too
big.

Using LSH (or some variant of random projection) might indeed
interesting to quickly the approximate nearest neighbors graph of high
dim sparse data (but I think a cython version for the exact case
truncated case would still be useful, at least as a control reference
for the approximate case).

BTW, I am making some progress on the Random Projection branch: I have
started integrating murmurhash to simulate random projection by a
sparse matrix that is never materialized in memory. The example looks
good too. It still need some work on the hashing part and on the
narrative doc.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to