Am 26.10.2012 15:35, schrieb Olivier Grisel:
> BTW, in the mean time you could encode your coocurrences as text 
> identifiers use either Lucene/Solr in Java using the sunburnt python 
> client or woosh [1] in python as a way to do efficient sparse lookups 
> in such a sparse matrix to be able to quickly compute the non zero 
> cosine similarities between all pairs. Solr also as MoreLikeThis 
> queries that can be used to truncate the search to the top most 
> similar samples in the set of samples in the case you have some very 
> frequent non zero features that would mostly break the sparsity of the 
> cosine similarity matrix. As Trey Grainger says in his talk "Building 
> a real time, solr-powered recommendation engine": "A Lucene index is a 
> multi-dimensional sparse matrix… with very fast and powerful lookup 
> capabilities." [1] http://packages.python.org/Whoosh/quickstart.html 
> [2] 
> http://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine

Thanks, this looks promising. What do you exactly mean, by encoding 
cooccurrences as text identifiers? How would I handle my sparse vectors 
then?

I know the MoreLikeThis functionality, but does it exactly do cosine 
similarity? The thing is, that I need this relatedness emasure for my 
studies.

Philipp


------------------------------------------------------------------------------
WINDOWS 8 is here. 
Millions of people.  Your app in 30 days.
Visit The Windows 8 Center at Sourceforge for all your go to resources.
http://windows8center.sourceforge.net/
join-generation-app-and-make-money-coding-fast/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to