2012/10/26 Philipp Singer <kill...@gmail.com>: > Am 26.10.2012 15:35, schrieb Olivier Grisel: >> BTW, in the mean time you could encode your coocurrences as text >> identifiers use either Lucene/Solr in Java using the sunburnt python >> client or woosh [1] in python as a way to do efficient sparse lookups >> in such a sparse matrix to be able to quickly compute the non zero >> cosine similarities between all pairs. Solr also as MoreLikeThis >> queries that can be used to truncate the search to the top most >> similar samples in the set of samples in the case you have some very >> frequent non zero features that would mostly break the sparsity of the >> cosine similarity matrix. As Trey Grainger says in his talk "Building >> a real time, solr-powered recommendation engine": "A Lucene index is a >> multi-dimensional sparse matrix… with very fast and powerful lookup >> capabilities." [1] http://packages.python.org/Whoosh/quickstart.html >> [2] >> http://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine > > Thanks, this looks promising. What do you exactly mean, by encoding > cooccurrences as text identifiers? How would I handle my sparse vectors > then?
It's just that the Solr API deals with text document as inputs rather than precomputed integer feature index + float feature value: you cannot bypass the text feature extraction layer of Solr (the analyzers) unfortunately. > I know the MoreLikeThis functionality, but does it exactly do cosine > similarity? The thing is, that I need this relatedness emasure for my > studies. No it's a truncated approximation (a lower bound) but it keeps many zeros in your similarity matrix in case you have terms that occur in every single document. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ WINDOWS 8 is here. Millions of people. Your app in 30 days. Visit The Windows 8 Center at Sourceforge for all your go to resources. http://windows8center.sourceforge.net/ join-generation-app-and-make-money-coding-fast/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general