Hi, I want to compute the pairwise cosine similarity of items in a vector space of a very high dimensionality .
My input matrix is very sparse, but the number of nonzero elements per item follows a very skewed distribution (i.e. power law-ish, with very few items having lots of features, and vice versa). Intuitively, comparing items with very different numbers of features doesn't seem very desirable, but the only idea I got to mitigate this problem is to partition my input matrix in "bands of items having similar #s of features", which is not obvious to do, given the very skewed distribution. I'd greatly appreciate any idea or suggestion about this problem. Thanks, Christian ------------------------------------------------------------------------------ Start Your Social Network Today - Download eXo Platform Build your Enterprise Intranet with eXo Platform Software Java Based Open Source Intranet - Social, Extensible, Cloud Ready Get Started Now And Turn Your Intranet Into A Collaboration Platform http://p.sf.net/sfu/ExoPlatform _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general