[Scikit-learn-general] Similarity in a vector space model with skewed feature distribution

Christian Jauvin Tue, 22 Apr 2014 19:01:27 -0700

Hi,

I want to compute the pairwise cosine similarity of items in a vector
space of a very high dimensionality .


My input matrix is very sparse, but the number of nonzero elements per
item follows a very skewed distribution (i.e. power law-ish, with very
few items having lots of features, and vice versa).

Intuitively, comparing items with very different numbers of features
doesn't seem very desirable, but the only idea I got to mitigate this
problem is to partition my input matrix in "bands of items having
similar #s of features", which is not obvious to do, given the very
skewed distribution.

I'd greatly appreciate any idea or suggestion about this problem.

Thanks,

Christian

------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Similarity in a vector space model with skewed feature distribution

Reply via email to