Is your problem figuring out a good similarity measure, or dealing with
large quantities of sparse data in a memory efficient way? If it is the
latter, you can look into feature hashing:
http://en.wikipedia.org/wiki/Feature_hashing
regards
shankar.
On Wed, Apr 23, 2014 at 9:59 AM, Christian Jauvin <cjau...@gmail.com> wrote:
> Hi,
>
> I want to compute the pairwise cosine similarity of items in a vector
> space of a very high dimensionality .
>
> My input matrix is very sparse, but the number of nonzero elements per
> item follows a very skewed distribution (i.e. power law-ish, with very
> few items having lots of features, and vice versa).
>
> Intuitively, comparing items with very different numbers of features
> doesn't seem very desirable, but the only idea I got to mitigate this
> problem is to partition my input matrix in "bands of items having
> similar #s of features", which is not obvious to do, given the very
> skewed distribution.
>
> I'd greatly appreciate any idea or suggestion about this problem.
>
> Thanks,
>
> Christian
>
>
> ------------------------------------------------------------------------------
> Start Your Social Network Today - Download eXo Platform
> Build your Enterprise Intranet with eXo Platform Software
> Java Based Open Source Intranet - Social, Extensible, Cloud Ready
> Get Started Now And Turn Your Intranet Into A Collaboration Platform
> http://p.sf.net/sfu/ExoPlatform
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general