If its very sparse you can try
https://issues.apache.org/jira/browse/MAHOUT-703

Instead of minimizing reconstruction error, it tries to enforce that your
words rank higher than other words not present in your document.

Example of some results from this approach:

https://docs.google.com/present/edit?id=0AQC247eq7Jp5ZGZ6NXpyOWhfMjlmM2pzdjRkZw&authkey=CNj2h98P&hl=en_US


On Fri, Jun 3, 2011 at 4:48 PM, Eshwaran Vijaya Kumar <
[email protected]> wrote:

> Hello all,
>  We are trying to build a clustering system which will have an SVD
> component. I believe Mahout has two SVD solvers: DistributedLanczosSolver
> and SSVD. Could someone give me some tips on which would be a better choice
> of a solver given that the size of the data will be roughly 100 million rows
> with each row having roughly 50 K dimensions (100 million X 50000 ). We will
> be working with text data so the resultant matrix should be relatively
> sparse to begin with.
>
> Thanks
> Eshwaran




-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

Reply via email to