What you really probably need to worry is not the number of
dimensions, but only avg number of non-zero elements per row
(density). How dense is the data?



On Fri, Jun 3, 2011 at 4:48 PM, Eshwaran Vijaya Kumar
<[email protected]> wrote:
> Hello all,
>  We are trying to build a clustering system which will have an SVD component. 
> I believe Mahout has two SVD solvers: DistributedLanczosSolver and SSVD. 
> Could someone give me some tips on which would be a better choice of a solver 
> given that the size of the data will be roughly 100 million rows with each 
> row having roughly 50 K dimensions (100 million X 50000 ). We will be working 
> with text data so the resultant matrix should be relatively sparse to begin 
> with.
>
> Thanks
> Eshwaran

Reply via email to