What you really probably need to worry is not the number of dimensions, but only avg number of non-zero elements per row (density). How dense is the data?
On Fri, Jun 3, 2011 at 4:48 PM, Eshwaran Vijaya Kumar <[email protected]> wrote: > Hello all, > We are trying to build a clustering system which will have an SVD component. > I believe Mahout has two SVD solvers: DistributedLanczosSolver and SSVD. > Could someone give me some tips on which would be a better choice of a solver > given that the size of the data will be roughly 100 million rows with each > row having roughly 50 K dimensions (100 million X 50000 ). We will be working > with text data so the resultant matrix should be relatively sparse to begin > with. > > Thanks > Eshwaran
