Hi The requested rank (k) is 1000 and p is 1. The input size is 1.2 gigabyte.
Thanks On Mon, Jun 10, 2013 at 9:28 PM, Dmitriy Lyubimov <[email protected]> wrote: > what is requested rank? This guy will not scale w.r.t rank, only w.r.t > input size. Reallistically you don't need k>100, p >15. > > What is the input size (A in Gb?) > > > On Mon, Jun 10, 2013 at 5:31 AM, Yahia Zakaria <[email protected] > >wrote: > > > Hi All > > > > I am running Mahout SSVD (trunk version) using pca option on Bag of Words > > dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words). This > > dataset > > have 8000000 instances (rows) and 100000 attributes (columns). Mahout > SSVD > > is too slow, it may take days to finish the first phase of SSVD (Q-Job) > . I > > am running the code on a cluster of 16 machines, each one is 8 cores and > 32 > > GB memory. Moreover, the CPU and memory of the workers are not utilized > at > > all. While running Mahout SSVD on smaller dataset (12500 rows and 5000 > > columns), it runs too fast, the job was finished in 2 minutes. Do you > have > > any idea why Mahout SSVD is too slow for high dimensional data ? and to > > what extent that SSVD can work efficiently (with respect to the number of > > rows and columns of the input matrix) ? > > > > Thanks > > Yehia > > >
