Did you tune the number of reducers? I successfully applied ssvd to a dataset with 3B nonzeros on 6 machines in a few hours. Am 10.06.2013 14:32 schrieb "Yahia Zakaria" <[email protected]>:
> Hi All > > I am running Mahout SSVD (trunk version) using pca option on Bag of Words > dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words). This > dataset > have 8000000 instances (rows) and 100000 attributes (columns). Mahout SSVD > is too slow, it may take days to finish the first phase of SSVD (Q-Job) . I > am running the code on a cluster of 16 machines, each one is 8 cores and 32 > GB memory. Moreover, the CPU and memory of the workers are not utilized at > all. While running Mahout SSVD on smaller dataset (12500 rows and 5000 > columns), it runs too fast, the job was finished in 2 minutes. Do you have > any idea why Mahout SSVD is too slow for high dimensional data ? and to > what extent that SSVD can work efficiently (with respect to the number of > rows and columns of the input matrix) ? > > Thanks > Yehia >
