Mahout SSVD is too slow for highly dimensional data

Yahia Zakaria Mon, 10 Jun 2013 05:32:43 -0700

Hi All

I am running Mahout SSVD (trunk version) using pca option on Bag of Words
dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words). This dataset
have 8000000 instances (rows) and 100000 attributes (columns). Mahout SSVD
is too slow, it may take days to finish the first phase of SSVD (Q-Job) . I
am running the code on a cluster of 16 machines, each one is 8 cores and 32
GB memory. Moreover, the CPU and memory of the workers are not utilized at
all. While running Mahout SSVD on smaller dataset (12500 rows and 5000
columns), it runs too fast, the job was finished in 2 minutes. Do you have
any idea why Mahout SSVD is too slow for high dimensional data ? and to
what extent that SSVD can work efficiently (with respect to the number of
rows and columns of the input matrix) ?


Thanks
Yehia

Mahout SSVD is too slow for highly dimensional data

Reply via email to