Hi All I am running Mahout SSVD (trunk version) using pca option on Bag of Words dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words). This dataset have 8000000 instances (rows) and 100000 attributes (columns). Mahout SSVD is too slow, it may take days to finish the first phase of SSVD (Q-Job) . I am running the code on a cluster of 16 machines, each one is 8 cores and 32 GB memory. Moreover, the CPU and memory of the workers are not utilized at all. While running Mahout SSVD on smaller dataset (12500 rows and 5000 columns), it runs too fast, the job was finished in 2 minutes. Do you have any idea why Mahout SSVD is too slow for high dimensional data ? and to what extent that SSVD can work efficiently (with respect to the number of rows and columns of the input matrix) ?
Thanks Yehia
