Hi,
I'm also new to mahout,  but hope I can help:

If you are running mahout with hadoop on default settings on one machine with no cluster, I advise you to change your setup to be one node cluster, and tune hadoop. That can make enormous performance improvements.

Because of default hadoop settings my job (SSVD, for small data) took 32 minutes, after changing to cluster and little tuning of io.sort.mb parameter i managed to get down to 9-10 minutes. With default settings it makes lots of spills to disk, and they're really slow. I'm still tuning my job, but its huge improvement already.

to get those improvements I followed:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
(without setting hdfs)

and
http://www.slideshare.net/cloudera/mr-perf

Hope that help.

W dniu 24.05.2013 11:59, parnab kumar pisze:
Hi ,
              I am using mahout 0.6 in its default settings i.e i am not
using any hadoop cluster . I am running it in developer mode . To test LDA
  i used around 40k files which i converted to sequence file format vectors
. I tried to test with 20 iterations . It is taking more than 3 hrs to
complete the 20 iterations. I donot understand why it is taking so much
time . Is it natural for it to take so much time without using a cluster .
I am using a CPU with 2 processors and 2 gb of ram.

Thanks,
Parnab


Reply via email to