Re: running lda in mahout

Jakub Pawłowski Fri, 24 May 2013 03:13:53 -0700

Hi,
I'm also new to mahout,  but hope I can help:

If you are running mahout with hadoop on default settings on one machinewith no cluster, I advise you to change your setup to be one nodecluster, and tune hadoop. That can make enormous performance improvements.

Because of default hadoop settings my job (SSVD, for small data) took 32minutes, after changing to cluster and little tuning of io.sort.mbparameter i managed to get down to 9-10 minutes. With default settingsit makes lots of spills to disk, and they're really slow. I'm stilltuning my job, but its huge improvement already.


to get those improvements I followed:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
(without setting hdfs)

and
http://www.slideshare.net/cloudera/mr-perf

Hope that help.

W dniu 24.05.2013 11:59, parnab kumar pisze:

Hi ,
              I am using mahout 0.6 in its default settings i.e i am not
using any hadoop cluster . I am running it in developer mode . To test LDA
  i used around 40k files which i converted to sequence file format vectors
. I tried to test with 20 iterations . It is taking more than 3 hrs to
complete the 20 iterations. I donot understand why it is taking so much
time . Is it natural for it to take so much time without using a cluster .
I am using a CPU with 2 processors and 2 gb of ram.

Thanks,
Parnab

Re: running lda in mahout

Reply via email to