Hi, I have some problems to utilize all available CPU power for 'mahout cvb' command. The CPU usage is just about 35% and IO wait ~0%. I have 8 cores and 28 GB memory in a single computer that is running Mahout 0.7-cdh-4.1.2 with Hadoop 2.0.0-cdh4.1.2 in pseudo-distributed mode. How can I take advantage of all the CPU power for a single 'mahout cvb' task?
I use following parameters to run mahout cvb: mahout cvb -Ddfs.namenode.handler.count=32 -Dmapred.job.tracker.handler.count=32 -Dio.sort.factor=30 -Dio.sort.mb=500 -Dio.file.buffer.size=65536 -Dmapred.child.java.opts=-Xmx2g -Dmapred.map.child.java.opts=-Xmx2g -Dmapred.reduce.child.java.opts=-Xmx2g -Dmapred.job.reuse.jvm.num.tasks=-1 -Dmapred.map.tasks=7 -Dmapred.reduce.tasks=7 -Dmapred.max.split.size=3145728 -Dmapred.min.split.size=3145728 -Dmapred.tasktracker.map.tasks.maximum=7 -Dmapred.tasktracker.reduce.tasks.maximum=7 -Dmapred.tasktracker.tasks.maximum=7 --input ~/mahout-files/mydatavectors_int --output ~/mahout-files/topics --num_terms 10078 --num_topics 50 --doc_topic_output ~/mahout-files/doc-topics --maxIter 50 --num_update_threads 8 --num_train_threads 8 -block 1 --test_set_fraction 0.1 --convergenceDelta 0.0000001 --tempDir ~/mahout-files/cvb-temp Linux top command says: Cpu(s): 33.9%us, 1.1%sy, 0.0%ni, 65.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 28479224k total, 16398624k used, 12080600k free, 899576k buffers Swap: 28942332k total, 0k used, 28942332k free, 5733368k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19765 mapred 20 0 2811m 650m 16m S 129 2.3 3:59.06 java 19721 mapred 20 0 2812m 650m 16m S 125 2.3 3:53.70 java So just 2.5 / 8 cores are fully in use. Regards, Markus
