Hi,

I have some problems to utilize all available CPU power for 'mahout cvb'
command.
The CPU usage is just about 35% and IO wait ~0%.
I have 8 cores and 28 GB memory in a single computer that is running Mahout
0.7-cdh-4.1.2 with Hadoop 2.0.0-cdh4.1.2 in pseudo-distributed mode.
How can I take advantage of all the CPU power for a single 'mahout cvb'
task?


I use following parameters to run mahout cvb:

mahout cvb
-Ddfs.namenode.handler.count=32
-Dmapred.job.tracker.handler.count=32
-Dio.sort.factor=30
-Dio.sort.mb=500
-Dio.file.buffer.size=65536
-Dmapred.child.java.opts=-Xmx2g
-Dmapred.map.child.java.opts=-Xmx2g
-Dmapred.reduce.child.java.opts=-Xmx2g
-Dmapred.job.reuse.jvm.num.tasks=-1
-Dmapred.map.tasks=7
-Dmapred.reduce.tasks=7
-Dmapred.max.split.size=3145728
-Dmapred.min.split.size=3145728
-Dmapred.tasktracker.map.tasks.maximum=7
-Dmapred.tasktracker.reduce.tasks.maximum=7
-Dmapred.tasktracker.tasks.maximum=7
  --input ~/mahout-files/mydatavectors_int
  --output ~/mahout-files/topics
  --num_terms 10078
  --num_topics 50
  --doc_topic_output ~/mahout-files/doc-topics
  --maxIter 50
  --num_update_threads 8
  --num_train_threads 8
  -block 1
  --test_set_fraction 0.1
  --convergenceDelta 0.0000001
  --tempDir ~/mahout-files/cvb-temp


Linux top command says:

Cpu(s): 33.9%us,  1.1%sy,  0.0%ni, 65.0%id,  0.0%wa,  0.0%hi,  0.0%si,
0.0%st
Mem:  28479224k total, 16398624k used, 12080600k free,   899576k buffers
Swap: 28942332k total,        0k used, 28942332k free,  5733368k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19765 mapred    20   0 2811m 650m  16m S  129  2.3   3:59.06 java
19721 mapred    20   0 2812m 650m  16m S  125  2.3   3:53.70 java

So just 2.5 / 8 cores are fully in use.


Regards, Markus

Reply via email to