Thanks! That did the magic. I divided the input sequence file into 8 equally sized files and now CPU usage is near 100%.
Regards, Markus 2012/12/3 Andy Schlaikjer <[email protected]> > Hi Markus, > > First I'd check to make sure your input term vectors are evenly > partitioned into more than two part files. You can force a certain map > side parallelism by creating a specific number of part files here. No > matter how you configure map slots, you'll need input organized in > such a way that they all receive tasks to process. > > Andy > @sagemintblue > > > On Dec 3, 2012, at 12:44 AM, Markus Paaso <[email protected]> wrote: > > > The log shows that there are 2 map tasks and 10 reduce tasks. > > How can there be 10 reduce tasks when I set parameter > > '-Dmapred.tasktracker.reduce.tasks.maximum=7'? > > I would like to increase the amount of concurrent map tasks. Any > parameter > > suggestions for that? > > > > It seems that configuration parameter > > 'mapred.tasktracker.map.tasks.maximum' doesn't grow the number of > > concurrently running map tasks... > > > > > > Some log rows from mahout cvb: > > > > 12/12/03 10:30:23 INFO mapred.JobClient: Job complete: > job_201212011004_0432 > > 12/12/03 10:30:23 INFO mapred.JobClient: Counters: 32 > > 12/12/03 10:30:23 INFO mapred.JobClient: File System Counters > > 12/12/03 10:30:23 INFO mapred.JobClient: FILE: Number of bytes > > read=8076460 > > 12/12/03 10:30:23 INFO mapred.JobClient: FILE: Number of bytes > > written=18396152 > > 12/12/03 10:30:23 INFO mapred.JobClient: FILE: Number of read > > operations=0 > > 12/12/03 10:30:23 INFO mapred.JobClient: FILE: Number of large read > > operations=0 > > 12/12/03 10:30:23 INFO mapred.JobClient: FILE: Number of write > > operations=0 > > 12/12/03 10:30:23 INFO mapred.JobClient: HDFS: Number of bytes > > read=14054985 > > 12/12/03 10:30:23 INFO mapred.JobClient: HDFS: Number of bytes > > written=4040120 > > 12/12/03 10:30:23 INFO mapred.JobClient: HDFS: Number of read > > operations=166 > > 12/12/03 10:30:23 INFO mapred.JobClient: HDFS: Number of large read > > operations=0 > > 12/12/03 10:30:23 INFO mapred.JobClient: HDFS: Number of write > > operations=91 > > 12/12/03 10:30:23 INFO mapred.JobClient: Job Counters > > 12/12/03 10:30:23 INFO mapred.JobClient: Launched map tasks=2 > > 12/12/03 10:30:23 INFO mapred.JobClient: Launched reduce tasks=10 > > 12/12/03 10:30:23 INFO mapred.JobClient: Data-local map tasks=2 > > 12/12/03 10:30:23 INFO mapred.JobClient: Total time spent by all maps > > in occupied slots (ms)=456617 > > 12/12/03 10:30:23 INFO mapred.JobClient: Total time spent by all > > reduces in occupied slots (ms)=108715 > > 12/12/03 10:30:23 INFO mapred.JobClient: Total time spent by all maps > > waiting after reserving slots (ms)=0 > > 12/12/03 10:30:23 INFO mapred.JobClient: Total time spent by all > > reduces waiting after reserving slots (ms)=0 > > 12/12/03 10:30:23 INFO mapred.JobClient: Map-Reduce Framework > > 12/12/03 10:30:23 INFO mapred.JobClient: Map input records=77332 > > 12/12/03 10:30:23 INFO mapred.JobClient: Map output records=100 > > 12/12/03 10:30:23 INFO mapred.JobClient: Map output bytes=8075900 > > 12/12/03 10:30:23 INFO mapred.JobClient: Input split bytes=288 > > 12/12/03 10:30:23 INFO mapred.JobClient: Combine input records=100 > > 12/12/03 10:30:23 INFO mapred.JobClient: Combine output records=100 > > 12/12/03 10:30:23 INFO mapred.JobClient: Reduce input groups=50 > > 12/12/03 10:30:23 INFO mapred.JobClient: Reduce shuffle bytes=8076520 > > 12/12/03 10:30:23 INFO mapred.JobClient: Reduce input records=100 > > 12/12/03 10:30:23 INFO mapred.JobClient: Reduce output records=50 > > 12/12/03 10:30:23 INFO mapred.JobClient: Spilled Records=200 > > 12/12/03 10:30:23 INFO mapred.JobClient: CPU time spent (ms)=570850 > > 12/12/03 10:30:23 INFO mapred.JobClient: Physical memory (bytes) > > snapshot=3334303744 > > 12/12/03 10:30:23 INFO mapred.JobClient: Virtual memory (bytes) > > snapshot=35329503232 > > 12/12/03 10:30:23 INFO mapred.JobClient: Total committed heap usage > > (bytes)=6070009856 > > > > > > Cheers, Markus > > > > > > 2012/12/3 Markus Paaso <[email protected]> > > > >> Hi, > >> > >> I have some problems to utilize all available CPU power for 'mahout cvb' > >> command. > >> The CPU usage is just about 35% and IO wait ~0%. > >> I have 8 cores and 28 GB memory in a single computer that is running > >> Mahout 0.7-cdh-4.1.2 with Hadoop 2.0.0-cdh4.1.2 in pseudo-distributed > mode. > >> How can I take advantage of all the CPU power for a single 'mahout cvb' > >> task? > >> > >> > >> I use following parameters to run mahout cvb: > >> > >> mahout cvb > >> -Ddfs.namenode.handler.count=32 > >> -Dmapred.job.tracker.handler.count=32 > >> -Dio.sort.factor=30 > >> -Dio.sort.mb=500 > >> -Dio.file.buffer.size=65536 > >> -Dmapred.child.java.opts=-Xmx2g > >> -Dmapred.map.child.java.opts=-Xmx2g > >> -Dmapred.reduce.child.java.opts=-Xmx2g > >> -Dmapred.job.reuse.jvm.num.tasks=-1 > >> -Dmapred.map.tasks=7 > >> -Dmapred.reduce.tasks=7 > >> -Dmapred.max.split.size=3145728 > >> -Dmapred.min.split.size=3145728 > >> -Dmapred.tasktracker.map.tasks.maximum=7 > >> -Dmapred.tasktracker.reduce.tasks.maximum=7 > >> -Dmapred.tasktracker.tasks.maximum=7 > >> --input ~/mahout-files/mydatavectors_int > >> --output ~/mahout-files/topics > >> --num_terms 10078 > >> --num_topics 50 > >> --doc_topic_output ~/mahout-files/doc-topics > >> --maxIter 50 > >> --num_update_threads 8 > >> --num_train_threads 8 > >> -block 1 > >> --test_set_fraction 0.1 > >> --convergenceDelta 0.0000001 > >> --tempDir ~/mahout-files/cvb-temp > >> > >> > >> Linux top command says: > >> > >> Cpu(s): 33.9%us, 1.1%sy, 0.0%ni, 65.0%id, 0.0%wa, 0.0%hi, 0.0%si, > >> 0.0%st > >> Mem: 28479224k total, 16398624k used, 12080600k free, 899576k buffers > >> Swap: 28942332k total, 0k used, 28942332k free, 5733368k cached > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >> 19765 mapred 20 0 2811m 650m 16m S 129 2.3 3:59.06 java > >> 19721 mapred 20 0 2812m 650m 16m S 125 2.3 3:53.70 java > >> > >> So just 2.5 / 8 cores are fully in use. > >> > >> > >> Regards, Markus > > > > > > > > -- > > Markus Paaso > > Developer, Sagire Software Oy > > http://sagire.fi/ >
