Thanks! That did the magic.

I divided the input sequence file into 8 equally sized files and now CPU
usage is near 100%.


Regards, Markus



2012/12/3 Andy Schlaikjer <[email protected]>

> Hi Markus,
>
> First I'd check to make sure your input term vectors are evenly
> partitioned into more than two part files. You can force a certain map
> side parallelism by creating a specific number of part files here. No
> matter how you configure map slots, you'll need input organized in
> such a way that they all receive tasks to process.
>
> Andy
> @sagemintblue
>
>
> On Dec 3, 2012, at 12:44 AM, Markus Paaso <[email protected]> wrote:
>
> > The log shows that there are 2 map tasks and 10 reduce tasks.
> > How can there be 10 reduce tasks when I set parameter
> > '-Dmapred.tasktracker.reduce.tasks.maximum=7'?
> > I would like to increase the amount of concurrent map tasks. Any
> parameter
> > suggestions for that?
> >
> > It seems that configuration parameter
> > 'mapred.tasktracker.map.tasks.maximum' doesn't grow the number of
> > concurrently running map tasks...
> >
> >
> > Some log rows from mahout cvb:
> >
> > 12/12/03 10:30:23 INFO mapred.JobClient: Job complete:
> job_201212011004_0432
> > 12/12/03 10:30:23 INFO mapred.JobClient: Counters: 32
> > 12/12/03 10:30:23 INFO mapred.JobClient:   File System Counters
> > 12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of bytes
> > read=8076460
> > 12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of bytes
> > written=18396152
> > 12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of read
> > operations=0
> > 12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of large read
> > operations=0
> > 12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of write
> > operations=0
> > 12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of bytes
> > read=14054985
> > 12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of bytes
> > written=4040120
> > 12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of read
> > operations=166
> > 12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of large read
> > operations=0
> > 12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of write
> > operations=91
> > 12/12/03 10:30:23 INFO mapred.JobClient:   Job Counters
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Launched map tasks=2
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Launched reduce tasks=10
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all maps
> > in occupied slots (ms)=456617
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all
> > reduces in occupied slots (ms)=108715
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all
> > reduces waiting after reserving slots (ms)=0
> > 12/12/03 10:30:23 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Map input records=77332
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Map output records=100
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Map output bytes=8075900
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Input split bytes=288
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Combine input records=100
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Combine output records=100
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Reduce input groups=50
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Reduce shuffle bytes=8076520
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Reduce input records=100
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Reduce output records=50
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Spilled Records=200
> > 12/12/03 10:30:23 INFO mapred.JobClient:     CPU time spent (ms)=570850
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3334303744
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=35329503232
> > 12/12/03 10:30:23 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6070009856
> >
> >
> > Cheers, Markus
> >
> >
> > 2012/12/3 Markus Paaso <[email protected]>
> >
> >> Hi,
> >>
> >> I have some problems to utilize all available CPU power for 'mahout cvb'
> >> command.
> >> The CPU usage is just about 35% and IO wait ~0%.
> >> I have 8 cores and 28 GB memory in a single computer that is running
> >> Mahout 0.7-cdh-4.1.2 with Hadoop 2.0.0-cdh4.1.2 in pseudo-distributed
> mode.
> >> How can I take advantage of all the CPU power for a single 'mahout cvb'
> >> task?
> >>
> >>
> >> I use following parameters to run mahout cvb:
> >>
> >> mahout cvb
> >> -Ddfs.namenode.handler.count=32
> >> -Dmapred.job.tracker.handler.count=32
> >> -Dio.sort.factor=30
> >> -Dio.sort.mb=500
> >> -Dio.file.buffer.size=65536
> >> -Dmapred.child.java.opts=-Xmx2g
> >> -Dmapred.map.child.java.opts=-Xmx2g
> >> -Dmapred.reduce.child.java.opts=-Xmx2g
> >> -Dmapred.job.reuse.jvm.num.tasks=-1
> >> -Dmapred.map.tasks=7
> >> -Dmapred.reduce.tasks=7
> >> -Dmapred.max.split.size=3145728
> >> -Dmapred.min.split.size=3145728
> >> -Dmapred.tasktracker.map.tasks.maximum=7
> >> -Dmapred.tasktracker.reduce.tasks.maximum=7
> >> -Dmapred.tasktracker.tasks.maximum=7
> >>  --input ~/mahout-files/mydatavectors_int
> >>  --output ~/mahout-files/topics
> >>  --num_terms 10078
> >>  --num_topics 50
> >>  --doc_topic_output ~/mahout-files/doc-topics
> >>  --maxIter 50
> >>  --num_update_threads 8
> >>  --num_train_threads 8
> >>  -block 1
> >>  --test_set_fraction 0.1
> >>  --convergenceDelta 0.0000001
> >>  --tempDir ~/mahout-files/cvb-temp
> >>
> >>
> >> Linux top command says:
> >>
> >> Cpu(s): 33.9%us,  1.1%sy,  0.0%ni, 65.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >> 0.0%st
> >> Mem:  28479224k total, 16398624k used, 12080600k free,   899576k buffers
> >> Swap: 28942332k total,        0k used, 28942332k free,  5733368k cached
> >>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >> 19765 mapred    20   0 2811m 650m  16m S  129  2.3   3:59.06 java
> >> 19721 mapred    20   0 2812m 650m  16m S  125  2.3   3:53.70 java
> >>
> >> So just 2.5 / 8 cores are fully in use.
> >>
> >>
> >> Regards, Markus
> >
> >
> >
> > --
> > Markus Paaso
> > Developer, Sagire Software Oy
> > http://sagire.fi/
>

Reply via email to