Hi Bryan, J-D replied in another thread. The issue was because of a misconfiguration on the mapred side. I was facing only the local job tracker.. that's why only 2 tasks was running at a time.
I re-configured the cluster and it's now working very well. Next step is to build my own mapreduce for testing... Thanks, JM 2012/10/11, Bryan Beaudreault <[email protected]>: > JM, > > Are you trying to use HTableInputFormat to scan HBase from map reduce? If > so, there is a map task per region so you should have 25 regions. If only > 2 are running at once thats a problem with your hadoop setup. Is your job > running in a pool with only 2 slots available? > > If not HTableInputFormat, Jon is right. If your input is a splittable > format, like SequenceFileInputFormat, you can further split them using the > setting mapred.max.split.size. I believe the default is 100mb or > something. This can be set on a per-job basis using the job conf. > > - Bryan > > On Thu, Oct 11, 2012 at 4:50 PM, Jonathan Bishop > <[email protected]>wrote: > >> JM, >> >> The number of map tasks will be limited by the number of input splits >> available. Assuming you are reading files, that is. >> >> Also, you need to reboot your cluster for those setting to take effect. >> >> Hope this helps, >> >> Jon Bishop >> >> On Thu, Oct 11, 2012 at 1:44 PM, Jean-Marc Spaggiari < >> [email protected]> wrote: >> >> > But this is the limit per tasktracker, right? >> > >> > And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12 >> > tasks? >> > >> > Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ >> > >> > I just tried with the setting below (changing 2 by 6) but I'm getting >> > the same result. >> > >> > JM >> > >> > 2012/10/11 Kevin O'dell <[email protected]>: >> > > J-M, >> > > >> > > It should be in the mapred-site.xml the values >> > > are mapred.tasktracker.map.tasks.maximum and >> > > mapred.tasktracker.reduce.tasks.maximum. This is the default in CDH4 >> > > >> > > <property> >> > > <name>mapreduce.tasktracker.map.tasks.maximum</name> >> > > <value>2</value> >> > > <description>The maximum number of map tasks that will be run >> > > simultaneously by a task tracker. >> > > </description> >> > > </property> >> > > >> > > <property> >> > > <name>mapreduce.tasktracker.reduce.tasks.maximum</name> >> > > <value>2</value> >> > > <description>The maximum number of reduce tasks that will be run >> > > simultaneously by a task tracker. >> > > </description> >> > > </property> >> > > >> > > This would explain why they are going 2 by 2. Does this help? >> > > >> > > On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari < >> > > [email protected]> wrote: >> > > >> > >> I don't know. I did not touched that. Where can I found this >> > information? >> > >> >> > >> 2012/10/11 Kevin O'dell <[email protected]>: >> > >> > What are you max tasks set to? >> > >> > >> > >> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari < >> > >> > [email protected]> wrote: >> > >> > >> > >> >> Hi, >> > >> >> >> > >> >> Is there a way to force the number of map tasks in a MR? >> > >> >> >> > >> >> I have a 25 regions table splitted over 6 nodes. But the MR is >> > running >> > >> >> the tasks only 2 by 2. >> > >> >> >> > >> >> Is there a way to force it to run one task on each regionserver >> > >> >> serving at least one region? Why is the MR waiting for 2 taskes >> > >> >> to >> > >> >> complete before sending to the other tasks? >> > >> >> >> > >> >> I'm starting the MR with a caching of 100. >> > >> >> >> > >> >> I tried mapred.map.tasks and speculative=false with no success. >> > >> >> >> > >> >> Any idea how I can increase it this number of tasks? >> > >> >> >> > >> >> Thanks, >> > >> >> >> > >> >> JM >> > >> >> >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > Kevin O'Dell >> > >> > Customer Operations Engineer, Cloudera >> > >> >> > > >> > > >> > > >> > > -- >> > > Kevin O'Dell >> > > Customer Operations Engineer, Cloudera >> > >> >
