JM, Are you trying to use HTableInputFormat to scan HBase from map reduce? If so, there is a map task per region so you should have 25 regions. If only 2 are running at once thats a problem with your hadoop setup. Is your job running in a pool with only 2 slots available?
If not HTableInputFormat, Jon is right. If your input is a splittable format, like SequenceFileInputFormat, you can further split them using the setting mapred.max.split.size. I believe the default is 100mb or something. This can be set on a per-job basis using the job conf. - Bryan On Thu, Oct 11, 2012 at 4:50 PM, Jonathan Bishop <[email protected]>wrote: > JM, > > The number of map tasks will be limited by the number of input splits > available. Assuming you are reading files, that is. > > Also, you need to reboot your cluster for those setting to take effect. > > Hope this helps, > > Jon Bishop > > On Thu, Oct 11, 2012 at 1:44 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > But this is the limit per tasktracker, right? > > > > And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12 > > tasks? > > > > Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ > > > > I just tried with the setting below (changing 2 by 6) but I'm getting > > the same result. > > > > JM > > > > 2012/10/11 Kevin O'dell <[email protected]>: > > > J-M, > > > > > > It should be in the mapred-site.xml the values > > > are mapred.tasktracker.map.tasks.maximum and > > > mapred.tasktracker.reduce.tasks.maximum. This is the default in CDH4 > > > > > > <property> > > > <name>mapreduce.tasktracker.map.tasks.maximum</name> > > > <value>2</value> > > > <description>The maximum number of map tasks that will be run > > > simultaneously by a task tracker. > > > </description> > > > </property> > > > > > > <property> > > > <name>mapreduce.tasktracker.reduce.tasks.maximum</name> > > > <value>2</value> > > > <description>The maximum number of reduce tasks that will be run > > > simultaneously by a task tracker. > > > </description> > > > </property> > > > > > > This would explain why they are going 2 by 2. Does this help? > > > > > > On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari < > > > [email protected]> wrote: > > > > > >> I don't know. I did not touched that. Where can I found this > > information? > > >> > > >> 2012/10/11 Kevin O'dell <[email protected]>: > > >> > What are you max tasks set to? > > >> > > > >> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari < > > >> > [email protected]> wrote: > > >> > > > >> >> Hi, > > >> >> > > >> >> Is there a way to force the number of map tasks in a MR? > > >> >> > > >> >> I have a 25 regions table splitted over 6 nodes. But the MR is > > running > > >> >> the tasks only 2 by 2. > > >> >> > > >> >> Is there a way to force it to run one task on each regionserver > > >> >> serving at least one region? Why is the MR waiting for 2 taskes to > > >> >> complete before sending to the other tasks? > > >> >> > > >> >> I'm starting the MR with a caching of 100. > > >> >> > > >> >> I tried mapred.map.tasks and speculative=false with no success. > > >> >> > > >> >> Any idea how I can increase it this number of tasks? > > >> >> > > >> >> Thanks, > > >> >> > > >> >> JM > > >> >> > > >> > > > >> > > > >> > > > >> > -- > > >> > Kevin O'Dell > > >> > Customer Operations Engineer, Cloudera > > >> > > > > > > > > > > > > -- > > > Kevin O'Dell > > > Customer Operations Engineer, Cloudera > > >
