Re: Force the number of map tasks in MR?

Bryan Beaudreault Thu, 11 Oct 2012 18:11:18 -0700

JM,

Are you trying to use HTableInputFormat to scan HBase from map reduce?  If
so, there is a map task per region so you should have 25 regions.  If only
2 are running at once thats a problem with your hadoop setup.  Is your job
running in a pool with only 2 slots available?


If not HTableInputFormat, Jon is right.  If your input is a splittable
format, like SequenceFileInputFormat, you can further split them using the
setting mapred.max.split.size.  I believe the default is 100mb or
something.  This can be set on a per-job basis using the job conf.

- Bryan

On Thu, Oct 11, 2012 at 4:50 PM, Jonathan Bishop <[email protected]>wrote:

> JM,
>
> The number of map tasks will be limited by the number of input splits
> available. Assuming you are reading files, that is.
>
> Also, you need to reboot your cluster for those setting to take effect.
>
> Hope this helps,
>
> Jon Bishop
>
> On Thu, Oct 11, 2012 at 1:44 PM, Jean-Marc Spaggiari <
> [email protected]> wrote:
>
> > But this is the limit per tasktracker, right?
> >
> > And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12
> > tasks?
> >
> > Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ
> >
> > I just tried with the setting below (changing 2 by 6) but I'm getting
> > the same result.
> >
> > JM
> >
> > 2012/10/11 Kevin O'dell <[email protected]>:
> > > J-M,
> > >
> > >   It should be in the mapred-site.xml the values
> > > are mapred.tasktracker.map.tasks.maximum and
> > > mapred.tasktracker.reduce.tasks.maximum.  This is the default in CDH4
> > >
> > > <property>
> > >   <name>mapreduce.tasktracker.map.tasks.maximum</name>
> > >   <value>2</value>
> > >   <description>The maximum number of map tasks that will be run
> > >   simultaneously by a task tracker.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
> > >   <value>2</value>
> > >   <description>The maximum number of reduce tasks that will be run
> > >   simultaneously by a task tracker.
> > >   </description>
> > > </property>
> > >
> > > This would explain why they are going 2 by 2.  Does this help?
> > >
> > > On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari <
> > > [email protected]> wrote:
> > >
> > >> I don't know. I did not touched that. Where can I found this
> > information?
> > >>
> > >> 2012/10/11 Kevin O'dell <[email protected]>:
> > >> > What are you max tasks set to?
> > >> >
> > >> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
> > >> > [email protected]> wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> Is there a way to force the number of map tasks in a MR?
> > >> >>
> > >> >> I have a 25 regions table splitted over 6 nodes. But the MR is
> > running
> > >> >> the tasks only 2 by 2.
> > >> >>
> > >> >> Is there a way to force it to run one task on each regionserver
> > >> >> serving at least one region? Why is the MR waiting for 2 taskes to
> > >> >> complete before sending to the other tasks?
> > >> >>
> > >> >> I'm starting the MR with a caching of 100.
> > >> >>
> > >> >> I tried mapred.map.tasks and speculative=false with no success.
> > >> >>
> > >> >> Any idea how I can increase it this number of tasks?
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> JM
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Kevin O'Dell
> > >> > Customer Operations Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Kevin O'Dell
> > > Customer Operations Engineer, Cloudera
> >
>

Re: Force the number of map tasks in MR?

Reply via email to