Re: Force the number of map tasks in MR?

Jean-Marc Spaggiari Thu, 11 Oct 2012 18:45:37 -0700

Hi Bryan,

J-D replied in another thread. The issue was because of a
misconfiguration on the mapred side. I was facing only the local job
tracker.. that's why only 2 tasks was running at a time.


I re-configured the cluster and it's now working very well.

Next step is to build my own mapreduce for testing...

Thanks,

JM

2012/10/11, Bryan Beaudreault <[email protected]>:
> JM,
>
> Are you trying to use HTableInputFormat to scan HBase from map reduce?  If
> so, there is a map task per region so you should have 25 regions.  If only
> 2 are running at once thats a problem with your hadoop setup.  Is your job
> running in a pool with only 2 slots available?
>
> If not HTableInputFormat, Jon is right.  If your input is a splittable
> format, like SequenceFileInputFormat, you can further split them using the
> setting mapred.max.split.size.  I believe the default is 100mb or
> something.  This can be set on a per-job basis using the job conf.
>
> - Bryan
>
> On Thu, Oct 11, 2012 at 4:50 PM, Jonathan Bishop
> <[email protected]>wrote:
>
>> JM,
>>
>> The number of map tasks will be limited by the number of input splits
>> available. Assuming you are reading files, that is.
>>
>> Also, you need to reboot your cluster for those setting to take effect.
>>
>> Hope this helps,
>>
>> Jon Bishop
>>
>> On Thu, Oct 11, 2012 at 1:44 PM, Jean-Marc Spaggiari <
>> [email protected]> wrote:
>>
>> > But this is the limit per tasktracker, right?
>> >
>> > And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12
>> > tasks?
>> >
>> > Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ
>> >
>> > I just tried with the setting below (changing 2 by 6) but I'm getting
>> > the same result.
>> >
>> > JM
>> >
>> > 2012/10/11 Kevin O'dell <[email protected]>:
>> > > J-M,
>> > >
>> > >   It should be in the mapred-site.xml the values
>> > > are mapred.tasktracker.map.tasks.maximum and
>> > > mapred.tasktracker.reduce.tasks.maximum.  This is the default in CDH4
>> > >
>> > > <property>
>> > >   <name>mapreduce.tasktracker.map.tasks.maximum</name>
>> > >   <value>2</value>
>> > >   <description>The maximum number of map tasks that will be run
>> > >   simultaneously by a task tracker.
>> > >   </description>
>> > > </property>
>> > >
>> > > <property>
>> > >   <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>> > >   <value>2</value>
>> > >   <description>The maximum number of reduce tasks that will be run
>> > >   simultaneously by a task tracker.
>> > >   </description>
>> > > </property>
>> > >
>> > > This would explain why they are going 2 by 2.  Does this help?
>> > >
>> > > On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari <
>> > > [email protected]> wrote:
>> > >
>> > >> I don't know. I did not touched that. Where can I found this
>> > information?
>> > >>
>> > >> 2012/10/11 Kevin O'dell <[email protected]>:
>> > >> > What are you max tasks set to?
>> > >> >
>> > >> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
>> > >> > [email protected]> wrote:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> Is there a way to force the number of map tasks in a MR?
>> > >> >>
>> > >> >> I have a 25 regions table splitted over 6 nodes. But the MR is
>> > running
>> > >> >> the tasks only 2 by 2.
>> > >> >>
>> > >> >> Is there a way to force it to run one task on each regionserver
>> > >> >> serving at least one region? Why is the MR waiting for 2 taskes
>> > >> >> to
>> > >> >> complete before sending to the other tasks?
>> > >> >>
>> > >> >> I'm starting the MR with a caching of 100.
>> > >> >>
>> > >> >> I tried mapred.map.tasks and speculative=false with no success.
>> > >> >>
>> > >> >> Any idea how I can increase it this number of tasks?
>> > >> >>
>> > >> >> Thanks,
>> > >> >>
>> > >> >> JM
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> > Kevin O'Dell
>> > >> > Customer Operations Engineer, Cloudera
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Kevin O'Dell
>> > > Customer Operations Engineer, Cloudera
>> >
>>
>

Re: Force the number of map tasks in MR?

Reply via email to