Re: replicated join amd map tasks

shan s Tue, 01 May 2012 22:59:03 -0700

By other steps, I mainly mean other default joins in the script.

The point is that when I use 'Replicated'  join, 2 maps tasks are
scheduled. When I use "default" join, 100+ map jobs are scheduled.
How do we explain this decision process?
How can I increase actual no. of maps scheduled for Replicated joins?


On Mon, Apr 30, 2012 at 11:59 PM, Prashant Kommireddi <[email protected]>
wrote:
>
> 2 map tasks for join vs 100+ in other steps, what are "other" steps here?
>
> Your 2nd question, I think you are asking about Map and Reduce Task
> capacity mentioned on the JobTracker page? That is governed based on
> configuration properties set before hadoop is started on cluster.
>
>
>
>
> On Mon, Apr 30, 2012 at 7:54 AM, shan s <[email protected]> wrote:
>
> > Sorry for the previous incomplete message.
> > Here is the take 2:
> >
> > When I use a Replicated Join only 2 map tasks get scheduled (compared to
> > 100+ tasks for the other steps)
> > What is the idea behind this? What setting do I use to override this
> > behaviour?
> >
> >
> > Also, a basic question.
> > Does hadoop decide the map task capacity or it simply follows the
> > configuration?
> >
> > Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes
> > Excluded Nodes
> >  64                         20                             1.00
> >
> > Thanks, Prashant.
> >

Re: replicated join amd map tasks

Reply via email to