Re: replicated join amd map tasks

Rajgopal Vaithiyanathan Wed, 02 May 2012 03:01:56 -0700

That doesn't seem right. try doing an `EXPLAIN` on your script.
Could you please post the PIG script here ?


On Wed, May 2, 2012 at 11:28 AM, shan s <[email protected]> wrote:

> By other steps, I mainly mean other default joins in the script.
>
> The point is that when I use 'Replicated'  join, 2 maps tasks are
> scheduled. When I use "default" join, 100+ map jobs are scheduled.
> How do we explain this decision process?
> How can I increase actual no. of maps scheduled for Replicated joins?
>
> On Mon, Apr 30, 2012 at 11:59 PM, Prashant Kommireddi <[email protected]
> >
> wrote:
> >
> > 2 map tasks for join vs 100+ in other steps, what are "other" steps here?
> >
> > Your 2nd question, I think you are asking about Map and Reduce Task
> > capacity mentioned on the JobTracker page? That is governed based on
> > configuration properties set before hadoop is started on cluster.
> >
> >
> >
> >
> > On Mon, Apr 30, 2012 at 7:54 AM, shan s <[email protected]> wrote:
> >
> > > Sorry for the previous incomplete message.
> > > Here is the take 2:
> > >
> > > When I use a Replicated Join only 2 map tasks get scheduled (compared
> to
> > > 100+ tasks for the other steps)
> > > What is the idea behind this? What setting do I use to override this
> > > behaviour?
> > >
> > >
> > > Also, a basic question.
> > > Does hadoop decide the map task capacity or it simply follows the
> > > configuration?
> > >
> > > Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted
> Nodes
> > > Excluded Nodes
> > >  64                         20                             1.00
> > >
> > > Thanks, Prashant.
> > >
>

Re: replicated join amd map tasks

Reply via email to