That doesn't seem right. try doing an `EXPLAIN` on your script. Could you please post the PIG script here ?
On Wed, May 2, 2012 at 11:28 AM, shan s <[email protected]> wrote: > By other steps, I mainly mean other default joins in the script. > > The point is that when I use 'Replicated' join, 2 maps tasks are > scheduled. When I use "default" join, 100+ map jobs are scheduled. > How do we explain this decision process? > How can I increase actual no. of maps scheduled for Replicated joins? > > On Mon, Apr 30, 2012 at 11:59 PM, Prashant Kommireddi <[email protected] > > > wrote: > > > > 2 map tasks for join vs 100+ in other steps, what are "other" steps here? > > > > Your 2nd question, I think you are asking about Map and Reduce Task > > capacity mentioned on the JobTracker page? That is governed based on > > configuration properties set before hadoop is started on cluster. > > > > > > > > > > On Mon, Apr 30, 2012 at 7:54 AM, shan s <[email protected]> wrote: > > > > > Sorry for the previous incomplete message. > > > Here is the take 2: > > > > > > When I use a Replicated Join only 2 map tasks get scheduled (compared > to > > > 100+ tasks for the other steps) > > > What is the idea behind this? What setting do I use to override this > > > behaviour? > > > > > > > > > Also, a basic question. > > > Does hadoop decide the map task capacity or it simply follows the > > > configuration? > > > > > > Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted > Nodes > > > Excluded Nodes > > > 64 20 1.00 > > > > > > Thanks, Prashant. > > > >
