By other steps, I mainly mean other default joins in the script. The point is that when I use 'Replicated' join, 2 maps tasks are scheduled. When I use "default" join, 100+ map jobs are scheduled. How do we explain this decision process? How can I increase actual no. of maps scheduled for Replicated joins?
On Mon, Apr 30, 2012 at 11:59 PM, Prashant Kommireddi <[email protected]> wrote: > > 2 map tasks for join vs 100+ in other steps, what are "other" steps here? > > Your 2nd question, I think you are asking about Map and Reduce Task > capacity mentioned on the JobTracker page? That is governed based on > configuration properties set before hadoop is started on cluster. > > > > > On Mon, Apr 30, 2012 at 7:54 AM, shan s <[email protected]> wrote: > > > Sorry for the previous incomplete message. > > Here is the take 2: > > > > When I use a Replicated Join only 2 map tasks get scheduled (compared to > > 100+ tasks for the other steps) > > What is the idea behind this? What setting do I use to override this > > behaviour? > > > > > > Also, a basic question. > > Does hadoop decide the map task capacity or it simply follows the > > configuration? > > > > Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes > > Excluded Nodes > > 64 20 1.00 > > > > Thanks, Prashant. > >
