Sorry for the previous incomplete message. Here is the take 2: When I use a Replicated Join only 2 map tasks get scheduled (compared to 100+ tasks for the other steps) What is the idea behind this? What setting do I use to override this behaviour?
Also, a basic question. Does hadoop decide the map task capacity or it simply follows the configuration? Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Excluded Nodes 64 20 1.00 Thanks, Prashant.
