Hi, I just realized that one of my large scale pig jobs that has 100K map jobs actually only has one reduce task. Reading the documentation I see that the number of reduce tasks is defined by the PARALLEL clause whose default value is 1. I have a few questions around this:
# Why is the default value of reduce tasks 1? # (Related to first question) Why aren't reduce tasks parallelized automatically in Pig? # How do I choose a good value of reduce tasks for my pig jobs? Thanks in Advance, Pankaj
