On Sun, Jul 3, 2011 at 10:00 PM, Prabhu Dhakshina Murthy < prabh...@yahoo-inc.com> wrote:
> I have few questions on running the pig script/ map-reduce jobs. > > 1. I know that pig creates *logical, physical and then execution plans* > before it really starts executing the map/reduce job; I am able to look > at the logical/physical plans using the command *explain <alias_name>*; > But how do I view the execution plan (which I suppose list the different > map/reduce tasks planned)? In the course of pig execution, I see that > many jobs (map/reduce pair) are created. Want to understand what each of > these jobs solve. > > "explain alias" will show logical plan, physical plan and MR plan. Check carefully. 2. Is there any definitive guide which I can use to understand the plans > created because what is spat is difficult to understand. > Check * http://ofps.oreilly.com/titles/9781449302641/developing_and_testing.html#dev_tools * 3. I am able to change the number of map jobs by changing the number of > input file blocks. Do I have control over the number of reduce jobs as > well? How do I set the number of reducers? > Check http://pig.apache.org/docs/r0.8.1/cookbook.html#Use+the+Parallel+Features > 4. What is the default heap memory size in mapper/reducer nodes? Which > job parameters reflect these? Will I be able to change the heap memory > by -Xmx 1024m option? My jobs used to fail when I set the heap memory in > this way - May be there are some restrictions on what values can be > supplied? > It is controlled "mapred.*child*.java.opts" Daniel > > Thanks much! >