I have few questions on running the pig script/ map-reduce jobs.

1. I know that pig creates *logical, physical and then execution plans*
before it really starts executing the map/reduce job; I am able to look
at the logical/physical plans using the command *explain <alias_name>*;
But how do I view the execution plan (which I suppose list the different
map/reduce tasks planned)? In the course of pig execution, I see that
many jobs (map/reduce pair) are created. Want to understand what each of
these jobs solve.

2. Is there any definitive guide which I can use to understand the plans
created because what is spat is difficult to understand.

3. I am able to change the number of map jobs by changing the number of
input file blocks. Do I have control over the number of reduce jobs as
well? How do I set the number of reducers?

4. What is the default heap memory size in mapper/reducer nodes? Which
job parameters reflect these? Will I be able to change the heap memory
by -Xmx 1024m option? My jobs used to fail when I set the heap memory in
this way - May be there are some restrictions on what values can be
supplied?

Thanks much!

Reply via email to