[ 
https://issues.apache.org/jira/browse/PIG-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594957#action_12594957
 ] 

Shravan Matthur Narayanamurthy commented on PIG-162:
----------------------------------------------------

Thanks for reviewing the patch Arun. My responses in line..

1. Hadoop doesn't require Writables from 0.17.0 onwards: HADOOP-1986, so you 
could use that as an advantage.
[shrav] This is great. However, in the types branch we are still at hadoop-15. 
We plan to merge the changes in the main branch later and I think this is a 
good candidate to be taken up then. I thought about it for a while. Currently, 
we do not have a umbrella class for our types other than WritableComparable. 
Could not come up with a neat soln for this. Need to think more on this. 
Certainly a good point and we need to spend more thime on this one.

2. I agree with Alan about map-only jobs, just use something similar to PIG-196 
(my unbiased opinion smile).
[shrav] I think doing something like PIG-196 would incur a branching in every 
call to the map function checking whether it is a map-only job. This additional 
complexity is due to the introduction of types. In the map-only jobs, we don't 
care about extracting the key & indexed tuple. In a map-reduce job, we have to 
do the extraction. This is the branching I wanted to avoid. I guess I gave a 
naive solution by duplicating code; one for map-only & the other for 
map-reduce. I guess a better solution, as Alan suggested would be to subclass 
both map-only & map-reduce Map classes and have an abstract collectKeyAndTuple 
function which will be implemented in the map-only & map-reduce classes 
accordingly.

3. RunnableReporter is a thread which blindly does 'reporting'. This makes it 
very hard to debug when applications go haywire. By this, you are going to miss 
a very important safety net provided Hadoop Map-Reduce i.e. the ability to kill 
tasks which aren't 'progressing'. Please do not do this! Ideally you should be 
using the reporter in the map/reduce functions to report progress when tuples 
are being consumed.
[shrav] You are right. I will change that. Since this is a major change, I will 
do it once this patch and Shubham's patch is in. I will write a proposal on the 
changes and submit it.

4. The 'Slicer' notion is missing from PigInputFormat/PigSplit... are you 
planning to integrate it later?
[shrav] Yeah we have left it to the merging phase later

5. It's great that you are using Hadoop's jobcontrol, please let us know if 
anything was amiss here: HADOOP- 2484.
[shrav] It works well. Probably some more documentation would be helpful.

Regarding Pig's notion of "Properties", are you referring to the backend and 
datastorage? If so, I think we need to take this up during the merge of changes 
from the main branch

Regarding, creating a separate JobContorlCompiler, I did that because, I wanted 
to leave some room for the optimizer to act. So once the MROperPlan is built, 
it can be optimized and then  JobControlCompiler can work on the optimized plan 
to generate the Job Control.

> Rework mapreduce submission and monitoring
> ------------------------------------------
>
>                 Key: PIG-162
>                 URL: https://issues.apache.org/jira/browse/PIG-162
>             Project: Pig
>          Issue Type: Sub-task
>         Environment: This bug tracks works to rework the submission and 
> monitoring interface to map reduce as described in  
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: mapreduceJumbo.patch, split.png, 
> TEST-org.apache.pig.test.TestMRCompiler.txt, 
> TEST-org.apache.pig.test.TestUnion.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to