[jira] Commented: (PIG-162) Rework mapreduce submission and monitoring

Arun C Murthy (JIRA) Wed, 07 May 2008 02:16:23 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594832#action_12594832
 ]


Arun C Murthy commented on PIG-162:
-----------------------------------

Shravan, I'm not super-familiar with the new pipeline so pardon my temerity 
while I put forth some thoughts on this patch... I'll confine it to Pig's usage 
of Map-Reduce:

1. Hadoop doesn't require Writables from 0.17.0 onwards: HADOOP-1986, so you 
could use that as an advantage.
2. I agree with Alan about map-only jobs, just use something similar to PIG-196 
(my unbiased opinion *smile*).
3. RunnableReporter is a thread which blindly does 'reporting'. This makes it 
_very_ hard to debug when applications go haywire. By this, you are going to 
miss a very important safety net provided Hadoop Map-Reduce i.e. the ability to 
kill tasks which aren't 'progressing'. *Please do not do this!* Ideally you 
should be using the *reporter* in the map/reduce functions to report progress 
when tuples are being consumed.
4. The 'Slicer' notion is missing from PigInputFormat/PigSplit... are you 
planning to integrate it later?
5. It's great that you are using Hadoop's jobcontrol, please let us know if 
anything was amiss here: HADOOP- 2484.

----

Unrelated to this patch: I've felt the pain of jumping between Pig's notion of 
"Properties" and Hadoop's Configuration/JobConf and worse, keeping them 
in-sync. This led to some obscure bugs like PIG-230. Can you guys consider 
using Configuration/JobConf uniformly in both the logical and physical layers? 
IMHO it will a huge maintence win... thoughts? Alan?

Similarly I don't see the value in JobControlCompiler transating between 
MROperPlan and JobControl, for e.g.
{noformat}
+    public JobControl compile(MROperPlan plan, String grpName, Configuration 
conf, PigContext pigContext) throws JobCreationException{
+        this.plan = plan;
+        this.conf = conf;
+        this.pigContext = pigContext;
+        JobControl jobCtrl = new JobControl(grpName);
+        
+        List<MapReduceOper> leaevs = new ArrayList<MapReduceOper>();
+        leaevs = plan.getLeaves();
+        
+        for (MapReduceOper mro : leaevs) {
+            jobCtrl.addJob(compile(mro,jobCtrl));
+        }
+        return jobCtrl;
+    }
{noformat}

The notion of MRCompiler and JobControlCompiler makes me a tad uneasy, is it an 
overkill? Can we have one compiler and one visitor?
Should we use org.apache.hadoop.mapred.jobcontrol.Job more extensively?

I realise I'm air-brushing here, but yet ... 

> Rework mapreduce submission and monitoring
> ------------------------------------------
>
>                 Key: PIG-162
>                 URL: https://issues.apache.org/jira/browse/PIG-162
>             Project: Pig
>          Issue Type: Sub-task
>         Environment: This bug tracks works to rework the submission and 
> monitoring interface to map reduce as described in  
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: mapreduceJumbo.patch, split.png, 
> TEST-org.apache.pig.test.TestMRCompiler.txt, 
> TEST-org.apache.pig.test.TestUnion.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-162) Rework mapreduce submission and monitoring

Reply via email to