[
https://issues.apache.org/jira/browse/PIG-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594832#action_12594832
]
Arun C Murthy commented on PIG-162:
-----------------------------------
Shravan, I'm not super-familiar with the new pipeline so pardon my temerity
while I put forth some thoughts on this patch... I'll confine it to Pig's usage
of Map-Reduce:
1. Hadoop doesn't require Writables from 0.17.0 onwards: HADOOP-1986, so you
could use that as an advantage.
2. I agree with Alan about map-only jobs, just use something similar to PIG-196
(my unbiased opinion *smile*).
3. RunnableReporter is a thread which blindly does 'reporting'. This makes it
_very_ hard to debug when applications go haywire. By this, you are going to
miss a very important safety net provided Hadoop Map-Reduce i.e. the ability to
kill tasks which aren't 'progressing'. *Please do not do this!* Ideally you
should be using the *reporter* in the map/reduce functions to report progress
when tuples are being consumed.
4. The 'Slicer' notion is missing from PigInputFormat/PigSplit... are you
planning to integrate it later?
5. It's great that you are using Hadoop's jobcontrol, please let us know if
anything was amiss here: HADOOP- 2484.
----
Unrelated to this patch: I've felt the pain of jumping between Pig's notion of
"Properties" and Hadoop's Configuration/JobConf and worse, keeping them
in-sync. This led to some obscure bugs like PIG-230. Can you guys consider
using Configuration/JobConf uniformly in both the logical and physical layers?
IMHO it will a huge maintence win... thoughts? Alan?
Similarly I don't see the value in JobControlCompiler transating between
MROperPlan and JobControl, for e.g.
{noformat}
+ public JobControl compile(MROperPlan plan, String grpName, Configuration
conf, PigContext pigContext) throws JobCreationException{
+ this.plan = plan;
+ this.conf = conf;
+ this.pigContext = pigContext;
+ JobControl jobCtrl = new JobControl(grpName);
+
+ List<MapReduceOper> leaevs = new ArrayList<MapReduceOper>();
+ leaevs = plan.getLeaves();
+
+ for (MapReduceOper mro : leaevs) {
+ jobCtrl.addJob(compile(mro,jobCtrl));
+ }
+ return jobCtrl;
+ }
{noformat}
The notion of MRCompiler and JobControlCompiler makes me a tad uneasy, is it an
overkill? Can we have one compiler and one visitor?
Should we use org.apache.hadoop.mapred.jobcontrol.Job more extensively?
I realise I'm air-brushing here, but yet ...
> Rework mapreduce submission and monitoring
> ------------------------------------------
>
> Key: PIG-162
> URL: https://issues.apache.org/jira/browse/PIG-162
> Project: Pig
> Issue Type: Sub-task
> Environment: This bug tracks works to rework the submission and
> monitoring interface to map reduce as described in
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: mapreduceJumbo.patch, split.png,
> TEST-org.apache.pig.test.TestMRCompiler.txt,
> TEST-org.apache.pig.test.TestUnion.txt
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.