[jira] Commented: (PIG-162) Rework mapreduce submission and monitoring

Shravan Matthur Narayanamurthy (JIRA) Mon, 12 May 2008 11:57:30 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596163#action_12596163
 ]


Shravan Matthur Narayanamurthy commented on PIG-162:
----------------------------------------------------

I am sorry Alan. I missed this. I forgot that I had submitted the patch under 
Pig-162 and was just checking Pig-157 & Pig-161.

1) I'm still unclear on why we now need a separate map class for map only jobs. 
I see that for map only jobs you are omitting the key and just collecting the 
tuple. This makes sense. But if we need to do this how is it that Arun's patch 
196 works? (That's really a question for Arun more than for you.) You say it 
has something to do with types. Based on a brief glance at the current code, it 
looks like we're using the tuple for both key and value in the current code and 
the first field of the tuple in the new code. Is that what causes the issue? If 
so, what are the advantages of using the first field in the tuple instead of 
the whole tuple as the key for sort and shuffle?
[shrav] The processing for both the cases is different. In the map only case, 
the tuple returned is the one we need to store. In the map-reduce case, there 
will be a local rearrange at the end of the map plan and it would emit a tuple 
of the form <key, IndexedTuple>. So, for the map-reduce case, we need to 
extract the key, conert it to Hadoop type and then use the IndexedTuple as the 
value. I could not think of a way of clubbing these two without an if statement 
in the map call. All this circus was to avoid that if statement.

2) Please make your junit tests conform to the older style of junit. Some users 
are still on the older version, and we haven't explicitly required 4.x type 
junit. This will mean changing all of the method names in TestMRCompiler that 
start with test but aren't tests.
[shrav] I can change TestMRCompiler but what about the other test cases Alan?

3) As a general note, we're all in agreement that having the separate reporter 
thread is an issue, but you'll submit a fix for that in a subsequent patch.
[shrav] Yes

4) I'm willing to submit this patch as is, except that the TestMRCompiler test 
fails in the unit tests. I'll attach the output of that test run separately. 
From what I could tell the failure was a real issue, the plan being generated 
did not appear to match the expected plan.
[shrav] Alan, I was looking into it. Actually there is no problem in the plan 
generated. However, I have assumed that the order remains constant since we use 
lists everywhere. I think this is not valid. It works fine on my machine which 
made me assume that the Set's iterator's might not implement some random 
iteration through it. Without this assumption, its really hard to verify the 
plan. I will have to think abou this. Probably some way of sorting and storing.

> Rework mapreduce submission and monitoring
> ------------------------------------------
>
>                 Key: PIG-162
>                 URL: https://issues.apache.org/jira/browse/PIG-162
>             Project: Pig
>          Issue Type: Sub-task
>         Environment: This bug tracks works to rework the submission and 
> monitoring interface to map reduce as described in  
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: mapreduceJumbo.patch, mapreduceJumboWithComInc.patch, 
> split.png, TEST-org.apache.pig.test.TestMRCompiler.txt, 
> TEST-org.apache.pig.test.TestMRCompiler.txt, 
> TEST-org.apache.pig.test.TestUnion.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-162) Rework mapreduce submission and monitoring

Reply via email to