[
https://issues.apache.org/jira/browse/PIG-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596163#action_12596163
]
Shravan Matthur Narayanamurthy commented on PIG-162:
----------------------------------------------------
I am sorry Alan. I missed this. I forgot that I had submitted the patch under
Pig-162 and was just checking Pig-157 & Pig-161.
1) I'm still unclear on why we now need a separate map class for map only jobs.
I see that for map only jobs you are omitting the key and just collecting the
tuple. This makes sense. But if we need to do this how is it that Arun's patch
196 works? (That's really a question for Arun more than for you.) You say it
has something to do with types. Based on a brief glance at the current code, it
looks like we're using the tuple for both key and value in the current code and
the first field of the tuple in the new code. Is that what causes the issue? If
so, what are the advantages of using the first field in the tuple instead of
the whole tuple as the key for sort and shuffle?
[shrav] The processing for both the cases is different. In the map only case,
the tuple returned is the one we need to store. In the map-reduce case, there
will be a local rearrange at the end of the map plan and it would emit a tuple
of the form <key, IndexedTuple>. So, for the map-reduce case, we need to
extract the key, conert it to Hadoop type and then use the IndexedTuple as the
value. I could not think of a way of clubbing these two without an if statement
in the map call. All this circus was to avoid that if statement.
2) Please make your junit tests conform to the older style of junit. Some users
are still on the older version, and we haven't explicitly required 4.x type
junit. This will mean changing all of the method names in TestMRCompiler that
start with test but aren't tests.
[shrav] I can change TestMRCompiler but what about the other test cases Alan?
3) As a general note, we're all in agreement that having the separate reporter
thread is an issue, but you'll submit a fix for that in a subsequent patch.
[shrav] Yes
4) I'm willing to submit this patch as is, except that the TestMRCompiler test
fails in the unit tests. I'll attach the output of that test run separately.
From what I could tell the failure was a real issue, the plan being generated
did not appear to match the expected plan.
[shrav] Alan, I was looking into it. Actually there is no problem in the plan
generated. However, I have assumed that the order remains constant since we use
lists everywhere. I think this is not valid. It works fine on my machine which
made me assume that the Set's iterator's might not implement some random
iteration through it. Without this assumption, its really hard to verify the
plan. I will have to think abou this. Probably some way of sorting and storing.
> Rework mapreduce submission and monitoring
> ------------------------------------------
>
> Key: PIG-162
> URL: https://issues.apache.org/jira/browse/PIG-162
> Project: Pig
> Issue Type: Sub-task
> Environment: This bug tracks works to rework the submission and
> monitoring interface to map reduce as described in
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: mapreduceJumbo.patch, mapreduceJumboWithComInc.patch,
> split.png, TEST-org.apache.pig.test.TestMRCompiler.txt,
> TEST-org.apache.pig.test.TestMRCompiler.txt,
> TEST-org.apache.pig.test.TestUnion.txt
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.