[ 
https://issues.apache.org/jira/browse/PIG-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1116:
--------------------------------

    Attachment: PIG-1116.patch

Attached patch to fix this issue in MRCompiler - the extra job is now removed 
out of the MROperPlan.

> Remove redundant map-reduce job for merge join
> ----------------------------------------------
>
>                 Key: PIG-1116
>                 URL: https://issues.apache.org/jira/browse/PIG-1116
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1116.patch
>
>
> In merge join, when we convert right hand side file into a side file, we 
> didn't remove it from the map-reduce plan, we only disconnect it from the 
> plan. When we run the query, the redundant load will load the data but doing 
> nothing. This operation should be removed entirely. 
> Eg: 
> a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using 
> org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, gpa);
> b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using 
> org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, 
> registration, contributions);
> c = join a by name, b by name using "merge";
> explain c;
> {code}
> #--------------------------------------------------
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-21
> Map Plan
> Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/votersortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted'))
>  - 1-13--------
> Global sort: false
> ----------------
> MapReduce node 1-20
> Map Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-19
> |
> |---MergeJoin[tuple] - 1-16
>     |
>     
> |---Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/studentsortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted'))
>  - 1-12--------
> Global sort: false
> ----------------
> {code}
> 1-21 should be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to