Remove redundant map-reduce job for merge join

                 Key: PIG-1116
             Project: Pig
          Issue Type: Bug
            Reporter: Daniel Dai

In merge join, when we convert right hand side file into a side file, we didn't 
remove it from the map-reduce plan, we only disconnect it from the plan. When 
we run the query, the redundant load will load the data but doing nothing. This 
operation should be removed entirely. 

a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using 
org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, gpa);
b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using 
org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, 
registration, contributions);
c = join a by name, b by name using "merge";
explain c;

# Map Reduce Plan                                  
MapReduce node 1-21
Map Plan
 - 1-13--------
Global sort: false

MapReduce node 1-20
Map Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-19
|---MergeJoin[tuple] - 1-16
 - 1-12--------
Global sort: false

1-21 should be removed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to