[ https://issues.apache.org/jira/browse/PIG-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785134#action_12785134 ]
Olga Natkovich commented on PIG-1116: ------------------------------------- +1 > Remove redundant map-reduce job for merge join > ---------------------------------------------- > > Key: PIG-1116 > URL: https://issues.apache.org/jira/browse/PIG-1116 > Project: Pig > Issue Type: Bug > Reporter: Daniel Dai > Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: PIG-1116.patch > > > In merge join, when we convert right hand side file into a side file, we > didn't remove it from the map-reduce plan, we only disconnect it from the > plan. When we run the query, the redundant load will load the data but doing > nothing. This operation should be removed entirely. > Eg: > a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using > org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, gpa); > b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using > org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, > registration, contributions); > c = join a by name, b by name using "merge"; > explain c; > {code} > #-------------------------------------------------- > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node 1-21 > Map Plan > Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/votersortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted')) > - 1-13-------- > Global sort: false > ---------------- > MapReduce node 1-20 > Map Plan > Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-19 > | > |---MergeJoin[tuple] - 1-16 > | > > |---Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/studentsortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted')) > - 1-12-------- > Global sort: false > ---------------- > {code} > 1-21 should be removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.