[ https://issues.apache.org/jira/browse/PIG-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated PIG-5445: ------------------------------ Attachment: pig-5445-v01.patch I have no understanding of how the cogroup&MergeJoinIndexer are implemented, but checking MergeJoinIndexer.java {code:java} 70 public MergeJoinIndexer(String funcSpec, String innerPlan, String serializedPhyPlan, 71 String udfCntxtSignature, String scope, String ignoreNulls) throws ExecException{ 72 73 loader = ... 82 precedingPhyPlan = (PhysicalPlan)ObjectSerializer.deserialize(serializedPhyPlan); 83 if(precedingPhyPlan != null){ 84 if(precedingPhyPlan.getLeaves().size() != 1 || precedingPhyPlan.getRoots().size() != 1){ 85 int errCode = 2168; 86 String errMsg = "Expected physical plan with exactly one root and one leaf."; 87 throw new ExecException(errMsg,errCode,PigException.BUG); 88 } 89 this.rightPipelineLeaf = precedingPhyPlan.getLeaves().get(0); 90 this.rightPipelineRoot = precedingPhyPlan.getRoots().get(0); 91 this.rightPipelineRoot.setInputs(null); ********* 92 } 93 } {code} MergeJoinIndexer is always overwriting the "inputs" with null. This means "inputs" can be skipped at serialization time. Attaching the patch (pig-5445-v01.patch) which does that. Size of TEZC-MergeCogroup-1.gld was reduced by 5 with this patch since it no longer serialize PigContext and POLoad for MergeJoinIndexer. > TestTezCompiler.testMergeCogroup fails whenever config is updated > ----------------------------------------------------------------- > > Key: PIG-5445 > URL: https://issues.apache.org/jira/browse/PIG-5445 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.19.0 > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Priority: Minor > Attachments: pig-5445-v01.patch > > > TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and > config that comes with it). > {noformat} > testMergeCogroupFailure > expected: > <|---a: > Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a > > pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...> > > but was: > <|---a: > Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a > > pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...> > at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472) > at > org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) > {noformat} > (edited the diff above a bit to make it easier to identify where the > difference was) > Basically 3rd argument to MergeJoinIndexer differed. -- This message was sent by Atlassian Jira (v8.20.10#820010)