[ 
https://issues.apache.org/jira/browse/PIG-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-5445:
------------------------------
    Attachment: pig-5445-v01.patch

I have no understanding of how the cogroup&MergeJoinIndexer are implemented, 
but checking MergeJoinIndexer.java 
{code:java}
 70     public MergeJoinIndexer(String funcSpec, String innerPlan, String 
serializedPhyPlan,
 71             String udfCntxtSignature, String scope, String ignoreNulls) 
throws ExecException{
 72
 73         loader = 
...
 82             precedingPhyPlan = 
(PhysicalPlan)ObjectSerializer.deserialize(serializedPhyPlan);
 83             if(precedingPhyPlan != null){
 84                     if(precedingPhyPlan.getLeaves().size() != 1 || 
precedingPhyPlan.getRoots().size() != 1){
 85                         int errCode = 2168;
 86                         String errMsg = "Expected physical plan with 
exactly one root and one leaf.";
 87                         throw new 
ExecException(errMsg,errCode,PigException.BUG);
 88                     }
 89                 this.rightPipelineLeaf = 
precedingPhyPlan.getLeaves().get(0);
 90                 this.rightPipelineRoot = precedingPhyPlan.getRoots().get(0);
 91                 this.rightPipelineRoot.setInputs(null); *********
 92             }
 93         } {code}
MergeJoinIndexer is always overwriting the "inputs" with null.   This means 
"inputs" can be skipped at serialization time.   Attaching the patch 
(pig-5445-v01.patch) which does that.   Size of TEZC-MergeCogroup-1.gld was 
reduced by 5 with this patch since it no longer serialize PigContext and POLoad 
for MergeJoinIndexer.

> TestTezCompiler.testMergeCogroup fails whenever config is updated
> -----------------------------------------------------------------
>
>                 Key: PIG-5445
>                 URL: https://issues.apache.org/jira/browse/PIG-5445
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.19.0
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: pig-5445-v01.patch
>
>
> TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and 
> config that comes with it).
> {noformat}
> testMergeCogroupFailure
> expected:
> <|---a: 
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>   
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...>
>  
> but was:
> <|---a: 
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>   
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...>
> at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472)
> at 
> org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) 
> {noformat}
> (edited the diff above a bit to make it easier to identify where the 
> difference was)
> Basically 3rd argument to MergeJoinIndexer differed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to