[ 
https://issues.apache.org/jira/browse/PIG-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-846:
-------------------------------

    Status: Patch Available  (was: Open)

The root cause is that MultiQueryOptimizer as part of merging MROpers updates 
the keyInfo Hashmap present in POPackage to reflect the new "index" (originally 
the index is 0 since multiquery optimization only happens with single input 
MROpers). MultiQueryOptimization uses indices in the ranges 0x80 to 0x8f by 
ORing the index with a multiQuery bitmask (0x80). These indices are assigned by 
flattenning out all POLocalRearranges across all merged map plans (including 
those nested in any POSplit opreator in a map plan). Now the corresponding 
POPackages need to have their keyInfo HashMaps updated to reflect these new 
indexes. MultiQueryOptimizer does this update for POPackage operators that 
occur in the top level POMultiQueryPackage. However if merging of MROpers 
happens recursively, we could have POMultiQueryPackage operators in the list of 
packages maintained by the topLevel POMultiQueryPackage. These were not getting 
updated.

The fix updates (recursively) the packages in any POMultiQueryPackage present 
in the top level POMultiQueryPackage.

> MultiQuery optimization in some cases has an issue when there is a split in 
> the map plan 
> -----------------------------------------------------------------------------------------
>
>                 Key: PIG-846
>                 URL: https://issues.apache.org/jira/browse/PIG-846
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.2.1
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.3.0
>
>         Attachments: PIG-846.patch
>
>
> The following script produces the error that follows:
> {noformat}
> A = LOAD 'input.txt' as (f0, f1, f2, f3, f4, f5, f6, f7, f8); 
> B = FOREACH A GENERATE f0, f1, f2, f3, f4;
> B1 = foreach B generate f0, f1, f2;
> C = GROUP B1 BY (f1, f2);
> STORE C into 'foo1';
> B2 = FOREACH B GENERATE f0, f3, f4;
> E = GROUP B2 BY (f3, f4);
> STORE E into 'foo2';
> F = FOREACH A GENERATE f0, f5, f6, f7, f8;
> F1 = FOREACH F GENERATE f0, f5,f6;
> G = GROUP F1 BY (f5, f6);
> STORE G into 'foo3';
> F2  = FOREACH F GENERATE f0, f7, f8;
> I = GROUP F2 BY (f7, f8);
> STORE I into 'foo4';
> {noformat}
> Exception encountered during execution:
> {noformat}
> java.lang.NullPointerException
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:262)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:209)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMultiQueryPackage.getNext(POMultiQueryPackage.java:186)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMultiQueryPackage.getNext(POMultiQueryPackage.java:186)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:277)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:268)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:142)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
>       at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to