Ashutosh Chauhan commented on PIG-858:

While POFRJoin is getting compiled in MRCompiler, it needs to identify for each 
of its 
predecessor in physical plan of which compiled MROperator they are part of. 
Currently, it is
assumed to be one of the compiledInputs(an array of MRoper which are immediate 
predecessor of current MROper in MROper DAG). 
Mostly this is true, but in cases where one physical operator results in two or 
more MR operator, this may not be true, as is the
case here. When there is an order-by before FRJoin; one of the inputs of 
POFRJoin will be
POSort, but POSort operator will be in the first MROper of the two generated 
and thus will not be found in compiledInputs (which contains second MROper). 
current way of identifying corresponding MRoper of a physical operator is 
This bug also affects the implementation of merge-sort join 
https://issues.apache.org/jira/browse/PIG-845 . Since POMergeJoin needs to know 
which MROper
corresponds to its left input and which one corresponds to its right. It can do 
so by looking
into compiledInputs as long as there is no order-by (or similiar PO which 
results in
multiple MROper) as its predecessors. Doing order-by before using merge
join is however a natural use-case there.

Proposal is to introduce a new private member variable in MRCompiler 
(similiar to logToPhyMap) using which leaf MROper for a given
physical operator can be identified. Thoughts?

> Order By followed by "replicated" join fails while compiling MR-plan from 
> physical plan
> ---------------------------------------------------------------------------------------
>                 Key: PIG-858
>                 URL: https://issues.apache.org/jira/browse/PIG-858
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>             Fix For: 0.4.0
> Consider the query:
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0;
> explain C;
> {code}
> works. But if replicated join is used instead
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0 using "replicated";
> explain C;
> {code}
> this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error 
> compiling operator POFRJoin
> relevant stacktrace:
> {code}
> Caused by: java.lang.RuntimeException: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
>         at org.apache.pig.PigServer.explain(PigServer.java:574)
>         ... 8 more
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
>         ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
>         ... 16 more
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to