[ 
https://issues.apache.org/jira/browse/PIG-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-874.
--------------------------------

    Resolution: Fixed

This is getting addressed with the optimizer re-work

> Problems in pushing down foreach with flatten
> ---------------------------------------------
>
>                 Key: PIG-874
>                 URL: https://issues.apache.org/jira/browse/PIG-874
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Santhosh Srinivasan
>
> If the graph contains more than one foreach connected to an operator, pushing 
> down foreach with flatten is not possible with the current optimizer pattern 
> matching algorithm and current implementation of rewire. The following 
> mechanism of pushing foreach with flatten does not work.
> 1. Search for foreach (with flatten) connected to an operator
> 2. If checks pass then unflatten the flattened column in the foreach
> 3. Create a new foreach that flattens the mapped column (the original column 
> number could have changed) and insert the new foreach after the old foreach's 
> successor.
> An example to illustrate the problem:
> {code}
> A = load 'myfile' as (name, age, gpa:(letter_grade, point_score));
> B = foreach A generate $0, $1, flatten($2);
> C = load 'anotherfile' as (name, age, preference:(course_name, instructor));
> D = foreach C generate $0, $1, flatten($2);
> E = join B by $0, D by $0 using "replicated";
> F = limit E 10;
> {code}
> In the code snipped (see above), the optimizer will find two matches, B->E 
> and D->E. For the first pattern match (B->E), $2 will be unflattened and a 
> new foreach will be introduced after the join.
> {code}
> A = load 'myfile' as (name, age, gpa:(letter_grade, point_score));
> B = foreach A generate $0, $1, $2;
> C = load 'anotherfile' as (name, age, preference:(course_name, instructor));
> D = foreach C generate $0, $1, flatten($2);
> E = join B by $0, D by $0 using "replicated";
> E1 = foreach E generate $0, $1, flatten($2), $3, $4, $5, $6;
> F = limit E1 10;
> {code}
> For the second match (D->E), the same transformation is applied. However, 
> this transformation will not work for the following reason. The new foreach 
> is now inserted between the E and E1. When E1 is rewired, rewire is unable to 
> map $6 in E1 as it never exists in E. In order to fix such situations, the 
> pattern matching should return a global match instead of a local match.
> Reference: PIG-873

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to