Problems in pushing down foreach with flatten
---------------------------------------------

                 Key: PIG-874
                 URL: https://issues.apache.org/jira/browse/PIG-874
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.3.1
            Reporter: Santhosh Srinivasan
             Fix For: 0.4.0


If the graph contains more than one foreach connected to an operator, pushing 
down foreach with flatten is not possible with the current optimizer pattern 
matching algorithm and current implementation of rewire. The following 
mechanism of pushing foreach with flatten does not work.

1. Search for foreach (with flatten) connected to an operator
2. If checks pass then unflatten the flattened column in the foreach
3. Create a new foreach that flattens the mapped column (the original column 
number could have changed) and insert the new foreach after the old foreach's 
successor.

An example to illustrate the problem:

{code}
A = load 'myfile' as (name, age, gpa:(letter_grade, point_score));
B = foreach A generate $0, $1, flatten($2);
C = load 'anotherfile' as (name, age, preference:(course_name, instructor));
D = foreach C generate $0, $1, flatten($2);
E = join B by $0, D by $0 using "replicated";
F = limit E 10;
{code}

In the code snipped (see above), the optimizer will find two matches, B->E and 
D->E. For the first pattern match (B->E), $2 will be unflattened and a new 
foreach will be introduced after the join.

{code}
A = load 'myfile' as (name, age, gpa:(letter_grade, point_score));
B = foreach A generate $0, $1, $2;
C = load 'anotherfile' as (name, age, preference:(course_name, instructor));
D = foreach C generate $0, $1, flatten($2);
E = join B by $0, D by $0 using "replicated";
E1 = foreach E generate $0, $1, flatten($2), $3, $4, $5, $6;
F = limit E1 10;
{code}

For the second match (D->E), the same transformation is applied. However, this 
transformation will not work for the following reason. The new foreach is now 
inserted between the E and E1. When E1 is rewired, rewire is unable to map $6 
in E1 as it never exists in E. In order to fix such situations, the pattern 
matching should return a global match instead of a local match.

Reference: PIG-873

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to