Daniel Dai commented on PIG-1022:

Actually we cannot push the filter even before f2. Since we do not keep track 
of the source of data inside tuple, so gid should be treated as a generated 
field of f2. However, projection map of f2 give us the wrong result that gid is 
a directly mapped field of group (which is a tuple (name, gid)), and this 
triggers all the subsequences. The fix for this problem is to modify the 
projection map generation logic for the mapped field. 

Santhosh, do you have any comment?

> optimizer pushes filter before the foreach that generates column used by 
> filter
> -------------------------------------------------------------------------------
>                 Key: PIG-1022
>                 URL: https://issues.apache.org/jira/browse/PIG-1022
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Daniel Dai
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, 
> gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as 
> gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as 
> gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first 
> foreach, even though the filter is on gid which is generated in first foreach.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to