[ 
https://issues.apache.org/jira/browse/PIG-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701003#action_12701003
 ] 

Viraj Bhat commented on PIG-772:
--------------------------------

There seems to be a workaround for the same, but the question is does the below 
Pig script perform better than the nested Pig script in the original 
description. In fact there are potentially big performance advantages if the 
filter statement allowed the semantics in the Original description of this 
Jira. This will also  avoid multiple redundant passes though the data.

{code}
A = LOAD 'half.txt' AS (key:CHARARRAY, val:INT);
B = GROUP A BY key;
C = foreach B { N = AVG(A.val); generate group, flatten(A.val), N as N;};
D = filter C by val >= N;
E = foreach D generate group, val;
F = group E by group;
G = foreach F generate group, E;
dump G
{code}

Input: half.txt
===================
A       1
A       2
A       3
B       1
B       3
====================
Result:
====================
(A,{(A,2),(A,3)})
(B,{(B,3)})
====================

> Semantics of Filter statement inside ForEach should support filtering on 
> aliases used in the Group statement preeceding it
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-772
>                 URL: https://issues.apache.org/jira/browse/PIG-772
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Viraj Bhat
>            Priority: Minor
>             Fix For: 0.3.0
>
>
> I have  a Pig script which tries to display all bags which are greater than 
> the average value in the group.
> {code}
> A = LOAD 'half.txt' AS (key:CHARARRAY, val:INT);
> B = GROUP A BY key;
> C = FOREACH B {
>        N = AVG(A.val);
>        HALF = FILTER A by val >= N;
>     GENERATE
>        FLATTEN(GROUP),
>        HALF;
> };
> dump C;
> {code}
> Presently the semantics of the Filter statement inside the FOREACH does not 
> support these types of operations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to