[ 
https://issues.apache.org/jira/browse/PIG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-710.
--------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.3.0
         Assignee: Pradeep Kamath

This issue has the same root cause as PIG-514 - hence marking this duplicate - 
the fix for this issue will also be tracked in PIG-514

> Filtering bag in nested foreach does not produce expected results
> -----------------------------------------------------------------
>
>                 Key: PIG-710
>                 URL: https://issues.apache.org/jira/browse/PIG-710
>             Project: Pig
>          Issue Type: Bug
>            Reporter: David Ciemiewicz
>            Assignee: Pradeep Kamath
>             Fix For: 0.3.0
>
>
> I have an idiom I used to use in older versions of pig (prior to types 
> branch) which would group into a collection and then filter the output if any 
> of the collection contained a particular string.
> This relies on FILTER statements within a FOREACH ... { ... GENERATE ... } 
> statement.
> ORDER ... BY in the FOREACH ... { ... GENERATE ... } statement does not seem 
> to have a problem so it seems to be something isolated to the FILTER.
> {code}
> A = load 'filterbug.data' using PigStorage() as ( id, str );
> B = group A by ( id );
> describe B;
> dump B;
> D = foreach B generate
>         group,
>         COUNT(A),
>         A.str;
> describe D;
> dump D;
> C = foreach B {
>         D = order A by str;
>         matchedcount = COUNT(D);
>         generate
>                 group,
>                 matchedcount as matchedcount,
>                 D.str;
>         };
> describe C;
> dump C;
> Cfiltered = foreach B {
>         D = filter A by (
>                 str matches 'hello'
>                 );
>         matchedcount = COUNT(D);
>         generate
>                 group,
>                 matchedcount as matchedcount,
>                 A.str;
>         };
> describe Cfiltered;
> dump Cfiltered;
> {code}
> Here's the output:
> {code}
> -bash-3.00$ pig -exectype local -latest filterbug.pig
> USING: /grid/0/gs/pig/current
> B: {group: bytearray,A: {id: bytearray,str: bytearray}}
> 2009-03-10 03:14:14,838 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2009-03-10 03:14:14,839 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (a,{(a,hello),(a,goodbye)})
> (b,{(b,goodbye)})
> (c,{(c,hello),(c,hello),(c,hello)})
> (d,{(d,what)})
> D: {group: bytearray,long,str: {str: bytearray}}
> 2009-03-10 03:14:14,920 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2009-03-10 03:14:14,920 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (a,2L,{(hello),(goodbye)})
> (b,1L,{(goodbye)})
> (c,3L,{(hello),(hello),(hello)})
> (d,1L,{(what)})
> C: {group: bytearray,matchedcount: long,str: {str: bytearray}}
> 2009-03-10 03:14:14,985 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2009-03-10 03:14:14,985 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (a,2L,{(goodbye),(hello)})
> (b,1L,{(goodbye)})
> (c,3L,{(hello),(hello),(hello)})
> (d,1L,{(what)})
> 2009-03-10 03:14:15,018 [main] WARN  org.apache.pig.PigServer - Encountered 
> Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
> Cfiltered: {group: bytearray,matchedcount: long,str: {str: bytearray}}
> 2009-03-10 03:14:15,044 [main] WARN  org.apache.pig.PigServer - Encountered 
> Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
> 2009-03-10 03:14:15,057 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2009-03-10 03:14:15,057 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (a,1L,{(hello),(goodbye)})
> {code}
> What I expect for the output of Cfiltered is actually:
> (a,1L,{(hello),(goodbye)})
> (b,0L,{(goodbye)})
> (c,3L,{(hello),(hello),(hello)})
> (d,0L,{(what)})
> The data file is:
> {code}
> a       hello
> a       goodbye
> b       goodbye
> c       hello
> c       hello
> c       hello
> d       what
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to