[ https://issues.apache.org/jira/browse/PIG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath resolved PIG-710. -------------------------------- Resolution: Duplicate Fix Version/s: 0.3.0 Assignee: Pradeep Kamath This issue has the same root cause as PIG-514 - hence marking this duplicate - the fix for this issue will also be tracked in PIG-514 > Filtering bag in nested foreach does not produce expected results > ----------------------------------------------------------------- > > Key: PIG-710 > URL: https://issues.apache.org/jira/browse/PIG-710 > Project: Pig > Issue Type: Bug > Reporter: David Ciemiewicz > Assignee: Pradeep Kamath > Fix For: 0.3.0 > > > I have an idiom I used to use in older versions of pig (prior to types > branch) which would group into a collection and then filter the output if any > of the collection contained a particular string. > This relies on FILTER statements within a FOREACH ... { ... GENERATE ... } > statement. > ORDER ... BY in the FOREACH ... { ... GENERATE ... } statement does not seem > to have a problem so it seems to be something isolated to the FILTER. > {code} > A = load 'filterbug.data' using PigStorage() as ( id, str ); > B = group A by ( id ); > describe B; > dump B; > D = foreach B generate > group, > COUNT(A), > A.str; > describe D; > dump D; > C = foreach B { > D = order A by str; > matchedcount = COUNT(D); > generate > group, > matchedcount as matchedcount, > D.str; > }; > describe C; > dump C; > Cfiltered = foreach B { > D = filter A by ( > str matches 'hello' > ); > matchedcount = COUNT(D); > generate > group, > matchedcount as matchedcount, > A.str; > }; > describe Cfiltered; > dump Cfiltered; > {code} > Here's the output: > {code} > -bash-3.00$ pig -exectype local -latest filterbug.pig > USING: /grid/0/gs/pig/current > B: {group: bytearray,A: {id: bytearray,str: bytearray}} > 2009-03-10 03:14:14,838 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! > 2009-03-10 03:14:14,839 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > (a,{(a,hello),(a,goodbye)}) > (b,{(b,goodbye)}) > (c,{(c,hello),(c,hello),(c,hello)}) > (d,{(d,what)}) > D: {group: bytearray,long,str: {str: bytearray}} > 2009-03-10 03:14:14,920 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! > 2009-03-10 03:14:14,920 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > (a,2L,{(hello),(goodbye)}) > (b,1L,{(goodbye)}) > (c,3L,{(hello),(hello),(hello)}) > (d,1L,{(what)}) > C: {group: bytearray,matchedcount: long,str: {str: bytearray}} > 2009-03-10 03:14:14,985 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! > 2009-03-10 03:14:14,985 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > (a,2L,{(goodbye),(hello)}) > (b,1L,{(goodbye)}) > (c,3L,{(hello),(hello),(hello)}) > (d,1L,{(what)}) > 2009-03-10 03:14:15,018 [main] WARN org.apache.pig.PigServer - Encountered > Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s). > Cfiltered: {group: bytearray,matchedcount: long,str: {str: bytearray}} > 2009-03-10 03:14:15,044 [main] WARN org.apache.pig.PigServer - Encountered > Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s). > 2009-03-10 03:14:15,057 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! > 2009-03-10 03:14:15,057 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > (a,1L,{(hello),(goodbye)}) > {code} > What I expect for the output of Cfiltered is actually: > (a,1L,{(hello),(goodbye)}) > (b,0L,{(goodbye)}) > (c,3L,{(hello),(hello),(hello)}) > (d,0L,{(what)}) > The data file is: > {code} > a hello > a goodbye > b goodbye > c hello > c hello > c hello > d what > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.