[ https://issues.apache.org/jira/browse/PIG-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689772#action_12689772 ]
Viraj Bhat commented on PIG-514: -------------------------------- Another test case: consider the following input file: 1 1 3 1 2 3 2 1 3 2 1 3 The pig program is like this: {code} test = load 'test.txt' as (col1: int, col2: int, col3: int); test2 = group test by col1; test3 = foreach test2 { filter_one = filter test by (col2==1); filter_notone = filter test by (col2!=1); generate group as col1, COUNT(filter_one) as cnt_one, COUNT(filter_notone) as cnt_notone; }; {code} The output consists of a single line: (1,1L,1L) But I would expect (1,1L,1L) (2,2L,0L) > COUNT returns no results as a result of two filter statements in FOREACH > ------------------------------------------------------------------------ > > Key: PIG-514 > URL: https://issues.apache.org/jira/browse/PIG-514 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 1.0.0 > Reporter: Viraj Bhat > Attachments: mystudentfile.txt > > > For the following piece of sample code in FOREACH which counts the filtered > student records based on record_type == 1 and scores and also on record_type > == 0 does not seem to return any results. > {code} > mydata = LOAD 'mystudentfile.txt' AS (record_type,name,age,scores,gpa); > --keep only what we need > mydata_filtered = FOREACH mydata GENERATE record_type, name, age, > scores ; > --group > mydata_grouped = GROUP mydata_filtered BY (record_type,age); > myfinaldata = FOREACH mydata_grouped { > myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age == scores; > myfilter2 = FILTER mydata_filtered BY record_type == 0; > GENERATE FLATTEN(group), > -- Only this count causes the problem ?? > COUNT(myfilter1) as col2, > SUM(myfilter2.scores) as col3, > COUNT(myfilter2) as col4; }; > --these set of statements confirm that the count on the filters returns 1 > --mycountdata = FOREACH mydata_grouped > --{ > -- myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age == > scores; > -- GENERATE > -- COUNT(myfilter1) as colcount; > --}; > --dump mycountdata; > dump myfinaldata; > {code} > But if you uncomment the {code} COUNT(myfilter1) as col2, {code}, it seems > to work with the following results.. > (0,22,45.0,2L) > (0,24,133.0,6L) > (0,25,22.0,1L) > Also I have tried to verify if this is a issue with the {code} > COUNT(myfilter1) as col2, {code} returning zero. It does not seem to be the > case. > If {code} dump mycountdata; {code} is uncommented it returns: > (1L) > (1L) > I am attaching the tab separated 'mystudentfile.txt' file used in this Pig > script. Is this an issue with 2 filters in the FOREACH followed by a COUNT on > these filters?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.