[
https://issues.apache.org/jira/browse/PIG-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699801#action_12699801
]
Pradeep Kamath commented on PIG-514:
------------------------------------
I am currently working on implementing the above proposal since I have not seen
any objections. After making the core changes to implement the above proposal,
I validated that it fixed the issue reported here and also in PIG-739 and
PIG-710. I need to add a few more changes to make the patch complete - will
supply a patch once done.
> COUNT returns no results as a result of two filter statements in FOREACH
> ------------------------------------------------------------------------
>
> Key: PIG-514
> URL: https://issues.apache.org/jira/browse/PIG-514
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.2.0
> Reporter: Viraj Bhat
> Assignee: Pradeep Kamath
> Attachments: mystudentfile.txt
>
>
> For the following piece of sample code in FOREACH which counts the filtered
> student records based on record_type == 1 and scores and also on record_type
> == 0 does not seem to return any results.
> {code}
> mydata = LOAD 'mystudentfile.txt' AS (record_type,name,age,scores,gpa);
> --keep only what we need
> mydata_filtered = FOREACH mydata GENERATE record_type, name, age,
> scores ;
> --group
> mydata_grouped = GROUP mydata_filtered BY (record_type,age);
> myfinaldata = FOREACH mydata_grouped {
> myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age == scores;
> myfilter2 = FILTER mydata_filtered BY record_type == 0;
> GENERATE FLATTEN(group),
> -- Only this count causes the problem ??
> COUNT(myfilter1) as col2,
> SUM(myfilter2.scores) as col3,
> COUNT(myfilter2) as col4; };
> --these set of statements confirm that the count on the filters returns 1
> --mycountdata = FOREACH mydata_grouped
> --{
> -- myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age ==
> scores;
> -- GENERATE
> -- COUNT(myfilter1) as colcount;
> --};
> --dump mycountdata;
> dump myfinaldata;
> {code}
> But if you uncomment the {code} COUNT(myfilter1) as col2, {code}, it seems
> to work with the following results..
> (0,22,45.0,2L)
> (0,24,133.0,6L)
> (0,25,22.0,1L)
> Also I have tried to verify if this is a issue with the {code}
> COUNT(myfilter1) as col2, {code} returning zero. It does not seem to be the
> case.
> If {code} dump mycountdata; {code} is uncommented it returns:
> (1L)
> (1L)
> I am attaching the tab separated 'mystudentfile.txt' file used in this Pig
> script. Is this an issue with 2 filters in the FOREACH followed by a COUNT on
> these filters??
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.