[ https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834633#action_12834633 ]
Hadoop QA commented on PIG-1169: -------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436055/PIG-1169.patch against trunk revision 909921. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/206/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/206/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/206/console This message is automatically generated. > Top-N queries produce incorrect results when a store statement is added > between order by and limit statement > ------------------------------------------------------------------------------------------------------------ > > Key: PIG-1169 > URL: https://issues.apache.org/jira/browse/PIG-1169 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.7.0 > Reporter: Richard Ding > Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1169.patch > > > ??We tried to get top N results after a groupby and sort, and got different > results with or without storing the full sorted results. Here is a skeleton > of our pig script.?? > {code} > raw_data = Load '<input_files>' AS (f1, f2, ..., fn); > grouped = group raw_data by (f1, f2); > data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; > ordered = order data by value DESC parallel 10; > topn = limit ordered 10; > store ordered into 'outputdir/full'; > store topn into 'outputdir/topn'; > {code} > ??With the statement 'store ordered ...', top N results are incorrect, but > without the statement, results are correct. Has anyone seen this before? I > know a similar bug has been fixed in the multi-query release. We are on pig > .4 and hadoop .20.1.?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.