[jira] Commented: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement

Hadoop QA (JIRA) Tue, 16 Feb 2010 20:18:49 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834633#action_12834633
 ]


Hadoop QA commented on PIG-1169:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436055/PIG-1169.patch
  against trunk revision 909921.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/206/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/206/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/206/console

This message is automatically generated.

> Top-N queries produce incorrect results when a store statement is added 
> between order by and limit statement
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1169
>                 URL: https://issues.apache.org/jira/browse/PIG-1169
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.7.0
>
>         Attachments: PIG-1169.patch
>
>
> ??We tried to get top N results after a groupby and sort, and got different 
> results with or without storing the full sorted results. Here is a skeleton 
> of our pig script.??
> {code}
> raw_data = Load '<input_files>' AS (f1, f2, ..., fn);
> grouped = group raw_data by (f1, f2);
> data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value;
> ordered = order data by value DESC parallel 10;
> topn = limit ordered 10;
> store ordered into 'outputdir/full';
> store topn into 'outputdir/topn';
> {code}
> ??With the statement 'store ordered ...', top N results are incorrect, but 
> without the statement, results are correct. Has anyone seen this before? I 
> know a similar bug has been fixed in the multi-query release. We are on pig 
> .4 and hadoop .20.1.??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement

Reply via email to