[
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776423#action_12776423
]
Hadoop QA commented on PIG-1038:
--------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424580/PIG-1038-3.patch
against trunk revision 834285.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 7 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The applied patch generated 209 javac compiler warnings (more
than the trunk's current 199 warnings).
-1 findbugs. The patch appears to introduce 1 new Findbugs warnings.
-1 release audit. The applied patch generated 320 release audit warnings
(more than the trunk's current 317 warnings).
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/148/testReport/
Release audit warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/148/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/148/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/148/console
This message is automatically generated.
> Optimize nested distinct/sort to use secondary key
> --------------------------------------------------
>
> Key: PIG-1038
> URL: https://issues.apache.org/jira/browse/PIG-1038
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.4.0
> Reporter: Olga Natkovich
> Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1038-1.patch, PIG-1038-2.patch, PIG-1038-3.patch
>
>
> If nested foreach plan contains sort/distinct, it is possible to use hadoop
> secondary sort instead of SortedDataBag and DistinctDataBag to optimize the
> query.
> Eg1:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = order A by $1;
> generate group, D;
> }
> store C into 'myresult';
> We can specify a secondary sort on A.$1, and drop "order A by $1".
> Eg2:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = A.$1;
> E = distinct D;
> generate group, E;
> }
> store C into 'myresult';
> We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct
> D" to a special version of distinct, which does not do the sorting.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.