[jira] Commented: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774859#action_12774859 ] Daniel Dai commented on PIG-979: This patch depends on PIG-1038. It is not directly patchable on its own. That's why tests are failed. > Acummulator Interface for UDFs > -- > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature >Reporter: Alan Gates >Assignee: Ying He > Attachments: PIG-979.patch > > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key
[ https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1038: Status: Patch Available (was: Open) > Optimize nested distinct/sort to use secondary key > -- > > Key: PIG-1038 > URL: https://issues.apache.org/jira/browse/PIG-1038 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.4.0 >Reporter: Olga Natkovich >Assignee: Daniel Dai > Fix For: 0.6.0 > > Attachments: PIG-1038-1.patch, PIG-1038-2.patch > > > If nested foreach plan contains sort/distinct, it is possible to use hadoop > secondary sort instead of SortedDataBag and DistinctDataBag to optimize the > query. > Eg1: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = order A by $1; > generate group, D; > } > store C into 'myresult'; > We can specify a secondary sort on A.$1, and drop "order A by $1". > Eg2: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = A.$1; > E = distinct D; > generate group, E; > } > store C into 'myresult'; > We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct > D" to a special version of distinct, which does not do the sorting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key
[ https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1038: Status: Open (was: Patch Available) > Optimize nested distinct/sort to use secondary key > -- > > Key: PIG-1038 > URL: https://issues.apache.org/jira/browse/PIG-1038 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.4.0 >Reporter: Olga Natkovich >Assignee: Daniel Dai > Fix For: 0.6.0 > > Attachments: PIG-1038-1.patch, PIG-1038-2.patch > > > If nested foreach plan contains sort/distinct, it is possible to use hadoop > secondary sort instead of SortedDataBag and DistinctDataBag to optimize the > query. > Eg1: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = order A by $1; > generate group, D; > } > store C into 'myresult'; > We can specify a secondary sort on A.$1, and drop "order A by $1". > Eg2: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = A.$1; > E = distinct D; > generate group, E; > } > store C into 'myresult'; > We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct > D" to a special version of distinct, which does not do the sorting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key
[ https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1038: Attachment: PIG-1038-2.patch Reattach patch to address javac and findbugs warnings. > Optimize nested distinct/sort to use secondary key > -- > > Key: PIG-1038 > URL: https://issues.apache.org/jira/browse/PIG-1038 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.4.0 >Reporter: Olga Natkovich >Assignee: Daniel Dai > Fix For: 0.6.0 > > Attachments: PIG-1038-1.patch, PIG-1038-2.patch > > > If nested foreach plan contains sort/distinct, it is possible to use hadoop > secondary sort instead of SortedDataBag and DistinctDataBag to optimize the > query. > Eg1: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = order A by $1; > generate group, D; > } > store C into 'myresult'; > We can specify a secondary sort on A.$1, and drop "order A by $1". > Eg2: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = A.$1; > E = distinct D; > generate group, E; > } > store C into 'myresult'; > We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct > D" to a special version of distinct, which does not do the sorting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774815#action_12774815 ] Hadoop QA commented on PIG-979: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424249/PIG-979.patch against trunk revision 833549. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/144/testReport/ Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/144/console This message is automatically generated. > Acummulator Interface for UDFs > -- > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature >Reporter: Alan Gates >Assignee: Ying He > Attachments: PIG-979.patch > > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-979: --- Status: Patch Available (was: Open) > Acummulator Interface for UDFs > -- > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature >Reporter: Alan Gates >Assignee: Ying He > Attachments: PIG-979.patch > > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1038) Optimize nested distinct/sort to use secondary key
[ https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774753#action_12774753 ] Hadoop QA commented on PIG-1038: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424289/PIG-1038-1.patch against trunk revision 833549. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 207 javac compiler warnings (more than the trunk's current 199 warnings). -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. -1 release audit. The applied patch generated 319 release audit warnings (more than the trunk's current 317 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/143/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/143/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/143/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/143/console This message is automatically generated. > Optimize nested distinct/sort to use secondary key > -- > > Key: PIG-1038 > URL: https://issues.apache.org/jira/browse/PIG-1038 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.4.0 >Reporter: Olga Natkovich >Assignee: Daniel Dai > Fix For: 0.6.0 > > Attachments: PIG-1038-1.patch > > > If nested foreach plan contains sort/distinct, it is possible to use hadoop > secondary sort instead of SortedDataBag and DistinctDataBag to optimize the > query. > Eg1: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = order A by $1; > generate group, D; > } > store C into 'myresult'; > We can specify a secondary sort on A.$1, and drop "order A by $1". > Eg2: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = A.$1; > E = distinct D; > generate group, E; > } > store C into 'myresult'; > We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct > D" to a special version of distinct, which does not do the sorting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.