date:20091108

[jira] Commented: (PIG-979) Acummulator Interface for UDFs

2009-11-08 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774859#action_12774859
 ] 

Daniel Dai commented on PIG-979:


This patch depends on PIG-1038. It is not directly patchable on its own. That's 
why tests are failed.

> Acummulator Interface for UDFs
> --
>
> Key: PIG-979
> URL: https://issues.apache.org/jira/browse/PIG-979
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: PIG-979.patch
>
>
> Add an accumulator interface for UDFs that would allow them to take a set 
> number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key

2009-11-08 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1038:


Status: Patch Available  (was: Open)

> Optimize nested distinct/sort to use secondary key
> --
>
> Key: PIG-1038
> URL: https://issues.apache.org/jira/browse/PIG-1038
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1038-1.patch, PIG-1038-2.patch
>
>
> If nested foreach plan contains sort/distinct, it is possible to use hadoop 
> secondary sort instead of SortedDataBag and DistinctDataBag to optimize the 
> query. 
> Eg1:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = order A by $1;
> generate group, D;
> }
> store C into 'myresult';
> We can specify a secondary sort on A.$1, and drop "order A by $1".
> Eg2:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = A.$1;
> E = distinct D;
> generate group, E;
> }
> store C into 'myresult';
> We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct 
> D" to a special version of distinct, which does not do the sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key

2009-11-08 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1038:


Status: Open  (was: Patch Available)

> Optimize nested distinct/sort to use secondary key
> --
>
> Key: PIG-1038
> URL: https://issues.apache.org/jira/browse/PIG-1038
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1038-1.patch, PIG-1038-2.patch
>
>
> If nested foreach plan contains sort/distinct, it is possible to use hadoop 
> secondary sort instead of SortedDataBag and DistinctDataBag to optimize the 
> query. 
> Eg1:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = order A by $1;
> generate group, D;
> }
> store C into 'myresult';
> We can specify a secondary sort on A.$1, and drop "order A by $1".
> Eg2:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = A.$1;
> E = distinct D;
> generate group, E;
> }
> store C into 'myresult';
> We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct 
> D" to a special version of distinct, which does not do the sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key

2009-11-08 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1038:


Attachment: PIG-1038-2.patch

Reattach patch to address javac and findbugs warnings.

> Optimize nested distinct/sort to use secondary key
> --
>
> Key: PIG-1038
> URL: https://issues.apache.org/jira/browse/PIG-1038
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1038-1.patch, PIG-1038-2.patch
>
>
> If nested foreach plan contains sort/distinct, it is possible to use hadoop 
> secondary sort instead of SortedDataBag and DistinctDataBag to optimize the 
> query. 
> Eg1:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = order A by $1;
> generate group, D;
> }
> store C into 'myresult';
> We can specify a secondary sort on A.$1, and drop "order A by $1".
> Eg2:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = A.$1;
> E = distinct D;
> generate group, E;
> }
> store C into 'myresult';
> We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct 
> D" to a special version of distinct, which does not do the sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-979) Acummulator Interface for UDFs

2009-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774815#action_12774815
 ] 

Hadoop QA commented on PIG-979:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424249/PIG-979.patch
  against trunk revision 833549.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/144/testReport/
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/144/console

This message is automatically generated.

> Acummulator Interface for UDFs
> --
>
> Key: PIG-979
> URL: https://issues.apache.org/jira/browse/PIG-979
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: PIG-979.patch
>
>
> Add an accumulator interface for UDFs that would allow them to take a set 
> number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-979) Acummulator Interface for UDFs

2009-11-08 Thread Olga Natkovich (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-979:
---

Status: Patch Available  (was: Open)

> Acummulator Interface for UDFs
> --
>
> Key: PIG-979
> URL: https://issues.apache.org/jira/browse/PIG-979
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: PIG-979.patch
>
>
> Add an accumulator interface for UDFs that would allow them to take a set 
> number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1038) Optimize nested distinct/sort to use secondary key

2009-11-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774753#action_12774753
 ] 

Hadoop QA commented on PIG-1038:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424289/PIG-1038-1.patch
  against trunk revision 833549.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 207 javac compiler warnings (more 
than the trunk's current 199 warnings).

-1 findbugs.  The patch appears to introduce 3 new Findbugs warnings.

-1 release audit.  The applied patch generated 319 release audit warnings 
(more than the trunk's current 317 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/143/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/143/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/143/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/143/console

This message is automatically generated.

> Optimize nested distinct/sort to use secondary key
> --
>
> Key: PIG-1038
> URL: https://issues.apache.org/jira/browse/PIG-1038
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1038-1.patch
>
>
> If nested foreach plan contains sort/distinct, it is possible to use hadoop 
> secondary sort instead of SortedDataBag and DistinctDataBag to optimize the 
> query. 
> Eg1:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = order A by $1;
> generate group, D;
> }
> store C into 'myresult';
> We can specify a secondary sort on A.$1, and drop "order A by $1".
> Eg2:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = A.$1;
> E = distinct D;
> generate group, E;
> }
> store C into 'myresult';
> We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct 
> D" to a special version of distinct, which does not do the sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-979) Acummulator Interface for UDFs

[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key

[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key

[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key

[jira] Commented: (PIG-979) Acummulator Interface for UDFs

[jira] Updated: (PIG-979) Acummulator Interface for UDFs

[jira] Commented: (PIG-1038) Optimize nested distinct/sort to use secondary key

7 matches

Site Navigation

Mail list logo

Footer information