[ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760389#action_12760389
 ] 

Alan Gates commented on PIG-979:
--------------------------------

Jeff, thanks for the paper.  I looked over it and I'm not certain it directly 
applies.  They are measuring both the aggregation time (sort or hash) and how 
it is passed to the user defined aggregate (iterate or accumulate).  Being in 
Hadoop we already have the aggregation done.  So it's just a question of the 
fastest way to make the data available to the UDF.  As I said above, we want to 
test the performance of this and prove its worth before we add it.

As a general complaint, they used a fairly old revision of Pig code in their 
paper, even though it appears it was published in the last few months.

> Acummulator Interface for UDFs
> ------------------------------
>
>                 Key: PIG-979
>                 URL: https://issues.apache.org/jira/browse/PIG-979
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>            Assignee: Ying He
>
> Add an accumulator interface for UDFs that would allow them to take a set 
> number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to