[ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760396#action_12760396
 ] 

Alan Gates commented on PIG-979:
--------------------------------

Ciemo,

In your comment above, you indicate you'd like functions like cumulative sum to 
be able to emit a value each time a record is added.  But how does that work 
with something like:

{code}
A = load 'bla';
B = group A by $0;
C = foreach B generate {
       D = order A by $1;
       generate CUMULATIVE_SUM(D.$2), SUM(D.$2);
}
{code}

SUM can't output a value until it's seen everything, but CUMULATIVE_SUM will 
have an output on every record.  The way Pig's data model handles this with 
bags.  The other possibility I can see is that Pig handles this as having an 
implicit flatten, so output from above would look like:

1   10
3   10
6   10
10 10

Are you proposing that we create a way to streamline output of these types of 
functions to STORE (or DUMP) so that the bag never need be materialized?  Or do 
you want a UDF type that takes a bag and produces multiple outputs along with 
an implicit flatten?  Or are you suggesting a change in the data model?  

> Acummulator Interface for UDFs
> ------------------------------
>
>                 Key: PIG-979
>                 URL: https://issues.apache.org/jira/browse/PIG-979
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>            Assignee: Ying He
>
> Add an accumulator interface for UDFs that would allow them to take a set 
> number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to