[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760396#action_12760396 ]
Alan Gates commented on PIG-979: -------------------------------- Ciemo, In your comment above, you indicate you'd like functions like cumulative sum to be able to emit a value each time a record is added. But how does that work with something like: {code} A = load 'bla'; B = group A by $0; C = foreach B generate { D = order A by $1; generate CUMULATIVE_SUM(D.$2), SUM(D.$2); } {code} SUM can't output a value until it's seen everything, but CUMULATIVE_SUM will have an output on every record. The way Pig's data model handles this with bags. The other possibility I can see is that Pig handles this as having an implicit flatten, so output from above would look like: 1 10 3 10 6 10 10 10 Are you proposing that we create a way to streamline output of these types of functions to STORE (or DUMP) so that the bag never need be materialized? Or do you want a UDF type that takes a bag and produces multiple outputs along with an implicit flatten? Or are you suggesting a change in the data model? > Acummulator Interface for UDFs > ------------------------------ > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature > Reporter: Alan Gates > Assignee: Ying He > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.