[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760396#action_12760396
]
Alan Gates commented on PIG-979:
--------------------------------
Ciemo,
In your comment above, you indicate you'd like functions like cumulative sum to
be able to emit a value each time a record is added. But how does that work
with something like:
{code}
A = load 'bla';
B = group A by $0;
C = foreach B generate {
D = order A by $1;
generate CUMULATIVE_SUM(D.$2), SUM(D.$2);
}
{code}
SUM can't output a value until it's seen everything, but CUMULATIVE_SUM will
have an output on every record. The way Pig's data model handles this with
bags. The other possibility I can see is that Pig handles this as having an
implicit flatten, so output from above would look like:
1 10
3 10
6 10
10 10
Are you proposing that we create a way to streamline output of these types of
functions to STORE (or DUMP) so that the bag never need be materialized? Or do
you want a UDF type that takes a bag and produces multiple outputs along with
an implicit flatten? Or are you suggesting a change in the data model?
> Acummulator Interface for UDFs
> ------------------------------
>
> Key: PIG-979
> URL: https://issues.apache.org/jira/browse/PIG-979
> Project: Pig
> Issue Type: New Feature
> Reporter: Alan Gates
> Assignee: Ying He
>
> Add an accumulator interface for UDFs that would allow them to take a set
> number of records at a time instead of the entire bag.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.