[ 
https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-844.
--------------------------------


accumulate interface took care of this.

> PERFORMANCE: streaming data to the UDFs in foreach
> --------------------------------------------------
>
>                 Key: PIG-844
>                 URL: https://issues.apache.org/jira/browse/PIG-844
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> Currently, Pig places the data passed to UDFs into a bag. This can cause the 
> process to use more memory than actually needed as in many cases it would be 
> better to push the data one tuple at a time to the UDFs.
> For the case where combiner is invoked, this might not be that important; 
> however, for non-algebraic UDFs as well as other cases where combiner can't 
> be used, this can provide significant memory improvement.
> Another possible use case is where the data is already grouped going into pig 
> and we don't need to group it again.
> How this will effect UDF interface needs to be further discussed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to