[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775184#action_12775184
]
Ying He commented on PIG-979:
-----------------------------
Alan, thanks for the feedback.
1. A test case is already created to test mix of accumulator UDF with regular
UDF, it is in testAccumBasic().
2. The optimizer can't be applied when inner is set to POPackage, because if an
inner is set, POPackage checks the bag for that input is NULL, if it is,
POPackage returns NULL. This can only be done when all the tuples are retrieved
and put into a bag.
3 & 4, will fix that
5. needs performance testing.
6. The reducer get results from POPackage and pass it to root, which is
POForEach, to process. From POForEach perspective, it gets a tuple with bags in
it from POPackage. Then POForEach retrieves tuples off iterator and pass to
UDFs in multiple cycles. Because only POPackage knows how to read tuples out of
iterator and put in proper bags, AccumulativeTupleBuffer and AccumulativeBag
are created to communicate between POPackage and POForEach. Every time
POForEach calls getNextBatch() on AccumulativeTupleBuffer, it in effects calls
inner class of POPackage to retrieve tuples out of iterator.
POPackage can not be the one to block the reading of tuples, because it is only
called once from reducer. I also thought of changing reducer to call POPackage
multiple times to process each batch of data, then it becomes tricky to
maintain correct states of operators, and all operators in reducer plan would
have to support partial data, which is not necessary.
> Acummulator Interface for UDFs
> ------------------------------
>
> Key: PIG-979
> URL: https://issues.apache.org/jira/browse/PIG-979
> Project: Pig
> Issue Type: New Feature
> Reporter: Alan Gates
> Assignee: Ying He
> Attachments: PIG-979.patch
>
>
> Add an accumulator interface for UDFs that would allow them to take a set
> number of records at a time instead of the entire bag.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.