[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775184#action_12775184 ]
Ying He commented on PIG-979: ----------------------------- Alan, thanks for the feedback. 1. A test case is already created to test mix of accumulator UDF with regular UDF, it is in testAccumBasic(). 2. The optimizer can't be applied when inner is set to POPackage, because if an inner is set, POPackage checks the bag for that input is NULL, if it is, POPackage returns NULL. This can only be done when all the tuples are retrieved and put into a bag. 3 & 4, will fix that 5. needs performance testing. 6. The reducer get results from POPackage and pass it to root, which is POForEach, to process. From POForEach perspective, it gets a tuple with bags in it from POPackage. Then POForEach retrieves tuples off iterator and pass to UDFs in multiple cycles. Because only POPackage knows how to read tuples out of iterator and put in proper bags, AccumulativeTupleBuffer and AccumulativeBag are created to communicate between POPackage and POForEach. Every time POForEach calls getNextBatch() on AccumulativeTupleBuffer, it in effects calls inner class of POPackage to retrieve tuples out of iterator. POPackage can not be the one to block the reading of tuples, because it is only called once from reducer. I also thought of changing reducer to call POPackage multiple times to process each batch of data, then it becomes tricky to maintain correct states of operators, and all operators in reducer plan would have to support partial data, which is not necessary. > Acummulator Interface for UDFs > ------------------------------ > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature > Reporter: Alan Gates > Assignee: Ying He > Attachments: PIG-979.patch > > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.