Alan Gates commented on PIG-979:

A test should be added that checks that when accumulator UDFs are mixed with 
non-accumulator UDFs it works properly.

Why is the optimization not applied in the case that inner is set on POPackage? 
 It seems the accumulator interface should still work in this case.

Some comments on what AccumulatorOptimizer.check() is and what it allows would 
be helpful.

The code contains tabs in some spots instead of 4 spaces.

The cases in which the accumulator interface can be used has been greatly 
extended by adding the support for unary and binary operators.  But this comes 
at a cost.  Every binary and unary comparison now has to make the accumChild 
call.  99% of the time this will be false.  It's not clear to me how often 
users will do things like:

foreach C generate accumfunc1(A) + accumfunc2(A) OR
foreach C generate (accumfunc1(A) > 100 ? 0 : 1)

which is the only time I can see where this additional functionality is useful, 
since we don't currently allow these functions in filters.  It's possible that 
JIT along with branch prediction will remove this extra cost, since the branch 
will always be one way or another for a given query.  But I'd like to see this 
tested.  It would be interesting to compare a query with heavy use of binary 
operators (but no accumulator UDFs) with and without this change.

I don't understand why you need the new interface AccumulativeTupleBuffer and 
class AccumulativeBag.  Why can't the block of tuples read off of the iterator 
just be put in a regular bag and then passed to the UDFs?

In all the sum implementations of accumulate you calculate the sum of the block 
of tuples twice.  It should be done once and cached.

In COUNT.accumulate rather than making intermediateCount a Long and then 
forcing the creation of a new Long each time you add one you should instead 
keep it as a long and depend on boxing to convert it to Long when you return it 
in getValue.  Same in COUNT_STAR.accumulate

> Acummulator Interface for UDFs
> ------------------------------
>                 Key: PIG-979
>                 URL: https://issues.apache.org/jira/browse/PIG-979
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>            Assignee: Ying He
>         Attachments: PIG-979.patch
> Add an accumulator interface for UDFs that would allow them to take a set 
> number of records at a time instead of the entire bag.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to