[ 
https://issues.apache.org/jira/browse/PIG-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660953#action_12660953
 ] 

Pradeep Kamath commented on PIG-580:
------------------------------------

A different AlgebraicChecker instance is used for each ForEach inner plan. So 
the above check is to guard against more than one distinct agg in the same 
inner plan. In the script above, the two distinct aggs would be present in two 
different inner plans of the ForEach and the AlgebraicChecker instance dealing 
with COUNT(Ab) would mark it as "combineable" as would the (different) 
AlgebraicChecker instance working with COUNT(Bb). So the script would use the 
combiner.

> PERFORMANCE: Combiner should also be used when there are distinct aggregates 
> in a foreach following a group provided there are no non-algebraics in the 
> foreach 
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-580
>                 URL: https://issues.apache.org/jira/browse/PIG-580
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-580-v2.patch, PIG-580.patch
>
>
> Currently Pig uses the combiner only when there is foreach following a group 
> when the elements in the foreach generate have the following characteristics:
> 1) simple project of the "group" column
> 2) Algebraic UDF
> The above conditions exclude use of the combiner for distinct aggregates - 
> the distinct operation itself is combinable (irrespective of whether it feeds 
> to an algebraic or non algebraic udf). So if the following foreach should 
> also be combinable:
> {code}
> ..
> b = group a by $0;
> c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) }
> {code}
> The combiner optimizer should cause the distinct to be combined and the final 
> combine output should feed the COUNT() and SUM() in the reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to