Combiner should also be used when there are distinct aggregates in a foreach 
following a group provided there are no non-algebraics in the foreach 

                 Key: PIG-580
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: types_branch

Currently Pig uses the combiner only when there is foreach following a group 
when the elements in the foreach generate have the following characteristics:
1) simple project of the "group" column
2) Algebraic UDF

The above conditions exclude use of the combiner for distinct aggregates - the 
distinct operation itself is combinable (irrespective of whether it feeds to an 
algebraic or non algebraic udf). So if the following foreach should also be 
b = group a by $0;
c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) }

The combiner optimizer should cause the distinct to be combined and the final 
combine output should feed the COUNT() and SUM() in the reduce.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to