Combiner should also be used when there are distinct aggregates in a foreach
following a group provided there are no non-algebraics in the foreach
---------------------------------------------------------------------------------------------------------------------------------------------------
Key: PIG-580
URL: https://issues.apache.org/jira/browse/PIG-580
Project: Pig
Issue Type: Improvement
Affects Versions: types_branch
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
Fix For: types_branch
Currently Pig uses the combiner only when there is foreach following a group
when the elements in the foreach generate have the following characteristics:
1) simple project of the "group" column
2) Algebraic UDF
The above conditions exclude use of the combiner for distinct aggregates - the
distinct operation itself is combinable (irrespective of whether it feeds to an
algebraic or non algebraic udf). So if the following foreach should also be
combinable:
{code}
..
b = group a by $0;
c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) }
{code}
The combiner optimizer should cause the distinct to be combined and the final
combine output should feed the COUNT() and SUM() in the reduce.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.