[ https://issues.apache.org/jira/browse/PIG-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-580: ------------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) patch committed; thanks, pradeep > PERFORMANCE: Combiner should also be used when there are distinct aggregates > in a foreach following a group provided there are no non-algebraics in the > foreach > ---------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-580 > URL: https://issues.apache.org/jira/browse/PIG-580 > Project: Pig > Issue Type: Improvement > Affects Versions: types_branch > Reporter: Pradeep Kamath > Assignee: Pradeep Kamath > Fix For: types_branch > > Attachments: PIG-580-v2.patch, PIG-580.patch > > > Currently Pig uses the combiner only when there is foreach following a group > when the elements in the foreach generate have the following characteristics: > 1) simple project of the "group" column > 2) Algebraic UDF > The above conditions exclude use of the combiner for distinct aggregates - > the distinct operation itself is combinable (irrespective of whether it feeds > to an algebraic or non algebraic udf). So if the following foreach should > also be combinable: > {code} > .. > b = group a by $0; > c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) } > {code} > The combiner optimizer should cause the distinct to be combined and the final > combine output should feed the COUNT() and SUM() in the reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.