[ https://issues.apache.org/jira/browse/PIG-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660944#action_12660944 ]
alangates edited comment on PIG-580 at 1/5/09 1:51 PM: -------------------------------------------------------- In CombinerOptimizer.visitDistinct you have: {code} + if(sawDistinctAgg) { + // we want to combine only in the case where there is only + // one PODistinct which is the only input to an agg + // we apparently have seen a PODistinct before, so lets not + // combine. + sawNonAlgebraic = true; + } {code} but I can envision a case where you want to count multiple distinct things: {code} A = load ... B = group A by $0; C = foreach B { Aa = B.$1; Ab = distinct Aa; Ba = B.$2; Bb = distinct Ba; generate group, COUNT(Ab), COUNT(Bb); } {code} Is there a reason we need to not use the combiner with multiple distincts? was (Author: alangates): In CombinerOptimizer.visitDistinct you have: {code} + if(sawDistinctAgg) { + // we want to combine only in the case where there is only + // one PODistinct which is the only input to an agg + // we apparently have seen a PODistinct before, so lets not + // combine. + sawNonAlgebraic = true; + } {code} but I can envision a case where you want to count multiple distinct things: {code} A = load ... B = group A by $0; C = foreach B { Aa = B.$1; Ab = distinct Aa; Ba = B.$2; Bb = distinct Ba; generate group, COUNT(Ab), COUNT(Bb); } Is there a reason we need to not use the combiner with multiple distincts? > PERFORMANCE: Combiner should also be used when there are distinct aggregates > in a foreach following a group provided there are no non-algebraics in the > foreach > ---------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-580 > URL: https://issues.apache.org/jira/browse/PIG-580 > Project: Pig > Issue Type: Improvement > Affects Versions: types_branch > Reporter: Pradeep Kamath > Assignee: Pradeep Kamath > Fix For: types_branch > > Attachments: PIG-580-v2.patch, PIG-580.patch > > > Currently Pig uses the combiner only when there is foreach following a group > when the elements in the foreach generate have the following characteristics: > 1) simple project of the "group" column > 2) Algebraic UDF > The above conditions exclude use of the combiner for distinct aggregates - > the distinct operation itself is combinable (irrespective of whether it feeds > to an algebraic or non algebraic udf). So if the following foreach should > also be combinable: > {code} > .. > b = group a by $0; > c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) } > {code} > The combiner optimizer should cause the distinct to be combined and the final > combine output should feed the COUNT() and SUM() in the reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.