Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/10394 )
Change subject: IMPALA-110 (part 2): Refactor PartitionedAggregationNode ...................................................................... Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/common/global-flags.cc@123 PS4, Line 123: // Stress options that are only enabled in debug builds for testing. > What's the motivation for keeping the old implementation? Are we just conce responded to Dan's equivalent comment http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/aggregator.h File be/src/exec/aggregator.h: http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/aggregator.h@95 PS4, Line 95: /// Account for peak memory used by this aggregator. > Is the idea here to account for memory used by each aggregator separately? Yes, seems like it would be nice for supportability, and I assume that it doesn't add too much overhead, even in the case where there are a lot of Aggregators? http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/exec-node.cc@310 PS4, Line 310: *node = pool->Add(new AggregationNode(pool, tnode, descs)); : } > Oops, ignore the previous comment. I wrote it without re-reading this code, I added this because its generally nice to have safety valves in case of regressions (mostly I'm thinking about bugs, we've taken a lot of care to prevent perf regressions) and this was easy to do. Of course, with this patch the code is essentially identical so bugs are unlikely, but the final patch will be making some real functional changes to the logic and there's a potential for issues, though even there I think that the risk is low. Agreed that having extra stuff to maintain is unfortunate, so I'll just go ahead and remove this http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/grouping-aggregator.h File be/src/exec/grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/grouping-aggregator.h@130 PS4, Line 130: /// streamed through and returned in 'out_batch'. AddBatch() and AddBatchStreaming() > is it a rule that AddBatchStreaming() and AddBatch() shouldn't both be used I think it would be fine, but since there's no reason to and its not tested, I warned against it in the comment. http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/grouping-aggregator.cc File be/src/exec/grouping-aggregator.cc: http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/grouping-aggregator.cc@91 PS4, Line 91: is_streaming_preagg_(tnode.agg_node.use_streaming_preaggregation), > should this be implied by the exec node type, now that we have a distinct e Yes but a GroupingAggregator can be used with either the StreamingAggNode or the regular AggNode and the GroupingAggregator needs to know which one it is in various places, eg. to know if we should codegen AddBatch or AddBatchStreaming http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.h File be/src/exec/non-grouping-aggregator.h: http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.h@103 PS4, Line 103: _' will be updated with the codegened : /// function. : /// Assumes AGGREGATED_ROWS = false. > is that still accurate? Done http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.h@107 PS4, Line 107: mCodeGen* codegen, > also the naming seems inconsistent with ProcessBatchNoGrouping(). The NoGro Done http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.cc File be/src/exec/non-grouping-aggregator.cc: http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.cc@123 PS4, Line 123: AddBatchImpl(batch)); > given that this is no 1:1 with AddBatch(), maybe just call it AddBatchWork( How about AddBatchImpl? http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/streaming-aggregation-node.h File be/src/exec/streaming-aggregation-node.h: http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/streaming-aggregation-node.h@35 PS4, Line 35: f there is not enough memory available > I think it also stops aggregating if it's not seeing a good reduction? if s Done -- To view, visit http://gerrit.cloudera.org:8080/10394 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e7bb583f54aa4add3738bde7f57cf3511ac567e Gerrit-Change-Number: 10394 Gerrit-PatchSet: 5 Gerrit-Owner: Thomas Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Thomas Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Wed, 27 Jun 2018 00:23:25 +0000 Gerrit-HasComments: Yes
