Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10394 )

Change subject: IMPALA-110 (part 2): Refactor PartitionedAggregationNode
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/common/global-flags.cc@123
PS4, Line 123: // Stress options that are only enabled in debug builds for 
testing.
> What's the motivation for keeping the old implementation? Are we just conce
responded to Dan's equivalent comment


http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/aggregator.h
File be/src/exec/aggregator.h:

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/aggregator.h@95
PS4, Line 95:   /// Account for peak memory used by this aggregator.
> Is the idea here to account for memory used by each aggregator separately?
Yes, seems like it would be nice for supportability, and I assume that it 
doesn't add too much overhead, even in the case where there are a lot of 
Aggregators?


http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/exec-node.cc@310
PS4, Line 310:        *node = pool->Add(new AggregationNode(pool, tnode, 
descs));
             :       }
> Oops, ignore the previous comment. I wrote it without re-reading this code,
I added this because its generally nice to have safety valves in case of 
regressions (mostly I'm thinking about bugs, we've taken a lot of care to 
prevent perf regressions) and this was easy to do.

Of course, with this patch the code is essentially identical so bugs are 
unlikely, but the final patch will be making some real functional changes to 
the logic and there's a potential for issues, though even there I think that 
the risk is low.

Agreed that having extra stuff to maintain is unfortunate, so I'll just go 
ahead and remove this


http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/grouping-aggregator.h
File be/src/exec/grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/grouping-aggregator.h@130
PS4, Line 130:   /// streamed through and returned in 'out_batch'. AddBatch() 
and AddBatchStreaming()
> is it a rule that AddBatchStreaming() and AddBatch() shouldn't both be used
I think it would be fine, but since there's no reason to and its not tested, I 
warned against it in the comment.


http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/grouping-aggregator.cc
File be/src/exec/grouping-aggregator.cc:

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/grouping-aggregator.cc@91
PS4, Line 91: is_streaming_preagg_(tnode.agg_node.use_streaming_preaggregation),
> should this be implied by the exec node type, now that we have a distinct e
Yes but a GroupingAggregator can be used with either the StreamingAggNode or 
the regular AggNode and the GroupingAggregator needs to know which one it is in 
various places, eg. to know if we should codegen AddBatch or AddBatchStreaming


http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.h
File be/src/exec/non-grouping-aggregator.h:

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.h@103
PS4, Line 103: _' will be updated with the codegened
             :   /// function.
             :   /// Assumes AGGREGATED_ROWS = false.
> is that still accurate?
Done


http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.h@107
PS4, Line 107: mCodeGen* codegen,
> also the naming seems inconsistent with ProcessBatchNoGrouping(). The NoGro
Done


http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.cc
File be/src/exec/non-grouping-aggregator.cc:

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/non-grouping-aggregator.cc@123
PS4, Line 123: AddBatchImpl(batch));
> given that this is no 1:1 with AddBatch(), maybe just call it AddBatchWork(
How about AddBatchImpl?


http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/streaming-aggregation-node.h
File be/src/exec/streaming-aggregation-node.h:

http://gerrit.cloudera.org:8080/#/c/10394/4/be/src/exec/streaming-aggregation-node.h@35
PS4, Line 35: f there is not enough memory available
> I think it also stops aggregating if it's not seeing a good reduction? if s
Done



--
To view, visit http://gerrit.cloudera.org:8080/10394
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e7bb583f54aa4add3738bde7f57cf3511ac567e
Gerrit-Change-Number: 10394
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Wed, 27 Jun 2018 00:23:25 +0000
Gerrit-HasComments: Yes

Reply via email to