Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314:         bufferSize = Math.min(bufferSize, 
Math.max(minSpillableBufferBytes_,
> Estimates are still conservative but more accurate than before - great! The
The effective MT_DOP would depend somewhat on how the scan ranges were divided 
between hosts too, right? E.g. if we had 4 scan ranges across 100 hosts, and 
mt_dop=4, could the planner assume that the scan ranges end up on different 
hosts?


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

Line 186:         || numNodes_ == 0) {
> What about numNodes_ == -1?
I'm not sure what this was about - I just kept this from the existing code. We 
don't actually use num_nodes_ in the calculation so maybe it's defunct and we 
should instead use num_instances_.


Line 190:       perInstanceDataBytes = (long) Math.ceil(getChild(1).cardinality_
> For broadcast, it seems we should count the full HT size only once per node
This might need some more thought about how this should work with parallel 
plans and separate join builds. It seems like once that's separated out more, 
we would compute the resource requirement for the single instance of the join 
build sink, rather than divide the requirement between the different fragment 
instances.


Line 193:         perInstanceDataBytes /= 
fragment_.getNumInstances(queryOptions.getMt_dop());
> getNumInstances() could return -1
Done


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/util/BitUtil.java
File fe/src/main/java/org/apache/impala/util/BitUtil.java:

Line 1: package org.apache.impala.util;
> Apache header
Done


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 409:     // vary the spillable buffer size..
> trailing "."
Done


Line 415:     PlanNode.minSpillableBufferBytes_ = 64 * 1024;
> Would it make sense to move this into RuntimeEnv. Setting a static variable
I didn't realise there was already a thing that existed for this purpose. Moved 
it there. Also changed it so that the default buffer size was derived from 
--read_size rather than hardcoded.


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to