Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates ......................................................................
Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 314: bufferSize = Math.min(bufferSize, Math.max(minSpillableBufferBytes_, > Estimates are still conservative but more accurate than before - great! The The effective MT_DOP would depend somewhat on how the scan ranges were divided between hosts too, right? E.g. if we had 4 scan ranges across 100 hosts, and mt_dop=4, could the planner assume that the scan ranges end up on different hosts? http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: Line 186: || numNodes_ == 0) { > What about numNodes_ == -1? I'm not sure what this was about - I just kept this from the existing code. We don't actually use num_nodes_ in the calculation so maybe it's defunct and we should instead use num_instances_. Line 190: perInstanceDataBytes = (long) Math.ceil(getChild(1).cardinality_ > For broadcast, it seems we should count the full HT size only once per node This might need some more thought about how this should work with parallel plans and separate join builds. It seems like once that's separated out more, we would compute the resource requirement for the single instance of the join build sink, rather than divide the requirement between the different fragment instances. Line 193: perInstanceDataBytes /= fragment_.getNumInstances(queryOptions.getMt_dop()); > getNumInstances() could return -1 Done http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/util/BitUtil.java File fe/src/main/java/org/apache/impala/util/BitUtil.java: Line 1: package org.apache.impala.util; > Apache header Done http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 409: // vary the spillable buffer size.. > trailing "." Done Line 415: PlanNode.minSpillableBufferBytes_ = 64 * 1024; > Would it make sense to move this into RuntimeEnv. Setting a static variable I didn't realise there was already a thing that existed for this purpose. Moved it there. Also changed it so that the default buffer size was derived from --read_size rather than hardcoded. -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
