Alex Behm has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates ......................................................................
Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 314: // default buffer size, e.g. with small dimension tables. > The effective MT_DOP would depend somewhat on how the scan ranges were divi That's correct. Ultimately the planner can only guess because we do the scheduling in the BE. I think we should be conservative and not underestimate, but at the same time the number of fragment instances can definitely not be higher than the total number of scan ranges. In your example, I don't think it's safe to assume that the 4 ranges would be spread across 4 hosts (but that's probably what the planner will think in terms of #hosts). Also I don't think the number of hosts and scan ranges will vary dramatically in practice, because the planner estimates the number of hosts based on the scan-range replicas. So with a replication factor of R and N scan ranges you cannot have more than R*N hosts. Ultimately the number of hosts is not very reliable either. I think addressing this comment may be a little involved, so how about filing a JIRA and deferring for now? http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: Line 186: if (getChild(1).getCardinality() == -1 || getChild(1).getAvgRowSize() == -1 > I'm not sure what this was about - I just kept this from the existing code. Agree Line 190: } else { > This might need some more thought about how this should work with parallel Add a Preconditions check that asserts MT_DOP=0 with a comment that the estimation is not quite right with parallel plans? http://gerrit.cloudera.org:8080/#/c/6963/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: Line 629: protected final static long getDefaultSpillableBufferBytes() { Why this indirection? http://gerrit.cloudera.org:8080/#/c/6963/8/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test File testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test: Line 2: select * use straight_join for these tests Line 761: select string_col, count(*) Add a similar test with no stats for the join case (or integrate into this one) -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes