Alex Behm has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 314:         // default buffer size, e.g. with small dimension tables.
> The effective MT_DOP would depend somewhat on how the scan ranges were divi
That's correct. Ultimately the planner can only guess because we do the 
scheduling in the BE. I think we should be conservative and not underestimate, 
but at the same time the number of fragment instances can definitely not be 
higher than the total number of scan ranges.

In your example, I don't think it's safe to assume that the 4 ranges would be 
spread across 4 hosts (but that's probably what the planner will think in terms 
of #hosts). Also I don't think the number of hosts and scan ranges will vary 
dramatically in practice, because the planner estimates the number of hosts 
based on the scan-range replicas. So with a replication factor of R and N scan 
ranges you cannot have more than R*N hosts. Ultimately the number of hosts is 
not very reliable either.

I think addressing this comment may be a little involved, so how about filing a 
JIRA and deferring for now?


http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

Line 186:     if (getChild(1).getCardinality() == -1 || 
getChild(1).getAvgRowSize() == -1
> I'm not sure what this was about - I just kept this from the existing code.
Agree


Line 190:     } else {
> This might need some more thought about how this should work with parallel 
Add a Preconditions check that asserts MT_DOP=0 with a comment that the 
estimation is not quite right with parallel plans?


http://gerrit.cloudera.org:8080/#/c/6963/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 629:   protected final static long getDefaultSpillableBufferBytes() {
Why this indirection?


http://gerrit.cloudera.org:8080/#/c/6963/8/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test:

Line 2: select *
use straight_join for these tests


Line 761: select string_col, count(*)
Add a similar test with no stats for the join case (or integrate into this one)


-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to