Tim Armstrong has posted comments on this change. Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates ......................................................................
Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: Line 314: // default buffer size, e.g. with small dimension tables. > That's correct. Ultimately the planner can only guess because we do the sch Filed IMPALA-5565. I'm still a little hazy on the exact scope of what needs to be fixed so it might be helpful to add any ideas you had in there. http://gerrit.cloudera.org:8080/#/c/6963/6/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: Line 186: if (getChild(1).getCardinality() == -1 || getChild(1).getAvgRowSize() == -1 > Agree Done. Made it check for <= 0 since there are a couple of bugs in the numNodes calculation that I fixed in https://gerrit.cloudera.org/#/c/7223/ where it could return 0. Line 190: } else { > Add a Preconditions check that asserts MT_DOP=0 with a comment that the est We still need to generate plans in planner tests, so I made the DCHECK conditional on whether it's a test environment. http://gerrit.cloudera.org:8080/#/c/6963/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: Line 629: protected final static long getDefaultSpillableBufferBytes() { > Why this indirection? To make the intent clear at the callsites. The HDFS read size and the spillable buffer size will be decoupled in IMPALA-3200. It will be easier to fix that up if we make the distinction in the planner already. http://gerrit.cloudera.org:8080/#/c/6963/8/testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test File testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test: Line 2: select * > use straight_join for these tests Done Line 761: select string_col, count(*) > Add a similar test with no stats for the join case (or integrate into this I have a test above " Join with no stats for right input - should use default buffers." Or did you have an additional test in mind? -- To view, visit http://gerrit.cloudera.org:8080/6963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
