Tim Armstrong has posted comments on this change. Change subject: IMPALA-3748: estimate minimum buffers in planner ......................................................................
Patch Set 8: (15 comments) http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java File fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java: Line 48: public class ResourceRequirement { > these aren't requirements if they include the existing "estimates". Resourc Done Line 115: node.computeCosts(queryOptions); > they're not costs renamed to computeResourceConsumption(). I don't feel strongly about the name - please let me know if you have a better idea. Line 156: hdfsScanMemEstimate + hbaseScanMemEstimate + nonScanMemEstimate + dataSinkMemEstimate); > long line Done http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: Line 209: public int getNumInstancesPerHost(TQueryOptions queryOptions) { > passing in the dop directly is preferable, because it's easier to see what' Done Line 232: public long getNumDistinctValues(TQueryOptions queryOptions, List<Expr> exprs) { > rename to include "per instance" in the name to clarify. you might also wan Done http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: Line 120: // estimated per-instance memory usage for this plan node in bytes; > the "estimates" don't really have defined semantics (max? desirable?). we m I added a comment to point to IMPALA-5013, which tracks that problem. Some of the estimates (HDFS scanner, join, agg) are reasonable and we might want to use them as a basis for estimating "ideal" memory once this evolves further. Reworded to avoid "usage". Line 124: // Minimum number of buffers required to execute spilling operator. > why restrict this to spilling? doesn't every operator have a defined requir That's true. Currently the requirement is 0 for non-spilling operators but that will change as IMPALA-4834 progresses. Updated the comment accordingly to be clearer. Line 281: TQueryOptions queryOptions, TExplainLevel detailLevel) { > what's the role of the query options? It's needed to get mt_dop. For this particular interface it's at a high-level of abstraction so seems a little specific to be passing in mt_dop as an int. Line 317: PrintUtils.printInstances(" ", fragment_.getNumInstances(queryOptions))); > why do we want this? I think it makes it much easier to interpret the plans with mt_dop > 1. Otherwise it's not immediately obvious what parallelism each node executes with, and particularly how the per-instance estimates relate to the total per-host number. You suggested in a different comment that we should move it to plan fragment, so I did that. Line 322: PrintUtils.printPerInstanceMinBuffers(" ", perInstanceMinBufferBytes_)); > why make this conditional? The output would be confusing if it was user-visible because the backend exec nodes often execute with less memory than them minimum buffers (because of the small buffers optimisation, etc). It'll be more actionable once we have true reservations. http://gerrit.cloudera.org:8080/#/c/5847/8/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: Line 10: | hosts=1 instances=1 est-per-instance-mem=0B > how come the number of hosts changed? Because it's a single-node plan - I adjusted the approach so that # hosts more accurately reflects how the planner expects the plan to execute. It's now based on PlanFragment.getNumNodes() (the # of instances of the fragment expected) vs PlanNode.getNumNodes() (an estimate that feeds into the decision how to partition a fragment). I think this will be even clearer if I move hosts/instances to the fragment level. The previous approach led to various anomalies, e.g. hosts could be different for PlanNodes in the same fragment. http://gerrit.cloudera.org:8080/#/c/5847/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test: Line 15: hosts=1 instances=1 est-per-instance-mem=0B > regarding the number of fragment instances: that's really a per-fragment (r I ended up making this change at explain levels 2 and 3. I've updated some of affected tests so you can see how the new output looks. If we agree I can go through and fix the remaining tests (I don't have a working kudu instance locally so it's a bit labour-intensive to update some of the tests). http://gerrit.cloudera.org:8080/#/c/5847/8/testdata/workloads/functional-query/queries/QueryTest/explain-level0.test File testdata/workloads/functional-query/queries/QueryTest/explain-level0.test: Line 3 > why change this? These tests were disabled for years. With all these complex queries I think they would require frequent updating for unrelated changes (e.g. planner changes). The new tests are simpler but should provide reasonable coverage of the output format of "explain". Line 8: 'Per-Host Resource Estimates: Memory=388.41MB' > how come no resource requirements? Display of them is currently disabled (aside from PlannerTests). These tests reflect what we expect users to see when they run an explain. http://gerrit.cloudera.org:8080/#/c/5847/8/testdata/workloads/functional-query/queries/QueryTest/explain-level3.test File testdata/workloads/functional-query/queries/QueryTest/explain-level3.test: Line 8: 'Per-Host Resource Estimates: Memory=388.41MB' > how come this is missing the requirements? They're deliberately disabled for now -- To view, visit http://gerrit.cloudera.org:8080/5847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
