[Impala-ASF-CR] IMPALA-3748: estimate minimum buffers in planner

Tim Armstrong (Code Review) Thu, 09 Mar 2017 15:18:41 -0800

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3748: estimate minimum buffers in planner
......................................................................



Patch Set 8:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
File fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java:

Line 48:   public class ResourceRequirement {
> these aren't requirements if they include the existing "estimates". Resourc
Done


Line 115:       node.computeCosts(queryOptions);
> they're not costs
renamed to computeResourceConsumption(). I don't feel strongly about the name - 
please let me know if you have a better idea.


Line 156:             hdfsScanMemEstimate + hbaseScanMemEstimate + 
nonScanMemEstimate + dataSinkMemEstimate);
> long line
Done


http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

Line 209:   public int getNumInstancesPerHost(TQueryOptions queryOptions) {
> passing in the dop directly is preferable, because it's easier to see what'
Done


Line 232:   public long getNumDistinctValues(TQueryOptions queryOptions, 
List<Expr> exprs) {
> rename to include "per instance" in the name to clarify. you might also wan
Done


http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 120:   // estimated per-instance memory usage for this plan node in bytes;
> the "estimates" don't really have defined semantics (max? desirable?). we m
I added a comment to point to IMPALA-5013, which tracks that problem. Some of 
the estimates (HDFS scanner, join, agg) are reasonable and we might want to use 
them as a basis for estimating "ideal" memory once this evolves further.

Reworded to avoid "usage".


Line 124:   // Minimum number of buffers required to execute spilling operator.
> why restrict this to spilling? doesn't every operator have a defined requir
That's true. Currently the requirement is 0 for non-spilling operators but that 
will change as IMPALA-4834 progresses.

Updated the comment accordingly to be clearer.


Line 281:       TQueryOptions queryOptions, TExplainLevel detailLevel) {
> what's the role of the query options?
It's needed to get mt_dop. For this particular interface it's at a high-level 
of abstraction so seems a little specific to be passing in mt_dop as an int.


Line 317:           PrintUtils.printInstances(" ", 
fragment_.getNumInstances(queryOptions)));
> why do we want this?
I think it makes it much easier to interpret the plans with mt_dop > 1. 
Otherwise it's not immediately obvious what parallelism each node executes 
with, and particularly how the per-instance estimates relate to the total 
per-host number.

You suggested in a different comment that we should move it to plan fragment, 
so I did that.


Line 322:             PrintUtils.printPerInstanceMinBuffers(" ", 
perInstanceMinBufferBytes_));
> why make this conditional?
The output would be confusing if it was user-visible because the backend exec 
nodes often execute with less memory than them minimum buffers (because of the 
small buffers optimisation, etc). It'll be more actionable once we have true 
reservations.


http://gerrit.cloudera.org:8080/#/c/5847/8/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 10: |  hosts=1 instances=1 est-per-instance-mem=0B
> how come the number of hosts changed?
Because it's a single-node plan - I adjusted the approach so that # hosts more 
accurately reflects how the planner expects the plan to execute.

It's now based on PlanFragment.getNumNodes() (the # of instances of the  
fragment expected) vs PlanNode.getNumNodes() (an estimate that feeds into the 
decision how to partition a fragment). 

I think this will be even clearer if I move hosts/instances to the fragment 
level.

The previous approach led to various anomalies, e.g. hosts could be different 
for PlanNodes in the same fragment.


http://gerrit.cloudera.org:8080/#/c/5847/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test:

Line 15:      hosts=1 instances=1 est-per-instance-mem=0B
> regarding the number of fragment instances: that's really a per-fragment (r
I ended up making this change at explain levels 2 and 3. I've  updated some of 
affected tests so you can see how the new output looks. If we agree I can go 
through and fix the remaining tests (I don't have a working kudu instance 
locally so it's a bit labour-intensive to update some of the tests).


http://gerrit.cloudera.org:8080/#/c/5847/8/testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
File testdata/workloads/functional-query/queries/QueryTest/explain-level0.test:

Line 3
> why change this?
These tests were disabled for years. With all these complex queries I think 
they would require frequent updating for unrelated changes (e.g. planner 
changes). The new tests are simpler but should provide reasonable coverage of 
the output format of "explain".


Line 8: 'Per-Host Resource Estimates: Memory=388.41MB'
> how come no resource requirements?
Display of them is currently disabled (aside from PlannerTests). These tests 
reflect what we expect users to see when they run an explain.


http://gerrit.cloudera.org:8080/#/c/5847/8/testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
File testdata/workloads/functional-query/queries/QueryTest/explain-level3.test:

Line 8: 'Per-Host Resource Estimates: Memory=388.41MB'
> how come this is missing the requirements?
They're deliberately disabled for now


-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3748: estimate minimum buffers in planner

Reply via email to