Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22926 )

Change subject: IMPALA-14574: Lower memory estimate by analyzing 
PipelineMembership
......................................................................


Patch Set 14:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/22926/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22926/14//COMMIT_MSG@92
PS14, Line 92:   scale 10. No major regression is observed.
Were there any improvements to note? I guess perf tests run one at a time, so 
we wouldn't see any spilling to reduce or higher query concurrency. Do we have 
concurrency stress tests we could try out?


http://gerrit.cloudera.org:8080/#/c/22926/14/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/22926/14/be/src/service/query-options.cc@1437
PS14, Line 1437:         RETURN_IF_ERROR(GetThriftEnum(value, "Memory 
estimation mode",
nit: it might be better to stick with "Memory estimate mode" here. Consistency 
can make it easier to find things even if it's grammatically a little odd.


http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java
File fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java:

http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@54
PS14, Line 54:     long minMemPerInstance = 
fragment.getPerIntstanceMinMemOverestimation();
typo: getPerInstanceMinMemOverestimation?


http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@60
PS14, Line 60:    * allFragments must be arranged in reversed pre-order 
traversal (i.e., bottom-up).
Why does the order of allFragments matter?


http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@104
PS14, Line 104:       PlanFragment rootFragment, TQueryOptions queryOptions) {
I'd add trace logging before the loop that indicates something about the plan 
being estimated. Maybe just identify the rootFragment.


http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@167
PS14, Line 167:             // pm is an OPEN pipeline.
Are GETNEXT and OPEN really the only two options here? What guarantees that?


http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@219
PS14, Line 219:           for (PipelineMembership openPipe : openPipelines) {
This is pretty deeply nested. Consider breaking some of this function into 
separate functions.


http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@470
PS14, Line 470:     // There are several PlanNodes that do not reserve initial 
memory (0), but set
We could also fix this by ensuring every node sets an initial memory 
reservation. I guess your approach is less disruptive to existing planning.


http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java@714
PS14, Line 714:       LOG.info("maxPerHostEstimateBytes={}, 
memEstimateBytes={}, "
What does this look like in-context? It might be nice to provide a 
human-readable descriptor about why it's being included like: "Using 
MAX_PIPELINE estimate: ".


http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java@1097
PS14, Line 1097:   private static boolean hasIncompleteStats(PlanFragment 
rootFragment) {
We provide a warning about incomplete stats elsewhere. I'm surprised this 
helper needed to be created.


http://gerrit.cloudera.org:8080/#/c/22926/14/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test:

http://gerrit.cloudera.org:8080/#/c/22926/14/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test@513
PS14, Line 513: Per-Host Resource Estimates: Memory=41.59GB
This is a substantial change. Nice sign that it could be beneficial in some 
cases.

Have you tried running it to see if actual memory max memory use stays within 
the estimate?



--
To view, visit http://gerrit.cloudera.org:8080/22926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba4f40993d8360bfdc1eb592a626e285dfed1627
Gerrit-Change-Number: 22926
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Tue, 06 Jan 2026 00:37:29 +0000
Gerrit-HasComments: Yes

Reply via email to