Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/22926 )
Change subject: IMPALA-14574: Lower memory estimate by analyzing PipelineMembership ...................................................................... Patch Set 14: (11 comments) http://gerrit.cloudera.org:8080/#/c/22926/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22926/14//COMMIT_MSG@92 PS14, Line 92: scale 10. No major regression is observed. Were there any improvements to note? I guess perf tests run one at a time, so we wouldn't see any spilling to reduce or higher query concurrency. Do we have concurrency stress tests we could try out? http://gerrit.cloudera.org:8080/#/c/22926/14/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/22926/14/be/src/service/query-options.cc@1437 PS14, Line 1437: RETURN_IF_ERROR(GetThriftEnum(value, "Memory estimation mode", nit: it might be better to stick with "Memory estimate mode" here. Consistency can make it easier to find things even if it's grammatically a little odd. http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java File fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java: http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@54 PS14, Line 54: long minMemPerInstance = fragment.getPerIntstanceMinMemOverestimation(); typo: getPerInstanceMinMemOverestimation? http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@60 PS14, Line 60: * allFragments must be arranged in reversed pre-order traversal (i.e., bottom-up). Why does the order of allFragments matter? http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@104 PS14, Line 104: PlanFragment rootFragment, TQueryOptions queryOptions) { I'd add trace logging before the loop that indicates something about the plan being estimated. Maybe just identify the rootFragment. http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@167 PS14, Line 167: // pm is an OPEN pipeline. Are GETNEXT and OPEN really the only two options here? What guarantees that? http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PipelineBasedMemEstimator.java@219 PS14, Line 219: for (PipelineMembership openPipe : openPipelines) { This is pretty deeply nested. Consider breaking some of this function into separate functions. http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@470 PS14, Line 470: // There are several PlanNodes that do not reserve initial memory (0), but set We could also fix this by ensuring every node sets an initial memory reservation. I guess your approach is less disruptive to existing planning. http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java@714 PS14, Line 714: LOG.info("maxPerHostEstimateBytes={}, memEstimateBytes={}, " What does this look like in-context? It might be nice to provide a human-readable descriptor about why it's being included like: "Using MAX_PIPELINE estimate: ". http://gerrit.cloudera.org:8080/#/c/22926/14/fe/src/main/java/org/apache/impala/planner/Planner.java@1097 PS14, Line 1097: private static boolean hasIncompleteStats(PlanFragment rootFragment) { We provide a warning about incomplete stats elsewhere. I'm surprised this helper needed to be created. http://gerrit.cloudera.org:8080/#/c/22926/14/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test: http://gerrit.cloudera.org:8080/#/c/22926/14/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test@513 PS14, Line 513: Per-Host Resource Estimates: Memory=41.59GB This is a substantial change. Nice sign that it could be beneficial in some cases. Have you tried running it to see if actual memory max memory use stays within the estimate? -- To view, visit http://gerrit.cloudera.org:8080/22926 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba4f40993d8360bfdc1eb592a626e285dfed1627 Gerrit-Change-Number: 22926 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Tue, 06 Jan 2026 00:37:29 +0000 Gerrit-HasComments: Yes
