[Impala-ASF-CR] IMPALA-12029: Add TScanRangeSpec.estimate output bytes per instance

2023-03-28 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Add 
TScanRangeSpec.estimate_output_bytes_per_instance
..


Patch Set 1:

Let's try to solve this case more directly by not bounding the threads the 
current executor group size except for on the last group being considered.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 28 Mar 2023 21:01:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Add TScanRangeSpec.estimate output bytes per instance

2023-03-28 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Add 
TScanRangeSpec.estimate_output_bytes_per_instance
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19656/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19656/1//COMMIT_MSG@23
PS1, Line 23: is
is not



--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 28 Mar 2023 18:24:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12029: Add TScanRangeSpec.estimate output bytes per instance

2023-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19656 )

Change subject: IMPALA-12029: Add 
TScanRangeSpec.estimate_output_bytes_per_instance
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12702/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Mar 2023 18:18:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12029: Add TScanRangeSpec.estimate output bytes per instance

2023-03-28 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19656


Change subject: IMPALA-12029: Add 
TScanRangeSpec.estimate_output_bytes_per_instance
..

IMPALA-12029: Add TScanRangeSpec.estimate_output_bytes_per_instance

In multiple executor group set setup, Frontend will try to match a query
with the smallest executor group set that can fit the memory and cpu
requirement of the compiled query. There are kind of query where the
compiled plan will fit to any executor group set but not necessarily
deliver the best performance. An example for this is Impala's COMPUTE
STATS query. It does full table scan and aggregate the stats, have
fairly simple query plan shape, but can benefit from higher scan
parallelism.

This patch add new field estimate_output_bytes_per_instance under
TScanRangeSpec as a feedback from Planner to Frontend about scan output
volume. The value of estimate_output_bytes_per_instance is derived from
scan cardinality times average output row size. It only set if scan
cardinality can be precisely derived from stats or estimated from table
size (not -1) and current scan instance count is already maximal.
TABLE_NUM_ROWS hint can be used as workaround to supply cardinality if
Planner can not estimate cardinality by itself.
estimate_output_bytes_per_instance will not be set for
DataSourceScanNode. estimate_output_bytes_per_instance also will not be
set for optimized scan such as partition key scan or count star query.

This patch also add new query option MAX_OUTPUT_BYTES_PER_SCAN_NODE.
Using this two information, Frontend can make judgement whether to
assign the compiled plan to current executor group anyway, or try step
up to the next larger executor group. If any TScanRangeSpec has
estimate_output_bytes_per_instance lower than
MAX_OUTPUT_BYTES_PER_SCAN_NODE, Frontend will step up to the next larger
executor group set to increase parallelism.

Testing:
- Pass core tests.
- Add test cases in query-options-test.cc.
- Add test case in test_query_cpu_count_divisor_default.
- Raised impala.admission-control.max-query-mem-limit.root.small from
  64MB to 70MB in llama-site-3-groups.xml so that the new test case can
  fit in root.small pool.

Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Planner.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/resources/llama-site-3-groups.xml
M tests/custom_cluster/test_executor_groups.py
12 files changed, 150 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/19656/1
--
To view, visit http://gerrit.cloudera.org:8080/19656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7a2276fbd344d00caa67103026661a3644b9a1f9
Gerrit-Change-Number: 19656
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto