Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19807 )

Change subject: IMPALA-12091: Control scan parallelism by its processing cost
......................................................................


Patch Set 3:

(5 comments)

Thank you for the feedback!
Replying some below:

http://gerrit.cloudera.org:8080/#/c/19807/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19807/3//COMMIT_MSG@17
PS3, Line 17: 200 scan range
> just curious any reason to choose this number? Do we need to tune this numb
I just ballpark this according to tpcds_parquet.store_sales (1824 files) that 
we use in PlannerTest.testProcessingCost.
We definitely want to refine this further next or make it tunable with backend 
flag.


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/scan-node.cc@298
PS3, Line 298:     max_row_batches = max(2, max_row_batches / 
parent->plan_node().effective_dop());
> This is too disconnected from the max_row_batches computation above, which
The max_row_batch value that has 
FLAGS_max_queued_row_batches_per_scanner_thread as component is the heuristic 
used in not MT_DOP mode (MT_DOP=0).

The change is for when scan node is in MT_DOP mode, which roughly translate to 
dividing max_row_batches from previous formula equally among instances in the 
same host node.


http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/service/query-options.h@293
PS3, Line 293:   QUERY_OPT_FN(processing_cost_max_threads, 
PROCESSING_COST_MAX_THREADS,                 \
> Rename max_fragment_threads_per_executor or something more tangible to the
This is intended to match PROCESSING_COST_MIN_THREADS that already exist, and 
the context is per-fragment_id instead of per-query.

max_fragment_threads_per_executor sounds like per-query rather than 
per-fragment_id.


http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@658
PS3, Line 658: =
> Is 0 valid value?
Returned value will be Math.max against getMinParallelismPerNode(). But I'll 
change this method to return 1 just to be safe.


http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/test/resources/llama-site-3-groups.xml
File fe/src/test/resources/llama-site-3-groups.xml:

http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/test/resources/llama-site-3-groups.xml@41
PS3, Line 41:     <value>94371840</value>
> Why is this change necessary?
Some new test added in tests/custom_cluster/test_executor_groups.py hit this 
memory limit. Raising this a bit so they can fit to small pool both in term of 
cpu and memory.



--
To view, visit http://gerrit.cloudera.org:8080/19807
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If948e45455275d9a61a6cd5d6a30a8b98a7c729a
Gerrit-Change-Number: 19807
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Wed, 03 May 2023 20:55:39 +0000
Gerrit-HasComments: Yes

Reply via email to