Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19807 )
Change subject: IMPALA-12091: Control scan parallelism by its processing cost ...................................................................... Patch Set 3: (5 comments) Thank you for the feedback! Replying some below: http://gerrit.cloudera.org:8080/#/c/19807/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19807/3//COMMIT_MSG@17 PS3, Line 17: 200 scan range > just curious any reason to choose this number? Do we need to tune this numb I just ballpark this according to tpcds_parquet.store_sales (1824 files) that we use in PlannerTest.testProcessingCost. We definitely want to refine this further next or make it tunable with backend flag. http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/exec/scan-node.cc@298 PS3, Line 298: max_row_batches = max(2, max_row_batches / parent->plan_node().effective_dop()); > This is too disconnected from the max_row_batches computation above, which The max_row_batch value that has FLAGS_max_queued_row_batches_per_scanner_thread as component is the heuristic used in not MT_DOP mode (MT_DOP=0). The change is for when scan node is in MT_DOP mode, which roughly translate to dividing max_row_batches from previous formula equally among instances in the same host node. http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/19807/3/be/src/service/query-options.h@293 PS3, Line 293: QUERY_OPT_FN(processing_cost_max_threads, PROCESSING_COST_MAX_THREADS, \ > Rename max_fragment_threads_per_executor or something more tangible to the This is intended to match PROCESSING_COST_MIN_THREADS that already exist, and the context is per-fragment_id instead of per-query. max_fragment_threads_per_executor sounds like per-query rather than per-fragment_id. http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@658 PS3, Line 658: = > Is 0 valid value? Returned value will be Math.max against getMinParallelismPerNode(). But I'll change this method to return 1 just to be safe. http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/test/resources/llama-site-3-groups.xml File fe/src/test/resources/llama-site-3-groups.xml: http://gerrit.cloudera.org:8080/#/c/19807/3/fe/src/test/resources/llama-site-3-groups.xml@41 PS3, Line 41: <value>94371840</value> > Why is this change necessary? Some new test added in tests/custom_cluster/test_executor_groups.py hit this memory limit. Raising this a bit so they can fit to small pool both in term of cpu and memory. -- To view, visit http://gerrit.cloudera.org:8080/19807 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If948e45455275d9a61a6cd5d6a30a8b98a7c729a Gerrit-Change-Number: 19807 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Wed, 03 May 2023 20:55:39 +0000 Gerrit-HasComments: Yes
