Re: [PR] [SPARK-49057][SQL] Do not block the AQE loop when submitting query stages [spark]

via GitHub Thu, 01 Aug 2024 07:50:16 -0700


cloud-fan commented on code in PR #47533:
URL: https://github.com/apache/spark/pull/47533#discussion_r1700339034



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala:
##########
@@ -170,6 +170,16 @@ object StaticSQLConf {
       .intConf
       .createWithDefault(1000)
 
+  val SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD =
+    buildStaticConf("spark.sql.shuffleExchange.maxThreadThreshold")
+      .internal()
+      .doc("The maximum degree of parallelism for doing preparation of shuffle 
exchange, " +
+        "which includes subquery execution, file listing, etc.")
+      .version("4.0.0")
+      .intConf
+      .checkValue(thres => thres > 0 && thres <= 1024, "The threshold must be 
in (0,1024].")
+      .createWithDefault(1024)

Review Comment:
   The shuffle async job is just waiting for other work (subquery expression 
execution) to finish, which is very light-weighted. The broadcast async job 
executes a query and collects the result in the driver, which is very heavy. 
That's why we can give much larger parallelism to the shuffle async jobs. In 
our benchmark we found this number is reasonably good for TPC.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49057][SQL] Do not block the AQE loop when submitting query stages [spark]

Reply via email to