Victsm commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r537832601



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1992,4 +1992,32 @@ package object config {
       .version("3.1.0")
       .doubleConf
       .createWithDefault(5)
+
+  private[spark] val SHUFFLE_NUM_PUSH_THREADS =
+    ConfigBuilder("spark.shuffle.push.numPushThreads")
+      .doc("Specify the number of threads in the block pusher pool. These 
threads assist " +
+        "in creating connections and pushing blocks to remote shuffle services 
when push based " +
+        "shuffle is enabled. By default, the threadpool size is equal to the 
number of cores.")
+      .version("3.1.0")
+      .intConf
+      .createOptional
+
+  private[spark] val SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH =
+    ConfigBuilder("spark.shuffle.push.maxBlockSizeToPush")
+      .doc("The max size of an individual block to push to the remote shuffle 
services when push " +
+        "based shuffle is enabled. Blocks larger than this threshold are not 
pushed.")
+      .version("3.1.0")
+      .bytesConf(ByteUnit.KiB)
+      .createWithDefaultString("800k")

Review comment:
       @Ngone51 agree that we cannot have perfect alignment between AQE and 
push-based shuffle right now, due to these 2 operating at 2 different levels.
   The current implementation is rather an opportunistic approach in the hope 
that large blocks possibly also come from skewed partitions which if the 
shuffle is for a join and AQE is enabled, the corresponding partition could be 
caught as a skewed partition by AQE.
   If this happens, the skewed partition would be consumed differently by AQE, 
not leveraging the merged shuffle partitions by push-based shuffle. Thus, 
pushing these blocks could be wasteful.
   With the current limitations of AQE, i.e. skew handling is only for join and 
it's still largely based on the original shuffle mechanism, it is not 
straightforward to intersect push-based shuffle with AQE.
   This could also be an area for further improvements down the way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to