Victsm commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r537832601
##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1992,4 +1992,32 @@ package object config {
.version("3.1.0")
.doubleConf
.createWithDefault(5)
+
+ private[spark] val SHUFFLE_NUM_PUSH_THREADS =
+ ConfigBuilder("spark.shuffle.push.numPushThreads")
+ .doc("Specify the number of threads in the block pusher pool. These
threads assist " +
+ "in creating connections and pushing blocks to remote shuffle services
when push based " +
+ "shuffle is enabled. By default, the threadpool size is equal to the
number of cores.")
+ .version("3.1.0")
+ .intConf
+ .createOptional
+
+ private[spark] val SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH =
+ ConfigBuilder("spark.shuffle.push.maxBlockSizeToPush")
+ .doc("The max size of an individual block to push to the remote shuffle
services when push " +
+ "based shuffle is enabled. Blocks larger than this threshold are not
pushed.")
+ .version("3.1.0")
+ .bytesConf(ByteUnit.KiB)
+ .createWithDefaultString("800k")
Review comment:
@Ngone51 agree that we cannot have perfect alignment between AQE and
push-based shuffle right now, due to these 2 operating at 2 different levels.
The current implementation is rather an opportunistic approach in the hope
that large blocks possibly also come from skewed partitions which if the
shuffle is for a join and AQE is enabled, the corresponding partition could be
caught as a skewed partition by AQE.
If this happens, the skewed partition would be consumed differently by AQE,
not leveraging the merged shuffle partitions by push-based shuffle. Thus,
pushing these blocks could be wasteful.
With the current limitations of AQE, i.e. skew handling is only for join and
it's still largely based on the original shuffle mechanism, it is not
straightforward to intersect push-based shuffle with AQE.
This could also be an area for further improvements down the way.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]