Ngone51 commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r534913268
##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1992,4 +1992,32 @@ package object config {
.version("3.1.0")
.doubleConf
.createWithDefault(5)
+
+ private[spark] val SHUFFLE_NUM_PUSH_THREADS =
+ ConfigBuilder("spark.shuffle.push.numPushThreads")
+ .doc("Specify the number of threads in the block pusher pool. These
threads assist " +
+ "in creating connections and pushing blocks to remote shuffle services
when push based " +
+ "shuffle is enabled. By default, the threadpool size is equal to the
number of cores.")
+ .version("3.1.0")
+ .intConf
+ .createOptional
+
+ private[spark] val SHUFFLE_MAX_BLOCK_SIZE_TO_PUSH =
+ ConfigBuilder("spark.shuffle.push.maxBlockSizeToPush")
+ .doc("The max size of an individual block to push to the remote shuffle
services when push " +
+ "based shuffle is enabled. Blocks larger than this threshold are not
pushed.")
+ .version("3.1.0")
+ .bytesConf(ByteUnit.KiB)
+ .createWithDefaultString("800k")
Review comment:
@Victsm
For 1, I think 1MiB is better(that's what Spark usually do) if you do not
see much performance difference between 800Kib and 1 MiB. And you'd better add
more explanation in the conf doc to say something like the default value is the
appropriate value to avoid potential disk throughput issue and a small value
could lead to severe disk issue.
For 2, AFAIK, Spark doesn't know the details of AQE at the shuffle level
yet. So we actually don't even know whether there's a join operation at SQL
query. So how can we decide whether the skewed partition needs to be calculated
or not inside ShuffleBlockPusher? Besides, if we want to calculate the skewed
partition, we should ensure it's the same(or less than) as the calculated one
at AQE level, right? Could we make sure of it? (Maybe this's easy if two places
use the same algorithm)
It's might possible when AQE can pass more info through the task and
ShuffleBlockPusher leverage it then. But for now, I feel it's kind of hard to
intersect with AQE.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]