Tagar commented on a change in pull request #32766:
URL: https://github.com/apache/spark/pull/32766#discussion_r645111860
##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -2023,6 +2023,22 @@ package object config {
.stringConf
.createWithDefaultString("PWR")
+ private[spark] val EXECUTOR_DECOMMISSION_BATCH_INTERVAL =
+ ConfigBuilder("spark.executor.decommission.batchInterval")
+ .doc("Executors are decommissioned in batched to avoid overloading
network bandwidth in" +
+ " migrating rdd and shuffle data. This config sets the interval
between batches.")
+ .version("3.2.0")
+ .timeConf(TimeUnit.MILLISECONDS)
+ .createWithDefault(30000)
Review comment:
Azure gives a very limited time of just 30s for spot instance removal.
With these defaults only 3 nodes will get a chance to evacuate shuffle/cache
data.
(9 nodes on AWS with the 90s notification)
Any chance to make this (much) smaller?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]