skonto commented on a change in pull request #24817: [WIP][SPARK-27963][core] Allow dynamic allocation without a shuffle service. URL: https://github.com/apache/spark/pull/24817#discussion_r291847257
########## File path: core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala ########## @@ -64,6 +69,26 @@ private[spark] class ExecutorMonitor( private val nextTimeout = new AtomicLong(Long.MaxValue) private var timedOutExecs = Seq.empty[String] + // Active job tracking. + // + // The following state is used when an external shuffle service is not in use, and allows Spark + // to scale down based on whether the shuffle data stored in executors is in use. + // + // The algorithm works as following: when jobs start, some state is kept that tracks which stages + // are part of that job, and which shuffle ID is attached to those stages. As tasks finish, the + // executor tracking code is updated to include the list of shuffles for which it's storing + // shuffle data. + // + // If executors hold shuffle data that is related to an active job, then the executor is + // considered to be in "shuffle busy" state; meaning that the executor is not allowed to be + // removed. If the executor has shuffle data but it doesn't relate to any active job, then it + // may be removed when idle, following the same timeout configuration used for cache blocks. Review comment: As a side note. There is a trade-off here, you dont want to keep things around wasting resources when idle but also not to re-start executors when you have shuffle data that are being used by the job. Getting this timeout right becomes use case dependent but also adds another headache to the configuration options people need to think of. Configuration is the number one problem I have seen people facing with Spark (including DA) and it is never obvious how they should configure things. Of course for this addition it is reasonable to make it configurable. The good thing is you may have a chance to keep latency stable after an idle period of time, as the executors you care about, are around at least. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
