skonto commented on a change in pull request #24817: [WIP][SPARK-27963][core] 
Allow dynamic allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#discussion_r291847257
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##########
 @@ -64,6 +69,26 @@ private[spark] class ExecutorMonitor(
   private val nextTimeout = new AtomicLong(Long.MaxValue)
   private var timedOutExecs = Seq.empty[String]
 
+  // Active job tracking.
+  //
+  // The following state is used when an external shuffle service is not in 
use, and allows Spark
+  // to scale down based on whether the shuffle data stored in executors is in 
use.
+  //
+  // The algorithm works as following: when jobs start, some state is kept 
that tracks which stages
+  // are part of that job, and which shuffle ID is attached to those stages. 
As tasks finish, the
+  // executor tracking code is updated to include the list of shuffles for 
which it's storing
+  // shuffle data.
+  //
+  // If executors hold shuffle data that is related to an active job, then the 
executor is
+  // considered to be in "shuffle busy" state; meaning that the executor is 
not allowed to be
+  // removed. If the executor has shuffle data but it doesn't relate to any 
active job, then it
+  // may be removed when idle, following the same timeout configuration used 
for cache blocks.
 
 Review comment:
   As a side note. There is a trade-off here, you dont want to keep things 
around wasting resources when idle  but also not to re-start executors when you 
have shuffle data that are being used by the job. Getting this timeout right 
becomes use case dependent but also adds another headache to the configuration 
options people need to think of. Configuration is the number one problem people 
I have seen people facing with Spark (including DA) and it is never obvious how 
they should configure things.  Of course for this addition it is reasonable to 
to make it configurable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to