vanzin commented on a change in pull request #24817: [WIP][SPARK-27963][core] 
Allow dynamic allocation without a shuffle service.
URL: https://github.com/apache/spark/pull/24817#discussion_r296927131
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala
 ##########
 @@ -64,6 +69,26 @@ private[spark] class ExecutorMonitor(
   private val nextTimeout = new AtomicLong(Long.MaxValue)
   private var timedOutExecs = Seq.empty[String]
 
+  // Active job tracking.
+  //
+  // The following state is used when an external shuffle service is not in 
use, and allows Spark
+  // to scale down based on whether the shuffle data stored in executors is in 
use.
+  //
+  // The algorithm works as following: when jobs start, some state is kept 
that tracks which stages
+  // are part of that job, and which shuffle ID is attached to those stages. 
As tasks finish, the
+  // executor tracking code is updated to include the list of shuffles for 
which it's storing
+  // shuffle data.
+  //
+  // If executors hold shuffle data that is related to an active job, then the 
executor is
+  // considered to be in "shuffle busy" state; meaning that the executor is 
not allowed to be
+  // removed. If the executor has shuffle data but it doesn't relate to any 
active job, then it
+  // may be removed when idle, following the same timeout configuration used 
for cache blocks.
 
 Review comment:
   > Getting this timeout right becomes use case dependent but also adds 
another headache to the configuration options people need to think of
   
   Actually the heuristic is written in a way that for usual applications, this 
timeout shouldn't matter much. The context cleaner runs the GC periodically, 
which would clean up shuffles related to orphaned RDDs. Normal application run 
would also cause these orphaned RDDs to be collected. So for all practical 
purposes, in a normal application the executor would become idle soon after 
jobs are run (doubly so for SQL jobs, which don't seem to reuse the underlying 
RDDs).
   
   For shells this is a little more tricky because of the way the shell itself 
holds on to RDD references. So if you're directly playing with RDDs in a shell 
this may not perform as well, and then you'd be relying on the timeout. But, 
like the above, for SQL jobs you'd still end up with idle executors more often 
than not.
   
   But, on a side note, this comment is inaccurate since I ended up adding a 
separate timeout for the shuffle.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to