rdblue commented on a change in pull request #23401: [SPARK-26513][Core] : 
Trigger GC on executor node idle
URL: https://github.com/apache/spark/pull/23401#discussion_r245070406
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/executor/Executor.scala
 ##########
 @@ -628,6 +630,22 @@ private[spark] class Executor(
           }
       } finally {
         runningTasks.remove(taskId)
+        if(idleGCEnabled) {
+          if (runningTasks.isEmpty) {
 
 Review comment:
   I think that the concern is that the executor doesn't know when it will 
receive another task. Consider the case where an executor has 1 core and high 
memory usage due to broadcast tables. This would trigger a GC between each 
task, which could delay the entire job.
   
   I think it would be better to coordinate with the driver, which knows 
whether there are more tasks in a stage (or if there are multiple concurrent 
stages). Seems to me that a stage boundary is the right time to optimistically 
trigger GC. Of course, that only works if the executor is going to keep running.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to