[GitHub] [spark] Ngone51 commented on a change in pull request #29452: [SPARK-32643][CORE] Consolidate state decommissioning in the TaskSchedulerImpl realm

GitBox Tue, 18 Aug 2020 01:14:22 -0700


Ngone51 commented on a change in pull request #29452:
URL: https://github.com/apache/spark/pull/29452#discussion_r471997617




##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionInfo.scala
##########
@@ -18,11 +18,21 @@
 package org.apache.spark.scheduler
 
 /**
- * Provides more detail when an executor is being decommissioned.
+ * Message providing more detail when an executor is being decommissioned.
  * @param message Human readable reason for why the decommissioning is 
happening.
  * @param isHostDecommissioned Whether the host (aka the `node` or `worker` in 
other places) is
  *                             being decommissioned too. Used to infer if the 
shuffle data might
  *                             be lost even if the external shuffle service is 
enabled.
  */
 private[spark]
 case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: 
Boolean)
+
+/**
+ * State related to decommissioning that is kept by the TaskSchedulerImpl. 
This state is derived
+ * from the info message above but it is kept distinct to allow the state to 
evolve independently
+ * from the message.
+ */
+case class ExecutorDecommissionState(message: String,

Review comment:
       > So I felt that adding the timestamp field to a "message' is not a good 
idea because it bloats it un-necessarily with information it does not need. It 
also raises the question what does that timestamp even mean ? Timestamp of the 
executor (which can originate this message) or timestamp of the driver ?
   
   
   The timestamp will be used to calculate the `canExceedDeadline` in 
`TaskSetManager` later? And the timestamp means the approximate time the 
executor starts to decommission.
   
   
   > Its not a problem to couple these -- its just a preference to not couple 
them so that we can change them as time goes by (evolve them). Having a copy is 
redundant but it is also flexible.
   
   I just can not imagine what kind of evolution would make things to be hard 
to handle after we couple them. Actually, what worries me more is that 
`ExecutorDecommissionInfo` might not satisfy the requirement of other 
clusters(if we'd support them in the future). That could be a reason that we 
should not couple `ExecutorDecommissionState` with `ExecutorDecommissionInfo`. 
This also reminds me again that we'd better have an interface for 
`ExecutorDecommissionInfo` like `ExecutorLossReason`, so we could have a way to 
work with other cluster managers rather than Standalone only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a change in pull request #29452: [SPARK-32643][CORE] Consolidate state decommissioning in the TaskSchedulerImpl realm

Reply via email to