Github user guoxiaolongzte commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20259#discussion_r161936082
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
    @@ -179,6 +181,7 @@ private[deploy] class Master(
         }
         persistenceEngine = persistenceEngine_
         leaderElectionAgent = leaderElectionAgent_
    +    startupTime = System.currentTimeMillis()
    --- End diff --
    
    Spark master process zombie, the background has a shell script 
automatically pull the spark master process to ensure high availability, but 
the restart process, there may be some applications such as failure.
    
    If I look at startup time metric today, if the startup time is ten days ago 
or a month ago, I would think the system is relatively stable, there is no 
restart behavior.
    
    If I look at the startup time metric today, if startup time was 1 day ago 
or an hour ago, I would assume that the system is unstable and that a recent 
reboot has occurred, requiring developers to troubleshoot problems and analyze 
them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to