Github user guoxiaolongzte commented on a diff in the pull request:
https://github.com/apache/spark/pull/20259#discussion_r161936082
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala
---
@@ -179,6 +181,7 @@ private[deploy] class Master(
}
persistenceEngine = persistenceEngine_
leaderElectionAgent = leaderElectionAgent_
+ startupTime = System.currentTimeMillis()
--- End diff --
Spark master process zombie, the background has a shell script
automatically pull the spark master process to ensure high availability, but
the restart process, there may be some applications such as failure.
If I look at startup time metric today, if the startup time is ten days ago
or a month ago, I would think the system is relatively stable, there is no
restart behavior.
If I look at the startup time metric today, if startup time was 1 day ago
or an hour ago, I would assume that the system is unstable and that a recent
reboot has occurred, requiring developers to troubleshoot problems and analyze
them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]