holdenk commented on a change in pull request #32436:
URL: https://github.com/apache/spark/pull/32436#discussion_r626096418



##########
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
##########
@@ -217,14 +217,42 @@ private[spark] class ExecutorPodsLifecycleManager(
     ExecutorExited(exitCode, exitCausedByApp, exitMessage)
   }
 
+  // A utility function to try and help people figure out whats gone wrong 
faster.
+  private def describeExitCode(code: Int): String = {
+    val humanStr = code match {
+      case 0 => "(success)"
+      case 1 => "(generic, look at logs to clarify)"
+      case 42 => "(douglas adams)"
+      // Spark specific
+      case 10 => "(Uncaught exception)"
+      case 50 => "(Uncaught exception)"
+      case 52 => "(JVM OOM)"
+      case 53 => "(DiskStore failed to create temp dir)"
+      // K8s & JVM specific exit codes
+      case 126 => "(not executable - possibly perm or arch)"
+      case 137 => "(SIGKILL, possible container OOM)"
+      case 139 => "(SIGSEGV: that's unexpected)"
+      case 255 => "(exit-1, your guess is as good as mine)"

Review comment:
       So I think it's going to be inconsistent depending on how exactly it 
shows up (e.g. does the JVM have an uncaught exception trying to write a file 
or do we exceed the resource quota). So for now I don't have a clear exit code 
to map to it unfortunately. I could try and add a base handler for IO errors 
that are uncaught to exit with a specific code, but I'd rather do that in a 
separate PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to