arham0254A commented on PR #56736: URL: https://github.com/apache/spark/pull/56736#issuecomment-4801968776
@pan3793 That would definitely be the ideal behavior for downstream debugging However, keeping in mind the initial issue reported—where external orchestrators (like Airflow or bash scripts) are suffering from silent pipeline failures because `spark-submit` is returning a `0` (success) despite the remote driver crashing—this PR is aimed at providing an immediate fix to stop those false positives. Currently, the `reportDriverStatus` method relies entirely on the `DriverStatusResponse` RPC message, which only contains the `DriverState` enum (FINISHED, FAILED, ERROR, KILLED) and an `Option[Exception]`. The actual integer exit code of the remote JVM process isn't currently being passed back from the Master to the Client in that payload. To forward the real exit code, we would need to significantly expand the scope of this PR by modifying the internal RPC protocol across the Worker, Master, and Client to capture, store, and transmit that specific integer. Given that architectural constraint, would it be acceptable to stick with a generic non-zero code (`-1`) for this PR to immediately resolve the critical silent-failure bug for orchestrators, and perhaps open a follow-up ticket to enhance the RPC protocol later? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
