dongjoon-hyun opened a new pull request, #650: URL: https://github.com/apache/spark-kubernetes-operator/pull/650
### What changes were proposed in this pull request? This PR replaces `-XX:+ExitOnOutOfMemoryError` with `-XX:+CrashOnOutOfMemoryError -XX:ErrorFile=/dev/stderr` in the operator container's default JVM arguments. ### Why are the changes needed? `-XX:+ExitOnOutOfMemoryError` simply calls `exit(1)` on OOM, leaving no diagnostic information behind. `-XX:+CrashOnOutOfMemoryError` instead aborts the JVM (via `SIGABRT`) and produces a HotSpot fatal error report (stack traces, memory map, GC state, OOM cause), which makes post-mortem analysis of operator OOMs much easier. By default the report is written to `hs_err_pid<pid>.log` on disk, which would be lost when the pod is restarted and would consume ephemeral storage. Setting `-XX:ErrorFile=/dev/stderr` redirects the report to the container's stderr so it is captured by `kubectl logs` along with the existing log4j2 output (the console appender already targets `SYSTEM_ERR`). ### Does this PR introduce _any_ user-facing change? Yes. The default behavior of the operator container changes as follows: | Aspect | Before | After | | --- | --- | --- | | JVM OOM flag | `-XX:+ExitOnOutOfMemoryError` | `-XX:+CrashOnOutOfMemoryError -XX:ErrorFile=/dev/stderr` | | Termination mechanism | `exit(1)` | `abort()` via `SIGABRT` | | Container exit code | `1` | `134` (`128 + SIGABRT(6)`) | | Fatal error report | None | Written to stderr (visible in `kubectl logs`) | | On-disk `hs_err_pid<pid>.log` | N/A | Not created (redirected to `/dev/stderr`) | Users who override `jvmArgs` are unaffected. ### How was this patch tested? - `helm lint build-tools/helm/spark-kubernetes-operator` passes. - `helm template build-tools/helm/spark-kubernetes-operator` shows the rendered `OPERATOR_JAVA_OPTS` env var contains `-XX:+CrashOnOutOfMemoryError -XX:ErrorFile=/dev/stderr` and no longer contains `-XX:+ExitOnOutOfMemoryError`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
