yadavay-amzn opened a new pull request, #55720: URL: https://github.com/apache/spark/pull/55720
### What changes were proposed in this pull request? Fixes SPARK-56448 (https://issues.apache.org/jira/browse/SPARK-56448). The Spark Connect REPL uses Ammonite. Ammonite's default `Storage.Folder` persists compiled predef classes under `~/.ammonite/<version>/cache`. On a subsequent REPL start from the same working directory, the cached `CodePredef` class is reloaded but its reference to the per-session `ArgsPredef` helper is stale, producing a `NullPointerException` during predef initialization. This PR switches the Connect REPL's compile cache to `Storage.InMemory` so every session starts fresh and no stale cache is carried across restarts. The `ammonite.Main` construction is extracted into a package-private helper (`ConnectRepl.newAmmoniteMain`) to keep the regression test localised to unit-level. ### Why are the changes needed? The stale-cache failure is a user-visible crash on every second invocation of `bin/spark-shell --remote sc://...` from the same working directory. Repro steps are on the JIRA. ### Does this PR introduce _any_ user-facing change? There is one minor observable tradeoff: because the compile cache is now in-memory rather than persisted, each REPL start recompiles the predef instead of reading the cached classfiles. This adds ~a few hundred milliseconds to subsequent REPL startups but eliminates the NPE. We believe this is the correct tradeoff — a small startup cost is preferable to a hard failure. ### How was this patch tested? Added `ConnectReplSuite` with a regression test that asserts the `ammonite.Main` returned by `ConnectRepl.newAmmoniteMain` exposes no on-disk cache directory (`storageBackend.dirOpt.isEmpty`) and is an instance of `Storage.InMemory`. The `dirOpt` check is an observable behavioural assertion: if the `Storage.InMemory()` wiring is reverted, `ammonite.Main` falls back to the default `Storage.Folder(Defaults.ammoniteHome)` and the assertion fails with a clear message rather than silently compiling. I verified this locally by temporarily reverting only the `Storage.InMemory()` line and re-running the test; it fails with `Some(/home/<user>/.ammonite) was not empty`. Restoring the fix makes the test pass. Targeted test run: ``` ./build/sbt "connect-client-jvm/Test/testOnly org.apache.spark.sql.application.ConnectReplSuite" ... [info] - SPARK-56448: Connect REPL uses an in-memory Ammonite storage backend (107 milliseconds) [info] Tests: succeeded 1, failed 0 ``` Also ran `./dev/scalastyle` locally — passes. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
