yadavay-amzn opened a new pull request, #55720:
URL: https://github.com/apache/spark/pull/55720

   ### What changes were proposed in this pull request?
   
   Fixes SPARK-56448 (https://issues.apache.org/jira/browse/SPARK-56448).
   
   The Spark Connect REPL uses Ammonite. Ammonite's default `Storage.Folder`
   persists compiled predef classes under `~/.ammonite/<version>/cache`. On a
   subsequent REPL start from the same working directory, the cached
   `CodePredef` class is reloaded but its reference to the per-session
   `ArgsPredef` helper is stale, producing a `NullPointerException` during
   predef initialization.
   
   This PR switches the Connect REPL's compile cache to `Storage.InMemory`
   so every session starts fresh and no stale cache is carried across
   restarts. The `ammonite.Main` construction is extracted into a
   package-private helper (`ConnectRepl.newAmmoniteMain`) to keep the
   regression test localised to unit-level.
   
   ### Why are the changes needed?
   
   The stale-cache failure is a user-visible crash on every second
   invocation of `bin/spark-shell --remote sc://...` from the same working
   directory. Repro steps are on the JIRA.
   
   ### Does this PR introduce _any_ user-facing change?
   
   There is one minor observable tradeoff: because the compile cache is
   now in-memory rather than persisted, each REPL start recompiles the
   predef instead of reading the cached classfiles. This adds ~a few
   hundred milliseconds to subsequent REPL startups but eliminates the
   NPE. We believe this is the correct tradeoff — a small startup cost
   is preferable to a hard failure.
   
   ### How was this patch tested?
   
   Added `ConnectReplSuite` with a regression test that asserts the
   `ammonite.Main` returned by `ConnectRepl.newAmmoniteMain` exposes no
   on-disk cache directory (`storageBackend.dirOpt.isEmpty`) and is an
   instance of `Storage.InMemory`.
   
   The `dirOpt` check is an observable behavioural assertion: if the
   `Storage.InMemory()` wiring is reverted, `ammonite.Main` falls back to
   the default `Storage.Folder(Defaults.ammoniteHome)` and the assertion
   fails with a clear message rather than silently compiling. I verified
   this locally by temporarily reverting only the `Storage.InMemory()`
   line and re-running the test; it fails with
   `Some(/home/<user>/.ammonite) was not empty`. Restoring the fix makes
   the test pass.
   
   Targeted test run:
   ```
   ./build/sbt "connect-client-jvm/Test/testOnly 
org.apache.spark.sql.application.ConnectReplSuite"
   ...
   [info] - SPARK-56448: Connect REPL uses an in-memory Ammonite storage 
backend (107 milliseconds)
   [info] Tests: succeeded 1, failed 0
   ```
   
   Also ran `./dev/scalastyle` locally — passes.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.7
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to