vsevolodstep-db opened a new pull request, #42995: URL: https://github.com/apache/spark/pull/42995
### What changes were proposed in this pull request? This PR enhances existing ClosureCleaner implementation to support cleaning closures defined in Ammonite. Please refer to [this gist](https://gist.github.com/vsevolodstep-db/b8e4d676745d6e2d047ecac291e5254c) to get more context on how Ammonite code wrapping works and what problems I'm trying to solve here. As we need `ClosureCleaner` to be available in Spark Connect, I also moved the implementation to `common-utils` module. This brings a new `xbean-asm9-shaded` which is fairly small. Existing implementation of `ClosureCleaner` also checks if the closure is serializable or not. This check is `spark-core` specific, so in order to preserve the existing code behaviour without changing other code pieces, I moved this check to `SparkClosureCleaner`, which is now used in `core`. The important changes affect `ClosureCleaner` only. They should not affect existing codepath for normal Scala closures / closures defined in a native Scala REPL and cover only closures defined in Ammonite. Also, this PR modifies SparkConnect's `UserDefinedFunction` to actually use `ClosureCleaner` ### Why are the changes needed? To properly support closures defined in Ammonite, reduce UDF payload size and avoid possible `NonSerializable` exceptions. This includes: - lambda capturing outer command object, leading in a circular dependency - lambda capturing other command objects transitively, exploding payload size ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. New tests in `ReplE2ESuite` covering various scenarios using SparkConnect + Ammonite REPL to make sure closures are actually cleaned. ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
