sstanoje opened a new pull request, #39136: URL: https://github.com/apache/spark/pull/39136
In the GraalVM team, we have been investigating the possibility of generating native images of Spark applications by compiling them ahead of time with GraalVM Native Image (https://www.graalvm.org/latest/reference-manual/native-image/). Being compatible with Native Image offers another deployment option to users who can benefit from instant startup, a smaller memory footprint, and a small packaging size (no JVM needed at runtime). However, Native Image relies on `java.lang.ClassLoader` for class loading. That means all classes loaded by one loader must have unique names. On Native Image, we have a special way of handling dynamically generated classes and use a single loader to load them all. So, their names must be unique and stable across runs. On the Spark side, `org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator` generates classes that are all named “GeneratedClass” but have different bytecodes. A simple approach to make Spark Native Image friendly is to use unique and stable names for those classes. A suggestion could be to append `org.apache.spark.sql.catalyst.expressions.codegen.CodeAndComment#hashCode` to the end of such a generated class. Moreover, generated classes in Spark could have lambdas in their body, which can also lead to unstable bytecodes even after the change above since lambdas contain their memory address in their generated names. Stripping this memory address will guarantee stable bytecodes also in this case. So with those two small changes, we can make all Spark-generated classes Native Image friendly. The minimal reproducer that fails on both Java and Native Image is in the attachment. With this patch, we can successfully run all 8 Spark benchmarks from the popular Renaissance benchmark suite as native images: https://renaissance.dev/. [reproducer.zip](https://github.com/apache/spark/files/10266721/reproducer.zip) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
