sstanoje opened a new pull request, #39136:
URL: https://github.com/apache/spark/pull/39136

   In the GraalVM team, we have been investigating the possibility of 
generating native images of Spark applications by compiling them ahead of time 
with GraalVM Native Image 
(https://www.graalvm.org/latest/reference-manual/native-image/). Being 
compatible with Native Image offers another deployment option to users who can 
benefit from instant startup, a smaller memory footprint, and a small packaging 
size (no JVM needed at runtime). However, Native Image relies on 
`java.lang.ClassLoader` for class loading. That means all classes loaded by one 
loader must have unique names. On Native Image, we have a special way of 
handling dynamically generated classes and use a single loader to load them 
all. So, their names must be unique and stable across runs.
   On the Spark side, 
`org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator` generates 
classes that are all named “GeneratedClass” but have different bytecodes. A 
simple approach to make Spark Native Image friendly is to use unique and stable 
names for those classes. A suggestion could be to append 
`org.apache.spark.sql.catalyst.expressions.codegen.CodeAndComment#hashCode` to 
the end of such a generated class.
   Moreover, generated classes in Spark could have lambdas in their body, which 
can also lead to unstable bytecodes even after the change above since lambdas 
contain their memory address in their generated names. Stripping this memory 
address will guarantee stable bytecodes also in this case. So with those two 
small changes, we can make all Spark-generated classes Native Image friendly. 
The minimal reproducer that fails on both Java and Native Image is in the 
attachment.
   With this patch, we can successfully run all 8 Spark benchmarks from the 
popular Renaissance benchmark suite as native images: https://renaissance.dev/.
   
[reproducer.zip](https://github.com/apache/spark/files/10266721/reproducer.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to