JoshRosen opened a new pull request #24905: [SPARK-28102] Add configuration for selecting LZ4 implementation (safe, unsafe, JNI) URL: https://github.com/apache/spark/pull/24905 ## What changes were proposed in this pull request? This PR adds a new `spark.io.compression.lz4.factory` configuration for selecting the LZ4 implementation (safe, unsafe, JNI). This allows advanced users to either explicitly opt-out of JNI code or to explicitly _require_ JNI code (hard-failing in case the JNI libraries cannot be loaded or initialized). Spark currently uses the default `LZ4BlockInputStream` / `LZ4BlockOutputStream` constructors, which use `LZ4Factory.fastestInstance()`: this factory attempts to load and initialize the JNI library and falls back to a Java implementation in case of errors (missing native library or exceptions during initialization). I deploy Spark in an environment where the JNI libraries don't work properly, so I'd like to explicitly disable the use of JNI to avoid performance problems in the existing fallback logic: with the current code, exceptions are repeatedly thrown from a `LZ4JNI` static initializer and this leads to significant lock contention because the filling of stacktraces is performed underneath this lock. In this PR, I introduce a single configuration to select both the `LZ4Factory` and `XXHashFactory` implementations. The default behavior is the same as before: use `fastestInstance`. ## How was this patch tested? New unit tests covering all values of the new flag.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
