JoshRosen edited a comment on issue #24905: [SPARK-28102] Avoid performance problems when lz4-java JNI libraries fail to initialize URL: https://github.com/apache/spark/pull/24905#issuecomment-503624028 Here's an example microbenchmark illustrating the performance problems in the old code in case JNI initialization failed: ```scala val numThreads = 10 // e.g. number of task threads val numCallsPerThread = 5000 // e.g. number of reduce partitions val threads = (1 to numThreads).map { _ => new Thread { override def run(): Unit = { (1 to numCallsPerThread).foreach { _ => shaded.spark.net.jpountz.lz4.LZ4Factory.fastestInstance } } } } val start = System.currentTimeMillis() threads.foreach(_.start()) threads.foreach(_.join()) val end = System.currentTimeMillis println(end - start) ``` If I use `fastestJavaInstance` then this runs in ~15ms, but it takes ~950ms with `fastestInstance` if the JNI library fails to initialize. If we cache the result of the `fastestInstance` call then performance is identical.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
