I found huge performance regression ( 1/20 of original) of my application after Spark git commit: 0441515f221146756800dc583b225bdec8a6c075.
Apply the following patch, will fix my issue: diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index 214a8c8..ebec21d 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -145,7 +145,7 @@ private[spark] class Executor( } } - override def run() { + override def run() : Unit = SparkHadoopUtil.get.runAsSparkUser { () => val startTime = System.currentTimeMillis() SparkEnv.set(env) Thread.currentThread.setContextClassLoader(replClassLoader) In the runAsSparkUser will call the 'UserGroupInformation.doAs()' to execute the task and my application running OK; if not through it, the performance was very poor. Application hotspot was JNIHandleBlock::alloc_handle (JVM code, very high CPI (cycles per instruction, < 1 is OK) > 10) My application passed large array data (>80K length) to native C code through JNI. Why the "UserGroupInformation.doAs()" great impacted the performance under this situation? Thanks, Zhonghui