My Spark application had huge performance refression after Spark git commit: 0441515f221146756800dc583b225bdec8a6c075

Jin, Zhonghui Fri, 01 Aug 2014 11:49:41 -0700

I found huge performance regression ( 1/20 of original) of my application after 
Spark git commit: 0441515f221146756800dc583b225bdec8a6c075.


Apply the following patch, will fix my issue:

diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala 
b/core/src/main/scala/org/apache/spark/executor/Executor.scala
index 214a8c8..ebec21d 100644
--- a/core/src/main/scala/org/apache/spark/executor/Executor.scala
+++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala
@@ -145,7 +145,7 @@ private[spark] class Executor(
       }
     }
-    override def run() {
+    override def run() : Unit = SparkHadoopUtil.get.runAsSparkUser { () =>
       val startTime = System.currentTimeMillis()
       SparkEnv.set(env)
       Thread.currentThread.setContextClassLoader(replClassLoader)

In the runAsSparkUser will call the 'UserGroupInformation.doAs()' to execute 
the task and my application running OK;
if not through it, the performance was very poor. Application hotspot was 
JNIHandleBlock::alloc_handle (JVM code, very high CPI (cycles per instruction, 
< 1 is OK) > 10)

My application passed large array data (>80K length) to native C code through 
JNI.

Why the "UserGroupInformation.doAs()" great impacted the performance under this 
situation?


Thanks,
Zhonghui

My Spark application had huge performance refression after Spark git commit: 0441515f221146756800dc583b225bdec8a6c075

Reply via email to