[ https://issues.apache.org/jira/browse/SPARK-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin (Sangwoo) Kim resolved SPARK-1963. ---------------------------------------- Resolution: Invalid > Job aborted with NullPointerException from DAGScheduler.scala:1020 > ------------------------------------------------------------------ > > Key: SPARK-1963 > URL: https://issues.apache.org/jira/browse/SPARK-1963 > Project: Spark > Issue Type: Bug > Reporter: Kevin (Sangwoo) Kim > > Hi, I'm testing Spark 0.9.1 from EC2 r3.8xlarge (32 core, 240GiB MEM) > During counting active user from 70GB of data, Spark job aborted with NPE > from DAGScheduler. > I guess the number of active user count is around 1~2M. > Here's what I did > {code} > val logs = sc.textFile("file:///spark/data/*") > val activeUser = logs.map{x => val a = > LogObjectExtractor.getAnonymousAction(x); a.getUserId}.distinct > activeUser.count > {code} > and here's the log. > {code} > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Serialized task 1.0:2235 as > 1883 bytes in 1 ms > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Finished TID 2207 in 17541 > ms on ip-10-169-5-198.ap-northeast-1.compute.internal (progress: 2204/2267) > 14/05/29 05:26:46 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, > 2207) > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Starting task 1.0:2236 as > TID 2236 on executor 0: ip-10-169-5-198.ap-northeast-1.compute.internal > (PROCESS_LOCAL) > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Serialized task 1.0:2236 as > 1883 bytes in 1 ms > 14/05/29 05:26:46 WARN scheduler.TaskSetManager: Lost TID 2230 (task 1.0:2230) > 14/05/29 05:26:46 WARN scheduler.TaskSetManager: Loss was due to > java.lang.NullPointerException > java.lang.NullPointerException > at > $line16.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:17) > at > $line16.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:17) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:97) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) > at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:477) > at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:477) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) > at org.apache.spark.scheduler.Task.run(Task.scala:53) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Starting task 1.0:2230 as > TID 2237 on executor 0: ip-10-169-5-198.ap-northeast-1.compute.internal > (PROCESS_LOCAL) > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Serialized task 1.0:2230 as > 1883 bytes in 0 ms > 14/05/29 05:26:46 WARN scheduler.TaskSetManager: Lost TID 2231 (task 1.0:2231) > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Loss was due to > java.lang.NullPointerException [duplicate 1] > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Starting task 1.0:2231 as > TID 2238 on executor 0: ip-10-169-5-198.ap-northeast-1.compute.internal > (PROCESS_LOCAL) > {code} > ... > {code} > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Loss was due to > java.lang.NullPointerException [duplicate 27] > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Loss was due to > java.lang.NullPointerException [duplicate 28] > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Finished TID 2201 in 17959 > ms on ip-10-169-5-198.ap-northeast-1.compute.internal (progress: 2210/2267) > 14/05/29 05:26:46 INFO scheduler.TaskSetManager: Finished TID 2209 in 16588 > ms on ip-10-169-5-198.ap-northeast-1.compute.internal (progress: 2211/2267) > org.apache.spark.SparkException: Job aborted: Task 1.0:2230 failed 4 times > (most recent failure: Exception failure: java.lang.NullPointerException) > {code} > Thanks! -- This message was sent by Atlassian JIRA (v6.2#6252)