Przemyslaw Pastuszka created SPARK-3637: -------------------------------------------
Summary: NPE in ShuffleMapTask Key: SPARK-3637 URL: https://issues.apache.org/jira/browse/SPARK-3637 Project: Spark Issue Type: Bug Affects Versions: 1.1.0 Reporter: Przemyslaw Pastuszka When trying to execute spark.jobserver.WordCountExample using spark-jobserver (https://github.com/ooyala/spark-jobserver) we observed that often it fails with NullPointerException in ShuffleMapTask.scala. Here are full details: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 6, hadoop-simple-768-worker-with-zookeeper-0): java.lang.NullPointerException: \n java.nio.ByteBuffer.wrap(ByteBuffer.java:392)\n org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)\n org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)\n org.apache.spark.scheduler.Task.run(Task.scala:54)\n org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)\n java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n java.lang.Thread.run(Thread.java:745)\nDriver stacktrace:", "errorClass": "org.apache.spark.SparkException", "stack": ["org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1153)", "org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1142)", "org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1141)", "scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)", "scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)", "org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1141)", "org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:682)", "org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:682)", "scala.Option.foreach(Option.scala:236)", "org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:682)", "org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1359)", "akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)", "akka.actor.ActorCell.invoke(ActorCell.scala:456)", "akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)", "akka.dispatch.Mailbox.run(Mailbox.scala:219)", "akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)", "scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)", "scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)", "scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)", "scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)" {code} I am aware, that this failure may be due to the job being ill-defined by spark-jobserver (I don't know if that's the case), but if so, then it should be handled more gratefully on spark side. What's also important, that this issue doesn't happen always, which may indicate some type of race condition in the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org