Przemyslaw Pastuszka created SPARK-3637:
-------------------------------------------

             Summary: NPE in ShuffleMapTask
                 Key: SPARK-3637
                 URL: https://issues.apache.org/jira/browse/SPARK-3637
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Przemyslaw Pastuszka


When trying to execute spark.jobserver.WordCountExample using spark-jobserver 
(https://github.com/ooyala/spark-jobserver) we observed that often it fails 
with NullPointerException in ShuffleMapTask.scala. Here are full details:
{code}
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 1.0 (TID 6, 
hadoop-simple-768-worker-with-zookeeper-0): java.lang.NullPointerException: \n  
      java.nio.ByteBuffer.wrap(ByteBuffer.java:392)\n        
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)\n    
    
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)\n    
    org.apache.spark.scheduler.Task.run(Task.scala:54)\n        
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)\n        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n
        java.lang.Thread.run(Thread.java:745)\nDriver stacktrace:",
    "errorClass": "org.apache.spark.SparkException",
    "stack": 
["org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1153)",
 
"org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1142)",
 
"org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1141)",
 
"scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)",
 "scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)", 
"org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1141)", 
"org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:682)",
 
"org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:682)",
 "scala.Option.foreach(Option.scala:236)", 
"org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:682)",
 
"org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1359)",
 "akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)", 
"akka.actor.ActorCell.invoke(ActorCell.scala:456)", 
"akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)", 
"akka.dispatch.Mailbox.run(Mailbox.scala:219)", 
"akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)",
 "scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)", 
"scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)",
 "scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)", 
"scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)"
{code}

I am aware, that this failure may be due to the job being ill-defined by 
spark-jobserver (I don't know if that's the case), but if so, then it should be 
handled more gratefully on spark side.
What's also important, that this issue doesn't happen always, which may 
indicate some type of race condition in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to