[jira] [Commented] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)
[ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737453#comment-14737453 ] Timothy Chen commented on SPARK-9503: - Sorry this is indeed a bug and a fix is already in 1.5. Please try out the just released 1.5 and it shouldn't happen. > Mesos dispatcher NullPointerException (MesosClusterScheduler) > - > > Key: SPARK-9503 > URL: https://issues.apache.org/jira/browse/SPARK-9503 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.4.1 > Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83 >Reporter: Sebastian YEPES FERNANDEZ > Labels: mesosphere > > Hello, > I have just started using start-mesos-dispatcher and have been noticing that > some random crashes NPE's > By looking at the exception it looks like in certain situations the > "queuedDrivers" is empty and causes the NPE "submission.cores" > https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 > {code:title=log|borderStyle=solid} > 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting > applications on port 7077 > Exception in thread "Thread-1647" java.lang.NullPointerException > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) > I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver > I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework > '20150730-234528-4261456064-5050-61754-' > 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code > DRIVER_ABORTED > {code} > A side effect of this NPE is that after the crash the dispatcher will not > start because its already registered #SPARK-7831 > {code:title=log|borderStyle=solid} > 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at > http://192.168.0.254:8081 > I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0 > I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at > master@192.168.0.254:5050 > I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. > Attempting to register without authentication > I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework > attempted to re-register' > I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver > 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed > framework attempted to re-register > I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework > '20150730-234528-4261456064-5050-61754-0038' > 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code > DRIVER_ABORTED > 15/07/31 09:55:47 INFO Utils: Shutdown hook called > {code} > I can get around this by removing the zk data: > {code:title=zkCli.sh|borderStyle=solid} > rmr /spark_mesos_dispatcher > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)
[ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735750#comment-14735750 ] Sal Uryasev commented on SPARK-9503: Someone on my team is hitting the same bug. There is something suspicious going on within the code that may be the cause: removeFromQueuedDrivers is called while looping through queuedDrivers, calling "queuedDrivers.remove(index)" . > Mesos dispatcher NullPointerException (MesosClusterScheduler) > - > > Key: SPARK-9503 > URL: https://issues.apache.org/jira/browse/SPARK-9503 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.4.1 > Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83 >Reporter: Sebastian YEPES FERNANDEZ > Labels: mesosphere > > Hello, > I have just started using start-mesos-dispatcher and have been noticing that > some random crashes NPE's > By looking at the exception it looks like in certain situations the > "queuedDrivers" is empty and causes the NPE "submission.cores" > https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 > {code:title=log|borderStyle=solid} > 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting > applications on port 7077 > Exception in thread "Thread-1647" java.lang.NullPointerException > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) > I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver > I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework > '20150730-234528-4261456064-5050-61754-' > 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code > DRIVER_ABORTED > {code} > A side effect of this NPE is that after the crash the dispatcher will not > start because its already registered #SPARK-7831 > {code:title=log|borderStyle=solid} > 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at > http://192.168.0.254:8081 > I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0 > I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at > master@192.168.0.254:5050 > I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. > Attempting to register without authentication > I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework > attempted to re-register' > I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver > 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed > framework attempted to re-register > I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework > '20150730-234528-4261456064-5050-61754-0038' > 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code > DRIVER_ABORTED > 15/07/31 09:55:47 INFO Utils: Shutdown hook called > {code} > I can get around this by removing the zk data: > {code:title=zkCli.sh|borderStyle=solid} > rmr /spark_mesos_dispatcher > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org