[jira] [Commented] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)

2015-09-09 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737453#comment-14737453
 ] 

Timothy Chen commented on SPARK-9503:
-

Sorry this is indeed a bug and a fix is already in 1.5.
Please try out the just released 1.5 and it shouldn't happen.

> Mesos dispatcher NullPointerException (MesosClusterScheduler)
> -
>
> Key: SPARK-9503
> URL: https://issues.apache.org/jira/browse/SPARK-9503
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.4.1
> Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83
>Reporter: Sebastian YEPES FERNANDEZ
>  Labels: mesosphere
>
> Hello,
> I have just started using start-mesos-dispatcher and have been noticing that 
> some random crashes NPE's
> By looking at the exception it looks like in certain situations the 
> "queuedDrivers" is empty and causes the NPE "submission.cores"
> https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516
> {code:title=log|borderStyle=solid}
> 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
> applications on port 7077
> Exception in thread "Thread-1647" java.lang.NullPointerException
> at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
> at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
> at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
> I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
> I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
> '20150730-234528-4261456064-5050-61754-'
> 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
> DRIVER_ABORTED
> {code}
> A side effect of this NPE is that after the crash the dispatcher will not 
> start because its already registered #SPARK-7831
> {code:title=log|borderStyle=solid}
> 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at 
> http://192.168.0.254:8081
> I0731 09:55:47.715039  8162 sched.cpp:157] Version: 0.23.0
> I0731 09:55:47.717013  8163 sched.cpp:254] New master detected at 
> master@192.168.0.254:5050
> I0731 09:55:47.717381  8163 sched.cpp:264] No credentials provided. 
> Attempting to register without authentication
> I0731 09:55:47.718246  8177 sched.cpp:819] Got error 'Completed framework 
> attempted to re-register'
> I0731 09:55:47.718268  8177 sched.cpp:1625] Asked to abort the driver
> 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed 
> framework attempted to re-register
> I0731 09:55:47.719091  8177 sched.cpp:861] Aborting framework 
> '20150730-234528-4261456064-5050-61754-0038'
> 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code 
> DRIVER_ABORTED
> 15/07/31 09:55:47 INFO Utils: Shutdown hook called
> {code}
> I can get around this by removing the zk data:
> {code:title=zkCli.sh|borderStyle=solid}
> rmr /spark_mesos_dispatcher
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)

2015-09-08 Thread Sal Uryasev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735750#comment-14735750
 ] 

Sal Uryasev commented on SPARK-9503:


Someone on my team is hitting the same bug.

There is something suspicious going on within the code that may be the cause:
removeFromQueuedDrivers is called while looping through queuedDrivers, calling 
"queuedDrivers.remove(index)" .

> Mesos dispatcher NullPointerException (MesosClusterScheduler)
> -
>
> Key: SPARK-9503
> URL: https://issues.apache.org/jira/browse/SPARK-9503
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.4.1
> Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83
>Reporter: Sebastian YEPES FERNANDEZ
>  Labels: mesosphere
>
> Hello,
> I have just started using start-mesos-dispatcher and have been noticing that 
> some random crashes NPE's
> By looking at the exception it looks like in certain situations the 
> "queuedDrivers" is empty and causes the NPE "submission.cores"
> https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516
> {code:title=log|borderStyle=solid}
> 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting 
> applications on port 7077
> Exception in thread "Thread-1647" java.lang.NullPointerException
> at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
> at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
> at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
> I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
> I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework 
> '20150730-234528-4261456064-5050-61754-'
> 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code 
> DRIVER_ABORTED
> {code}
> A side effect of this NPE is that after the crash the dispatcher will not 
> start because its already registered #SPARK-7831
> {code:title=log|borderStyle=solid}
> 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at 
> http://192.168.0.254:8081
> I0731 09:55:47.715039  8162 sched.cpp:157] Version: 0.23.0
> I0731 09:55:47.717013  8163 sched.cpp:254] New master detected at 
> master@192.168.0.254:5050
> I0731 09:55:47.717381  8163 sched.cpp:264] No credentials provided. 
> Attempting to register without authentication
> I0731 09:55:47.718246  8177 sched.cpp:819] Got error 'Completed framework 
> attempted to re-register'
> I0731 09:55:47.718268  8177 sched.cpp:1625] Asked to abort the driver
> 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed 
> framework attempted to re-register
> I0731 09:55:47.719091  8177 sched.cpp:861] Aborting framework 
> '20150730-234528-4261456064-5050-61754-0038'
> 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code 
> DRIVER_ABORTED
> 15/07/31 09:55:47 INFO Utils: Shutdown hook called
> {code}
> I can get around this by removing the zk data:
> {code:title=zkCli.sh|borderStyle=solid}
> rmr /spark_mesos_dispatcher
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org