[jira] [Created] (SPARK-25960) Support subpath mounting with Kubernetes
Timothy Chen created SPARK-25960: Summary: Support subpath mounting with Kubernetes Key: SPARK-25960 URL: https://issues.apache.org/jira/browse/SPARK-25960 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 2.5.0 Reporter: Timothy Chen Currently we support mounting volumes into executor and driver, but there is no option to provide a subpath to be mounted from the volume. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25148) Executors launched with Spark on K8s client mode should prefix name with spark.app.name
Timothy Chen created SPARK-25148: Summary: Executors launched with Spark on K8s client mode should prefix name with spark.app.name Key: SPARK-25148 URL: https://issues.apache.org/jira/browse/SPARK-25148 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Timothy Chen With the latest added client mode with Spark on k8s, executors launched by default are all named "spark-exec-#". Which means when multiple jobs are launched in the same cluster, they often have to retry to find unused pod names. Also it's hard to correlate which executors are launched for which spark app. The work around is to manually use the executor prefix configuration for each job launched. Ideally the experience should be the same for cluster mode, which each executor is default prefix with the spark.app.name. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23953) Add get_json_scalar function
[ https://issues.apache.org/jira/browse/SPARK-23953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen resolved SPARK-23953. -- Resolution: Invalid > Add get_json_scalar function > > > Key: SPARK-23953 > URL: https://issues.apache.org/jira/browse/SPARK-23953 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.3.0 >Reporter: Timothy Chen >Priority: Major > > Besides get_json_object which returns a JSON string in a return type, we > should add a function "get_json_scalar" that returns a scalar type assuming > the path maps to a scalar (boolean, number, string or null). It returns null > when the path points to a object or array structure > This is also in Presto (https://prestodb.io/docs/current/functions/json.html). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23953) Add get_json_scalar function
Timothy Chen created SPARK-23953: Summary: Add get_json_scalar function Key: SPARK-23953 URL: https://issues.apache.org/jira/browse/SPARK-23953 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.3.0 Reporter: Timothy Chen Besides get_json_object which returns a JSON string in a return type, we should add a function "get_json_scalar" that returns a scalar type assuming the path maps to a scalar (boolean, number, string or null). It returns null when the path points to a object or array structure This is also in Presto (https://prestodb.io/docs/current/functions/json.html). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19320) Allow guaranteed amount of GPU to be used when launching jobs
Timothy Chen created SPARK-19320: Summary: Allow guaranteed amount of GPU to be used when launching jobs Key: SPARK-19320 URL: https://issues.apache.org/jira/browse/SPARK-19320 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently the only configuration for using GPUs with Mesos is setting the maximum amount of GPUs a job will take from an offer, but doesn't guarantee exactly how much. We should have a configuration that sets this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14645) non local Python resource doesn't work with Mesos cluster mode
Timothy Chen created SPARK-14645: Summary: non local Python resource doesn't work with Mesos cluster mode Key: SPARK-14645 URL: https://issues.apache.org/jira/browse/SPARK-14645 Project: Spark Issue Type: Bug Reporter: Timothy Chen Currently SparkSubmit explicitly allows non-local python resources for cluster mode with Mesos, which it's actually supported. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14082) Add support for GPU resource when running on Mesos
Timothy Chen created SPARK-14082: Summary: Add support for GPU resource when running on Mesos Key: SPARK-14082 URL: https://issues.apache.org/jira/browse/SPARK-14082 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen As Mesos is integrating GPU as a first class resource, Spark can benefit by allowing frameworks to launch their jobs with GPU and using the GPU information provided by Mesos to discover/run their jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13414) Add support for launching multiple Mesos dispatchers
Timothy Chen created SPARK-13414: Summary: Add support for launching multiple Mesos dispatchers Key: SPARK-13414 URL: https://issues.apache.org/jira/browse/SPARK-13414 Project: Spark Issue Type: Improvement Reporter: Timothy Chen Currently the sbin/[start|stop]-mesos-dispatcher scripts only assume there is one mesos dispatcher launched, but potentially users that like to run multi-tenant dispatcher might want to launch multiples. It also helps local development to have the ability to launch multiple ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13387) Add support for SPARK_DAEMON_JAVA_OPTS with MesosClusterDispatcher.
Timothy Chen created SPARK-13387: Summary: Add support for SPARK_DAEMON_JAVA_OPTS with MesosClusterDispatcher. Key: SPARK-13387 URL: https://issues.apache.org/jira/browse/SPARK-13387 Project: Spark Issue Type: Improvement Reporter: Timothy Chen As SPARK_JAVA_OPTS is getting deprecated, to allow setting java properties for MesosClusterDispatcher it also should support SPARK_DAEMON_JAVA_OPTS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12892) Support plugging in Spark scheduler
Timothy Chen created SPARK-12892: Summary: Support plugging in Spark scheduler Key: SPARK-12892 URL: https://issues.apache.org/jira/browse/SPARK-12892 Project: Spark Issue Type: Improvement Reporter: Timothy Chen Currently the only supported cluster schedulers are standalone, Mesos, Yarn and Simr. However if users like to build a new one it must be merged back into main, and might not be desirable for Spark and hard to iterate. Instead, we should make a plugin architecture possible so that when users like to integrate with new scheduler it can plugged in via configuration and runtime loading instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12465) Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir
Timothy Chen created SPARK-12465: Summary: Remove spark.deploy.mesos.zookeeper.dir and use spark.deploy.zookeeper.dir Key: SPARK-12465 URL: https://issues.apache.org/jira/browse/SPARK-12465 Project: Spark Issue Type: Task Components: Mesos Reporter: Timothy Chen Remove spark.deploy.mesos.zookeeper.dir and use existing configuration spark.deploy.zookeeper.dir for Mesos cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12464) Remove spark.deploy.mesos.zookeeper.url and use spark.deploy.zookeeper.url
Timothy Chen created SPARK-12464: Summary: Remove spark.deploy.mesos.zookeeper.url and use spark.deploy.zookeeper.url Key: SPARK-12464 URL: https://issues.apache.org/jira/browse/SPARK-12464 Project: Spark Issue Type: Task Components: Mesos Reporter: Timothy Chen Remove spark.deploy.mesos.zookeeper.url and use existing configuration spark.deploy.zookeeper.url for Mesos cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12463) Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode
Timothy Chen created SPARK-12463: Summary: Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode Key: SPARK-12463 URL: https://issues.apache.org/jira/browse/SPARK-12463 Project: Spark Issue Type: Task Reporter: Timothy Chen Remove spark.deploy.mesos.recoveryMode and use spark.deploy.recoveryMode configuration for cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12351) Add documentation of submitting Mesos jobs with cluster
Timothy Chen created SPARK-12351: Summary: Add documentation of submitting Mesos jobs with cluster Key: SPARK-12351 URL: https://issues.apache.org/jira/browse/SPARK-12351 Project: Spark Issue Type: Documentation Reporter: Timothy Chen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12351) Add documentation of submitting Mesos jobs with cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-12351: - Description: Add more documentation around how to launch spark drivers with Mesos cluster mode Summary: Add documentation of submitting Mesos jobs with cluster mode (was: Add documentation of submitting Mesos jobs with cluster) > Add documentation of submitting Mesos jobs with cluster mode > > > Key: SPARK-12351 > URL: https://issues.apache.org/jira/browse/SPARK-12351 > Project: Spark > Issue Type: Documentation >Reporter: Timothy Chen > > Add more documentation around how to launch spark drivers with Mesos cluster > mode -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10749) Support multiple roles with Spark Mesos dispatcher
Timothy Chen created SPARK-10749: Summary: Support multiple roles with Spark Mesos dispatcher Key: SPARK-10749 URL: https://issues.apache.org/jira/browse/SPARK-10749 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Although you can currently set the framework role of the Mesos dispatcher, it doesn't correctly use the offers given to it. It should inherit how Coarse/Fine grain scheduler works and use multiple roles offers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10748) Log error instead of crashing Spark Mesos dispatcher when a job is misconfigured
Timothy Chen created SPARK-10748: Summary: Log error instead of crashing Spark Mesos dispatcher when a job is misconfigured Key: SPARK-10748 URL: https://issues.apache.org/jira/browse/SPARK-10748 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen Currently when a dispatcher is submitting a new driver, it simply throws a SparkExecption when necessary configuration is not set. We should log and keep the dispatcher running instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)
[ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737453#comment-14737453 ] Timothy Chen commented on SPARK-9503: - Sorry this is indeed a bug and a fix is already in 1.5. Please try out the just released 1.5 and it shouldn't happen. > Mesos dispatcher NullPointerException (MesosClusterScheduler) > - > > Key: SPARK-9503 > URL: https://issues.apache.org/jira/browse/SPARK-9503 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.4.1 > Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83 >Reporter: Sebastian YEPES FERNANDEZ > Labels: mesosphere > > Hello, > I have just started using start-mesos-dispatcher and have been noticing that > some random crashes NPE's > By looking at the exception it looks like in certain situations the > "queuedDrivers" is empty and causes the NPE "submission.cores" > https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 > {code:title=log|borderStyle=solid} > 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting > applications on port 7077 > Exception in thread "Thread-1647" java.lang.NullPointerException > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) > I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver > I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework > '20150730-234528-4261456064-5050-61754-' > 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code > DRIVER_ABORTED > {code} > A side effect of this NPE is that after the crash the dispatcher will not > start because its already registered #SPARK-7831 > {code:title=log|borderStyle=solid} > 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at > http://192.168.0.254:8081 > I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0 > I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at > master@192.168.0.254:5050 > I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. > Attempting to register without authentication > I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework > attempted to re-register' > I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver > 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed > framework attempted to re-register > I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework > '20150730-234528-4261456064-5050-61754-0038' > 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code > DRIVER_ABORTED > 15/07/31 09:55:47 INFO Utils: Shutdown hook called > {code} > I can get around this by removing the zk data: > {code:title=zkCli.sh|borderStyle=solid} > rmr /spark_mesos_dispatcher > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10313) Support HA stateful driver on Mesos cluster mode
Timothy Chen created SPARK-10313: Summary: Support HA stateful driver on Mesos cluster mode Key: SPARK-10313 URL: https://issues.apache.org/jira/browse/SPARK-10313 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Spark driver state becomes important to recover after failure especially in Spark streaming context. We can allow Spark cluster mode framework to support a stateful supervised driver mode, that launches Spark drivers in persistent volumes with Mesos, and on relaunch tries to relaunch the driver with the same volume mounted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10160) Support Spark shell over Mesos Cluster Mode
Timothy Chen created SPARK-10160: Summary: Support Spark shell over Mesos Cluster Mode Key: SPARK-10160 URL: https://issues.apache.org/jira/browse/SPARK-10160 Project: Spark Issue Type: Improvement Components: Mesos, Spark Shell Reporter: Timothy Chen It's not possible to run Spark-shell with cluster mode since the shell that is running in the cluster is not being able to interact with the client. We can build a proxy that is transferring the inputs of the user and the output of the shell, and also be able to get connected and reconnected from the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode
[ https://issues.apache.org/jira/browse/SPARK-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-10161: - Component/s: (was: Spark Shell) PySpark Support Pyspark shell over Mesos Cluster Mode - Key: SPARK-10161 URL: https://issues.apache.org/jira/browse/SPARK-10161 Project: Spark Issue Type: Improvement Components: Mesos, PySpark Reporter: Timothy Chen It's not possible to run Pyspark shell with cluster mode since the shell that is running in the cluster is not being able to interact with the client. We can build a proxy that is transferring the inputs of the user and the output of the shell, and also be able to get connected and reconnected from the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode
Timothy Chen created SPARK-10161: Summary: Support Pyspark shell over Mesos Cluster Mode Key: SPARK-10161 URL: https://issues.apache.org/jira/browse/SPARK-10161 Project: Spark Issue Type: Improvement Components: Mesos, Spark Shell Reporter: Timothy Chen It's not possible to run Spark-shell with cluster mode since the shell that is running in the cluster is not being able to interact with the client. We can build a proxy that is transferring the inputs of the user and the output of the shell, and also be able to get connected and reconnected from the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10161) Support Pyspark shell over Mesos Cluster Mode
[ https://issues.apache.org/jira/browse/SPARK-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-10161: - Description: It's not possible to run Pyspark shell with cluster mode since the shell that is running in the cluster is not being able to interact with the client. We can build a proxy that is transferring the inputs of the user and the output of the shell, and also be able to get connected and reconnected from the user. was: It's not possible to run Spark-shell with cluster mode since the shell that is running in the cluster is not being able to interact with the client. We can build a proxy that is transferring the inputs of the user and the output of the shell, and also be able to get connected and reconnected from the user. Support Pyspark shell over Mesos Cluster Mode - Key: SPARK-10161 URL: https://issues.apache.org/jira/browse/SPARK-10161 Project: Spark Issue Type: Improvement Components: Mesos, PySpark Reporter: Timothy Chen It's not possible to run Pyspark shell with cluster mode since the shell that is running in the cluster is not being able to interact with the client. We can build a proxy that is transferring the inputs of the user and the output of the shell, and also be able to get connected and reconnected from the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10124) Mesos cluster mode causes exception when multiple spark apps are being scheduled
Timothy Chen created SPARK-10124: Summary: Mesos cluster mode causes exception when multiple spark apps are being scheduled Key: SPARK-10124 URL: https://issues.apache.org/jira/browse/SPARK-10124 Project: Spark Issue Type: Bug Reporter: Timothy Chen Currently the spark applications can be queued to the Mesos cluster dispatcher, but when multiple jobs are in queue we don't handle removing jobs from the buffer correctly while iterating and causes null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10124) Mesos cluster mode causes exception when multiple spark apps are being scheduled
[ https://issues.apache.org/jira/browse/SPARK-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-10124: - Target Version/s: 1.5.1 Mesos cluster mode causes exception when multiple spark apps are being scheduled Key: SPARK-10124 URL: https://issues.apache.org/jira/browse/SPARK-10124 Project: Spark Issue Type: Bug Reporter: Timothy Chen Currently the spark applications can be queued to the Mesos cluster dispatcher, but when multiple jobs are in queue we don't handle removing jobs from the buffer correctly while iterating and causes null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10124) Mesos cluster mode causes exception when multiple spark apps are being scheduled
[ https://issues.apache.org/jira/browse/SPARK-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-10124: - Target Version/s: 1.5.0 (was: 1.5.1) Mesos cluster mode causes exception when multiple spark apps are being scheduled Key: SPARK-10124 URL: https://issues.apache.org/jira/browse/SPARK-10124 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen Currently the spark applications can be queued to the Mesos cluster dispatcher, but when multiple jobs are in queue we don't handle removing jobs from the buffer correctly while iterating and causes null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10124) Mesos cluster mode causes exception when multiple spark apps are being scheduled
[ https://issues.apache.org/jira/browse/SPARK-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-10124: - Component/s: Mesos Mesos cluster mode causes exception when multiple spark apps are being scheduled Key: SPARK-10124 URL: https://issues.apache.org/jira/browse/SPARK-10124 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen Currently the spark applications can be queued to the Mesos cluster dispatcher, but when multiple jobs are in queue we don't handle removing jobs from the buffer correctly while iterating and causes null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9873) Cap the amount of executors launched in Mesos fine grain mode
Timothy Chen created SPARK-9873: --- Summary: Cap the amount of executors launched in Mesos fine grain mode Key: SPARK-9873 URL: https://issues.apache.org/jira/browse/SPARK-9873 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in fine grain mode as long as there is resources available that matches the scheduler requirement Spark will try to use the resources offered by mesos, which means to excessive resource usage that can lead to other frameworks not able to get their fair share. We should add a option to cap the number of executors launched, so the combination of spark.task.cpus and spark.mesos.executor.max is the total amount of cores it will grab. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9708) Spark should create local temporary directories in Mesos sandbox when launched with Mesos
Timothy Chen created SPARK-9708: --- Summary: Spark should create local temporary directories in Mesos sandbox when launched with Mesos Key: SPARK-9708 URL: https://issues.apache.org/jira/browse/SPARK-9708 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen Currently Spark creates temporary directories with Utils.getConfiguredLocalDirs, and it writes to YARN directories if YARN is detected, otherwise just writes in a temporary directory in the host. However, Mesos does create a directory per task and ideally Spark should use that directory to create its local temporary directories since it then can be cleaned up when the task is gone and not left on the host or cleaned until reboot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7876) Make Spark UI content paths non absolute paths
[ https://issues.apache.org/jira/browse/SPARK-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen resolved SPARK-7876. - Resolution: Fixed Make Spark UI content paths non absolute paths -- Key: SPARK-7876 URL: https://issues.apache.org/jira/browse/SPARK-7876 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Timothy Chen Currently all the SparkUI href and img/css paths in the HTML rendered are absolute paths. This is problematic if you try to deploy Spark in the cloud and putting a proxy in front of the cluster, which is common since most env don't want to allocate public ips for every node in the cluster and Spark drivers can potentially launch anywhere with cluster mode. By making the paths relative, all the paths can go through the proxy based on the original request URL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7876) Make Spark UI content paths non absolute paths
[ https://issues.apache.org/jira/browse/SPARK-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen resolved SPARK-7876. - Resolution: Fixed Make Spark UI content paths non absolute paths -- Key: SPARK-7876 URL: https://issues.apache.org/jira/browse/SPARK-7876 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Timothy Chen Currently all the SparkUI href and img/css paths in the HTML rendered are absolute paths. This is problematic if you try to deploy Spark in the cloud and putting a proxy in front of the cluster, which is common since most env don't want to allocate public ips for every node in the cluster and Spark drivers can potentially launch anywhere with cluster mode. By making the paths relative, all the paths can go through the proxy based on the original request URL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7962) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed SPARK-7962. --- Mesos cluster mode is broken Key: SPARK-7962 URL: https://issues.apache.org/jira/browse/SPARK-7962 Project: Spark Issue Type: Bug Components: Mesos, Spark Submit Affects Versions: 1.4.0 Reporter: Timothy Chen Assignee: Timothy Chen Priority: Critical Fix For: 1.4.0 Rest submission client prepends extra spark:// for non standalone master urls -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7876) Make Spark UI content paths non absolute paths
[ https://issues.apache.org/jira/browse/SPARK-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen reopened SPARK-7876: - Make Spark UI content paths non absolute paths -- Key: SPARK-7876 URL: https://issues.apache.org/jira/browse/SPARK-7876 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Timothy Chen Currently all the SparkUI href and img/css paths in the HTML rendered are absolute paths. This is problematic if you try to deploy Spark in the cloud and putting a proxy in front of the cluster, which is common since most env don't want to allocate public ips for every node in the cluster and Spark drivers can potentially launch anywhere with cluster mode. By making the paths relative, all the paths can go through the proxy based on the original request URL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9669) Support PySpark with Mesos Cluster mode
Timothy Chen created SPARK-9669: --- Summary: Support PySpark with Mesos Cluster mode Key: SPARK-9669 URL: https://issues.apache.org/jira/browse/SPARK-9669 Project: Spark Issue Type: Improvement Components: Mesos, PySpark Reporter: Timothy Chen PySpark with cluster mode with Mesos is not yet supported. We need to enable it and make sure it's able to launch Pyspark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9575) Add documentation around Mesos shuffle service and dynamic allocation
Timothy Chen created SPARK-9575: --- Summary: Add documentation around Mesos shuffle service and dynamic allocation Key: SPARK-9575 URL: https://issues.apache.org/jira/browse/SPARK-9575 Project: Spark Issue Type: Documentation Components: Mesos Reporter: Timothy Chen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8873) Support cleaning up shuffle files for drivers launched with Mesos
Timothy Chen created SPARK-8873: --- Summary: Support cleaning up shuffle files for drivers launched with Mesos Key: SPARK-8873 URL: https://issues.apache.org/jira/browse/SPARK-8873 Project: Spark Issue Type: Improvement Reporter: Timothy Chen With dynamic allocation enabled with Mesos, drivers can launch with shuffle data cached in the external shuffle service. However, there is no reliable way to let the shuffle service clean up the shuffle data when the driver exits, since it may crash before it notifies the shuffle service and shuffle data will be cached forever. We need to implement a reliable way to detect driver termination and clean up shuffle data accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8798) Allow additional uris to be fetched with mesos
Timothy Chen created SPARK-8798: --- Summary: Allow additional uris to be fetched with mesos Key: SPARK-8798 URL: https://issues.apache.org/jira/browse/SPARK-8798 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8083) Fix return to drivers link in Mesos driver page
Timothy Chen created SPARK-8083: --- Summary: Fix return to drivers link in Mesos driver page Key: SPARK-8083 URL: https://issues.apache.org/jira/browse/SPARK-8083 Project: Spark Issue Type: Bug Reporter: Timothy Chen The current path is set to / but this doesn't work with a proxy. We need to prepend the proxy base uri if it's set. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7962) Rest submission client prepends extra spark:// for non standalone master urls
[ https://issues.apache.org/jira/browse/SPARK-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-7962: Summary: Rest submission client prepends extra spark:// for non standalone master urls (was: est submission client prepends extra spark:// for ) Rest submission client prepends extra spark:// for non standalone master urls - Key: SPARK-7962 URL: https://issues.apache.org/jira/browse/SPARK-7962 Project: Spark Issue Type: Bug Components: Spark Submit Reporter: Timothy Chen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7962) est submission client prepends extra spark:// for
Timothy Chen created SPARK-7962: --- Summary: est submission client prepends extra spark:// for Key: SPARK-7962 URL: https://issues.apache.org/jira/browse/SPARK-7962 Project: Spark Issue Type: Bug Reporter: Timothy Chen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7962) Rest submission client prepends extra spark:// for non standalone master urls
[ https://issues.apache.org/jira/browse/SPARK-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-7962: Component/s: Spark Submit Rest submission client prepends extra spark:// for non standalone master urls - Key: SPARK-7962 URL: https://issues.apache.org/jira/browse/SPARK-7962 Project: Spark Issue Type: Bug Components: Spark Submit Reporter: Timothy Chen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7876) Make Spark UI content paths non absolute paths
Timothy Chen created SPARK-7876: --- Summary: Make Spark UI content paths non absolute paths Key: SPARK-7876 URL: https://issues.apache.org/jira/browse/SPARK-7876 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Timothy Chen Currently all the SparkUI href and img/css paths in the HTML rendered are absolute paths. This is problematic if you try to deploy Spark in the cloud and putting a proxy in front of the cluster, which is common since most env don't want to allocate public ips for every node in the cluster and Spark drivers can potentially launch anywhere with cluster mode. By making the paths relative, all the paths can go through the proxy based on the original request URL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7877) Support non-persistent cluster mode
Timothy Chen created SPARK-7877: --- Summary: Support non-persistent cluster mode Key: SPARK-7877 URL: https://issues.apache.org/jira/browse/SPARK-7877 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently mesos cluster mode framework by default won't be removed even when it's shutdown since it's assumed to be a long running framework can register and reattach to all the running tasks. However, there might be cases users want to make the framework more empheral, which when the framework dies all the tasks stops and mesos doesn't keep the framework state at all. Besides making the state be in memory, we also need to make the framework failover timeout to be a small amount, which should be configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7831) Mesos dispatcher doesn't deregister as a framework from Mesos when stopped
[ https://issues.apache.org/jira/browse/SPARK-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556382#comment-14556382 ] Timothy Chen commented on SPARK-7831: - Hi Luc, actually that's the default behavior that I put into the dispatcher framework, since I expect the dispatcher to be a long running framework and even when it goes away it's expected to be resumed and all the tasks should be running. To really shut down the framework a user can ask Mesos to terminate the framework via the shutdown REST api call. I think what we should do here is to make a configuration flag to trigger this behavior or not, and default I think should be what its currently like. What you think? Mesos dispatcher doesn't deregister as a framework from Mesos when stopped -- Key: SPARK-7831 URL: https://issues.apache.org/jira/browse/SPARK-7831 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.4.0 Environment: Spark 1.4.0-rc1, Mesos 0.2.2 (compiled from source) Reporter: Luc Bourlier To run Spark on Mesos in cluster mode, a Spark Mesos dispatcher has to be running. It is launched using {{sbin/start-mesos-dispatcher.sh}}. The Mesos dispatcher registers as a framework in the Mesos cluster. After using {{sbin/stop-mesos-dispatcher.sh}} to stop the dispatcher, the application is correctly terminated locally, but the framework is still listed as {{active}} in the Mesos dashboard. I would expect the framework to be de-registered when the dispatcher is stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7373) Support launching Spark drivers in Docker images with Mesos cluster mode
Timothy Chen created SPARK-7373: --- Summary: Support launching Spark drivers in Docker images with Mesos cluster mode Key: SPARK-7373 URL: https://issues.apache.org/jira/browse/SPARK-7373 Project: Spark Issue Type: Improvement Reporter: Timothy Chen Support launching Spark drivers in Docker images with Mesos cluster mode -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7216) Show driver details in Mesos cluster UI
Timothy Chen created SPARK-7216: --- Summary: Show driver details in Mesos cluster UI Key: SPARK-7216 URL: https://issues.apache.org/jira/browse/SPARK-7216 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Show driver details in Mesos cluster UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6284) Support framework authentication and role in Mesos framework
[ https://issues.apache.org/jira/browse/SPARK-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357180#comment-14357180 ] Timothy Chen commented on SPARK-6284: - https://github.com/apache/spark/pull/4960 Support framework authentication and role in Mesos framework Key: SPARK-6284 URL: https://issues.apache.org/jira/browse/SPARK-6284 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Support framework authentication and role in both Coarse grain and fine grain mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6284) Support framework authentication and role in Mesos framework
Timothy Chen created SPARK-6284: --- Summary: Support framework authentication and role in Mesos framework Key: SPARK-6284 URL: https://issues.apache.org/jira/browse/SPARK-6284 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Support framework authentication and role in both Coarse grain and fine grain mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6081) DriverRunner doesn't support pulling HTTP/HTTPS URIs
Timothy Chen created SPARK-6081: --- Summary: DriverRunner doesn't support pulling HTTP/HTTPS URIs Key: SPARK-6081 URL: https://issues.apache.org/jira/browse/SPARK-6081 Project: Spark Issue Type: Improvement Reporter: Timothy Chen Standalone cluster mode according to the docs supports specifying http|https jar urls, but when actually called the urls passed to the driver runner is not able to pull http uris due to the usage of hadoopfs get. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-2628) Mesos backend throwing unable to find LoginModule
[ https://issues.apache.org/jira/browse/SPARK-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed SPARK-2628. --- Resolution: Won't Fix Mesos backend throwing unable to find LoginModule -- Key: SPARK-2628 URL: https://issues.apache.org/jira/browse/SPARK-2628 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen Assignee: Tim Chen http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3c1401892590126-6927.p...@n3.nabble.com%3E 14/07/22 19:57:59 INFO HttpServer: Starting HTTP Server 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2628) Mesos backend throwing unable to find LoginModule
[ https://issues.apache.org/jira/browse/SPARK-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327978#comment-14327978 ] Timothy Chen commented on SPARK-2628: - Seems like this is fixed post 1.0.4, somewhere in 1.1. If users are using older versions than 1.1 people can run into this. Will close this as won't fix. Mesos backend throwing unable to find LoginModule -- Key: SPARK-2628 URL: https://issues.apache.org/jira/browse/SPARK-2628 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen Assignee: Tim Chen http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3c1401892590126-6927.p...@n3.nabble.com%3E 14/07/22 19:57:59 INFO HttpServer: Starting HTTP Server 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5338) Support cluster mode with Mesos
[ https://issues.apache.org/jira/browse/SPARK-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294264#comment-14294264 ] Timothy Chen commented on SPARK-5338: - Started a doc to begin the design of this, will be adding more details along the way: https://docs.google.com/document/d/1BswXeFLRY8ofIfyWgjalexsM7XSVs_ddcXGK1HOb7QQ/edit?usp=sharing Support cluster mode with Mesos --- Key: SPARK-5338 URL: https://issues.apache.org/jira/browse/SPARK-5338 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently using Spark with Mesos, the only supported deployment is client mode. It is also useful to have a cluster mode deployment that can be shared and long running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5338) Support cluster mode with Mesos
[ https://issues.apache.org/jira/browse/SPARK-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-5338: Component/s: Mesos Support cluster mode with Mesos --- Key: SPARK-5338 URL: https://issues.apache.org/jira/browse/SPARK-5338 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently using Spark with Mesos, the only supported deployment is client mode. It is also useful to have a cluster mode deployment that can be shared and long running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5338) Support cluster mode with Mesos
Timothy Chen created SPARK-5338: --- Summary: Support cluster mode with Mesos Key: SPARK-5338 URL: https://issues.apache.org/jira/browse/SPARK-5338 Project: Spark Issue Type: Improvement Reporter: Timothy Chen Currently using Spark with Mesos, the only supported deployment is client mode. It is also useful to have a cluster mode deployment that can be shared and long running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode
[ https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276094#comment-14276094 ] Timothy Chen commented on SPARK-5095: - [~joshdevins][~maasg] I have a PR out now, I wonder if you guys can try it? https://github.com/apache/spark/pull/4027 Support launching multiple mesos executors in coarse grained mesos mode --- Key: SPARK-5095 URL: https://issues.apache.org/jira/browse/SPARK-5095 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in coarse grained mesos mode, it's expected that we only launch one Mesos executor that launches one JVM process to launch multiple spark executors. However, this become a problem when the JVM process launched is larger than an ideal size (30gb is recommended value from databricks), which causes GC problems reported on the mailing list. We should support launching mulitple executors when large enough resources are available for spark to use, and these resources are still under the configured limit. This is also applicable when users want to specifiy number of executors to be launched on each node -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode
[ https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273244#comment-14273244 ] Timothy Chen commented on SPARK-5095: - [~joshdevins] [~gmaas] indeed capping the cores is actually to fix 4940, and we can use that to address the number of executors. I'm trying not to have just a set of configurations that can achieve both, otherwise it becomes a lot harder to maintain. I'm working on the patch now and I'll add you both on github for review. Support launching multiple mesos executors in coarse grained mesos mode --- Key: SPARK-5095 URL: https://issues.apache.org/jira/browse/SPARK-5095 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in coarse grained mesos mode, it's expected that we only launch one Mesos executor that launches one JVM process to launch multiple spark executors. However, this become a problem when the JVM process launched is larger than an ideal size (30gb is recommended value from databricks), which causes GC problems reported on the mailing list. We should support launching mulitple executors when large enough resources are available for spark to use, and these resources are still under the configured limit. This is also applicable when users want to specifiy number of executors to be launched on each node -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3619) Upgrade to Mesos 0.21 to work around MESOS-1688
[ https://issues.apache.org/jira/browse/SPARK-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266459#comment-14266459 ] Timothy Chen commented on SPARK-3619: - [~jongyoul] please go ahead! Upgrade to Mesos 0.21 to work around MESOS-1688 --- Key: SPARK-3619 URL: https://issues.apache.org/jira/browse/SPARK-3619 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Matei Zaharia Assignee: Timothy Chen The Mesos 0.21 release has a fix for https://issues.apache.org/jira/browse/MESOS-1688, which affects Spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode
[ https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265796#comment-14265796 ] Timothy Chen commented on SPARK-5095: - I think instead of configuring the number of executors to launch per slave, I think it's more ideal to configure the amount of cpu/mem per executor. My current thoughts for implementation is to introduce two more configs: spark.mesos.coarse.executors.max -- the maximum amount of executors launched per slave, applies to coarse grain mode spark.mesos.coarse.cores.max -- the maximum amount of cpus to use per executor Memory is already configurable through spark.executor.memory. With these, you can choose to launch two executors by specifiying two max executors and also capping the max cpus to be halved the amount. These configurations can also fix SPARK-4940. Support launching multiple mesos executors in coarse grained mesos mode --- Key: SPARK-5095 URL: https://issues.apache.org/jira/browse/SPARK-5095 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in coarse grained mesos mode, it's expected that we only launch one Mesos executor that launches one JVM process to launch multiple spark executors. However, this become a problem when the JVM process launched is larger than an ideal size (30gb is recommended value from databricks), which causes GC problems reported on the mailing list. We should support launching mulitple executors when large enough resources are available for spark to use, and these resources are still under the configured limit. This is also applicable when users want to specifiy number of executors to be launched on each node -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4940) Support more evenly distributing cores for Mesos mode
[ https://issues.apache.org/jira/browse/SPARK-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265802#comment-14265802 ] Timothy Chen commented on SPARK-4940: - So I assume you're specifiying coarse grain mode right? And how are streaming consumers launched? I know that on the scheduler side it is launching spark executors/drivers, and we simply launch one spark executor per slave that is running multiple spark tasks. My assumption was that it was the number of resources allocated that is disproportional to each slave's executor. Support more evenly distributing cores for Mesos mode - Key: SPARK-4940 URL: https://issues.apache.org/jira/browse/SPARK-4940 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in Coarse grain mode the spark scheduler simply takes all the resources it can on each node, but can cause uneven distribution based on resources available on each slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4940) Support more evenly distributing cores for Mesos mode
[ https://issues.apache.org/jira/browse/SPARK-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-4940: Summary: Support more evenly distributing cores for Mesos mode (was: Document or Support more evenly distributing cores for Mesos mode) Support more evenly distributing cores for Mesos mode - Key: SPARK-4940 URL: https://issues.apache.org/jira/browse/SPARK-4940 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in Coarse grain mode the spark scheduler simply takes all the resources it can on each node, but can cause uneven distribution based on resources available on each slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode
[ https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-5095: Description: Currently in coarse grained mesos mode, it's expected that we only launch one Mesos executor that launches one JVM process to launch multiple spark executors. However, this become a problem when the JVM process launched is larger than an ideal size (30gb is recommended value from databricks), which causes GC problems reported on the mailing list. We should support launching mulitple executors when large enough resources are available for spark to use, and these resources are still under the configured limit. This is also applicable when users want to specifiy number of executors to be launched on each node was: Currently in coarse grained mesos mode, it's expected that we only launch one Mesos executor that launches one JVM process to launch multiple spark executors. However, this become a problem when the JVM process launched is larger than an ideal size (30gb is recommended value from databricks), which causes GC problems reported on the mailing list. We should support launching mulitple executors when large enough resources are available for spark to use, and these resources are still under the configured limit. Support launching multiple mesos executors in coarse grained mesos mode --- Key: SPARK-5095 URL: https://issues.apache.org/jira/browse/SPARK-5095 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in coarse grained mesos mode, it's expected that we only launch one Mesos executor that launches one JVM process to launch multiple spark executors. However, this become a problem when the JVM process launched is larger than an ideal size (30gb is recommended value from databricks), which causes GC problems reported on the mailing list. We should support launching mulitple executors when large enough resources are available for spark to use, and these resources are still under the configured limit. This is also applicable when users want to specifiy number of executors to be launched on each node -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode
Timothy Chen created SPARK-5095: --- Summary: Support launching multiple mesos executors in coarse grained mesos mode Key: SPARK-5095 URL: https://issues.apache.org/jira/browse/SPARK-5095 Project: Spark Issue Type: Improvement Reporter: Timothy Chen Currently in coarse grained mesos mode, it's expected that we only launch one Mesos executor that launches one JVM process to launch multiple spark executors. However, this become a problem when the JVM process launched is larger than an ideal size (30gb is recommended value from databricks), which causes GC problems reported on the mailing list. We should support launching mulitple executors when large enough resources are available for spark to use, and these resources are still under the configured limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode
[ https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-5095: Component/s: Mesos Support launching multiple mesos executors in coarse grained mesos mode --- Key: SPARK-5095 URL: https://issues.apache.org/jira/browse/SPARK-5095 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in coarse grained mesos mode, it's expected that we only launch one Mesos executor that launches one JVM process to launch multiple spark executors. However, this become a problem when the JVM process launched is larger than an ideal size (30gb is recommended value from databricks), which causes GC problems reported on the mailing list. We should support launching mulitple executors when large enough resources are available for spark to use, and these resources are still under the configured limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4940) Document or Support more evenly distributing cores for Mesos mode
[ https://issues.apache.org/jira/browse/SPARK-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258570#comment-14258570 ] Timothy Chen commented on SPARK-4940: - Potentially I think there are two ways to help distribute the allocation: 1) Use static reservation in mesos side, to reserve resources just for Spark which gurantees the coarse grained mode will use them. The downside is that when Spark doesn't need these resources it can't be shared with other frameworks. 2) In the Spark scheduler side we can perhaps have a minimum and maximum cpu allocation count, so that besides just requiring 1 cpu we also ask to have the required cores to be within a range, so it's much more evenly allocated. Document or Support more evenly distributing cores for Mesos mode - Key: SPARK-4940 URL: https://issues.apache.org/jira/browse/SPARK-4940 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in Coarse grain mode the spark scheduler simply takes all the resources it can on each node, but can cause uneven distribution based on resources available on each slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4940) Document or Support more evenly distributing cores for Mesos mode
[ https://issues.apache.org/jira/browse/SPARK-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258571#comment-14258571 ] Timothy Chen commented on SPARK-4940: - [~gmaas] Document or Support more evenly distributing cores for Mesos mode - Key: SPARK-4940 URL: https://issues.apache.org/jira/browse/SPARK-4940 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in Coarse grain mode the spark scheduler simply takes all the resources it can on each node, but can cause uneven distribution based on resources available on each slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4940) Document or Support more evenly distributing cores for Mesos mode
Timothy Chen created SPARK-4940: --- Summary: Document or Support more evenly distributing cores for Mesos mode Key: SPARK-4940 URL: https://issues.apache.org/jira/browse/SPARK-4940 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently in Coarse grain mode the spark scheduler simply takes all the resources it can on each node, but can cause uneven distribution based on resources available on each slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4286) Support External Shuffle Service with Mesos integration
[ https://issues.apache.org/jira/browse/SPARK-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-4286: Description: With the new external shuffle service added, we need to also make the Mesos integration able to launch the shuffle service and support the auto scaling executors. Mesos executor will launch the external shuffle service and leave it running, while have spark executors scalable. was:With the new external shuffle service added, we need to also make the Mesos integration able to launch the shuffle service and support the auto scaling executors. Support External Shuffle Service with Mesos integration --- Key: SPARK-4286 URL: https://issues.apache.org/jira/browse/SPARK-4286 Project: Spark Issue Type: Task Components: Mesos Reporter: Timothy Chen With the new external shuffle service added, we need to also make the Mesos integration able to launch the shuffle service and support the auto scaling executors. Mesos executor will launch the external shuffle service and leave it running, while have spark executors scalable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4286) Support External Shuffle Service with Mesos integration
Timothy Chen created SPARK-4286: --- Summary: Support External Shuffle Service with Mesos integration Key: SPARK-4286 URL: https://issues.apache.org/jira/browse/SPARK-4286 Project: Spark Issue Type: Task Reporter: Timothy Chen With the new external shuffle service added, we need to also make the Mesos integration able to launch the shuffle service and support the auto scaling executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4286) Support External Shuffle Service with Mesos integration
[ https://issues.apache.org/jira/browse/SPARK-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-4286: Component/s: Mesos Support External Shuffle Service with Mesos integration --- Key: SPARK-4286 URL: https://issues.apache.org/jira/browse/SPARK-4286 Project: Spark Issue Type: Task Components: Mesos Reporter: Timothy Chen With the new external shuffle service added, we need to also make the Mesos integration able to launch the shuffle service and support the auto scaling executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4286) Support External Shuffle Service with Mesos integration
[ https://issues.apache.org/jira/browse/SPARK-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201312#comment-14201312 ] Timothy Chen commented on SPARK-4286: - Please assign to me, thanks. Support External Shuffle Service with Mesos integration --- Key: SPARK-4286 URL: https://issues.apache.org/jira/browse/SPARK-4286 Project: Spark Issue Type: Task Components: Mesos Reporter: Timothy Chen With the new external shuffle service added, we need to also make the Mesos integration able to launch the shuffle service and support the auto scaling executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-2616) Update Mesos to 0.19.1
[ https://issues.apache.org/jira/browse/SPARK-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen closed SPARK-2616. --- Resolution: Fixed SPARK-3619 is going to update to 0.21 Update Mesos to 0.19.1 -- Key: SPARK-2616 URL: https://issues.apache.org/jira/browse/SPARK-2616 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Let's update Mesos to 0.19.1 and verify that it works. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3619) Upgrade to Mesos 0.21 to work around MESOS-1688
[ https://issues.apache.org/jira/browse/SPARK-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160436#comment-14160436 ] Timothy Chen commented on SPARK-3619: - I can do this, please assign it to me Upgrade to Mesos 0.21 to work around MESOS-1688 --- Key: SPARK-3619 URL: https://issues.apache.org/jira/browse/SPARK-3619 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Matei Zaharia When Mesos 0.21 comes out, it will have a fix for https://issues.apache.org/jira/browse/MESOS-1688, which affects Spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3619) Upgrade to Mesos 0.21 to work around MESOS-1688
[ https://issues.apache.org/jira/browse/SPARK-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160437#comment-14160437 ] Timothy Chen commented on SPARK-3619: - [~matei] Upgrade to Mesos 0.21 to work around MESOS-1688 --- Key: SPARK-3619 URL: https://issues.apache.org/jira/browse/SPARK-3619 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Matei Zaharia When Mesos 0.21 comes out, it will have a fix for https://issues.apache.org/jira/browse/MESOS-1688, which affects Spark jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-3817) BlockManagerMasterActor: Got two different block manager registrations with Mesos
Timothy Chen created SPARK-3817: --- Summary: BlockManagerMasterActor: Got two different block manager registrations with Mesos Key: SPARK-3817 URL: https://issues.apache.org/jira/browse/SPARK-3817 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen 14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140711-081617-711206558-5050-2543-5 Here is the log from the mesos-slave where this container was running. http://pastebin.com/Q1Cuzm6Q -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
[ https://issues.apache.org/jira/browse/SPARK-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14149833#comment-14149833 ] Timothy Chen commented on SPARK-2691: - [~tstclair] sounds great! The integration should be straight forward by specifying a DockerInfo into the TaskInfo, the only interesting question arise with options and also the docker image itself. Would like to start investigating and making the change? Allow Spark on Mesos to be launched with Docker --- Key: SPARK-2691 URL: https://issues.apache.org/jira/browse/SPARK-2691 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Assignee: Timothy Chen Labels: mesos Currently to launch Spark with Mesos one must upload a tarball and specifiy the executor URI to be passed in that is to be downloaded on each slave or even each execution depending coarse mode or not. We want to make Spark able to support launching Executors via a Docker image that utilizes the recent Docker and Mesos integration work. With the recent integration Spark can simply specify a Docker image and options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2022) Spark 1.0.0 is failing if mesos.coarse set to true
[ https://issues.apache.org/jira/browse/SPARK-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117581#comment-14117581 ] Timothy Chen commented on SPARK-2022: - This should be resolved now, [~pwend...@gmail.com] please help close this. Spark 1.0.0 is failing if mesos.coarse set to true -- Key: SPARK-2022 URL: https://issues.apache.org/jira/browse/SPARK-2022 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Marek Wiewiorka Assignee: Tim Chen Priority: Critical more stderr --- WARNING: Logging before InitGoogleLogging() is written to STDERR I0603 16:07:53.721132 61192 exec.cpp:131] Version: 0.18.2 I0603 16:07:53.725230 61200 exec.cpp:205] Executor registered on slave 201405220917-134217738-5050-27119-0 Exception in thread main java.lang.NumberFormatException: For input string: sparkseq003.cloudapp.net at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:135) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) more stdout --- Registered executor on sparkseq003.cloudapp.net Starting task 5 Forked command at 61202 sh -c '/home/mesos/spark-1.0.0/bin/spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend -Dspark.mesos.coarse=true akka.tcp://sp...@sparkseq001.cloudapp.net:40312/user/CoarseG rainedScheduler 201405220917-134217738-5050-27119-0 sparkseq003.cloudapp.net 4' Command exited with status 1 (pid: 61202) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2921) Mesos doesn't handle spark.executor.extraJavaOptions correctly (among other things)
[ https://issues.apache.org/jira/browse/SPARK-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101383#comment-14101383 ] Timothy Chen commented on SPARK-2921: - This should be all addressed by the PR from the linked issue. Mesos doesn't handle spark.executor.extraJavaOptions correctly (among other things) --- Key: SPARK-2921 URL: https://issues.apache.org/jira/browse/SPARK-2921 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.2 Reporter: Andrew Or Priority: Critical Fix For: 1.1.0 The code path to handle this exists only for the coarse grained mode, and even in this mode the java options aren't passed to the executors properly. We currently pass the entire value of spark.executor.extraJavaOptions to the executors as a string without splitting it. We need to use Utils.splitCommandString as in standalone mode. I have not confirmed this, but I would assume spark.executor.extraClassPath and spark.executor.extraLibraryPath are also not propagated correctly in either mode. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2022) Spark 1.0.0 is failing if mesos.coarse set to true
[ https://issues.apache.org/jira/browse/SPARK-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077109#comment-14077109 ] Timothy Chen commented on SPARK-2022: - Github PR: https://github.com/apache/spark/pull/1622 Spark 1.0.0 is failing if mesos.coarse set to true -- Key: SPARK-2022 URL: https://issues.apache.org/jira/browse/SPARK-2022 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Marek Wiewiorka Assignee: Tim Chen Priority: Critical more stderr --- WARNING: Logging before InitGoogleLogging() is written to STDERR I0603 16:07:53.721132 61192 exec.cpp:131] Version: 0.18.2 I0603 16:07:53.725230 61200 exec.cpp:205] Executor registered on slave 201405220917-134217738-5050-27119-0 Exception in thread main java.lang.NumberFormatException: For input string: sparkseq003.cloudapp.net at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:135) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) more stdout --- Registered executor on sparkseq003.cloudapp.net Starting task 5 Forked command at 61202 sh -c '/home/mesos/spark-1.0.0/bin/spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend -Dspark.mesos.coarse=true akka.tcp://sp...@sparkseq001.cloudapp.net:40312/user/CoarseG rainedScheduler 201405220917-134217738-5050-27119-0 sparkseq003.cloudapp.net 4' Command exited with status 1 (pid: 61202) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2691) Allow Spark on Mesos to be launched with Docker
Timothy Chen created SPARK-2691: --- Summary: Allow Spark on Mesos to be launched with Docker Key: SPARK-2691 URL: https://issues.apache.org/jira/browse/SPARK-2691 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Currently to launch Spark with Mesos one must upload a tarball and specifiy the executor URI to be passed in that is to be downloaded on each slave or even each execution depending coarse mode or not. We want to make Spark able to support launching Executors via a Docker image that utilizes the recent Docker and Mesos integration work. With the recent integration Spark can simply specify a Docker image and options that is needed and it should continue to work as-is. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2616) Update Mesos to 0.19.1
Timothy Chen created SPARK-2616: --- Summary: Update Mesos to 0.19.1 Key: SPARK-2616 URL: https://issues.apache.org/jira/browse/SPARK-2616 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Timothy Chen Let's update Mesos to 0.19.1 and verify that it works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2628) Mesos backend throwing unable to find LoginModule
Timothy Chen created SPARK-2628: --- Summary: Mesos backend throwing unable to find LoginModule Key: SPARK-2628 URL: https://issues.apache.org/jira/browse/SPARK-2628 Project: Spark Issue Type: Bug Reporter: Timothy Chen http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3c1401892590126-6927.p...@n3.nabble.com%3E 14/07/22 19:57:59 INFO HttpServer: Starting HTTP Server 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2628) Mesos backend throwing unable to find LoginModule
[ https://issues.apache.org/jira/browse/SPARK-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070826#comment-14070826 ] Timothy Chen commented on SPARK-2628: - [~pwendell] please assign to me, thanks! Mesos backend throwing unable to find LoginModule -- Key: SPARK-2628 URL: https://issues.apache.org/jira/browse/SPARK-2628 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3c1401892590126-6927.p...@n3.nabble.com%3E 14/07/22 19:57:59 INFO HttpServer: Starting HTTP Server 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2628) Mesos backend throwing unable to find LoginModule
[ https://issues.apache.org/jira/browse/SPARK-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated SPARK-2628: Component/s: Mesos Mesos backend throwing unable to find LoginModule -- Key: SPARK-2628 URL: https://issues.apache.org/jira/browse/SPARK-2628 Project: Spark Issue Type: Bug Components: Mesos Reporter: Timothy Chen http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3c1401892590126-6927.p...@n3.nabble.com%3E 14/07/22 19:57:59 INFO HttpServer: Starting HTTP Server 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more 14/07/22 19:57:59 ERROR Executor: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.Error: java.io.IOException: failure to login at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:40) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ... 2 more Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org/apache/hadoop/security/UserGroupInformation$HadoopLoginModule at javax.security.auth.login.LoginContext.invoke(LoginContext.java:823) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471) ... 6 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2022) Spark 1.0.0 is failing if mesos.coarse set to true
[ https://issues.apache.org/jira/browse/SPARK-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069508#comment-14069508 ] Timothy Chen commented on SPARK-2022: - this seems duplicate of SPARK-2020 Spark 1.0.0 is failing if mesos.coarse set to true -- Key: SPARK-2022 URL: https://issues.apache.org/jira/browse/SPARK-2022 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Marek Wiewiorka Priority: Critical more stderr --- WARNING: Logging before InitGoogleLogging() is written to STDERR I0603 16:07:53.721132 61192 exec.cpp:131] Version: 0.18.2 I0603 16:07:53.725230 61200 exec.cpp:205] Executor registered on slave 201405220917-134217738-5050-27119-0 Exception in thread main java.lang.NumberFormatException: For input string: sparkseq003.cloudapp.net at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.parseInt(Integer.java:527) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:135) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) more stdout --- Registered executor on sparkseq003.cloudapp.net Starting task 5 Forked command at 61202 sh -c '/home/mesos/spark-1.0.0/bin/spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend -Dspark.mesos.coarse=true akka.tcp://sp...@sparkseq001.cloudapp.net:40312/user/CoarseG rainedScheduler 201405220917-134217738-5050-27119-0 sparkseq003.cloudapp.net 4' Command exited with status 1 (pid: 61202) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2269) Clean up and add unit tests for resourceOffers in MesosSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066879#comment-14066879 ] Timothy Chen commented on SPARK-2269: - Created a PR for this: https://github.com/apache/spark/pull/1487 Clean up and add unit tests for resourceOffers in MesosSchedulerBackend --- Key: SPARK-2269 URL: https://issues.apache.org/jira/browse/SPARK-2269 Project: Spark Issue Type: Bug Components: Mesos Reporter: Patrick Wendell Assignee: Tim Chen This function could be simplified a bit. We could re-write it without offerableIndices or creating the mesosTasks array as large as the offer list. There is a lot of logic around making sure you get the correct index into mesosTasks and offers, really we should just build mesosTasks directly from the offers we get back. To associate the tasks we are launching with the offers we can just create a hashMap from the slaveId to the original offer. The basic logic of the function is that you take the mesos offers, convert them to spark offers, then convert the results back. One reason I think it might be designed as it is now is to deal with the case where Mesos gives multiple offers for a single slave. I checked directly with the Mesos team and they said this won't ever happen, you'll get at most one offer per mesos slave within a set of offers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-872) Should revive offer after tasks finish in Mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065703#comment-14065703 ] Timothy Chen commented on SPARK-872: I'm not quite understanding your statement where Mesos master will call resourceOffer until 4 cores are free? Can you elaborate what that means? Should revive offer after tasks finish in Mesos fine-grained mode -- Key: SPARK-872 URL: https://issues.apache.org/jira/browse/SPARK-872 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 0.8.0 Reporter: xiajunluan when running spark on latest Mesos release, I notice that spark on mesos fine-grained could not schedule spark tasks effectively, for example, if slave has 4 cpu cores resource, mesos master will call resourceOffer function of spark until 4 cpu cores are all free. but In my points like standalone scheduler mode, if one task finished and one cpus core is free, Mesos master should call spark resourceOffer to allocate resource to tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1702) Mesos executor won't start because of a ClassNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065706#comment-14065706 ] Timothy Chen commented on SPARK-1702: - The PR is merged and closed already, is this still an issue? Mesos executor won't start because of a ClassNotFoundException -- Key: SPARK-1702 URL: https://issues.apache.org/jira/browse/SPARK-1702 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Bouke van der Bijl Labels: executors, mesos, spark Some discussion here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html Fix here (which is probably not the right fix): https://github.com/apache/spark/pull/620 This was broken in v0.9.0, was fixed in v0.9.1 and is now broken again. Error in Mesos executor stderr: WARNING: Logging before InitGoogleLogging() is written to STDERR I0502 17:31:42.672224 14688 exec.cpp:131] Version: 0.18.0 I0502 17:31:42.674959 14707 exec.cpp:205] Executor registered on slave 20140501-182306-16842879-5050-10155-0 14/05/02 17:31:42 INFO MesosExecutorBackend: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/05/02 17:31:42 INFO MesosExecutorBackend: Registered with Mesos as executor ID 20140501-182306-16842879-5050-10155-0 14/05/02 17:31:43 INFO SecurityManager: Changing view acls to: vagrant 14/05/02 17:31:43 INFO SecurityManager: SecurityManager, is authentication enabled: false are ui acls enabled: false users with view permissions: Set(vagrant) 14/05/02 17:31:43 INFO Slf4jLogger: Slf4jLogger started 14/05/02 17:31:43 INFO Remoting: Starting remoting 14/05/02 17:31:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@localhost:50843] 14/05/02 17:31:43 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@localhost:50843] java.lang.ClassNotFoundException: org/apache/spark/serializer/JavaSerializer at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:165) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:176) at org.apache.spark.executor.Executor.init(Executor.scala:106) at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:56) Exception in thread Thread-0 I0502 17:31:43.710039 14707 exec.cpp:412] Deactivating the executor libprocess The problem is that it can't find the class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1764) EOF reached before Python server acknowledged
[ https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065709#comment-14065709 ] Timothy Chen commented on SPARK-1764: - I'm not sure how this is related to Mesos, is this reproable using YARN or standalone? EOF reached before Python server acknowledged - Key: SPARK-1764 URL: https://issues.apache.org/jira/browse/SPARK-1764 Project: Spark Issue Type: Bug Components: Mesos, PySpark Affects Versions: 1.0.0 Reporter: Bouke van der Bijl Priority: Blocker Labels: mesos, pyspark I'm getting EOF reached before Python server acknowledged while using PySpark on Mesos. The error manifests itself in multiple ways. One is: 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed due to the error EOF reached before Python server acknowledged; shutting down SparkContext And the other has a full stacktrace: 14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server acknowledged org.apache.spark.SparkException: EOF reached before Python server acknowledged at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416) at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:277) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) This error causes the SparkContext to shutdown. I have not been able to reliably reproduce this bug, it seems to happen randomly, but if you run enough tasks on a SparkContext it'll hapen eventually -- This message was sent by Atlassian JIRA (v6.2#6252)