[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12413: Summary: Mesos ZK persistence throws a (was: Mesos ZK persistence is broken) > Mesos ZK persistence throws a > -- > > Key: SPARK-12413 > URL: https://issues.apache.org/jira/browse/SPARK-12413 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt > > See: https://github.com/apache/spark/pull/10359#discussion_r47929981 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12413) Mesos ZK persistence is broken
Michael Gummelt created SPARK-12413: --- Summary: Mesos ZK persistence is broken Key: SPARK-12413 URL: https://issues.apache.org/jira/browse/SPARK-12413 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.6.0 Reporter: Michael Gummelt See: https://github.com/apache/spark/pull/10359#discussion_r47929981 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12413: Summary: Mesos ZK persistence throws a NotSerializableException (was: Mesos ZK persistence throws a ) > Mesos ZK persistence throws a NotSerializableException > -- > > Key: SPARK-12413 > URL: https://issues.apache.org/jira/browse/SPARK-12413 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt > > See: https://github.com/apache/spark/pull/10359#discussion_r47929981 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12413: Description: This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.util.Utils$.serialize(Utils.scala:83) at org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166) at org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132) at org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258) was: This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster {{ Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at
[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12413: Description: https://github.com/apache/spark/pull/10359 breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.util.Utils$.serialize(Utils.scala:83) at org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166) at org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132) at org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258) was: This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at
[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12413: Description: This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster {{ Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.util.Utils$.serialize(Utils.scala:83) at org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166) at org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132) at org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258) }} was: This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L166 ``` Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
[jira] [Commented] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063005#comment-15063005 ] Michael Gummelt commented on SPARK-12413: - Updated. Thanks > Mesos ZK persistence throws a NotSerializableException > -- > > Key: SPARK-12413 > URL: https://issues.apache.org/jira/browse/SPARK-12413 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt > > https://github.com/apache/spark/pull/10359 breaks ZK persistence due to > https://issues.scala-lang.org/browse/SI-6654 > This line throws a NotSerializable exception: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster > The MesosClusterDispatcher attempts to serialize MesosDriverDescription > objects to ZK, but https://github.com/apache/spark/pull/10359 makes it so the > {{command}} property is unserializable > Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 > 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: > 0x151b1d1567e0002 after 0ms > 15/12/17 21:52:44 DEBUG nio: created > SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} > 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 > 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on > AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 > 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ > o.s.j.s.ServletContextHandler{/,null} > 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ > o.s.j.s.ServletContextHandler{/,null} > 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null > -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 > 15/12/17 21:52:44 DEBUG ServletHandler: chain=null > 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create > java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at org.apache.spark.util.Utils$.serialize(Utils.scala:83) > at > org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166) > at > org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132) > at > org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12413: Description: https://github.com/apache/spark/pull/10359 breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster The MesosClusterDispatcher attempts to serialize MesosDriverDescription objects to ZK, but https://github.com/apache/spark/pull/10359 makes it so the {{command}} property is unserializable Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.util.Utils$.serialize(Utils.scala:83) at org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166) at org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132) at org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258) was: https://github.com/apache/spark/pull/10359 breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at
[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12413: Description: This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 This line throws a NotSerializable exception: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L166 ``` Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 0x151b1d1567e0002 after 0ms 15/12/17 21:52:44 DEBUG nio: created SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ o.s.j.s.ServletContextHandler{/,null} 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 15/12/17 21:52:44 DEBUG ServletHandler: chain=null 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.util.Utils$.serialize(Utils.scala:83) at org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166) at org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132) at org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258) ``` was:See: https://github.com/apache/spark/pull/10359#discussion_r47929981 > Mesos ZK persistence throws a NotSerializableException > -- > > Key: SPARK-12413 > URL: https://issues.apache.org/jira/browse/SPARK-12413 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt > > This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654 > This line throws a NotSerializable exception: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L166 > ``` > Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 > 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: > 0x151b1d1567e0002 after 0ms > 15/12/17 21:52:44 DEBUG nio: created > SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} > 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 > 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on > AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 > 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ > o.s.j.s.ServletContextHandler{/,null} > 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ > o.s.j.s.ServletContextHandler{/,null} > 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null > -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 > 15/12/17 21:52:44 DEBUG ServletHandler: chain=null > 15/12/17 21:52:44 WARN ServletHandler:
[jira] [Commented] (SPARK-16194) No way to dynamically set env vars on driver in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-16194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348650#comment-15348650 ] Michael Gummelt commented on SPARK-16194: - Ah, yea, that's what I need. I'd like the make this standard. > No way to dynamically set env vars on driver in cluster mode > > > Key: SPARK-16194 > URL: https://issues.apache.org/jira/browse/SPARK-16194 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt >Priority: Minor > > I often need to dynamically configure a driver when submitting in cluster > mode, but there's currently no way of setting env vars. {{spark-env.sh}} > lets me set env vars, but I have to statically build that into my spark > distribution. I need a solution for specifying them in {{spark-submit}}. > Much like {{spark.executorEnv.[ENV]}}, but for drivers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16194) No way to dynamically set env vars on driver in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-16194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348645#comment-15348645 ] Michael Gummelt commented on SPARK-16194: - > Env variables are pretty much from outside Spark right? They're my own env vars, yea. The motivating case is setting "SSL_ENABLED" on the driver to enable mesos SSL support. > Generally, these are being removed and deprecated anyway. You mean the Spark env vars like SPARK_SUBMIT_OPTS? That's good to hear, but that's not what I'm talking about. > Any chance of just using a sys property or command line alternative? libmesos ultimately needs SSL_ENABLED, so every spark job I submit would have to convert from the sys property to the env var, which is infeasible. I realize this may be a corner case, but it would bring us to consistency with spark.executorEnv.[ENV] > No way to dynamically set env vars on driver in cluster mode > > > Key: SPARK-16194 > URL: https://issues.apache.org/jira/browse/SPARK-16194 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt >Priority: Minor > > I often need to dynamically configure a driver when submitting in cluster > mode, but there's currently no way of setting env vars. {{spark-env.sh}} > lets me set env vars, but I have to statically build that into my spark > distribution. I need a solution for specifying them in {{spark-submit}}. > Much like {{spark.executorEnv.[ENV]}}, but for drivers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16194) No way to dynamically set env vars on driver in cluster mode
Michael Gummelt created SPARK-16194: --- Summary: No way to dynamically set env vars on driver in cluster mode Key: SPARK-16194 URL: https://issues.apache.org/jira/browse/SPARK-16194 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: Michael Gummelt I often need to dynamically configure a driver when submitting in cluster mode, but there's currently no way of setting env vars. {{spark-env.sh}} lets me set env vars, but I have to statically build that into my spark distribution. I need a solution for specifying them in {{spark-submit}}. Much like {{spark.executorEnv.[ENV]}}, but for drivers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13258) --conf variables not honored in Mesos cluster mode
Michael Gummelt created SPARK-13258: --- Summary: --conf variables not honored in Mesos cluster mode Key: SPARK-13258 URL: https://issues.apache.org/jira/browse/SPARK-13258 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.6.0 Reporter: Michael Gummelt Spark properties set via the deprecated {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the preferred {{--conf}} are not. This results in the URI being fetched in the executor: {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 --class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} This does not: {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 --conf spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md --class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369 In the above line of code, you can see that SPARK_JAVA_OPTS is passed along to the driver, so those properties take effect. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373 Whereas in this line of code, you see that {{--conf}} variables are set on {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this env var is being set on the driver, not the executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13258) --conf properties not honored in Mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-13258: Summary: --conf properties not honored in Mesos cluster mode (was: --conf variables not honored in Mesos cluster mode) > --conf properties not honored in Mesos cluster mode > --- > > Key: SPARK-13258 > URL: https://issues.apache.org/jira/browse/SPARK-13258 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt > > Spark properties set via the deprecated {{SPARK_JAVA_OPTS}} are passed along > to the driver, but those set via the preferred {{--conf}} are not. > This results in the URI being fetched in the executor: > {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md > -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" > ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 > --class org.apache.spark.examples.SparkPi > http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} > This does not: > {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" > ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 > --conf > spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md > --class org.apache.spark.examples.SparkPi > http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369 > In the above line of code, you can see that SPARK_JAVA_OPTS is passed along > to the driver, so those properties take effect. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373 > Whereas in this line of code, you see that {{--conf}} variables are set on > {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this > env var is being set on the driver, not the executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13258) --conf properties not honored in Mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-13258: Description: Spark properties set on {{spark-submit}} via the deprecated {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the preferred {{--conf}} are not. For example, this results in the URI being fetched in the executor: {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 --class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} This does not: {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 --conf spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md --class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369 In the above line of code, you can see that SPARK_JAVA_OPTS is passed along to the driver, so those properties take effect. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373 Whereas in this line of code, you see that {{--conf}} variables are set on {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this env var is being set on the driver, not the executor. was: Spark properties set via the deprecated {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the preferred {{--conf}} are not. This results in the URI being fetched in the executor: {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 --class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} This does not: {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 --conf spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md --class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369 In the above line of code, you can see that SPARK_JAVA_OPTS is passed along to the driver, so those properties take effect. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373 Whereas in this line of code, you see that {{--conf}} variables are set on {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this env var is being set on the driver, not the executor. > --conf properties not honored in Mesos cluster mode > --- > > Key: SPARK-13258 > URL: https://issues.apache.org/jira/browse/SPARK-13258 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt > > Spark properties set on {{spark-submit}} via the deprecated > {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the > preferred {{--conf}} are not. > For example, this results in the URI being fetched in the executor: > {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md > -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" > ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 > --class org.apache.spark.examples.SparkPi > http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} > This does not: > {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" > ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 > --conf > spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md > --class org.apache.spark.examples.SparkPi > http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} >
[jira] [Created] (SPARK-13259) SPARK_HOME should not be used as the CWD in docker executors
Michael Gummelt created SPARK-13259: --- Summary: SPARK_HOME should not be used as the CWD in docker executors Key: SPARK-13259 URL: https://issues.apache.org/jira/browse/SPARK-13259 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.6.0 Reporter: Michael Gummelt Priority: Minor I have a docker image that explicitly sets WORKDIR. However, I also have to set spark.mesos.executor.home when submitting in client mode, otherwise the cwd is set to the SPARK_HOME of the driver. SPARK_HOME should never be used in docker executors, as it's a different file system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13439) Document that spark.mesos.uris is comma-separated
[ https://issues.apache.org/jira/browse/SPARK-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-13439: Description: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L346 > Document that spark.mesos.uris is comma-separated > - > > Key: SPARK-13439 > URL: https://issues.apache.org/jira/browse/SPARK-13439 > Project: Spark > Issue Type: Documentation > Components: Mesos >Reporter: Michael Gummelt > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L346 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13439) Document that spark.mesos.uris is comma-separated
Michael Gummelt created SPARK-13439: --- Summary: Document that spark.mesos.uris is comma-separated Key: SPARK-13439 URL: https://issues.apache.org/jira/browse/SPARK-13439 Project: Spark Issue Type: Documentation Components: Mesos Reporter: Michael Gummelt -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown
[ https://issues.apache.org/jira/browse/SPARK-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-14180: Description: I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced a deadlock in executor shutdown. The result is executor shutdown hangs indefinitely. In Mesos at least, this lasts until {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the driver stops, which force kills the executors. The deadlock is as follows: - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks on rpcEnv.awaitTermination() https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95 - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which blocks until all dispatcher threads (MessageLoop threads) terminate - However, the initial Shutdown message handling is itself handled by a Dispatcher MessageLoop thread. This mutual dependence results in a deadlock. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala#L216 was: I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced a deadlock in executor shutdown. The result is executor shutdown hangs indefinitely. In Mesos at least, this lasts until {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the driver stops, which force kills the executors. The deadlock is as follows: - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks on rpcEnv.awaitTermination() https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95 - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which blocks until all dispatcher threads (MessageLoop threads) terminate - However, the initial Shutdown message handling is itself handled by a Dispatcher MessageLoop thread. This mutual dependence results in a deadlock. > Deadlock in CoarseGrainedExecutorBackend Shutdown > - > > Key: SPARK-14180 > URL: https://issues.apache.org/jira/browse/SPARK-14180 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: master branch. commit > d6dc12ef0146ae409834c78737c116050961f350 >Reporter: Michael Gummelt >Priority: Blocker > > I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced > a deadlock in executor shutdown. The result is executor shutdown hangs > indefinitely. In Mesos at least, this lasts until > {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the > driver stops, which force kills the executors. > The deadlock is as follows: > - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks > on rpcEnv.awaitTermination() > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95 > - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which > blocks until all dispatcher threads (MessageLoop threads) terminate > - However, the initial Shutdown message handling is itself handled by a > Dispatcher MessageLoop thread. This mutual dependence results in a deadlock. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala#L216 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown
Michael Gummelt created SPARK-14180: --- Summary: Deadlock in CoarseGrainedExecutorBackend Shutdown Key: SPARK-14180 URL: https://issues.apache.org/jira/browse/SPARK-14180 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Environment: master branch. commit d6dc12ef0146ae409834c78737c116050961f350 Reporter: Michael Gummelt Priority: Blocker I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced a deadlock in executor shutdown. The result is executor shutdown hangs indefinitely. In Mesos at least, this lasts until {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the driver stops, which force kills the executors. The deadlock is as follows: - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks on rpcEnv.awaitTermination() https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95 - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which blocks until all dispatcher threads (MessageLoop threads) terminate - However, the initial Shutdown message handling is itself handled by a Dispatcher MessageLoop thread. This mutual dependence results in a deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown
[ https://issues.apache.org/jira/browse/SPARK-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213184#comment-15213184 ] Michael Gummelt commented on SPARK-14180: - cc [~zsxwing] > Deadlock in CoarseGrainedExecutorBackend Shutdown > - > > Key: SPARK-14180 > URL: https://issues.apache.org/jira/browse/SPARK-14180 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: master branch. commit > d6dc12ef0146ae409834c78737c116050961f350 >Reporter: Michael Gummelt >Priority: Blocker > > I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced > a deadlock in executor shutdown. The result is executor shutdown hangs > indefinitely. In Mesos at least, this lasts until > {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the > driver stops, which force kills the executors. > The deadlock is as follows: > - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks > on rpcEnv.awaitTermination() > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95 > - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which > blocks until all dispatcher threads (MessageLoop threads) terminate > - However, the initial Shutdown message handling is itself handled by a > Dispatcher MessageLoop thread. This mutual dependence results in a deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13258) --conf properties not honored in Mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224690#comment-15224690 ] Michael Gummelt commented on SPARK-13258: - [~jayv] Does your PR fix this problem? > --conf properties not honored in Mesos cluster mode > --- > > Key: SPARK-13258 > URL: https://issues.apache.org/jira/browse/SPARK-13258 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt > > Spark properties set on {{spark-submit}} via the deprecated > {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the > preferred {{--conf}} are not. > For example, this results in the URI being fetched in the executor: > {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md > -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" > ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 > --class org.apache.spark.examples.SparkPi > http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} > This does not: > {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" > ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 > --conf > spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md > --class org.apache.spark.examples.SparkPi > http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369 > In the above line of code, you can see that SPARK_JAVA_OPTS is passed along > to the driver, so those properties take effect. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373 > Whereas in this line of code, you see that {{--conf}} variables are set on > {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this > env var is being set on the driver, not the executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown
[ https://issues.apache.org/jira/browse/SPARK-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-14180: Affects Version/s: (was: 2.0.0) > Deadlock in CoarseGrainedExecutorBackend Shutdown > - > > Key: SPARK-14180 > URL: https://issues.apache.org/jira/browse/SPARK-14180 > Project: Spark > Issue Type: Bug > Environment: master branch. commit > d6dc12ef0146ae409834c78737c116050961f350 >Reporter: Michael Gummelt >Priority: Blocker > Fix For: 2.0.0 > > > I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced > a deadlock in executor shutdown. The result is executor shutdown hangs > indefinitely. In Mesos at least, this lasts until > {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the > driver stops, which force kills the executors. > The deadlock is as follows: > - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks > on rpcEnv.awaitTermination() > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95 > - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which > blocks until all dispatcher threads (MessageLoop threads) terminate > - However, the initial Shutdown message handling is itself handled by a > Dispatcher MessageLoop thread. This mutual dependence results in a deadlock. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala#L216 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown
[ https://issues.apache.org/jira/browse/SPARK-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-14180: Fix Version/s: 2.0.0 > Deadlock in CoarseGrainedExecutorBackend Shutdown > - > > Key: SPARK-14180 > URL: https://issues.apache.org/jira/browse/SPARK-14180 > Project: Spark > Issue Type: Bug > Environment: master branch. commit > d6dc12ef0146ae409834c78737c116050961f350 >Reporter: Michael Gummelt >Priority: Blocker > Fix For: 2.0.0 > > > I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced > a deadlock in executor shutdown. The result is executor shutdown hangs > indefinitely. In Mesos at least, this lasts until > {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the > driver stops, which force kills the executors. > The deadlock is as follows: > - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks > on rpcEnv.awaitTermination() > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95 > - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which > blocks until all dispatcher threads (MessageLoop threads) terminate > - However, the initial Shutdown message handling is itself handled by a > Dispatcher MessageLoop thread. This mutual dependence results in a deadlock. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala#L216 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14822) Add lazy executor startup to Mesos Scheduler
Michael Gummelt created SPARK-14822: --- Summary: Add lazy executor startup to Mesos Scheduler Key: SPARK-14822 URL: https://issues.apache.org/jira/browse/SPARK-14822 Project: Spark Issue Type: Task Components: Mesos Reporter: Michael Gummelt As we deprecate fine-grained mode, we need to make sure we have alternative solutions for its benefits. Its two benefits are: 0. lazy executor startup In fine-grained mode, executors are brought up only as tasks are scheduled. This means that a user doesn't have to set {{spark.cores.max}} to ensure that the app doesn't consume all resources in the cluster. 1. relinquishing cores As Spark tasks terminate, the mesos task it was bound to terminates as well, thus relinquishing the cores assigned to it. I'd like to add {{0.}} to coarse-grained mode, possibly enabled with a configuration param. If https://issues.apache.org/jira/browse/MESOS-1279 ever happens, we can add {{1.}} as well. cc [~tnachen] [~dragos] [~skonto] [~andrewor14] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14977) Fine grained mode in Mesos is not fair
[ https://issues.apache.org/jira/browse/SPARK-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263393#comment-15263393 ] Michael Gummelt commented on SPARK-14977: - I assume your first two jobs are long running? Mesos doesn't offer resources to the third app, because there are no more resources to offer. They've already been offered to the first two apps. We're looking into support for revocable resources to solve this problem. You can also partition your cluster via roles if you'd like certain jobs to have guaranteed resources. > Fine grained mode in Mesos is not fair > -- > > Key: SPARK-14977 > URL: https://issues.apache.org/jira/browse/SPARK-14977 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.1.0 > Environment: Spark commit db75ccb, Debian jessie, Mesos fine grained >Reporter: Luca Bruno > > I've setup a mesos cluster and I'm running spark in fine grained mode. > Spark defaults to 2 executor cores and 2gb of ram. > The total mesos cluster has 8 cores and 8gb of ram. > When I submit two spark jobs simultaneously, spark will always accept full > resources, leading the two frameworks to use 4gb of ram each instead of 2gb. > If I submit another spark job, it will not get offered resources from mesos, > at least using the default HierarchicalDRF allocator module. > Mesos will keep offering 4gb of ram to earlier spark jobs, and spark keeps > accepting full resources for every new task. > Hence new spark jobs have no chance of getting a share. > Is this something to be solved with a custom mesos allocator? Or spark should > be more fair instead? Or maybe provide a configuration option to always > accept with the minimum resources? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279318#comment-15279318 ] Michael Gummelt commented on SPARK-10643: - +1 to fix this. > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15271) Allow force pulling executor docker images
[ https://issues.apache.org/jira/browse/SPARK-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281838#comment-15281838 ] Michael Gummelt commented on SPARK-15271: - Much needed, thanks. > Allow force pulling executor docker images > -- > > Key: SPARK-15271 > URL: https://issues.apache.org/jira/browse/SPARK-15271 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 1.6.1 >Reporter: Philipp Hoffmann > > Mesos agents by default will not pull docker images which are cached locally > already. > Because of this, in order to run a mutable tag (like {{...:latest}}) from the > current version on the docker repository you have to explicitly tell the > Mesos agent to pull the image (force pull). Otherwise the Mesos agent will > run an old (cached version). > The feature for force pulling the image was introduced in Mesos 0.22: > https://github.com/apache/mesos/commit/8682569df528717ff5efb64da26b1b49c39c4efd > This ticket is about making use of this feature in Spark in order to force > Mesos agents to pull the executors docker image. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14977) Fine grained mode in Mesos is not fair
[ https://issues.apache.org/jira/browse/SPARK-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273202#comment-15273202 ] Michael Gummelt commented on SPARK-14977: - [~lethalman]: Fine-grained mode only release cores, not memory. It's impossible for us to shrink the memory allocation without OOM-ing the executor, because the JVM doesn't relinquish memory back to the OS. You can use dynamic allocation to terminate entire executors as they become idle. Also, FYI, fine-grained mode will soon be deprecated in favor of dynamic allocation. > Fine grained mode in Mesos is not fair > -- > > Key: SPARK-14977 > URL: https://issues.apache.org/jira/browse/SPARK-14977 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.1.0 > Environment: Spark commit db75ccb, Debian jessie, Mesos fine grained >Reporter: Luca Bruno > > I've setup a mesos cluster and I'm running spark in fine grained mode. > Spark defaults to 2 executor cores and 2gb of ram. > The total mesos cluster has 8 cores and 8gb of ram. > When I submit two spark jobs simultaneously, spark will always accept full > resources, leading the two frameworks to use 4gb of ram each instead of 2gb. > If I submit another spark job, it will not get offered resources from mesos, > at least using the default HierarchicalDRF allocator module. > Mesos will keep offering 4gb of ram to earlier spark jobs, and spark keeps > accepting full resources for every new task. > Hence new spark jobs have no chance of getting a share. > Is this something to be solved with a custom mesos allocator? Or spark should > be more fair instead? Or maybe provide a configuration option to always > accept with the minimum resources? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14977) Fine grained mode in Mesos is not fair
[ https://issues.apache.org/jira/browse/SPARK-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt closed SPARK-14977. --- Resolution: Not A Problem > Fine grained mode in Mesos is not fair > -- > > Key: SPARK-14977 > URL: https://issues.apache.org/jira/browse/SPARK-14977 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.1.0 > Environment: Spark commit db75ccb, Debian jessie, Mesos fine grained >Reporter: Luca Bruno > > I've setup a mesos cluster and I'm running spark in fine grained mode. > Spark defaults to 2 executor cores and 2gb of ram. > The total mesos cluster has 8 cores and 8gb of ram. > When I submit two spark jobs simultaneously, spark will always accept full > resources, leading the two frameworks to use 4gb of ram each instead of 2gb. > If I submit another spark job, it will not get offered resources from mesos, > at least using the default HierarchicalDRF allocator module. > Mesos will keep offering 4gb of ram to earlier spark jobs, and spark keeps > accepting full resources for every new task. > Hence new spark jobs have no chance of getting a share. > Is this something to be solved with a custom mesos allocator? Or spark should > be more fair instead? Or maybe provide a configuration option to always > accept with the minimum resources? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274445#comment-15274445 ] Michael Gummelt commented on SPARK-15142: - I can't understand this sentence. Can you reword this? > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274447#comment-15274447 ] Michael Gummelt commented on SPARK-15142: - Can you include the dispatcher logs? Does restarting the dispatcher fix the problem? > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15155) Optionally ignore default role resources
[ https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274455#comment-15274455 ] Michael Gummelt commented on SPARK-15155: - Why do you want to avoid launching on the default role? The default role represents resources available to all frameworks. If you don't want certain frameworks to launch tasks on default role resources, you should reserve those resources on a different role. > Optionally ignore default role resources > > > Key: SPARK-15155 > URL: https://issues.apache.org/jira/browse/SPARK-15155 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chris Heller > > SPARK-6284 added support for Mesos roles, but the framework will still accept > resources from both the reserved role specified in {{spark.mesos.role}} and > the default role {{*}}. > I'd like to propose the addition of a new boolean property: > {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark > will only accept resources from the role passed in the {{spark.mesos.role}} > property. If {{spark.mesos.role}} has not been set, > {{spark.mesos.ignoreDefaultRoleResources}} has no effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15155) Optionally ignore default role resources
[ https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274491#comment-15274491 ] Michael Gummelt commented on SPARK-15155: - Yes, I understand the effect, but not the motivation. Why don't you want to launch Spark tasks on the default role? > Optionally ignore default role resources > > > Key: SPARK-15155 > URL: https://issues.apache.org/jira/browse/SPARK-15155 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chris Heller > > SPARK-6284 added support for Mesos roles, but the framework will still accept > resources from both the reserved role specified in {{spark.mesos.role}} and > the default role {{*}}. > I'd like to propose the addition of a new boolean property: > {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark > will only accept resources from the role passed in the {{spark.mesos.role}} > property. If {{spark.mesos.role}} has not been set, > {{spark.mesos.ignoreDefaultRoleResources}} has no effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15155) Optionally ignore default role resources
[ https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274560#comment-15274560 ] Michael Gummelt commented on SPARK-15155: - Why not create a separate role for your ad-hoc work? We'll eventually solve this more efficiently with support for revocable resources: http://mesos.apache.org/documentation/latest/oversubscription/ > Optionally ignore default role resources > > > Key: SPARK-15155 > URL: https://issues.apache.org/jira/browse/SPARK-15155 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chris Heller > > SPARK-6284 added support for Mesos roles, but the framework will still accept > resources from both the reserved role specified in {{spark.mesos.role}} and > the default role {{*}}. > I'd like to propose the addition of a new boolean property: > {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark > will only accept resources from the role passed in the {{spark.mesos.role}} > property. If {{spark.mesos.role}} has not been set, > {{spark.mesos.ignoreDefaultRoleResources}} has no effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15155) Optionally ignore default role resources
[ https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274682#comment-15274682 ] Michael Gummelt commented on SPARK-15155: - Just have a single role for your batch jobs if you want them to have guaranteed resources. Or just ensure that the streaming jobs have spark.cores.max set appropriately, and launch everything in the default role. If this doesn't work for some reason, and you still have issues, please frame the problem as "If I do X, then I will run into problem Y", because I'm having trouble understanding your problem. > Optionally ignore default role resources > > > Key: SPARK-15155 > URL: https://issues.apache.org/jira/browse/SPARK-15155 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chris Heller > > SPARK-6284 added support for Mesos roles, but the framework will still accept > resources from both the reserved role specified in {{spark.mesos.role}} and > the default role {{*}}. > I'd like to propose the addition of a new boolean property: > {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark > will only accept resources from the role passed in the {{spark.mesos.role}} > property. If {{spark.mesos.role}} has not been set, > {{spark.mesos.ignoreDefaultRoleResources}} has no effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15155) Optionally ignore default role resources
[ https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274505#comment-15274505 ] Michael Gummelt commented on SPARK-15155: - I'm still missing the "why". What is the downside of having a job launch tasks on the default role? > Optionally ignore default role resources > > > Key: SPARK-15155 > URL: https://issues.apache.org/jira/browse/SPARK-15155 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chris Heller > > SPARK-6284 added support for Mesos roles, but the framework will still accept > resources from both the reserved role specified in {{spark.mesos.role}} and > the default role {{*}}. > I'd like to propose the addition of a new boolean property: > {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark > will only accept resources from the role passed in the {{spark.mesos.role}} > property. If {{spark.mesos.role}} has not been set, > {{spark.mesos.ignoreDefaultRoleResources}} has no effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15155) Optionally ignore default role resources
[ https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274576#comment-15274576 ] Michael Gummelt commented on SPARK-15155: - > they will again take resources from the default role Your stated problem was that your ad-hoc jobs were starved. This solves that problem. So now I don't understand the problem. Your long running apps have taken all the default resources, but if you have resources reserved for your ad-hoc jobs, they will never be starved. > Optionally ignore default role resources > > > Key: SPARK-15155 > URL: https://issues.apache.org/jira/browse/SPARK-15155 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.5.0, 1.6.0 >Reporter: Chris Heller > > SPARK-6284 added support for Mesos roles, but the framework will still accept > resources from both the reserved role specified in {{spark.mesos.role}} and > the default role {{*}}. > I'd like to propose the addition of a new boolean property: > {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark > will only accept resources from the role passed in the {{spark.mesos.role}} > property. If {{spark.mesos.role}} has not been set, > {{spark.mesos.ignoreDefaultRoleResources}} has no effect. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16742) Kerberos support for Spark on Mesos
Michael Gummelt created SPARK-16742: --- Summary: Kerberos support for Spark on Mesos Key: SPARK-16742 URL: https://issues.apache.org/jira/browse/SPARK-16742 Project: Spark Issue Type: New Feature Components: Mesos Reporter: Michael Gummelt We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be contributing it to Apache Spark soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15377353#comment-15377353 ] Michael Gummelt commented on SPARK-16522: - [~srowen] I'm going to look into this now and resolve it today. Can you hold off on the next 2.0 RC until this is resolved? It's likely a major bug. > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Spark applications running on Mesos throw exception upon exit as follows: > {panel} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at
[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378178#comment-15378178 ] Michael Gummelt commented on SPARK-16522: - I don't think so. Please give me a couple hours to investigate further, though. > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Spark applications running on Mesos throw exception upon exit as follows: > {noformat} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) >
[jira] [Updated] (SPARK-16687) build/mvn fails when fetching mvn
[ https://issues.apache.org/jira/browse/SPARK-16687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16687: Description: mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn` {code} ./build/mvn --force exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz 100.0% gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn ./build/mvn: line 152: /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or directory {code} After changing MVN_VERSION from "3.3.3" to "3.3.9": {code} ./build/mvn --force exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz 100.0% Using `mvn` from path: /home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 {code} was: mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn` {code} Cmgummelt@mg-mesos:~/code/spark$ ./build/mvn --force exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz 100.0% gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn ./build/mvn: line 152: /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or directory {code} After changing MVN_VERSION from "3.3.3" to "3.3.9": {code} ./build/mvn --force exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz 100.0% Using `mvn` from path: /home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 {code} > build/mvn fails when fetching mvn > - > > Key: SPARK-16687 > URL: https://issues.apache.org/jira/browse/SPARK-16687 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.2 >Reporter: Michael Gummelt > > mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn` > {code} > ./build/mvn --force > exec: curl --progress-bar -L > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz > > 100.0% > gzip: stdin: not in gzip format > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn > ./build/mvn: line 152: > /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or > directory > {code} > After changing MVN_VERSION from "3.3.3" to "3.3.9": > {code} > ./build/mvn --force > exec: curl --progress-bar -L > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz > > 100.0% > Using `mvn` from path: > /home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn > OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support > was removed in 8.0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16687) build/mvn fails when fetching mvn
Michael Gummelt created SPARK-16687: --- Summary: build/mvn fails when fetching mvn Key: SPARK-16687 URL: https://issues.apache.org/jira/browse/SPARK-16687 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.6.2 Reporter: Michael Gummelt mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn` {code} Cmgummelt@mg-mesos:~/code/spark$ ./build/mvn --force exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz 100.0% gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn ./build/mvn: line 152: /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or directory {code} After changing MVN_VERSION from "3.3.3" to "3.3.9": {code} ./build/mvn --force exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz 100.0% Using `mvn` from path: /home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16687) build/mvn fails when fetching mvn
[ https://issues.apache.org/jira/browse/SPARK-16687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390014#comment-15390014 ] Michael Gummelt commented on SPARK-16687: - thanks! > build/mvn fails when fetching mvn > - > > Key: SPARK-16687 > URL: https://issues.apache.org/jira/browse/SPARK-16687 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.2 >Reporter: Michael Gummelt > > mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn` > {code} > ./build/mvn --force > exec: curl --progress-bar -L > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz > > 100.0% > gzip: stdin: not in gzip format > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn > ./build/mvn: line 152: > /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or > directory > {code} > After changing MVN_VERSION from "3.3.3" to "3.3.9": > {code} > ./build/mvn --force > exec: curl --progress-bar -L > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz > > 100.0% > Using `mvn` from path: > /home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn > OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support > was removed in 8.0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16450) Build failes for Mesos 0.28.x
[ https://issues.apache.org/jira/browse/SPARK-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396236#comment-15396236 ] Michael Gummelt commented on SPARK-16450: - I'll update soon. Though this pending PR updates to 0.28 https://github.com/apache/spark/pull/14275 > Build failes for Mesos 0.28.x > - > > Key: SPARK-16450 > URL: https://issues.apache.org/jira/browse/SPARK-16450 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 > Environment: Mesos 0.28.0 >Reporter: Niels Becker > > Build fails: > [error] > /usr/local/spark/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala:82: > type mismatch; > [error] found : org.apache.mesos.protobuf.ByteString > [error] required: String > [error] credBuilder.setSecret(ByteString.copyFromUtf8(secret)) > Build cmd: > dev/make-distribution.sh --tgz -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive > -DskipTests -Dmesos.version=0.28.0 -Djava.version=1.8 > Spark Version: 2.0.0-rc2 > Java: OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-1~bpo8+1-b14 > Scala Version: 2.11.8 > Same error for mesos.version=0.28.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-16783) make-distri
[ https://issues.apache.org/jira/browse/SPARK-16783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt closed SPARK-16783. --- Resolution: Not A Problem > make-distri > --- > > Key: SPARK-16783 > URL: https://issues.apache.org/jira/browse/SPARK-16783 > Project: Spark > Issue Type: Bug >Reporter: Michael Gummelt > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16784) Configurable log4j settings
Michael Gummelt created SPARK-16784: --- Summary: Configurable log4j settings Key: SPARK-16784 URL: https://issues.apache.org/jira/browse/SPARK-16784 Project: Spark Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Michael Gummelt I often want to change the logging configuration on a single spark job. This is easy in client mode. I just modify log4j.properties. It's difficult in cluster mode, because I need to modify the log4j.properties in the distribution in which the driver runs. I'd like a way of setting this dynamically, such as a java system property. Some brief searching showed that log4j doesn't seem to accept such a property, but I'd like to open up this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16783) make-distri
Michael Gummelt created SPARK-16783: --- Summary: make-distri Key: SPARK-16783 URL: https://issues.apache.org/jira/browse/SPARK-16783 Project: Spark Issue Type: Bug Reporter: Michael Gummelt -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16808) History Server main page does not honor APPLICATION_WEB_PROXY_BASE
Michael Gummelt created SPARK-16808: --- Summary: History Server main page does not honor APPLICATION_WEB_PROXY_BASE Key: SPARK-16808 URL: https://issues.apache.org/jira/browse/SPARK-16808 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: Michael Gummelt The root of the history server is rendered dynamically with javascript, and this doesn't honor APPLICATION_WEB_PROXY_BASE: https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/ui/static/historypage-template.html#L67 Other links in the history server do honor it: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L146 This means the links on the history server root page are broken when deployed behind a proxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16809) Link Mesos Dispatcher and History Server
Michael Gummelt created SPARK-16809: --- Summary: Link Mesos Dispatcher and History Server Key: SPARK-16809 URL: https://issues.apache.org/jira/browse/SPARK-16809 Project: Spark Issue Type: New Feature Components: Mesos Reporter: Michael Gummelt This is a somewhat duplicate of Spark-13401, but the PR for that JIRA seems to only implement sandbox linking, not history server linking, which is the sole scope of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417857#comment-15417857 ] Michael Gummelt edited comment on SPARK-16784 at 8/11/16 8:11 PM: -- {{log4j.debug=true}} only results in log4j printing its internal debugging messages (e.g. config file location, appenders, etc.). It doesn't turn on debug logging for the application. was (Author: mgummelt): {{log4j.debug=true}} only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt reopened SPARK-16784: - `log4j.debug=true` only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417856#comment-15417856 ] Michael Gummelt commented on SPARK-16784: - `log4j.debug=true` only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417857#comment-15417857 ] Michael Gummelt edited comment on SPARK-16784 at 8/11/16 8:10 PM: -- {{log4j.debug=true}} only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. was (Author: mgummelt): `log4j.debug=true` only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16784: Comment: was deleted (was: `log4j.debug=true` only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application.) > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16881) Migrate Mesos configs to use ConfigEntry
Michael Gummelt created SPARK-16881: --- Summary: Migrate Mesos configs to use ConfigEntry Key: SPARK-16881 URL: https://issues.apache.org/jira/browse/SPARK-16881 Project: Spark Issue Type: Task Components: Mesos Affects Versions: 2.0.0 Reporter: Michael Gummelt Priority: Minor https://github.com/apache/spark/pull/14414#discussion_r73032190 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16881) Migrate Mesos configs to use ConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16881: Description: https://github.com/apache/spark/pull/14414#discussion_r73032190 We'd like to migrate Mesos' use of config vars to the new ConfigEntry class so we can a) define all our configs in one place like YARN does, and b) take use of features like default handling and generics was:https://github.com/apache/spark/pull/14414#discussion_r73032190 > Migrate Mesos configs to use ConfigEntry > > > Key: SPARK-16881 > URL: https://issues.apache.org/jira/browse/SPARK-16881 > Project: Spark > Issue Type: Task > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Michael Gummelt >Priority: Minor > > https://github.com/apache/spark/pull/14414#discussion_r73032190 > We'd like to migrate Mesos' use of config vars to the new ConfigEntry class > so we can a) define all our configs in one place like YARN does, and b) take > use of features like default handling and generics -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL
Michael Gummelt created SPARK-17002: --- Summary: Document that spark.ssl.protocol. is required for SSL Key: SPARK-17002 URL: https://issues.apache.org/jira/browse/SPARK-17002 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.0.0, 1.6.2 Reporter: Michael Gummelt cc [~jlewandowski] I was trying to start the Spark master. When setting {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get this none-too-helpful error message: {code} 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mgummelt); users with modify permissions: Set(mgummelt) 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for SSL connections. Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121) at org.apache.spark.deploy.master.Master$.main(Master.scala:1106) at org.apache.spark.deploy.master.Master.main(Master.scala) {code} We should document that {{spark.ssl.protocol}} is required, and throw a more helpful error message when it isn't set. In fact, we should remove the `getOrElse` here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285, since the following line fails when the protocol is set to "Default" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376125#comment-15376125 ] Michael Gummelt commented on SPARK-16522: - I've seen some stack traces recently that might have been this. I'm trying to repro now. Will get back to you. Which commit/tag are you running? > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Spark applications running on Mesos throw exception upon exit as follows: > {panel} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at
[jira] [Commented] (SPARK-13258) --conf properties not honored in Mesos cluster mode
[ https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375797#comment-15375797 ] Michael Gummelt commented on SPARK-13258: - Nope, this is still a bug. > --conf properties not honored in Mesos cluster mode > --- > > Key: SPARK-13258 > URL: https://issues.apache.org/jira/browse/SPARK-13258 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt > > Spark properties set on {{spark-submit}} via the deprecated > {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the > preferred {{--conf}} are not. > For example, this results in the URI being fetched in the executor: > {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md > -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" > ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 > --class org.apache.spark.examples.SparkPi > http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} > This does not: > {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" > ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 > --conf > spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md > --class org.apache.spark.examples.SparkPi > http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}} > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369 > In the above line of code, you can see that SPARK_JAVA_OPTS is passed along > to the driver, so those properties take effect. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373 > Whereas in this line of code, you see that {{--conf}} variables are set on > {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this > env var is being set on the driver, not the executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11183) enable support for mesos 0.24+
[ https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384520#comment-15384520 ] Michael Gummelt commented on SPARK-11183: - Hey All. I just saw this JIRA. Sorry for the delay. Once Mesos 1.0 is released (maybe this week), I'll update Spark to use the 1.0 bindings. The problem described in this JIRA isn't actually a bindings issue. It's a libmesos issue. If you update your libmesos to be a later version, it should go away. The bindings aren't technically guaranteed to be compatible with an arbitrary libmesos version, but I've never seen an issue. The long term solution is to move the Spark scheduler over to the new Mesos HTTP API, so we no longer have to deal with libmesos. > enable support for mesos 0.24+ > -- > > Key: SPARK-11183 > URL: https://issues.apache.org/jira/browse/SPARK-11183 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Ioannis Polyzos > > mesos 0.24, the mesos leader info in ZK has changed to json tis result to > spark failed to running on 0.24+. > References : > https://issues.apache.org/jira/browse/MESOS-2340 > > http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E > https://github.com/mesos/elasticsearch/issues/338 > https://github.com/spark-jobserver/spark-jobserver/issues/267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16627) --jars doesn't work in Mesos mode
Michael Gummelt created SPARK-16627: --- Summary: --jars doesn't work in Mesos mode Key: SPARK-16627 URL: https://issues.apache.org/jira/browse/SPARK-16627 Project: Spark Issue Type: Bug Components: Mesos Reporter: Michael Gummelt Definitely doesn't work in cluster mode. Might not work in client mode either. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16637) Support Mesos Unified Containerizer
Michael Gummelt created SPARK-16637: --- Summary: Support Mesos Unified Containerizer Key: SPARK-16637 URL: https://issues.apache.org/jira/browse/SPARK-16637 Project: Spark Issue Type: Task Components: Mesos Affects Versions: 2.0.0 Reporter: Michael Gummelt Mesos is moving toward a single, unified containerizer that will run both Docker and non-Docker containers. We should add support. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380130#comment-15380130 ] Michael Gummelt commented on SPARK-16522: - This shouldn't affect functionality > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Spark applications running on Mesos throw exception upon exit as follows: > {noformat} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at >
[jira] [Commented] (SPARK-16450) Build failes for Mesos 0.28.x
[ https://issues.apache.org/jira/browse/SPARK-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371562#comment-15371562 ] Michael Gummelt commented on SPARK-16450: - Once Mesos 1.0 is released, I'll submit a PR to upgrade. Long term solution is to use the HTTP API, so we no longer have to deal with libmesos, but that's a large change. > Build failes for Mesos 0.28.x > - > > Key: SPARK-16450 > URL: https://issues.apache.org/jira/browse/SPARK-16450 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 > Environment: Mesos 0.28.0 >Reporter: Niels Becker > > Build fails: > [error] > /usr/local/spark/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala:82: > type mismatch; > [error] found : org.apache.mesos.protobuf.ByteString > [error] required: String > [error] credBuilder.setSecret(ByteString.copyFromUtf8(secret)) > Build cmd: > dev/make-distribution.sh --tgz -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive > -DskipTests -Dmesos.version=0.28.0 -Djava.version=1.8 > Spark Version: 2.0.0-rc2 > Java: OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-1~bpo8+1-b14 > Scala Version: 2.11.8 > Same error for mesos.version=0.28.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363013#comment-15363013 ] Michael Gummelt commented on SPARK-16379: - I traced back the addition of the `synchronized` block, and it seems Matei added it a long time ago. I can't prove that the method is thread-safe, so I'd rather not remove the synchronization block. So we can either: 1) Remove the log statements (I'd like to keep them) 2) Revert the `lazy` commit 3) Introduce an explicit lock, and synchronize on that rather than `this` 2) is the "correct" thing to do, since it's the author's responsibility to not break existing code, but I'm OK with 3) as well. [~srowen] what do you think? > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:34 PM: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html | One other thing i hope it holds is no new commit should break the project even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html > One other thing i hope it holds is no new commit should break the project > even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:33 PM: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html > One other thing i hope it holds is no new commit should break the project > even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:35 PM: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html {quote} One other thing i hope it holds is no new commit should break the project even if it fixes something or reveals another issue etc. {quote} Well I do agree with Sean that it's on us to fix bugs revealed by external changes. was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html | One other thing i hope it holds is no new commit should break the project even if it fixes something or reveals another issue etc. Well I do agree with Sean that it's on us to fix bugs revealed by external changes. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363367#comment-15363367 ] Michael Gummelt commented on SPARK-11857: - I'll give [~amcelwee] a couple days to respond. [~dragos] [~skonto] [~tnachen] speak now or forever hold your peace. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363256#comment-15363256 ] Michael Gummelt commented on SPARK-16379: - > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363258#comment-15363258 ] Michael Gummelt commented on SPARK-16379: - > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363345#comment-15363345 ] Michael Gummelt commented on SPARK-11857: - For completeness, here's my theoretical analysis to augment our empirical observation that users don't mind fine-grained mode being removed. Fine-grained mode provides two benefits: 1) Slow-start Executors are brought up lazily 2) Relinquishing cores Cores are relinquished back to Mesos as Spark tasks terminate Fine-grained mdoe does *not* provide the following benefits, though some think it does: a) Relinquishing memory The JVM doesn't relinquish memory, so it would be unsafe for us to resize the cgroup b) Relinquishing executors As for alternatives to the benefits, 1) is provided by dynamic allocation, though we need a better recommended setup for this as I document here: http://apache-spark-developers-list.1001551.n3.nabble.com/HDFS-as-Shuffle-Service-td17340.html There is no alternative to 2), but we've generally found that the executor-level granularity of dynamic allocation is sufficient for most. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363410#comment-15363410 ] Michael Gummelt commented on SPARK-11857: - Thanks! > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt commented on SPARK-16379: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363290#comment-15363290 ] Michael Gummelt commented on SPARK-16379: - Hmmm, since that's a different lock, I don't see the possibility for deadlock in the previous code, but I'm content to relinquish the point. Concurrency is hard :) > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363311#comment-15363311 ] Michael Gummelt commented on SPARK-11857: - I endorse the deprecation. Fine-grained mode would be more useful if the JVM could shrink in memory as well as cores, but alas... We at Mesosphere haven't heard any objections from users regarding the loss of fine-grained. [~andrewor14] Please cc me if you need Mesos input. Tim is still active, I believe, but no longer at Mesosphere. I work (mostly) full-time on Spark on Mesos. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt reopened SPARK-11857: - > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:31 PM: - I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. I agree we shouldn't get rid of `lazy val` completely, but it is unfortunate that you can't use them in a `synchronized` block. It's a leaky abstraction. Seems to be addressed here: http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html was (Author: mgummelt): I say we add a new lock to synchronize on and be done with it. The root of the issue is that deadlock detection is hard. The author of the breaking change added a critical region, and to do so safely, you have to ensure that all calling code paths haven't acquired the same lock, which is difficult (undecidable). The only process change I can imagine to fix the higher level issue is running some sort of deadlock detection tool in the Spark tests. > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16379: Comment: was deleted (was: > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 ) > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
[ https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363258#comment-15363258 ] Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:15 PM: - > it's entirely possible that code has a bug that's only revealed when some > other legitimate change happens Of course, but I still don't see the bug that existed previously. Perhaps `synchronized` was unnecessary, but I still see no race condition nor deadlock in the previous code. Maybe following up on this will help: > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 was (Author: mgummelt): > The previous code also involved acquiring a lock Link? I don't see this. Or do you just mean the null check? https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45 > Spark on mesos is broken due to race condition in Logging > - > > Key: SPARK-16379 > URL: https://issues.apache.org/jira/browse/SPARK-16379 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Stavros Kontopoulos >Priority: Blocker > Attachments: out.txt > > > This commit introduced a transient lazy log val: > https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec > This has caused problems in the past: > https://github.com/apache/spark/pull/1004 > One commit before that everything works fine. > I spotted that when my CI started to fail: > https://ci.typesafe.com/job/mit-docker-test-ref/191/ > You can easily verify it by installing mesos on your machine and try to > connect with spark shell from bin dir: > ./spark-shell --master mesos://zk://localhost:2181/mesos --conf > spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz > It gets stuck at the point where it tries to create the SparkContext. > Logging gets stuck here: > I0705 12:10:10.076617 9303 group.cpp:700] Trying to get > '/mesos/json.info_000152' in ZooKeeper > I0705 12:10:10.076920 9304 detector.cpp:479] A new leading master > (UPID=master@127.0.1.1:5050) is detected > I0705 12:10:10.076956 9303 sched.cpp:326] New master detected at > master@127.0.1.1:5050 > I0705 12:10:10.077057 9303 sched.cpp:336] No credentials provided. > Attempting to register without authentication > I0705 12:10:10.090709 9301 sched.cpp:703] Framework registered with > 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001 > I verified it also by changing @transient lazy val log to def and it works as > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions
[ https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363321#comment-15363321 ] Michael Gummelt commented on SPARK-11857: - [~amcelwee] Do you have any more input on this issue. We're moving forward with deprecating fine-grained mode, but we're willing to solve your issue first. > Remove Mesos fine-grained mode subject to discussions > - > > Key: SPARK-11857 > URL: https://issues.apache.org/jira/browse/SPARK-11857 > Project: Spark > Issue Type: Sub-task > Components: Mesos >Reporter: Reynold Xin >Assignee: Reynold Xin > > See discussions in > http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html > and > http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16923) Mesos cluster scheduler duplicates config vars by setting them in the environment and as --conf
Michael Gummelt created SPARK-16923: --- Summary: Mesos cluster scheduler duplicates config vars by setting them in the environment and as --conf Key: SPARK-16923 URL: https://issues.apache.org/jira/browse/SPARK-16923 Project: Spark Issue Type: Task Components: Mesos Affects Versions: 2.0.0 Reporter: Michael Gummelt I don't think this introduces any bugs, but we should fix it nonetheless -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12909) Spark on Mesos accessing Secured HDFS w/Kerberos
[ https://issues.apache.org/jira/browse/SPARK-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412624#comment-15412624 ] Michael Gummelt commented on SPARK-12909: - DC/OS Spark has this functionality, and we'll be upstreaming it to Apache Spark soon. > Spark on Mesos accessing Secured HDFS w/Kerberos > > > Key: SPARK-12909 > URL: https://issues.apache.org/jira/browse/SPARK-12909 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Greg Senia > > Ability for Spark on Mesos to use a Kerberized HDFS FileSystem for data It > seems like this is not possible based on email chains and forum articles? If > these are true how hard would it be to get this implemented I'm willing to > try to help. > https://community.hortonworks.com/questions/5415/spark-on-yarn-vs-mesos.html > https://www.mail-archive.com/user@spark.apache.org/msg31326.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412461#comment-15412461 ] Michael Gummelt commented on SPARK-11638: - [~radekg] > The only advantage we had was using the same configuration inside of the > docker container. You mean you want to run the spark driver in a docker container? Which configuration did you have to change? I can look more into this, but I need a clear "It's easier/better to do X in bridge mode than in host mode". > So with the HTTP API, Spark would still require the heavy libmesos in order > to work with Mesos? No. The HTTP API will remove the libmesos dependency, which is nice. It's not an urgent priority though. > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark
[jira] [Created] (SPARK-16927) Mesos Cluster Dispatcher default properties
Michael Gummelt created SPARK-16927: --- Summary: Mesos Cluster Dispatcher default properties Key: SPARK-16927 URL: https://issues.apache.org/jira/browse/SPARK-16927 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 2.0.0 Reporter: Michael Gummelt Add the capability to set default driver properties for all jobs submitted through the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411277#comment-15411277 ] Michael Gummelt commented on SPARK-11638: - This JIRA is complex and a lot of it is out of date. Can someone briefly explain to me what the problem is? Why do you want bridge networking? > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with Akka 2.4.x yet. > What we want is the back port of mentioned {{akka-remote}} settings to > {{2.3.x}} versions. These patches are attached to this ticket - > {{2.3.4.patch}} and
[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411309#comment-15411309 ] Michael Gummelt commented on SPARK-16944: - Since Mesos is offer based, it's up to the Spark scheduler itself to choose which offers have the best locality. In YARN, I think they tell the resource manager about preferences. > [MESOS] Improve data locality when launching new executors when dynamic > allocation is enabled > - > > Key: SPARK-16944 > URL: https://issues.apache.org/jira/browse/SPARK-16944 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Currently Spark on Yarn supports better data locality by considering the > preferred locations of the pending tasks when dynamic allocation is enabled. > Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better > that Mesos can also support this feature. > I guess that some logic existing in Yarn could be reused by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411307#comment-15411307 ] Michael Gummelt commented on SPARK-16944: - I think we can improve both with and without dynamic allocation. In both modes, Mesos is only looking at locality after it's already placed the executors. > [MESOS] Improve data locality when launching new executors when dynamic > allocation is enabled > - > > Key: SPARK-16944 > URL: https://issues.apache.org/jira/browse/SPARK-16944 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Currently Spark on Yarn supports better data locality by considering the > preferred locations of the pending tasks when dynamic allocation is enabled. > Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better > that Mesos can also support this feature. > I guess that some logic existing in Yarn could be reused by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411315#comment-15411315 ] Michael Gummelt commented on SPARK-16944: - Yea, we typically call it "delay scheduling". It was first written about by the Spark/Mesos researchers: http://elmeleegy.com/khaled/papers/delay_scheduling.pdf Spark already has `spark.locality.wait`, but that's how long the task scheduler will wait until an executor will come up with the preferred locality. We need a similar concept for waiting for offers to come in so we can place the executor correctly in the first place. > [MESOS] Improve data locality when launching new executors when dynamic > allocation is enabled > - > > Key: SPARK-16944 > URL: https://issues.apache.org/jira/browse/SPARK-16944 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Currently Spark on Yarn supports better data locality by considering the > preferred locations of the pending tasks when dynamic allocation is enabled. > Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better > that Mesos can also support this feature. > I guess that some logic existing in Yarn could be reused by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12909) Spark on Mesos accessing Secured HDFS w/Kerberos
[ https://issues.apache.org/jira/browse/SPARK-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413999#comment-15413999 ] Michael Gummelt commented on SPARK-12909: - I agree. I just spoke with Reynold about this. I'll create the module before the next big feature. > Spark on Mesos accessing Secured HDFS w/Kerberos > > > Key: SPARK-12909 > URL: https://issues.apache.org/jira/browse/SPARK-12909 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Greg Senia > > Ability for Spark on Mesos to use a Kerberized HDFS FileSystem for data It > seems like this is not possible based on email chains and forum articles? If > these are true how hard would it be to get this implemented I'm willing to > try to help. > https://community.hortonworks.com/questions/5415/spark-on-yarn-vs-mesos.html > https://www.mail-archive.com/user@spark.apache.org/msg31326.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16522: Fix Version/s: (was: 2.1.0) 2.0.1 > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui >Assignee: Sun Rui > Fix For: 2.0.1 > > > Spark applications running on Mesos throw exception upon exit as follows: > {noformat} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) >
[jira] [Reopened] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt reopened SPARK-16522: - Reopening so we can track this until it's merged into the 2.0 branch. Also changed the fix version to 2.0.1 > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui >Assignee: Sun Rui > Fix For: 2.0.1 > > > Spark applications running on Mesos throw exception upon exit as follows: > {noformat} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at
[jira] [Commented] (SPARK-16967) Collect Mesos support code into a module/profile
[ https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414004#comment-15414004 ] Michael Gummelt commented on SPARK-16967: - Will do > Collect Mesos support code into a module/profile > > > Key: SPARK-16967 > URL: https://issues.apache.org/jira/browse/SPARK-16967 > Project: Spark > Issue Type: Task > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Sean Owen >Priority: Critical > > CC [~mgummelt] [~tnachen] [~skonto] > I think this is fairly easy and would be beneficial as more work goes into > Mesos. It should separate into a module like YARN does, just on principle > really, but because it also means anyone that doesn't need Mesos support can > build without it. > I'm entirely willing to take a shot at this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions
[ https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359481#comment-15359481 ] Michael Gummelt commented on SPARK-11714: - [~drcrallen] Your solution isn't exactly clear to me, but it sounds like you're trying to solve the problem of specifying arbitrary ports (such as jmx) for the executor to reserve, rather than just the ones that spark knows about (executor, blockmanager, shuffle service). I think a clean way to do this would be introducing {{spark.mesos.executor.ports}}. So then you could specify: -Dspark.mesos.executor.ports=5000 -Dcom.sun.management.jmxremote.port=5000 or something similar > Make Spark on Mesos honor port restrictions > --- > > Key: SPARK-11714 > URL: https://issues.apache.org/jira/browse/SPARK-11714 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen > > Currently the MesosSchedulerBackend does not make any effort to honor "ports" > as a resource offer in Mesos. This ask is to have the ports which the > executor binds to honor the limits of the "ports" resource of an offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17067) Revocable resource support
Michael Gummelt created SPARK-17067: --- Summary: Revocable resource support Key: SPARK-17067 URL: https://issues.apache.org/jira/browse/SPARK-17067 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Michael Gummelt -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19479) Spark Mesos artifact split causes spark-core dependency to not pull in mesos impl
[ https://issues.apache.org/jira/browse/SPARK-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855091#comment-15855091 ] Michael Gummelt commented on SPARK-19479: - Yea, sorry for the inconvenience, but I announced this on the dev list. Search for "Mesos is now a maven module". If I were you, I would create an email filter for "Mesos" on the user/dev lists. This is what I do. > Spark Mesos artifact split causes spark-core dependency to not pull in mesos > impl > - > > Key: SPARK-19479 > URL: https://issues.apache.org/jira/browse/SPARK-19479 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.1.0 >Reporter: Charles Allen > > https://github.com/apache/spark/pull/14637 ( > https://issues.apache.org/jira/browse/SPARK-16967 ) forked off the mesos impl > into its own artifact, but the release notes do not call this out. This broke > our deployments because we depend on packaging with spark-core, which no > longer had any mesos awareness. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-16742) Kerberos support for Spark on Mesos
Title: Message Title Michael Gummelt commented on SPARK-16742 Re: Kerberos support for Spark on Mesos As an update, we (Mesosphere) are working with Stratio on a joint solution. Stratio will submit a WIP PR soon, and we'll have a design discussion in this JIRA issue. Add Comment This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] (SPARK-16742) Kerberos support for Spark on Mesos
Title: Message Title Michael Gummelt commented on SPARK-16742 Re: Kerberos support for Spark on Mesos Thomas Graves Yea, I'm pretty sure we're going to change that to use delegation tokens like the existing solutions. Add Comment This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] (SPARK-16784) Configurable log4j settings
Title: Message Title Michael Gummelt updated an issue Spark / SPARK-16784 Configurable log4j settings Change By: Michael Gummelt Affects Version/s: 2.1.0 Add Comment This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)