[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a

2015-12-17 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-12413:

Summary: Mesos ZK persistence throws a   (was: Mesos ZK persistence is 
broken)

> Mesos ZK persistence throws a 
> --
>
> Key: SPARK-12413
> URL: https://issues.apache.org/jira/browse/SPARK-12413
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> See: https://github.com/apache/spark/pull/10359#discussion_r47929981



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12413) Mesos ZK persistence is broken

2015-12-17 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-12413:
---

 Summary: Mesos ZK persistence is broken
 Key: SPARK-12413
 URL: https://issues.apache.org/jira/browse/SPARK-12413
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.6.0
Reporter: Michael Gummelt


See: https://github.com/apache/spark/pull/10359#discussion_r47929981



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException

2015-12-17 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-12413:

Summary: Mesos ZK persistence throws a NotSerializableException  (was: 
Mesos ZK persistence throws a )

> Mesos ZK persistence throws a NotSerializableException
> --
>
> Key: SPARK-12413
> URL: https://issues.apache.org/jira/browse/SPARK-12413
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> See: https://github.com/apache/spark/pull/10359#discussion_r47929981



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException

2015-12-17 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-12413:

Description: 
This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster

Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at org.apache.spark.util.Utils$.serialize(Utils.scala:83)
at 
org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166)
at 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132)
at 
org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258)

  was:
This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster

{{
Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 

[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException

2015-12-17 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-12413:

Description: 
https://github.com/apache/spark/pull/10359 breaks ZK persistence due to 
https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster

Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at org.apache.spark.util.Utils$.serialize(Utils.scala:83)
at 
org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166)
at 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132)
at 
org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258)

  was:
This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster

Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 

[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException

2015-12-17 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-12413:

Description: 
This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster

{{
Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at org.apache.spark.util.Utils$.serialize(Utils.scala:83)
at 
org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166)
at 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132)
at 
org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258)
}}

  was:
This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L166

```
Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 

[jira] [Commented] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException

2015-12-17 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063005#comment-15063005
 ] 

Michael Gummelt commented on SPARK-12413:
-

Updated.  Thanks

> Mesos ZK persistence throws a NotSerializableException
> --
>
> Key: SPARK-12413
> URL: https://issues.apache.org/jira/browse/SPARK-12413
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> https://github.com/apache/spark/pull/10359 breaks ZK persistence due to 
> https://issues.scala-lang.org/browse/SI-6654
> This line throws a NotSerializable exception: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster
> The MesosClusterDispatcher attempts to serialize MesosDriverDescription 
> objects to ZK, but https://github.com/apache/spark/pull/10359 makes it so the 
> {{command}} property is unserializable
> Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
> 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
> 0x151b1d1567e0002 after 0ms
> 15/12/17 21:52:44 DEBUG nio: created 
> SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
> 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
> 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
> AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
> 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
> o.s.j.s.ServletContextHandler{/,null}
> 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
> o.s.j.s.ServletContextHandler{/,null}
> 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null 
> -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
> 15/12/17 21:52:44 DEBUG ServletHandler: chain=null
> 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
> java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>   at org.apache.spark.util.Utils$.serialize(Utils.scala:83)
>   at 
> org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166)
>   at 
> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132)
>   at 
> org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException

2015-12-17 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-12413:

Description: 
https://github.com/apache/spark/pull/10359 breaks ZK persistence due to 
https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster

The MesosClusterDispatcher attempts to serialize MesosDriverDescription objects 
to ZK, but https://github.com/apache/spark/pull/10359 makes it so the 
{{command}} property is unserializable

Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at org.apache.spark.util.Utils$.serialize(Utils.scala:83)
at 
org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166)
at 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132)
at 
org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258)

  was:
https://github.com/apache/spark/pull/10359 breaks ZK persistence due to 
https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster

Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 

[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException

2015-12-17 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-12413:

Description: 
This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654

This line throws a NotSerializable exception: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L166

```
Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
0x151b1d1567e0002 after 0ms
15/12/17 21:52:44 DEBUG nio: created 
SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
o.s.j.s.ServletContextHandler{/,null}
15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null -> 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
15/12/17 21:52:44 DEBUG ServletHandler: chain=null
15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create
java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at org.apache.spark.util.Utils$.serialize(Utils.scala:83)
at 
org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166)
at 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132)
at 
org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258)
```

  was:See: https://github.com/apache/spark/pull/10359#discussion_r47929981


> Mesos ZK persistence throws a NotSerializableException
> --
>
> Key: SPARK-12413
> URL: https://issues.apache.org/jira/browse/SPARK-12413
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> This breaks ZK persistence due to https://issues.scala-lang.org/browse/SI-6654
> This line throws a NotSerializable exception: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L166
> ```
> Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0
> 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: 
> 0x151b1d1567e0002 after 0ms
> 15/12/17 21:52:44 DEBUG nio: created 
> SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0}
> 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591
> 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on 
> AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1
> 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ 
> o.s.j.s.ServletContextHandler{/,null}
> 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ 
> o.s.j.s.ServletContextHandler{/,null}
> 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null 
> -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091
> 15/12/17 21:52:44 DEBUG ServletHandler: chain=null
> 15/12/17 21:52:44 WARN ServletHandler: 

[jira] [Commented] (SPARK-16194) No way to dynamically set env vars on driver in cluster mode

2016-06-24 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348650#comment-15348650
 ] 

Michael Gummelt commented on SPARK-16194:
-

Ah, yea, that's what I need.  I'd like the make this standard.

> No way to dynamically set env vars on driver in cluster mode
> 
>
> Key: SPARK-16194
> URL: https://issues.apache.org/jira/browse/SPARK-16194
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>Priority: Minor
>
> I often need to dynamically configure a driver when submitting in cluster 
> mode, but there's currently no way of setting env vars.  {{spark-env.sh}} 
> lets me set env vars, but I have to statically build that into my spark 
> distribution.  I need a solution for specifying them in {{spark-submit}}.  
> Much like {{spark.executorEnv.[ENV]}}, but for drivers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16194) No way to dynamically set env vars on driver in cluster mode

2016-06-24 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348645#comment-15348645
 ] 

Michael Gummelt commented on SPARK-16194:
-

> Env variables are pretty much from outside Spark right?

They're my own env vars, yea.  The motivating case is setting "SSL_ENABLED" on 
the driver to enable mesos SSL support.

> Generally, these are being removed and deprecated anyway.

You mean the Spark env vars like SPARK_SUBMIT_OPTS?  That's good to hear, but 
that's not what I'm talking about.

> Any chance of just using a sys property or command line alternative?

libmesos ultimately needs SSL_ENABLED, so every spark job I submit would have 
to convert from the sys property to the env var, which is infeasible.

I realize this may be a corner case, but it would bring us to consistency with 
spark.executorEnv.[ENV]

> No way to dynamically set env vars on driver in cluster mode
> 
>
> Key: SPARK-16194
> URL: https://issues.apache.org/jira/browse/SPARK-16194
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>Priority: Minor
>
> I often need to dynamically configure a driver when submitting in cluster 
> mode, but there's currently no way of setting env vars.  {{spark-env.sh}} 
> lets me set env vars, but I have to statically build that into my spark 
> distribution.  I need a solution for specifying them in {{spark-submit}}.  
> Much like {{spark.executorEnv.[ENV]}}, but for drivers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16194) No way to dynamically set env vars on driver in cluster mode

2016-06-24 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16194:
---

 Summary: No way to dynamically set env vars on driver in cluster 
mode
 Key: SPARK-16194
 URL: https://issues.apache.org/jira/browse/SPARK-16194
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Michael Gummelt


I often need to dynamically configure a driver when submitting in cluster mode, 
but there's currently no way of setting env vars.  {{spark-env.sh}} lets me set 
env vars, but I have to statically build that into my spark distribution.  I 
need a solution for specifying them in {{spark-submit}}.  Much like 
{{spark.executorEnv.[ENV]}}, but for drivers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13258) --conf variables not honored in Mesos cluster mode

2016-02-09 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-13258:
---

 Summary: --conf variables not honored in Mesos cluster mode
 Key: SPARK-13258
 URL: https://issues.apache.org/jira/browse/SPARK-13258
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.6.0
Reporter: Michael Gummelt


Spark properties set via the deprecated {{SPARK_JAVA_OPTS}} are passed along to 
the driver, but those set via the preferred {{--conf}} are not.

This results in the URI being fetched in the executor:

{{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
 -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit 
--deploy-mode cluster --master mesos://10.0.78.140:7077  --class 
org.apache.spark.examples.SparkPi 
http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}

This does not:

{{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" 
./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 
--conf 
spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
 --class org.apache.spark.examples.SparkPi 
http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369
In the above line of code, you can see that SPARK_JAVA_OPTS is passed along to 
the driver, so those properties take effect.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373
Whereas in this line of code, you see that {{--conf}} variables are set on 
{{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this env 
var is being set on the driver, not the executor.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13258) --conf properties not honored in Mesos cluster mode

2016-02-09 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-13258:

Summary: --conf properties not honored in Mesos cluster mode  (was: --conf 
variables not honored in Mesos cluster mode)

> --conf properties not honored in Mesos cluster mode
> ---
>
> Key: SPARK-13258
> URL: https://issues.apache.org/jira/browse/SPARK-13258
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> Spark properties set via the deprecated {{SPARK_JAVA_OPTS}} are passed along 
> to the driver, but those set via the preferred {{--conf}} are not.
> This results in the URI being fetched in the executor:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" 
> ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077  
> --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> This does not:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0"
>  ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 
> --conf 
> spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369
> In the above line of code, you can see that SPARK_JAVA_OPTS is passed along 
> to the driver, so those properties take effect.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373
> Whereas in this line of code, you see that {{--conf}} variables are set on 
> {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this 
> env var is being set on the driver, not the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13258) --conf properties not honored in Mesos cluster mode

2016-02-09 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-13258:

Description: 
Spark properties set on {{spark-submit}} via the deprecated {{SPARK_JAVA_OPTS}} 
are passed along to the driver, but those set via the preferred {{--conf}} are 
not.

For example, this results in the URI being fetched in the executor:

{{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
 -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit 
--deploy-mode cluster --master mesos://10.0.78.140:7077  --class 
org.apache.spark.examples.SparkPi 
http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}

This does not:

{{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" 
./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 
--conf 
spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
 --class org.apache.spark.examples.SparkPi 
http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369
In the above line of code, you can see that SPARK_JAVA_OPTS is passed along to 
the driver, so those properties take effect.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373
Whereas in this line of code, you see that {{--conf}} variables are set on 
{{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this env 
var is being set on the driver, not the executor.



  was:
Spark properties set via the deprecated {{SPARK_JAVA_OPTS}} are passed along to 
the driver, but those set via the preferred {{--conf}} are not.

This results in the URI being fetched in the executor:

{{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
 -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" ./bin/spark-submit 
--deploy-mode cluster --master mesos://10.0.78.140:7077  --class 
org.apache.spark.examples.SparkPi 
http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}

This does not:

{{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" 
./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 
--conf 
spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
 --class org.apache.spark.examples.SparkPi 
http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369
In the above line of code, you can see that SPARK_JAVA_OPTS is passed along to 
the driver, so those properties take effect.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373
Whereas in this line of code, you see that {{--conf}} variables are set on 
{{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this env 
var is being set on the driver, not the executor.




> --conf properties not honored in Mesos cluster mode
> ---
>
> Key: SPARK-13258
> URL: https://issues.apache.org/jira/browse/SPARK-13258
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> Spark properties set on {{spark-submit}} via the deprecated 
> {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the 
> preferred {{--conf}} are not.
> For example, this results in the URI being fetched in the executor:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" 
> ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077  
> --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> This does not:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0"
>  ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 
> --conf 
> spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> 

[jira] [Created] (SPARK-13259) SPARK_HOME should not be used as the CWD in docker executors

2016-02-09 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-13259:
---

 Summary: SPARK_HOME should not be used as the CWD in docker 
executors
 Key: SPARK-13259
 URL: https://issues.apache.org/jira/browse/SPARK-13259
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.6.0
Reporter: Michael Gummelt
Priority: Minor


I have a docker image that explicitly sets WORKDIR.  However, I also have to 
set spark.mesos.executor.home when submitting in client mode, otherwise the cwd 
is set to the SPARK_HOME of the driver.  SPARK_HOME should never be used in 
docker executors, as it's a different file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13439) Document that spark.mesos.uris is comma-separated

2016-02-22 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-13439:

Description: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L346

> Document that spark.mesos.uris is comma-separated
> -
>
> Key: SPARK-13439
> URL: https://issues.apache.org/jira/browse/SPARK-13439
> Project: Spark
>  Issue Type: Documentation
>  Components: Mesos
>Reporter: Michael Gummelt
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L346



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13439) Document that spark.mesos.uris is comma-separated

2016-02-22 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-13439:
---

 Summary: Document that spark.mesos.uris is comma-separated
 Key: SPARK-13439
 URL: https://issues.apache.org/jira/browse/SPARK-13439
 Project: Spark
  Issue Type: Documentation
  Components: Mesos
Reporter: Michael Gummelt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown

2016-03-26 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-14180:

Description: 
I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced a 
deadlock in executor shutdown.  The result is executor shutdown hangs 
indefinitely.  In Mesos at least, this lasts until 
{{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the driver 
stops, which force kills the executors.

The deadlock is as follows:
- CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks on 
rpcEnv.awaitTermination() 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95
- rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which 
blocks until all dispatcher threads (MessageLoop threads) terminate
- However, the initial Shutdown message handling is itself handled by a 
Dispatcher MessageLoop thread.  This mutual dependence results in a deadlock. 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala#L216

  was:
I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced a 
deadlock in executor shutdown.  The result is executor shutdown hangs 
indefinitely.  In Mesos at least, this lasts until 
{{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the driver 
stops, which force kills the executors.

The deadlock is as follows:
- CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks on 
rpcEnv.awaitTermination() 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95
- rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which 
blocks until all dispatcher threads (MessageLoop threads) terminate
- However, the initial Shutdown message handling is itself handled by a 
Dispatcher MessageLoop thread.  This mutual dependence results in a deadlock.


> Deadlock in CoarseGrainedExecutorBackend Shutdown
> -
>
> Key: SPARK-14180
> URL: https://issues.apache.org/jira/browse/SPARK-14180
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: master branch.  commit 
> d6dc12ef0146ae409834c78737c116050961f350
>Reporter: Michael Gummelt
>Priority: Blocker
>
> I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced 
> a deadlock in executor shutdown.  The result is executor shutdown hangs 
> indefinitely.  In Mesos at least, this lasts until 
> {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the 
> driver stops, which force kills the executors.
> The deadlock is as follows:
> - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks 
> on rpcEnv.awaitTermination() 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95
> - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which 
> blocks until all dispatcher threads (MessageLoop threads) terminate
> - However, the initial Shutdown message handling is itself handled by a 
> Dispatcher MessageLoop thread.  This mutual dependence results in a deadlock. 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown

2016-03-26 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-14180:
---

 Summary: Deadlock in CoarseGrainedExecutorBackend Shutdown
 Key: SPARK-14180
 URL: https://issues.apache.org/jira/browse/SPARK-14180
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: master branch.  commit 
d6dc12ef0146ae409834c78737c116050961f350
Reporter: Michael Gummelt
Priority: Blocker


I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced a 
deadlock in executor shutdown.  The result is executor shutdown hangs 
indefinitely.  In Mesos at least, this lasts until 
{{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the driver 
stops, which force kills the executors.

The deadlock is as follows:
- CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks on 
rpcEnv.awaitTermination() 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95
- rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which 
blocks until all dispatcher threads (MessageLoop threads) terminate
- However, the initial Shutdown message handling is itself handled by a 
Dispatcher MessageLoop thread.  This mutual dependence results in a deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown

2016-03-26 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213184#comment-15213184
 ] 

Michael Gummelt commented on SPARK-14180:
-

cc [~zsxwing]

> Deadlock in CoarseGrainedExecutorBackend Shutdown
> -
>
> Key: SPARK-14180
> URL: https://issues.apache.org/jira/browse/SPARK-14180
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: master branch.  commit 
> d6dc12ef0146ae409834c78737c116050961f350
>Reporter: Michael Gummelt
>Priority: Blocker
>
> I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced 
> a deadlock in executor shutdown.  The result is executor shutdown hangs 
> indefinitely.  In Mesos at least, this lasts until 
> {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the 
> driver stops, which force kills the executors.
> The deadlock is as follows:
> - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks 
> on rpcEnv.awaitTermination() 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95
> - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which 
> blocks until all dispatcher threads (MessageLoop threads) terminate
> - However, the initial Shutdown message handling is itself handled by a 
> Dispatcher MessageLoop thread.  This mutual dependence results in a deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13258) --conf properties not honored in Mesos cluster mode

2016-04-04 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224690#comment-15224690
 ] 

Michael Gummelt commented on SPARK-13258:
-

[~jayv] Does your PR fix this problem?

> --conf properties not honored in Mesos cluster mode
> ---
>
> Key: SPARK-13258
> URL: https://issues.apache.org/jira/browse/SPARK-13258
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> Spark properties set on {{spark-submit}} via the deprecated 
> {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the 
> preferred {{--conf}} are not.
> For example, this results in the URI being fetched in the executor:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" 
> ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077  
> --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> This does not:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0"
>  ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 
> --conf 
> spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369
> In the above line of code, you can see that SPARK_JAVA_OPTS is passed along 
> to the driver, so those properties take effect.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373
> Whereas in this line of code, you see that {{--conf}} variables are set on 
> {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this 
> env var is being set on the driver, not the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown

2016-03-28 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-14180:

Affects Version/s: (was: 2.0.0)

> Deadlock in CoarseGrainedExecutorBackend Shutdown
> -
>
> Key: SPARK-14180
> URL: https://issues.apache.org/jira/browse/SPARK-14180
> Project: Spark
>  Issue Type: Bug
> Environment: master branch.  commit 
> d6dc12ef0146ae409834c78737c116050961f350
>Reporter: Michael Gummelt
>Priority: Blocker
> Fix For: 2.0.0
>
>
> I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced 
> a deadlock in executor shutdown.  The result is executor shutdown hangs 
> indefinitely.  In Mesos at least, this lasts until 
> {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the 
> driver stops, which force kills the executors.
> The deadlock is as follows:
> - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks 
> on rpcEnv.awaitTermination() 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95
> - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which 
> blocks until all dispatcher threads (MessageLoop threads) terminate
> - However, the initial Shutdown message handling is itself handled by a 
> Dispatcher MessageLoop thread.  This mutual dependence results in a deadlock. 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14180) Deadlock in CoarseGrainedExecutorBackend Shutdown

2016-03-28 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-14180:

Fix Version/s: 2.0.0

> Deadlock in CoarseGrainedExecutorBackend Shutdown
> -
>
> Key: SPARK-14180
> URL: https://issues.apache.org/jira/browse/SPARK-14180
> Project: Spark
>  Issue Type: Bug
> Environment: master branch.  commit 
> d6dc12ef0146ae409834c78737c116050961f350
>Reporter: Michael Gummelt
>Priority: Blocker
> Fix For: 2.0.0
>
>
> I'm fairly certain that https://github.com/apache/spark/pull/11031 introduced 
> a deadlock in executor shutdown.  The result is executor shutdown hangs 
> indefinitely.  In Mesos at least, this lasts until 
> {{spark.mesos.coarse.shutdownTimeout}} (default 10s), at which point the 
> driver stops, which force kills the executors.
> The deadlock is as follows:
> - CoarseGrainedExecutorBackend receives a Shutdown message, which now blocks 
> on rpcEnv.awaitTermination() 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L95
> - rpcEnv.awaitTermination() blocks on dispatcher.awaitTermination(), which 
> blocks until all dispatcher threads (MessageLoop threads) terminate
> - However, the initial Shutdown message handling is itself handled by a 
> Dispatcher MessageLoop thread.  This mutual dependence results in a deadlock. 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala#L216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14822) Add lazy executor startup to Mesos Scheduler

2016-04-21 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-14822:
---

 Summary: Add lazy executor startup to Mesos Scheduler
 Key: SPARK-14822
 URL: https://issues.apache.org/jira/browse/SPARK-14822
 Project: Spark
  Issue Type: Task
  Components: Mesos
Reporter: Michael Gummelt


As we deprecate fine-grained mode, we need to make sure we have alternative 
solutions for its benefits.

Its two benefits are:

0. lazy executor startup
  In fine-grained mode, executors are brought up only as tasks are scheduled.  
This means that a user doesn't have to set {{spark.cores.max}} to ensure that 
the app doesn't consume all resources in the cluster.

1. relinquishing cores
  As Spark tasks terminate, the mesos task it was bound to terminates as well, 
thus relinquishing the cores assigned to it.

I'd like to add {{0.}} to coarse-grained mode, possibly enabled with a 
configuration param.  If https://issues.apache.org/jira/browse/MESOS-1279 ever 
happens, we can add {{1.}} as well.

cc [~tnachen] [~dragos] [~skonto] [~andrewor14]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14977) Fine grained mode in Mesos is not fair

2016-04-28 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263393#comment-15263393
 ] 

Michael Gummelt commented on SPARK-14977:
-

I assume your first two jobs are long running?  Mesos doesn't offer resources 
to the third app, because there are no more resources to offer.  They've 
already been offered to the first two apps.  We're looking into support for 
revocable resources to solve this problem.  You can also partition your cluster 
via roles if you'd like certain jobs to have guaranteed resources.

> Fine grained mode in Mesos is not fair
> --
>
> Key: SPARK-14977
> URL: https://issues.apache.org/jira/browse/SPARK-14977
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.1.0
> Environment: Spark commit db75ccb, Debian jessie, Mesos fine grained
>Reporter: Luca Bruno
>
> I've setup a mesos cluster and I'm running spark in fine grained mode.
> Spark defaults to 2 executor cores and 2gb of ram.
> The total mesos cluster has 8 cores and 8gb of ram.
> When I submit two spark jobs simultaneously, spark will always accept full 
> resources, leading the two frameworks to use 4gb of ram each instead of 2gb.
> If I submit another spark job, it will not get offered resources from mesos, 
> at least using the default HierarchicalDRF allocator module.
> Mesos will keep offering 4gb of ram to earlier spark jobs, and spark keeps 
> accepting full resources for every new task.
> Hence new spark jobs have no chance of getting a share.
> Is this something to be solved with a custom mesos allocator? Or spark should 
> be more fair instead? Or maybe provide a configuration option to always 
> accept with the minimum resources?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit

2016-05-10 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279318#comment-15279318
 ] 

Michael Gummelt commented on SPARK-10643:
-

+1 to fix this.  

> Support HDFS application download in client mode spark submit
> -
>
> Key: SPARK-10643
> URL: https://issues.apache.org/jira/browse/SPARK-10643
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Submit
>Reporter: Alan Braithwaite
>Priority: Minor
>
> When using mesos with docker and marathon, it would be nice to be able to 
> make spark-submit deployable on marathon and have that download a jar from 
> HDFS instead of having to package the jar with the docker.
> {code}
> $ docker run -it docker.example.com/spark:latest 
> /usr/local/spark/bin/spark-submit  --class 
> com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar 
> Warning: Skip remote jar hdfs://hdfs/tmp/application.jar.
> java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> Although I'm aware that we can run in cluster mode with mesos, we've already 
> built some nice tools surrounding marathon for logging and monitoring.
> Code in question:
> https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15271) Allow force pulling executor docker images

2016-05-12 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281838#comment-15281838
 ] 

Michael Gummelt commented on SPARK-15271:
-

Much needed, thanks.

> Allow force pulling executor docker images
> --
>
> Key: SPARK-15271
> URL: https://issues.apache.org/jira/browse/SPARK-15271
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 1.6.1
>Reporter: Philipp Hoffmann
>
> Mesos agents by default will not pull docker images which are cached locally 
> already.
> Because of this, in order to run a mutable tag (like {{...:latest}}) from the 
> current version on the docker repository you have to explicitly tell the 
> Mesos agent to pull the image (force pull). Otherwise the Mesos agent will 
> run an old (cached version).
> The feature for force pulling the image was introduced in Mesos 0.22:
> https://github.com/apache/mesos/commit/8682569df528717ff5efb64da26b1b49c39c4efd
> This ticket is about making use of this feature in Spark in order to force 
> Mesos agents to pull the executors docker image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14977) Fine grained mode in Mesos is not fair

2016-05-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273202#comment-15273202
 ] 

Michael Gummelt commented on SPARK-14977:
-

[~lethalman]: Fine-grained mode only release cores, not memory.  It's 
impossible for us to shrink the memory allocation without OOM-ing the executor, 
because the JVM doesn't relinquish memory back to the OS.

You can use dynamic allocation to terminate entire executors as they become 
idle.

Also, FYI, fine-grained mode will soon be deprecated in favor of dynamic 
allocation.

> Fine grained mode in Mesos is not fair
> --
>
> Key: SPARK-14977
> URL: https://issues.apache.org/jira/browse/SPARK-14977
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.1.0
> Environment: Spark commit db75ccb, Debian jessie, Mesos fine grained
>Reporter: Luca Bruno
>
> I've setup a mesos cluster and I'm running spark in fine grained mode.
> Spark defaults to 2 executor cores and 2gb of ram.
> The total mesos cluster has 8 cores and 8gb of ram.
> When I submit two spark jobs simultaneously, spark will always accept full 
> resources, leading the two frameworks to use 4gb of ram each instead of 2gb.
> If I submit another spark job, it will not get offered resources from mesos, 
> at least using the default HierarchicalDRF allocator module.
> Mesos will keep offering 4gb of ram to earlier spark jobs, and spark keeps 
> accepting full resources for every new task.
> Hence new spark jobs have no chance of getting a share.
> Is this something to be solved with a custom mesos allocator? Or spark should 
> be more fair instead? Or maybe provide a configuration option to always 
> accept with the minimum resources?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14977) Fine grained mode in Mesos is not fair

2016-05-05 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt closed SPARK-14977.
---
Resolution: Not A Problem

> Fine grained mode in Mesos is not fair
> --
>
> Key: SPARK-14977
> URL: https://issues.apache.org/jira/browse/SPARK-14977
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.1.0
> Environment: Spark commit db75ccb, Debian jessie, Mesos fine grained
>Reporter: Luca Bruno
>
> I've setup a mesos cluster and I'm running spark in fine grained mode.
> Spark defaults to 2 executor cores and 2gb of ram.
> The total mesos cluster has 8 cores and 8gb of ram.
> When I submit two spark jobs simultaneously, spark will always accept full 
> resources, leading the two frameworks to use 4gb of ram each instead of 2gb.
> If I submit another spark job, it will not get offered resources from mesos, 
> at least using the default HierarchicalDRF allocator module.
> Mesos will keep offering 4gb of ram to earlier spark jobs, and spark keeps 
> accepting full resources for every new task.
> Hence new spark jobs have no chance of getting a share.
> Is this something to be solved with a custom mesos allocator? Or spark should 
> be more fair instead? Or maybe provide a configuration option to always 
> accept with the minimum resources?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2016-05-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274445#comment-15274445
 ] 

Michael Gummelt commented on SPARK-15142:
-

I can't understand this sentence.  Can you reword this?

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2016-05-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274447#comment-15274447
 ] 

Michael Gummelt commented on SPARK-15142:
-

Can you include the dispatcher logs?
Does restarting the dispatcher fix the problem?

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15155) Optionally ignore default role resources

2016-05-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274455#comment-15274455
 ] 

Michael Gummelt commented on SPARK-15155:
-

Why do you want to avoid launching on the default role?  The default role 
represents resources available to all frameworks.  If you don't want certain 
frameworks to launch tasks on default role resources, you should reserve those 
resources on a different role.

> Optionally ignore default role resources
> 
>
> Key: SPARK-15155
> URL: https://issues.apache.org/jira/browse/SPARK-15155
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chris Heller
>
> SPARK-6284 added support for Mesos roles, but the framework will still accept 
> resources from both the reserved role specified in {{spark.mesos.role}} and 
> the default role {{*}}.
> I'd like to propose the addition of a new boolean property: 
> {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark 
> will only accept resources from the role passed in the {{spark.mesos.role}} 
> property. If {{spark.mesos.role}} has not been set, 
> {{spark.mesos.ignoreDefaultRoleResources}} has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15155) Optionally ignore default role resources

2016-05-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274491#comment-15274491
 ] 

Michael Gummelt commented on SPARK-15155:
-

Yes, I understand the effect, but not the motivation.  Why don't you want to 
launch Spark tasks on the default role?

> Optionally ignore default role resources
> 
>
> Key: SPARK-15155
> URL: https://issues.apache.org/jira/browse/SPARK-15155
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chris Heller
>
> SPARK-6284 added support for Mesos roles, but the framework will still accept 
> resources from both the reserved role specified in {{spark.mesos.role}} and 
> the default role {{*}}.
> I'd like to propose the addition of a new boolean property: 
> {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark 
> will only accept resources from the role passed in the {{spark.mesos.role}} 
> property. If {{spark.mesos.role}} has not been set, 
> {{spark.mesos.ignoreDefaultRoleResources}} has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15155) Optionally ignore default role resources

2016-05-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274560#comment-15274560
 ] 

Michael Gummelt commented on SPARK-15155:
-

Why not create a separate role for your ad-hoc work?

We'll eventually solve this more efficiently with support for revocable 
resources: http://mesos.apache.org/documentation/latest/oversubscription/


> Optionally ignore default role resources
> 
>
> Key: SPARK-15155
> URL: https://issues.apache.org/jira/browse/SPARK-15155
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chris Heller
>
> SPARK-6284 added support for Mesos roles, but the framework will still accept 
> resources from both the reserved role specified in {{spark.mesos.role}} and 
> the default role {{*}}.
> I'd like to propose the addition of a new boolean property: 
> {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark 
> will only accept resources from the role passed in the {{spark.mesos.role}} 
> property. If {{spark.mesos.role}} has not been set, 
> {{spark.mesos.ignoreDefaultRoleResources}} has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15155) Optionally ignore default role resources

2016-05-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274682#comment-15274682
 ] 

Michael Gummelt commented on SPARK-15155:
-

Just have a single role for your batch jobs if you want them to have guaranteed 
resources.  Or just ensure that the streaming jobs have spark.cores.max set 
appropriately, and launch everything in the default role.  If this doesn't work 
for some reason, and you still have issues, please frame the problem as "If I 
do X, then I will run into problem Y", because I'm having trouble understanding 
your problem.

> Optionally ignore default role resources
> 
>
> Key: SPARK-15155
> URL: https://issues.apache.org/jira/browse/SPARK-15155
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chris Heller
>
> SPARK-6284 added support for Mesos roles, but the framework will still accept 
> resources from both the reserved role specified in {{spark.mesos.role}} and 
> the default role {{*}}.
> I'd like to propose the addition of a new boolean property: 
> {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark 
> will only accept resources from the role passed in the {{spark.mesos.role}} 
> property. If {{spark.mesos.role}} has not been set, 
> {{spark.mesos.ignoreDefaultRoleResources}} has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15155) Optionally ignore default role resources

2016-05-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274505#comment-15274505
 ] 

Michael Gummelt commented on SPARK-15155:
-

I'm still missing the "why".  What is the downside of having a job launch tasks 
on the default role?

> Optionally ignore default role resources
> 
>
> Key: SPARK-15155
> URL: https://issues.apache.org/jira/browse/SPARK-15155
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chris Heller
>
> SPARK-6284 added support for Mesos roles, but the framework will still accept 
> resources from both the reserved role specified in {{spark.mesos.role}} and 
> the default role {{*}}.
> I'd like to propose the addition of a new boolean property: 
> {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark 
> will only accept resources from the role passed in the {{spark.mesos.role}} 
> property. If {{spark.mesos.role}} has not been set, 
> {{spark.mesos.ignoreDefaultRoleResources}} has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15155) Optionally ignore default role resources

2016-05-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274576#comment-15274576
 ] 

Michael Gummelt commented on SPARK-15155:
-

> they will again take resources from the default role

Your stated problem was that your ad-hoc jobs were starved.  This solves that 
problem.  So now I don't understand the problem.  Your long running apps have 
taken all the default resources, but if you have resources reserved for your 
ad-hoc jobs, they will never be starved.

> Optionally ignore default role resources
> 
>
> Key: SPARK-15155
> URL: https://issues.apache.org/jira/browse/SPARK-15155
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Chris Heller
>
> SPARK-6284 added support for Mesos roles, but the framework will still accept 
> resources from both the reserved role specified in {{spark.mesos.role}} and 
> the default role {{*}}.
> I'd like to propose the addition of a new boolean property: 
> {{spark.mesos.ignoreDefaultRoleResources}}. When this property is set Spark 
> will only accept resources from the role passed in the {{spark.mesos.role}} 
> property. If {{spark.mesos.role}} has not been set, 
> {{spark.mesos.ignoreDefaultRoleResources}} has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16742) Kerberos support for Spark on Mesos

2016-07-26 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16742:
---

 Summary: Kerberos support for Spark on Mesos
 Key: SPARK-16742
 URL: https://issues.apache.org/jira/browse/SPARK-16742
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Reporter: Michael Gummelt


We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
contributing it to Apache Spark soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-07-14 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15377353#comment-15377353
 ] 

Michael Gummelt commented on SPARK-16522:
-

[~srowen] I'm going to look into this now and resolve it today.  Can you hold 
off on the next 2.0 RC until this is resolved?  It's likely a major bug.

> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {panel}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at 

[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-07-14 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378178#comment-15378178
 ] 

Michael Gummelt commented on SPARK-16522:
-

I don't think so.  Please give me a couple hours to investigate further, though.

> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>  

[jira] [Updated] (SPARK-16687) build/mvn fails when fetching mvn

2016-07-22 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16687:

Description: 
mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn`

{code}
./build/mvn --force
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
 100.0%

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn
./build/mvn: line 152: 
/home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or 
directory
{code}

After changing MVN_VERSION from "3.3.3" to "3.3.9":

{code}
./build/mvn --force
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
 100.0%
Using `mvn` from path: 
/home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was 
removed in 8.0
{code}

  was:
mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn`

{code}
Cmgummelt@mg-mesos:~/code/spark$ ./build/mvn --force
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
 100.0%

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn
./build/mvn: line 152: 
/home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or 
directory
{code}

After changing MVN_VERSION from "3.3.3" to "3.3.9":

{code}
./build/mvn --force
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
 100.0%
Using `mvn` from path: 
/home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was 
removed in 8.0
{code}


> build/mvn fails when fetching mvn
> -
>
> Key: SPARK-16687
> URL: https://issues.apache.org/jira/browse/SPARK-16687
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.6.2
>Reporter: Michael Gummelt
>
> mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn`
> {code}
> ./build/mvn --force
> exec: curl --progress-bar -L 
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
>  
> 100.0%
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
> Using `mvn` from path: 
> /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn
> ./build/mvn: line 152: 
> /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or 
> directory
> {code}
> After changing MVN_VERSION from "3.3.3" to "3.3.9":
> {code}
> ./build/mvn --force
> exec: curl --progress-bar -L 
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
>  
> 100.0%
> Using `mvn` from path: 
> /home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support 
> was removed in 8.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16687) build/mvn fails when fetching mvn

2016-07-22 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16687:
---

 Summary: build/mvn fails when fetching mvn
 Key: SPARK-16687
 URL: https://issues.apache.org/jira/browse/SPARK-16687
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.6.2
Reporter: Michael Gummelt


mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn`

{code}
Cmgummelt@mg-mesos:~/code/spark$ ./build/mvn --force
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
 100.0%

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn
./build/mvn: line 152: 
/home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or 
directory
{code}

After changing MVN_VERSION from "3.3.3" to "3.3.9":

{code}
./build/mvn --force
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
 100.0%
Using `mvn` from path: 
/home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was 
removed in 8.0
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16687) build/mvn fails when fetching mvn

2016-07-22 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390014#comment-15390014
 ] 

Michael Gummelt commented on SPARK-16687:
-

thanks!

> build/mvn fails when fetching mvn
> -
>
> Key: SPARK-16687
> URL: https://issues.apache.org/jira/browse/SPARK-16687
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.6.2
>Reporter: Michael Gummelt
>
> mvn 3.3.3 no longer exists in the apache.org mirror used by `build/mvn`
> {code}
> ./build/mvn --force
> exec: curl --progress-bar -L 
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
>  
> 100.0%
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
> Using `mvn` from path: 
> /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn
> ./build/mvn: line 152: 
> /home/mgummelt/code/spark/build/apache-maven-3.3.3/bin/mvn: No such file or 
> directory
> {code}
> After changing MVN_VERSION from "3.3.3" to "3.3.9":
> {code}
> ./build/mvn --force
> exec: curl --progress-bar -L 
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
>  
> 100.0%
> Using `mvn` from path: 
> /home/mgummelt/code/spark/build/apache-maven-3.3.9/bin/mvn
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support 
> was removed in 8.0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16450) Build failes for Mesos 0.28.x

2016-07-27 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396236#comment-15396236
 ] 

Michael Gummelt commented on SPARK-16450:
-

I'll update soon.  Though this pending PR updates to 0.28 
https://github.com/apache/spark/pull/14275

> Build failes for Mesos 0.28.x
> -
>
> Key: SPARK-16450
> URL: https://issues.apache.org/jira/browse/SPARK-16450
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
> Environment: Mesos 0.28.0
>Reporter: Niels Becker
>
> Build fails:
> [error] 
> /usr/local/spark/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala:82:
>  type mismatch;
> [error]  found   : org.apache.mesos.protobuf.ByteString
> [error]  required: String
> [error]   credBuilder.setSecret(ByteString.copyFromUtf8(secret))
> Build cmd:
> dev/make-distribution.sh --tgz -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive 
> -DskipTests -Dmesos.version=0.28.0 -Djava.version=1.8
> Spark Version: 2.0.0-rc2
> Java: OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-1~bpo8+1-b14
> Scala Version: 2.11.8
> Same error for mesos.version=0.28.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16783) make-distri

2016-07-28 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt closed SPARK-16783.
---
Resolution: Not A Problem

> make-distri
> ---
>
> Key: SPARK-16783
> URL: https://issues.apache.org/jira/browse/SPARK-16783
> Project: Spark
>  Issue Type: Bug
>Reporter: Michael Gummelt
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16784) Configurable log4j settings

2016-07-28 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16784:
---

 Summary: Configurable log4j settings
 Key: SPARK-16784
 URL: https://issues.apache.org/jira/browse/SPARK-16784
 Project: Spark
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Michael Gummelt


I often want to change the logging configuration on a single spark job.  This 
is easy in client mode.  I just modify log4j.properties.  It's difficult in 
cluster mode, because I need to modify the log4j.properties in the distribution 
in which the driver runs.  I'd like a way of setting this dynamically, such as 
a java system property.  Some brief searching showed that log4j doesn't seem to 
accept such a property, but I'd like to open up this idea for further comment.  
Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16783) make-distri

2016-07-28 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16783:
---

 Summary: make-distri
 Key: SPARK-16783
 URL: https://issues.apache.org/jira/browse/SPARK-16783
 Project: Spark
  Issue Type: Bug
Reporter: Michael Gummelt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16808) History Server main page does not honor APPLICATION_WEB_PROXY_BASE

2016-07-29 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16808:
---

 Summary: History Server main page does not honor 
APPLICATION_WEB_PROXY_BASE
 Key: SPARK-16808
 URL: https://issues.apache.org/jira/browse/SPARK-16808
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Michael Gummelt


The root of the history server is rendered dynamically with javascript, and 
this doesn't honor APPLICATION_WEB_PROXY_BASE: 
https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/ui/static/historypage-template.html#L67

Other links in the history server do honor it: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L146

This means the links on the history server root page are broken when deployed 
behind a proxy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16809) Link Mesos Dispatcher and History Server

2016-07-29 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16809:
---

 Summary: Link Mesos Dispatcher and History Server
 Key: SPARK-16809
 URL: https://issues.apache.org/jira/browse/SPARK-16809
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Reporter: Michael Gummelt


This is a somewhat duplicate of Spark-13401, but the PR for that JIRA seems to 
only implement sandbox linking, not history server linking, which is the sole 
scope of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417857#comment-15417857
 ] 

Michael Gummelt edited comment on SPARK-16784 at 8/11/16 8:11 PM:
--

{{log4j.debug=true}} only results in log4j printing its internal debugging 
messages (e.g. config file location, appenders, etc.).  It doesn't turn on 
debug logging for the application.


was (Author: mgummelt):
{{log4j.debug=true}} only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt reopened SPARK-16784:
-

`log4j.debug=true` only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417856#comment-15417856
 ] 

Michael Gummelt commented on SPARK-16784:
-

`log4j.debug=true` only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417857#comment-15417857
 ] 

Michael Gummelt edited comment on SPARK-16784 at 8/11/16 8:10 PM:
--

{{log4j.debug=true}} only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.


was (Author: mgummelt):
`log4j.debug=true` only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16784:

Comment: was deleted

(was: `log4j.debug=true` only results in log4j printing its debugging messages. 
 It doesn't turn on debug logging for the application.)

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16881) Migrate Mesos configs to use ConfigEntry

2016-08-03 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16881:
---

 Summary: Migrate Mesos configs to use ConfigEntry
 Key: SPARK-16881
 URL: https://issues.apache.org/jira/browse/SPARK-16881
 Project: Spark
  Issue Type: Task
  Components: Mesos
Affects Versions: 2.0.0
Reporter: Michael Gummelt
Priority: Minor


https://github.com/apache/spark/pull/14414#discussion_r73032190



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16881) Migrate Mesos configs to use ConfigEntry

2016-08-03 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16881:

Description: 
https://github.com/apache/spark/pull/14414#discussion_r73032190

We'd like to migrate Mesos' use of config vars to the new ConfigEntry class so 
we can a) define all our configs in one place like YARN does, and b) take use 
of features like default handling and generics

  was:https://github.com/apache/spark/pull/14414#discussion_r73032190


> Migrate Mesos configs to use ConfigEntry
> 
>
> Key: SPARK-16881
> URL: https://issues.apache.org/jira/browse/SPARK-16881
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>Priority: Minor
>
> https://github.com/apache/spark/pull/14414#discussion_r73032190
> We'd like to migrate Mesos' use of config vars to the new ConfigEntry class 
> so we can a) define all our configs in one place like YARN does, and b) take 
> use of features like default handling and generics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL

2016-08-10 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-17002:
---

 Summary: Document that spark.ssl.protocol. is required for SSL
 Key: SPARK-17002
 URL: https://issues.apache.org/jira/browse/SPARK-17002
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.0.0, 1.6.2
Reporter: Michael Gummelt


cc [~jlewandowski]

I was trying to start the Spark master.  When setting 
{{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get 
this none-too-helpful error message:

{code}
16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(mgummelt); users 
with modify permissions: Set(mgummelt)
16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for 
SSL connections.
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121)
at org.apache.spark.deploy.master.Master$.main(Master.scala:1106)
at org.apache.spark.deploy.master.Master.main(Master.scala)
{code}

We should document that {{spark.ssl.protocol}} is required, and throw a more 
helpful error message when it isn't set.  In fact, we should remove the 
`getOrElse` here: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285,
 since the following line fails when the protocol is set to "Default"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-07-13 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376125#comment-15376125
 ] 

Michael Gummelt commented on SPARK-16522:
-

I've seen some stack traces recently that might have been this.  I'm trying to 
repro now.  Will get back to you.  Which commit/tag are you running?

> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {panel}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at 

[jira] [Commented] (SPARK-13258) --conf properties not honored in Mesos cluster mode

2016-07-13 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375797#comment-15375797
 ] 

Michael Gummelt commented on SPARK-13258:
-

Nope, this is still a bug.

> --conf properties not honored in Mesos cluster mode
> ---
>
> Key: SPARK-13258
> URL: https://issues.apache.org/jira/browse/SPARK-13258
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> Spark properties set on {{spark-submit}} via the deprecated 
> {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the 
> preferred {{--conf}} are not.
> For example, this results in the URI being fetched in the executor:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" 
> ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077  
> --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> This does not:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0"
>  ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 
> --conf 
> spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369
> In the above line of code, you can see that SPARK_JAVA_OPTS is passed along 
> to the driver, so those properties take effect.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373
> Whereas in this line of code, you see that {{--conf}} variables are set on 
> {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this 
> env var is being set on the driver, not the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11183) enable support for mesos 0.24+

2016-07-19 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384520#comment-15384520
 ] 

Michael Gummelt commented on SPARK-11183:
-

Hey All.  I just saw this JIRA.  Sorry for the delay.  Once Mesos 1.0 is 
released (maybe this week), I'll update Spark to use the 1.0 bindings.

The problem described in this JIRA isn't actually a bindings issue.  It's a 
libmesos issue.  If you update your libmesos to be a later version, it should 
go away.  The bindings aren't technically guaranteed to be compatible with an 
arbitrary libmesos version, but I've never seen an issue.

The long term solution is to move the Spark scheduler over to the new Mesos 
HTTP API, so we no longer have to deal with libmesos.



> enable support for mesos 0.24+
> --
>
> Key: SPARK-11183
> URL: https://issues.apache.org/jira/browse/SPARK-11183
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Ioannis Polyzos
>
> mesos 0.24, the mesos leader info in ZK has changed to json tis result to 
> spark failed to running on 0.24+.
> References : 
>   https://issues.apache.org/jira/browse/MESOS-2340 
>   
> http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E
>   https://github.com/mesos/elasticsearch/issues/338
>   https://github.com/spark-jobserver/spark-jobserver/issues/267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16627) --jars doesn't work in Mesos mode

2016-07-19 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16627:
---

 Summary: --jars doesn't work in Mesos mode
 Key: SPARK-16627
 URL: https://issues.apache.org/jira/browse/SPARK-16627
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Reporter: Michael Gummelt


Definitely doesn't work in cluster mode.  Might not work in client mode either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16637) Support Mesos Unified Containerizer

2016-07-19 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16637:
---

 Summary: Support Mesos Unified Containerizer
 Key: SPARK-16637
 URL: https://issues.apache.org/jira/browse/SPARK-16637
 Project: Spark
  Issue Type: Task
  Components: Mesos
Affects Versions: 2.0.0
Reporter: Michael Gummelt


Mesos is moving toward a single, unified containerizer that will run both 
Docker and non-Docker containers.  We should add support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-07-15 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380130#comment-15380130
 ] 

Michael Gummelt commented on SPARK-16522:
-

This shouldn't affect functionality

> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> 

[jira] [Commented] (SPARK-16450) Build failes for Mesos 0.28.x

2016-07-11 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371562#comment-15371562
 ] 

Michael Gummelt commented on SPARK-16450:
-

Once Mesos 1.0 is released, I'll submit a PR to upgrade.

Long term solution is to use the HTTP API, so we no longer have to deal with 
libmesos, but that's a large change.

> Build failes for Mesos 0.28.x
> -
>
> Key: SPARK-16450
> URL: https://issues.apache.org/jira/browse/SPARK-16450
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
> Environment: Mesos 0.28.0
>Reporter: Niels Becker
>
> Build fails:
> [error] 
> /usr/local/spark/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala:82:
>  type mismatch;
> [error]  found   : org.apache.mesos.protobuf.ByteString
> [error]  required: String
> [error]   credBuilder.setSecret(ByteString.copyFromUtf8(secret))
> Build cmd:
> dev/make-distribution.sh --tgz -Phadoop-2.7 -Dhadoop.version=2.7.0 -Phive 
> -DskipTests -Dmesos.version=0.28.0 -Djava.version=1.8
> Spark Version: 2.0.0-rc2
> Java: OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-1~bpo8+1-b14
> Scala Version: 2.11.8
> Same error for mesos.version=0.28.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363013#comment-15363013
 ] 

Michael Gummelt commented on SPARK-16379:
-

I traced back the addition of the `synchronized` block, and it seems Matei 
added it a long time ago.  I can't prove that the method is thread-safe, so I'd 
rather not remove the synchronization block.  So we can either:

1) Remove the log statements (I'd like to keep them)
2) Revert the `lazy` commit
3) Introduce an explicit lock, and synchronize on that rather than `this`

2) is the "correct" thing to do, since it's the author's responsibility to not 
break existing code, but I'm OK with 3) as well.  [~srowen] what do you think?

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:34 PM:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

| One other thing i hope it holds is no new commit should break the project 
even if it fixes something or reveals another issue etc.

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.


was (Author: mgummelt):
I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

> One other thing i hope it holds is no new commit should break the project 
> even if it fixes something or reveals another issue etc.

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:33 PM:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

> One other thing i hope it holds is no new commit should break the project 
> even if it fixes something or reveals another issue etc.

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.


was (Author: mgummelt):
I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:35 PM:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

{quote}
One other thing i hope it holds is no new commit should break the project even 
if it fixes something or reveals another issue etc.
{quote}

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.


was (Author: mgummelt):
I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html

| One other thing i hope it holds is no new commit should break the project 
even if it fixes something or reveals another issue etc.

Well I do agree with Sean that it's on us to fix bugs revealed by external 
changes.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363367#comment-15363367
 ] 

Michael Gummelt commented on SPARK-11857:
-

I'll give [~amcelwee] a couple days to respond.

[~dragos] [~skonto] [~tnachen] speak now or forever hold your peace.

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363256#comment-15363256
 ] 

Michael Gummelt commented on SPARK-16379:
-

> The previous code also involved acquiring a lock

Link?  I don't see this.  Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45



> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363258#comment-15363258
 ] 

Michael Gummelt commented on SPARK-16379:
-

> The previous code also involved acquiring a lock

Link? I don't see this. Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45


> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363345#comment-15363345
 ] 

Michael Gummelt commented on SPARK-11857:
-

For completeness, here's my theoretical analysis to augment our empirical 
observation that users don't mind fine-grained mode being removed.

Fine-grained mode provides two benefits:
1) Slow-start
  Executors are brought up lazily

2) Relinquishing cores
  Cores are relinquished back to Mesos as Spark tasks terminate

Fine-grained mdoe does *not* provide the following benefits, though some think 
it does:
a) Relinquishing memory
  The JVM doesn't relinquish memory, so it would be unsafe for us to resize the 
cgroup

b) Relinquishing executors

As for alternatives to the benefits, 1) is provided by dynamic allocation, 
though we need a better recommended setup for this as I document here: 
http://apache-spark-developers-list.1001551.n3.nabble.com/HDFS-as-Shuffle-Service-td17340.html
There is no alternative to 2), but we've generally found that the 
executor-level granularity of dynamic allocation is sufficient for most. 

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363410#comment-15363410
 ] 

Michael Gummelt commented on SPARK-11857:
-

Thanks!

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt commented on SPARK-16379:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363290#comment-15363290
 ] 

Michael Gummelt commented on SPARK-16379:
-

Hmmm, since that's a different lock, I don't see the possibility for deadlock 
in the previous code, but I'm content to relinquish the point.  Concurrency is 
hard :)

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363311#comment-15363311
 ] 

Michael Gummelt commented on SPARK-11857:
-

I endorse the deprecation.  Fine-grained mode would be more useful if the JVM 
could shrink in memory as well as cores, but alas...

We at Mesosphere haven't heard any objections from users regarding the loss of 
fine-grained.

[~andrewor14] Please cc me if you need Mesos input.  Tim is still active, I 
believe, but no longer at Mesosphere.  I work (mostly) full-time on Spark on 
Mesos.

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt reopened SPARK-11857:
-

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363289#comment-15363289
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:31 PM:
-

I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.  I agree we shouldn't 
get rid of `lazy val` completely, but it is unfortunate that you can't use them 
in a `synchronized` block.  It's a leaky abstraction.  Seems to be addressed 
here: 
http://docs.scala-lang.org/sips/pending/improved-lazy-val-initialization.html


was (Author: mgummelt):
I say we add a new lock to synchronize on and be done with it.

The root of the issue is that deadlock detection is hard.  The author of the 
breaking change added a critical region, and to do so safely, you have to 
ensure that all calling code paths haven't acquired the same lock, which is 
difficult (undecidable).

The only process change I can imagine to fix the higher level issue is running 
some sort of deadlock detection tool in the Spark tests.

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16379:

Comment: was deleted

(was: > The previous code also involved acquiring a lock

Link?  I don't see this.  Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45

)

> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16379) Spark on mesos is broken due to race condition in Logging

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363258#comment-15363258
 ] 

Michael Gummelt edited comment on SPARK-16379 at 7/5/16 9:15 PM:
-

> it's entirely possible that code has a bug that's only revealed when some 
> other legitimate change happens

Of course, but I still don't see the bug that existed previously.  Perhaps 
`synchronized` was unnecessary, but I still see no race condition nor deadlock 
in the previous code.  Maybe following up on this will help:

> The previous code also involved acquiring a lock

Link? I don't see this. Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45



was (Author: mgummelt):
> The previous code also involved acquiring a lock

Link? I don't see this. Or do you just mean the null check? 
https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec#diff-bfd5810d8aa78ad90150e806d830bb78L45


> Spark on mesos is broken due to race condition in Logging
> -
>
> Key: SPARK-16379
> URL: https://issues.apache.org/jira/browse/SPARK-16379
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Stavros Kontopoulos
>Priority: Blocker
> Attachments: out.txt
>
>
> This commit introduced a transient lazy log val: 
> https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec
> This has caused problems in the past:
> https://github.com/apache/spark/pull/1004
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> https://ci.typesafe.com/job/mit-docker-test-ref/191/
> You can easily verify it by installing mesos on your machine and try to 
> connect with spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf 
> spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get 
> '/mesos/json.info_000152' in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master 
> (UPID=master@127.0.1.1:5050) is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at 
> master@127.0.1.1:5050
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. 
> Attempting to register without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 
> 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11857) Remove Mesos fine-grained mode subject to discussions

2016-07-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363321#comment-15363321
 ] 

Michael Gummelt commented on SPARK-11857:
-

[~amcelwee] Do you have any more input on this issue.  We're moving forward 
with deprecating fine-grained mode, but we're willing to solve your issue first.

> Remove Mesos fine-grained mode subject to discussions
> -
>
> Key: SPARK-11857
> URL: https://issues.apache.org/jira/browse/SPARK-11857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> See discussions in
> http://apache-spark-developers-list.1001551.n3.nabble.com/Removing-the-Mesos-fine-grained-mode-td15277.html
> and
> http://apache-spark-developers-list.1001551.n3.nabble.com/Please-reply-if-you-use-Mesos-fine-grained-mode-td14930.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16923) Mesos cluster scheduler duplicates config vars by setting them in the environment and as --conf

2016-08-05 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16923:
---

 Summary: Mesos cluster scheduler duplicates config vars by setting 
them in the environment and as --conf
 Key: SPARK-16923
 URL: https://issues.apache.org/jira/browse/SPARK-16923
 Project: Spark
  Issue Type: Task
  Components: Mesos
Affects Versions: 2.0.0
Reporter: Michael Gummelt


I don't think this introduces any bugs, but we should fix it nonetheless



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12909) Spark on Mesos accessing Secured HDFS w/Kerberos

2016-08-08 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412624#comment-15412624
 ] 

Michael Gummelt commented on SPARK-12909:
-

DC/OS Spark has this functionality, and we'll be upstreaming it to Apache Spark 
soon.

> Spark on Mesos accessing Secured HDFS w/Kerberos
> 
>
> Key: SPARK-12909
> URL: https://issues.apache.org/jira/browse/SPARK-12909
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Greg Senia
>
> Ability for Spark on Mesos to use a Kerberized HDFS FileSystem for data It 
> seems like this is not possible based on email chains and forum articles? If 
> these are true how hard would it be to get this implemented I'm willing to 
> try to help.
> https://community.hortonworks.com/questions/5415/spark-on-yarn-vs-mesos.html
> https://www.mail-archive.com/user@spark.apache.org/msg31326.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking

2016-08-08 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412461#comment-15412461
 ] 

Michael Gummelt commented on SPARK-11638:
-

[~radekg]

> The only advantage we had was using the same configuration inside of the 
> docker container.

You mean you want to run the spark driver in a docker container?  Which 
configuration did you have to change?  I can look more into this, but I need a 
clear "It's easier/better to do X in bridge mode than in host mode".

> So with the HTTP API, Spark would still require the heavy libmesos in order 
> to work with Mesos?

No.  The HTTP API will remove the libmesos dependency, which is nice.  It's not 
an urgent priority though. 

> Run Spark on Mesos with bridge networking
> -
>
> Key: SPARK-11638
> URL: https://issues.apache.org/jira/browse/SPARK-11638
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Spark Core
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Radoslaw Gruchalski
> Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, 
> 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch
>
>
> h4. Summary
> Provides {{spark.driver.advertisedPort}}, 
> {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and 
> {{spark.replClassServer.advertisedPort}} settings to enable running Spark in 
> Mesos on Docker with Bridge networking. Provides patches for Akka Remote to 
> enable Spark driver advertisement using alternative host and port.
> With these settings, it is possible to run Spark Master in a Docker container 
> and have the executors running on Mesos talk back correctly to such Master.
> The problem is discussed on the Mesos mailing list here: 
> https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E
> h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door
> In order for the framework to receive orders in the bridged container, Mesos 
> in the container has to register for offers using the IP address of the 
> Agent. Offers are sent by Mesos Master to the Docker container running on a 
> different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} 
> would advertise itself using the IP address of the container, something like 
> {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a 
> different host, it's a different machine. Mesos 0.24.0 introduced two new 
> properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and 
> {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's 
> address to register for offers. This was provided mainly for running Mesos in 
> Docker on Mesos.
> h4. Spark - how does the above relate and what is being addressed here?
> Similar to Mesos, out of the box, Spark does not allow to advertise its 
> services on ports different than bind ports. Consider following scenario:
> Spark is running inside a Docker container on Mesos, it's a bridge networking 
> mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for 
> the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and 
> {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to 
> Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the 
> container ports. Starting the executors from such container results in 
> executors not being able to communicate back to the Spark Master.
> This happens because of 2 things:
> Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} 
> transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port 
> different to what it bound to. The settings discussed are here: 
> https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376.
>  These do not exist in Akka {{2.3.x}}. Spark driver will always advertise 
> port {{}} as this is the one {{akka-remote}} is bound to.
> Any URIs the executors contact the Spark Master on, are prepared by Spark 
> Master and handed over to executors. These always contain the port number 
> used by the Master to find the service on. The services are:
> - {{spark.broadcast.port}}
> - {{spark.fileserver.port}}
> - {{spark.replClassServer.port}}
> all above ports are by default {{0}} (random assignment) but can be specified 
> using Spark configuration ( {{-Dspark...port}} ). However, they are limited 
> in the same way as the {{spark.driver.port}}; in the above example, an 
> executor should not contact the file server on port {{6677}} but rather on 
> the respective 31xxx assigned by Mesos.
> Spark currently does not allow any of that.
> h4. Taking on the problem, step 1: Spark Driver
> As mentioned above, Spark 

[jira] [Created] (SPARK-16927) Mesos Cluster Dispatcher default properties

2016-08-05 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-16927:
---

 Summary: Mesos Cluster Dispatcher default properties
 Key: SPARK-16927
 URL: https://issues.apache.org/jira/browse/SPARK-16927
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 2.0.0
Reporter: Michael Gummelt


Add the capability to set default driver properties for all jobs submitted 
through the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking

2016-08-07 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411277#comment-15411277
 ] 

Michael Gummelt commented on SPARK-11638:
-

This JIRA is complex and a lot of it is out of date.  Can someone briefly 
explain to me what the problem is?  Why do you want bridge networking?



> Run Spark on Mesos with bridge networking
> -
>
> Key: SPARK-11638
> URL: https://issues.apache.org/jira/browse/SPARK-11638
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Spark Core
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Radoslaw Gruchalski
> Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, 
> 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch
>
>
> h4. Summary
> Provides {{spark.driver.advertisedPort}}, 
> {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and 
> {{spark.replClassServer.advertisedPort}} settings to enable running Spark in 
> Mesos on Docker with Bridge networking. Provides patches for Akka Remote to 
> enable Spark driver advertisement using alternative host and port.
> With these settings, it is possible to run Spark Master in a Docker container 
> and have the executors running on Mesos talk back correctly to such Master.
> The problem is discussed on the Mesos mailing list here: 
> https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E
> h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door
> In order for the framework to receive orders in the bridged container, Mesos 
> in the container has to register for offers using the IP address of the 
> Agent. Offers are sent by Mesos Master to the Docker container running on a 
> different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} 
> would advertise itself using the IP address of the container, something like 
> {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a 
> different host, it's a different machine. Mesos 0.24.0 introduced two new 
> properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and 
> {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's 
> address to register for offers. This was provided mainly for running Mesos in 
> Docker on Mesos.
> h4. Spark - how does the above relate and what is being addressed here?
> Similar to Mesos, out of the box, Spark does not allow to advertise its 
> services on ports different than bind ports. Consider following scenario:
> Spark is running inside a Docker container on Mesos, it's a bridge networking 
> mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for 
> the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and 
> {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to 
> Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the 
> container ports. Starting the executors from such container results in 
> executors not being able to communicate back to the Spark Master.
> This happens because of 2 things:
> Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} 
> transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port 
> different to what it bound to. The settings discussed are here: 
> https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376.
>  These do not exist in Akka {{2.3.x}}. Spark driver will always advertise 
> port {{}} as this is the one {{akka-remote}} is bound to.
> Any URIs the executors contact the Spark Master on, are prepared by Spark 
> Master and handed over to executors. These always contain the port number 
> used by the Master to find the service on. The services are:
> - {{spark.broadcast.port}}
> - {{spark.fileserver.port}}
> - {{spark.replClassServer.port}}
> all above ports are by default {{0}} (random assignment) but can be specified 
> using Spark configuration ( {{-Dspark...port}} ). However, they are limited 
> in the same way as the {{spark.driver.port}}; in the above example, an 
> executor should not contact the file server on port {{6677}} but rather on 
> the respective 31xxx assigned by Mesos.
> Spark currently does not allow any of that.
> h4. Taking on the problem, step 1: Spark Driver
> As mentioned above, Spark Driver is based on {{akka-remote}}. In order to 
> take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and 
> {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile 
> with Akka 2.4.x yet.
> What we want is the back port of mentioned {{akka-remote}} settings to 
> {{2.3.x}} versions. These patches are attached to this ticket - 
> {{2.3.4.patch}} and 

[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411309#comment-15411309
 ] 

Michael Gummelt commented on SPARK-16944:
-

Since Mesos is offer based, it's up to the Spark scheduler itself to choose 
which offers have the best locality.  In YARN, I think they tell the resource 
manager about preferences.


> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411307#comment-15411307
 ] 

Michael Gummelt commented on SPARK-16944:
-

I think we can improve both with and without dynamic allocation.  In both 
modes, Mesos is only looking at locality after it's already placed the 
executors. 

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411315#comment-15411315
 ] 

Michael Gummelt commented on SPARK-16944:
-

Yea, we typically call it "delay scheduling".  It was first written about by 
the Spark/Mesos researchers:  
http://elmeleegy.com/khaled/papers/delay_scheduling.pdf

Spark already has `spark.locality.wait`, but that's how long the task scheduler 
will wait until an executor will come up with the preferred locality.  We need 
a similar concept for waiting for offers to come in so we can place the 
executor correctly in the first place.

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12909) Spark on Mesos accessing Secured HDFS w/Kerberos

2016-08-09 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413999#comment-15413999
 ] 

Michael Gummelt commented on SPARK-12909:
-

I agree.  I just spoke with Reynold about this.  I'll create the module before 
the next big feature.

> Spark on Mesos accessing Secured HDFS w/Kerberos
> 
>
> Key: SPARK-12909
> URL: https://issues.apache.org/jira/browse/SPARK-12909
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Greg Senia
>
> Ability for Spark on Mesos to use a Kerberized HDFS FileSystem for data It 
> seems like this is not possible based on email chains and forum articles? If 
> these are true how hard would it be to get this implemented I'm willing to 
> try to help.
> https://community.hortonworks.com/questions/5415/spark-on-yarn-vs-mesos.html
> https://www.mail-archive.com/user@spark.apache.org/msg31326.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-08-09 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16522:

Fix Version/s: (was: 2.1.0)
   2.0.1

> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>Assignee: Sun Rui
> Fix For: 2.0.1
>
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
> 

[jira] [Reopened] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-08-09 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt reopened SPARK-16522:
-

Reopening so we can track this until it's merged into the 2.0 branch.

Also changed the fix version to 2.0.1


> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>Assignee: Sun Rui
> Fix For: 2.0.1
>
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at 

[jira] [Commented] (SPARK-16967) Collect Mesos support code into a module/profile

2016-08-09 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414004#comment-15414004
 ] 

Michael Gummelt commented on SPARK-16967:
-

Will do

> Collect Mesos support code into a module/profile
> 
>
> Key: SPARK-16967
> URL: https://issues.apache.org/jira/browse/SPARK-16967
> Project: Spark
>  Issue Type: Task
>  Components: Mesos, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Priority: Critical
>
> CC [~mgummelt] [~tnachen] [~skonto] 
> I think this is fairly easy and would be beneficial as more work goes into 
> Mesos. It should separate into a module like YARN does, just on principle 
> really, but because it also means anyone that doesn't need Mesos support can 
> build without it.
> I'm entirely willing to take a shot at this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions

2016-07-01 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359481#comment-15359481
 ] 

Michael Gummelt commented on SPARK-11714:
-

[~drcrallen] Your solution isn't exactly clear to me, but it sounds like you're 
trying to solve the problem of specifying arbitrary ports (such as jmx) for the 
executor to reserve, rather than just the ones that spark knows about 
(executor, blockmanager, shuffle service).

I think a clean way to do this would be introducing 
{{spark.mesos.executor.ports}}.  So then you could specify:

-Dspark.mesos.executor.ports=5000 -Dcom.sun.management.jmxremote.port=5000

or something similar

> Make Spark on Mesos honor port restrictions
> ---
>
> Key: SPARK-11714
> URL: https://issues.apache.org/jira/browse/SPARK-11714
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Charles Allen
>
> Currently the MesosSchedulerBackend does not make any effort to honor "ports" 
> as a resource offer in Mesos. This ask is to have the ports which the 
> executor binds to honor the limits of the "ports" resource of an offer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17067) Revocable resource support

2016-08-15 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-17067:
---

 Summary: Revocable resource support
 Key: SPARK-17067
 URL: https://issues.apache.org/jira/browse/SPARK-17067
 Project: Spark
  Issue Type: Improvement
  Components: Mesos
Reporter: Michael Gummelt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19479) Spark Mesos artifact split causes spark-core dependency to not pull in mesos impl

2017-02-06 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855091#comment-15855091
 ] 

Michael Gummelt commented on SPARK-19479:
-

Yea, sorry for the inconvenience, but I announced this on the dev list.  Search 
for "Mesos is now a maven module".  If I were you, I would create an email 
filter for "Mesos" on the user/dev lists.  This is what I do.

> Spark Mesos artifact split causes spark-core dependency to not pull in mesos 
> impl
> -
>
> Key: SPARK-19479
> URL: https://issues.apache.org/jira/browse/SPARK-19479
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos, Spark Core
>Affects Versions: 2.1.0
>Reporter: Charles Allen
>
> https://github.com/apache/spark/pull/14637 ( 
> https://issues.apache.org/jira/browse/SPARK-16967 ) forked off the mesos impl 
> into its own artifact, but the release notes do not call this out. This broke 
> our deployments because we depend on packaging with spark-core, which no 
> longer had any mesos awareness. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-16742) Kerberos support for Spark on Mesos

2017-01-30 Thread Michael Gummelt (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Michael Gummelt commented on  SPARK-16742 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Kerberos support for Spark on Mesos  
 
 
 
 
 
 
 
 
 
 
As an update, we (Mesosphere) are working with Stratio on a joint solution. Stratio will submit a WIP PR soon, and we'll have a design discussion in this JIRA issue. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-16742) Kerberos support for Spark on Mesos

2017-01-30 Thread Michael Gummelt (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Michael Gummelt commented on  SPARK-16742 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Kerberos support for Spark on Mesos  
 
 
 
 
 
 
 
 
 
 
Thomas Graves Yea, I'm pretty sure we're going to change that to use delegation tokens like the existing solutions. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-16784) Configurable log4j settings

2017-01-29 Thread Michael Gummelt (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Michael Gummelt updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-16784 
 
 
 
  Configurable log4j settings  
 
 
 
 
 
 
 
 
 

Change By:
 
 Michael Gummelt 
 
 
 

Affects Version/s:
 
 2.1.0 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



  1   2   >