[jira] [Resolved] (SPARK-11097) Add connection established callback to lower level RPC layer so we don't need to check for new connections in NettyRpcHandler.receive
[ https://issues.apache.org/jira/browse/SPARK-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-11097. --- Resolution: Fixed Fix Version/s: 2.0.0 > Add connection established callback to lower level RPC layer so we don't need > to check for new connections in NettyRpcHandler.receive > - > > Key: SPARK-11097 > URL: https://issues.apache.org/jira/browse/SPARK-11097 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > I think we can remove the check for new connections in > NettyRpcHandler.receive if we just add a channel registered callback to the > lower level network module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11097) Add connection established callback to lower level RPC layer so we don't need to check for new connections in NettyRpcHandler.receive
[ https://issues.apache.org/jira/browse/SPARK-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-11097: -- Assignee: Shixiong Zhu (was: Apache Spark) > Add connection established callback to lower level RPC layer so we don't need > to check for new connections in NettyRpcHandler.receive > - > > Key: SPARK-11097 > URL: https://issues.apache.org/jira/browse/SPARK-11097 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu > > I think we can remove the check for new connections in > NettyRpcHandler.receive if we just add a channel registered callback to the > lower level network module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12411) Reconsider executor heartbeats rpc timeout
[ https://issues.apache.org/jira/browse/SPARK-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12411. --- Resolution: Fixed Assignee: Nong Li Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Reconsider executor heartbeats rpc timeout > -- > > Key: SPARK-12411 > URL: https://issues.apache.org/jira/browse/SPARK-12411 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Nong Li >Assignee: Nong Li > Fix For: 2.0.0 > > > Currently, the timeout for checking when an executor is failed is the same as > the timeout of the sender ("spark.network.timeout") which defaults to 120s. > This means if there is a network issue, the executor only gets one try to > heartbeat which probably causes the failure detection to be flaky. > The executor has a config to control how often to heartbeat > (spark.executor.heartbeatInterval) which defaults to 10s. This combination of > configs doesn't seem to make sense. The heartbeat rpc timeout should probably > be less than or equal to the heartbeatInterval. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12413) Mesos ZK persistence throws a NotSerializableException
[ https://issues.apache.org/jira/browse/SPARK-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12413: -- Fix Version/s: (was: 2.0.0) > Mesos ZK persistence throws a NotSerializableException > -- > > Key: SPARK-12413 > URL: https://issues.apache.org/jira/browse/SPARK-12413 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Michael Gummelt >Assignee: Michael Gummelt > Fix For: 1.6.0 > > > https://github.com/apache/spark/pull/10359 breaks ZK persistence due to > https://issues.scala-lang.org/browse/SI-6654 > This line throws a NotSerializable exception: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster > The MesosClusterDispatcher attempts to serialize MesosDriverDescription > objects to ZK, but https://github.com/apache/spark/pull/10359 makes it so the > {{command}} property is unserializable > Offer id: 72f4d1ce-67f7-41b0-95a3-aa6fb208df32-O189, cpu: 3.0, mem: 12995.0 > 15/12/17 21:52:44 DEBUG ClientCnxn: Got ping response for sessionid: > 0x151b1d1567e0002 after 0ms > 15/12/17 21:52:44 DEBUG nio: created > SCEP@2e746d70{l(/10.0.6.166:41456)<->r(/10.0.0.240:17386),s=0,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=0}-{AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} > 15/12/17 21:52:44 DEBUG HttpParser: filled 1591/1591 > 15/12/17 21:52:44 DEBUG Server: REQUEST /v1/submissions/create on > AsyncHttpConnection@5dbcebe3,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=2,l=2,c=1174},r=1 > 15/12/17 21:52:44 DEBUG ContextHandler: scope null||/v1/submissions/create @ > o.s.j.s.ServletContextHandler{/,null} > 15/12/17 21:52:44 DEBUG ContextHandler: context=||/v1/submissions/create @ > o.s.j.s.ServletContextHandler{/,null} > 15/12/17 21:52:44 DEBUG ServletHandler: servlet |/v1/submissions/create|null > -> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet-368e091 > 15/12/17 21:52:44 DEBUG ServletHandler: chain=null > 15/12/17 21:52:44 WARN ServletHandler: /v1/submissions/create > java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at org.apache.spark.util.Utils$.serialize(Utils.scala:83) > at > org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:110) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:166) > at > org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:132) > at > org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:258) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065047#comment-15065047 ] Andrew Or commented on SPARK-12345: --- For those who are following: There are 4 patches related to this issue that are merged in this order: (1) https://github.com/apache/spark/pull/10332 - doesn't actually work (2) https://github.com/apache/spark/pull/10359 - fixes #10332 to make it actually work (3) https://github.com/apache/spark/pull/10366 - fixes #10359, which broke HA (SPARK-12413) (4) https://github.com/apache/spark/pull/10329 - an alternative, more correct fix Patches (1), (2), and (3) are merged ONLY into branch-1.6. Patch (4) is merged ONLY in master. We have a different fix for branch-1.6 because it was an RC blocker and we wanted to minimize the scope of the changes there. However, patch (4) is a better fix, and so it exists in master for the longer term. > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Timothy Chen >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12365) Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called
[ https://issues.apache.org/jira/browse/SPARK-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12365: -- Fix Version/s: (was: 1.6.1) > Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called > > > Key: SPARK-12365 > URL: https://issues.apache.org/jira/browse/SPARK-12365 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Fix For: 2.0.0 > > > SPARK-9886 fixed call to Runtime.getRuntime.addShutdownHook() in > ExternalBlockStore.scala > This issue intends to address remaining usage of > Runtime.getRuntime.addShutdownHook() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12365) Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called
[ https://issues.apache.org/jira/browse/SPARK-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12365: -- Target Version/s: 2.0.0 (was: 1.6.1, 2.0.0) > Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called > > > Key: SPARK-12365 > URL: https://issues.apache.org/jira/browse/SPARK-12365 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Fix For: 2.0.0 > > > SPARK-9886 fixed call to Runtime.getRuntime.addShutdownHook() in > ExternalBlockStore.scala > This issue intends to address remaining usage of > Runtime.getRuntime.addShutdownHook() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12390) Clean up unused serializer parameter in BlockManager
[ https://issues.apache.org/jira/browse/SPARK-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12390: -- Fix Version/s: 1.6.1 > Clean up unused serializer parameter in BlockManager > > > Key: SPARK-12390 > URL: https://issues.apache.org/jira/browse/SPARK-12390 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 1.6.1, 2.0.0 >Reporter: Andrew Or >Assignee: Apache Spark > Fix For: 1.6.1, 2.0.0 > > > This parameter is never used: > https://github.com/apache/spark/blob/ce5fd4008e890ef8ebc2d3cb703a666783ad6c02/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1204 > and there are 4 more places like that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12390) Clean up unused serializer parameter in BlockManager
[ https://issues.apache.org/jira/browse/SPARK-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12390. --- Resolution: Fixed > Clean up unused serializer parameter in BlockManager > > > Key: SPARK-12390 > URL: https://issues.apache.org/jira/browse/SPARK-12390 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 1.6.1, 2.0.0 >Reporter: Andrew Or >Assignee: Apache Spark > Fix For: 1.6.1, 2.0.0 > > > This parameter is never used: > https://github.com/apache/spark/blob/ce5fd4008e890ef8ebc2d3cb703a666783ad6c02/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1204 > and there are 4 more places like that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12390) Clean up unused serializer parameter in BlockManager
[ https://issues.apache.org/jira/browse/SPARK-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-12390: - Assignee: Andrew Or (was: Apache Spark) > Clean up unused serializer parameter in BlockManager > > > Key: SPARK-12390 > URL: https://issues.apache.org/jira/browse/SPARK-12390 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 1.6.1, 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 1.6.1, 2.0.0 > > > This parameter is never used: > https://github.com/apache/spark/blob/ce5fd4008e890ef8ebc2d3cb703a666783ad6c02/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1204 > and there are 4 more places like that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12415) Do not use closure serializer to serialize task result
Andrew Or created SPARK-12415: - Summary: Do not use closure serializer to serialize task result Key: SPARK-12415 URL: https://issues.apache.org/jira/browse/SPARK-12415 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Andrew Or As the name suggests, closure serializer is for closures. We should be able to use the generic serializer for task results. If we want to do this we need to register `org.apache.spark.scheduler.TaskResult` if we use Kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12414) Remove closure serializer
Andrew Or created SPARK-12414: - Summary: Remove closure serializer Key: SPARK-12414 URL: https://issues.apache.org/jira/browse/SPARK-12414 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Andrew Or There is a config `spark.closure.serializer` that accepts exactly one value: the java serializer. This is because there are currently bugs in the Kryo serializer that make it not a viable candidate. This was uncovered by an unsuccessful attempt to make it work: SPARK-7708. My high level point is that the Java serializer has worked well for at least 6 Spark versions now, and it is an incredibly complicated task to get other serializers (not just Kryo) to work with Spark's closures. IMO the effort is not worth it and we should just remove this documentation and all the code associated with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken when SPARK_HOME is set
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12345: -- Summary: Mesos cluster mode is broken when SPARK_HOME is set (was: Mesos cluster mode is broken) > Mesos cluster mode is broken when SPARK_HOME is set > --- > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Apache Spark >Priority: Critical > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12345: -- Summary: Mesos cluster mode is broken (was: Mesos cluster mode is broken when SPARK_HOME is set) > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Apache Spark >Priority: Critical > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12345. --- Resolution: Fixed Assignee: Luc Bourlier (was: Apache Spark) > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Luc Bourlier >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12345: -- Fix Version/s: 1.6.0 > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Apache Spark >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12345: -- Target Version/s: 1.6.0 (was: 1.6.1) > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Apache Spark >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12365) Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called
[ https://issues.apache.org/jira/browse/SPARK-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12365. --- Resolution: Fixed Assignee: Ted Yu (was: Apache Spark) Fix Version/s: 2.0.0 1.6.1 Target Version/s: 1.6.1, 2.0.0 > Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called > > > Key: SPARK-12365 > URL: https://issues.apache.org/jira/browse/SPARK-12365 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > SPARK-9886 fixed call to Runtime.getRuntime.addShutdownHook() in > ExternalBlockStore.scala > This issue intends to address remaining usage of > Runtime.getRuntime.addShutdownHook() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12186) stage web URI will redirect to the wrong location if it is the first URI from the application to be requested from the history server
[ https://issues.apache.org/jira/browse/SPARK-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12186. --- Resolution: Fixed Fix Version/s: 1.6.0 Target Version/s: 1.6.0 > stage web URI will redirect to the wrong location if it is the first URI from > the application to be requested from the history server > - > > Key: SPARK-12186 > URL: https://issues.apache.org/jira/browse/SPARK-12186 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 1.6.0 > > > In the history server when we open an application link for the first time, it > loads the app and registers the app UI and sends a redirect to the URI that > was requested. > The code to send the redirect is: > {{res.sendRedirect(res.encodeRedirectURL(req.getRequestURI()))}} > However {{req.getRequestURI()}} is not the complete URI that was requested, > it doesn't contain the query string. > Stage URIs are of the following form: > http://localhost:18080/history/application_1449188824095_0001/stages/stage/?id=0=0 > When such a URI is *the first URI from the application to be requested*, it > redirects to a URI like: > http://localhost:18080/history/application_1449188824095_0001/stages/stage/ > which errors with > {code} > HTTP ERROR 400 > Problem accessing /history/application_1449188824095_0001/stages/stage/. > Reason: > requirement failed: Missing id parameter > Powered by Jetty:// > {code} > This is not a frequent occurrence because you usually navigate to the stage > URI after you have navigated to some other URI belonging to the application > and then this will not happen, only when the stage URI is the first URI from > the application to be requested from the history server will you see this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12186) stage web URI will redirect to the wrong location if it is the first URI from the application to be requested from the history server
[ https://issues.apache.org/jira/browse/SPARK-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12186: -- Fix Version/s: (was: 1.6.0) 2.0.0 1.6.1 > stage web URI will redirect to the wrong location if it is the first URI from > the application to be requested from the history server > - > > Key: SPARK-12186 > URL: https://issues.apache.org/jira/browse/SPARK-12186 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > In the history server when we open an application link for the first time, it > loads the app and registers the app UI and sends a redirect to the URI that > was requested. > The code to send the redirect is: > {{res.sendRedirect(res.encodeRedirectURL(req.getRequestURI()))}} > However {{req.getRequestURI()}} is not the complete URI that was requested, > it doesn't contain the query string. > Stage URIs are of the following form: > http://localhost:18080/history/application_1449188824095_0001/stages/stage/?id=0=0 > When such a URI is *the first URI from the application to be requested*, it > redirects to a URI like: > http://localhost:18080/history/application_1449188824095_0001/stages/stage/ > which errors with > {code} > HTTP ERROR 400 > Problem accessing /history/application_1449188824095_0001/stages/stage/. > Reason: > requirement failed: Missing id parameter > Powered by Jetty:// > {code} > This is not a frequent occurrence because you usually navigate to the stage > URI after you have navigated to some other URI belonging to the application > and then this will not happen, only when the stage URI is the first URI from > the application to be requested from the history server will you see this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12186) stage web URI will redirect to the wrong location if it is the first URI from the application to be requested from the history server
[ https://issues.apache.org/jira/browse/SPARK-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12186: -- Target Version/s: 1.6.1, 2.0.0 (was: 1.6.0) > stage web URI will redirect to the wrong location if it is the first URI from > the application to be requested from the history server > - > > Key: SPARK-12186 > URL: https://issues.apache.org/jira/browse/SPARK-12186 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > In the history server when we open an application link for the first time, it > loads the app and registers the app UI and sends a redirect to the URI that > was requested. > The code to send the redirect is: > {{res.sendRedirect(res.encodeRedirectURL(req.getRequestURI()))}} > However {{req.getRequestURI()}} is not the complete URI that was requested, > it doesn't contain the query string. > Stage URIs are of the following form: > http://localhost:18080/history/application_1449188824095_0001/stages/stage/?id=0=0 > When such a URI is *the first URI from the application to be requested*, it > redirects to a URI like: > http://localhost:18080/history/application_1449188824095_0001/stages/stage/ > which errors with > {code} > HTTP ERROR 400 > Problem accessing /history/application_1449188824095_0001/stages/stage/. > Reason: > requirement failed: Missing id parameter > Powered by Jetty:// > {code} > This is not a frequent occurrence because you usually navigate to the stage > URI after you have navigated to some other URI belonging to the application > and then this will not happen, only when the stage URI is the first URI from > the application to be requested from the history server will you see this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12186) stage web URI will redirect to the wrong location if it is the first URI from the application to be requested from the history server
[ https://issues.apache.org/jira/browse/SPARK-12186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12186: -- Assignee: Rohit Agarwal > stage web URI will redirect to the wrong location if it is the first URI from > the application to be requested from the history server > - > > Key: SPARK-12186 > URL: https://issues.apache.org/jira/browse/SPARK-12186 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Rohit Agarwal >Assignee: Rohit Agarwal >Priority: Minor > > In the history server when we open an application link for the first time, it > loads the app and registers the app UI and sends a redirect to the URI that > was requested. > The code to send the redirect is: > {{res.sendRedirect(res.encodeRedirectURL(req.getRequestURI()))}} > However {{req.getRequestURI()}} is not the complete URI that was requested, > it doesn't contain the query string. > Stage URIs are of the following form: > http://localhost:18080/history/application_1449188824095_0001/stages/stage/?id=0=0 > When such a URI is *the first URI from the application to be requested*, it > redirects to a URI like: > http://localhost:18080/history/application_1449188824095_0001/stages/stage/ > which errors with > {code} > HTTP ERROR 400 > Problem accessing /history/application_1449188824095_0001/stages/stage/. > Reason: > requirement failed: Missing id parameter > Powered by Jetty:// > {code} > This is not a frequent occurrence because you usually navigate to the stage > URI after you have navigated to some other URI belonging to the application > and then this will not happen, only when the stage URI is the first URI from > the application to be requested from the history server will you see this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10248) DAGSchedulerSuite should check there were no errors in EventProcessLoop
[ https://issues.apache.org/jira/browse/SPARK-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10248. --- Resolution: Fixed Assignee: Imran Rashid Fix Version/s: 2.0.0 1.6.1 Target Version/s: 1.6.1, 2.0.0 > DAGSchedulerSuite should check there were no errors in EventProcessLoop > --- > > Key: SPARK-10248 > URL: https://issues.apache.org/jira/browse/SPARK-10248 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Imran Rashid >Assignee: Imran Rashid > Fix For: 1.6.1, 2.0.0 > > > If an exception is thrown inside {{DAGSchedulerEventProcessLoop}}, it is just > logged, so its hard to directly check in tests. (In fact, the scheduler > isn't even stopped, b/c the tests don't use the {{DAGSchduler}} that is known > by the {{SparkContext}}). > We should update the test framework so we can check if there is an error in > the event loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12345: -- Assignee: Timothy Chen (was: Luc Bourlier) > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Timothy Chen >Priority: Critical > Fix For: 1.6.0 > > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12386) Setting "spark.executor.port" leads to NPE in SparkEnv
[ https://issues.apache.org/jira/browse/SPARK-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12386. --- Resolution: Fixed Assignee: Marcelo Vanzin (was: Apache Spark) Fix Version/s: 2.0.0 1.6.1 Target Version/s: 1.6.1, 2.0.0 > Setting "spark.executor.port" leads to NPE in SparkEnv > -- > > Key: SPARK-12386 > URL: https://issues.apache.org/jira/browse/SPARK-12386 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Critical > Fix For: 1.6.1, 2.0.0 > > > From the list: > {quote} > when we set spark.executor.port in 1.6, we get thrown a NPE in > SparkEnv$.create(SparkEnv.scala:259). > {quote} > Fix is simple; probably should make it to 1.6.0 since it will affect anyone > using that config options, but I'll leave that to the release manager's > discretion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12390) Clean up unused serializer parameter in BlockManager
[ https://issues.apache.org/jira/browse/SPARK-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12390: -- Fix Version/s: 2.0.0 > Clean up unused serializer parameter in BlockManager > > > Key: SPARK-12390 > URL: https://issues.apache.org/jira/browse/SPARK-12390 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 1.6.1, 2.0.0 >Reporter: Andrew Or >Assignee: Apache Spark > Fix For: 2.0.0 > > > This parameter is never used: > https://github.com/apache/spark/blob/ce5fd4008e890ef8ebc2d3cb703a666783ad6c02/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1204 > and there are 4 more places like that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12390) Clean up unused serializer parameter in BlockManager
Andrew Or created SPARK-12390: - Summary: Clean up unused serializer parameter in BlockManager Key: SPARK-12390 URL: https://issues.apache.org/jira/browse/SPARK-12390 Project: Spark Issue Type: Bug Components: Block Manager, Spark Core Affects Versions: 1.6.1, 2.0.0 Reporter: Andrew Or Assignee: Andrew Or This parameter is never used: https://github.com/apache/spark/blob/ce5fd4008e890ef8ebc2d3cb703a666783ad6c02/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1204 and there are 4 more places like that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9042) Spark SQL incompatibility if security is enforced on the Hive warehouse
[ https://issues.apache.org/jira/browse/SPARK-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060208#comment-15060208 ] Andrew Ray commented on SPARK-9042: --- Sean, I think there are a couple issues going on here. In my experience with the Sentry HDFS plugin, you can read tables just fine from spark (which was the stated issue here). However there are other similar issues that are real, you can't create/modify any tables. There are two issues there. First is HDFS permissions, the sentry hdfs plugin only gives you read access. Second is Hive metastore permissions, even if you create the table in some other hdfs location that you have write access to you will still fail as you can't make modifications to the hive metastore as it has a whitelist of users that is by default set to just hive and impala. > Spark SQL incompatibility if security is enforced on the Hive warehouse > --- > > Key: SPARK-9042 > URL: https://issues.apache.org/jira/browse/SPARK-9042 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Nitin Kak > > Hive queries executed from Spark using HiveContext use CLI to create the > query plan and then access the Hive table directories(under > /user/hive/warehouse/) directly. This gives AccessContolException if Apache > Sentry is installed: > org.apache.hadoop.security.AccessControlException: Permission denied: > user=kakn, access=READ_EXECUTE, > inode="/user/hive/warehouse/mastering.db/sample_table":hive:hive:drwxrwx--t > With Apache Sentry, only "hive" user(created only for Sentry) has the > permissions to access the hive warehouse directory. After Sentry > installations all the queries are directed to HiveServer2 which translates > the changes the invoking user to "hive" and then access the hive warehouse > directory. However, HiveContext does not execute the query through > HiveServer2 which is leading to the issue. Here is an example of executing > hive query through HiveContext. > val hqlContext = new HiveContext(sc) // Create context to run Hive queries > val pairRDD = hqlContext.sql(hql) // where hql is the string with hive query -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12345) Mesos cluster mode is broken
Andrew Or created SPARK-12345: - Summary: Mesos cluster mode is broken Key: SPARK-12345 URL: https://issues.apache.org/jira/browse/SPARK-12345 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.6.0 Reporter: Andrew Or Assignee: Iulian Dragos Priority: Critical The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. (Iulian: please edit this to provide more detail) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10250) Scala PairRDDFunctions.groupByKey() should be fault-tolerant of single large groups
[ https://issues.apache.org/jira/browse/SPARK-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-10250: -- Summary: Scala PairRDDFunctions.groupByKey() should be fault-tolerant of single large groups (was: Scala PairRDDFuncitons.groupByKey() should be fault-tolerant of single large groups) > Scala PairRDDFunctions.groupByKey() should be fault-tolerant of single large > groups > --- > > Key: SPARK-10250 > URL: https://issues.apache.org/jira/browse/SPARK-10250 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.1 >Reporter: Matt Cheah >Priority: Minor > > PairRDDFunctions.groupByKey() is less robust that Python's equivalent, as > PySpark's groupByKey can spill single large groups to disk. We should bring > the Scala implementation up to parity. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12345) Mesos cluster mode is broken
[ https://issues.apache.org/jira/browse/SPARK-12345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12345: -- Target Version/s: 1.6.1 (was: 1.6.0) > Mesos cluster mode is broken > > > Key: SPARK-12345 > URL: https://issues.apache.org/jira/browse/SPARK-12345 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Apache Spark >Priority: Critical > > The same setup worked in 1.5.2 but is now failing for 1.6.0-RC2. > The driver is confused about where SPARK_HOME is. It resolves > `mesos.executor.uri` or `spark.mesos.executor.home` relative to the > filesystem where the driver runs, which is wrong. > {code} > I1215 15:00:39.411212 28032 exec.cpp:134] Version: 0.25.0 > I1215 15:00:39.413512 28037 exec.cpp:208] Executor registered on slave > 130bdc39-44e7-4256-8c22-602040d337f1-S1 > bin/spark-submit: line 27: > /Users/dragos/workspace/Spark/dev/rc-tests/spark-1.6.0-bin-hadoop2.6/bin/spark-class: > No such file or directory > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12130) Replace shuffleManagerClass with shortShuffleMgrNames in ExternalShuffleBlockResolver
[ https://issues.apache.org/jira/browse/SPARK-12130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12130. --- Resolution: Fixed Assignee: Lianhui Wang Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Replace shuffleManagerClass with shortShuffleMgrNames in > ExternalShuffleBlockResolver > - > > Key: SPARK-12130 > URL: https://issues.apache.org/jira/browse/SPARK-12130 > Project: Spark > Issue Type: Bug > Components: Shuffle, YARN >Reporter: Lianhui Wang >Assignee: Lianhui Wang > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12130) Replace shuffleManagerClass with shortShuffleMgrNames in ExternalShuffleBlockResolver
[ https://issues.apache.org/jira/browse/SPARK-12130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12130: -- Issue Type: Improvement (was: Bug) > Replace shuffleManagerClass with shortShuffleMgrNames in > ExternalShuffleBlockResolver > - > > Key: SPARK-12130 > URL: https://issues.apache.org/jira/browse/SPARK-12130 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Reporter: Lianhui Wang >Assignee: Lianhui Wang > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9886) Validate usages of Runtime.getRuntime.addShutdownHook
[ https://issues.apache.org/jira/browse/SPARK-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-9886. -- Resolution: Fixed Assignee: (was: Apache Spark) Fix Version/s: 2.0.0 1.6.1 Target Version/s: 1.6.1, 2.0.0 > Validate usages of Runtime.getRuntime.addShutdownHook > - > > Key: SPARK-9886 > URL: https://issues.apache.org/jira/browse/SPARK-9886 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Michel Lemay >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > While refactoring calls to Utils.addShutdownHook to spark ShutdownHookManager > in PR #8109, I've seen instances of calls to > Runtime.getRuntime.addShutdownHook: > org\apache\spark\deploy\ExternalShuffleService.scala:126 > org\apache\spark\deploy\mesos\MesosClusterDispatcher.scala:113 > org\apache\spark\storage\ExternalBlockStore.scala:181 > Comment from @vanzin: > "From a quick look, it seems that at least the one in ExternalBlockStore > should be changed; the other two seem to be separate processes (i.e. they are > not part of a Spark application) so that's questionable. But I'd say leave it > for a different change (maybe file a separate bug so it doesn't fall through > the cracks)." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10477) using DSL in ColumnPruningSuite to improve readablity
[ https://issues.apache.org/jira/browse/SPARK-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-10477: -- Component/s: Tests > using DSL in ColumnPruningSuite to improve readablity > - > > Key: SPARK-10477 > URL: https://issues.apache.org/jira/browse/SPARK-10477 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Trivial > Fix For: 1.6.1, 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4117) Spark on Yarn handle AM being told command from RM
[ https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-4117. -- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Spark on Yarn handle AM being told command from RM > -- > > Key: SPARK-4117 > URL: https://issues.apache.org/jira/browse/SPARK-4117 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves > Fix For: 2.0.0 > > > In the allocateResponse from the RM it can send commands that the AM should > follow. for instance AM_RESYNC and AM_SHUTDOWN. We should add support for > those. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-4514. -- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 1.2.1, 1.1.2 (was: 1.1.2, 1.2.1) > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Richard W. Eggert II >Priority: Critical > Fix For: 2.0.0 > > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9026) SimpleFutureAction.onComplete should not tie up a separate thread for each callback
[ https://issues.apache.org/jira/browse/SPARK-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-9026: - Assignee: Richard W. Eggert II (was: Josh Rosen) > SimpleFutureAction.onComplete should not tie up a separate thread for each > callback > --- > > Key: SPARK-9026 > URL: https://issues.apache.org/jira/browse/SPARK-9026 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Josh Rosen >Assignee: Richard W. Eggert II > Fix For: 2.0.0 > > > As [~zsxwing] points out at > https://github.com/apache/spark/pull/7276#issuecomment-121097747, > SimpleFutureAction currently blocks a separate execution context thread for > each callback registered via onComplete: > {code} > override def onComplete[U](func: (Try[T]) => U)(implicit executor: > ExecutionContext) { > executor.execute(new Runnable { > override def run() { > func(awaitResult()) > } > }) > } > {code} > We should fix this so that callbacks do not steal threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4514: - Assignee: Richard W. Eggert II (was: Josh Rosen) > SparkContext localProperties does not inherit property updates across thread > reuse > -- > > Key: SPARK-4514 > URL: https://issues.apache.org/jira/browse/SPARK-4514 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Erik Erlandson >Assignee: Richard W. Eggert II >Priority: Critical > Fix For: 2.0.0 > > > The current job group id of a Spark context is stored in the > {{localProperties}} member value. This data structure is designed to be > thread local, and its settings are not preserved when {{ComplexFutureAction}} > instantiates a new {{Future}}. > One consequence of this is that {{takeAsync()}} does not behave in the same > way as other async actions, e.g. {{countAsync()}}. For example, this test > (if copied into StatusTrackerSuite.scala), will fail, because > {{"my-job-group2"}} is not propagated to the Future which actually > instantiates the job: > {code:java} > test("getJobIdsForGroup() with takeAsync()") { > sc = new SparkContext("local", "test", new SparkConf(false)) > sc.setJobGroup("my-job-group2", "description") > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty) > val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1) > val firstJobId = eventually(timeout(10 seconds)) { > firstJobFuture.jobIds.head > } > eventually(timeout(10 seconds)) { > sc.statusTracker.getJobIdsForGroup("my-job-group2") should be > (Seq(firstJobId)) > } > } > {code} > It also impacts current PR for SPARK-1021, which involves additional uses of > {{ComplexFutureAction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4117) Spark on Yarn handle AM being told command from RM
[ https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4117: - Assignee: Devaraj K > Spark on Yarn handle AM being told command from RM > -- > > Key: SPARK-4117 > URL: https://issues.apache.org/jira/browse/SPARK-4117 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves >Assignee: Devaraj K > Fix For: 2.0.0 > > > In the allocateResponse from the RM it can send commands that the AM should > follow. for instance AM_RESYNC and AM_SHUTDOWN. We should add support for > those. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12105) Add a DataFrame.show() with argument for output PrintStream
[ https://issues.apache.org/jira/browse/SPARK-12105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12105. --- Resolution: Fixed Assignee: Jean-Baptiste Onofré Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Add a DataFrame.show() with argument for output PrintStream > --- > > Key: SPARK-12105 > URL: https://issues.apache.org/jira/browse/SPARK-12105 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.2 >Reporter: Dean Wampler >Assignee: Jean-Baptiste Onofré >Priority: Minor > Fix For: 2.0.0 > > > It would be nice to send the output of DataFrame.show(...) to a different > output stream than stdout, including just capturing the string itself. This > is useful, e.g., for testing. Actually, it would be sufficient and perhaps > better to just make DataFrame.showString a public method, -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12351) Add documentation of submitting Mesos jobs with cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12351. --- Resolution: Fixed Assignee: Timothy Chen Fix Version/s: 2.0.0 1.6.1 Target Version/s: 1.6.1, 2.0.0 > Add documentation of submitting Mesos jobs with cluster mode > > > Key: SPARK-12351 > URL: https://issues.apache.org/jira/browse/SPARK-12351 > Project: Spark > Issue Type: Documentation >Reporter: Timothy Chen >Assignee: Timothy Chen > Fix For: 1.6.1, 2.0.0 > > > Add more documentation around how to launch spark drivers with Mesos cluster > mode -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9026) SimpleFutureAction.onComplete should not tie up a separate thread for each callback
[ https://issues.apache.org/jira/browse/SPARK-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-9026. -- Resolution: Fixed Fix Version/s: 2.0.0 > SimpleFutureAction.onComplete should not tie up a separate thread for each > callback > --- > > Key: SPARK-9026 > URL: https://issues.apache.org/jira/browse/SPARK-9026 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 2.0.0 > > > As [~zsxwing] points out at > https://github.com/apache/spark/pull/7276#issuecomment-121097747, > SimpleFutureAction currently blocks a separate execution context thread for > each callback registered via onComplete: > {code} > override def onComplete[U](func: (Try[T]) => U)(implicit executor: > ExecutionContext) { > executor.execute(new Runnable { > override def run() { > func(awaitResult()) > } > }) > } > {code} > We should fix this so that callbacks do not steal threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10477) using DSL in ColumnPruningSuite to improve readablity
[ https://issues.apache.org/jira/browse/SPARK-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10477. --- Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 2.0.0 1.6.1 Target Version/s: 1.6.1, 2.0.0 > using DSL in ColumnPruningSuite to improve readablity > - > > Key: SPARK-10477 > URL: https://issues.apache.org/jira/browse/SPARK-10477 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Trivial > Fix For: 1.6.1, 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12062. --- Resolution: Fixed Fix Version/s: 2.0.0 1.6.1 > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > Fix For: 1.6.1, 2.0.0 > > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9516) Improve Thread Dump page
[ https://issues.apache.org/jira/browse/SPARK-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-9516. -- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Improve Thread Dump page > > > Key: SPARK-9516 > URL: https://issues.apache.org/jira/browse/SPARK-9516 > Project: Spark > Issue Type: New Feature > Components: Web UI >Reporter: Nan Zhu >Assignee: Nan Zhu >Priority: Minor > Fix For: 2.0.0 > > > Originally proposed by [~irashid] in > https://github.com/apache/spark/pull/7808#issuecomment-126788335: > we can enhance the current thread dump page with at least the following two > new features: > 1) sort threads by thread status, > 2) a filter to grep the threads -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10123) Cannot set "--deploy-mode" in default configuration
[ https://issues.apache.org/jira/browse/SPARK-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-10123. --- Resolution: Fixed Assignee: Saisai Shao Fix Version/s: 2.0.0 > Cannot set "--deploy-mode" in default configuration > --- > > Key: SPARK-10123 > URL: https://issues.apache.org/jira/browse/SPARK-10123 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Marcelo Vanzin >Assignee: Saisai Shao >Priority: Minor > Fix For: 2.0.0 > > > There's no configuration option that is the equivalent of "--deploy-mode". So > it's not possible, for example, to have applications be submitted in > standalone cluster mode by default - you have to always use the command line > argument for that. > YARN is special because it has the (somewhat deprecated) "yarn-cluster" > master, but it would be nice to be consistent and have a proper config option > for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9886) Validate usages of Runtime.getRuntime.addShutdownHook
[ https://issues.apache.org/jira/browse/SPARK-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-9886: - Assignee: Naveen Kumar Minchu > Validate usages of Runtime.getRuntime.addShutdownHook > - > > Key: SPARK-9886 > URL: https://issues.apache.org/jira/browse/SPARK-9886 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Michel Lemay >Assignee: Naveen Kumar Minchu >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > While refactoring calls to Utils.addShutdownHook to spark ShutdownHookManager > in PR #8109, I've seen instances of calls to > Runtime.getRuntime.addShutdownHook: > org\apache\spark\deploy\ExternalShuffleService.scala:126 > org\apache\spark\deploy\mesos\MesosClusterDispatcher.scala:113 > org\apache\spark\storage\ExternalBlockStore.scala:181 > Comment from @vanzin: > "From a quick look, it seems that at least the one in ExternalBlockStore > should be changed; the other two seem to be separate processes (i.e. they are > not part of a Spark application) so that's questionable. But I'd say leave it > for a different change (maybe file a separate bug so it doesn't fall through > the cracks)." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reopened SPARK-12062: --- > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12062: -- Target Version/s: 1.6.1, 2.0.0 > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12275) No plan for BroadcastHint in some condition
[ https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12275. --- Resolution: Fixed Fix Version/s: 1.5.3 Target Version/s: 1.5.3, 1.6.1, 2.0.0 (was: 1.5.3) > No plan for BroadcastHint in some condition > --- > > Key: SPARK-12275 > URL: https://issues.apache.org/jira/browse/SPARK-12275 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: yucai >Assignee: yucai > Labels: backport-needed > Fix For: 1.5.3, 1.6.1, 2.0.0 > > > *Summary* > No plan for BroadcastHint is generated in some condition. > *Test Case* > {code} > val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") > val parquetTempFile = > "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), > scala.util.Random.nextInt) > df1.write.parquet(parquetTempFile) > val pf1 = sqlContext.read.parquet(parquetTempFile) > #1. df1.join(broadcast(pf1)).count() > #2. broadcast(pf1).count() > {code} > *Result* > It will trigger assertion in QueryPlanner.scala, like below: > {code} > scala> df1.join(broadcast(pf1)).count() > java.lang.AssertionError: assertion failed: No plan for BroadcastHint > +- Relation[key#6,value#7] > ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet] > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > at > org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054829#comment-15054829 ] Andrew Or commented on SPARK-12062: --- I see, if you already have a patch then this is worth fixing. However I don't think we should introduce yet another configuration. It's best if the serving is asynchronous before we remove it completely. > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6270) Standalone Master hangs when streaming job completes and event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053915#comment-15053915 ] Andrew Or commented on SPARK-6270: -- +1 to removing history serving functionality from standalone Master > Standalone Master hangs when streaming job completes and event logging is > enabled > - > > Key: SPARK-6270 > URL: https://issues.apache.org/jira/browse/SPARK-6270 > Project: Spark > Issue Type: Bug > Components: Deploy, Streaming >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.5.1 >Reporter: Tathagata Das >Priority: Critical > > If the event logging is enabled, the Spark Standalone Master tries to > recreate the web UI of a completed Spark application from its event logs. > However if this event log is huge (e.g. for a Spark Streaming application), > then the master hangs in its attempt to read and recreate the web ui. This > hang causes the whole standalone cluster to be unusable. > Workaround is to disable the event logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6270) Standalone Master hangs when streaming job completes and event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053929#comment-15053929 ] Andrew Or commented on SPARK-6270: -- [~shivaram] That may be difficult to do because different applications can specify different log directories, whereas now the history server reads all logs in the same one. Also it would add complexity to the Master process because now it binds to two ports instead of one. I think we should try to keep it lightweight in the future and simply rip out this functionality. With Spark 2.0 I believe we're allowed to do that. :) > Standalone Master hangs when streaming job completes and event logging is > enabled > - > > Key: SPARK-6270 > URL: https://issues.apache.org/jira/browse/SPARK-6270 > Project: Spark > Issue Type: Bug > Components: Deploy, Streaming >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.5.1 >Reporter: Tathagata Das >Priority: Critical > > If the event logging is enabled, the Spark Standalone Master tries to > recreate the web UI of a completed Spark application from its event logs. > However if this event log is huge (e.g. for a Spark Streaming application), > then the master hangs in its attempt to read and recreate the web ui. This > hang causes the whole standalone cluster to be unusable. > Workaround is to disable the event logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053923#comment-15053923 ] Andrew Or commented on SPARK-12062: --- Actually, I'm closing this as a Won't Fix since SPARK-12299 supersedes this. > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12299) Remove history serving functionality from standalone Master
Andrew Or created SPARK-12299: - Summary: Remove history serving functionality from standalone Master Key: SPARK-12299 URL: https://issues.apache.org/jira/browse/SPARK-12299 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.0.0 Reporter: Andrew Or The standalone Master currently continues to serve the historical UIs of applications that have completed and enabled event logging. This poses problems, however, if the event log is very large, e.g. SPARK-6270. The Master might OOM or hang while it rebuilds the UI, rejecting applications in the mean time. Personally, I have had to make modifications in the code to disable this myself, because I wanted to use event logging in standalone mode for applications that produce a lot of logging. Removing this from the Master would simplify the process significantly. This issue supersedes SPARK-12062. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous
[ https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12062. --- Resolution: Won't Fix > Master rebuilding historical SparkUI should be asynchronous > --- > > Key: SPARK-12062 > URL: https://issues.apache.org/jira/browse/SPARK-12062 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Bryan Cutler > > When a long-running application finishes, it takes a while (sometimes > minutes) to rebuild the SparkUI. However, in Master.scala this is currently > done within the RPC event loop, which runs only in 1 thread. Thus, in the > mean time no other applications can register with this master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4036) Add Conditional Random Fields (CRF) algorithm to Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Philpot updated SPARK-4036: -- Attachment: sample-output sample-input features.hair-eye dig-hair-eye-train.model I attach a feature file, model file, input data, and output data from CRF++. There may be odd UTF-8 issues, but our real pipeline addresses them in ways not fully represented in this data sample, in case you see anything related to that. [David, this person is working on CRF natively (scala) for Spark. FYI] Thank you, Andrew -- Andrew Philpot andrew.phil...@gmail.com > Add Conditional Random Fields (CRF) algorithm to Spark MLlib > > > Key: SPARK-4036 > URL: https://issues.apache.org/jira/browse/SPARK-4036 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Guoqiang Li >Assignee: Kai Sasaki > Attachments: CRF_design.1.pdf, dig-hair-eye-train.model, > features.hair-eye, sample-input, sample-output > > > Conditional random fields (CRFs) are a class of statistical modelling method > often applied in pattern recognition and machine learning, where they are > used for structured prediction. > The paper: > http://www.seas.upenn.edu/~strctlrn/bib/PDF/crf.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6270) Standalone Master hangs when streaming job completes and event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053395#comment-15053395 ] Andrew Or commented on SPARK-6270: -- Yeah, some regression tests would be helpful, though in my case I needed to run something complicated enough to reproduce the issue. I think a real long-term fix would be to periodically checkpoint the UI state and truncate the event log. > Standalone Master hangs when streaming job completes and event logging is > enabled > - > > Key: SPARK-6270 > URL: https://issues.apache.org/jira/browse/SPARK-6270 > Project: Spark > Issue Type: Bug > Components: Deploy, Streaming >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.5.1 >Reporter: Tathagata Das >Priority: Critical > > If the event logging is enabled, the Spark Standalone Master tries to > recreate the web UI of a completed Spark application from its event logs. > However if this event log is huge (e.g. for a Spark Streaming application), > then the master hangs in its attempt to read and recreate the web ui. This > hang causes the whole standalone cluster to be unusable. > Workaround is to disable the event logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6270) Standalone Master hangs when streaming job completes and event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053399#comment-15053399 ] Andrew Or commented on SPARK-6270: -- Also, we should consider switching event log compression on by default. These files are highly compressible and in my case I've noticed ratios of 10X (which saved around 10G of disk space). > Standalone Master hangs when streaming job completes and event logging is > enabled > - > > Key: SPARK-6270 > URL: https://issues.apache.org/jira/browse/SPARK-6270 > Project: Spark > Issue Type: Bug > Components: Deploy, Streaming >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.5.1 >Reporter: Tathagata Das >Priority: Critical > > If the event logging is enabled, the Spark Standalone Master tries to > recreate the web UI of a completed Spark application from its event logs. > However if this event log is huge (e.g. for a Spark Streaming application), > then the master hangs in its attempt to read and recreate the web ui. This > hang causes the whole standalone cluster to be unusable. > Workaround is to disable the event logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6270) Standalone Master hangs when streaming job completes and event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053405#comment-15053405 ] Andrew Or commented on SPARK-6270: -- You can set `spark.eventLog.compress` to true > Standalone Master hangs when streaming job completes and event logging is > enabled > - > > Key: SPARK-6270 > URL: https://issues.apache.org/jira/browse/SPARK-6270 > Project: Spark > Issue Type: Bug > Components: Deploy, Streaming >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.5.1 >Reporter: Tathagata Das >Priority: Critical > > If the event logging is enabled, the Spark Standalone Master tries to > recreate the web UI of a completed Spark application from its event logs. > However if this event log is huge (e.g. for a Spark Streaming application), > then the master hangs in its attempt to read and recreate the web ui. This > hang causes the whole standalone cluster to be unusable. > Workaround is to disable the event logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6270) Standalone Master hangs when streaming job completes and event logging is enabled
[ https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053924#comment-15053924 ] Andrew Or commented on SPARK-6270: -- I have filed a JIRA for it: https://issues.apache.org/jira/browse/SPARK-12299 > Standalone Master hangs when streaming job completes and event logging is > enabled > - > > Key: SPARK-6270 > URL: https://issues.apache.org/jira/browse/SPARK-6270 > Project: Spark > Issue Type: Bug > Components: Deploy, Streaming >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.5.1 >Reporter: Tathagata Das >Priority: Critical > > If the event logging is enabled, the Spark Standalone Master tries to > recreate the web UI of a completed Spark application from its event logs. > However if this event log is huge (e.g. for a Spark Streaming application), > then the master hangs in its attempt to read and recreate the web ui. This > hang causes the whole standalone cluster to be unusable. > Workaround is to disable the event logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12108) Event logs are much bigger in 1.6 than in 1.5
[ https://issues.apache.org/jira/browse/SPARK-12108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12108. --- Resolution: Fixed Fix Version/s: 1.6.0 > Event logs are much bigger in 1.6 than in 1.5 > - > > Key: SPARK-12108 > URL: https://issues.apache.org/jira/browse/SPARK-12108 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 1.6.0 > > > From running page rank, the event log in 1.5 is 1.3GB uncompressed, but in > 1.6 it's 6GB! > From a preliminary bisect, this commit is suspect: > https://github.com/apache/spark/commit/42d933fbba0584b39bd8218eafc44fb03aeb157d -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12251) Document Spark 1.6's off-heap memory configurations and add config validation
[ https://issues.apache.org/jira/browse/SPARK-12251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12251. --- Resolution: Fixed > Document Spark 1.6's off-heap memory configurations and add config validation > - > > Key: SPARK-12251 > URL: https://issues.apache.org/jira/browse/SPARK-12251 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 1.6.0 > > > We need to document the new off-heap memory limit configurations which were > added in Spark 1.6, add simple configuration validation (for instance, you > shouldn't be able to enable off-heap execution when the off-heap memory limit > is zero), and alias the old and confusing `spark.unsafe.offHeap` > configuration to something that lives in the `spark.memory` namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12251) Document Spark 1.6's off-heap memory configurations and add config validation
[ https://issues.apache.org/jira/browse/SPARK-12251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12251: -- Fix Version/s: 1.6.0 > Document Spark 1.6's off-heap memory configurations and add config validation > - > > Key: SPARK-12251 > URL: https://issues.apache.org/jira/browse/SPARK-12251 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 1.6.0 > > > We need to document the new off-heap memory limit configurations which were > added in Spark 1.6, add simple configuration validation (for instance, you > shouldn't be able to enable off-heap execution when the off-heap memory limit > is zero), and alias the old and confusing `spark.unsafe.offHeap` > configuration to something that lives in the `spark.memory` namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12155) Execution OOM after a relative large dataset cached in the cluster.
[ https://issues.apache.org/jira/browse/SPARK-12155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or reassigned SPARK-12155: - Assignee: Andrew Or (was: Josh Rosen) > Execution OOM after a relative large dataset cached in the cluster. > --- > > Key: SPARK-12155 > URL: https://issues.apache.org/jira/browse/SPARK-12155 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Reporter: Yin Huai >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.6.0 > > > I have a cluster with relative 80GB of mem. Then, I cached a 43GB dataframe. > When I start to consume the query. I got the following exception (I added > more logs to the code). > {code} > 15/12/05 00:33:43 INFO UnifiedMemoryManager: Creating UnifedMemoryManager for > 4 cores with 16929521664 maxMemory, 8464760832 storageRegionSize. > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 1048576 bytes of free space for > block rdd_94_37(free: 3253659951, max: 16798973952) > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 5142008 bytes of free space for > block rdd_94_37(free: 3252611375, max: 16798973952) > 15/12/05 01:20:50 INFO Executor: Finished task 36.0 in stage 4.0 (TID 109). > 3028 bytes result sent to driver > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 98948238 bytes of free space for > block rdd_94_37(free: 3314840375, max: 16866344960) > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 98675713 bytes of free space for > block rdd_94_37(free: 3215892137, max: 16866344960) > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 197347565 bytes of free space > for block rdd_94_37(free: 3117216424, max: 16866344960) > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 295995553 bytes of free space > for block rdd_94_37(free: 2919868859, max: 16866344960) > 15/12/05 01:20:51 INFO MemoryStore: Ensuring 394728479 bytes of free space > for block rdd_94_37(free: 2687050010, max: 16929521664) > 15/12/05 01:20:51 INFO Executor: Finished task 32.0 in stage 4.0 (TID 106). > 3028 bytes result sent to driver > 15/12/05 01:20:51 INFO MemoryStore: Ensuring 591258816 bytes of free space > for block rdd_94_37(free: 2292321531, max: 16929521664) > 15/12/05 01:20:51 INFO MemoryStore: Ensuring 901645182 bytes of free space > for block rdd_94_37(free: 1701062715, max: 16929521664) > 15/12/05 01:20:52 INFO MemoryStore: Ensuring 1302179076 bytes of free space > for block rdd_94_37(free: 799417533, max: 16929521664) > 15/12/05 01:20:52 INFO MemoryStore: Will not store rdd_94_37 as it would > require dropping another block from the same RDD > 15/12/05 01:20:52 WARN MemoryStore: Not enough space to cache rdd_94_37 in > memory! (computed 2.4 GB so far) > 15/12/05 01:20:52 INFO MemoryStore: Memory use = 12.6 GB (blocks) + 2.4 GB > (scratch space shared across 13 tasks(s)) = 15.0 GB. Storage limit = 15.8 GB. > 15/12/05 01:20:52 INFO BlockManager: Found block rdd_94_37 locally > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to acquire 262144 bytes > memory. But, on-heap execution memory poll only has 0 bytes free memory. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: memoryReclaimableFromStorage > 8464760832, storageMemoryPool.poolSize 16929521664, storageRegionSize > 8464760832. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to reclaim memory space from > storage memory pool. > 15/12/05 01:20:52 INFO StorageMemoryPool: Claiming 262144 bytes free memory > space from StorageMemoryPool. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Reclaimed 262144 bytes of memory > from storage memory pool.Adding them back to onHeapExecutionMemoryPool. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to acquire 67108864 bytes > memory. But, on-heap execution memory poll only has 0 bytes free memory. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: memoryReclaimableFromStorage > 8464498688, storageMemoryPool.poolSize 16929259520, storageRegionSize > 8464760832. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to reclaim memory space from > storage memory pool. > 15/12/05 01:20:52 INFO StorageMemoryPool: Claiming 67108864 bytes free memory > space from StorageMemoryPool. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Reclaimed 67108864 bytes of > memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool. > 15/12/05 01:20:54 INFO Executor: Finished task 37.0 in stage 4.0 (TID 110). > 3077 bytes result sent to driver > 15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 120 > 15/12/05 01:20:56 INFO Executor: Running task 1.0 in stage 5.0 (TID 120) > 15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 124 > 15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 128 > 15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 132 > 15/12/05 01:20:56
[jira] [Resolved] (SPARK-12155) Execution OOM after a relative large dataset cached in the cluster.
[ https://issues.apache.org/jira/browse/SPARK-12155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12155. --- Resolution: Fixed Fix Version/s: 1.6.0 > Execution OOM after a relative large dataset cached in the cluster. > --- > > Key: SPARK-12155 > URL: https://issues.apache.org/jira/browse/SPARK-12155 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Reporter: Yin Huai >Assignee: Josh Rosen >Priority: Blocker > Fix For: 1.6.0 > > > I have a cluster with relative 80GB of mem. Then, I cached a 43GB dataframe. > When I start to consume the query. I got the following exception (I added > more logs to the code). > {code} > 15/12/05 00:33:43 INFO UnifiedMemoryManager: Creating UnifedMemoryManager for > 4 cores with 16929521664 maxMemory, 8464760832 storageRegionSize. > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 1048576 bytes of free space for > block rdd_94_37(free: 3253659951, max: 16798973952) > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 5142008 bytes of free space for > block rdd_94_37(free: 3252611375, max: 16798973952) > 15/12/05 01:20:50 INFO Executor: Finished task 36.0 in stage 4.0 (TID 109). > 3028 bytes result sent to driver > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 98948238 bytes of free space for > block rdd_94_37(free: 3314840375, max: 16866344960) > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 98675713 bytes of free space for > block rdd_94_37(free: 3215892137, max: 16866344960) > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 197347565 bytes of free space > for block rdd_94_37(free: 3117216424, max: 16866344960) > 15/12/05 01:20:50 INFO MemoryStore: Ensuring 295995553 bytes of free space > for block rdd_94_37(free: 2919868859, max: 16866344960) > 15/12/05 01:20:51 INFO MemoryStore: Ensuring 394728479 bytes of free space > for block rdd_94_37(free: 2687050010, max: 16929521664) > 15/12/05 01:20:51 INFO Executor: Finished task 32.0 in stage 4.0 (TID 106). > 3028 bytes result sent to driver > 15/12/05 01:20:51 INFO MemoryStore: Ensuring 591258816 bytes of free space > for block rdd_94_37(free: 2292321531, max: 16929521664) > 15/12/05 01:20:51 INFO MemoryStore: Ensuring 901645182 bytes of free space > for block rdd_94_37(free: 1701062715, max: 16929521664) > 15/12/05 01:20:52 INFO MemoryStore: Ensuring 1302179076 bytes of free space > for block rdd_94_37(free: 799417533, max: 16929521664) > 15/12/05 01:20:52 INFO MemoryStore: Will not store rdd_94_37 as it would > require dropping another block from the same RDD > 15/12/05 01:20:52 WARN MemoryStore: Not enough space to cache rdd_94_37 in > memory! (computed 2.4 GB so far) > 15/12/05 01:20:52 INFO MemoryStore: Memory use = 12.6 GB (blocks) + 2.4 GB > (scratch space shared across 13 tasks(s)) = 15.0 GB. Storage limit = 15.8 GB. > 15/12/05 01:20:52 INFO BlockManager: Found block rdd_94_37 locally > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to acquire 262144 bytes > memory. But, on-heap execution memory poll only has 0 bytes free memory. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: memoryReclaimableFromStorage > 8464760832, storageMemoryPool.poolSize 16929521664, storageRegionSize > 8464760832. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to reclaim memory space from > storage memory pool. > 15/12/05 01:20:52 INFO StorageMemoryPool: Claiming 262144 bytes free memory > space from StorageMemoryPool. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Reclaimed 262144 bytes of memory > from storage memory pool.Adding them back to onHeapExecutionMemoryPool. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to acquire 67108864 bytes > memory. But, on-heap execution memory poll only has 0 bytes free memory. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: memoryReclaimableFromStorage > 8464498688, storageMemoryPool.poolSize 16929259520, storageRegionSize > 8464760832. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Try to reclaim memory space from > storage memory pool. > 15/12/05 01:20:52 INFO StorageMemoryPool: Claiming 67108864 bytes free memory > space from StorageMemoryPool. > 15/12/05 01:20:52 INFO UnifiedMemoryManager: Reclaimed 67108864 bytes of > memory from storage memory pool.Adding them back to onHeapExecutionMemoryPool. > 15/12/05 01:20:54 INFO Executor: Finished task 37.0 in stage 4.0 (TID 110). > 3077 bytes result sent to driver > 15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 120 > 15/12/05 01:20:56 INFO Executor: Running task 1.0 in stage 5.0 (TID 120) > 15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 124 > 15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 128 > 15/12/05 01:20:56 INFO CoarseGrainedExecutorBackend: Got assigned task 132 > 15/12/05
[jira] [Resolved] (SPARK-12253) UnifiedMemoryManager race condition: storage can starve new tasks
[ https://issues.apache.org/jira/browse/SPARK-12253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12253. --- Resolution: Fixed Fix Version/s: 1.6.0 > UnifiedMemoryManager race condition: storage can starve new tasks > - > > Key: SPARK-12253 > URL: https://issues.apache.org/jira/browse/SPARK-12253 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.6.0 > > > The following race condition is possible with the existing code in unified > memory management: > (1) Existing tasks collectively occupy all execution memory > (2) New task comes in and blocks while existing tasks spill > (3) After tasks finish spilling, another task jumps in and puts in a large > block, stealing the freed memory > (4) New task still cannot acquire memory and goes back to sleep -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12189) UnifiedMemoryManager double counts storage memory freed
[ https://issues.apache.org/jira/browse/SPARK-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12189. --- Resolution: Fixed Fix Version/s: 1.6.0 > UnifiedMemoryManager double counts storage memory freed > --- > > Key: SPARK-12189 > URL: https://issues.apache.org/jira/browse/SPARK-12189 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Josh Rosen >Priority: Blocker > Fix For: 1.6.0 > > > When execution evicts storage, we decrement the storage memory in two places: > (1) > https://github.com/apache/spark/blob/3e7e05f5ee763925ed60410d7de04cf36b723de1/core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala#L131 > (2) > https://github.com/apache/spark/blob/3e7e05f5ee763925ed60410d7de04cf36b723de1/core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala#L133 > (1) calls MemoryStore#ensureFreeSpace, which internally calls > MemoryManager#releaseStorageMemory for each block it drops. This call lowers > the storage memory used by the block size. > A seemingly simple fix is just to remove the line in (2). However, this bug > is actually masked by SPARK-12165, so this one must be fixed after that one. > Josh actually has an outstanding patch to fix both: > https://github.com/apache/spark/pull/10170 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12253) UnifiedMemoryManager race condition: storage can starve new tasks
Andrew Or created SPARK-12253: - Summary: UnifiedMemoryManager race condition: storage can starve new tasks Key: SPARK-12253 URL: https://issues.apache.org/jira/browse/SPARK-12253 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Blocker The following race condition is possible with the existing code in unified memory management: (1) Existing tasks collectively occupy all execution memory (2) New task comes in and blocks while existing tasks spill (3) After tasks finish spilling, another task jumps in and puts in a large block, stealing the freed memory (4) New task still cannot acquire memory and goes back to sleep -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12165) Execution memory requests may fail to evict storage blocks if storage memory usage is below max memory
[ https://issues.apache.org/jira/browse/SPARK-12165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12165. --- Resolution: Fixed Fix Version/s: 1.6.0 > Execution memory requests may fail to evict storage blocks if storage memory > usage is below max memory > -- > > Key: SPARK-12165 > URL: https://issues.apache.org/jira/browse/SPARK-12165 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Blocker > Fix For: 1.6.0 > > > Consider a scenario where storage memory usage has grown past the size of the > unevictable storage region ({{spark.memory.storageFraction}} * maxMemory) and > a task needs to acquire more execution memory by reclaiming evictable storage > memory. If the storage memory usage is less than maxMemory, then there's a > possibility that no storage blocks will be evicted. This is caused by how > {{MemoryStore.ensureFreeSpace()}} is called inside of > {{StorageMemoryPool.shrinkPoolToReclaimSpace()}}. > Here's a failing regression test which demonstrates this bug: > https://github.com/apache/spark/commit/b519fe628a9a2b8238dfedbfd9b74bdd2ddc0de4?diff=unified#diff-b3a7cd2e011e048908d70f743c0ed7cfR155 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12205) Pivot fails Analysis when aggregate is UnresolvedFunction
Andrew Ray created SPARK-12205: -- Summary: Pivot fails Analysis when aggregate is UnresolvedFunction Key: SPARK-12205 URL: https://issues.apache.org/jira/browse/SPARK-12205 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: Andrew Ray -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10872) Derby error (XSDB6) when creating new HiveContext after restarting SparkContext
[ https://issues.apache.org/jira/browse/SPARK-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047280#comment-15047280 ] Andrew King commented on SPARK-10872: - I am running into the same issue. I am calling HiveContext through IPython. If I try to run my code in the same instance of IPython more than once, I get this error. A HiveContext.close() would solve this issue for me (I use sc.close() to get around a similar problem with SparkContext). Some way to kill derby / hive through python would be great. > Derby error (XSDB6) when creating new HiveContext after restarting > SparkContext > --- > > Key: SPARK-10872 > URL: https://issues.apache.org/jira/browse/SPARK-10872 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0 >Reporter: Dmytro Bielievtsov > > Starting from spark 1.4.0 (works well on 1.3.1), the following code fails > with "XSDB6: Another instance of Derby may have already booted the database > ~/metastore_db": > {code:python} > from pyspark import SparkContext, HiveContext > sc = SparkContext("local[*]", "app1") > sql = HiveContext(sc) > sql.createDataFrame([[1]]).collect() > sc.stop() > sc = SparkContext("local[*]", "app2") > sql = HiveContext(sc) > sql.createDataFrame([[1]]).collect() # Py4J error > {code} > This is related to [#SPARK-9539], and I intend to restart spark context > several times for isolated jobs to prevent cache cluttering and GC errors. > Here's a larger part of the full error trace: > {noformat} > Failed to start database 'metastore_db' with class loader > org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@13015ec0, see > the next exception for details. > org.datanucleus.exceptions.NucleusDataStoreException: Failed to start > database 'metastore_db' with class loader > org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@13015ec0, see > the next exception for details. > at > org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:516) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:298) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) > at > org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) > at > org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) > at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) > at > org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) > at java.security.AccessController.doPrivileged(Native Method) > at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) > at > javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394) > at > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57) > at >
[jira] [Created] (SPARK-12211) Incorrect version number in graphx doc for migration from 1.1
Andrew Ray created SPARK-12211: -- Summary: Incorrect version number in graphx doc for migration from 1.1 Key: SPARK-12211 URL: https://issues.apache.org/jira/browse/SPARK-12211 Project: Spark Issue Type: Documentation Components: Documentation, GraphX Affects Versions: 1.5.2, 1.5.1, 1.5.0, 1.4.1, 1.4.0, 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0, 1.6.0 Reporter: Andrew Ray Priority: Minor Migration from 1.1 section added to the GraphX doc in 1.2.0 (see https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#migrating-from-spark-11) uses {{site.SPARK_VERSION}} as the version where changes were introduced, it should be just 1.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12189) UnifiedMemoryManager double counts storage memory freed
Andrew Or created SPARK-12189: - Summary: UnifiedMemoryManager double counts storage memory freed Key: SPARK-12189 URL: https://issues.apache.org/jira/browse/SPARK-12189 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Andrew Or Assignee: Josh Rosen Priority: Blocker When execution evicts storage, we decrement the storage memory in two places: (1) https://github.com/apache/spark/blob/3e7e05f5ee763925ed60410d7de04cf36b723de1/core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala#L131 (2) https://github.com/apache/spark/blob/3e7e05f5ee763925ed60410d7de04cf36b723de1/core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala#L133 (1) calls MemoryStore#ensureFreeSpace, which internally calls MemoryManager#releaseStorageMemory for each block it drops. This call lowers the storage memory used by the block size. A seemingly simple fix is just to remove the line in (2). However, this bug is actually masked by SPARK-12165, so this one must be fixed after that one. Josh actually has an outstanding patch to fix both: https://github.com/apache/spark/pull/10170 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12184) Make python api doc for pivot consistant with scala doc
Andrew Ray created SPARK-12184: -- Summary: Make python api doc for pivot consistant with scala doc Key: SPARK-12184 URL: https://issues.apache.org/jira/browse/SPARK-12184 Project: Spark Issue Type: Documentation Components: PySpark Affects Versions: 1.6.0 Reporter: Andrew Ray Priority: Trivial In SPARK-11946 the API for pivot was changed a bit and got updated doc, the doc changes were not made for the python api though. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4036) Add Conditional Random Fields (CRF) algorithm to Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042462#comment-15042462 ] Andrew Philpot commented on SPARK-4036: --- Hi, what is the maturity of this code? Are you interested in a tester? I have existing models and features for CRF++. Would love to simply migrate them to a native spark implementation. > Add Conditional Random Fields (CRF) algorithm to Spark MLlib > > > Key: SPARK-4036 > URL: https://issues.apache.org/jira/browse/SPARK-4036 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Guoqiang Li >Assignee: Kai Sasaki > Attachments: CRF_design.1.pdf > > > Conditional random fields (CRFs) are a class of statistical modelling method > often applied in pattern recognition and machine learning, where they are > used for structured prediction. > The paper: > http://www.seas.upenn.edu/~strctlrn/bib/PDF/crf.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11081) Make spark-core pull in Jersey and javax.ws.rs dependencies separately for easier overriding
[ https://issues.apache.org/jira/browse/SPARK-11081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042048#comment-15042048 ] Andrew Ash commented on SPARK-11081: Agreed, Spark 2.0 is the main opportunity to bump as many dependency versions as possible. Some of these constraints are imposed on Spark by its transitive dependencies, not the direct ones though. But I totally support doing a dependency inspection during the 2.0 phase. > Make spark-core pull in Jersey and javax.ws.rs dependencies separately for > easier overriding > > > Key: SPARK-11081 > URL: https://issues.apache.org/jira/browse/SPARK-11081 > Project: Spark > Issue Type: Improvement > Components: Build, Spark Core >Reporter: Mingyu Kim > > As seen from this thread > (https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCALte62yD8H3=2KVMiFs7NZjn929oJ133JkPLrNEj=vrx-d2...@mail.gmail.com%3E), > Spark is incompatible with Jersey 2 especially when Spark is embedded in an > application running with Jersey. > There was an in-depth discussion on options for shading and making it easier > for users to be able to use Jersey 2 with Spark applications: > https://github.com/apache/spark/pull/9615 > To recap the discussion, Jersey 1 has two issues: > 1. It has classes listed in META-INF/services/ files that would be loaded > even if Jersey 2 was being loaded on the classpath in a higher precedence. > This means that Jersey 2 would attempt to use Jersey 1 implementations in > some places regardless of user attempts to override the dependency with > things like userClassPathFirst. > 2. Jersey 1 packages javax.ws.rs classes inside itself, making it hard to > exclude just javax.ws.rs APIs and replace them with ones that Jersey 2 is > compatible with. > Also discussed was the fact that plain old shading doesn't work here, since > you would need to shade lines in META-INF/services as well, not just classes. > Not to mention that shading JAX-RS annotations is tricky as well. > To recap the discussion as what needs to happen Spark-side, we need to: > 1. Create a "org.spark-project.jersey" artifact (loosely speaking) which is > the Jersey 1 jar minus all the javax.ws.rs stuff (no need to actually > shade/namespace the classes that way, just the artifact name) > 2. Put all the javax.ws.rs stuff extracted from step 1 into its own artifact, > say "org.spark-project.javax.ws.rs". (META-INF/services/javax.ws.rs* files > live in this artifact as well) > 3. Spark-core's pom depends on org.spark-project artifacts from step 1 and 2 > 4. Spark assembly excludes META-INF/services/javax.ws.rs.* - it turns out > these files aren't actually necessary for Jersey 1 to function properly in > general (we need to test this more however) > Now a user that wants to depend on Jersey 2, and is depending on Spark maven > artifacts, would do the following in their application > 1. Provide my own dependency on Jersey 2 and its transitive javax.ws.rs > dependencies > 2. In my application's dependencies, exclude org.spark-project.javax.ws.rs > from spark-core. We keep org.spark-project.jersey because spark-core needs > it, but it will use the javax.ws.rs classes that my application is providing. > 3. Set spark.executor.userClassPathFirst=true and ship Jersey 2 and new > javax.ws.rs jars to the executors -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12133) Support dynamic allocation in Spark Streaming
Andrew Or created SPARK-12133: - Summary: Support dynamic allocation in Spark Streaming Key: SPARK-12133 URL: https://issues.apache.org/jira/browse/SPARK-12133 Project: Spark Issue Type: Bug Components: Spark Core, Streaming Reporter: Andrew Or Dynamic allocation is a feature that allows your cluster resources to scale up and down based on the workload. Currently it doesn't work well with Spark streaming because of several reasons: (1) Your executors may never be idle since they run something every N seconds (2) You should have at least one receiver running always (3) The existing heuristics don't take into account length of batch queue ... The goal of this JIRA is to provide better support for using dynamic allocation in streaming. The PR will revert the changes added in SPARK-10955, which warns against using dynamic allocation with streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12059) Standalone Master assertion error
[ https://issues.apache.org/jira/browse/SPARK-12059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-12059. --- > Standalone Master assertion error > - > > Key: SPARK-12059 > URL: https://issues.apache.org/jira/browse/SPARK-12059 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Saisai Shao >Priority: Critical > Fix For: 1.6.0 > > > {code} > 15/11/30 09:55:04 ERROR Inbox: Ignoring error > java.lang.AssertionError: assertion failed: executor 4 state transfer from > RUNNING to RUNNING is illegal > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
[ https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Davidson updated SPARK-12110: Attachment: launchCluster.sh.out launchCluster.sh launchCluster.sh is a wrapper around spark-ec2 script launchCluster.sh is the output from when I ran this script on nov 5th 2015 > spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build > Spark with Hive > > > Key: SPARK-12110 > URL: https://issues.apache.org/jira/browse/SPARK-12110 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.5.1 > Environment: cluster created using > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 >Reporter: Andrew Davidson > Attachments: launchCluster.sh, launchCluster.sh.out > > > I am using spark-1.5.1-bin-hadoop2.6. I used > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured > spark-env to use python3. I can not run the tokenizer sample code. Is there a > work around? > Kind regards > Andy > {code} > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 658 raise Exception("You must build Spark with Hive. " > 659 "Export 'SPARK_HIVE=true' and run " > --> 660 "build/sbt assembly", e) > 661 > 662 def _get_hive_ctx(self): > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError('An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) > http://spark.apache.org/docs/latest/ml-features.html#tokenizer > from pyspark.ml.feature import Tokenizer, RegexTokenizer > sentenceDataFrame = sqlContext.createDataFrame([ > (0, "Hi I heard about Spark"), > (1, "I wish Java could use case classes"), > (2, "Logistic,regression,models,are,neat") > ], ["label", "sentence"]) > tokenizer = Tokenizer(inputCol="sentence", outputCol="words") > wordsDataFrame = tokenizer.transform(sentenceDataFrame) > for words_label in wordsDataFrame.select("words", "label").take(3): > print(words_label) > --- > Py4JJavaError Traceback (most recent call last) > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 654 if not hasattr(self, '_scala_HiveContext'): > --> 655 self._scala_HiveContext = self._get_hive_ctx() > 656 return self._scala_HiveContext > /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) > 662 def _get_hive_ctx(self): > --> 663 return self._jvm.HiveContext(self._jsc.sc()) > 664 > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 700 return_value = get_return_value(answer, self._gateway_client, > None, > --> 701 self._fqn) > 702 > /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > Py4JJavaError: An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext. > : java.lang.RuntimeException: java.io.IOException: Filesystem closed > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) > at > org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) > at > org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) > at > org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:214) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) > at
[jira] [Commented] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
[ https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038574#comment-15038574 ] Andrew Davidson commented on SPARK-12110: - Hi Davies attached is a script I wrote to launch the cluster and the output it produced when I ran it on on nov 5th, 2015 I also included al file LaunchingSparkCluster.md with directions for how to configure the cluster to use java 8, python3, ... On an related note. I would like to update my cluster to 1.5.2. I have not been able to find any directions for how to this Kind regards Andy p > spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build > Spark with Hive > > > Key: SPARK-12110 > URL: https://issues.apache.org/jira/browse/SPARK-12110 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.5.1 > Environment: cluster created using > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 >Reporter: Andrew Davidson > Attachments: launchCluster.sh, launchCluster.sh.out, > launchingSparkCluster.md > > > I am using spark-1.5.1-bin-hadoop2.6. I used > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured > spark-env to use python3. I can not run the tokenizer sample code. Is there a > work around? > Kind regards > Andy > {code} > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 658 raise Exception("You must build Spark with Hive. " > 659 "Export 'SPARK_HIVE=true' and run " > --> 660 "build/sbt assembly", e) > 661 > 662 def _get_hive_ctx(self): > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError('An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) > http://spark.apache.org/docs/latest/ml-features.html#tokenizer > from pyspark.ml.feature import Tokenizer, RegexTokenizer > sentenceDataFrame = sqlContext.createDataFrame([ > (0, "Hi I heard about Spark"), > (1, "I wish Java could use case classes"), > (2, "Logistic,regression,models,are,neat") > ], ["label", "sentence"]) > tokenizer = Tokenizer(inputCol="sentence", outputCol="words") > wordsDataFrame = tokenizer.transform(sentenceDataFrame) > for words_label in wordsDataFrame.select("words", "label").take(3): > print(words_label) > --- > Py4JJavaError Traceback (most recent call last) > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 654 if not hasattr(self, '_scala_HiveContext'): > --> 655 self._scala_HiveContext = self._get_hive_ctx() > 656 return self._scala_HiveContext > /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) > 662 def _get_hive_ctx(self): > --> 663 return self._jvm.HiveContext(self._jsc.sc()) > 664 > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 700 return_value = get_return_value(answer, self._gateway_client, > None, > --> 701 self._fqn) > 702 > /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > Py4JJavaError: An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext. > : java.lang.RuntimeException: java.io.IOException: Filesystem closed > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) > at > org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) > at > org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) > at > org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) >
[jira] [Updated] (SPARK-12133) Support dynamic allocation in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12133: -- Target Version/s: 2.0.0 > Support dynamic allocation in Spark Streaming > - > > Key: SPARK-12133 > URL: https://issues.apache.org/jira/browse/SPARK-12133 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Andrew Or > > Dynamic allocation is a feature that allows your cluster resources to scale > up and down based on the workload. Currently it doesn't work well with Spark > streaming because of several reasons: > (1) Your executors may never be idle since they run something every N seconds > (2) You should have at least one receiver running always > (3) The existing heuristics don't take into account length of batch queue > ... > The goal of this JIRA is to provide better support for using dynamic > allocation in streaming. The PR will revert the changes added in SPARK-10955, > which warns against using dynamic allocation with streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12111) need upgrade instruction
[ https://issues.apache.org/jira/browse/SPARK-12111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038563#comment-15038563 ] Andrew Davidson commented on SPARK-12111: - Hi Sean I am unable to find instructions for upgrading existing installations. Can you point me at the documentation for upgrading ? I built the culster using the using spark-ec2 script. Kind Regards Andy > need upgrade instruction > > > Key: SPARK-12111 > URL: https://issues.apache.org/jira/browse/SPARK-12111 > Project: Spark > Issue Type: Documentation > Components: EC2 >Affects Versions: 1.5.1 >Reporter: Andrew Davidson > Labels: build, documentation > > I have looked all over the spark website and googled. I have not found > instructions for how to upgrade spark in general let alone a cluster created > by using spark-ec2 script > thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
[ https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Davidson updated SPARK-12110: Attachment: launchingSparkCluster.md > spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build > Spark with Hive > > > Key: SPARK-12110 > URL: https://issues.apache.org/jira/browse/SPARK-12110 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.5.1 > Environment: cluster created using > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 >Reporter: Andrew Davidson > Attachments: launchCluster.sh, launchCluster.sh.out, > launchingSparkCluster.md > > > I am using spark-1.5.1-bin-hadoop2.6. I used > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured > spark-env to use python3. I can not run the tokenizer sample code. Is there a > work around? > Kind regards > Andy > {code} > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 658 raise Exception("You must build Spark with Hive. " > 659 "Export 'SPARK_HIVE=true' and run " > --> 660 "build/sbt assembly", e) > 661 > 662 def _get_hive_ctx(self): > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError('An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) > http://spark.apache.org/docs/latest/ml-features.html#tokenizer > from pyspark.ml.feature import Tokenizer, RegexTokenizer > sentenceDataFrame = sqlContext.createDataFrame([ > (0, "Hi I heard about Spark"), > (1, "I wish Java could use case classes"), > (2, "Logistic,regression,models,are,neat") > ], ["label", "sentence"]) > tokenizer = Tokenizer(inputCol="sentence", outputCol="words") > wordsDataFrame = tokenizer.transform(sentenceDataFrame) > for words_label in wordsDataFrame.select("words", "label").take(3): > print(words_label) > --- > Py4JJavaError Traceback (most recent call last) > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 654 if not hasattr(self, '_scala_HiveContext'): > --> 655 self._scala_HiveContext = self._get_hive_ctx() > 656 return self._scala_HiveContext > /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) > 662 def _get_hive_ctx(self): > --> 663 return self._jvm.HiveContext(self._jsc.sc()) > 664 > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 700 return_value = get_return_value(answer, self._gateway_client, > None, > --> 701 self._fqn) > 702 > /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > Py4JJavaError: An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext. > : java.lang.RuntimeException: java.io.IOException: Filesystem closed > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) > at > org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) > at > org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) > at > org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:214) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at
[jira] [Updated] (SPARK-12133) Support dynamic allocation in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12133: -- Attachment: dynamic-allocation-streaming-design.pdf > Support dynamic allocation in Spark Streaming > - > > Key: SPARK-12133 > URL: https://issues.apache.org/jira/browse/SPARK-12133 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Andrew Or >Assignee: Tathagata Das > Attachments: dynamic-allocation-streaming-design.pdf > > > Dynamic allocation is a feature that allows your cluster resources to scale > up and down based on the workload. Currently it doesn't work well with Spark > streaming because of several reasons: > (1) Your executors may never be idle since they run something every N seconds > (2) You should have at least one receiver running always > (3) The existing heuristics don't take into account length of batch queue > ... > The goal of this JIRA is to provide better support for using dynamic > allocation in streaming. A design doc will be posted shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12133) Support dynamic allocation in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12133: -- Description: Dynamic allocation is a feature that allows your cluster resources to scale up and down based on the workload. Currently it doesn't work well with Spark streaming because of several reasons: (1) Your executors may never be idle since they run something every N seconds (2) You should have at least one receiver running always (3) The existing heuristics don't take into account length of batch queue ... The goal of this JIRA is to provide better support for using dynamic allocation in streaming. A design doc will be posted shortly. was: Dynamic allocation is a feature that allows your cluster resources to scale up and down based on the workload. Currently it doesn't work well with Spark streaming because of several reasons: (1) Your executors may never be idle since they run something every N seconds (2) You should have at least one receiver running always (3) The existing heuristics don't take into account length of batch queue ... The goal of this JIRA is to provide better support for using dynamic allocation in streaming. > Support dynamic allocation in Spark Streaming > - > > Key: SPARK-12133 > URL: https://issues.apache.org/jira/browse/SPARK-12133 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Andrew Or > > Dynamic allocation is a feature that allows your cluster resources to scale > up and down based on the workload. Currently it doesn't work well with Spark > streaming because of several reasons: > (1) Your executors may never be idle since they run something every N seconds > (2) You should have at least one receiver running always > (3) The existing heuristics don't take into account length of batch queue > ... > The goal of this JIRA is to provide better support for using dynamic > allocation in streaming. A design doc will be posted shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12133) Support dynamic allocation in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12133: -- Assignee: Tathagata Das > Support dynamic allocation in Spark Streaming > - > > Key: SPARK-12133 > URL: https://issues.apache.org/jira/browse/SPARK-12133 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Andrew Or >Assignee: Tathagata Das > > Dynamic allocation is a feature that allows your cluster resources to scale > up and down based on the workload. Currently it doesn't work well with Spark > streaming because of several reasons: > (1) Your executors may never be idle since they run something every N seconds > (2) You should have at least one receiver running always > (3) The existing heuristics don't take into account length of batch queue > ... > The goal of this JIRA is to provide better support for using dynamic > allocation in streaming. A design doc will be posted shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12133) Support dynamic allocation in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-12133: -- Description: Dynamic allocation is a feature that allows your cluster resources to scale up and down based on the workload. Currently it doesn't work well with Spark streaming because of several reasons: (1) Your executors may never be idle since they run something every N seconds (2) You should have at least one receiver running always (3) The existing heuristics don't take into account length of batch queue ... The goal of this JIRA is to provide better support for using dynamic allocation in streaming. was: Dynamic allocation is a feature that allows your cluster resources to scale up and down based on the workload. Currently it doesn't work well with Spark streaming because of several reasons: (1) Your executors may never be idle since they run something every N seconds (2) You should have at least one receiver running always (3) The existing heuristics don't take into account length of batch queue ... The goal of this JIRA is to provide better support for using dynamic allocation in streaming. The PR will revert the changes added in SPARK-10955, which warns against using dynamic allocation with streaming. > Support dynamic allocation in Spark Streaming > - > > Key: SPARK-12133 > URL: https://issues.apache.org/jira/browse/SPARK-12133 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Andrew Or > > Dynamic allocation is a feature that allows your cluster resources to scale > up and down based on the workload. Currently it doesn't work well with Spark > streaming because of several reasons: > (1) Your executors may never be idle since they run something every N seconds > (2) You should have at least one receiver running always > (3) The existing heuristics don't take into account length of batch queue > ... > The goal of this JIRA is to provide better support for using dynamic > allocation in streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12100) bug in spark/python/pyspark/rdd.py portable_hash()
Andrew Davidson created SPARK-12100: --- Summary: bug in spark/python/pyspark/rdd.py portable_hash() Key: SPARK-12100 URL: https://issues.apache.org/jira/browse/SPARK-12100 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.5.1 Reporter: Andrew Davidson Priority: Minor I am using spark-1.5.1-bin-hadoop2.6. I used spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured spark-env to use python3. I get and exception 'Randomness of hash of string should be disabled via PYTHONHASHSEED’. Is there any reason rdd.py should not just set PYTHONHASHSEED ? Should I file a bug? Kind regards Andy details http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=subtract#pyspark.RDD.subtract Example from documentation does not work out of the box Subtract(other, numPartitions=None) Return each value in self that is not contained in other. >>> x = sc.parallelize([("a", 1), ("b", 4), ("b", 5), ("a", 3)]) >>> y = sc.parallelize([("a", 3), ("c", None)]) >>> sorted(x.subtract(y).collect()) [('a', 1), ('b', 4), ('b', 5)] It raises if sys.version >= '3.3' and 'PYTHONHASHSEED' not in os.environ: raise Exception("Randomness of hash of string should be disabled via PYTHONHASHSEED") The following script fixes the problem Sudo printf "\n# set PYTHONHASHSEED so python3 will not generate Exception'Randomness of hash of string should be disabled via PYTHONHASHSEED'\nexport PYTHONHASHSEED=123\n" >> /root/spark/conf/spark-env.sh sudo pssh -i -h /root/spark-ec2/slaves cp /root/spark/conf/spark-env.sh /root/spark/conf/spark-env.sh-`date "+%Y-%m-%d:%H:%M"` Sudo for i in `cat slaves` ; do scp spark-env.sh root@$i:/root/spark/conf/spark-env.sh; done This is how I am starting spark export PYSPARK_PYTHON=python3.4 export PYSPARK_DRIVER_PYTHON=python3.4 export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN" $SPARK_ROOT/bin/pyspark \ --master $MASTER_URL \ --total-executor-cores $numCores \ --driver-memory 2G \ --executor-memory 2G \ $extraPkgs \ $* see email thread "possible bug spark/python/pyspark/rdd.py portable_hash()' on user@spark for more info -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
Andrew Davidson created SPARK-12110: --- Summary: spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive Key: SPARK-12110 URL: https://issues.apache.org/jira/browse/SPARK-12110 Project: Spark Issue Type: Bug Components: ML, PySpark, SQL Affects Versions: 1.5.1 Environment: cluster created using spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 Reporter: Andrew Davidson I am using spark-1.5.1-bin-hadoop2.6. I used spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured spark-env to use python3. I can not run the tokenizer sample code. Is there a work around? Kind regards Andy /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) 658 raise Exception("You must build Spark with Hive. " 659 "Export 'SPARK_HIVE=true' and run " --> 660 "build/sbt assembly", e) 661 662 def _get_hive_ctx(self): Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError('An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) http://spark.apache.org/docs/latest/ml-features.html#tokenizer from pyspark.ml.feature import Tokenizer, RegexTokenizer sentenceDataFrame = sqlContext.createDataFrame([ (0, "Hi I heard about Spark"), (1, "I wish Java could use case classes"), (2, "Logistic,regression,models,are,neat") ], ["label", "sentence"]) tokenizer = Tokenizer(inputCol="sentence", outputCol="words") wordsDataFrame = tokenizer.transform(sentenceDataFrame) for words_label in wordsDataFrame.select("words", "label").take(3): print(words_label) --- Py4JJavaError Traceback (most recent call last) /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) 654 if not hasattr(self, '_scala_HiveContext'): --> 655 self._scala_HiveContext = self._get_hive_ctx() 656 return self._scala_HiveContext /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) 662 def _get_hive_ctx(self): --> 663 return self._jvm.HiveContext(self._jsc.sc()) 664 /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args) 700 return_value = get_return_value(answer, self._gateway_client, None, --> 701 self._fqn) 702 /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 35 try: ---> 36 return f(*a, **kw) 37 except py4j.protocol.Py4JJavaError as e: /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 299 'An error occurred while calling {0}{1}{2}.\n'. --> 300 format(target_id, '.', name), value) 301 else: Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1057) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:554) at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599) at
[jira] [Commented] (SPARK-12111) need upgrade instruction
[ https://issues.apache.org/jira/browse/SPARK-12111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036888#comment-15036888 ] Andrew Davidson commented on SPARK-12111: - This is where someone that knows the details of how spark gets build and installed needs to provide some directions > need upgrade instruction > > > Key: SPARK-12111 > URL: https://issues.apache.org/jira/browse/SPARK-12111 > Project: Spark > Issue Type: Documentation > Components: EC2 >Affects Versions: 1.5.1 >Reporter: Andrew Davidson > Labels: build, documentation > > I have looked all over the spark website and googled. I have not found > instructions for how to upgrade spark in general let alone a cluster created > by using spark-ec2 script > thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12111) need upgrade instruction
[ https://issues.apache.org/jira/browse/SPARK-12111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036887#comment-15036887 ] Andrew Davidson commented on SPARK-12111: - Hi Sean I understand I will need to stop by cluster to change to a different version. I am looking for directions for how to "change to a different version" E.G. on my local mac I have several different versions of spark down loaded. I have an env var SPARK_ROOT=pathToVersion I want to use. To use something like pyspark I would $ $SPARK_ROOT/bin/pyspark I am looking for direction for how to do something similar in a cluster env. I think the rough steps would be 1) stop the cluster 2) down load the binary. Is the binary the same on all the machines (ie. masters and slaves?) 3) I am not sure what do do about the config/* > need upgrade instruction > > > Key: SPARK-12111 > URL: https://issues.apache.org/jira/browse/SPARK-12111 > Project: Spark > Issue Type: Documentation > Components: EC2 >Affects Versions: 1.5.1 >Reporter: Andrew Davidson > Labels: build, documentation > > I have looked all over the spark website and googled. I have not found > instructions for how to upgrade spark in general let alone a cluster created > by using spark-ec2 script > thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-12111) need upgrade instruction
[ https://issues.apache.org/jira/browse/SPARK-12111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Davidson reopened SPARK-12111: - Hi Sean It must be possible for customers to upgrade installations. Given Spark is written in java it is probably a matter of replacing jar files and maybe making a few changes to config files. Who ever is responsible for build/release of spark can probably write down the instructions. Its not reasonable to say destroy your old cluster and re-install it. In my experience spark does not work out of the box. You have to do a lot of work to configure it properly. I have a lot of data on HDFS I can not simply move it sincerely yours Andy > need upgrade instruction > > > Key: SPARK-12111 > URL: https://issues.apache.org/jira/browse/SPARK-12111 > Project: Spark > Issue Type: Documentation > Components: EC2 >Affects Versions: 1.5.1 >Reporter: Andrew Davidson > Labels: build, documentation > > I have looked all over the spark website and googled. I have not found > instructions for how to upgrade spark in general let alone a cluster created > by using spark-ec2 script > thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12108) Event logs are much bigger in 1.6 than in 1.5
Andrew Or created SPARK-12108: - Summary: Event logs are much bigger in 1.6 than in 1.5 Key: SPARK-12108 URL: https://issues.apache.org/jira/browse/SPARK-12108 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Andrew Or Assignee: Andrew Or >From running page rank, the event log in 1.5 is 1.3GB uncompressed, but in 1.6 >it's 6GB! >From a preliminary bisect, this commit is suspect: https://github.com/apache/spark/commit/42d933fbba0584b39bd8218eafc44fb03aeb157d -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12111) need upgrade instruction
Andrew Davidson created SPARK-12111: --- Summary: need upgrade instruction Key: SPARK-12111 URL: https://issues.apache.org/jira/browse/SPARK-12111 Project: Spark Issue Type: Documentation Components: EC2 Affects Versions: 1.5.1 Reporter: Andrew Davidson I have looked all over the spark website and googled. I have not found instructions for how to upgrade spark in general let alone a cluster created by using spark-ec2 script thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
[ https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037017#comment-15037017 ] Andrew Davidson commented on SPARK-12110: - Hi Patrick Here is how I start my notebook on my cluster. $ cat ../bin/startIPythonNotebook.sh export SPARK_ROOT=/root/spark export MASTER_URL=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077 export PYSPARK_PYTHON=python3.4 export PYSPARK_DRIVER_PYTHON=python3.4 export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN" extraPkgs='--packages com.databricks:spark-csv_2.11:1.3.0' numCores=3 # one for driver 2 for workers $SPARK_ROOT/bin/pyspark \ --master $MASTER_URL \ --total-executor-cores $numCores \ --driver-memory 2G \ --executor-memory 2G \ $extraPkgs \ $* > spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build > Spark with Hive > > > Key: SPARK-12110 > URL: https://issues.apache.org/jira/browse/SPARK-12110 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.5.1 > Environment: cluster created using > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 >Reporter: Andrew Davidson > > I am using spark-1.5.1-bin-hadoop2.6. I used > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured > spark-env to use python3. I can not run the tokenizer sample code. Is there a > work around? > Kind regards > Andy > {code} > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 658 raise Exception("You must build Spark with Hive. " > 659 "Export 'SPARK_HIVE=true' and run " > --> 660 "build/sbt assembly", e) > 661 > 662 def _get_hive_ctx(self): > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError('An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) > http://spark.apache.org/docs/latest/ml-features.html#tokenizer > from pyspark.ml.feature import Tokenizer, RegexTokenizer > sentenceDataFrame = sqlContext.createDataFrame([ > (0, "Hi I heard about Spark"), > (1, "I wish Java could use case classes"), > (2, "Logistic,regression,models,are,neat") > ], ["label", "sentence"]) > tokenizer = Tokenizer(inputCol="sentence", outputCol="words") > wordsDataFrame = tokenizer.transform(sentenceDataFrame) > for words_label in wordsDataFrame.select("words", "label").take(3): > print(words_label) > --- > Py4JJavaError Traceback (most recent call last) > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 654 if not hasattr(self, '_scala_HiveContext'): > --> 655 self._scala_HiveContext = self._get_hive_ctx() > 656 return self._scala_HiveContext > /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) > 662 def _get_hive_ctx(self): > --> 663 return self._jvm.HiveContext(self._jsc.sc()) > 664 > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 700 return_value = get_return_value(answer, self._gateway_client, > None, > --> 701 self._fqn) > 702 > /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > Py4JJavaError: An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext. > : java.lang.RuntimeException: java.io.IOException: Filesystem closed > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) > at > org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) > at > org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) > at > org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at
[jira] [Commented] (SPARK-12110) spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build Spark with Hive
[ https://issues.apache.org/jira/browse/SPARK-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037022#comment-15037022 ] Andrew Davidson commented on SPARK-12110: - Hi Patrick when I run the same example code on my local Macbook Pro. It runs fine. I am newbie. Is the spark-ec2 script deprecated? I noticed on my cluster [ec2-user@ip-172-31-29-60 notebooks]$ cat /root/spark/RELEASE Spark 1.5.1 built for Hadoop 1.2.1 Build flags: -Psparkr -Phadoop-1 -Phive -Phive-thriftserver -DzincPort=3030 [ec2-user@ip-172-31-29-60 notebooks]$ on my local mac $ cat ./spark-1.5.1-bin-hadoop2.6/RELEASE Spark 1.5.1 built for Hadoop 2.6.0 Build flags: -Psparkr -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn -DzincPort=3034 $ It looks like the ./spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 maybe installed the wrong version of spark? > spark-1.5.1-bin-hadoop2.6; pyspark.ml.feature Exception: ("You must build > Spark with Hive > > > Key: SPARK-12110 > URL: https://issues.apache.org/jira/browse/SPARK-12110 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.5.1 > Environment: cluster created using > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 >Reporter: Andrew Davidson > > I am using spark-1.5.1-bin-hadoop2.6. I used > spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured > spark-env to use python3. I can not run the tokenizer sample code. Is there a > work around? > Kind regards > Andy > {code} > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 658 raise Exception("You must build Spark with Hive. " > 659 "Export 'SPARK_HIVE=true' and run " > --> 660 "build/sbt assembly", e) > 661 > 662 def _get_hive_ctx(self): > Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run > build/sbt assembly", Py4JJavaError('An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38)) > http://spark.apache.org/docs/latest/ml-features.html#tokenizer > from pyspark.ml.feature import Tokenizer, RegexTokenizer > sentenceDataFrame = sqlContext.createDataFrame([ > (0, "Hi I heard about Spark"), > (1, "I wish Java could use case classes"), > (2, "Logistic,regression,models,are,neat") > ], ["label", "sentence"]) > tokenizer = Tokenizer(inputCol="sentence", outputCol="words") > wordsDataFrame = tokenizer.transform(sentenceDataFrame) > for words_label in wordsDataFrame.select("words", "label").take(3): > print(words_label) > --- > Py4JJavaError Traceback (most recent call last) > /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) > 654 if not hasattr(self, '_scala_HiveContext'): > --> 655 self._scala_HiveContext = self._get_hive_ctx() > 656 return self._scala_HiveContext > /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) > 662 def _get_hive_ctx(self): > --> 663 return self._jvm.HiveContext(self._jsc.sc()) > 664 > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 700 return_value = get_return_value(answer, self._gateway_client, > None, > --> 701 self._fqn) > 702 > /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 35 try: > ---> 36 return f(*a, **kw) > 37 except py4j.protocol.Py4JJavaError as e: > /root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > Py4JJavaError: An error occurred while calling > None.org.apache.spark.sql.hive.HiveContext. > : java.lang.RuntimeException: java.io.IOException: Filesystem closed > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) > at > org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:171) > at > org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) > at > org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) > at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:167) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at >
[jira] [Created] (SPARK-12081) Make unified memory management work with small heaps
Andrew Or created SPARK-12081: - Summary: Make unified memory management work with small heaps Key: SPARK-12081 URL: https://issues.apache.org/jira/browse/SPARK-12081 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical By default, Spark drivers and executors are 1GB. With the recent unified memory mode, only 250MB is set aside for non-storage non-execution purposes (spark.memory.fraction is 75%). However, especially in local mode, the driver needs at least ~300MB. Some local jobs started to OOM because of this. Two mutually exclusive proposals: (1) First, cut out 300 MB, then take 75% of what remains (2) Use min(75% of JVM heap size, JVM heap size - 300MB) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12059) Standalone Master assertion error
[ https://issues.apache.org/jira/browse/SPARK-12059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035060#comment-15035060 ] Andrew Or commented on SPARK-12059: --- I think this only happens under error conditions, but it's still bad to see an assertion failure. I'm not sure exactly how it happened but I believe it's a side effect of some other error. > Standalone Master assertion error > - > > Key: SPARK-12059 > URL: https://issues.apache.org/jira/browse/SPARK-12059 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.6.0 >Reporter: Andrew Or >Assignee: Saisai Shao >Priority: Critical > > {code} > 15/11/30 09:55:04 ERROR Inbox: Ignoring error > java.lang.AssertionError: assertion failed: executor 4 state transfer from > RUNNING to RUNNING is illegal > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org