[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-30 Thread Ashwin Agate (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645
 ] 

Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM:
---

 
{code:java}
 Can we please increase the priority of this bug since it exists in latest 
Spark 2.3.0 too?  We have observed this during upgrade scenario (with Spark 
1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect 
of spark master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked – master shutting down.{code}


was (Author: agateaaa):
Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked -- master shutting down.
{code:java}
 {code}

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export 

[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-30 Thread Ashwin Agate (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645
 ] 

Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM:
---

  Can we please increase the priority of this bug since it exists in latest 
Spark 2.3.0 too?  We have observed this during upgrade scenario (with Spark 
1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect 
of spark master shutting down on other nodes which is not very ideal.
{code:java}
 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked – master shutting down.{code}


was (Author: agateaaa):
 
{code:java}
 Can we please increase the priority of this bug since it exists in latest 
Spark 2.3.0 too?  We have observed this during upgrade scenario (with Spark 
1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect 
of spark master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked – master shutting down.{code}

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export 

[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-30 Thread Ashwin Agate (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645
 ] 

Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM:
---

Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked -- master shutting down.
{code:java}
 {code}


was (Author: agateaaa):
Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 8:02 AM:
--

cc [~vanzin]  [~rxin] [~yhuai]


was (Author: zuo.tingbing9):
cc [~vanzin]  [~rxin] Xiao Li

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 8:01 AM:
--

cc [~vanzin]  [~rxin] Xiao Li


was (Author: zuo.tingbing9):
cc [~vanzin]  [~rxin] [gatorsmile|https://github.com/gatorsmile]

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:53 AM:
--

cc [~vanzin]  [~rxin] [gatorsmile|https://github.com/gatorsmile]


was (Author: zuo.tingbing9):
cc [~vanzin]  [gatorsmile|https://github.com/gatorsmile]

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:51 AM:
--

cc [~vanzin]  [gatorsmile|https://github.com/gatorsmile]


was (Author: zuo.tingbing9):
cc [~vanzin]  

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:49 AM:
--

cc [~vanzin]  gatorsmile


was (Author: zuo.tingbing9):
cc [~vanzin] 

cc +gatorsmile+

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:49 AM:
--

cc [~vanzin]  


was (Author: zuo.tingbing9):
cc [~vanzin]  gatorsmile

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:48 AM:
--

cc [~vanzin] 

cc @gatorsmile


was (Author: zuo.tingbing9):
cc [~vanzin] gatorsmile

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:48 AM:
--

cc [~vanzin] 

cc +gatorsmile+


was (Author: zuo.tingbing9):
cc [~vanzin] 

cc @gatorsmile

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:47 AM:
--

cc [~vanzin] gatorsmile


was (Author: zuo.tingbing9):
cc [~vanzin] 

cc [gatorsmile|https://github.com/gatorsmile]

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:45 AM:
--

cc [~vanzin] 

cc [gatorsmile|https://github.com/gatorsmile]


was (Author: zuo.tingbing9):
cc [~vanzin]  [gatorsmile|https://github.com/gatorsmile]

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:44 AM:
--

cc [~vanzin]  [gatorsmile|https://github.com/gatorsmile]


was (Author: zuo.tingbing9):
cc [~vanzin]  *[gatorsmile|https://github.com/gatorsmile]*

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:43 AM:
--

cc [~vanzin]  *[gatorsmile|https://github.com/gatorsmile]*


was (Author: zuo.tingbing9):
cc [~vanzin]  @*[gatorsmile|https://github.com/gatorsmile]*

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2018-04-17 Thread zuotingbing (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535
 ] 

zuotingbing edited comment on SPARK-15544 at 4/17/18 7:43 AM:
--

cc [~vanzin]  @*[gatorsmile|https://github.com/gatorsmile]*


was (Author: zuo.tingbing9):
cc [~vanzin]

> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit

2016-07-22 Thread Avik Sil (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389189#comment-15389189
 ] 

Avik Sil edited comment on SPARK-15544 at 7/22/16 9:11 AM:
---

I am also seeing the same issue with spark 1.3.0, ubuntu 14.04, zookeeper 3.4.5

We have a 3 node cluster with spark and zookeeper. We also have a automatic 
restarter service which checks for the status of spark master every 5 min and 
restarts it if it is not running. So when the master shuts down after its 
leadership is revoked, the restarter service starts spark master within 5 min. 
But in *few cases* we don't see any ALIVE spark master in any of the three 
nodes - we don't see any "We have gained leadership" message in any of the 3 
nodes.

So in a sense the workaround suggested in the above comment does not work.

>From spark-defaults.conf:

spark.deploy.recoveryMode ZOOKEEPER
spark.deploy.zookeeper.url 
192.168.42.2:28000,192.168.42.3:28000,192.168.42.4:28000
spark.deploy.recoveryDirectory 
/var/run/sparkmaster/df71911f-a28d-409d-977f-ea2e596ec578/recovery
spark.akka.logLifecycleEvents true



was (Author: avik...@gmail.com):
I am also seeing the same issue with spark 1.3.0, ubuntu 14.04, zookeeper 3.4.5

We have a 3 node cluster with spark and zookeeper. We also have a automatic 
restarter service which checks for the status of spark master every 5 min and 
restarts it if it is not running. So when the master shuts down after its 
leadership is revoked, the restarter service starts spark master within 5 min. 
But in *few cases* we don't see any ALIVE spark master in any of the three 
nodes - we don't see any "We have gained leadership" message in any of the 3 
nodes.

>From spark-defaults.conf:

spark.deploy.recoveryMode ZOOKEEPER
spark.deploy.zookeeper.url 
192.168.42.2:28000,192.168.42.3:28000,192.168.42.4:28000
spark.deploy.recoveryDirectory 
/var/run/sparkmaster/df71911f-a28d-409d-977f-ea2e596ec578/recovery
spark.akka.logLifecycleEvents true


> Bouncing Zookeeper node causes Active spark master to exit
> --
>
> Key: SPARK-15544
> URL: https://issues.apache.org/jira/browse/SPARK-15544
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>Reporter: Steven Lowenthal
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org