[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645 ] Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM: --- {code:java} Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked – master shutting down.{code} was (Author: agateaaa): Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked -- master shutting down. {code:java} {code} > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645 ] Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM: --- Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. {code:java} Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked – master shutting down.{code} was (Author: agateaaa): {code:java} Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked – master shutting down.{code} > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1645#comment-1645 ] Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM: --- Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:40588 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: spark-box1:7078 got disassociated, removing it. Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing worker worker-20180427105900-spark-box1-7078 on 19 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling app of lost executor: 2 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: Unable to read additional data from server sessionid 0x1630 Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ConnectionStateManager: State change: SUSPENDED Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ZooKeeperLeaderElectionAgent: We have lost leadership Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: Leadership has been revoked -- master shutting down. {code:java} {code} was (Author: agateaaa): Can we please increase the priority of this bug since it exists in latest Spark 2.3.0 too? We have observed this during upgrade scenario (with Spark 1.6.3), where we have to shutdown zookeeper, which has the adverse side-effect of spark master shutting down on other nodes which is not very ideal. > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 8:02 AM: -- cc [~vanzin] [~rxin] [~yhuai] was (Author: zuo.tingbing9): cc [~vanzin] [~rxin] Xiao Li > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 8:01 AM: -- cc [~vanzin] [~rxin] Xiao Li was (Author: zuo.tingbing9): cc [~vanzin] [~rxin] [gatorsmile|https://github.com/gatorsmile] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:53 AM: -- cc [~vanzin] [~rxin] [gatorsmile|https://github.com/gatorsmile] was (Author: zuo.tingbing9): cc [~vanzin] [gatorsmile|https://github.com/gatorsmile] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:51 AM: -- cc [~vanzin] [gatorsmile|https://github.com/gatorsmile] was (Author: zuo.tingbing9): cc [~vanzin] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:49 AM: -- cc [~vanzin] gatorsmile was (Author: zuo.tingbing9): cc [~vanzin] cc +gatorsmile+ > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:49 AM: -- cc [~vanzin] was (Author: zuo.tingbing9): cc [~vanzin] gatorsmile > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:48 AM: -- cc [~vanzin] cc @gatorsmile was (Author: zuo.tingbing9): cc [~vanzin] gatorsmile > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:48 AM: -- cc [~vanzin] cc +gatorsmile+ was (Author: zuo.tingbing9): cc [~vanzin] cc @gatorsmile > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:47 AM: -- cc [~vanzin] gatorsmile was (Author: zuo.tingbing9): cc [~vanzin] cc [gatorsmile|https://github.com/gatorsmile] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:45 AM: -- cc [~vanzin] cc [gatorsmile|https://github.com/gatorsmile] was (Author: zuo.tingbing9): cc [~vanzin] [gatorsmile|https://github.com/gatorsmile] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:44 AM: -- cc [~vanzin] [gatorsmile|https://github.com/gatorsmile] was (Author: zuo.tingbing9): cc [~vanzin] *[gatorsmile|https://github.com/gatorsmile]* > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:43 AM: -- cc [~vanzin] *[gatorsmile|https://github.com/gatorsmile]* was (Author: zuo.tingbing9): cc [~vanzin] @*[gatorsmile|https://github.com/gatorsmile]* > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:43 AM: -- cc [~vanzin] @*[gatorsmile|https://github.com/gatorsmile]* was (Author: zuo.tingbing9): cc [~vanzin] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389189#comment-15389189 ] Avik Sil edited comment on SPARK-15544 at 7/22/16 9:11 AM: --- I am also seeing the same issue with spark 1.3.0, ubuntu 14.04, zookeeper 3.4.5 We have a 3 node cluster with spark and zookeeper. We also have a automatic restarter service which checks for the status of spark master every 5 min and restarts it if it is not running. So when the master shuts down after its leadership is revoked, the restarter service starts spark master within 5 min. But in *few cases* we don't see any ALIVE spark master in any of the three nodes - we don't see any "We have gained leadership" message in any of the 3 nodes. So in a sense the workaround suggested in the above comment does not work. >From spark-defaults.conf: spark.deploy.recoveryMode ZOOKEEPER spark.deploy.zookeeper.url 192.168.42.2:28000,192.168.42.3:28000,192.168.42.4:28000 spark.deploy.recoveryDirectory /var/run/sparkmaster/df71911f-a28d-409d-977f-ea2e596ec578/recovery spark.akka.logLifecycleEvents true was (Author: avik...@gmail.com): I am also seeing the same issue with spark 1.3.0, ubuntu 14.04, zookeeper 3.4.5 We have a 3 node cluster with spark and zookeeper. We also have a automatic restarter service which checks for the status of spark master every 5 min and restarts it if it is not running. So when the master shuts down after its leadership is revoked, the restarter service starts spark master within 5 min. But in *few cases* we don't see any ALIVE spark master in any of the three nodes - we don't see any "We have gained leadership" message in any of the 3 nodes. >From spark-defaults.conf: spark.deploy.recoveryMode ZOOKEEPER spark.deploy.zookeeper.url 192.168.42.2:28000,192.168.42.3:28000,192.168.42.4:28000 spark.deploy.recoveryDirectory /var/run/sparkmaster/df71911f-a28d-409d-977f-ea2e596ec578/recovery spark.akka.logLifecycleEvents true > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org