[jira] [Commented] (SPARK-38079) Not waiting for configmap before starting driver
[ https://issues.apache.org/jira/browse/SPARK-38079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726030#comment-17726030 ] zuotingbing commented on SPARK-38079: - We face the same problem. Is there anybody follow this issue? pod event: Warning FailedMount 13s kubelet MountVolume.SetUp failed for volume "hadoop-properties" : configmap "spark-pi-6-a63eb888484c3bf1-hadoop-config" not found Warning FailedMount 13s kubelet MountVolume.SetUp failed for volume "spark-conf-volume-driver" : configmap "spark-drv-70e59d88484c3e59-conf-map" not found > Not waiting for configmap before starting driver > > > Key: SPARK-38079 > URL: https://issues.apache.org/jira/browse/SPARK-38079 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0, 3.2.1 >Reporter: Ben >Priority: Major > > *The problem* > When you spark-submit to kubernetes in cluster-mode: > # Kubernetes creates the driver > # Kubernetes creates a configmap that the driver depends on > This is a race condition. If the configmap is not created quickly enough, > then the driver will fail to start up properly. > See [this stackoverflow post|https://stackoverflow.com/a/58508313] for an > alternate description of this problem. > > *To Reproduce* > # Download spark 3.2.0 or 3.2.1 from > [https://spark.apache.org/downloads.html] > # Create an image with > {code:java} > bin/docker-image-tool.sh{code} > # Spark submit one of the examples to some kubernetes instance > # Observe the race condition -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28014) All waiting apps will be changed to the wrong state of Running after master changed
[ https://issues.apache.org/jira/browse/SPARK-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-28014: Summary: All waiting apps will be changed to the wrong state of Running after master changed (was: All waiting apps will be changed to the wrong state of Running ) > All waiting apps will be changed to the wrong state of Running after master > changed > --- > > Key: SPARK-28014 > URL: https://issues.apache.org/jira/browse/SPARK-28014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: zuotingbing >Priority: Major > Attachments: image-2019-06-12-15-36-14-211.png, > image-2019-06-12-15-38-45-367.png > > > These waiting apps which granted 0 cores will be changed to Running state > after master changed, that is a little weird. > > before master changed: > !image-2019-06-12-15-36-14-211.png! > after master changed from zdh112 to zdh113: > !image-2019-06-12-15-38-45-367.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28014) All waiting apps will be changed to the wrong state of Running
[ https://issues.apache.org/jira/browse/SPARK-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-28014: Description: These waiting apps which granted 0 cores will be changed to Running state after master changed, that is a little weird. before master changed: !image-2019-06-12-15-36-14-211.png! after master changed from zdh112 to zdh113: !image-2019-06-12-15-38-45-367.png! was:These waiting apps which granted 0 cores will be changed to Running state, that is a little weird. > All waiting apps will be changed to the wrong state of Running > > > Key: SPARK-28014 > URL: https://issues.apache.org/jira/browse/SPARK-28014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: zuotingbing >Priority: Major > Attachments: image-2019-06-12-15-36-14-211.png, > image-2019-06-12-15-38-45-367.png > > > These waiting apps which granted 0 cores will be changed to Running state > after master changed, that is a little weird. > > before master changed: > !image-2019-06-12-15-36-14-211.png! > after master changed from zdh112 to zdh113: > !image-2019-06-12-15-38-45-367.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28014) All waiting apps will be changed to the wrong state of Running
[ https://issues.apache.org/jira/browse/SPARK-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-28014: Attachment: image-2019-06-12-15-38-45-367.png > All waiting apps will be changed to the wrong state of Running > > > Key: SPARK-28014 > URL: https://issues.apache.org/jira/browse/SPARK-28014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: zuotingbing >Priority: Major > Attachments: image-2019-06-12-15-36-14-211.png, > image-2019-06-12-15-38-45-367.png > > > These waiting apps which granted 0 cores will be changed to Running state, > that is a little weird. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28014) All waiting apps will be changed to the wrong state of Running
[ https://issues.apache.org/jira/browse/SPARK-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-28014: Attachment: image-2019-06-12-15-36-14-211.png > All waiting apps will be changed to the wrong state of Running > > > Key: SPARK-28014 > URL: https://issues.apache.org/jira/browse/SPARK-28014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: zuotingbing >Priority: Major > Attachments: image-2019-06-12-15-36-14-211.png > > > These waiting apps which granted 0 cores will be changed to Running state, > that is a little weird. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28014) All waiting apps will be changed to the wrong state of Running
[ https://issues.apache.org/jira/browse/SPARK-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-28014: Priority: Major (was: Minor) > All waiting apps will be changed to the wrong state of Running > > > Key: SPARK-28014 > URL: https://issues.apache.org/jira/browse/SPARK-28014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: zuotingbing >Priority: Major > > These waiting apps which granted 0 cores will be changed to Running state, > that is a little weird. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28014) All waiting apps will be changed to the wrong state of Running
[ https://issues.apache.org/jira/browse/SPARK-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-28014: Description: These waiting apps which granted 0 cores will be changed to Running state, that is a little weird. > All waiting apps will be changed to the wrong state of Running > > > Key: SPARK-28014 > URL: https://issues.apache.org/jira/browse/SPARK-28014 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: zuotingbing >Priority: Minor > > These waiting apps which granted 0 cores will be changed to Running state, > that is a little weird. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28014) All waiting apps will be changed to the wrong state of Running
zuotingbing created SPARK-28014: --- Summary: All waiting apps will be changed to the wrong state of Running Key: SPARK-28014 URL: https://issues.apache.org/jira/browse/SPARK-28014 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.3 Reporter: zuotingbing -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23191) Workers registration failes in case of network drop
[ https://issues.apache.org/jira/browse/SPARK-23191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835256#comment-16835256 ] zuotingbing edited comment on SPARK-23191 at 5/15/19 3:07 AM: -- See these detail logs, master changed from vmax18 to vmax17. In master vmax18, worker be removed because got no heartbeat but soon got heartbeat and asking to re-register with master vmax18(will tryRegisterAllMaster() which include master vmax17). In the same time, worker has bean registered with master vmax17 when master vmax17 got leadership. So Worker registration failed: Duplicate worker ID. spark-mr-master-vmax18.log: {code:java} 2019-03-15 20:22:09,441 INFO ZooKeeperLeaderElectionAgent: We have lost leadership 2019-03-15 20:22:14,544 WARN Master: Removing worker-20190218183101-vmax18-33129 because we got no heartbeat in 60 seconds 2019-03-15 20:22:14,544 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:14,864 WARN Master: Got heartbeat from unregistered worker worker-20190218183101-vmax18-33129. Asking it to re-register. 2019-03-15 20:22:14,975 ERROR Master: Leadership has been revoked -- master shutting down. {code} spark-mr-master-vmax17.log: {code:java} 2019-03-15 20:22:14,870 INFO Master: Registering worker vmax18:33129 with 21 cores, 125.0 GB RAM 2019-03-15 20:22:15,261 INFO Master: vmax18:33129 got disassociated, removing it. 2019-03-15 20:22:15,263 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:15,311 ERROR Inbox: Ignoring error org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /spark/master_status/worker_worker-20190218183101-vmax18-33129 {code} spark-mr-worker-vmax18.log: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called{code} PS, this will result another issue: The leader will always in COMPLETING_RECOVERY state. worker-vmax18 shut down cause duplicate worker ID,and clear the worker's node on persist Engine(we use zookeeper). Then the new leader(master-vmax17) find the worker died and trying to remove it ,and try to clear the node on zookeeper,but the node has been removed yet during worker-vmax18 shut down ,so {color:#ff}*an exception was thrown in function completeRecovery()* *. Then the leader will always in COMPLETING_RECOVERY state.*{color} was (Author: zuo.tingbing9): See these detail logs, master changed from vmax18 to vmax17. In master vmax18, worker be removed because got no heartbeat but soon got heartbeat and asking to re-register with master vmax18. In the same time, worker has bean registered with master vmax17 when master vmax17 got leadership. So Worker registration failed: Duplicate worker ID. spark-mr-master-vmax18.log: {code:java} 2019-03-15 20:22:09,441 INFO ZooKeeperLeaderElectionAgent: We have lost leadership 2019-03-15 20:22:14,544 WARN Master: Removing worker-20190218183101-vmax18-33129 because we got no heartbeat in 60 seconds 2019-03-15 20:22:14,544 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:14,864 WARN Master: Got heartbeat from unregistered worker worker-20190218183101-vmax18-33129. Asking it to re-register. 2019-03-15 20:22:14,975 ERROR Master: Leadership has been revoked -- master shutting down. {code} spark-mr-master-vmax17.log: {code:java} 2019-03-15 20:22:14,870 INFO Master: Registering worker vmax18:33129 with 21 cores, 125.0 GB RAM 2019-03-15 20:22:15,261 INFO Master: vmax18:33129 got disassociated, removing it. 2019-03-15 20:22:15,263 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:15,311 ERROR Inbox: Ignoring error org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /spark/master_status/worker_worker-20190218183101-vmax18-33129 {code} spark-mr-worker-vmax18.log: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,895 INFO
[jira] [Comment Edited] (SPARK-23191) Workers registration failes in case of network drop
[ https://issues.apache.org/jira/browse/SPARK-23191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835256#comment-16835256 ] zuotingbing edited comment on SPARK-23191 at 5/8/19 2:21 AM: - See these detail logs, master changed from vmax18 to vmax17. In master vmax18, worker be removed because got no heartbeat but soon got heartbeat and asking to re-register with master vmax18. In the same time, worker has bean registered with master vmax17 when master vmax17 got leadership. So Worker registration failed: Duplicate worker ID. spark-mr-master-vmax18.log: {code:java} 2019-03-15 20:22:09,441 INFO ZooKeeperLeaderElectionAgent: We have lost leadership 2019-03-15 20:22:14,544 WARN Master: Removing worker-20190218183101-vmax18-33129 because we got no heartbeat in 60 seconds 2019-03-15 20:22:14,544 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:14,864 WARN Master: Got heartbeat from unregistered worker worker-20190218183101-vmax18-33129. Asking it to re-register. 2019-03-15 20:22:14,975 ERROR Master: Leadership has been revoked -- master shutting down. {code} spark-mr-master-vmax17.log: {code:java} 2019-03-15 20:22:14,870 INFO Master: Registering worker vmax18:33129 with 21 cores, 125.0 GB RAM 2019-03-15 20:22:15,261 INFO Master: vmax18:33129 got disassociated, removing it. 2019-03-15 20:22:15,263 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:15,311 ERROR Inbox: Ignoring error org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /spark/master_status/worker_worker-20190218183101-vmax18-33129 {code} spark-mr-worker-vmax18.log: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called{code} PS, this will result another issue: The leader will always in COMPLETING_RECOVERY state. worker-vmax18 shut down cause duplicate worker ID,and clear the worker's node on persist Engine(we use zookeeper). Then the new leader(master-vmax17) find the worker died and trying to remove it ,and try to clear the node on zookeeper,but the node has been removed yet during worker-vmax18 shut down ,so {color:#ff}*an exception was thrown in function completeRecovery()* *. Then the leader will always in COMPLETING_RECOVERY state.*{color} was (Author: zuo.tingbing9): See these detail logs, master changed from vmax18 to vmax17. In master vmax18, worker be removed because got no heartbeat but soon got heartbeat and asking to re-register with master vmax18. In the same time, worker has bean registered with master vmax17 when master vmax17 got leadership. So Worker registration failed: Duplicate worker ID. spark-mr-master-vmax18.log: ** {code:java} 2019-03-15 20:22:09,441 INFO ZooKeeperLeaderElectionAgent: We have lost leadership 2019-03-15 20:22:14,544 WARN Master: Removing worker-20190218183101-vmax18-33129 because we got no heartbeat in 60 seconds 2019-03-15 20:22:14,544 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:14,864 WARN Master: Got heartbeat from unregistered worker worker-20190218183101-vmax18-33129. Asking it to re-register. 2019-03-15 20:22:14,975 ERROR Master: Leadership has been revoked -- master shutting down. {code} spark-mr-master-vmax17.log: {code:java} 2019-03-15 20:22:14,870 INFO Master: Registering worker vmax18:33129 with 21 cores, 125.0 GB RAM 2019-03-15 20:22:15,261 INFO Master: vmax18:33129 got disassociated, removing it. 2019-03-15 20:22:15,263 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:15,311 ERROR Inbox: Ignoring error org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /spark/master_status/worker_worker-20190218183101-vmax18-33129 {code} spark-mr-worker-vmax18.log: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15
[jira] [Commented] (SPARK-23191) Workers registration failes in case of network drop
[ https://issues.apache.org/jira/browse/SPARK-23191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835256#comment-16835256 ] zuotingbing commented on SPARK-23191: - See these detail logs, master changed from vmax18 to vmax17. In master vmax18, worker be removed because got no heartbeat but soon got heartbeat and asking to re-register with master vmax18. In the same time, worker has bean registered with master vmax17 when master vmax17 got leadership. So Worker registration failed: Duplicate worker ID. spark-mr-master-vmax18.log: ** {code:java} 2019-03-15 20:22:09,441 INFO ZooKeeperLeaderElectionAgent: We have lost leadership 2019-03-15 20:22:14,544 WARN Master: Removing worker-20190218183101-vmax18-33129 because we got no heartbeat in 60 seconds 2019-03-15 20:22:14,544 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:14,864 WARN Master: Got heartbeat from unregistered worker worker-20190218183101-vmax18-33129. Asking it to re-register. 2019-03-15 20:22:14,975 ERROR Master: Leadership has been revoked -- master shutting down. {code} spark-mr-master-vmax17.log: {code:java} 2019-03-15 20:22:14,870 INFO Master: Registering worker vmax18:33129 with 21 cores, 125.0 GB RAM 2019-03-15 20:22:15,261 INFO Master: vmax18:33129 got disassociated, removing it. 2019-03-15 20:22:15,263 INFO Master: Removing worker worker-20190218183101-vmax18-33129 on vmax18:33129 2019-03-15 20:22:15,311 ERROR Inbox: Ignoring error org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /spark/master_status/worker_worker-20190218183101-vmax18-33129 {code} spark-mr-worker-vmax18.log: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called{code} PS, this will result another issue: The leader will always in COMPLETING_RECOVERY state. worker-vmax18 shut down cause duplicate worker ID,and clear the worker's node on persist Engine(we use zookeeper). Then the new leader(master-vmax17) find the worker died and trying to remove it ,and try to clear the node on zookeeper,but the node has been removed yet during worker-vmax18 shut down ,so {color:#FF}*an exception was thrown in function completeRecovery()* *. Then the leader will always in COMPLETING_RECOVERY state.*{color} > Workers registration failes in case of network drop > --- > > Key: SPARK-23191 > URL: https://issues.apache.org/jira/browse/SPARK-23191 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3, 2.2.1, 2.3.0 > Environment: OS:- Centos 6.9(64 bit) > >Reporter: Neeraj Gupta >Priority: Critical > > We have a 3 node cluster. We were facing issues of multiple driver running in > some scenario in production. > On further investigation we were able to reproduce iin both 1.6.3 and 2.2.1 > versions the scenario with following steps:- > # Setup a 3 node cluster. Start master and slaves. > # On any node where the worker process is running block the connections on > port 7077 using iptables. > {code:java} > iptables -A OUTPUT -p tcp --dport 7077 -j DROP > {code} > # After about 10-15 secs we get the error on node that it is unable to > connect to master. > {code:java} > 2018-01-23 12:08:51,639 [rpc-client-1-1] WARN > org.apache.spark.network.server.TransportChannelHandler - Exception in > connection from > java.io.IOException: Connection timed out > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221) > at > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643) > at >
[jira] [Comment Edited] (SPARK-23191) Workers registration failes in case of network drop
[ https://issues.apache.org/jira/browse/SPARK-23191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829051#comment-16829051 ] zuotingbing edited comment on SPARK-23191 at 4/29/19 9:28 AM: -- we faced the same issue in standalone HA mode. Could you please take a view on this issue? {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called 2019-03-15 20:22:14,898 INFO ShutdownHookManager: Deleting directory /data4/zdh/spark/tmp/spark-c578bf32-6a5e-44a5-843b-c796f44648ee 2019-03-15 20:22:14,908 INFO ShutdownHookManager: Deleting directory /data3/zdh/spark/tmp/spark-7e57e77d-cbb7-47d3-a6dd-737b57788533 2019-03-15 20:22:14,920 INFO ShutdownHookManager: Deleting directory /data2/zdh/spark/tmp/spark-0beebf20-abbd-4d99-a401-3ef0e88e0b05{code} [~andrewor14] [~cloud_fan] [~vanzin] was (Author: zuo.tingbing9): we faced the same issue in standalone HA mode. Could you please take a view on this issue? [~andrewor14] [~cloud_fan] [~vanzin] > Workers registration failes in case of network drop > --- > > Key: SPARK-23191 > URL: https://issues.apache.org/jira/browse/SPARK-23191 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3, 2.2.1, 2.3.0 > Environment: OS:- Centos 6.9(64 bit) > >Reporter: Neeraj Gupta >Priority: Critical > > We have a 3 node cluster. We were facing issues of multiple driver running in > some scenario in production. > On further investigation we were able to reproduce iin both 1.6.3 and 2.2.1 > versions the scenario with following steps:- > # Setup a 3 node cluster. Start master and slaves. > # On any node where the worker process is running block the connections on > port 7077 using iptables. > {code:java} > iptables -A OUTPUT -p tcp --dport 7077 -j DROP > {code} > # After about 10-15 secs we get the error on node that it is unable to > connect to master. > {code:java} > 2018-01-23 12:08:51,639 [rpc-client-1-1] WARN > org.apache.spark.network.server.TransportChannelHandler - Exception in > connection from > java.io.IOException: Connection timed out > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at >
[jira] [Commented] (SPARK-23191) Workers registration failes in case of network drop
[ https://issues.apache.org/jira/browse/SPARK-23191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829051#comment-16829051 ] zuotingbing commented on SPARK-23191: - we faced the same issue in standalone HA mode. Could you please take a view on this issue? [~andrewor14] [~cloud_fan] [~vanzin] > Workers registration failes in case of network drop > --- > > Key: SPARK-23191 > URL: https://issues.apache.org/jira/browse/SPARK-23191 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3, 2.2.1, 2.3.0 > Environment: OS:- Centos 6.9(64 bit) > >Reporter: Neeraj Gupta >Priority: Critical > > We have a 3 node cluster. We were facing issues of multiple driver running in > some scenario in production. > On further investigation we were able to reproduce iin both 1.6.3 and 2.2.1 > versions the scenario with following steps:- > # Setup a 3 node cluster. Start master and slaves. > # On any node where the worker process is running block the connections on > port 7077 using iptables. > {code:java} > iptables -A OUTPUT -p tcp --dport 7077 -j DROP > {code} > # After about 10-15 secs we get the error on node that it is unable to > connect to master. > {code:java} > 2018-01-23 12:08:51,639 [rpc-client-1-1] WARN > org.apache.spark.network.server.TransportChannelHandler - Exception in > connection from > java.io.IOException: Connection timed out > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221) > at > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) > at java.lang.Thread.run(Thread.java:745) > 2018-01-23 12:08:51,647 [dispatcher-event-loop-0] ERROR > org.apache.spark.deploy.worker.Worker - Connection to master failed! Waiting > for master to reconnect... > 2018-01-23 12:08:51,647 [dispatcher-event-loop-0] ERROR > org.apache.spark.deploy.worker.Worker - Connection to master failed! Waiting > for master to reconnect... > {code} > # Once we get this exception we renable the connections to port 7077 using > {code:java} > iptables -D OUTPUT -p tcp --dport 7077 -j DROP > {code} > # Worker tries to register again with master but is unable to do so. It > gives following error > {code:java} > 2018-01-23 12:08:58,657 [worker-register-master-threadpool-2] WARN > org.apache.spark.deploy.worker.Worker - Failed to connect to master > :7077 > org.apache.spark.SparkException: Exception thrown in awaitResult: > at > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100) > at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108) > at > org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:241) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed to connect to :7077 > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182) > at
[jira] [Commented] (SPARK-16190) Worker registration failed: Duplicate worker ID
[ https://issues.apache.org/jira/browse/SPARK-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796772#comment-16796772 ] zuotingbing commented on SPARK-16190: - i faced the same issue. worker log as follows: {code:java} // code placeholder {code} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called 2019-03-15 20:22:14,898 INFO ShutdownHookManager: Deleting directory /data4/zdh/spark/tmp/spark-c578bf32-6a5e-44a5-843b-c796f44648ee 2019-03-15 20:22:14,908 INFO ShutdownHookManager: Deleting directory /data3/zdh/spark/tmp/spark-7e57e77d-cbb7-47d3-a6dd-737b57788533 2019-03-15 20:22:14,920 INFO ShutdownHookManager: Deleting directory /data2/zdh/spark/tmp/spark-0beebf20-abbd-4d99-a401-3ef0e88e0b05 > Worker registration failed: Duplicate worker ID > --- > > Key: SPARK-16190 > URL: https://issues.apache.org/jira/browse/SPARK-16190 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.6.1 >Reporter: Thomas Huang >Priority: Minor > Attachments: > spark-mqq-org.apache.spark.deploy.worker.Worker-1-slave19.out, > spark-mqq-org.apache.spark.deploy.worker.Worker-1-slave2.out, > spark-mqq-org.apache.spark.deploy.worker.Worker-1-slave7.out, > spark-mqq-org.apache.spark.deploy.worker.Worker-1-slave8.out > > > Several worker crashed simultaneously due to this error: > Worker registration failed: Duplicate worker ID > This is the worker log on one of those crashed workers: > 16/06/24 16:28:53 INFO ExecutorRunner: Killing process! > 16/06/24 16:28:53 INFO ExecutorRunner: Runner thread for executor > app-20160624003013-0442/26 interrupted > 16/06/24 16:28:53 INFO ExecutorRunner: Killing process! > 16/06/24 16:29:03 WARN ExecutorRunner: Failed to terminate process: > java.lang.UNIXProcess@31340137. This process will likely be orphaned. > 16/06/24 16:29:03 WARN ExecutorRunner: Failed to terminate process: > java.lang.UNIXProcess@4d3bdb1d. This process will likely be orphaned. > 16/06/24 16:29:03 INFO Worker: Executor app-20160624003013-0442/8 finished > with state KILLED > 16/06/24 16:29:03 INFO Worker: Executor app-20160624003013-0442/26 finished > with state KILLED > 16/06/24 16:29:03 INFO Worker: Cleaning up local directories for application > app-20160624003013-0442 > 16/06/24 16:31:18 INFO ExternalShuffleBlockResolver: Application > app-20160624003013-0442 removed, cleanupLocalDirs = true > 16/06/24 16:31:18 INFO Worker: Asked to launch executor >
[jira] [Comment Edited] (SPARK-16190) Worker registration failed: Duplicate worker ID
[ https://issues.apache.org/jira/browse/SPARK-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796772#comment-16796772 ] zuotingbing edited comment on SPARK-16190 at 3/20/19 3:57 AM: -- i faced the same issue. worker log as follows: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called 2019-03-15 20:22:14,898 INFO ShutdownHookManager: Deleting directory /data4/zdh/spark/tmp/spark-c578bf32-6a5e-44a5-843b-c796f44648ee 2019-03-15 20:22:14,908 INFO ShutdownHookManager: Deleting directory /data3/zdh/spark/tmp/spark-7e57e77d-cbb7-47d3-a6dd-737b57788533 2019-03-15 20:22:14,920 INFO ShutdownHookManager: Deleting directory /data2/zdh/spark/tmp/spark-0beebf20-abbd-4d99-a401-3ef0e88e0b05{code} was (Author: zuo.tingbing9): i faced the same issue. worker log as follows: {code:java} // code placeholder {code} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894
[jira] [Comment Edited] (SPARK-16190) Worker registration failed: Duplicate worker ID
[ https://issues.apache.org/jira/browse/SPARK-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796772#comment-16796772 ] zuotingbing edited comment on SPARK-16190 at 3/20/19 4:02 AM: -- i faced the same issue in standalone HA mode. worker log as follows: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called 2019-03-15 20:22:14,898 INFO ShutdownHookManager: Deleting directory /data4/zdh/spark/tmp/spark-c578bf32-6a5e-44a5-843b-c796f44648ee 2019-03-15 20:22:14,908 INFO ShutdownHookManager: Deleting directory /data3/zdh/spark/tmp/spark-7e57e77d-cbb7-47d3-a6dd-737b57788533 2019-03-15 20:22:14,920 INFO ShutdownHookManager: Deleting directory /data2/zdh/spark/tmp/spark-0beebf20-abbd-4d99-a401-3ef0e88e0b05{code} was (Author: zuo.tingbing9): i faced the same issue. worker log as follows: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15
[jira] [Comment Edited] (SPARK-16190) Worker registration failed: Duplicate worker ID
[ https://issues.apache.org/jira/browse/SPARK-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796772#comment-16796772 ] zuotingbing edited comment on SPARK-16190 at 3/20/19 3:59 AM: -- i faced the same issue. worker log as follows: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called 2019-03-15 20:22:14,898 INFO ShutdownHookManager: Deleting directory /data4/zdh/spark/tmp/spark-c578bf32-6a5e-44a5-843b-c796f44648ee 2019-03-15 20:22:14,908 INFO ShutdownHookManager: Deleting directory /data3/zdh/spark/tmp/spark-7e57e77d-cbb7-47d3-a6dd-737b57788533 2019-03-15 20:22:14,920 INFO ShutdownHookManager: Deleting directory /data2/zdh/spark/tmp/spark-0beebf20-abbd-4d99-a401-3ef0e88e0b05{code} was (Author: zuo.tingbing9): i faced the same issue. worker log as follows: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner:
[jira] [Comment Edited] (SPARK-16190) Worker registration failed: Duplicate worker ID
[ https://issues.apache.org/jira/browse/SPARK-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796772#comment-16796772 ] zuotingbing edited comment on SPARK-16190 at 3/20/19 4:00 AM: -- i faced the same issue. worker log as follows: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,895 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,896 INFO ShutdownHookManager: Shutdown hook called 2019-03-15 20:22:14,898 INFO ShutdownHookManager: Deleting directory /data4/zdh/spark/tmp/spark-c578bf32-6a5e-44a5-843b-c796f44648ee 2019-03-15 20:22:14,908 INFO ShutdownHookManager: Deleting directory /data3/zdh/spark/tmp/spark-7e57e77d-cbb7-47d3-a6dd-737b57788533 2019-03-15 20:22:14,920 INFO ShutdownHookManager: Deleting directory /data2/zdh/spark/tmp/spark-0beebf20-abbd-4d99-a401-3ef0e88e0b05{code} was (Author: zuo.tingbing9): i faced the same issue. worker log as follows: {code:java} 2019-03-15 20:22:10,474 INFO Worker: Master has changed, new master is at spark://vmax17:7077 2019-03-15 20:22:14,862 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax18:7077... 2019-03-15 20:22:14,863 INFO Worker: Connecting to master vmax17:7077... 2019-03-15 20:22:14,865 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,865 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,868 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,868 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,871 INFO Worker: Master with url spark://vmax18:7077 requested this worker to reconnect. 2019-03-15 20:22:14,871 INFO Worker: Not spawning another attempt to register with the master, since there is an attempt scheduled already. 2019-03-15 20:22:14,879 ERROR Worker: Worker registration failed: Duplicate worker ID 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,891 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,893 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO ExecutorRunner: Killing process! 2019-03-15 20:22:14,894 INFO
[jira] [Comment Edited] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781160#comment-16781160 ] zuotingbing edited comment on SPARK-27010 at 3/1/19 1:27 AM: - Currently, if we set *SPARK_MASTER_PORT=0*, we can easily find out the actual port number in log which would help us to better get the correct *spark.master* address. !2019-03-01_092511.png! was (Author: zuo.tingbing9): Currently, if we set *SPARK_MASTER_PORT=0*, we can easily find out the actual port number in log which would help us to better get the correct *spark.master* address. !2019-03-01_090847.png! > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png, 2019-03-01_092511.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use when using beeline to connect.. > before: > !2019-02-28_170942.png! > after: > !2019-02-28_170904.png! > use beeline to connect success: > !2019-02-28_170844.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Attachment: 2019-03-01_092511.png > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png, 2019-03-01_092511.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use when using beeline to connect.. > before: > !2019-02-28_170942.png! > after: > !2019-02-28_170904.png! > use beeline to connect success: > !2019-02-28_170844.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Attachment: (was: 2019-03-01_090847.png) > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use when using beeline to connect.. > before: > !2019-02-28_170942.png! > after: > !2019-02-28_170904.png! > use beeline to connect success: > !2019-02-28_170844.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781160#comment-16781160 ] zuotingbing commented on SPARK-27010: - Currently, if we set *SPARK_MASTER_PORT=0*, we can easily find out the actual port number in log which would help us to better get the correct *spark.master* address. !2019-03-01_090847.png! > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png, 2019-03-01_090847.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use when using beeline to connect.. > before: > !2019-02-28_170942.png! > after: > !2019-02-28_170904.png! > use beeline to connect success: > !2019-02-28_170844.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Attachment: 2019-03-01_090847.png > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png, 2019-03-01_090847.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use when using beeline to connect.. > before: > !2019-02-28_170942.png! > after: > !2019-02-28_170904.png! > use beeline to connect success: > !2019-02-28_170844.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Description: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we should use when using beeline to connect.. before: !2019-02-28_170942.png! after: !2019-02-28_170904.png! use beeline to connect success: !2019-02-28_170844.png! was: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we should use beeline to connect to. before: !2019-02-28_170942.png! after: !2019-02-28_170904.png! use beeline to connect success: !2019-02-28_170844.png! > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use when using beeline to connect.. > before: > !2019-02-28_170942.png! > after: > !2019-02-28_170904.png! > use beeline to connect success: > !2019-02-28_170844.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27010) display the actual port number when hive.server2.thrift.port=0
zuotingbing created SPARK-27010: --- Summary: display the actual port number when hive.server2.thrift.port=0 Key: SPARK-27010 URL: https://issues.apache.org/jira/browse/SPARK-27010 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.0 Reporter: zuotingbing Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we use beeline to connect. before: !image-2019-02-28-17-00-21-251.png! after: !image-2019-02-28-17-03-45-779.png! use beeline to connect success: -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Description: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we use beeline to connect. before: after: use beeline to connect success: was: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we use beeline to connect. before: !image-2019-02-28-17-00-21-251.png! after: !image-2019-02-28-17-03-45-779.png! use beeline to connect success: > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we use beeline to connect. > before: > > after: > > use beeline to connect success: > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Description: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we should use beeline to connect to. before: !2019-02-28_170942.png! after: !2019-02-28_170904.png! use beeline to connect success: !2019-02-28_170844.png! was: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we should use beeline to connect. before: !2019-02-28_170942.png! after: !2019-02-28_170904.png! use beeline to connect success: !2019-02-28_170844.png! > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use beeline to connect to. > before: > !2019-02-28_170942.png! > after: > !2019-02-28_170904.png! > use beeline to connect success: > !2019-02-28_170844.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Description: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we should use beeline to connect. before: !2019-02-28_170942.png! after: !2019-02-28_170904.png! use beeline to connect success: !2019-02-28_170844.png! was: Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the actual port number which one we use beeline to connect. before: after: use beeline to connect success: > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we should use beeline to connect. > before: > !2019-02-28_170942.png! > after: > !2019-02-28_170904.png! > use beeline to connect success: > !2019-02-28_170844.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Attachment: 2019-02-28_170844.png > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we use beeline to connect. > before: > !image-2019-02-28-17-00-21-251.png! > after: > !image-2019-02-28-17-03-45-779.png! > use beeline to connect success: > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Attachment: 2019-02-28_170904.png > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we use beeline to connect. > before: > !image-2019-02-28-17-00-21-251.png! > after: > !image-2019-02-28-17-03-45-779.png! > use beeline to connect success: > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Attachment: 2019-02-28_170942.png > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > Attachments: 2019-02-28_170844.png, 2019-02-28_170904.png, > 2019-02-28_170942.png > > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we use beeline to connect. > before: > !image-2019-02-28-17-00-21-251.png! > after: > !image-2019-02-28-17-03-45-779.png! > use beeline to connect success: > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27010) find out the actual port number when hive.server2.thrift.port=0
[ https://issues.apache.org/jira/browse/SPARK-27010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-27010: Summary: find out the actual port number when hive.server2.thrift.port=0 (was: display the actual port number when hive.server2.thrift.port=0) > find out the actual port number when hive.server2.thrift.port=0 > --- > > Key: SPARK-27010 > URL: https://issues.apache.org/jira/browse/SPARK-27010 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: zuotingbing >Priority: Minor > > Currently, if we set *hive.server2.thrift.port=0*, it hard to find out the > actual port number which one we use beeline to connect. > before: > !image-2019-02-28-17-00-21-251.png! > after: > !image-2019-02-28-17-03-45-779.png! > use beeline to connect success: > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers with freeCores>=CPUS_PER_TASK at first for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Priority: Trivial (was: Major) > we should filter the workOffers with freeCores>=CPUS_PER_TASK at first for > better performance > - > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Trivial > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers with freeCores>=CPUS_PER_TASK for better > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers with freeCores>=CPUS_PER_TASK at first for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Summary: we should filter the workOffers with freeCores>=CPUS_PER_TASK at first for better performance (was: we should filter the workOffers of which freeCores>=CPUS_PER_TASK at first for better performance) > we should filter the workOffers with freeCores>=CPUS_PER_TASK at first for > better performance > - > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers with freeCores>=CPUS_PER_TASK for better > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>=CPUS_PER_TASK at first for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Description: We should filter the workOffers with freeCores>=CPUS_PER_TASK for better performance. (was: We should filter the workOffers of which freeCores>=CPUS_PER_TASK for better performance.) > we should filter the workOffers of which freeCores>=CPUS_PER_TASK at first > for better performance > - > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers with freeCores>=CPUS_PER_TASK for better > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>=CPUS_PER_TASK at first for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Summary: we should filter the workOffers of which freeCores>=CPUS_PER_TASK at first for better performance (was: we should filter the workOffers of which freeCores>CPUS_PER_TASK at first for better performance) > we should filter the workOffers of which freeCores>=CPUS_PER_TASK at first > for better performance > - > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores>=CPUS_PER_TASK for better > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>CPUS_PER_TASK at first for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Description: We should filter the workOffers of which freeCores>=CPUS_PER_TASK for better performance. (was: We should filter the workOffers of which freeCores=0 for better performance.) > we should filter the workOffers of which freeCores>CPUS_PER_TASK at first for > better performance > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores>=CPUS_PER_TASK for better > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>CPUS_PER_TASK at first for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Summary: we should filter the workOffers of which freeCores>CPUS_PER_TASK at first for better performance (was: we should filter the workOffers of which freeCores>0 for better performance) > we should filter the workOffers of which freeCores>CPUS_PER_TASK at first for > better performance > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 for better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Priority: Major (was: Minor) > we should filter the workOffers of which freeCores>0 for better performance > --- > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 for better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Description: We should filter the workOffers of which freeCores=0 for better performance. (was: We should filter the workOffers of which freeCores=0 when make fake resource offers on all executors.) > we should filter the workOffers of which freeCores>0 for better performance > --- > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 for better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Summary: we should filter the workOffers of which freeCores>0 for better performance (was: we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors) > we should filter the workOffers of which freeCores>0 for better performance > --- > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Component/s: (was: Spark Core) Scheduler > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Attachment: 2018-10-26_162822.png > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Summary: we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors (was: we should filter the workOffers of which freeCores=0 when make fake resource offers on all executors) > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25852) we should filter the workOffers of which freeCores=0 when make fake resource offers on all executors
zuotingbing created SPARK-25852: --- Summary: we should filter the workOffers of which freeCores=0 when make fake resource offers on all executors Key: SPARK-25852 URL: https://issues.apache.org/jira/browse/SPARK-25852 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.2 Reporter: zuotingbing We should filter the workOffers of which freeCores=0 when make fake resource offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618837#comment-16618837 ] zuotingbing edited comment on SPARK-25451 at 9/18/18 10:11 AM: --- yes, thanks [~yumwang] was (Author: zuo.tingbing9): yes, thanks > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25451: Target Version/s: (was: 2.3.1) > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618837#comment-16618837 ] zuotingbing commented on SPARK-25451: - yes, thanks > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25451: Description: See the attached pic. !mshot.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: {code:java} $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. was: See the attached pic. !image-2018-09-18-16-35-09-548.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077 Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !mshot.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > {code:java} > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077{code} > Run job as fellows: > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25451: Description: See the attached pic. !image-2018-09-18-16-35-09-548.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077 Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. was: See the attached pic. !image-2018-09-18-16-35-09-548.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077 Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > > !image-2018-09-18-16-35-09-548.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077 > Run job as fellows: > > > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25451) Stages page doesn't show the right number of the total tasks
[ https://issues.apache.org/jira/browse/SPARK-25451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25451: Attachment: mshot.png > Stages page doesn't show the right number of the total tasks > > > Key: SPARK-25451 > URL: https://issues.apache.org/jira/browse/SPARK-25451 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: mshot.png > > > > See the attached pic. > !image-2018-09-18-16-35-09-548.png! > The executor 1 has 7 tasks, but in the Stages Page the total tasks of > executor is 6. > > to reproduce this simply start a shell: > $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g > --total-executor-cores 2 --master spark://localhost.localdomain:7077 > Run job as fellows: > > > {code:java} > sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad > executor")}.collect() {code} > > Go to the stages page and you will see the Total Tasks is not right in > {code:java} > Aggregated Metrics by Executor{code} > table. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25451) Stages page doesn't show the right number of the total tasks
zuotingbing created SPARK-25451: --- Summary: Stages page doesn't show the right number of the total tasks Key: SPARK-25451 URL: https://issues.apache.org/jira/browse/SPARK-25451 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.3.1 Reporter: zuotingbing See the attached pic. !image-2018-09-18-16-35-09-548.png! The executor 1 has 7 tasks, but in the Stages Page the total tasks of executor is 6. to reproduce this simply start a shell: $SPARK_HOME/bin/spark-shell --executor-cores 1 --executor-memory 1g --total-executor-cores 2 --master spark://localhost.localdomain:7077 Run job as fellows: {code:java} sc.parallelize(1 to 1, 3).map{ x => throw new RuntimeException("Bad executor")}.collect() {code} Go to the stages page and you will see the Total Tasks is not right in {code:java} Aggregated Metrics by Executor{code} table. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24829) In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or spark-sql
[ https://issues.apache.org/jira/browse/SPARK-24829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-24829: Summary: In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or spark-sql (was: CAST AS FLOAT inconsistent with Hive) > In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or > spark-sql > - > > Key: SPARK-24829 > URL: https://issues.apache.org/jira/browse/SPARK-24829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-07-18_110944.png, 2018-07-18_11.png > > > SELECT CAST('4.56' AS FLOAT) > the result is 4.55942779541 , it should be 4.56 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24829) In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or spark-sql
[ https://issues.apache.org/jira/browse/SPARK-24829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-24829: Attachment: (was: CAST-FLOAT.png) > In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or > spark-sql > - > > Key: SPARK-24829 > URL: https://issues.apache.org/jira/browse/SPARK-24829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-07-18_110944.png, 2018-07-18_11.png > > > SELECT CAST('4.56' AS FLOAT) > the result is 4.55942779541 , it should be 4.56 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24829) In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or spark-sql
[ https://issues.apache.org/jira/browse/SPARK-24829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-24829: Attachment: 2018-07-18_11.png > In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or > spark-sql > - > > Key: SPARK-24829 > URL: https://issues.apache.org/jira/browse/SPARK-24829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-07-18_110944.png, 2018-07-18_11.png > > > SELECT CAST('4.56' AS FLOAT) > the result is 4.55942779541 , it should be 4.56 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24829) In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or spark-sql
[ https://issues.apache.org/jira/browse/SPARK-24829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-24829: Attachment: 2018-07-18_110944.png > In Spark Thrift Server, CAST AS FLOAT inconsistent with spark-shell or > spark-sql > - > > Key: SPARK-24829 > URL: https://issues.apache.org/jira/browse/SPARK-24829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-07-18_110944.png, 2018-07-18_11.png > > > SELECT CAST('4.56' AS FLOAT) > the result is 4.55942779541 , it should be 4.56 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24829) CAST AS FLOAT inconsistent with Hive
[ https://issues.apache.org/jira/browse/SPARK-24829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-24829: Attachment: CAST-FLOAT.png > CAST AS FLOAT inconsistent with Hive > > > Key: SPARK-24829 > URL: https://issues.apache.org/jira/browse/SPARK-24829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zuotingbing >Priority: Major > Attachments: CAST-FLOAT.png > > > SELECT CAST('4.56' AS FLOAT) > the result is 4.55942779541 , it should be 4.56 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24829) CAST AS FLOAT inconsistent with Hive
zuotingbing created SPARK-24829: --- Summary: CAST AS FLOAT inconsistent with Hive Key: SPARK-24829 URL: https://issues.apache.org/jira/browse/SPARK-24829 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: zuotingbing SELECT CAST('4.56' AS FLOAT) the result is 4.55942779541 , it should be 4.56 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19250) In security cluster, spark beeline connect to hive metastore failed
[ https://issues.apache.org/jira/browse/SPARK-19250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493254#comment-16493254 ] zuotingbing commented on SPARK-19250: - Same problem in spark 2.2.1 . But We add kinit for Kerberos before start the thrift server , beeline works well in spark 2.0.2 > In security cluster, spark beeline connect to hive metastore failed > --- > > Key: SPARK-19250 > URL: https://issues.apache.org/jira/browse/SPARK-19250 > Project: Spark > Issue Type: Bug >Reporter: meiyoula >Priority: Major > Labels: security-issue > > 1. starting thriftserver in security mode, set hive.metastore.uris to hive > metastore uri, also hive is in security mode. > 2. when use beeline to create table, it can't connect to hive metastore > successfully, occurs "Failed to find any Kerberos tgt". > {quote} > 2017-01-17 16:25:53,618 | ERROR | [pool-25-thread-1] | SASL negotiation > failure | > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:315) > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738) > at > org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:513) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:249) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1533) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3119) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3138) > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:791) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:755) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1461) > at > org.apache.hadoop.hive.ql.session.SessionState.getUserFromAuthenticator(SessionState.java:1014) > at > org.apache.hadoop.hive.ql.metadata.Table.getEmptyTable(Table.java:177) > at org.apache.hadoop.hive.ql.metadata.Table.(Table.java:119) > at > org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$toHiveTable(HiveClientImpl.scala:803) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:430) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:430) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:430) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:284) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:231) >
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 8:02 AM: -- cc [~vanzin] [~rxin] [~yhuai] was (Author: zuo.tingbing9): cc [~vanzin] [~rxin] Xiao Li > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 8:01 AM: -- cc [~vanzin] [~rxin] Xiao Li was (Author: zuo.tingbing9): cc [~vanzin] [~rxin] [gatorsmile|https://github.com/gatorsmile] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:53 AM: -- cc [~vanzin] [~rxin] [gatorsmile|https://github.com/gatorsmile] was (Author: zuo.tingbing9): cc [~vanzin] [gatorsmile|https://github.com/gatorsmile] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:51 AM: -- cc [~vanzin] [gatorsmile|https://github.com/gatorsmile] was (Author: zuo.tingbing9): cc [~vanzin] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:49 AM: -- cc [~vanzin] gatorsmile was (Author: zuo.tingbing9): cc [~vanzin] cc +gatorsmile+ > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:49 AM: -- cc [~vanzin] was (Author: zuo.tingbing9): cc [~vanzin] gatorsmile > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:48 AM: -- cc [~vanzin] cc @gatorsmile was (Author: zuo.tingbing9): cc [~vanzin] gatorsmile > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:48 AM: -- cc [~vanzin] cc +gatorsmile+ was (Author: zuo.tingbing9): cc [~vanzin] cc @gatorsmile > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:47 AM: -- cc [~vanzin] gatorsmile was (Author: zuo.tingbing9): cc [~vanzin] cc [gatorsmile|https://github.com/gatorsmile] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:45 AM: -- cc [~vanzin] cc [gatorsmile|https://github.com/gatorsmile] was (Author: zuo.tingbing9): cc [~vanzin] [gatorsmile|https://github.com/gatorsmile] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:44 AM: -- cc [~vanzin] [gatorsmile|https://github.com/gatorsmile] was (Author: zuo.tingbing9): cc [~vanzin] *[gatorsmile|https://github.com/gatorsmile]* > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:43 AM: -- cc [~vanzin] *[gatorsmile|https://github.com/gatorsmile]* was (Author: zuo.tingbing9): cc [~vanzin] @*[gatorsmile|https://github.com/gatorsmile]* > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing edited comment on SPARK-15544 at 4/17/18 7:43 AM: -- cc [~vanzin] @*[gatorsmile|https://github.com/gatorsmile]* was (Author: zuo.tingbing9): cc [~vanzin] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440535#comment-16440535 ] zuotingbing commented on SPARK-15544: - cc [~vanzin] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15544) Bouncing Zookeeper node causes Active spark master to exit
[ https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440511#comment-16440511 ] zuotingbing commented on SPARK-15544: - The same issue still occurs in spark 2.3.0. see [SPARK-23530|https://issues.apache.org/jira/browse/SPARK-23530] > Bouncing Zookeeper node causes Active spark master to exit > -- > > Key: SPARK-15544 > URL: https://issues.apache.org/jira/browse/SPARK-15544 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04. Zookeeper 3.4.6 with 3-node quorum >Reporter: Steven Lowenthal >Priority: Major > > Shutting Down a single zookeeper node caused spark master to exit. The > master should have connected to a second zookeeper node. > {code:title=log output} > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138 > 16/05/25 18:21:28 INFO master.Master: Launching executor > app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129 > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x154dfc0426b0054, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data > from server sessionid 0x254c701f28d0053, likely server has closed socket, > closing socket connection and attempting reconnect > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED > 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost > leadership > 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master > shutting down. }} > {code} > spark-env.sh: > {code:title=spark-env.sh} > export SPARK_LOCAL_DIRS=/ephemeral/spark/local > export SPARK_WORKER_DIR=/ephemeral/spark/work > export SPARK_LOG_DIR=/var/log/spark > export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop > export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181" > export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23745: Description: !2018-03-20_164832.png! when start the HiveThriftServer2, we create some directories for hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not remove these directories. The directories could accumulate a lot. was: !2018-03-20_164832.png! when start the HiveThriftServer2, we create some directories for hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not remove these directories.The directories could accumulate a lot. > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-20_164832.png > > > !2018-03-20_164832.png! > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories. The directories could accumulate a lot. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23745: Description: !2018-03-20_164832.png! when start the HiveThriftServer2, we create some directories for hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not remove these directories.The directories could accumulate a lot. was: when start the HiveThriftServer2, we create some directories for hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not remove these directories.The directories could accumulate a lot. > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-20_164832.png > > > !2018-03-20_164832.png! > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories.The directories could accumulate a lot. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23745: Description: when start the HiveThriftServer2, we create some directories for hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not remove these directories.The directories could accumulate a lot. was: !image-2018-03-20-16-49-00-175.png! when start the HiveThriftServer2, we create some directories for hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not remove these directories.The directories could accumulate a lot. > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-20_164832.png > > > > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories.The directories could accumulate a lot. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
[ https://issues.apache.org/jira/browse/SPARK-23745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23745: Attachment: 2018-03-20_164832.png > Remove the directories of the “hive.downloaded.resources.dir” when > HiveThriftServer2 stopped > > > Key: SPARK-23745 > URL: https://issues.apache.org/jira/browse/SPARK-23745 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: linux >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-20_164832.png > > > !image-2018-03-20-16-49-00-175.png! > when start the HiveThriftServer2, we create some directories for > hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not > remove these directories.The directories could accumulate a lot. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23745) Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped
zuotingbing created SPARK-23745: --- Summary: Remove the directories of the “hive.downloaded.resources.dir” when HiveThriftServer2 stopped Key: SPARK-23745 URL: https://issues.apache.org/jira/browse/SPARK-23745 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Environment: linux Reporter: zuotingbing !image-2018-03-20-16-49-00-175.png! when start the HiveThriftServer2, we create some directories for hive.downloaded.resources.dir, but when stop the HiveThriftServer2 we do not remove these directories.The directories could accumulate a lot. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Description: !2018-03-07_121010.png! when the hive session closed, we should also cleanup the .pipeout file. was: !2018-03-07_121010.png! when the hive session closed, we should also cleanup the .pipeout file. > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-07_121010.png > > > !2018-03-07_121010.png! > > when the hive session closed, we should also cleanup the .pipeout file. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Description: !2018-03-07_121010.png! when the hive session closed, we should also cleanup the .pipeout file. was: when the hive session closed, we should also cleanup the .pipeout file. > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-07_121010.png > > > !2018-03-07_121010.png! > when the hive session closed, we should also cleanup the .pipeout file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Attachment: 2018-03-07_121010.png > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-07_121010.png > > > > > when the hive session closed, we should also cleanup the .pipeout file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Attachment: (was: 2018-03-07_121010.png) > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-07_121010.png > > > > > when the hive session closed, we should also cleanup the .pipeout file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Description: when the hive session closed, we should also cleanup the .pipeout file. was: !2018-03-01_202415.png! when the hive session closed, we should also cleanup the .pipeout file. > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-07_121010.png > > > > > when the hive session closed, we should also cleanup the .pipeout file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Attachment: 2018-03-07_121010.png > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-07_121010.png > > > !2018-03-01_202415.png! > > when the hive session closed, we should also cleanup the .pipeout file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Attachment: (was: 2018-03-01_202415.png) > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-07_121010.png > > > !2018-03-01_202415.png! > > when the hive session closed, we should also cleanup the .pipeout file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Description: !2018-03-01_202415.png! when the hive session closed, we should also cleanup the .pipeout file. was: when the hive session closed, we should also cleanup the .pipeout file. > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-01_202415.png > > > !2018-03-01_202415.png! > > when the hive session closed, we should also cleanup the .pipeout file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
[ https://issues.apache.org/jira/browse/SPARK-23547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-23547: Attachment: 2018-03-01_202415.png > Cleanup the .pipeout file when the Hive Session closed > -- > > Key: SPARK-23547 > URL: https://issues.apache.org/jira/browse/SPARK-23547 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-03-01_202415.png > > > when the hive session closed, we should also cleanup the .pipeout file. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23547) Cleanup the .pipeout file when the Hive Session closed
zuotingbing created SPARK-23547: --- Summary: Cleanup the .pipeout file when the Hive Session closed Key: SPARK-23547 URL: https://issues.apache.org/jira/browse/SPARK-23547 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: zuotingbing when the hive session closed, we should also cleanup the .pipeout file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22793) Memory leak in Spark Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292314#comment-16292314 ] zuotingbing edited comment on SPARK-22793 at 12/26/17 2:00 AM: --- yes the master branch also has this problem. was (Author: zuo.tingbing9): yes the master branch also has this problem, but the difference is so big between branch master and 2.0 . Could someone help to merge this to the master branch? > Memory leak in Spark Thrift Server > -- > > Key: SPARK-22793 > URL: https://issues.apache.org/jira/browse/SPARK-22793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.2.1 >Reporter: zuotingbing >Priority: Critical > > 1. Start HiveThriftServer2. > 2. Connect to thriftserver through beeline. > 3. Close the beeline. > 4. repeat step2 and step 3 for several times, which caused the leak of Memory. > we found there are many directories never be dropped under the path > {code:java} > hive.exec.local.scratchdir > {code} and > {code:java} > hive.exec.scratchdir > {code} , as we know the scratchdir has been added to deleteOnExit when it be > created. So it means that the cache size of FileSystem deleteOnExit will keep > increasing until JVM terminated. > In addition, we use > {code:java} > jmap -histo:live [PID] > {code} to printout the size of objects in HiveThriftServer2 Process, we can > find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and > "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though > we closed all the beeline connections, which caused the leak of Memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22793) Memory leak in Spark Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-22793: Affects Version/s: 2.2.1 > Memory leak in Spark Thrift Server > -- > > Key: SPARK-22793 > URL: https://issues.apache.org/jira/browse/SPARK-22793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.2.1 >Reporter: zuotingbing >Priority: Critical > > 1. Start HiveThriftServer2. > 2. Connect to thriftserver through beeline. > 3. Close the beeline. > 4. repeat step2 and step 3 for several times, which caused the leak of Memory. > we found there are many directories never be dropped under the path > {code:java} > hive.exec.local.scratchdir > {code} and > {code:java} > hive.exec.scratchdir > {code} , as we know the scratchdir has been added to deleteOnExit when it be > created. So it means that the cache size of FileSystem deleteOnExit will keep > increasing until JVM terminated. > In addition, we use > {code:java} > jmap -histo:live [PID] > {code} to printout the size of objects in HiveThriftServer2 Process, we can > find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and > "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though > we closed all the beeline connections, which caused the leak of Memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22837) Session timeout checker does not work in SessionManager
[ https://issues.apache.org/jira/browse/SPARK-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-22837: Summary: Session timeout checker does not work in SessionManager (was: Session timeout checker does not work in Hive Thrift Server) > Session timeout checker does not work in SessionManager > --- > > Key: SPARK-22837 > URL: https://issues.apache.org/jira/browse/SPARK-22837 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.2.1 >Reporter: zuotingbing > > Currently, > {code:java} > SessionManager.init > {code} > will not be called, the config > {code:java} > HIVE_SERVER2_SESSION_CHECK_INTERVAL HIVE_SERVER2_IDLE_SESSION_TIMEOUT > HIVE_SERVER2_IDLE_SESSION_CHECK_OPERATION > {code} > of session timeout checker can not be loaded, it cause the session timeout > checker does not work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22837) Session timeout checker does not work in Hive Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-22837: Description: Currently, {code:java} SessionManager.init {code} will not be called, the config {code:java} HIVE_SERVER2_SESSION_CHECK_INTERVAL HIVE_SERVER2_IDLE_SESSION_TIMEOUT HIVE_SERVER2_IDLE_SESSION_CHECK_OPERATION {code} of session timeout checker can not be loaded, it cause the session timeout checker does not work. was: Currently, {code:java} SessionManager.int {code} will not be called, the config {code:java} HIVE_SERVER2_SESSION_CHECK_INTERVAL HIVE_SERVER2_IDLE_SESSION_TIMEOUT HIVE_SERVER2_IDLE_SESSION_CHECK_OPERATION {code} of session timeout checker can not be loaded, it cause the session timeout checker does not work. > Session timeout checker does not work in Hive Thrift Server > --- > > Key: SPARK-22837 > URL: https://issues.apache.org/jira/browse/SPARK-22837 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.2.1 >Reporter: zuotingbing > > Currently, > {code:java} > SessionManager.init > {code} > will not be called, the config > {code:java} > HIVE_SERVER2_SESSION_CHECK_INTERVAL HIVE_SERVER2_IDLE_SESSION_TIMEOUT > HIVE_SERVER2_IDLE_SESSION_CHECK_OPERATION > {code} > of session timeout checker can not be loaded, it cause the session timeout > checker does not work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22837) Session timeout checker does not work in Hive Thrift Server
zuotingbing created SPARK-22837: --- Summary: Session timeout checker does not work in Hive Thrift Server Key: SPARK-22837 URL: https://issues.apache.org/jira/browse/SPARK-22837 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.1, 2.0.2 Reporter: zuotingbing Currently, {code:java} SessionManager.int {code} will not be called, the config {code:java} HIVE_SERVER2_SESSION_CHECK_INTERVAL HIVE_SERVER2_IDLE_SESSION_TIMEOUT HIVE_SERVER2_IDLE_SESSION_CHECK_OPERATION {code} of session timeout checker can not be loaded, it cause the session timeout checker does not work. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22793) Memory leak in Spark Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292314#comment-16292314 ] zuotingbing edited comment on SPARK-22793 at 12/15/17 10:48 AM: yes the master branch also has this problem, but the difference is so big between branch master and 2.0 . Could someone help to merge this to the master branch? was (Author: zuo.tingbing9): yes the master branch also has this problem,but the different is so big between branch master to 2.0 . i am not sure this can be merged to the master branch. > Memory leak in Spark Thrift Server > -- > > Key: SPARK-22793 > URL: https://issues.apache.org/jira/browse/SPARK-22793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: zuotingbing >Priority: Critical > > 1. Start HiveThriftServer2. > 2. Connect to thriftserver through beeline. > 3. Close the beeline. > 4. repeat step2 and step 3 for several times, which caused the leak of Memory. > we found there are many directories never be dropped under the path > {code:java} > hive.exec.local.scratchdir > {code} and > {code:java} > hive.exec.scratchdir > {code} , as we know the scratchdir has been added to deleteOnExit when it be > created. So it means that the cache size of FileSystem deleteOnExit will keep > increasing until JVM terminated. > In addition, we use > {code:java} > jmap -histo:live [PID] > {code} to printout the size of objects in HiveThriftServer2 Process, we can > find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and > "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though > we closed all the beeline connections, which caused the leak of Memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22793) Memory leak in Spark Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292314#comment-16292314 ] zuotingbing commented on SPARK-22793: - yes the master branch also has this problem,but the different is so big between branch master to 2.0 . i am not sure this can be merged to the master branch. > Memory leak in Spark Thrift Server > -- > > Key: SPARK-22793 > URL: https://issues.apache.org/jira/browse/SPARK-22793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: zuotingbing >Priority: Critical > > 1. Start HiveThriftServer2. > 2. Connect to thriftserver through beeline. > 3. Close the beeline. > 4. repeat step2 and step 3 for several times, which caused the leak of Memory. > we found there are many directories never be dropped under the path > {code:java} > hive.exec.local.scratchdir > {code} and > {code:java} > hive.exec.scratchdir > {code} , as we know the scratchdir has been added to deleteOnExit when it be > created. So it means that the cache size of FileSystem deleteOnExit will keep > increasing until JVM terminated. > In addition, we use > {code:java} > jmap -histo:live [PID] > {code} to printout the size of objects in HiveThriftServer2 Process, we can > find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and > "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though > we closed all the beeline connections, which caused the leak of Memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22793) Memory leak in Spark Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292248#comment-16292248 ] zuotingbing commented on SPARK-22793: - ok, i will try to check it in master branch. Thanks. > Memory leak in Spark Thrift Server > -- > > Key: SPARK-22793 > URL: https://issues.apache.org/jira/browse/SPARK-22793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: zuotingbing >Priority: Critical > > 1. Start HiveThriftServer2. > 2. Connect to thriftserver through beeline. > 3. Close the beeline. > 4. repeat step2 and step 3 for several times, which caused the leak of Memory. > we found there are many directories never be dropped under the path > {code:java} > hive.exec.local.scratchdir > {code} and > {code:java} > hive.exec.scratchdir > {code} , as we know the scratchdir has been added to deleteOnExit when it be > created. So it means that the cache size of FileSystem deleteOnExit will keep > increasing until JVM terminated. > In addition, we use > {code:java} > jmap -histo:live [PID] > {code} to printout the size of objects in HiveThriftServer2 Process, we can > find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and > "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though > we closed all the beeline connections, which caused the leak of Memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22793) Memory leak in Spark Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292174#comment-16292174 ] zuotingbing commented on SPARK-22793: - {code:java} lazy val metadataHive: HiveClient = sharedState.metadataHive.newSession() {code} HiveClient has been created by {code:java} sharedState.metadataHive {code} but will be created again in {code:java} .newSession() {code} > Memory leak in Spark Thrift Server > -- > > Key: SPARK-22793 > URL: https://issues.apache.org/jira/browse/SPARK-22793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: zuotingbing >Priority: Critical > > 1. Start HiveThriftServer2. > 2. Connect to thriftserver through beeline. > 3. Close the beeline. > 4. repeat step2 and step 3 for several times, which caused the leak of Memory. > we found there are many directories never be dropped under the path > {code:java} > hive.exec.local.scratchdir > {code} and > {code:java} > hive.exec.scratchdir > {code} , as we know the scratchdir has been added to deleteOnExit when it be > created. So it means that the cache size of FileSystem deleteOnExit will keep > increasing until JVM terminated. > In addition, we use > {code:java} > jmap -histo:live [PID] > {code} to printout the size of objects in HiveThriftServer2 Process, we can > find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and > "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though > we closed all the beeline connections, which caused the leak of Memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22793) Memory leak in Spark Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-22793: Description: 1. Start HiveThriftServer2. 2. Connect to thriftserver through beeline. 3. Close the beeline. 4. repeat step2 and step 3 for several times, which caused the leak of Memory. we found there are many directories never be dropped under the path {code:java} hive.exec.local.scratchdir {code} and {code:java} hive.exec.scratchdir {code} , as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem deleteOnExit will keep increasing until JVM terminated. In addition, we use {code:java} jmap -histo:live [PID] {code} to printout the size of objects in HiveThriftServer2 Process, we can find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though we closed all the beeline connections, which caused the leak of Memory. was: 1. Start HiveThriftServer2 2. Connect to thriftserver through beeline 3. Close the beeline 4. repeat step2 and step 3 for several times we found there are many directories never be dropped under the path {code:java} hive.exec.local.scratchdir {code} and {code:java} hive.exec.scratchdir {code} , as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem deleteOnExit will keep increasing until JVM terminated. In addition, we use {code:java} jmap -histo:live [PID] {code} to printout the size of objects in HiveThriftServer2 Process, we can find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though we closed all the beeline connections, which caused the leak of Memory. > Memory leak in Spark Thrift Server > -- > > Key: SPARK-22793 > URL: https://issues.apache.org/jira/browse/SPARK-22793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: zuotingbing >Priority: Critical > > 1. Start HiveThriftServer2. > 2. Connect to thriftserver through beeline. > 3. Close the beeline. > 4. repeat step2 and step 3 for several times, which caused the leak of Memory. > we found there are many directories never be dropped under the path > {code:java} > hive.exec.local.scratchdir > {code} and > {code:java} > hive.exec.scratchdir > {code} , as we know the scratchdir has been added to deleteOnExit when it be > created. So it means that the cache size of FileSystem deleteOnExit will keep > increasing until JVM terminated. > In addition, we use > {code:java} > jmap -histo:live [PID] > {code} to printout the size of objects in HiveThriftServer2 Process, we can > find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and > "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though > we closed all the beeline connections, which caused the leak of Memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22793) Memory leak in Spark Thrift Server
[ https://issues.apache.org/jira/browse/SPARK-22793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-22793: Description: 1. Start HiveThriftServer2 2. Connect to thriftserver through beeline 3. Close the beeline 4. repeat step2 and step 3 for several times we found there are many directories never be dropped under the path {code:java} hive.exec.local.scratchdir {code} and {code:java} hive.exec.scratchdir {code} , as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem deleteOnExit will keep increasing until JVM terminated. In addition, we use {code:java} jmap -histo:live [PID] {code} to printout the size of objects in HiveThriftServer2 Process, we can find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though we closed all the beeline connections, which caused the leak of Memory. was: 1. Start HiveThriftServer2 2. Connect to thriftserver through beeline 3. Close the beeline 4. repeat step2 and step 3 for several times we found there are many directories never be dropped under the path {code:java} hive.exec.local.scratchdir {code} and {code:java} hive.exec.scratchdir {code} , as we know the scratchdir is added to deleteOnExit when it be created. So it means that the cache size of FileSystem deleteOnExit will keep increasing until JVM terminated. In addition, we use {code:java} jmap -histo:live [PID] {code} to printout the size of objects in HiveThriftServer2 Process, we can find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though we closed all the beeline connections, which caused the leak of Memory. > Memory leak in Spark Thrift Server > -- > > Key: SPARK-22793 > URL: https://issues.apache.org/jira/browse/SPARK-22793 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: zuotingbing >Priority: Critical > > 1. Start HiveThriftServer2 > 2. Connect to thriftserver through beeline > 3. Close the beeline > 4. repeat step2 and step 3 for several times > we found there are many directories never be dropped under the path > {code:java} > hive.exec.local.scratchdir > {code} and > {code:java} > hive.exec.scratchdir > {code} , as we know the scratchdir has been added to deleteOnExit when it be > created. So it means that the cache size of FileSystem deleteOnExit will keep > increasing until JVM terminated. > In addition, we use > {code:java} > jmap -histo:live [PID] > {code} to printout the size of objects in HiveThriftServer2 Process, we can > find the object "org.apache.spark.sql.hive.client.HiveClientImpl" and > "org.apache.hadoop.hive.ql.session.SessionState" keep increasing even though > we closed all the beeline connections, which caused the leak of Memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org