[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066524#comment-15066524 ] Stefania edited comment on CASSANDRA-8072 at 12/21/15 3:54 PM: --- Building on [~brandon.williams] previous analysis but taking into account more recent changes where we do close sockets, the problem is still that the seed node is sending the ACK to the old socket, even after it has been closed by the decommissioned node. This is because we only send on these sockets, so we cannot know when they are closed until the send buffers are exceeded or unless we try to read from them as well. However, the problem should now only be true until the node is convicted, approx 10 seconds with a {{phi_convict_threshold}} of 8. I verified this by adding a sleep of 15 seconds in my test before restarting the node, and it restarted without problems. [~slowenthal] or [~rhatch] would you be able to confirm this with your tests? If we cannot detect when an outgoing socket is closed by its peer, then we need an out-of-bound notification. This could come from the departing node announcing its shutdown at the end of its decommission but the existing logic in {{Gossiper.stop()}} prevents this for the dead states (*removing, removed, left and hibernate*) or for *bootstrapping*. This was introduced by CASSANDRA-8336 and the same problem has already been raised in CASSANDRA-9630. Even if we undo CASSANDRA-8336 there is then another issue: since CASSANDRA-9765 we can no longer join a cluster in status SHUTDOWN and I believe this is correct. So the answer cannot be to announce a shutdown after decommission, not without significant changes to the Gossip protocol. Closing the socket earlier, say when we get the status LEFT notification, is not sufficient because during the RING_DELAY sleep period we may re-establish the connection to the node before it dies, typically for a Gossip update. So I think we only have two options: * read from outgoing sockets purely to detect when they are closed * send a new GOSSIP flag indicating it is time to close the sockets to a node was (Author: stefania): Building on [~brandon.williams] previous analysis but taking into account more recent changes where we do close sockets, the problem is still that the seed node is sending the ACK to the old socket, even after it has been closed by the decommissioned node. This is because we only send on these sockets, so we cannot know when they are closed until the send buffers are exceeded or unless we try to read from them as well. However, the problem should now only be true until the node is convicted, approx 10 seconds with a {{phi_convict_threshold}} of 8. I verified this by adding a sleep of 15 seconds in my test before restarting the node, and it restarted without problems. [~slowenthal] would you be able to confirm this with your tests? If we cannot detect when an outgoing socket is closed by its peer, then we need an out-of-bound notification. This could come from the departing node announcing its shutdown at the end of its decommission but the existing logic in {{Gossiper.stop()}} prevents this for the dead states (*removing, removed, left and hibernate*) or for *bootstrapping*. This was introduced by CASSANDRA-8336 and the same problem has already been raised in CASSANDRA-9630. Even if we undo CASSANDRA-8336 there is then another issue: since CASSANDRA-9765 we can no longer join a cluster in status SHUTDOWN and I believe this is correct. So the answer cannot be to announce a shutdown after decommission, not without significant changes to the Gossip protocol. Closing the socket earlier, say when we get the status LEFT notification, is not sufficient because during the RING_DELAY sleep period we may re-establish the connection to the node before it dies, typically for a Gossip update. So I think we only have two options: * read from outgoing sockets purely to detect when they are closed * send a new GOSSIP flag indicating it is time to close the sockets to a node > Exception during startup: Unable to gossip with any seeds > - > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug > Components: Lifecycle >Reporter: Ryan Springer >Assignee: Stefania > Fix For: 2.1.x > > Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, > casandra-system-log-with-assert-patch.log, screenshot-1.png, > trace_logs.tar.bz2 > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster > in either ec2 or locally, an error occurs sometimes with one
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985589#comment-14985589 ] Kenneth Failbus edited comment on CASSANDRA-8072 at 11/2/15 5:38 PM: - Upon enabling trace on the one see and re-bootstrapping the new node, I got the following exception on the node that was bootstrapping. {code} 2015-11-02 17:34:52,150 [ACCEPT-/10.22.168.53] DEBUG MessagingService Error reading the socket Socket[addr=/10.xx.xx.xx,port=46678,localport=10xxx] java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:916) {code} was (Author: kenfailbus): Upon enabling trace on the one see and re-bootstrapping the new node, I got the following exception {code} 2015-11-02 17:34:52,150 [ACCEPT-/10.22.168.53] DEBUG MessagingService Error reading the socket Socket[addr=/10.xx.xx.xx,port=46678,localport=10xxx] java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:916) {code} > Exception during startup: Unable to gossip with any seeds > - > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug >Reporter: Ryan Springer >Assignee: Stefania > Fix For: 2.1.x > > Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, > casandra-system-log-with-assert-patch.log, screenshot-1.png, > trace_logs.tar.bz2 > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster > in either ec2 or locally, an error occurs sometimes with one of the nodes > refusing to start C*. The error in the /var/log/cassandra/system.log is: > ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) > Exception encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java > (line 1279) Announcing shutdown > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 > MessagingService.java (line 701) Waiting for messaging service to quiesce > INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 > MessagingService.java (line 941) MessagingService has terminated the accept() > thread > This errors does not always occur when provisioning a 2-node cluster, but > probably around half of the time on only one of the nodes. I haven't been > able to reproduce this error with DSC 2.0.9, and there have been no code or > definition file changes in Opscenter. > I can reproduce locally with the above steps. I'm happy to test any proposed > fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985589#comment-14985589 ] Kenneth Failbus edited comment on CASSANDRA-8072 at 11/2/15 5:49 PM: - Upon enabling trace on the one seed node and re-bootstrapping the new node, I got the following exception on the node that was bootstrapping. {code} 2015-11-02 17:34:52,150 [ACCEPT-/10.22.168.53] DEBUG MessagingService Error reading the socket Socket[addr=/10.xx.xx.xx,port=46678,localport=10xxx] java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:916) {code} was (Author: kenfailbus): Upon enabling trace on the one see and re-bootstrapping the new node, I got the following exception on the node that was bootstrapping. {code} 2015-11-02 17:34:52,150 [ACCEPT-/10.22.168.53] DEBUG MessagingService Error reading the socket Socket[addr=/10.xx.xx.xx,port=46678,localport=10xxx] java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:916) {code} > Exception during startup: Unable to gossip with any seeds > - > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug >Reporter: Ryan Springer >Assignee: Stefania > Fix For: 2.1.x > > Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, > casandra-system-log-with-assert-patch.log, screenshot-1.png, > trace_logs.tar.bz2 > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster > in either ec2 or locally, an error occurs sometimes with one of the nodes > refusing to start C*. The error in the /var/log/cassandra/system.log is: > ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) > Exception encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java > (line 1279) Announcing shutdown > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 > MessagingService.java (line 701) Waiting for messaging service to quiesce > INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 > MessagingService.java (line 941) MessagingService has terminated the accept() > thread > This errors does not always occur when provisioning a 2-node cluster, but > probably around half of the time on only one of the nodes. I haven't been > able to reproduce this error with DSC 2.0.9, and there have been no code or > definition file changes in Opscenter. > I can reproduce locally with the above steps. I'm happy to test any proposed > fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985589#comment-14985589 ] Kenneth Failbus edited comment on CASSANDRA-8072 at 11/2/15 5:50 PM: - Upon enabling trace on the one seed node and re-bootstrapping the new node, I got the following exception on the node that was bootstrapping. {code} 2015-11-02 17:34:52,150 [ACCEPT-/10.22.168.53] DEBUG MessagingService Error reading the socket Socket[addr=/10.xx.xx.xx,port=46678,localport=10xxx] java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:916) {code} And then usual exception that this ticket is mentioning about {code} 2015-11-02 17:34:55,526 [EXPIRING-MAP-REAPER:1] TRACE ExpiringMap Expired 0 entries 2015-11-02 17:35:00,526 [EXPIRING-MAP-REAPER:1] TRACE ExpiringMap Expired 0 entries 2015-11-02 17:35:05,527 [EXPIRING-MAP-REAPER:1] TRACE ExpiringMap Expired 0 entries 2015-11-02 17:35:10,527 [EXPIRING-MAP-REAPER:1] TRACE ExpiringMap Expired 0 entries 2015-11-02 17:35:11,982 [main] ERROR CassandraDaemon Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:623) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:515) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:437) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:423) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:641) 2015-11-02 17:35:11,986 [Thread-7] INFO DseDaemon DSE shutting down... 2015-11-02 17:35:11,987 [StorageServiceShutdownHook] WARN Gossiper No local state or state is in silent shutdown, not announcing shutdown 2015-11-02 17:35:11,987 [StorageServiceShutdownHook] INFO MessagingService Waiting for messaging service to quiesce 2015-11-02 17:35:11,987 [StorageServiceShutdownHook] DEBUG MessagingService Closing accept() thread 2015-11-02 17:35:11,988 [ACCEPT-/10.22.168.53] DEBUG MessagingService Asynchronous close seen by server thread 2015-11-02 17:35:11,988 [ACCEPT-/10.22.168.53] INFO MessagingService MessagingService has terminated the accept() thread 2015-11-02 17:35:12,068 [Thread-7] ERROR CassandraDaemon Exception in thread Thread[Thread-7,5,main] {code} was (Author: kenfailbus): Upon enabling trace on the one seed node and re-bootstrapping the new node, I got the following exception on the node that was bootstrapping. {code} 2015-11-02 17:34:52,150 [ACCEPT-/10.22.168.53] DEBUG MessagingService Error reading the socket Socket[addr=/10.xx.xx.xx,port=46678,localport=10xxx] java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:916) {code} > Exception during startup: Unable to gossip with any seeds > - > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug >Reporter: Ryan Springer >Assignee: Stefania > Fix For: 2.1.x > > Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, > casandra-system-log-with-assert-patch.log, screenshot-1.png, > trace_logs.tar.bz2 > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster > in either ec2 or locally, an error occurs sometimes with one of the nodes > refusing to start C*. The error in the /var/log/cassandra/system.log is: > ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742914#comment-14742914 ] Steven Lowenthal edited comment on CASSANDRA-8072 at 9/14/15 4:49 AM: -- I can easily reproduce this with my automated launcher I even tried to better randomize when non-seed nodes come in to join. I recently noticed that the seed node has a socket stuck in CLOSE_WAIT for the nodes that report can't gossip with any seeds. Perhaps the solution lies in ensuring that both ends of the connection properly close the connection. It's likely the client (the node asking to join) exceptions out and dies without elegantly closing the connection. See Screenshot. Also well-known port afs3-fileserver is 7000. was (Author: slowenthal): I can easily reproduce this with my automated launcher I even tried to better randomize when non-seed nodes come in to join. I recently noticed that the seed node has a socket stuck in CLOSE_WAIT for the nodes that report can't gossip with any seeds. Perhaps the solution lies in ensuring that both ends of the connection properly close the connection. It's likely the client (the node asking to join) exceptions out and dies without elegantly closing the connection. > Exception during startup: Unable to gossip with any seeds > - > > Key: CASSANDRA-8072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 > Project: Cassandra > Issue Type: Bug >Reporter: Ryan Springer >Assignee: Brandon Williams > Fix For: 2.1.x > > Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, > cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, > casandra-system-log-with-assert-patch.log, screenshot-1.png, > trace_logs.tar.bz2 > > > When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster > in either ec2 or locally, an error occurs sometimes with one of the nodes > refusing to start C*. The error in the /var/log/cassandra/system.log is: > ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) > Exception encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java > (line 1279) Announcing shutdown > INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 > MessagingService.java (line 701) Waiting for messaging service to quiesce > INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 > MessagingService.java (line 941) MessagingService has terminated the accept() > thread > This errors does not always occur when provisioning a 2-node cluster, but > probably around half of the time on only one of the nodes. I haven't been > able to reproduce this error with DSC 2.0.9, and there have been no code or > definition file changes in Opscenter. > I can reproduce locally with the above steps. I'm happy to test any proposed > fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580341#comment-14580341 ] Andreas Schnitzerling edited comment on CASSANDRA-8072 at 6/10/15 10:15 AM: Hello, I made following steps: I decom a 2.0.15 node with 128 vnodes and tried to bootstrap 2.1.6-R on the same node w/ 256 vnodes in write survey mode to test. 2.1.6 doesn't bootstrap because of the unable gossib exception but the old 2.0.15 does it w/o problems. Even if i use cassandra.yaml from 2.0.15 (deleted properties invalid for 2.1.6) it doesn't start. I have 14 nodes 2.0.15 running on Windows 7. {panel:title=system.log} ERROR [main] 2015-06-10 12:03:22,200 CassandraDaemon.java:553 - Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1307) ~[apache-cassandra-2.1.6.jar:2.1.6] at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530) ~[apache-cassandra-2.1.6.jar:2.1.6] at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:774) ~[apache-cassandra-2.1.6.jar:2.1.6] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:711) ~[apache-cassandra-2.1.6.jar:2.1.6] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:602) ~[apache-cassandra-2.1.6.jar:2.1.6] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394) [apache-cassandra-2.1.6.jar:2.1.6] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:536) [apache-cassandra-2.1.6.jar:2.1.6] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) [apache-cassandra-2.1.6.jar:2.1.6] WARN [StorageServiceShutdownHook] 2015-06-10 12:03:22,200 Gossiper.java:1418 - No local state or state is in silent shutdown, not announcing shutdown INFO [StorageServiceShutdownHook] 2015-06-10 12:03:22,200 MessagingService.java:708 - Waiting for messaging service to quiesce INFO [ACCEPT-PC5771/10.2.0.61] 2015-06-10 12:03:22,200 MessagingService.java:958 - MessagingService has terminated the accept() thread {panel} was (Author: andie78): Hello, I made following steps: I decom a 2.0.15 node with 128 vnodes and tried to bootstrap 2.1.6-R on the same node w/ 256 vnodes in write survey mode to test. 2.1.6 doesn't bootstrap because of the unable gossib exception but the old 2.0.15 does it w/o problems. Even if i use cassandra.yaml from 2.0.15 (deletetd properties invalid for 2.1.6) it doesn't start. I have 14 nodes 2.0.15 running on Windows 7. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.1.x, 2.0.x Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept()
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580409#comment-14580409 ] Andreas Schnitzerling edited comment on CASSANDRA-8072 at 6/10/15 11:16 AM: I tried severel times to switch between 2.0.15 and 2.1.6 (start bootstr at 2.0.15, stop, copy data to 2.1.6) but 2.1.6 doesn't start probably. One time, 2.1.6 had seen only other nodes, not himself. Now I deleted all data again on that node, removed the node with nodetool and now the bootstrap w/ 2.0.15 works well in write survey mode. I'll wait until bootstrap finished and then I try to upgrade the same node to 2.1.6 w/ write survey as well. Seems like 2.1.6 is not backward compatible bootstrapping to a 2.0.x cluster. was (Author: andie78): I tried severel times to switch between 2.0.15 and 2.1.6 (start bootstr at 2.0.15, stop, copy data to 2.1.6) but 2.1.6 don't start probably. One time, 2.1.6 have seen only other nodes, not himself. Now I deleted all again on that node, removed the node with nodetool and now the bootstrap w/ 2.0.15 works well in write survey mode. I'll wait until bootstrap finished and then I try to upgrade the same node to 2.1.6 w/ write survey as well. Seems like 2.1.6 is not backward compatible bootstrapping to a 2.0.x cluster. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.1.x, 2.0.x Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580409#comment-14580409 ] Andreas Schnitzerling edited comment on CASSANDRA-8072 at 6/10/15 11:17 AM: I tried severel times to switch between 2.0.15 and 2.1.6 (start bootstr at 2.0.15, stop, copy data to 2.1.6) but 2.1.6 doesn't start probably. One time, 2.1.6 had seen only other nodes, not himself. Now I deleted all data again on that node, removed the node with nodetool and now the bootstrap w/ 2.0.15 works well in write survey mode. I'll wait until bootstrap finished and then I try to upgrade the same node to 2.1.6 w/ write survey as well. Seems like 2.1.6 is not backward compatible bootstrapping to a 2.0.x cluster. Can u confirm that behavior? was (Author: andie78): I tried severel times to switch between 2.0.15 and 2.1.6 (start bootstr at 2.0.15, stop, copy data to 2.1.6) but 2.1.6 doesn't start probably. One time, 2.1.6 had seen only other nodes, not himself. Now I deleted all data again on that node, removed the node with nodetool and now the bootstrap w/ 2.0.15 works well in write survey mode. I'll wait until bootstrap finished and then I try to upgrade the same node to 2.1.6 w/ write survey as well. Seems like 2.1.6 is not backward compatible bootstrapping to a 2.0.x cluster. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.1.x, 2.0.x Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500545#comment-14500545 ] Brandon Williams edited comment on CASSANDRA-8072 at 4/19/15 6:39 PM: -- After deep packet inspection, I believe I've found the root non-reconnectable snitch part of this issue. When you decom a node, it never correctly tears down its ITC pools, which leaves the other side with a dead OTC pool: {noformat} tcp1 0 10.208.8.123:33441 10.208.8.63:7000CLOSE_WAIT 18401/java {noformat} Now when you try to bootstrap with the same IP, the shadow syn is correctly sent and the ack reply is built and queued, but MS tries to use the now defunct OTC pool and the message never makes it back to the node, since it just sends TCP RSTs which finally kills the connection. But since the gossip syn is only sent once, the seed has nothing else to send the node and never reestablishes the connection, leaving the bootstrapping node thinking it never talked to a seed and throwing this error. was (Author: brandon.williams): After deep packet inspection, I believe I've found the root non-reconnectable snitch part of this issue. When you decom a node, it never correctly tears down its ITC pools, which leaves the other side with a dead OTC pool: {noformat} tcp1 0 10.208.8.123:33441 10.208.8.63:7000CLOSE_WAIT 18401/java {noformat} Now when you try to bootstrap with the same IP, the shadow syn is correctly sent and the ack reply is built and queued, but MS tries to use the now default OTC pool and the message never makes it back to the node, since it just sends RSTs which finally kills the connection. But since the syn is only sent once, the seed has nothing else to send the node and never reestablishes the connection, leaving the bootstrapping node thinking it never talked to a seed and throwing this error. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.0.15, 2.1.5 Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490383#comment-14490383 ] John Alberts edited comment on CASSANDRA-8072 at 4/10/15 9:23 PM: -- Logs from cassandra cluster with logging set to TRACE. This is from a new node launched and cassandra failed to start. This is for a cluster running on EC2 using the ec2multiregion snitch. I was able to reproduce this issue on a new cluster, decommissioned a node, shut it down, brought up a new node with the same EIP and this failed. was (Author: albertsj1): Logs from cassandra cluster with logging set to TRACE. This is from a new node launched and cassandra failed to start. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.0.15, 2.1.5 Attachments: casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201236#comment-14201236 ] Joseph Clark edited comment on CASSANDRA-8072 at 11/6/14 11:52 PM: --- [CASSANDRA-8274|https://issues.apache.org/jira/browse/CASSANDRA-8274] appears to me to be the root cause, in my situation at least, to [CASSANDRA-7292|https://issues.apache.org/jira/browse/CASSANDRA-7292]. I'm still not convinced that 7292 and 8072 are duplicate issues. was (Author: jw.clark): [CASSANDRA-8274|https://issues.apache.org/jira/browse/CASSANDRA-8274] appears to me to be the root cause, in my situation at least, to [CASSANDRA-7292|https://issues.apache.org/jira/browse/CASSANDRA-7292]. I'm still not convinced that 7292 is a duplicate issue. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)