RE: Re: Unable to gossip with peers when starting cluster

Ben Klein Fri, 11 Nov 2022 09:46:39 -0800

0.9 was never a seed before.

Based on your comment, I also tried, from having all three nodes up(following the initial bootstrap), restarting 0.7. This failed with thesame error.



On 2022/11/09 15:37:24 Jeff Jirsa wrote:
> When you say you configured them to talk to .0.31 as a seed, did you do
> that by changing the yaml?
>
> Was 0.9 ever a seed before?
>
> I expect if you start 0.7 and 0.9 at the same time, it all works. This
> looks like a logic/state bug that needs to be fixed, though.
>
> (If you're going to upgrade, usually you start with all 3 hosts up, and

> restart one at a time. Starting with 0 online is likely poorlytested, and

> we should fix that).
>
>
>
> On Wed, Nov 9, 2022 at 7:08 AM Klein, Benjamin E (PERATON) <
> benjamin.e.kl...@peraton.com> wrote:
>
> > I am trying to upgrade a three-node Cassandra cluster (192.168.0.31,
> > 192.168.0.7, and 192.168.0.9) from 3.11 to 4.0.3. At the start of the
> > process, all three nodes are down. I have configured all three nodes to
> > have 192.168.0.31:7000 as their only seed.
> >
> > I am trying to bring all three nodes up, one at a time. Starting Node 1

> > (.31) works just fine. However, Node 2 (.7) fails to start with theerror> > message "Unable to gossip with any peers". The configuration fileand log

> > from Node 2 are attached (the log has had lines related to loading
> > individual tables snipped); the relevant portion of the log is at the
> > bottom of this message. Note that this node was able to successfully
> > connect to the other seed node.
> >
> > I have already tried the following unsuccessfully:
> >

> > * Starting with a completely blank (i.e., newly formatted) /datadrive on> > all nodes. This worked fine the first time the cluster started;however,

> > attempting to restart the cluster gives the same error.

> > * Ensuring that all clocks are synchronized to the same NTPservers, which

> > have a ping time to all three nodes of approximately 0.5-1.0ms
> > * Setting the cross_node_timeout configuration entry to false

> > * Setting the internode_tcp_connect_timeout_in_ms configurationentry to

> > 20000

> > * Adding an entry for each node in its /etc/hosts file (e.g., Node1 gets

> > the entry "192.168.0.31 node-1")
> >
> > Is there anything else I should try?
> >
> > ---
> > Relevant portion of Cassandra log:
> > INFO [main] 2022-11-04 16:57:02,541 StorageService.java:755 - Loading
> > persisted ring state

> > INFO [main] 2022-11-04 16:57:02,541 StorageService.java:838 -Populating

> > token metadata from system tables

> > INFO [GossipStage:1] 2022-11-04 16:57:02,570 Gossiper.java:1969 -Adding /

> > 192.168.0.31:7000 as there was no previous epState; new state is

> > EndpointState: HeartBeatState = HeartBeat: generation = 0, version= -1,

> > AppStateMap = {}

> > INFO [GossipStage:1] 2022-11-04 16:57:02,570 Gossiper.java:1969 -Adding /

> > 192.168.0.9:7000 as there was no previous epState; new state is

> > EndpointState: HeartBeatState = HeartBeat: generation = 0, version= -1,

> > AppStateMap = {}

> > INFO [main] 2022-11-04 16:57:02,705InboundConnectionInitiator.java:127 -

> > Listening on address: (/192.168.0.7:7000), nic: eth0, encryption:
> > unencrypted
> > INFO [Messaging-EventLoop-3-3] 2022-11-04 16:57:02,993
> > OutboundConnection.java:1150 - /192.168.0.7:7000(/192.168.0.7:55882
> > )->/192.168.0.31:7000-URGENT_MESSAGES-ef0bde62 successfully connected,
> > version = 12, framing = CRC, encryption = unencrypted
> > INFO [Messaging-EventLoop-3-6] 2022-11-04 16:57:07,938

> > NoSpamLogger.java:92 -/192.168.0.7:7000->/192.168.0.9:7000-URGENT_MESSAGES-[no-channel]

> > failed to connect
> > io.netty.channel.AbstractChannel$AnnotatedConnectException:
> > finishConnect(..) failed: Connection refused: /192.168.0.9:7000

> > Caused by: java.net.ConnectException: finishConnect(..) failed:Connection

> > refused
> > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> > at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
> > at

> >io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673)

> > at

> >io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650)

> > at

> >io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530)

> > at

> >io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)

> > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
> > at

> >io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)

> > at

> >io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

> > at

> >io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

> > at java.base/java.lang.Thread.run(Thread.java:829)

> > Exception (java.lang.RuntimeException) encountered during startup:Unable

> > to gossip with any peers
> > java.lang.RuntimeException: Unable to gossip with any peers
> > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1844)
> > at

> >org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:650)

> > at

> >org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:936)

> > at

> >org.apache.cassandra.service.StorageService.initServer(StorageService.java:786)

> > at

> >org.apache.cassandra.service.StorageService.initServer(StorageService.java:731)

> > at

> >org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)

> > at

> >org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:765)

> > at

> >org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:889)> > ERROR [main] 2022-11-04 16:58:03,943 CassandraDaemon.java:911 -Exception

> > encountered during startup
> > java.lang.RuntimeException: Unable to gossip with any peers
> > at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1844)
> > at