Re: Nodes show different number of tokens than initially
On Fri, Feb 2, 2018 at 2:37 AM, kurt greaveswrote: > So one time I tried to understand why only a single node could have a > token, and it appeared that it came over the fence from facebook and has > been kept ever since. Personally I don't think it's necessary, and agree > that it is kind of problematic (but there's probably lot's of stuff that > relies on this now). Multiple DC's is one example but the same could apply > to racks. There's no real reason (with NTS) that two nodes in separate > racks can't have the same token. In fact being able to do this would make > token allocation much simpler, and smart allocation algorithms could work > much better with vnodes. > I understand that it might be way too late to change this. My biggest gripe though is that all these subtle (but essential for real understanding) details are ever so poorly documented. I hope with the move away from DataStax to Community website this might gradually improve. Regards, -- Alex
Re: Nodes show different number of tokens than initially
So one time I tried to understand why only a single node could have a token, and it appeared that it came over the fence from facebook and has been kept ever since. Personally I don't think it's necessary, and agree that it is kind of problematic (but there's probably lot's of stuff that relies on this now). Multiple DC's is one example but the same could apply to racks. There's no real reason (with NTS) that two nodes in separate racks can't have the same token. In fact being able to do this would make token allocation much simpler, and smart allocation algorithms could work much better with vnodes. On 1 February 2018 at 17:35, Oleksandr Shulginwrote: > On Thu, Feb 1, 2018 at 5:19 AM, Jeff Jirsa wrote: > >> >>> The reason I find it surprising, is that it makes very little *sense* to >>> put a token belonging to a mode from one DC between tokens of nodes from >>> another one. >>> >> >> I don't want to really turn this into an argument over what should and >> shouldn't make sense, but I do agree, it doesn't make sense to put a token >> on one node in one DC onto another node in another DC. >> > > This is not what I was trying to say. I should have used an example to > express myself clearer. Here goes (disclaimer: it might sound like a rant, > take it with a grain of salt): > > $ ccm create -v 3.0.15 -n 3:3 -s 2dcs > > For a more meaningful multi-DC setup than the default SimpleStrategy, use > NTS: > > $ ccm node1 cqlsh -e "ALTER KEYSPACE system_auth WITH replication = > {'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};" > > $ ccm node1 nodetool ring > > Datacenter: dc1 > == > AddressRackStatus State LoadOwns > Token > > 3074457345618258602 > 127.0.0.1 r1 Up Normal 117.9 KB66.67% > -9223372036854775808 > 127.0.0.2 r1 Up Normal 131.56 KB 66.67% > -3074457345618258603 > 127.0.0.3 r1 Up Normal 117.88 KB 66.67% > 3074457345618258602 > > Datacenter: dc2 > == > AddressRackStatus State LoadOwns > Token > > 3074457345618258702 > 127.0.0.4 r1 Up Normal 121.54 KB 66.67% > -9223372036854775708 > 127.0.0.5 r1 Up Normal 118.59 KB 66.67% > -3074457345618258503 > 127.0.0.6 r1 Up Normal 114.12 KB 66.67% > 3074457345618258702 > > Note that CCM is aware of the cross-DC clashes and selects the tokens for > the dc2 shifted by a 100. > > Then look at the token ring (output abbreviated and aligned by me): > > $ ccm node1 nodetool describering system_auth > > Schema Version:4f7d0ad0-350d-3ea0-ae8b-53d5bc34fc7e > TokenRange: > TokenRange(start_token:-9223372036854775808, > end_token:-9223372036854775708, endpoints:[127.0.0.4, 127.0.0.2, > 127.0.0.5, 127.0.0.3], ... TokenRange(start_token:-9223372036854775708, > end_token:-3074457345618258603, endpoints:[127.0.0.2, 127.0.0.5, > 127.0.0.3, 127.0.0.6], ... TokenRange(start_token:-3074457345618258603, > end_token:-3074457345618258503, endpoints:[127.0.0.5, 127.0.0.3, > 127.0.0.6, 127.0.0.1], ... > TokenRange(start_token:-3074457345618258503, end_token: > 3074457345618258602, endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.1, > 127.0.0.4], ... TokenRange(start_token: 3074457345618258602, end_token: > 3074457345618258702, endpoints:[127.0.0.6, 127.0.0.1, 127.0.0.4, > 127.0.0.2], ... > TokenRange(start_token: 3074457345618258702, end_token:-9223372036854775808, > endpoints:[127.0.0.1, 127.0.0.4, 127.0.0.2, 127.0.0.5], ... > > So in this setup, every token range has one end contributed by a node from > dc1 and the other end -- from dc2. That doesn't model anything in the real > topology of the cluster. > > I see that it's easy to lump together tokens from all nodes and sort them, > to produce a single token ring (and this is obviously the reason why tokens > have to be unique throughout the cluster as a whole). That doesn't mean > it's a meaningful thing to do. > > This introduces complexity which not present in the problem domain > initially. This was a deliberate choice of developers, dare I say, to > complect the separate DCs together in a single token ring. This has > profound consequences from the operations side. If anything, it prevents > bootstrapping multiple nodes at the same time even if they are in different > DCs. Or would you suggest to set consistent_range_movement=false and > hope it will work out? > > If the whole reason for having separate DCs is to provide isolation, I > fail to see how the single token ring design does anything towards > achieving that. > > But also being very clear (I want to make sure I understand what you're >> saying): that's a manual thing you did, Cassandra didn't do it for you, >> right? The fact that Cassandra didn't STOP you from doing it could be >> considered a bug, but YOU made that config choice? >> > > Yes, we have chosen exactly the same token for two nodes in different DCs >
Re: Nodes show different number of tokens than initially
On Thu, Feb 1, 2018 at 5:19 AM, Jeff Jirsawrote: > >> The reason I find it surprising, is that it makes very little *sense* to >> put a token belonging to a mode from one DC between tokens of nodes from >> another one. >> > > I don't want to really turn this into an argument over what should and > shouldn't make sense, but I do agree, it doesn't make sense to put a token > on one node in one DC onto another node in another DC. > This is not what I was trying to say. I should have used an example to express myself clearer. Here goes (disclaimer: it might sound like a rant, take it with a grain of salt): $ ccm create -v 3.0.15 -n 3:3 -s 2dcs For a more meaningful multi-DC setup than the default SimpleStrategy, use NTS: $ ccm node1 cqlsh -e "ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};" $ ccm node1 nodetool ring Datacenter: dc1 == AddressRackStatus State LoadOwns Token 3074457345618258602 127.0.0.1 r1 Up Normal 117.9 KB66.67% -9223372036854775808 127.0.0.2 r1 Up Normal 131.56 KB 66.67% -3074457345618258603 127.0.0.3 r1 Up Normal 117.88 KB 66.67% 3074457345618258602 Datacenter: dc2 == AddressRackStatus State LoadOwns Token 3074457345618258702 127.0.0.4 r1 Up Normal 121.54 KB 66.67% -9223372036854775708 127.0.0.5 r1 Up Normal 118.59 KB 66.67% -3074457345618258503 127.0.0.6 r1 Up Normal 114.12 KB 66.67% 3074457345618258702 Note that CCM is aware of the cross-DC clashes and selects the tokens for the dc2 shifted by a 100. Then look at the token ring (output abbreviated and aligned by me): $ ccm node1 nodetool describering system_auth Schema Version:4f7d0ad0-350d-3ea0-ae8b-53d5bc34fc7e TokenRange: TokenRange(start_token:-9223372036854775808, end_token:-9223372036854775708, endpoints:[127.0.0.4, 127.0.0.2, 127.0.0.5, 127.0.0.3], ... TokenRange( start_token:-9223372036854775708, end_token:-3074457345618258603, endpoints:[127.0.0.2, 127.0.0.5, 127.0.0.3, 127.0.0.6], ... TokenRange(start_token:-3074457345618258603, end_token:-3074457345618258503, endpoints:[127.0.0.5, 127.0.0.3, 127.0.0.6, 127.0.0.1], ... TokenRange(start_token:-3074457345618258503, end_token: 3074457345618258602, endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.1, 127.0.0.4], ... TokenRange(start_token: 3074457345618258602, end_token: 3074457345618258702, endpoints:[127.0.0.6, 127.0.0.1, 127.0.0.4, 127.0.0.2], ... TokenRange(start_token: 3074457345618258702, end_token:-9223372036854775808, endpoints:[127.0.0.1, 127.0.0.4, 127.0.0.2, 127.0.0.5], ... So in this setup, every token range has one end contributed by a node from dc1 and the other end -- from dc2. That doesn't model anything in the real topology of the cluster. I see that it's easy to lump together tokens from all nodes and sort them, to produce a single token ring (and this is obviously the reason why tokens have to be unique throughout the cluster as a whole). That doesn't mean it's a meaningful thing to do. This introduces complexity which not present in the problem domain initially. This was a deliberate choice of developers, dare I say, to complect the separate DCs together in a single token ring. This has profound consequences from the operations side. If anything, it prevents bootstrapping multiple nodes at the same time even if they are in different DCs. Or would you suggest to set consistent_range_movement=false and hope it will work out? If the whole reason for having separate DCs is to provide isolation, I fail to see how the single token ring design does anything towards achieving that. But also being very clear (I want to make sure I understand what you're > saying): that's a manual thing you did, Cassandra didn't do it for you, > right? The fact that Cassandra didn't STOP you from doing it could be > considered a bug, but YOU made that config choice? > Yes, we have chosen exactly the same token for two nodes in different DCs because we were unaware of this globally uniqueness requirement. Yes, we believe it's a bug that Cassandra didn't stop us from doing that. You can trivially predict what would happen with SimpleStrategy in > multi-DC: run nodetool ring, and the first RF nodes listed after a given > token own that data, regardless of which DC they're in. Because it's all > one big ring. In any case I don't think SimpleStrategy is a valid argument to consider in multi-DC setup. It is true that you can start a cluster spanning multiple DCs from scratich while using SimpleStrategy, but there is no way to add a new DC to the cluster unless you go NTS, so why pulling this example? Cheers, -- Alex
Re: Nodes show different number of tokens than initially
> > I don’t know why this is a surprise (maybe because people like to talk > about multiple rings, but the fact that replication strategy is set per > keyspace and that you could use SimpleStrategy in a multiple dc cluster > demonstrates this), but we can chat about that another time This is actually a point of confusion for a lot of new users. It seems obvious for people who know the internals or who have been around since pre-NTS/vnodes, but it's really not. Especially because NTS makes it seem like there are two separate rings. > that's a manual thing you did, Cassandra didn't do it for you, right? The > fact that Cassandra didn't STOP you from doing it could be considered a > bug, but YOU made that config choice? This should be fairly easy to reproduce, however Kurt mentioned that there > supposed to be some sort of protection against that. I'll try again > tomorrow. Sorry, the behaviour was expected. I was under the impression that you couldn't 'steal' a token from another node (thought C* stopped you), and I misread the code. It actually gives the token up to the new node - not the other way round. I haven't thought about it long enough to really consider what the behaviour should be, or whether the current behaviour is right or wrong though.
Re: Nodes show different number of tokens than initially
On Wed, Jan 31, 2018 at 12:08 PM, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On 31 Jan 2018 17:18, "Jeff Jirsa"wrote: > > > I don’t know why this is a surprise (maybe because people like to talk > about multiple rings, but the fact that replication strategy is set per > keyspace and that you could use SimpleStrategy in a multiple dc cluster > demonstrates this), but we can chat about that another time > > > The reason I find it surprising, is that it makes very little *sense* to > put a token belonging to a mode from one DC between tokens of nodes from > another one. > > I don't want to really turn this into an argument over what should and shouldn't make sense, but I do agree, it doesn't make sense to put a token on one node in one DC onto another node in another DC. But also being very clear (I want to make sure I understand what you're saying): that's a manual thing you did, Cassandra didn't do it for you, right? The fact that Cassandra didn't STOP you from doing it could be considered a bug, but YOU made that config choice? > Having token ranges like that, with ends from nodes in different DCs, > doesn't convey any *meaning* and have no correspondence to what is being > modelled here. It also makes it nearly impossible to reason about range > ownership (unless you're a machine, in which case you probably don't care). > > I understand that it works in the end, but it doesn't help to know that. > It is an implementation detail sticking outside the code guts and it sure > *is* surprising in all its ugliness. It also opens up the possibility of > problems just like the one which have started this discussion. > > I don't find the argument of using SimpleStrategy for multi-DC > particularly interesting, lest can I predict what to be expected from such > an attempt. > You can trivially predict what would happen with SimpleStrategy in multi-DC: run nodetool ring, and the first RF nodes listed after a given token own that data, regardless of which DC they're in. Because it's all one big ring.
Re: Nodes show different number of tokens than initially
On 31 Jan 2018 17:18, "Jeff Jirsa"wrote: I don’t know why this is a surprise (maybe because people like to talk about multiple rings, but the fact that replication strategy is set per keyspace and that you could use SimpleStrategy in a multiple dc cluster demonstrates this), but we can chat about that another time The reason I find it surprising, is that it makes very little *sense* to put a token belonging to a mode from one DC between tokens of nodes from another one. Having token ranges like that, with ends from nodes in different DCs, doesn't convey any *meaning* and have no correspondence to what is being modelled here. It also makes it nearly impossible to reason about range ownership (unless you're a machine, in which case you probably don't care). I understand that it works in the end, but it doesn't help to know that. It is an implementation detail sticking outside the code guts and it sure *is* surprising in all its ugliness. It also opens up the possibility of problems just like the one which have started this discussion. I don't find the argument of using SimpleStrategy for multi-DC particularly interesting, lest can I predict what to be expected from such an attempt. If this is deemed invalid config why does the new node *silently* steals the existing token, badly affecting the ownership of the rest of the nodes? It should just refuse to start! Philosophically, With multiple DCs, it may start up and not see the other DC for minutes/hours/days before it realizes there’s a token conflict - what should it do then? This was not the case for us - the new mode has seen all of the ring and could detect that there is a conflict. Still it decided to claim the token ownership, removing it from a longer-lived mode. This should be fairly easy to reproduce, however Kurt mentioned that there supposed to be some sort of protection against that. I'll try again tomorrow. If your suggestion to resolve that is to make sure we see the whole ring before starting up, we end up in a situation where we try not to startup unless we can see all nodes, and create outages during DC separations. I don't really see a problem here. A newly started node learns topology from the seed nodes - it doesn't need to *see* all nodes, just learn that the *exist* and which tokens are assigned to them. A node which is restarting doesn't even need to do that, because it doesn't need to reconsider its token ownership. Cheers, -- Alex
Re: Nodes show different number of tokens than initially
> On Jan 31, 2018, at 12:35 AM, Oleksandr Shulgin >wrote: > >> On Tue, Jan 30, 2018 at 5:44 PM, Jeff Jirsa wrote: >> All DCs in a cluster use the same token space in the DHT, > > I can't believe my bloody eyes, but this seems to be true... I don’t know why this is a surprise (maybe because people like to talk about multiple rings, but the fact that replication strategy is set per keyspace and that you could use SimpleStrategy in a multiple dc cluster demonstrates this), but we can chat about that another time > >> so token conflicts across datacenters are invalid config > > If this is deemed invalid config why does the new node *silently* steals the > existing token, badly affecting the ownership of the rest of the nodes? It > should just refuse to start! Philosophically, With multiple DCs, it may start up and not see the other DC for minutes/hours/days before it realizes there’s a token conflict - what should it do then? Which node gets stopped? If your suggestion to resolve that is to make sure we see the whole ring before starting up, we end up in a situation where we try not to startup unless we can see all nodes, and create outages during DC separations. Distributed systems and occasional availability make these decisions harder. Please open a jira if you think it’s wrong, but I’m not sure I know what the “right” answer is either.
Re: Nodes show different number of tokens than initially
So the only reason that the new node would "steal" the token is if it started up earlier - which is based off how many heartbeats have occurred since entering NORMAL status on each node. I can't see any reason the new nodes would have higher generation numbers, so sounds likely there's a bug somewhere there. I'm not really sure why this comparison would be relevant unless you were starting multiple nodes at the same time, and based off your example it seems it definitely shouldn't have happened. Can you create a JIRA ticket with the above information?
Re: Nodes show different number of tokens than initially
On Wed, Jan 31, 2018 at 5:06 AM, Dikang Guwrote: > What's the partitioner you use? We have logic to prevent duplicate tokens. > We are using the default Murmur3Partitioner. The problem arises from the fact that we manually allocating the tokens as described earlier. -- Alex
Re: Nodes show different number of tokens than initially
On Tue, Jan 30, 2018 at 5:44 PM, Jeff Jirsawrote: > All DCs in a cluster use the same token space in the DHT, > I can't believe my bloody eyes, but this seems to be true... so token conflicts across datacenters are invalid config > If this is deemed invalid config why does the new node *silently* steals the existing token, badly affecting the ownership of the rest of the nodes? It should just refuse to start! -- Alex
Re: Nodes show different number of tokens than initially
What's the partitioner you use? We have logic to prevent duplicate tokens. private static Collection adjustForCrossDatacenterClashes(final TokenMetadata tokenMetadata, StrategyAdapter strategy, Collection tokens) { List filtered = Lists.newArrayListWithCapacity(tokens.size()); for (Token t : tokens) { while (tokenMetadata.getEndpoint(t) != null) { InetAddress other = tokenMetadata.getEndpoint(t); if (strategy.inAllocationRing(other)) throw new ConfigurationException(String.format("Allocated token %s already assigned to node %s. Is another node also allocating tokens?", t, other)); t = t.increaseSlightly(); } filtered.add(t); } return filtered; } On Tue, Jan 30, 2018 at 8:44 AM, Jeff Jirsawrote: > All DCs in a cluster use the same token space in the DHT, so token > conflicts across datacenters are invalid config > > > -- > Jeff Jirsa > > > On Jan 29, 2018, at 11:50 PM, Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > > On Tue, Jan 30, 2018 at 5:13 AM, kurt greaves > wrote: > >> Shouldn't happen. Can you send through nodetool ring output from one of >> those nodes? Also, did the logs have anything to say about tokens when you >> started the 3 seed nodes? >> > > Hi Kurt, > > I cannot run nodetool ring anymore, since these test nodes are long gone. > However I've grepped the logs and this is what I've found: > > Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18 > Nodes /172.31.128.31 and /172.31.128.41 have the same token > -9223372036854775808. Ignoring /172.31.128.31 > Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18 > Nodes /172.31.144.32 and /172.31.128.41 have the same token > -8454757700450211158. Ignoring /172.31.144.32 > Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30 > Nodes /172.31.128.41 and /172.31.128.31 have the same token > -9223372036854775808. /172.31.128.41 is the new owner > Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30 > Nodes /172.31.144.32 and /172.31.128.41 have the same token > -8454757700450211158. Ignoring /172.31.144.32 > Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45 > Nodes /172.31.128.41 and /172.31.128.31 have the same token > -9223372036854775808. /172.31.128.41 is the new owner > Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45 > Nodes /172.31.144.32 and /172.31.128.41 have the same token > -8454757700450211158. Ignoring /172.31.144.32 > > Since we are allocating the tokens for seed nodes manually, it appears > that the first seed node in the new ring (172.31.128.41) gets the same > first token (-9223372036854775808) as the node in the old ring > (172.31.128.31). The same goes for the 3rd token of the new seed node > (-8454757700450211158). > > What is beyond me is why would that matter and why would token ownership > change at all, while these nodes are in the *different virtual DCs*? To me > this sounds like a paticularly nasty bug... > > -- > Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 > 127-59-707 <+49%20176%2012759707> > > -- Dikang
Re: Nodes show different number of tokens than initially
All DCs in a cluster use the same token space in the DHT, so token conflicts across datacenters are invalid config -- Jeff Jirsa > On Jan 29, 2018, at 11:50 PM, Oleksandr Shulgin >wrote: > >> On Tue, Jan 30, 2018 at 5:13 AM, kurt greaves wrote: >> Shouldn't happen. Can you send through nodetool ring output from one of >> those nodes? Also, did the logs have anything to say about tokens when you >> started the 3 seed nodes? > > Hi Kurt, > > I cannot run nodetool ring anymore, since these test nodes are long gone. > However I've grepped the logs and this is what I've found: > > Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18 > Nodes /172.31.128.31 and /172.31.128.41 have the same token > -9223372036854775808. Ignoring /172.31.128.31 > Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18 > Nodes /172.31.144.32 and /172.31.128.41 have the same token > -8454757700450211158. Ignoring /172.31.144.32 > Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30 > Nodes /172.31.128.41 and /172.31.128.31 have the same token > -9223372036854775808. /172.31.128.41 is the new owner > Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30 > Nodes /172.31.144.32 and /172.31.128.41 have the same token > -8454757700450211158. Ignoring /172.31.144.32 > Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45 > Nodes /172.31.128.41 and /172.31.128.31 have the same token > -9223372036854775808. /172.31.128.41 is the new owner > Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45 > Nodes /172.31.144.32 and /172.31.128.41 have the same token > -8454757700450211158. Ignoring /172.31.144.32 > > Since we are allocating the tokens for seed nodes manually, it appears that > the first seed node in the new ring (172.31.128.41) gets the same first token > (-9223372036854775808) as the node in the old ring (172.31.128.31). The same > goes for the 3rd token of the new seed node (-8454757700450211158). > > What is beyond me is why would that matter and why would token ownership > change at all, while these nodes are in the *different virtual DCs*? To me > this sounds like a paticularly nasty bug... > > -- > Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 > 127-59-707 >
Re: Nodes show different number of tokens than initially
On Tue, Jan 30, 2018 at 5:13 AM, kurt greaveswrote: > Shouldn't happen. Can you send through nodetool ring output from one of > those nodes? Also, did the logs have anything to say about tokens when you > started the 3 seed nodes? > Hi Kurt, I cannot run nodetool ring anymore, since these test nodes are long gone. However I've grepped the logs and this is what I've found: Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18 Nodes /172.31.128.31 and /172.31.128.41 have the same token -9223372036854775808. Ignoring /172.31.128.31 Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18 Nodes /172.31.144.32 and /172.31.128.41 have the same token -8454757700450211158. Ignoring /172.31.144.32 Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30 Nodes /172.31.128.41 and /172.31.128.31 have the same token -9223372036854775808. /172.31.128.41 is the new owner Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30 Nodes /172.31.144.32 and /172.31.128.41 have the same token -8454757700450211158. Ignoring /172.31.144.32 Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45 Nodes /172.31.128.41 and /172.31.128.31 have the same token -9223372036854775808. /172.31.128.41 is the new owner Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45 Nodes /172.31.144.32 and /172.31.128.41 have the same token -8454757700450211158. Ignoring /172.31.144.32 Since we are allocating the tokens for seed nodes manually, it appears that the first seed node in the new ring (172.31.128.41) gets the same first token (-9223372036854775808) as the node in the old ring (172.31.128.31). The same goes for the 3rd token of the new seed node (-8454757700450211158). What is beyond me is why would that matter and why would token ownership change at all, while these nodes are in the *different virtual DCs*? To me this sounds like a paticularly nasty bug... -- Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 127-59-707
Re: Nodes show different number of tokens than initially
Shouldn't happen. Can you send through nodetool ring output from one of those nodes? Also, did the logs have anything to say about tokens when you started the 3 seed nodes?
Re: Nodes show different number of tokens than initially
On Fri, Jan 26, 2018 at 3:08 PM, Kenneth Brotman < kenbrot...@yahoo.com.invalid> wrote: > > > Could it be that after distributing the data, some of the nodes did not > need to have a fourth token? > I'm not sure, but that would be definitely against my understanding of how token assignment works. E.g. these nodes still have num_tokens=4 in the configuration file, so if they are restarted, the Cassandra server will refuse to start, right? -- Alex
RE: Nodes show different number of tokens than initially
Oleksandr, Could it be that after distributing the data, some of the nodes did not need to have a fourth token? Kenneth Brotman From: Oleksandr Shulgin [mailto:oleksandr.shul...@zalando.de] Sent: Thursday, January 25, 2018 3:44 AM To: User Subject: Nodes show different number of tokens than initially Hello, While testing token allocation with version 3.0.15 we are experiencing some quite unexpected result. We have deployed a secondary virtual DC with 6 nodes, 4 tokens per node. Then we were adding the 7th node to the new DC in order to observe the effect of ownership re-distribution. To set up the new DC we've used the following steps: 1. Alter all keyspaces to replicate to the upcoming new DC. 2. Deploy 3 seed nodes (IP ends with .31) with num_tokens=4 and tokens specified by initial_token list, auto_bootstrap=false. 3. Deploy 3 more nodes (IP ends with .32) with num_tokens=4 and allocate_tokens_for_keyspace=data_ks, auto_bootstrap=true. 4. Rebuild all new nodes specifying eu-central as the source DC (for the 3 already bootstrapped nodes, workaround by truncating system.available_ranges first). The following is the output of nodetool status after starting to bootstrap the 7th node (172.31.128.33): Datacenter: eu-central == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.31.160.12 26.4 GB256 48.9% 89067222-b0eb-49e5-be7d-758ea24ace9a 1c UN 172.31.144.12 28.92 GB 256 52.6% 2ab4786f-9722-4418-ba78-9c435cbb30e5 1b UN 172.31.128.12 28.13 GB 256 47.9% c4733a5c-abc5-4bab-9449-1e3f584cf64f 1a UN 172.31.128.11 29.84 GB 256 52.2% 6083369c-1a0f-4098-a420-313dacd429b6 1a UN 172.31.160.11 28.25 GB 256 51.1% 4dc361fc-818a-4b7f-abd3-9121488a7db1 1c UN 172.31.144.11 28.14 GB 256 47.4% 05e5df92-d196-46d5-8812-e843fbbd2922 1b Datacenter: eu-central_4vn == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.31.128.31 24.83 GB 445.8% 4d7decb3-8692-4aec-a2e1-2ac89aed8c5a 1a UN 172.31.144.31 26.52 GB 445.8% 2eb29602-2df5-4f4f-b419-b5a94cf785f0 1b UN 172.31.160.31 248 GB445.8% f1bd4696-c25c-4bc3-8c30-292f2bd027c1 1c UJ 172.31.128.33 568.94 MB 4? ffa21d50-9bb4-4d2b-9e3e-7a6945f6f071 1a UN 172.31.144.32 29.3 GB454.2% 5ce019f6-99fd-4333-b231-d04a266229bb 1b UN 172.31.160.32 27.8 GB454.2% 193bef27-eea8-4aa6-9d5f-8baf3decdd76 1c UN 172.31128.32 30.5 GB454.2% 6a046b64-31f9-4881-85b0-ab3a2f6dcdc4 1a Then we wanted to start testing distribution with 8 vnodes. For that we started to deploy yet another DC. The following is the output of nodetool status after deploying the 3 seed nodes of the 8-tokens DC: Datacenter: eu-central == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.31.160.12 26.4 GB256 48.9% 89067222-b0eb-49e5-be7d-758ea24ace9a 1c UN 172.31.144.12 28.92 GB 256 52.6% 2ab4786f-9722-4418-ba78-9c435cbb30e5 1b UN 172.31.128.12 28.13 GB 256 47.9% c4733a5c-abc5-4bab-9449-1e3f584cf64f 1a UN 172.31.128.11 29.84 GB 256 52.2% 6083369c-1a0f-4098-a420-313dacd429b6 1a UN 172.31.160.11 28.25 GB 256 51.1% 4dc361fc-818a-4b7f-abd3-9121488a7db1 1c UN 172.31.144.11 28.14 GB 256 47.4% 05e5df92-d196-46d5-8812-e843fbbd2922 1b Datacenter: eu-central_4vn == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.31.128.31 24.83 GB 345.8% 4d7decb3-8692-4aec-a2e1-2ac89aed8c5a 1a UN 172.31.144.31 26.52 GB 445.8% 2eb29602-2df5-4f4f-b419-b5a94cf785f0 1b UN 172.31.160.31 24.8 GB445.8% f1bd4696-c25c-4bc3-8c30-292f2bd027c1 1c UJ 172.31.128.33 4.21 GB4? ffa21d50-9bb4-4d2b-9e3e-7a6945f6f071 1a UN 17231.160.32 27.8 GB454.2% 193bef27-eea8-4aa6-9d5f-8baf3decdd76 1c UN 172.31.144.32 29.3 GB354.2% 5ce019f6-99fd-4333-b231-d04a266229bb 1b UN 172.31.128.32 30.5 GB454.2
Nodes show different number of tokens than initially
Hello, While testing token allocation with version 3.0.15 we are experiencing some quite unexpected result. We have deployed a secondary virtual DC with 6 nodes, 4 tokens per node. Then we were adding the 7th node to the new DC in order to observe the effect of ownership re-distribution. To set up the new DC we've used the following steps: 1. Alter all keyspaces to replicate to the upcoming new DC. 2. Deploy 3 seed nodes (IP ends with .31) with num_tokens=4 and tokens specified by initial_token list, auto_bootstrap=false. 3. Deploy 3 more nodes (IP ends with .32) with num_tokens=4 and allocate_tokens_for_keyspace=data_ks, auto_bootstrap=true. 4. Rebuild all new nodes specifying eu-central as the source DC (for the 3 already bootstrapped nodes, workaround by truncating system.available_ranges first). The following is the output of nodetool status after starting to bootstrap the 7th node (172.31.128.33): Datacenter: eu-central == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.31.160.12 26.4 GB256 48.9% 89067222-b0eb-49e5-be7d-758ea24ace9a 1c UN 172.31.144.12 28.92 GB 256 52.6% 2ab4786f-9722-4418-ba78-9c435cbb30e5 1b UN 172.31.128.12 28.13 GB 256 47.9% c4733a5c-abc5-4bab-9449-1e3f584cf64f 1a UN 172.31.128.11 29.84 GB 256 52.2% 6083369c-1a0f-4098-a420-313dacd429b6 1a UN 172.31.160.11 28.25 GB 256 51.1% 4dc361fc-818a-4b7f-abd3-9121488a7db1 1c UN 172.31.144.11 28.14 GB 256 47.4% 05e5df92-d196-46d5-8812-e843fbbd2922 1b Datacenter: eu-central_4vn == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.31.128.31 24.83 GB 445.8% 4d7decb3-8692-4aec-a2e1-2ac89aed8c5a 1a UN 172.31.144.31 26.52 GB 445.8% 2eb29602-2df5-4f4f-b419-b5a94cf785f0 1b UN 172.31.160.31 24.8 GB445.8% f1bd4696-c25c-4bc3-8c30-292f2bd027c1 1c UJ 172.31.128.33 568.94 MB 4? ffa21d50-9bb4-4d2b-9e3e-7a6945f6f071 1a UN 172.31.144.32 29.3 GB454.2% 5ce019f6-99fd-4333-b231-d04a266229bb 1b UN 172.31.160.32 27.8 GB454.2% 193bef27-eea8-4aa6-9d5f-8baf3decdd76 1c UN 172.31.128.32 30.5 GB454.2% 6a046b64-31f9-4881-85b0-ab3a2f6dcdc4 1a Then we wanted to start testing distribution with 8 vnodes. For that we started to deploy yet another DC. The following is the output of nodetool status after deploying the 3 seed nodes of the 8-tokens DC: Datacenter: eu-central == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.31.160.12 26.4 GB256 48.9% 89067222-b0eb-49e5-be7d-758ea24ace9a 1c UN 172.31.144.12 28.92 GB 256 52.6% 2ab4786f-9722-4418-ba78-9c435cbb30e5 1b UN 172.31.128.12 28.13 GB 256 47.9% c4733a5c-abc5-4bab-9449-1e3f584cf64f 1a UN 172.31.128.11 29.84 GB 256 52.2% 6083369c-1a0f-4098-a420-313dacd429b6 1a UN 172.31.160.11 28.25 GB 256 51.1% 4dc361fc-818a-4b7f-abd3-9121488a7db1 1c UN 172.31.144.11 28.14 GB 256 47.4% 05e5df92-d196-46d5-8812-e843fbbd2922 1b Datacenter: eu-central_4vn == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack *UN 172.31.128.31 24.83 GB 345.8% 4d7decb3-8692-4aec-a2e1-2ac89aed8c5a 1a* UN 172.31.144.31 26.52 GB 445.8% 2eb29602-2df5-4f4f-b419-b5a94cf785f0 1b UN 172.31.160.31 24.8 GB445.8% f1bd4696-c25c-4bc3-8c30-292f2bd027c1 1c UJ 172.31.128.33 4.21 GB4? ffa21d50-9bb4-4d2b-9e3e-7a6945f6f071 1a UN 172.31.160.32 27.8 GB454.2% 193bef27-eea8-4aa6-9d5f-8baf3decdd76 1c *UN 172.31.144.32 29.3 GB354.2% 5ce019f6-99fd-4333-b231-d04a266229bb 1b* UN 172.31.128.32 30.5 GB454.2% 6a046b64-31f9-4881-85b0-ab3a2f6dcdc4 1a Datacenter: eu-central_8vn == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 172.31.128.41 111.11 KB 80.0% e218b68e-9837-4e6a-acbe-9833fda285bc 1a UN 172.31.144.41 113.2 KB 80.0% 3ec883e9-6b84-4314-85bd-a3c00c4f47c8 1b UN 172.31.160.41 82.22 KB 80.0% cfaee6c5-ee9c-4d29-aa54-ca3e8e74e356 1c What is absolutely unexpected is that here we see that 2 nodes in the _4vn DC apparently now have reduced number of tokens: 3 instead of 4. How could that happen? -- Alex