Re: Nodes show different number of tokens than initially

2018-02-01 Thread Oleksandr Shulgin
On Fri, Feb 2, 2018 at 2:37 AM, kurt greaves  wrote:

> So one time I tried to understand why only a single node could have a
> token, and it appeared that it came over the fence from facebook and has
> been kept ever since. Personally I don't think it's necessary, and agree
> that it is kind of problematic (but there's probably lot's of stuff that
> relies on this now). Multiple DC's is one example but the same could apply
> to racks. There's no real reason (with NTS) that two nodes in separate
> racks can't have the same token. In fact being able to do this would make
> token allocation much simpler, and smart allocation algorithms could work
> much better with vnodes.
>

I understand that it might be way too late to change this.

My biggest gripe though is that all these subtle (but essential for real
understanding) details are ever so poorly documented.  I hope with the move
away from DataStax to Community website this might gradually improve.

Regards,
--
Alex


Re: Nodes show different number of tokens than initially

2018-02-01 Thread kurt greaves
So one time I tried to understand why only a single node could have a
token, and it appeared that it came over the fence from facebook and has
been kept ever since. Personally I don't think it's necessary, and agree
that it is kind of problematic (but there's probably lot's of stuff that
relies on this now). Multiple DC's is one example but the same could apply
to racks. There's no real reason (with NTS) that two nodes in separate
racks can't have the same token. In fact being able to do this would make
token allocation much simpler, and smart allocation algorithms could work
much better with vnodes.

On 1 February 2018 at 17:35, Oleksandr Shulgin  wrote:

> On Thu, Feb 1, 2018 at 5:19 AM, Jeff Jirsa  wrote:
>
>>
>>> The reason I find it surprising, is that it makes very little *sense* to
>>> put a token belonging to a mode from one DC between tokens of nodes from
>>> another one.
>>>
>>
>> I don't want to really turn this into an argument over what should and
>> shouldn't make sense, but I do agree, it doesn't make sense to put a token
>> on one node in one DC onto another node in another DC.
>>
>
> This is not what I was trying to say.  I should have used an example to
> express myself clearer.  Here goes (disclaimer: it might sound like a rant,
> take it with a grain of salt):
>
> $ ccm create -v 3.0.15 -n 3:3 -s 2dcs
>
> For a more meaningful multi-DC setup than the default SimpleStrategy, use
> NTS:
>
> $ ccm node1 cqlsh -e "ALTER KEYSPACE system_auth WITH replication =
> {'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};"
>
> $ ccm node1 nodetool ring
>
> Datacenter: dc1
> ==
> AddressRackStatus State   LoadOwns
> Token
>
> 3074457345618258602
> 127.0.0.1  r1  Up Normal  117.9 KB66.67%
> -9223372036854775808
> 127.0.0.2  r1  Up Normal  131.56 KB   66.67%
> -3074457345618258603
> 127.0.0.3  r1  Up Normal  117.88 KB   66.67%
> 3074457345618258602
>
> Datacenter: dc2
> ==
> AddressRackStatus State   LoadOwns
> Token
>
> 3074457345618258702
> 127.0.0.4  r1  Up Normal  121.54 KB   66.67%
> -9223372036854775708
> 127.0.0.5  r1  Up Normal  118.59 KB   66.67%
> -3074457345618258503
> 127.0.0.6  r1  Up Normal  114.12 KB   66.67%
> 3074457345618258702
>
> Note that CCM is aware of the cross-DC clashes and selects the tokens for
> the dc2 shifted by a 100.
>
> Then look at the token ring (output abbreviated and aligned by me):
>
> $ ccm node1 nodetool describering system_auth
>
> Schema Version:4f7d0ad0-350d-3ea0-ae8b-53d5bc34fc7e
> TokenRange:
> TokenRange(start_token:-9223372036854775808,
> end_token:-9223372036854775708, endpoints:[127.0.0.4, 127.0.0.2,
> 127.0.0.5, 127.0.0.3], ... TokenRange(start_token:-9223372036854775708,
> end_token:-3074457345618258603, endpoints:[127.0.0.2, 127.0.0.5,
> 127.0.0.3, 127.0.0.6], ... TokenRange(start_token:-3074457345618258603,
> end_token:-3074457345618258503, endpoints:[127.0.0.5, 127.0.0.3,
> 127.0.0.6, 127.0.0.1], ...
> TokenRange(start_token:-3074457345618258503, end_token:
> 3074457345618258602, endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.1,
> 127.0.0.4], ... TokenRange(start_token: 3074457345618258602, end_token:
> 3074457345618258702, endpoints:[127.0.0.6, 127.0.0.1, 127.0.0.4,
> 127.0.0.2], ...
> TokenRange(start_token: 3074457345618258702, end_token:-9223372036854775808,
> endpoints:[127.0.0.1, 127.0.0.4, 127.0.0.2, 127.0.0.5], ...
>
> So in this setup, every token range has one end contributed by a node from
> dc1 and the other end -- from dc2.  That doesn't model anything in the real
> topology of the cluster.
>
> I see that it's easy to lump together tokens from all nodes and sort them,
> to produce a single token ring (and this is obviously the reason why tokens
> have to be unique throughout the cluster as a whole).  That doesn't mean
> it's a meaningful thing to do.
>
> This introduces complexity which not present in the problem domain
> initially.  This was a deliberate choice of developers, dare I say, to
> complect the separate DCs together in a single token ring.  This has
> profound consequences from the operations side.  If anything, it prevents
> bootstrapping multiple nodes at the same time even if they are in different
> DCs.  Or would you suggest to set consistent_range_movement=false and
> hope it will work out?
>
> If the whole reason for having separate DCs is to provide isolation, I
> fail to see how the single token ring design does anything towards
> achieving that.
>
> But also being very clear (I want to make sure I understand what you're
>> saying): that's a manual thing you did, Cassandra didn't do it for you,
>> right? The fact that Cassandra didn't STOP you from doing it could be
>> considered a bug, but YOU made that config choice?
>>
>
> Yes, we have chosen exactly the same token for two nodes in different DCs
> 

Re: Nodes show different number of tokens than initially

2018-02-01 Thread Oleksandr Shulgin
On Thu, Feb 1, 2018 at 5:19 AM, Jeff Jirsa  wrote:

>
>> The reason I find it surprising, is that it makes very little *sense* to
>> put a token belonging to a mode from one DC between tokens of nodes from
>> another one.
>>
>
> I don't want to really turn this into an argument over what should and
> shouldn't make sense, but I do agree, it doesn't make sense to put a token
> on one node in one DC onto another node in another DC.
>

This is not what I was trying to say.  I should have used an example to
express myself clearer.  Here goes (disclaimer: it might sound like a rant,
take it with a grain of salt):

$ ccm create -v 3.0.15 -n 3:3 -s 2dcs

For a more meaningful multi-DC setup than the default SimpleStrategy, use
NTS:

$ ccm node1 cqlsh -e "ALTER KEYSPACE system_auth WITH replication =
{'class': 'NetworkTopologyStrategy', 'dc1': 2, 'dc2': 2};"

$ ccm node1 nodetool ring

Datacenter: dc1
==
AddressRackStatus State   LoadOwns
Token

3074457345618258602
127.0.0.1  r1  Up Normal  117.9 KB66.67%
-9223372036854775808
127.0.0.2  r1  Up Normal  131.56 KB   66.67%
-3074457345618258603
127.0.0.3  r1  Up Normal  117.88 KB   66.67%
3074457345618258602

Datacenter: dc2
==
AddressRackStatus State   LoadOwns
Token

3074457345618258702
127.0.0.4  r1  Up Normal  121.54 KB   66.67%
-9223372036854775708
127.0.0.5  r1  Up Normal  118.59 KB   66.67%
-3074457345618258503
127.0.0.6  r1  Up Normal  114.12 KB   66.67%
3074457345618258702

Note that CCM is aware of the cross-DC clashes and selects the tokens for
the dc2 shifted by a 100.

Then look at the token ring (output abbreviated and aligned by me):

$ ccm node1 nodetool describering system_auth

Schema Version:4f7d0ad0-350d-3ea0-ae8b-53d5bc34fc7e
TokenRange:
TokenRange(start_token:-9223372036854775808, end_token:-9223372036854775708,
endpoints:[127.0.0.4, 127.0.0.2, 127.0.0.5, 127.0.0.3], ... TokenRange(
start_token:-9223372036854775708, end_token:-3074457345618258603,
endpoints:[127.0.0.2, 127.0.0.5, 127.0.0.3, 127.0.0.6], ...
TokenRange(start_token:-3074457345618258603, end_token:-3074457345618258503,
endpoints:[127.0.0.5, 127.0.0.3, 127.0.0.6, 127.0.0.1], ...
TokenRange(start_token:-3074457345618258503, end_token:
3074457345618258602, endpoints:[127.0.0.3, 127.0.0.6, 127.0.0.1,
127.0.0.4], ... TokenRange(start_token: 3074457345618258602, end_token:
3074457345618258702, endpoints:[127.0.0.6, 127.0.0.1, 127.0.0.4,
127.0.0.2], ...
TokenRange(start_token: 3074457345618258702, end_token:-9223372036854775808,
endpoints:[127.0.0.1, 127.0.0.4, 127.0.0.2, 127.0.0.5], ...

So in this setup, every token range has one end contributed by a node from
dc1 and the other end -- from dc2.  That doesn't model anything in the real
topology of the cluster.

I see that it's easy to lump together tokens from all nodes and sort them,
to produce a single token ring (and this is obviously the reason why tokens
have to be unique throughout the cluster as a whole).  That doesn't mean
it's a meaningful thing to do.

This introduces complexity which not present in the problem domain
initially.  This was a deliberate choice of developers, dare I say, to
complect the separate DCs together in a single token ring.  This has
profound consequences from the operations side.  If anything, it prevents
bootstrapping multiple nodes at the same time even if they are in different
DCs.  Or would you suggest to set consistent_range_movement=false and hope
it will work out?

If the whole reason for having separate DCs is to provide isolation, I fail
to see how the single token ring design does anything towards achieving
that.

But also being very clear (I want to make sure I understand what you're
> saying): that's a manual thing you did, Cassandra didn't do it for you,
> right? The fact that Cassandra didn't STOP you from doing it could be
> considered a bug, but YOU made that config choice?
>

Yes, we have chosen exactly the same token for two nodes in different DCs
because we were unaware of this globally uniqueness requirement.  Yes, we
believe it's a bug that Cassandra didn't stop us from doing that.

You can trivially predict what would happen with SimpleStrategy in
> multi-DC: run nodetool ring, and the first RF nodes listed after a given
> token own that data, regardless of which DC they're in. Because it's all
> one big ring.


In any case I don't think SimpleStrategy is a valid argument to consider in
multi-DC setup.  It is true that you can start a cluster spanning multiple
DCs from scratich while using SimpleStrategy, but there is no way to add a
new DC to the cluster unless you go NTS, so why pulling this example?

Cheers,
--
Alex


Re: Nodes show different number of tokens than initially

2018-01-31 Thread kurt greaves
>
> I don’t know why this is a surprise (maybe because people like to talk
> about multiple rings, but the fact that replication strategy is set per
> keyspace and that you could use SimpleStrategy in a multiple dc cluster
> demonstrates this), but we can chat about that another time

This is actually a point of confusion for a lot of new users. It seems
obvious for people who know the internals or who have been around since
pre-NTS/vnodes, but it's really not. Especially because NTS makes it seem
like there are two separate rings.

> that's a manual thing you did, Cassandra didn't do it for you, right? The
> fact that Cassandra didn't STOP you from doing it could be considered a
> bug, but YOU made that config choice?

This should be fairly easy to reproduce, however Kurt mentioned that there
> supposed to be some sort of protection against that. I'll try again
> tomorrow.

Sorry, the behaviour was expected. I was under the impression that you
couldn't 'steal' a token from another node (thought C* stopped you), and I
misread the code. It actually gives the token up to the new node - not the
other way round. I haven't thought about it long enough to really consider
what the behaviour should be, or whether the current behaviour is right or
wrong though.
​


Re: Nodes show different number of tokens than initially

2018-01-31 Thread Jeff Jirsa
On Wed, Jan 31, 2018 at 12:08 PM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On 31 Jan 2018 17:18, "Jeff Jirsa"  wrote:
>
>
> I don’t know why this is a surprise (maybe because people like to talk
> about multiple rings, but the fact that replication strategy is set per
> keyspace and that you could use SimpleStrategy in a multiple dc cluster
> demonstrates this), but we can chat about that another time
>
>
> The reason I find it surprising, is that it makes very little *sense* to
> put a token belonging to a mode from one DC between tokens of nodes from
> another one.
>
>
I don't want to really turn this into an argument over what should and
shouldn't make sense, but I do agree, it doesn't make sense to put a token
on one node in one DC onto another node in another DC. But also being very
clear (I want to make sure I understand what you're saying): that's a
manual thing you did, Cassandra didn't do it for you, right? The fact that
Cassandra didn't STOP you from doing it could be considered a bug, but YOU
made that config choice?


> Having token ranges like that, with ends from nodes in different DCs,
> doesn't convey any *meaning* and have no correspondence to what is being
> modelled here. It also makes it nearly impossible to reason about range
> ownership (unless you're a machine, in which case you probably don't care).
>
> I understand that it works in the end, but it doesn't help to know that.
> It is an implementation detail sticking outside the code guts and it sure
> *is* surprising in all its ugliness. It also opens up the possibility of
> problems just like the one which have started this discussion.
>
> I don't find the argument of using SimpleStrategy for multi-DC
> particularly interesting, lest can I predict what to be expected from such
> an attempt.
>

You can trivially predict what would happen with SimpleStrategy in
multi-DC: run nodetool ring, and the first RF nodes listed after a given
token own that data, regardless of which DC they're in. Because it's all
one big ring.


Re: Nodes show different number of tokens than initially

2018-01-31 Thread Oleksandr Shulgin
On 31 Jan 2018 17:18, "Jeff Jirsa"  wrote:


I don’t know why this is a surprise (maybe because people like to talk
about multiple rings, but the fact that replication strategy is set per
keyspace and that you could use SimpleStrategy in a multiple dc cluster
demonstrates this), but we can chat about that another time


The reason I find it surprising, is that it makes very little *sense* to
put a token belonging to a mode from one DC between tokens of nodes from
another one.

Having token ranges like that, with ends from nodes in different DCs,
doesn't convey any *meaning* and have no correspondence to what is being
modelled here. It also makes it nearly impossible to reason about range
ownership (unless you're a machine, in which case you probably don't care).

I understand that it works in the end, but it doesn't help to know that. It
is an implementation detail sticking outside the code guts and it sure *is*
surprising in all its ugliness. It also opens up the possibility of
problems just like the one which have started this discussion.

I don't find the argument of using SimpleStrategy for multi-DC particularly
interesting, lest can I predict what to be expected from such an attempt.

If this is deemed invalid config why does the new node *silently* steals
the existing token, badly affecting the ownership of the rest of the
nodes?  It should just refuse to start!

Philosophically, With multiple DCs, it may start up and not see the other
DC for minutes/hours/days before it realizes there’s a token conflict -
what should it do then?


This was not the case for us - the new mode has seen all of the ring and
could detect that there is a conflict. Still it decided to claim the token
ownership, removing it from a longer-lived mode.

This should be fairly easy to reproduce, however Kurt mentioned that there
supposed to be some sort of protection against that. I'll try again
tomorrow.

If your suggestion to resolve that is to make sure we see the whole ring
before starting up, we end up in a situation where we try not to startup
unless we can see all nodes, and create outages during DC separations.


I don't really see a problem here. A newly started node learns topology
from the seed nodes - it doesn't need to *see* all nodes, just learn that
the *exist* and which tokens are assigned to them. A node which is
restarting doesn't even need to do that, because it doesn't need to
reconsider its token ownership.

Cheers,
--
Alex


Re: Nodes show different number of tokens than initially

2018-01-31 Thread Jeff Jirsa



> On Jan 31, 2018, at 12:35 AM, Oleksandr Shulgin 
>  wrote:
> 
>> On Tue, Jan 30, 2018 at 5:44 PM, Jeff Jirsa  wrote:
>> All DCs in a cluster use the same token space in the DHT,
> 
> I can't believe my bloody eyes, but this seems to be true...

I don’t know why this is a surprise (maybe because people like to talk about 
multiple rings, but the fact that replication strategy is set per keyspace and 
that you could use SimpleStrategy in a multiple dc cluster demonstrates this), 
but we can chat about that another time 


> 
>> so token conflicts across datacenters are invalid config
>  
> If this is deemed invalid config why does the new node *silently* steals the 
> existing token, badly affecting the ownership of the rest of the nodes?  It 
> should just refuse to start!


Philosophically, With multiple DCs, it may start up and not see the other DC 
for minutes/hours/days before it realizes there’s a token conflict - what 
should it do then? Which node gets stopped? If your suggestion to resolve that 
is to make sure we see the whole ring before starting up, we end up in a 
situation where we try not to startup unless we can see all nodes, and create 
outages during DC separations. 

Distributed systems and occasional availability make these decisions harder.

Please open a jira if you think it’s wrong, but I’m not sure I know what the 
“right” answer is either.

Re: Nodes show different number of tokens than initially

2018-01-31 Thread kurt greaves
So the only reason that the new node would "steal" the token is if it
started up earlier - which is based off how many heartbeats have occurred
since entering NORMAL status on each node. I can't see any reason the new
nodes would have higher generation numbers, so sounds likely there's a bug
somewhere there. I'm not really sure why this comparison would be relevant
unless you were starting multiple nodes at the same time, and based off
your example it seems it definitely shouldn't have happened. Can you create
a JIRA ticket with the above information?​


Re: Nodes show different number of tokens than initially

2018-01-31 Thread Oleksandr Shulgin
On Wed, Jan 31, 2018 at 5:06 AM, Dikang Gu  wrote:

> What's the partitioner you use? We have logic to prevent duplicate tokens.
>

We are using the default Murmur3Partitioner.  The problem arises from the
fact that we manually allocating the tokens as described earlier.

--
Alex


Re: Nodes show different number of tokens than initially

2018-01-31 Thread Oleksandr Shulgin
On Tue, Jan 30, 2018 at 5:44 PM, Jeff Jirsa  wrote:

> All DCs in a cluster use the same token space in the DHT,
>

I can't believe my bloody eyes, but this seems to be true...

so token conflicts across datacenters are invalid config
>

If this is deemed invalid config why does the new node *silently* steals
the existing token, badly affecting the ownership of the rest of the
nodes?  It should just refuse to start!

--
Alex


Re: Nodes show different number of tokens than initially

2018-01-30 Thread Dikang Gu
What's the partitioner you use? We have logic to prevent duplicate tokens.

private static Collection adjustForCrossDatacenterClashes(final
TokenMetadata tokenMetadata,

StrategyAdapter strategy, Collection tokens)
{
List filtered = Lists.newArrayListWithCapacity(tokens.size());

for (Token t : tokens)
{
while (tokenMetadata.getEndpoint(t) != null)
{
InetAddress other = tokenMetadata.getEndpoint(t);
if (strategy.inAllocationRing(other))
throw new
ConfigurationException(String.format("Allocated token %s already
assigned to node %s. Is another node also allocating tokens?", t,
other));
t = t.increaseSlightly();
}
filtered.add(t);
}
return filtered;
}



On Tue, Jan 30, 2018 at 8:44 AM, Jeff Jirsa  wrote:

> All DCs in a cluster use the same token space in the DHT, so token
> conflicts across datacenters are invalid config
>
>
> --
> Jeff Jirsa
>
>
> On Jan 29, 2018, at 11:50 PM, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> On Tue, Jan 30, 2018 at 5:13 AM, kurt greaves 
> wrote:
>
>> Shouldn't happen. Can you send through nodetool ring output from one of
>> those nodes? Also, did the logs have anything to say about tokens when you
>> started the 3 seed nodes?​
>>
>
> Hi Kurt,
>
> I cannot run nodetool ring anymore, since these test nodes are long gone.
> However I've grepped the logs and this is what I've found:
>
> Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18
> Nodes /172.31.128.31 and /172.31.128.41 have the same token
> -9223372036854775808. Ignoring /172.31.128.31
> Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18
> Nodes /172.31.144.32 and /172.31.128.41 have the same token
> -8454757700450211158. Ignoring /172.31.144.32
> Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30
> Nodes /172.31.128.41 and /172.31.128.31 have the same token
> -9223372036854775808. /172.31.128.41 is the new owner
> Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30
> Nodes /172.31.144.32 and /172.31.128.41 have the same token
> -8454757700450211158. Ignoring /172.31.144.32
> Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45
> Nodes /172.31.128.41 and /172.31.128.31 have the same token
> -9223372036854775808. /172.31.128.41 is the new owner
> Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45
> Nodes /172.31.144.32 and /172.31.128.41 have the same token
> -8454757700450211158. Ignoring /172.31.144.32
>
> Since we are allocating the tokens for seed nodes manually, it appears
> that the first seed node in the new ring (172.31.128.41) gets the same
> first token (-9223372036854775808) as the node in the old ring
> (172.31.128.31).  The same goes for the 3rd token of the new seed node
> (-8454757700450211158).
>
> What is beyond me is why would that matter and why would token ownership
> change at all, while these nodes are in the *different virtual DCs*?  To me
> this sounds like a paticularly nasty bug...
>
> --
> Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
> 127-59-707 <+49%20176%2012759707>
>
>


-- 
Dikang


Re: Nodes show different number of tokens than initially

2018-01-30 Thread Jeff Jirsa
All DCs in a cluster use the same token space in the DHT, so token conflicts 
across datacenters are invalid config
 

-- 
Jeff Jirsa


> On Jan 29, 2018, at 11:50 PM, Oleksandr Shulgin 
>  wrote:
> 
>> On Tue, Jan 30, 2018 at 5:13 AM, kurt greaves  wrote:
>> Shouldn't happen. Can you send through nodetool ring output from one of 
>> those nodes? Also, did the logs have anything to say about tokens when you 
>> started the 3 seed nodes?​
> 
> Hi Kurt,
> 
> I cannot run nodetool ring anymore, since these test nodes are long gone.  
> However I've grepped the logs and this is what I've found:
> 
> Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO  08:57:18 
> Nodes /172.31.128.31 and /172.31.128.41 have the same token 
> -9223372036854775808.  Ignoring /172.31.128.31
> Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO  08:57:18 
> Nodes /172.31.144.32 and /172.31.128.41 have the same token 
> -8454757700450211158.  Ignoring /172.31.144.32
> Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO  08:58:30 
> Nodes /172.31.128.41 and /172.31.128.31 have the same token 
> -9223372036854775808.  /172.31.128.41 is the new owner
> Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO  08:58:30 
> Nodes /172.31.144.32 and /172.31.128.41 have the same token 
> -8454757700450211158.  Ignoring /172.31.144.32
> Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO  08:59:45 
> Nodes /172.31.128.41 and /172.31.128.31 have the same token 
> -9223372036854775808.  /172.31.128.41 is the new owner
> Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO  08:59:45 
> Nodes /172.31.144.32 and /172.31.128.41 have the same token 
> -8454757700450211158.  Ignoring /172.31.144.32
> 
> Since we are allocating the tokens for seed nodes manually, it appears that 
> the first seed node in the new ring (172.31.128.41) gets the same first token 
> (-9223372036854775808) as the node in the old ring (172.31.128.31).  The same 
> goes for the 3rd token of the new seed node (-8454757700450211158).
> 
> What is beyond me is why would that matter and why would token ownership 
> change at all, while these nodes are in the *different virtual DCs*?  To me 
> this sounds like a paticularly nasty bug...
> 
> -- 
> Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 
> 127-59-707
> 


Re: Nodes show different number of tokens than initially

2018-01-29 Thread Oleksandr Shulgin
On Tue, Jan 30, 2018 at 5:13 AM, kurt greaves  wrote:

> Shouldn't happen. Can you send through nodetool ring output from one of
> those nodes? Also, did the logs have anything to say about tokens when you
> started the 3 seed nodes?​
>

Hi Kurt,

I cannot run nodetool ring anymore, since these test nodes are long gone.
However I've grepped the logs and this is what I've found:

Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18
Nodes /172.31.128.31 and /172.31.128.41 have the same token
-9223372036854775808. Ignoring /172.31.128.31
Jan 25 08:57:18 ip-172-31-128-41 docker/cf3ea463915a[854]: INFO 08:57:18
Nodes /172.31.144.32 and /172.31.128.41 have the same token
-8454757700450211158. Ignoring /172.31.144.32
Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30
Nodes /172.31.128.41 and /172.31.128.31 have the same token
-9223372036854775808. /172.31.128.41 is the new owner
Jan 25 08:58:30 ip-172-31-144-41 docker/48fba443d99f[852]: INFO 08:58:30
Nodes /172.31.144.32 and /172.31.128.41 have the same token
-8454757700450211158. Ignoring /172.31.144.32
Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45
Nodes /172.31.128.41 and /172.31.128.31 have the same token
-9223372036854775808. /172.31.128.41 is the new owner
Jan 25 08:59:45 ip-172-31-160-41 docker/cced70e132f2[849]: INFO 08:59:45
Nodes /172.31.144.32 and /172.31.128.41 have the same token
-8454757700450211158. Ignoring /172.31.144.32

Since we are allocating the tokens for seed nodes manually, it appears that
the first seed node in the new ring (172.31.128.41) gets the same first
token (-9223372036854775808) as the node in the old ring (172.31.128.31).
The same goes for the 3rd token of the new seed node (-8454757700450211158).

What is beyond me is why would that matter and why would token ownership
change at all, while these nodes are in the *different virtual DCs*?  To me
this sounds like a paticularly nasty bug...

-- 
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
127-59-707


Re: Nodes show different number of tokens than initially

2018-01-29 Thread kurt greaves
Shouldn't happen. Can you send through nodetool ring output from one of
those nodes? Also, did the logs have anything to say about tokens when you
started the 3 seed nodes?​


Re: Nodes show different number of tokens than initially

2018-01-26 Thread Oleksandr Shulgin
On Fri, Jan 26, 2018 at 3:08 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

>
>
> Could it be that after distributing the data, some of the nodes did not
> need to have a fourth token?
>

I'm not sure, but that would be definitely against my understanding of how
token assignment works.  E.g. these nodes still have num_tokens=4 in the
configuration file, so if they are restarted, the Cassandra server will
refuse to start, right?

--
Alex


RE: Nodes show different number of tokens than initially

2018-01-26 Thread Kenneth Brotman
Oleksandr,

 

Could it be that after distributing the data, some of the nodes did not need to 
have a fourth token?

 

Kenneth Brotman

 

From: Oleksandr Shulgin [mailto:oleksandr.shul...@zalando.de] 
Sent: Thursday, January 25, 2018 3:44 AM
To: User
Subject: Nodes show different number of tokens than initially

 

Hello,

 

While testing token allocation with version 3.0.15 we are experiencing some 
quite unexpected result.

 

We have deployed a secondary virtual DC with 6 nodes, 4 tokens per node.  Then 
we were adding the 7th node to the new DC in order to observe the effect of 
ownership re-distribution.

 

To set up the new DC we've used the following steps:

 

1. Alter all keyspaces to replicate to the upcoming new DC.

2. Deploy 3 seed nodes (IP ends with .31) with num_tokens=4 and tokens 
specified by initial_token list, auto_bootstrap=false.

3. Deploy 3 more nodes (IP ends with .32) with num_tokens=4 and 
allocate_tokens_for_keyspace=data_ks, auto_bootstrap=true.

4. Rebuild all new nodes specifying eu-central as the source DC (for the 3 
already bootstrapped nodes, workaround by truncating system.available_ranges 
first).

 

The following is the output of nodetool status after starting to bootstrap the 
7th node (172.31.128.33):

 

Datacenter: eu-central

==

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack

UN  172.31.160.12  26.4 GB256  48.9% 
89067222-b0eb-49e5-be7d-758ea24ace9a  1c

UN  172.31.144.12  28.92 GB   256  52.6% 
2ab4786f-9722-4418-ba78-9c435cbb30e5  1b

UN  172.31.128.12  28.13 GB   256  47.9% 
c4733a5c-abc5-4bab-9449-1e3f584cf64f  1a

UN  172.31.128.11  29.84 GB   256  52.2% 
6083369c-1a0f-4098-a420-313dacd429b6  1a

UN  172.31.160.11  28.25 GB   256  51.1% 
4dc361fc-818a-4b7f-abd3-9121488a7db1  1c

UN  172.31.144.11  28.14 GB   256  47.4% 
05e5df92-d196-46d5-8812-e843fbbd2922  1b

Datacenter: eu-central_4vn

==

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack

UN  172.31.128.31  24.83 GB   445.8% 
4d7decb3-8692-4aec-a2e1-2ac89aed8c5a  1a

UN  172.31.144.31  26.52 GB   445.8% 
2eb29602-2df5-4f4f-b419-b5a94cf785f0  1b

UN  172.31.160.31  248 GB445.8% 
f1bd4696-c25c-4bc3-8c30-292f2bd027c1  1c

UJ  172.31.128.33  568.94 MB  4? 
ffa21d50-9bb4-4d2b-9e3e-7a6945f6f071  1a

UN  172.31.144.32  29.3 GB454.2% 
5ce019f6-99fd-4333-b231-d04a266229bb  1b

UN  172.31.160.32  27.8 GB454.2% 
193bef27-eea8-4aa6-9d5f-8baf3decdd76  1c

UN  172.31128.32  30.5 GB454.2% 
6a046b64-31f9-4881-85b0-ab3a2f6dcdc4  1a

 

 

Then we wanted to start testing distribution with 8 vnodes.  For that we 
started to deploy yet another DC.

 

The following is the output of nodetool status after deploying the 3 seed nodes 
of the 8-tokens DC:

 

Datacenter: eu-central

==

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack

UN  172.31.160.12  26.4 GB256  48.9% 
89067222-b0eb-49e5-be7d-758ea24ace9a  1c

UN  172.31.144.12  28.92 GB   256  52.6% 
2ab4786f-9722-4418-ba78-9c435cbb30e5  1b

UN  172.31.128.12  28.13 GB   256  47.9% 
c4733a5c-abc5-4bab-9449-1e3f584cf64f  1a

UN  172.31.128.11  29.84 GB   256  52.2% 
6083369c-1a0f-4098-a420-313dacd429b6  1a

UN  172.31.160.11  28.25 GB   256  51.1% 
4dc361fc-818a-4b7f-abd3-9121488a7db1  1c

UN  172.31.144.11  28.14 GB   256  47.4% 
05e5df92-d196-46d5-8812-e843fbbd2922  1b

Datacenter: eu-central_4vn

==

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack

UN  172.31.128.31  24.83 GB   345.8% 
4d7decb3-8692-4aec-a2e1-2ac89aed8c5a  1a

UN  172.31.144.31  26.52 GB   445.8% 
2eb29602-2df5-4f4f-b419-b5a94cf785f0  1b

UN  172.31.160.31  24.8 GB445.8% 
f1bd4696-c25c-4bc3-8c30-292f2bd027c1  1c

UJ  172.31.128.33  4.21 GB4? 
ffa21d50-9bb4-4d2b-9e3e-7a6945f6f071  1a

UN  17231.160.32  27.8 GB454.2% 
193bef27-eea8-4aa6-9d5f-8baf3decdd76  1c

UN  172.31.144.32  29.3 GB354.2% 
5ce019f6-99fd-4333-b231-d04a266229bb  1b

UN  172.31.128.32  30.5 GB454.2

Nodes show different number of tokens than initially

2018-01-25 Thread Oleksandr Shulgin
Hello,

While testing token allocation with version 3.0.15 we are experiencing some
quite unexpected result.

We have deployed a secondary virtual DC with 6 nodes, 4 tokens per node.
Then we were adding the 7th node to the new DC in order to observe the
effect of ownership re-distribution.

To set up the new DC we've used the following steps:

1. Alter all keyspaces to replicate to the upcoming new DC.
2. Deploy 3 seed nodes (IP ends with .31) with num_tokens=4 and tokens
specified by initial_token list, auto_bootstrap=false.
3. Deploy 3 more nodes (IP ends with .32) with num_tokens=4 and
allocate_tokens_for_keyspace=data_ks, auto_bootstrap=true.
4. Rebuild all new nodes specifying eu-central as the source DC (for the 3
already bootstrapped nodes, workaround by truncating
system.available_ranges first).

The following is the output of nodetool status after starting to bootstrap
the 7th node (172.31.128.33):

Datacenter: eu-central
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack
UN  172.31.160.12  26.4 GB256  48.9%
 89067222-b0eb-49e5-be7d-758ea24ace9a  1c
UN  172.31.144.12  28.92 GB   256  52.6%
 2ab4786f-9722-4418-ba78-9c435cbb30e5  1b
UN  172.31.128.12  28.13 GB   256  47.9%
 c4733a5c-abc5-4bab-9449-1e3f584cf64f  1a
UN  172.31.128.11  29.84 GB   256  52.2%
 6083369c-1a0f-4098-a420-313dacd429b6  1a
UN  172.31.160.11  28.25 GB   256  51.1%
 4dc361fc-818a-4b7f-abd3-9121488a7db1  1c
UN  172.31.144.11  28.14 GB   256  47.4%
 05e5df92-d196-46d5-8812-e843fbbd2922  1b
Datacenter: eu-central_4vn
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack
UN  172.31.128.31  24.83 GB   445.8%
 4d7decb3-8692-4aec-a2e1-2ac89aed8c5a  1a
UN  172.31.144.31  26.52 GB   445.8%
 2eb29602-2df5-4f4f-b419-b5a94cf785f0  1b
UN  172.31.160.31  24.8 GB445.8%
 f1bd4696-c25c-4bc3-8c30-292f2bd027c1  1c
UJ  172.31.128.33  568.94 MB  4?
 ffa21d50-9bb4-4d2b-9e3e-7a6945f6f071  1a
UN  172.31.144.32  29.3 GB454.2%
 5ce019f6-99fd-4333-b231-d04a266229bb  1b
UN  172.31.160.32  27.8 GB454.2%
 193bef27-eea8-4aa6-9d5f-8baf3decdd76  1c
UN  172.31.128.32  30.5 GB454.2%
 6a046b64-31f9-4881-85b0-ab3a2f6dcdc4  1a


Then we wanted to start testing distribution with 8 vnodes.  For that we
started to deploy yet another DC.

The following is the output of nodetool status after deploying the 3 seed
nodes of the 8-tokens DC:

Datacenter: eu-central
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack
UN  172.31.160.12  26.4 GB256  48.9%
 89067222-b0eb-49e5-be7d-758ea24ace9a  1c
UN  172.31.144.12  28.92 GB   256  52.6%
 2ab4786f-9722-4418-ba78-9c435cbb30e5  1b
UN  172.31.128.12  28.13 GB   256  47.9%
 c4733a5c-abc5-4bab-9449-1e3f584cf64f  1a
UN  172.31.128.11  29.84 GB   256  52.2%
 6083369c-1a0f-4098-a420-313dacd429b6  1a
UN  172.31.160.11  28.25 GB   256  51.1%
 4dc361fc-818a-4b7f-abd3-9121488a7db1  1c
UN  172.31.144.11  28.14 GB   256  47.4%
 05e5df92-d196-46d5-8812-e843fbbd2922  1b
Datacenter: eu-central_4vn
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack
*UN  172.31.128.31  24.83 GB   345.8%
 4d7decb3-8692-4aec-a2e1-2ac89aed8c5a  1a*
UN  172.31.144.31  26.52 GB   445.8%
 2eb29602-2df5-4f4f-b419-b5a94cf785f0  1b
UN  172.31.160.31  24.8 GB445.8%
 f1bd4696-c25c-4bc3-8c30-292f2bd027c1  1c
UJ  172.31.128.33  4.21 GB4?
 ffa21d50-9bb4-4d2b-9e3e-7a6945f6f071  1a
UN  172.31.160.32  27.8 GB454.2%
 193bef27-eea8-4aa6-9d5f-8baf3decdd76  1c
*UN  172.31.144.32  29.3 GB354.2%
 5ce019f6-99fd-4333-b231-d04a266229bb  1b*
UN  172.31.128.32  30.5 GB454.2%
 6a046b64-31f9-4881-85b0-ab3a2f6dcdc4  1a
Datacenter: eu-central_8vn
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
   Rack
UN  172.31.128.41  111.11 KB  80.0%
e218b68e-9837-4e6a-acbe-9833fda285bc  1a
UN  172.31.144.41  113.2 KB   80.0%
3ec883e9-6b84-4314-85bd-a3c00c4f47c8  1b
UN  172.31.160.41  82.22 KB   80.0%
cfaee6c5-ee9c-4d29-aa54-ca3e8e74e356  1c

What is absolutely unexpected is that here we see that 2 nodes in the _4vn
DC apparently now have reduced number of tokens: 3 instead of 4.

How could that happen?

--
Alex