Because executing “removenode” streamed extra data from live nodes to the “gaining” replica
Oversimplified (if you had one token per node)
If you start with A B C
Then add D
D should bootstrap a range from each of A B and C, but at the end, some of the data that was A B C becomes B C D
When you removenode, you tell B and C to send data back to A.
A B and C will eventually contact that data away. Eventually.
If you get around to adding D again, running “cleanup” when you’re done (successfully) will remove a lot of it.
On Apr 3, 2023, at 8:14 PM, David Tinker <david.tin...@gmail.com> wrote:
Looks like the remove has sorted things out. Thanks.
One thing I am wondering about is why the nodes are carrying a lot more data? The loads were about 2.7T before, now 3.4T.
# nodetool status Datacenter: dc1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN xxx.xxx.xxx.105 3.4 TiB 256 100.0% afd02287-3f88-4c6f-8b27-06f7a8192402 rack3 UN xxx.xxx.xxx.253 3.34 TiB 256 100.0% e1af72be-e5df-4c6b-a124-c7bc48c6602a rack2 UN xxx.xxx.xxx.107 3.44 TiB 256 100.0% ab72f017-be96-41d2-9bef-a551dec2c7b5 rack1
That's correct. nodetool removenode
is strongly preferred when your node is already down. If the node
is still functional, use nodetool
decommission on the node instead.
On 03/04/2023 16:32, Jeff Jirsa wrote:
FWIW, `nodetool decommission` is strongly
preferred. `nodetool removenode` is designed to be run when a
host is offline. Only decommission is guaranteed to maintain
consistency / correctness, and removemode probably streams a
lot more data around than decommission.
Use nodetool removenode
is strongly preferred in most circumstances, and only
resort to assassinate if
you do not care about data consistency or you know there
won't be any consistency issue (e.g. no new writes and
did not run nodetool cleanup).
Since the size of data on the new node is small, nodetool removenode should
finish fairly quickly and bring your cluster back.
Next time when you are doing something like this again,
please test it out on a non-production environment, make
sure everything works as expected before moving onto the
production.
On 03/04/2023 06:28, David Tinker wrote:
Should I use assassinate or removenode?
Given that there is some data on the node. Or will
that be found on the other nodes? Sorry for all the
questions but I really don't want to mess up.
That's what nodetool assassinte will
do.
Is it possible for me to remove
the node from the cluster i.e. to undo this
mess and get the cluster operating again?
You can leave it in the seed
list of the other nodes, just make sure
it's not included in this node's seed
list. However, if you do decide to fix
the issue with the racks first assassinate
this node (nodetool assassinate
<ip>), and update the rack name
before you restart.
It is also in the seeds
list for the other nodes. Should I
remove it from those, restart them one
at a time, then restart it?
/etc/cassandra # grep -i
bootstrap *
doesn't show anything so I don't
think I have auto_bootstrap
false.
Thanks
very much for the help.
Just remove it from
the seed list in the
cassandra.yaml file and restart
the node. Make sure that
auto_bootstrap is set to true
first though.
So likely because
I made it a seed node when I
added it to the cluster it
didn't do the bootstrap
process. How can I recover
this?
Yes replication
factor is 3.
I ran nodetool
repair -pr on all the
nodes (one at a time)
and am still having
issues getting data back
from queries.
I
did make the new node
a seed node.
Re
"rack4": I assumed
that was just an
indication as to the
physical location of
the server for
redundancy. This one
is separate from the
others so I used
rack4.
I'm assuming that
your replication
factor is 3. If
that's the case, did
you intentionally
put this node in
rack 4? Typically,
you want to add
nodes in multiples
of your replication
factor in order to
keep the "racks"
balanced. In other
words, this node
should have been
added to rack 1, 2
or 3.
Having said that,
you should be able
to easily fix your
problem by running a
nodetool repair -pr
on the new node.
Hi
All
I recently
added a node to
my 3 node
Cassandra 4.0.5
cluster and now
many reads are
not returning
rows! What do I
need to do to
fix this? There
weren't any
errors in the
logs or other
problems that I
could see. I
expected the
cluster to
balance itself
but this hasn't
happened (yet?).
The nodes are
similar so I
have
num_tokens=256
for each. I am
using the
Murmur3Partitioner.
#
nodetool
status
Datacenter:
dc1
===============
Status=Up/Down
|/
State=Normal/Leaving/Joining/Moving
-- Address
Load
Tokens
Owns
(effective)
Host ID
Rack
UN
xxx.xxx.xxx.105
2.65 TiB
256 72.9%
afd02287-3f88-4c6f-8b27-06f7a8192402
rack3
UN
xxx.xxx.xxx.253
2.6 TiB
256 73.9%
e1af72be-e5df-4c6b-a124-c7bc48c6602a
rack2
UN
xxx.xxx.xxx.24
93.82 KiB
256 80.0%
c4e8b4a0-f014-45e6-afb4-648aad4f8500
rack4
UN
xxx.xxx.xxx.107
2.65 TiB
256 73.2%
ab72f017-be96-41d2-9bef-a551dec2c7b5
rack1
# nodetool
netstats
Mode: NORMAL
Not sending
any streams.
Read Repair
Statistics:
Attempted: 0
Mismatch
(Blocking): 0
Mismatch
(Background):
0
Pool Name
Active
Pending
Completed
Dropped
Large messages
n/a
0
71754
0
Small messages
n/a
0
8398184
14
Gossip
messages
n/a
0
1303634
0
# nodetool
ring
Datacenter:
dc1
==========
Address
Rack
Status
State Load
Owns
Token
9189523899826545641
xxx.xxx.xxx.24
rack4
Up
Normal 93.82
KiB
79.95%
-9194674091837769168
xxx.xxx.xxx.107 rack1 Up Normal 2.65 TiB 73.25%
-9168781258594813088
xxx.xxx.xxx.253 rack2 Up Normal 2.6 TiB 73.92%
-9163037340977721917
xxx.xxx.xxx.105 rack3 Up Normal 2.65 TiB 72.88%
-9148860739730046229
xxx.xxx.xxx.107 rack1 Up Normal 2.65 TiB 73.25%
-9125240034139323535
xxx.xxx.xxx.253 rack2 Up Normal 2.6 TiB 73.92%
-9112518853051755414
xxx.xxx.xxx.105 rack3 Up Normal 2.65 TiB 72.88%
-9100516173422432134
...
This is
causing a
serious
production
issue. Please
help if you can.
Thanks
David
|