Re: Reads not returning data after adding node

David Tinker Mon, 03 Apr 2023 22:41:35 -0700

Ok. Have to psych myself up to the add node task a bit. Didn't go well the
first time round!


Tasks
- Make sure the new node is not in seeds list!
- Check cluster name, listen address, rpc address
- Give it its own rack in cassandra-rackdc.properties
- Delete cassandra-topology.properties if it exists
- Make sure no compactions are on the go
- rm -rf /var/lib/cassandra/*
- rm /data/cassandra/commitlog/* (this is on different disk)
- systemctl start cassandra

And it should start streaming data from the other nodes and join the
cluster. Anything else I have to watch out for? Tx.


On Tue, Apr 4, 2023 at 5:25 AM Jeff Jirsa <jji...@gmail.com> wrote:

> Because executing “removenode” streamed extra data from live nodes to the
> “gaining” replica
>
> Oversimplified (if you had one token per node)
>
> If you  start with A B C
>
> Then add D
>
> D should bootstrap a range from each of A B and C, but at the end, some of
> the data that was A B C becomes B C D
>
> When you removenode, you tell B and C to send data back to A.
>
> A B and C will eventually contact that data away. Eventually.
>
> If you get around to adding D again, running “cleanup” when you’re done
> (successfully) will remove a lot of it.
>
>
>
> On Apr 3, 2023, at 8:14 PM, David Tinker <david.tin...@gmail.com> wrote:
>
> 
> Looks like the remove has sorted things out. Thanks.
>
> One thing I am wondering about is why the nodes are carrying a lot more
> data? The loads were about 2.7T before, now 3.4T.
>
> # nodetool status
> Datacenter: dc1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address          Load      Tokens  Owns (effective)  Host ID
>                     Rack
> UN  xxx.xxx.xxx.105  3.4 TiB   256     100.0%
>  afd02287-3f88-4c6f-8b27-06f7a8192402  rack3
> UN  xxx.xxx.xxx.253  3.34 TiB  256     100.0%
>  e1af72be-e5df-4c6b-a124-c7bc48c6602a  rack2
> UN  xxx.xxx.xxx.107  3.44 TiB  256     100.0%
>  ab72f017-be96-41d2-9bef-a551dec2c7b5  rack1
>
> On Mon, Apr 3, 2023 at 5:42 PM Bowen Song via user <
> user@cassandra.apache.org> wrote:
>
>> That's correct. nodetool removenode is strongly preferred when your node
>> is already down. If the node is still functional, use nodetool
>> decommission on the node instead.
>> On 03/04/2023 16:32, Jeff Jirsa wrote:
>>
>> FWIW, `nodetool decommission` is strongly preferred. `nodetool
>> removenode` is designed to be run when a host is offline. Only decommission
>> is guaranteed to maintain consistency / correctness, and removemode
>> probably streams a lot more data around than decommission.
>>
>>
>> On Mon, Apr 3, 2023 at 6:47 AM Bowen Song via user <
>> user@cassandra.apache.org> wrote:
>>
>>> Use nodetool removenode is strongly preferred in most circumstances,
>>> and only resort to assassinate if you do not care about data
>>> consistency or you know there won't be any consistency issue (e.g. no new
>>> writes and did not run nodetool cleanup).
>>>
>>> Since the size of data on the new node is small, nodetool removenode
>>> should finish fairly quickly and bring your cluster back.
>>>
>>> Next time when you are doing something like this again, please test it
>>> out on a non-production environment, make sure everything works as expected
>>> before moving onto the production.
>>>
>>>
>>> On 03/04/2023 06:28, David Tinker wrote:
>>>
>>> Should I use assassinate or removenode? Given that there is some data on
>>> the node. Or will that be found on the other nodes? Sorry for all the
>>> questions but I really don't want to mess up.
>>>
>>> On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz <crdiaz...@gmail.com> wrote:
>>>
>>>> That's what nodetool assassinte will do.
>>>>
>>>> On Sun, Apr 2, 2023 at 10:19 PM David Tinker <david.tin...@gmail.com>
>>>> wrote:
>>>>
>>>>> Is it possible for me to remove the node from the cluster i.e. to undo
>>>>> this mess and get the cluster operating again?
>>>>>
>>>>> On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz <crdiaz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> You can leave it in the seed list of the other nodes, just make sure
>>>>>> it's not included in this node's seed list.  However, if you do decide to
>>>>>> fix the issue with the racks first assassinate this node (nodetool
>>>>>> assassinate <ip>), and update the rack name before you restart.
>>>>>>
>>>>>> On Sun, Apr 2, 2023 at 10:06 PM David Tinker <david.tin...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> It is also in the seeds list for the other nodes. Should I remove it
>>>>>>> from those, restart them one at a time, then restart it?
>>>>>>>
>>>>>>> /etc/cassandra # grep -i bootstrap *
>>>>>>> doesn't show anything so I don't think I have auto_bootstrap false.
>>>>>>>
>>>>>>> Thanks very much for the help.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz <crdiaz...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just remove it from the seed list in the cassandra.yaml file and
>>>>>>>> restart the node.  Make sure that auto_bootstrap is set to true first
>>>>>>>> though.
>>>>>>>>
>>>>>>>> On Sun, Apr 2, 2023 at 9:59 PM David Tinker <david.tin...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> So likely because I made it a seed node when I added it to the
>>>>>>>>> cluster it didn't do the bootstrap process. How can I recover this?
>>>>>>>>>
>>>>>>>>> On Mon, Apr 3, 2023 at 6:41 AM David Tinker <
>>>>>>>>> david.tin...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Yes replication factor is 3.
>>>>>>>>>>
>>>>>>>>>> I ran nodetool repair -pr on all the nodes (one at a time) and
>>>>>>>>>> am still having issues getting data back from queries.
>>>>>>>>>>
>>>>>>>>>> I did make the new node a seed node.
>>>>>>>>>>
>>>>>>>>>> Re "rack4": I assumed that was just an indication as to the
>>>>>>>>>> physical location of the server for redundancy. This one is separate 
>>>>>>>>>> from
>>>>>>>>>> the others so I used rack4.
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 3, 2023 at 6:30 AM Carlos Diaz <crdiaz...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm assuming that your replication factor is 3.  If that's the
>>>>>>>>>>> case, did you intentionally put this node in rack 4?  Typically, 
>>>>>>>>>>> you want
>>>>>>>>>>> to add nodes in multiples of your replication factor in order to 
>>>>>>>>>>> keep the
>>>>>>>>>>> "racks" balanced.  In other words, this node should have been added 
>>>>>>>>>>> to rack
>>>>>>>>>>> 1, 2 or 3.
>>>>>>>>>>>
>>>>>>>>>>> Having said that, you should be able to easily fix your problem
>>>>>>>>>>> by running a nodetool repair -pr on the new node.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Apr 2, 2023 at 8:16 PM David Tinker <
>>>>>>>>>>> david.tin...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi All
>>>>>>>>>>>>
>>>>>>>>>>>> I recently added a node to my 3 node Cassandra 4.0.5 cluster
>>>>>>>>>>>> and now many reads are not returning rows! What do I need to do to 
>>>>>>>>>>>> fix
>>>>>>>>>>>> this? There weren't any errors in the logs or other problems that 
>>>>>>>>>>>> I could
>>>>>>>>>>>> see. I expected the cluster to balance itself but this hasn't 
>>>>>>>>>>>> happened
>>>>>>>>>>>> (yet?). The nodes are similar so I have num_tokens=256 for each. I 
>>>>>>>>>>>> am using
>>>>>>>>>>>> the Murmur3Partitioner.
>>>>>>>>>>>>
>>>>>>>>>>>> # nodetool status
>>>>>>>>>>>> Datacenter: dc1
>>>>>>>>>>>> ===============
>>>>>>>>>>>> Status=Up/Down
>>>>>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>>>>> --  Address          Load       Tokens  Owns (effective)  Host
>>>>>>>>>>>> ID                               Rack
>>>>>>>>>>>> UN  xxx.xxx.xxx.105  2.65 TiB   256     72.9%
>>>>>>>>>>>> afd02287-3f88-4c6f-8b27-06f7a8192402  rack3
>>>>>>>>>>>> UN  xxx.xxx.xxx.253  2.6 TiB    256     73.9%
>>>>>>>>>>>> e1af72be-e5df-4c6b-a124-c7bc48c6602a  rack2
>>>>>>>>>>>> UN  xxx.xxx.xxx.24   93.82 KiB  256     80.0%
>>>>>>>>>>>> c4e8b4a0-f014-45e6-afb4-648aad4f8500  rack4
>>>>>>>>>>>> UN  xxx.xxx.xxx.107  2.65 TiB   256     73.2%
>>>>>>>>>>>> ab72f017-be96-41d2-9bef-a551dec2c7b5  rack1
>>>>>>>>>>>>
>>>>>>>>>>>> # nodetool netstats
>>>>>>>>>>>> Mode: NORMAL
>>>>>>>>>>>> Not sending any streams.
>>>>>>>>>>>> Read Repair Statistics:
>>>>>>>>>>>> Attempted: 0
>>>>>>>>>>>> Mismatch (Blocking): 0
>>>>>>>>>>>> Mismatch (Background): 0
>>>>>>>>>>>> Pool Name                    Active   Pending      Completed
>>>>>>>>>>>> Dropped
>>>>>>>>>>>> Large messages                  n/a         0          71754
>>>>>>>>>>>>       0
>>>>>>>>>>>> Small messages                  n/a         0        8398184
>>>>>>>>>>>>      14
>>>>>>>>>>>> Gossip messages                 n/a         0        1303634
>>>>>>>>>>>>       0
>>>>>>>>>>>>
>>>>>>>>>>>> # nodetool ring
>>>>>>>>>>>> Datacenter: dc1
>>>>>>>>>>>> ==========
>>>>>>>>>>>> Address               Rack        Status State   Load
>>>>>>>>>>>>  Owns                Token
>>>>>>>>>>>>
>>>>>>>>>>>>                      9189523899826545641
>>>>>>>>>>>> xxx.xxx.xxx.24        rack4       Up     Normal  93.82 KiB
>>>>>>>>>>>>   79.95%              -9194674091837769168
>>>>>>>>>>>> xxx.xxx.xxx.107       rack1       Up     Normal  2.65 TiB
>>>>>>>>>>>>  73.25%              -9168781258594813088
>>>>>>>>>>>> xxx.xxx.xxx.253       rack2       Up     Normal  2.6 TiB
>>>>>>>>>>>>   73.92%              -9163037340977721917
>>>>>>>>>>>> xxx.xxx.xxx.105       rack3       Up     Normal  2.65 TiB
>>>>>>>>>>>>  72.88%              -9148860739730046229
>>>>>>>>>>>> xxx.xxx.xxx.107       rack1       Up     Normal  2.65 TiB
>>>>>>>>>>>>  73.25%              -9125240034139323535
>>>>>>>>>>>> xxx.xxx.xxx.253       rack2       Up     Normal  2.6 TiB
>>>>>>>>>>>>   73.92%              -9112518853051755414
>>>>>>>>>>>> xxx.xxx.xxx.105       rack3       Up     Normal  2.65 TiB
>>>>>>>>>>>>  72.88%              -9100516173422432134
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>> This is causing a serious production issue. Please help if you
>>>>>>>>>>>> can.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Reads not returning data after adding node

Reply via email to