Re: Reads not returning data after adding node

David Tinker Mon, 03 Apr 2023 06:57:37 -0700

Thanks. Hmm, the remove has been busy for hours but seems to be progressing.


I have been running this on the nodes to monitor progress:
# nodetool netstats | grep Already
        Receiving 92 files, 843934103369 bytes total. Already received 82
files (89.13%), 590204687299 bytes total (69.93%)
        Sending 84 files, 860198753783 bytes total. Already sent 56 files
(66.67%), 307038785732 bytes total (35.69%)
        Sending 78 files, 815573435637 bytes total. Already sent 56 files
(71.79%), 313079823738 bytes total (38.39%)

The percentages are ticking up.

# nodetool ring | head -20
Datacenter: dc1
==========
Address               Rack        Status State   Load            Owns
         Token

         9189523899826545641
xxx.xxx.xxx..24        rack4       Down   Leaving 26.62 GiB       79.95%
           -9194674091837769168
xxx.xxx.xxx.107       rack1       Up     Normal  2.68 TiB        73.25%
         -9168781258594813088
xxx.xxx.xxx.253       rack2       Up     Normal  2.63 TiB        73.92%
         -9163037340977721917
xxx.xxx.xxx.105       rack3       Up     Normal  2.68 TiB        72.88%
         -9148860739730046229


On Mon, Apr 3, 2023 at 3:46 PM Bowen Song via user <
[email protected]> wrote:

> Use nodetool removenode is strongly preferred in most circumstances, and
> only resort to assassinate if you do not care about data consistency or
> you know there won't be any consistency issue (e.g. no new writes and did
> not run nodetool cleanup).
>
> Since the size of data on the new node is small, nodetool removenode
> should finish fairly quickly and bring your cluster back.
>
> Next time when you are doing something like this again, please test it out
> on a non-production environment, make sure everything works as expected
> before moving onto the production.
>
>
> On 03/04/2023 06:28, David Tinker wrote:
>
> Should I use assassinate or removenode? Given that there is some data on
> the node. Or will that be found on the other nodes? Sorry for all the
> questions but I really don't want to mess up.
>
> On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz <[email protected]> wrote:
>
>> That's what nodetool assassinte will do.
>>
>> On Sun, Apr 2, 2023 at 10:19 PM David Tinker <[email protected]>
>> wrote:
>>
>>> Is it possible for me to remove the node from the cluster i.e. to undo
>>> this mess and get the cluster operating again?
>>>
>>> On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz <[email protected]> wrote:
>>>
>>>> You can leave it in the seed list of the other nodes, just make sure
>>>> it's not included in this node's seed list.  However, if you do decide to
>>>> fix the issue with the racks first assassinate this node (nodetool
>>>> assassinate <ip>), and update the rack name before you restart.
>>>>
>>>> On Sun, Apr 2, 2023 at 10:06 PM David Tinker <[email protected]>
>>>> wrote:
>>>>
>>>>> It is also in the seeds list for the other nodes. Should I remove it
>>>>> from those, restart them one at a time, then restart it?
>>>>>
>>>>> /etc/cassandra # grep -i bootstrap *
>>>>> doesn't show anything so I don't think I have auto_bootstrap false.
>>>>>
>>>>> Thanks very much for the help.
>>>>>
>>>>>
>>>>> On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Just remove it from the seed list in the cassandra.yaml file and
>>>>>> restart the node.  Make sure that auto_bootstrap is set to true first
>>>>>> though.
>>>>>>
>>>>>> On Sun, Apr 2, 2023 at 9:59 PM David Tinker <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> So likely because I made it a seed node when I added it to the
>>>>>>> cluster it didn't do the bootstrap process. How can I recover this?
>>>>>>>
>>>>>>> On Mon, Apr 3, 2023 at 6:41 AM David Tinker <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes replication factor is 3.
>>>>>>>>
>>>>>>>> I ran nodetool repair -pr on all the nodes (one at a time) and am
>>>>>>>> still having issues getting data back from queries.
>>>>>>>>
>>>>>>>> I did make the new node a seed node.
>>>>>>>>
>>>>>>>> Re "rack4": I assumed that was just an indication as to the
>>>>>>>> physical location of the server for redundancy. This one is separate 
>>>>>>>> from
>>>>>>>> the others so I used rack4.
>>>>>>>>
>>>>>>>> On Mon, Apr 3, 2023 at 6:30 AM Carlos Diaz <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm assuming that your replication factor is 3.  If that's the
>>>>>>>>> case, did you intentionally put this node in rack 4?  Typically, you 
>>>>>>>>> want
>>>>>>>>> to add nodes in multiples of your replication factor in order to keep 
>>>>>>>>> the
>>>>>>>>> "racks" balanced.  In other words, this node should have been added 
>>>>>>>>> to rack
>>>>>>>>> 1, 2 or 3.
>>>>>>>>>
>>>>>>>>> Having said that, you should be able to easily fix your problem by
>>>>>>>>> running a nodetool repair -pr on the new node.
>>>>>>>>>
>>>>>>>>> On Sun, Apr 2, 2023 at 8:16 PM David Tinker <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All
>>>>>>>>>>
>>>>>>>>>> I recently added a node to my 3 node Cassandra 4.0.5 cluster and
>>>>>>>>>> now many reads are not returning rows! What do I need to do to fix 
>>>>>>>>>> this?
>>>>>>>>>> There weren't any errors in the logs or other problems that I could 
>>>>>>>>>> see. I
>>>>>>>>>> expected the cluster to balance itself but this hasn't happened 
>>>>>>>>>> (yet?). The
>>>>>>>>>> nodes are similar so I have num_tokens=256 for each. I am using the
>>>>>>>>>> Murmur3Partitioner.
>>>>>>>>>>
>>>>>>>>>> # nodetool status
>>>>>>>>>> Datacenter: dc1
>>>>>>>>>> ===============
>>>>>>>>>> Status=Up/Down
>>>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>>> --  Address          Load       Tokens  Owns (effective)  Host ID
>>>>>>>>>>                               Rack
>>>>>>>>>> UN  xxx.xxx.xxx.105  2.65 TiB   256     72.9%
>>>>>>>>>> afd02287-3f88-4c6f-8b27-06f7a8192402  rack3
>>>>>>>>>> UN  xxx.xxx.xxx.253  2.6 TiB    256     73.9%
>>>>>>>>>> e1af72be-e5df-4c6b-a124-c7bc48c6602a  rack2
>>>>>>>>>> UN  xxx.xxx.xxx.24   93.82 KiB  256     80.0%
>>>>>>>>>> c4e8b4a0-f014-45e6-afb4-648aad4f8500  rack4
>>>>>>>>>> UN  xxx.xxx.xxx.107  2.65 TiB   256     73.2%
>>>>>>>>>> ab72f017-be96-41d2-9bef-a551dec2c7b5  rack1
>>>>>>>>>>
>>>>>>>>>> # nodetool netstats
>>>>>>>>>> Mode: NORMAL
>>>>>>>>>> Not sending any streams.
>>>>>>>>>> Read Repair Statistics:
>>>>>>>>>> Attempted: 0
>>>>>>>>>> Mismatch (Blocking): 0
>>>>>>>>>> Mismatch (Background): 0
>>>>>>>>>> Pool Name                    Active   Pending      Completed
>>>>>>>>>> Dropped
>>>>>>>>>> Large messages                  n/a         0          71754
>>>>>>>>>>     0
>>>>>>>>>> Small messages                  n/a         0        8398184
>>>>>>>>>>    14
>>>>>>>>>> Gossip messages                 n/a         0        1303634
>>>>>>>>>>     0
>>>>>>>>>>
>>>>>>>>>> # nodetool ring
>>>>>>>>>> Datacenter: dc1
>>>>>>>>>> ==========
>>>>>>>>>> Address               Rack        Status State   Load
>>>>>>>>>>  Owns                Token
>>>>>>>>>>
>>>>>>>>>>                    9189523899826545641
>>>>>>>>>> xxx.xxx.xxx.24        rack4       Up     Normal  93.82 KiB
>>>>>>>>>> 79.95%              -9194674091837769168
>>>>>>>>>> xxx.xxx.xxx.107       rack1       Up     Normal  2.65 TiB
>>>>>>>>>>  73.25%              -9168781258594813088
>>>>>>>>>> xxx.xxx.xxx.253       rack2       Up     Normal  2.6 TiB
>>>>>>>>>> 73.92%              -9163037340977721917
>>>>>>>>>> xxx.xxx.xxx.105       rack3       Up     Normal  2.65 TiB
>>>>>>>>>>  72.88%              -9148860739730046229
>>>>>>>>>> xxx.xxx.xxx.107       rack1       Up     Normal  2.65 TiB
>>>>>>>>>>  73.25%              -9125240034139323535
>>>>>>>>>> xxx.xxx.xxx.253       rack2       Up     Normal  2.6 TiB
>>>>>>>>>> 73.92%              -9112518853051755414
>>>>>>>>>> xxx.xxx.xxx.105       rack3       Up     Normal  2.65 TiB
>>>>>>>>>>  72.88%              -9100516173422432134
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> This is causing a serious production issue. Please help if you
>>>>>>>>>> can.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>

Re: Reads not returning data after adding node

Reply via email to