Re: removenode stuck - cassandra 4.1.0

2023-01-23 Thread Joe Obernberger

Thank you - I was just impatient.  :)

-Joe

On 1/23/2023 12:56 PM, Jeff Jirsa wrote:

Those hosts are likely sending streams.

If you do `nodetool netstats` on the replicas of the node you're 
removing, you should see byte counters and file counters - they should 
all be incrementing. If one of them isnt incremening, that one is 
probably stuck.


There's at least one bug in 4.1 that can cause (I think? rate 
limiters) to interact in a way that can cause this. 
https://issues.apache.org/jira/browse/CASSANDRA-18110 describes it and 
has a workaround.




On Mon, Jan 23, 2023 at 9:41 AM Joe Obernberger 
 wrote:


I had a drive fail (first drive in the list) on a Cassandra cluster.
I've stopped the node (as it no longer starts), and am trying to
remove
it from the cluster, but the removenode command is hung (been running
for 3 hours so far):
nodetool removenode status is always reporting the same token as
being
removed.  Help?

nodetool removenode status
RemovalStatus: Removing token (-9196617215347134065). Waiting for
replication confirmation from
[/172.16.100.248 ,/172.16.100.249
,/172.16.100.251
,/172.16.100.252
,/172.16.100.34
,/172.16.100.35
,/172.16.100.36
,/172.16.100.37
,/172.16.100.38
,/172.16.100.42
,/172.16.100.44
,/172.16.100.45 ].

Thanks.

-Joe


-- 
This email has been checked for viruses by AVG antivirus software.

www.avg.com 


Re: removenode stuck - cassandra 4.1.0

2023-01-23 Thread Jeff Jirsa
Those hosts are likely sending streams.

If you do `nodetool netstats` on the replicas of the node you're removing,
you should see byte counters and file counters - they should all be
incrementing. If one of them isnt incremening, that one is probably stuck.

There's at least one bug in 4.1 that can cause (I think? rate limiters) to
interact in a way that can cause this.
https://issues.apache.org/jira/browse/CASSANDRA-18110 describes it and has
a workaround.



On Mon, Jan 23, 2023 at 9:41 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> I had a drive fail (first drive in the list) on a Cassandra cluster.
> I've stopped the node (as it no longer starts), and am trying to remove
> it from the cluster, but the removenode command is hung (been running
> for 3 hours so far):
> nodetool removenode status is always reporting the same token as being
> removed.  Help?
>
> nodetool removenode status
> RemovalStatus: Removing token (-9196617215347134065). Waiting for
> replication confirmation from
> [/172.16.100.248,/172.16.100.249,/172.16.100.251,/172.16.100.252,/
> 172.16.100.34,/172.16.100.35,/172.16.100.36,/172.16.100.37,/172.16.100.38
> ,/172.16.100.42,/172.16.100.44,/172.16.100.45].
>
> Thanks.
>
> -Joe
>
>
> --
> This email has been checked for viruses by AVG antivirus software.
> www.avg.com
>


removenode stuck - cassandra 4.1.0

2023-01-23 Thread Joe Obernberger
I had a drive fail (first drive in the list) on a Cassandra cluster.  
I've stopped the node (as it no longer starts), and am trying to remove 
it from the cluster, but the removenode command is hung (been running 
for 3 hours so far):
nodetool removenode status is always reporting the same token as being 
removed.  Help?


nodetool removenode status
RemovalStatus: Removing token (-9196617215347134065). Waiting for 
replication confirmation from 
[/172.16.100.248,/172.16.100.249,/172.16.100.251,/172.16.100.252,/172.16.100.34,/172.16.100.35,/172.16.100.36,/172.16.100.37,/172.16.100.38,/172.16.100.42,/172.16.100.44,/172.16.100.45].


Thanks.

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Failed disks - correct procedure

2023-01-23 Thread Joe Obernberger
Some more observations.  If the first drive fails on a node, then you 
can't just remove it from the list.  Example:

We have:
/data/1/cassandra
/data/2/cassandra
/data/3/cassandra
/data/4/cassandra
...

If /data/1 fails, and I remove it from the list, when you try to start 
cassandra on that node it says there already exists a node with that 
address and you need to replace it.  I think the only option at that 
point it to bootstrap it and use the replace_address option.


-Joe

On 1/17/2023 10:41 AM, C. Scott Andreas wrote:
Bumping this note from Andy downthread to make sure everyone has seen 
it and is aware:


“Before you do that, you will want to make sure a cycle of repairs has 
run on the replicas of the down node to ensure they are consistent 
with each other.”


When replacing an instance, it’s necessary to run repair (incremental 
or full) among the surviving replicas *before* bootstrapping a 
replacement instance in. If you don’t do this, Cassandra’s quorum 
consistency guarantees won’t be met and data may appear to be lost. 
It’s not possible to use Cassandra as a consistent database without 
doing so.


Given replicas A, B, C, and replacement replica A*:
- Quorum write is witnessed by A, B
- A fails
- A* is bootstrapped in without repair of B, C
- Quorum read succeeds against A*, C
- The successful quorum read will not observe data from the previous 
successful quorum write and the data will appear to be lost.


Repairing surviving replicas before bootstrapping a replacement node 
is necessary to avoid this.


— Scott

On Jan 17, 2023, at 7:28 AM, Joe Obernberger 
 wrote:




I come from the hadoop world where we have a cluster with probably 
over 500 drives.  Drives fail all the time; or well several a year 
anyway.  We remove that single drive from HDFS, HDFS re-balances, and 
when we get around to it, we swap in a new drive, format it, and add 
it back to HDFS.  We keep the OS drives separate from the data drives 
and ensure that the OS volume is in a RAID mirror.  It's painful when 
OS drives fail, so mirror works.  When space is low, we add another 
node with lots of disks.
We are repurposing this same hardware to run a large Cassandra 
cluster.  I'd love it if Cassandra could support larger individual 
nodes, but we've been trying to configure it with lots of disks for 
redundancy, with the idea that we won't use an entire nodes storage 
only for Cassandra.  As was mentioned a long while back, blades seem 
to make more sense for Cassandra than single nodes with lots of disk, 
but we've got what we've got!

:)

So far, no issues with:
Stop node, remove drive from cassandra config, start node, run repair 
- version 4.1.


-Joe

On 1/17/2023 10:11 AM, Durity, Sean R via user wrote:


For physical hardware when disks fail, I do a removenode, wait for 
the drive to be replaced, reinstall Cassandra, and then bootstrap 
the node back in (and run clean-up across the DC).


All of our disks are presented as one file system for data, which is 
not what the original question was asking.


Sean R. Durity

*From:*Marc Hoppins 
*Sent:* Tuesday, January 17, 2023 3:57 AM
*To:* user@cassandra.apache.org
*Subject:* [EXTERNAL] RE: Failed disks - correct procedure

HI all, I was pondering this very situation. We have a node with a 
crapped-out disk (not the first time). Removenode vs repairnode: in 
regard time, there is going to be little difference twixt replacing 
a dead node and removing then re-installing


INTERNAL USE

HI all,
I was pondering this very situation.
We have a node with a crapped-out disk (not the first time). 
Removenode vs repairnode: in regard time, there is going to be 
little difference twixt replacing a dead node and removing then 
re-installing a node.  There is going to be a bunch of reads/writes 
and verifications (or similar) which is going to take a similar 
amount of time...or do I read that wrong?
For myself, I just go with removenode and then rejoin after HDD has 
bee replaced.  Usually the fix exceeds the wait time and the node is 
then out of the system anyway.

-Original Message-
From: Joe Obernberger 
Sent: Monday, January 16, 2023 6:31 PM
To: Jeff Jirsa ; user@cassandra.apache.org
Subject: Re: Failed disks - correct procedure
EXTERNAL
I'm using 4.1.0-1.
I've been doing a lot of truncates lately before the drive failed 
(research project).  Current drives have about 100GBytes of data 
each, although the actual amount of data in Cassandra is much less 
(because of truncates and snapshots).  The cluster is not 
homo-genius; some nodes have more drives than others.

nodetool status -r
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns  Host
ID   Rack
UN  nyx.querymasters.com    7.9 GiB    250 ?
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  enceladus.querymasters.com  6.34 GiB   200 ?
274a6e8d-de37-4e0b-b000-02d221d858

Re: Cassandra nightly process

2023-01-23 Thread Loïc CHANEL via user
Thanks for your help guys.
You were right, the problem actually came from a very heavy data treatment
that happens every 2 hours starting at midnight. The processing performance
was heavily affected causing one node to write hints because communication
with the other node was complicated.
Best regards,


Loïc CHANEL
System Big Data engineer
SoftAtHome (Lyon, France)


Le lun. 16 janv. 2023 à 17:23, Gábor Auth  a écrit :

> Hi,
>
> On Mon, Jan 16, 2023 at 3:07 PM Loïc CHANEL via user <
> user@cassandra.apache.org> wrote:
>
>> So my question here is : am I missing a Cassandra internal process that
>> is triggered on a daily basis at 0:00 and 2:00 ?
>>
>
> I bet, it's not a Cassandra issue. Have you any other metrics about your
> VPSs (CPU, memory, load, IO stat, disk throughput, network traffic, etc.)?
> I think, some process (on another virtual machine or host) steals your
> resources and your Cassandra cannot process the request and the other
> instance need to put data to hints.
>
> --
> Bye,
> Gábor Auth
>