Re: Failed disks - correct procedure

Joe Obernberger Mon, 16 Jan 2023 08:59:39 -0800

Thank you Andy.

Is there a way to just remove the drive from the cluster and replace itlater? Ordering replacement drives isn't a fast process...

What I've done so far is:
Stop node
Remove drive reference from /etc/cassandra/conf/cassandra.yaml
Restart node
Run repair


Will that work?  Right now, it's showing all nodes as up.

-Joe

On 1/16/2023 11:55 AM, Tolbert, Andy wrote:

Hi Joe,

I'd recommend just doing a replacement, bringing up a new node with
-Dcassandra.replace_address_first_boot=ip.you.are.replacing as
described here:
https://cassandra.apache.org/doc/4.1/cassandra/operating/topo_changes.html#replacing-a-dead-node

Before you do that, you will want to make sure a cycle of repairs has
run on the replicas of the down node to ensure they are consistent
with each other.

Make sure you also have 'auto_bootstrap: true' in the yaml of the node
you are replacing and that the initial_token matches the node you are
replacing (If you are not using vnodes) so the node doesn't skip
bootstrapping.  This is the default, but felt worth mentioning.

You can also remove the dead node, which should stream data to
replicas that will pick up new ranges, but you also will want to do
repairs ahead of time too.  To be honest it's not something I've done
recently, so I'm not as confident on executing that procedure.

Thanks,
Andy


On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger
<joseph.obernber...@gmail.com> wrote:

Hi all - what is the correct procedure when handling a failed disk?
Have a node in a 15 node cluster.  This node has 16 drives and cassandra
data is split across them.  One drive is failing.  Can I just remove it
from the list and cassandra will then replicate? If not - what?
Thank you!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com

Re: Failed disks - correct procedure

Reply via email to