Re: Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger

I'm using 4.1.0-1.
I've been doing a lot of truncates lately before the drive failed 
(research project).  Current drives have about 100GBytes of data each, 
although the actual amount of data in Cassandra is much less (because of 
truncates and snapshots).  The cluster is not homo-genius; some nodes 
have more drives than others.


nodetool status -r
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns  Host 
ID   Rack
UN  nyx.querymasters.com    7.9 GiB    250 ? 
07bccfce-45f1-41a3-a5c4-ee748a7a9b98  rack1
UN  enceladus.querymasters.com  6.34 GiB   200 ? 
274a6e8d-de37-4e0b-b000-02d221d858a5  rack1
UN  aion.querymasters.com   6.31 GiB   200 ? 
59150c47-274a-46fb-9d5e-bed468d36797  rack1
UN  calypso.querymasters.com    6.26 GiB   200 ? 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  fortuna.querymasters.com    7.1 GiB    200 ? 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  kratos.querymasters.com 6.36 GiB   200 ? 
0d9509cc-2f23-4117-a883-469a1be54baf  rack1
UN  charon.querymasters.com 6.35 GiB   200 ? 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  eros.querymasters.com   6.4 GiB    200 ? 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  ursula.querymasters.com 6.24 GiB   200 ? 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  gaia.querymasters.com   6.28 GiB   200 ? 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1
UN  chaos.querymasters.com  3.78 GiB   120 ? 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  pallas.querymasters.com 6.24 GiB   200 ? 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
UN  paradigm7.querymasters.com  16.25 GiB  500 ? 
1ccd2cc5-3ee5-43c5-a8c3-7065bdc24297  rack1
UN  aether.querymasters.com 6.36 GiB   200 ? 
352fd049-32f8-4be8-9275-68b145ac2832  rack1
UN  athena.querymasters.com 15.85 GiB  500 ? 
b088a8e6-42f3-4331-a583-47ef5149598f  rack1


-Joe

On 1/16/2023 12:23 PM, Jeff Jirsa wrote:

Prior to cassandra-6696 you’d have to treat one missing disk as a failed 
machine, wipe all the data and re-stream it, as a tombstone for a given value 
may be on one disk and data on another (effectively redirecting data)

So the answer has to be version dependent, too - which version were you using?


On Jan 16, 2023, at 9:08 AM, Tolbert, Andy  wrote:

Hi Joe,

Reading it back I realized I misunderstood that part of your email, so
you must be using data_file_directories with 16 drives?  That's a lot
of drives!  I imagine this may happen from time to time given that
disks like to fail.

That's a bit of an interesting scenario that I would have to think
about.  If you brought the node up without the bad drive, repairs are
probably going to do a ton of repair overstreaming if you aren't using
4.0 (https://issues.apache.org/jira/browse/CASSANDRA-3200) which may
put things into a really bad state (lots of streaming = lots of
compactions = slower reads) and you may be seeing some inconsistency
if repairs weren't regularly running beforehand.

How much data was on the drive that failed?  How much data do you
usually have per node?

Thanks,
Andy


On Mon, Jan 16, 2023 at 10:59 AM Joe Obernberger
 wrote:

Thank you Andy.
Is there a way to just remove the drive from the cluster and replace it
later?  Ordering replacement drives isn't a fast process...
What I've done so far is:
Stop node
Remove drive reference from /etc/cassandra/conf/cassandra.yaml
Restart node
Run repair

Will that work?  Right now, it's showing all nodes as up.

-Joe


On 1/16/2023 11:55 AM, Tolbert, Andy wrote:
Hi Joe,

I'd recommend just doing a replacement, bringing up a new node with
-Dcassandra.replace_address_first_boot=ip.you.are.replacing as
described here:
https://cassandra.apache.org/doc/4.1/cassandra/operating/topo_changes.html#replacing-a-dead-node

Before you do that, you will want to make sure a cycle of repairs has
run on the replicas of the down node to ensure they are consistent
with each other.

Make sure you also have 'auto_bootstrap: true' in the yaml of the node
you are replacing and that the initial_token matches the node you are
replacing (If you are not using vnodes) so the node doesn't skip
bootstrapping.  This is the default, but felt worth mentioning.

You can also remove the dead node, which should stream data to
replicas that will pick up new ranges, but you also will want to do
repairs ahead of time too.  To be honest it's not something I've done
recently, so I'm not as confident on executing that procedure.

Thanks,
Andy


On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger
 wrote:

Hi all - what is the correct procedure when handling a failed disk?
Have a node in a 15 node cluster.  This node has 16 drives and cassandra
data is split across them.  One drive is failing.  Can I just remove it
from the list and cassandra will then replicate? If not - what?
Thank you!

-Joe


--

Re: Failed disks - correct procedure

2023-01-16 Thread Jeff Jirsa
Prior to cassandra-6696 you’d have to treat one missing disk as a failed 
machine, wipe all the data and re-stream it, as a tombstone for a given value 
may be on one disk and data on another (effectively redirecting data)

So the answer has to be version dependent, too - which version were you using? 

> On Jan 16, 2023, at 9:08 AM, Tolbert, Andy  wrote:
> 
> Hi Joe,
> 
> Reading it back I realized I misunderstood that part of your email, so
> you must be using data_file_directories with 16 drives?  That's a lot
> of drives!  I imagine this may happen from time to time given that
> disks like to fail.
> 
> That's a bit of an interesting scenario that I would have to think
> about.  If you brought the node up without the bad drive, repairs are
> probably going to do a ton of repair overstreaming if you aren't using
> 4.0 (https://issues.apache.org/jira/browse/CASSANDRA-3200) which may
> put things into a really bad state (lots of streaming = lots of
> compactions = slower reads) and you may be seeing some inconsistency
> if repairs weren't regularly running beforehand.
> 
> How much data was on the drive that failed?  How much data do you
> usually have per node?
> 
> Thanks,
> Andy
> 
>> On Mon, Jan 16, 2023 at 10:59 AM Joe Obernberger
>>  wrote:
>> 
>> Thank you Andy.
>> Is there a way to just remove the drive from the cluster and replace it
>> later?  Ordering replacement drives isn't a fast process...
>> What I've done so far is:
>> Stop node
>> Remove drive reference from /etc/cassandra/conf/cassandra.yaml
>> Restart node
>> Run repair
>> 
>> Will that work?  Right now, it's showing all nodes as up.
>> 
>> -Joe
>> 
>>> On 1/16/2023 11:55 AM, Tolbert, Andy wrote:
>>> Hi Joe,
>>> 
>>> I'd recommend just doing a replacement, bringing up a new node with
>>> -Dcassandra.replace_address_first_boot=ip.you.are.replacing as
>>> described here:
>>> https://cassandra.apache.org/doc/4.1/cassandra/operating/topo_changes.html#replacing-a-dead-node
>>> 
>>> Before you do that, you will want to make sure a cycle of repairs has
>>> run on the replicas of the down node to ensure they are consistent
>>> with each other.
>>> 
>>> Make sure you also have 'auto_bootstrap: true' in the yaml of the node
>>> you are replacing and that the initial_token matches the node you are
>>> replacing (If you are not using vnodes) so the node doesn't skip
>>> bootstrapping.  This is the default, but felt worth mentioning.
>>> 
>>> You can also remove the dead node, which should stream data to
>>> replicas that will pick up new ranges, but you also will want to do
>>> repairs ahead of time too.  To be honest it's not something I've done
>>> recently, so I'm not as confident on executing that procedure.
>>> 
>>> Thanks,
>>> Andy
>>> 
>>> 
>>> On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger
>>>  wrote:
 Hi all - what is the correct procedure when handling a failed disk?
 Have a node in a 15 node cluster.  This node has 16 drives and cassandra
 data is split across them.  One drive is failing.  Can I just remove it
 from the list and cassandra will then replicate? If not - what?
 Thank you!
 
 -Joe
 
 
 --
 This email has been checked for viruses by AVG antivirus software.
 www.avg.com
>> 
>> --
>> This email has been checked for viruses by AVG antivirus software.
>> www.avg.com


Re: Failed disks - correct procedure

2023-01-16 Thread Tolbert, Andy
Hi Joe,

Reading it back I realized I misunderstood that part of your email, so
you must be using data_file_directories with 16 drives?  That's a lot
of drives!  I imagine this may happen from time to time given that
disks like to fail.

That's a bit of an interesting scenario that I would have to think
about.  If you brought the node up without the bad drive, repairs are
probably going to do a ton of repair overstreaming if you aren't using
4.0 (https://issues.apache.org/jira/browse/CASSANDRA-3200) which may
put things into a really bad state (lots of streaming = lots of
compactions = slower reads) and you may be seeing some inconsistency
if repairs weren't regularly running beforehand.

How much data was on the drive that failed?  How much data do you
usually have per node?

Thanks,
Andy

On Mon, Jan 16, 2023 at 10:59 AM Joe Obernberger
 wrote:
>
> Thank you Andy.
> Is there a way to just remove the drive from the cluster and replace it
> later?  Ordering replacement drives isn't a fast process...
> What I've done so far is:
> Stop node
> Remove drive reference from /etc/cassandra/conf/cassandra.yaml
> Restart node
> Run repair
>
> Will that work?  Right now, it's showing all nodes as up.
>
> -Joe
>
> On 1/16/2023 11:55 AM, Tolbert, Andy wrote:
> > Hi Joe,
> >
> > I'd recommend just doing a replacement, bringing up a new node with
> > -Dcassandra.replace_address_first_boot=ip.you.are.replacing as
> > described here:
> > https://cassandra.apache.org/doc/4.1/cassandra/operating/topo_changes.html#replacing-a-dead-node
> >
> > Before you do that, you will want to make sure a cycle of repairs has
> > run on the replicas of the down node to ensure they are consistent
> > with each other.
> >
> > Make sure you also have 'auto_bootstrap: true' in the yaml of the node
> > you are replacing and that the initial_token matches the node you are
> > replacing (If you are not using vnodes) so the node doesn't skip
> > bootstrapping.  This is the default, but felt worth mentioning.
> >
> > You can also remove the dead node, which should stream data to
> > replicas that will pick up new ranges, but you also will want to do
> > repairs ahead of time too.  To be honest it's not something I've done
> > recently, so I'm not as confident on executing that procedure.
> >
> > Thanks,
> > Andy
> >
> >
> > On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger
> >  wrote:
> >> Hi all - what is the correct procedure when handling a failed disk?
> >> Have a node in a 15 node cluster.  This node has 16 drives and cassandra
> >> data is split across them.  One drive is failing.  Can I just remove it
> >> from the list and cassandra will then replicate? If not - what?
> >> Thank you!
> >>
> >> -Joe
> >>
> >>
> >> --
> >> This email has been checked for viruses by AVG antivirus software.
> >> www.avg.com
>
> --
> This email has been checked for viruses by AVG antivirus software.
> www.avg.com


Re: Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger

Thank you Andy.
Is there a way to just remove the drive from the cluster and replace it 
later?  Ordering replacement drives isn't a fast process...

What I've done so far is:
Stop node
Remove drive reference from /etc/cassandra/conf/cassandra.yaml
Restart node
Run repair

Will that work?  Right now, it's showing all nodes as up.

-Joe

On 1/16/2023 11:55 AM, Tolbert, Andy wrote:

Hi Joe,

I'd recommend just doing a replacement, bringing up a new node with
-Dcassandra.replace_address_first_boot=ip.you.are.replacing as
described here:
https://cassandra.apache.org/doc/4.1/cassandra/operating/topo_changes.html#replacing-a-dead-node

Before you do that, you will want to make sure a cycle of repairs has
run on the replicas of the down node to ensure they are consistent
with each other.

Make sure you also have 'auto_bootstrap: true' in the yaml of the node
you are replacing and that the initial_token matches the node you are
replacing (If you are not using vnodes) so the node doesn't skip
bootstrapping.  This is the default, but felt worth mentioning.

You can also remove the dead node, which should stream data to
replicas that will pick up new ranges, but you also will want to do
repairs ahead of time too.  To be honest it's not something I've done
recently, so I'm not as confident on executing that procedure.

Thanks,
Andy


On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger
 wrote:

Hi all - what is the correct procedure when handling a failed disk?
Have a node in a 15 node cluster.  This node has 16 drives and cassandra
data is split across them.  One drive is failing.  Can I just remove it
from the list and cassandra will then replicate? If not - what?
Thank you!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Failed disks - correct procedure

2023-01-16 Thread Tolbert, Andy
Hi Joe,

I'd recommend just doing a replacement, bringing up a new node with
-Dcassandra.replace_address_first_boot=ip.you.are.replacing as
described here:
https://cassandra.apache.org/doc/4.1/cassandra/operating/topo_changes.html#replacing-a-dead-node

Before you do that, you will want to make sure a cycle of repairs has
run on the replicas of the down node to ensure they are consistent
with each other.

Make sure you also have 'auto_bootstrap: true' in the yaml of the node
you are replacing and that the initial_token matches the node you are
replacing (If you are not using vnodes) so the node doesn't skip
bootstrapping.  This is the default, but felt worth mentioning.

You can also remove the dead node, which should stream data to
replicas that will pick up new ranges, but you also will want to do
repairs ahead of time too.  To be honest it's not something I've done
recently, so I'm not as confident on executing that procedure.

Thanks,
Andy


On Mon, Jan 16, 2023 at 9:28 AM Joe Obernberger
 wrote:
>
> Hi all - what is the correct procedure when handling a failed disk?
> Have a node in a 15 node cluster.  This node has 16 drives and cassandra
> data is split across them.  One drive is failing.  Can I just remove it
> from the list and cassandra will then replicate? If not - what?
> Thank you!
>
> -Joe
>
>
> --
> This email has been checked for viruses by AVG antivirus software.
> www.avg.com


Re: Cassandra nightly process

2023-01-16 Thread Gábor Auth
Hi,

On Mon, Jan 16, 2023 at 3:07 PM Loïc CHANEL via user <
user@cassandra.apache.org> wrote:

> So my question here is : am I missing a Cassandra internal process that is
> triggered on a daily basis at 0:00 and 2:00 ?
>

I bet, it's not a Cassandra issue. Have you any other metrics about your
VPSs (CPU, memory, load, IO stat, disk throughput, network traffic, etc.)?
I think, some process (on another virtual machine or host) steals your
resources and your Cassandra cannot process the request and the other
instance need to put data to hints.

-- 
Bye,
Gábor Auth


Re: Cassandra nightly process

2023-01-16 Thread Patrick McFadin
My general advice for any time you see hints accumulating, consider that
smoke for the more pressing fire happening somewhere else. You correctly
identified the right path to consider, which is some sort of scheduled
activity. Cassandra doesn't have any scheduled internal jobs. Compactions
happen as needed, and maintenance jobs such as repair are scheduled by the
operator. The thing to investigate at this point is what other spikes are
happening in the system at this point. Network? Disk activity? CPU?

Other things to investigate
Unrelated systems consuming network bandwidth
VPN tunnels resetting nightly (yes, I've seen that one)
You mentioned virtualization, noisy neighbor?

Good luck with the sleuthing!

Patrick



On Mon, Jan 16, 2023 at 6:33 AM Yakir Gibraltar  wrote:

> Check if you see packet loss at this time
>
> On Mon, Jan 16, 2023 at 4:08 PM Loïc CHANEL via user <
> user@cassandra.apache.org> wrote:
>
>> Hi team,
>>
>> I am currently running a 2-nodes Cassandra database. Although that's not
>> the best setup, the cluster is doing pretty fine.
>> Still, I noticed that for (at least) 5 days now, one of my two nodes is
>> writing hints during the night, and then it recovers the data-sync with the
>> other node in the morning. What's interesting here is that we can see that
>> the hints writing stuff happens at 0:00 every day, then it starts to
>> recover for a few minutes before 2:00, then the node starts writing hints
>> again at 2:00, then it (sometimes) starts to recover a few minutes before
>> 4:00 and then it start writing hints again at 4:00 (and sometimes same
>> thing happens once more around 6:00).
>> See the following graph that shows the number of hints files in hints
>> directory :
>>
>> [image: image.png]
>>
>> I am not aware of any nightly sync or any such process in Cassandra
>> database, but I checked cron logs on the servers and nothing seems to
>> happen at these times, and there is no particular activity at that time on
>> the virtualization platform (the daily snapshot is taken at 1:30). So my
>> question here is : am I missing a Cassandra internal process that is
>> triggered on a daily basis at 0:00 and 2:00 ?
>> Thanks,
>>
>>
>> Loïc CHANEL
>> System Big Data engineer
>> SoftAtHome (Lyon, France)
>>
>
>
> --
> *בברכה,*
> *יקיר גיברלטר*
>


Failed disks - correct procedure

2023-01-16 Thread Joe Obernberger

Hi all - what is the correct procedure when handling a failed disk?
Have a node in a 15 node cluster.  This node has 16 drives and cassandra 
data is split across them.  One drive is failing.  Can I just remove it 
from the list and cassandra will then replicate? If not - what?

Thank you!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com


Re: Cassandra nightly process

2023-01-16 Thread Yakir Gibraltar
Check if you see packet loss at this time

On Mon, Jan 16, 2023 at 4:08 PM Loïc CHANEL via user <
user@cassandra.apache.org> wrote:

> Hi team,
>
> I am currently running a 2-nodes Cassandra database. Although that's not
> the best setup, the cluster is doing pretty fine.
> Still, I noticed that for (at least) 5 days now, one of my two nodes is
> writing hints during the night, and then it recovers the data-sync with the
> other node in the morning. What's interesting here is that we can see that
> the hints writing stuff happens at 0:00 every day, then it starts to
> recover for a few minutes before 2:00, then the node starts writing hints
> again at 2:00, then it (sometimes) starts to recover a few minutes before
> 4:00 and then it start writing hints again at 4:00 (and sometimes same
> thing happens once more around 6:00).
> See the following graph that shows the number of hints files in hints
> directory :
>
> [image: image.png]
>
> I am not aware of any nightly sync or any such process in Cassandra
> database, but I checked cron logs on the servers and nothing seems to
> happen at these times, and there is no particular activity at that time on
> the virtualization platform (the daily snapshot is taken at 1:30). So my
> question here is : am I missing a Cassandra internal process that is
> triggered on a daily basis at 0:00 and 2:00 ?
> Thanks,
>
>
> Loïc CHANEL
> System Big Data engineer
> SoftAtHome (Lyon, France)
>


-- 
*בברכה,*
*יקיר גיברלטר*


Cassandra nightly process

2023-01-16 Thread Loïc CHANEL via user
Hi team,

I am currently running a 2-nodes Cassandra database. Although that's not
the best setup, the cluster is doing pretty fine.
Still, I noticed that for (at least) 5 days now, one of my two nodes is
writing hints during the night, and then it recovers the data-sync with the
other node in the morning. What's interesting here is that we can see that
the hints writing stuff happens at 0:00 every day, then it starts to
recover for a few minutes before 2:00, then the node starts writing hints
again at 2:00, then it (sometimes) starts to recover a few minutes before
4:00 and then it start writing hints again at 4:00 (and sometimes same
thing happens once more around 6:00).
See the following graph that shows the number of hints files in hints
directory :

[image: image.png]

I am not aware of any nightly sync or any such process in Cassandra
database, but I checked cron logs on the servers and nothing seems to
happen at these times, and there is no particular activity at that time on
the virtualization platform (the daily snapshot is taken at 1:30). So my
question here is : am I missing a Cassandra internal process that is
triggered on a daily basis at 0:00 and 2:00 ?
Thanks,


Loïc CHANEL
System Big Data engineer
SoftAtHome (Lyon, France)


Upgrading Cassandra 3.11.14 → 4.1

2023-01-16 Thread Lapo Luchini

Hi all,
is upgrading Cassandra 3.11.14 → 4.1 supported, or is it better to 
follow the 3.11.14 → 4.0 → 4.1 path?


(I think it is okay as i found no record of deprecated old SSTable 
formats, but I couldn't manage to find any official documentation 
regarding upgrade paths… forgive me if it is around)


--
Lapo Luchini
l...@lapo.it