Re: Remove folders of deleted tables

2023-12-05 Thread Bowen Song via user
The same table name with two different CF IDs is not just "temporary 
schema disagreements", it's much worse than that. This breaks the 
eventual consistency guarantee, and leads to silent data corruption. 
It's silently happening in the background, and you don't realise it 
until you suddenly do, and then everything seems to blow up at the same 
time. You need to sort this out ASAP.



On 05/12/2023 19:57, Sébastien Rebecchi wrote:

Hi Bowen,

Thanks for your answer.

I was thinking of extreme use cases, but as far as I am concerned I 
can deal with creation and deletion of 2 tables every 6 hours for a 
keyspace. So it lets around 8 folders of deleted tables per day - 
sometimes more cause I can see sometimes 2 folders created for a same 
table name, with 2 different ids, caused by temporary schema 
disagreements I guess.
Basically it means 20 years before the KS folder has 65K subfolders, 
so I would say I have time to think of redesigning the data model ^^
Nevertheless, does it sound too much in terms of thombstones in the 
systems tables (with the default GC grace period of 10 days)?


Sébastien.

Le mar. 5 déc. 2023, 12:19, Bowen Song via user 
 a écrit :


Please rethink your use case. Create and delete tables
concurrently often lead to schema disagreement. Even doing so on a
single node sequentially will lead to a large number of tombstones
in the system tables.

On 04/12/2023 19:55, Sébastien Rebecchi wrote:

Thank you Dipan.

Do you know if there is a good reason for Cassandra to let tables
folder even when there is no snapshot?

I'm thinking of use cases where there is the need to create and
delete small tables at a high rate. You could quickly end with
more than 65K (limit of ext4) subdirectories in the KS directory,
while 99.9.. % of them are residual of deleted tables.

That looks quite dirty from Cassandra to not clean its own
"garbage" by itself, and quite dangerous for the end user to have
to do it alone, don't you think so?

Thanks,

Sébastien.

Le lun. 4 déc. 2023, 11:28, Dipan Shah  a
écrit :

Hello Sebastien,

There are no inbuilt tools that will automatically remove
folders of deleted tables.

Thanks,

Dipan Shah


*From:* Sébastien Rebecchi 
*Sent:* 04 December 2023 13:54
*To:* user@cassandra.apache.org 
*Subject:* Remove folders of deleted tables
Hello,

When we delete a table with Cassandra, it lets the folder of
that table on file system, even if there is no snapshot (auto
snapshots disabled).
So we end with the empty folder {data folder}/{keyspace
name}/{table name-table id} containing only 1  subfolder,
backups, which is itself empty.
Is there a way to automatically remove folders of deleted tables?

Sébastien.


Re: Remove folders of deleted tables

2023-12-05 Thread Jeff Jirsa
The last time you mentioned this:

On Tue, Dec 5, 2023 at 11:57 AM Sébastien Rebecchi 
wrote:

> Hi Bowen,
>
> Thanks for your answer.
>
> I was thinking of extreme use cases, but as far as I am concerned I can
> deal with creation and deletion of 2 tables every 6 hours for a keyspace.
> So it lets around 8 folders of deleted tables per day - sometimes more
> cause I can see sometimes 2 folders created for a same table name, with 2
> different ids, caused by temporary schema disagreements I guess.
>

I told you it's much worse than you're assuming it is:
https://lists.apache.org/thread/fzkn3vqjyfjslcv97wcycb6w0wn5ltk2

Here's a more detailed explanation:
https://www.mail-archive.com/user@cassandra.apache.org/msg62206.html

(This is fixed and strictly safe in the version of cassandra with
transactional cluster metadata, which just got merged to trunk in the past
month, so "will be safe soon").


Re: Remove folders of deleted tables

2023-12-05 Thread Sébastien Rebecchi
Hi Bowen,

Thanks for your answer.

I was thinking of extreme use cases, but as far as I am concerned I can
deal with creation and deletion of 2 tables every 6 hours for a keyspace.
So it lets around 8 folders of deleted tables per day - sometimes more
cause I can see sometimes 2 folders created for a same table name, with 2
different ids, caused by temporary schema disagreements I guess.
Basically it means 20 years before the KS folder has 65K subfolders, so I
would say I have time to think of redesigning the data model ^^
Nevertheless, does it sound too much in terms of thombstones in the systems
tables (with the default GC grace period of 10 days)?

Sébastien.

Le mar. 5 déc. 2023, 12:19, Bowen Song via user 
a écrit :

> Please rethink your use case. Create and delete tables concurrently often
> lead to schema disagreement. Even doing so on a single node sequentially
> will lead to a large number of tombstones in the system tables.
> On 04/12/2023 19:55, Sébastien Rebecchi wrote:
>
> Thank you Dipan.
>
> Do you know if there is a good reason for Cassandra to let tables folder
> even when there is no snapshot?
>
> I'm thinking of use cases where there is the need to create and delete
> small tables at a high rate. You could quickly end with more than 65K
> (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them
> are residual of deleted tables.
>
> That looks quite dirty from Cassandra to not clean its own "garbage" by
> itself, and quite dangerous for the end user to have to do it alone, don't
> you think so?
>
> Thanks,
>
> Sébastien.
>
> Le lun. 4 déc. 2023, 11:28, Dipan Shah  a écrit :
>
>> Hello Sebastien,
>>
>> There are no inbuilt tools that will automatically remove folders of
>> deleted tables.
>>
>> Thanks,
>>
>> Dipan Shah
>> --
>> *From:* Sébastien Rebecchi 
>> *Sent:* 04 December 2023 13:54
>> *To:* user@cassandra.apache.org 
>> *Subject:* Remove folders of deleted tables
>>
>> Hello,
>>
>> When we delete a table with Cassandra, it lets the folder of that table
>> on file system, even if there is no snapshot (auto snapshots disabled).
>> So we end with the empty folder {data folder}/{keyspace name}/{table
>> name-table id} containing only 1  subfolder, backups, which is itself empty.
>> Is there a way to automatically remove folders of deleted tables?
>>
>> Sébastien.
>>
>


Re: Remove folders of deleted tables

2023-12-05 Thread Jon Haddad
I can't think of a reason to keep empty directories around, seems like a 
reasonable change, but I don't think you're butting up against a thing that 
most people would run into, as snapshots are enabled by default (auto_snapshot: 
true) and almost nobody changes it.  

The use case you described isn't handled well by Cassandra for a host of other 
reasons, and I would *never* do that in a production environment with any 
released version.  The folder thing is the least of the issues you'll run into, 
so even if you contribute a patch and address it, I'd still wouldn't do it 
until transactional cluster metadata gets released and I've had a chance to 
kick the tires to see what issues you run into besides schema inconsistencies.  
I suspect the drivers won't love it either.

Assuming you're running into an issue now:

find . -type d -empty -exec rmdir {} \;

rmdir only removes empty directories, you'll need to run it twice (once for 
backup, once for the empty table).  It will remove all empty directories in 
that folder so if you've got unused tables, you'd be better off using the find 
command, getting the list, removing the active tables from it and explicitly 
running the rmdir command with the directories you want cleaned up.

Jon

On 2023/12/04 19:55:06 Sébastien Rebecchi wrote:
> Thank you Dipan.
> 
> Do you know if there is a good reason for Cassandra to let tables folder
> even when there is no snapshot?
> 
> I'm thinking of use cases where there is the need to create and delete
> small tables at a high rate. You could quickly end with more than 65K
> (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them
> are residual of deleted tables.
> 
> That looks quite dirty from Cassandra to not clean its own "garbage" by
> itself, and quite dangerous for the end user to have to do it alone, don't
> you think so?
> 
> Thanks,
> 
> Sébastien.
> 
> Le lun. 4 déc. 2023, 11:28, Dipan Shah  a écrit :
> 
> > Hello Sebastien,
> >
> > There are no inbuilt tools that will automatically remove folders of
> > deleted tables.
> >
> > Thanks,
> >
> > Dipan Shah
> > --
> > *From:* Sébastien Rebecchi 
> > *Sent:* 04 December 2023 13:54
> > *To:* user@cassandra.apache.org 
> > *Subject:* Remove folders of deleted tables
> >
> > Hello,
> >
> > When we delete a table with Cassandra, it lets the folder of that table on
> > file system, even if there is no snapshot (auto snapshots disabled).
> > So we end with the empty folder {data folder}/{keyspace name}/{table
> > name-table id} containing only 1  subfolder, backups, which is itself empty.
> > Is there a way to automatically remove folders of deleted tables?
> >
> > Sébastien.
> >
> 


[RELEASE] Apache Cassandra 5.0-beta1 released

2023-12-05 Thread Mick Semb Wever
The Cassandra team is pleased to announce the release of Apache Cassandra
version 5.0-beta1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a beta release[1] on the 5.0 series. As always, please pay
attention to the release notes[2] and let us know[3] if you were to
encounter any problem.

Please note what our definition of a beta release means, further info at
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle

For more information on what's in 5.0:
https://cassandra.apache.org/_/Apache-Cassandra-5.0-Moving-Toward-an-AI-Driven-Future.html


Enjoy!

[1]: CHANGES.txt
https://github.com/apache/cassandra/blob/cassandra-5.0-beta1/CHANGES.txt
[2]: NEWS.txt
https://github.com/apache/cassandra/blob/cassandra-5.0-beta1/NEWS.txt
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Remove folders of deleted tables

2023-12-05 Thread Bowen Song via user
Please rethink your use case. Create and delete tables concurrently 
often lead to schema disagreement. Even doing so on a single node 
sequentially will lead to a large number of tombstones in the system tables.


On 04/12/2023 19:55, Sébastien Rebecchi wrote:

Thank you Dipan.

Do you know if there is a good reason for Cassandra to let tables 
folder even when there is no snapshot?


I'm thinking of use cases where there is the need to create and delete 
small tables at a high rate. You could quickly end with more than 65K 
(limit of ext4) subdirectories in the KS directory, while 99.9.. % of 
them are residual of deleted tables.


That looks quite dirty from Cassandra to not clean its own "garbage" 
by itself, and quite dangerous for the end user to have to do it 
alone, don't you think so?


Thanks,

Sébastien.

Le lun. 4 déc. 2023, 11:28, Dipan Shah  a écrit :

Hello Sebastien,

There are no inbuilt tools that will automatically remove folders
of deleted tables.

Thanks,

Dipan Shah


*From:* Sébastien Rebecchi 
*Sent:* 04 December 2023 13:54
*To:* user@cassandra.apache.org 
*Subject:* Remove folders of deleted tables
Hello,

When we delete a table with Cassandra, it lets the folder of that
table on file system, even if there is no snapshot (auto snapshots
disabled).
So we end with the empty folder {data folder}/{keyspace
name}/{table name-table id} containing only 1  subfolder, backups,
which is itself empty.
Is there a way to automatically remove folders of deleted tables?

Sébastien.