Re: Remove folders of deleted tables
The same table name with two different CF IDs is not just "temporary schema disagreements", it's much worse than that. This breaks the eventual consistency guarantee, and leads to silent data corruption. It's silently happening in the background, and you don't realise it until you suddenly do, and then everything seems to blow up at the same time. You need to sort this out ASAP. On 05/12/2023 19:57, Sébastien Rebecchi wrote: Hi Bowen, Thanks for your answer. I was thinking of extreme use cases, but as far as I am concerned I can deal with creation and deletion of 2 tables every 6 hours for a keyspace. So it lets around 8 folders of deleted tables per day - sometimes more cause I can see sometimes 2 folders created for a same table name, with 2 different ids, caused by temporary schema disagreements I guess. Basically it means 20 years before the KS folder has 65K subfolders, so I would say I have time to think of redesigning the data model ^^ Nevertheless, does it sound too much in terms of thombstones in the systems tables (with the default GC grace period of 10 days)? Sébastien. Le mar. 5 déc. 2023, 12:19, Bowen Song via user a écrit : Please rethink your use case. Create and delete tables concurrently often lead to schema disagreement. Even doing so on a single node sequentially will lead to a large number of tombstones in the system tables. On 04/12/2023 19:55, Sébastien Rebecchi wrote: Thank you Dipan. Do you know if there is a good reason for Cassandra to let tables folder even when there is no snapshot? I'm thinking of use cases where there is the need to create and delete small tables at a high rate. You could quickly end with more than 65K (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them are residual of deleted tables. That looks quite dirty from Cassandra to not clean its own "garbage" by itself, and quite dangerous for the end user to have to do it alone, don't you think so? Thanks, Sébastien. Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : Hello Sebastien, There are no inbuilt tools that will automatically remove folders of deleted tables. Thanks, Dipan Shah *From:* Sébastien Rebecchi *Sent:* 04 December 2023 13:54 *To:* user@cassandra.apache.org *Subject:* Remove folders of deleted tables Hello, When we delete a table with Cassandra, it lets the folder of that table on file system, even if there is no snapshot (auto snapshots disabled). So we end with the empty folder {data folder}/{keyspace name}/{table name-table id} containing only 1 subfolder, backups, which is itself empty. Is there a way to automatically remove folders of deleted tables? Sébastien.
Re: Remove folders of deleted tables
The last time you mentioned this: On Tue, Dec 5, 2023 at 11:57 AM Sébastien Rebecchi wrote: > Hi Bowen, > > Thanks for your answer. > > I was thinking of extreme use cases, but as far as I am concerned I can > deal with creation and deletion of 2 tables every 6 hours for a keyspace. > So it lets around 8 folders of deleted tables per day - sometimes more > cause I can see sometimes 2 folders created for a same table name, with 2 > different ids, caused by temporary schema disagreements I guess. > I told you it's much worse than you're assuming it is: https://lists.apache.org/thread/fzkn3vqjyfjslcv97wcycb6w0wn5ltk2 Here's a more detailed explanation: https://www.mail-archive.com/user@cassandra.apache.org/msg62206.html (This is fixed and strictly safe in the version of cassandra with transactional cluster metadata, which just got merged to trunk in the past month, so "will be safe soon").
Re: Remove folders of deleted tables
Hi Bowen, Thanks for your answer. I was thinking of extreme use cases, but as far as I am concerned I can deal with creation and deletion of 2 tables every 6 hours for a keyspace. So it lets around 8 folders of deleted tables per day - sometimes more cause I can see sometimes 2 folders created for a same table name, with 2 different ids, caused by temporary schema disagreements I guess. Basically it means 20 years before the KS folder has 65K subfolders, so I would say I have time to think of redesigning the data model ^^ Nevertheless, does it sound too much in terms of thombstones in the systems tables (with the default GC grace period of 10 days)? Sébastien. Le mar. 5 déc. 2023, 12:19, Bowen Song via user a écrit : > Please rethink your use case. Create and delete tables concurrently often > lead to schema disagreement. Even doing so on a single node sequentially > will lead to a large number of tombstones in the system tables. > On 04/12/2023 19:55, Sébastien Rebecchi wrote: > > Thank you Dipan. > > Do you know if there is a good reason for Cassandra to let tables folder > even when there is no snapshot? > > I'm thinking of use cases where there is the need to create and delete > small tables at a high rate. You could quickly end with more than 65K > (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them > are residual of deleted tables. > > That looks quite dirty from Cassandra to not clean its own "garbage" by > itself, and quite dangerous for the end user to have to do it alone, don't > you think so? > > Thanks, > > Sébastien. > > Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : > >> Hello Sebastien, >> >> There are no inbuilt tools that will automatically remove folders of >> deleted tables. >> >> Thanks, >> >> Dipan Shah >> -- >> *From:* Sébastien Rebecchi >> *Sent:* 04 December 2023 13:54 >> *To:* user@cassandra.apache.org >> *Subject:* Remove folders of deleted tables >> >> Hello, >> >> When we delete a table with Cassandra, it lets the folder of that table >> on file system, even if there is no snapshot (auto snapshots disabled). >> So we end with the empty folder {data folder}/{keyspace name}/{table >> name-table id} containing only 1 subfolder, backups, which is itself empty. >> Is there a way to automatically remove folders of deleted tables? >> >> Sébastien. >> >
Re: Remove folders of deleted tables
I can't think of a reason to keep empty directories around, seems like a reasonable change, but I don't think you're butting up against a thing that most people would run into, as snapshots are enabled by default (auto_snapshot: true) and almost nobody changes it. The use case you described isn't handled well by Cassandra for a host of other reasons, and I would *never* do that in a production environment with any released version. The folder thing is the least of the issues you'll run into, so even if you contribute a patch and address it, I'd still wouldn't do it until transactional cluster metadata gets released and I've had a chance to kick the tires to see what issues you run into besides schema inconsistencies. I suspect the drivers won't love it either. Assuming you're running into an issue now: find . -type d -empty -exec rmdir {} \; rmdir only removes empty directories, you'll need to run it twice (once for backup, once for the empty table). It will remove all empty directories in that folder so if you've got unused tables, you'd be better off using the find command, getting the list, removing the active tables from it and explicitly running the rmdir command with the directories you want cleaned up. Jon On 2023/12/04 19:55:06 Sébastien Rebecchi wrote: > Thank you Dipan. > > Do you know if there is a good reason for Cassandra to let tables folder > even when there is no snapshot? > > I'm thinking of use cases where there is the need to create and delete > small tables at a high rate. You could quickly end with more than 65K > (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them > are residual of deleted tables. > > That looks quite dirty from Cassandra to not clean its own "garbage" by > itself, and quite dangerous for the end user to have to do it alone, don't > you think so? > > Thanks, > > Sébastien. > > Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : > > > Hello Sebastien, > > > > There are no inbuilt tools that will automatically remove folders of > > deleted tables. > > > > Thanks, > > > > Dipan Shah > > -- > > *From:* Sébastien Rebecchi > > *Sent:* 04 December 2023 13:54 > > *To:* user@cassandra.apache.org > > *Subject:* Remove folders of deleted tables > > > > Hello, > > > > When we delete a table with Cassandra, it lets the folder of that table on > > file system, even if there is no snapshot (auto snapshots disabled). > > So we end with the empty folder {data folder}/{keyspace name}/{table > > name-table id} containing only 1 subfolder, backups, which is itself empty. > > Is there a way to automatically remove folders of deleted tables? > > > > Sébastien. > > >
[RELEASE] Apache Cassandra 5.0-beta1 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 5.0-beta1. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is a beta release[1] on the 5.0 series. As always, please pay attention to the release notes[2] and let us know[3] if you were to encounter any problem. Please note what our definition of a beta release means, further info at https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle For more information on what's in 5.0: https://cassandra.apache.org/_/Apache-Cassandra-5.0-Moving-Toward-an-AI-Driven-Future.html Enjoy! [1]: CHANGES.txt https://github.com/apache/cassandra/blob/cassandra-5.0-beta1/CHANGES.txt [2]: NEWS.txt https://github.com/apache/cassandra/blob/cassandra-5.0-beta1/NEWS.txt [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: Remove folders of deleted tables
Please rethink your use case. Create and delete tables concurrently often lead to schema disagreement. Even doing so on a single node sequentially will lead to a large number of tombstones in the system tables. On 04/12/2023 19:55, Sébastien Rebecchi wrote: Thank you Dipan. Do you know if there is a good reason for Cassandra to let tables folder even when there is no snapshot? I'm thinking of use cases where there is the need to create and delete small tables at a high rate. You could quickly end with more than 65K (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them are residual of deleted tables. That looks quite dirty from Cassandra to not clean its own "garbage" by itself, and quite dangerous for the end user to have to do it alone, don't you think so? Thanks, Sébastien. Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : Hello Sebastien, There are no inbuilt tools that will automatically remove folders of deleted tables. Thanks, Dipan Shah *From:* Sébastien Rebecchi *Sent:* 04 December 2023 13:54 *To:* user@cassandra.apache.org *Subject:* Remove folders of deleted tables Hello, When we delete a table with Cassandra, it lets the folder of that table on file system, even if there is no snapshot (auto snapshots disabled). So we end with the empty folder {data folder}/{keyspace name}/{table name-table id} containing only 1 subfolder, backups, which is itself empty. Is there a way to automatically remove folders of deleted tables? Sébastien.