Re: Remove folders of deleted tables
a écrit : The same table name with two different CF IDs is not just "temporary schema disagreements", it's much worse than that. This breaks the eventual consistency guarantee, and leads to silent data corruption. It's silently happening in the background, and you don't realise it until you suddenly do, and then everything seems to blow up at the same time. You need to sort this out ASAP. On 05/12/2023 19:57, Sébastien Rebecchi wrote: Hi Bowen, Thanks for your answer. I was thinking of extreme use cases, but as far as I am concerned I can deal with creation and deletion of 2 tables every 6 hours for a keyspace. So it lets around 8 folders of deleted tables per day - sometimes more cause I can see sometimes 2 folders created for a same table name, with 2 different ids, caused by temporary schema disagreements I guess. Basically it means 20 years before the KS folder has 65K subfolders, so I would say I have time to think of redesigning the data model ^^ Nevertheless, does it sound too much in terms of thombstones in the systems tables (with the default GC grace period of 10 days)? Sébastien. Le mar. 5 déc. 2023, 12:19, Bowen Song via user a écrit : Please rethink your use case. Create and delete tables concurrently often lead to schema disagreement. Even doing so on a single node sequentially will lead to a large number of tombstones in the system tables. On 04/12/2023 19:55, Sébastien Rebecchi wrote: Thank you Dipan. Do you know if there is a good reason for Cassandra to let tables folder even when there is no snapshot? I'm thinking of use cases where there is the need to create and delete small tables at a high rate. You could quickly end with more than 65K (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them are residual of deleted tables. That looks quite dirty from Cassandra to not clean its own "garbage" by itself, and quite dangerous for the end user to have to do it alone, don't you think so? Thanks, Sébastien. Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : Hello Sebastien, There are no inbuilt tools that will automatically remove folders of deleted tables. Thanks, Dipan Shah *From:* Sébastien Rebecchi *Sent:* 04 December 2023 13:54 *To:* user@cassandra.apache.org *Subject:* Remove folders of deleted tables Hello, When we delete a table with Cassandra, it lets the folder of that table on file system, even if there is no snapshot (auto snapshots disabled). So we end with the empty folder {data folder}/{keyspace name}/{table name-table id} containing only 1 subfolder, backups, which is itself empty. Is there a way to automatically remove folders of deleted tables? Sébastien.
Re: Remove folders of deleted tables
s of deleted tables per day - sometimes more >> cause I can see sometimes 2 folders created for a same table name, with 2 >> different ids, caused by temporary schema disagreements I guess. >> Basically it means 20 years before the KS folder has 65K subfolders, so I >> would say I have time to think of redesigning the data model ^^ >> Nevertheless, does it sound too much in terms of thombstones in the >> systems tables (with the default GC grace period of 10 days)? >> >> Sébastien. >> >> Le mar. 5 déc. 2023, 12:19, Bowen Song via user < >> user@cassandra.apache.org> a écrit : >> >>> Please rethink your use case. Create and delete tables concurrently >>> often lead to schema disagreement. Even doing so on a single node >>> sequentially will lead to a large number of tombstones in the system tables. >>> On 04/12/2023 19:55, Sébastien Rebecchi wrote: >>> >>> Thank you Dipan. >>> >>> Do you know if there is a good reason for Cassandra to let tables folder >>> even when there is no snapshot? >>> >>> I'm thinking of use cases where there is the need to create and delete >>> small tables at a high rate. You could quickly end with more than 65K >>> (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them >>> are residual of deleted tables. >>> >>> That looks quite dirty from Cassandra to not clean its own "garbage" by >>> itself, and quite dangerous for the end user to have to do it alone, don't >>> you think so? >>> >>> Thanks, >>> >>> Sébastien. >>> >>> Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : >>> >>>> Hello Sebastien, >>>> >>>> There are no inbuilt tools that will automatically remove folders of >>>> deleted tables. >>>> >>>> Thanks, >>>> >>>> Dipan Shah >>>> -- >>>> *From:* Sébastien Rebecchi >>>> *Sent:* 04 December 2023 13:54 >>>> *To:* user@cassandra.apache.org >>>> *Subject:* Remove folders of deleted tables >>>> >>>> Hello, >>>> >>>> When we delete a table with Cassandra, it lets the folder of that table >>>> on file system, even if there is no snapshot (auto snapshots disabled). >>>> So we end with the empty folder {data folder}/{keyspace name}/{table >>>> name-table id} containing only 1 subfolder, backups, which is itself >>>> empty. >>>> Is there a way to automatically remove folders of deleted tables? >>>> >>>> Sébastien. >>>> >>>
Re: Remove folders of deleted tables
There are many different ways to avoid or minimise the chance of schema disagreements, the easiest way is to always send DDL queries to the same node in the cluster. This is very easy to implement and avoids schema disagreements at the cost of creating a single point of failure for DDL queries. More sophisticated methods also exist, such as locking and centralised schema modification, and you should consider which one is more suitable for your use case. Ignoring the schema disagreements problem is not recommended, as this is not a tested state for the cluster, you are likely to run into some known and unknown (and possibly severe) issues later. The system_schema.columns table will almost certainly have more tombstones created than the number of tables deleted, unless each deleted table had only one column. I doubt creating and deleting 8 tables per day will be a problem, but I would recommend you find a way to test it before doing that on a production system, because I don't know anyone else is using Cassandra in this way. From the surface, it does sound like TWCS with the date in in the partition key may fit your use case better than creating and deleting tables every day. On 06/12/2023 08:26, Sébastien Rebecchi wrote: Hello Jeff, Bowen Thanks for your answer. Now I understand that there is a bug in Cassandra that can not handle concurrent schema modifications, I was not aware of that severity, I thought that temporary schema mismatches were eventually resolved smartly, by a kind of "merge" mechanism. For my use cases, keyspaces and tables are created "on-demand", when receiving exceptions for invalid KS or table on insert (then the KS and table are created and the insert is retried). I can not afford to centralize schema modifications in a bottleneck, but I can afford the data inconsistencies, waiting for the fix in Cassandra. I'm more worried about tombstones in system tables, I assume that 8 tombstones per day (or even more, but in the order of no more than some dozens) is reasonable, can you confirm (or invalidate) that please? Sébastien. Le mer. 6 déc. 2023 à 03:00, Bowen Song via user a écrit : The same table name with two different CF IDs is not just "temporary schema disagreements", it's much worse than that. This breaks the eventual consistency guarantee, and leads to silent data corruption. It's silently happening in the background, and you don't realise it until you suddenly do, and then everything seems to blow up at the same time. You need to sort this out ASAP. On 05/12/2023 19:57, Sébastien Rebecchi wrote: Hi Bowen, Thanks for your answer. I was thinking of extreme use cases, but as far as I am concerned I can deal with creation and deletion of 2 tables every 6 hours for a keyspace. So it lets around 8 folders of deleted tables per day - sometimes more cause I can see sometimes 2 folders created for a same table name, with 2 different ids, caused by temporary schema disagreements I guess. Basically it means 20 years before the KS folder has 65K subfolders, so I would say I have time to think of redesigning the data model ^^ Nevertheless, does it sound too much in terms of thombstones in the systems tables (with the default GC grace period of 10 days)? Sébastien. Le mar. 5 déc. 2023, 12:19, Bowen Song via user a écrit : Please rethink your use case. Create and delete tables concurrently often lead to schema disagreement. Even doing so on a single node sequentially will lead to a large number of tombstones in the system tables. On 04/12/2023 19:55, Sébastien Rebecchi wrote: Thank you Dipan. Do you know if there is a good reason for Cassandra to let tables folder even when there is no snapshot? I'm thinking of use cases where there is the need to create and delete small tables at a high rate. You could quickly end with more than 65K (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them are residual of deleted tables. That looks quite dirty from Cassandra to not clean its own "garbage" by itself, and quite dangerous for the end user to have to do it alone, don't you think so? Thanks, Sébastien. Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : Hello Sebastien, There are no inbuilt tools that will automatically remove folders of deleted tables. Thanks, Dipan Shah *From:* Sébastien Rebecchi *Sent:* 04 December 2023 13:54 *To:* user@cassandra.apache.org *Subject:* Remove folders of deleted tables Hello, When w
Re: Remove folders of deleted tables
Hello Jeff, Bowen Thanks for your answer. Now I understand that there is a bug in Cassandra that can not handle concurrent schema modifications, I was not aware of that severity, I thought that temporary schema mismatches were eventually resolved smartly, by a kind of "merge" mechanism. For my use cases, keyspaces and tables are created "on-demand", when receiving exceptions for invalid KS or table on insert (then the KS and table are created and the insert is retried). I can not afford to centralize schema modifications in a bottleneck, but I can afford the data inconsistencies, waiting for the fix in Cassandra. I'm more worried about tombstones in system tables, I assume that 8 tombstones per day (or even more, but in the order of no more than some dozens) is reasonable, can you confirm (or invalidate) that please? Sébastien. Le mer. 6 déc. 2023 à 03:00, Bowen Song via user a écrit : > The same table name with two different CF IDs is not just "temporary > schema disagreements", it's much worse than that. This breaks the eventual > consistency guarantee, and leads to silent data corruption. It's silently > happening in the background, and you don't realise it until you suddenly > do, and then everything seems to blow up at the same time. You need to sort > this out ASAP. > > > On 05/12/2023 19:57, Sébastien Rebecchi wrote: > > Hi Bowen, > > Thanks for your answer. > > I was thinking of extreme use cases, but as far as I am concerned I can > deal with creation and deletion of 2 tables every 6 hours for a keyspace. > So it lets around 8 folders of deleted tables per day - sometimes more > cause I can see sometimes 2 folders created for a same table name, with 2 > different ids, caused by temporary schema disagreements I guess. > Basically it means 20 years before the KS folder has 65K subfolders, so I > would say I have time to think of redesigning the data model ^^ > Nevertheless, does it sound too much in terms of thombstones in the > systems tables (with the default GC grace period of 10 days)? > > Sébastien. > > Le mar. 5 déc. 2023, 12:19, Bowen Song via user > a écrit : > >> Please rethink your use case. Create and delete tables concurrently often >> lead to schema disagreement. Even doing so on a single node sequentially >> will lead to a large number of tombstones in the system tables. >> On 04/12/2023 19:55, Sébastien Rebecchi wrote: >> >> Thank you Dipan. >> >> Do you know if there is a good reason for Cassandra to let tables folder >> even when there is no snapshot? >> >> I'm thinking of use cases where there is the need to create and delete >> small tables at a high rate. You could quickly end with more than 65K >> (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them >> are residual of deleted tables. >> >> That looks quite dirty from Cassandra to not clean its own "garbage" by >> itself, and quite dangerous for the end user to have to do it alone, don't >> you think so? >> >> Thanks, >> >> Sébastien. >> >> Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : >> >>> Hello Sebastien, >>> >>> There are no inbuilt tools that will automatically remove folders of >>> deleted tables. >>> >>> Thanks, >>> >>> Dipan Shah >>> -- >>> *From:* Sébastien Rebecchi >>> *Sent:* 04 December 2023 13:54 >>> *To:* user@cassandra.apache.org >>> *Subject:* Remove folders of deleted tables >>> >>> Hello, >>> >>> When we delete a table with Cassandra, it lets the folder of that table >>> on file system, even if there is no snapshot (auto snapshots disabled). >>> So we end with the empty folder {data folder}/{keyspace name}/{table >>> name-table id} containing only 1 subfolder, backups, which is itself empty. >>> Is there a way to automatically remove folders of deleted tables? >>> >>> Sébastien. >>> >>
Re: Remove folders of deleted tables
The same table name with two different CF IDs is not just "temporary schema disagreements", it's much worse than that. This breaks the eventual consistency guarantee, and leads to silent data corruption. It's silently happening in the background, and you don't realise it until you suddenly do, and then everything seems to blow up at the same time. You need to sort this out ASAP. On 05/12/2023 19:57, Sébastien Rebecchi wrote: Hi Bowen, Thanks for your answer. I was thinking of extreme use cases, but as far as I am concerned I can deal with creation and deletion of 2 tables every 6 hours for a keyspace. So it lets around 8 folders of deleted tables per day - sometimes more cause I can see sometimes 2 folders created for a same table name, with 2 different ids, caused by temporary schema disagreements I guess. Basically it means 20 years before the KS folder has 65K subfolders, so I would say I have time to think of redesigning the data model ^^ Nevertheless, does it sound too much in terms of thombstones in the systems tables (with the default GC grace period of 10 days)? Sébastien. Le mar. 5 déc. 2023, 12:19, Bowen Song via user a écrit : Please rethink your use case. Create and delete tables concurrently often lead to schema disagreement. Even doing so on a single node sequentially will lead to a large number of tombstones in the system tables. On 04/12/2023 19:55, Sébastien Rebecchi wrote: Thank you Dipan. Do you know if there is a good reason for Cassandra to let tables folder even when there is no snapshot? I'm thinking of use cases where there is the need to create and delete small tables at a high rate. You could quickly end with more than 65K (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them are residual of deleted tables. That looks quite dirty from Cassandra to not clean its own "garbage" by itself, and quite dangerous for the end user to have to do it alone, don't you think so? Thanks, Sébastien. Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : Hello Sebastien, There are no inbuilt tools that will automatically remove folders of deleted tables. Thanks, Dipan Shah *From:* Sébastien Rebecchi *Sent:* 04 December 2023 13:54 *To:* user@cassandra.apache.org *Subject:* Remove folders of deleted tables Hello, When we delete a table with Cassandra, it lets the folder of that table on file system, even if there is no snapshot (auto snapshots disabled). So we end with the empty folder {data folder}/{keyspace name}/{table name-table id} containing only 1 subfolder, backups, which is itself empty. Is there a way to automatically remove folders of deleted tables? Sébastien.
Re: Remove folders of deleted tables
The last time you mentioned this: On Tue, Dec 5, 2023 at 11:57 AM Sébastien Rebecchi wrote: > Hi Bowen, > > Thanks for your answer. > > I was thinking of extreme use cases, but as far as I am concerned I can > deal with creation and deletion of 2 tables every 6 hours for a keyspace. > So it lets around 8 folders of deleted tables per day - sometimes more > cause I can see sometimes 2 folders created for a same table name, with 2 > different ids, caused by temporary schema disagreements I guess. > I told you it's much worse than you're assuming it is: https://lists.apache.org/thread/fzkn3vqjyfjslcv97wcycb6w0wn5ltk2 Here's a more detailed explanation: https://www.mail-archive.com/user@cassandra.apache.org/msg62206.html (This is fixed and strictly safe in the version of cassandra with transactional cluster metadata, which just got merged to trunk in the past month, so "will be safe soon").
Re: Remove folders of deleted tables
Hi Bowen, Thanks for your answer. I was thinking of extreme use cases, but as far as I am concerned I can deal with creation and deletion of 2 tables every 6 hours for a keyspace. So it lets around 8 folders of deleted tables per day - sometimes more cause I can see sometimes 2 folders created for a same table name, with 2 different ids, caused by temporary schema disagreements I guess. Basically it means 20 years before the KS folder has 65K subfolders, so I would say I have time to think of redesigning the data model ^^ Nevertheless, does it sound too much in terms of thombstones in the systems tables (with the default GC grace period of 10 days)? Sébastien. Le mar. 5 déc. 2023, 12:19, Bowen Song via user a écrit : > Please rethink your use case. Create and delete tables concurrently often > lead to schema disagreement. Even doing so on a single node sequentially > will lead to a large number of tombstones in the system tables. > On 04/12/2023 19:55, Sébastien Rebecchi wrote: > > Thank you Dipan. > > Do you know if there is a good reason for Cassandra to let tables folder > even when there is no snapshot? > > I'm thinking of use cases where there is the need to create and delete > small tables at a high rate. You could quickly end with more than 65K > (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them > are residual of deleted tables. > > That looks quite dirty from Cassandra to not clean its own "garbage" by > itself, and quite dangerous for the end user to have to do it alone, don't > you think so? > > Thanks, > > Sébastien. > > Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : > >> Hello Sebastien, >> >> There are no inbuilt tools that will automatically remove folders of >> deleted tables. >> >> Thanks, >> >> Dipan Shah >> ---------- >> *From:* Sébastien Rebecchi >> *Sent:* 04 December 2023 13:54 >> *To:* user@cassandra.apache.org >> *Subject:* Remove folders of deleted tables >> >> Hello, >> >> When we delete a table with Cassandra, it lets the folder of that table >> on file system, even if there is no snapshot (auto snapshots disabled). >> So we end with the empty folder {data folder}/{keyspace name}/{table >> name-table id} containing only 1 subfolder, backups, which is itself empty. >> Is there a way to automatically remove folders of deleted tables? >> >> Sébastien. >> >
Re: Remove folders of deleted tables
I can't think of a reason to keep empty directories around, seems like a reasonable change, but I don't think you're butting up against a thing that most people would run into, as snapshots are enabled by default (auto_snapshot: true) and almost nobody changes it. The use case you described isn't handled well by Cassandra for a host of other reasons, and I would *never* do that in a production environment with any released version. The folder thing is the least of the issues you'll run into, so even if you contribute a patch and address it, I'd still wouldn't do it until transactional cluster metadata gets released and I've had a chance to kick the tires to see what issues you run into besides schema inconsistencies. I suspect the drivers won't love it either. Assuming you're running into an issue now: find . -type d -empty -exec rmdir {} \; rmdir only removes empty directories, you'll need to run it twice (once for backup, once for the empty table). It will remove all empty directories in that folder so if you've got unused tables, you'd be better off using the find command, getting the list, removing the active tables from it and explicitly running the rmdir command with the directories you want cleaned up. Jon On 2023/12/04 19:55:06 Sébastien Rebecchi wrote: > Thank you Dipan. > > Do you know if there is a good reason for Cassandra to let tables folder > even when there is no snapshot? > > I'm thinking of use cases where there is the need to create and delete > small tables at a high rate. You could quickly end with more than 65K > (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them > are residual of deleted tables. > > That looks quite dirty from Cassandra to not clean its own "garbage" by > itself, and quite dangerous for the end user to have to do it alone, don't > you think so? > > Thanks, > > Sébastien. > > Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : > > > Hello Sebastien, > > > > There are no inbuilt tools that will automatically remove folders of > > deleted tables. > > > > Thanks, > > > > Dipan Shah > > -- > > *From:* Sébastien Rebecchi > > *Sent:* 04 December 2023 13:54 > > *To:* user@cassandra.apache.org > > *Subject:* Remove folders of deleted tables > > > > Hello, > > > > When we delete a table with Cassandra, it lets the folder of that table on > > file system, even if there is no snapshot (auto snapshots disabled). > > So we end with the empty folder {data folder}/{keyspace name}/{table > > name-table id} containing only 1 subfolder, backups, which is itself empty. > > Is there a way to automatically remove folders of deleted tables? > > > > Sébastien. > > >
Re: Remove folders of deleted tables
Please rethink your use case. Create and delete tables concurrently often lead to schema disagreement. Even doing so on a single node sequentially will lead to a large number of tombstones in the system tables. On 04/12/2023 19:55, Sébastien Rebecchi wrote: Thank you Dipan. Do you know if there is a good reason for Cassandra to let tables folder even when there is no snapshot? I'm thinking of use cases where there is the need to create and delete small tables at a high rate. You could quickly end with more than 65K (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them are residual of deleted tables. That looks quite dirty from Cassandra to not clean its own "garbage" by itself, and quite dangerous for the end user to have to do it alone, don't you think so? Thanks, Sébastien. Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : Hello Sebastien, There are no inbuilt tools that will automatically remove folders of deleted tables. Thanks, Dipan Shah *From:* Sébastien Rebecchi *Sent:* 04 December 2023 13:54 *To:* user@cassandra.apache.org *Subject:* Remove folders of deleted tables Hello, When we delete a table with Cassandra, it lets the folder of that table on file system, even if there is no snapshot (auto snapshots disabled). So we end with the empty folder {data folder}/{keyspace name}/{table name-table id} containing only 1 subfolder, backups, which is itself empty. Is there a way to automatically remove folders of deleted tables? Sébastien.
Re: Remove folders of deleted tables
Thank you Dipan. Do you know if there is a good reason for Cassandra to let tables folder even when there is no snapshot? I'm thinking of use cases where there is the need to create and delete small tables at a high rate. You could quickly end with more than 65K (limit of ext4) subdirectories in the KS directory, while 99.9.. % of them are residual of deleted tables. That looks quite dirty from Cassandra to not clean its own "garbage" by itself, and quite dangerous for the end user to have to do it alone, don't you think so? Thanks, Sébastien. Le lun. 4 déc. 2023, 11:28, Dipan Shah a écrit : > Hello Sebastien, > > There are no inbuilt tools that will automatically remove folders of > deleted tables. > > Thanks, > > Dipan Shah > -- > *From:* Sébastien Rebecchi > *Sent:* 04 December 2023 13:54 > *To:* user@cassandra.apache.org > *Subject:* Remove folders of deleted tables > > Hello, > > When we delete a table with Cassandra, it lets the folder of that table on > file system, even if there is no snapshot (auto snapshots disabled). > So we end with the empty folder {data folder}/{keyspace name}/{table > name-table id} containing only 1 subfolder, backups, which is itself empty. > Is there a way to automatically remove folders of deleted tables? > > Sébastien. >
Re: Remove folders of deleted tables
Hello Sebastien, There are no inbuilt tools that will automatically remove folders of deleted tables. Thanks, Dipan Shah From: Sébastien Rebecchi Sent: 04 December 2023 13:54 To: user@cassandra.apache.org Subject: Remove folders of deleted tables Hello, When we delete a table with Cassandra, it lets the folder of that table on file system, even if there is no snapshot (auto snapshots disabled). So we end with the empty folder {data folder}/{keyspace name}/{table name-table id} containing only 1 subfolder, backups, which is itself empty. Is there a way to automatically remove folders of deleted tables? Sébastien.
Remove folders of deleted tables
Hello, When we delete a table with Cassandra, it lets the folder of that table on file system, even if there is no snapshot (auto snapshots disabled). So we end with the empty folder {data folder}/{keyspace name}/{table name-table id} containing only 1 subfolder, backups, which is itself empty. Is there a way to automatically remove folders of deleted tables? Sébastien.