best setup of tombstones cleanup over a "wide" table (was: efficient delete over a "wide" table?)

Attila Wind Sat, 05 Sep 2020 01:18:10 -0700

Thank you guys for the answers - I expected this but wanted to verify(who knows how smart Cassandra can be in the background! :-) )

@Jeff: unfortunately the records we will pick up for delete are notnecessarily "neighbours" in terms of creation time so forming upcontiguous ranges can not be done...


Just one more question left in this case...

As this way we will have lots of row tombstones generated over this"wide" tableWhat would be your recommended table setup here (in terms ofgc_grace_seconds, compaction, compression, etc etc)? Currently we havedefault setup for everything which I believe should be fine tuned a bitbetter....


FYI: this table has ~500k new UUID keyed rows every day in each partition...

thanks a lot!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


04.09.2020 16:33 keltezéssel, Jeff Jirsa írta:

As someone else pointed out it’s the same number of tombstones. Doingdistinct queries gives you a bit more flexibility to retry it onefails, but multiple in one command avoids some contention on thememtable partition objects.
If you’re happen to be using type1 uuids (timeuuid) AND you’redeleting contiguous ranges, you could do a DELETE ... WHERE uuid>=?AND uuid <= ?
This would trade lots of tombstones for a single range tombstones, butmay not match your model.
On Sep 3, 2020, at 11:57 PM, Attila Wind <attilaw@swf.technology> wrote:



Hi C* gurus,

I'm looking for the best strategy to delete records from a "wide" table.
"wide" means the table stores records which have a UUID-style idelement of the key - within each partition
So yes, its not the partitioning key... The partitioning key isactually kind of a customerId at the moment and actually I'm not evensure this is the right model for this table... Given the fact thatnumber of curtomerIds <<< number of UUIDs probably not.But lets exclude this for a moment maybe and come back to the mainquestion of mine!
So the question:
when I delete records from this table, given the fact I can and Iwill delete in "batch fashion" (imagine kind of a scheduled job whichcollects - let's say - 1000 records) every time I do deletes...
Would there be a difference (in terms of generated tombstones) if Iwould
a) issue delete one-by-one like
DELETE FROM ... WHERE ... uuid = 'a'
DELETE FROM ... WHERE ... uuid = 'b'
...
DELETE FROM ... WHERE ... uuid = 'z'

or

b) issue delete in a group fashion like
DELETE FROM ... WHERE ... uuid in ('a', 'b', ... 'z')

?

or is there any other way to effeicently delete which I miss here?

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932

best setup of tombstones cleanup over a "wide" table (was: efficient delete over a "wide" table?)

Reply via email to