Thank you guys for the answers - I expected this but wanted to verify
(who knows how smart Cassandra can be in the background! :-) )
@Jeff: unfortunately the records we will pick up for delete are not
necessarily "neighbours" in terms of creation time so forming up
contiguous ranges can not be done...
Just one more question left in this case...
As this way we will have lots of row tombstones generated over this
"wide" table
What would be your recommended table setup here (in terms of
gc_grace_seconds, compaction, compression, etc etc)? Currently we have
default setup for everything which I believe should be fine tuned a bit
better....
FYI: this table has ~500k new UUID keyed rows every day in each partition...
thanks a lot!
Attila Wind
http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932
04.09.2020 16:33 keltezéssel, Jeff Jirsa írta:
As someone else pointed out it’s the same number of tombstones. Doing
distinct queries gives you a bit more flexibility to retry it one
fails, but multiple in one command avoids some contention on the
memtable partition objects.
If you’re happen to be using type1 uuids (timeuuid) AND you’re
deleting contiguous ranges, you could do a DELETE ... WHERE uuid>=?
AND uuid <= ?
This would trade lots of tombstones for a single range tombstones, but
may not match your model.
On Sep 3, 2020, at 11:57 PM, Attila Wind <attilaw@swf.technology> wrote:
Hi C* gurus,
I'm looking for the best strategy to delete records from a "wide" table.
"wide" means the table stores records which have a UUID-style id
element of the key - within each partition
So yes, its not the partitioning key... The partitioning key is
actually kind of a customerId at the moment and actually I'm not even
sure this is the right model for this table... Given the fact that
number of curtomerIds <<< number of UUIDs probably not.
But lets exclude this for a moment maybe and come back to the main
question of mine!
So the question:
when I delete records from this table, given the fact I can and I
will delete in "batch fashion" (imagine kind of a scheduled job which
collects - let's say - 1000 records) every time I do deletes...
Would there be a difference (in terms of generated tombstones) if I
would
a) issue delete one-by-one like
DELETE FROM ... WHERE ... uuid = 'a'
DELETE FROM ... WHERE ... uuid = 'b'
...
DELETE FROM ... WHERE ... uuid = 'z'
or
b) issue delete in a group fashion like
DELETE FROM ... WHERE ... uuid in ('a', 'b', ... 'z')
?
or is there any other way to effeicently delete which I miss here?
thanks!
--
Attila Wind
http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932