I'm probably going to take a shot at doing it basing it off of TWCS. But I
don't know the fundamentals of compaction strategies and coding that well.
Fundamentally you have memtable sets being flushed out to sstables, and
then those sstables being reprocessed with background threads. And then
There’s a company using TWCS in this config - I’m not going to out them, but I
think they do it (or used to) with aggressive tombstone sub properties. They
may have since extended/enhanced it somewhat.
> On Feb 16, 2018, at 2:24 PM, Carl Mueller
An even MORE complicated version could address the case where the TTLs are
at the column key rather than the row key. That would divide the row across
sstables by the rowkey, in essence the opposite of what most compaction
strategies try to do: eventually centralize the data for a rowkey in one
Oh and as a further refinement outside of our use case.
If we could group/organize the sstables by the rowkey time value or
inherent TTL value, the naive version would be evenly distributed buckets
into the future.
But many/most data patterns like this have "busy" data in the near term.
We have a scheduler app here at smartthings, where we track per-second
tasks to be executed.
These are all TTL'd to be destroyed after the second the event was
registered with has passed.
If the scheduling window was sufficiently small, say, 1 day, we could
probably use a time window compaction