Re: scheduled work compaction strategy

2018-02-17 Thread Carl Mueller
I'm probably going to take a shot at doing it basing it off of TWCS. But I don't know the fundamentals of compaction strategies and coding that well. Fundamentally you have memtable sets being flushed out to sstables, and then those sstables being reprocessed with background threads. And then

Re: scheduled work compaction strategy

2018-02-16 Thread Jeff Jirsa
There’s a company using TWCS in this config - I’m not going to out them, but I think they do it (or used to) with aggressive tombstone sub properties. They may have since extended/enhanced it somewhat. -- Jeff Jirsa > On Feb 16, 2018, at 2:24 PM, Carl Mueller

Re: scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
An even MORE complicated version could address the case where the TTLs are at the column key rather than the row key. That would divide the row across sstables by the rowkey, in essence the opposite of what most compaction strategies try to do: eventually centralize the data for a rowkey in one

Re: scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
Oh and as a further refinement outside of our use case. If we could group/organize the sstables by the rowkey time value or inherent TTL value, the naive version would be evenly distributed buckets into the future. But many/most data patterns like this have "busy" data in the near term. Far out

scheduled work compaction strategy

2018-02-16 Thread Carl Mueller
We have a scheduler app here at smartthings, where we track per-second tasks to be executed. These are all TTL'd to be destroyed after the second the event was registered with has passed. If the scheduling window was sufficiently small, say, 1 day, we could probably use a time window compaction