Anubhav Kale created CASSANDRA-11407:
----------------------------------------

             Summary: Proposal for a simple DTCS
                 Key: CASSANDRA-11407
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11407
             Project: Cassandra
          Issue Type: Improvement
          Components: Compaction
            Reporter: Anubhav Kale
         Attachments: 0001-Simple-DTCS.patch

Today's DTCS implementation has been discussed and debated in a few JIRAs 
already (the notable one is 
https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main 
challenges with the current approach is that it is very difficult to reason 
about how the "Target" class makes buckets, thus making it difficult to reason 
about the expected file layout on disk.

I am proposing a simplification to current approach that keeps most of the DTCS 
properties intact that makes it a great fit for time-series data. The 
simplification is as follows.

Given the min and max timestamps across all SS Tables in question, start from 
min and make windows based on base and min_threshold. The logic in GetWindow 
simply tries to fit maximum sized windows from min to max. 

This keeps the DTCS properties intact except that we don't need to wait for 
min_threshold windows before making a bigger one. I would argue this simplifies 
the algorithm to a great extent, is easy to reason about and the end result 
isn't drastically different than the original DTCS in most cases. We give up on 
the "alignment" logic in current class, but I honestly don't think it buys us a 
lot besides complexity.

The implementation can obviously be optimized and cleaned up more if folks 
think this is a good idea. 








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to