Thank you Jeff Jirsa by your comments, How can we do this: "fix this by not scheduling the major compaction until we know all of the sstables in the window are available to be compacted"?
About the column-family schema, I had to customize the cassandra-stress tool so that it could create a reasonable number of rows per partition. In the default behavior it keeps creating repeated clustering keys for each partition, and so most data get updated instead of inserted. Lucas B. Dias 2018-05-14 14:03 GMT-03:00 Jeff Jirsa <jji...@gmail.com>: > Interesting! > > I suspect I know what the increased disk usage in TWCS, and it's a > solvable problem, the problem is roughly something like this: > - Window 1 has sstables 1, 2, 3, 4, 5, 6 > - We start compacting 1, 2, 3, 4 (using STCS-in-TWCS first window) > - The TWCS window rolls over > - We flush (sstable 7), and trigger the TWCS window major compaction, > which starts compacting 5, 6, 7 + any other sstable from that window > - If the first compaction (1,2,3,4) has finished by the time sstable 7 is > flushed, we'll include it's result in that compaction, if it doesn't we'll > have to do the major compaction twice to guarantee we have exactly one > sstable per window, which will temporarily increase disk space > > We can likely fix this by not scheduling the major compaction until we > know all of the sstables in the window are available to be compacted. > > Also your data model is probably typical, but not well suited for time > series cases - if you find my 2016 Cassandra Summit TWCS talk (it's on > youtube), I mention aligning partition keys to TWCS windows, which involves > adding a second component to the partition key. This is hugely important in > terms of making sure TWCS data expires quickly and avoiding having to read > from more than one TWCS window at a time. > > > - Jeff > > > > On Mon, May 14, 2018 at 7:12 AM, Lucas Benevides < > lu...@maurobenevides.com.br> wrote: > >> Dear community, >> >> I want to tell you about my paper published in a conference in March. The >> title is " NoSQL Database Performance Tuning for IoT Data - Cassandra >> Case Study" and it is available (not for free) in >> http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10 >> .5220/0006782702770284 . >> >> TWCS is used and compared with DTCS. >> >> I hope you can download it, unfortunately I cannot send copies as the >> publisher has its copyright. >> >> Lucas B. Dias >> >> >> >