[
https://issues.apache.org/jira/browse/CASSANDRA-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226463#comment-14226463
]
Björn Hegerfors commented on CASSANDRA-8360:
OK, sounds fair. That essentially means that we want to treat the incoming
window specially. A question worth asking is what we want the incoming window
for. Currently it is keep the last unit of base_time_seconds compacted at all
times. While it respects min_threshold, a value written early in the window
will essentially be constantly recompacted once every (min_threshold - 1)
subsequent sstable flushes. I'm fully aware that this might be a bad idea, or
rather I wasn't sure if it was the right thing to do. Really, it's completely
inspired by STCS's min_sstable_size which seems to do the same thing, i.e. not
respect the logarithmic complexity tree-like merging on small enough SSTables.
(Reminds me a bit of insertion sort being fastest on small enough arrays). So
base_time_seconds has the same purpose. A problem is that it might be harder
set a good default on time than on size.
Setting min_sstable_size in STCS to 0 has an near-equivalent in DTCS: setting
base_time_seconds to 1. The windows will be powers of base_time_seconds (up to
base_time_seconds of each size), starting at 1 second. Even with this setting,
data that is an hour old will be in near-hour large windows. The only
meaningful difference is that SSTables 2 seconds and 10 seconds old will not be
in the same window. What I mean by this, is that setting base_time_seconds to 1
is perfectly reasonable, it's just the same as setting min_sstable_size to 0 or
1 in STCS. I just want to make it clear that base_time_seconds is not really
something that you should set to 1 hour (3600) just because you want SSTables
older than 1 hour to be in nice 1-hour chunks. If you set it to 900 with
min_threshold=4, SSTables older than 1 hour will still be in perfect 1 hour
chunks (because preceding up to 4 900-second chunks, comes a 4*900=3600-second
chunk).
So I guess respecting min_threshold in the 'incoming window' is just as right
as respecting min_threshold when compacting SSTables smaller than
min_sstable_size in STCS. Which I believe it does. So there's my roundabout way
of coming to the same conclusion as you, [~jbellis] :). I just have this
feeling that the meaning of base_time_seconds isn't well understood.
In DTCS, always compact SSTables in the same time window, even if they are
fewer than min_threshold
---
Key: CASSANDRA-8360
URL: https://issues.apache.org/jira/browse/CASSANDRA-8360
Project: Cassandra
Issue Type: Improvement
Reporter: Björn Hegerfors
Priority: Minor
DTCS uses min_threshold to decide how many time windows of the same size that
need to accumulate before merging into a larger window. The age of an SSTable
is determined as its min timestamp, and it always falls into exactly one of
the time windows. If multiple SSTables fall into the same window, DTCS
considers compacting them, but if they are fewer than min_threshold, it
decides not to do it.
When do more than 1 but fewer than min_threshold SSTables end up in the same
time window (except for the current window), you might ask? In the current
state, DTCS can spill some extra SSTables into bigger windows when the
previous window wasn't fully compacted, which happens all the time when the
latest window stops being the current one. Also, repairs and hints can put
new SSTables in old windows.
I think, and [~jjordan] agreed in a comment on CASSANDRA-6602, that DTCS
should ignore min_threshold and compact tables in the same windows regardless
of how few they are. I guess max_threshold should still be respected.
[~jjordan] suggested that this should apply to all windows but the current
window, where all the new SSTables end up. That could make sense. I'm not
clear on whether compacting many SSTables at once is more cost efficient or
not, when it comes to the very newest and smallest SSTables. Maybe compacting
as soon as 2 SSTables are seen is fine if the initial window size is small
enough? I guess the opposite could be the case too; that the very newest
SSTables should be compacted very many at a time?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)