[jira] [Commented] (CASSANDRA-8360) In DTCS, always compact SSTables in the same time window, even if they are fewer than min_threshold

2014-11-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226463#comment-14226463
 ] 

Björn Hegerfors commented on CASSANDRA-8360:


OK, sounds fair. That essentially means that we want to treat the incoming 
window specially. A question worth asking is what we want the incoming window 
for. Currently it is keep the last unit of base_time_seconds compacted at all 
times. While it respects min_threshold, a value written early in the window 
will essentially be constantly recompacted once every (min_threshold - 1) 
subsequent sstable flushes. I'm fully aware that this might be a bad idea, or 
rather I wasn't sure if it was the right thing to do. Really, it's completely 
inspired by STCS's min_sstable_size which seems to do the same thing, i.e. not 
respect the logarithmic complexity tree-like merging on small enough SSTables. 
(Reminds me a bit of insertion sort being fastest on small enough arrays). So 
base_time_seconds has the same purpose. A problem is that it might be harder 
set a good default on time than on size.

Setting min_sstable_size in STCS to 0 has an near-equivalent in DTCS: setting 
base_time_seconds to 1. The windows will be powers of base_time_seconds (up to 
base_time_seconds of each size), starting at 1 second. Even with this setting, 
data that is an hour old will be in near-hour large windows. The only 
meaningful difference is that SSTables 2 seconds and 10 seconds old will not be 
in the same window. What I mean by this, is that setting base_time_seconds to 1 
is perfectly reasonable, it's just the same as setting min_sstable_size to 0 or 
1 in STCS. I just want to make it clear that base_time_seconds is not really 
something that you should set to 1 hour (3600) just because you want SSTables 
older than 1 hour to be in nice 1-hour chunks. If you set it to 900 with 
min_threshold=4, SSTables older than 1 hour will still be in perfect 1 hour 
chunks (because preceding up to 4 900-second chunks, comes a 4*900=3600-second 
chunk).

So I guess respecting min_threshold in the 'incoming window' is just as right 
as respecting min_threshold when compacting SSTables smaller than 
min_sstable_size in STCS. Which I believe it does. So there's my roundabout way 
of coming to the same conclusion as you, [~jbellis] :). I just have this 
feeling that the meaning of base_time_seconds isn't well understood.

 In DTCS, always compact SSTables in the same time window, even if they are 
 fewer than min_threshold
 ---

 Key: CASSANDRA-8360
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8360
 Project: Cassandra
  Issue Type: Improvement
Reporter: Björn Hegerfors
Priority: Minor

 DTCS uses min_threshold to decide how many time windows of the same size that 
 need to accumulate before merging into a larger window. The age of an SSTable 
 is determined as its min timestamp, and it always falls into exactly one of 
 the time windows. If multiple SSTables fall into the same window, DTCS 
 considers compacting them, but if they are fewer than min_threshold, it 
 decides not to do it.
 When do more than 1 but fewer than min_threshold SSTables end up in the same 
 time window (except for the current window), you might ask? In the current 
 state, DTCS can spill some extra SSTables into bigger windows when the 
 previous window wasn't fully compacted, which happens all the time when the 
 latest window stops being the current one. Also, repairs and hints can put 
 new SSTables in old windows.
 I think, and [~jjordan] agreed in a comment on CASSANDRA-6602, that DTCS 
 should ignore min_threshold and compact tables in the same windows regardless 
 of how few they are. I guess max_threshold should still be respected.
 [~jjordan] suggested that this should apply to all windows but the current 
 window, where all the new SSTables end up. That could make sense. I'm not 
 clear on whether compacting many SSTables at once is more cost efficient or 
 not, when it comes to the very newest and smallest SSTables. Maybe compacting 
 as soon as 2 SSTables are seen is fine if the initial window size is small 
 enough? I guess the opposite could be the case too; that the very newest 
 SSTables should be compacted very many at a time?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8360) In DTCS, always compact SSTables in the same time window, even if they are fewer than min_threshold

2014-11-25 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225806#comment-14225806
 ] 

Jonathan Ellis commented on CASSANDRA-8360:
---

The max is there to make sure we don't oom or overwhelm the heap with 
compaction buffers.  So that should probably be respected at all times.

I agree that ignore min, except for the 'incoming' window makes the most 
sense -- you don't want to constantly recompact 90% of the data, every time a 
new sstable is flushed.  That's a big hit to DTCS advantage in write 
amplification.

It's possible that as you say this is fine if the window is small enough -- but 
if it's that small (smaller than flush interval) then it will be the 
previously active window soon enough.  So I don't think it's worth trying to 
special case that.

 In DTCS, always compact SSTables in the same time window, even if they are 
 fewer than min_threshold
 ---

 Key: CASSANDRA-8360
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8360
 Project: Cassandra
  Issue Type: Improvement
Reporter: Björn Hegerfors
Priority: Minor

 DTCS uses min_threshold to decide how many time windows of the same size that 
 need to accumulate before merging into a larger window. The age of an SSTable 
 is determined as its min timestamp, and it always falls into exactly one of 
 the time windows. If multiple SSTables fall into the same window, DTCS 
 considers compacting them, but if they are fewer than min_threshold, it 
 decides not to do it.
 When do more than 1 but fewer than min_threshold SSTables end up in the same 
 time window (except for the current window), you might ask? In the current 
 state, DTCS can spill some extra SSTables into bigger windows when the 
 previous window wasn't fully compacted, which happens all the time when the 
 latest window stops being the current one. Also, repairs and hints can put 
 new SSTables in old windows.
 I think, and [~jjordan] agreed in a comment on CASSANDRA-6602, that DTCS 
 should ignore min_threshold and compact tables in the same windows regardless 
 of how few they are. I guess max_threshold should still be respected.
 [~jjordan] suggested that this should apply to all windows but the current 
 window, where all the new SSTables end up. That could make sense. I'm not 
 clear on whether compacting many SSTables at once is more cost efficient or 
 not, when it comes to the very newest and smallest SSTables. Maybe compacting 
 as soon as 2 SSTables are seen is fine if the initial window size is small 
 enough? I guess the opposite could be the case too; that the very newest 
 SSTables should be compacted very many at a time?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)