[jira] [Commented] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables

2014-11-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211998#comment-14211998
 ] 

Sylvain Lebresne commented on CASSANDRA-8243:
-

bq. any reason to still not submit this patch to 2.0?

Yes, this is really an improvement and we're not committing improvements to 2.0 
at this point. I don't contest the reasoning, but we've all seen simple and 
well reasoned patch backfire in unexpected ways from times to times and that's 
why we implement rules like this. I'm not absolutely dead set against 
committing to 2.0 if everyone else wants to, but I'm not in favor of it on 
principle.

> DTCS can leave time-overlaps, limiting ability to expire entire SSTables
> 
>
> Key: CASSANDRA-8243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8243
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Björn Hegerfors
>Assignee: Björn Hegerfors
>Priority: Minor
>  Labels: compaction, performance
> Fix For: 2.0.12, 2.1.3
>
> Attachments: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt, 
> cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt
>
>
> CASSANDRA-6602 (DTCS) and CASSANDRA-5228 are supposed to be a perfect match 
> for tables where every value is written with a TTL. DTCS makes sure to keep 
> old data separate from new data. So shortly after the TTL has passed, 
> Cassandra should be able to throw away the whole SSTable containing a given 
> data point.
> CASSANDRA-5228 deletes the very oldest SSTables, and only if they don't 
> overlap (in terms of timestamps) with another SSTable which cannot be deleted.
> DTCS however, can't guarantee that SSTables won't overlap (again, in terms of 
> timestamps). In a test that I ran, every single SSTable overlapped with its 
> nearest neighbors by a very tiny amount. My reasoning for why this could 
> happen is that the dumped memtables were already overlapping from the start. 
> DTCS will never create an overlap where there is none. I surmised that this 
> happened in my case because I sent parallel writes which must have come out 
> of order. This was just locally, and out of order writes should be much more 
> common non-locally.
> That means that the SSTable removal optimization may never get a chance to 
> kick in!
> I can see two solutions:
> 1. Make DTCS split SSTables on time window borders. This will essentially 
> only be done on a newly dumped memtable once every base_time_seconds.
> 2. Make TTL SSTable expiry more aggressive. Relax the conditions on which an 
> SSTable can be dropped completely, of course without affecting any semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables

2014-11-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210057#comment-14210057
 ] 

Björn Hegerfors commented on CASSANDRA-8243:


An expired column is equivalent to a tombstone with the same timestamp in 
Cassandra's eyes, right? Compactions even turn them into tombstones, if they 
can't be immediately purged. So to simplify, we're dealing with all-tombstone 
SSTables. Both the old and new implementation agree that removing an SSTable 
can only happen if the oldest SSTable (the one with lowest minTimestamp) is 
all-tombstones (= has fully expired). Both implementations also agree that this 
oldest SSTable may not overlap (in time span) with an SSTable containing any 
non-tombtone data. If there is no such overlap, everything in any SSTable (with 
an overlapping row range, anyway) written with a timestamp less than or equal 
to this oldest table's maxTimestamp is guaranteed to be a tombstone.

Also, since any SSTable that either of the implementations remove is an 
all-tombstone SSTable, the only thing that can happen is that something is 
resurrected. Combined with the reasoning in my previous paragraph, the only 
thing that could be resurrected when a tombstone for column x with timestamp t 
is removed is another tombstone for column x, with a lower timestamp t'! When 
could that matter? Only if some other SSTable makes a constructive write to 
column x in the interval (t', t]. But that's impossible, because that would 
then be an SSTable containing some non-tombstone data with a minTimestamp less 
than or equal to the oldest SSTable's maxTimestamp, which goes against the 
assumption that no such SSTable exists!

There you have a proof by contradiction that the oldest SSTable can be safely 
removed if it is all-tombstones and doesn't overlap with any SSTable containing 
any non-tombstone data. If we then consider the oldest SSTable free to remove, 
the same rules apply to the oldest remaining SSTable and so on. This is the 
rule that my implementation uses. From the comments it looks like we already 
agree intuitively on this, but I though a more formal proof like this might 
help this get committed. [~slebresne] any reason to still not submit this patch 
to 2.0?

Oh, and I noticed that I didn't update the Javadoc, so here comes a new patch.

> DTCS can leave time-overlaps, limiting ability to expire entire SSTables
> 
>
> Key: CASSANDRA-8243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8243
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Björn Hegerfors
>Assignee: Björn Hegerfors
>Priority: Minor
>  Labels: compaction, performance
> Fix For: 2.0.12, 2.1.3
>
> Attachments: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt, 
> cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt
>
>
> CASSANDRA-6602 (DTCS) and CASSANDRA-5228 are supposed to be a perfect match 
> for tables where every value is written with a TTL. DTCS makes sure to keep 
> old data separate from new data. So shortly after the TTL has passed, 
> Cassandra should be able to throw away the whole SSTable containing a given 
> data point.
> CASSANDRA-5228 deletes the very oldest SSTables, and only if they don't 
> overlap (in terms of timestamps) with another SSTable which cannot be deleted.
> DTCS however, can't guarantee that SSTables won't overlap (again, in terms of 
> timestamps). In a test that I ran, every single SSTable overlapped with its 
> nearest neighbors by a very tiny amount. My reasoning for why this could 
> happen is that the dumped memtables were already overlapping from the start. 
> DTCS will never create an overlap where there is none. I surmised that this 
> happened in my case because I sent parallel writes which must have come out 
> of order. This was just locally, and out of order writes should be much more 
> common non-locally.
> That means that the SSTable removal optimization may never get a chance to 
> kick in!
> I can see two solutions:
> 1. Make DTCS split SSTables on time window borders. This will essentially 
> only be done on a newly dumped memtable once every base_time_seconds.
> 2. Make TTL SSTable expiry more aggressive. Relax the conditions on which an 
> SSTable can be dropped completely, of course without affecting any semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables

2014-11-13 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209735#comment-14209735
 ] 

Sylvain Lebresne commented on CASSANDRA-8243:
-

I suspect my comment was so we avoid dropping a tombstone that shadows datat in 
another non-dropped sstable, but you're right that in that case we're 
guaranteed that the data shadowed is actually expired (and even purgeable) data 
so I agree with you, I don't think it matters and this look safe to do (though 
I'd rather avoid committing this to 2.0, just in case).

> DTCS can leave time-overlaps, limiting ability to expire entire SSTables
> 
>
> Key: CASSANDRA-8243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8243
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Björn Hegerfors
>Assignee: Björn Hegerfors
>Priority: Minor
>  Labels: compaction, performance
> Fix For: 2.0.12, 2.1.3
>
> Attachments: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt
>
>
> CASSANDRA-6602 (DTCS) and CASSANDRA-5228 are supposed to be a perfect match 
> for tables where every value is written with a TTL. DTCS makes sure to keep 
> old data separate from new data. So shortly after the TTL has passed, 
> Cassandra should be able to throw away the whole SSTable containing a given 
> data point.
> CASSANDRA-5228 deletes the very oldest SSTables, and only if they don't 
> overlap (in terms of timestamps) with another SSTable which cannot be deleted.
> DTCS however, can't guarantee that SSTables won't overlap (again, in terms of 
> timestamps). In a test that I ran, every single SSTable overlapped with its 
> nearest neighbors by a very tiny amount. My reasoning for why this could 
> happen is that the dumped memtables were already overlapping from the start. 
> DTCS will never create an overlap where there is none. I surmised that this 
> happened in my case because I sent parallel writes which must have come out 
> of order. This was just locally, and out of order writes should be much more 
> common non-locally.
> That means that the SSTable removal optimization may never get a chance to 
> kick in!
> I can see two solutions:
> 1. Make DTCS split SSTables on time window borders. This will essentially 
> only be done on a newly dumped memtable once every base_time_seconds.
> 2. Make TTL SSTable expiry more aggressive. Relax the conditions on which an 
> SSTable can be dropped completely, of course without affecting any semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables

2014-11-13 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209684#comment-14209684
 ] 

Marcus Eriksson commented on CASSANDRA-8243:


Problem with this is that we might drop tombstones that actually cover data in 
other sstables, even though that data is also expired. 

I don't see any reason that this would make a difference to users, but I'm 
gonna throw up the [~slebresne]-flag here as he said back in CASSANDRA-5228 
that we must account for the timestamp of candidates that cover data in other 
sstables (in the code in the comment from Mar 21st)

> DTCS can leave time-overlaps, limiting ability to expire entire SSTables
> 
>
> Key: CASSANDRA-8243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8243
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Björn Hegerfors
>Assignee: Björn Hegerfors
>Priority: Minor
>  Labels: compaction, performance
> Fix For: 2.0.12, 2.1.3
>
> Attachments: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt
>
>
> CASSANDRA-6602 (DTCS) and CASSANDRA-5228 are supposed to be a perfect match 
> for tables where every value is written with a TTL. DTCS makes sure to keep 
> old data separate from new data. So shortly after the TTL has passed, 
> Cassandra should be able to throw away the whole SSTable containing a given 
> data point.
> CASSANDRA-5228 deletes the very oldest SSTables, and only if they don't 
> overlap (in terms of timestamps) with another SSTable which cannot be deleted.
> DTCS however, can't guarantee that SSTables won't overlap (again, in terms of 
> timestamps). In a test that I ran, every single SSTable overlapped with its 
> nearest neighbors by a very tiny amount. My reasoning for why this could 
> happen is that the dumped memtables were already overlapping from the start. 
> DTCS will never create an overlap where there is none. I surmised that this 
> happened in my case because I sent parallel writes which must have come out 
> of order. This was just locally, and out of order writes should be much more 
> common non-locally.
> That means that the SSTable removal optimization may never get a chance to 
> kick in!
> I can see two solutions:
> 1. Make DTCS split SSTables on time window borders. This will essentially 
> only be done on a newly dumped memtable once every base_time_seconds.
> 2. Make TTL SSTable expiry more aggressive. Relax the conditions on which an 
> SSTable can be dropped completely, of course without affecting any semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)