[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2017-12-11 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287120#comment-16287120
 ] 

Jeff Jirsa commented on CASSANDRA-9420:
---

Linking to 9779, the 'append only' table optimization ticket.


> Table option for promising that you will never touch a column twice
> ---
>
> Key: CASSANDRA-9420
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Björn Hegerfors
>
> There are time series use cases where you write all values with various TTLs, 
> have GC grace = 0 and never ever update or delete a column after insertion. 
> In the case where all TTLs are the same, DTCS with recent patches works 
> great. But when there is lots of variations in TTLs, you are forced to choose 
> between splitting your table into multiple TTL tiers or having your SSTables 
> filled to the majority with tombstones. Or running frequent major compactions.
> The problem stems from the fact that Cassandra plays safe when a TTL has 
> expired, and turns it into a tombstone, rather than getting rid of it on the 
> spot. The reason is that this TTL _may_ have been in a column which has had 
> an earlier write without (or with a higher) TTL. And then that one should now 
> be deleted too.
> I propose that there should be table level setting to say "I guarantee that 
> there will never be any updates to any columns". The effect of enabling that 
> option is that all tombstones and expired TTLs should always be immediately 
> removed during compaction. And the check for dropping entirely expired 
> SSTables can be very loosened for these tables.
> This option should probably require gc_grace_seconds to be set to zero. It's 
> also questionable if writes without TTL should be allowed to such a table, 
> since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2015-05-21 Thread Matt Stump (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554386#comment-14554386
 ] 

Matt Stump commented on CASSANDRA-9420:
---

Yes it is extremely common for customers to have an append only data model. 
Sometimes the access pattern has a uniform TTL, but some customers will have 
different TTLs because they use retention period to distinguish between 
different levels of service or plans for a multi-tenant environment. To deal 
with this we have multiple CFs, one per data retention period. This isn't ideal.

I would be very much in favor of a feature that allows the avoidance of 
tombstone creation by telling C* about the access patterns. Tombstones are the 
achilles heel of TTLs.

 Table option for promising that you will never touch a column twice
 ---

 Key: CASSANDRA-9420
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
 Project: Cassandra
  Issue Type: New Feature
Reporter: Björn Hegerfors

 There are time series use cases where you write all values with various TTLs, 
 have GC grace = 0 and never ever update or delete a column after insertion. 
 In the case where all TTLs are the same, DTCS with recent patches works 
 great. But when there is lots of variations in TTLs, you are forced to choose 
 between splitting your table into multiple TTL tiers or having your SSTables 
 filled to the majority with tombstones. Or running frequent major compactions.
 The problem stems from the fact that Cassandra plays safe when a TTL has 
 expired, and turns it into a tombstone, rather than getting rid of it on the 
 spot. The reason is that this TTL _may_ have been in a column which has had 
 an earlier write without (or with a higher) TTL. And then that one should now 
 be deleted too.
 I propose that there should be table level setting to say I guarantee that 
 there will never be any updates to any columns. The effect of enabling that 
 option is that all tombstones and expired TTLs should always be immediately 
 removed during compaction. And the check for dropping entirely expired 
 SSTables can be very loosened for these tables.
 This option should probably require gc_grace_seconds to be set to zero. It's 
 also questionable if writes without TTL should be allowed to such a table, 
 since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2015-05-20 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552197#comment-14552197
 ] 

Aleksey Yeschenko commented on CASSANDRA-9420:
--

bq. If gcGrace == 0, which you seem fine with, then we'll get rid of expired 
column on the spot. 

Can we really? Even if gc gs is 0, we must ensure that there is no intersection 
with other sstables, before purging it.

This option would allow us to skip that check and indeed be able to drop it on 
the spot.


 Table option for promising that you will never touch a column twice
 ---

 Key: CASSANDRA-9420
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
 Project: Cassandra
  Issue Type: New Feature
Reporter: Björn Hegerfors

 There are time series use cases where you write all values with various TTLs, 
 have GC grace = 0 and never ever update or delete a column after insertion. 
 In the case where all TTLs are the same, DTCS with recent patches works 
 great. But when there is lots of variations in TTLs, you are forced to choose 
 between splitting your table into multiple TTL tiers or having your SSTables 
 filled to the majority with tombstones. Or running frequent major compactions.
 The problem stems from the fact that Cassandra plays safe when a TTL has 
 expired, and turns it into a tombstone, rather than getting rid of it on the 
 spot. The reason is that this TTL _may_ have been in a column which has had 
 an earlier write without (or with a higher) TTL. And then that one should now 
 be deleted too.
 I propose that there should be table level setting to say I guarantee that 
 there will never be any updates to any columns. The effect of enabling that 
 option is that all tombstones and expired TTLs should always be immediately 
 removed during compaction. And the check for dropping entirely expired 
 SSTables can be very loosened for these tables.
 This option should probably require gc_grace_seconds to be set to zero. It's 
 also questionable if writes without TTL should be allowed to such a table, 
 since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2015-05-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552266#comment-14552266
 ] 

Sylvain Lebresne commented on CASSANDRA-9420:
-

bq. What we should have done back then was to make that option not a default, 
but an enforced constrain. So that every update/insert always comes with the 
same TTL

The problem Björn wants to improve is explicitly the case where there is lots 
of variations in TTLs, so such option won't really help. And as he said, when 
all TTLs are the same, DTCS is able to drop expired sstable very efficiently so 
that this kind of option is not really needed for that case either.

bq. Even if gc gs is 0, we must ensure that there is no intersection with other 
sstables, before purging it.

Yes, but as the description mentions time series and DTCS, so I assumed the 
min timestamp checks would solve this. But I was wrong. As Björn says, with 
variable TTLs, the min timestamp check will rarely help.  

That said, there is probably  better heuristics to improve this case.  For 
instance, in theory, we could use the the sstable max/min clustering values 
which, for time series with DTCS, would allow to decide that most cells can't 
intersect any non-compacted sstable (one question being how efficient we can 
make that check be).

Because Cassandra cannot guarantee you won't override a column, so adding an 
option that tells it to assume it blindly feels pretty dangerous and I'm 
personally really not a fan.

 Table option for promising that you will never touch a column twice
 ---

 Key: CASSANDRA-9420
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
 Project: Cassandra
  Issue Type: New Feature
Reporter: Björn Hegerfors

 There are time series use cases where you write all values with various TTLs, 
 have GC grace = 0 and never ever update or delete a column after insertion. 
 In the case where all TTLs are the same, DTCS with recent patches works 
 great. But when there is lots of variations in TTLs, you are forced to choose 
 between splitting your table into multiple TTL tiers or having your SSTables 
 filled to the majority with tombstones. Or running frequent major compactions.
 The problem stems from the fact that Cassandra plays safe when a TTL has 
 expired, and turns it into a tombstone, rather than getting rid of it on the 
 spot. The reason is that this TTL _may_ have been in a column which has had 
 an earlier write without (or with a higher) TTL. And then that one should now 
 be deleted too.
 I propose that there should be table level setting to say I guarantee that 
 there will never be any updates to any columns. The effect of enabling that 
 option is that all tombstones and expired TTLs should always be immediately 
 removed during compaction. And the check for dropping entirely expired 
 SSTables can be very loosened for these tables.
 This option should probably require gc_grace_seconds to be set to zero. It's 
 also questionable if writes without TTL should be allowed to such a table, 
 since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2015-05-20 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552323#comment-14552323
 ] 

Aleksey Yeschenko commented on CASSANDRA-9420:
--

Right. I was speaking slightly out of context of this ticket.

Anecdotally, from observing #cassandra and talking to Field people, it seems 
like there are many users relying just on TTL and never doing any updates. 
Having an option for them to express it in metadata, and for us to enforce it, 
would allow us to drop expired TTL cells much more aggressively.

I'm speaking of using the same TTL everywhere, here, which should be at least 
as common as varying TTLs. We don't care about overwrites here at all, only 
need to enforce that all UPDATEs and INSERTs come with the same default TTL at 
all times. Maybe forbid DELETEs too.

[~mstump] Am I talking nonsense here? Is this relevant at all?

 Table option for promising that you will never touch a column twice
 ---

 Key: CASSANDRA-9420
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
 Project: Cassandra
  Issue Type: New Feature
Reporter: Björn Hegerfors

 There are time series use cases where you write all values with various TTLs, 
 have GC grace = 0 and never ever update or delete a column after insertion. 
 In the case where all TTLs are the same, DTCS with recent patches works 
 great. But when there is lots of variations in TTLs, you are forced to choose 
 between splitting your table into multiple TTL tiers or having your SSTables 
 filled to the majority with tombstones. Or running frequent major compactions.
 The problem stems from the fact that Cassandra plays safe when a TTL has 
 expired, and turns it into a tombstone, rather than getting rid of it on the 
 spot. The reason is that this TTL _may_ have been in a column which has had 
 an earlier write without (or with a higher) TTL. And then that one should now 
 be deleted too.
 I propose that there should be table level setting to say I guarantee that 
 there will never be any updates to any columns. The effect of enabling that 
 option is that all tombstones and expired TTLs should always be immediately 
 removed during compaction. And the check for dropping entirely expired 
 SSTables can be very loosened for these tables.
 This option should probably require gc_grace_seconds to be set to zero. It's 
 also questionable if writes without TTL should be allowed to such a table, 
 since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2015-05-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552356#comment-14552356
 ] 

Sylvain Lebresne commented on CASSANDRA-9420:
-

bq.  I was speaking slightly out of context of this ticket.

Then would you mind filing a separate ticket so things are talked of in context?

 Table option for promising that you will never touch a column twice
 ---

 Key: CASSANDRA-9420
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
 Project: Cassandra
  Issue Type: New Feature
Reporter: Björn Hegerfors

 There are time series use cases where you write all values with various TTLs, 
 have GC grace = 0 and never ever update or delete a column after insertion. 
 In the case where all TTLs are the same, DTCS with recent patches works 
 great. But when there is lots of variations in TTLs, you are forced to choose 
 between splitting your table into multiple TTL tiers or having your SSTables 
 filled to the majority with tombstones. Or running frequent major compactions.
 The problem stems from the fact that Cassandra plays safe when a TTL has 
 expired, and turns it into a tombstone, rather than getting rid of it on the 
 spot. The reason is that this TTL _may_ have been in a column which has had 
 an earlier write without (or with a higher) TTL. And then that one should now 
 be deleted too.
 I propose that there should be table level setting to say I guarantee that 
 there will never be any updates to any columns. The effect of enabling that 
 option is that all tombstones and expired TTLs should always be immediately 
 removed during compaction. And the check for dropping entirely expired 
 SSTables can be very loosened for these tables.
 This option should probably require gc_grace_seconds to be set to zero. It's 
 also questionable if writes without TTL should be allowed to such a table, 
 since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2015-05-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549993#comment-14549993
 ] 

Sylvain Lebresne commented on CASSANDRA-9420:
-

bq. The problem stems from the fact that Cassandra plays safe when a TTL has 
expired, and turns it into a tombstone, rather than getting rid of it on the 
spot.

If {{gcGrace == 0}}, which you seem fine with, then we'll get rid of expired 
column on the spot. The fact that expired column are turned into tombstones 
is a technicality that only exists so we keep expired columns for gcGrace 
seconds after expiration, but if the later is 0, it's removed right away. If 
you've observed differently in practice, then that would sound like a bug and 
if you provided reproduction steps we certainly can look into it. But, and 
unless I've misunderstood what you're saying, we don't need yet another option 
for this.

 Table option for promising that you will never touch a column twice
 ---

 Key: CASSANDRA-9420
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
 Project: Cassandra
  Issue Type: New Feature
Reporter: Björn Hegerfors

 There are time series use cases where you write all values with various TTLs, 
 have GC grace = 0 and never ever update or delete a column after insertion. 
 In the case where all TTLs are the same, DTCS with recent patches works 
 great. But when there is lots of variations in TTLs, you are forced to choose 
 between splitting your table into multiple TTL tiers or having your SSTables 
 filled to the majority with tombstones. Or running frequent major compactions.
 The problem stems from the fact that Cassandra plays safe when a TTL has 
 expired, and turns it into a tombstone, rather than getting rid of it on the 
 spot. The reason is that this TTL _may_ have been in a column which has had 
 an earlier write without (or with a higher) TTL. And then that one should now 
 be deleted too.
 I propose that there should be table level setting to say I guarantee that 
 there will never be any updates to any columns. The effect of enabling that 
 option is that all tombstones and expired TTLs should always be immediately 
 removed during compaction. And the check for dropping entirely expired 
 SSTables can be very loosened for these tables.
 This option should probably require gc_grace_seconds to be set to zero. It's 
 also questionable if writes without TTL should be allowed to such a table, 
 since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2015-05-18 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549363#comment-14549363
 ] 

Aleksey Yeschenko commented on CASSANDRA-9420:
--

Or an {{enforce_default_ttl}} boolean per-table option. 

 Table option for promising that you will never touch a column twice
 ---

 Key: CASSANDRA-9420
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
 Project: Cassandra
  Issue Type: New Feature
Reporter: Björn Hegerfors

 There are time series use cases where you write all values with various TTLs, 
 have GC grace = 0 and never ever update or delete a column after insertion. 
 In the case where all TTLs are the same, DTCS with recent patches works 
 great. But when there is lots of variations in TTLs, you are forced to choose 
 between splitting your table into multiple TTL tiers or having your SSTables 
 filled to the majority with tombstones. Or running frequent major compactions.
 The problem stems from the fact that Cassandra plays safe when a TTL has 
 expired, and turns it into a tombstone, rather than getting rid of it on the 
 spot. The reason is that this TTL _may_ have been in a column which has had 
 an earlier write without (or with a higher) TTL. And then that one should now 
 be deleted too.
 I propose that there should be table level setting to say I guarantee that 
 there will never be any updates to any columns. The effect of enabling that 
 option is that all tombstones and expired TTLs should always be immediately 
 removed during compaction. And the check for dropping entirely expired 
 SSTables can be very loosened for these tables.
 This option should probably require gc_grace_seconds to be set to zero. It's 
 also questionable if writes without TTL should be allowed to such a table, 
 since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9420) Table option for promising that you will never touch a column twice

2015-05-18 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549346#comment-14549346
 ] 

Aleksey Yeschenko commented on CASSANDRA-9420:
--

We've discussed this before, when introducing default TTL.

What we should have done back then was to make that option not a default, but 
an enforced constrain. So that *every* update/insert always comes with the same 
TTL.

I personally wouldn't mind adding a new {{required_ttl}} table option and have 
it behave like this.

 Table option for promising that you will never touch a column twice
 ---

 Key: CASSANDRA-9420
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9420
 Project: Cassandra
  Issue Type: New Feature
Reporter: Björn Hegerfors

 There are time series use cases where you write all values with various TTLs, 
 have GC grace = 0 and never ever update or delete a column after insertion. 
 In the case where all TTLs are the same, DTCS with recent patches works 
 great. But when there is lots of variations in TTLs, you are forced to choose 
 between splitting your table into multiple TTL tiers or having your SSTables 
 filled to the majority with tombstones. Or running frequent major compactions.
 The problem stems from the fact that Cassandra plays safe when a TTL has 
 expired, and turns it into a tombstone, rather than getting rid of it on the 
 spot. The reason is that this TTL _may_ have been in a column which has had 
 an earlier write without (or with a higher) TTL. And then that one should now 
 be deleted too.
 I propose that there should be table level setting to say I guarantee that 
 there will never be any updates to any columns. The effect of enabling that 
 option is that all tombstones and expired TTLs should always be immediately 
 removed during compaction. And the check for dropping entirely expired 
 SSTables can be very loosened for these tables.
 This option should probably require gc_grace_seconds to be set to zero. It's 
 also questionable if writes without TTL should be allowed to such a table, 
 since those would become constants.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)