[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-09-03 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150446#comment-16150446
 ] 

mck edited comment on CASSANDRA-13418 at 9/4/17 5:31 AM:
-

Updated:
|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/267]
 |


was (Author: michaelsembwever):
Updated:
|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265]
 |

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-09-01 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151196#comment-16151196
 ] 

mck edited comment on CASSANDRA-13418 at 9/1/17 9:39 PM:
-

{quote}P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.{quote}

I agree, but found no clear method name to use. As Marcus' comments, 
{{getFullyExpiredSSTables(..)}} isn't appropriate.
Any suggestions for a clear name? Otherwise the method is at 70 lines length, 
not great but no disaster, so i'm ok either way.


was (Author: michaelsembwever):
{quote}P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.{quote}

I agree, but found not clear method name to use. As Marcus' comments, 
{{getFullyExpiredSSTables(..)}} isn't appropriate.
Any suggestions for a clear name? Otherwise the method is at 70 lines length, 
not great but no disaster, so i'm ok either way.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-09-01 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 9/1/17 1:56 PM:
---

Don't worry [~michaelsembwever], I am currently working on an issue with 
couchbase so I couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.


was (Author: rgerard):
Don't worry [~michaelsembwever], I am currently working with an issue on 
couchbase so I couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-09-01 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 9/1/17 1:51 PM:
---

Don't worry [~michaelsembwever], I am currently working with an issue on 
couchbase so I couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.


was (Author: rgerard):
Don't worry [~mck], I am currently working with an issue on couchbase so I 
couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-09-01 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409
 ] 

mck edited comment on CASSANDRA-13418 at 9/1/17 12:15 PM:
--

[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest?)


was (Author: michaelsembwever):
[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest?)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-09-01 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409
 ] 

mck edited comment on CASSANDRA-13418 at 9/1/17 12:06 PM:
--

[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest?)


was (Author: michaelsembwever):
[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest still warranted?)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-09-01 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409
 ] 

mck edited comment on CASSANDRA-13418 at 9/1/17 12:05 PM:
--

[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest still warranted?)


was (Author: michaelsembwever):
[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyTest.validateOptions}} which is only 
there for the tests, and a new test method which does what Marcus asks for. 
([~krummas], do you still want a dtest still warranted?)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-30 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147112#comment-16147112
 ] 

mck edited comment on CASSANDRA-13418 at 8/30/17 12:25 PM:
---

[~KurtG], thanks for the two use cases write up. at least it's documented here 
to begin with.
The "more compactions" in the first scenario also depends on values for 
tombstone_threshold and tombstone_compaction_interval. Because of this I'm now 
sitting on the fence for whether the logged warning should be in the patch. 
It's actually an optimisation in certain situations to do the second use-case.

Patches updated:
 - the new TWCS* classes renamed to TimeWindow*, as that's the standard prefix,
 - log message shorten a little, and using the terminology (property names) as 
known by the operator,
 - logging the message via the NoSpamLogger (max one line every 15 minutes), and
 - site and cql docs updated (just in trunk)

Shall I just remove the log warning altogether now it's in the docs

|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241]
 |


was (Author: michaelsembwever):
[~KurtG], thanks for the two use cases write up. at least it's documented here 
to begin with.
The "more compactions" in the first scenario also depends on values for 
tombstone_threshold and tombstone_compaction_interval. Because of this I'm now 
sitting on the fence for whether the logged warning should be in the patch. 

Patches updated:
 - the new TWCS* classes renamed to TimeWindow*, as that's the standard prefix,
 - log message shorten a little, and using the terminology (property names) as 
known by the operator,
 - logging the message via the NoSpamLogger (max one line every 15 minutes), and
 - site and cql docs updated (just in trunk)

Shall I just remove the log warning altogether now it's in the docs

|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241]
 |

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-30 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147112#comment-16147112
 ] 

mck edited comment on CASSANDRA-13418 at 8/30/17 11:55 AM:
---

[~KurtG], thanks for the two use cases write up. at least it's documented here 
to begin with.
The "more compactions" in the first scenario also depends on values for 
tombstone_threshold and tombstone_compaction_interval. Because of this I'm now 
sitting on the fence for whether the logged warning should be in the patch. 

Patches updated:
 - the new TWCS* classes renamed to TimeWindow*, as that's the standard prefix,
 - log message shorten a little, and using the terminology (property names) as 
known by the operator,
 - logging the message via the NoSpamLogger (max one line every 15 minutes), and
 - site and cql docs updated (just in trunk)

Shall I just remove the log warning altogether now it's in the docs

|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241]
 |


was (Author: michaelsembwever):
[~KurtG], thanks for the two use cases write up. at least it's documented here 
to begin with.
The "more compactions" in the first scenario also depends on values for 
tombstone_threshold and tombstone_compaction_interval. I'm sitting on the fence 
for whether even the logged warning should be in the patch. 

Patches updated:
 - the new TWCS* classes to TimeWindow* as that's the standard prefix,
 - log message shorten a little, and using the terminology (property names) as 
known by the operator,
 - logging the message via the NoSpamLogger (max one line every 15 minutes), and
 - site and cql docs updated (just in trunk)

Shall I just remove the log warning altogether now it's in the docs

|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241]
 |

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-29 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:15 AM:


bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore* the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

In this case, you will most likely not trigger any compaction to purge 
tombstone if you run into an overlaps.


bq. 1. enabling both
When looking for sstables expired, you will *ignore* the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
locally if the current sstable is eligible.

In this case, you will always trigger compaction to purge tombstone even if you 
run into an overlaps.

-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025
the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this is what 
you want");
+}
 }
{noformat}



was (Author: rgerard):
bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore* the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore* the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
locally if the current sstable is eligible.


-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025
the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this is what 
you want");
+}
 }
{noformat}


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-29 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:11 AM:


bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore* the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025
the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this is what 
you want");
+}
 }
{noformat}



was (Author: rgerard):
bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025
the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this is what 
you want");
+}
 }
{noformat}


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-29 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:11 AM:


bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore* the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore* the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
locally if the current sstable is eligible.


-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025
the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this is what 
you want");
+}
 }
{noformat}



was (Author: rgerard):
bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore* the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025
the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this is what 
you want");
+}
 }
{noformat}


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-29 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:10 AM:


bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025
the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this is what 
you want");
+}
 }
{noformat}



was (Author: rgerard):
bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815
 

the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this what you 
want");
+}
 }
{noformat}


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-29 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:09 AM:


bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815
 

the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this what you 
want");
+}
 }
{noformat}



was (Author: rgerard):
2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815
 

the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this what you 
want");
+}
 }
{noformat}


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-29 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:09 AM:


bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


-


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815
 

the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this what you 
want");
+}
 }
{noformat}



was (Author: rgerard):
bq. 2. only enabling unsafe_aggressive_sstable_expiration

When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *not ignore* the overlaps and 
look globally if the current sstable is eligible.

bq. 1. enabling both
When looking for sstables expired, you will *ignore *the overlaps and only look 
locally if the current sstable is eligible.
When looking for sstables to compact, you will *ignore* the overlaps and look 
globally if the current sstable is eligible.


I made a new version of the patch with uncheckedTombstoneCompaction disabled 
and a warning message.
https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815
 

the diff 

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index 43c90c7042..d21222c484 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
@@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends 
AbstractCompactionStrategy
 else
 logger.debug("Enabling tombstone compactions for TWCS");

-if (this.options.ignoreOverlaps)
-this.uncheckedTombstoneCompaction = true;
-
+if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) {
+logger.warn("You are running with sstables overlapping checks 
disabled but without unchecked tombstone compaction, check that this what you 
want");
+}
 }
{noformat}


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-27 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142240#comment-16142240
 ] 

mck edited comment on CASSANDRA-13418 at 8/27/17 9:22 PM:
--

{quote}N.B: I tried to apply the syle guide found in 
.idea/codeStyleSettings.xml but it is changing me a lot of things. Do you know 
if it is up to date ?{quote}
I don't use IntelliJ so I can't answer that for you, sry. [~krummas]?
Otherwise you can ask on irc #cassandra or on the user mailing list.

{quote}I enable uncheckedTombstoneCompaction when ignoreOverlaps is 
activated{quote}
I'm -1 on this for the moment. While it holds a logic argument, as you explain, 
it's not intuitive for the user. The user has to know that this happens (via 
docs or via code). I'd be more comfortable expecting the users using an 
advanced toggle like this (requires system properties and table option) to 
appreciate the difference between {{uncheckedTombstoneCompaction}} and 
{{unsafe_aggressive_sstable_expiration}} and to enable both. Any smarts can be 
added latter on with further user feedback and experience.

Could we, instead of setting {{uncheckedTombstoneCompaction}}, log a warning 
telling the user that they probably want to {{uncheckedTombstoneCompaction}} 
set as well?


was (Author: michaelsembwever):
{quote}N.B: I tried to apply the syle guide found in 
.idea/codeStyleSettings.xml but it is changing me a lot of things. Do you know 
if it is up to date ?{quote}
I don't use IntelliJ so I can't answer that for you, sry. [~krummas]?
Otherwise you can ask on irc #cassandra or on the user mailing list.

{quote}I enable uncheckedTombstoneCompaction when ignoreOverlaps is 
activated{quote}
I'm -1 on this for the moment. While it holds a logic argument, as you explain, 
it's not intuitive for the user. The user has to know that this happens (via 
docs or via code). I'd be more comfortable expecting the users using an 
advanced toggle like this (requires system properties and table option) to 
appreciate the difference between {{uncheckedTombstoneCompaction}} and 
{{unsafe_aggressive_sstable_expiration}} and to enable both. Any smarts can be 
added latter on with further user feedback and experience.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-25 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141425#comment-16141425
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/25/17 9:41 AM:


{quote}looks just like a flakey test to me. {quote}
Ok

{quote}you can let me know if you agree{quote}
I am at peace with that :)


was (Author: rgerard):
I am at peace with that :)


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-24 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141130#comment-16141130
 ] 

mck edited comment on CASSANDRA-13418 at 8/25/17 4:50 AM:
--

{quote}is this bad new ? 
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/
{quote} looks just like a flakey test to me. 

The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, 
then it's to go under 4.0. But a patch would also be nice for 3.11.
I'll update this in thelastpickle repo, see links below, and you can let me 
know if you agree. The commit message has been updated as well, per practice.

These patches are then:

|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214]
 |



was (Author: michaelsembwever):
{quote}is this bad new ? 
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/
{quote} looks just like a flakey test to me. 

The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, 
then it's to go under 4.0. But a patch would also be nice for 3.11.
I'll update this in thelastpickle repo, see links below, and you can let me 
know if you agree. The commit message has been updated as well, per practice.

These patches are then:

|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| [testall|https://circleci.com/gh/thelastpickle/cassandra/mck%2Ftrunk_13418]   
| 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214]
 |


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-24 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141130#comment-16141130
 ] 

mck edited comment on CASSANDRA-13418 at 8/25/17 4:45 AM:
--

{quote}is this bad new ? 
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/
{quote} looks just like a flakey test to me. 

The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, 
then it's to go under 4.0. But a patch would also be nice for 3.11.
I'll update this in thelastpickle repo, see links below, and you can let me 
know if you agree. The commit message has been updated as well, per practice.

These patches are then:

|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| [testall|https://circleci.com/gh/thelastpickle/cassandra/mck%2Ftrunk_13418]   
| 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214]
 |



was (Author: michaelsembwever):
{quote}is this bad new ? 
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/
{quote} looks just like a flakey test to me. 

The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, 
then it's to go under 4.0. But a patch would also be nice for 3.11.
I'll update this in thelastpickle repo, see links below, and you can let me 
know if you agree. The commit message has been updated as well, per practice.

These patches are then:

|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/criteo-forks/cassandra/tree/mck/cassandra-3.11_13418]
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/]
 |
| 
[trunk_13418|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | [testall|https://circleci.com/gh/thelastpickle/cassandra/22]  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214]
 |


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:50 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/95c7bb758478a86abf3506fd6e3ddb5d06413bce

{{---}}


I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{---}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{---}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

{{---}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b


{{---}}


I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{---}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{---}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

{{---}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:47 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b


{{---}}


I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{---}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{---}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

{{---}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b


{{---}}


I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{---}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{---}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:46 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b


{{---}}


I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{---}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{---}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{---}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{---}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:46 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{---}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{---}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{_}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{_}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:45 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{_}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{_}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:45 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so

{{}}

{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

{{}}

{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}


{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:44 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}


* CompactionController:232 any reason not to return an immutable set?
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:44 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}


{quote} CompactionController:232 any reason not to return an immutable 
set?{quote}
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}


* CompactionController:232 any reason not to return an immutable set?
I tried to change everything to an ImmutableSet but it breaks a lot of tests.

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:39 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring them when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:36 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it, just say so


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:34 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} 
but it is changing me a lot of things. Do you know if it is up to date ?


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:32 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff and ignoreOverlaps is activated then look locally 
instead of globally}}


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff if ignoreOverlaps is activated look locally instead 
of globally}}

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:31 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable. I was willing to enforce the 
{{if you want to drop stuff if ignoreOverlaps is activated look locally instead 
of globally}}


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:26 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't confortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact, as in both case it will result in not doing the job due to checking 
globally instead of just locally to the sstable.


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:23 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}
I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated
---
Do we want this? It feels like if we expect to be able to drop entire sstables 
due to being expired, it would be pretty wasteful to run a single sstable 
tombstone compaction when there are 20% tombstones in the sstable? We would 
probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}Do we want this? It feels like if we expect to be able to drop entire 
sstables due to being expired, it would be pretty wasteful to run a single 
sstable tombstone compaction when there are 20% tombstones in the sstable? We 
would probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:22 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you 
wish, I don't any strong opinion about it


{quote}Do we want this? It feels like if we expect to be able to drop entire 
sstables due to being expired, it would be pretty wasteful to run a single 
sstable tombstone compaction when there are 20% tombstones in the sstable? We 
would probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you 
wish, I don't any strong opinion about it


{quote}Do we want this? It feels like if we expect to be able to drop entire 
sstables due to being expired, it would be pretty wasteful to run a single 
sstable tombstone compaction when there are 20% tombstones in the sstable? We 
would probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:22 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you 
wish, I don't have any strong opinion about it


{quote}Do we want this? It feels like if we expect to be able to drop entire 
sstables due to being expired, it would be pretty wasteful to run a single 
sstable tombstone compaction when there are 20% tombstones in the sstable? We 
would probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you 
wish, I don't have any strong opinion about it


{quote}Do we want this? It feels like if we expect to be able to drop entire 
sstables due to being expired, it would be pretty wasteful to run a single 
sstable tombstone compaction when there are 20% tombstones in the sstable? We 
would probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-22 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:22 AM:
-

New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you 
wish, I don't have any strong opinion about it


{quote}Do we want this? It feels like if we expect to be able to drop entire 
sstables due to being expired, it would be pretty wasteful to run a single 
sstable tombstone compaction when there are 20% tombstones in the sstable? We 
would probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 


was (Author: rgerard):
New version here 
https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b
* I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you 
wish, I don't any strong opinion about it


{quote}Do we want this? It feels like if we expect to be able to drop entire 
sstables due to being expired, it would be pretty wasteful to run a single 
sstable tombstone compaction when there are 20% tombstones in the sstable? We 
would probably be better off waiting until 100% is expired and drop the entire 
sstable without compaction?{quote}

In my case you are right, activating disableTombstoneCompaction or setting the 
tombstoneThresold high enough should be better performance wise. My intention 
when activating the option is to guarantee a consistent behavior for 
overlapping checks. I wasn't comfortable to ignore overlaps when checking for 
fully expired sstables but not ignoring it when looking for sstables to 
compact. 

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-21 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135283#comment-16135283
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/21/17 3:22 PM:


Initial review comments: 
* CHANGES.txt needs a line -> OK
* I also think so as it greatly help having a stable behavior when using TWCS 
for time series
* Not at all, it was just to pack things together and to inform the reader that 
a TWCSCompactionController exist
* OK

Trivial stuff :
* Ok
* I don't like to return Immutable collections when only the base type (Set, 
List, Map,...) is specified as due to the type erasure someone will get burn at 
runtime with that (due to unchecked exception). And also as the parent function 
already use a mutable set I sticked with that because returning sometime a 
mutable set and sometime an immutable set is kind of a leaky abstraction for me 
(Will check if I can change everything for an ImmutableSet) 
* OK
* OK

Will propose an other patch tomorrow.

P.S: The patch has been running in production since last Friday without hickups.


was (Author: rgerard):
Initial review comments: 
* CHANGES.txt needs a line -> OK
* I also think so as it greatly help having a stable behavior when using TWCS 
for time series
* Not at all, it was just to pack things together and to inform the reader that 
a TWCSCompactionController exist
* OK

Trivial stuff :
* Ok
* I don't like to return Immutable collections when only the base type (Set, 
List, Map,...) is specified as due to the type erasure someone will get burn at 
runtime with that (due to unchecked exception). And also as the parent function 
already use a mutable set I sticked with that because returning sometime a 
mutable set and sometime an immutable set is kind of a leaky abstraction for me 
(Will check if I can change everything for an ImmutableSet) 
* OK
* OK

Will propose an other patch tomorrow.

P.S: The patch has been running in production since last Friday without hickups.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-21 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135283#comment-16135283
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/21/17 3:20 PM:


Initial review comments: 
* CHANGES.txt needs a line -> OK
* I also think so as it greatly help having a stable behavior when using TWCS 
for time series
* Not at all, it was just to pack things together and to inform the reader that 
a TWCSCompactionController exist
* OK

Trivial stuff :
* Ok
* I don't like to return Immutable collections when only the base type (Set, 
List, Map,...) is specified as due to the type erasure someone will get burn at 
runtime with that (due to unchecked exception). And also as the parent function 
already use a mutable set I sticked with that because returning sometime a 
mutable set and sometime an immutable set is kind of a leaky abstraction for me 
(Will check if I can change everything for an ImmutableSet) 
* OK
* OK

Will propose an other patch tomorrow.

P.S: The patch has been running in production since last Friday without hickups.


was (Author: rgerard):
Initial review comments: 
* CHANGES.txt needs a line -> OK
* I also think so as it greatly help having a stable behavior when using TWCS 
for time series
* Not at all, it was just to pack things together and to inform the reader that 
a TWCSCompactionController exist
* OK

Trivial stuff :
* Ok
* I don't like to return Immutable collections when only the base type (Set, 
List, Map,...) is specified as due to the type erasure someone will get burn at 
runtime with that (due to unchecked exception). And also as the parent function 
already use a mutable set I sticked with that because returning sometime a 
mutable set and sometime an immutable set is kind of a leaky abstraction for me 
(Will check if I can change everything for an ImmutableSet) 
* OK
* OK

Will propose an other patch tomorrow.

P.S: The patch has been running in production since last Friday without hickups.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-21 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135283#comment-16135283
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/21/17 3:20 PM:


Initial review comments: 
* CHANGES.txt needs a line -> OK
* I also think so as it greatly help having a stable behavior when using TWCS 
for time series
* Not at all, it was just to pack things together and to inform the reader that 
a TWCSCompactionController exist
* OK

Trivial stuff :
* Ok
* I don't like to return Immutable collections when only the base type (Set, 
List, Map,...) is specified as due to the type erasure someone will get burn at 
runtime with that (due to unchecked exception). And also as the parent function 
already use a mutable set I sticked with that because returning sometime a 
mutable set and sometime an immutable set is kind of a leaky abstraction for me 
(Will check if I can change everything for an ImmutableSet) 
* OK
* OK

Will propose an other patch tomorrow.

P.S: The patch has been running in production since last Friday without hickups.


was (Author: rgerard):
Initial review comments: 
* CHANGES.txt needs a line -> OK
* I think so also as it greatly help having a stable behavior when using TWCS 
for time series
* Not at all, it was just to pack things together and to inform the reader that 
a TWCSCompactionController exist
* OK

Trivial stuff :
* Ok
* I don't like to return Immutable collections when only the base type (Set, 
List, Map,...) is specified as due to the type erasure someone will get burn at 
runtime with that (due to unchecked exception). And also as the parent function 
already use a mutable set I sticked with that because returning sometime a 
mutable set and sometime an immutable set is kind of a leaky abstraction for me 
(Will check if I can change everything for an ImmutableSet) 
* OK
* OK

Will propose an other patch tomorrow.

P.S: The patch has been running in production since last Friday without hickups.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-21 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/21/17 11:16 AM:
-

Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 * Used [~krummas] way for introducing the ignore Overlaps
 * I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 * I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case. (was the default behavior in the last patch)
* Added a simple test case. I will look to add more (feel free to suggest some)
* Rebased upon trunk

Every tests passes (ant test) and I will deploy this patch internally to 
confirm that it works as expected.
If you have any remarks [~krummas] in the mean time


was (Author: rgerard):
Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case. (was the default behavior in the last patch)
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests passes (ant test) and I will deploy this patch internally to 
confirm that it works as expected.
If you have any remarks [~krummas] in the mean time

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-21 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135028#comment-16135028
 ] 

mck edited comment on CASSANDRA-13418 at 8/21/17 10:55 AM:
---

[~rgerard],

Initial review comments: 
 - {{CHANGES.txt}} needs a line
 - is this suitable for 3.11.x as well? (Does it constitute a stability patch? 
i think so, but that's an opinion)
 - is the method {{TWCSCompactionController.getFullyExpiredSSTables(..)}} 
really needed? (CompactionController:170 seems to do what we need already, or 
am i missing something?)
 - {{CompactionController.ignoreOverlaps()}} is a bit of a concept, it deserves 
some apidoc love.

Trivial stuff…
 - CompactionController:108 missing space in {{if(}}, same on :170
 - CompactionController:232 any reason not to return an immutable set?
 - TWCSCompactionController:29 needs a blank line after
 - TWCSCompactionController:33 java declaration order. (static fields go before 
member fields)
 - TimeWindowCompactionStrategy:70  missing space in {{if(}}


was (Author: michaelsembwever):
[~rgerard],

Initial review comments: 
 - {{CHANGES.txt}} needs a line
 - is this suitable for 3.11.x as well? (Does it constitute a stability patch? 
i think so, but that's an opinion)
 - is the method {{TWCSCompactionController.getFullyExpiredSSTables(..)}} 
really needed? (CompactionController:170 seems to do what we need already)
 - {{CompactionController.ignoreOverlaps()}} is a bit of a concept, it deserves 
some apidoc love.

Trivial stuff…
 - CompactionController:108 missing space in {{if(}}, same on :170
 - CompactionController:232 any reason not to return an immutable set?
 - TWCSCompactionController:29 needs a blank line after
 - TWCSCompactionController:33 java declaration order. (static fields go before 
member fields)
 - TimeWindowCompactionStrategy:70  missing space in {{if(}}

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-17 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:57 AM:


Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case. (was the default behavior in the last patch)
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests passes (ant test) and I will deploy this patch internally to 
confirm that it works as expected.
If you have any remarks [~krummas] in the mean time


was (Author: rgerard):
Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case. (was the default behavior in the last patch)
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests passes and I will deploy this patch internally to confirm that it 
works as expected.
If you have any remarks [~krummas] in the mean time

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-17 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:53 AM:


Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case. (was the default behavior in the last patch)
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests passes and I will deploy this patch internally to confirm that it 
works as expected.
If you have any remarks [~krummas] in the mean time


was (Author: rgerard):
Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests passes and I will deploy this patch internally to confirm that it 
works as expected.
If you have any remarks [~krummas] in the mean time

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-17 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:52 AM:


Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests passes and I will deploy this patch internally to confirm that it 
works as expected.
If you have any remarks [~krummas] in the mean time


was (Author: rgerard):
Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests passes and I will deploy this patch internally to confirm that it 
works as expected

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-17 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:51 AM:


Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests passes and I will deploy this patch internally to confirm that it 
works as expected


was (Author: rgerard):
Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests pass and I will deploy this patch internally to confirm that it 
works as expected

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-17 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:43 AM:


Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest some)
+ Rebased upon trunk

Every tests pass and I will deploy this patch internally to confirm that it 
works as expected


was (Author: rgerard):
Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest somes)
+ Rebased upon trunk

Every tests pass and I will deploy this patch internally to confirm that it 
works as expected

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-17 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:41 AM:


Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest somes)
+ Rebased upon trunk

Every tests pass and I will deploy this patch internally to confirm that it 
works as expected


was (Author: rgerard):
Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest somes)
+ Rebased upon trunk

Every tests passed and I will deploy this patch internally to confirm that it 
works as expected

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-08-17 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:41 AM:


Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest somes)
+ Rebased upon trunk

Every tests passed and I will deploy this patch internally to confirm that it 
works as expected


was (Author: rgerard):
Hi,

I am back with a new proposition 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05

Majors differences : 
 + Used [~krummas] way for introducing the ignore Overlaps
 + I splitted the function that is doing the overlapingChecks as in the 
previous patch, I was wrongfully checking for overlaps in memtables (even if 
the option was activated) 
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170
 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated  
https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71
It seems a sane default for me, as even if we drop fully expired sstables, 
we will still check for worth Dropping ones and we want to also ignore overlaps 
check in this case.
+ Added a simple test case. I will look to add more (feel free to suggest somes)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-07-05 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074581#comment-16074581
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 7/5/17 11:02 AM:


Seems better and I am going to test it.
I will keep you updated of the result.

Thanks [~krummas] for the direction !


was (Author: rgerard):
Seems better and will try it out.
I will keep you updated of the result.

Thanks [~krummas] for the direction !

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-07-05 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 7/5/17 8:56 AM:
---

Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWCS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sum up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question,
I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
 in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sum up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question,
I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
 in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-07-05 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074408#comment-16074408
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 7/5/17 8:26 AM:
---

Sorry about the bad name :(

So here is the current patch we are using in production
https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82
 
and the branch we are using is this one 
https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo


was (Author: rgerard):
Sorry about the bad name :(

So here is the current patch we are using in production
https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82
 

and the branch we are using is this one 
https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-07-05 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074408#comment-16074408
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 7/5/17 8:26 AM:
---

Sorry about the bad name :(

So here is the current patch we are using in production
https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82
 

and the branch we are using is this one 
https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo


was (Author: rgerard):
Sorry about the bad name :(

So here is the current patch we are using in production
https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82
 and the branch we are using is this one 
https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 8:20 PM:


Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sum up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question,
I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
 in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sum up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
 in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 8:18 PM:


Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sum up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 8:18 PM:


Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sum up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
 in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sum up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:24 PM:


Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:24 PM:


Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS as not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:24 PM:


Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the option is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS was not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the option in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:23 PM:


Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165].
 So only modifying things at the TWS level would have resulted in compacting 
the sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.

It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS as not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]
So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS as not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:23 PM:


Hi back Marcus,

So I took into account your comments and regarding the 1rst one I wanted to do 
that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]
So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS as not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding your the 1rst one I wanted 
to do that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]
So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS as not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:22 PM:


Hi back Marcus,

So I took into account your comments and regarding your the 1rst one I wanted 
to do that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]
So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)
To sump up moving things closer to TWCS as not possible (to me) without 
impacting more external code. 

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding your the 1rst one I wanted 
to do that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]

So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:21 PM:


Hi back Marcus,

So I took into account your comments and regarding your the 1rst one I wanted 
to do that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]

So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)

Regarding the 2nd question I put the code validating the options in 
[TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157]
in order to [trigger an 
exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161]
 if the options is used elsewhere than TWCS.







P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding your the 1rst one I wanted 
to do that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]

So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)

Regarding the 2nd question I put the code validating the options in 
TimeWindowCompactionStategyOptions 
https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157
in order to trigger an exception if the options is used elsewhere than TWCS.
https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161






P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-28 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:20 PM:


Hi back Marcus,

So I took into account your comments and regarding your the 1rst one I wanted 
to do that at first but 
getFullyExpiredSSTables is also used in 
[CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]

So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones [ignoring 
overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141]
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)

Regarding the 2nd question I put the code validating the options in 
TimeWindowCompactionStategyOptions 
https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157
in order to trigger an exception if the options is used elsewhere than TWCS.
https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161






P.s: I will have more time in the upcoming days, so I will be more responsive.


was (Author: rgerard):
Hi back Marcus,

So I took into account your comments and regarding your the 1rst one I wanted 
to do that at first but 
getFullyExpiredSSTables is also used in CompactionTask
https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165
So only modifying things at the TWS level would have resulted in compacting the 
sstables that we wanted to drop, and I was not too incline to touch to 
CompactionTask.
It is also making worthDroppingTombstones ignoring overlaps
 
https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141
 and respect the tombstoneThresold specified (We can turn on 
uncheckedTombstoneCompaction for this one)

Regarding the 2nd question I put the code validating the options in 
TimeWindowCompactionStategyOptions 
https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157
in order to trigger an exception if the options is used elsewhere than TWCS.
https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161






P.s: I will have more time in the upcoming days, so I will be more responsive.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-21 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057392#comment-16057392
 ] 

Corentin Chary edited comment on CASSANDRA-13418 at 6/21/17 12:21 PM:
--

Latest version of the patch works as it should: 
https://github.com/criteo-forks/cassandra/commit/da4a5c17448dab64aeb4295bb7401afbea9edf51

!twcs-cleanup.png!


was (Author: iksaif):
Latest version of the patch works as it should: 
https://github.com/criteo-forks/cassandra/commit/da4a5c17448dab64aeb4295bb7401afbea9edf51

!twcs-cleanup.png|thumbnail!

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-20 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055289#comment-16055289
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 6/20/17 7:44 AM:


Hello Jonathan, thanks to keep us up to date :)
On my side, I have deployed the patch I mentioned earlier and at first glance 
it is running fine. For now, I lack the time to analyse the new behavior 
further and more in depth but I will do it in the upcoming weeks.
I will keep the thread informed.


was (Author: rgerard):
Hello Jonathan, thanks to keep us up to date :)
On my side, I have deployed the patch I mentioned earlier and at first glance 
it is running fine. For now, I lack the time to analyse the new behavior 
further and more in depth but I will do it in the upcoming weeks. So I will 
keep the thread informed.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-05-10 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000735#comment-16000735
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 5/10/17 7:32 AM:


I am trying things out by merging your ideas [~iksaif], [~jjirsa], 
[~adejanovski]
https://github.com/erebe/cassandra/commit/f70b1efa5e2b589a5d4fa7245cd307b693ca701c

but I am not sure of what to do if one node of the ring has not activated 
cassadra with -Dcassandra.unsafe.xxx
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101
for now I just disable it with a warning even if the compactionParams says 
otherwise.

Let me know if this is not the right direction for you



was (Author: rgerard):
I am trying things out by merging your ideas [~iksaif], [~jjirsa], 
[~adejanovski]
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417

but I am not sure of what to do if one node of the ring has not activated 
cassadra with -Dcassandra.unsafe.xxx
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101
for now I just disable it with a warning even if the compactionParams says 
otherwise.

Let me know if this is not the right direction for you


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-05-09 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000735#comment-16000735
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 5/9/17 10:56 AM:


I am trying things out by merging your ideas [~iksaif], [~jjirsa], 
[~adejanovski]
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417

but I am not sure of what to do if one node of the ring has not activated 
cassadra with -Dcassandra.unsafe.xxx
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101
for now I just disable it with a warning even if the compactionParams says 
otherwise.

Let me know if this is not the right direction for you



was (Author: rgerard):
I am trying things out by merging your ideas [~iksaif], [~jjirsa], 
[~adejanovski]
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417

but I am not sure of what do if one node of the ring has not activated cassadra 
with -Dcassandra.unsafe.xxx
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101
for now I just disable it with a warning even if the compactionParams says 
otherwise.

Let me know if this is not the right direction for you


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-05-08 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000735#comment-16000735
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 5/8/17 1:38 PM:
---

I am trying things out by merging your ideas [~iksaif], [~jjirsa], 
[~adejanovski]
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417

but I am not sure of what do if one node of the ring has not activated cassadra 
with -Dcassandra.unsafe.xxx
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101
for now I just disable it with a warning even if the compactionParams says 
otherwise.

Let me know if this is not the right direction for you



was (Author: rgerard):
I am trying things out by merging your ideas [~iksaif], [~jjirsa], 
[~adejanovski]
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417

but I am not sure of what do if one node of the ring has not activated cassadra 
with -Dcassandra.unsafe.xxx
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101

Let me know if this is not the right direction for you


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-05-08 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000735#comment-16000735
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 5/8/17 1:37 PM:
---

I am trying things out by merging your ideas [~iksaif], [~jjirsa], 
[~adejanovski]
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417

but I am not sure of what do if one node of the ring has not activated cassadra 
with -Dcassandra.unsafe.xxx
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101

Let me know if this is not the right direction for you



was (Author: rgerard):
I am trying things out by merging your ideas [~iksaif] [~jjirsa] [~adejanovski]
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417

but I am not sure of what do if one node of the ring has not activated cassadra 
with -Dcassandra.unsafe.xxx
https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101

Let me know if this is not the right direction for you


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-27 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15987267#comment-15987267
 ] 

Jeff Jirsa edited comment on CASSANDRA-13418 at 4/27/17 6:56 PM:
-

I think Marcus' concern is valid, but having run TWCS in production for a long 
time, I really wish we just had a dangerous-sounding option that defaulted into 
a safe state that would let append-only users ignore overlaps when they want to 
drop sstables.

Adding code to flush/split read repaired data to different sstables is a lot 
more invasive, and would require follow-up changes to TWCS (so as not to try to 
immediately recompact those sstables with the larger post-window-major large 
final sstables). We talked about doing this in 9666 (in fact, we committed to 
it as a condition of merging TWCS), and I think it's probably a perfectly 
reasonable thing to do, but it's a lot more effort than simply telling 
cassandra "this table has no deletes, we don't care about overlaps".

Maybe the right thing is to get 9779 done so we can block deletes, and then 
this is a much-less-scary option?



was (Author: jjirsa):
I think Marcus' concern is valid, but having run TWCS in production for a long 
time, I really wish we just had a dangerous-sounding option that defaulted into 
a safe state that would let append-only users ignore overlaps when they want to 
drop sstables.

Adding code to flush read repaired data to different sstables is a lot more 
invasive, and would require follow-up changes to TWCS (so as not to try to 
immediately recompact those sstables with the larger post-window-major large 
final sstables). We talked about doing this in 9666 (in fact, we committed to 
it as a condition of merging TWCS), and I think it's probably a perfectly 
reasonable thing to do, but it's a lot more effort than simply telling 
cassandra "this table has no deletes, we don't care about overlaps".

Maybe the right thing is to get 9779 done so we can block deletes, and then 
this is a much-less-scary option?


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-08 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961710#comment-15961710
 ] 

Jeff Jirsa edited comment on CASSANDRA-13418 at 4/8/17 6:49 AM:


{quote}
What do you think about provide_overlapping_tombstones = "ignore" ? This is a 
little would integrate nicely with the code and does not add yet another 
compaction option (but sounds a little weird).
{quote}

As a table property instead of as a compaction property? Or as a system 
property? It feels like a compaction property to me, [~krummas] do you have any 
suggestions on how you feel something like this should be done (or, if it 
should be done at all)? 




was (Author: jjirsa):
{quote}
What do you think about provide_overlapping_tombstones = "ignore" ? This is a 
little would integrate nicely with the code and does not add yet another 
compaction option (but sounds a little weird).
{quote}

As a table property instead of as a compaction property? It feels like a 
compaction property to me, [~krummas] do you have any suggestions on how you 
feel something like this should be done (or, if it should be done at all)? 



> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-07 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961662#comment-15961662
 ] 

Jeff Jirsa edited comment on CASSANDRA-13418 at 4/8/17 3:29 AM:


unchecked_tombstone_compaction and tombstone_compaction_interval help here, but 
it's still not necessary - you may be able to promise/guarantee that all writes 
are appends (not overwrites or deletes), in which case there's no reason to 
recompact all the sstables to incrementally remove garbage, when the data truly 
is expired and obsolete.

This sorta borders on (but isn't directly the same as) CASSANDRA-9779 - if we 
know we're not going to overwrite/delete any data, there are a number of 
optimizations that can be done. 


was (Author: jjirsa):
unchecked_tombstone_compaction and tombstone_compaction_interval help here, but 
it's still not necessary - you may be able to promise/guarantee that all writes 
are appends (not overwrites or deletes), in which case there's no reason to 
recompact all the sstables to incrementally remove garbage, when the data truly 
is expired and obsolete.


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)