[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150446#comment-16150446 ] mck edited comment on CASSANDRA-13418 at 9/4/17 5:31 AM: - Updated: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/267] | was (Author: michaelsembwever): Updated: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151196#comment-16151196 ] mck edited comment on CASSANDRA-13418 at 9/1/17 9:39 PM: - {quote}P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it.{quote} I agree, but found no clear method name to use. As Marcus' comments, {{getFullyExpiredSSTables(..)}} isn't appropriate. Any suggestions for a clear name? Otherwise the method is at 70 lines length, not great but no disaster, so i'm ok either way. was (Author: michaelsembwever): {quote}P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it.{quote} I agree, but found not clear method name to use. As Marcus' comments, {{getFullyExpiredSSTables(..)}} isn't appropriate. Any suggestions for a clear name? Otherwise the method is at 70 lines length, not great but no disaster, so i'm ok either way. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559 ] Romain GERARD edited comment on CASSANDRA-13418 at 9/1/17 1:56 PM: --- Don't worry [~michaelsembwever], I am currently working on an issue with couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. was (Author: rgerard): Don't worry [~michaelsembwever], I am currently working with an issue on couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559 ] Romain GERARD edited comment on CASSANDRA-13418 at 9/1/17 1:51 PM: --- Don't worry [~michaelsembwever], I am currently working with an issue on couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. was (Author: rgerard): Don't worry [~mck], I am currently working with an issue on couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409 ] mck edited comment on CASSANDRA-13418 at 9/1/17 12:15 PM: -- [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest?) was (Author: michaelsembwever): [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest?) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409 ] mck edited comment on CASSANDRA-13418 at 9/1/17 12:06 PM: -- [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest?) was (Author: michaelsembwever): [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest still warranted?) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409 ] mck edited comment on CASSANDRA-13418 at 9/1/17 12:05 PM: -- [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest still warranted?) was (Author: michaelsembwever): [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyTest.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest still warranted?) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147112#comment-16147112 ] mck edited comment on CASSANDRA-13418 at 8/30/17 12:25 PM: --- [~KurtG], thanks for the two use cases write up. at least it's documented here to begin with. The "more compactions" in the first scenario also depends on values for tombstone_threshold and tombstone_compaction_interval. Because of this I'm now sitting on the fence for whether the logged warning should be in the patch. It's actually an optimisation in certain situations to do the second use-case. Patches updated: - the new TWCS* classes renamed to TimeWindow*, as that's the standard prefix, - log message shorten a little, and using the terminology (property names) as known by the operator, - logging the message via the NoSpamLogger (max one line every 15 minutes), and - site and cql docs updated (just in trunk) Shall I just remove the log warning altogether now it's in the docs || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | was (Author: michaelsembwever): [~KurtG], thanks for the two use cases write up. at least it's documented here to begin with. The "more compactions" in the first scenario also depends on values for tombstone_threshold and tombstone_compaction_interval. Because of this I'm now sitting on the fence for whether the logged warning should be in the patch. Patches updated: - the new TWCS* classes renamed to TimeWindow*, as that's the standard prefix, - log message shorten a little, and using the terminology (property names) as known by the operator, - logging the message via the NoSpamLogger (max one line every 15 minutes), and - site and cql docs updated (just in trunk) Shall I just remove the log warning altogether now it's in the docs || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147112#comment-16147112 ] mck edited comment on CASSANDRA-13418 at 8/30/17 11:55 AM: --- [~KurtG], thanks for the two use cases write up. at least it's documented here to begin with. The "more compactions" in the first scenario also depends on values for tombstone_threshold and tombstone_compaction_interval. Because of this I'm now sitting on the fence for whether the logged warning should be in the patch. Patches updated: - the new TWCS* classes renamed to TimeWindow*, as that's the standard prefix, - log message shorten a little, and using the terminology (property names) as known by the operator, - logging the message via the NoSpamLogger (max one line every 15 minutes), and - site and cql docs updated (just in trunk) Shall I just remove the log warning altogether now it's in the docs || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | was (Author: michaelsembwever): [~KurtG], thanks for the two use cases write up. at least it's documented here to begin with. The "more compactions" in the first scenario also depends on values for tombstone_threshold and tombstone_compaction_interval. I'm sitting on the fence for whether even the logged warning should be in the patch. Patches updated: - the new TWCS* classes to TimeWindow* as that's the standard prefix, - log message shorten a little, and using the terminology (property names) as known by the operator, - logging the message via the NoSpamLogger (max one line every 15 minutes), and - site and cql docs updated (just in trunk) Shall I just remove the log warning altogether now it's in the docs || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard]
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:15 AM: bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore* the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. In this case, you will most likely not trigger any compaction to purge tombstone if you run into an overlaps. bq. 1. enabling both When looking for sstables expired, you will *ignore* the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look locally if the current sstable is eligible. In this case, you will always trigger compaction to purge tombstone even if you run into an overlaps. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this is what you want"); +} } {noformat} was (Author: rgerard): bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore* the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore* the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look locally if the current sstable is eligible. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this is what you want"); +} } {noformat} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:11 AM: bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore* the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this is what you want"); +} } {noformat} was (Author: rgerard): bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this is what you want"); +} } {noformat} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:11 AM: bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore* the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore* the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look locally if the current sstable is eligible. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this is what you want"); +} } {noformat} was (Author: rgerard): bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore* the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this is what you want"); +} } {noformat} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:10 AM: bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/1800b23ddfbb308645c44022e15c1760a0124025 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this is what you want"); +} } {noformat} was (Author: rgerard): bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this what you want"); +} } {noformat} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:09 AM: bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this what you want"); +} } {noformat} was (Author: rgerard): 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this what you want"); +} } {noformat} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can.
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144901#comment-16144901 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/29/17 8:09 AM: bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. - I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this what you want"); +} } {noformat} was (Author: rgerard): bq. 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this what you want"); +} } {noformat} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142240#comment-16142240 ] mck edited comment on CASSANDRA-13418 at 8/27/17 9:22 PM: -- {quote}N.B: I tried to apply the syle guide found in .idea/codeStyleSettings.xml but it is changing me a lot of things. Do you know if it is up to date ?{quote} I don't use IntelliJ so I can't answer that for you, sry. [~krummas]? Otherwise you can ask on irc #cassandra or on the user mailing list. {quote}I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated{quote} I'm -1 on this for the moment. While it holds a logic argument, as you explain, it's not intuitive for the user. The user has to know that this happens (via docs or via code). I'd be more comfortable expecting the users using an advanced toggle like this (requires system properties and table option) to appreciate the difference between {{uncheckedTombstoneCompaction}} and {{unsafe_aggressive_sstable_expiration}} and to enable both. Any smarts can be added latter on with further user feedback and experience. Could we, instead of setting {{uncheckedTombstoneCompaction}}, log a warning telling the user that they probably want to {{uncheckedTombstoneCompaction}} set as well? was (Author: michaelsembwever): {quote}N.B: I tried to apply the syle guide found in .idea/codeStyleSettings.xml but it is changing me a lot of things. Do you know if it is up to date ?{quote} I don't use IntelliJ so I can't answer that for you, sry. [~krummas]? Otherwise you can ask on irc #cassandra or on the user mailing list. {quote}I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated{quote} I'm -1 on this for the moment. While it holds a logic argument, as you explain, it's not intuitive for the user. The user has to know that this happens (via docs or via code). I'd be more comfortable expecting the users using an advanced toggle like this (requires system properties and table option) to appreciate the difference between {{uncheckedTombstoneCompaction}} and {{unsafe_aggressive_sstable_expiration}} and to enable both. Any smarts can be added latter on with further user feedback and experience. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141425#comment-16141425 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/25/17 9:41 AM: {quote}looks just like a flakey test to me. {quote} Ok {quote}you can let me know if you agree{quote} I am at peace with that :) was (Author: rgerard): I am at peace with that :) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141130#comment-16141130 ] mck edited comment on CASSANDRA-13418 at 8/25/17 4:50 AM: -- {quote}is this bad new ? https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/ {quote} looks just like a flakey test to me. The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, then it's to go under 4.0. But a patch would also be nice for 3.11. I'll update this in thelastpickle repo, see links below, and you can let me know if you agree. The commit message has been updated as well, per practice. These patches are then: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214] | was (Author: michaelsembwever): {quote}is this bad new ? https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/ {quote} looks just like a flakey test to me. The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, then it's to go under 4.0. But a patch would also be nice for 3.11. I'll update this in thelastpickle repo, see links below, and you can let me know if you agree. The commit message has been updated as well, per practice. These patches are then: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141130#comment-16141130 ] mck edited comment on CASSANDRA-13418 at 8/25/17 4:45 AM: -- {quote}is this bad new ? https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/ {quote} looks just like a flakey test to me. The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, then it's to go under 4.0. But a patch would also be nice for 3.11. I'll update this in thelastpickle repo, see links below, and you can let me know if you agree. The commit message has been updated as well, per practice. These patches are then: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214] | was (Author: michaelsembwever): {quote}is this bad new ? https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/ {quote} looks just like a flakey test to me. The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, then it's to go under 4.0. But a patch would also be nice for 3.11. I'll update this in thelastpickle repo, see links below, and you can let me know if you agree. The commit message has been updated as well, per practice. These patches are then: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/criteo-forks/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/] | | [trunk_13418|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/22] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:50 AM: - New version here https://github.com/criteo-forks/cassandra/commit/95c7bb758478a86abf3506fd6e3ddb5d06413bce {{---}} I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{---}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{---}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. {{---}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b {{---}} I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{---}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{---}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. {{---}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:47 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b {{---}} I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{---}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{---}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. {{---}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b {{---}} I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{---}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{---}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:46 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b {{---}} I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{---}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{---}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{---}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{---}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:46 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{---}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{---}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{_}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{_}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:45 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{_}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{_}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question:
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:45 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {{}} {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {{}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:44 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} * CompactionController:232 any reason not to return an immutable set? I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:44 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} {quote} CompactionController:232 any reason not to return an immutable set?{quote} I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} * CompactionController:232 any reason not to return an immutable set? I tried to change everything to an ImmutableSet but it breaks a lot of tests. N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? >
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:39 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring them when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:36 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it, just say so {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:34 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} N.B: I tried to apply the syle guide found in {{.idea/codeStyleSettings.xml}} but it is changing me a lot of things. Do you know if it is up to date ? was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:32 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff and ignoreOverlaps is activated then look locally instead of globally}} was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff if ignoreOverlaps is activated look locally instead of globally}} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. --
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:31 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. I was willing to enforce the {{if you want to drop stuff if ignoreOverlaps is activated look locally instead of globally}} was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:26 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't confortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact, as in both case it will result in not doing the job due to checking globally instead of just locally to the sstable. was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:23 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote} I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated --- Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote}Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:22 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you wish, I don't any strong opinion about it {quote}Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you wish, I don't any strong opinion about it {quote}Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:22 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove {{TWCSCompactionController.getFullyExpiredSSTables(..)}} if you wish, I don't have any strong opinion about it {quote}Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you wish, I don't have any strong opinion about it {quote}Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136652#comment-16136652 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/22/17 11:22 AM: - New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you wish, I don't have any strong opinion about it {quote}Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. was (Author: rgerard): New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you wish, I don't any strong opinion about it {quote}Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135283#comment-16135283 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/21/17 3:22 PM: Initial review comments: * CHANGES.txt needs a line -> OK * I also think so as it greatly help having a stable behavior when using TWCS for time series * Not at all, it was just to pack things together and to inform the reader that a TWCSCompactionController exist * OK Trivial stuff : * Ok * I don't like to return Immutable collections when only the base type (Set, List, Map,...) is specified as due to the type erasure someone will get burn at runtime with that (due to unchecked exception). And also as the parent function already use a mutable set I sticked with that because returning sometime a mutable set and sometime an immutable set is kind of a leaky abstraction for me (Will check if I can change everything for an ImmutableSet) * OK * OK Will propose an other patch tomorrow. P.S: The patch has been running in production since last Friday without hickups. was (Author: rgerard): Initial review comments: * CHANGES.txt needs a line -> OK * I also think so as it greatly help having a stable behavior when using TWCS for time series * Not at all, it was just to pack things together and to inform the reader that a TWCSCompactionController exist * OK Trivial stuff : * Ok * I don't like to return Immutable collections when only the base type (Set, List, Map,...) is specified as due to the type erasure someone will get burn at runtime with that (due to unchecked exception). And also as the parent function already use a mutable set I sticked with that because returning sometime a mutable set and sometime an immutable set is kind of a leaky abstraction for me (Will check if I can change everything for an ImmutableSet) * OK * OK Will propose an other patch tomorrow. P.S: The patch has been running in production since last Friday without hickups. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135283#comment-16135283 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/21/17 3:20 PM: Initial review comments: * CHANGES.txt needs a line -> OK * I also think so as it greatly help having a stable behavior when using TWCS for time series * Not at all, it was just to pack things together and to inform the reader that a TWCSCompactionController exist * OK Trivial stuff : * Ok * I don't like to return Immutable collections when only the base type (Set, List, Map,...) is specified as due to the type erasure someone will get burn at runtime with that (due to unchecked exception). And also as the parent function already use a mutable set I sticked with that because returning sometime a mutable set and sometime an immutable set is kind of a leaky abstraction for me (Will check if I can change everything for an ImmutableSet) * OK * OK Will propose an other patch tomorrow. P.S: The patch has been running in production since last Friday without hickups. was (Author: rgerard): Initial review comments: * CHANGES.txt needs a line -> OK * I also think so as it greatly help having a stable behavior when using TWCS for time series * Not at all, it was just to pack things together and to inform the reader that a TWCSCompactionController exist * OK Trivial stuff : * Ok * I don't like to return Immutable collections when only the base type (Set, List, Map,...) is specified as due to the type erasure someone will get burn at runtime with that (due to unchecked exception). And also as the parent function already use a mutable set I sticked with that because returning sometime a mutable set and sometime an immutable set is kind of a leaky abstraction for me (Will check if I can change everything for an ImmutableSet) * OK * OK Will propose an other patch tomorrow. P.S: The patch has been running in production since last Friday without hickups. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135283#comment-16135283 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/21/17 3:20 PM: Initial review comments: * CHANGES.txt needs a line -> OK * I also think so as it greatly help having a stable behavior when using TWCS for time series * Not at all, it was just to pack things together and to inform the reader that a TWCSCompactionController exist * OK Trivial stuff : * Ok * I don't like to return Immutable collections when only the base type (Set, List, Map,...) is specified as due to the type erasure someone will get burn at runtime with that (due to unchecked exception). And also as the parent function already use a mutable set I sticked with that because returning sometime a mutable set and sometime an immutable set is kind of a leaky abstraction for me (Will check if I can change everything for an ImmutableSet) * OK * OK Will propose an other patch tomorrow. P.S: The patch has been running in production since last Friday without hickups. was (Author: rgerard): Initial review comments: * CHANGES.txt needs a line -> OK * I think so also as it greatly help having a stable behavior when using TWCS for time series * Not at all, it was just to pack things together and to inform the reader that a TWCSCompactionController exist * OK Trivial stuff : * Ok * I don't like to return Immutable collections when only the base type (Set, List, Map,...) is specified as due to the type erasure someone will get burn at runtime with that (due to unchecked exception). And also as the parent function already use a mutable set I sticked with that because returning sometime a mutable set and sometime an immutable set is kind of a leaky abstraction for me (Will check if I can change everything for an ImmutableSet) * OK * OK Will propose an other patch tomorrow. P.S: The patch has been running in production since last Friday without hickups. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/21/17 11:16 AM: - Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : * Used [~krummas] way for introducing the ignore Overlaps * I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 * I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. (was the default behavior in the last patch) * Added a simple test case. I will look to add more (feel free to suggest some) * Rebased upon trunk Every tests passes (ant test) and I will deploy this patch internally to confirm that it works as expected. If you have any remarks [~krummas] in the mean time was (Author: rgerard): Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. (was the default behavior in the last patch) + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests passes (ant test) and I will deploy this patch internally to confirm that it works as expected. If you have any remarks [~krummas] in the mean time > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135028#comment-16135028 ] mck edited comment on CASSANDRA-13418 at 8/21/17 10:55 AM: --- [~rgerard], Initial review comments: - {{CHANGES.txt}} needs a line - is this suitable for 3.11.x as well? (Does it constitute a stability patch? i think so, but that's an opinion) - is the method {{TWCSCompactionController.getFullyExpiredSSTables(..)}} really needed? (CompactionController:170 seems to do what we need already, or am i missing something?) - {{CompactionController.ignoreOverlaps()}} is a bit of a concept, it deserves some apidoc love. Trivial stuff… - CompactionController:108 missing space in {{if(}}, same on :170 - CompactionController:232 any reason not to return an immutable set? - TWCSCompactionController:29 needs a blank line after - TWCSCompactionController:33 java declaration order. (static fields go before member fields) - TimeWindowCompactionStrategy:70 missing space in {{if(}} was (Author: michaelsembwever): [~rgerard], Initial review comments: - {{CHANGES.txt}} needs a line - is this suitable for 3.11.x as well? (Does it constitute a stability patch? i think so, but that's an opinion) - is the method {{TWCSCompactionController.getFullyExpiredSSTables(..)}} really needed? (CompactionController:170 seems to do what we need already) - {{CompactionController.ignoreOverlaps()}} is a bit of a concept, it deserves some apidoc love. Trivial stuff… - CompactionController:108 missing space in {{if(}}, same on :170 - CompactionController:232 any reason not to return an immutable set? - TWCSCompactionController:29 needs a blank line after - TWCSCompactionController:33 java declaration order. (static fields go before member fields) - TimeWindowCompactionStrategy:70 missing space in {{if(}} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:57 AM: Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. (was the default behavior in the last patch) + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests passes (ant test) and I will deploy this patch internally to confirm that it works as expected. If you have any remarks [~krummas] in the mean time was (Author: rgerard): Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. (was the default behavior in the last patch) + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests passes and I will deploy this patch internally to confirm that it works as expected. If you have any remarks [~krummas] in the mean time > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:53 AM: Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. (was the default behavior in the last patch) + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests passes and I will deploy this patch internally to confirm that it works as expected. If you have any remarks [~krummas] in the mean time was (Author: rgerard): Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests passes and I will deploy this patch internally to confirm that it works as expected. If you have any remarks [~krummas] in the mean time > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail:
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:52 AM: Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests passes and I will deploy this patch internally to confirm that it works as expected. If you have any remarks [~krummas] in the mean time was (Author: rgerard): Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests passes and I will deploy this patch internally to confirm that it works as expected > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:51 AM: Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests passes and I will deploy this patch internally to confirm that it works as expected was (Author: rgerard): Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests pass and I will deploy this patch internally to confirm that it works as expected > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:43 AM: Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest some) + Rebased upon trunk Every tests pass and I will deploy this patch internally to confirm that it works as expected was (Author: rgerard): Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest somes) + Rebased upon trunk Every tests pass and I will deploy this patch internally to confirm that it works as expected > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:41 AM: Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest somes) + Rebased upon trunk Every tests pass and I will deploy this patch internally to confirm that it works as expected was (Author: rgerard): Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest somes) + Rebased upon trunk Every tests passed and I will deploy this patch internally to confirm that it works as expected > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130152#comment-16130152 ] Romain GERARD edited comment on CASSANDRA-13418 at 8/17/17 9:41 AM: Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest somes) + Rebased upon trunk Every tests passed and I will deploy this patch internally to confirm that it works as expected was (Author: rgerard): Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest somes) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074581#comment-16074581 ] Romain GERARD edited comment on CASSANDRA-13418 at 7/5/17 11:02 AM: Seems better and I am going to test it. I will keep you updated of the result. Thanks [~krummas] for the direction ! was (Author: rgerard): Seems better and will try it out. I will keep you updated of the result. Thanks [~krummas] for the direction ! > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 7/5/17 8:56 AM: --- Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWCS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sum up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question, I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sum up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question, I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074408#comment-16074408 ] Romain GERARD edited comment on CASSANDRA-13418 at 7/5/17 8:26 AM: --- Sorry about the bad name :( So here is the current patch we are using in production https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82 and the branch we are using is this one https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo was (Author: rgerard): Sorry about the bad name :( So here is the current patch we are using in production https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82 and the branch we are using is this one https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074408#comment-16074408 ] Romain GERARD edited comment on CASSANDRA-13418 at 7/5/17 8:26 AM: --- Sorry about the bad name :( So here is the current patch we are using in production https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82 and the branch we are using is this one https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo was (Author: rgerard): Sorry about the bad name :( So here is the current patch we are using in production https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82 and the branch we are using is this one https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 8:20 PM: Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sum up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question, I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sum up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 8:18 PM: Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sum up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again).
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 8:18 PM: Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sum up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sum up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again).
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:24 PM: Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:24 PM: Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS as not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:24 PM: Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the option is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS was not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the option in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:23 PM: Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165]. So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS as not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165] So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS as not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again).
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:23 PM: Hi back Marcus, So I took into account your comments and regarding the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165] So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS as not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding your the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165] So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS as not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:22 PM: Hi back Marcus, So I took into account your comments and regarding your the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165] So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) To sump up moving things closer to TWCS as not possible (to me) without impacting more external code. Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding your the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165] So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:21 PM: Hi back Marcus, So I took into account your comments and regarding your the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165] So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) Regarding the 2nd question I put the code validating the options in [TimeWindowCompactionStategyOptions|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157] in order to [trigger an exception|https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161] if the options is used elsewhere than TWCS. P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding your the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165] So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) Regarding the 2nd question I put the code validating the options in TimeWindowCompactionStategyOptions https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157 in order to trigger an exception if the options is used elsewhere than TWCS. https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161 P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc:
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066894#comment-16066894 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/28/17 5:20 PM: Hi back Marcus, So I took into account your comments and regarding your the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in [CompactionTask|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165] So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones [ignoring overlaps|https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141] and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) Regarding the 2nd question I put the code validating the options in TimeWindowCompactionStategyOptions https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157 in order to trigger an exception if the options is used elsewhere than TWCS. https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161 P.s: I will have more time in the upcoming days, so I will be more responsive. was (Author: rgerard): Hi back Marcus, So I took into account your comments and regarding your the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in CompactionTask https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165 So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones ignoring overlaps https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141 and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) Regarding the 2nd question I put the code validating the options in TimeWindowCompactionStategyOptions https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157 in order to trigger an exception if the options is used elsewhere than TWCS. https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161 P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski],
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057392#comment-16057392 ] Corentin Chary edited comment on CASSANDRA-13418 at 6/21/17 12:21 PM: -- Latest version of the patch works as it should: https://github.com/criteo-forks/cassandra/commit/da4a5c17448dab64aeb4295bb7401afbea9edf51 !twcs-cleanup.png! was (Author: iksaif): Latest version of the patch works as it should: https://github.com/criteo-forks/cassandra/commit/da4a5c17448dab64aeb4295bb7401afbea9edf51 !twcs-cleanup.png|thumbnail! > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055289#comment-16055289 ] Romain GERARD edited comment on CASSANDRA-13418 at 6/20/17 7:44 AM: Hello Jonathan, thanks to keep us up to date :) On my side, I have deployed the patch I mentioned earlier and at first glance it is running fine. For now, I lack the time to analyse the new behavior further and more in depth but I will do it in the upcoming weeks. I will keep the thread informed. was (Author: rgerard): Hello Jonathan, thanks to keep us up to date :) On my side, I have deployed the patch I mentioned earlier and at first glance it is running fine. For now, I lack the time to analyse the new behavior further and more in depth but I will do it in the upcoming weeks. So I will keep the thread informed. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000735#comment-16000735 ] Romain GERARD edited comment on CASSANDRA-13418 at 5/10/17 7:32 AM: I am trying things out by merging your ideas [~iksaif], [~jjirsa], [~adejanovski] https://github.com/erebe/cassandra/commit/f70b1efa5e2b589a5d4fa7245cd307b693ca701c but I am not sure of what to do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 for now I just disable it with a warning even if the compactionParams says otherwise. Let me know if this is not the right direction for you was (Author: rgerard): I am trying things out by merging your ideas [~iksaif], [~jjirsa], [~adejanovski] https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417 but I am not sure of what to do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 for now I just disable it with a warning even if the compactionParams says otherwise. Let me know if this is not the right direction for you > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000735#comment-16000735 ] Romain GERARD edited comment on CASSANDRA-13418 at 5/9/17 10:56 AM: I am trying things out by merging your ideas [~iksaif], [~jjirsa], [~adejanovski] https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417 but I am not sure of what to do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 for now I just disable it with a warning even if the compactionParams says otherwise. Let me know if this is not the right direction for you was (Author: rgerard): I am trying things out by merging your ideas [~iksaif], [~jjirsa], [~adejanovski] https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417 but I am not sure of what do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 for now I just disable it with a warning even if the compactionParams says otherwise. Let me know if this is not the right direction for you > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000735#comment-16000735 ] Romain GERARD edited comment on CASSANDRA-13418 at 5/8/17 1:38 PM: --- I am trying things out by merging your ideas [~iksaif], [~jjirsa], [~adejanovski] https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417 but I am not sure of what do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 for now I just disable it with a warning even if the compactionParams says otherwise. Let me know if this is not the right direction for you was (Author: rgerard): I am trying things out by merging your ideas [~iksaif], [~jjirsa], [~adejanovski] https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417 but I am not sure of what do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 Let me know if this is not the right direction for you > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000735#comment-16000735 ] Romain GERARD edited comment on CASSANDRA-13418 at 5/8/17 1:37 PM: --- I am trying things out by merging your ideas [~iksaif], [~jjirsa], [~adejanovski] https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417 but I am not sure of what do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 Let me know if this is not the right direction for you was (Author: rgerard): I am trying things out by merging your ideas [~iksaif] [~jjirsa] [~adejanovski] https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417 but I am not sure of what do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 Let me know if this is not the right direction for you > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15987267#comment-15987267 ] Jeff Jirsa edited comment on CASSANDRA-13418 at 4/27/17 6:56 PM: - I think Marcus' concern is valid, but having run TWCS in production for a long time, I really wish we just had a dangerous-sounding option that defaulted into a safe state that would let append-only users ignore overlaps when they want to drop sstables. Adding code to flush/split read repaired data to different sstables is a lot more invasive, and would require follow-up changes to TWCS (so as not to try to immediately recompact those sstables with the larger post-window-major large final sstables). We talked about doing this in 9666 (in fact, we committed to it as a condition of merging TWCS), and I think it's probably a perfectly reasonable thing to do, but it's a lot more effort than simply telling cassandra "this table has no deletes, we don't care about overlaps". Maybe the right thing is to get 9779 done so we can block deletes, and then this is a much-less-scary option? was (Author: jjirsa): I think Marcus' concern is valid, but having run TWCS in production for a long time, I really wish we just had a dangerous-sounding option that defaulted into a safe state that would let append-only users ignore overlaps when they want to drop sstables. Adding code to flush read repaired data to different sstables is a lot more invasive, and would require follow-up changes to TWCS (so as not to try to immediately recompact those sstables with the larger post-window-major large final sstables). We talked about doing this in 9666 (in fact, we committed to it as a condition of merging TWCS), and I think it's probably a perfectly reasonable thing to do, but it's a lot more effort than simply telling cassandra "this table has no deletes, we don't care about overlaps". Maybe the right thing is to get 9779 done so we can block deletes, and then this is a much-less-scary option? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961710#comment-15961710 ] Jeff Jirsa edited comment on CASSANDRA-13418 at 4/8/17 6:49 AM: {quote} What do you think about provide_overlapping_tombstones = "ignore" ? This is a little would integrate nicely with the code and does not add yet another compaction option (but sounds a little weird). {quote} As a table property instead of as a compaction property? Or as a system property? It feels like a compaction property to me, [~krummas] do you have any suggestions on how you feel something like this should be done (or, if it should be done at all)? was (Author: jjirsa): {quote} What do you think about provide_overlapping_tombstones = "ignore" ? This is a little would integrate nicely with the code and does not add yet another compaction option (but sounds a little weird). {quote} As a table property instead of as a compaction property? It feels like a compaction property to me, [~krummas] do you have any suggestions on how you feel something like this should be done (or, if it should be done at all)? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961662#comment-15961662 ] Jeff Jirsa edited comment on CASSANDRA-13418 at 4/8/17 3:29 AM: unchecked_tombstone_compaction and tombstone_compaction_interval help here, but it's still not necessary - you may be able to promise/guarantee that all writes are appends (not overwrites or deletes), in which case there's no reason to recompact all the sstables to incrementally remove garbage, when the data truly is expired and obsolete. This sorta borders on (but isn't directly the same as) CASSANDRA-9779 - if we know we're not going to overwrite/delete any data, there are a number of optimizations that can be done. was (Author: jjirsa): unchecked_tombstone_compaction and tombstone_compaction_interval help here, but it's still not necessary - you may be able to promise/guarantee that all writes are appends (not overwrites or deletes), in which case there's no reason to recompact all the sstables to incrementally remove garbage, when the data truly is expired and obsolete. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959035#comment-15959035 ] Romain GERARD edited comment on CASSANDRA-13418 at 4/6/17 2:53 PM: --- I may be wrong but wasn't unchecked_tombstone_compaction combined with tombstone_compaction_interval designed to be used for this use case ? Even if dropping the sstable is more efficient than compacting it, if someone knowledgeable can tell me I would be pleased. I am not against adding an other option, but I would rather have the confidence that I add it out of need rather than because I missed something already existant in cassandra. was (Author: rgerard): I may be wrong but wasn't unchecked_tombstone_compaction combined with tombstone_compaction_interval designed to be used for this use case ? Even if dropping the sstable is more efficient than compacting it. If someone knowledgeable can tell me I would be pleased. I am not against adding an other option, but I would rather have the confidence that I add it out of need rather than because I missed something already existant in cassandra. > Allow TWCS to ignore overlaps > - > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959035#comment-15959035 ] Romain GERARD edited comment on CASSANDRA-13418 at 4/6/17 2:51 PM: --- I may be wrong but wasn't `unchecked_tombstone_compaction` combined with `tombstone_compaction_interval` designed to be used for this use case ? Even if dropping the sstable is more efficient than compacting it. I am not against adding an other option, but I would rather have the confidence that I add it out of need rather than because I missed something already existant in cassandra. was (Author: rgerard): I may be wrong but wasn't unchecked_tombstone_compaction combined with tombstone_compaction_interval designed to be used for this use case ? Even if dropping the sstable is more efficient than compacting it. I am not against adding an other option, but I would rather have the confidence that I add it out of need rather than because I missed something already existant in cassandra. > Allow TWCS to ignore overlaps > - > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959035#comment-15959035 ] Romain GERARD edited comment on CASSANDRA-13418 at 4/6/17 2:51 PM: --- I may be wrong but wasn't unchecked_tombstone_compaction combined with tombstone_compaction_interval designed to be used for this use case ? Even if dropping the sstable is more efficient than compacting it. I am not against adding an other option, but I would rather have the confidence that I add it out of need rather than because I missed something already existant in cassandra. was (Author: rgerard): I may be wrong but wasn't `unchecked_tombstone_compaction` combined with `tombstone_compaction_interval` designed to be used for this use case ? Even if dropping the sstable is more efficient than compacting it. I am not against adding an other option, but I would rather have the confidence that I add it out of need rather than because I missed something already existant in cassandra. > Allow TWCS to ignore overlaps > - > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959035#comment-15959035 ] Romain GERARD edited comment on CASSANDRA-13418 at 4/6/17 2:52 PM: --- I may be wrong but wasn't unchecked_tombstone_compaction combined with tombstone_compaction_interval designed to be used for this use case ? Even if dropping the sstable is more efficient than compacting it. If someone knowledgeable can tell me I would be pleased. I am not against adding an other option, but I would rather have the confidence that I add it out of need rather than because I missed something already existant in cassandra. was (Author: rgerard): I may be wrong but wasn't unchecked_tombstone_compaction combined with tombstone_compaction_interval designed to be used for this use case ? Even if dropping the sstable is more efficient than compacting it. I am not against adding an other option, but I would rather have the confidence that I add it out of need rather than because I missed something already existant in cassandra. > Allow TWCS to ignore overlaps > - > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)