[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154249#comment-16154249 ] Mick Semb Wever commented on CASSANDRA-13418: - thanks for the patch. is going to make a big difference to a number of users that have append-only datamodels! -- Mick Semb Wever Australia The Last Pickle Apache Cassandra Consulting http://www.thelastpickle.com > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.1, 4.0 > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153498#comment-16153498 ] Romain GERARD commented on CASSANDRA-13418: --- Champagne ! Thanks for the commitment :) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.1, 4.0 > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151196#comment-16151196 ] mck commented on CASSANDRA-13418: - {quote}P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it.{quote} I agree, but found not clear method name to use. As Marcus' comments, {{getFullyExpiredSSTables(..)}} isn't appropriate. Any suggestions for a clear name? Otherwise the method is at 70 lines length, not great but no disaster, so i'm ok either way. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150559#comment-16150559 ] Romain GERARD commented on CASSANDRA-13418: --- Don't worry [~mck], I am currently working with an issue on couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150446#comment-16150446 ] mck commented on CASSANDRA-13418: - Updated: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150409#comment-16150409 ] mck commented on CASSANDRA-13418: - [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyTest.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest still warranted?) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150175#comment-16150175 ] Marcus Eriksson commented on CASSANDRA-13418: - A dtest that makes sure that we drop sstables when the option is enabled and that we don't drop them when it is not enabled > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150173#comment-16150173 ] Romain GERARD commented on CASSANDRA-13418: --- Will change code style and the protected to private (Splitting from getFullyExpiredSSTables seems more readable to me) If you can think of any more test. I will add them > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148879#comment-16148879 ] Marcus Eriksson commented on CASSANDRA-13418: - I had a quick read through and I have a few comments; * code style problems, brace on newline in a couple of places * I don't see the need for this method: {{protected static Set getFullyExpiredSSTables(Iterable compacting, int gcBefore)}} - we could just handle that in the current {{getFullyExpiredSSTables}}. If you disagree with that, it should at least be private and renamed as it is dangerous * I still think this needs more tests > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148862#comment-16148862 ] Romain GERARD commented on CASSANDRA-13418: --- +1 > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148739#comment-16148739 ] mck commented on CASSANDRA-13418: - Both patches are updated, with the warning log removed. The links in my previous comment remain correct. I'm +1 and ready to merge this, so long i hear no other objections. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148606#comment-16148606 ] Romain GERARD commented on CASSANDRA-13418: --- {quote}Shall I just remove the log warning altogether now it's in the docs{quote} +1 I found the warning log confusing. The doc should be a better alternative imo Thanks for the NoSpamLogger, it is much better > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147112#comment-16147112 ] mck commented on CASSANDRA-13418: - [~KurtG], thanks for the two use cases write up. at least it's documented here to begin with. The "more compactions" in the first scenario also depends on values for tombstone_threshold and tombstone_compaction_interval. I'm sitting on the fence for whether even the logged warning should be in the patch. Patches updated: - the new TWCS* classes to TimeWindow* as that's the standard prefix, - log message shorten a little, and using the terminology (property names) as known by the operator, - logging the message via the NoSpamLogger (max one line every 15 minutes), and - site and cql docs updated (just in trunk) Shall I just remove the log warning altogether now it's in the docs || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/241] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146693#comment-16146693 ] mck commented on CASSANDRA-13418: - Am working on docs… (i think i will intentionally only add doc additions to trunk, is that ok?) I've also renamed the new TWCS* classes to TimeWindow* as that's the standard prefix here. an updated patch on its way… > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146634#comment-16146634 ] Kurt Greaves commented on CASSANDRA-13418: -- Yeah more want to document the implications for the user. Basically my understanding is this: Case 1 (enabling both: {{unsafe_aggressive_sstable_expiration}} and {{uncheckedTombstoneCompaction}}: If data is read repaired into an earlier window, we will potentially trigger more compactions including the overlapping data but only when {{provide_overlapping_tombstones}} is set. If it is not set we will potentially trigger only single SSTable tombstone compactions which _may or may not_ provide a benefit. Basically if you enable more options the benefit is that you potentially get rid of read-repaired expired data faster, at the cost of more compactions. So you should only really do this if you have a significant amount of read-repaired data and hitting a lot of tombstones because of it. Case 2 (enabling only {{unsafe_aggressive_sstable_expiration}}): If data is read repaired into an earlier window, it will not be removed until the whole SSTable is expired, which means it will hang around until TTL passes for that table. If that's not the whole story/correct please elaborate. I think it's worthwhile having it documented clearly somewhere (docs) why you'd set these options (together or separately). > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144901#comment-16144901 ] Romain GERARD commented on CASSANDRA-13418: --- 2. only enabling unsafe_aggressive_sstable_expiration When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *not ignore* the overlaps and look globally if the current sstable is eligible. bq. 1. enabling both When looking for sstables expired, you will *ignore *the overlaps and only look locally if the current sstable is eligible. When looking for sstables to compact, you will *ignore* the overlaps and look globally if the current sstable is eligible. I made a new version of the patch with uncheckedTombstoneCompaction disabled and a warning message. https://github.com/criteo-forks/cassandra/commit/800ab325cbf7d9d4d5e60e2b959918426e121815 the diff {noformat} diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java index 43c90c7042..d21222c484 100644 --- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java +++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java @@ -67,9 +67,9 @@ public class TimeWindowCompactionStrategy extends AbstractCompactionStrategy else logger.debug("Enabling tombstone compactions for TWCS"); -if (this.options.ignoreOverlaps) -this.uncheckedTombstoneCompaction = true; - +if(this.options.ignoreOverlaps && !this.uncheckedTombstoneCompaction) { +logger.warn("You are running with sstables overlapping checks disabled but without unchecked tombstone compaction, check that this what you want"); +} } {noformat} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144660#comment-16144660 ] Kurt Greaves commented on CASSANDRA-13418: -- >Watchers, feel free to jump on the question May be missing something obvious but can someone clarify why you wouldn't want to enable {{uncheckedTombstoneCompaction}} when {{unsafe_aggressive_sstable_expiration}} is also enabled? What exactly are the implications of: 1. enabling both 2. only enabling unsafe_aggressive_sstable_expiration > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143749#comment-16143749 ] Marcus Eriksson commented on CASSANDRA-13418: - bq. N.B: I tried to apply the syle guide found in .idea/codeStyleSettings.xml but it is changing me a lot of things. Do you know if it is up to date ? It is up to date, not sure what you mean by "apply" here? bq. Could we, instead of setting uncheckedTombstoneCompaction, log a warning telling the user that they probably want to uncheckedTombstoneCompaction set as well? +1 > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143548#comment-16143548 ] Romain GERARD commented on CASSANDRA-13418: --- Point noted. My only issue with putting a log is that it is easily missable, especially at startup. But as you stated, this is an advance option so we can think that people will not stumble-upon it by mistake and will read the doc first before activating it. You already have paid the cost for looking for this option, so adding a couple instead of one should be painless. We can go without at first, as you said, as it is the safest default. @Watchers, feel free to jump on the question > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142240#comment-16142240 ] mck commented on CASSANDRA-13418: - {quote}N.B: I tried to apply the syle guide found in .idea/codeStyleSettings.xml but it is changing me a lot of things. Do you know if it is up to date ?{quote} I don't use IntelliJ so I can't answer that for you, sry. [~krummas]? Otherwise you can ask on irc #cassandra or on the user mailing list. {quote}I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated{quote} I'm -1 on this for the moment. While it holds a logic argument, as you explain, it's not intuitive for the user. The user has to know that this happens (via docs or via code). I'd be more comfortable expecting the users using an advanced toggle like this (requires system properties and table option) to appreciate the difference between {{uncheckedTombstoneCompaction}} and {{unsafe_aggressive_sstable_expiration}} and to enable both. Any smarts can be added latter on with further user feedback and experience. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141425#comment-16141425 ] Romain GERARD commented on CASSANDRA-13418: --- I am at peace with that :) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141130#comment-16141130 ] mck commented on CASSANDRA-13418: - {quote}is this bad new ? https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/ {quote} looks just like a flakey test to me. The CHANGES.txt entry was in the wrong place. If this is a patch against trunk, then it's to go under 4.0. But a patch would also be nice for 3.11. I'll update this in thelastpickle repo, see links below, and you can let me know if you agree. The commit message has been updated as well, per practice. These patches are then: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/criteo-forks/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/219/] | | [trunk_13418|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/22] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140505#comment-16140505 ] Romain GERARD commented on CASSANDRA-13418: --- is this bad new ? https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214/ > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139378#comment-16139378 ] mck commented on CASSANDRA-13418: - [~rgerard]'s patch, again (a35b9e8): || branch || testall || dtest || | [trunk_13418|https://github.com/criteo-forks/cassandra/tree/CASSANDRA-13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/22] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/214] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138221#comment-16138221 ] Romain GERARD commented on CASSANDRA-13418: --- {quote}I think so yes. Not because it's my opinion, but because that's the general minimalist style of the C* codebase.{quote} Understood, here the update https://github.com/criteo-forks/cassandra/commit/a35b9e818a5294f7fd99588bd407fe909b3402a7 > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137722#comment-16137722 ] mck commented on CASSANDRA-13418: - {quote}I can remove TWCSCompactionController.getFullyExpiredSSTables(..) if you wish, I don't have any strong opinion about it, just say so{quote} I think so yes. Not because it's my opinion, but because that's the general minimalist style of the C* codebase. {quote}CompactionController:232 any reason not to return an immutable set?{quote} Leave the code as-was in your 0c4d342 commit. Your original rebuttal justifying it to be mutable made sense. {quote}Do you have in mind any test that should be good to have ?{quote} I'll get back to you on this [~rgerard]. But the test you added was a good start. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136964#comment-16136964 ] Romain GERARD commented on CASSANDRA-13418: --- Do you have in mind any test that should be good to have ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136652#comment-16136652 ] Romain GERARD commented on CASSANDRA-13418: --- New version here https://github.com/criteo-forks/cassandra/commit/cfabb2ddd31f16ae127d4b22e0c02a1676ba336b * I can remove `TWCSCompactionController.getFullyExpiredSSTables(..)` if you wish, I don't any strong opinion about it {quote}Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction?{quote} In my case you are right, activating disableTombstoneCompaction or setting the tombstoneThresold high enough should be better performance wise. My intention when activating the option is to guarantee a consistent behavior for overlapping checks. I wasn't comfortable to ignore overlaps when checking for fully expired sstables but not ignoring it when looking for sstables to compact. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136544#comment-16136544 ] Marcus Eriksson commented on CASSANDRA-13418: - bq. I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated Do we want this? It feels like if we expect to be able to drop entire sstables due to being expired, it would be pretty wasteful to run a single sstable tombstone compaction when there are 20% tombstones in the sstable? We would probably be better off waiting until 100% is expired and drop the entire sstable without compaction? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135748#comment-16135748 ] mck commented on CASSANDRA-13418: - {quote}And also as the parent function already use a mutable set…{quote} fair enough, i was too lazy to check so far :-) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135283#comment-16135283 ] Romain GERARD commented on CASSANDRA-13418: --- Initial review comments: * CHANGES.txt needs a line -> OK * I think so also as it greatly help having a stable behavior when using TWCS for time series * Not at all, it was just to pack things together and to inform the reader that a TWCSCompactionController exist * OK Trivial stuff : * Ok * I don't like to return Immutable collections when only the base type (Set, List, Map,...) is specified as due to the type erasure someone will get burn at runtime with that (due to unchecked exception). And also as the parent function already use a mutable set I sticked with that because returning sometime a mutable set and sometime an immutable set is kind of a leaky abstraction for me (Will check if I can change everything for an ImmutableSet) * OK * OK Will propose an other patch tomorrow. P.S: The patch has been running in production since last Friday without hickups. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135028#comment-16135028 ] mck commented on CASSANDRA-13418: - [~rgerard], Initial review comments: - {{CHANGES.txt}} needs a line - is this suitable for 3.11.x as well? (Does it constitute a stability patch? i think so, but that's an opinion) - is the method {{TWCSCompactionController.getFullyExpiredSSTables(..)}} really needed? (CompactionController:170 seems to do what we need already) - {{CompactionController.ignoreOverlaps()}} is a bit of a concept, it deserves some apidoc love. Trivial stuff… - CompactionController:108 missing space in {{if(}}, same on :170 - CompactionController:232 any reason not to return an immutable set? - TWCSCompactionController:29 needs a blank line after - TWCSCompactionController:33 java declaration order. (static fields go before member fields) - TimeWindowCompactionStrategy:70 missing space in {{if(}} > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134638#comment-16134638 ] mck commented on CASSANDRA-13418: - [~rgerard]'s patch: || branch || testall || dtest || | [trunk_13418|https://github.com/criteo-forks/cassandra/tree/CASSANDRA-13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/21] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/194] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130152#comment-16130152 ] Romain GERARD commented on CASSANDRA-13418: --- Hi, I am back with a new proposition https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05 Majors differences : + Used [~krummas] way for introducing the ignore Overlaps + I splitted the function that is doing the overlapingChecks as in the previous patch, I was wrongfully checking for overlaps in memtables (even if the option was activated) https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e8e282423dcbf34d30a3578c8dec15cdR170 + I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated https://github.com/criteo-forks/cassandra/commit/0c4d342341340115d2c8d15f78b2cb3eab3c2f05#diff-e83635b2fb3079d9b91b039c605c15daR71 It seems a sane default for me, as even if we drop fully expired sstables, we will still check for worth Dropping ones and we want to also ignore overlaps check in this case. + Added a simple test case. I will look to add more (feel free to suggest somes) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091218#comment-16091218 ] Romain GERARD commented on CASSANDRA-13418: --- Yes I am still looking forward to test the proposition of [~krummas] and to add unit tests. I am right now in vacation, so I haven't had the time to play around with the code, but it is defintly in my todo list on my return (begin of August) So I haven't forgotten the patch. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091082#comment-16091082 ] mck commented on CASSANDRA-13418: - [~rgerard], are you looking to add any unit tests to [~krummas]'s patch? And have you also had any chance to do any manual testing? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074910#comment-16074910 ] Romain GERARD commented on CASSANDRA-13418: --- Agreed for the unit tests > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074593#comment-16074593 ] Marcus Eriksson commented on CASSANDRA-13418: - this needs unit tests before getting committed as well > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074581#comment-16074581 ] Romain GERARD commented on CASSANDRA-13418: --- Seems better and will try it out. I will keep you updated of the result. Thanks [~krummas] for the direction ! > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074565#comment-16074565 ] Marcus Eriksson commented on CASSANDRA-13418: - how about something like this? https://github.com/krummas/cassandra/commits/twcs_drop_unsafe (untested) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074408#comment-16074408 ] Romain GERARD commented on CASSANDRA-13418: --- Sorry about the bad name :( So here is the current patch we are using in production https://github.com/criteo-forks/cassandra/commit/9424d9d25978e11b34d725a3bdf8a4956a7cbc82 and the branch we are using is this one https://github.com/criteo-forks/cassandra/commits/cassandra-3.11-criteo > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074381#comment-16074381 ] Marcus Eriksson commented on CASSANDRA-13418: - [~rgerard] you seem to be mentioning the wrong person could you point me to the branch you are working on? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074357#comment-16074357 ] Romain GERARD commented on CASSANDRA-13418: --- Gentle bump, [~markerickson-wk] > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066894#comment-16066894 ] Romain GERARD commented on CASSANDRA-13418: --- Hi back Marcus, So I took into account your comments and regarding your the 1rst one I wanted to do that at first but getFullyExpiredSSTables is also used in CompactionTask https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionTask.java#L165 So only modifying things at the TWS level would have resulted in compacting the sstables that we wanted to drop, and I was not too incline to touch to CompactionTask. It is also making worthDroppingTombstones ignoring overlaps https://github.com/criteo-forks/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L141 and respect the tombstoneThresold specified (We can turn on uncheckedTombstoneCompaction for this one) Regarding the 2nd question I put the code validating the options in TimeWindowCompactionStategyOptions https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java#L157 in order to trigger an exception if the options is used elsewhere than TWCS. https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/src/java/org/apache/cassandra/schema/CompactionParams.java#L161 P.s: I will have more time in the upcoming days, so I will be more responsive. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062629#comment-16062629 ] Romain GERARD commented on CASSANDRA-13418: --- Will have a look at it this week [~markerickson-wk] > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057490#comment-16057490 ] Marcus Eriksson commented on CASSANDRA-13418: - Had a (too?) quick read-through Would it be possible to move all (or at least most) logic to the TWCS-classes as that is the only place it is used? Perhaps just adding a boolean to getFullyExpiredSSTables() that states whether we are aggressive or not? Also making the CompactionParams.Option.UNSAFE_AGGRESSIVE_SSTABLE_EXPIRATION a TWCS-only option would make sense right? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057392#comment-16057392 ] Corentin Chary commented on CASSANDRA-13418: Latest version of the patch works as it should: https://github.com/criteo-forks/cassandra/commit/da4a5c17448dab64aeb4295bb7401afbea9edf51 !twcs-cleanup.png|thumbnail! > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055289#comment-16055289 ] Romain GERARD commented on CASSANDRA-13418: --- Hello Jonathan, thanks to keep us up to date :) On my side, I have deployed the patch I mentioned earlier and at first glance it is running fine. For now, I lack the time to analyse the new behavior further and more in depth but I will do it in the upcoming weeks. So I will keep the thread informed. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041465#comment-16041465 ] Jonathan Owens commented on CASSANDRA-13418: I also should mention that our TWCS compaction window length on this system is 12 hours, but our partitions are 1 month, so partitions overlap at best 60 tables. I think TWCS recommendations are to align compaction window length to partition width, which would result in few-to-no overlapping partitions across tables. We didn't know that at the time. I'm not sure how that interacts here. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037380#comment-16037380 ] Jonathan Owens commented on CASSANDRA-13418: We're chasing what may be a gotcha in our implementation of this. We have one cluster that does regular incremental repairs, and is ending up with a whole lot of duplicated data across sstables, we guess due to overstreaming. Explicitly ignoring overlap is awesome for compacting away tombstones, but does nothing to detect duplicate partitions across tables on disk. And in TWCS, because it uses largest-timestamp to bucket, tables with older data in them that was streamed later will never appear in the same compaction operation as the table they "should have" been written in the first time. CASSANDRA-10496 would resolve this eventually by pushing that older data into the correct bucket, but we need a workaround sooner. We're contemplating a few options: * I remember, or imagined, a ticket to try to suss out overlapping sstables and include them in the current compaction operation if found, rather than cancelling the operation. That seems good here, because in TWCS you should not have many overlaps, and if you do they need to be addressed somehow or you end up with duplicates. * We could switch to cassandra-reaper or something similar and do higher-precision repairs to reduce overstreaming, though that's a lot of work to fix what seems really like a compaction artifact. * Reverting the change would put us back in the world where tombstones don't expire due to overlap checks failing, so that's out. * We can write an external tool to detect overlaps and issue user-defined compactions against them, but that seems really yucky. * We could never run incremental repairs and rely only on higher consistency levels on write/read, and let read repair do the work. This fixes the problem only by decreasing the magnitude. I still believe this patch is a good idea, as optimizing for tombstone expiry is essential with TWCS, but the repair interaction here is worth pointing out. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011913#comment-16011913 ] Romain GERARD commented on CASSANDRA-13418: --- Any advices to take a next step forward ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16000735#comment-16000735 ] Romain GERARD commented on CASSANDRA-13418: --- I am trying things out by merging your ideas [~iksaif] [~jjirsa] [~adejanovski] https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417 but I am not sure of what do if one node of the ring has not activated cassadra with -Dcassandra.unsafe.xxx https://github.com/erebe/cassandra/commit/12f085a53df62361f2fad5c046dc770ff746b417#diff-e8e282423dcbf34d30a3578c8dec15cdR101 Let me know if this is not the right direction for you > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988846#comment-15988846 ] Jeff Jirsa commented on CASSANDRA-13418: I think the path forward is this: Let's create a new compaction option, that ISNT in any of the autocomplete logic, and has a scary name {{unsafe_aggressive_sstable_expiration}} . We could also add a system property {{-Dcassandra.unsafe.aggressive_sstable_expiration}} to make sure nobody ever enables this by mistake and gets unsafe behavior. The same patch should add a section to the documentation explaining exactly why this option is unsafe. Finally, The approach in the patch on this ticket is different from the approach on the fork [~intjonathan] linked - the reviewer (I can do it, or perhaps [~krummas] ) should definitely evaluate which approach is going to be better. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988836#comment-15988836 ] Nate McCall commented on CASSANDRA-13418: - I would like to see an append-only optimization as [~iksaif] and others have described: off by default and scary warnings in the docs. This is a common enough use case (particularly in that folks like [~intjonathan] and crew having a patched version deployed already) that we should provide an optimization for it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988490#comment-15988490 ] Corentin Chary commented on CASSANDRA-13418: I agree that fixing CASSANDRA-13418 would be a better solution, but this is likely to take more time, and I'm unsure we will get to the point where it really solve our issues in all the cases. I'll would be inclined to add the option too, with appropriate documentation. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987304#comment-15987304 ] Jonathan Owens commented on CASSANDRA-13418: We did exactly this about a year ago for our timeseries use case, and it works great. Made compactions a whole lot more efficient. Here's how we did it: https://github.com/newrelic-forks/cassandra/pull/8 > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987267#comment-15987267 ] Jeff Jirsa commented on CASSANDRA-13418: I think Marcus' concern is valid, but having run TWCS in production for a long time, I really wish we just had a dangerous-sounding option that defaulted into a safe state that would let append-only users ignore overlaps when they want to drop sstables. Adding code to flush read repaired data to different sstables is a lot more invasive, and would require follow-up changes to TWCS (so as not to try to immediately recompact those sstables with the larger post-window-major large final sstables). We talked about doing this in 9666 (in fact, we committed to it as a condition of merging TWCS), and I think it's probably a perfectly reasonable thing to do, but it's a lot more effort than simply telling cassandra "this table has no deletes, we don't care about overlaps". Maybe the right thing is to get 9779 done so we can block deletes, and then this is a much-less-scary option? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986853#comment-15986853 ] Jon Haddad commented on CASSANDRA-13418: I get the desire to guard users to not shoot themselves in the foot, but this is a pretty reasonable approach to solving the problem. The tradeoff is a bit of risk. There's been a lot of proposed features like these that are in the "great if you know what you're doing but crazy dangerous otherwise" category. I feel like these features should be thoroughly documented and guarded behind a config flag that's disabled by default. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984600#comment-15984600 ] Corentin Chary commented on CASSANDRA-13418: [~krummas]: good point, CASSANDRA-10496 seems to come with its own set of issues: number of sstables would probably get huge, except if you add some kind of "buffering" like what is done for the first window. I'll see if I can find a reasonable solution for TWCS or propose it in the related ticket. If can't agree on a good solution, we can fallback to what is proposed here. [~adejanovski]: about skipping getOverlappingSSTables() completely, I though about that too, but I think it's used in some other place and I wasn't sure what the result would be. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984564#comment-15984564 ] Marcus Eriksson commented on CASSANDRA-13418: - I think we should instead fix the actual problem, that old data (read-repaired for example) can get flushed together with new data (maybe CASSANDRA-10496) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984514#comment-15984514 ] Alexander Dejanovski commented on CASSANDRA-13418: -- [~iksaif], I have a similar patch waiting to be tested on my laptop actually :) The naming of the option in my version was deliberately scary so that people would (hopefully) give it a good thinking before using it : unsafe_expired_sstable_deletion I fully agree that this option (whatever the name) should be available for TWCS (and why not DTCS) because the typical use case is to use TTL as a deletion mechanism and not explicit DELETE statements, which should be done in some rare cases only. If a cluster is a bit unhealthy for whatever reason, it is painful to see read repair forcing tens of GB of data to stay on disk because of timestamp overlaps. The only possible zombie data in a correct TWCS use case (all data is written with TTLs) would be if the tombstone and the data it shadows are written in the same time window (and of course the data is missing on one node). If the data and the tombstone live in different buckets, we'll be in the following scenario : - data is written in bucket 1 with a TTL but the write fails on one node - the tombstone is written in bucket 2 on all nodes : data and tombstone will then never be compacted together since they live in different buckets. - In bucket 3 there is a read repair that replicates the data on the node that missed it, which should have been in bucket 1. It's written with the same timestamp/TTL and will expire at the same time than all other nodes, even if the tombstone is collected before (which won't happen until TTL expires). If the tombstone and the data it shadows live in the same bucket, and the TTL is longer than gc_grace_seconds, then it's indeed possible to have reappearing data, but even then I'm not sure it could happen : During the bucket's major compaction, the data and tombstone would most likely be merged and only the tombstone would survive, preventing the possibility of having a subsequent read repair to replicate the data in the next time windows. [~jjirsa] [~krummas] : I may be wrong here in the way compaction actually merges tombstones and data before gc_grace_seconds, so please correct me if necessary. IMHO it is worth enduring a slight chance of reappearing data in a TTL workload, by choice, in order to allow optimal space savings. After looking at your patch, it could be interesting performance wise to fully skip calling getOverlappingSSTables() in order to avoid searching and storing overlaps only to void them afterwards, by modifying this line instead : https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java#L107 To sum up : big +1, it'll help ops that try to fight with low disk space and don't understand why expired SSTables don't get deleted. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. >
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984354#comment-15984354 ] Corentin Chary commented on CASSANDRA-13418: Trying to go forward, [~jjirsa], [~adejanovski], [~krummas]: what is your opinion on adding a custom option to TWCS and DTCS only doing basically what my current patch does ? The only drawback that I see if a fully expired overlapping table is removed is that read-repaired data that was explicitely deleted could eventually re-appear. If you're aware of more dangerous situations I'd be glad to hear about it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974996#comment-15974996 ] Jeff Jirsa commented on CASSANDRA-13418: [~iksaif] - this isn't a review, but at first glance I'm not in love with the idea of extending that config option in that way - it wasn't meant for that purpose, though it's sort of tangential (it was really meant for the specific task of grouping sstables for the cleanup compaction, and this isn't the cleanup compaction). There's also the bigger question of whether or not we really want to expose this to users. It's dangerous. I really really wanted something like this at my last employer, but the "this can be dangerous" factor prevented me from writing it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974024#comment-15974024 ] Corentin Chary commented on CASSANDRA-13418: [~rgerard]: No it should certainly not be the default. If you look at the description of our usecase, it's only necessary when you have short-lived data with a lot of cells, which make running periodic repairs impossible or very impractical and when you also need/want read-repairs because you can't afford QUORUM reads (datacenters on separate continents and low latency requirements). So there is a need for it, but it should not be the default. [~jjirsa], any opinion ? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973651#comment-15973651 ] Romain GERARD commented on CASSANDRA-13418: --- I haven't had time to check my own questions yet (I will this week) so I am holding my judgment for now. My main concern is 'yet an other option ?' I need to convince myself first if we can't make it a default for TWCS ? Why ? As the need is tightly coupled to TWCS can't we bundle the option more closely to it, ... I will definitively spend some time reading Cassandra this week, > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972898#comment-15972898 ] Corentin Chary commented on CASSANDRA-13418: Here is an attempt at a patch: https://github.com/iksaif/cassandra/tree/CASSANDRA-13005-trunk Works with: {code} ALTER TABLE test.test WITH compaction = {'class': 'TimeWindowCompactionStrategy', 'provide_overlapping_tombstones': 'ignore_overlaps'}; {code} This outputs: {code} WARN [CompactionExecutor:4] 2017-04-18 17:17:00,538 CompactionController.java:96 - You are running with overlapping sstable sanity checks for tombstones disabled on test:test,this can lead to inconsistencies when running explicit deletions. {code} I'm still not sure about reusing the existing option, I could be conviced overwise (but it should not be hard to change). Once we agree on that I can add documentation and unit tests. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962212#comment-15962212 ] Corentin Chary commented on CASSANDRA-13418: If I understand things correctly here, the worst that can happen is that data could re-appear. Remember that we just drop SSTables were *all* the items have expired. (The worst that can happen if you don't have the option is that you suddently stop dropping SSTables and all your disk full up.) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962188#comment-15962188 ] Marcus Eriksson commented on CASSANDRA-13418: - I'll think more about this, but generally adding an option to unsafely drop sstables feels wrong. Someone is going to misunderstand it and lose legit data. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962183#comment-15962183 ] Corentin Chary commented on CASSANDRA-13418: AFAIK provide_overlapping_tombstones is a compaction property that we already have. I'm suggesting to add "ignore" on top of the existing "none" (default), "cell" and "row" values. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961710#comment-15961710 ] Jeff Jirsa commented on CASSANDRA-13418: {quote} What do you think about provide_overlapping_tombstones = "ignore" ? This is a little would integrate nicely with the code and does not add yet another compaction option (but sounds a little weird). {quote} As a table property instead of as a compaction property? It feels like a compaction property to me, [~krummas] do you have any suggestions on how you feel something like this should be done (or, if it should be done at all)? > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961662#comment-15961662 ] Jeff Jirsa commented on CASSANDRA-13418: unchecked_tombstone_compaction and tombstone_compaction_interval help here, but it's still not necessary - you may be able to promise/guarantee that all writes are appends (not overwrites or deletes), in which case there's no reason to recompact all the sstables to incrementally remove garbage, when the data truly is expired and obsolete. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary > Labels: twcs > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.3.15#6346)