[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190016#comment-17190016 ] Sagar Linge commented on CASSANDRA-13885: - What was the solution implemented for this issue ? I am facing same issue in cassandra 3.11 . > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer >Priority: Normal > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207842#comment-16207842 ] Blake Eggleston commented on CASSANDRA-13885: - Both are problems. There's also the issue of what to do with these split data sets after the patch release. If a user is only doing full repairs, they'll have a growing unrepaired data set that will never be compacted with the repaired data set. In any case, making changes like this to repair behavior isn't appropriate for a 3.0 patch release. > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206984#comment-16206984 ] Kurt Greaves commented on CASSANDRA-13885: -- OK, so you literally meant that it would be complex (code-wise) to skip anti-compaction? Or is this referring to the complexities that Stefan mentioned, i.e, mixing incremental with full repairs would cause problems? > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206972#comment-16206972 ] Blake Eggleston commented on CASSANDRA-13885: - [~KurtG] from my previous comment: {quote} fixing this would mean some non-trivial changes to repair behavior which have the potential to affect correctness {quote} in other words, the changes that would be needed to add support for this flag would be invasive enough that there would be a real risk of breaking other. more critical things > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206922#comment-16206922 ] Kurt Greaves commented on CASSANDRA-13885: -- I don't understand why providing a flag to skip anti-compaction for full repairs is such a big deal? For anyone using vnodes and full repairs this is going to make it much more difficult to repair at the same rate as previous, as they'll only be able to run one repair per table at a time. > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174313#comment-16174313 ] Marcus Eriksson commented on CASSANDRA-13885: - You can run repair with {{-st -et }} to avoid anticompaction in 3.0 > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173655#comment-16173655 ] Blake Eggleston commented on CASSANDRA-13885: - I agree that this behavior is weird, and that it has some negative operational implications. However, fixing this would mean some non-trivial changes to repair behavior which have the potential to affect correctness. I'd lean pretty strongly towards not-fixing this one. > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173049#comment-16173049 ] Thomas Steinmaurer commented on CASSANDRA-13885: It is about ease the operational side and that 2.2+ is a major shift towards behaving differently and being much more complex when I simply want to run a full repair across my 9 node cluster on 2 small volume CFs on a daily basis (grace period = 72hr) and being used to so by running the following with 2.1 kicked off in parallel on all nodes: {code} nodetool repair -pr mykeyspace mycf1 mycf2 {code} Ok, I learned incremental repair being the default since 2.2+, so I need to additionally apply the -full option. Ok, not a big deal, but when running the following with 3.0.14, again kicked off in parallel on all nodes: {code} nodetool repair -full -pr mykeyspace mycf1 mycf2 {code} I start to see basically the following nodetool output: {code} ... [2017-09-20 11:34:49,968] Some repair failed [2017-09-20 11:34:49,968] Repair command #8 finished in 0 seconds error: Repair job has failed with the error message: [2017-09-20 11:34:49,968] Some repair failed -- StackTrace -- java.lang.RuntimeException: Repair job has failed with the error message: [2017-09-20 11:34:49,968] Some repair failed at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:115) at org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77) at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583) at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533) at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452) at com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108) {code} > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173034#comment-16173034 ] Stefan Podkowinski commented on CASSANDRA-13885: It's always a potential problem before CASSANDRA-9143, yes. But since only unrepaired data is affected, running incremental repairs often enough before gc_grace will minimize the chance that a sstable would be skipped from anti-compaction and remain in the unrepaired set afterwards. And that's what incremental repairs are designed for anyways, to be run regularly on new data. The important thing is that at the end, all data needs to be successfully promoted to the repaired set before gc_grace. Why is that important? Because after gc_grace, deleted data may be compacted away on replicas. But this will not happen in case the tombstone and corresponding data will be in different repaired/unrepaired sets, as those will not be compacted together. Also remember that incremental will only validate sstables in unrepaired. As a consequence, after the next incremental repair, the data from the unrepaired set (but not the tombstone from repaired set) will be transferred to the other replicas, where the data already had been compacted away before. So how would this situation change if we'd not run anti-compaction (promote to repaired) after full repairs at all? In this case we'd just let the unrepaired set grow, which should not be a problem on its own. But the operator would be responsible to schedule incremental repairs often enough to make sure the promotion process is happening before gc_grace, to avoid the potential data inconsistency issues describe above. The only other way to avoid these would be not to run incremental repairs at all anymore, which would be fine, too. So yes, I guess we could agree in this ticket under which situations it would be acceptable to run full repairs with a --skip-anticompaction flag, but I'd also like to hear how to communicate the correct scheduling to users, without just handing them a loaded gun. Because currently you can't do wrong by mixing full and incremental (as far as I can tell) and we can get away by telling people to run any kind of repair at least once before gc_grace, e.g. weekly incremental with every n-th as a full repair. Exclusively running full repairs, even with included anti-compaction at the end, is btw not as broken as you may thing. In that situation you simply don't care about the unrepaired set. The anti-compaction at the end of the repair is a waste, yes, but it's not so bad (performance wise), as we only have to anti-compact new unrepaired data since the last repair. Not being able to perform parallel -pr repairs is an unfortunate side-effect of this, but I'd still prefer to recommend avoid using -pr in parallel and fall back to range based repairs if the cluster size doesn't allow this. Doing subrange repairs would actually cause the same problems as -pr, but with CASSANDRA-10422 it was decided to skip them, so all the caveats described above will apply there, although I'd not expect users doing subrange repairs mixed with incremental repairs. > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172945#comment-16172945 ] Paulo Motta commented on CASSANDRA-13885: - bq. What we need to avoid here is to end up with a tombstone in the repaired set and the corresponding data in unrepaired. Given that anti-compaction is non-deterministic on 3.0 due to CASSANDRA-9143, you can't guarantee both the data and the tombstone will be marked as repaired after incremental repair so this will be always a potential problem whether or not you run anti-compaction after full-repairs. I don't see how running anti-compaction after full repairs can improve this since it's still subject to the same limitations. Since I might be missing some edge case here, would you mind giving an example where skipping anti-compaction after full repair could be a problem when mixing with incremental repairs? bq. Or at least make sure that incremental repairs - if run at all - will be run at least once before gc_grace. This is a basic requirement of repair, so if you don't do that you're basically accepting the risk of data resurrection - whether or nor anti-compaction is run after full repairs. bq. Really -1 on any changes to fundamental repair assumptions and paradigms in 3.0, if not for really critical bug fixing I'd agree with that if we had reliable incremental repairs which is not the case on 3.0, and we were just fully conscious about its limitations quite late on 3.0 line, but some users are just starting to adopt 3.0, so it's fair to give them an option to stick with non-incremental repairs if they prefer so for operational reasons. Perhaps we could just add a {{\-\-skip-anticompaction}} flag which can be used together with {{--full}} to skip anti-compactions? > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172893#comment-16172893 ] Stefan Podkowinski commented on CASSANDRA-13885: If you really want to stop doing anti-compaction for full repairs, you'd also have to prevent users from running both full and incremental repairs during their repair schedules. Or at least make sure that incremental repairs - if run at all - will be run at least once before gc_grace. What we need to avoid here is to end up with a tombstone in the repaired set and the corresponding data in unrepaired. Assuming gc_grace has passed and both have already been compacted on the other replicas, running incremental would zombie the data back to the replicas, as incremental is only working on the unrepaired set, while the local tombstone is in the repaired set and thus won't be transfered or considered during MT creation. Really -1 on any changes to fundamental repair assumptions and paradigms in 3.0, if not for really critical bug fixing > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171831#comment-16171831 ] Jeff Jirsa commented on CASSANDRA-13885: cc [~krummas] and [~bdeggleston] for visibility. It'll be far less invasive to remove (parts of?) CASSANDRA-7586 than it would be to backport CASSANDRA-9143 and the ~10 or so follow-up patches [~bdeggleston] has done to make incremental repair viable in trunk. Neither of these options feel very appropriate for 3.0 though, if I'm being honest. > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org