[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049772#comment-17049772 ] Nihar Dudam commented on CASSANDRA-10726: - Thanks Jeff and Jeremiah. Seems like we are facing this issue only on 2 tables where reads/writes/deletes are very heavy (for every 5 min). We will surely look at performance issues and ask for help in the slack or users mailing list. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049757#comment-17049757 ] Jeremiah Jordan commented on CASSANDRA-10726: - Not saying this patch isn’t helpful. It is very helpful to improve things if things are failing. But this patch is not going to help you if the steady state of your cluster is that the blocking read repairs cause read timeouts. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049745#comment-17049745 ] Jeff Jirsa commented on CASSANDRA-10726: [~jjordan]: bq. In 3.11 you should set both read_repair_chance and dc_local_read_repair_chance to 0 on all your tables. That's not what this is about - this is in foreground read repair where one of the queried replicas doesn't match. bq. If such things are common on your cluster, then you will want to investigate the performance issues further and try to resolve them. This change means your application will no longer be trying to wait for the replicas to be up to date before the read returns It helps when disks are failing, and it helps when GC is high. Both of those things happen in the real world, and this patch turns a failing machine into a non-issue. That said - backporting this patch is almost certainly not going to happen. It's a large patch in a critical piece of code and it's not suitable for a stable branch like 3.11. It's a meaningful, important patch, but probably safest to wait for 3.11 > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049591#comment-17049591 ] Nihar Dudam commented on CASSANDRA-10726: - @Jeremiah Thanks for the input. Replicas are getting updated and read timeouts are gone after some time. So backporting this fix to 3.11 will help us to get rid of reading timeouts if a single replica is not up-to-date I guess. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049553#comment-17049553 ] Jeremiah Jordan commented on CASSANDRA-10726: - In 3.11 you should set both read_repair_chance and dc_local_read_repair_chance to 0 on all your tables. bq. Our application scale is increasing day by day and replicas are not up-to-date when required. Not sure what you expect this change to fix there. If such things are common on your cluster, then you will want to investigate the performance issues further and try to resolve them. This change means your application will no longer be trying to wait for the replicas to be up to date before the read returns. As per the description of this ticket, this is helpful if some "bad thing" is happening to make writes fail, and you don't want that to interfere with reads. So it will help with transient issues. If this is a common issue in your cluster, then you have some work to do. You might try the #cassandra slack channel, the users mailing list, or some consulting to help you with the persistent issue. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049547#comment-17049547 ] Nihar Dudam commented on CASSANDRA-10726: - We are also very much interested in 3.11 backport. We are facing this issue very frequently in our application. Is there any workaround? Our application scale is increasing day by day and replicas are not up-to-date when required. I see Digest mismatch WARN almost every minute now. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044701#comment-17044701 ] Jerome BAROTIN commented on CASSANDRA-10726: [~isaacreath] Nice work ! We are indeed very interested by a 3.11 backport. Hope you'll can do that. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974101#comment-16974101 ] Sanjit Mohanty commented on CASSANDRA-10726: [~isaacreath]: We are using C* version 3.11.4 & quite frequently encountering this situation. It is quite critical for us to get this fixed. Please let me know if you would need any support. I can as well contribute in the back port. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966946#comment-16966946 ] Federico commented on CASSANDRA-10726: -- [~isaacreath] would you? I've encountered this problem in my application and Cassandra 4 is so far away that I think many teams would benefit from this. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824557#comment-16824557 ] Isaac Reath commented on CASSANDRA-10726: - I've had an issue pop up a couple of times where blocking read repair failing caused reads to time out where [https://github.com/apache/cassandra/pull/94] would have solved the issue. I'd be more than happy to write a 3.11 backport of the speculative retry logic if there was any interest in the community in having one. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804130#comment-16804130 ] Blake Eggleston commented on CASSANDRA-10726: - 4.0 only I'm afraid. It's built on a refactor that's too invasive to backport to 3.11 > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803747#comment-16803747 ] Mikael Svensson commented on CASSANDRA-10726: - Will this be backported to 3.11? I run in to the RTE a lot with the problem described with this issue https://issues.apache.org/jira/browse/CASSANDRA-14480 > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589495#comment-16589495 ] Kurt Greaves commented on CASSANDRA-10726: -- Great, thanks Blake. I just needed a quick fix to get my tests running :). > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589010#comment-16589010 ] Blake Eggleston commented on CASSANDRA-10726: - Thanks Kurt. Opening the response in a try block like that would close the response too early though. I fixed it without auto-closing it [here|https://github.com/apache/cassandra/commit/7082b64fc9a9a14c9f9cb9a492c0b25d109f4edc]. I also noticed there's at least one place {{StorageProxy#query}} can could throw an exception, so I also added a catch block to close any previously opened responses. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588731#comment-16588731 ] Kurt Greaves commented on CASSANDRA-10726: -- [~bdeggleston], looks like this [broke eclipse-warnings|https://circleci.com/gh/kgreav/cassandra/204] because we don't close the SingleRangeResponse in {{sendNextRequests}}. I fixed it with [this change|https://github.com/apache/cassandra/commit/75f9bcb40bd98f4c0c364f9d26d46be88a1f107f]. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.0 > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587730#comment-16587730 ] Marcus Eriksson commented on CASSANDRA-10726: - this lgtm, +1 > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585612#comment-16585612 ] Alex Petrov commented on CASSANDRA-10726: - Thank you! A couple more nits: * [here|https://github.com/bdeggleston/cassandra/commit/7b753828e38bb92884219867e10d01e4f4184da6#diff-b677a5a6a3f1a90a889bcf906c1f8001R56] here switch to {{ConcurrentMap}} seems to be unnecessary. If it's for explicit contract, I don't mind to leave it. * [here|https://github.com/bdeggleston/cassandra/commit/7b753828e38bb92884219867e10d01e4f4184da6#diff-0246c72855070863c2fdbee6d97f494dR156] we break unconditionally, so we might want to remove loop. I realise that the collection might be empty though, in which case this might be justified. * [here|https://github.com/bdeggleston/cassandra/commit/7b753828e38bb92884219867e10d01e4f4184da6#diff-0246c72855070863c2fdbee6d97f494dR170] I'm not 100% on board with rename as we're delegating the call to {{repair.maybeSendAdditionalRepairs}}, I think renaming {{awaitReads}} was enough. Or, if writes here are meant in the context of repair anyways, we should rename delegated method. Whichever way you might prefer. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581234#comment-16581234 ] Blake Eggleston commented on CASSANDRA-10726: - Review changes pushed to branch with a few exceptions. [~krummas], for {{BlockingPartitionRepair#isLocal}}and {{BlockingPartitionRepair#sendRR}}, these methods are both used in {{ReadRepairTestInstrumentedReadRepairHandler}} for testing. [~ifesdjeen]: bq. We might want to simplify the code a little by not caching versions here. I'd really prefer to leave as is if it's ok with you. It's not that complicated, and we avoid re-calculating mutation size for every endpoint we send a repair mutation to, which could be fairly regularly for higher rf keyspaces. bq. Partition range code path is also affected (since StorageProxy#fetchRows is changed). It'd be great to have dtests for partition ranges as well. fetchRows only processes SinglePartitionReadCommands (although it can process multiple from {{SELECT WHERE IN (...)}} queries). Partition range reads are handled elsewhere, and don't support speculation in the normal or read repair path. They are affected by the read repair refactor, but are already tested with existing dtests. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574643#comment-16574643 ] Alex Petrov commented on CASSANDRA-10726: - It seems that there still might be a problem with {{Accumulator}} during {{BlockingReadRepair}}. {{DataResolver}} is created in {{BlockingReadRepair#startRepair}}, which is gets {{allReplicas}} from {{ReadCallback#endpoints}}, which gets it through {{AbstractReadExecutor#getReadExecutor}}, where they represent {{consistencyLevel.filterForQuery(keyspace, allReplicas)}}. This means that, when we're sending more data requests, we will call {{getLiveSortedEndpoints}} and can end up with more nodes, but since {{Accumulator}} was initialised with just "target" nodes, if we keep getting responses (e.g. if the node was slow, not dead, which is more often the case), {{Accumulator}} will overflow. Unfortunately, testing with RF3 won't reveal that. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531020#comment-16531020 ] Marcus Eriksson commented on CASSANDRA-10726: - this looks good to me in general, a few minor comments below * Update {{StorageProxy#fetchRows}} comment or maybe add inline comments on each of the calls in that method * {{private final ReadRepair readRepair}} in {{ReadCallback}} can be removed since background RR was removed * In {{BlockingPartitionRepair}}, make the {{pendingRepairs}} {{Map}} a {{ConcurrentMap}} to indicate that it needs to be concurrent * {{BlockingPartitionRepair#isLocal}} is not tested, but annotated with {{@VisibleForTesting}} (could just remove it and call {{CL.isLocal}} in {{shouldBlockOn}}) * {{BlockingPartitionRepair#sendRR}} is not tested but annotated with {{@VisibleForTesting}}, make private and remove annotation? * {{BlockingReadRepair}} {{endpoints}} is not used * {{BlockingReadRepair}} needs comments on its methods, for example in {{startRepair}}, why {{digestResolver}} is unused and the difference between {{allEndpoints}} and {{targetEndpoints}} * {{BlockingReadRepair#shouldSpeculate}} could be private or add a test as it is not completely trivial * {{BlockingReadRepair#maybeSendAdditionalDataRequests}} sends out full data reads to all non-contacted endpoints (in the local dc if cl is local) - should we cap this at 1 additional replica? I guess it will most often just be a single extra replica, but might make sense to limit it for high-rf clusters? * {{ReadRepair#awaitRepair}}/{{ReadRepair#awaitRepairs}} naming is a bit confusing, maybe rename {{awaitRepair()}} to {{awaitRepairReads()}} or something? * {{ReadRepair}} interface needs comments on all methods * Maybe throw {{UnsupportedOperationException}} in {{PartitionRangeReadCommand#getReplicaToken}} instead since we never call {{sendAdditionalDataRequests}} when doing {{PartitionRangeReadCommand}}. An option could perhaps be to pass in the key to {{BlockingReadRepair#startRepair}} and keep it in {{BRR.DigestRepair}} and remove {{getReplicaToken()}}? * nice dtests, they need {{@since('4.0')}} and a few have empty docstrings > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525356#comment-16525356 ] ASF GitHub Bot commented on CASSANDRA-10726: Github user bdeggleston commented on the issue: https://github.com/apache/cassandra/pull/94 @tedcarroll this PR is out of date. See the patch on https://issues.apache.org/jira/browse/CASSANDRA-10726 > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525324#comment-16525324 ] ASF GitHub Bot commented on CASSANDRA-10726: Github user tedcarroll commented on a diff in the pull request: https://github.com/apache/cassandra/pull/94#discussion_r198575430 --- Diff: src/java/org/apache/cassandra/service/DataResolver.java --- @@ -40,18 +40,26 @@ import org.apache.cassandra.exceptions.ReadTimeoutException; import org.apache.cassandra.net.*; import org.apache.cassandra.tracing.Tracing; -import org.apache.cassandra.utils.FBUtilities; +import org.apache.cassandra.utils.Pair; public class DataResolver extends ResponseResolver { -@VisibleForTesting -final List repairResults = Collections.synchronizedList(new ArrayList<>()); +private final Map> repairResponseRequestMap = new HashMap<>(); --- End diff -- Should be a ConcurrentHashMap. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525311#comment-16525311 ] ASF GitHub Bot commented on CASSANDRA-10726: Github user tedcarroll commented on the issue: https://github.com/apache/cassandra/pull/94 This patch contains the potential for a ConcurrentModificationException. The HashMap at line 47 of DataResolver.java should probably be a ConcurrentHashMap. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476556#comment-16476556 ] Blake Eggleston commented on CASSANDRA-10726: - I reworked the patch after the recent read repair chance changes [here|https://github.com/bdeggleston/cassandra/tree/10726-v4], with dtests [here|https://github.com/bdeggleston/cassandra-dtest/tree/10726-v2], and a green test run [here|https://circleci.com/workflow-run/8e150317-aaf1-43db-b3ba-36327f05b7db] > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Blake Eggleston >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257697#comment-16257697 ] Blake Eggleston commented on CASSANDRA-10726: - I opened CASSANDRA-14058 for the refactor since that's sort of a separate task and to keep refactor discussion out of this ticket. There's an initial implementation up, take a look and let me know what you think. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234071#comment-16234071 ] Jeff Jirsa commented on CASSANDRA-10726: Agree as well - keep the default as-is, but I like the other two options. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234058#comment-16234058 ] Jeremiah Jordan commented on CASSANDRA-10726: - {quote}And I'm leaning towards these three settings: 1. Perform blocking RR (current default, left default) 2. Write hints instead of sending RR mutations, and let hint delivery repair the inconsistency, in a manner decoupled from the original read request 3. Do nothing, for those who rely on regular repair for one reason or another (thinking TWCS, DTCS users?){quote} I think the current default needs to stay the default (I was bitten by the original problem more than once in the 0.6/0.7 days), but giving advanced users the ability to disable that seems fine to me. +1 to those options. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang >Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220351#comment-16220351 ] Aleksey Yeschenko commented on CASSANDRA-10726: --- I don't agree that it's *required*, but I'm not talking here about changing the default behaviour. I'm talking about providing extra options, while retaining the default blocking behaviour. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219759#comment-16219759 ] Richard Low commented on CASSANDRA-10726: - Background read repair is quite different. This foreground read repair is required to be blocking as the discussion at the beginning of the ticket shows. Now I understand it, I think this is an important guarantee and people would be very surprised if this behaviour changed. So I'm strongly in favour of 1, although the title of the ticket may be misleading :) > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219254#comment-16219254 ] Aleksey Yeschenko commented on CASSANDRA-10726: --- I've been thinking more about this, in particular in context of deprecating/removing {{read_repair_chance}} and {{dclocal_read_repair_chance}} options. The question I asked myself was: what kind of tuning do we want for read repair, once those two are gone? And I'm leaning towards these three settings: 1. Perform blocking RR (current default, left default) 2. Write hints instead of sending RR mutations, and let hint delivery repair the inconsistency, in a manner decoupled from the original read request 3. Do nothing, for those who rely on regular repair for one reason or another (thinking TWCS, DTCS users?) Any opinions? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130504#comment-16130504 ] Xiaolong Jiang commented on CASSANDRA-10726: [~iamaleksey] Because I don't know how to do refactor to make my change clean except we change the iteractor style. If we do, it will be a much bigger change which we can put in a different new JIRA. But I am not sure how safe to refactor since I think the iteractor style is new engine style and it's all over the place, which seems the significant change and new direction after 3.0. If you have any suggestion of how to refactor or even specific refactor to this JIRA, I would be more than happy to make the proper change. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130320#comment-16130320 ] Aleksey Yeschenko commented on CASSANDRA-10726: --- bq. I wound not refactor the whole read pipeline right now I guess even though I do agree the code becomes so complicated Can you elaborate on reasons why you wouldn't do that? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129843#comment-16129843 ] Xiaolong Jiang commented on CASSANDRA-10726: [~krummas] we don't want to share results. if there is speculative read retry kicks in, it will mess up and may trigger a background async repair, which actually will do the repair twice together with foreground read repair. We are already doing duplicated repair already before. So I want to two repairs (foreground and background async repair) not to mess up each other. Ideally, I would return the map back as the input for next steps, which we don't have any race at all. However the iterator close thing makes it impossible to pass around value cleanly, that's how I end up with this shared map hack. [~iamaleksey] I wound not refactor the whole read pipeline right now I guess even though I do agree the code becomes so complicated :(. Regarding the 1M rows timeout, I compared with what we did before with I am doing now. It turns out previous code is waiting repair back with write rpc timeout. I was hoping I can make it better, but it turns out I am making it worse. It's better to wait longer instead of returning failure for read. If we can not get result even after waiting longer, the client will get timeout anyway. Thus I changed the repair wait time out same as before. and I also ran the stress test with 1M rows by shutdown node3 when writing and then read with cl=ALL to force read repair. It's looking to me, no read timeout now. I pushed the fix to same PR I sent previously. Could you please check again? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128848#comment-16128848 ] Aleksey Yeschenko commented on CASSANDRA-10726: --- Not a fan of what the patch does to already complicated {{AbstractReadExecutor}} and {{DataResolver}}. Given that the patch is going into trunk, we might use the opportunity to maybe do some deeper refactoring instead of hacking this on top of already overly complicated path? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128832#comment-16128832 ] Marcus Eriksson commented on CASSANDRA-10726: - Your patch fixes the issue, but don't we actually want the different invocations to share results? I tried replacing the HashMap with a ConcurrentHashMap and that also fixes the issue. I also ran a small test to make sure standard read performance is not horrible, but it seems we are getting way more timeouts with this patch - I started a 3 node ccm cluster, inserted 1M rows with stress, flushed and stopped node3, removed all its data and restarted, then read back 1M rows with CL=ALL. With the patch I'm getting lots of timeouts, without it, no timeouts. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126694#comment-16126694 ] Xiaolong Jiang commented on CASSANDRA-10726: [~krummas] this is the patch to fix CME https://github.com/krummas/cassandra/pull/4 Can u please take a look? If it sounds good, can you merge to your repo and retrigger the dtest? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126376#comment-16126376 ] Xiaolong Jiang commented on CASSANDRA-10726: [~krummas] After digging into this, I think the problem is there are two pathes to do the read repair. The first path is the one I fixed which is directly part of read call. The other path is back ground AsyncReadRepair runner which use same DataReslover to resolve the conflict. Since they are using same DataReslover to reslove, they are sharing same repairResponseRequestMap object. So the background AsyncRepairRunner is also changing this map causing the interation on the repairResponseRequestMap keySet get CME. Let me see how I can fix this. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125586#comment-16125586 ] Marcus Eriksson commented on CASSANDRA-10726: - [~xiaolong...@gmail.com] seems we are getting a CME: {code} java.util.ConcurrentModificationException: null at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) ~[na:1.8.0_149-apple] at java.util.HashMap$KeyIterator.next(HashMap.java:1461) ~[na:1.8.0_149-apple] at org.apache.cassandra.service.DataResolver$RepairMergeListener.awaitRepairResponses(DataResolver.java:298) ~[main/:na] at org.apache.cassandra.service.DataResolver$RepairMergeListener.waitRepairToFinishWithPossibleRetry(DataResolver.java:223) ~[main/:na] at org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:175) ~[main/:na] at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175) ~[main/:na] at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) ~[main/:na] at org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:103) ~[main/:na] at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:232) ~[main/:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_149-apple] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_149-apple] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[main/:na] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_149-apple] {code} in the {{materialized_views_test.py:TestMaterializedViewsConsistency.multi_partition_consistent_reads_after_write_test}} dtest > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123604#comment-16123604 ] Xiaolong Jiang commented on CASSANDRA-10726: [~krummas] Thanks for the review. Yes, your change looks good to me. I saw the tests in circle CI passed. Dtest is still running. Please go ahead and merge when dtest passes. I think we can go to 4.0 only for open source version. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123171#comment-16123171 ] Marcus Eriksson commented on CASSANDRA-10726: - btw, I think this should go to 4.0 only, do you agree? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123166#comment-16123166 ] Marcus Eriksson commented on CASSANDRA-10726: - This LGTM, pushed a branch with some small nits fixed here: https://github.com/krummas/cassandra/tree/xiaolong/10726 (please have a look) running tests: https://circleci.com/gh/krummas/cassandra/67 https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/175 Will commit if the tests look good and you think my nits are ok > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078474#comment-16078474 ] Xiaolong Jiang commented on CASSANDRA-10726: [~bdeggleston]I made a few changes based on your comments and squashed to one commit. Please take a look and let me know what you think. https://github.com/apache/cassandra/pull/94/files > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077257#comment-16077257 ] Xiaolong Jiang commented on CASSANDRA-10726: 1. I will change isQuorum to satisfiesQuorumFor and add unit tests. Not sure about your suggestion "satisfiedQuorumFor(int quorum)" though. I will mock keyspace and do the unit test 2. I will remove FBUtilities#waitOnFuturesNanos 3. I will make changes to wait maximum timeToWaitNanos for all responses instead of for each one 4. I do have tests to cover read repair response from second node which is testResolveOneReadRepairRetry in DataResolverTest. It's not directly checking the response, it's making sure the correct data is sent to peer4. (the response is actually mocked by calling resolver.preprocess which is meanless, we only need to make sure correct data is retried to peer4) 5. hum, it's building in my personal CASSANDRA-10726 branch. I will remove the "final" keyword. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077236#comment-16077236 ] Xiaolong Jiang commented on CASSANDRA-10726: [~bdeggleston]Thanks for taking the time to review. The reason why I need responseCntSnapshot is sources.length != responseCntSnapshot because of the read speculative retry. When read is slow, cassandra will try read on one more host > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068995#comment-16068995 ] Blake Eggleston commented on CASSANDRA-10726: - Sorry for the delay getting to this. I have some comments, but conceptually, this seems good. Here's my first round of comments: {{ConsistencyLevel#isQuorum}} * a name like {{satisfiesQuorumFor}} would be more descriptive * needs unit tests that demonstrates it works as expected for different rf/cl combos. It might be easier to have this call an {{satisfiedQuorumFor(int quorum)}}, and test that. {{FBUtilities#waitOnFuturesNanos}} * this can be removed, it’s not used anywhere {{DataResolver}} * {{repairResponseRequestMap}} ** unless you need to mutate this as part of a test, this should remain private. For testing purposes, you should have a getter method that makes a defensive copy * {{responseCntSnapshot}} ** I’m pretty sure you can get rid of this class member. In the resolve method, sources.length == responseCntSnapshot, so you can just use RepairMergeListener.sources.length instead of setting a variable on the class. * {{timeOuts}} ** I think the intended behavior is to wait for a max of {{timeToWaitNanos}}, but return immediately if any of the read repair recipients have responded. This method waits for {{timeToWaitNanos}} for each AsyncOneResponse, so if the last response has been received, it will still wait for each preceding response future before getting to it and returning. ** should also have a more descriptive name like {{awaitRepairResponses}} or something * {{waitRepairToFinishWithPossibleRetry}} ** this method needs a detailed javadoc explaining what sort of guarantees it’s providing with regard to read repair. Specifically, the idea of a “monotonic read” is brought up in it’s comments, this has been discussed in this JIRA, but it also needs to be explained in the code. How it provides that guarantee without unnecessary blocking should also be explained {{DataResolverTest}} * You’re using the wrong brace style for some of the tests still * I don’t see any tests confirming that the method actually returns as expected if you receive a read repair response from the second node. Also, could you please also fix these more general issues before posting changes: * fix the compilation errors (it won’t build for me) * squash and rebase onto trunk, the way the various commits are interleaved with other commits makes it difficult to review this without the help of github. * remove the unnecessary uses of the {{final}} keyword. It's used in a lot of local variables and method arguments. {{final}} typically isn't used in C* outside of class members > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006704#comment-16006704 ] Xiaolong Jiang commented on CASSANDRA-10726: regarding distinctHostNum: /** * When doing the read repair, the mutation is per partition key, so it's possible we will repair multiple * partitions into different hosts. let's say RF = 5, we need to read partition p1, p2, p3, p4 from three nodes, * n1, n2, n3. If n1 contains latest data, n2 is missing p1 and p2, n2 is missing p3 and p4. So we need to run * repair for n2 by sending p1 and p2 partitions and run repair for n3 by sending p3 and p4 partitions. it's * possible p1 and p3 repair is slow, so beloew distinctHostNum will return 2. In this case, I will not retry * a new node for read repair since this read repair retry will only handle one slow host. If p3 and p4 is fast, * p1 and p2 repair is slow or just p1 repair is slow, below distinctHostNum will return 1, in this case, I will * retry 1 extra node and send p1, p2 to extra node or just p1 if only p1 read repair times out. * In same host, we can have multiple partition read repair and we can only handle one host slowness, so we should * get distinct host from read repair response future. */ > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006701#comment-16006701 ] Xiaolong Jiang commented on CASSANDRA-10726: regarding the response snapshot count: we need to capture this count and save to this data resolver. All return results is based on this many reponse count and below "responseCntSnapshot" is used to calculate whether read repair is ok or not since response list can get more response later, but we only iterator "responseCntSnapshot" for any future operations including read repair retry. So we have to save this count in current state. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925260#comment-15925260 ] Jason Brown commented on CASSANDRA-10726: - [~xiaolong...@gmail.com] Thanks for the comments they add more context to the patch. Here's my initial round of comments on your patch: - general nit: code style is incorrect wrt braces - rename {{ConsistencyLevel#isMajority}} to {{ConsistencyLevel#isQuorum}} as it's more in line with our general naming conventions - {{AbstractReadExecutor#getReadRepairExtra}} -- function needs a javadoc comment -- rename {{#readRepairExtra}} to ... ??? "spareReadRepairNode". I would like to make this Optional as it makes it clearer that it's not required (rather than piggybacking on "null" to indicate that sematic meaning). -- {{readRepairExtra = allSortedReplicas.get(blockForReplicas.size());}} - this seems unsafe as it assumes the {{List}}s are ordered the same. - {{DataResolver}} -- {{#reppairExtra}} is misspelled. Further, make it consistent with whatever we name it in {{AbstractReadExecutor}} -- {{#close}} - don't add the exception to the {{Tracing.trace()}} call, and it's debatable if you need it on the {{logger.debug}} line, as well. -- {{repairResponseRequestMap}} and {{#responseCntSnapshot}} - I don't think you actually need these. You just care about the number of replicas that have responded, and you can just call {{responses.size()}} where you actually care about (in {{#waitRepairToFinishWithPossibleRetry()}}. -- {{#distinctHostNum()}} I'm not sure why you need this as we should only send the message to distinct hosts. I think you can just use {{results.size()}} instead. -- {{#waitRepairToFinishWithPossibleRetry}} --- wrt the block starting at line 224. You iterate over all the timed out entries, but you need to cover the full merge set and send that "repairExtra" node. You don't know what data that replica has or needs, thus you need to send the full merged set. I'm not sure off the top of my head where/how to grab the merged rows, but I'm sure you can figure it out ;) --- Furthermore, once you get the merged set, you do not need to send multiple messages to the target node, and can instead send one. Thus you can simplify the block starting at line 224 as such: {code} Tracing.trace("retry read-repair-mutation to {}", reppairExtra); PartitionUpdate update = // get merged result from some location ... MessageOut messageOut = new Mutation(update).createMessage(MessagingService.Verb.READ_REPAIR); AsyncOneResponse response = MessagingService.instance().sendRR(messageOut,reppairExtra); response.get(waitTimeNanos, TimeUnit.NANOSECONDS); // TimeoutException will be thrown {code} The biggest thing this patch needs is testing. You might be able to unit test this one (in fact [~xiaolong...@gmail.com] spoke offline about some idea about how to do it), but it will take some time (worthwhile, in my opinion). A dtest will probably be required, as well, even though that will get tricky - byteman will probably necessary to help you out. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923200#comment-15923200 ] Xiaolong Jiang commented on CASSANDRA-10726: The patch is trying to do 2 things: 1. Before, when we read, say quorum read, let RF = 3 (replica1, replica2, replica3), so the client request is trying to read from 2 replicas (replica1, replica2), but there is a digest mismatch between these 2 replicas, so read repair will kick in. Let's say the stale data is in replica2, read repair will send the correct data to replica2. But for some reason, the write request got timeout, then we send "read timeout " to client side. After this patch, we will wait for replica2 write for some time, if it didn't come back, correct data is sent to replica3 no matter whether replica3 already has latest data or not. Because we know if replica3 write succeeds, it's guaranteed 2 replicas got the correct data, client will return success with data for read request, and next time the quorum read will definitely read correct data. 2. The second thing this patch is trying to do is to make sure in read repair part, we don't block for replicas beyond what is needed for consistency level to reply back in speculative retry/read repair chance case. For example, we still use above RF = 3 quorum read case, it's trying to read from replica1 and replica2, but replica2 is slow, then speculative retry kicks in, read will try to read replica3, then all 3 replicas read come back, but there is digest mismatch, both replica2 and replica3 are stale data, what happens before is read repair will block for both replica2 and replica3 to finish read repair, but there is no need to wait for both to come back, we only need to wait for one repair to come back since we only need one successful repair to guarantee successful quorum read. And next quorum read will definitely read latest data even replica 3 read repair failed.This is applied same to read repiar chance. Let's say the read repair chance is "GLOBAL", we don't need to block for all replicas to finish repair, we only need to block what the read consistency level needs. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > Fix For: 3.0.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872885#comment-15872885 ] Xiaolong Jiang commented on CASSANDRA-10726: [~jasobrown] Patch is ready https://github.com/apache/cassandra/pull/94 > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872879#comment-15872879 ] ASF GitHub Bot commented on CASSANDRA-10726: GitHub user xiaolong302 opened a pull request: https://github.com/apache/cassandra/pull/94 CASSANDRA-10726: Read repair inserts should use speculative retry 1. do an extra read repair retry to only guarantee “monotonic quorum read”. Here “quorum” means majority of nodes among replicas 2. only block what is needed for resolving the digest mismatch no matter whether it’s speculative retry or read repair chance. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xiaolong302/cassandra CASSANDRA-10726 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/94.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #94 commit a587c20e82ffc4aa7c4a3cb1468551b255fc7f71 Author: Xiaolong JiangDate: 2017-01-17T05:31:06Z Cass: CASSANDRA-10726: Read repair inserts should use speculative retry 1. do an extra read repair retry to only guarantee “monotonic quorum read”. Here “quorum” means majority of nodes among replicas 2. only block what is needed for resolving the digest mismatch no matter whether it’s speculative retry or read repair chance. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658728#comment-15658728 ] Xiaolong Jiang commented on CASSANDRA-10726: [~kohlisankalp]Can you assign to me? [~rlow] did a quick chat to me about this JIRA. I will try to fix this one. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Nachiket Patil > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394236#comment-15394236 ] sankalp kohli commented on CASSANDRA-10726: --- Assigned to [~nachiket_patil] > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Nachiket Patil > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181364#comment-15181364 ] Jonathan Ellis commented on CASSANDRA-10726: Yes. And if two of the nodes agree on the most recent value then we don't need to block on the third being updated. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181335#comment-15181335 ] Richard Low commented on CASSANDRA-10726: - What do you think [~jbellis] [~slebresne]? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122431#comment-15122431 ] Richard Low commented on CASSANDRA-10726: - Actually, isn't the real problem here that speculative retry doesn't include the RR write? We should give up waiting for the write to complete and retry on another replica. A slow RR insert is just as bad as a slow read. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093872#comment-15093872 ] Sylvain Lebresne commented on CASSANDRA-10726: -- bq. Would a reasonable half way house be to keep the write as blocking but return success in the case of a write timeout? That would still break the "monotonic quorum reads": unless you get positive acks from the read-repair, you can't guarantee a quorum of replica is now up to date. Granted, it will work more often if we do that (than if we don't block at all), but guarantees are not about "most of the time" :) And just to recap my personal position on this, I do feel we should keep the guarantee, at least by default, and still feel the right way to deal with the scenario you're complaining about would be a better way to deal with nodes backing up on writes. But we all know it's easier said than fixed, and while I'd rather we spend time on that better way to deal with the 2 scenario [~jbellis] mentioned above, I'm not too strongly opposed to a -D stopgap for advanced users. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093975#comment-15093975 ] Jonathan Ellis commented on CASSANDRA-10726: I can live with the -D approach. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092081#comment-15092081 ] Brandon Williams commented on CASSANDRA-10726: -- I understand your sentiment, but maybe a -D flag for the 'power user' isn't too awful? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092135#comment-15092135 ] Aleksey Yeschenko commented on CASSANDRA-10726: --- bq. I understand your sentiment, but maybe a -D flag for the 'power user' isn't too awful? I agree. We lose nothing here by providing a -D flag. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092150#comment-15092150 ] Richard Low commented on CASSANDRA-10726: - It would lose a guarantee (which admittedly I didn't know existed), but most people who care about what happens when there's a write timeout will use CAS read and write. Would a reasonable half way house be to keep the write as blocking but return success in the case of a write timeout? Then almost always the behaviour will be the same, but it would avoid the timeouts caused by a single broken replica. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092013#comment-15092013 ] Jonathan Ellis commented on CASSANDRA-10726: For the last, we already have options for how much read repair to do. For the former, I'm really -0 leaning to -1 against adding an option to discard an important guarantee. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081936#comment-15081936 ] Brandon Williams commented on CASSANDRA-10726: -- Let's make it configurable for a) blocking read repair, b) non-blocking read repair, and c) no read repair at all. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081929#comment-15081929 ] Richard Low commented on CASSANDRA-10726: - +1 on the option to disable. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068248#comment-15068248 ] Jonathan Ellis commented on CASSANDRA-10726: Seeing reads "go backwards in time" is one of the most confusing aspects of eventual consistency for people, so I do think it's important that quorum reads avoid that, even more so because users tend to oversimplify quorum reads as "strong consistency that means I don't have to think about EC." So to the degree we can make that assumption true, we should, especially if that's been our behavior already for 4+ years. It seems like there are two primary problem scenarios: * When a node is overloaded for writes, this stops reads as well. First, delaying reads when we're behind on writes is arguably a good thing that will help you recover faster. Second, the right way to tackle this is with better handling of the write overload as in CASANDRA-9318. * When data is read-only because disks are failing. I agree with Sylvain that half-broken is often worse than completely broken, and in this specific case if a disk puts itself in read-only mode then it won't be long until it isn't readable either. This is another case where "mark a disk bad and broadcast to other nodes not to send me requests for tokens pinned to it" as envisioned in CASSANDRA-6696 would be useful, along with an option for "promote write errors to blacklist on reads as wells." > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062414#comment-15062414 ] Aleksey Yeschenko commented on CASSANDRA-10726: --- Need to think about it more, but I wouldn't mind, in principle, dropping the guarantee. Arguably the behaviour seen by Richard is more confusing/painful than CASSANDRA-2494 issue. Should at the very least make an option to disable this. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059773#comment-15059773 ] Sylvain Lebresne commented on CASSANDRA-10726: -- The comment in the code is not terribly informative, but the original reason for this is CASSANDRA-2494. I'll let you read up for precise context, but the summary is that we wait for read repair to ensure "monotonic quorum reads", i.e. that if you do 2 successive quorum reads, you're guaranteed the 2nd one won't see something older than the 1st one, and this even if you had a failed quorum write that made you write the most up to date value only to a minority of replicas. How important that guarantee is in practice is probably debatable, I'm sure some would be totally fine without it, but we've been providing it (silently) pretty much forever at this point so some users are likely relying on it (even if without realizing it). I also more generally think we should try to always lean towards providing more guarantee rather than less when we can as it yield a less surprising system. So, without pretending this is the best guarantee since sliced bread, I'm not terribly enthusiastic at the idea of dropping it. Which doesn't mean I ignore the problem you're raising. If a node properly respond to read but not to writes, then it can indeed be a problem for the reads it is participating in, and that's not great. I'm just not sure dropping our monotonic quorum read guarantee is the correct way to mitigate that problem. Part of me feels like a node that is dropping writes consistently shouldn't be serving read as if everything was fine (a half broken node is often worth than a fully broken one) but I'm not saying I have a good solution of the top of my head to ensure that without too much downsides. I'd certainly welcome a broader range of opinions/ideas ([~jbellis], [~iamaleksey] in particular?). > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058496#comment-15058496 ] sankalp kohli commented on CASSANDRA-10726: --- cc [~brandon.williams] [~slebresne] What do you think? > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058502#comment-15058502 ] Brandon Williams commented on CASSANDRA-10726: -- Blocking foreground repair basically means a quorum or all consistency level. If we return before completing the write, that violates consistency. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058576#comment-15058576 ] Richard Low commented on CASSANDRA-10726: - How does it violate consistency? The replica was already inconsistent to require a read repair insert so returning before completing the write can't make it any worse. > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)