Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Thank you Sylvain and Benedict for the patch and thank you to everybody that took the time to contribute to this discussion :-) On Fri, Nov 27, 2020 at 5:15 PM Sylvain Lebresne wrote: > I hope I haven't misread this, but it appears we've reached a kind of > consensus for committing the fix, so I went ahead and did it. > I added a NEWS entry that I hope is clear (and points to the flag that > disables the fix if someone wants to go that route), but any committers can > feel free to ninja-nitpick that NEWS entry if they so wish. > > Many thanks to Benjamin for driving the discussion here. > -- > Sylvain > > > On Tue, Nov 24, 2020 at 3:43 PM Ekaterina Dimitrova > > wrote: > > > I am +1 on Benjamin’s proposal > > and less interruptions during upgrades. For more visibility maybe we can > > also write a short article about the options and the tradeoffs, further > to > > NEWS.txt (that’s not something to decide now, of course :-) ) > > > > > > On Tue, 24 Nov 2020 at 9:13, Benjamin Lerer > > > wrote: > > > > > Paulo, what you propose with the yaml seems different from default to > > > *correctness*. It means to me that we are forcing the user to choose > > > between *correctness *and *performance*. Most of us have a good > > > understanding of the problem and it is a hard choice for us. I imagine > > that > > > most of the users do not fully understand LWTs and will not know what > to > > > choose. Some might not even use LWTs and will suddenly be forced to > make > > a > > > choice that they do not understand. It does not feel right to me to > push > > > them to make that choice. > > > > > > I also agree with Benedict and Mick that it is a risky thing to do. > > > > > > something that can bring a cluster down upon an unprepared user. > > > > > > > > > I do not think that it will be the case (feel free to correct me > > Benedict). > > > The impact will probably be an increase in the number of write/read > > > timeouts for the LWTs read/writes. For a heavy load that would cause > the > > > services depending on those queries to become unreliable. On the other > > hand > > > the impact of the current problem is that we can hit some correctness > > issue > > > without even knowing it. > > > > > > We need to choose between two imperfect solutions and we have some > > > difficulties to agree on which one to choose. > > > > > > Benedict suggested that Sylvain and I made the choice. Sylvain did not > > want > > > to make the final call. > > > I chose correctness. If it is a problem and people prefer to vote. It > is > > > perfectly fine for me too :-) > > > > > > I just want us to move forward. > > > > > > > > > > > > On Tue, Nov 24, 2020 at 12:52 PM Mick Semb Wever > wrote: > > > > > > > > I think the keyword there is "normally" - if we can't say > > _certainly_, > > > > > then this is probably an unsafe change to make. > > > > > > > > > > I can imagine any number of hacky upgrade processes that would be > > > > > dangerous with this change. > > > > > > > > > > > > > > > > > I agree. We just don't know what users are doing, this is risky. > > > > > > > > IMO the same applies to a performance degradation, i.e. something > that > > > can > > > > bring a cluster down upon an unprepared user. Despite our best > efforts > > > with > > > > NEWS.txt we should still look after such users. IMHO the imperfection > > of > > > > LWTs on past branches we have to carry. I'm well aware this is easier > > > said > > > > than done, even for far simpler changes. Having the flag there to > > switch > > > to > > > > "correct LWT" is still a huge win for users. > > > > > > > > > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
I hope I haven't misread this, but it appears we've reached a kind of consensus for committing the fix, so I went ahead and did it. I added a NEWS entry that I hope is clear (and points to the flag that disables the fix if someone wants to go that route), but any committers can feel free to ninja-nitpick that NEWS entry if they so wish. Many thanks to Benjamin for driving the discussion here. -- Sylvain On Tue, Nov 24, 2020 at 3:43 PM Ekaterina Dimitrova wrote: > I am +1 on Benjamin’s proposal > and less interruptions during upgrades. For more visibility maybe we can > also write a short article about the options and the tradeoffs, further to > NEWS.txt (that’s not something to decide now, of course :-) ) > > > On Tue, 24 Nov 2020 at 9:13, Benjamin Lerer > wrote: > > > Paulo, what you propose with the yaml seems different from default to > > *correctness*. It means to me that we are forcing the user to choose > > between *correctness *and *performance*. Most of us have a good > > understanding of the problem and it is a hard choice for us. I imagine > that > > most of the users do not fully understand LWTs and will not know what to > > choose. Some might not even use LWTs and will suddenly be forced to make > a > > choice that they do not understand. It does not feel right to me to push > > them to make that choice. > > > > I also agree with Benedict and Mick that it is a risky thing to do. > > > > something that can bring a cluster down upon an unprepared user. > > > > > > I do not think that it will be the case (feel free to correct me > Benedict). > > The impact will probably be an increase in the number of write/read > > timeouts for the LWTs read/writes. For a heavy load that would cause the > > services depending on those queries to become unreliable. On the other > hand > > the impact of the current problem is that we can hit some correctness > issue > > without even knowing it. > > > > We need to choose between two imperfect solutions and we have some > > difficulties to agree on which one to choose. > > > > Benedict suggested that Sylvain and I made the choice. Sylvain did not > want > > to make the final call. > > I chose correctness. If it is a problem and people prefer to vote. It is > > perfectly fine for me too :-) > > > > I just want us to move forward. > > > > > > > > On Tue, Nov 24, 2020 at 12:52 PM Mick Semb Wever wrote: > > > > > > I think the keyword there is "normally" - if we can't say > _certainly_, > > > > then this is probably an unsafe change to make. > > > > > > > > I can imagine any number of hacky upgrade processes that would be > > > > dangerous with this change. > > > > > > > > > > > > > I agree. We just don't know what users are doing, this is risky. > > > > > > IMO the same applies to a performance degradation, i.e. something that > > can > > > bring a cluster down upon an unprepared user. Despite our best efforts > > with > > > NEWS.txt we should still look after such users. IMHO the imperfection > of > > > LWTs on past branches we have to carry. I'm well aware this is easier > > said > > > than done, even for far simpler changes. Having the flag there to > switch > > to > > > "correct LWT" is still a huge win for users. > > > > > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
I am +1 on Benjamin’s proposal and less interruptions during upgrades. For more visibility maybe we can also write a short article about the options and the tradeoffs, further to NEWS.txt (that’s not something to decide now, of course :-) ) On Tue, 24 Nov 2020 at 9:13, Benjamin Lerer wrote: > Paulo, what you propose with the yaml seems different from default to > *correctness*. It means to me that we are forcing the user to choose > between *correctness *and *performance*. Most of us have a good > understanding of the problem and it is a hard choice for us. I imagine that > most of the users do not fully understand LWTs and will not know what to > choose. Some might not even use LWTs and will suddenly be forced to make a > choice that they do not understand. It does not feel right to me to push > them to make that choice. > > I also agree with Benedict and Mick that it is a risky thing to do. > > something that can bring a cluster down upon an unprepared user. > > > I do not think that it will be the case (feel free to correct me Benedict). > The impact will probably be an increase in the number of write/read > timeouts for the LWTs read/writes. For a heavy load that would cause the > services depending on those queries to become unreliable. On the other hand > the impact of the current problem is that we can hit some correctness issue > without even knowing it. > > We need to choose between two imperfect solutions and we have some > difficulties to agree on which one to choose. > > Benedict suggested that Sylvain and I made the choice. Sylvain did not want > to make the final call. > I chose correctness. If it is a problem and people prefer to vote. It is > perfectly fine for me too :-) > > I just want us to move forward. > > > > On Tue, Nov 24, 2020 at 12:52 PM Mick Semb Wever wrote: > > > > I think the keyword there is "normally" - if we can't say _certainly_, > > > then this is probably an unsafe change to make. > > > > > > I can imagine any number of hacky upgrade processes that would be > > > dangerous with this change. > > > > > > > > > I agree. We just don't know what users are doing, this is risky. > > > > IMO the same applies to a performance degradation, i.e. something that > can > > bring a cluster down upon an unprepared user. Despite our best efforts > with > > NEWS.txt we should still look after such users. IMHO the imperfection of > > LWTs on past branches we have to carry. I'm well aware this is easier > said > > than done, even for far simpler changes. Having the flag there to switch > to > > "correct LWT" is still a huge win for users. > > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
> Benedict suggested that Sylvain and I made the choice. Sylvain did not want > to make the final call. > I chose correctness. If it is a problem and people prefer to vote. It is > perfectly fine for me too :-) +1 Appreciate it having been raised for exposure and discussion Benjamin, and happy to leave the final say to those carrying the work on, especially in this case :-) - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Fair points. I retract the yaml suggestion and +1 to go with the correctness route. Em ter., 24 de nov. de 2020 às 11:13, Benjamin Lerer < benjamin.le...@datastax.com> escreveu: > Paulo, what you propose with the yaml seems different from default to > *correctness*. It means to me that we are forcing the user to choose > between *correctness *and *performance*. Most of us have a good > understanding of the problem and it is a hard choice for us. I imagine that > most of the users do not fully understand LWTs and will not know what to > choose. Some might not even use LWTs and will suddenly be forced to make a > choice that they do not understand. It does not feel right to me to push > them to make that choice. > > I also agree with Benedict and Mick that it is a risky thing to do. > > something that can bring a cluster down upon an unprepared user. > > > I do not think that it will be the case (feel free to correct me Benedict). > The impact will probably be an increase in the number of write/read > timeouts for the LWTs read/writes. For a heavy load that would cause the > services depending on those queries to become unreliable. On the other hand > the impact of the current problem is that we can hit some correctness issue > without even knowing it. > > We need to choose between two imperfect solutions and we have some > difficulties to agree on which one to choose. > > Benedict suggested that Sylvain and I made the choice. Sylvain did not want > to make the final call. > I chose correctness. If it is a problem and people prefer to vote. It is > perfectly fine for me too :-) > > I just want us to move forward. > > > > On Tue, Nov 24, 2020 at 12:52 PM Mick Semb Wever wrote: > > > > I think the keyword there is "normally" - if we can't say _certainly_, > > > then this is probably an unsafe change to make. > > > > > > I can imagine any number of hacky upgrade processes that would be > > > dangerous with this change. > > > > > > > > > I agree. We just don't know what users are doing, this is risky. > > > > IMO the same applies to a performance degradation, i.e. something that > can > > bring a cluster down upon an unprepared user. Despite our best efforts > with > > NEWS.txt we should still look after such users. IMHO the imperfection of > > LWTs on past branches we have to carry. I'm well aware this is easier > said > > than done, even for far simpler changes. Having the flag there to switch > to > > "correct LWT" is still a huge win for users. > > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Paulo, what you propose with the yaml seems different from default to *correctness*. It means to me that we are forcing the user to choose between *correctness *and *performance*. Most of us have a good understanding of the problem and it is a hard choice for us. I imagine that most of the users do not fully understand LWTs and will not know what to choose. Some might not even use LWTs and will suddenly be forced to make a choice that they do not understand. It does not feel right to me to push them to make that choice. I also agree with Benedict and Mick that it is a risky thing to do. something that can bring a cluster down upon an unprepared user. I do not think that it will be the case (feel free to correct me Benedict). The impact will probably be an increase in the number of write/read timeouts for the LWTs read/writes. For a heavy load that would cause the services depending on those queries to become unreliable. On the other hand the impact of the current problem is that we can hit some correctness issue without even knowing it. We need to choose between two imperfect solutions and we have some difficulties to agree on which one to choose. Benedict suggested that Sylvain and I made the choice. Sylvain did not want to make the final call. I chose correctness. If it is a problem and people prefer to vote. It is perfectly fine for me too :-) I just want us to move forward. On Tue, Nov 24, 2020 at 12:52 PM Mick Semb Wever wrote: > > I think the keyword there is "normally" - if we can't say _certainly_, > > then this is probably an unsafe change to make. > > > > I can imagine any number of hacky upgrade processes that would be > > dangerous with this change. > > > > > I agree. We just don't know what users are doing, this is risky. > > IMO the same applies to a performance degradation, i.e. something that can > bring a cluster down upon an unprepared user. Despite our best efforts with > NEWS.txt we should still look after such users. IMHO the imperfection of > LWTs on past branches we have to carry. I'm well aware this is easier said > than done, even for far simpler changes. Having the flag there to switch to > "correct LWT" is still a huge win for users. >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
> I think the keyword there is "normally" - if we can't say _certainly_, > then this is probably an unsafe change to make. > > I can imagine any number of hacky upgrade processes that would be > dangerous with this change. > I agree. We just don't know what users are doing, this is risky. IMO the same applies to a performance degradation, i.e. something that can bring a cluster down upon an unprepared user. Despite our best efforts with NEWS.txt we should still look after such users. IMHO the imperfection of LWTs on past branches we have to carry. I'm well aware this is easier said than done, even for far simpler changes. Having the flag there to switch to "correct LWT" is still a huge win for users.
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
I think the keyword there is "normally" - if we can't say _certainly_, then this is probably an unsafe change to make. I can imagine any number of hacky upgrade processes that would be dangerous with this change. But, happy to defer to the consensus of others. On 24/11/2020, 11:04, "Paulo Motta" wrote: In this case the breaking change is a feature, not a bug. The exact intention of this is to require manual intervention to raise awareness about the potential performance degradation. This sounds reasonable, once we already broke the contract of not introducing performance regressions in a minor. I don't see how this can pose an outage risk to the cluster given upgrades are normally performed in a rolling restart fashion, so the worst that could happen is the first node in the sequence not starting, so the upgrade would not proceed. In my view this would be far less harmful than figuring out about a performance regression after all your nodes are upgraded. Nevertheless, I'm pretty fine on retracting the suggestion to move forward with the proposal if you feel strongly about it. Em ter., 24 de nov. de 2020 às 07:26, Benedict Elliott Smith < bened...@apache.org> escreveu: > In my parlance the config property would be a breaking change, whereas the > LWT behaviour would be a performance regression. This latter might cause > partial outages or service degradation, but refusing to start a prod > cluster without manual intervention is potentially a much worse situation, > and even more surprising for a patch upgrade. > > On 24/11/2020, 01:05, "Paulo Motta" wrote: > > Isn't the plan to change LWT implementation (and performance > expectation) > in a patch version? This is a breaking change by itself, I'm just > proposing > to make the trade-off choice explicit in the yaml to prevent unexpected > performance degradation during upgrade (for users who are not aware of > the > change). > > Just to make it clear, I'm proposing having a "lwt_legacy_mode: false" > uncommented in the default yaml with a descriptive comment about > CASSANDRA-12126, so new users will always get the new behavior, but > users > using a yaml template based on a previous 3.X version will not be able > to > start the node because this property will be missing. I believe the > majority of operators will just update their yaml with > "lwt_legacy_mode: > false" and move on with their upgrades, but people wanting to keep the > previous performance will become aware of the breaking change and set > it to > true. > > Em seg., 23 de nov. de 2020 às 21:07, Benedict Elliott Smith < > bened...@apache.org> escreveu: > > > What do you mean by minor upgrade? We can't break patch upgrades for > any > > of 3.x, as this could also cause surprise outages. > > > > On 23/11/2020, 23:51, "Paulo Motta" > wrote: > > > > I was thinking about the YAML requirement during the 3.X minor > > upgrade to > > make the decision explicit (need to update yaml) rather than > implicit > > (by > > upgrading you agree with the change), since the latter can go > > unnoticed by > > those who don't pay attention to NEWS.txt > > > > Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith < > > bened...@apache.org> escreveu: > > > > > What's the value of the yaml? The user is likely to have > upgraded to > > > latest 3.x as part of the upgrade process to 4.0, so they'll > already > > have > > > had a decision made for them. If correctness didn't break > anything, > > there > > > doesn't any longer seem much point in offering a choice? > > > > > > On 23/11/2020, 22:45, "Brandon Williams" > wrote: > > > > > > +1 to both as well. > > > > > > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston > > > > > > wrote: > > > > > > > +1 to correctness, and I like the yaml idea > > > > > > > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta < > > pauloricard...@gmail.com > > > > > > > > wrote: > > > > > > > > > > +1 to defaulting for correctness. > > > > > > > > > > In addition to that, how about making it a mandatory > > cassandra.yaml > > > > > property defaulting to correctness? This would make > upgrades > > with > > > an old > > > > > cassandra.yaml fail unless
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
In this case the breaking change is a feature, not a bug. The exact intention of this is to require manual intervention to raise awareness about the potential performance degradation. This sounds reasonable, once we already broke the contract of not introducing performance regressions in a minor. I don't see how this can pose an outage risk to the cluster given upgrades are normally performed in a rolling restart fashion, so the worst that could happen is the first node in the sequence not starting, so the upgrade would not proceed. In my view this would be far less harmful than figuring out about a performance regression after all your nodes are upgraded. Nevertheless, I'm pretty fine on retracting the suggestion to move forward with the proposal if you feel strongly about it. Em ter., 24 de nov. de 2020 às 07:26, Benedict Elliott Smith < bened...@apache.org> escreveu: > In my parlance the config property would be a breaking change, whereas the > LWT behaviour would be a performance regression. This latter might cause > partial outages or service degradation, but refusing to start a prod > cluster without manual intervention is potentially a much worse situation, > and even more surprising for a patch upgrade. > > On 24/11/2020, 01:05, "Paulo Motta" wrote: > > Isn't the plan to change LWT implementation (and performance > expectation) > in a patch version? This is a breaking change by itself, I'm just > proposing > to make the trade-off choice explicit in the yaml to prevent unexpected > performance degradation during upgrade (for users who are not aware of > the > change). > > Just to make it clear, I'm proposing having a "lwt_legacy_mode: false" > uncommented in the default yaml with a descriptive comment about > CASSANDRA-12126, so new users will always get the new behavior, but > users > using a yaml template based on a previous 3.X version will not be able > to > start the node because this property will be missing. I believe the > majority of operators will just update their yaml with > "lwt_legacy_mode: > false" and move on with their upgrades, but people wanting to keep the > previous performance will become aware of the breaking change and set > it to > true. > > Em seg., 23 de nov. de 2020 às 21:07, Benedict Elliott Smith < > bened...@apache.org> escreveu: > > > What do you mean by minor upgrade? We can't break patch upgrades for > any > > of 3.x, as this could also cause surprise outages. > > > > On 23/11/2020, 23:51, "Paulo Motta" > wrote: > > > > I was thinking about the YAML requirement during the 3.X minor > > upgrade to > > make the decision explicit (need to update yaml) rather than > implicit > > (by > > upgrading you agree with the change), since the latter can go > > unnoticed by > > those who don't pay attention to NEWS.txt > > > > Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith < > > bened...@apache.org> escreveu: > > > > > What's the value of the yaml? The user is likely to have > upgraded to > > > latest 3.x as part of the upgrade process to 4.0, so they'll > already > > have > > > had a decision made for them. If correctness didn't break > anything, > > there > > > doesn't any longer seem much point in offering a choice? > > > > > > On 23/11/2020, 22:45, "Brandon Williams" > wrote: > > > > > > +1 to both as well. > > > > > > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston > > > > > > wrote: > > > > > > > +1 to correctness, and I like the yaml idea > > > > > > > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta < > > pauloricard...@gmail.com > > > > > > > > wrote: > > > > > > > > > > +1 to defaulting for correctness. > > > > > > > > > > In addition to that, how about making it a mandatory > > cassandra.yaml > > > > > property defaulting to correctness? This would make > upgrades > > with > > > an old > > > > > cassandra.yaml fail unless an option is explicitly > specified, > > > making > > > > > operators aware of the issue and forcing them to make a > > choice. > > > > > > > > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < > > > > >> benjamin.le...@datastax.com> escreveu: > > > > >> > > > > >> Thank you very much to everybody that provided > feedback. It > > > helped a > > > > lot to > > > > >> limit our options. > > > > >> > > > > >> Unfortunately, it seems that some poor soul (me, > really!!!) > > will > > > have to > > > > >> make the final call between #3 and #4. >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
In my parlance the config property would be a breaking change, whereas the LWT behaviour would be a performance regression. This latter might cause partial outages or service degradation, but refusing to start a prod cluster without manual intervention is potentially a much worse situation, and even more surprising for a patch upgrade. On 24/11/2020, 01:05, "Paulo Motta" wrote: Isn't the plan to change LWT implementation (and performance expectation) in a patch version? This is a breaking change by itself, I'm just proposing to make the trade-off choice explicit in the yaml to prevent unexpected performance degradation during upgrade (for users who are not aware of the change). Just to make it clear, I'm proposing having a "lwt_legacy_mode: false" uncommented in the default yaml with a descriptive comment about CASSANDRA-12126, so new users will always get the new behavior, but users using a yaml template based on a previous 3.X version will not be able to start the node because this property will be missing. I believe the majority of operators will just update their yaml with "lwt_legacy_mode: false" and move on with their upgrades, but people wanting to keep the previous performance will become aware of the breaking change and set it to true. Em seg., 23 de nov. de 2020 às 21:07, Benedict Elliott Smith < bened...@apache.org> escreveu: > What do you mean by minor upgrade? We can't break patch upgrades for any > of 3.x, as this could also cause surprise outages. > > On 23/11/2020, 23:51, "Paulo Motta" wrote: > > I was thinking about the YAML requirement during the 3.X minor > upgrade to > make the decision explicit (need to update yaml) rather than implicit > (by > upgrading you agree with the change), since the latter can go > unnoticed by > those who don't pay attention to NEWS.txt > > Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith < > bened...@apache.org> escreveu: > > > What's the value of the yaml? The user is likely to have upgraded to > > latest 3.x as part of the upgrade process to 4.0, so they'll already > have > > had a decision made for them. If correctness didn't break anything, > there > > doesn't any longer seem much point in offering a choice? > > > > On 23/11/2020, 22:45, "Brandon Williams" wrote: > > > > +1 to both as well. > > > > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston > > > > wrote: > > > > > +1 to correctness, and I like the yaml idea > > > > > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta < > pauloricard...@gmail.com > > > > > > wrote: > > > > > > > > +1 to defaulting for correctness. > > > > > > > > In addition to that, how about making it a mandatory > cassandra.yaml > > > > property defaulting to correctness? This would make upgrades > with > > an old > > > > cassandra.yaml fail unless an option is explicitly specified, > > making > > > > operators aware of the issue and forcing them to make a > choice. > > > > > > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < > > > >> benjamin.le...@datastax.com> escreveu: > > > >> > > > >> Thank you very much to everybody that provided feedback. It > > helped a > > > lot to > > > >> limit our options. > > > >> > > > >> Unfortunately, it seems that some poor soul (me, really!!!) > will > > have to > > > >> make the final call between #3 and #4. > > > >> > > > >> If I reformulate the question to: Do we default to > *correctness > > *or to > > > >> *performance*? > > > >> > > > >> I would choose to default to *correctness*. > > > >> > > > >> Of course the situation is more complex than that but it > seems > > that > > > >> somebody has to make a call and live with it. It seems to > me that > > being > > > >> blamed for choosing correctness is easier to live with ;-) > > > >> > > > >> Benjamin > > > >> > > > >> PS: I tried to push the choice on Sylvain but he dodged the > > bullet. > > > >> > > > >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < > > > >> bened...@apache.org> > > > >> wrote: > > > >> > > > >>> I think I meant #4 __♂️ > > > >>> > > > >>> On 20/11/2020, 21:11, "Blake Eggleston" > > > > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Isn't the plan to change LWT implementation (and performance expectation) in a patch version? This is a breaking change by itself, I'm just proposing to make the trade-off choice explicit in the yaml to prevent unexpected performance degradation during upgrade (for users who are not aware of the change). Just to make it clear, I'm proposing having a "lwt_legacy_mode: false" uncommented in the default yaml with a descriptive comment about CASSANDRA-12126, so new users will always get the new behavior, but users using a yaml template based on a previous 3.X version will not be able to start the node because this property will be missing. I believe the majority of operators will just update their yaml with "lwt_legacy_mode: false" and move on with their upgrades, but people wanting to keep the previous performance will become aware of the breaking change and set it to true. Em seg., 23 de nov. de 2020 às 21:07, Benedict Elliott Smith < bened...@apache.org> escreveu: > What do you mean by minor upgrade? We can't break patch upgrades for any > of 3.x, as this could also cause surprise outages. > > On 23/11/2020, 23:51, "Paulo Motta" wrote: > > I was thinking about the YAML requirement during the 3.X minor > upgrade to > make the decision explicit (need to update yaml) rather than implicit > (by > upgrading you agree with the change), since the latter can go > unnoticed by > those who don't pay attention to NEWS.txt > > Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith < > bened...@apache.org> escreveu: > > > What's the value of the yaml? The user is likely to have upgraded to > > latest 3.x as part of the upgrade process to 4.0, so they'll already > have > > had a decision made for them. If correctness didn't break anything, > there > > doesn't any longer seem much point in offering a choice? > > > > On 23/11/2020, 22:45, "Brandon Williams" wrote: > > > > +1 to both as well. > > > > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston > > > > wrote: > > > > > +1 to correctness, and I like the yaml idea > > > > > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta < > pauloricard...@gmail.com > > > > > > wrote: > > > > > > > > +1 to defaulting for correctness. > > > > > > > > In addition to that, how about making it a mandatory > cassandra.yaml > > > > property defaulting to correctness? This would make upgrades > with > > an old > > > > cassandra.yaml fail unless an option is explicitly specified, > > making > > > > operators aware of the issue and forcing them to make a > choice. > > > > > > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < > > > >> benjamin.le...@datastax.com> escreveu: > > > >> > > > >> Thank you very much to everybody that provided feedback. It > > helped a > > > lot to > > > >> limit our options. > > > >> > > > >> Unfortunately, it seems that some poor soul (me, really!!!) > will > > have to > > > >> make the final call between #3 and #4. > > > >> > > > >> If I reformulate the question to: Do we default to > *correctness > > *or to > > > >> *performance*? > > > >> > > > >> I would choose to default to *correctness*. > > > >> > > > >> Of course the situation is more complex than that but it > seems > > that > > > >> somebody has to make a call and live with it. It seems to > me that > > being > > > >> blamed for choosing correctness is easier to live with ;-) > > > >> > > > >> Benjamin > > > >> > > > >> PS: I tried to push the choice on Sylvain but he dodged the > > bullet. > > > >> > > > >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < > > > >> bened...@apache.org> > > > >> wrote: > > > >> > > > >>> I think I meant #4 __♂️ > > > >>> > > > >>> On 20/11/2020, 21:11, "Blake Eggleston" > > > > > > > > >>> wrote: > > > >>> > > > >>>I’d also prefer #3 over #4 > > > >>> > > > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < > > > >>> bened...@apache.org> wrote: > > > > > > Well, I expressed a preference for #3 over #4, > particularly for > > > >> the > > > >>> 3.x series. However at this point, I think the lack of a > clear > > project > > > >>> decision means we can punt it back to you and Sylvain to > make > > the final > > > >>> call. > > > > > > On 20/11/2020, 16:23, "Benjamin Lerer" < > > > >> benjamin.le...@datastax.com> > > > >>> wrote: > > > > > > I will try to summarize the discussion to clarify
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
What do you mean by minor upgrade? We can't break patch upgrades for any of 3.x, as this could also cause surprise outages. On 23/11/2020, 23:51, "Paulo Motta" wrote: I was thinking about the YAML requirement during the 3.X minor upgrade to make the decision explicit (need to update yaml) rather than implicit (by upgrading you agree with the change), since the latter can go unnoticed by those who don't pay attention to NEWS.txt Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith < bened...@apache.org> escreveu: > What's the value of the yaml? The user is likely to have upgraded to > latest 3.x as part of the upgrade process to 4.0, so they'll already have > had a decision made for them. If correctness didn't break anything, there > doesn't any longer seem much point in offering a choice? > > On 23/11/2020, 22:45, "Brandon Williams" wrote: > > +1 to both as well. > > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston > > wrote: > > > +1 to correctness, and I like the yaml idea > > > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta > > > wrote: > > > > > > +1 to defaulting for correctness. > > > > > > In addition to that, how about making it a mandatory cassandra.yaml > > > property defaulting to correctness? This would make upgrades with > an old > > > cassandra.yaml fail unless an option is explicitly specified, > making > > > operators aware of the issue and forcing them to make a choice. > > > > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < > > >> benjamin.le...@datastax.com> escreveu: > > >> > > >> Thank you very much to everybody that provided feedback. It > helped a > > lot to > > >> limit our options. > > >> > > >> Unfortunately, it seems that some poor soul (me, really!!!) will > have to > > >> make the final call between #3 and #4. > > >> > > >> If I reformulate the question to: Do we default to *correctness > *or to > > >> *performance*? > > >> > > >> I would choose to default to *correctness*. > > >> > > >> Of course the situation is more complex than that but it seems > that > > >> somebody has to make a call and live with it. It seems to me that > being > > >> blamed for choosing correctness is easier to live with ;-) > > >> > > >> Benjamin > > >> > > >> PS: I tried to push the choice on Sylvain but he dodged the > bullet. > > >> > > >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < > > >> bened...@apache.org> > > >> wrote: > > >> > > >>> I think I meant #4 __♂️ > > >>> > > >>> On 20/11/2020, 21:11, "Blake Eggleston" > > > > > >>> wrote: > > >>> > > >>>I’d also prefer #3 over #4 > > >>> > > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < > > >>> bened...@apache.org> wrote: > > > > Well, I expressed a preference for #3 over #4, particularly for > > >> the > > >>> 3.x series. However at this point, I think the lack of a clear > project > > >>> decision means we can punt it back to you and Sylvain to make > the final > > >>> call. > > > > On 20/11/2020, 16:23, "Benjamin Lerer" < > > >> benjamin.le...@datastax.com> > > >>> wrote: > > > > I will try to summarize the discussion to clarify the outcome. > > > > Mick is in favor of #4 > > Summanth is in favor of #4 > > Sylvain answer was not clear for me. I understood it like I > > >>> prefer #3 to #4 > > and I am also fine with #1 > > Jeff is in favor of #3 and will understand #4 > > David is in favor #3 (fix bug and add flag to roll back to old > > >>> behavior) in > > 4.0 and #4 in 3.0 and 3.11 > > > > Do not hesitate to correct me if I misunderstood your answer. > > > > Based on these answers it seems clear that most people prefer > to > > >>> go for #3 > > or #4. > > > > The choice between #3 (fix correctness opt-in to current > > >>> behavior) and #4 > > (current behavior opt-in to correctness) is a bit less clear > > >>> specially if > > we consider the 3.X branches or 4.0. > > > > Does anybody as some idea on how to choose between those 2 > > >>> choices or some > > extra opinions on #3 versus #4? > > >>
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
I was thinking about the YAML requirement during the 3.X minor upgrade to make the decision explicit (need to update yaml) rather than implicit (by upgrading you agree with the change), since the latter can go unnoticed by those who don't pay attention to NEWS.txt Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith < bened...@apache.org> escreveu: > What's the value of the yaml? The user is likely to have upgraded to > latest 3.x as part of the upgrade process to 4.0, so they'll already have > had a decision made for them. If correctness didn't break anything, there > doesn't any longer seem much point in offering a choice? > > On 23/11/2020, 22:45, "Brandon Williams" wrote: > > +1 to both as well. > > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston > > wrote: > > > +1 to correctness, and I like the yaml idea > > > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta > > > wrote: > > > > > > +1 to defaulting for correctness. > > > > > > In addition to that, how about making it a mandatory cassandra.yaml > > > property defaulting to correctness? This would make upgrades with > an old > > > cassandra.yaml fail unless an option is explicitly specified, > making > > > operators aware of the issue and forcing them to make a choice. > > > > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < > > >> benjamin.le...@datastax.com> escreveu: > > >> > > >> Thank you very much to everybody that provided feedback. It > helped a > > lot to > > >> limit our options. > > >> > > >> Unfortunately, it seems that some poor soul (me, really!!!) will > have to > > >> make the final call between #3 and #4. > > >> > > >> If I reformulate the question to: Do we default to *correctness > *or to > > >> *performance*? > > >> > > >> I would choose to default to *correctness*. > > >> > > >> Of course the situation is more complex than that but it seems > that > > >> somebody has to make a call and live with it. It seems to me that > being > > >> blamed for choosing correctness is easier to live with ;-) > > >> > > >> Benjamin > > >> > > >> PS: I tried to push the choice on Sylvain but he dodged the > bullet. > > >> > > >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < > > >> bened...@apache.org> > > >> wrote: > > >> > > >>> I think I meant #4 __♂️ > > >>> > > >>> On 20/11/2020, 21:11, "Blake Eggleston" > > > > > >>> wrote: > > >>> > > >>>I’d also prefer #3 over #4 > > >>> > > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < > > >>> bened...@apache.org> wrote: > > > > Well, I expressed a preference for #3 over #4, particularly for > > >> the > > >>> 3.x series. However at this point, I think the lack of a clear > project > > >>> decision means we can punt it back to you and Sylvain to make > the final > > >>> call. > > > > On 20/11/2020, 16:23, "Benjamin Lerer" < > > >> benjamin.le...@datastax.com> > > >>> wrote: > > > > I will try to summarize the discussion to clarify the outcome. > > > > Mick is in favor of #4 > > Summanth is in favor of #4 > > Sylvain answer was not clear for me. I understood it like I > > >>> prefer #3 to #4 > > and I am also fine with #1 > > Jeff is in favor of #3 and will understand #4 > > David is in favor #3 (fix bug and add flag to roll back to old > > >>> behavior) in > > 4.0 and #4 in 3.0 and 3.11 > > > > Do not hesitate to correct me if I misunderstood your answer. > > > > Based on these answers it seems clear that most people prefer > to > > >>> go for #3 > > or #4. > > > > The choice between #3 (fix correctness opt-in to current > > >>> behavior) and #4 > > (current behavior opt-in to correctness) is a bit less clear > > >>> specially if > > we consider the 3.X branches or 4.0. > > > > Does anybody as some idea on how to choose between those 2 > > >>> choices or some > > extra opinions on #3 versus #4? > > > > > > > > > > > > > > > On Wed, Nov 18, 2020 at 9:45 PM David Capwell < > > >>> dcapw...@gmail.com> wrote: > > > > > > I feel that #4 (fix bug and add flag to roll back to old > behavior) > > >>> is best. > > > > > > About the alternative implementation, I am fine adding it to > 3.x > > >>> and 4.0, > > > but should treat it as a different path disabled by default > that > > >>> you can > > > opt-into, with a plan to opt-in by default "eventually". > > > > > > On Wed, Nov 18, 2020 at 11:10 AM Benedict E
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
What's the value of the yaml? The user is likely to have upgraded to latest 3.x as part of the upgrade process to 4.0, so they'll already have had a decision made for them. If correctness didn't break anything, there doesn't any longer seem much point in offering a choice? On 23/11/2020, 22:45, "Brandon Williams" wrote: +1 to both as well. On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston wrote: > +1 to correctness, and I like the yaml idea > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta > wrote: > > > > +1 to defaulting for correctness. > > > > In addition to that, how about making it a mandatory cassandra.yaml > > property defaulting to correctness? This would make upgrades with an old > > cassandra.yaml fail unless an option is explicitly specified, making > > operators aware of the issue and forcing them to make a choice. > > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < > >> benjamin.le...@datastax.com> escreveu: > >> > >> Thank you very much to everybody that provided feedback. It helped a > lot to > >> limit our options. > >> > >> Unfortunately, it seems that some poor soul (me, really!!!) will have to > >> make the final call between #3 and #4. > >> > >> If I reformulate the question to: Do we default to *correctness *or to > >> *performance*? > >> > >> I would choose to default to *correctness*. > >> > >> Of course the situation is more complex than that but it seems that > >> somebody has to make a call and live with it. It seems to me that being > >> blamed for choosing correctness is easier to live with ;-) > >> > >> Benjamin > >> > >> PS: I tried to push the choice on Sylvain but he dodged the bullet. > >> > >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < > >> bened...@apache.org> > >> wrote: > >> > >>> I think I meant #4 __♂️ > >>> > >>> On 20/11/2020, 21:11, "Blake Eggleston" > > >>> wrote: > >>> > >>>I’d also prefer #3 over #4 > >>> > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < > >>> bened...@apache.org> wrote: > > Well, I expressed a preference for #3 over #4, particularly for > >> the > >>> 3.x series. However at this point, I think the lack of a clear project > >>> decision means we can punt it back to you and Sylvain to make the final > >>> call. > > On 20/11/2020, 16:23, "Benjamin Lerer" < > >> benjamin.le...@datastax.com> > >>> wrote: > > I will try to summarize the discussion to clarify the outcome. > > Mick is in favor of #4 > Summanth is in favor of #4 > Sylvain answer was not clear for me. I understood it like I > >>> prefer #3 to #4 > and I am also fine with #1 > Jeff is in favor of #3 and will understand #4 > David is in favor #3 (fix bug and add flag to roll back to old > >>> behavior) in > 4.0 and #4 in 3.0 and 3.11 > > Do not hesitate to correct me if I misunderstood your answer. > > Based on these answers it seems clear that most people prefer to > >>> go for #3 > or #4. > > The choice between #3 (fix correctness opt-in to current > >>> behavior) and #4 > (current behavior opt-in to correctness) is a bit less clear > >>> specially if > we consider the 3.X branches or 4.0. > > Does anybody as some idea on how to choose between those 2 > >>> choices or some > extra opinions on #3 versus #4? > > > > > > > > On Wed, Nov 18, 2020 at 9:45 PM David Capwell < > >>> dcapw...@gmail.com> wrote: > > > > I feel that #4 (fix bug and add flag to roll back to old behavior) > >>> is best. > > > > About the alternative implementation, I am fine adding it to 3.x > >>> and 4.0, > > but should treat it as a different path disabled by default that > >>> you can > > opt-into, with a plan to opt-in by default "eventually". > > > > On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < > > bened...@apache.org> > > wrote: > > > >> Perhaps there might be broader appetite to weigh in on which > >> major > >> releases we might target for work that fixes the correctness bug > >>> without > >> serious performance regression? > >> > >> i.e., if we were to fix the correctness bug now, introducing a > >>> serious > >> performance regression (either opt-in or opt-out), but were to > >>> land work > >> without this problem for 5.0, would there be appetite to backport > >>> this > > work
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
+1 to both as well. On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston wrote: > +1 to correctness, and I like the yaml idea > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta > wrote: > > > > +1 to defaulting for correctness. > > > > In addition to that, how about making it a mandatory cassandra.yaml > > property defaulting to correctness? This would make upgrades with an old > > cassandra.yaml fail unless an option is explicitly specified, making > > operators aware of the issue and forcing them to make a choice. > > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < > >> benjamin.le...@datastax.com> escreveu: > >> > >> Thank you very much to everybody that provided feedback. It helped a > lot to > >> limit our options. > >> > >> Unfortunately, it seems that some poor soul (me, really!!!) will have to > >> make the final call between #3 and #4. > >> > >> If I reformulate the question to: Do we default to *correctness *or to > >> *performance*? > >> > >> I would choose to default to *correctness*. > >> > >> Of course the situation is more complex than that but it seems that > >> somebody has to make a call and live with it. It seems to me that being > >> blamed for choosing correctness is easier to live with ;-) > >> > >> Benjamin > >> > >> PS: I tried to push the choice on Sylvain but he dodged the bullet. > >> > >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < > >> bened...@apache.org> > >> wrote: > >> > >>> I think I meant #4 __♂️ > >>> > >>> On 20/11/2020, 21:11, "Blake Eggleston" > > >>> wrote: > >>> > >>>I’d also prefer #3 over #4 > >>> > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < > >>> bened...@apache.org> wrote: > > Well, I expressed a preference for #3 over #4, particularly for > >> the > >>> 3.x series. However at this point, I think the lack of a clear project > >>> decision means we can punt it back to you and Sylvain to make the final > >>> call. > > On 20/11/2020, 16:23, "Benjamin Lerer" < > >> benjamin.le...@datastax.com> > >>> wrote: > > I will try to summarize the discussion to clarify the outcome. > > Mick is in favor of #4 > Summanth is in favor of #4 > Sylvain answer was not clear for me. I understood it like I > >>> prefer #3 to #4 > and I am also fine with #1 > Jeff is in favor of #3 and will understand #4 > David is in favor #3 (fix bug and add flag to roll back to old > >>> behavior) in > 4.0 and #4 in 3.0 and 3.11 > > Do not hesitate to correct me if I misunderstood your answer. > > Based on these answers it seems clear that most people prefer to > >>> go for #3 > or #4. > > The choice between #3 (fix correctness opt-in to current > >>> behavior) and #4 > (current behavior opt-in to correctness) is a bit less clear > >>> specially if > we consider the 3.X branches or 4.0. > > Does anybody as some idea on how to choose between those 2 > >>> choices or some > extra opinions on #3 versus #4? > > > > > > > > On Wed, Nov 18, 2020 at 9:45 PM David Capwell < > >>> dcapw...@gmail.com> wrote: > > > > I feel that #4 (fix bug and add flag to roll back to old behavior) > >>> is best. > > > > About the alternative implementation, I am fine adding it to 3.x > >>> and 4.0, > > but should treat it as a different path disabled by default that > >>> you can > > opt-into, with a plan to opt-in by default "eventually". > > > > On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < > > bened...@apache.org> > > wrote: > > > >> Perhaps there might be broader appetite to weigh in on which > >> major > >> releases we might target for work that fixes the correctness bug > >>> without > >> serious performance regression? > >> > >> i.e., if we were to fix the correctness bug now, introducing a > >>> serious > >> performance regression (either opt-in or opt-out), but were to > >>> land work > >> without this problem for 5.0, would there be appetite to backport > >>> this > > work > >> to any of 4.0, 3.11 or 3.0? > >> > >> > >> On 18/11/2020, 18:31, "Jeff Jirsa" wrote: > >> > >> This is complicated and relatively few people on earth > >>> understand it, > >> so > >> having little feedback is mostly expected, unfortunately. > >> > >> My normal emotional response is "correctness is required, > >>> opt-in to > >> performance improvements that sacrifice strict correctness", > >>> but I'm > >> also > >> sure this is going to surprise people, and would understand / > >>> accept > > #4 > >> (default to current, opt-in to correct). > >> > >> > >> On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < > >> bened...@apache.org> > >> wrote: > >> > >>> It doesn't seem like there's much enthusiasm for a
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
+1 to correctness, and I like the yaml idea > On Nov 23, 2020, at 4:20 AM, Paulo Motta wrote: > > +1 to defaulting for correctness. > > In addition to that, how about making it a mandatory cassandra.yaml > property defaulting to correctness? This would make upgrades with an old > cassandra.yaml fail unless an option is explicitly specified, making > operators aware of the issue and forcing them to make a choice. > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < >> benjamin.le...@datastax.com> escreveu: >> >> Thank you very much to everybody that provided feedback. It helped a lot to >> limit our options. >> >> Unfortunately, it seems that some poor soul (me, really!!!) will have to >> make the final call between #3 and #4. >> >> If I reformulate the question to: Do we default to *correctness *or to >> *performance*? >> >> I would choose to default to *correctness*. >> >> Of course the situation is more complex than that but it seems that >> somebody has to make a call and live with it. It seems to me that being >> blamed for choosing correctness is easier to live with ;-) >> >> Benjamin >> >> PS: I tried to push the choice on Sylvain but he dodged the bullet. >> >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < >> bened...@apache.org> >> wrote: >> >>> I think I meant #4 __♂️ >>> >>> On 20/11/2020, 21:11, "Blake Eggleston" >>> wrote: >>> >>>I’d also prefer #3 over #4 >>> On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < >>> bened...@apache.org> wrote: Well, I expressed a preference for #3 over #4, particularly for >> the >>> 3.x series. However at this point, I think the lack of a clear project >>> decision means we can punt it back to you and Sylvain to make the final >>> call. On 20/11/2020, 16:23, "Benjamin Lerer" < >> benjamin.le...@datastax.com> >>> wrote: I will try to summarize the discussion to clarify the outcome. Mick is in favor of #4 Summanth is in favor of #4 Sylvain answer was not clear for me. I understood it like I >>> prefer #3 to #4 and I am also fine with #1 Jeff is in favor of #3 and will understand #4 David is in favor #3 (fix bug and add flag to roll back to old >>> behavior) in 4.0 and #4 in 3.0 and 3.11 Do not hesitate to correct me if I misunderstood your answer. Based on these answers it seems clear that most people prefer to >>> go for #3 or #4. The choice between #3 (fix correctness opt-in to current >>> behavior) and #4 (current behavior opt-in to correctness) is a bit less clear >>> specially if we consider the 3.X branches or 4.0. Does anybody as some idea on how to choose between those 2 >>> choices or some extra opinions on #3 versus #4? > On Wed, Nov 18, 2020 at 9:45 PM David Capwell < >>> dcapw...@gmail.com> wrote: > > I feel that #4 (fix bug and add flag to roll back to old behavior) >>> is best. > > About the alternative implementation, I am fine adding it to 3.x >>> and 4.0, > but should treat it as a different path disabled by default that >>> you can > opt-into, with a plan to opt-in by default "eventually". > > On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < > bened...@apache.org> > wrote: > >> Perhaps there might be broader appetite to weigh in on which >> major >> releases we might target for work that fixes the correctness bug >>> without >> serious performance regression? >> >> i.e., if we were to fix the correctness bug now, introducing a >>> serious >> performance regression (either opt-in or opt-out), but were to >>> land work >> without this problem for 5.0, would there be appetite to backport >>> this > work >> to any of 4.0, 3.11 or 3.0? >> >> >> On 18/11/2020, 18:31, "Jeff Jirsa" wrote: >> >> This is complicated and relatively few people on earth >>> understand it, >> so >> having little feedback is mostly expected, unfortunately. >> >> My normal emotional response is "correctness is required, >>> opt-in to >> performance improvements that sacrifice strict correctness", >>> but I'm >> also >> sure this is going to surprise people, and would understand / >>> accept > #4 >> (default to current, opt-in to correct). >> >> >> On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < >> bened...@apache.org> >> wrote: >> >>> It doesn't seem like there's much enthusiasm for any of the >>> options >>> available here... >>> >>> On 12/11/2020, 14:37, "Benedict Elliott Smith" < > bened...@apache.org >>> >>> wrote: >>> Is the new implementation a separate, distinctly modularized >> new >>> body of work >>> >>> It’s primarily a distinct, modulari
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
+1 to defaulting for correctness. In addition to that, how about making it a mandatory cassandra.yaml property defaulting to correctness? This would make upgrades with an old cassandra.yaml fail unless an option is explicitly specified, making operators aware of the issue and forcing them to make a choice. Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer < benjamin.le...@datastax.com> escreveu: > Thank you very much to everybody that provided feedback. It helped a lot to > limit our options. > > Unfortunately, it seems that some poor soul (me, really!!!) will have to > make the final call between #3 and #4. > > If I reformulate the question to: Do we default to *correctness *or to > *performance*? > > I would choose to default to *correctness*. > > Of course the situation is more complex than that but it seems that > somebody has to make a call and live with it. It seems to me that being > blamed for choosing correctness is easier to live with ;-) > > Benjamin > > PS: I tried to push the choice on Sylvain but he dodged the bullet. > > On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith < > bened...@apache.org> > wrote: > > > I think I meant #4 __♂️ > > > > On 20/11/2020, 21:11, "Blake Eggleston" > > wrote: > > > > I’d also prefer #3 over #4 > > > > > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < > > bened...@apache.org> wrote: > > > > > > Well, I expressed a preference for #3 over #4, particularly for > the > > 3.x series. However at this point, I think the lack of a clear project > > decision means we can punt it back to you and Sylvain to make the final > > call. > > > > > > On 20/11/2020, 16:23, "Benjamin Lerer" < > benjamin.le...@datastax.com> > > wrote: > > > > > >I will try to summarize the discussion to clarify the outcome. > > > > > >Mick is in favor of #4 > > >Summanth is in favor of #4 > > >Sylvain answer was not clear for me. I understood it like I > > prefer #3 to #4 > > >and I am also fine with #1 > > >Jeff is in favor of #3 and will understand #4 > > >David is in favor #3 (fix bug and add flag to roll back to old > > behavior) in > > >4.0 and #4 in 3.0 and 3.11 > > > > > >Do not hesitate to correct me if I misunderstood your answer. > > > > > >Based on these answers it seems clear that most people prefer to > > go for #3 > > >or #4. > > > > > >The choice between #3 (fix correctness opt-in to current > > behavior) and #4 > > >(current behavior opt-in to correctness) is a bit less clear > > specially if > > >we consider the 3.X branches or 4.0. > > > > > >Does anybody as some idea on how to choose between those 2 > > choices or some > > >extra opinions on #3 versus #4? > > > > > > > > > > > > > > > > > > > > >>On Wed, Nov 18, 2020 at 9:45 PM David Capwell < > > dcapw...@gmail.com> wrote: > > >> > > >> I feel that #4 (fix bug and add flag to roll back to old behavior) > > is best. > > >> > > >> About the alternative implementation, I am fine adding it to 3.x > > and 4.0, > > >> but should treat it as a different path disabled by default that > > you can > > >> opt-into, with a plan to opt-in by default "eventually". > > >> > > >> On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < > > >> bened...@apache.org> > > >> wrote: > > >> > > >>> Perhaps there might be broader appetite to weigh in on which > major > > >>> releases we might target for work that fixes the correctness bug > > without > > >>> serious performance regression? > > >>> > > >>> i.e., if we were to fix the correctness bug now, introducing a > > serious > > >>> performance regression (either opt-in or opt-out), but were to > > land work > > >>> without this problem for 5.0, would there be appetite to backport > > this > > >> work > > >>> to any of 4.0, 3.11 or 3.0? > > >>> > > >>> > > >>> On 18/11/2020, 18:31, "Jeff Jirsa" wrote: > > >>> > > >>>This is complicated and relatively few people on earth > > understand it, > > >>> so > > >>>having little feedback is mostly expected, unfortunately. > > >>> > > >>>My normal emotional response is "correctness is required, > > opt-in to > > >>>performance improvements that sacrifice strict correctness", > > but I'm > > >>> also > > >>>sure this is going to surprise people, and would understand / > > accept > > >> #4 > > >>>(default to current, opt-in to correct). > > >>> > > >>> > > >>>On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < > > >>> bened...@apache.org> > > >>>wrote: > > >>> > > It doesn't seem like there's much enthusiasm for any of the > > options > > available here... > > > > On 12/11/2020, 14:37, "Benedict Ell
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Thank you very much to everybody that provided feedback. It helped a lot to limit our options. Unfortunately, it seems that some poor soul (me, really!!!) will have to make the final call between #3 and #4. If I reformulate the question to: Do we default to *correctness *or to *performance*? I would choose to default to *correctness*. Of course the situation is more complex than that but it seems that somebody has to make a call and live with it. It seems to me that being blamed for choosing correctness is easier to live with ;-) Benjamin PS: I tried to push the choice on Sylvain but he dodged the bullet. On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith wrote: > I think I meant #4 __♂️ > > On 20/11/2020, 21:11, "Blake Eggleston" > wrote: > > I’d also prefer #3 over #4 > > > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith < > bened...@apache.org> wrote: > > > > Well, I expressed a preference for #3 over #4, particularly for the > 3.x series. However at this point, I think the lack of a clear project > decision means we can punt it back to you and Sylvain to make the final > call. > > > > On 20/11/2020, 16:23, "Benjamin Lerer" > wrote: > > > >I will try to summarize the discussion to clarify the outcome. > > > >Mick is in favor of #4 > >Summanth is in favor of #4 > >Sylvain answer was not clear for me. I understood it like I > prefer #3 to #4 > >and I am also fine with #1 > >Jeff is in favor of #3 and will understand #4 > >David is in favor #3 (fix bug and add flag to roll back to old > behavior) in > >4.0 and #4 in 3.0 and 3.11 > > > >Do not hesitate to correct me if I misunderstood your answer. > > > >Based on these answers it seems clear that most people prefer to > go for #3 > >or #4. > > > >The choice between #3 (fix correctness opt-in to current > behavior) and #4 > >(current behavior opt-in to correctness) is a bit less clear > specially if > >we consider the 3.X branches or 4.0. > > > >Does anybody as some idea on how to choose between those 2 > choices or some > >extra opinions on #3 versus #4? > > > > > > > > > > > > > >>On Wed, Nov 18, 2020 at 9:45 PM David Capwell < > dcapw...@gmail.com> wrote: > >> > >> I feel that #4 (fix bug and add flag to roll back to old behavior) > is best. > >> > >> About the alternative implementation, I am fine adding it to 3.x > and 4.0, > >> but should treat it as a different path disabled by default that > you can > >> opt-into, with a plan to opt-in by default "eventually". > >> > >> On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < > >> bened...@apache.org> > >> wrote: > >> > >>> Perhaps there might be broader appetite to weigh in on which major > >>> releases we might target for work that fixes the correctness bug > without > >>> serious performance regression? > >>> > >>> i.e., if we were to fix the correctness bug now, introducing a > serious > >>> performance regression (either opt-in or opt-out), but were to > land work > >>> without this problem for 5.0, would there be appetite to backport > this > >> work > >>> to any of 4.0, 3.11 or 3.0? > >>> > >>> > >>> On 18/11/2020, 18:31, "Jeff Jirsa" wrote: > >>> > >>>This is complicated and relatively few people on earth > understand it, > >>> so > >>>having little feedback is mostly expected, unfortunately. > >>> > >>>My normal emotional response is "correctness is required, > opt-in to > >>>performance improvements that sacrifice strict correctness", > but I'm > >>> also > >>>sure this is going to surprise people, and would understand / > accept > >> #4 > >>>(default to current, opt-in to correct). > >>> > >>> > >>>On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < > >>> bened...@apache.org> > >>>wrote: > >>> > It doesn't seem like there's much enthusiasm for any of the > options > available here... > > On 12/11/2020, 14:37, "Benedict Elliott Smith" < > >> bened...@apache.org > > wrote: > > > Is the new implementation a separate, distinctly modularized > >>> new > body of work > > It’s primarily a distinct, modularised and new body of work, > >>> however > there is some shared code that has been modified - namely > >>> PaxosState, in > which legacy code is maintained but modified for compatibility, > and > >>> the > system.paxos table (which receives a new column, and slightly > >>> modified > serialization code). It is conceptually an optimised version of > >> the > existing algorithm. > > >>>
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
I think I meant #4 __♂️ On 20/11/2020, 21:11, "Blake Eggleston" wrote: I’d also prefer #3 over #4 > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith wrote: > > Well, I expressed a preference for #3 over #4, particularly for the 3.x series. However at this point, I think the lack of a clear project decision means we can punt it back to you and Sylvain to make the final call. > > On 20/11/2020, 16:23, "Benjamin Lerer" wrote: > >I will try to summarize the discussion to clarify the outcome. > >Mick is in favor of #4 >Summanth is in favor of #4 >Sylvain answer was not clear for me. I understood it like I prefer #3 to #4 >and I am also fine with #1 >Jeff is in favor of #3 and will understand #4 >David is in favor #3 (fix bug and add flag to roll back to old behavior) in >4.0 and #4 in 3.0 and 3.11 > >Do not hesitate to correct me if I misunderstood your answer. > >Based on these answers it seems clear that most people prefer to go for #3 >or #4. > >The choice between #3 (fix correctness opt-in to current behavior) and #4 >(current behavior opt-in to correctness) is a bit less clear specially if >we consider the 3.X branches or 4.0. > >Does anybody as some idea on how to choose between those 2 choices or some >extra opinions on #3 versus #4? > > > > > > >>On Wed, Nov 18, 2020 at 9:45 PM David Capwell wrote: >> >> I feel that #4 (fix bug and add flag to roll back to old behavior) is best. >> >> About the alternative implementation, I am fine adding it to 3.x and 4.0, >> but should treat it as a different path disabled by default that you can >> opt-into, with a plan to opt-in by default "eventually". >> >> On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < >> bened...@apache.org> >> wrote: >> >>> Perhaps there might be broader appetite to weigh in on which major >>> releases we might target for work that fixes the correctness bug without >>> serious performance regression? >>> >>> i.e., if we were to fix the correctness bug now, introducing a serious >>> performance regression (either opt-in or opt-out), but were to land work >>> without this problem for 5.0, would there be appetite to backport this >> work >>> to any of 4.0, 3.11 or 3.0? >>> >>> >>> On 18/11/2020, 18:31, "Jeff Jirsa" wrote: >>> >>>This is complicated and relatively few people on earth understand it, >>> so >>>having little feedback is mostly expected, unfortunately. >>> >>>My normal emotional response is "correctness is required, opt-in to >>>performance improvements that sacrifice strict correctness", but I'm >>> also >>>sure this is going to surprise people, and would understand / accept >> #4 >>>(default to current, opt-in to correct). >>> >>> >>>On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < >>> bened...@apache.org> >>>wrote: >>> It doesn't seem like there's much enthusiasm for any of the options available here... On 12/11/2020, 14:37, "Benedict Elliott Smith" < >> bened...@apache.org wrote: > Is the new implementation a separate, distinctly modularized >>> new body of work It’s primarily a distinct, modularised and new body of work, >>> however there is some shared code that has been modified - namely >>> PaxosState, in which legacy code is maintained but modified for compatibility, and >>> the system.paxos table (which receives a new column, and slightly >>> modified serialization code). It is conceptually an optimised version of >> the existing algorithm. If there's a chance of being of value to 4.0, I can try to put >>> up a patch next week alongside a high level description of the changes. > But a performance regression is a regression, I'm not >>> shrugging it off. I don't want to give the impression I'm shrugging off the >>> correctness issue either. It's a serious issue to fix, but since all successful >>> updates to the database are linearizable, I think it's likely that many applications behave correctly with the present semantics, or at >> least encounter only transient errors. No doubt many also do not, but I >>> have no idea of the ratio. The regression isn't itself a simple issue either - depending >> on >>> the topology and message latencies it is not difficult to produce >>> inescapable c
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
I’d also prefer #3 over #4 > On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith > wrote: > > Well, I expressed a preference for #3 over #4, particularly for the 3.x > series. However at this point, I think the lack of a clear project decision > means we can punt it back to you and Sylvain to make the final call. > > On 20/11/2020, 16:23, "Benjamin Lerer" wrote: > >I will try to summarize the discussion to clarify the outcome. > >Mick is in favor of #4 >Summanth is in favor of #4 >Sylvain answer was not clear for me. I understood it like I prefer #3 to #4 >and I am also fine with #1 >Jeff is in favor of #3 and will understand #4 >David is in favor #3 (fix bug and add flag to roll back to old behavior) in >4.0 and #4 in 3.0 and 3.11 > >Do not hesitate to correct me if I misunderstood your answer. > >Based on these answers it seems clear that most people prefer to go for #3 >or #4. > >The choice between #3 (fix correctness opt-in to current behavior) and #4 >(current behavior opt-in to correctness) is a bit less clear specially if >we consider the 3.X branches or 4.0. > >Does anybody as some idea on how to choose between those 2 choices or some >extra opinions on #3 versus #4? > > > > > > >>On Wed, Nov 18, 2020 at 9:45 PM David Capwell wrote: >> >> I feel that #4 (fix bug and add flag to roll back to old behavior) is best. >> >> About the alternative implementation, I am fine adding it to 3.x and 4.0, >> but should treat it as a different path disabled by default that you can >> opt-into, with a plan to opt-in by default "eventually". >> >> On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < >> bened...@apache.org> >> wrote: >> >>> Perhaps there might be broader appetite to weigh in on which major >>> releases we might target for work that fixes the correctness bug without >>> serious performance regression? >>> >>> i.e., if we were to fix the correctness bug now, introducing a serious >>> performance regression (either opt-in or opt-out), but were to land work >>> without this problem for 5.0, would there be appetite to backport this >> work >>> to any of 4.0, 3.11 or 3.0? >>> >>> >>> On 18/11/2020, 18:31, "Jeff Jirsa" wrote: >>> >>>This is complicated and relatively few people on earth understand it, >>> so >>>having little feedback is mostly expected, unfortunately. >>> >>>My normal emotional response is "correctness is required, opt-in to >>>performance improvements that sacrifice strict correctness", but I'm >>> also >>>sure this is going to surprise people, and would understand / accept >> #4 >>>(default to current, opt-in to correct). >>> >>> >>>On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < >>> bened...@apache.org> >>>wrote: >>> It doesn't seem like there's much enthusiasm for any of the options available here... On 12/11/2020, 14:37, "Benedict Elliott Smith" < >> bened...@apache.org wrote: > Is the new implementation a separate, distinctly modularized >>> new body of work It’s primarily a distinct, modularised and new body of work, >>> however there is some shared code that has been modified - namely >>> PaxosState, in which legacy code is maintained but modified for compatibility, and >>> the system.paxos table (which receives a new column, and slightly >>> modified serialization code). It is conceptually an optimised version of >> the existing algorithm. If there's a chance of being of value to 4.0, I can try to put >>> up a patch next week alongside a high level description of the changes. > But a performance regression is a regression, I'm not >>> shrugging it off. I don't want to give the impression I'm shrugging off the >>> correctness issue either. It's a serious issue to fix, but since all successful >>> updates to the database are linearizable, I think it's likely that many applications behave correctly with the present semantics, or at >> least encounter only transient errors. No doubt many also do not, but I >>> have no idea of the ratio. The regression isn't itself a simple issue either - depending >> on >>> the topology and message latencies it is not difficult to produce >>> inescapable contention, i.e. guaranteed timeouts - that might persist as long >> as clients continue to retry. It could be quite a serious degradation >> of service to impose on our users. I don't pretend to know the correct way to make a decision >>> balancing these considerations, but I am perhaps more concerned about >> imposing service outages than I am temporarily maintaining semantics our >>> users have apparently accepted for years - though I absolutely share your embarrassment there. On 12/11/2020, 12:41, "Jos
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Well, I expressed a preference for #3 over #4, particularly for the 3.x series. However at this point, I think the lack of a clear project decision means we can punt it back to you and Sylvain to make the final call. On 20/11/2020, 16:23, "Benjamin Lerer" wrote: I will try to summarize the discussion to clarify the outcome. Mick is in favor of #4 Summanth is in favor of #4 Sylvain answer was not clear for me. I understood it like I prefer #3 to #4 and I am also fine with #1 Jeff is in favor of #3 and will understand #4 David is in favor #3 (fix bug and add flag to roll back to old behavior) in 4.0 and #4 in 3.0 and 3.11 Do not hesitate to correct me if I misunderstood your answer. Based on these answers it seems clear that most people prefer to go for #3 or #4. The choice between #3 (fix correctness opt-in to current behavior) and #4 (current behavior opt-in to correctness) is a bit less clear specially if we consider the 3.X branches or 4.0. Does anybody as some idea on how to choose between those 2 choices or some extra opinions on #3 versus #4? On Wed, Nov 18, 2020 at 9:45 PM David Capwell wrote: > I feel that #4 (fix bug and add flag to roll back to old behavior) is best. > > About the alternative implementation, I am fine adding it to 3.x and 4.0, > but should treat it as a different path disabled by default that you can > opt-into, with a plan to opt-in by default "eventually". > > On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < > bened...@apache.org> > wrote: > > > Perhaps there might be broader appetite to weigh in on which major > > releases we might target for work that fixes the correctness bug without > > serious performance regression? > > > > i.e., if we were to fix the correctness bug now, introducing a serious > > performance regression (either opt-in or opt-out), but were to land work > > without this problem for 5.0, would there be appetite to backport this > work > > to any of 4.0, 3.11 or 3.0? > > > > > > On 18/11/2020, 18:31, "Jeff Jirsa" wrote: > > > > This is complicated and relatively few people on earth understand it, > > so > > having little feedback is mostly expected, unfortunately. > > > > My normal emotional response is "correctness is required, opt-in to > > performance improvements that sacrifice strict correctness", but I'm > > also > > sure this is going to surprise people, and would understand / accept > #4 > > (default to current, opt-in to correct). > > > > > > On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < > > bened...@apache.org> > > wrote: > > > > > It doesn't seem like there's much enthusiasm for any of the options > > > available here... > > > > > > On 12/11/2020, 14:37, "Benedict Elliott Smith" < > bened...@apache.org > > > > > > wrote: > > > > > > > Is the new implementation a separate, distinctly modularized > > new > > > body of work > > > > > > It’s primarily a distinct, modularised and new body of work, > > however > > > there is some shared code that has been modified - namely > > PaxosState, in > > > which legacy code is maintained but modified for compatibility, and > > the > > > system.paxos table (which receives a new column, and slightly > > modified > > > serialization code). It is conceptually an optimised version of > the > > > existing algorithm. > > > > > > If there's a chance of being of value to 4.0, I can try to put > > up a > > > patch next week alongside a high level description of the changes. > > > > > > > But a performance regression is a regression, I'm not > > shrugging it > > > off. > > > > > > I don't want to give the impression I'm shrugging off the > > correctness > > > issue either. It's a serious issue to fix, but since all successful > > updates > > > to the database are linearizable, I think it's likely that many > > > applications behave correctly with the present semantics, or at > least > > > encounter only transient errors. No doubt many also do not, but I > > have no > > > idea of the ratio. > > > > > > The regression isn't itself a simple issue either - depending > on > > the > > > topology and message latencies it is not difficult to produce > > inescapable > > > contention, i.e. guaranteed timeouts - that might persist as long > as > > > clients continue to retry. It could be quite a serious degradation > of > > > service to impose on our users. > > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
I will try to summarize the discussion to clarify the outcome. Mick is in favor of #4 Summanth is in favor of #4 Sylvain answer was not clear for me. I understood it like I prefer #3 to #4 and I am also fine with #1 Jeff is in favor of #3 and will understand #4 David is in favor #3 (fix bug and add flag to roll back to old behavior) in 4.0 and #4 in 3.0 and 3.11 Do not hesitate to correct me if I misunderstood your answer. Based on these answers it seems clear that most people prefer to go for #3 or #4. The choice between #3 (fix correctness opt-in to current behavior) and #4 (current behavior opt-in to correctness) is a bit less clear specially if we consider the 3.X branches or 4.0. Does anybody as some idea on how to choose between those 2 choices or some extra opinions on #3 versus #4? On Wed, Nov 18, 2020 at 9:45 PM David Capwell wrote: > I feel that #4 (fix bug and add flag to roll back to old behavior) is best. > > About the alternative implementation, I am fine adding it to 3.x and 4.0, > but should treat it as a different path disabled by default that you can > opt-into, with a plan to opt-in by default "eventually". > > On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith < > bened...@apache.org> > wrote: > > > Perhaps there might be broader appetite to weigh in on which major > > releases we might target for work that fixes the correctness bug without > > serious performance regression? > > > > i.e., if we were to fix the correctness bug now, introducing a serious > > performance regression (either opt-in or opt-out), but were to land work > > without this problem for 5.0, would there be appetite to backport this > work > > to any of 4.0, 3.11 or 3.0? > > > > > > On 18/11/2020, 18:31, "Jeff Jirsa" wrote: > > > > This is complicated and relatively few people on earth understand it, > > so > > having little feedback is mostly expected, unfortunately. > > > > My normal emotional response is "correctness is required, opt-in to > > performance improvements that sacrifice strict correctness", but I'm > > also > > sure this is going to surprise people, and would understand / accept > #4 > > (default to current, opt-in to correct). > > > > > > On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < > > bened...@apache.org> > > wrote: > > > > > It doesn't seem like there's much enthusiasm for any of the options > > > available here... > > > > > > On 12/11/2020, 14:37, "Benedict Elliott Smith" < > bened...@apache.org > > > > > > wrote: > > > > > > > Is the new implementation a separate, distinctly modularized > > new > > > body of work > > > > > > It’s primarily a distinct, modularised and new body of work, > > however > > > there is some shared code that has been modified - namely > > PaxosState, in > > > which legacy code is maintained but modified for compatibility, and > > the > > > system.paxos table (which receives a new column, and slightly > > modified > > > serialization code). It is conceptually an optimised version of > the > > > existing algorithm. > > > > > > If there's a chance of being of value to 4.0, I can try to put > > up a > > > patch next week alongside a high level description of the changes. > > > > > > > But a performance regression is a regression, I'm not > > shrugging it > > > off. > > > > > > I don't want to give the impression I'm shrugging off the > > correctness > > > issue either. It's a serious issue to fix, but since all successful > > updates > > > to the database are linearizable, I think it's likely that many > > > applications behave correctly with the present semantics, or at > least > > > encounter only transient errors. No doubt many also do not, but I > > have no > > > idea of the ratio. > > > > > > The regression isn't itself a simple issue either - depending > on > > the > > > topology and message latencies it is not difficult to produce > > inescapable > > > contention, i.e. guaranteed timeouts - that might persist as long > as > > > clients continue to retry. It could be quite a serious degradation > of > > > service to impose on our users. > > > > > > I don't pretend to know the correct way to make a decision > > balancing > > > these considerations, but I am perhaps more concerned about > imposing > > > service outages than I am temporarily maintaining semantics our > > users have > > > apparently accepted for years - though I absolutely share your > > > embarrassment there. > > > > > > > > > On 12/11/2020, 12:41, "Joshua McKenzie" > > > wrote: > > > > > > Is the new implementation a separate, distinctly > modularized > > new > > > body of > > > work or does it make substantial changes to existing > > > implementation and > > > subsume it? > > > > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
I feel that #4 (fix bug and add flag to roll back to old behavior) is best. About the alternative implementation, I am fine adding it to 3.x and 4.0, but should treat it as a different path disabled by default that you can opt-into, with a plan to opt-in by default "eventually". On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith wrote: > Perhaps there might be broader appetite to weigh in on which major > releases we might target for work that fixes the correctness bug without > serious performance regression? > > i.e., if we were to fix the correctness bug now, introducing a serious > performance regression (either opt-in or opt-out), but were to land work > without this problem for 5.0, would there be appetite to backport this work > to any of 4.0, 3.11 or 3.0? > > > On 18/11/2020, 18:31, "Jeff Jirsa" wrote: > > This is complicated and relatively few people on earth understand it, > so > having little feedback is mostly expected, unfortunately. > > My normal emotional response is "correctness is required, opt-in to > performance improvements that sacrifice strict correctness", but I'm > also > sure this is going to surprise people, and would understand / accept #4 > (default to current, opt-in to correct). > > > On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith < > bened...@apache.org> > wrote: > > > It doesn't seem like there's much enthusiasm for any of the options > > available here... > > > > On 12/11/2020, 14:37, "Benedict Elliott Smith" > > > wrote: > > > > > Is the new implementation a separate, distinctly modularized > new > > body of work > > > > It’s primarily a distinct, modularised and new body of work, > however > > there is some shared code that has been modified - namely > PaxosState, in > > which legacy code is maintained but modified for compatibility, and > the > > system.paxos table (which receives a new column, and slightly > modified > > serialization code). It is conceptually an optimised version of the > > existing algorithm. > > > > If there's a chance of being of value to 4.0, I can try to put > up a > > patch next week alongside a high level description of the changes. > > > > > But a performance regression is a regression, I'm not > shrugging it > > off. > > > > I don't want to give the impression I'm shrugging off the > correctness > > issue either. It's a serious issue to fix, but since all successful > updates > > to the database are linearizable, I think it's likely that many > > applications behave correctly with the present semantics, or at least > > encounter only transient errors. No doubt many also do not, but I > have no > > idea of the ratio. > > > > The regression isn't itself a simple issue either - depending on > the > > topology and message latencies it is not difficult to produce > inescapable > > contention, i.e. guaranteed timeouts - that might persist as long as > > clients continue to retry. It could be quite a serious degradation of > > service to impose on our users. > > > > I don't pretend to know the correct way to make a decision > balancing > > these considerations, but I am perhaps more concerned about imposing > > service outages than I am temporarily maintaining semantics our > users have > > apparently accepted for years - though I absolutely share your > > embarrassment there. > > > > > > On 12/11/2020, 12:41, "Joshua McKenzie" > wrote: > > > > Is the new implementation a separate, distinctly modularized > new > > body of > > work or does it make substantial changes to existing > > implementation and > > subsume it? > > > > On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne < > > lebre...@gmail.com> wrote: > > > > > Regarding option #4, I'll remark that experience tends to > > suggest users > > > don't consistently read the `NEWS.txt` file on upgrade, so > > option #4 will > > > likely essentially mean "LWT has a correctness issue, but > once > > it broke > > > your data enough that you'll notice, you'll be able to dig > the > > proper flag > > > to fix it for next time". I guess it's better than > nothing, of > > course, but > > > I'll admit that defaulting to "opt-in correctness", > especially > > for a > > > feature (LWT) that exists uniquely to provide additional > > guarantees, is > > > something I have a hard rallying behind. > > > > > > But a performance regression is a regression, I'm not > shrugging > > it off. > > > Still, I feel we shouldn't leave LWT with a fairly serious > known > > > correctness bug and I frankly feel bad for "the project"
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Perhaps there might be broader appetite to weigh in on which major releases we might target for work that fixes the correctness bug without serious performance regression? i.e., if we were to fix the correctness bug now, introducing a serious performance regression (either opt-in or opt-out), but were to land work without this problem for 5.0, would there be appetite to backport this work to any of 4.0, 3.11 or 3.0? On 18/11/2020, 18:31, "Jeff Jirsa" wrote: This is complicated and relatively few people on earth understand it, so having little feedback is mostly expected, unfortunately. My normal emotional response is "correctness is required, opt-in to performance improvements that sacrifice strict correctness", but I'm also sure this is going to surprise people, and would understand / accept #4 (default to current, opt-in to correct). On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith wrote: > It doesn't seem like there's much enthusiasm for any of the options > available here... > > On 12/11/2020, 14:37, "Benedict Elliott Smith" > wrote: > > > Is the new implementation a separate, distinctly modularized new > body of work > > It’s primarily a distinct, modularised and new body of work, however > there is some shared code that has been modified - namely PaxosState, in > which legacy code is maintained but modified for compatibility, and the > system.paxos table (which receives a new column, and slightly modified > serialization code). It is conceptually an optimised version of the > existing algorithm. > > If there's a chance of being of value to 4.0, I can try to put up a > patch next week alongside a high level description of the changes. > > > But a performance regression is a regression, I'm not shrugging it > off. > > I don't want to give the impression I'm shrugging off the correctness > issue either. It's a serious issue to fix, but since all successful updates > to the database are linearizable, I think it's likely that many > applications behave correctly with the present semantics, or at least > encounter only transient errors. No doubt many also do not, but I have no > idea of the ratio. > > The regression isn't itself a simple issue either - depending on the > topology and message latencies it is not difficult to produce inescapable > contention, i.e. guaranteed timeouts - that might persist as long as > clients continue to retry. It could be quite a serious degradation of > service to impose on our users. > > I don't pretend to know the correct way to make a decision balancing > these considerations, but I am perhaps more concerned about imposing > service outages than I am temporarily maintaining semantics our users have > apparently accepted for years - though I absolutely share your > embarrassment there. > > > On 12/11/2020, 12:41, "Joshua McKenzie" wrote: > > Is the new implementation a separate, distinctly modularized new > body of > work or does it make substantial changes to existing > implementation and > subsume it? > > On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne < > lebre...@gmail.com> wrote: > > > Regarding option #4, I'll remark that experience tends to > suggest users > > don't consistently read the `NEWS.txt` file on upgrade, so > option #4 will > > likely essentially mean "LWT has a correctness issue, but once > it broke > > your data enough that you'll notice, you'll be able to dig the > proper flag > > to fix it for next time". I guess it's better than nothing, of > course, but > > I'll admit that defaulting to "opt-in correctness", especially > for a > > feature (LWT) that exists uniquely to provide additional > guarantees, is > > something I have a hard rallying behind. > > > > But a performance regression is a regression, I'm not shrugging > it off. > > Still, I feel we shouldn't leave LWT with a fairly serious known > > correctness bug and I frankly feel bad for "the project" that > this has been > > known for so long without action, so I'm a bit biased in wanting > to get it > > fixed asap. > > > > But maybe I'm overstating the urgency here, and maybe option #1 > is a better > > way forward. > > > > -- > > Sylvain > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.a
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
This is complicated and relatively few people on earth understand it, so having little feedback is mostly expected, unfortunately. My normal emotional response is "correctness is required, opt-in to performance improvements that sacrifice strict correctness", but I'm also sure this is going to surprise people, and would understand / accept #4 (default to current, opt-in to correct). On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith wrote: > It doesn't seem like there's much enthusiasm for any of the options > available here... > > On 12/11/2020, 14:37, "Benedict Elliott Smith" > wrote: > > > Is the new implementation a separate, distinctly modularized new > body of work > > It’s primarily a distinct, modularised and new body of work, however > there is some shared code that has been modified - namely PaxosState, in > which legacy code is maintained but modified for compatibility, and the > system.paxos table (which receives a new column, and slightly modified > serialization code). It is conceptually an optimised version of the > existing algorithm. > > If there's a chance of being of value to 4.0, I can try to put up a > patch next week alongside a high level description of the changes. > > > But a performance regression is a regression, I'm not shrugging it > off. > > I don't want to give the impression I'm shrugging off the correctness > issue either. It's a serious issue to fix, but since all successful updates > to the database are linearizable, I think it's likely that many > applications behave correctly with the present semantics, or at least > encounter only transient errors. No doubt many also do not, but I have no > idea of the ratio. > > The regression isn't itself a simple issue either - depending on the > topology and message latencies it is not difficult to produce inescapable > contention, i.e. guaranteed timeouts - that might persist as long as > clients continue to retry. It could be quite a serious degradation of > service to impose on our users. > > I don't pretend to know the correct way to make a decision balancing > these considerations, but I am perhaps more concerned about imposing > service outages than I am temporarily maintaining semantics our users have > apparently accepted for years - though I absolutely share your > embarrassment there. > > > On 12/11/2020, 12:41, "Joshua McKenzie" wrote: > > Is the new implementation a separate, distinctly modularized new > body of > work or does it make substantial changes to existing > implementation and > subsume it? > > On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne < > lebre...@gmail.com> wrote: > > > Regarding option #4, I'll remark that experience tends to > suggest users > > don't consistently read the `NEWS.txt` file on upgrade, so > option #4 will > > likely essentially mean "LWT has a correctness issue, but once > it broke > > your data enough that you'll notice, you'll be able to dig the > proper flag > > to fix it for next time". I guess it's better than nothing, of > course, but > > I'll admit that defaulting to "opt-in correctness", especially > for a > > feature (LWT) that exists uniquely to provide additional > guarantees, is > > something I have a hard rallying behind. > > > > But a performance regression is a regression, I'm not shrugging > it off. > > Still, I feel we shouldn't leave LWT with a fairly serious known > > correctness bug and I frankly feel bad for "the project" that > this has been > > known for so long without action, so I'm a bit biased in wanting > to get it > > fixed asap. > > > > But maybe I'm overstating the urgency here, and maybe option #1 > is a better > > way forward. > > > > -- > > Sylvain > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
It doesn't seem like there's much enthusiasm for any of the options available here... On 12/11/2020, 14:37, "Benedict Elliott Smith" wrote: > Is the new implementation a separate, distinctly modularized new body of work It’s primarily a distinct, modularised and new body of work, however there is some shared code that has been modified - namely PaxosState, in which legacy code is maintained but modified for compatibility, and the system.paxos table (which receives a new column, and slightly modified serialization code). It is conceptually an optimised version of the existing algorithm. If there's a chance of being of value to 4.0, I can try to put up a patch next week alongside a high level description of the changes. > But a performance regression is a regression, I'm not shrugging it off. I don't want to give the impression I'm shrugging off the correctness issue either. It's a serious issue to fix, but since all successful updates to the database are linearizable, I think it's likely that many applications behave correctly with the present semantics, or at least encounter only transient errors. No doubt many also do not, but I have no idea of the ratio. The regression isn't itself a simple issue either - depending on the topology and message latencies it is not difficult to produce inescapable contention, i.e. guaranteed timeouts - that might persist as long as clients continue to retry. It could be quite a serious degradation of service to impose on our users. I don't pretend to know the correct way to make a decision balancing these considerations, but I am perhaps more concerned about imposing service outages than I am temporarily maintaining semantics our users have apparently accepted for years - though I absolutely share your embarrassment there. On 12/11/2020, 12:41, "Joshua McKenzie" wrote: Is the new implementation a separate, distinctly modularized new body of work or does it make substantial changes to existing implementation and subsume it? On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne wrote: > Regarding option #4, I'll remark that experience tends to suggest users > don't consistently read the `NEWS.txt` file on upgrade, so option #4 will > likely essentially mean "LWT has a correctness issue, but once it broke > your data enough that you'll notice, you'll be able to dig the proper flag > to fix it for next time". I guess it's better than nothing, of course, but > I'll admit that defaulting to "opt-in correctness", especially for a > feature (LWT) that exists uniquely to provide additional guarantees, is > something I have a hard rallying behind. > > But a performance regression is a regression, I'm not shrugging it off. > Still, I feel we shouldn't leave LWT with a fairly serious known > correctness bug and I frankly feel bad for "the project" that this has been > known for so long without action, so I'm a bit biased in wanting to get it > fixed asap. > > But maybe I'm overstating the urgency here, and maybe option #1 is a better > way forward. > > -- > Sylvain > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
> Is the new implementation a separate, distinctly modularized new body of work It’s primarily a distinct, modularised and new body of work, however there is some shared code that has been modified - namely PaxosState, in which legacy code is maintained but modified for compatibility, and the system.paxos table (which receives a new column, and slightly modified serialization code). It is conceptually an optimised version of the existing algorithm. If there's a chance of being of value to 4.0, I can try to put up a patch next week alongside a high level description of the changes. > But a performance regression is a regression, I'm not shrugging it off. I don't want to give the impression I'm shrugging off the correctness issue either. It's a serious issue to fix, but since all successful updates to the database are linearizable, I think it's likely that many applications behave correctly with the present semantics, or at least encounter only transient errors. No doubt many also do not, but I have no idea of the ratio. The regression isn't itself a simple issue either - depending on the topology and message latencies it is not difficult to produce inescapable contention, i.e. guaranteed timeouts - that might persist as long as clients continue to retry. It could be quite a serious degradation of service to impose on our users. I don't pretend to know the correct way to make a decision balancing these considerations, but I am perhaps more concerned about imposing service outages than I am temporarily maintaining semantics our users have apparently accepted for years - though I absolutely share your embarrassment there. On 12/11/2020, 12:41, "Joshua McKenzie" wrote: Is the new implementation a separate, distinctly modularized new body of work or does it make substantial changes to existing implementation and subsume it? On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne wrote: > Regarding option #4, I'll remark that experience tends to suggest users > don't consistently read the `NEWS.txt` file on upgrade, so option #4 will > likely essentially mean "LWT has a correctness issue, but once it broke > your data enough that you'll notice, you'll be able to dig the proper flag > to fix it for next time". I guess it's better than nothing, of course, but > I'll admit that defaulting to "opt-in correctness", especially for a > feature (LWT) that exists uniquely to provide additional guarantees, is > something I have a hard rallying behind. > > But a performance regression is a regression, I'm not shrugging it off. > Still, I feel we shouldn't leave LWT with a fairly serious known > correctness bug and I frankly feel bad for "the project" that this has been > known for so long without action, so I'm a bit biased in wanting to get it > fixed asap. > > But maybe I'm overstating the urgency here, and maybe option #1 is a better > way forward. > > -- > Sylvain > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Is the new implementation a separate, distinctly modularized new body of work or does it make substantial changes to existing implementation and subsume it? On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne wrote: > Regarding option #4, I'll remark that experience tends to suggest users > don't consistently read the `NEWS.txt` file on upgrade, so option #4 will > likely essentially mean "LWT has a correctness issue, but once it broke > your data enough that you'll notice, you'll be able to dig the proper flag > to fix it for next time". I guess it's better than nothing, of course, but > I'll admit that defaulting to "opt-in correctness", especially for a > feature (LWT) that exists uniquely to provide additional guarantees, is > something I have a hard rallying behind. > > But a performance regression is a regression, I'm not shrugging it off. > Still, I feel we shouldn't leave LWT with a fairly serious known > correctness bug and I frankly feel bad for "the project" that this has been > known for so long without action, so I'm a bit biased in wanting to get it > fixed asap. > > But maybe I'm overstating the urgency here, and maybe option #1 is a better > way forward. > > -- > Sylvain >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Regarding option #4, I'll remark that experience tends to suggest users don't consistently read the `NEWS.txt` file on upgrade, so option #4 will likely essentially mean "LWT has a correctness issue, but once it broke your data enough that you'll notice, you'll be able to dig the proper flag to fix it for next time". I guess it's better than nothing, of course, but I'll admit that defaulting to "opt-in correctness", especially for a feature (LWT) that exists uniquely to provide additional guarantees, is something I have a hard rallying behind. But a performance regression is a regression, I'm not shrugging it off. Still, I feel we shouldn't leave LWT with a fairly serious known correctness bug and I frankly feel bad for "the project" that this has been known for so long without action, so I'm a bit biased in wanting to get it fixed asap. But maybe I'm overstating the urgency here, and maybe option #1 is a better way forward. -- Sylvain
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Knowing there is a correctness issue in LWT, and given users use LWT primarily for correctness, my opinion is we should commit the correctness patch (makes it one of #1, #3 or #4) I agree we should not cause further delay to 4.0 release (making it one of #3 or #4). Con for #3 would be, applications may have to rework their (and downstreams') configuration(s) to potentially accommodate for the performance regression which may not be ideal for a seamless 4.0 upgrade that we expect users to experience. Now, given this correctness issue has been since the beginning, existing LWT users would notice no new difference potentially w.r.t. correctness since they may have already worked around this bug (if they noticed), so +1 to option #4. On Wed, Nov 11, 2020 at 1:49 PM Benedict Elliott Smith wrote: > In my opinion, a similar calculus should be applied to 3.0 and 3.11. This > is a(n arguably quite serious) bug, so whatever is not overly onerous to > backport should be considered while they are supported. The work under > discussion has two components: a replacement to the core consensus > algorithm, and mechanisms to ensure safety across range movements. The > latter might be more invasive for 3.x, but the former should be quite easy > to backport and as such probably quite well justified. > > > can it also pluggable (either opt-in or opt-out)? > > I think pluggable means something different to opt-in/opt-out, at least to > me. I'm all for more pluggability, and also for more optionality, but the > decision is very sensitive to context. We need to be able to select between > our options, which for consensus practically means supporting live > migration - which is exceptionally challenging in any general sense (and > perhaps inherently non-pluggable). > > As to future development for consensus, I personally hope the work we are > discussing here will be a strong platform for it, but obviously that's for > the community to decide later on. I think the work to take it forwards to > something epaxos-like will not be that herculean, with some incremental > milestones en route. But that's a totally different discussion for the > future, and either a CEP or a small intercollegiate working group. > > > On 11/11/2020, 18:48, "Michael Semb Wever" wrote: > > > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > > Benedict, Sylvain and I wanted to get the community feedback on them. > > > > We can: > > > >1. Try to use Benedict proposal for 4.0 if the community has the > >appetite for it. The main issue there is some potential extra > delay for 4.0 > >2. Do nothing for 4.0. Meaning do not commit the current patch. > We have > >lived a long time with that issue and we can probably wait a bit > more for a > >proper solution. > >3. Commit the patch as such, fixing the correctness but > introducing > >potentially some performance issue until we release a better > solution. > >4. Changing the patch to default to the current behavior but > allowing > >people to enable the new one if the correctness is a problem for > them. > > > > > If these options are for 4.0, is it then (4) that it is getting > applied to 3.0 and 3.11 ? > > If that is the case then I would vote on also applying (4) to 4.0, > given we are now in front of beta4. Please let's not further delay 4.0. > > Post 4.0, if (1) is as described "a parallel implementation of the > same underlying Paxos algorithm" can it also pluggable (either opt-in or > opt-out)? And would/could EPaxos become pluggable too in a similar manner > (if it eventuates)? I'm in favour on providing more pluggable interfaces > into C*, along with the code quality improvements that's going to have to > be accompanied with. > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
In my opinion, a similar calculus should be applied to 3.0 and 3.11. This is a(n arguably quite serious) bug, so whatever is not overly onerous to backport should be considered while they are supported. The work under discussion has two components: a replacement to the core consensus algorithm, and mechanisms to ensure safety across range movements. The latter might be more invasive for 3.x, but the former should be quite easy to backport and as such probably quite well justified. > can it also pluggable (either opt-in or opt-out)? I think pluggable means something different to opt-in/opt-out, at least to me. I'm all for more pluggability, and also for more optionality, but the decision is very sensitive to context. We need to be able to select between our options, which for consensus practically means supporting live migration - which is exceptionally challenging in any general sense (and perhaps inherently non-pluggable). As to future development for consensus, I personally hope the work we are discussing here will be a strong platform for it, but obviously that's for the community to decide later on. I think the work to take it forwards to something epaxos-like will not be that herculean, with some incremental milestones en route. But that's a totally different discussion for the future, and either a CEP or a small intercollegiate working group. On 11/11/2020, 18:48, "Michael Semb Wever" wrote: > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > >1. Try to use Benedict proposal for 4.0 if the community has the >appetite for it. The main issue there is some potential extra delay for 4.0 >2. Do nothing for 4.0. Meaning do not commit the current patch. We have >lived a long time with that issue and we can probably wait a bit more for a >proper solution. >3. Commit the patch as such, fixing the correctness but introducing >potentially some performance issue until we release a better solution. >4. Changing the patch to default to the current behavior but allowing >people to enable the new one if the correctness is a problem for them. > If these options are for 4.0, is it then (4) that it is getting applied to 3.0 and 3.11 ? If that is the case then I would vote on also applying (4) to 4.0, given we are now in front of beta4. Please let's not further delay 4.0. Post 4.0, if (1) is as described "a parallel implementation of the same underlying Paxos algorithm" can it also pluggable (either opt-in or opt-out)? And would/could EPaxos become pluggable too in a similar manner (if it eventuates)? I'm in favour on providing more pluggable interfaces into C*, along with the code quality improvements that's going to have to be accompanied with. - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
> Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > >1. Try to use Benedict proposal for 4.0 if the community has the >appetite for it. The main issue there is some potential extra delay for 4.0 >2. Do nothing for 4.0. Meaning do not commit the current patch. We have >lived a long time with that issue and we can probably wait a bit more for a >proper solution. >3. Commit the patch as such, fixing the correctness but introducing >potentially some performance issue until we release a better solution. >4. Changing the patch to default to the current behavior but allowing >people to enable the new one if the correctness is a problem for them. > If these options are for 4.0, is it then (4) that it is getting applied to 3.0 and 3.11 ? If that is the case then I would vote on also applying (4) to 4.0, given we are now in front of beta4. Please let's not further delay 4.0. Post 4.0, if (1) is as described "a parallel implementation of the same underlying Paxos algorithm" can it also pluggable (either opt-in or opt-out)? And would/could EPaxos become pluggable too in a similar manner (if it eventuates)? I'm in favour on providing more pluggable interfaces into C*, along with the code quality improvements that's going to have to be accompanied with. - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Got it. Thanks for the extra context. No real opinion here. :) On Wed, Nov 11, 2020 at 11:29 AM Benedict Elliott Smith wrote: > It's been there since the beginning. > > If we were to consider the alternative proposal for 4.0, it would not have > to be blocking for release. I had planned to come forward after 4.0, > primarily because I did not want to create further political complexities > for the project at this time, but also because I do not presently have the > time to produce all of the documentation we might like for such a proposal. > However, the work is ready, has already been reviewed by multiple > committers, has had more extensive testing than any feature I'm aware of to > date, and could be made available for 4.0 in fairly short order. While the > work itself is non-trivial, the work to integrate it is not complex. It > would also be optional, and configurable at runtime. > > The only likely blocker would be the process of review, and any other due > diligence the project might want to undertake. Absolutely not something I > advocate for or against an accelerated timescale on. I have no personal > preference for the approach taken, just providing this for context. > > > On 11/11/2020, 16:18, "Joshua McKenzie" wrote: > > How old is the C-12126 surfaced defect? i.e. is this a thing we've had > since initial introduction of paxos or is it a regression we introduced > somewhere along the way? > > On Wed, Nov 11, 2020 at 11:03 AM Benjamin Lerer < > benjamin.le...@datastax.com> > wrote: > > > CASSANDRA-12126 addresses one correctness issue of Light Weight > > Transactions. Unfortunately, the current patch developed by Sylvain > and > > Benedict requires an extra round trip between the coordinator and the > > replicas for SERIAL and LOCAL_SERIAL reads. > > After some experimentations, Benedict discovered that this extra > round trip > > could lead to a significant increase in timeouts for read-heavy > workloads. > > > > Users for which this behavior is a problem will be able to switch > back to > > the old behavior using a system property, therefore choosing > performance > > versus correctness. > > > > On the side, Benedict has worked on another approach that does not > suffer > > from that performance problem and also addresses some LWT correctness > > issues that can happen when adding or removing nodes. He initially > intended > > to deliver that improvement in 4.X but can try to incorporate it > into 4.0. > > > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > > Benedict, Sylvain and I wanted to get the community feedback on them. > > > > We can: > > > >1. Try to use Benedict proposal for 4.0 if the community has the > >appetite for it. The main issue there is some potential extra > delay for > > 4.0 > >2. Do nothing for 4.0. Meaning do not commit the current patch. > We have > >lived a long time with that issue and we can probably wait a bit > more > > for a > >proper solution. > >3. Commit the patch as such, fixing the correctness but > introducing > >potentially some performance issue until we release a better > solution. > >4. Changing the patch to default to the current behavior but > allowing > >people to enable the new one if the correctness is a problem for > them. > > > > Thanks in advance for your feedback. > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
It's been there since the beginning. If we were to consider the alternative proposal for 4.0, it would not have to be blocking for release. I had planned to come forward after 4.0, primarily because I did not want to create further political complexities for the project at this time, but also because I do not presently have the time to produce all of the documentation we might like for such a proposal. However, the work is ready, has already been reviewed by multiple committers, has had more extensive testing than any feature I'm aware of to date, and could be made available for 4.0 in fairly short order. While the work itself is non-trivial, the work to integrate it is not complex. It would also be optional, and configurable at runtime. The only likely blocker would be the process of review, and any other due diligence the project might want to undertake. Absolutely not something I advocate for or against an accelerated timescale on. I have no personal preference for the approach taken, just providing this for context. On 11/11/2020, 16:18, "Joshua McKenzie" wrote: How old is the C-12126 surfaced defect? i.e. is this a thing we've had since initial introduction of paxos or is it a regression we introduced somewhere along the way? On Wed, Nov 11, 2020 at 11:03 AM Benjamin Lerer wrote: > CASSANDRA-12126 addresses one correctness issue of Light Weight > Transactions. Unfortunately, the current patch developed by Sylvain and > Benedict requires an extra round trip between the coordinator and the > replicas for SERIAL and LOCAL_SERIAL reads. > After some experimentations, Benedict discovered that this extra round trip > could lead to a significant increase in timeouts for read-heavy workloads. > > Users for which this behavior is a problem will be able to switch back to > the old behavior using a system property, therefore choosing performance > versus correctness. > > On the side, Benedict has worked on another approach that does not suffer > from that performance problem and also addresses some LWT correctness > issues that can happen when adding or removing nodes. He initially intended > to deliver that improvement in 4.X but can try to incorporate it into 4.0. > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > >1. Try to use Benedict proposal for 4.0 if the community has the >appetite for it. The main issue there is some potential extra delay for > 4.0 >2. Do nothing for 4.0. Meaning do not commit the current patch. We have >lived a long time with that issue and we can probably wait a bit more > for a >proper solution. >3. Commit the patch as such, fixing the correctness but introducing >potentially some performance issue until we release a better solution. >4. Changing the patch to default to the current behavior but allowing >people to enable the new one if the correctness is a problem for them. > > Thanks in advance for your feedback. > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
How old is the C-12126 surfaced defect? i.e. is this a thing we've had since initial introduction of paxos or is it a regression we introduced somewhere along the way? On Wed, Nov 11, 2020 at 11:03 AM Benjamin Lerer wrote: > CASSANDRA-12126 addresses one correctness issue of Light Weight > Transactions. Unfortunately, the current patch developed by Sylvain and > Benedict requires an extra round trip between the coordinator and the > replicas for SERIAL and LOCAL_SERIAL reads. > After some experimentations, Benedict discovered that this extra round trip > could lead to a significant increase in timeouts for read-heavy workloads. > > Users for which this behavior is a problem will be able to switch back to > the old behavior using a system property, therefore choosing performance > versus correctness. > > On the side, Benedict has worked on another approach that does not suffer > from that performance problem and also addresses some LWT correctness > issues that can happen when adding or removing nodes. He initially intended > to deliver that improvement in 4.X but can try to incorporate it into 4.0. > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > >1. Try to use Benedict proposal for 4.0 if the community has the >appetite for it. The main issue there is some potential extra delay for > 4.0 >2. Do nothing for 4.0. Meaning do not commit the current patch. We have >lived a long time with that issue and we can probably wait a bit more > for a >proper solution. >3. Commit the patch as such, fixing the correctness but introducing >potentially some performance issue until we release a better solution. >4. Changing the patch to default to the current behavior but allowing >people to enable the new one if the correctness is a problem for them. > > Thanks in advance for your feedback. >