Re: Stabilising Internode Messaging in 4.0

2019-04-14 Thread Nate McCall
Josh - You are 100% correct on that - I appreciate you calling it out.
In re-reading this, I think that came out sideways and significantly
harsher from what I intended. My apologies to Benedict and Aleksey.

My (poorly made) point was that collaboration is the lifeblood of any
open source project. It will require extra work but sometimes large
complex efforts require a level of pragmatism that can be at odds with
this goal.


On Sun, Apr 14, 2019 at 12:28 AM Joshua McKenzie  wrote:
>
> >
> > a couple of people (even if I know them personally,
> > consider them friends and are both among the best engineers i've ever
> > met) going off in a room and producing something in isolation is more
> > or less a giant "f*k you" to the open source process and our community
> > as a whole.
>
> Two engineers who are co-located who spend a couple months collaborating on
> something before opening it up for broad consensus debate is hardly as
> extreme as I think you're characterizing it. If we want to have a formal
> process on this project where people preemptively get consensus on any work
> greater than some arbitrary LoC or complexity metric, that's fine, but
> let's not retcon this and try and impose restrictions or interpretations on
> peoples' behavior after the fact.
>
> Everyone acted in good faith here. Collaborating internationally with
> dozens of engineers is Hard; let's assume the best of each other.
>
>
>
>
>
> On Fri, Apr 12, 2019 at 5:45 PM Nate McCall  wrote:
>
> > As someone who has been here a (very) long time and worked on C* in
> > production envs. back to version 0.4, this large patch - taken by
> > itself - does, to be frank, scare the shit out of me. In a complex
> > system any large change will have side effects impossible to
> > anticipate. I have seen this hold true too many times.
> >
> > That said, I think we all agree that internode has been a source of
> > warts since Facebook spun this out 10+ yrs ago and that we are all
> > tired of applying bandaides.
> >
> > As has been talked to else thread - and this is the super crucial
> > point for me -  we also have a substantially better testing story
> > internally and externally coming together than at any point in the
> > projects past.
> >
> > This next part is partially selfish, but I want to be 100% clear is in
> > the immediate interests of the project's future:
> > I am getting on stage in about a month to keynote the first Cassandra
> > focused event with any notable attendance we have had for a long time.
> > We are then all going out to vegas in Sept. to discuss the future of
> > our project and ideally have some cool use cases to show a bunch of
> > users.
> >
> > For both of these, we need a story to tell. It needs to be clear and
> > cohesive. And I think it's super important to get in front of these
> > people and have part of this story be "we took three years because we
> > didnt compromise on quality." If we dont have our shit together here I
> > think we will start loosing users at a much faster pace and we
> > seriously risk becoming "that thing you can run only if you are a
> > large company and can put a bunch of people on it who know it
> > intimately." Whether that is the case or not, *it will* be the
> > perception. We are just running out of time.
> >
> > So back to this patch: on the surface, it fixes a lot of stuff and
> > puts us on the right track for the future. I'm willing to set aside
> > the number of times I've been burned over the past decade because I
> > think we are in a much better position - as a whole community - to
> > find, report and fix issues this patch will introduce and do so much
> > faster than we ever have.
> >
> > I do want to end this with one more point because it needs to be
> > called out: a couple of people (even if I know them personally,
> > consider them friends and are both among the best engineers i've ever
> > met) going off in a room and producing something in isolation is more
> > or less a giant "f*k you" to the open source process and our community
> > as a whole. Internode is a particularly complex, messy, baggage ridden
> > component where there is an argument to be made that uninterrupted
> > concentration was the only way to achieve this, but it must be
> > understood that the perception of actions like this is toe stepping,
> > devaluation of opinions and is generally not conducive to holding a
> > community together. Again, i doubt this was the intention, but it is
> > the perception. Please let's avoid this in the future.
> >
> > In sum, +1. I wish this process were smoother but we're running out of
> > time.
> >
> > -Nate
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For 

Re: Stabilising Internode Messaging in 4.0

2019-04-13 Thread Joshua McKenzie
>
> a couple of people (even if I know them personally,
> consider them friends and are both among the best engineers i've ever
> met) going off in a room and producing something in isolation is more
> or less a giant "f*k you" to the open source process and our community
> as a whole.

Two engineers who are co-located who spend a couple months collaborating on
something before opening it up for broad consensus debate is hardly as
extreme as I think you're characterizing it. If we want to have a formal
process on this project where people preemptively get consensus on any work
greater than some arbitrary LoC or complexity metric, that's fine, but
let's not retcon this and try and impose restrictions or interpretations on
peoples' behavior after the fact.

Everyone acted in good faith here. Collaborating internationally with
dozens of engineers is Hard; let's assume the best of each other.





On Fri, Apr 12, 2019 at 5:45 PM Nate McCall  wrote:

> As someone who has been here a (very) long time and worked on C* in
> production envs. back to version 0.4, this large patch - taken by
> itself - does, to be frank, scare the shit out of me. In a complex
> system any large change will have side effects impossible to
> anticipate. I have seen this hold true too many times.
>
> That said, I think we all agree that internode has been a source of
> warts since Facebook spun this out 10+ yrs ago and that we are all
> tired of applying bandaides.
>
> As has been talked to else thread - and this is the super crucial
> point for me -  we also have a substantially better testing story
> internally and externally coming together than at any point in the
> projects past.
>
> This next part is partially selfish, but I want to be 100% clear is in
> the immediate interests of the project's future:
> I am getting on stage in about a month to keynote the first Cassandra
> focused event with any notable attendance we have had for a long time.
> We are then all going out to vegas in Sept. to discuss the future of
> our project and ideally have some cool use cases to show a bunch of
> users.
>
> For both of these, we need a story to tell. It needs to be clear and
> cohesive. And I think it's super important to get in front of these
> people and have part of this story be "we took three years because we
> didnt compromise on quality." If we dont have our shit together here I
> think we will start loosing users at a much faster pace and we
> seriously risk becoming "that thing you can run only if you are a
> large company and can put a bunch of people on it who know it
> intimately." Whether that is the case or not, *it will* be the
> perception. We are just running out of time.
>
> So back to this patch: on the surface, it fixes a lot of stuff and
> puts us on the right track for the future. I'm willing to set aside
> the number of times I've been burned over the past decade because I
> think we are in a much better position - as a whole community - to
> find, report and fix issues this patch will introduce and do so much
> faster than we ever have.
>
> I do want to end this with one more point because it needs to be
> called out: a couple of people (even if I know them personally,
> consider them friends and are both among the best engineers i've ever
> met) going off in a room and producing something in isolation is more
> or less a giant "f*k you" to the open source process and our community
> as a whole. Internode is a particularly complex, messy, baggage ridden
> component where there is an argument to be made that uninterrupted
> concentration was the only way to achieve this, but it must be
> understood that the perception of actions like this is toe stepping,
> devaluation of opinions and is generally not conducive to holding a
> community together. Again, i doubt this was the intention, but it is
> the perception. Please let's avoid this in the future.
>
> In sum, +1. I wish this process were smoother but we're running out of
> time.
>
> -Nate
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Nate McCall
As someone who has been here a (very) long time and worked on C* in
production envs. back to version 0.4, this large patch - taken by
itself - does, to be frank, scare the shit out of me. In a complex
system any large change will have side effects impossible to
anticipate. I have seen this hold true too many times.

That said, I think we all agree that internode has been a source of
warts since Facebook spun this out 10+ yrs ago and that we are all
tired of applying bandaides.

As has been talked to else thread - and this is the super crucial
point for me -  we also have a substantially better testing story
internally and externally coming together than at any point in the
projects past.

This next part is partially selfish, but I want to be 100% clear is in
the immediate interests of the project's future:
I am getting on stage in about a month to keynote the first Cassandra
focused event with any notable attendance we have had for a long time.
We are then all going out to vegas in Sept. to discuss the future of
our project and ideally have some cool use cases to show a bunch of
users.

For both of these, we need a story to tell. It needs to be clear and
cohesive. And I think it's super important to get in front of these
people and have part of this story be "we took three years because we
didnt compromise on quality." If we dont have our shit together here I
think we will start loosing users at a much faster pace and we
seriously risk becoming "that thing you can run only if you are a
large company and can put a bunch of people on it who know it
intimately." Whether that is the case or not, *it will* be the
perception. We are just running out of time.

So back to this patch: on the surface, it fixes a lot of stuff and
puts us on the right track for the future. I'm willing to set aside
the number of times I've been burned over the past decade because I
think we are in a much better position - as a whole community - to
find, report and fix issues this patch will introduce and do so much
faster than we ever have.

I do want to end this with one more point because it needs to be
called out: a couple of people (even if I know them personally,
consider them friends and are both among the best engineers i've ever
met) going off in a room and producing something in isolation is more
or less a giant "f*k you" to the open source process and our community
as a whole. Internode is a particularly complex, messy, baggage ridden
component where there is an argument to be made that uninterrupted
concentration was the only way to achieve this, but it must be
understood that the perception of actions like this is toe stepping,
devaluation of opinions and is generally not conducive to holding a
community together. Again, i doubt this was the intention, but it is
the perception. Please let's avoid this in the future.

In sum, +1. I wish this process were smoother but we're running out of time.

-Nate

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Pavel Yaskevich
It seems to me that the corner stone here is the development process.
If the work and review is done openly (e.g. on JIRA or Github) we
wouldn't be having this post factum conversation, because all of the
progress would be visible, so it would make sense to just squash before
committing
if so preferred.

It's indeed not really bisect friendly to drop squashed changes into the
repository,
but I've been guilty of that myself with SASI for a number of reasons, so I
can't
blame authors for this without sounding hypocritical.

As I mentioned before I would be great if we could establish the process for
how the development is supposed to happen, like other projects do. But it,
as most of the other concerns, (if not all of them) could be discussed
separately.


On Fri, Apr 12, 2019 at 1:25 PM Benedict Elliott Smith 
wrote:

> Can you start a new thread to build consensus on your proposals for
> modifying the commit process?
>
> I do not share your views, as already laid out in my first email.  The
> community makes these decisions through building consensus, and potentially
> a vote of the PMC.  This scope of change requires its own thread of
> discussion.
>
>
>
> > On 12 Apr 2019, at 20:46, Jordan West  wrote:
> >
> > Since their seems to be an assumption that I haven’t read the code let me
> > clarify: I am working on making time to be a reviewer on this and I have
> > already spent a few hours with the patch before I sent any replies,
> likely
> > more than most who are replying here. Again, because I disagree on
> > non-technical matters does not mean I haven’t considered the technical. I
> > am sharing what I think is necessary for the authors
> > to make review higher quality. I will not compromise my review standards
> on
> > a patch like this as I have said already. Telling me to review it to talk
> > more about it directly ignores my feedback and requires me to acquiesce
> all
> > of my concerns, which as I said I won’t do as a reviewer.
> >
> > And yes I am arguing for changing how the Cassandra community approaches
> > large patches. In the same way the freeze changed how we approached major
> > releases and the decision to do so has been a net benefit as measured by
> > quality and stability. Existing community members have already chimed in
> in
> > support of things like better commit hygiene.
> >
> > The past approaches haven’t prioritized quality and stability and it
> really
> > shows. What I and others here are suggesting has worked all over our
> > industry and is adopted by companies big (like google as i linked
> > previously) and small (like many startups I and others have worked for).
> > Everything we want to do: better testing, better review, better code, is
> > made easier with better design review, better discussion, and more
> > digestible patches among many of the other things suggested in this
> thread.
> >
> > Jordan
> >
> > On Fri, Apr 12, 2019 at 12:01 PM Benedict Elliott Smith <
> bened...@apache.org>
> > wrote:
> >
> >> I would once again exhort everyone making these kinds of comment to
> >> actually read the code, and to comment on Jira.  Preferably with a
> >> justification by reference to the code for how or why it would improve
> the
> >> patch.
> >>
> >> As far as a design document is concerned, it’s very unclear what is
> being
> >> requested.  We already had plans, as Jordan knows, to produce a wiki
> page
> >> for posterity, and a blog post closer to release.  However, I have never
> >> heard of this as a requirement for review, or for commit.  We have so
> far
> >> taken two members of the community through the patch over video chat,
> and
> >> would be more than happy to do the same for others.  So far nobody has
> had
> >> any difficulty getting to grips with its structure.
> >>
> >> If the project wants to modify its normal process for putting a patch
> >> together, this is a whole different can of worms, and I am strongly -1.
> >> I’m not sure what precedent we’re trying to set imposing arbitrary
> >> constraints pre-commit for work that has already met the project’s
> >> inclusion criteria.
> >>
> >>
> >>> On 12 Apr 2019, at 18:58, Pavel Yaskevich  wrote:
> >>>
> >>> I haven't actually looked at the code
> >>
> >>
> >>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Benedict Elliott Smith
Can you start a new thread to build consensus on your proposals for modifying 
the commit process?

I do not share your views, as already laid out in my first email.  The 
community makes these decisions through building consensus, and potentially a 
vote of the PMC.  This scope of change requires its own thread of discussion.



> On 12 Apr 2019, at 20:46, Jordan West  wrote:
> 
> Since their seems to be an assumption that I haven’t read the code let me
> clarify: I am working on making time to be a reviewer on this and I have
> already spent a few hours with the patch before I sent any replies, likely
> more than most who are replying here. Again, because I disagree on
> non-technical matters does not mean I haven’t considered the technical. I
> am sharing what I think is necessary for the authors
> to make review higher quality. I will not compromise my review standards on
> a patch like this as I have said already. Telling me to review it to talk
> more about it directly ignores my feedback and requires me to acquiesce all
> of my concerns, which as I said I won’t do as a reviewer.
> 
> And yes I am arguing for changing how the Cassandra community approaches
> large patches. In the same way the freeze changed how we approached major
> releases and the decision to do so has been a net benefit as measured by
> quality and stability. Existing community members have already chimed in in
> support of things like better commit hygiene.
> 
> The past approaches haven’t prioritized quality and stability and it really
> shows. What I and others here are suggesting has worked all over our
> industry and is adopted by companies big (like google as i linked
> previously) and small (like many startups I and others have worked for).
> Everything we want to do: better testing, better review, better code, is
> made easier with better design review, better discussion, and more
> digestible patches among many of the other things suggested in this thread.
> 
> Jordan
> 
> On Fri, Apr 12, 2019 at 12:01 PM Benedict Elliott Smith 
> wrote:
> 
>> I would once again exhort everyone making these kinds of comment to
>> actually read the code, and to comment on Jira.  Preferably with a
>> justification by reference to the code for how or why it would improve the
>> patch.
>> 
>> As far as a design document is concerned, it’s very unclear what is being
>> requested.  We already had plans, as Jordan knows, to produce a wiki page
>> for posterity, and a blog post closer to release.  However, I have never
>> heard of this as a requirement for review, or for commit.  We have so far
>> taken two members of the community through the patch over video chat, and
>> would be more than happy to do the same for others.  So far nobody has had
>> any difficulty getting to grips with its structure.
>> 
>> If the project wants to modify its normal process for putting a patch
>> together, this is a whole different can of worms, and I am strongly -1.
>> I’m not sure what precedent we’re trying to set imposing arbitrary
>> constraints pre-commit for work that has already met the project’s
>> inclusion criteria.
>> 
>> 
>>> On 12 Apr 2019, at 18:58, Pavel Yaskevich  wrote:
>>> 
>>> I haven't actually looked at the code
>> 
>> 
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Blake Eggleston
It seems like one of the main points of contention isn’t so much the the 
content of the patch, but more about the amount of review this patch has/will 
receive relative to its perceived risk. If it’s the latter, then I think it 
would be more effective to explain why that’s the case, and what level of 
review would be more appropriate.

I’m personally +0  on requiring additional review. I feel like the 3 people 
involved so far have sufficient expertise, and trust them to be responsible, 
including soliciting additional reviews if they feel they’re needed.

If dev@ does collectively want more eyes on this, I’d suggest we solicit 
reviews from people who are very familiar with the messaging code, and let them 
decide what additional work and documentation they’d need to make a review 
manageable, if any. Everyone has their own review style, and there’s no need to 
ask for a bunch of additional work if it’s not needed.

> On Apr 12, 2019, at 12:46 PM, Jordan West  wrote:
> 
> Since their seems to be an assumption that I haven’t read the code let me
> clarify: I am working on making time to be a reviewer on this and I have
> already spent a few hours with the patch before I sent any replies, likely
> more than most who are replying here. Again, because I disagree on
> non-technical matters does not mean I haven’t considered the technical. I
> am sharing what I think is necessary for the authors
> to make review higher quality. I will not compromise my review standards on
> a patch like this as I have said already. Telling me to review it to talk
> more about it directly ignores my feedback and requires me to acquiesce all
> of my concerns, which as I said I won’t do as a reviewer.
> 
> And yes I am arguing for changing how the Cassandra community approaches
> large patches. In the same way the freeze changed how we approached major
> releases and the decision to do so has been a net benefit as measured by
> quality and stability. Existing community members have already chimed in in
> support of things like better commit hygiene.
> 
> The past approaches haven’t prioritized quality and stability and it really
> shows. What I and others here are suggesting has worked all over our
> industry and is adopted by companies big (like google as i linked
> previously) and small (like many startups I and others have worked for).
> Everything we want to do: better testing, better review, better code, is
> made easier with better design review, better discussion, and more
> digestible patches among many of the other things suggested in this thread.
> 
> Jordan
> 
> On Fri, Apr 12, 2019 at 12:01 PM Benedict Elliott Smith 
> wrote:
> 
>> I would once again exhort everyone making these kinds of comment to
>> actually read the code, and to comment on Jira.  Preferably with a
>> justification by reference to the code for how or why it would improve the
>> patch.
>> 
>> As far as a design document is concerned, it’s very unclear what is being
>> requested.  We already had plans, as Jordan knows, to produce a wiki page
>> for posterity, and a blog post closer to release.  However, I have never
>> heard of this as a requirement for review, or for commit.  We have so far
>> taken two members of the community through the patch over video chat, and
>> would be more than happy to do the same for others.  So far nobody has had
>> any difficulty getting to grips with its structure.
>> 
>> If the project wants to modify its normal process for putting a patch
>> together, this is a whole different can of worms, and I am strongly -1.
>> I’m not sure what precedent we’re trying to set imposing arbitrary
>> constraints pre-commit for work that has already met the project’s
>> inclusion criteria.
>> 
>> 
>>> On 12 Apr 2019, at 18:58, Pavel Yaskevich  wrote:
>>> 
>>> I haven't actually looked at the code
>> 
>> 
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Jordan West
Since their seems to be an assumption that I haven’t read the code let me
clarify: I am working on making time to be a reviewer on this and I have
already spent a few hours with the patch before I sent any replies, likely
more than most who are replying here. Again, because I disagree on
non-technical matters does not mean I haven’t considered the technical. I
am sharing what I think is necessary for the authors
to make review higher quality. I will not compromise my review standards on
a patch like this as I have said already. Telling me to review it to talk
more about it directly ignores my feedback and requires me to acquiesce all
of my concerns, which as I said I won’t do as a reviewer.

And yes I am arguing for changing how the Cassandra community approaches
large patches. In the same way the freeze changed how we approached major
releases and the decision to do so has been a net benefit as measured by
quality and stability. Existing community members have already chimed in in
support of things like better commit hygiene.

The past approaches haven’t prioritized quality and stability and it really
shows. What I and others here are suggesting has worked all over our
industry and is adopted by companies big (like google as i linked
previously) and small (like many startups I and others have worked for).
Everything we want to do: better testing, better review, better code, is
made easier with better design review, better discussion, and more
digestible patches among many of the other things suggested in this thread.

Jordan

On Fri, Apr 12, 2019 at 12:01 PM Benedict Elliott Smith 
wrote:

> I would once again exhort everyone making these kinds of comment to
> actually read the code, and to comment on Jira.  Preferably with a
> justification by reference to the code for how or why it would improve the
> patch.
>
> As far as a design document is concerned, it’s very unclear what is being
> requested.  We already had plans, as Jordan knows, to produce a wiki page
> for posterity, and a blog post closer to release.  However, I have never
> heard of this as a requirement for review, or for commit.  We have so far
> taken two members of the community through the patch over video chat, and
> would be more than happy to do the same for others.  So far nobody has had
> any difficulty getting to grips with its structure.
>
> If the project wants to modify its normal process for putting a patch
> together, this is a whole different can of worms, and I am strongly -1.
> I’m not sure what precedent we’re trying to set imposing arbitrary
> constraints pre-commit for work that has already met the project’s
> inclusion criteria.
>
>
> > On 12 Apr 2019, at 18:58, Pavel Yaskevich  wrote:
> >
> > I haven't actually looked at the code
>
>
>
>


Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Benedict Elliott Smith
I would once again exhort everyone making these kinds of comment to actually 
read the code, and to comment on Jira.  Preferably with a justification by 
reference to the code for how or why it would improve the patch.

As far as a design document is concerned, it’s very unclear what is being 
requested.  We already had plans, as Jordan knows, to produce a wiki page for 
posterity, and a blog post closer to release.  However, I have never heard of 
this as a requirement for review, or for commit.  We have so far taken two 
members of the community through the patch over video chat, and would be more 
than happy to do the same for others.  So far nobody has had any difficulty 
getting to grips with its structure.

If the project wants to modify its normal process for putting a patch together, 
this is a whole different can of worms, and I am strongly -1.  I’m not sure 
what precedent we’re trying to set imposing arbitrary constraints pre-commit 
for work that has already met the project’s inclusion criteria.


> On 12 Apr 2019, at 18:58, Pavel Yaskevich  wrote:
> 
> I haven't actually looked at the code





Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Pavel Yaskevich
On Fri, Apr 12, 2019 at 10:15 AM Jordan West  wrote:

> I understand these non-technical discussions are not what everyone wants to
> focus on but they are extremely pertinent to the stability of the project.
> What I would like to see before merging this in is below. They are all
> reasonable asks in my opinion that will still result in the patch being
> merged — only w/ even more confidence in its quality. More details on my
> thoughts behind it follow.
>
> 1. Additional third party/independent reviewers added and required before
> merge.
>
> 2. The patch should at a minimum be broken up (commit-wise, not ticket
> wise) so the newly added features (CRC/backpressure/virtual tables) can be
> reviewed independent of the large patch set and can be easily included or
> excluded based on community discussion or reviewer feedback — using the
> exception process for new features we have used in the past during this
> freeze (merging 13304 the day after freeze, the zstd changes, etc).
>
> 3. As Sankalp mentioned, and I believe is in progress, a test plan should
> be published and executed (or at least part of it should be execute before
> the merge, its possible some will happen post merge but this should be
> minimal).
>
> 4. A (design) doc should be published to make it easier for reviewers to
> approach the code.
>

I haven't actually looked at the code, but these seems like reasonable asks.

I'd expect mechanical changes split into separate commits, as well as (at
least)
categories (e.g. CRC and framing, backpreassure) that Benedict/Aleksey
outlined.
Design doc would be great at least for posterity sake.


> With the above said, I apologize if the 8099 comment was a parallel that
> was too close to home for some. I am sure they are not direct comparison
> but the parallel I was trying to draw (and continue to) is that the
> development process that led to a project like 8099 continues to be used
> here. My focus for Cassandra is improving Quality and Stability of 4.0 (and
> ideally the 3.x series along the way) — especially in light of the recent
> status email I sent that included over 25 bugs found in the last 6 months.
>
> There is no question that the majority of this patch should go in — the bug
> fixes are necessary and we have no alternatives written. The question to
> me, besides increasing confidence in the patch, is can the authors make
> this better for reviewers by putting some more effort into the
> non-technical aspects of the patch and should the new features be included
> given the already risky changes proposed in this patch and the demand to
> review them?  The goal of the freeze was to reduce the time spent on new
> work so we can improve what existed. We do have a process for exceptions to
> that and if the community feels strongly about these features then we
> should follow that process — which would involve isolating the changes from
> the larger patch and having them be considered separately.
>
> Further, as someone who has reviewed 13304 and found a bug others didn’t, I
> don’t think having the code authors dictate the complexity or timeframe of
> the review makes sense. Thats not to say I didn’t read the email as
> suggested.  I encourage you to consider that its possible my experience
> informed my contrary point of view and how that sort of denigrating and
> unneeded comment affects the community.
>
> Anyways, part of those improvements have to come from how we design and
> develop the database. Not just how we test and run it. Having worked on
> several large projects on multiple databases (Cassandra’s SASI*, Riak’s
> Cluster Metadata / Membership, Riak’s bucket types, and Riak’s ring
> resizing feature, among others) and for large companies (those projects I
> can’t talk about), I am sure it is possible to design and develop features
> with a better process than the one used here. It is certainly possible, and
> hugely beneficial, to break up code into smaller commits (Google also feels
> this way: https://arxiv.org/pdf/1702.01715.pdf) and its not unreasonable
> to
> ask by any means. It should be a pre-requisite. A patch like this requires
> more from the authors than simply writing the code. Holding ourselves to a
> higher standard will increase the quality of the database dramatically. The
> same way committing to real testing has done (I again refer to all of the
> bugs found during the freeze that were not found previously).
>
> Hopefully its clear from the above that I am very supportive of getting the
> majority of these changes in. I think it would benefit the future of the
> project if we did that in a more deliberate way than how risky changes like
> this, late in the cycle, were handled in the past. We have an opportunity
> to change that here and it would benefit the project significantly.
> Cassandra’s willingness to jettison code has kept it relevant. Its process
> for doing so however has had negative effects on the databases brand — the
> deployment of the 3.x ser

Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Sam Tunnicliffe
+1 Thanks for articulating that so well Josh.

Sam

> On 12 Apr 2019, at 16:19, Blake Eggleston  
> wrote:
> 
> Well said Josh. You’ve pretty much summarized my thoughts on this as well.
> 
> +1 to moving forward with this
> 
>> On Apr 11, 2019, at 10:15 PM, Joshua McKenzie  wrote:
>> 
>> As one of the two people that re-wrote all our unit tests to try and help
>> Sylvain get 8099 out the door, I think it's inaccurate to compare the scope
>> and potential stability impact of this work to the truly sweeping work that
>> went into 8099 (not to downplay the scope and extent of this work here).
>> 
>> TBH, one of the big reasons we tend to drop such large PRs is the fact that
>>> Cassandra's code is highly intertwined and it makes it hard to precisely
>>> change things. We need to iterate towards interfaces that allow us to
>>> iterate quickly and reduce the amount of highly intertwined code. It helps
>>> with testing as well. I want us to have a meaningful discussion around it
>>> before we drop a big PR.
>> 
>> This has been a huge issue with our codebase since at least back when I
>> first encountered it five years ago. To date, while we have made progress
>> on this front, it's been nowhere near sufficient to mitigate the issues in
>> the codebase and allow for large, meaningful changes in smaller incremental
>> patches or commits. Having yet another discussion around this (there have
>> been many, many of them over the years) as a blocker for significant work
>> to go into the codebase is an unnecessary and dangerous blocker. Not to say
>> we shouldn't formalize a path to try and make incremental progress to
>> improve the situation, far from it, but blocking other progress on a
>> decade's worth of accumulated hygiene problems isn't going to make the
>> community focus on fixing those problems imo, it'll just turn away
>> contributions.
>> 
>> So let me second jd (and many others') opinion here: "it makes sense to get
>> it right the first time, rather than applying bandaids to 4.0 and rewriting
>> things for 4.next". And fwiw, asking people who have already done a huge
>> body of work to reformat that work into a series of commits or to break up
>> that work in a fashion that's more to the liking of people not involved in
>> either the writing of the patch or reviewing of it doesn't make much sense
>> to me. As I am neither an assignee nor reviewer on this contribution, I
>> leave it up to the parties involved to do things professionally and with a
>> high standard of quality. Admittedly, a large code change merging in like
>> this has implications for rebasing on anyone else's work that's in flight,
>> but be it one commit merged or 50, or be it one JIRA ticket or ten, the
>> end-result is the same; any large contribution in any format will ripple
>> outwards and require re-work from others in the community.
>> 
>> The one thing I *would* strongly argue for is performance benchmarking of
>> the new messaging code on a representative sample of different
>> general-purpose queries, LWT's, etc, preferably in a 3 node RF=3 cluster,
>> plus a healthy suite of jmh micro-benches (assuming they're not already in
>> the diff. If they are, disregard / sorry). From speaking with Aleksey
>> offline about this work, my understanding is that that's something they
>> plan on doing before putting a bow on things.
>> 
>> In the balance between "fear change because it destabilizes" and "go forth
>> blindly into that dark night, rewriting All The Things", I think the
>> Cassandra project's willingness to jettison the old and introduce the new
>> has served it well in keeping relevant as the years have gone by. I'd hate
>> to see that culture of progress get mired in a dogmatic adherence to
>> requirements on commit counts, lines of code allowed / expected on a given
>> patch set, or any other metrics that might stymie the professional needs of
>> some of the heaviest contributors to the project.
>> 
>> On Wed, Apr 10, 2019 at 5:03 PM Oleksandr Petrov 
>> wrote:
>> 
>>> Sorry to pick only a few points to address, but I think these are ones
>>> where I can contribute productively to the discussion.
>>> 
 In principle, I agree with the technical improvements you
>>> mention (backpressure / checksumming / etc). These things should be there.
>>> Are they a hard requirement for 4.0?
>>> 
>>> One thing that comes to mind is protocol versioning and consistency. If
>>> changes adding checksumming and handshake do not make it to 4.0, we grow
>>> the upgrade matrix and have to put changes to the separate protocol
>>> version. I'm not sure how many other internode protocol changes we have
>>> planned for 4.next, but this is definitely something we should keep in
>>> mind.
>>> 
 2. We should not be measuring complexity in LoC with the exception that
>>> all 20k lines *do need to be review* (not just the important parts and
>>> because code refactoring tools have bugs too) and more lines take more
>>> time.
>>> 
>>> Everything

Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Jordan West
I understand these non-technical discussions are not what everyone wants to
focus on but they are extremely pertinent to the stability of the project.
What I would like to see before merging this in is below. They are all
reasonable asks in my opinion that will still result in the patch being
merged — only w/ even more confidence in its quality. More details on my
thoughts behind it follow.

1. Additional third party/independent reviewers added and required before
merge.

2. The patch should at a minimum be broken up (commit-wise, not ticket
wise) so the newly added features (CRC/backpressure/virtual tables) can be
reviewed independent of the large patch set and can be easily included or
excluded based on community discussion or reviewer feedback — using the
exception process for new features we have used in the past during this
freeze (merging 13304 the day after freeze, the zstd changes, etc).

3. As Sankalp mentioned, and I believe is in progress, a test plan should
be published and executed (or at least part of it should be execute before
the merge, its possible some will happen post merge but this should be
minimal).

4. A (design) doc should be published to make it easier for reviewers to
approach the code.

With the above said, I apologize if the 8099 comment was a parallel that
was too close to home for some. I am sure they are not direct comparison
but the parallel I was trying to draw (and continue to) is that the
development process that led to a project like 8099 continues to be used
here. My focus for Cassandra is improving Quality and Stability of 4.0 (and
ideally the 3.x series along the way) — especially in light of the recent
status email I sent that included over 25 bugs found in the last 6 months.

There is no question that the majority of this patch should go in — the bug
fixes are necessary and we have no alternatives written. The question to
me, besides increasing confidence in the patch, is can the authors make
this better for reviewers by putting some more effort into the
non-technical aspects of the patch and should the new features be included
given the already risky changes proposed in this patch and the demand to
review them?  The goal of the freeze was to reduce the time spent on new
work so we can improve what existed. We do have a process for exceptions to
that and if the community feels strongly about these features then we
should follow that process — which would involve isolating the changes from
the larger patch and having them be considered separately.

Further, as someone who has reviewed 13304 and found a bug others didn’t, I
don’t think having the code authors dictate the complexity or timeframe of
the review makes sense. Thats not to say I didn’t read the email as
suggested.  I encourage you to consider that its possible my experience
informed my contrary point of view and how that sort of denigrating and
unneeded comment affects the community.

Anyways, part of those improvements have to come from how we design and
develop the database. Not just how we test and run it. Having worked on
several large projects on multiple databases (Cassandra’s SASI*, Riak’s
Cluster Metadata / Membership, Riak’s bucket types, and Riak’s ring
resizing feature, among others) and for large companies (those projects I
can’t talk about), I am sure it is possible to design and develop features
with a better process than the one used here. It is certainly possible, and
hugely beneficial, to break up code into smaller commits (Google also feels
this way: https://arxiv.org/pdf/1702.01715.pdf) and its not unreasonable to
ask by any means. It should be a pre-requisite. A patch like this requires
more from the authors than simply writing the code. Holding ourselves to a
higher standard will increase the quality of the database dramatically. The
same way committing to real testing has done (I again refer to all of the
bugs found during the freeze that were not found previously).

Hopefully its clear from the above that I am very supportive of getting the
majority of these changes in. I think it would benefit the future of the
project if we did that in a more deliberate way than how risky changes like
this, late in the cycle, were handled in the past. We have an opportunity
to change that here and it would benefit the project significantly.
Cassandra’s willingness to jettison code has kept it relevant. Its process
for doing so however has had negative effects on the databases brand — the
deployment of the 3.x series was directly affected by presumptions (real or
otherwise) of quality. We could take this as an opportunity to fix that and
keep the nimble aspects of the database alive at the same time.

Jordan

* Unfortunately w/ SASI we did contribute one big commit publicly but there
was a better commit history during development that could have been shared
and I would have liked to see us make it more digestible (I think we would
have found more bugs before merge).

On Fri, Apr 12, 2019 at 8:21 AM 

Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Blake Eggleston
Well said Josh. You’ve pretty much summarized my thoughts on this as well.

+1 to moving forward with this

> On Apr 11, 2019, at 10:15 PM, Joshua McKenzie  wrote:
> 
> As one of the two people that re-wrote all our unit tests to try and help
> Sylvain get 8099 out the door, I think it's inaccurate to compare the scope
> and potential stability impact of this work to the truly sweeping work that
> went into 8099 (not to downplay the scope and extent of this work here).
> 
> TBH, one of the big reasons we tend to drop such large PRs is the fact that
>> Cassandra's code is highly intertwined and it makes it hard to precisely
>> change things. We need to iterate towards interfaces that allow us to
>> iterate quickly and reduce the amount of highly intertwined code. It helps
>> with testing as well. I want us to have a meaningful discussion around it
>> before we drop a big PR.
> 
> This has been a huge issue with our codebase since at least back when I
> first encountered it five years ago. To date, while we have made progress
> on this front, it's been nowhere near sufficient to mitigate the issues in
> the codebase and allow for large, meaningful changes in smaller incremental
> patches or commits. Having yet another discussion around this (there have
> been many, many of them over the years) as a blocker for significant work
> to go into the codebase is an unnecessary and dangerous blocker. Not to say
> we shouldn't formalize a path to try and make incremental progress to
> improve the situation, far from it, but blocking other progress on a
> decade's worth of accumulated hygiene problems isn't going to make the
> community focus on fixing those problems imo, it'll just turn away
> contributions.
> 
> So let me second jd (and many others') opinion here: "it makes sense to get
> it right the first time, rather than applying bandaids to 4.0 and rewriting
> things for 4.next". And fwiw, asking people who have already done a huge
> body of work to reformat that work into a series of commits or to break up
> that work in a fashion that's more to the liking of people not involved in
> either the writing of the patch or reviewing of it doesn't make much sense
> to me. As I am neither an assignee nor reviewer on this contribution, I
> leave it up to the parties involved to do things professionally and with a
> high standard of quality. Admittedly, a large code change merging in like
> this has implications for rebasing on anyone else's work that's in flight,
> but be it one commit merged or 50, or be it one JIRA ticket or ten, the
> end-result is the same; any large contribution in any format will ripple
> outwards and require re-work from others in the community.
> 
> The one thing I *would* strongly argue for is performance benchmarking of
> the new messaging code on a representative sample of different
> general-purpose queries, LWT's, etc, preferably in a 3 node RF=3 cluster,
> plus a healthy suite of jmh micro-benches (assuming they're not already in
> the diff. If they are, disregard / sorry). From speaking with Aleksey
> offline about this work, my understanding is that that's something they
> plan on doing before putting a bow on things.
> 
> In the balance between "fear change because it destabilizes" and "go forth
> blindly into that dark night, rewriting All The Things", I think the
> Cassandra project's willingness to jettison the old and introduce the new
> has served it well in keeping relevant as the years have gone by. I'd hate
> to see that culture of progress get mired in a dogmatic adherence to
> requirements on commit counts, lines of code allowed / expected on a given
> patch set, or any other metrics that might stymie the professional needs of
> some of the heaviest contributors to the project.
> 
> On Wed, Apr 10, 2019 at 5:03 PM Oleksandr Petrov 
> wrote:
> 
>> Sorry to pick only a few points to address, but I think these are ones
>> where I can contribute productively to the discussion.
>> 
>>> In principle, I agree with the technical improvements you
>> mention (backpressure / checksumming / etc). These things should be there.
>> Are they a hard requirement for 4.0?
>> 
>> One thing that comes to mind is protocol versioning and consistency. If
>> changes adding checksumming and handshake do not make it to 4.0, we grow
>> the upgrade matrix and have to put changes to the separate protocol
>> version. I'm not sure how many other internode protocol changes we have
>> planned for 4.next, but this is definitely something we should keep in
>> mind.
>> 
>>> 2. We should not be measuring complexity in LoC with the exception that
>> all 20k lines *do need to be review* (not just the important parts and
>> because code refactoring tools have bugs too) and more lines take more
>> time.
>> 
>> Everything should definitely be reviewed. But with different rigour: one
>> thing is to review byte arithmetic and protocol formats and completely
>> different thing is to verify that Verb moved from one place

Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Benedict Elliott Smith
I don’t have a lot to add to Josh’s contribution, except that I’d like to 
really hammer home that many people were a party to 8099, and as a project we 
learned a great deal from the experience.  It’s a very complex topic, that does 
not lend itself to simple comparisons, but I think anyone who participated in 
that work would find it strange to see these two pieces of work compared.  I 
think it would also be helpful if we stopped using it as some kind of bogeyman. 
 It seems too easy to forget how much positive change came out of 8099, and how 
many bugs we have since avoided because of it.  A lot of people put a herculean 
effort into making it happen: by my recollection Sylvain alone spent perhaps a 
year of his time on the initial and follow-up work.  Most of the active 
contributors participated in some way, for months in many cases.  Every time we 
talk about it in this way it denigrates a lot of good work.  Using it as a 
rhetorical device without seemingly appreciating what was involved, or where it 
went wrong, is even less helpful.

On a personal note, I found a couple of the responses to this thread 
disappointing.  As far as I can tell, neither engaged with my email, in which I 
justify our approach on most of their areas of concern.  Nor accepted the 
third-party reviewer’s comments that the patch is manageable to review and of 
acceptable scope.  Nor seemingly read the patch with care to reach their own 
conclusion, with the one concrete factual assertion about the code being false.

We’re trying to build a more positive and constructive community here than 
there has been in the past.  I want to encourage and welcome critical feedback, 
but I think it is incumbent on critics to do some basic research and to engage 
with the target of their criticism - lest they appear to have a goal of 
frustrating a body of work rather than improving it.  Please take a moment to 
read my email, take a closer look at the patch itself, and then engage with us 
on Jira with specific constructive feedback, and concrete positive suggestions.

I'd like to thank everyone else for taking the time to provide their thoughts, 
and we hope to address any lingering concerns.  I would love to hear your 
feedback on our testing and documentation plan [1] that we have put together 
and are executing on.

[1] 
https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Internode+Messaging+Test+Plan
 



> On 12 Apr 2019, at 08:56, Pavel Yaskevich  wrote:
> 
> On Thu, Apr 11, 2019 at 10:15 PM Joshua McKenzie 
> wrote:
> 
>> As one of the two people that re-wrote all our unit tests to try and help
>> Sylvain get 8099 out the door, I think it's inaccurate to compare the scope
>> and potential stability impact of this work to the truly sweeping work that
>> went into 8099 (not to downplay the scope and extent of this work here).
>> 
>> TBH, one of the big reasons we tend to drop such large PRs is the fact that
>>> Cassandra's code is highly intertwined and it makes it hard to precisely
>>> change things. We need to iterate towards interfaces that allow us to
>>> iterate quickly and reduce the amount of highly intertwined code. It
>> helps
>>> with testing as well. I want us to have a meaningful discussion around it
>>> before we drop a big PR.
>> 
>> This has been a huge issue with our codebase since at least back when I
>> first encountered it five years ago. To date, while we have made progress
>> on this front, it's been nowhere near sufficient to mitigate the issues in
>> the codebase and allow for large, meaningful changes in smaller incremental
>> patches or commits. Having yet another discussion around this (there have
>> been many, many of them over the years) as a blocker for significant work
>> to go into the codebase is an unnecessary and dangerous blocker. Not to say
>> we shouldn't formalize a path to try and make incremental progress to
>> improve the situation, far from it, but blocking other progress on a
>> decade's worth of accumulated hygiene problems isn't going to make the
>> community focus on fixing those problems imo, it'll just turn away
>> contributions.
>> 
> 
>> So let me second jd (and many others') opinion here: "it makes sense to get
>> it right the first time, rather than applying bandaids to 4.0 and rewriting
>> things for 4.next". And fwiw, asking people who have already done a huge
>> body of work to reformat that work into a series of commits or to break up
>> that work in a fashion that's more to the liking of people not involved in
>> either the writing of the patch or reviewing of it doesn't make much sense
>> to me. As I am neither an assignee nor reviewer on this contribution, I
>> leave it up to the parties involved to do things professionally and with a
>> high standard of quality. Admittedly, a large code change merging in like
>> this has implications for rebasing on anyone else's work that's in 

Re: Stabilising Internode Messaging in 4.0

2019-04-12 Thread Pavel Yaskevich
On Thu, Apr 11, 2019 at 10:15 PM Joshua McKenzie 
wrote:

> As one of the two people that re-wrote all our unit tests to try and help
> Sylvain get 8099 out the door, I think it's inaccurate to compare the scope
> and potential stability impact of this work to the truly sweeping work that
> went into 8099 (not to downplay the scope and extent of this work here).
>
> TBH, one of the big reasons we tend to drop such large PRs is the fact that
> > Cassandra's code is highly intertwined and it makes it hard to precisely
> > change things. We need to iterate towards interfaces that allow us to
> > iterate quickly and reduce the amount of highly intertwined code. It
> helps
> > with testing as well. I want us to have a meaningful discussion around it
> > before we drop a big PR.
>
> This has been a huge issue with our codebase since at least back when I
> first encountered it five years ago. To date, while we have made progress
> on this front, it's been nowhere near sufficient to mitigate the issues in
> the codebase and allow for large, meaningful changes in smaller incremental
> patches or commits. Having yet another discussion around this (there have
> been many, many of them over the years) as a blocker for significant work
> to go into the codebase is an unnecessary and dangerous blocker. Not to say
> we shouldn't formalize a path to try and make incremental progress to
> improve the situation, far from it, but blocking other progress on a
> decade's worth of accumulated hygiene problems isn't going to make the
> community focus on fixing those problems imo, it'll just turn away
> contributions.
>

> So let me second jd (and many others') opinion here: "it makes sense to get
> it right the first time, rather than applying bandaids to 4.0 and rewriting
> things for 4.next". And fwiw, asking people who have already done a huge
> body of work to reformat that work into a series of commits or to break up
> that work in a fashion that's more to the liking of people not involved in
> either the writing of the patch or reviewing of it doesn't make much sense
> to me. As I am neither an assignee nor reviewer on this contribution, I
> leave it up to the parties involved to do things professionally and with a
> high standard of quality. Admittedly, a large code change merging in like
> this has implications for rebasing on anyone else's work that's in flight,
> but be it one commit merged or 50, or be it one JIRA ticket or ten, the
> end-result is the same; any large contribution in any format will ripple
> outwards and require re-work from others in the community.
>
The one thing I *would* strongly argue for is performance benchmarking of
> the new messaging code on a representative sample of different
> general-purpose queries, LWT's, etc, preferably in a 3 node RF=3 cluster,
> plus a healthy suite of jmh micro-benches (assuming they're not already in
> the diff. If they are, disregard / sorry). From speaking with Aleksey
> offline about this work, my understanding is that that's something they
> plan on doing before putting a bow on things.
>
> In the balance between "fear change because it destabilizes" and "go forth
> blindly into that dark night, rewriting All The Things", I think the
> Cassandra project's willingness to jettison the old and introduce the new
> has served it well in keeping relevant as the years have gone by. I'd hate
> to see that culture of progress get mired in a dogmatic adherence to
> requirements on commit counts, lines of code allowed / expected on a given
> patch set, or any other metrics that might stymie the professional needs of
> some of the heaviest contributors to the project.
>

+1. Based on all of the discussion here and in JIRA it seems to me that
we'd be doing a big disservice to the users by outright rejecting the
changes just
based on +/- LoC or complexity of review. From the points raised it seems
like
enabling encryption by default (or even making it the only available
option?),
upstreaming Netty related changes, possible steps to improve codebase, as
well as how the changes should be formatted to aid the reviewers could all
be discussed separately.

I think at the end of the day it _might be_ reasonable for PMC have a final
say on the matter,
maybe even on point-by-point basis.

>

>
> On Wed, Apr 10, 2019 at 5:03 PM Oleksandr Petrov <
> oleksandr.pet...@gmail.com>
> wrote:
>
> > Sorry to pick only a few points to address, but I think these are ones
> > where I can contribute productively to the discussion.
> >
> > > In principle, I agree with the technical improvements you
> > mention (backpressure / checksumming / etc). These things should be
> there.
> > Are they a hard requirement for 4.0?
> >
> > One thing that comes to mind is protocol versioning and consistency. If
> > changes adding checksumming and handshake do not make it to 4.0, we grow
> > the upgrade matrix and have to put changes to the separate protocol
> > version. I'm not sure how many other internode pr

Re: Stabilising Internode Messaging in 4.0

2019-04-11 Thread Joshua McKenzie
As one of the two people that re-wrote all our unit tests to try and help
Sylvain get 8099 out the door, I think it's inaccurate to compare the scope
and potential stability impact of this work to the truly sweeping work that
went into 8099 (not to downplay the scope and extent of this work here).

TBH, one of the big reasons we tend to drop such large PRs is the fact that
> Cassandra's code is highly intertwined and it makes it hard to precisely
> change things. We need to iterate towards interfaces that allow us to
> iterate quickly and reduce the amount of highly intertwined code. It helps
> with testing as well. I want us to have a meaningful discussion around it
> before we drop a big PR.

This has been a huge issue with our codebase since at least back when I
first encountered it five years ago. To date, while we have made progress
on this front, it's been nowhere near sufficient to mitigate the issues in
the codebase and allow for large, meaningful changes in smaller incremental
patches or commits. Having yet another discussion around this (there have
been many, many of them over the years) as a blocker for significant work
to go into the codebase is an unnecessary and dangerous blocker. Not to say
we shouldn't formalize a path to try and make incremental progress to
improve the situation, far from it, but blocking other progress on a
decade's worth of accumulated hygiene problems isn't going to make the
community focus on fixing those problems imo, it'll just turn away
contributions.

So let me second jd (and many others') opinion here: "it makes sense to get
it right the first time, rather than applying bandaids to 4.0 and rewriting
things for 4.next". And fwiw, asking people who have already done a huge
body of work to reformat that work into a series of commits or to break up
that work in a fashion that's more to the liking of people not involved in
either the writing of the patch or reviewing of it doesn't make much sense
to me. As I am neither an assignee nor reviewer on this contribution, I
leave it up to the parties involved to do things professionally and with a
high standard of quality. Admittedly, a large code change merging in like
this has implications for rebasing on anyone else's work that's in flight,
but be it one commit merged or 50, or be it one JIRA ticket or ten, the
end-result is the same; any large contribution in any format will ripple
outwards and require re-work from others in the community.

The one thing I *would* strongly argue for is performance benchmarking of
the new messaging code on a representative sample of different
general-purpose queries, LWT's, etc, preferably in a 3 node RF=3 cluster,
plus a healthy suite of jmh micro-benches (assuming they're not already in
the diff. If they are, disregard / sorry). From speaking with Aleksey
offline about this work, my understanding is that that's something they
plan on doing before putting a bow on things.

In the balance between "fear change because it destabilizes" and "go forth
blindly into that dark night, rewriting All The Things", I think the
Cassandra project's willingness to jettison the old and introduce the new
has served it well in keeping relevant as the years have gone by. I'd hate
to see that culture of progress get mired in a dogmatic adherence to
requirements on commit counts, lines of code allowed / expected on a given
patch set, or any other metrics that might stymie the professional needs of
some of the heaviest contributors to the project.

On Wed, Apr 10, 2019 at 5:03 PM Oleksandr Petrov 
wrote:

> Sorry to pick only a few points to address, but I think these are ones
> where I can contribute productively to the discussion.
>
> > In principle, I agree with the technical improvements you
> mention (backpressure / checksumming / etc). These things should be there.
> Are they a hard requirement for 4.0?
>
> One thing that comes to mind is protocol versioning and consistency. If
> changes adding checksumming and handshake do not make it to 4.0, we grow
> the upgrade matrix and have to put changes to the separate protocol
> version. I'm not sure how many other internode protocol changes we have
> planned for 4.next, but this is definitely something we should keep in
> mind.
>
> > 2. We should not be measuring complexity in LoC with the exception that
> all 20k lines *do need to be review* (not just the important parts and
> because code refactoring tools have bugs too) and more lines take more
> time.
>
> Everything should definitely be reviewed. But with different rigour: one
> thing is to review byte arithmetic and protocol formats and completely
> different thing is to verify that Verb moved from one place to the other is
> still used. Concentrating on a smaller subset doesn't make a patch smaller,
> just helps to guide reviewers and observers, so my primary goal was to help
> people, hence my reference to the jira comment I'm working on.
>
>
> On Wed, Apr 10, 2019 at 6:13 PM Sankalp Kohli 
> wrote:
>
> 

Re: Stabilising Internode Messaging in 4.0

2019-04-10 Thread Oleksandr Petrov
Sorry to pick only a few points to address, but I think these are ones
where I can contribute productively to the discussion.

> In principle, I agree with the technical improvements you
mention (backpressure / checksumming / etc). These things should be there.
Are they a hard requirement for 4.0?

One thing that comes to mind is protocol versioning and consistency. If
changes adding checksumming and handshake do not make it to 4.0, we grow
the upgrade matrix and have to put changes to the separate protocol
version. I'm not sure how many other internode protocol changes we have
planned for 4.next, but this is definitely something we should keep in mind.

> 2. We should not be measuring complexity in LoC with the exception that
all 20k lines *do need to be review* (not just the important parts and
because code refactoring tools have bugs too) and more lines take more time.

Everything should definitely be reviewed. But with different rigour: one
thing is to review byte arithmetic and protocol formats and completely
different thing is to verify that Verb moved from one place to the other is
still used. Concentrating on a smaller subset doesn't make a patch smaller,
just helps to guide reviewers and observers, so my primary goal was to help
people, hence my reference to the jira comment I'm working on.


On Wed, Apr 10, 2019 at 6:13 PM Sankalp Kohli 
wrote:

> I think we should wait for testing doc on confluence to come up and
> discuss what all needs to be added there to gain confidence.
>
> If the work is more to split the patch into smaller parts and delays 4.0
> even more, can we use time in adding more test coverage, documentation and
> design docs to this component?  Will that be a middle ground here ?
>
> Examples of large changes not going well is due to lack of testing,
> documentation and design docs not because they were big. Being big adds to
> the complexity and increased the total bug count but small changes can be
> equally worse in terms of impact.
>
>
> > On Apr 10, 2019, at 8:53 AM, Jordan West  wrote:
> >
> > There is a lot of discuss here so I’ll try to keep my opinions brief:
> >
> > 1. The bug fixes are a requirement in order to have a stable 4.0. Whether
> > they come from this patch or the original I have less of an opinion. I do
> > think its important to minimize code changes at this time in the
> > development/freeze cycle — including large refactors which add risk
> despite
> > how they are being discussed here. From that perspective, I would prefer
> to
> > see more targeted fixes but since we don’t have them and we have this
> patch
> > the decision is different.
> >
> > 2. We should not be measuring complexity in LoC with the exception that
> all
> > 20k lines *do need to be review* (not just the important parts and
> because
> > code refactoring tools have bugs too) and more lines take more time.
> > Otherwise, its a poor metric for how long this will take to review.
> > Further, it seems odd that the authors are projecting how long it will
> take
> > to review — this should be the charge of the reviewers and I would like
> to
> > hear from them on this. Its clear this a complex patch — as risky as
> > something like 8099 (and the original rewrite by Jason). We should treat
> it
> > as such and not try to merge it in quickly for the sake of it, repeating
> > past mistakes. The goal of reviewing the messaging service work was to do
> > just that. It would be a disservice to rush in these changes now. If the
> > goal is to get the patch in that should be the priority, not completing
> it
> > “in two weeks”. Its clear several community members have pushed back on
> > that and are not comfortable with the time frame.
> >
> > 3. If we need to add special forks of Netty classes to the code because
> of
> > “how we use Netty” that is a concern to me w.r.t to quality, stability,
> and
> > maintenance. Is there a place that documents/justifies our
> non-traditional
> > usage (I saw some in JavaDocs but found it lacking in *why* but had a lot
> > of how/what was changed). Given folks in the community have decent
> > relationships with the Netty team perhaps we should leverage that as
> well.
> > Have we reached out to them?
> >
> > 4. In principle, I agree with the technical improvements you mention
> > (backpressure / checksumming / etc). These things should be there. Are
> they
> > a hard requirement for 4.0? In my opinion no and Dinesh has done a good
> job
> > of providing some reasons as to why so I won’t go into much detail here.
> In
> > short, a bug and a missing safety mechanism are not the same thing. Its
> > also important to consider how many users a change like that covers and
> for
> > what risk — we found a bug in 13304 late in review, had it slipped
> through
> > it would have subjected users to silent corruption they thought couldn’t
> > occur anymore because we included the feature in a prod Cassandra
> release.
> >
> > 5. The patch could seriously benefit from some com

Re: Stabilising Internode Messaging in 4.0

2019-04-10 Thread Dinesh Joshi
> On Apr 10, 2019, at 9:06 AM, Sankalp Kohli  wrote:
> 
> I think we should wait for testing doc on confluence to come up and discuss 
> what all needs to be added there to gain confidence. 
> 
> If the work is more to split the patch into smaller parts and delays 4.0 even 
> more, can we use time in adding more test coverage, documentation and design 
> docs to this component?  Will that be a middle ground here ? 

+1 for finding the right middle ground.

If splitting is an issue, we can look through opportunities to split this. Here 
are the benefits –

1. It makes verifying and testing the code a lot easier.
2. It also helps us pin-point all sorts of regressions (git bisect anybody?) 
precisely. 
3. It also gives us the opportunity to add the correct interfaces. More on this 
later.
4. It helps make the review quicker.

TBH, one of the big reasons we tend to drop such large PRs is the fact that 
Cassandra's code is highly intertwined and it makes it hard to precisely change 
things. We need to iterate towards interfaces that allow us to iterate quickly 
and reduce the amount of highly intertwined code. It helps with testing as 
well. I want us to have a meaningful discussion around it before we drop a big 
PR.

> Examples of large changes not going well is due to lack of testing, 
> documentation and design docs not because they were big. Being big adds to 
> the complexity and increased the total bug count but small changes can be 
> equally worse in terms of impact. 

+1 so many times, Sankalp and I recognize this as well. Thank you for calling 
it out.

I was working with some of the PMCs to develop a document with concrete 
guidelines to help us standardize and avoid issues like these. Due to various 
reasons, that document did not go anywhere. We had similar debate on the 
sidecar and we used CIP to help with it. It did help make the discussion more 
meaningful.

Lack of documentation as many people have called out makes it really hard for 
us to review things especially with such a large change.

Dinesh


> 
>> On Apr 10, 2019, at 8:53 AM, Jordan West  wrote:
>> 
>> There is a lot of discuss here so I’ll try to keep my opinions brief:
>> 
>> 1. The bug fixes are a requirement in order to have a stable 4.0. Whether
>> they come from this patch or the original I have less of an opinion. I do
>> think its important to minimize code changes at this time in the
>> development/freeze cycle — including large refactors which add risk despite
>> how they are being discussed here. From that perspective, I would prefer to
>> see more targeted fixes but since we don’t have them and we have this patch
>> the decision is different.
>> 
>> 2. We should not be measuring complexity in LoC with the exception that all
>> 20k lines *do need to be review* (not just the important parts and because
>> code refactoring tools have bugs too) and more lines take more time.
>> Otherwise, its a poor metric for how long this will take to review.
>> Further, it seems odd that the authors are projecting how long it will take
>> to review — this should be the charge of the reviewers and I would like to
>> hear from them on this. Its clear this a complex patch — as risky as
>> something like 8099 (and the original rewrite by Jason). We should treat it
>> as such and not try to merge it in quickly for the sake of it, repeating
>> past mistakes. The goal of reviewing the messaging service work was to do
>> just that. It would be a disservice to rush in these changes now. If the
>> goal is to get the patch in that should be the priority, not completing it
>> “in two weeks”. Its clear several community members have pushed back on
>> that and are not comfortable with the time frame.
>> 
>> 3. If we need to add special forks of Netty classes to the code because of
>> “how we use Netty” that is a concern to me w.r.t to quality, stability, and
>> maintenance. Is there a place that documents/justifies our non-traditional
>> usage (I saw some in JavaDocs but found it lacking in *why* but had a lot
>> of how/what was changed). Given folks in the community have decent
>> relationships with the Netty team perhaps we should leverage that as well.
>> Have we reached out to them?
>> 
>> 4. In principle, I agree with the technical improvements you mention
>> (backpressure / checksumming / etc). These things should be there. Are they
>> a hard requirement for 4.0? In my opinion no and Dinesh has done a good job
>> of providing some reasons as to why so I won’t go into much detail here. In
>> short, a bug and a missing safety mechanism are not the same thing. Its
>> also important to consider how many users a change like that covers and for
>> what risk — we found a bug in 13304 late in review, had it slipped through
>> it would have subjected users to silent corruption they thought couldn’t
>> occur anymore because we included the feature in a prod Cassandra release.
>> 
>> 5. The patch could seriously benefit from some commit hygiene that would
>>

Re: Stabilising Internode Messaging in 4.0

2019-04-10 Thread Sankalp Kohli
I think we should wait for testing doc on confluence to come up and discuss 
what all needs to be added there to gain confidence. 

If the work is more to split the patch into smaller parts and delays 4.0 even 
more, can we use time in adding more test coverage, documentation and design 
docs to this component?  Will that be a middle ground here ? 

Examples of large changes not going well is due to lack of testing, 
documentation and design docs not because they were big. Being big adds to the 
complexity and increased the total bug count but small changes can be equally 
worse in terms of impact. 


> On Apr 10, 2019, at 8:53 AM, Jordan West  wrote:
> 
> There is a lot of discuss here so I’ll try to keep my opinions brief:
> 
> 1. The bug fixes are a requirement in order to have a stable 4.0. Whether
> they come from this patch or the original I have less of an opinion. I do
> think its important to minimize code changes at this time in the
> development/freeze cycle — including large refactors which add risk despite
> how they are being discussed here. From that perspective, I would prefer to
> see more targeted fixes but since we don’t have them and we have this patch
> the decision is different.
> 
> 2. We should not be measuring complexity in LoC with the exception that all
> 20k lines *do need to be review* (not just the important parts and because
> code refactoring tools have bugs too) and more lines take more time.
> Otherwise, its a poor metric for how long this will take to review.
> Further, it seems odd that the authors are projecting how long it will take
> to review — this should be the charge of the reviewers and I would like to
> hear from them on this. Its clear this a complex patch — as risky as
> something like 8099 (and the original rewrite by Jason). We should treat it
> as such and not try to merge it in quickly for the sake of it, repeating
> past mistakes. The goal of reviewing the messaging service work was to do
> just that. It would be a disservice to rush in these changes now. If the
> goal is to get the patch in that should be the priority, not completing it
> “in two weeks”. Its clear several community members have pushed back on
> that and are not comfortable with the time frame.
> 
> 3. If we need to add special forks of Netty classes to the code because of
> “how we use Netty” that is a concern to me w.r.t to quality, stability, and
> maintenance. Is there a place that documents/justifies our non-traditional
> usage (I saw some in JavaDocs but found it lacking in *why* but had a lot
> of how/what was changed). Given folks in the community have decent
> relationships with the Netty team perhaps we should leverage that as well.
> Have we reached out to them?
> 
> 4. In principle, I agree with the technical improvements you mention
> (backpressure / checksumming / etc). These things should be there. Are they
> a hard requirement for 4.0? In my opinion no and Dinesh has done a good job
> of providing some reasons as to why so I won’t go into much detail here. In
> short, a bug and a missing safety mechanism are not the same thing. Its
> also important to consider how many users a change like that covers and for
> what risk — we found a bug in 13304 late in review, had it slipped through
> it would have subjected users to silent corruption they thought couldn’t
> occur anymore because we included the feature in a prod Cassandra release.
> 
> 5. The patch could seriously benefit from some commit hygiene that would
> make it easier for folks to review. Had this been done not only would
> review be easier but also the piecemeal breakup of features/fixes could
> have been done more easily. I don’t buy the premise that this wasn’t
> possible. If we had to add the feature/fix later it would have been
> possible. I’m sure there was a smart way we could have organized it, if it
> was a priority.
> 
> 6. Have any upgrade tests been done/added? I would also like to see some
> performance benchmarks before merging so that the patch is in a similar
> state as 14503 in terms of performance testing.
> 
> I’m sure I have more thoughts but these seem like the important ones for
> now.
> 
> Jordan
> 
> On Wed, Apr 10, 2019 at 8:21 AM Dinesh Joshi 
> wrote:
> 
>> Here's are my 2¢.
>> 
>> I think the general direction of this work is valuable but I have a few
>> concerns I’d like to address. More inline.
>> 
>>> On Apr 4, 2019, at 1:13 PM, Aleksey Yeschenko 
>> wrote:
>>> 
>>> I would like to propose CASSANDRA-15066 [1] - an important set of bug
>> fixes
>>> and stability improvements to internode messaging code that Benedict, I,
>>> and others have been working on for the past couple of months.
>>> 
>>> First, some context.   This work started off as a review of
>> CASSANDRA-14503
>>> (Internode connection management is race-prone [2]), CASSANDRA-13630
>>> (Support large internode messages with netty) [3], and a pre-4.0
>>> confirmatory review of such a major new feature.
>>> 
>>> However, as w

Re: Stabilising Internode Messaging in 4.0

2019-04-10 Thread Jordan West
There is a lot of discuss here so I’ll try to keep my opinions brief:

1. The bug fixes are a requirement in order to have a stable 4.0. Whether
they come from this patch or the original I have less of an opinion. I do
think its important to minimize code changes at this time in the
development/freeze cycle — including large refactors which add risk despite
how they are being discussed here. From that perspective, I would prefer to
see more targeted fixes but since we don’t have them and we have this patch
the decision is different.

2. We should not be measuring complexity in LoC with the exception that all
20k lines *do need to be review* (not just the important parts and because
code refactoring tools have bugs too) and more lines take more time.
Otherwise, its a poor metric for how long this will take to review.
Further, it seems odd that the authors are projecting how long it will take
to review — this should be the charge of the reviewers and I would like to
hear from them on this. Its clear this a complex patch — as risky as
something like 8099 (and the original rewrite by Jason). We should treat it
as such and not try to merge it in quickly for the sake of it, repeating
past mistakes. The goal of reviewing the messaging service work was to do
just that. It would be a disservice to rush in these changes now. If the
goal is to get the patch in that should be the priority, not completing it
“in two weeks”. Its clear several community members have pushed back on
that and are not comfortable with the time frame.

3. If we need to add special forks of Netty classes to the code because of
“how we use Netty” that is a concern to me w.r.t to quality, stability, and
maintenance. Is there a place that documents/justifies our non-traditional
usage (I saw some in JavaDocs but found it lacking in *why* but had a lot
of how/what was changed). Given folks in the community have decent
relationships with the Netty team perhaps we should leverage that as well.
Have we reached out to them?

4. In principle, I agree with the technical improvements you mention
(backpressure / checksumming / etc). These things should be there. Are they
a hard requirement for 4.0? In my opinion no and Dinesh has done a good job
of providing some reasons as to why so I won’t go into much detail here. In
short, a bug and a missing safety mechanism are not the same thing. Its
also important to consider how many users a change like that covers and for
what risk — we found a bug in 13304 late in review, had it slipped through
it would have subjected users to silent corruption they thought couldn’t
occur anymore because we included the feature in a prod Cassandra release.

5. The patch could seriously benefit from some commit hygiene that would
make it easier for folks to review. Had this been done not only would
review be easier but also the piecemeal breakup of features/fixes could
have been done more easily. I don’t buy the premise that this wasn’t
possible. If we had to add the feature/fix later it would have been
possible. I’m sure there was a smart way we could have organized it, if it
was a priority.

6. Have any upgrade tests been done/added? I would also like to see some
performance benchmarks before merging so that the patch is in a similar
state as 14503 in terms of performance testing.

I’m sure I have more thoughts but these seem like the important ones for
now.

Jordan

On Wed, Apr 10, 2019 at 8:21 AM Dinesh Joshi 
wrote:

> Here's are my 2¢.
>
> I think the general direction of this work is valuable but I have a few
> concerns I’d like to address. More inline.
>
> > On Apr 4, 2019, at 1:13 PM, Aleksey Yeschenko 
> wrote:
> >
> > I would like to propose CASSANDRA-15066 [1] - an important set of bug
> fixes
> > and stability improvements to internode messaging code that Benedict, I,
> > and others have been working on for the past couple of months.
> >
> > First, some context.   This work started off as a review of
> CASSANDRA-14503
> > (Internode connection management is race-prone [2]), CASSANDRA-13630
> > (Support large internode messages with netty) [3], and a pre-4.0
> > confirmatory review of such a major new feature.
> >
> > However, as we dug in, we realized this was insufficient. With more than
> 50
> > bugs uncovered [4] - dozens of them critical to correctness and/or
> > stability of the system - a substantial rework was necessary to
> guarantee a
> > solid internode messaging subsystem for the 4.0 release.
> >
> > In addition to addressing all of the uncovered bugs [4] that were unique
> to
> > trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> > long-existing, pre-4.0 bugs and stability issues. For the complete list
> of
> > notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> > to highlight a few.
>
> Do you have regression tests that will fail if these bugs are reintroduced
> at a later point?
>
> > # Lack of message integrity checks
> >
> > It’s known that TCP checksums 

Re: Stabilising Internode Messaging in 4.0

2019-04-10 Thread Dinesh Joshi
Here's are my 2¢.

I think the general direction of this work is valuable but I have a few 
concerns I’d like to address. More inline.

> On Apr 4, 2019, at 1:13 PM, Aleksey Yeschenko  wrote:
> 
> I would like to propose CASSANDRA-15066 [1] - an important set of bug fixes
> and stability improvements to internode messaging code that Benedict, I,
> and others have been working on for the past couple of months.
> 
> First, some context.   This work started off as a review of CASSANDRA-14503
> (Internode connection management is race-prone [2]), CASSANDRA-13630
> (Support large internode messages with netty) [3], and a pre-4.0
> confirmatory review of such a major new feature.
> 
> However, as we dug in, we realized this was insufficient. With more than 50
> bugs uncovered [4] - dozens of them critical to correctness and/or
> stability of the system - a substantial rework was necessary to guarantee a
> solid internode messaging subsystem for the 4.0 release.
> 
> In addition to addressing all of the uncovered bugs [4] that were unique to
> trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> long-existing, pre-4.0 bugs and stability issues. For the complete list of
> notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> to highlight a few.

Do you have regression tests that will fail if these bugs are reintroduced at a 
later point?

> # Lack of message integrity checks
> 
> It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
> relied upon [6] for integrity. With sufficient scale or time, you will hit
> bit flips. Sadly, most of the time these go undetected.  Cassandra’s
> replication model makes this issue much more serious, as the faulty data
> can infect the cluster.
> 
> We recognised this problem, and recently introduced a fix for server-client
> messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to the
> native protocol) [7].

This was discussed in my review and Jason created a ticket[1] to track it. We 
explicitly decided to defer this work not only due to the feature freeze in the 
community but also for technical reasons detailed below. 

Regarding new features during the feature freeze window, we have had such 
discussions in the past. The most recent being the one I initiated on Zstd 
Compressor which went positively and we have moved forward after assessing risk 
& community consensus.

Regarding checksumming, please scroll down to the comments section in the 
link[2] you provided. You'll notice this discussion –

>> Daniel Fox Franke:
>> Please don't design new network protocols that don't either run over TLS or 
>> do some other kind of cryptographic authentication. If you have 
>> cryptographic authentication, then CRC is redundant.
> 
> Evan Jones:
> Good point. Even internal applications inside a data center should be using 
> encryption, and today the performance impact is probably small (I haven't 
> actually measured it myself these days).

Enabling TLS & internode compression are mitigation strategies to avoid data 
corruption in transit. By your own admission in CASSANDRA-13304[3], we don't 
require checksumming if TLS is present. Here's your full quote –

> Aleksey Yeschenko:
> 

> Checksums and TLS are orthogonal. It just so happens that you don't need the 
> former if you already have the latter.

I want to be fair, later you did say that we don't want to force people to pay 
the cost of TLS overhead. However, I would also like to point out that with 
introduction of Netty for internode communication, we have 4-5x the TLS 
performance thanks to OpenSSL JNI bindings. You can refer to Norman or my talks 
on the topic. So TLS is practical & compression is necessary for performance. 
Both strategies work fine to protect against data corruption making 
checksumming redundant. With SSL certificate hot reloading, it also avoids 
unnecessary cluster restarts providing maximum availability.

In the same vein, it's 2019 and if people are not using TLS for internode, then 
it is really really bad for data security in our industry and we should not be 
encouraging it. In fact, I would go so far as to make TLS as the default. 

Managing TLS infrastructure is beyond Cassandra's scope and operators should 
figure it out by now for their & their user's sake. Cassandra makes it super 
easy & performant to have TLS enabled. People should be using it.

> But until CASSANDRA-15066 [1] lands, this is also a critical flaw
> internode. We have addressed it by ensuring that no matter what, whether
> you use SSL or not, whether you use internode compression or not, a
> protocol level CRC is always present, for every message frame. It’s our
> deep and sincere belief that shipping a new implementation of the messaging
> protocol without application-level data integrity checks would be
> unacceptable for a modern database.

My previous technical arguments have provided enough evidence that protocol 
level checksumming is not a show st

Re: Stabilising Internode Messaging in 4.0

2019-04-10 Thread Oleksandr Petrov
To be fair, even though the patch totals to 20K LoC, the core of the patch
is within reasonable bounds (around net.async.*). There are many changes
because of the code that got moved around. Some parts changes look large
because Java is quite a verbose language (e.g., metric tables).
Unfortunately, I do not see a good way to split out the bug fixes specific
to netty refactor from some other changes, since some things were
fundamentally reworked.

I'll leave a more elaborate comment that summarises the way I've been
approaching the patch review that might help to identify the "important"
parts that one should concentrate on while reviewing.

I've been reviewing the patch and can say that it has several advantages,
including the fact that incoming and outgoing paths are now easier to test
(e.g., write unit tests for). This has helped to validate rather complex
scenarios, such as handshake protocol, back pressure, dropping messages,
large messages, expiry, counting and more. Patch also integrates well with
in-jvm tests, which allowed to verify several behaviors which otherwise
would've been somewhat difficult to reproduce.

I do agree that validating this patch is no easy endeavor, but having that
said, at that point, we would have to invest a similar amount of time to
validate 4.0 with and without this patch.

I'm personally leaning towards 4.0, since this helps to keep the community
unified testing the same version all together instead of some concentrating
on making 4.0 work, while some are skipping it and proceeding with 4.next.


On Wed, Apr 10, 2019 at 4:55 PM Benedict Elliott Smith 
wrote:

> Appreciate everyone's input. It sounds like there's broad agreement that
> the fixes need to make it into 4.0, which I'm really pleased to see. The
> question seems to be moving towards scope and timing.
>
> TL;DR: This patch is probably the most efficient route to a stable 4.0.
> The patch is already reviewed, extensively tested, and more thorough
> testing is coming. From our perspective, a timeline for merging this is on
> the order of two weeks.
>
> To best answer the question of ideal scope and timing, it's perhaps
> worthwhile considering what our ideal approach would look like, given
> realistic breathing room, and then how other pressures might prefer to
> modify that. In my opinion, both views endorse proceeding with the patch
> largely as it stands, though I will highlight some of the strongest cases
> for splitting.
>
> To answer the first question: Were I starting with 3.0, and producing a
> major refactor of messaging, would I have produced this patch, or one very
> like it? The answer is yes.
>
> These changes have been designed together, intentionally. By making them
> simultaneously, we not only reduce the review and work burden, we reduce
> the overall risk by ensuring all of the semantics of each component are
> designed in harmony. A trickle of changes requires subtly updating the
> semantics of each component, across multiple constantly shifting interfaces
> between old and new. Each time we touch one of these interfaces, we risk
> failing to properly modify (or more likely hack) one side to understand the
> other, and carrying those misunderstandings and hacks through to later
> patches.
>
> Each time we do this, it is costly in both time and risk. The project
> can't really afford either, even in the wider picture sense. There is
> already too much work to do, and risk to mitigate.
>
> Just as importantly, most of these changes are necessary, and actually fix
> bugs. For example, the “performance improvements” from re-implementing
> Netty classes are actually to avoid bugs in our usage, since they are
> designed for use only on the event loop.
>
> - Using our own allocator avoids cross-thread recycling, which can cause
> unbounded heap growth in Netty today (and is the topic of active discussion
> on the Netty bug tracker [1]). We already have our own allocator that
> works, and arguably it is lowest risk to repurpose it. We also reduce risk
> by using existing ByteBuffer methods, keeping the code more idiomatic.
> - AsyncPromise is used to avoid blocking the event loop. Modifying and
> invoking listeners on a DefaultPromise requires taking a mutex, and we do
> this on threads besides the event loop. This could stall the event loop for
> an entire scheduler quanta (or more, if the owning thread is low priority
> or has been extremely busy recently), which is a bug for a real-time
> networking thread that could be servicing dozens of connections.
> - FutureCombiner in Netty is not thread-safe, and we need it to be.
>
> Some are perhaps not bug fixes, but a fundamental part of the design of
> the patch. You either get them for free, or you write a different patch.
> For example, checksumming, which falls naturally out of framing - which is
> integral to the new message flow. Or dropping messages eagerly when reading
> off the wire, which is simply natural to do in this design. Back pressure
> f

Re: Stabilising Internode Messaging in 4.0

2019-04-10 Thread Benedict Elliott Smith
Appreciate everyone's input. It sounds like there's broad agreement that the 
fixes need to make it into 4.0, which I'm really pleased to see. The question 
seems to be moving towards scope and timing.

TL;DR: This patch is probably the most efficient route to a stable 4.0. The 
patch is already reviewed, extensively tested, and more thorough testing is 
coming. From our perspective, a timeline for merging this is on the order of 
two weeks.

To best answer the question of ideal scope and timing, it's perhaps worthwhile 
considering what our ideal approach would look like, given realistic breathing 
room, and then how other pressures might prefer to modify that. In my opinion, 
both views endorse proceeding with the patch largely as it stands, though I 
will highlight some of the strongest cases for splitting.

To answer the first question: Were I starting with 3.0, and producing a major 
refactor of messaging, would I have produced this patch, or one very like it? 
The answer is yes.

These changes have been designed together, intentionally. By making them 
simultaneously, we not only reduce the review and work burden, we reduce the 
overall risk by ensuring all of the semantics of each component are designed in 
harmony. A trickle of changes requires subtly updating the semantics of each 
component, across multiple constantly shifting interfaces between old and new. 
Each time we touch one of these interfaces, we risk failing to properly modify 
(or more likely hack) one side to understand the other, and carrying those 
misunderstandings and hacks through to later patches.

Each time we do this, it is costly in both time and risk. The project can't 
really afford either, even in the wider picture sense. There is already too 
much work to do, and risk to mitigate.

Just as importantly, most of these changes are necessary, and actually fix 
bugs. For example, the “performance improvements” from re-implementing Netty 
classes are actually to avoid bugs in our usage, since they are designed for 
use only on the event loop.

- Using our own allocator avoids cross-thread recycling, which can cause 
unbounded heap growth in Netty today (and is the topic of active discussion on 
the Netty bug tracker [1]). We already have our own allocator that works, and 
arguably it is lowest risk to repurpose it. We also reduce risk by using 
existing ByteBuffer methods, keeping the code more idiomatic.
- AsyncPromise is used to avoid blocking the event loop. Modifying and invoking 
listeners on a DefaultPromise requires taking a mutex, and we do this on 
threads besides the event loop. This could stall the event loop for an entire 
scheduler quanta (or more, if the owning thread is low priority or has been 
extremely busy recently), which is a bug for a real-time networking thread that 
could be servicing dozens of connections.
- FutureCombiner in Netty is not thread-safe, and we need it to be.

Some are perhaps not bug fixes, but a fundamental part of the design of the 
patch. You either get them for free, or you write a different patch. For 
example, checksumming, which falls naturally out of framing - which is integral 
to the new message flow. Or dropping messages eagerly when reading off the 
wire, which is simply natural to do in this design. Back pressure falls roughly 
into this category as well, and fixes a major stability bug. Perhaps the 
biggest stability bug we have today.

The new handshake protocol is probably the best candidate for splitting into 
its own patch. There's a good case to be made here, but the question is: what 
are we hoping to achieve by separating it? Reviewing these specific changes is 
not a significant burden, I don't think. Testing it independently probably 
doesn't yield significant benefit - if anything it's probably advantageous to 
include the changes in our atypically thorough testing of this subsystem.

Perhaps we can take this discussion onto Jira, to dive into specifics on more 
items? I've tried to keep it high level, and only select a few items, and it's 
perhaps already too deep in the weeds. We're absolutely open to modifying the 
scope if it definitely makes sense, and that's probably best established with 
some deep discussions on specific points.

So, what about our current pressures?

I think it's clear that landing this patch is the most efficient route to 4.0 
release. It has already exceeded the normal barrier for inclusion in the 
project - it has been reviewed by two people already (50% each by me and 
Aleksey, and 100% by Alex Petrov). It is already well tested, and I will 
shortly be posting a test plan to the Wiki outlining plans for further 
coverage. Suffice to say we anticipate atypically thorough test coverage before 
the patch is committed, and an extremely high confidence in the result. We also 
don't anticipate this taking a significant length of time to achieve.

Attempting to incorporate the patch in a piecemeal fashion can only slow things 
down and, in my

Re: Stabilising Internode Messaging in 4.0

2019-04-09 Thread Joseph Lynch
Let's try this again, apparently email is hard ...

I am relatively new to these code paths—especially compared to the
committers that have been working on these issues for years such as
the 15066 authors as well as Jason Brown—but like many Cassandra users
I am familiar with many of the classes of issues Aleksey and Benedict
have identified with this patchset (especially related to messaging
correctness, performance and the lack of message backpressure). We
believe that every single fix and feature in this patch is valuable
and we desire that we are able to get them all merged in and
validated. We don’t think it’s even a question if we want to merge
these: we should want these excellent changes. The only questions—in
my opinion—are how do we safely merge them and when do we merge them?

Due to my and Vinay’s relative lack of knowledge of these code paths,
we hope that we can get as many experienced eyes as we can to review
the patch and evaluate the risk-reward tradeoffs of some of the deeper
changes. We don’t feel qualified to make assertions about risk vs
reward in this patchset, but I know there are a number of people on
this mailing list who are qualified and I think we would all
appreciate their insight and help.

I completely understand that we don’t live in an ideal world, but I do
personally feel that in an ideal world it would be possible to pull
the bug fixes (bugs specific to the 4.0 netty refactor) out from the
semantic changes (e.g. droppability, checksumming, back pressure,
handshake changes), code refactors (e.g. verb handler,
MessageIn/MessageOut) and performance changes (various
re-implementations of Netty internals, some optimizations around
dropping dead messages earlier). Then we can review, validate, and
benchmark each change independently and iteratively move towards
better messaging. At the same time, I recognize that it may be hard to
pull these changes apart, but I worry that review and validation of
the patch, as is, may take the testing community many months to
properly vet and will either mean that we cut 4.0 many, many months
from now or we cut 4.0 before we can properly test the patchset.

I think we are all agreed we don’t want an unstable 4.0, so the main
decision point here is: what set of changes from this valuable and
important patch set do we put in 4.0, and which do we try to put in
4.next? Once we determine that, the community can hopefully start
allocating the necessary review, testing, and benchmarking resources
to ensure that 4.0 is our first ever rock solid “.0” release.

-Joey


On Thu, Apr 4, 2019 at 5:56 PM Jon Haddad  wrote:
>
> Given the number of issues that are addressed, I definitely think it's
> worth strongly considering merging this in.  I think it might be a
> little unrealistic to cut the first alpha after the merge though.
> Being realistic, any 20K+ LOC change is going to introduce its own
> bugs, and we should be honest with ourselves about that.  It seems
> likely the issues the patch addressed would have affected the 4.0
> release in some form *anyways* so the question might be do we fix them
> now or after someone's cluster burns down because there's no inbound /
> outbound message load shedding.
>
> Giving it a quick code review and going through the JIRA comments
> (well written, thanks guys) there seem to be some pretty important bug
> fixes in here as well as paying off a bit of technical debt.
>
> Jon
>
> On Thu, Apr 4, 2019 at 1:37 PM Pavel Yaskevich  wrote:
> >
> > Great to see such a significant progress made in the area!
> >
> > On Thu, Apr 4, 2019 at 1:13 PM Aleksey Yeschenko  wrote:
> >
> > > I would like to propose CASSANDRA-15066 [1] - an important set of bug 
> > > fixes
> > > and stability improvements to internode messaging code that Benedict, I,
> > > and others have been working on for the past couple of months.
> > >
> > > First, some context.   This work started off as a review of 
> > > CASSANDRA-14503
> > > (Internode connection management is race-prone [2]), CASSANDRA-13630
> > > (Support large internode messages with netty) [3], and a pre-4.0
> > > confirmatory review of such a major new feature.
> > >
> > > However, as we dug in, we realized this was insufficient. With more than 
> > > 50
> > > bugs uncovered [4] - dozens of them critical to correctness and/or
> > > stability of the system - a substantial rework was necessary to guarantee 
> > > a
> > > solid internode messaging subsystem for the 4.0 release.
> > >
> > > In addition to addressing all of the uncovered bugs [4] that were unique 
> > > to
> > > trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> > > long-existing, pre-4.0 bugs and stability issues. For the complete list of
> > > notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> > > to highlight a few.
> > >
> > > # Lack of message integrity checks
> > >
> > > It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
> > > relied upon [6] for in

Re: Stabilising Internode Messaging in 4.0

2019-04-09 Thread Joseph Lynch
*I am relatively new to these code paths—especially compared to the
committers that have been working on these issues for years such as the
15066 authors as well as Jason Brown—but like many Cassandra users I am
familiar with many of the classes of issues Aleksey and Benedict have
identified with this patchset (especially related to messaging correctness,
performance and the lack of message backpressure). We believe that every
single fix and feature in this patch is valuable and we desire that we are
able to get them all merged in and validated. We don’t think it’s even a
question if we want to merge these: we should want these excellent changes.
The only questions—in my opinion—are how do we safely merge them and when
do we merge them?Due to my and Vinay’s relative lack of knowledge of these
code paths, we hope that we can get as many experienced eyes as we can to
review the patch and evaluate the risk-reward tradeoffs of some of the
deeper changes. We don’t feel qualified to make assertions about risk vs
reward in this patchset, but I know there are a number of people on this
mailing list who are qualified and I think we would all appreciate their
insight and help.I completely understand that we don’t live in an ideal
world, but I do personally feel that in an ideal world it would be possible
to pull the bug fixes (bugs specific to the 4.0 netty refactor) out from
the semantic changes (e.g. droppability, checksumming, back pressure,
handshake changes), code refactors (e.g. verb handler,
MessageIn/MessageOut) and performance changes (various re-implementations
of Netty internals, some optimizations around dropping dead messages
earlier). Then we can review, validate, and benchmark each change
independently and iteratively move towards better messaging. At the same
time, I recognize that it may be hard to pull these changes apart, but I
worry that review and validation of the patch, as is, may take the testing
community many months to properly vet and will either mean that we cut 4.0
many, many months from now or we cut 4.0 before we can properly test the
patchset.I think we are all agreed we don’t want an unstable 4.0, so the
main decision point here is: what set of changes from this valuable and
important patch set do we put in 4.0, and which do we try to put in 4.next?
Once we determine that, the community can hopefully start allocating the
necessary review, testing, and benchmarking resources to ensure that 4.0 is
our first ever rock solid “.0” release.-Joey*

On Thu, Apr 4, 2019 at 5:56 PM Jon Haddad  wrote:

> Given the number of issues that are addressed, I definitely think it's
> worth strongly considering merging this in.  I think it might be a
> little unrealistic to cut the first alpha after the merge though.
> Being realistic, any 20K+ LOC change is going to introduce its own
> bugs, and we should be honest with ourselves about that.  It seems
> likely the issues the patch addressed would have affected the 4.0
> release in some form *anyways* so the question might be do we fix them
> now or after someone's cluster burns down because there's no inbound /
> outbound message load shedding.
>
> Giving it a quick code review and going through the JIRA comments
> (well written, thanks guys) there seem to be some pretty important bug
> fixes in here as well as paying off a bit of technical debt.
>
> Jon
>
> On Thu, Apr 4, 2019 at 1:37 PM Pavel Yaskevich  wrote:
> >
> > Great to see such a significant progress made in the area!
> >
> > On Thu, Apr 4, 2019 at 1:13 PM Aleksey Yeschenko 
> wrote:
> >
> > > I would like to propose CASSANDRA-15066 [1] - an important set of bug
> fixes
> > > and stability improvements to internode messaging code that Benedict,
> I,
> > > and others have been working on for the past couple of months.
> > >
> > > First, some context.   This work started off as a review of
> CASSANDRA-14503
> > > (Internode connection management is race-prone [2]), CASSANDRA-13630
> > > (Support large internode messages with netty) [3], and a pre-4.0
> > > confirmatory review of such a major new feature.
> > >
> > > However, as we dug in, we realized this was insufficient. With more
> than 50
> > > bugs uncovered [4] - dozens of them critical to correctness and/or
> > > stability of the system - a substantial rework was necessary to
> guarantee a
> > > solid internode messaging subsystem for the 4.0 release.
> > >
> > > In addition to addressing all of the uncovered bugs [4] that were
> unique to
> > > trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> > > long-existing, pre-4.0 bugs and stability issues. For the complete
> list of
> > > notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d
> like
> > > to highlight a few.
> > >
> > > # Lack of message integrity checks
> > >
> > > It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot
> be
> > > relied upon [6] for integrity. With sufficient scale or time, you will
> hit
> > > bit flips. Sad

Re: Stabilising Internode Messaging in 4.0

2019-04-04 Thread Jon Haddad
Given the number of issues that are addressed, I definitely think it's
worth strongly considering merging this in.  I think it might be a
little unrealistic to cut the first alpha after the merge though.
Being realistic, any 20K+ LOC change is going to introduce its own
bugs, and we should be honest with ourselves about that.  It seems
likely the issues the patch addressed would have affected the 4.0
release in some form *anyways* so the question might be do we fix them
now or after someone's cluster burns down because there's no inbound /
outbound message load shedding.

Giving it a quick code review and going through the JIRA comments
(well written, thanks guys) there seem to be some pretty important bug
fixes in here as well as paying off a bit of technical debt.

Jon

On Thu, Apr 4, 2019 at 1:37 PM Pavel Yaskevich  wrote:
>
> Great to see such a significant progress made in the area!
>
> On Thu, Apr 4, 2019 at 1:13 PM Aleksey Yeschenko  wrote:
>
> > I would like to propose CASSANDRA-15066 [1] - an important set of bug fixes
> > and stability improvements to internode messaging code that Benedict, I,
> > and others have been working on for the past couple of months.
> >
> > First, some context.   This work started off as a review of CASSANDRA-14503
> > (Internode connection management is race-prone [2]), CASSANDRA-13630
> > (Support large internode messages with netty) [3], and a pre-4.0
> > confirmatory review of such a major new feature.
> >
> > However, as we dug in, we realized this was insufficient. With more than 50
> > bugs uncovered [4] - dozens of them critical to correctness and/or
> > stability of the system - a substantial rework was necessary to guarantee a
> > solid internode messaging subsystem for the 4.0 release.
> >
> > In addition to addressing all of the uncovered bugs [4] that were unique to
> > trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> > long-existing, pre-4.0 bugs and stability issues. For the complete list of
> > notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> > to highlight a few.
> >
> > # Lack of message integrity checks
> >
> > It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
> > relied upon [6] for integrity. With sufficient scale or time, you will hit
> > bit flips. Sadly, most of the time these go undetected.  Cassandra’s
> > replication model makes this issue much more serious, as the faulty data
> > can infect the cluster.
> >
> > We recognised this problem, and recently introduced a fix for server-client
> > messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to the
> > native protocol) [7].
> >
> > But until CASSANDRA-15066 [1] lands, this is also a critical flaw
> > internode. We have addressed it by ensuring that no matter what, whether
> > you use SSL or not, whether you use internode compression or not, a
> > protocol level CRC is always present, for every message frame. It’s our
> > deep and sincere belief that shipping a new implementation of the messaging
> > protocol without application-level data integrity checks would be
> > unacceptable for a modern database.
> >
>
> I'm all for introducing more correctness checks at all levels especially in
> communication.
> Having dealt with multiple data corruption bugs that could have been easily
> prevented by
> having a checksum, it's great to see that we are moving in this direction.
>
>
> > # Lack of back-pressure and memory usage constraints
> >
> > As it stands today, it’s far too easy for a single slow node to become
> > overwhelmed by messages from its peers.  Conversely, multiple coordinators
> > can be made unstable by the backlog of messages to deliver to just one
> > struggling node.
> >
> > To address this problem, we have introduced strict memory usage constraints
> > that apply TCP-level back-pressure, on both outbound and inbound.  It is
> > now impossible for a node to be swamped on inbound, and on outbound it is
> > made significantly harder to overcommit resources.  It’s a simple, reliable
> > mechanism that drastically improves cluster stability under load, and
> > especially overload.
> >
> > Cassandra is a mature system, and introducing an entirely new messaging
> > implementation without resolving this fundamental stability issue is
> > difficult to justify in our view.
> >
>
> I'd say that this is required to be able to ship 4.0 as a release focused
> on stability.
> I personally have been waiting for this to happen for years. Significant
> step forward in our QoS story.
>
>
> >
> > # State of the patch, feature freeze and 4.0 timeline concerns
> >
> > The patch is essentially complete, with much improved unit tests all
> > passing, dtests green, and extensive fuzz testing underway - with initial
> > results all positive.  We intend to further improve in-code documentation
> > and test coverage in the next week or two, and do some minor additional
> > code review, but we believe it will be basically 

Re: Stabilising Internode Messaging in 4.0

2019-04-04 Thread Pavel Yaskevich
Great to see such a significant progress made in the area!

On Thu, Apr 4, 2019 at 1:13 PM Aleksey Yeschenko  wrote:

> I would like to propose CASSANDRA-15066 [1] - an important set of bug fixes
> and stability improvements to internode messaging code that Benedict, I,
> and others have been working on for the past couple of months.
>
> First, some context.   This work started off as a review of CASSANDRA-14503
> (Internode connection management is race-prone [2]), CASSANDRA-13630
> (Support large internode messages with netty) [3], and a pre-4.0
> confirmatory review of such a major new feature.
>
> However, as we dug in, we realized this was insufficient. With more than 50
> bugs uncovered [4] - dozens of them critical to correctness and/or
> stability of the system - a substantial rework was necessary to guarantee a
> solid internode messaging subsystem for the 4.0 release.
>
> In addition to addressing all of the uncovered bugs [4] that were unique to
> trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> long-existing, pre-4.0 bugs and stability issues. For the complete list of
> notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> to highlight a few.
>
> # Lack of message integrity checks
>
> It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
> relied upon [6] for integrity. With sufficient scale or time, you will hit
> bit flips. Sadly, most of the time these go undetected.  Cassandra’s
> replication model makes this issue much more serious, as the faulty data
> can infect the cluster.
>
> We recognised this problem, and recently introduced a fix for server-client
> messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to the
> native protocol) [7].
>
> But until CASSANDRA-15066 [1] lands, this is also a critical flaw
> internode. We have addressed it by ensuring that no matter what, whether
> you use SSL or not, whether you use internode compression or not, a
> protocol level CRC is always present, for every message frame. It’s our
> deep and sincere belief that shipping a new implementation of the messaging
> protocol without application-level data integrity checks would be
> unacceptable for a modern database.
>

I'm all for introducing more correctness checks at all levels especially in
communication.
Having dealt with multiple data corruption bugs that could have been easily
prevented by
having a checksum, it's great to see that we are moving in this direction.


> # Lack of back-pressure and memory usage constraints
>
> As it stands today, it’s far too easy for a single slow node to become
> overwhelmed by messages from its peers.  Conversely, multiple coordinators
> can be made unstable by the backlog of messages to deliver to just one
> struggling node.
>
> To address this problem, we have introduced strict memory usage constraints
> that apply TCP-level back-pressure, on both outbound and inbound.  It is
> now impossible for a node to be swamped on inbound, and on outbound it is
> made significantly harder to overcommit resources.  It’s a simple, reliable
> mechanism that drastically improves cluster stability under load, and
> especially overload.
>
> Cassandra is a mature system, and introducing an entirely new messaging
> implementation without resolving this fundamental stability issue is
> difficult to justify in our view.
>

I'd say that this is required to be able to ship 4.0 as a release focused
on stability.
I personally have been waiting for this to happen for years. Significant
step forward in our QoS story.


>
> # State of the patch, feature freeze and 4.0 timeline concerns
>
> The patch is essentially complete, with much improved unit tests all
> passing, dtests green, and extensive fuzz testing underway - with initial
> results all positive.  We intend to further improve in-code documentation
> and test coverage in the next week or two, and do some minor additional
> code review, but we believe it will be basically good to commit in a couple
> weeks.
>
> The patch could also use some performance testing. We are hoping that our
> colleagues at Netflix and new Cassandra committers Joey and Vinay will help
> us with this aspect.  However, this work needs to be done regardless, so
> provides no significant delay.
>
> I would classify absolutely most of the work done in this patch as
> necessary bug fixes and stability improvements - in line with the stated
> goals of the feature freeze.
>
> The only clear “feature” introduced is the expanded metrics, and virtual
> tables to observe them.  If the freeze is to be strictly interpreted these
> can be removed, but we think this would be unwise.
>
> We hope that the community will appreciate and welcome these changes.
> We’ve worked hard to deliver this promptly, to minimise delays to 4.0 were
> these issues to be addressed piecemeal, and we sincerely believe this work
> will have a positive impact on stability and performance of Cassandra
> clusters for yea

Re: Stabilising Internode Messaging in 4.0

2019-04-04 Thread J. D. Jordan
Definitely sounds like it is worth taking a 2nd look here. Given that this is 
in relation to brand new code for 4.0, I agree that it makes sense to get it 
right the first time, rather than applying bandaids to 4.0 and rewriting things 
for 4.next. I think 4.0 should be a solid starting point for future work and 
not have brand new code in it that we are knowingly planning to re-write right 
after we cut the release...

-Jeremiah

> On Apr 4, 2019, at 3:13 PM, Aleksey Yeschenko  wrote:
> 
> I would like to propose CASSANDRA-15066 [1] - an important set of bug fixes
> and stability improvements to internode messaging code that Benedict, I,
> and others have been working on for the past couple of months.
> 
> First, some context.   This work started off as a review of CASSANDRA-14503
> (Internode connection management is race-prone [2]), CASSANDRA-13630
> (Support large internode messages with netty) [3], and a pre-4.0
> confirmatory review of such a major new feature.
> 
> However, as we dug in, we realized this was insufficient. With more than 50
> bugs uncovered [4] - dozens of them critical to correctness and/or
> stability of the system - a substantial rework was necessary to guarantee a
> solid internode messaging subsystem for the 4.0 release.
> 
> In addition to addressing all of the uncovered bugs [4] that were unique to
> trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> long-existing, pre-4.0 bugs and stability issues. For the complete list of
> notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> to highlight a few.
> 
> # Lack of message integrity checks
> 
> It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
> relied upon [6] for integrity. With sufficient scale or time, you will hit
> bit flips. Sadly, most of the time these go undetected.  Cassandra’s
> replication model makes this issue much more serious, as the faulty data
> can infect the cluster.
> 
> We recognised this problem, and recently introduced a fix for server-client
> messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to the
> native protocol) [7].
> 
> But until CASSANDRA-15066 [1] lands, this is also a critical flaw
> internode. We have addressed it by ensuring that no matter what, whether
> you use SSL or not, whether you use internode compression or not, a
> protocol level CRC is always present, for every message frame. It’s our
> deep and sincere belief that shipping a new implementation of the messaging
> protocol without application-level data integrity checks would be
> unacceptable for a modern database.
> 
> # Lack of back-pressure and memory usage constraints
> 
> As it stands today, it’s far too easy for a single slow node to become
> overwhelmed by messages from its peers.  Conversely, multiple coordinators
> can be made unstable by the backlog of messages to deliver to just one
> struggling node.
> 
> To address this problem, we have introduced strict memory usage constraints
> that apply TCP-level back-pressure, on both outbound and inbound.  It is
> now impossible for a node to be swamped on inbound, and on outbound it is
> made significantly harder to overcommit resources.  It’s a simple, reliable
> mechanism that drastically improves cluster stability under load, and
> especially overload.
> 
> Cassandra is a mature system, and introducing an entirely new messaging
> implementation without resolving this fundamental stability issue is
> difficult to justify in our view.
> 
> # State of the patch, feature freeze and 4.0 timeline concerns
> 
> The patch is essentially complete, with much improved unit tests all
> passing, dtests green, and extensive fuzz testing underway - with initial
> results all positive.  We intend to further improve in-code documentation
> and test coverage in the next week or two, and do some minor additional
> code review, but we believe it will be basically good to commit in a couple
> weeks.
> 
> The patch could also use some performance testing. We are hoping that our
> colleagues at Netflix and new Cassandra committers Joey and Vinay will help
> us with this aspect.  However, this work needs to be done regardless, so
> provides no significant delay.
> 
> I would classify absolutely most of the work done in this patch as
> necessary bug fixes and stability improvements - in line with the stated
> goals of the feature freeze.
> 
> The only clear “feature” introduced is the expanded metrics, and virtual
> tables to observe them.  If the freeze is to be strictly interpreted these
> can be removed, but we think this would be unwise.
> 
> We hope that the community will appreciate and welcome these changes.
> We’ve worked hard to deliver this promptly, to minimise delays to 4.0 were
> these issues to be addressed piecemeal, and we sincerely believe this work
> will have a positive impact on stability and performance of Cassandra
> clusters for years to come.
> 
> P.S. I believe that once it’s committed, we 

Stabilising Internode Messaging in 4.0

2019-04-04 Thread Aleksey Yeschenko
I would like to propose CASSANDRA-15066 [1] - an important set of bug fixes
and stability improvements to internode messaging code that Benedict, I,
and others have been working on for the past couple of months.

First, some context.   This work started off as a review of CASSANDRA-14503
(Internode connection management is race-prone [2]), CASSANDRA-13630
(Support large internode messages with netty) [3], and a pre-4.0
confirmatory review of such a major new feature.

However, as we dug in, we realized this was insufficient. With more than 50
bugs uncovered [4] - dozens of them critical to correctness and/or
stability of the system - a substantial rework was necessary to guarantee a
solid internode messaging subsystem for the 4.0 release.

In addition to addressing all of the uncovered bugs [4] that were unique to
trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
long-existing, pre-4.0 bugs and stability issues. For the complete list of
notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
to highlight a few.

# Lack of message integrity checks

It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
relied upon [6] for integrity. With sufficient scale or time, you will hit
bit flips. Sadly, most of the time these go undetected.  Cassandra’s
replication model makes this issue much more serious, as the faulty data
can infect the cluster.

We recognised this problem, and recently introduced a fix for server-client
messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to the
native protocol) [7].

But until CASSANDRA-15066 [1] lands, this is also a critical flaw
internode. We have addressed it by ensuring that no matter what, whether
you use SSL or not, whether you use internode compression or not, a
protocol level CRC is always present, for every message frame. It’s our
deep and sincere belief that shipping a new implementation of the messaging
protocol without application-level data integrity checks would be
unacceptable for a modern database.

# Lack of back-pressure and memory usage constraints

As it stands today, it’s far too easy for a single slow node to become
overwhelmed by messages from its peers.  Conversely, multiple coordinators
can be made unstable by the backlog of messages to deliver to just one
struggling node.

To address this problem, we have introduced strict memory usage constraints
that apply TCP-level back-pressure, on both outbound and inbound.  It is
now impossible for a node to be swamped on inbound, and on outbound it is
made significantly harder to overcommit resources.  It’s a simple, reliable
mechanism that drastically improves cluster stability under load, and
especially overload.

Cassandra is a mature system, and introducing an entirely new messaging
implementation without resolving this fundamental stability issue is
difficult to justify in our view.

# State of the patch, feature freeze and 4.0 timeline concerns

The patch is essentially complete, with much improved unit tests all
passing, dtests green, and extensive fuzz testing underway - with initial
results all positive.  We intend to further improve in-code documentation
and test coverage in the next week or two, and do some minor additional
code review, but we believe it will be basically good to commit in a couple
weeks.

The patch could also use some performance testing. We are hoping that our
colleagues at Netflix and new Cassandra committers Joey and Vinay will help
us with this aspect.  However, this work needs to be done regardless, so
provides no significant delay.

I would classify absolutely most of the work done in this patch as
necessary bug fixes and stability improvements - in line with the stated
goals of the feature freeze.

The only clear “feature” introduced is the expanded metrics, and virtual
tables to observe them.  If the freeze is to be strictly interpreted these
can be removed, but we think this would be unwise.

We hope that the community will appreciate and welcome these changes.
We’ve worked hard to deliver this promptly, to minimise delays to 4.0 were
these issues to be addressed piecemeal, and we sincerely believe this work
will have a positive impact on stability and performance of Cassandra
clusters for years to come.

P.S. I believe that once it’s committed, we should cut our first 4.0 alpha.
It’s about time we started this train (:

[1] https://issues.apache.org/jira/browse/CASSANDRA-15066
[2] https://issues.apache.org/jira/browse/CASSANDRA-14503
[3] https://issues.apache.org/jira/browse/CASSANDRA-13630
[4] https://gist.github.com/belliottsmith/0d12df678d8e9ab06776e29116d56b91
(incomplete list)
[5] https://www.evanjones.ca/tcp-checksums.html
[6] https://www.evanjones.ca/tcp-and-ethernet-checksums-fail.html
[7] https://issues.apache.org/jira/browse/CASSANDRA-13304