[quagga-dev 12327] Re: [PATCH 2/5] bgpd: strip incorrect Graceful Restart R-bit code

David Lamparter Sun, 17 May 2015 11:53:40 -0700

[meme images were inserted into this mail after i switched into f*ck it
mode.]

On Sun, May 17, 2015 at 06:31:04PM +0100, Paul Jakma wrote:
> > And there's Non-Stop Forwarding, i.e. keeping forwarding state, to get 
> > rid of that flap entirely.  That only works if the router is doing a 
> > reasonably controlled reboot (either by user command, or by a 
> > well-designed crash/panic handler).  Both of these are single-system 
> > short-time events.
> 
> We don't do NSF.

... yet.
https://i.imgur.com/WA9TwtG.png

> Further, the global delay of route-calculation, for /all/ peers, is likely 
> ungood for the routing of the local speaker even with NSF, IMO.

I disagree.

> >> The update-delay removes a nice convergence feature.
> >
> > It's a bug, and here's why.
> 
> > 1.  it generates network churn by exiting cork mode on a per-peer base,
> > which is simply too early:
> 
> > Assume you have a router starting up with 4 peers.  They all differ in
> > their speed & processing power:
> >
> > "<" = session establishment, numbers = routes/updates, ">" = EoR
> > A     <123456789>
> > B    <1 2 3 4 5 6 7 8 9>
> > C          <1  2  3  4  5  6  7  8  9>
> > D                 <123456789>
> >
> > A will start receiving updates quite early, in particular at a point
> > where it is the only source for everything numbered 6 and above.  It
> > can, worst-case, receive 3 updates about prefixes 6789;  it can receive
> > 2 updates about prefixes 345.
> 
> > This only gets worse when there are more peers, and worse yet of someone
> > doesn't send updates in ascending table order.
> 
> Why is this worse?

Because this is a scenario with only one router in the R set, yet we're
generating a significant amount of network-propagated churn that can
impact larger parts of the DFZ.

We can end up telling A that our best path to 6 is B, then D, then C, in
rapid succession - and it's quite possible this is actually the best
path for A, meaning it'll readvertise this to its own peers.

http://cdn.meme.am/instances/500x/20566851.jpg

> Remember, in general, you have:

No, I don't remember that, because I'm assuming the number of routers
that has restarted to be 0 or 1.

[snip large bulk of text that assumes more than 1 router in R]

> (The GR RFC does not mention the restart-restart case specifically at all. 
> Indeed, you have to thread together text from different parts of the RFC 
> to figure out that case, with a dose of implication required. Note that no 
> where does the GR RFC explicitly say that a restarted peer must defer 
> sending updates to another restarted peer).

The GR RFC explicitly says the restarting speaker "MUST defer route
selection" and "noted that prior to route selection, the speaker has no
routes to advertise to its peers and no routes to update the forwarding
state."  Where 'its peers' includes other peers that are restarting.

> > 2.  it doesn't work with non-stop forwarding.
> >
> > keeping forwarding state would translate on Quagga to a zebra & bgpd 
> > restart while keeping routes in the kernel untouched.  The implication 
> > here is that other routers will keep the previous state in their table 
> > and replace it only when they receive incoming updates, finally removing 
> > stale nonrefreshed entries on seeing EoR.
> 
> First we'd have to get bgpd restart, with zebra still running, working 
> nicely.

That sounds like a challenge, one that if I didn't have a responsibility
to review patches submitted to this list, would likely take up.
Unfortunately I have enough work on my TODO list as it is.

http://weknowmemes.com/wp-content/uploads/2011/11/challenge-denied-rage-face.jpg

This is something that a patch could be submitted for, though...
http://i3.kym-cdn.com/photos/images/newsfeed/000/264/241/9e9.gif
[no payment implied]

> > This isn't possible to do on a per-peer level.  We can't send our
> > current table to one peer - sending that table means we must have the
> > state installed, or we're lying/actively breaking BGP! - while still
> > claiming to another peer that we're using our pre-restart state.
> 
> Being able to defer UPDATEs at a more fine-grained level than a global 
> condition still permits deferring at a global level. If local-NSF required 
> global.

I'm still arguing my (1.) and (2.) from previous mail, i.e.:
1. global deferral is neccessary to avoid network-propagating churn in
   the situation where only 1 router is restarting
2. NSF requires global deferral

If there is something that we can do on a per-peer level *on top of
'normal' GR*, I would like to see a reviewed spec for that.

[more text cut]

> Though, I'd assert some of it started in the very first reply to 
> my patches, which didn't help in setting the tone.

Could you point out what exactly was wrong with my very first reply?  I
don't see the issue (it seemed neutral and to the point to me), and
would like to avoid repeating the mistake.

> >> "Send UPDATEs without artificial delays when it's perfectly acceptable" is
> >> abusing people's networks?
> >
> > Please stop twisting my words to your liking.  I complained that it 
> > isn't acceptable to push experimental, undocumented, unreviewed new 
> > protocol inventions into Quagga.  That would be using our users' 
> > networks as testbeds.
> 
> It's not. It's normal BGP! :)
> 
> If speaking normal BGP and sending UPDATEs to a peer without delay (as per 
> normal BGP) is using users as test-beds we'd better stop shipping bgpd. :)

BGP consists of more than bytes on the wire.

> > Unfortunately, as pointed out in the "bug, not feature" section, it's 
> > not "perfectly acceptable" to send these updates;  in fact it will 
> > generate extra churn.
> 
> What extra churn though? There is no extra churn relative to BGP-4.

It's either extra delay relative to BGP-4, or extra churn relative to
4724 GR.  Feel free to pick one.

> it filters out the worst transients (sending a route that the remote-peer 
> has a better path for before you've got it, leading to 
> UPDATE-then-WITHDRAW to that remote peer)

Those are actually the least problematic because the peer won't select
them and they won't continue to travel through the BGP domain.

[cut remainder of cited mail]

Ah f*ck it I'm too tired to continue arguing on this.

The base disagreement seems to be whether it's realistic to cater to the
case of more than one router restarting.  I don't think it is, and I'd
like to have a nice NSF/ISSU update possibility at some point, allowing
the user to update their Quagga installation without it showing up
worldwide as a flap for their prefix.

That, to my understanding is what GR was invented for.  It's a damn
f*cking restart that you put on your schedule when Cisco goes
oopsie-daisy & tells you you have to move from 15.2S to 15.2(1)S2, or we
c*ck up and tell you to move from 0.99.22.3 to 0.99.22.4 because
Phtheven here was in charge of RelEng:
https://s-media-cache-ak0.pinimg.com/736x/43/97/7d/43977d9c3d497f918e65a62d0578ceb9.jpg

I don't know how to continue on this.  We can do another threadnaught on
whether R>1 is realistic or not.  Or we can discuss how NAKs work.  Or
we can discuss how our community works overall.
... either way, this topic is now plonk'd to the absolute bottom of my
priority list.  There's enough things to productively apply time to.
http://ak-hdl.buzzfed.com/static/2015-02/4/5/enhanced/webdr04/enhanced-buzz-21683-1423047209-9.jpg

-David

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

[quagga-dev 12327] Re: [PATCH 2/5] bgpd: strip incorrect Graceful Restart R-bit code

Reply via email to