[quagga-dev 12310] Re: [PATCH 2/5] bgpd: strip incorrect Graceful Restart R-bit code

David Lamparter Thu, 14 May 2015 15:50:08 -0700

On Thu, May 14, 2015 at 10:45:58PM +0100, Paul Jakma wrote:
> a) GR peers that are both in update-delay just immediately send EoR to
>     each other, and so go out of update-delay mode with each other.


This is incorrect; you seem to have forgotten update-delay is global.
Each side will stay in update-delay until *all* of its peers are in an
acceptable state, where
acceptable := EoR || R=1 || (NoGR && Keepalive)
or alternatively the timer expires.

> b) Non-GR peers sit in update-delay mode until one or both send keepalives
>     to kick the other out of update-delay mode.

Again, they stay in update-delay mode until all peers globally are in an
acceptable state, or the timer expires.  The "look for Keepalive instead
of EoR" thing seems to be a Cumulus invention, we could strip that out
if needed.  It's not a core aspect of the functionality, really.

(With Non-GR, we don't really know when the peer is done sending us
their full table... using keepalives for that seems, hm, "innovative",
though I agree with you it might turn out to be a bad idea.  Should
investigate this further.)

> It is a basic tenet of convergence churn reduction to have the "more 
> converged" side (call it M) send its state /first/ to the "less converged" 
> side (L), so that the processing churn happens on L and transients are 
> hopefully confined to L and the sub-network on its side of the partition. 
> /Then/ have L send its result to M.
> 
> This implies:
> 
> 1. You need some mechanism or heuristic to negotiate the "more converged"
>     and "less converged" properties, to decide who is L and who is M (if
>     any at all). The mechanism may be imperfect and not always produce the
>     optimal answer, but such is life.
> 
> 2. You need an ordering mechanism to make the "M send first, then L" part
>     work.
> 
> These patches ignore this completely.

Uh, no, they don't.  The patch creates a global per-box M/L state;  a
box starts out believing it is in L and transitions into M based on
above "acceptable" condition on *all* peers, or timer expiry.  As long
as we're L, we don't send any outgoing updates as an automatic
side-effect of not running bestpath selection.

(i.e.  bgp_update_delay_active(bgp) ? L : M )

> You need some kind of assymetry and ordering conditions in the
> protocol, but "Wait for everyone else to send EoR or keepalive" is
> inherently symmetrical.

Please reread the patchset;  bgp_update_restarted_peers() handles this
by not waiting for peers that indicate R=1, thus taking care of a L<>L
session.  Also, a peer in M will never wait.  There's your asymmetry ;)

> Who goes first with this patch set? No one and everyone. This is not GR.

This patchset is pretty exactly RFC 4724, aside from the Non-GR
Keepalive thing.  Peers in M will never wait, thus "go first".  Peers in
L will wait until they think they have a reasonable view of the network.
That includes not waiting for other peers in L.

> I know why you're interested in this case, because it helps a route-server 
> with many peers, after the route-server restarts. However, these patches 
> also need to work for the general case.

(It doesn't actually, routeservers generally see each prefix only from
one source due to the IXP model of everyone only announcing their own
prefixes.  That makes GR rather useless in that case.  The case this
helps in is normal, actual routers with several peers providing
connectivity on a per-prefix base.)

> Also, I have expertise elsewhere in network protocol analysis, convergence 
> optimisation and churn reduction, and formal verification of same.
>
> Maybe we should try working with each other.

Yeah, maybe we should do that...  instead of arguing higher authority...


-David

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

[quagga-dev 12310] Re: [PATCH 2/5] bgpd: strip incorrect Graceful Restart R-bit code

Reply via email to