[quagga-dev 12313] Re: [PATCH 2/5] bgpd: strip incorrect Graceful Restart R-bit code

Paul Jakma Fri, 15 May 2015 00:55:50 -0700

On Fri, 15 May 2015, David Lamparter wrote:

On Thu, May 14, 2015 at 10:45:58PM +0100, Paul Jakma wrote:

a) GR peers that are both in update-delay just immediately send EoR to
    each other, and so go out of update-delay mode with each other.

This is incorrect; you seem to have forgotten update-delay is global.

No I didn't, it is the global aspect of delaying all UPDATEs that I havethe issue with.

Each side will stay in update-delay until *all* of its peers are in an
acceptable state, where
acceptable := EoR || R=1 || (NoGR && Keepalive)
or alternatively the timer expires.


Yes, understood.

b) Non-GR peers sit in update-delay mode until one or both send keepalives
    to kick the other out of update-delay mode.
Again, they stay in update-delay mode until all peers globally are in anacceptable state, or the timer expires. The "look for Keepalive insteadof EoR" thing seems to be a Cumulus invention, we could strip that outif needed. It's not a core aspect of the functionality, really.

The keepalive thing is sort of neat, though it kind of depends on whatyou're optimising for.

My issue with it is that this is optimising for one specific case. Onerouter restarts (R in the centre), and this is optimising for where allits neighbours are stable (stable in the uptime sense - not the "quiescentRIB sense" - RIB-stability and uptime-stability may well be inverselyrelated):


 S S S
  \|/
   R
  /|\
 S S S

I.e. one router has restarted.

More generally, the restarted router has 0 or more non-restarted peers and0 or more restarted peers:


   0...
  S S S
   \|/
    R
   / \
  R   R
   0...

To minimise the impact of the restarted routing domain on the stabledomain, you would want the restarted-domain to converge fully first(internally and on the information from the stable domain), and only thenallow routing information to go from the restarted domain to the stabledomain. That'd be hard to do, even with free reign to change BGP. So youcan only consider local information.

Locally, you want to send updates to other routers in the R-domain asquickly as possible, and delay sending to the S-domain as long aspossible.

Stopping all RIB processing is fine in the first case (you could arguethis single-router R-domain is more common, and it's fine forroute-servers, but it's less good in general for routing convergence.

What I would like is to defer UPDATEs only from the R-domain to S-domainpeers. This is what the current code does.

With the global update-delay, CPU churn is being traded off for worseconvergence (I had one other concern about it, I initially thought it wasadding a queue - but that wasn't the case). I'm not sure that that is atrade-off that everyone wants. I also dislike configurable options - wehave lots already. It'd be better to "do the right thing" as much aspossible.


What I've been trying to do is explore how to get there.

(With Non-GR, we don't really know when the peer is done sending ustheir full table... using keepalives for that seems, hm, "innovative",though I agree with you it might turn out to be a bad idea. Shouldinvestigate this further.)

It's a neat trick, if you want to wait. Not everyone wants their routersto "wait and see" though, sometimes they want them to get back to passingUPDATEs ASAP.

Uh, no, they don't. The patch creates a global per-box M/L state; abox starts out believing it is in L and transitions into M based onabove "acceptable" condition on *all* peers, or timer expiry. As longas we're L, we don't send any outgoing updates as an automaticside-effect of not running bestpath selection.


And that is precisely the issue.

Nothing gets sent globally, even though it would make sense to still sendwhere both sides are equal. The L is /relative/ thing - not absolute.

Please reread the patchset; bgp_update_restarted_peers() handles thisby not waiting for peers that indicate R=1, thus taking care of a L<>Lsession. Also, a peer in M will never wait. There's your asymmetry ;)

Ok, so it doesn't incude them in the calculation. It still defers sendingupdates to them, needlessly - if "fast convergence, while minimisingdisruption to the network that stayed up" is a goal.

This patchset is pretty exactly RFC 4724, aside from the Non-GRKeepalive thing. Peers in M will never wait, thus "go first". Peers inL will wait until they think they have a reasonable view of the network.That includes not waiting for other peers in L.

If delaying, it sends EoR immediately though, doesn't it? That's not RFCGR, and that needn't play nice with other implementations.

Yeah, maybe we should do that...  instead of arguing higher authority...

I'm not arguing to higher authority. I'm asking that you stop treating meas if I'm an idiot, and at least _listen_ to me.


E.g.:

- You NACK my patch to remove the startup timer and make the R-bit be
  dependent on state (any state is better than that damn timer - and
  I was perfectly happy to refine exactly what state, as discussed via
  IRC) with:

  "The R bit is intended to be based on wallclock time."

  After a (good) off-list discussion, and me posting a follow-up to
  address a concern you raised, which you reply to again arguing the need
  for a timer, you then post your own patch which, gosh, goes and removes
  the startup timer and makes the R-bit timer be, gosh, dependent on
  state, just as I had argued.

  WTF dude? Is that working together?

- We had a (productive I thought) discussion on IRC. We didn't resolve
  everything, but I thought we were going in the right direction, and I
  thought it became clear that we had different use-cases in mind. That
  you were concerned with minimising CPU churn on the restarted router,
  while I was concerned with minimising transient churn on the
  non-restarted router.

  The conclusion of that seemed to be for me to post a patch to fix the
  concrete issue you raised (restricting to the startup peers) and we
  could discuss further. I did that, you reply again with NACK on all the
  patches. WTF, how is that working together?

regards,
--
Paul Jakma      [email protected]  @pjakma Key ID: 64A2FF6A
Fortune:
You're never too old to become younger.
                -- Mae West

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

[quagga-dev 12313] Re: [PATCH 2/5] bgpd: strip incorrect Graceful Restart R-bit code

Reply via email to