On Fri, 15 May 2015, David Lamparter wrote:
On Thu, May 14, 2015 at 10:45:58PM +0100, Paul Jakma wrote:
a) GR peers that are both in update-delay just immediately send EoR to
each other, and so go out of update-delay mode with each other.
This is incorrect; you seem to have forgotten update-delay is global.
No I didn't, it is the global aspect of delaying all UPDATEs that I have
the issue with.
Each side will stay in update-delay until *all* of its peers are in an
acceptable state, where
acceptable := EoR || R=1 || (NoGR && Keepalive)
or alternatively the timer expires.
Yes, understood.
b) Non-GR peers sit in update-delay mode until one or both send keepalives
to kick the other out of update-delay mode.
Again, they stay in update-delay mode until all peers globally are in an
acceptable state, or the timer expires. The "look for Keepalive instead
of EoR" thing seems to be a Cumulus invention, we could strip that out
if needed. It's not a core aspect of the functionality, really.
The keepalive thing is sort of neat, though it kind of depends on what
you're optimising for.
My issue with it is that this is optimising for one specific case. One
router restarts (R in the centre), and this is optimising for where all
its neighbours are stable (stable in the uptime sense - not the "quiescent
RIB sense" - RIB-stability and uptime-stability may well be inversely
related):
S S S
\|/
R
/|\
S S S
I.e. one router has restarted.
More generally, the restarted router has 0 or more non-restarted peers and
0 or more restarted peers:
0...
S S S
\|/
R
/ \
R R
0...
To minimise the impact of the restarted routing domain on the stable
domain, you would want the restarted-domain to converge fully first
(internally and on the information from the stable domain), and only then
allow routing information to go from the restarted domain to the stable
domain. That'd be hard to do, even with free reign to change BGP. So you
can only consider local information.
Locally, you want to send updates to other routers in the R-domain as
quickly as possible, and delay sending to the S-domain as long as
possible.
Stopping all RIB processing is fine in the first case (you could argue
this single-router R-domain is more common, and it's fine for
route-servers, but it's less good in general for routing convergence.
What I would like is to defer UPDATEs only from the R-domain to S-domain
peers. This is what the current code does.
With the global update-delay, CPU churn is being traded off for worse
convergence (I had one other concern about it, I initially thought it was
adding a queue - but that wasn't the case). I'm not sure that that is a
trade-off that everyone wants. I also dislike configurable options - we
have lots already. It'd be better to "do the right thing" as much as
possible.
What I've been trying to do is explore how to get there.
(With Non-GR, we don't really know when the peer is done sending us
their full table... using keepalives for that seems, hm, "innovative",
though I agree with you it might turn out to be a bad idea. Should
investigate this further.)
It's a neat trick, if you want to wait. Not everyone wants their routers
to "wait and see" though, sometimes they want them to get back to passing
UPDATEs ASAP.
Uh, no, they don't. The patch creates a global per-box M/L state; a
box starts out believing it is in L and transitions into M based on
above "acceptable" condition on *all* peers, or timer expiry. As long
as we're L, we don't send any outgoing updates as an automatic
side-effect of not running bestpath selection.
And that is precisely the issue.
Nothing gets sent globally, even though it would make sense to still send
where both sides are equal. The L is /relative/ thing - not absolute.
Please reread the patchset; bgp_update_restarted_peers() handles this
by not waiting for peers that indicate R=1, thus taking care of a L<>L
session. Also, a peer in M will never wait. There's your asymmetry ;)
Ok, so it doesn't incude them in the calculation. It still defers sending
updates to them, needlessly - if "fast convergence, while minimising
disruption to the network that stayed up" is a goal.
This patchset is pretty exactly RFC 4724, aside from the Non-GR
Keepalive thing. Peers in M will never wait, thus "go first". Peers in
L will wait until they think they have a reasonable view of the network.
That includes not waiting for other peers in L.
If delaying, it sends EoR immediately though, doesn't it? That's not RFC
GR, and that needn't play nice with other implementations.
Yeah, maybe we should do that... instead of arguing higher authority...
I'm not arguing to higher authority. I'm asking that you stop treating me
as if I'm an idiot, and at least _listen_ to me.
E.g.:
- You NACK my patch to remove the startup timer and make the R-bit be
dependent on state (any state is better than that damn timer - and
I was perfectly happy to refine exactly what state, as discussed via
IRC) with:
"The R bit is intended to be based on wallclock time."
After a (good) off-list discussion, and me posting a follow-up to
address a concern you raised, which you reply to again arguing the need
for a timer, you then post your own patch which, gosh, goes and removes
the startup timer and makes the R-bit timer be, gosh, dependent on
state, just as I had argued.
WTF dude? Is that working together?
- We had a (productive I thought) discussion on IRC. We didn't resolve
everything, but I thought we were going in the right direction, and I
thought it became clear that we had different use-cases in mind. That
you were concerned with minimising CPU churn on the restarted router,
while I was concerned with minimising transient churn on the
non-restarted router.
The conclusion of that seemed to be for me to post a patch to fix the
concrete issue you raised (restricting to the startup peers) and we
could discuss further. I did that, you reply again with NACK on all the
patches. WTF, how is that working together?
regards,
--
Paul Jakma [email protected] @pjakma Key ID: 64A2FF6A
Fortune:
You're never too old to become younger.
-- Mae West
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev