[quagga-dev 12328] Re: [PATCH 2/5] bgpd: strip incorrect Graceful Restart R-bit code

Paul Jakma Sun, 17 May 2015 12:38:17 -0700

On Sun, 17 May 2015, David Lamparter wrote:

[meme images were inserted into this mail after i switched into f*ck it
mode.]

I can see. The mode that is. I generally read email with a terminalbased MUA.

Because this is a scenario with only one router in the R set, yet we're
generating a significant amount of network-propagated churn that can
impact larger parts of the DFZ.


Compared to what? Not to normal BGP.

Could an update-delay with a global wait do better. Yes. *IF* the timerhas been tuned *just right*.

Tell me how we do that for all Quagga users so we can enable update-delayby default?

I don't believe we can. To enable update-delay, the max-delay would haveto be so low that it'd be plain-BGP for many people. In which case, thepeer-variant would be better.

I say we can support all the use-cases. You say only your use-case shouldbe supported.

We can end up telling A that our best path to 6 is B, then D, then C, inrapid succession - and it's quite possible this is actually the bestpath for A, meaning it'll readvertise this to its own peers.


So that's fine.

If you update a peer with routes that are still via you, its FIB doesn'tchange - it should still go via you (your own FIB may change needlesslythough).

All the granulities achieve that. They all avoid the spurious"advertise-withdraw" because the restarted router sends aultimately-non-best route to the stable peer before.

Coalescing more over a longer time-span. I do *NOT* object to that.However, I would object to making it so every peer /always/ has to besubject to that delay.

Further, we will (I strongly suspect) never be able to enable the globalupdate-delay by default.

We can though have the default be much better. Maybe not as good asupdate-delay with a suitable max-delay for certain use cases, butcertainly better than default, plain BGP-4.

Yet you seem to want to make this an either/or thing. Either plain-BGP oryour way, and no other. I really don't get that.

Remember, in general, you have:


No, I don't remember that, because I'm assuming the number of routers
that has restarted to be 0 or 1.

[snip large bulk of text that assumes more than 1 router in R]


Yes, the general case.

If you limit the case so there are no other restarted peers, then thisdoesn't matter. If you constrain things to just to the case that suitsyour argument, then sure, your argument does indeed win. Is that yourconcern here?

The GR RFC explicitly says the restarting speaker "MUST defer routeselection" and "noted that prior to route selection, the speaker has noroutes to advertise to its peers and no routes to update the forwardingstate." Where 'its peers' includes other peers that are restarting.

Great. So what. We can do better, by not applying this optimisation whereit's not needed and so allowing us to provide a timer-free and hencefriendlier to use, easier to enable, GR that will still optimise out somechurn and without harming convergence time. And perfectly interoperably.And while still providing the update-delay mode you want.

I'm still arguing my (1.) and (2.) from previous mail, i.e.:

1. global deferral is neccessary to avoid network-propagating churn in
  the situation where only 1 router is restarting

Well, yes, if you want to limit the R-set to just 1-router you can ignoremy arguments. Sure.


In reality, in general, there will be 0...m peers that have restarted.

BGP consists of more than bytes on the wire.

I look forward to your patches to store received routes in a separateAdjIn RIB, as per RFC4271. ;) Until then, gosh, look our RIB /does/ infact have routes to advertise. ;)

What extra churn though? There is no extra churn relative to BGP-4.


It's either extra delay relative to BGP-4, or extra churn relative to
4724 GR.  Feel free to pick one.

Yes, there are trade-offs. I acknowledged that at least in our IRCdiscussion.


Minimising CPU churn vs optimising for best convergence. Etc.

it filters out the worst transients (sending a route that the remote-peer
has a better path for before you've got it, leading to
UPDATE-then-WITHDRAW to that remote peer)


Those are actually the least problematic because the peer won't select
them and they won't continue to travel through the BGP domain.


I am not arguing against the mode you want at all!

I'm saying other modes are also useful, because not everyone wants tominimise CPU churn. Some people are not running massive route-servers. Youwant to optimise for one case only.

Note, the "UPDATE, oops, yours was better. WITHDRAW" case that GR canavoid (both peer-specific or global) can cause issues locally, becauselocal FIB update-loads from such transients can be a problem ofthemselves. Including for us, for those who use Quagga for forwarding.

Or we can discuss how our community works overall.


Yes, please do kick that off.

regards,
--
Paul Jakma      [email protected]  @pjakma Key ID: 64A2FF6A
Fortune:
"If you are afraid of loneliness, don't marry."
-- Chekhov

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

[quagga-dev 12328] Re: [PATCH 2/5] bgpd: strip incorrect Graceful Restart R-bit code

Reply via email to