[quagga-dev 14070] Re: [PATCH] docs: Update bgpd docs, inc. on decision process, and with a section on MED.

Paul Jakma Thu, 03 Dec 2015 06:13:54 -0800

On Thu, 3 Dec 2015, Daniel Walton wrote:

I think it is a good idea to document how the bestpath algorithm worksbut personally there is an overwhelming amount of text here about MED.

There's a dearth of operator-orientated info out there on MED and itsissues. Most of the docs seem to focus on the immediate, localconsequences of DMED and always-compare, but don't really explore thebigger picture.

In particular, the non-transitive preferences over routes that MED caninduce and the intrinsic problems that can cause with route-hiding iBGPmechanisms doesn't seem well-described outside of academic literature -which isn't very accessible to non-academics (both in literal accessterms, and ease of reading).

 
      +@deffn {BGP} {bgp bestpath compare-routerid} {}
      +@anchor{bgp bestpath compare-routerid}
      +
      +Ensure that where iBGP routes are equal on most metrics,
      including
      +local-pref, AS_PATH length, IGP cost, MED, the tie is broken
      based on
      +router-ID.  If a route has an ORIGINATOR_ID attribute, i.e.  it
      has been
      +reflected, that ID will be used.  Otherwise, the router-ID of
      the peer the
      +route was received from will be used.
      +
      +The advantage of this is that the route-selection (at this
      point) will be
      +deterministic, across iBGP.  The disadvantage is that such
      equal routes will
      +tend to take the same exit out of the AS, via the lowest-ID
      router.
      +

Comparing the router-id always happens if both paths are from iBGPpeers, it is only if they are both from eBGP peers that it applies.

"iBGP routes" in the above is probably badly worded. I didn't mean that tobe iBGP origin there, but routes being compared where both were receivedfrom iBGP (or both from eBGP, as you note - but then only if theexternal-age check didn't do a return).


Or did you mean something else?

Not your change but above reads "The use of t is not" instead of "The use of
it is not"


Can fix in another trivial patch.

      +A deterministic comparison tends to imply an additional
      overhead of sorting
      +over any set of n routes to a destination.  The implementation
      of
      +deterministic MED in Quagga scales significantly worse than
      most sorting
      +algorithms at present, with the number of paths to a given
      destination.
      +That number is often low enough to not cause any issues, but
      where there are
      +many paths, the deterministic comparison may quickly become
      increasingly
      +expensive in terms of CPU.

I would say that the details of the sorting algorithm used is probably more
info that the average person is interested in if they are trying to
understand how bestpath works.

It seems relevant to an operator. DMED is not free, it has an intrinsiccost. Operators surely will want to have the information they need to beable to balance the costs against the benefits?

      +There is as of this writing @emph{no} known way to use MED for
      its original
      +purpose; @emph{and} reduce routing information in non-full-mesh
      iBGP
      +topologies (e.g with reflectors); @emph{and} be sure to avoid
      the
      +instability problems of MED due the non-transitive routing
      preferences it
      +can induce.


But there is a way :)

Is is _sure to avoid_ though? There are many many networks, and there aredifferent ways BGP can behave even on the same network.

MED intrinsically has an undefined order of preference across routes,that's the source of all the issues with it. iBGP topologies are gettingbigger and more complex (though some, DCs, are getting more regular instructure), and we ship with defaults that leave the more complex iBGPnetworks wide open to problems caused by MED.

 *  Preferring the oldest external path solves one scenario
 *  "Type I" churn (as described in RFC 3345) can be solved by tweaking IGP
    metrics.  If you are using RRs you just have to make your inter-cluster
    links have a higher cost than your intra-cluster links (same theory with
    confeds).  When we first discovered MED churn most customers that were
    hitting it were able to solve it via this approach.
 *  "Type II" churn can be solved by using addpath to TX the bestpath per
    neighbor-AS...see draft-ietf-idr-route-oscillation-stop-01

I don't know how to describe these cases in a way that an operator couldapply the advice and be sure they had avoided MED issues though. Patch tothe doc welcome though. ;)

Indeed, can you prove the issues are solved with those approaches? Thereare papers that derive quite simple rules that engineers can apply and be_sure_ that their path-vector protocol will converge, and even willconverge on optimal routes. The IGP cost case potentially can be proven tomeet those rules, but that proof will be specific to the network - not ageneral proof to any network.


There are simple fixes to the "churn" issues, certainly if one leverages

the academic work and recognising the root of the problem: It's due tofundamental ordering properties of the metrics involved (or utter lackthereof).

I think the text above remains correct, there is no way to have all those3 things, including being "sure to avoid the instability problems", as faras I am aware. Least in general (that phrase might be missing).

      +Note that even if action is taken to address the MED
      non-transitivity
      +issues, other oscillations may still be possible.  E.g.  on IGP
      cost if iBGP
      +and IGP topologies are at cross-purposes with each other.


Can you clarify here?

Flavel and Roughan give an example, and I think at least one of Griffin'spapers might give a few examples of IGP "wedgies", iirc.

Would say "produces deterministic" instead of "produces more deterministic".


Ack.

      +Setting this option will have a performance cost that may be
      noticeable when +there are many routes for each destination.
      Currently in Quagga it is +implemented in a way that scales poorly
      as the number of routes per +destination increases.

Why don't we fix our implementation so that it is less expensive andchop the paragraph above? I am worried that we will end up discouragingcustomers from enabling deterministic-med.

Well, I'm not aware of DMED fixing anything, so I'm not going to spend mytime on that. Someone else could, and update the above.

Till then, it seems like important information for admins. If they choosenot to enable DMED, they're not losing anything afaik.

Really, they should enable always-compare and set all MEDs to 0 whenreceived from eBGP, unless they have a specific use for MED. In whichcase, DMED would be irrelevant anyway.

      +Note that there are other sources of indeterminism in the route
      selection
      +process, @xref{BGP decision process}.

Other than "prefer oldest external" what sources of indeterminism are there?


That's the one I had in mind.

regards,
--
Paul Jakma, HPE Networking, Advanced Technology Group
Fortune:
  Live within your income, even if you have to borrow to do so.
  -Josh Billings

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

[quagga-dev 14070] Re: [PATCH] docs: Update bgpd docs, inc. on decision process, and with a section on MED.

Reply via email to