On Thu, 3 Dec 2015, Daniel Walton wrote:

I think it is a good idea to document how the bestpath algorithm works but personally there is an overwhelming amount of text here about MED.

There's a dearth of operator-orientated info out there on MED and its issues. Most of the docs seem to focus on the immediate, local consequences of DMED and always-compare, but don't really explore the bigger picture.

In particular, the non-transitive preferences over routes that MED can induce and the intrinsic problems that can cause with route-hiding iBGP mechanisms doesn't seem well-described outside of academic literature - which isn't very accessible to non-academics (both in literal access terms, and ease of reading).

 
      +@deffn {BGP} {bgp bestpath compare-routerid} {}
      +@anchor{bgp bestpath compare-routerid}
      +
      +Ensure that where iBGP routes are equal on most metrics,
      including
      +local-pref, AS_PATH length, IGP cost, MED, the tie is broken
      based on
      +router-ID.  If a route has an ORIGINATOR_ID attribute, i.e.  it
      has been
      +reflected, that ID will be used.  Otherwise, the router-ID of
      the peer the
      +route was received from will be used.
      +
      +The advantage of this is that the route-selection (at this
      point) will be
      +deterministic, across iBGP.  The disadvantage is that such
      equal routes will
      +tend to take the same exit out of the AS, via the lowest-ID
      router.
      +

Comparing the router-id always happens if both paths are from iBGP peers, it is only if they are both from eBGP peers that it applies.

"iBGP routes" in the above is probably badly worded. I didn't mean that to be iBGP origin there, but routes being compared where both were received from iBGP (or both from eBGP, as you note - but then only if the external-age check didn't do a return).

Or did you mean something else?
  
Not your change but above reads "The use of t is not" instead of "The use of
it is not"

Can fix in another trivial patch.
 
      +A deterministic comparison tends to imply an additional
      overhead of sorting
      +over any set of n routes to a destination.  The implementation
      of
      +deterministic MED in Quagga scales significantly worse than
      most sorting
      +algorithms at present, with the number of paths to a given
      destination.
      +That number is often low enough to not cause any issues, but
      where there are
      +many paths, the deterministic comparison may quickly become
      increasingly
      +expensive in terms of CPU.

I would say that the details of the sorting algorithm used is probably more
info that the average person is interested in if they are trying to
understand how bestpath works.

It seems relevant to an operator. DMED is not free, it has an intrinsic cost. Operators surely will want to have the information they need to be able to balance the costs against the benefits?
  
      +There is as of this writing @emph{no} known way to use MED for
      its original
      +purpose; @emph{and} reduce routing information in non-full-mesh
      iBGP
      +topologies (e.g with reflectors); @emph{and} be sure to avoid
      the
      +instability problems of MED due the non-transitive routing
      preferences it
      +can induce.


But there is a way :)

Is is _sure to avoid_ though? There are many many networks, and there are different ways BGP can behave even on the same network.

MED intrinsically has an undefined order of preference across routes, that's the source of all the issues with it. iBGP topologies are getting bigger and more complex (though some, DCs, are getting more regular in structure), and we ship with defaults that leave the more complex iBGP networks wide open to problems caused by MED.

 *  Preferring the oldest external path solves one scenario
 *  "Type I" churn (as described in RFC 3345) can be solved by tweaking IGP
    metrics.  If you are using RRs you just have to make your inter-cluster
    links have a higher cost than your intra-cluster links (same theory with
    confeds).  When we first discovered MED churn most customers that were
    hitting it were able to solve it via this approach.
 *  "Type II" churn can be solved by using addpath to TX the bestpath per
    neighbor-AS...see draft-ietf-idr-route-oscillation-stop-01

I don't know how to describe these cases in a way that an operator could apply the advice and be sure they had avoided MED issues though. Patch to the doc welcome though. ;)

Indeed, can you prove the issues are solved with those approaches? There are papers that derive quite simple rules that engineers can apply and be _sure_ that their path-vector protocol will converge, and even will converge on optimal routes. The IGP cost case potentially can be proven to meet those rules, but that proof will be specific to the network - not a general proof to any network.

There are simple fixes to the "churn" issues, certainly if one leverages
the academic work and recognising the root of the problem: It's due to fundamental ordering properties of the metrics involved (or utter lack thereof).

I think the text above remains correct, there is no way to have all those 3 things, including being "sure to avoid the instability problems", as far as I am aware. Least in general (that phrase might be missing).

      +Note that even if action is taken to address the MED
      non-transitivity
      +issues, other oscillations may still be possible.  E.g.  on IGP
      cost if iBGP
      +and IGP topologies are at cross-purposes with each other.


Can you clarify here? 

Flavel and Roughan give an example, and I think at least one of Griffin's papers might give a few examples of IGP "wedgies", iirc.


Would say "produces deterministic" instead of "produces more deterministic".

Ack.

      +Setting this option will have a performance cost that may be
      noticeable when +there are many routes for each destination.
      Currently in Quagga it is +implemented in a way that scales poorly
      as the number of routes per +destination increases.


Why don't we fix our implementation so that it is less expensive and chop the paragraph above? I am worried that we will end up discouraging customers from enabling deterministic-med.

Well, I'm not aware of DMED fixing anything, so I'm not going to spend my time on that. Someone else could, and update the above.

Till then, it seems like important information for admins. If they choose not to enable DMED, they're not losing anything afaik.

Really, they should enable always-compare and set all MEDs to 0 when received from eBGP, unless they have a specific use for MED. In which case, DMED would be irrelevant anyway.
  
      +Note that there are other sources of indeterminism in the route
      selection
      +process, @xref{BGP decision process}.

Other than "prefer oldest external" what sources of indeterminism are there?

That's the one I had in mind.

regards,
--
Paul Jakma, HPE Networking, Advanced Technology Group
Fortune:
  Live within your income, even if you have to borrow to do so.
  -Josh Billings
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to