Take 4, fix placeholder reference with the actual xref.

* bgpd.texi: Document the -l argument. Update the 'BGP decision process' table
  to reflect what /actually/ is implemented. Add docs on 'compare-routerid' in
  the bestpath section.

  Add a section on MED, to highlight the issues it has by default, and to
  highlight that it is terminally broken for its original purpose in many
  modern iBGP topologies.

* routemap.texi: set an anchor on 'set metric' so bgpd.texi can reference it.
---
 doc/bgpd.texi     | 264 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 doc/routemap.texi |   1 +
 2 files changed, 259 insertions(+), 6 deletions(-)

diff --git a/doc/bgpd.texi b/doc/bgpd.texi
index 7d92b5e..83d55a9 100644
--- a/doc/bgpd.texi
+++ b/doc/bgpd.texi
@@ -18,6 +18,7 @@ BGP-4.
 @menu
 * Starting BGP::                
 * BGP router::                  
+* BGP MED::
 * BGP network::                 
 * BGP Peer::                    
 * BGP Peer Group::              
@@ -53,6 +54,13 @@ Set the bgp protocol's port number.
 @item -r
 @itemx --retain
 When program terminates, retain BGP routes added by zebra.
+
+@item -l
+@itemx --listenon
+Specify a specific IP address for bgpd to listen on, rather than its 
+default of INADDR_ANY / IN6ADDR_ANY. This can be useful to constrain bgpd
+to an internal address, or to run multiple bgpd processes on one host.
+
 @end table
 
 @node BGP router
@@ -104,18 +112,59 @@ This command set distance value to
 @node BGP decision process
 @subsection BGP decision process
 
+The decision process Quagga BGP uses to select routes is as follows:
+
 @table @asis
 @item 1. Weight check
+prefer higher local weight routes to lower routes.
   
-@item 2. Local preference check.
+@item 2. Local preference check
+prefer higher local preference routes to lower.
+
+@item 3. Local route check
+Prefer local routes (statics, aggregates, redistributed) to received routes.
+
+@item AS path length check
+Prefer shortest hop-count AS_PATHs. 
+
+@item 4. Origin check
+Prefer the lowest origin type route.  That is, prefer IGP origin routes to
+EGP, to Incomplete routes. 
+
+@item 5. MED check
+Where routes with a MED were received from the same AS,
+prefer the route with the lowest MED. @xref{BGP MED}.
+
+@item 6. External check
+Prefer the route received from an external, eBGP peer
+over routes received from other types of peers.
+
+@item 7. IGP cost check
+Prefer the route with the lower IGP cost.
+
+@item 8. Multi-path check
+If multi-pathing is enabled, then check whether
+the routes not yet distinguished in preference may be considered equal. If
+@ref{bgp bestpath as-path multipath-relax} is set, all such routes are
+considered equal, otherwise routes received via iBGP with identical AS_PATHs
+or routes received from eBGP neighbours in the same AS are considered equal.
+
 
-@item 3. Local route check.
+@item 10. Router-ID check
+Prefer the route with the lowest router-ID.  If the
+route has an ORIGINATOR_ID attribute, through iBGP reflection, then that
+router ID is used, otherwise the router-ID of the peer the route was
+received from is used.
 
-@item 4. AS path length check.
+@item 11. Cluster-List length check
+The route with the shortest cluster-list
+length is used.  The cluster-list reflects the iBGP reflection path the
+route has taken.
 
-@item 5. Origin check.
+@item 12. Peer address
+Prefer the route received from the peer with the higher
+transport layer address, as a last-resort tie-breaker.
 
-@item 6. MED check.
 @end table
 
 @deffn {BGP} {bgp bestpath as-path confed} {}
@@ -125,11 +174,31 @@ decision process.
 @end deffn
 
 @deffn {BGP} {bgp bestpath as-path multipath-relax} {}
+@anchor{bgp bestpath as-path multipath-relax}
 This command specifies that BGP decision process should consider paths
 of equal AS_PATH length candidates for multipath computation. Without
 the knob, the entire AS_PATH must match for multipath computation.
 @end deffn
 
+@deffn {BGP} {bgp bestpath compare-routerid} {}
+@anchor{bgp bestpath compare-routerid}
+
+Ensure that where iBGP routes are equal on most metrics, including
+local-pref, AS_PATH length, IGP cost, MED, the tie is broken based on
+router-ID.  If a route has an ORIGINATOR_ID attribute, i.e.  it has been
+reflected, that ID will be used.  Otherwise, the router-ID of the peer the
+route was received from will be used.
+
+The advantage of this is that the route-selection (at this point) will be
+deterministic, across iBGP.  The disadvantage is that such equal routes will
+tend to take the same exit out of the AS, via the lowest-ID router.
+
+If this option is enabled, then the external-age check, where already
+selected eBGP routes are preferred, is skipped.
+@end deffn
+
+
+
 @node BGP route flap dampening
 @subsection BGP route flap dampening
 
@@ -151,6 +220,189 @@ The route-flap damping algorithm is compatible with 
@cite{RFC2439}. The use of t
 is not recommended nowadays, see 
@uref{http://www.ripe.net/ripe/docs/ripe-378,,RIPE-378}.
 @end deffn
 
+@node BGP MED
+@section BGP MED
+
+The BGP @acronym{MED, Multi_Exit_Discriminator} attribute is intended to
+allow one AS to indicate its preferences for its ingress points to another
+AS.  The MED attribute will not be propagated on to another AS by the
+receiving AS - it is `non-transitive' in the BGP sense.
+
+E.g.@:, if AS X and AS Y have 2 different BGP peering points, then AS X
+might set a MED of 100 on routes advertised at one and a MED of 200 at the
+other.  When AS Y selects between otherwise equal routes to or via
+AS X, AS Y should prefer to take the path via the lower MED peering of 100 with
+AS X.  Setting the MED allows an AS to influence the routing taken to it
+within another, neighbouring AS.
+
+In this use of MED it is not really meaningful to compare the MED value on
+routes where the next AS on the paths differs.  E.g., if AS Y also had a
+route for some destination via AS Z in addition to the routes from AS X, and
+AS Z had also set a MED, it wouldn't make sense for AS Y to compare AS Z's
+MED values to those of AS X.  The MED values have been set by different
+administrators, with different frames of reference.
+
+The default behaviour of BGP therefore is to not compare MED values across
+routes received from different neighbouring ASes.  In Quagga this is done by
+comparing the neighbouring, left-most AS in the received AS_PATHs of the
+routes and only comparing MED if those are the same.
+
+Unfortunately, this behaviour of MED, of sometimes being compared across
+routes and sometimes not, depending on the properties of those other routes,
+means MED can cause the order of preference over all the routes to be
+undefined.  That is, given routes A, B, and C, if A is preferred to B, and B
+is preferred to C, then a well-defined order should mean the preference is
+transitive (in the sense of orders @footnote{For some set of objects to have
+an order, there @emph{must} be some binary ordering relation that is defined
+between @emph{every} combination of those objects, @math{a \prec b}, and
+that relation @emph{must} be transitive, i.e.  if @math{a \prec b} and
+@math{b \prec c} then that relation must carry over and it must be that
+@math{a \prec c} for the objects to have an order.  If the relation allows
+for equality, i.e. if @math{a \prec b} and @math{b \prec a} may both be true
+and this implies that @math{a = b}, then some objects may be equal in order to 
each
+other and the order is partial.  Otherwise, if there is an order, all the
+objects are distinct and have a total order.  MED unfortunately does not
+define its order over all cases.}) and that A would be preferred to C.
+
+However, when MED is involved this need not be the case.  With MED it is
+possible that C is actually preferred over A.  This can be true even where
+BGP defines a deterministic ``most preferred'' route out of the full set of
+A,B,C. With MED, for any given set of routes there may be a deterministically
+preferred route, but there may be no way to arrange them into
+any order of preference.
+
+That MED can induce non-transitive orders of preference over routes can
+cause issues.  Firstly, it may be perceived to cause routing table churn
+locally at speakers; secondly it may cause routing instability in
+non-full-mesh iBGP topologies, where sets of speakers continually oscillate
+between different paths.
+
+The first issue arises from how speakers often implement routing decisions. 
+Though BGP defines a selection process that will deterministically select
+the same route as best at any given speaker, even with MED, that process
+requires evaluating all routes together.  For performance and ease of
+implementation reasons, many implementations evaluate route preferences in a
+pair-wise fashion instead.  Given there is no well-defined order when MED is
+involved, the best route that will be chosen becomes subject to
+implementation details, such as the order the routes are stored in.  That
+may be (locally) non-deterministic, e.g.@: it may be the order the routes
+were received in.  
+
+This indeterminism may be considered undesirable, though it need not cause
+problems.  It may mean additional routing churn is perceived, as sometimes
+more updates may be produced than at other times in reaction to some event .
+
+This first issue can be fixed with a more deterministic route selection that
+ensures routes are ordered by the neighbouring AS during selection. 
+@xref{bgp deterministic-med}.  This may reduce the number of updates as
+routes are received, and may in some cases reduce routing churn.  Though, it
+could equally deterministically produce the largest possible set of updates
+in response to the most common sequence of received updates.
+
+A deterministic comparison tends to imply an additional overhead of sorting
+over any set of n routes to a destination.  The implementation of
+deterministic MED in Quagga scales significantly worse than most sorting
+algorithms at present, with the number of paths to a given destination. 
+That number is often low enough to not cause any issues, but where there are
+many paths, the deterministic comparison may quickly become increasingly
+expensive in terms of CPU.
+
+Deterministic local evaluation can @emph{not} fix the second issue of MED
+however.  Which is that the non-transitive preference of routes MED can
+cause may lead to routing instability or oscillation across multiple
+speakers.  This can occur with non-full-mesh iBGP topologies that reduce the
+routing information known to each speaker.  This has primarily been
+documented with iBGP route-reflection topologies.  However, any other
+route-hiding technologies potentially could also cause oscillation with MED.
+
+The second issue occurs where speakers each have only a subset of routes. 
+E.g.  speaker X might have routes A,B, and speaker Y might have route C.  X
+selects A as its best, Y obviously can only choose C.  They exchange routes
+and then X might choose C as best from A,B,C while Y might choose A as best
+from A,C - the non-transitive, non-defined order of preference of routes
+that MED may induce allows this.  They then withdraw their routes and the
+cycle repeats.  This can occur even if all speakers use a deterministic
+order in route selection. 
+
+More complex and insidious cycles of oscillation have been documented in the
+literature.  See, e.g., @cite{McPherson, D.  and Gill, V.  and Walton, D.,
+ "Border Gateway Protocol (BGP) Persistent Route Oscillation Condition",
+ IETF RFC3345}, and @cite{Flavel, A.  and M.  Roughan, "Stable and flexible
+ iBGP", ACM SIGCOMM 2009}, and @cite{Griffin, T.  and G.  Wilfong, 
+"On the correctness of IBGP configuration", ACM SIGCOMM 2002} for concrete 
examples and further
+references.
+
+There is as of this writing @emph{no} known way to use MED for its original
+purpose; @emph{and} reduce routing information in non-full-mesh iBGP
+topologies (e.g with reflectors); @emph{and} be sure to avoid the
+instability problems of MED due the non-transitive routing preferences it
+can induce.
+
+The instability problems that MED can introduce on more complex,
+non-full-mesh, iBGP topologies may be avoided either by:
+
+@itemize 
+@item
+Deleting MED from all routes received from neighbouring ASes,
+and/or by ignoring MED entirely in the decision process.  There is no way to
+do this at this time in Quagga.
+@item
+Setting @ref{bgp always-compare-med}, however this allows MED to be compared
+across values set by different neighbour ASes, which may not produce
+desirable results.
+@item
+Setting MED to the same value (e.g.  0) using @ref{routemap set metric} on all
+received routes, in combination with setting @ref{bgp always-compare-med} on
+all speakers.
+@end itemize
+
+As MED is evaluated after the AS_PATH length check, another possible use for
+MED is for intra-AS steering of routes with equal AS_PATH length, as an
+extension of the last case above.  As MED is evaluated before IGP metric,
+this can allow cold-potato routing to be implemented, sending traffic to
+preferred hand-offs with neighbours, rather than the closest hand-off
+according to the IGP metric.  This would be done with @ref{routemap set
+metric} and by setting @ref{bgp always-compare-med} on all speakers.
+
+Note that even if action is taken to address the MED non-transitivity
+issues, other oscillations may still be possible.  E.g.  on IGP cost if iBGP
+and IGP topologies are at cross-purposes with each other.
+
+@deffn {BGP} {bgp deterministic-med} {}
+@anchor{bgp deterministic-med}
+
+Carry out route-selection in way that produces more deterministic answers
+locally, even in the face of MED and the lack of a well-defined order of
+preference it can induce on routes.  Without this option the preferred route
+with MED may be determined largely by the order that routes were received
+in.
+
+Setting this option will have a performance cost that may be noticeable when
+there are many routes for each destination.  Currently in Quagga it is
+implemented in a way that scales poorly as the number of routes per
+destination increases.
+
+The default is that this option is not set.
+@end deffn
+
+Note that there are other sources of indeterminism in the route selection
+process, @xref{BGP decision process}.
+
+@deffn {BGP} {bgp always-compare-med} {}
+@anchor{bgp always-compare-med}
+
+Always compare the MED on routes, even when they were received from
+different neighbouring ASes.  Setting this option makes the order of
+preference of routes more defined, and should eliminate MED induced
+oscillations.
+
+This option can be used, together with @ref{routemap set metric} to use MED
+as an intra-AS metric to steer equal-length AS_PATH routes to, e.g., desired
+exit points.
+@end deffn
+
+
+
 @node BGP network
 @section BGP network
 
@@ -188,7 +440,7 @@ This command specifies an aggregate address.
 @end deffn
 
 @deffn {BGP} {aggregate-address @var{A.B.C.D/M} as-set} {}
-This command specifies an aggregate address.  Resulting routes inlucde
+This command specifies an aggregate address.  Resulting routes include
 AS set.
 @end deffn
 
diff --git a/doc/routemap.texi b/doc/routemap.texi
index db3e72d..7938c96 100644
--- a/doc/routemap.texi
+++ b/doc/routemap.texi
@@ -171,6 +171,7 @@ Set the route's weight.
 @end deffn
 
 @deffn {Route-map Command} {set metric @var{metric}} {}
+@anchor{routemap set metric}
 Set the BGP attribute MED.
 @end deffn
 
-- 
2.5.0


_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to