Take 2. Fix some typos and mispoolings. Some word-smithing and
clarifications.
I've tried to keep it neutral editorially, but my low opinion of MED
may
have snuck in here and there.
It's worth noting that Quagga's defaults are "dangerous by default" at
present wrt MED. It'd be nice to fix that.
-----8<---------8<---------8<---------8<---------8<---------8<----
* bgpd.texi: Document the -l argument. Update the 'BGP decision
process' table
to reflect what /actually/ is implemented. Add docs on
'compare-routerid' in
the bestpath section.
Add a section on MED, to highlight the issues it has by default, and
to
highlight that it is terminally broken for its original purpose in
many
modern iBGP topologies.
* routemap.texi: set an anchor on 'set metric' so bgpd.texi can
reference it.
---
doc/bgpd.texi | 252
++++++++++++++++++++++++++++++++++++++++++++++++++++--
doc/routemap.texi | 1 +
2 files changed, 247 insertions(+), 6 deletions(-)
diff --git a/doc/bgpd.texi b/doc/bgpd.texi
index 7d92b5e..1c5d701 100644
--- a/doc/bgpd.texi
+++ b/doc/bgpd.texi
@@ -53,6 +53,13 @@ Set the bgp protocol's port number.
@item -r
@itemx --retain
When program terminates, retain BGP routes added by zebra.
+
+@item -l
+@itemx --listenon
+Specify a specific IP address for bgpd to listen on, rather than its
+default of INADDR_ANY / IN6ADDR_ANY. This can be useful to constrain
bgpd
+to an internal address, or to run multiple bgpd processes on one
host.
+
@end table
@node BGP router
@@ -104,18 +111,59 @@ This command set distance value to
@node BGP decision process
@subsection BGP decision process
+The decision process Quagga BGP uses to select routes is as follows:
+
@table @asis
@item 1. Weight check
+prefer higher local weight routes to lower routes.
-@item 2. Local preference check.
+@item 2. Local preference check
+prefer higher local preference routes to lower.
+
+@item 3. Local route check
+Prefer local routes (statics, aggregates, redistributed) to received
routes.
+
+@item AS path length check
+Prefer shortest hop-count AS_PATHs.
+
+@item 4. Origin check
+Prefer the lowest origin type route. That is, prefer IGP origin
routes to
+EGP, to Incomplete routes.
+
+@item 5. MED check
+Where routes with a MED were received from the same AS,
+prefer the route with the lowest MED. See ...
+
+@item 6. External check
+Prefer the route received from an external, eBGP peer
+over routes received from other types of peers.
+
+@item 7. IGP cost check
+Prefer the route with the lower IGP cost.
+
+@item 8. Multi-path check
+If multi-pathing is enabled, then check whether
+the routes not yet distinguished in preference may be considered
equal. If
+@ref{bgp bestpath as-path multipath-relax} is set, all such routes
are
+considered equal, otherwise routes received via iBGP with identical
AS_PATHs
+or routes received from eBGP neighbours in the same AS are considered
equal.
+
-@item 3. Local route check.
+@item 10. Router-ID check
+Prefer the route with the lowest router-ID. If the
+route has an ORIGINATOR_ID attribute, through iBGP reflection, then
that
+router ID is used, otherwise the router-ID of the peer the route was
+received from is used.
-@item 4. AS path length check.
+@item 11. Cluster-List length check
+The route with the shortest cluster-list
+length is used. The cluster-list reflects the iBGP reflection path
the
+route has taken.
-@item 5. Origin check.
+@item 12. Peer address
+Prefer the route received from the peer with the higher
+transport layer address, as a last-resort tie-breaker.
-@item 6. MED check.
@end table
@deffn {BGP} {bgp bestpath as-path confed} {}
@@ -125,11 +173,31 @@ decision process.
@end deffn
@deffn {BGP} {bgp bestpath as-path multipath-relax} {}
+@anchor{bgp bestpath as-path multipath-relax}
This command specifies that BGP decision process should consider paths
of equal AS_PATH length candidates for multipath computation. Without
the knob, the entire AS_PATH must match for multipath computation.
@end deffn
+@deffn {BGP} {bgp bestpath compare-routerid} {}
+@anchor{bgp bestpath compare-routerid}
+
+Ensure that where iBGP routes are equal on most metrics, including
+local-pref, AS_PATH length, IGP cost, MED, the tie is broken based on
+router-ID. If a route has an ORIGINATOR_ID attribute, i.e. it has
been
+reflected, that ID will be used. Otherwise, the router-ID of the
peer the
+route was received from will be used.
+
+The advantage of this is that the route-selection (at this point)
will be
+deterministic, across iBGP. The disadvantage is that such equal
routes will
+tend to take the same exit out of the AS, via the lowest-ID router.
+
+If this option is enabled, then the external-age check, where already
+selected eBGP routes are preferred, is skipped.
+@end deffn
+
+
+
@node BGP route flap dampening
@subsection BGP route flap dampening
@@ -151,6 +219,178 @@ The route-flap damping algorithm is compatible
with @cite{RFC2439}. The use of t
is not recommended nowadays, see
@uref{http://www.ripe.net/ripe/docs/ripe-378,,RIPE-378}.
@end deffn
+@node BGP MED
+@section BGP MED
+
+The BGP @acronym{MED, Multi_Exit_Discriminator} attribute is intended
to
+allow one AS to indicate preferences for ingress points to another
AS. The
+MED attribute will not be propagated on to another AS by the
receiving AS -
+it is `non-transitive' in the BGP sense.
+
+E.g.@:, if AS X and AS Y have 2 different BGP peerings, then AS X
might set
+a MED of 100 on routes advertised on one of those and a MED of 200 on
+another. AS Y then, when selecting between otherwise equal routes to
or via
+AS X, should prefer to take the path via the lower MED peering of 100
with
+AS X. Setting the MED allows an AS to influence the routing taken to
it
+within another, neighbouring AS.
+
+In this use of MED, it is not really meaningful to compare the MED
value on
+routes where the next AS on the paths differs. E.g., if AS Y also
had a
+route for some destination via AS Z in addition to the routes from AS
X, and
+AS Z had also set a MED, it wouldn't make sense for AS Y to compare
AS Z's
+MED values to those of AS X. The MED values have been set by
different
+administrators, with different frames of reference.
+
+The default behaviour of BGP therefore is to not compare MED values
across
+routes received from different neighbouring ASes. In Quagga this is
done by
+comparing the neighbouring, left-most AS in the received AS_PATHs of
the
+routes and only comparing MED if those are the same.
+
+Unfortunately, this behaviour means MED can cause the order of
preference
+over all the routes to be undefined. That is, given routes A, B, and
C, if
+A is preferred to B, and B is preferred to C, then a well-defined
order
+should mean the preference is transitive (in the mathematical sense
in the
+context of orders) and that A would be preferred to C.
+
+However, when MED is involved this need not be the case. With MED it
is
+possible that C is actually preferred over A. This can be true even
where
+BGP defines a deterministic ``most preferred'' route out of the full
set of
+A,B,C. With MED, for any given set of routes there may be a
deterministically
+preferred route, but there may be no way to arrange them into
+any order of preference.
+
+That MED can induce non-transitive orders of preference over routes
can
+cause issues. Firstly, it may be perceived to cause routing table
churn
+locally at speakers; secondly it may cause routing instability in
+non-full-mesh iBGP topologies, where sets of speakers continually
oscillate
+between different paths.
+
+The first issue arises from how speakers often implement routing
decisions.
+Though BGP defines a selection process that will deterministically
select
+the same route as best at any given speaker, even with MED, that
process
+requires evaluating all routes together. For performance and ease of
+implementation reasons, many implementations evaluate route
preferences in a
+pair-wise fashion instead. Given there is no well-defined order when
MED is
+involved, the best route that will be chosen becomes subject to
+implementation details, such as the order the routes are stored in.
That
+may be (locally) non-deterministic, e.g.@: it may be the order the
routes
+were received in.
+
+This indeterminism may be considered undesirable, though it need not
cause
+problems. It may mean additional routing churn is perceived, as
sometimes
+more updates may be produced than at other times in reaction to some
event .
+
+This first issue can be fixed with a more deterministic route
selection that
+ensures routes are ordered by the neighbouring AS during selection.
+@xref{bgp deterministic-med}. This may reduce the number of updates
as
+routes are received, and may in some cases reduce routing churn.
Though, it
+could equally deterministically produce the largest possible set of
updates
+in response to the most common sequence of received updates.
+
+A deterministic comparison tends to imply an additional overhead of
sorting
+over any set of n routes to a destination. The implementation of
+deterministic MED in Quagga scales significantly worse than most
sorting
+algorithms at present, with the number of paths to a given
destination.
+That number is often low enough to not cause any issues, but where
there are
+many paths, the deterministic comparison may quickly become
increasingly
+expensive in terms of CPU.
+
+Deterministic local evaluation can @emph{not} fix the second issue of
MED
+however. Which is that the non-transitive preference of routes MED
can
+cause may lead to routing instability or oscillation across multiple
+speakers. This can occur with non-full-mesh iBGP topologies that
reduce the
+routing information known to each speaker. This has primarily been
+documented with iBGP route-reflection topologies. However, any other
+route-hiding technologies potentially could also cause oscillation
with MED.
+
+The second issue occurs where speakers each have only a subset of
routes.
+E.g. speaker X might have routes A,B, and speaker Y might have route
C. X
+selects A as its best, Y obviously can only choose C. They exchange
routes
+and then X might choose C as best from A,B,C while Y might choose A
as best
+from A,C - the non-transitive, non-defined order of preference of
routes
+that MED may induce allows this. They then withdraw their routes and
the
+cycle repeats. This can occur even if all speakers use a
deterministic
+order in route selection.
+
+More complex and insidious cycles of oscillation have been documented
in the
+literature. See, e.g., @cite{McPherson, D. and Gill, V. and
Walton, D.,
+ "Border Gateway Protocol (BGP) Persistent Route Oscillation
Condition",
+ IETF RFC3345}, and @cite{Flavel, A. and M. Roughan, "Stable and
flexible
+ iBGP", ACM SIGCOMM 2009}, and @cite{Griffin, T. and G. Wilfong,
+"On the correctness of IBGP configuration", ACM SIGCOMM 2002} for
concrete examples and further
+references.
+
+There is as of this writing @emph{no} known way to use MED for its
original
+purpose; @emph{and} reduce routing information in non-full-mesh iBGP
+topologies (e.g with reflectors); @emph{and} be sure to avoid the
+instability problems of MED due the non-transitive routing
preferences it
+can induce.
+
+The instability problems that MED can introduce on more complex,
+non-full-mesh, iBGP topologies may be avoided either by:
+
+@itemize
+@item
+Deleting MED from all routes received from neighbouring ASes,
+and/or by ignoring MED entirely in the decision process. There is no
way to
+do this at this time in Quagga.
+@item
+Setting @ref{bgp always-compare-med}, however this allows MED to be
compared
+across values set by different neighbour ASes, which may not produce
+desirable results.
+@item
+Setting MED to the same value (e.g. 0) using @ref{routemap set
metric} on all
+received routes, in combination with setting @ref{bgp
always-compare-med} on
+all speakers.
+@end itemize
+
+As MED is evaluated after the AS_PATH length check, another possible
use for
+MED is for intra-AS steering of routes with equal AS_PATH length, as
an
+extension of the last case above. As MED is evaluated before IGP
metric,
+this can allow cold-potato routing to be implemented, sending traffic
to
+preferred hand-offs with neighbours, rather than the closest IGP
hand-off.
+This would be done with @ref{routemap set metric} and by setting
@ref{bgp
+always-compare-med} on all speakers.
+
+Note that even if action is taken to address the MED non-transitivity
+issues, other oscillations may still be possible. E.g. on IGP cost
if iBGP
+and IGP topologies are at cross-purposes with each other.
+
+@deffn {BGP} {bgp deterministic-med} {}
+@anchor{bgp deterministic-med}
+
+Carry out route-selection in way that produces more deterministic
answers
+locally, even in the face of MED and the lack of a well-defined order
of
+preference it can induce on routes. Without this option the
preferred route
+with MED may be determined largely by the order that routes were
received
+in.
+
+Setting this option will have a performance cost that may be
noticeable when
+there are many routes for each destination. Currently in Quagga it
is
+implemented in a way that scales poorly as the number of routes per
+destination increases.
+
+The default is that this option is not set.
+@end deffn
+
+Note that there are other sources of indeterminism in the route
selection
+process, @xref{BGP decision process}.
+
+@deffn {BGP} {bgp always-compare-med} {}
+@anchor{bgp always-compare-med}
+
+Always compare the MED on routes, even when they were received from
+different neighbouring ASes. Setting this option makes the order of
+preference of routes more defined, and should eliminate MED induced
+oscillations.
+
+This option can be used, together with @ref{routemap set metric} to
use MED
+as an intra-AS metric to steer equal-length AS_PATH routes to, e.g.,
desired
+exit points.
+@end deffn
+
+
+
@node BGP network
@section BGP network
@@ -188,7 +428,7 @@ This command specifies an aggregate address.
@end deffn
@deffn {BGP} {aggregate-address @var{A.B.C.D/M} as-set} {}
-This command specifies an aggregate address. Resulting routes
inlucde
+This command specifies an aggregate address. Resulting routes
include
AS set.
@end deffn
diff --git a/doc/routemap.texi b/doc/routemap.texi
index db3e72d..7938c96 100644
--- a/doc/routemap.texi
+++ b/doc/routemap.texi
@@ -171,6 +171,7 @@ Set the route's weight.
@end deffn
@deffn {Route-map Command} {set metric @var{metric}} {}
+@anchor{routemap set metric}
Set the BGP attribute MED.
@end deffn
--
2.5.0
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev