Acked-by: Donald Sharp <[email protected]>
On Thu, Dec 3, 2015 at 7:45 AM, Paul Jakma <[email protected]> wrote:
> Take 4, fix placeholder reference with the actual xref.
>
> * bgpd.texi: Document the -l argument. Update the 'BGP decision process'
> table
> to reflect what /actually/ is implemented. Add docs on
> 'compare-routerid' in
> the bestpath section.
>
> Add a section on MED, to highlight the issues it has by default, and to
> highlight that it is terminally broken for its original purpose in many
> modern iBGP topologies.
>
> * routemap.texi: set an anchor on 'set metric' so bgpd.texi can reference
> it.
> ---
> doc/bgpd.texi | 264
> ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> doc/routemap.texi | 1 +
> 2 files changed, 259 insertions(+), 6 deletions(-)
>
> diff --git a/doc/bgpd.texi b/doc/bgpd.texi
> index 7d92b5e..83d55a9 100644
> --- a/doc/bgpd.texi
> +++ b/doc/bgpd.texi
> @@ -18,6 +18,7 @@ BGP-4.
> @menu
> * Starting BGP::
> * BGP router::
> +* BGP MED::
> * BGP network::
> * BGP Peer::
> * BGP Peer Group::
> @@ -53,6 +54,13 @@ Set the bgp protocol's port number.
> @item -r
> @itemx --retain
> When program terminates, retain BGP routes added by zebra.
> +
> +@item -l
> +@itemx --listenon
> +Specify a specific IP address for bgpd to listen on, rather than its
> +default of INADDR_ANY / IN6ADDR_ANY. This can be useful to constrain bgpd
> +to an internal address, or to run multiple bgpd processes on one host.
> +
> @end table
>
> @node BGP router
> @@ -104,18 +112,59 @@ This command set distance value to
> @node BGP decision process
> @subsection BGP decision process
>
> +The decision process Quagga BGP uses to select routes is as follows:
> +
> @table @asis
> @item 1. Weight check
> +prefer higher local weight routes to lower routes.
>
> -@item 2. Local preference check.
> +@item 2. Local preference check
> +prefer higher local preference routes to lower.
> +
> +@item 3. Local route check
> +Prefer local routes (statics, aggregates, redistributed) to received
> routes.
> +
> +@item AS path length check
> +Prefer shortest hop-count AS_PATHs.
> +
> +@item 4. Origin check
> +Prefer the lowest origin type route. That is, prefer IGP origin routes to
> +EGP, to Incomplete routes.
> +
> +@item 5. MED check
> +Where routes with a MED were received from the same AS,
> +prefer the route with the lowest MED. @xref{BGP MED}.
> +
> +@item 6. External check
> +Prefer the route received from an external, eBGP peer
> +over routes received from other types of peers.
> +
> +@item 7. IGP cost check
> +Prefer the route with the lower IGP cost.
> +
> +@item 8. Multi-path check
> +If multi-pathing is enabled, then check whether
> +the routes not yet distinguished in preference may be considered equal. If
> +@ref{bgp bestpath as-path multipath-relax} is set, all such routes are
> +considered equal, otherwise routes received via iBGP with identical
> AS_PATHs
> +or routes received from eBGP neighbours in the same AS are considered
> equal.
> +
>
> -@item 3. Local route check.
> +@item 10. Router-ID check
> +Prefer the route with the lowest router-ID. If the
> +route has an ORIGINATOR_ID attribute, through iBGP reflection, then that
> +router ID is used, otherwise the router-ID of the peer the route was
> +received from is used.
>
> -@item 4. AS path length check.
> +@item 11. Cluster-List length check
> +The route with the shortest cluster-list
> +length is used. The cluster-list reflects the iBGP reflection path the
> +route has taken.
>
> -@item 5. Origin check.
> +@item 12. Peer address
> +Prefer the route received from the peer with the higher
> +transport layer address, as a last-resort tie-breaker.
>
> -@item 6. MED check.
> @end table
>
> @deffn {BGP} {bgp bestpath as-path confed} {}
> @@ -125,11 +174,31 @@ decision process.
> @end deffn
>
> @deffn {BGP} {bgp bestpath as-path multipath-relax} {}
> +@anchor{bgp bestpath as-path multipath-relax}
> This command specifies that BGP decision process should consider paths
> of equal AS_PATH length candidates for multipath computation. Without
> the knob, the entire AS_PATH must match for multipath computation.
> @end deffn
>
> +@deffn {BGP} {bgp bestpath compare-routerid} {}
> +@anchor{bgp bestpath compare-routerid}
> +
> +Ensure that where iBGP routes are equal on most metrics, including
> +local-pref, AS_PATH length, IGP cost, MED, the tie is broken based on
> +router-ID. If a route has an ORIGINATOR_ID attribute, i.e. it has been
> +reflected, that ID will be used. Otherwise, the router-ID of the peer the
> +route was received from will be used.
> +
> +The advantage of this is that the route-selection (at this point) will be
> +deterministic, across iBGP. The disadvantage is that such equal routes
> will
> +tend to take the same exit out of the AS, via the lowest-ID router.
> +
> +If this option is enabled, then the external-age check, where already
> +selected eBGP routes are preferred, is skipped.
> +@end deffn
> +
> +
> +
> @node BGP route flap dampening
> @subsection BGP route flap dampening
>
> @@ -151,6 +220,189 @@ The route-flap damping algorithm is compatible with
> @cite{RFC2439}. The use of t
> is not recommended nowadays, see @uref{
> http://www.ripe.net/ripe/docs/ripe-378,,RIPE-378}.
> @end deffn
>
> +@node BGP MED
> +@section BGP MED
> +
> +The BGP @acronym{MED, Multi_Exit_Discriminator} attribute is intended to
> +allow one AS to indicate its preferences for its ingress points to another
> +AS. The MED attribute will not be propagated on to another AS by the
> +receiving AS - it is `non-transitive' in the BGP sense.
> +
> +E.g.@:, if AS X and AS Y have 2 different BGP peering points, then AS X
> +might set a MED of 100 on routes advertised at one and a MED of 200 at the
> +other. When AS Y selects between otherwise equal routes to or via
> +AS X, AS Y should prefer to take the path via the lower MED peering of
> 100 with
> +AS X. Setting the MED allows an AS to influence the routing taken to it
> +within another, neighbouring AS.
> +
> +In this use of MED it is not really meaningful to compare the MED value on
> +routes where the next AS on the paths differs. E.g., if AS Y also had a
> +route for some destination via AS Z in addition to the routes from AS X,
> and
> +AS Z had also set a MED, it wouldn't make sense for AS Y to compare AS Z's
> +MED values to those of AS X. The MED values have been set by different
> +administrators, with different frames of reference.
> +
> +The default behaviour of BGP therefore is to not compare MED values across
> +routes received from different neighbouring ASes. In Quagga this is done
> by
> +comparing the neighbouring, left-most AS in the received AS_PATHs of the
> +routes and only comparing MED if those are the same.
> +
> +Unfortunately, this behaviour of MED, of sometimes being compared across
> +routes and sometimes not, depending on the properties of those other
> routes,
> +means MED can cause the order of preference over all the routes to be
> +undefined. That is, given routes A, B, and C, if A is preferred to B,
> and B
> +is preferred to C, then a well-defined order should mean the preference is
> +transitive (in the sense of orders @footnote{For some set of objects to
> have
> +an order, there @emph{must} be some binary ordering relation that is
> defined
> +between @emph{every} combination of those objects, @math{a \prec b}, and
> +that relation @emph{must} be transitive, i.e. if @math{a \prec b} and
> +@math{b \prec c} then that relation must carry over and it must be that
> +@math{a \prec c} for the objects to have an order. If the relation allows
> +for equality, i.e. if @math{a \prec b} and @math{b \prec a} may both be
> true
> +and this implies that @math{a = b}, then some objects may be equal in
> order to each
> +other and the order is partial. Otherwise, if there is an order, all the
> +objects are distinct and have a total order. MED unfortunately does not
> +define its order over all cases.}) and that A would be preferred to C.
> +
> +However, when MED is involved this need not be the case. With MED it is
> +possible that C is actually preferred over A. This can be true even where
> +BGP defines a deterministic ``most preferred'' route out of the full set
> of
> +A,B,C. With MED, for any given set of routes there may be a
> deterministically
> +preferred route, but there may be no way to arrange them into
> +any order of preference.
> +
> +That MED can induce non-transitive orders of preference over routes can
> +cause issues. Firstly, it may be perceived to cause routing table churn
> +locally at speakers; secondly it may cause routing instability in
> +non-full-mesh iBGP topologies, where sets of speakers continually
> oscillate
> +between different paths.
> +
> +The first issue arises from how speakers often implement routing
> decisions.
> +Though BGP defines a selection process that will deterministically select
> +the same route as best at any given speaker, even with MED, that process
> +requires evaluating all routes together. For performance and ease of
> +implementation reasons, many implementations evaluate route preferences
> in a
> +pair-wise fashion instead. Given there is no well-defined order when MED
> is
> +involved, the best route that will be chosen becomes subject to
> +implementation details, such as the order the routes are stored in. That
> +may be (locally) non-deterministic, e.g.@: it may be the order the routes
> +were received in.
> +
> +This indeterminism may be considered undesirable, though it need not cause
> +problems. It may mean additional routing churn is perceived, as sometimes
> +more updates may be produced than at other times in reaction to some
> event .
> +
> +This first issue can be fixed with a more deterministic route selection
> that
> +ensures routes are ordered by the neighbouring AS during selection.
> +@xref{bgp deterministic-med}. This may reduce the number of updates as
> +routes are received, and may in some cases reduce routing churn. Though,
> it
> +could equally deterministically produce the largest possible set of
> updates
> +in response to the most common sequence of received updates.
> +
> +A deterministic comparison tends to imply an additional overhead of
> sorting
> +over any set of n routes to a destination. The implementation of
> +deterministic MED in Quagga scales significantly worse than most sorting
> +algorithms at present, with the number of paths to a given destination.
> +That number is often low enough to not cause any issues, but where there
> are
> +many paths, the deterministic comparison may quickly become increasingly
> +expensive in terms of CPU.
> +
> +Deterministic local evaluation can @emph{not} fix the second issue of MED
> +however. Which is that the non-transitive preference of routes MED can
> +cause may lead to routing instability or oscillation across multiple
> +speakers. This can occur with non-full-mesh iBGP topologies that reduce
> the
> +routing information known to each speaker. This has primarily been
> +documented with iBGP route-reflection topologies. However, any other
> +route-hiding technologies potentially could also cause oscillation with
> MED.
> +
> +The second issue occurs where speakers each have only a subset of routes.
> +E.g. speaker X might have routes A,B, and speaker Y might have route C.
> X
> +selects A as its best, Y obviously can only choose C. They exchange
> routes
> +and then X might choose C as best from A,B,C while Y might choose A as
> best
> +from A,C - the non-transitive, non-defined order of preference of routes
> +that MED may induce allows this. They then withdraw their routes and the
> +cycle repeats. This can occur even if all speakers use a deterministic
> +order in route selection.
> +
> +More complex and insidious cycles of oscillation have been documented in
> the
> +literature. See, e.g., @cite{McPherson, D. and Gill, V. and Walton, D.,
> + "Border Gateway Protocol (BGP) Persistent Route Oscillation Condition",
> + IETF RFC3345}, and @cite{Flavel, A. and M. Roughan, "Stable and
> flexible
> + iBGP", ACM SIGCOMM 2009}, and @cite{Griffin, T. and G. Wilfong,
> +"On the correctness of IBGP configuration", ACM SIGCOMM 2002} for
> concrete examples and further
> +references.
> +
> +There is as of this writing @emph{no} known way to use MED for its
> original
> +purpose; @emph{and} reduce routing information in non-full-mesh iBGP
> +topologies (e.g with reflectors); @emph{and} be sure to avoid the
> +instability problems of MED due the non-transitive routing preferences it
> +can induce.
> +
> +The instability problems that MED can introduce on more complex,
> +non-full-mesh, iBGP topologies may be avoided either by:
> +
> +@itemize
> +@item
> +Deleting MED from all routes received from neighbouring ASes,
> +and/or by ignoring MED entirely in the decision process. There is no way
> to
> +do this at this time in Quagga.
> +@item
> +Setting @ref{bgp always-compare-med}, however this allows MED to be
> compared
> +across values set by different neighbour ASes, which may not produce
> +desirable results.
> +@item
> +Setting MED to the same value (e.g. 0) using @ref{routemap set metric}
> on all
> +received routes, in combination with setting @ref{bgp always-compare-med}
> on
> +all speakers.
> +@end itemize
> +
> +As MED is evaluated after the AS_PATH length check, another possible use
> for
> +MED is for intra-AS steering of routes with equal AS_PATH length, as an
> +extension of the last case above. As MED is evaluated before IGP metric,
> +this can allow cold-potato routing to be implemented, sending traffic to
> +preferred hand-offs with neighbours, rather than the closest hand-off
> +according to the IGP metric. This would be done with @ref{routemap set
> +metric} and by setting @ref{bgp always-compare-med} on all speakers.
> +
> +Note that even if action is taken to address the MED non-transitivity
> +issues, other oscillations may still be possible. E.g. on IGP cost if
> iBGP
> +and IGP topologies are at cross-purposes with each other.
> +
> +@deffn {BGP} {bgp deterministic-med} {}
> +@anchor{bgp deterministic-med}
> +
> +Carry out route-selection in way that produces more deterministic answers
> +locally, even in the face of MED and the lack of a well-defined order of
> +preference it can induce on routes. Without this option the preferred
> route
> +with MED may be determined largely by the order that routes were received
> +in.
> +
> +Setting this option will have a performance cost that may be noticeable
> when
> +there are many routes for each destination. Currently in Quagga it is
> +implemented in a way that scales poorly as the number of routes per
> +destination increases.
> +
> +The default is that this option is not set.
> +@end deffn
> +
> +Note that there are other sources of indeterminism in the route selection
> +process, @xref{BGP decision process}.
> +
> +@deffn {BGP} {bgp always-compare-med} {}
> +@anchor{bgp always-compare-med}
> +
> +Always compare the MED on routes, even when they were received from
> +different neighbouring ASes. Setting this option makes the order of
> +preference of routes more defined, and should eliminate MED induced
> +oscillations.
> +
> +This option can be used, together with @ref{routemap set metric} to use
> MED
> +as an intra-AS metric to steer equal-length AS_PATH routes to, e.g.,
> desired
> +exit points.
> +@end deffn
> +
> +
> +
> @node BGP network
> @section BGP network
>
> @@ -188,7 +440,7 @@ This command specifies an aggregate address.
> @end deffn
>
> @deffn {BGP} {aggregate-address @var{A.B.C.D/M} as-set} {}
> -This command specifies an aggregate address. Resulting routes inlucde
> +This command specifies an aggregate address. Resulting routes include
> AS set.
> @end deffn
>
> diff --git a/doc/routemap.texi b/doc/routemap.texi
> index db3e72d..7938c96 100644
> --- a/doc/routemap.texi
> +++ b/doc/routemap.texi
> @@ -171,6 +171,7 @@ Set the route's weight.
> @end deffn
>
> @deffn {Route-map Command} {set metric @var{metric}} {}
> +@anchor{routemap set metric}
> Set the BGP attribute MED.
> @end deffn
>
> --
> 2.5.0
>
>
> _______________________________________________
> Quagga-dev mailing list
> [email protected]
> https://lists.quagga.net/mailman/listinfo/quagga-dev
>
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev