On Thu, 3 Dec 2015, Daniel Walton wrote:
If both routes are received from iBGP peers (doesn't matter if they were
originated in our AS or not) then we compare the router-id. The compare
router-id knob only comes into play when comparing two paths from eBGP
peers because it causes us to skip the 'prefer oldest' check.
My point being that it is confusing to explain the knob in the context of
iBGP when the knob only has an impact for eBGP peers.
Yes. Tried to fix.
It seems relevant to an operator. DMED is not free, it has an
intrinsic cost. Operators surely will want to have the
information they need to be able to balance the costs against
the benefits?
It is an implemenation detail though. We should explain what the knob
does and based on that the operator can decide whether or not he/she
wants to enable it. If doing so causes performance issues then we
address those and fix them.
That cost won't be fixable though. It unavoidable comes with some
additional cost - supra-linear at least, so far as we know.
We have a metric which intrinsically has no order.
So it becomes impossible to scan along, pair by pair, in one O(n) pass and
find the "largest" or "most preferred", because such a thing is
intrinsically undefined and non-existant for this metric.
Therefore, in order to get a deterministic answer from 'preference', we
must _impose_ some other order on to the routes - given that we don't like
the existing "free" order of 'received order' that we store them in.
Imposing that external order unavoidably has a cost. I can't think of a
way to impose an order other than via a sort. So that implies an O(nlogn)
scaling cost, at least.
We can stagger that cost by doing an insertion sort when routes are
received, so that any given incoming route doesn't need more than O(n)
work, but overall it will still be an _additional_ O(nlogn) amount of
work.
Further, if one destination has a lot of paths, often many other
destinations in the routing table will too, so its more O(dnlogn) - and if
d is Internet DFZ prefixes, well that's been growing above-linear,
polynomially or maybe even exponentially.
It could be a very real cost, if you have a lot of neighouring ASes. And
DMED doesn't even fix much, to balance against that cost. If I was an
operator I'd want to know these things. :)
Note: For add-path bestpath-TX, where we'll have to sort anyway to do the
best-path comparison, then yes, you get DMED for free (well a constant
factor additional cost that can hence be ignored, when compared to an
O(nlogn) scaling cost).
I could say that the algorithm for replicating updates among members of
a peer-group is non-existent and that therefore you shouldn't configure
lots of peers because it will have a performance impact. That would be
a true statement but the operator doesn't really need to know that
unless they configure so many peers and carry so many prefixes that they
notice poor performance.
I don't know. If we knew the scaling cost of something, and if the feature
was optional and the admin might want to balance the costs against the
benefits, why not give them that information?
I don't know the costs of adding peers, but if someone wants to do a study
and measure it, I'd be happy to put it in the docs. :) (If the scaling
figure looks bad and that makes someone go improve it, all the better).
Other vendors might be afraid to tell admins less-than-glossy facts about
their implementation, or the reality of the costs of additional features
in their documentation, but we're different, surely? :)
I have not done a mathmatical proof on it but it has been 14 years since we
found MED churn and we do not know of any flavor of MED churn that is not
solved by the three bullets below.
Tweaking IGP metrics can indeed solve things, but the rules you give are a
bit topology specific. It's not general. It's not even clear they always
apply (maybe they stop applying with certain combinations of clusters, who
knows - sounds like your Type-II churn is that).
Also, poor operators who have to figure this stuff out and experiment with
it and try get it right, potentially while customers are getting irritated
at their cat video packets regularly suffering routing glitches, as BGP
oscillates.
There are some simple mathematical rules that can be followed in designing
a path-vector routing protocol and its metrics which will _ensure_ the
protocol converges, always, on any network. What I'd love is if we could
embrace those rules and start figuring if and how we can bend BGP into
following those rules.
1. Make paths map to a weight (lowest weight == most preferred), and do so
without reference to any other path (MED fails on that)
2. Ensure that the weight of a path always increases when propagated
(many BGP metrics fail on this, but saved by IGP cost hopefully
reflecting this, except if iBGP and IGP topologies are at
cross-purposes / not aligned, in which case you can have oscillation on
IGP costs alone; or if you don't run an IGP at all - seems more common
today; the CLUSTER_LIST length check is good though, it always
increases with reflection - lift that earlier into the decision process
and bingo...).
That's it. Do that, or something that can be shown to be equivalent, and
you can be _sure_ your iBGP will converge. No tweaking needed.
If the CLUSTER_LIST length were lifted to before the IGP check, you
wouldn't even need to care about tweaking IGP costs and having to ensure
they were _just right_. BGP would just work reliably, even if you screwed
up the IGP costs, no matter how complex your iBGP topology - no need for
complex extensions.
I would like Quagga to do what it can to give operators a BGP that "Just
Works" out of the box. :)
I don't know how to do that, but I do think we need to stop revering the
old and provably borkten stuff.
We have 'prefer oldest external' enabled by default which does help.
It doesn't help on any of the examples in the literature I can think of.
:)
The only other knob we could flip is always-compare-med but I don't
think customers would like it much if we did.
We obviously can't, but for any operator who doesn't care about MED - who
doesn't want to give neighbours the power to inject badly-defined,
internal routing metrics, the smart thing to do is:
- set MED to 0 on all routes
- set always-compare on
No performance cost.
The reality is that not many people actually hit MED churn and almost
all of the ones that do hit the Type I scenario which is easily solved.
Why even risk having to solve it? Just disable MED, as above. Problem gone
for good.
I think it would be best to just refer them to RFC3345 and say "read this if
you are seeing oscillating routes in bgp".
Will add that.
Mathmatically prove...no, I have never attempted to do so and we've
never been asked to provide a mathmatical proof. Am I confident that
MED churn is solved with these approaches, absolutely.
At least one is admin-intensive and even error-prone, and still a prereq
for the other.
"It just works, provably, by maths, you really can't screw it up" - is
much better, and what I'd prefer to work towards. Maybe that's just me.
Maybe mention it in the section on 'prefer oldest external' and compare
router-id and chop the mention here?
Made it more specific, but it does seem relevant there?
I suspect a lot of people going into those docs won't read the explanatory
stuff and only read the command-specific text.
regards,
--
Paul Jakma, HPE Networking, Advanced Technology Group
Fortune:
Your analyst has you mixed up with another patient. Don't believe a
thing he tells you.
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev