On Wed, 16 Mar 2016, Donald Sharp wrote:

If you have an interface that bounces allot, we were seeing crashes associated with a 'show ospf ...' command during a period when the interface was down. So since we are dumping some data to the vty it was decided that the most intelligent thing to do if we got into that situation was to safely not attempt to dereference a NULL pointer.

Weird, would love to know how the oi gets into that state though.

I can't really answer this question. The person who wrote this code is 2+ years gone from the company, and the code was written just under 4 years ago now. There are no meaningful notes left for me to peruse at this point in time. I do know from what little documentation is there, that OSPF was experiencing some starvation because watchquagga was killing it and restarting it. This is one of the fixes that went in to fix this whole issue.

So, add batching, then have to tinker with hellos cause of the batching..? How about, we just drop those patches?


Should this start by adding Piotr's patch first and then building on?
Also, does this reflect the comments on that patch set when submitted again
last year?


These patches took Piotr's patch and reworked them, we credited him very
clearly as the original author.  Additionally Piotr's code didn't patch
cleanly at the point in time due to the massive changes already in place
for bgp and zebra.  I see no resubmittal of Piotr's patch set last year and
I don't see any CR comments for this section of code, what am I missing?

I'll send you a link to the thread in a couple of weeks so. Was late last year I think.

It'd probably be preferable to have it as his patch + then Cumulus changes.

 ospfd: 16.0 rfc2328 compliance


What was the motivation for this?


We had/have(tense? oh I hate thee) customers seeing loops without this
patch?


RFC3137 and 6987 both allow for routers to treat "infinite" cost links as
infinite, and the intent of the stub-router support definitely is to wholly
prevent any infinite cost link being used for transit. Quagga is aggressive
on that, but not incorrect in terms of those RFCs.

Without this patch when I set 'max-metric router-lsa administrative' on a
switch, it is no longer considered as part of the spf calculation for
routes.

Suppose I have

source of traffic -----A ---- B ---- C ---- Network I want to talk to.

A is running fastpath, B and C are running Quagga ospf.  B has a default
route towards A for traffic.

If on 'C' I do 'max-metric router-lsa'.  This causes the links cost to be
infinite.  On B without this patch the links are ignored

As intended, yes.

and the path to 'Network I want to talk to' is dropped, leaving a default route back to A, which the packet goes back out on. A picks it up and sends it right back to B. We have achieved a loopage.

Well yes.

There is nothing in the RFC's that specify that we should not use a
max-metric link, while in Quagga we don't consider it.

As I said, Quagga is aggressive about not using a stub-router as a transit router. The RFC is pretty clear that that is an allowed outcome, and indeed the "desired" one (as the RFC seems to put it).

From motivation:

"  In some situations, it may be advantageous to inform routers in a
   network not to use a specific router as a transit point but to still
   route to it.
   ..

   Otherwise, traffic
   destined for the networks and reachable through such a stub router
   may still be routed through it."

(Note "may" in the last sentence).

Deployment considerations:

"  If the path through the stub router is the only one, the routers of
   the first type _will not use the stub router for transit_ (which is
   _the desired behavior_), while the routers of the second type will
   still use this path."

Added emphasis (via underscores) is mine.

The intent of stub-router support is to let a router 'exit' from the OSPF domain gracefully - by updating its OSPF advertisement to /stop/ other routers sending /any/ packets toward it for transit, while still maintaining adjacencies so that it can still calculate valid routes to forward any packets that happen to still arrive (i.e. cause they were sent before other routers were able to react); but keeping connectivity to itself intact.

It seems to me it was intended to be fairly equivalent to shutting down the router, as far as OSPF transit routing goes. If you set your only router to a destination to be a "non-transit" "stub router" then, going just by that language, yes, you should expect routing to that destination is likely no longer to work.

If you don't want to discourage /all/ traffic, I'd have thought you really would have wanted to set a metric *less than* infinite ("infinite" cost seems pretty clear to me). That starts to get into traffic-engineering, and some kind of global, router-level "Set X, in: set all links to min(max, link cost + X)" knob may be more what people are after.

If there are people who thought "stub-router" didn't fully mean that, that's interesting. Wasn't how I read the RFC or how I would have interpreted an "infinite" cost link (the original OSPF RFCs interpreted the same but later removed that requirement, I see). Given existing Quagga OSPF, the best way to get that relaxed sense would be to use 0xfffe, or 0xffff - some X, as the cost.

Whether to take that patch as is, or whether to have a 'strict' max-metric, and also a more relaxed version, hmmm. (Even as is, the commit message language should be changed).

Was the motivation keeping the router accessible, even if the management IP was also OSPF routed?

Yes.

That should already be the case, certainly for the RID. If other local IPs aren't, then maybe there's an issue in a later stage of the routing calc where the router's stub links are added.

If you've queried this patch a few times, I've missed it. I have 100 patches I'm trying to get through the process. It was not my intention to miss anything.

I know. :)

Though, generally, if things are or have been inefficient on side of the patch review process, it doesn't help if the queries or comments that are made then also just get dropped on the other side.

I think we've had this discussion... I'm pretty sure you said that you
would submit a RFC to get this handled?

That was for the decision order change thing, i.e. applies to:

It's a knob that someone can turn on/off if they need it.

It's more a pin on what admins may not realise is a hand grenade. BGP already has enough hidden-danger areas, and a plethora of config options. I'd like to try aim to make BGP - at least in Quagga - a bit more forgiving and easier to administer. I don't want to seed it with further config land-mines anyway.

MED similarly is a hand grenade.

And yes, with Det.MED and every-neighbour-AS-best-path-TX Add-Path you can apply sufficient thrust to make MED sort of fly, at least in the cases we've thought of so far. However, it still doesn't fix the core underlying brokenness of MED, and it can still cause breakage - e.g. if an admin doesn't run Add-Path/best-every-AS everywhere, or some other scenario comes up that we hadn't anticipated perhaps in future BGP extensions. Also, D.MED and every-neighbour-as-best-tx both add significant additional processing overheads to BGP (and a cost in code complexity). Tx every-neighbour-AS-best-path further adds significant communication overheads to BGP - and BGP already is quite an expensive and high-overhead protocol in terms of communication (compared to link-state anyway, which is O(1) messages per link for any link-event; path-vector is something like O(# simple-paths between link-event and receiver), perhaps more, for worst case).

It'd be far far more performant to just fix the underlying BGP metric issues and go make BGP safe and fast. It might just be better to go do that, rather than try workaround the issues of a very problematic part of BGP with a series of ever more costly changes to BGP.

Why not aim for simpler, safer, faster BGP?

At a minimum, that max-MED patch should sort after Add-Path + every-neighbour-AS-best-path though.

regards,
--
Paul Jakma      [email protected]  @pjakma Key ID: 64A2FF6A
Fortune:
If A equals success, then the formula is A = X + Y + Z.
X is work. Y is play.  Z is keep your mouth shut.
                -- Albert Einstein

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to