[quagga-dev 14940] Re: [PATCH 00/89] Cumulus Take-3 Electric Bugaloooooooooooooooo

Donald Sharp Sat, 19 Mar 2016 01:28:04 -0700

<inline>

On Tue, Mar 8, 2016 at 7:53 AM, Paul Jakma <[email protected]> wrote:


> A quick comment, just on some patches in there that I'm getting a sense of
> deja vu on mostly. ;) Non-exhaustive.
>
> On Fri, 11 Dec 2015, Donald Sharp wrote:
>
>  ospfd: Ensure deletion is clean
>>  ospfd: Ensure that OSPF can send out multiple packets per thread
>>  ospfd: Limit the number of interfaces serviced at one time
>>
>
> The first, I don't understand the point of - where is it possible that oi
> can be valid, but not have an address? (What is the problem?)


If you have an interface that bounces allot, we were seeing crashes
associated with a 'show ospf ...' command during a period when the
interface was down.  So since we are dumping some data to the vty it was
decided that the most intelligent thing to do if we got into that situation
was to safely not attempt to dereference a NULL pointer.



> The latter two need more justification. Note that if we start batching
> packet writes, it becomes more difficult to interleave anything (other
> things, or prioritising some kind of packets).
>

So in cases where an OSPF system was very busy and a large number of
packets were being sent and received we added a knob to allow the user to
batch up a few more outgoing writes in one pass.  In this case the default
went from 1 -> 3.  I'm not particularly worried about interleaving or
prioritization as that this reduces overhead in the #'s of threads we have
handling outgoing by a factor of 3, while only slightly increasing length
of time the cpu is held for this write thread.



>
> Again, what's the motivation for this? What was the original problem? Was
> it hellos being blocked? Maybe then it'd be simpler to just put hellos at
> the top of the queue instead, or just send them direct from the timer?
>

I can't really answer this question.  The person who wrote this code is 2+
years gone from the company, and the code was written just under 4 years
ago now.  There are no meaningful notes left for me to peruse at this point
in time.  I do know from what little documentation is there, that OSPF was
experiencing some starvation because watchquagga was killing it and
restarting it. This is one of the fixes that went in to fix this whole
issue.



> To expand on the motivation thing:
>
> The commit message should tell the story of why anyone should care about
> this patch, and not dismiss it as churn. It should sell the patch:
>
>  "Here's this terrible problem the original code has! <ideally, back
>   this up with clear examples and even measurements if available>
>
>   Here's the high-level solution!
>
>   Here's a description of the implementation of that solution, how it
>   was intended at least!
>
>   (ideally) Here's the evidence that it works! <examples and
>   measurements, where possible, ideally>"
>
> The more complex a patch, the more important it is that commit message
> follows a structured:
>
>  "Problem
>
>   Solution
>
>   Implementation
>
>   Results"
>
> form, and the more important it is to give details.
>
> Giving a description of the "Implementation" is particularly useful to
> signal to reviewers "I've actually re-read my patch carefully, so there's a
> better chance that I've already caught any silly mistakes", as well as
> provide a view of what the contributor /thought/ they implemented - so
> reviewers can better catch any other silly mistakes that got through.
>
> Yeah, I know, everyone hates writing commit messages - doesn't the code
> speak for itself? However, if we want to achieve throughput in getting
> patches in, then we have to start offloading work from reviewers and onto
> contributors. So, instead of:
>
> "One line vague commit message"
>
> And reviewers having to pick through a non-trivial patch and trying to
> figure it out for themselves, contributors instead need to start spending
> more time providing the "sales pitch" (problem, solution) and the "map"
> ("here's exactly how the solution was intended to be implemented") so that
> the reviewer doesn't have to spend as much time mentally constructing the
> same.
>
> *Contributors are only shooting themselves in the foot when they give
>  vague and/or terse messages.*
>
>  zebra: Add internal support for route tags
>>  bgpd, lib, ospfd, zebra: Add ability to read/write tag value
>>  bgpd, vtysh: Add support for route tags
>>  ospfd, vtysh: Add support for Route tags
>>  ripd, ripngd: add support for route tags
>>  zebra: add support for route tags
>>
>
> Should this start by adding Piotr's patch first and then building on?
> Also, does this reflect the comments on that patch set when submitted again
> last year?
>

These patches took Piotr's patch and reworked them, we credited him very
clearly as the original author.  Additionally Piotr's code didn't patch
cleanly at the point in time due to the massive changes already in place
for bgp and zebra.  I see no resubmittal of Piotr's patch set last year and
I don't see any CR comments for this section of code, what am I missing?




>  ospfd: 16.0 rfc2328 compliance
>>
>
> What was the motivation for this?
>

We had/have(tense? oh I hate thee) customers seeing loops without this
patch?


> RFC3137 and 6987 both allow for routers to treat "infinite" cost links as
> infinite, and the intent of the stub-router support definitely is to wholly
> prevent any infinite cost link being used for transit. Quagga is aggressive
> on that, but not incorrect in terms of those RFCs.
>
>
Without this patch when I set 'max-metric router-lsa administrative' on a
switch, it is no longer considered as part of the spf calculation for
routes.

Suppose I have

source of traffic -----A ---- B ---- C ---- Network I want to talk to.

A is running fastpath, B and C are running Quagga ospf.  B has a default
route towards A for traffic.

If on 'C' I do 'max-metric router-lsa'.  This causes the links cost to be
infinite.  On B without this patch the links are ignored and the path to
'Network I want to talk to' is dropped, leaving a default route back to A,
which the packet goes back out on.  A picks it up and sends it right back
to B.  We have achieved a loopage.

There is nothing in the RFC's that specify that we should not use a
max-metric link, while in Quagga we don't consider it.

Was the motivation keeping the router accessible, even if the management IP
> was also OSPF routed?
>


Yes.


>
> The commit message has slightly condescending language that probably could
> have been avoided ("pretty simple huh"), and also is confused:
>
> "   In this case, the router will continue
>     to forward any traffic sent to it while these "max-metric" LSAs are
>     propagated through the network; at which point, the router can be
>     taken out of service."
>
> This is irrelevant to that patch. The existing code allows for exactly
> this, because it allows the local router to still calculate paths out, and
> so continue to route any packets received out to and via other
> non-max-metric routers.
>
> Further, the spf_next stage of SPF is used to construct just the transit
> SPF tree. In this, the link in a router-LSA is only used to calculate the
> cost to *leave* a vertex V to reach the next candidate W. I.e., ignoring a
> max-metric link out of V should not affect reachability to V itself (i.e.
> it's router-ID). So, ignoring max-metric links was a very deliberately
> chosen way of implementing the "don't transit a stub-router" part of
> RFC3137 (and still in 6987) while still keeping the stub-router itself
> reachable.
>
> The proposed patch opens up the possibility that ospfd /would/ be able to
> start using max-metric routers for transit routes - which, to me,
> definitely is *not* what the stub-router RFCs intend.
>
> I've queried this patch a few times now, and not heard an explanation back
> as to what the problem is that it's trying to solve. So, can we have that
> explanation and discussion, or can you ensure this patch is not
> re-submitted? :)
>
>
If you've queried this patch a few times, I've missed it.  I have 100
patches I'm trying to get through the process.  It was not my intention to
miss anything.


> (E.g., it could be that what you really want is to set the metric to
> 0xfffe... or that something went wrong elsewhere in adding the router -
> depends on what problem Cumulus saw that led to that patch; if there wasn't
> an observed problem, then I wonder if it was due to someone being
> over-hasty in thinking they saw incorrect code, but not understanding SPF..
> (which is not exactly trivial - not just in the code, but even just the
> RFC)).
>
>  bgpd: Add maxmed command
>>
>
> MED is a terminally broken and unfixable "metric" (note: it fails to meet
> any normal definition of metric), which can easily cause pathological
> behaviours in iBGP. What BGP operator in their right mind, once fully
> informed, would want to allow any neighbour ASes to inject such a broken
> and dangerous metric into their iBGP?
>
> My view then would be that MED needs to be deprecated from BGP - as least
> its classic use. So should we really be encouraging further use of it?


I think we've had this discussion... I'm pretty sure you said that you
would submit a RFC to get this handled?

>
>
> bgpd: cluster-id length equality for multipath
>>
>
> We had a discussion on that one already. The sentiment is really good, and
> I'd even lift the cluster-list check up a lot higher, but it's a bit
> dangerous on its own.
>

It's a knob that someone can turn on/off if they need it.

donald

>
> I've had a draft on something similar years ago, but I didn't think I
> could put it into Quagga...
>
> regards,
> --
> Paul Jakma      [email protected]  @pjakma Key ID: 64A2FF6A
> Fortune:
> Force it!!!
> If it breaks, well, it wasn't working anyway...
> No, don't force it, get a bigger hammer.
>

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

[quagga-dev 14940] Re: [PATCH 00/89] Cumulus Take-3 Electric Bugaloooooooooooooooo

Reply via email to