Re: Mirja Kühlewind's Discuss on draft-ietf-bfd-seamless-base-09: (with DISCUSS)

Carlos Pignataro (cpignata) Wed, 04 May 2016 08:50:59 -0700

Hi, Mirja,

> On May 4, 2016, at 11:29 AM, Mirja Kuehlewind (IETF) <[email protected]> 
> wrote:
> 
> Hi Carlos,
> 
>> Am 04.05.2016 um 17:13 schrieb Carlos Pignataro (cpignata) 
>> <[email protected]>:
>> 
>> Hi, Mirja,
>> 
>>> On May 4, 2016, at 10:41 AM, Mirja Kuehlewind (IETF) <[email protected]> 
>>> wrote:
>>> 
>>> Hi Carlos,
>>> 
>>> below.
>>> 
>>>> Am 04.05.2016 um 16:33 schrieb Carlos Pignataro (cpignata) 
>>>> <[email protected]>:
>>>> 
>>>> Thanks much for the response, Mirja!
>>>> 
>>>> I think we are converging, please see inline.
>>>> 
>>>>> On May 4, 2016, at 10:13 AM, Mirja Kuehlewind (IETF) 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Hi Carlos,
>>>>> 
>>>>> see below.
>>>>> 
>>>>>> Am 03.05.2016 um 19:24 schrieb Carlos Pignataro (cpignata) 
>>>>>> <[email protected]>:
>>>>>> 
>>>>>> Hi, Mirja,
>>>>>> 
>>>>>>> On May 3, 2016, at 12:31 PM, Mirja Kuehlewind (IETF) 
>>>>>>> <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi Carlos,
>>>>>>> 
>>>>>>> 
>>>>>>>> Am 03.05.2016 um 15:40 schrieb Carlos Pignataro (cpignata) 
>>>>>>>> <[email protected]>:
>>>>>>>> 
>>>>>>>> Hi, Mirja,
>>>>>>>> 
>>>>>>>> What is an uncontrolled packet in an IP network, and what entity 
>>>>>>>> controls controlled ones? :-)
>>>>>>> 
>>>>>>> Questions over questions… :-)
>>>>>>> 
>>>>>>> See below...
>>>>>>> 
>>>>>>>> 
>>>>>>>> More seriously, please see inline.
>>>>>>>> 
>>>>>>>>> On May 3, 2016, at 5:35 AM, Mirja Kuehlewind <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Mirja Kühlewind has entered the following ballot position for
>>>>>>>>> draft-ietf-bfd-seamless-base-09: Discuss
>>>>>>>>> 
>>>>>>>>> When responding, please keep the subject line intact and reply to all
>>>>>>>>> email addresses included in the To and CC lines. (Feel free to cut 
>>>>>>>>> this
>>>>>>>>> introductory paragraph, however.)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Please refer to 
>>>>>>>>> https://www.ietf.org/iesg/statement/discuss-criteria.html
>>>>>>>>> for more information about IESG DISCUSS and COMMENT positions.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The document, along with other ballot positions, can be found here:
>>>>>>>>> https://datatracker.ietf.org/doc/draft-ietf-bfd-seamless-base/
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>> DISCUSS:
>>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> As S-BFD has no initiation process anymore it is not guarenteed that 
>>>>>>>>> the
>>>>>>>>> receiver/responder actually exists. That means that packets could 
>>>>>>>>> float
>>>>>>>>> (uncontrolled) in the network or even outside of the adminstrative 
>>>>>>>>> domain
>>>>>>>>> (e.g. due to configuration mistakes). From my point of view this 
>>>>>>>>> document
>>>>>>>>> should recommend/require two things:
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> We have called out the misconfiguration — however:
>>>>>>>> 
>>>>>>>>> 1) A maximum number of S-BFD packet that is allow to be send without
>>>>>>>>> getting a response (maybe leading to a local error report).
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> This can result in a deadlock situation, if an S-BFD Reflector is 
>>>>>>>> enabled much later. I’m very hesitant to cap the packets sent. We can, 
>>>>>>>> and I think it is useful, MAY log an error for multiple timeouts.
>>>>>>> 
>>>>>>> Okay, I understand that a hard limit probably does make sense. An error 
>>>>>>> log seems definitely useful.
>>>>>> 
>>>>>> OK, that sounds good. See below.
>>>>>> 
>>>>>>> Another proposal for consideration: Currently the draft says an 
>>>>>>> initiator should only send one packet per second if the target is in 
>>>>>>> ADMINDOWN state. In this case there this state is explicit announced. 
>>>>>>> However if the other end just disappears or was never/not yet there, 
>>>>>>> one could use an exponential back off instead, starting with a smaller 
>>>>>>> intervals than one second but then increase it exponentially. Just an 
>>>>>>> idea...
>>>>>> 
>>>>>> Thanks for the proposal. Please have in mind however that this is a 
>>>>>> protocol for detecting liveness (and lack of it), so increasing 
>>>>>> exponentially defeats the purpose.
>>>>>> 
>>>>>> Further, exponential back off may not be the best choice when 
>>>>>> interacting with routing protocols.
>>>>>> 
>>>>>> What we currently say is:
>>>>>>  The criteria for declaring loss of
>>>>>>  reachability and the action that would be triggered as a result
>>>>>>  are outside the scope of this document.
>>>>>> 
>>>>>> As much of these are implementation choices.
>>>>>> 
>>>>>> But we can add at the end “, and MAY include logging an error.“
>>>>> 
>>>>> Please do so.
>>>> 
>>>> Done.
>>>> 
>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 2) Egress filtering at the adminstrative border of the domain that 
>>>>>>>>> uses
>>>>>>>>> S-BFD to make sure that no S-BFD packets leave the domain.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> This is no different than any other application that uses UDP; a 
>>>>>>>> misconfigured DNS server will result in the same, and a traceroute is 
>>>>>>>> also not too different. This seems too onerous of a requirement. An 
>>>>>>>> administrative domain filters at ingress.
>>>>>>> 
>>>>>>> First of all, just because other protocols might have such a problem, 
>>>>>>> that does mean it’s okay.
>>>>>> 
>>>>>> I agree with this. I had a different point in mind though — trying to 
>>>>>> specify this on every UDP application might not be the most effective 
>>>>>> way. Perhaps there’s a UDP guideline you are uncovering.
>>>>>> 
>>>>>>> However, correctly me if I’m wrong, but there the situation seems 
>>>>>>> slightly different because there is no termination criterium at all 
>>>>>>> that means an s-bfd node would send useless data forever (… until 
>>>>>>> manual change of the config).
>>>>>>> 
>>>>>> 
>>>>>> But as far as this doc is concerned, let me try to make some 
>>>>>> clarifications (and a correction).
>>>>>> 
>>>>>> There are termination criteria — the document says:
>>>>>> 
>>>>>> An SBFDInitiator may be a persistent session on the initiator with a
>>>>>> timer for S-BFD control packet transmissions (stateful
>>>>>> SBFDInitiator).  An SBFDInitiator may also be a module, a script or a
>>>>>> tool on the initiator that transmits one or more S-BFD control
>>>>>> packets "when needed" (stateless SBFDInitiator).
>>>>>> 
>>>>>> For the case in which you have a “when needed” SBFDInitiator, the 
>>>>>> workflow is like a “ping”.
>>>>>> 
>>>>>> For the case in which you have a “persistent" SBFDInitiator, in theory 
>>>>>> this can run forever (for some value of ever). However, please don’t 
>>>>>> loose track of why this protocol exists. Having OAM failures and do 
>>>>>> nothing about it defeats the purpose of having OAM. Meaning, a red alarm 
>>>>>> will blink, a honk will horn, and the config state will be changed 
>>>>>> (manually or by some support system).
>>>>>> 
>>>>>> In other words, I do not think this is such a unique case (insofar as 
>>>>>> running ad-infinutum).
>>>>> 
>>>>> I still believe that the case where you have a misconfiguration and the 
>>>>> initiator sends packets (forever) but never ever gest a reply (and never 
>>>>> has seen a reply in the past), is a different case and can be detected 
>>>>> and handled separately.
>>>>> 
>>>> 
>>>> Again, I would not qualify this as ‘forever’, but I understand what you 
>>>> mean.
>>>> 
>>>>>> 
>>>>>>> I still believe that egress filtering would be more appropriate here 
>>>>>>> (than ingress) because the domain that is using s-bfd knows about it 
>>>>>>> and therefor can set up the respective filters and should not spam 
>>>>>>> others while hoping that filters are in place.
>>>>>>> 
>>>>>> 
>>>>>> To me, there is no insignificant operational complexity with requiring 
>>>>>> the addition of filters throughout, for one particular application not 
>>>>>> leaking (where the leak does not cause anything special), and when the 
>>>>>> leak might happen because of a misconfiguration (or bug) but will be 
>>>>>> detected by the operational support systems. The ROI does not seem to 
>>>>>> add up.
>>>>> 
>>>>> Okay the document should probably not require egress filtering in any 
>>>>> case but what’s about saying something like:
>>>>> 
>>>>> „If S-BFD is used it SHOULD be ensured that S-BFD control packet do not 
>>>>> propagate outside of the administrative domain that uses it.“
>>>>> 
>>>> 
>>>> We can add an additional explanation of the problem before a statement, 
>>>> but I do not think that particular SHOULD is actionable. How’s something 
>>>> like:
>>>> 
>>>> Explain that without handshake, a persistent initiator can send blindly, 
>>>> to then add “In such case, operational measures SHOULD be taken to 
>>>> identify if S-BFD packets are not responded to for an extended period of 
>>>> time, and remediate the situation”
>>>> 
>>>>> This is not an uncommon thing to specify also for other protocols.
>>>>> 
>>>> 
>>>> I agree. Frankly, I am happy with either statement, but I think the latter 
>>>> might be more operationally actionable.
>>>> 
>>>> Preference?
>>> 
>>> I still would prefer something in the line as I proposed. I think there 
>>> could effectively  be different action to be taken here, e.g. agree 
>>> filtering or measurement to detect failure, as well as no action if for 
>>> some other reason it can be ensure that should a misconfiguration can not 
>>> happen (or is at least very unlikely to happen) e.g because things are 
>>> automated and there are additional checks before apply a config.
>>> 
>> 
>> Perhaps I can add “for an extended period of time” to the first statement 
>> (or similar wording of your liking)?
>> 
>> Your main concern is the “forever”. Let’s ensure it is not “forever”. 
>> However, I’m concerned that a single packet out (like a ping to the wrong 
>> address) will be violating “ it SHOULD be ensured that S-BFD control packet 
>> do not propagate outside”
> 
> The concern it not „forever“ but putting (unnecessary) load on other network 
> (by accident). So I agree, one or a few packets is not a problem. So yes, 
> adding “for an extended period of time” is fine. We could also/instead 
> exchange the word „ensure“ by something else (maybe „control“…?).
>


These two changes would certainly work.

Thank you. We will post a new rev today.

[I still think that a few packets are not “(unnecessary) load" for an IP 
device, it’s really not different than doing a traceroute and getting an 
icmp.unreach port unreachable (or if it is critical and unwelcome load for a 
device, those devices are protected at ingress at their border).

But in any case, I do think that explaining the problem you highlight helps and 
improves the doc, and the new text on what to do does not hurt.]

Thanks,

— Carlos.

> Mirja
> 
> 
> 
>> 
>> Would that work?
>> 
>> Thanks,
>> 
>> — Carlos.
>> 
>>> The second SHOULD that you proposed is from my point of view actually an 
>>> additional point that I would also be happy to see in the doc.
>>> 
>>> Mirja
>>> 
>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> — Carlos.
>>>> 
>>>>> Mirja
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Does the explanation of the termination criteria help?
>>>>>> 
>>>>>>>> 
>>>>>>>> Seems to me the logging will alert someone/something to take action, 
>>>>>>>> and should be enough.
>>>>>>> 
>>>>>>> Logging plus alerts is definitely a good thing.
>>>>>>> 
>>>>>> 
>>>>>> I agree.
>>>>>> 
>>>>>> Will add “, and MAY include logging an error.” as per above.
>>>>>> 
>>>>>> Do you think we should expand on this somewhere else in the document?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> — Carlos.
>>>>>> 
>>>>>>> Mirja
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> — Carlos.

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: Mirja Kühlewind's Discuss on draft-ietf-bfd-seamless-base-09: (with DISCUSS)

Reply via email to