The case I had in mind is where multi hop BFD is being used to monitor
availability of remote servers.  there are many equal cost paths to reach
them especially in a DC.  BFD detecting network issues is only incidental
there. And even if it recovers it can leave monitoring/alerting trail . If
it's happening often would/should not be ignored.

I take your point about most applications only experiencing latency without
dropping tcp connection. I guess BFD in that case is helping them get
disconnected (eg directly associated protocols like BGP or causing a load
balancer in path to direct packet to wrong server). Though continuous
flapping is the flip side.

Thanks
Abhinav


On Wed, 22 Mar, 2023, 11:27 pm Jeff Tantsura, <[email protected]>
wrote:

> Abhinav,
>
> Let’s clarify a couple of points.
> What you are trying to do is to change entropy to change local hashing
> outcome, however for hashing to even be relevant there has to he either
> ECMP or LAG in the path to the destination otherwise shortest path will be
> he used regardless, so statistically, some of the flows between a given
> pair of end points (5 tuple) will be traversing the (partially)broken link,
> would you really like BFD to “pretend“ that everything is just fine?
> Moreover, by far, in case of congestion  - most applications won’t change
> their ports but have their TX rate reduced.
> There’s work done by Tom Herbert for IPv6/TCP (kernel patch upstreamed a
> few years ago)  - had beeb presented in RTGWG pre-Covid, that on RTO
> changes flow label value (that some might or might not include in hashing),
> which is strongly not recommended to be used outside of a tightly
> controlled homogenous  environment (think within DC).
> Outside of what BFD spec tells us (don’t), the above should provide enough
> motivation not to do this.
>
> Cheers,
> Jeff
>
> On Mar 23, 2023, at 05:44, Abhinav Srivastava <[email protected]> wrote:
>
> 
> Multi-hop BFD would be the mechanism that detects the failure on the path
> it happens to be using for the session. I wasn't thinking of another
> mechanism.  Detection timer expiry would be the trigger for recovery which
> could be augmented with few other possible criteria like how long session
> hasn't been able to come back up or prolonged flapping.
>
> Thanks
> Abhinav
>
> On Wed, 22 Mar, 2023, 3:05 pm Greg Mirsky, <[email protected]> wrote:
>
>> Hi Abhinav,
>> thank you for presenting an interesting scenario for a discussion. I have
>> several questions to better understand it:
>>
>>    - How the network failure that triggers the recovery process is
>>    detected?
>>    - If the failure detection mechanism is not multi-hop BFD, what is
>>    the relationship between the detection intervals of heat mechanism and the
>>    multi-hop BFD session?
>>
>> Regards,
>> Greg
>>
>> On Wed, Mar 22, 2023 at 4:36 PM Abhinav Srivastava <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>>
>>>
>>> I needed clarification around whether source port can be changed for a
>>> BFD session in case of multi hop BFD.   The ability to change BFD source
>>> port when BFD session goes down helps BFD session to recover if its stuck
>>> on a network path where there is some intermittent but significant packet
>>> loss.
>>>
>>>
>>>
>>> In such cases, normally without BFD, end to end application traffic
>>> would eventually settle down on a good path as applications typically
>>> change source port after experiencing disconnection or failures.  But if
>>> BFD is being used to monitor some part of a path which is experiencing
>>> significant but not 100% packet loss, it will start causing next hop list
>>> of associated static route or the associated BGP sessions to start flapping
>>> forever, as BFD packets would be stuck to that partial lossy path forever
>>> (until BFD session is deleted and recreated by admin action).  This may
>>> also hinder the typical application recovery strategy of changing source
>>> port on failure.
>>>
>>>
>>>
>>> Ability to dynamically change BFD source port can help BFD recover in
>>> such cases.  Is this something that is allowed as per RFC?  The RFC5881,
>>> section 4 (for single hop) case states that –
>>>
>>> *“The source port MUST be in the range 49152 through 65535. The same UDP
>>> source port number MUST be used for all BFD Control packets associated with
>>> a particular session”*
>>>
>>>
>>>
>>> Thanks
>>>
>>> Abhinav
>>>
>>

Reply via email to