The case I had in mind is where multi hop BFD is being used to monitor availability of remote servers. there are many equal cost paths to reach them especially in a DC. BFD detecting network issues is only incidental there. And even if it recovers it can leave monitoring/alerting trail . If it's happening often would/should not be ignored.
I take your point about most applications only experiencing latency without dropping tcp connection. I guess BFD in that case is helping them get disconnected (eg directly associated protocols like BGP or causing a load balancer in path to direct packet to wrong server). Though continuous flapping is the flip side. Thanks Abhinav On Wed, 22 Mar, 2023, 11:27 pm Jeff Tantsura, <[email protected]> wrote: > Abhinav, > > Let’s clarify a couple of points. > What you are trying to do is to change entropy to change local hashing > outcome, however for hashing to even be relevant there has to he either > ECMP or LAG in the path to the destination otherwise shortest path will be > he used regardless, so statistically, some of the flows between a given > pair of end points (5 tuple) will be traversing the (partially)broken link, > would you really like BFD to “pretend“ that everything is just fine? > Moreover, by far, in case of congestion - most applications won’t change > their ports but have their TX rate reduced. > There’s work done by Tom Herbert for IPv6/TCP (kernel patch upstreamed a > few years ago) - had beeb presented in RTGWG pre-Covid, that on RTO > changes flow label value (that some might or might not include in hashing), > which is strongly not recommended to be used outside of a tightly > controlled homogenous environment (think within DC). > Outside of what BFD spec tells us (don’t), the above should provide enough > motivation not to do this. > > Cheers, > Jeff > > On Mar 23, 2023, at 05:44, Abhinav Srivastava <[email protected]> wrote: > > > Multi-hop BFD would be the mechanism that detects the failure on the path > it happens to be using for the session. I wasn't thinking of another > mechanism. Detection timer expiry would be the trigger for recovery which > could be augmented with few other possible criteria like how long session > hasn't been able to come back up or prolonged flapping. > > Thanks > Abhinav > > On Wed, 22 Mar, 2023, 3:05 pm Greg Mirsky, <[email protected]> wrote: > >> Hi Abhinav, >> thank you for presenting an interesting scenario for a discussion. I have >> several questions to better understand it: >> >> - How the network failure that triggers the recovery process is >> detected? >> - If the failure detection mechanism is not multi-hop BFD, what is >> the relationship between the detection intervals of heat mechanism and the >> multi-hop BFD session? >> >> Regards, >> Greg >> >> On Wed, Mar 22, 2023 at 4:36 PM Abhinav Srivastava <[email protected]> >> wrote: >> >>> Hi all, >>> >>> >>> >>> I needed clarification around whether source port can be changed for a >>> BFD session in case of multi hop BFD. The ability to change BFD source >>> port when BFD session goes down helps BFD session to recover if its stuck >>> on a network path where there is some intermittent but significant packet >>> loss. >>> >>> >>> >>> In such cases, normally without BFD, end to end application traffic >>> would eventually settle down on a good path as applications typically >>> change source port after experiencing disconnection or failures. But if >>> BFD is being used to monitor some part of a path which is experiencing >>> significant but not 100% packet loss, it will start causing next hop list >>> of associated static route or the associated BGP sessions to start flapping >>> forever, as BFD packets would be stuck to that partial lossy path forever >>> (until BFD session is deleted and recreated by admin action). This may >>> also hinder the typical application recovery strategy of changing source >>> port on failure. >>> >>> >>> >>> Ability to dynamically change BFD source port can help BFD recover in >>> such cases. Is this something that is allowed as per RFC? The RFC5881, >>> section 4 (for single hop) case states that – >>> >>> *“The source port MUST be in the range 49152 through 65535. The same UDP >>> source port number MUST be used for all BFD Control packets associated with >>> a particular session”* >>> >>> >>> >>> Thanks >>> >>> Abhinav >>> >>
