Eduard, пн, 2 авг. 2021 г. в 13:45, Vasilenko Eduard <[email protected]>:
> It is the key in this presentation “This behavior MUST be switched off by > default” > > It has been shown on slides 7-10 that flow label change on RTO is enabled > by default for BSD and LINUX. > > It needs regulation. It needs a standard RFC. Because it kills Anycast > otherwise. > As I'm partially responsible for the key points of the presentation, I can stress that it is a bit different. - We have an opportunity for self-healing TCP on top of IPv6, it should be preserved; - The Linux defaults should be changed to a safe mode to prevent session timeouts; - The hash recalculation behavior should be documented; I'm not sure what you mean by the term 'regulation'. > The story of how to use RTO to work-around “silent drop” vendor’s bugs > could be a good informational RFC. > > My be people developing iOAM would pay more attention to this use case. > > > > IMHO: these are 2 separate drafts. > I'm not sure about it, we'll try to provide -00 before the next IETF meeting, let's see how it progresses. > Eduard > > *From:* Alexander Azimov [mailto:[email protected]] > *Sent:* Monday, August 2, 2021 1:20 PM > *To:* Vasilenko Eduard <[email protected]>; Jeff Tantsura < > [email protected]> > *Cc:* routing WG <[email protected]> > *Subject:* Re: Self-healing Networking with Flow Label > > > > Eduard, > > > > Please see the quote from the slide 28. My suggestion was: > > > > Client – sends SYN, Server – responds with SYN&ACK > > - In case of SYN_RTO or RTO events Server SHOULD recalculate its TCP > socket hash, thus change Flow Label. This behavior MAY be switched on by > default; > - In case of SYN_RTO or RTO events Client MAY recalculate its TCP > socket hash, thus change Flow Label. This behavior MUST be switched off by > default; > > This looks like a safe default behavior, that saves the part of the > improvements, but makes the work with stateful anycast services safe. > > > > And yes, IMO it's ok to have a knob to enable it in the controlled > environment. If you ask how to enable it in the presence of internal > anycast services - there was also a suggestion in the slides: eBPF. It > gives a good way to make this kind of separation. > > > > 02.08.2021, 11:48, "Vasilenko Eduard" <[email protected]>: > > Hi Jeff, > > The situation when Control Plane does not understand what the Forwarding > pane doing is a bug. > > Yes, RTO in TCP helps to find a work-around for this bug. And yes, Anycast > is typically absent inside DC – it does not create the problem in the DC > environment. > > > > But the same LINUX is used outside DC. RTO Flow Label change here would > create even more problems if Anycast would happen on the traffic path (not > much predictable for client). > > Do we need separate LINUX distribution for DC and separate distribution > for other environments? > > Or should we rely on the proper non-default configuration for different > environments? (Admin should not forget to change) > > What if Anycast would become needed in DC? > > > > RTO flow label recalculation is mutually exclusive with Anycast on the > traffic part. > > What is more valuable for the public? > > > > IMHO: It is better to fight the problem of such type of a bug with iOAM > than to cancel Anycast. > > > > IMHO: It is better to have Flow Label recalculation on RTO as “off” by > default. > > DC Admin has the higher qualification to activate it in a controlled > environment than every client worldwide that should not forget to disable > it. > > > > Eduard > > *From:* Jeff Tantsura [mailto:[email protected] > <[email protected]>] > *Sent:* Monday, August 2, 2021 6:56 AM > *To:* Vasilenko Eduard <[email protected]> > *Cc:* [email protected]; routing WG <[email protected]> > *Subject:* Re: Self-healing Networking with Flow Label > > > > Eduard, > > > > The issue is present not in link/device case, if well implemented - fast > rehash takes care of updating forwarding within a number of ms. The problem > is with “gray” failures, when the link in question is UP from > routing/forwarding prospective but drops traffic (mostly occasionally and > when a HW bug occurs has a distinct flow attributes). > > > > In many large DC fabrics, the majority of the traffic is east-west, > between end-points that aren’t anycast. In such deployments - the solution > solves issues rather elegantly and without any interventions from the > operator. > > The issues/side effects are well understood and will be documented. > > > > The best way to receive RTGWG emails is well… to subscribe to RTGWG ;-) > > Cheers, > > Jeff > > > > > On Aug 1, 2021, at 09:47, Vasilenko Eduard <[email protected]> > wrote: > > > > Hi Alexander, > > > > Have I understood your presentation right? > > The client SHOULD change IPv6 flow label after SYN RTO to have a chance to > be moved to the working path inside DC fabric (if DC fabric supports flow > label for hash calculation) > > But at the same time > > The client SHOULD NOT change the IPv6 flow label after SYN RTO to avoid > being switched to a different TCP proxy engine. > > > > Looks like a deadlock, especially if both things should happen for the > same traffic: > > it should reach DC fabric > > and > > it should be hash load-balanced between different TCP proxy engines (or > applications) inside DC Fabric. > > > > I see one bad solution (“Disable Flow Label”): > > Routers up to TCP proxy engine SHOULD be configured not to use flow label > (by the way these are all routers on the Internet), > > TCP flow engines SHOULD be outside of the DC Fabric (CLOS) – probably in > front of it. > > Routers/Switches inside DC Fabric SHOULD use flow labels. > > > > I see another bad solution (“Disable Anycast”): > > Disable anycast on routers in principle, use only stateful LB. > > > > > > It has been commented in the chat that Anycast is not possible in > principle for stateful connection. It is too general a statement. > > Anycast is just not compatible with Flow Label. It is not a problem for > IPv4 anycast even if the connection is stateful (TCP) because 5-tuple for > hash would not change. > > Hence, IPv6 anycast has become dead at the time when Flow Label change has > been added in LINUX for active TCP session. > > > > Among 3 thins: > > - Anycast > > - Flow Label load balancing (basic Flow Label functionality) > > - Flow Label change on the active session for application to be > more active in new path search > > You have to choose which one to kill – all 3 are not compatible with each > other at the same. > > I vote to disable Flow Label change in LINUX. Then wait till the network > would fix itself. > > We have so many fancy TE tools our days. A broken link or a broken node > could be excluded from routing for 50ms. > > > > PS: I am not subscribed to the RTGWG alias, please keep me on a copy of > this thread. > > <image001.png> > > Best Regards > > Eduard Vasilenko > > Senior Architect > > Europe Standardization & Industry Development Department > > Tel: +7(985) 910-1105, +7(916) 800-5506 > > > > _______________________________________________ > rtgwg mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/rtgwg > > > > > > -- > > Best regards, > > Alexander Azimov > > > _______________________________________________ > rtgwg mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/rtgwg > -- Best regards, Alexander Azimov
_______________________________________________ rtgwg mailing list [email protected] https://www.ietf.org/mailman/listinfo/rtgwg
