Re: [Lsr] [Teas] Fwd: Working Group Last Call for "Applicability of IS-IS Multi-Topology (MT) for Segment Routing based Network Resource Partition (NRP)" - draft-ietf-lsr-isis-sr-vtn-mt-06
>From the draft: === > The mechanism described in this document is considered useful for network > scenarios in which > the required number of NRP is small, as no control protocol extension is > required. For network > scenarios where the number of required NRP is large, more scalable solution > would be needed, > which may require further protocol extensions and enhancements. So the proposed draft is about a solution that doesn't scale (well). And then later, we might get another solution that does scale (better). Then we'll end up with two solutions for one problem. One bad solution, and one (hopefully) better solution. If that is the case, then I suggest we wait a bit, and see what else the TEAS workgroup comes up with. I rather have one good solution than two half-baked. Or even one good and one half-baked. Less is more. henk. > On 01/11/2024 4:40 AM CET Chongfeng Xie wrote: > > > > Hi Les, > > Thanks for your comments. > > This is an informational document which describes the applicability of > existing IS-IS MT mechanisms for building SR based NRPs. All the normative > references are either RFCs or stable WG documents. It is true that some > informative references are individual documents, while they just provide > additional information related to this topic, thus would not impact the > stability and maturity of the proposed mechanism. > > The text you quoted from draft-ietf-teas-nrp-scalability are about the > considerations when the number of NRP increases, how to minimize the impact > to the routing protocols (e.g. IGP). While as described in the scalability > considerations section of this document, the benefit and limitation of using > this mechanism for NRP are analyzed, and it also sets the target scenarios of > this mechanism: > > “The mechanism described in this document is considered useful for > network scenarios in which the required number of NRP is small” > > Thus it is clear that this solution is not recommended for network scenarios > where the number of required NRP is large. > > Please note section 3 of draft-ietf-teas-nrp-scalability also mentioned that: > > “The result of this is that different operators can choose to deploy > things at different scales.” > > And > > “In particular, we should be open to the use of approaches that do not > require control plane extensions and that can be applied to deployments with > limited scope.” > > According to the above text, we believe the mechanism described in this > document complies to the design principles discussed in > draft-ietf-teas-nrp-scalability and provides a valid solution for building > NRPs in a limited scope. > > Hope this solves your concerns about the maturity and scalability of this > mechanism. > > Best regards, > > Chongfeng > > > > > > > > From: Les Ginsberg \(ginsberg\) mailto:ginsberg=40cisco@dmarc.ietf.org > > Date: 2024-01-11 08:21 > > To: Joel Halpern mailto:j...@joelhalpern.com; Acee Lindem > > mailto:acee.i...@gmail.com; t...@ietf.org mailto:t...@ietf.org; > > lsr@ietf.org mailto:lsr@ietf.org > > Subject: Re: [Lsr] [Teas] Fwd: Working Group Last Call for "Applicability > > of IS-IS Multi-Topology (MT) for Segment Routing based Network Resource > > Partition (NRP)" - draft-ietf-lsr-isis-sr-vtn-mt-06 > > > > (NOTE: I am replying to Joel’s post rather than the original last call > > email because I share some of Joel’s concerns – though my opinion on the > > merits of the draft is very different. > > Also, I want to be sure the TEAS WG gets to see this email.) > > > > I oppose Last Call for draft-ietf-lsr-isis-sr-vtn-mt. > > > > It is certainly true, as Joel points out, that this draft references many > > drafts which are not yet RFCs – and in some cases are not even WG > > documents. Therefore, it is definitely premature to last call this draft. > > > > I also want to point out that the direction TEAS WG has moved to recommends > > that routing protocols NOT be used as a means of supporting NRP. > > > > https://www.ietf.org/archive/id/draft-ietf-teas-nrp-scalability-03.html#name-scalabliity-design-principl > > states: > > > > “…it is desirable for NRPs to have no more than small impact (zero being > > preferred) on the IGP information that is propagated today, and to not > > required additional SPF computations beyond those that are already > > required.” > > > > https://www.ietf.org/archive/id/draft-ietf-teas-nrp-scalability-03.html#name-scalabliity-design-principl > > states: > > > > “The routing protocols (IGP or BGP) do not need to be involved in any of > > these points, and it is important to isolate them from these aspects in > > order that there is no impact on scaling or stability.” > > > > Another draft which is referenced is > > https://datatracker.ietf.org/doc/draft-dong-lsr-sr-enhanced-vpn/ > > https://datatracker.ietf.org/doc/draft-dong-lsr-sr-enhanced-vpn/ - which is >
Re: [Lsr] WG Adoption Call - draft-pkaneria-lsr-multi-tlv (11/17/2023 - 12/09/2023)
Support. As the mechanism described in the draft has already been implemented by the three largest vendors of ISP-class routers, and that software has been deployed in real networks today, we better document this asap in an RFC. henk. > On 11/17/2023 6:23 PM CET Yingzhen Qu wrote: > > > Hi, > > This begins a WG adoption call for draft-pkaneria-lsr-multi-tlv: > draft-pkaneria-lsr-multi-tlv-04 - Multi-part TLVs in IS-IS (ietf.org) > https://datatracker.ietf.org/doc/draft-pkaneria-lsr-multi-tlv/ > > Please send your support or objection to the list before December 9th, 2023. > An extra week is allowed for the US Thanksgiving holiday. > > Thanks, > Yingzhen > ___ > Lsr mailing list > Lsr@ietf.org > https://www.ietf.org/mailman/listinfo/lsr > ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] New Version Notification for draft-pkaneria-lsr-multi-tlv-01.txt
Hi Tony, > Yes, I'm advocate for putting things elsewhere, but that proposal has > met with crickets. You don't get it both ways: no capabilities in the > protocol and nowhere else does not work. I'm not sure I know what you are talking about. Did you write a draft? > Because the thought of trying to deploy this capability at scale without > this attribute seems impossible. Consider the case of Tier 1 providers > who have large IS-IS deployments. Are you really going to evaluate 2000+ > nodes without some kind of help? With the help of the management-plane? How did those providers make changes to their configs/features/architecture before? I would expect them to use the same tools. > And the routers will do computations based on the multi-part TLVs. > One level of indirection for a capability does not seem extreme. Not extreme, indeed. But again, I rather not see 20 different minor or irrelevant things in the router-capability TLV. Certainly not at 2 octets per item. 1 Bit would already be (16 times) better. > > Regardless whether we do that or not, this discussion maybe should be done > > outside the multipart TLV discussion. Maybe another draft should be written > > about these software-capabilities in general? > > Please feel free. My proposal was shot down. Are you talking about a very recent proposal? Linked to the multipart-TLV draft? Or something older? I vaguely remember some idea about "generic transport" in IS-IS (or rather: outside the regular IS-IS instance). henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] New Version Notification for draft-pkaneria-lsr-multi-tlv-01.txt
Hi Tony, > Some exist today. There are many TLVs where they have never been specified. My point was: multipart TLVs exist today, before the introduction of the capability advertisement. So when you look at a LSPDB, you still don't know for sure which routers support multipart TLVs. Some might, but don't advertise it. Because their software was written before the new capability existed. >> In the end, every detail, will get its own router capability. > That's correct. It will. That's going to happen independently of this draft. I hope not. And I will oppose those attempts too. > we still do not have an effective management plane and > continue to stuff things into the LSDB that belong in the management plane. Yes. But that does not make it an excuse to put just anything in the LSPDB. I've seen you in this work-group as someone who tried to keep things out of IS-IS that don't belong there. I am surprised to see you want this capability in. > The entire definition of a Flex Algo topology constraint should be > in the management plane. Sure. But at least the routers do make route-calculation decisions based on that information. > That's not an excuse for not trying to do a good job now. That is the whole question. This capability is adding 2 more octets to LSPs. Is that worth it? What if indeed a few dozen drafts will follow to advertise more of these capabilities? Should we define a new top-level TLV for "feature/software support capability"? Not whether something is configured or not (as does the router-capability TLV), but whether a router's software has that capability to begin with. Or should we define a new variable-length bitmask sub-TLV for the existing router capability TLV. Where every bit indicates another piece of software the router supports? Regardless whether we do that or not, this discussion maybe should be done outside the multipart TLV discussion. Maybe another draft should be written about these software-capabilities in general? henk. > Op 11-09-2022 21:32 schreef Tony Li : > > > Hi Henk, > > > > If we want to introduce MP-TLVs, that change would warrant the existence > > > of the flag. > > > > Multipart TLVs already exist today. > > > Some exist today. There are many TLVs where they have never been specified. > > > > As discussed here, after introducing a "software capability TLV", if a > > router doesn't > > advertise any of those new capabilities, we still don't know whether that > > router supports > > multipart TLVs or not. > > So the new capability seems to have limited value. > > > In particular, the current proposal on the table is to have the capability > apply to the TLVs where multi-part has not been previously defined. > > > > > I dispute that a binary flag warrants the word 'complexity'. > > > > You might think of a single bit now. > > But people might want to add more. What TLVs on a router can and can not be > > received > > multipart? What about sub-TLVs? > > > We are not proposing that level of specificity. It’s all or nothing. > > > > It seems these days we have more people in the LSR work-group that prefer > > to write drafts > > than that prefer to write code. > > > Ok, I’m offended. > > > > I fear a large amount of drafts about router capabilities to > > advertise support for every bit, every TLV and every sub-TLV in LSPs. In > > the end, every > > feature, every detail or option in a feature, will get its own router > > capability. > > > That’s correct. It will. That’s going to happen independently of this draft. > > > > I'm happy nobody wants routers to react to advertised software > > capabilities. > > But if routers don't react to info in a LSP, I don't think that info > > belongs in the control-plane. > > It belongs in the management-plane. > > > Thank you, but we still (40 years in?) do not have an effective management > plane and continue to stuff things into the LSDB that belong in the > management plane. > > > > That's a different thing, imho. It's a single exception. It's very useful > > to identify LSPs and routers. > > I don't see other management information we should put in LSPs. > > > There’s tons of stuff. The entire definition of a Flex Algo topology > constraint should be in the management plane. Almost everything that is in > the router capability TLV should be in the management plane. > > We stepped down this slippery slope a long, long, time ago. > > > > Twenty-five years ago, you and me were the first people to implement > > support for wide metrics. > > I came up with a strategy to migrate a running network. > > I introduced the "metric-style [wide|narrow]" command. That was supposed to > > be a temporary thing. > > Just for use in the next 1-2 years, when every IS-IS network was migrating > > to wide metrics. > > > And it’s still there. > > > > When I came back to work, in 2015, I saw that the Nokia SR-OS routers still > > have narrow metrics as > > the default se
Re: [Lsr] New Version Notification for draft-pkaneria-lsr-multi-tlv-01.txt
Hi Tony, > If we want to introduce MP-TLVs, that change would warrant the existence of > the flag. Multipart TLVs already exist today. As discussed here, after introducing a "software capability TLV", if a router doesn't advertise any of those new capabilities, we still don't know whether that router supports multipart TLVs or not. So the new capability seems to have limited value. > I dispute that a binary flag warrants the word 'complexity'. You might think of a single bit now. But people might want to add more. What TLVs on a router can and can not be received multipart? What about sub-TLVs? > We are not allowing that level of granularity. A system that is > going to support MP-TLVs should take care to operate correctly > for ALL TLVs before advertising that it supports them. It seems these days we have more people in the LSR work-group that prefer to write drafts than that prefer to write code. I fear a large amount of drafts about router capabilities to advertise support for every bit, every TLV and every sub-TLV in LSPs. In the end, every feature, every detail or option in a feature, will get its own router capability. I'm happy nobody wants routers to react to advertised software capabilities. But if routers don't react to info in a LSP, I don't think that info belongs in the control-plane. It belongs in the management-plane. > We have been sending management information in the LSDB since > we introduced the hostname TLV. That's a different thing, imho. It's a single exception. It's very useful to identify LSPs and routers. I don't see other management information we should put in LSPs. Twenty-five years ago, you and me were the first people to implement support for wide metrics. I came up with a strategy to migrate a running network. I introduced the "metric-style [wide|narrow]" command. That was supposed to be a temporary thing. Just for use in the next 1-2 years, when every IS-IS network was migrating to wide metrics. When I came back to work, in 2015, I saw that the Nokia SR-OS routers still have narrow metrics as the default setting. I laughed. Now I am back at cisco, and I see that IOS-XR also has narrow metrics as the default. I cried. FYI, both OSes had their FCS many years after 1997. If I am not mistaken, JunOS has "metric-style both" as the default. So on all these 3 OSes, you need to explicitly configure "metric-style wide". Twenty five years after the migration . I fear that the same will happen with your router-capability. The new capability will have some value now. To help migrate to a network where all boxes support multipart TLVs. But 1-2 years from now, all (major) IS-IS implementations will support multipart TLVs for all TLVs. And then the new router-capability will have no use anymore. But routers all around the world will still advertise it. I predict that 25 years from now, 23 years after all IS-IS implementations started to support multi-part TLVs, routers will still advertise your capability. I don't like that. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] LSR WG Adoption Poll for "IS-IS Topology-Transparent Zone" - draft-chen-isis-ttz-11.txt
I object the introduction of a new major concept, called "zone". It adds nothing to solve problems we can not already solve. It just adds unnecessary complexity and technical debt. (12) In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. henk. Acee Lindem (acee) schreef op 2020-08-18 16:16: Based on the discussions in the last meeting and on the mailing list regarding draft-chen-isis-ttz-11, the chairs feel that there are enough differences with draft-ietf-lsr-isis-area-proxy-03 and in the community to consider advancing it independently on the experimental track. These differences include abstraction at arbitrary boundaries and IS-IS extensions for smooth transition to/from zone abstraction. We are now starting an LSR WG adoption call for draft-chen-isis-ttz-11.txt. Please indicate your support or objection to adoption prior to Tuesday, September 2nd, 2020. Thanks, Acee and Chris ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Request WG adoption of TTZ
Hello Tianran, Warning, long email again. What's the criterion to evaluate the benefit? As people have asked before, did any provider or enterprise ever use rfc8099 in their network ? As I wrote, one of my criteria is rfc1925. I like technology to be understandable. I like protocols to be (relatively) easy to implement. The more unused cruft there is, the further we get away from that goal. I'll give you an example. Did you, or your company ever implement rfc2973 ? That's mesh-groups in IS-IS. I'm sure some customers put it on their wishlist. Did any provider or customer ever use it ? I asked this question at my last job, and nobody knew the answer. I suspect nobody in the world ever used mesh-groups. Around the time I got in touch with IS-IS, in spring 1996, there was a problem that was seen 2 of the 3 largest ISPs in the US (UUnet and iMCI). Both networks melted because of IS-IS. All routers in their networks were 100% cpu time running IS-IS, busy exchanging LSPs. While no progress was made. The only solution was to reboot all routers in the backbone at the same time (several hundred routers). This happened more than once in both networks. To relieve the burden of flooding, mesh-groups were implemented, and rfc2973 was written. However, a short while later I became the sole IS-IS programmer for that router vendor. I was able to reproduce the problem in the lab. I then realized what the issue was. A fix of 10 lines of extra code fixed the problem. No customer ever reported those meltdowns again. That fix was the real solution. Not writing another RFC. In the mean-time, we have an extra RFC, about mesh-groups. Every book and manual on IS-IS has to spent time explaining what mesh-groups are. Every vendor has to implement it. Even when nobody in the world is using it. Mesh-groups were a superfluous idea. What I (and many others) are saying is that we don't want to specify and implement unnecessary things. Even when nobody is using such a thing, it will live on forever. What I see the TTZ does have benefit. Yes, TTZ and proxy-areas have benefit. Nobody is disagreeing. But what people don't like is the new concept of a zone. If you can abstract exactly one area into exactly one proxy-LSP, that is good enough for 99.9 % of cases. In OSPF it is harder to split or merge an area. In IS-IS it is a lot easier. So a network operator can design and change his areas first. And then implement proxy-areas as she/he wishes. Without much downtime. If we introduce the concept of a "zone", someone is going to have to explain that to everybody in the world who uses IS-IS. Have you ever taught a class on IS-IS to people who don't know routing protocols very well ? I am also wandering how it hurts the protocol in the long run ? Adding stuff that nobody uses makes everything more complex. I know it seems as if the goal over the last 15 years was to make every thing more complex. So what's the problem with adding yet another RFC ? But I like simple things. henk. Tianran Zhou wrote on 2020-07-16 02:41: > "Adding a new concept, with very little benefit, hurts the protocol in > the long run. The ability to abstract an area, and not also a zone, is > strong enough to be worthwhile, imho." Your conclusion here seems very subjective. What's the criterion the evaluate the benefit? What I see the TTZ does have benefit. I am also wandering how it hurts the protocol in the long run? Tianran ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Request WG adoption of TTZ
Huaimo Chen wrote on 2020-07-14 06:09: 2). IS-IS TTZ abstracts a zone to a single node. A zone is any target block or piece of an IS-IS area, which is to be abstracted. This seems more flexible and convenient to users. I don't agree that this convenience is really beneficial. I actually think this convenience is a downside. Link-state protocols are not easy to understand. And we already have the misfortune that IS-IS and OSPF use different names for things. Adding the new concept of a "zone", while we already have the concept of an area makes things only more complex. How often will this new flexibility be used in the real world ? I still haven't seen an answer to Christian Hopp's simple question: "Has RFC8099 been deployed by anyone ?" Anyone who has an answer ? My favorite rule of RFC1925 is rule 12: In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. Adding a new concept, with very little benefit, hurts the protocol in the long run. The ability to abstract an area, and not also a zone, is strong enough to be worthwhile, imho. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] WG adoption call for draft-li-lsr-isis-area-proxy-06
It’s very clear that this is inadequate. It's not so clear to me, sorry. Does anyone have an example (link or jpg) of a (sensible) topology that would not work with multiple levels of hierarchy, but works nicely/better with area-proxies (or FRs) ? Just curious. The structure of legacy IS-IS areas effectively precludes a scalable network for using lower levels for transit. This constrains ISPs to ‘cauliflower’ topology where you have L1 on the outside, L2 just inside of L1, L3 inside of L2, etc. I understand. L1-8 forces a hierarchical network designs. But even if one would have the tools to design a non-hierarchical network, that doesn't mean one should do so. :-) We already see networks who are unwilling to use the two levels that we have today due to this constraint. I think L1-8 levels would be a good starting principle for designing large networks. If there are spots in the network where the hierarchical constraints are a problem in the real world, indeed it would be nice to have tools like area-proxies in the tool-set, to help solve those problems. I would like to have both tools. I think you do too (as you are author of both drafts). henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] WG adoption call for draft-li-lsr-isis-area-proxy-06
I support the area-proxy draft. I think both the area-proxy draft and the flood-reflection drafts are a bit hacky. However, the result of the area-proxy draft has a certain elegance: only one L2 LSP per area in the backbone. The flood-reflection draft is just messy, imho. 1) The edge-routers of each area are still visible in L2. Making the L2 scaling benefits a factor less, compared to area-proxy. 2) You need a new tunneling technology to flood LSPs between edge-routers and the flood-reflector. Even when you simply use TCP, this adds new and unnecessary complexity. 3) You either need to tunnel user-traffic between edge-routers. Or you redistribute all L2 prefixes into L1. Negating the benefit of having L1-only routers in your transit area. I don't like either option. BTW, personally I think the proper solution to scale IS-IS to larger networks is 8 levels of hierarchy. Too bad that idea gets so little push from vendors and operators. henk. Christian Hopps schreef op 2020-06-10 21:27: This begins a 2 week WG adoption call for the following draft: https://datatracker.ietf.org/doc/draft-li-lsr-isis-area-proxy/ The draft would be adopted on the Experimental track. Please indicate your support or objection by June 24, 2020. Authors, please respond to the list indicating whether you are aware of any IPR that applies to this draft. Thanks, Chris and Acee. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Mitchel wrote: IS-IS has two levels of neighbors via hello level 1s (LSAs) and hello level 2s :, so immediate is somewhat relative.. As Tony said, Level-2 neighbors are still directly adjacent. There might be layer-2 switches between them. But there are never layer-3 routers between 2 adjacent level-2 neighbors. Les's point is that interfaces, linecards, and the interface between the data-plane and the control-plane can all be seen as points between 2 ISIS instances/process on two different routers, where ISIS messages might be dropped. And that therefor you need congestion-control (in stead of, or added to) receiver-side flow-control. Sorry, I disagree, Link capacity is always an issue.. Note, we're not trying to find the maximum number of LSPs we can transmit. We just want to improve the speed a bit. From 33 LSPs/sec today to 10k LSPs/sec or something in that order. There's no need to send 10 million LSPs/sec. Suppose the average LSP is 500 bytes. Suppose a router sends 10k LSPs per second. I think if ISISes can send 10k LSPs/sec, we've solved the problem for 99.99% of networks. 10k LSPs is 5 000 000 bytes. Is 40 000 000 bits. Is 40 Mbps. So a continuous stream of 10k LSPs/sec takes 40 Mbps to transmit. For LSP-flooding, bandwidth itself is never the problem. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
On Friday I wrote: I still think we'll end up re-implementing a new (and weaker) TCP. Christian Hopps wrote 2020-05-04 01:27: Let's not be too cynical at the start though! :) I wasn't trying to be cynical. Let me explain my line of reasoning two years ago. When reading about the proposals for limiting the flooding topology in IS-IS, I read a requirement doc. It said that the goal was to support areas (flooding domains) of 10k routers. Or maybe even 100k routers. My immediate thought was: "how are you gonna sync the LSDB when a router boots up ? That takes 300 to 3000 seconds !?". This is the problem I wanted to solve. I hadn't even thought of routers in dense topologies that have 1k+ neighbors. There are currently heathens that use BGP as IGP in their data-centers. There's even a cult that is developing a new IGP on top of BGP (LSVR). If they think BGP/BGP-LS/LSVR are good choices for an IGP, why is that ? One reason is that people claim that BGP is more scalable. Note, when doing "Internet-routing" with large number of prefixes, routers, or some implementations of BGP, still sometimes need minutes, or dozens of minutes to process and re-advetise all those prefixes. So when we talk about minutes, why do people think BGP is so much more wonderful ? I think it's TCP. TCP can transport lots of info quickly and efficiently. And conceptually TCP is easy to understand for the user ("you write into a socket, you read from a socket on the other box. done"). If TCP is good enough for BGP bulk-transport, it should be good enough for IS-IS bulk-transport. If there are issues with using TCP for routing-protocols, I'm sure we've solved those by now (in our implementations). We can use those same solutions/tweaks we use for BGP's TCP in ISIS's TCP. Or am I too naive now ? BTW, all the implementations I've worked with used regular TCP. All the Open Source BGPs seem to be using the regular TCP in their kernels. Can someone explain why TCP is good for BGP but not for IS-IS ? Almost 24 years ago, I sat on a bench in Santa Cruz discussing protocols with an engineer who had a lot more experience than I had, and still have. He was designing LDP at the time (with Yakov). LDP also uses TCP. He said "if we had to design IS-IS now, of course we'd use TCP as transport now". I never forgot that. The goal here is not to make IS-IS transport optimal. We don't need to use maximum available bandwidth. I just happen to think we need the same 2 elements that TCP has: sender-side congestion-avoidance and receiver-side flow-control. I hope I have explained why sender-side congestion-control in IS-IS is not enough (you don't get the feedback you need to make it work). Les and others have tried to explain why receiver-side flow-control is hard to implement (the receiving IS-IS might not know about the state of its interfaces, linecards, etc). That's why I think we need both. And when we implement both, it'll start to look like TCP. So why not use TCP itself ? Or Quic ? Or another transport that's already implemented ? I'd note that our environment is a bit more controlled than the end-to-end internet environment. In IS-IS we are dealing with single link (logical) so very simple solutions (CTS/RTS, ethernet PAUSE) could be viable. Les's argument is that it's often not so controlled. Let me ask you one question: In your algorithm, the receiving IS-IS will send a "pause signal" when it is overrun. How does IS-IS know it is overrun ? The router is dropping IS-IS pdu's on the interface, on the linecard, on the queue between linecards and Control Plane, on the IS-IS process's input-queue. When queues are full, you can't send a message up saying "we didn't have space for an IS-IS message, but we're sending you this message that we've just dropped an IS-IS message". How do you envision this works ? Imho receiver-side flow-control can only send a rough upper-bound on how many pdu's it can receive normally. A solution with a "pause signal" is basically the same as a receiver-side flow-control, where the receive-window is either 0 or infinite. Thus our choice of algorithms may well be less restricted. I'm looking forward to seeing (an outline of) your algorithm. Again, I'm not pushing for TCP (anymore). I'm not pushing for anything. I'm just trying to explain the problems that I see with solutions that are, imho, a bit too simple to really help. Maybe I'm wrong, and the problem is simpler than I think. Experimentation would be nice. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
[Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Hello all, Two years ago, Gunter Van de Velde and myself published this draft: https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00 That started this discussion about flow/congestion control and ISIS flooding. My thoughts were that once we start implementing new algorithms to optimize ISIS flooding speed, we'll end up with our own version of TCP. I think most people here have a good general understanding of TCP. But if not, this is a good overview how TCP does it: https://en.wikipedia.org/wiki/TCP_congestion_control What does TCP do: TCP does 2 things: flow control and congestion control. 1) Flow control is: the receiver trying to prevent itself from being overloaded. The receiver indicates, through the receiver-window-size in the TCP acks, how much data it can or wants to receive. 2) Congestion control is: the sender trying to prevent the links between sender and receiver from being overloaded. The sender makes an educated guess at what speed it can send. The part we seem to be missing: For the sender to make a guess at what speed it can send, it looks at how the transmission is behaving. Are there drops ? What is the RTT ? Do drop-percentage and RTT change ? Do acks come in at the same rate as the sender sends segments ? Are there duplicate acks ? To be able to do this, the sender must know what to expect. How acks behave. If you want an ISIS sender to make a guess at what speed it can send, without changing the protocol, the only thing the sender can do is look at the PSNPs that come back from the receiver. But the RTT of PSNPs can not be predicted. Because a good ISIS implementation does not immediately send a PSNP when it receives a LSP. 1) the receiver should jitter the PSNP, like it should jitter all packets. And 2) the receiver should wait a little to see if it can combine multiple acks into a single PSNP packet. In TCP, if a single segment gets lost, each new segment will cause the receiver to send an ack with the seqnr of the last received byte. This is called "duplicate acks". This triggers the sender to do fast-retransmission. In ISIS, this can't be be done. The information a sender can get from looking at incoming PSNPs is a lot less than what TCP can learn from incoming acks. The problem with sender-side congestion control: In ISIS, all we know is that the default retransmit-interval is 5 seconds. And I think most implementations use that as the default. This means that the receiver of an LSP has one requirement: send a PSNP within 5 seconds. For the rest, implementations are free to send PSNPs however and whenever they want. This means a sender can not really make conclusions about flooding speed, dropped LSPs, capacity of the receiver, etc. There is no ordering when flooding LSPs, or sending PSNPs. This makes a sender-side algorithm for ISIS a lot harder. When you think about it, you realize that a sender should wait the full 5 seconds before it can make any real conclusions about dropped LSPs. If a sender looks at PSNPs to determine its flooding speed, it will probably not be able to react without a delay of a few seconds. A sender might send hunderds or thousands of LSPs in those 5 seconds, which might all or partially be dropped, complicating matters even further. A sender-sider algorithm should specify how to do PSNPs. So imho a sender-side only algorithm can't work just like that in a multi-vendor environment. We must not only specify a congestion-control algorithm for the sender. We must also specify for the receiver a more specific algorithm how and when to send PSNPs. At least how to do PSNPs under load. Note that this might result in the receiver sending more (and smaller) PSNPs. More packets might mean more congestion (inside routers). Will receiver-side flow-control work ? I don't know if that's enough. It will certainly help. I think to tackle this problem, we need 3 parts: 1) sender-side congestion-control algorithm 2) more detailed algorithm on receiver when and how to send PSNPs 3) receiver-side flow-control mechanism As discussed at length, I don't know if the ISIS process on the receiving router can actually know if its running out of resources (buffers on interfaces, linecards, etc). That's implementation dependent. A receiver can definitely advertise a fixed value. So the sender has an upper bound to use when doing congestion-control. Just like TCP has both a flow-control window and a congestion-control window, and a sender uses both. Maybe the receiver can even advertise a dynamic value. Maybe now, maybe only in the future. An advertised upper limit seems useful to me today. What I didn't like about our own proposal (flooding over TCP): The problem I saw with flooding over TCP concerns multi-point networks (LANs). When flooding over a multi-point network, setting up TCP connections introduces serious challenges. Who are the endpoints of the TCP connections ? Full mesh ? Or
Re: [Lsr] Dynamic flow control for flooding
Hello Les, Thanks for taking the time to respond. [Les:] Base specification defines partialSNPInterval (2 seconds). Clearly w faster flooding we should look at decreasing this timer - but we certainly should not do away with it. That was the point I was trying to make: You kept mentioning that your "tx based flow control" only needed changes to the internal implementation of the LSP-sender. That's not the case. Your algorithm also depends on behaviour of the LSP-receiver. I did not see that mentioned anywhere before. Good to see that you (and Tony) now acknowledge this necessity. I hope you also realize (and agree) that changing the algorithm to send PSNPs on the LSP-receiver, in a way to improve the flow-control algorithm for the LSP-sender, will probably have a negative impact on the current efficiency of bundling acks in PSNPs. And that change can multiply the number of PSNPs (and thus ISIS PDUs in input queues) that need to be received on routers. If you don’t like the name we can certainly find something more appealing. I don't care much about the name. (In general I do care about naming in programming. And even 10x more about naming in protocol documents. But that's not important in the discussion at the moment). The point I was trying to get across is that your proposal is not something that happens internally on a single individual router. It is an algorithm that involves 2 routers. And thus it is a protocol issue. What I am proposing does not require protocol extensions - therefore no draft is required. Protocols do no only describe octets on the wire. They also describe behaviour. Thus, as Tony has already said, your proposed algorithm also need to be documented. In an RFC probably. Whether a BCP draft is desired is something I am open to considering. I don't know much about process in the IETF. But I was always under the assumptions that BCPs were mostly network design/configuration recommendations for network operators. From an earlier email: [Les:] I think you know what I am about to say.. :) Yes, my question of why use exponential backoffs was a rethorical question (as I wrote at the end of my email). I wrote: I hope it is clear to everyone that these are not serious questions. I'm just saying: "sometimes fast is slow". FYI, few people probably know this, but I happen to be the guy that intially came up with the idea of exponential backoffs in IGPs. (Back in 1999 when I was at cisco). Anyway, to reiterate my point: "sometimes fast is slow". It seems we now all agree that sending LSPs "rapidly" and then assuming retransmissions will fix any problems, is an approach that is way too naive. Good. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Dynamic flow control for flooding
Les Ginsberg (ginsberg) schreef op 2019-07-23 22:29: It is a mistake to equate LSP flooding with a set of independent P2P “connections” – each of which can operate at a rate independent of the other. Of course, if some routers are slow, convergence in parts of the network might be slow. But as Stephane has already suggested, it is up to the operators to determine if slower convergence in parts of their network is acceptable. E.g. they can chose to put fast/expensive/new routers in the center of their network, and move older routers to, or buy cheaper routers for, the edges of their network. But I have a question for you, Les: During your talk, or maybe in older emails, I was under the impression that you wanted to warn for another problem. Namely microloops. I am not sure I understand correctly. So let me try to explain what I understood. And please correct me if I am wrong. Between the time a link breaks, and the first router(s) start to repair the network, some traffic is dropped. Bad for that traffic of course. But the network keeps functioning. Once all routers have re-converged and adjusted their FIBs, everything is fine again. In the time in between, between the first router adjusting its FIB and the last router adjusting its FIB, you can have microloops. Microloops multiply traffic. Which can cause the whole network to suffer of congestion. Impacting traffic that did not (originally) go over the broken link. So you want the transition from "wrong FIBs", that point over the broken path, to "the final FIBs", where all traffic flows correctly, to have that transition happen on all routers at once. That would make the network go from "drop some traffic" to "forward over the new path" without a stage of "some microloops" in between. Am I correct ? Is this what you try to prevent ? Is this why you want all flooding between routers go at the same speed ? Thanks in advance, henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Dynamic flow control for flooding
Hello Robert, Tony brought up the example of a partioned network. But there are more examples. E.g. in a network there is a router with a 1000 neighbors. (When discussing distributed vs centralized flooding-topology reduction algorithms, I've been told these network designs exist). When such a router reboots/crashes/comes back up, all 1000 neighbors will create a new version of their own LSP. This causes a 1000 different LSPs to be flooded through the network at the same time. Impacting every router in the network. The case I was thinking of myself, was when a router in a large network boots. When it brings up a number of adjacencies, each neighbor will try to synchronize its LSPDB with the newly booted router. As the newly booted router will send emtpy CSNPs to each of its neighbors, each neighbor will start sending the full LSPDB. If such a network has 10k LSPs, and such a router has 100 neighbors, that router will receive 100 * 10k is 1 million LSPs. Having a faster and more efficient flooding transport, with flow-control, will make a reboot in such a topology less painful. (In that last case, creative use of the overload-bit could prevent black-holing or microloops while ISIS synchronizes its LSPDB after a reboot. Just like we used the overload-bit to solve the problem of slow convergence of BGP after a reboot, 22 years ago. I have no idea if there are any implementations that use the overload-bit to alleviate slow convergence of IS-IS after a reboot). henk. Robert Raszuk schreef op 2019-07-24 15:33: Hey Henk & all, If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long as Tony mentioned the full flooding as Tony said may take 33 sec - is this really a problem ? Remember we are not talking about protocol convergence after link flap or node going down. We are talking about serious network partitioning which itself may have lasted for minutes, hours or days. While just considering absolute numbers yelds desire to go faster and faster, if we put things in the overall perspective is there really a problem to be solved in the first place ? Would there still be a problem if LSR WG recommends faster acking maybe not for each LSP but for say 20 or 30 max ? Thx, R. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] Dynamic flow control for flooding
Hello Les, Les Ginsberg (ginsberg) wrote on 2019-07-24 07:17: If you accept that, then it makes sense to look for the simplest way to do flow control and that is decidedly not from the RX side. (I expect Tony Li to disagree with that 😊 – but I have already outlined why it is more complex to do it from the Rx side.) In your talk on Monday you called the idea in draft-decraene-lsr-isis-flooding-speed-01 "receiver driven flow control". You don't like that. You want "transmit based flow control". You argued that you can do "transmit based flow control" on the sender only. Therefor your algorithm is merely a "local trick". And "local tricks" don't need RFCs. I agree with that. But I don't agree that your algorithm is just a "local trick". In your algorithm, a "sender" sends a number of LSPs to a receiver. Without waiting for acks (PNSPs). Like in any sliding window protocol. The sending router keeps an eye on the number of unacked LSPs. And it determines how fast it can send more LSPs based on the current number of unacked LSPs. Every time the sender receives a PSNP, it knows the receiver got a number of LSPs, so it can increase its send-window again, and then send more LSPs. Correct ? I agree that the core idea of this algorithm makes sense. After all, it looks a lot like TCP. I believe the authors of draft-decraene-lsr-isis-flooding-speed were planning something like that for the next version of their draft. However, I do not agree with the name "tx driven flow control". I also do not agree that this algorithm is "a local trick". Therefor I also do not think this algorithm doesn't need to be documented (in an RFC). In your "tx based flow control", the sender (tx) sends LSPs at a rate that is derived from the rate at which it receives PSNPs. Therefor it is the sender of the PSNPs that sets the speed of transmission ! So it is still the receiver (of LSPs) that controls the flow control. The name "tx based flow control" is a little misleading, imho. It is important to realize that the success of your algorithm actually depends on the behaviour of the receiver. How does it send PSNPs ? Does it send one PSNP per received LSP ? Or does it pack multiple acks in one PSNP ? Does it send a PSNP immediatly, or does it wait a short time ? Does it try to fill a PSNP to the max (putting ~90 acks in one PSNP) ? Does the receiver does something in between ? I don't think the behaviour is specified exactly anywhere. I know about an IS-IS implementation from the nineties. When a router would receive an LSP, it would a) set the SSN bit (for that LSP/interface), and b) start the psnp-timer for that interface (if not already running). The psnp-timer would expire 2 seconds later. The router would then walk the LSPDB, find all LSPs with the SSN-bit set for that interface. And then build a PSNP with acks for all those LSPs. The result would be that: a) the first PSNP would be send 2 seconds (+/- jitter) after receiving the first LSP, and b) the PSNP would include ~66 acks. (As a router receiving at full speed would have received 66 LSPs in 2 seconds). For your "tx based flow control" algorithm to work properly, this has to change. The receiving router must send PSNPs more quickly and more aggressively. The result would be that there will be less acks in each PSNP. And thus more PSNPs will be sent. This makes us realize: in the current situation, if a router receives a 1000 LSPs, and sends those LSPs to 64 neighbors, it would receive: - the 1000 LSPs from an upstream neighbor, plus - 1000/66 = 16 PSNPs from each downstream neighbor = 64 * 16 = 1024 PSNPs. This makes a total of ~2000K PDUs received. If routers would send one PSNP per LSP (to have faster flow control), then the router in this example would receive: - the 1000 LSPs from an upstream neighbor, plus - 1000 PSNPs from each downstream neighbor * 16 = 1600 PSNPs. This makes a total of ~17000 PDUs received. The total number of PDUs received on this router would go from 2K PDUs to 17K PDUs. Remember that the problem we're trying to solve here is to make sure that routers do not get overrun on the receipt side with too many packets too quickly. It seems an aggressive PSNP-scheme, to achieve faster flow-control, is actually very counter-productive. Of course the algorithm can be tweaked. E.g. TCP sends one ack per every 2 received segments (if I'm not mistaken). If we do that here, the number of PDUs would go down from 17K to 9K PDUs. What do you propose ? How do you want the feedback of PSNPs to be quick, while maintaining an efficient packing of multiple acks per PSNP ? In any case, the points I'm trying to make here: *) Your algorithm is not sender-driven, but still receiver-driven. *) Your algorithm changes/dictates behaviour both on sender and receiver. *) Interaction between a sender and a receiver is what we call a protocol. If you want to make this work, especially in multi-vendor environments, we need to document these algorith
Re: [Lsr] IS-IS over TCP
Les Ginsberg (ginsberg) wrote 2018-11-07 17:06: The problem that RFC6213 tries to solve is a case where one of the neighbors is thinking that the other does not support BFD. And thus the lack of BFD is not used as an indication that something is wrong. Right ? [Les:] This is not correct. The key paragraph is in https://tools.ietf.org/html/rfc6213#section-2 " The problem with this solution is that it assumes that the transmission and receipt of IS-IS Hellos (IIHs) shares fate with forwarded data packets. This is not a fair assumption to make given that the primary use of BFD is to protect IPv4 (and IPv6) forwarding, and IS-IS does not utilize IPv4 or IPv6 for sending or receiving its hellos." We have seen cases where IPv4/IPv6 data packet delivery has been compromised - but IS-IS PDU delivery was unaffected. This led to the following behavior: 1)IS-IS exchanges hellos and bring adjacency up. Routes using the link are installed 2)BFD session is started and comes up 3)After a time some problem occurs which only impact IP traffic - BFD session goes down. 4)IS-IS adjacency is brought down due to BFD session down, but IS-IS continues to send hellos. If they are successfully exchanged then IS-IS adjacency is almost immediately restored and we resume installing IP routes using the link even though BFD session never comes up. Data traffic gets dropped. Thanks for the explanation. But that is kinda what I wanted to say. BFD is failing (because the IP-path is failing), and IS-IS doesn't realize this, because it thinks that BFD isn't being used (because "BFD session never comes up". The extensions in RFC 6213 allow IS-IS to know when both sides support BFD, which means the sequence changes to: 1)IS-IS exchanges hellos - but adjacency remains in INIT state. 2)BFD session is initiated - IS-IS adjacency remains in INIT state until BFD session comes up. Thus IS-IS never installs routes using the link unless we know IP traffic can be successfully delivered. We can then use BFD both as a requirement to bring the adjacency up AND as fast failure detection. OK. What do you suggest we do to fix this failure case if we do flooding over TCP ? - We could make BFD mandatory when flooding over TCP ? If the IP-path is broken, TCP will fail, but BFD will also fail ? - We bring the adjacency to UP state, but we don't include it in our own LSP immediately. Only after the TCP session has been established, we advertise the new adjacency in our LSP. Would that be enough ? It would stop routes from being calculated over the new adjacency. Maybe wait until the TCP connection has been set up, and a pair of IIHs has been exchanged over it ? (So authentication and other stuff can be verified for the TCP session). Or maybe even wait until IIHs have been sent, and then full sets of CSNPs are exchanged in both directions ? That last suggestion starts approaching the way OSPF does this. If I recall correctly, OSPF will only include adjacencies in its type-1 LSA after DDs have been exchanged, and the full LDSB has been synchronized. Would you want IS-IS to do the same ? - If the TCP session breaks, do you want to stop including the adjacency in the LSP ? This will make things like NSF, process restart and control plane failover much harder. What if two routers can exchange IIHs and do proper flooding of LSPs. But they can not exchange IP packets ? This could happen. IS-IS does not have a way to deal with this. [Les:] RFC 6213 was written precisely to address this case - and works very well. The fact that BFD is working does not mean it is 100% sure that aal IP traffic will work. Failures might depend on protocol number, portnumbers, packetsize, etc. I agree that it is likely that if BFD works, all of IP will work. But it's no guarantee. Likewise we have to decide how paranoia we want to be that if IIHs are exchanged, how sure are we that TCP can exchange LSPs as well ? Maybe a good compromise would be: 1) don't advertise the adjacency in your LSPs until the TCP flooding connection has been established. (And maybe IIHs/CSNPs are exchanged). 2) after connection is fully up (IIHs and flooding works), use longer time-outs to determine whether TCP is still working. 3) when the other side closes a TCP connection (by FIN or RST), don't stop advertising the adjacency in your LSP immediately. In stead, for the next 10 seconds or so, try to re-establish the TCP connection first. If re-establishment doesn't work, then the router can stop including the adjacency in its LSP. This would prevent routers advertising new adjacencies that have a problem with TCP. But if it works, and suddenly stops working, convergence is slower (10 seconds or so). But the protocol has the ability to re-establish the TCP session, to make it more flexible. Would that be acceptable ? [Les:] Actually they have. :-) That's why we wrote RFC 6213 - because the problem has been seen in the field. I was talking about a router fo
Re: [Lsr] IS-IS over TCP
Jeffrey Haas wrote 2018-11-07 20:56: I guess my question to those who live in IGP land is how often is this a problem? In the case of an IGP, the backpressure means you have databases that are out of sync and end up with bad forwarding. As discussed below, if you have multiple flooding paths, and not all of them are congested or throttled, when at least one copy of the LSP makes it across, convergence will be fast. Both iBGP and eBGP. The two general issues are slow receivers (scale) or responses to dropped packets. Slow receivers are a problem for native flooding too. Although I suspect you mean: after congestion problems, and the situation improves, native flooding will recover quickly, while TCP might intentionally keep things slow for a longer period of time. Correct ? Yes, that is an issue. The general experiment I recommend to people trying to do this sort of thing is take your TCP stream of choice, pace it according to your transmission needs, and then drop 5-30% of the packets. Observe what happens. Don't we have DSCP for this ? And remember, the TCP connection will (almost) always be between two directly connected endpoints. TCP recovers fine. But the hiccups can do bad things when timeliness is expected. For example, 3 second hold times for aggressive BGP peering may time out. Our proposal is to do only flooding over TCP. Adjacency management is still done based on native IIHs (and BFD). Even if TCP stalls the flooding, the adjacency should stay up. With flooding over multiple paths, it should not be a fatal event. I guess I'd restate my concern as "for this application, ensure that you're okay with the results of stalled trasnmission". Effectively, see the answer to the question I asked above about native behaviors. [Flooding happens over multiple paths. As long as one path is quick, convergence will be quick too]. This, I think, is a better point addressing my concerns. Thanks. I expect there will be more issues that need to be addressed. E.g. an old rule of thumb is: don't generate a routing update (packet) unless you are pretty sure you can send it right away, and the receiver can receive it. Otherwise a lot of the actual communication might be stale information. I'm not sure if people find this rule of thumb still relevant these days. (I know people who do not). With an abundance of cpu-cores, memory and bandwidth, it seems many problems of the past are not visible anymore. Unless you start pushing beyond what most people do. But if you do care, it is advisable to keep your TCP window-sizes small. Maybe at the default 16KB, maybe even 8KB or 4KB. With a window-size of 4KB you might be able to still send a dozen average-size LSPs, and those might get stuck/stale in TCP. But I think that's a good trade-off to get syncing of large LSDBs. As long as you don't set the window-size to 64KB or larger. And maybe even then, stale LSPs might be less of a problem than old-timers think. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] IS-IS over TCP
Hi Jeff, Jeffrey Haas wrote on 2018-11-06 05:20: I'm ambivalent of the transport, but agree that TCP shouldn't be the default answer. I picked TCP because every router has a working TCP implementation. And TCP is good enough for BGP. And thus also considered good enough for LSVR. If that's the case, I'd assume it is good enough for IS-IS as well. It's easy to adjust our draft so that new transports can be introduced over time. We can do TCP now. Add Quic later. And add other, new, better transports later, when they become available. I don't know much about Quic. But it seems the protocol and details are not 100% stable yet. Maybe soon. So maybe we can do TCP now, and Quic later ? Also, Quic might be easy to implement for router OSes that run on top of Unix. But for OSes that use QNS, vxWorks, or something proprietary, Quic might be more work. (It's up to others to decide if that's important or not. I have no opinion on this matter myself). My concerns that I tried raising via jabber summarize roughly as follows: - TCP is prone to interesting backpressure issues, typically as a result of packet loss or slow receivers. If a receiver is slow, that's the same situation as when IS-IS on the neighbor is slow. Retransmissions happen. Retransmissions with 10589 flooding are fixed time (5 seconds). (I guess some modern implementations do something smarter). So convergence would have been impacted with native flooding too. Note that our proposal only does TCP over directly connected routers. I'm sorry to say I have little experience with behaviour of BGP in real networks. Where did you observe these backpressure issues ? EBGP or iBGP ? I expect to see more problems with iBGP, because iBGP goes over multiple hops, which can cause all kinds of issues. EBGP is mostly over directly connected interfaces. I expect TCP to behave much nicer There. TCP behaviour of IS-IS flooding would resemble eBGP more than iBGP. At least, that is my expectation. Note. If you would do flooding over a tunnel, flooding over TCP might be beneficial too. Because of tunnel overhead (e.g. GRE-headers) tunnels usually have a smaller MTU. Therefor all max-sized LSPs will need to be fragmented when sent over the tunnel. This is also (especially) true for CSNPs. For packet-based tunneling-protocols, that means 2 packets for each max-sized LSP or CSNP. When using TCP, the LSPs get spread out over multiple segments, which should make reassembly a bit easier/cheaper. - TCP timers can react poorly in some environments where you may want time sensitive things. This includes something as long as 3 second BGP hold timers. When you do flooding-over-tcp, then you don't need to send PSNPs (acks) or do retransmission of LSPs. So you don't need timers for those. Things become less time-sensitive (at the cost of potentially slower flooding). - IGPs have a lot of interesting timer hacks to try to ensure that a given domain has a consistent database prior to running an SPF. In the face of "stuck" flooding due to backpressure or other things, some of these may need to be revisited. Again, it is my expectation that in case of problems with TCP, that same situation would have been worse with native ISIS flooding. Also note that in BGP, every update packet over every peering has significance. If one gets delayed, it slow down overall convergence. In ISIS flooding, a router will receive multiple copies of the same LSP. So if one TCP-connection is slow, the router might still receive the same LSPs over other paths. And the impact on overall convergence is likely to be less. Of course, this implies that routers flood over more than 1 or 2 interfaces. If we do one of the flooding-reduction proposals, I hope we'll end up with a situation where we have 3 or 4 redundant flooding topologies, so that routers will still receive LSPs quickly over other topologies when the primary topology has problems. It's been over a year since I looked at QUIC. I agree with Tony that a number of the properties it had on my last read are desirable. I'd suggest that its behavior (especially timers) in the event of packet loss should be given a look at based on the comments above. One other benefit of doing flooding over TCP is that part of the flooding administration is now outsourced to TCP. And TCP usually runs in another thread of process, inside or outside the kernel. This means that we'll automatically get a light form of multi-threading. Less work for the IS-IS process/threads. Quic runs in user-space. I don't know if that means it is a library, and functions run in the user's thread/process. Or whether Quic is a separate process/thread. If Quic runs in IS-IS's thread, it means we lose a cheap form performance improvements because of multi-threading. henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] IS-IS over TCP
ith situations where single routers have 1k adjacencies. If we want to improve the protocol, I think we should also improve the situation where we need to flood 10k LSPs over a single adjacency. That's the problem we try to solve here. My point here is that there are existing implementations which would get no benefit from your proposal. It might be argued that someone writing a new implementation may find it simpler to make use of a transport mechanism like TCP - but I do not think there is compelling data that demonstrates that the scalability of an implementation using your proposal is better than that of many existing implementations. When I worked at cisco in the nineties, that IS-IS implementation could deal with 250 routers in a double full mesh. That's 500 adjacencies per router. Those routers were running 100MHz and 200MHz mips cpus. Some were even cisco 7000s with 68040 cpus. That worked. With my outdated knowledge, I can not understand who a datacenter fabric where routers have 64 or 128 adjacencies could be a problem. Not with a proper implementation. But we're trying to fix that too. This then suggests that for existing implementations the main motivation to support your proposal is to help other implementations which have not optimized their existing implementations. :-) Comments? Proper IS-IS implementations should split up in 3 threads at least. One thread for maintaining adjacencies, one for doing flooding and one for doing SPFs (and route-installs). That way an SPF doesn't hold up flooding. And heavy flooding or long SPFs don't break adjacencies. How many implementations in the field do that ? Heck, I've even seen today's implementations that do not split off adjacency-maintenance as a separate process or thread. Anyway, the question is: if you want to have 10k LSPs in your flooding domain, do we depend on custom improvements, or do we want something that's documented ? One of the things that inspired me to do this proposal was LSVR. LSVR uses BGP-LS to transport LSPs. Why ? The word on the street is that "BGP scales so much better". Why does BGP scale so good ? Imho there is one main reason: BGP uses TCP for transport. Now I don't like LSVR (and I don't like BGP-LS). Because LSVR seems to re-invent the wheel, with no real improvements over IS-IS, except for the fact that it uses BGP which uses TCP. If that's the main benefit of LSVR, then why not just use TCP and be done with it ? BTW, there is another reason for our proposal. With the incoming drafts about flooding-topology-reduction, there is a new problem. All these proposals have situations where non-flooding adjacencies suddenly change to flooding adjacencies. When that happens, the LSDBs need to be synchronized again. To do that, all of them propose "just send a CSNP and be done with it". Well, the more LSPs, the more CNSPs that need to be sent. With 10k LSPs that's 110 CSNPs. CSNPs are not reliable. This re-synchronization happens when there is churn in the IGP. Are we sure CSNPs aren't dropped somewhere ? Can we start sending LSPs because we know the neighbor has sent all its CSNPs yet ? With reliable transport for LSPs and SNPs these worst-case scenarios will improve. Apologies for the long text. I hope it explains our goals and proposal a bit more. henk. Les -Original Message- From: Lsr On Behalf Of Henk Smit Sent: Monday, November 05, 2018 8:22 PM To: tony...@tony.li Cc: lsr@ietf.org Subject: Re: [Lsr] IS-IS over TCP Thanks, Tony. We picked TCP because every router on the planet already has a TCP stack in it. That made it the obvious choice. Our draft described a TVL in the IIHs to indicate a router's ability to use TCP for flooding. That TLV has several sub-TVLs. 1) the TCP port-number 2) an IPv4 address 3) and/or an IPv6 address We can change the first sub-TVL so that it indicates: 1) 1 or 2 bytes indicating what protocol to use 2) the remainder of the sub-TLV is an indicator what port-number or other identifier to use to connect over that protocol. This way we can start improving IS-IS with TCP today. And add/replace it with other protocols in the future. henk. tony...@tony.li schreef op 2018-11-06 04:51: > Per the WG meeting, discussing on the list: > > This is good work and I support it. > > I would remind folks that TCP is NOT the only transport protocol > available and that perhaps we should be considering QUIC while we’re > at it. In particular, flooding is a (relatively) low bandwidth > operation in the modern network and we could avoid slow-start issues > by using QUIC. > > Tony > > ___ > Lsr mailing list > Lsr@ietf.org > https://www.ietf.org/mailman/listinfo/lsr ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
Re: [Lsr] IS-IS over TCP
Thanks, Tony. We picked TCP because every router on the planet already has a TCP stack in it. That made it the obvious choice. Our draft described a TVL in the IIHs to indicate a router's ability to use TCP for flooding. That TLV has several sub-TVLs. 1) the TCP port-number 2) an IPv4 address 3) and/or an IPv6 address We can change the first sub-TVL so that it indicates: 1) 1 or 2 bytes indicating what protocol to use 2) the remainder of the sub-TLV is an indicator what port-number or other identifier to use to connect over that protocol. This way we can start improving IS-IS with TCP today. And add/replace it with other protocols in the future. henk. tony...@tony.li schreef op 2018-11-06 04:51: Per the WG meeting, discussing on the list: This is good work and I support it. I would remind folks that TCP is NOT the only transport protocol available and that perhaps we should be considering QUIC while we’re at it. In particular, flooding is a (relatively) low bandwidth operation in the modern network and we could avoid slow-start issues by using QUIC. Tony ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
[Lsr] A new proposal on how to do Sparse Link-State Flooding in Dense Topologies
Hello all, Gunter Van De Velde and myself have submitted a first version of our new draft regarding sparse Link-State flooding in dense topologies. You can find the new draft here: https://datatracker.ietf.org/doc/draft-hsmit-lsr-isis-dnfm/ Abstract This document describes a technology extension to reduce link-state flooding in highly resilient dense networks. It does this by using simple and backwards-compatible extensions to reduce the number of adjacencies over which link-state flooding takes place. "IS-IS Sparse Link-State Flooding" is an extension to the IS-IS routing protocol. It is relatively easy to understand and implement. It is backwards compatible. It requires no per-node configuration. It uses a distributed algorithm, therefor no centralized computations are required. No complex computations are required on each node in the network. The algorithm has no requirements for the network topology. It can be deployed in a redundant way to improve robustness and convergence-times. The element that distinguishes this algorithm from other proposals is the fact that it uses a new TLV in IIHs to signal suppression of flooding between two adjacent routers. A network has an "anchor", which will function as the root of the tree that forms the flooding topology. Each router will request, via its IIHs, to flood only over an adjacency to another router that is closer to the anchor. More details are in the draft. Although the algorithm isn't overly complex, we might not have been 100% succesful yet in writing down the perfect description and explanation. All suggestions, comments and questions are welcome. Thanks in advance, Gunter & henk. ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr
[Lsr] New draft: IS-IS flooding over TCP
Hello all, Gunter van de Velde (Nokia) and me have been working on 2 ideas to improve the scalability of IS-IS flooding in large-scale data-center networks. One idea deals with reducing IS-IS flooding in dense topologies. Next week or so we'll publish a draft that explains this idea. It's a distributed algorithm, backwards compatible, no extra computations required, works in any topology, has redundancy. The other idea is simpler. This afternoon Gunter and myself have published a draft about it. You can find it here: https://datatracker.ietf.org/doc/draft-hsmit-lsr-isis-flooding-over-tcp/ Abstract: This document proposes a solution to use TCP for IS-IS flooding. The proposed solution is a relative simple extension to implement. IS-IS flooding over TCP brings BGP's property of scalable transport via TCP to Link-State protocols. This proposal defines a new TLV in point-to-point IIHs to signal the intent of a router to do flooding over TCP, and it defines a small header to encapsulate IS-IS PDUs in the TCP byte-stream. The idea is simple: - Routers include a new TLV in IIHs to indicate that they want to flood over TCP. - When both routers agree, (both have the new TLV included) the router with the lowest systemid opens a TCP connection to its new neighbor. - This TCP connection is used to send LSPs and SNPs over. - IIHs are still sent the classic way (directly in a layer-2 frame). - Flooding over TCP is only done on p2p interfaces, not on multipoint ifs. There are several benefits for IS-IS: - the IS-IS process does not need to send or receive SNPs (acks) - the IS-IS process doesn't need to do retransmissions of LSPs - multiple LSPs can be packed into fewer TCP segments - TCP will bring high throughput and flow-control to IS-IS flooding Of course, all feedback is welcome, Gunter & Henk ___ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr