Re: [Lsr] [Teas] Fwd: Working Group Last Call for "Applicability of IS-IS Multi-Topology (MT) for Segment Routing based Network Resource Partition (NRP)" - draft-ietf-lsr-isis-sr-vtn-mt-06

2024-01-12 Thread Henk Smit
>From the draft:
===
> The mechanism described in this document is considered useful for network 
> scenarios in which
> the required number of NRP is small, as no control protocol extension is 
> required. For network
> scenarios where the number of required NRP is large, more scalable solution 
> would be needed,
> which may require further protocol extensions and enhancements.
 
So the proposed draft is about a solution that doesn't scale (well).
And then later, we might get another solution that does scale (better).
Then we'll end up with two solutions for one problem. One bad solution, and one 
(hopefully) better solution.
 
If that is the case, then I suggest we wait a bit, and see what else the TEAS 
workgroup comes up with.
I rather have one good solution than two half-baked. Or even one good and one 
half-baked. Less is more.
 
henk.
 

> On 01/11/2024 4:40 AM CET Chongfeng Xie  wrote:
>  
>  
>  
> Hi Les,
>  
> Thanks for your comments.
>  
> This is an informational document which describes the applicability of 
> existing IS-IS MT mechanisms for building SR based NRPs. All the normative 
> references are either RFCs or stable WG documents. It is true that some 
> informative references are individual documents, while they just provide 
> additional information related to this topic, thus would not impact the 
> stability and maturity of the proposed mechanism.
>  
> The text you quoted from draft-ietf-teas-nrp-scalability are about the 
> considerations when the number of NRP increases, how to minimize the impact 
> to the routing protocols (e.g. IGP). While as described in the scalability 
> considerations section of this document, the benefit and limitation of using 
> this mechanism for NRP are analyzed, and it also sets the target scenarios of 
> this mechanism:
>  
>  “The mechanism described in this document is considered useful for 
> network scenarios in which the required number of NRP is small”
>  
> Thus it is clear that this solution is not recommended for network scenarios 
> where the number of required NRP is large.
>  
> Please note section 3 of draft-ietf-teas-nrp-scalability also mentioned that:
>  
>   “The result of this is that different operators can choose to deploy 
> things at different scales.”
>  
> And
>  
>   “In particular, we should be open to the use of approaches that do not 
> require control plane extensions and that can be applied to deployments with 
> limited scope.”
>  
>  According to the above text, we believe the mechanism described in this 
> document complies to the design principles discussed in 
> draft-ietf-teas-nrp-scalability and provides a valid solution for building 
> NRPs in a limited scope.
>  
>  Hope this solves your concerns about the maturity and scalability of this 
> mechanism.
>  
>  Best regards,
>  
> Chongfeng
>  
> 
> >  
> > 
> > From: Les Ginsberg \(ginsberg\) mailto:ginsberg=40cisco@dmarc.ietf.org
> > Date: 2024-01-11 08:21
> > To: Joel Halpern mailto:j...@joelhalpern.com; Acee Lindem 
> > mailto:acee.i...@gmail.com; t...@ietf.org mailto:t...@ietf.org; 
> > lsr@ietf.org mailto:lsr@ietf.org
> > Subject: Re: [Lsr] [Teas] Fwd: Working Group Last Call for "Applicability 
> > of IS-IS Multi-Topology (MT) for Segment Routing based Network Resource 
> > Partition (NRP)" - draft-ietf-lsr-isis-sr-vtn-mt-06
> > 
> > (NOTE: I am replying to Joel’s post rather than the original last call 
> > email because I share some of Joel’s concerns – though my opinion on the 
> > merits of the draft is very different.
> > Also, I want to be sure the TEAS WG gets to see this email.)
> >  
> > I oppose Last Call for draft-ietf-lsr-isis-sr-vtn-mt.
> >  
> > It is certainly true, as Joel points out, that this draft references many 
> > drafts which are not yet RFCs – and in some cases are not even WG 
> > documents. Therefore, it is definitely premature to last call this draft.
> >  
> > I also want to point out that the direction TEAS WG has moved to recommends 
> > that routing protocols NOT be used as a means of supporting NRP.
> >  
> > https://www.ietf.org/archive/id/draft-ietf-teas-nrp-scalability-03.html#name-scalabliity-design-principl
> >  states:
> >  
> > “…it is desirable for NRPs to have no more than small impact (zero being 
> > preferred) on the IGP information that is propagated today, and to not 
> > required additional SPF computations beyond those that are already 
> > required.”
> >  
> > https://www.ietf.org/archive/id/draft-ietf-teas-nrp-scalability-03.html#name-scalabliity-design-principl
> >  states:
> >  
> > “The routing protocols (IGP or BGP) do not need to be involved in any of 
> > these points, and it is important to isolate them from these aspects in 
> > order that there is no impact on scaling or stability.”
> >  
> > Another draft which is referenced is 
> > https://datatracker.ietf.org/doc/draft-dong-lsr-sr-enhanced-vpn/ 
> > https://datatracker.ietf.org/doc/draft-dong-lsr-sr-enhanced-vpn/ - which is 

Re: [Lsr] WG Adoption Call - draft-pkaneria-lsr-multi-tlv (11/17/2023 - 12/09/2023)

2023-11-30 Thread Henk Smit
Support.
 
As the mechanism described in the draft has already been implemented by the 
three
largest vendors of ISP-class routers, and that software has been deployed in 
real networks
today, we better document this asap in an RFC.
 
henk.
 
 

> On 11/17/2023 6:23 PM CET Yingzhen Qu  wrote:
>  
>  
> Hi,
>  
> This begins a WG adoption call for draft-pkaneria-lsr-multi-tlv: 
> draft-pkaneria-lsr-multi-tlv-04 - Multi-part TLVs in IS-IS (ietf.org) 
> https://datatracker.ietf.org/doc/draft-pkaneria-lsr-multi-tlv/
>  
> Please send your support or objection to the list before December 9th, 2023. 
> An extra week is allowed for the US Thanksgiving holiday.
>  
> Thanks,
> Yingzhen 
> ___
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
> 
 
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] New Version Notification for draft-pkaneria-lsr-multi-tlv-01.txt

2022-09-13 Thread Henk Smit
Hi Tony,


> Yes, I'm advocate for putting things elsewhere, but that proposal has
> met with crickets.  You don't get it both ways: no capabilities in the
> protocol and nowhere else does not work.

I'm not sure I know what you are talking about.
Did you write a draft?

> Because the thought of trying to deploy this capability at scale without
> this attribute seems impossible. Consider the case of Tier 1 providers
> who have large IS-IS deployments. Are you really going to evaluate 2000+
> nodes without some kind of help?

With the help of the management-plane?
How did those providers make changes to their configs/features/architecture 
before?
I would expect them to use the same tools.

> And the routers will do computations based on the multi-part TLVs.
> One level of indirection for a capability does not seem extreme.

Not extreme, indeed.
But again, I rather not see 20 different minor or irrelevant things
in the router-capability TLV. Certainly not at 2 octets per item.
1 Bit would already be (16 times) better.

> > Regardless whether we do that or not, this discussion maybe should be done
> > outside the multipart TLV  discussion. Maybe another draft should be written
> > about these software-capabilities in general?
> 
> Please feel free.  My proposal was shot down.

Are you talking about a very recent proposal? Linked to the multipart-TLV
draft? Or something older? I vaguely remember some idea about
"generic transport" in IS-IS (or rather: outside the regular IS-IS instance).

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] New Version Notification for draft-pkaneria-lsr-multi-tlv-01.txt

2022-09-12 Thread Henk Smit
Hi Tony,

> Some exist today. There are many TLVs where they have never been specified.

My point was: multipart TLVs exist today, before the introduction of the
capability advertisement. So when you look at a LSPDB, you still don't know for
sure which routers support multipart TLVs. Some might, but don't advertise it.
Because their software was written before the new capability existed.

>> In the end, every detail, will get its own router capability.
>  That's correct. It will. That's going to happen independently of this draft.

I hope not. And I will oppose those attempts too.

> we still do not have an effective management plane and
> continue to stuff things into the LSDB that belong in the management plane.

Yes. But that does not make it an excuse to put just anything in the LSPDB.

I've seen you in this work-group as someone who tried to keep things out of
IS-IS that don't belong there. I am surprised to see you want this capability 
in.

> The entire definition of a Flex Algo topology constraint should be
> in the management plane.

Sure. But at least the routers do make route-calculation decisions based on
that information.

> That's not an excuse for not trying to do a good job now.

That is the whole question. This capability is adding 2 more octets to LSPs.
Is that worth it? What if indeed a few dozen drafts will follow to advertise
more of these capabilities?

Should we define a new top-level TLV for "feature/software support capability"?
Not whether something is configured or not (as does the router-capability TLV),
but whether a router's software has that capability to begin with.
Or should we define a new variable-length bitmask sub-TLV for the existing 
router
capability TLV. Where every bit indicates another piece of software the router
supports?

Regardless whether we do that or not, this discussion maybe should be done
outside the multipart TLV  discussion. Maybe another draft should be written
about these software-capabilities in general?

henk.



> Op 11-09-2022 21:32 schreef Tony Li :
> 
>  
> Hi Henk,
> 
> > > If we want to introduce MP-TLVs, that change would warrant the existence 
> > > of the flag.
> > 
> > Multipart TLVs already exist today. 
> 
> 
> Some exist today.  There are many TLVs where they have never been specified.
> 
> 
> > As discussed here, after introducing a "software capability TLV", if a 
> > router doesn't
> > advertise any of those new capabilities, we still don't know whether that 
> > router supports
> > multipart TLVs or not.
> > So the new capability seems to have limited value.
> 
> 
> In particular, the current proposal on the table is to have the capability 
> apply to the TLVs where multi-part has not been previously defined.
> 
> 
> > > I dispute that a binary flag warrants the word 'complexity'.
> > 
> > You might think of a single bit now. 
> > But people might want to add more. What TLVs on a router can and can not be 
> > received
> > multipart? What about sub-TLVs?
> 
> 
> We are not proposing that level of specificity. It’s all or nothing.
> 
> 
> > It seems these days we have more people in the LSR work-group that prefer 
> > to write drafts
> > than that prefer to write code.
> 
> 
> Ok, I’m offended.
> 
> 
> > I fear a large amount of drafts about router capabilities to
> > advertise support for every bit, every TLV and every sub-TLV in LSPs. In 
> > the end, every
> > feature, every detail or option in a feature, will get its own router 
> > capability.
> 
> 
> That’s correct. It will. That’s going to happen independently of this draft.
> 
> 
> > I'm happy nobody wants routers to react to advertised software 
> > capabilities. 
> > But if routers don't react to info in a LSP, I don't think that info 
> > belongs in the control-plane.
> > It belongs in the management-plane.
> 
> 
> Thank you, but we still (40 years in?) do not have an effective management 
> plane and continue to stuff things into the LSDB that belong in the 
> management plane.
> 
> 
> > That's a different thing, imho. It's a single exception. It's very useful 
> > to identify LSPs and routers.
> > I don't see other management information we should put in LSPs.
> 
> 
> There’s tons of stuff.  The entire definition of a Flex Algo topology 
> constraint should be in the management plane. Almost everything that is in 
> the router capability TLV should be in the management plane.
> 
> We stepped down this slippery slope a long, long, time ago.
> 
> 
> > Twenty-five years ago, you and me were the first people to implement 
> > support for wide metrics.
> > I came up with a strategy to migrate a running network.
> > I introduced the "metric-style [wide|narrow]" command. That was supposed to 
> > be a temporary thing.
> > Just for use in the next 1-2 years, when every IS-IS network was migrating 
> > to wide metrics.
> 
> 
> And it’s still there.
> 
> 
> > When I came back to work, in 2015, I saw that the Nokia SR-OS routers still 
> > have narrow metrics as
> > the default 

Re: [Lsr] New Version Notification for draft-pkaneria-lsr-multi-tlv-01.txt

2022-09-11 Thread Henk Smit



Hi Tony, 


 > If we want to introduce MP-TLVs, that change would warrant the existence of 
 > the flag. 


 Multipart TLVs already exist today. 
As discussed here, after introducing a "software capability TLV", if a router 
doesn't 
 advertise any of those new capabilities, we still don't know whether that 
router supports 
 multipart TLVs or not. 
 So the new capability seems to have limited value. 


 > I dispute that a binary flag warrants the word 'complexity'. 


 You might think of a single bit now. 
But people might want to add more. What TLVs on a router can and can not be 
received 
 multipart? What about sub-TLVs? 


 > We are not allowing that level of granularity. A system that is 
> going to support MP-TLVs should take care to operate correctly 
> for ALL TLVs before advertising that it supports them. 


 It seems these days we have more people in the LSR work-group that prefer to 
write drafts 
 than that prefer to write code. I fear a large amount of drafts about router 
capabilities to 
 advertise support for every bit, every TLV and every sub-TLV in LSPs. In the 
end, every 
 feature, every detail or option in a feature, will get its own router 
capability. 





 I'm happy nobody wants routers to react to advertised software capabilities. 
But if routers don't react to info in a LSP, I don't think that info belongs in 
the control-plane. 
 It belongs in the management-plane. 


 > We have been sending management information in the LSDB since 
> we introduced the hostname TLV. 


 That's a different thing, imho. It's a single exception. It's very useful to 
identify LSPs and routers. 
 I don't see other management information we should put in LSPs. 





 Twenty-five years ago, you and me were the first people to implement support 
for wide metrics. 
 I came up with a strategy to migrate a running network. 
 I introduced the "metric-style [wide|narrow]" command. That was supposed to be 
a temporary thing. 
 Just for use in the next 1-2 years, when every IS-IS network was migrating to 
wide metrics. 


 When I came back to work, in 2015, I saw that the Nokia SR-OS routers still 
have narrow metrics as 
 the default setting. I laughed. Now I am back at cisco, and I see that IOS-XR 
also has narrow metrics 
 as the default. I cried. FYI, both OSes had their FCS many years after 1997. 
If I am not mistaken, JunOS has "metric-style both" as the default. 
So on all these 3 OSes, you need to explicitly configure "metric-style wide". 
Twenty five years after the migration . 


 I fear that the same will happen with your router-capability. The new 
capability will have some 
 value now. To help migrate to a network where all boxes support multipart 
TLVs. 
 But 1-2 years from now, all (major) IS-IS implementations will support 
multipart TLVs for all TLVs. 
 And then the new router-capability will have no use anymore. But routers all 
around the world will 
 still advertise it. I predict that 25 years from now, 23 years after all IS-IS 
implementations started 
 to support multi-part TLVs, routers will still advertise your capability. 


 I don't like that. 


 henk. 

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] LSR WG Adoption Poll for "IS-IS Topology-Transparent Zone" - draft-chen-isis-ttz-11.txt

2020-08-19 Thread Henk Smit

I object the introduction of a new major concept, called "zone".
It adds nothing to solve problems we can not already solve.
It just adds unnecessary complexity and technical debt.

(12) In protocol design, perfection has been reached not when there
 is nothing left to add, but when there is nothing left to take 
away.


henk.


Acee Lindem (acee) schreef op 2020-08-18 16:16:

Based on the discussions in the last meeting and on the mailing list
regarding draft-chen-isis-ttz-11, the chairs feel that there are
enough differences with draft-ietf-lsr-isis-area-proxy-03 and in the
community to consider advancing it independently on the experimental
track.

These differences include abstraction at arbitrary boundaries and
IS-IS extensions for smooth transition to/from zone abstraction.

We are now starting an LSR WG adoption call for
draft-chen-isis-ttz-11.txt. Please indicate your support or objection
to adoption prior to Tuesday, September 2nd, 2020.

Thanks,

Acee and Chris
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Request WG adoption of TTZ

2020-07-16 Thread Henk Smit



Hello Tianran,

Warning, long email again.


What's the criterion to evaluate the benefit?


As people have asked before, did any provider or
enterprise ever use rfc8099 in their network ?

As I wrote, one of my criteria is rfc1925. I like
technology to be understandable. I like protocols to
be (relatively) easy to implement. The more unused
cruft there is, the further we get away from that goal.


I'll give you an example. Did you, or your company ever
implement rfc2973 ? That's mesh-groups in IS-IS.
I'm sure some customers put it on their wishlist.
Did any provider or customer ever use it ?
I asked this question at my last job, and nobody knew the
answer. I suspect nobody in the world ever used mesh-groups.

Around the time I got in touch with IS-IS, in spring 1996,
there was a problem that was seen 2 of the 3 largest ISPs
in the US (UUnet and iMCI). Both networks melted because
of IS-IS. All routers in their networks were 100% cpu
time running IS-IS, busy exchanging LSPs. While no progress
was made. The only solution was to reboot all routers in
the backbone at the same time (several hundred routers).
This happened more than once in both networks.

To relieve the burden of flooding, mesh-groups were
implemented, and rfc2973 was written. However, a short
while later I became the sole IS-IS programmer for that
router vendor. I was able to reproduce the problem in the lab.
I then realized what the issue was. A fix of 10 lines
of extra code fixed the problem. No customer ever reported
those meltdowns again. That fix was the real solution.
Not writing another RFC.

In the mean-time, we have an extra RFC, about mesh-groups.
Every book and manual on IS-IS has to spent time explaining
what mesh-groups are. Every vendor has to implement it.
Even when nobody in the world is using it. Mesh-groups were
a superfluous idea. What I (and many others) are saying is
that we don't want to specify and implement unnecessary things.
Even when nobody is using such a thing, it will live on forever.


What I see the TTZ does have benefit.


Yes, TTZ and proxy-areas have benefit. Nobody is disagreeing.

But what people don't like is the new concept of a zone.
If you can abstract exactly one area into exactly one proxy-LSP,
that is good enough for 99.9 % of cases. In OSPF it is harder to
split or merge an area. In IS-IS it is a lot easier. So a
network operator can design and change his areas first. And
then implement proxy-areas as she/he wishes. Without much
downtime.

If we introduce the concept of a "zone", someone is going to
have to explain that to everybody in the world who uses IS-IS.
Have you ever taught a class on IS-IS to people who don't know
routing protocols very well ?


I am also wandering how it hurts the protocol in the long run ?


Adding stuff that nobody uses makes everything more complex.
I know it seems as if the goal over the last 15 years was to make
every thing more complex. So what's the problem with adding yet
another RFC ?

But I like simple things.

henk.


Tianran Zhou wrote on 2020-07-16 02:41:


> "Adding a new concept, with very little benefit, hurts the protocol in
> the long run. The ability to abstract an area, and not also a zone, is
> strong enough to be worthwhile, imho."

Your conclusion here seems very subjective.
What's the criterion the evaluate the benefit? What I see the TTZ does
have benefit.
I am also wandering how it hurts the protocol in the long run?


Tianran


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Request WG adoption of TTZ

2020-07-15 Thread Henk Smit

Huaimo Chen wrote on 2020-07-14 06:09:


 2). IS-IS TTZ abstracts a zone to a single node. A zone is any target
block or piece of an IS-IS area, which is to be abstracted. This seems
more flexible and convenient to users.


I don't agree that this convenience is really beneficial.
I actually think this convenience is a downside.


Link-state protocols are not easy to understand. And we already
have the misfortune that IS-IS and OSPF use different names for things.
Adding the new concept of a "zone", while we already have the
concept of an area makes things only more complex.

How often will this new flexibility be used in the real world ?
I still haven't seen an answer to Christian Hopp's simple question:
"Has RFC8099 been deployed by anyone ?"
Anyone who has an answer ?

My favorite rule of RFC1925 is rule 12:
   In protocol design, perfection has been reached not when there is
   nothing left to add, but when there is nothing left to take away.

Adding a new concept, with very little benefit, hurts the protocol
in the long run. The ability to abstract an area, and not also a zone,
is strong enough to be worthwhile, imho.

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] WG adoption call for draft-li-lsr-isis-area-proxy-06

2020-06-15 Thread Henk Smit



It’s very clear that this is inadequate.


It's not so clear to me, sorry.
Does anyone have an example (link or jpg) of a (sensible) topology
that would not work with multiple levels of hierarchy, but works
nicely/better with area-proxies (or FRs) ? Just curious.


The structure of legacy
IS-IS areas effectively precludes a scalable network for using lower
levels for transit. This constrains ISPs to ‘cauliflower’ topology
where you have L1 on the outside, L2 just inside of L1, L3 inside of
L2, etc.


I understand. L1-8 forces a hierarchical network designs.
But even if one would have the tools to design a non-hierarchical
network, that doesn't mean one should do so. :-)


We already see networks who are unwilling to use the two levels that
we have today due to this constraint.


I think L1-8 levels would be a good starting principle for designing 
large networks.

If there are spots in the network where the hierarchical constraints
are a problem in the real world, indeed it would be nice to have tools
like area-proxies in the tool-set, to help solve those problems.

I would like to have both tools.
I think you do too (as you are author of both drafts).

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] WG adoption call for draft-li-lsr-isis-area-proxy-06

2020-06-15 Thread Henk Smit



I support the area-proxy draft.


I think both the area-proxy draft and the flood-reflection drafts are
a bit hacky. However, the result of the area-proxy draft has a certain
elegance: only one L2 LSP per area in the backbone.

The flood-reflection draft is just messy, imho.
1) The edge-routers of each area are still visible in L2.
Making the L2 scaling benefits a factor less, compared to area-proxy.
2) You need a new tunneling technology to flood LSPs between
edge-routers and the flood-reflector. Even when you simply use TCP,
this adds new and unnecessary complexity.
3) You either need to tunnel user-traffic between edge-routers.
Or you redistribute all L2 prefixes into L1. Negating the benefit of
having L1-only routers in your transit area. I don't like either option.


BTW, personally I think the proper solution to scale IS-IS to larger
networks is 8 levels of hierarchy. Too bad that idea gets so little
push from vendors and operators.

henk.


Christian Hopps schreef op 2020-06-10 21:27:

This begins a 2 week WG adoption call for the following draft:

  https://datatracker.ietf.org/doc/draft-li-lsr-isis-area-proxy/

The draft would be adopted on the Experimental track.

Please indicate your support or objection by June 24, 2020.

Authors, please respond to the list indicating whether you are aware
of any IPR that applies to this draft.

Thanks,
Chris and Acee.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough

2020-05-04 Thread Henk Smit



Mitchel wrote:

IS-IS has two levels of neighbors via hello level 1s (LSAs) and hello
level  2s :, so immediate is somewhat relative..


As Tony said, Level-2 neighbors are still directly adjacent.
There might be layer-2 switches between them.
But there are never layer-3 routers between 2 adjacent level-2 
neighbors.


Les's point is that interfaces, linecards, and the interface
between the data-plane and the control-plane can all be seen
as points between 2 ISIS instances/process on two different
routers, where ISIS messages might be dropped. And that
therefor you need congestion-control (in stead of, or added
to) receiver-side flow-control.


Sorry, I disagree, Link capacity is always an issue..


Note, we're not trying to find the maximum number of LSPs
we can transmit. We just want to improve the speed a bit.
From 33 LSPs/sec today to 10k LSPs/sec or something in
that order. There's no need to send 10 million LSPs/sec.

Suppose the average LSP is 500 bytes.
Suppose a router sends 10k LSPs per second.
I think if ISISes can send 10k LSPs/sec, we've solved the problem
for 99.99% of networks.

10k LSPs is 5 000 000 bytes. Is 40 000 000 bits. Is 40 Mbps.
So a continuous stream of 10k LSPs/sec takes 40 Mbps to transmit.

For LSP-flooding, bandwidth itself is never the problem.

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough

2020-05-04 Thread Henk Smit



On Friday I wrote:

I still think we'll end up re-implementing a new (and weaker) TCP.


Christian Hopps wrote 2020-05-04 01:27:

Let's not be too cynical at the start though! :)


I wasn't trying to be cynical.
Let me explain my line of reasoning two years ago.

When reading about the proposals for limiting the flooding topology
in IS-IS, I read a requirement doc. It said that the goal was to
support areas (flooding domains) of 10k routers. Or maybe even 100k
routers. My immediate thought was: "how are you gonna sync the LSDB
when a router boots up ? That takes 300 to 3000 seconds !?".

This is the problem I wanted to solve. I hadn't even thought of
routers in dense topologies that have 1k+ neighbors.


There are currently heathens that use BGP as IGP in their data-centers.
There's even a cult that is developing a new IGP on top of BGP (LSVR).
If they think BGP/BGP-LS/LSVR are good choices for an IGP, why is that ?
One reason is that people claim that BGP is more scalable. Note, when
doing "Internet-routing" with large number of prefixes, routers, or
some implementations of BGP, still sometimes need minutes, or dozens
of minutes to process and re-advetise all those prefixes. So when we
talk about minutes, why do people think BGP is so much more wonderful ?
I think it's TCP. TCP can transport lots of info quickly and 
efficiently.

And conceptually TCP is easy to understand for the user ("you write
into a socket, you read from a socket on the other box. done").

If TCP is good enough for BGP bulk-transport, it should be good
enough for IS-IS bulk-transport.

If there are issues with using TCP for routing-protocols, I'm sure
we've solved those by now (in our implementations). We can use those
same solutions/tweaks we use for BGP's TCP in ISIS's TCP. Or am I
too naive now ?

BTW, all the implementations I've worked with used regular TCP. All
the Open Source BGPs seem to be using the regular TCP in their
kernels. Can someone explain why TCP is good for BGP but not for IS-IS ?

Almost 24 years ago, I sat on a bench in Santa Cruz discussing protocols
with an engineer who had a lot more experience than I had, and still 
have.

He was designing LDP at the time (with Yakov). LDP also uses TCP.
He said "if we had to design IS-IS now, of course we'd use TCP as
transport now". I never forgot that.


The goal here is not to make IS-IS transport optimal. We don't need to
use maximum available bandwidth. I just happen to think we need the
same 2 elements that TCP has: sender-side congestion-avoidance and
receiver-side flow-control. I hope I have explained why sender-side
congestion-control in IS-IS is not enough (you don't get the feedback
you need to make it work). Les and others have tried to explain
why receiver-side flow-control is hard to implement (the receiving
IS-IS might not know about the state of its interfaces, linecards, etc).

That's why I think we need both.
And when we implement both, it'll start to look like TCP.
So why not use TCP itself ?
Or Quic ? Or another transport that's already implemented ?


I'd note that our environment is a bit more controlled than the
end-to-end internet environment. In IS-IS we are dealing with single
link (logical) so very simple solutions (CTS/RTS, ethernet PAUSE)
could be viable.


Les's argument is that it's often not so controlled.

Let me ask you one question:
In your algorithm, the receiving IS-IS will send a "pause signal" when
it is overrun. How does IS-IS know it is overrun ? The router is 
dropping
IS-IS pdu's on the interface, on the linecard, on the queue between 
linecards
and Control Plane, on the IS-IS process's input-queue. When queues are 
full,
you can't send a message up saying "we didn't have space for an IS-IS 
message,
but we're sending you this message that we've just dropped an IS-IS 
message".

How do you envision this works ?

Imho receiver-side flow-control can only send a rough upper-bound on how 
many

pdu's it can receive normally.

A solution with a "pause signal" is basically the same as a 
receiver-side

flow-control, where the receive-window is either 0 or infinite.


Thus our choice of algorithms may well be less restricted.


I'm looking forward to seeing (an outline of) your algorithm.

Again, I'm not pushing for TCP (anymore). I'm not pushing for anything.
I'm just trying to explain the problems that I see with solutions
that are, imho, a bit too simple to really help. Maybe I'm wrong, and
the problem is simpler than I think. Experimentation would be nice.

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


[Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough

2020-04-30 Thread Henk Smit



Hello all,

Two years ago, Gunter Van de Velde and myself published this draft:
https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00
That started this discussion about flow/congestion control and ISIS 
flooding.


My thoughts were that once we start implementing new algorithms to
optimize ISIS flooding speed, we'll end up with our own version of TCP.
I think most people here have a good general understanding of TCP.
But if not, this is a good overview how TCP does it:
https://en.wikipedia.org/wiki/TCP_congestion_control


What does TCP do:

TCP does 2 things: flow control and congestion control.

1) Flow control is: the receiver trying to prevent itself from being
overloaded. The receiver indicates, through the receiver-window-size
in the TCP acks, how much data it can or wants to receive.
2) Congestion control is: the sender trying to prevent the links between
sender and receiver from being overloaded. The sender makes an educated
guess at what speed it can send.


The part we seem to be missing:

For the sender to make a guess at what speed it can send, it looks at
how the transmission is behaving. Are there drops ? What is the RTT ?
Do drop-percentage and RTT change ? Do acks come in at the same rate
as the sender sends segments ? Are there duplicate acks ? To be able
to do this, the sender must know what to expect. How acks behave.

If you want an ISIS sender to make a guess at what speed it can send,
without changing the protocol, the only thing the sender can do is look
at the PSNPs that come back from the receiver. But the RTT of PSNPs can
not be predicted. Because a good ISIS implementation does not 
immediately
send a PSNP when it receives a LSP. 1) the receiver should jitter the 
PSNP,
like it should jitter all packets. And 2) the receiver should wait a 
little

to see if it can combine multiple acks into a single PSNP packet.

In TCP, if a single segment gets lost, each new segment will cause the
receiver to send an ack with the seqnr of the last received byte. This
is called "duplicate acks". This triggers the sender to do
fast-retransmission. In ISIS, this can't be be done. The information
a sender can get from looking at incoming PSNPs is a lot less than what
TCP can learn from incoming acks.


The problem with sender-side congestion control:

In ISIS, all we know is that the default retransmit-interval is 5 
seconds.
And I think most implementations use that as the default. This means 
that
the receiver of an LSP has one requirement: send a PSNP within 5 
seconds.
For the rest, implementations are free to send PSNPs however and 
whenever

they want. This means a sender can not really make conclusions about
flooding speed, dropped LSPs, capacity of the receiver, etc.
There is no ordering when flooding LSPs, or sending PSNPs. This makes
a sender-side algorithm for ISIS a lot harder.

When you think about it, you realize that a sender should wait the
full 5 seconds before it can make any real conclusions about dropped 
LSPs.
If a sender looks at PSNPs to determine its flooding speed, it will 
probably
not be able to react without a delay of a few seconds. A sender might 
send

hunderds or thousands of LSPs in those 5 seconds, which might all or
partially be dropped, complicating matters even further.


A sender-sider algorithm should specify how to do PSNPs.

So imho a sender-side only algorithm can't work just like that in a
multi-vendor environment. We must not only specify a congestion-control
algorithm for the sender. We must also specify for the receiver a more
specific algorithm how and when to send PSNPs. At least how to do PSNPs
under load.

Note that this might result in the receiver sending more (and smaller) 
PSNPs.

More packets might mean more congestion (inside routers).


Will receiver-side flow-control work ?

I don't know if that's enough. It will certainly help.

I think to tackle this problem, we need 3 parts:
1) sender-side congestion-control algorithm
2) more detailed algorithm on receiver when and how to send PSNPs
3) receiver-side flow-control mechanism

As discussed at length, I don't know if the ISIS process on the 
receiving

router can actually know if its running out of resources (buffers on
interfaces, linecards, etc). That's implementation dependent. A receiver
can definitely advertise a fixed value. So the sender has an upper bound
to use when doing congestion-control. Just like TCP has both a 
flow-control
window and a congestion-control window, and a sender uses both. Maybe 
the
receiver can even advertise a dynamic value. Maybe now, maybe only in 
the

future. An advertised upper limit seems useful to me today.


What I didn't like about our own proposal (flooding over TCP):

The problem I saw with flooding over TCP concerns multi-point networks 
(LANs).


When flooding over a multi-point network, setting up TCP connections
introduces serious challenges. Who are the endpoints of the TCP 
connections ?

Full mesh ? Or 

Re: [Lsr] Dynamic flow control for flooding

2019-07-25 Thread Henk Smit


Hello Les,

Thanks for taking the time to respond.


[Les:] Base specification defines partialSNPInterval (2 seconds).
Clearly w faster flooding we should look at decreasing this
timer - but we certainly should not do away with it.


That was the point I was trying to make:
You kept mentioning that your "tx based flow control" only needed
changes to the internal implementation of the LSP-sender.
That's not the case. Your algorithm also depends on behaviour
of the LSP-receiver. I did not see that mentioned anywhere before.
Good to see that you (and Tony) now acknowledge this necessity.

I hope you also realize (and agree) that changing the algorithm
to send PSNPs on the LSP-receiver, in a way to improve the
flow-control algorithm for the LSP-sender, will probably have a
negative impact on the current efficiency of bundling acks in
PSNPs. And that change can multiply the number of PSNPs (and thus
ISIS PDUs in input queues) that need to be received on routers.

If you don’t like the name we can certainly find something more 
appealing.


I don't care much about the name.
(In general I do care about naming in programming. And even 10x more 
about

naming in protocol documents. But that's not important in the discussion
at the moment).

The point I was trying to get across is that your proposal is not
something that happens internally on a single individual router. It is
an algorithm that involves 2 routers. And thus it is a protocol issue.


What I am proposing does not require protocol extensions -
therefore no draft is required.


Protocols do no only describe octets on the wire. They also describe
behaviour. Thus, as Tony has already said, your proposed algorithm
also need to be documented. In an RFC probably.


Whether a BCP draft is desired is something I am open to considering.


I don't know much about process in the IETF. But I was always under
the assumptions that BCPs were mostly network design/configuration
recommendations for network operators.


From an earlier email:

[Les:] I think you know what I am about to say.. :)


Yes, my question of why use exponential backoffs was a rethorical
question (as I wrote at the end of my email).
I wrote:
I hope it is clear to everyone that these are not serious questions. 
I'm

just saying: "sometimes fast is slow".


FYI, few people probably know this, but I happen to be the guy that
intially came up with the idea of exponential backoffs in IGPs.
(Back in 1999 when I was at cisco).

Anyway, to reiterate my point: "sometimes fast is slow". It seems we
now all agree that sending LSPs "rapidly" and then assuming 
retransmissions

will fix any problems, is an approach that is way too naive. Good.

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Henk Smit

Les Ginsberg (ginsberg) schreef op 2019-07-23 22:29:


It is a mistake to equate LSP flooding with a set of independent P2P
“connections” – each of which can operate at a rate independent
of the other.


Of course, if some routers are slow, convergence in parts of the network
might be slow. But as Stephane has already suggested, it is up to the
operators to determine if slower convergence in parts of their network
is acceptable. E.g. they can chose to put fast/expensive/new routers
in the center of their network, and move older routers to, or buy 
cheaper

routers for, the edges of their network.


But I have a question for you, Les:

During your talk, or maybe in older emails, I was under the impression
that you wanted to warn for another problem. Namely microloops.
I am not sure I understand correctly. So let me try to explain what
I understood. And please correct me if I am wrong.


Between the time a link breaks, and the first router(s) start to repair
the network, some traffic is dropped. Bad for that traffic of course. 
But

the network keeps functioning. Once all routers have re-converged and
adjusted their FIBs, everything is fine again.

In the time in between, between the first router adjusting its FIB and
the last router adjusting its FIB, you can have microloops. Microloops
multiply traffic. Which can cause the whole network to suffer of 
congestion.

Impacting traffic that did not (originally) go over the broken link.

So you want the transition from "wrong FIBs", that point over the broken
path, to "the final FIBs", where all traffic flows correctly, to have
that transition happen on all routers at once. That would make the 
network
go from "drop some traffic" to "forward over the new path" without a 
stage

of "some microloops" in between.

Am I correct ? Is this what you try to prevent ?
Is this why you want all flooding between routers go at the same speed ?

Thanks in advance,

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Henk Smit



Hello Robert,

Tony brought up the example of a partioned network.
But there are more examples.

E.g. in a network there is a router with a 1000 neighbors.
(When discussing distributed vs centralized flooding-topology
 reduction algorithms, I've been told these network designs exist).
When such a router reboots/crashes/comes back up, all 1000 neighbors
will create a new version of their own LSP. This causes a 1000 different
LSPs to be flooded through the network at the same time. Impacting every
router in the network.

The case I was thinking of myself, was when a router in a large network
boots. When it brings up a number of adjacencies, each neighbor will
try to synchronize its LSPDB with the newly booted router. As the newly
booted router will send emtpy CSNPs to each of its neighbors, each
neighbor will start sending the full LSPDB. If such a network has 10k
LSPs, and such a router has 100 neighbors, that router will receive 100 
* 10k
is 1 million LSPs. Having a faster and more efficient flooding 
transport,

with flow-control, will make a reboot in such a topology less painful.

(In that last case, creative use of the overload-bit could prevent 
black-holing
or microloops while ISIS synchronizes its LSPDB after a reboot. Just 
like we
used the overload-bit to solve the problem of slow convergence of BGP 
after
a reboot, 22 years ago. I have no idea if there are any implementations 
that
use the overload-bit to alleviate slow convergence of IS-IS after a 
reboot).


henk.


Robert Raszuk schreef op 2019-07-24 15:33:

Hey Henk & all,

If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long
as Tony mentioned the full flooding as Tony said may take 33 sec - is
this really a problem ?

Remember we are not talking about protocol convergence after link flap
or node going down. We are talking about serious network partitioning
which itself may have lasted for minutes, hours or days. While just
considering absolute numbers yelds desire to go faster and faster, if
we put things in the overall perspective is there really a problem to
be solved in the first place ?

Would there still be a problem if LSR WG recommends faster acking
maybe not for each LSP but for say 20 or 30 max ?

Thx,
R.


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Dynamic flow control for flooding

2019-07-24 Thread Henk Smit


Hello Les,

Les Ginsberg (ginsberg) wrote on 2019-07-24 07:17:


If you accept that, then it makes sense to look for the simplest way
to do flow control and that is decidedly not from the RX side. (I
expect Tony Li to disagree with that  – but I have already
outlined why it is more complex to do it from the Rx side.)


In your talk on Monday you called the idea in
draft-decraene-lsr-isis-flooding-speed-01 "receiver driven flow 
control".

You don't like that. You want "transmit based flow control".
You argued that you can do "transmit based flow control" on the sender 
only.

Therefor your algorithm is merely a "local trick".
And "local tricks" don't need RFCs. I agree with that.
But I don't agree that your algorithm is just a "local trick".


In your algorithm, a "sender" sends a number of LSPs to a receiver.
Without waiting for acks (PNSPs). Like in any sliding window protocol.
The sending router keeps an eye on the number of unacked LSPs.
And it determines how fast it can send more LSPs based on the current
number of unacked LSPs. Every time the sender receives a PSNP, it
knows the receiver got a number of LSPs, so it can increase its
send-window again, and then send more LSPs.
Correct ?

I agree that the core idea of this algorithm makes sense.
After all, it looks a lot like TCP.
I believe the authors of draft-decraene-lsr-isis-flooding-speed were
planning something like that for the next version of their draft.


However, I do not agree with the name "tx driven flow control".
I also do not agree that this algorithm is "a local trick".
Therefor I also do not think this algorithm doesn't need to be
documented (in an RFC).

In your "tx based flow control", the sender (tx) sends LSPs at a rate
that is derived from the rate at which it receives PSNPs. Therefor
it is the sender of the PSNPs that sets the speed of transmission !
So it is still the receiver (of LSPs) that controls the flow control.
The name "tx based flow control" is a little misleading, imho.


It is important to realize that the success of your algorithm actually
depends on the behaviour of the receiver. How does it send PSNPs ?
Does it send one PSNP per received LSP ? Or does it pack multiple acks
in one PSNP ? Does it send a PSNP immediatly, or does it wait a short
time ? Does it try to fill a PSNP to the max (putting ~90 acks in one
PSNP) ? Does the receiver does something in between ? I don't think
the behaviour is specified exactly anywhere.

I know about an IS-IS implementation from the nineties. When a router
would receive an LSP, it would a) set the SSN bit (for that 
LSP/interface),

and b) start the psnp-timer for that interface (if not already running).
The psnp-timer would expire 2 seconds later. The router would then walk
the LSPDB, find all LSPs with the SSN-bit set for that interface. And
then build a PSNP with acks for all those LSPs. The result would be
that: a) the first PSNP would be send 2 seconds (+/- jitter) after
receiving the first LSP, and b) the PSNP would include ~66 acks. (As
a router receiving at full speed would have received 66 LSPs in 2 
seconds).


For your "tx based flow control" algorithm to work properly, this has
to change. The receiving router must send PSNPs more quickly and more
aggressively. The result would be that there will be less acks in each
PSNP. And thus more PSNPs will be sent.

This makes us realize: in the current situation, if a router receives
a 1000 LSPs, and sends those LSPs to 64 neighbors, it would receive:
- the 1000 LSPs from an upstream neighbor, plus
- 1000/66 = 16 PSNPs from each downstream neighbor = 64 * 16 = 1024 
PSNPs.

This makes a total of ~2000K PDUs received.

If routers would send one PSNP per LSP (to have faster flow control),
then the router in this example would receive:
- the 1000 LSPs from an upstream neighbor, plus
- 1000 PSNPs from each downstream neighbor * 16 = 1600 PSNPs.
This makes a total of ~17000 PDUs received.

The total number of PDUs received on this router would go from 2K PDUs
to 17K PDUs.

Remember that the problem we're trying to solve here is to make sure
that routers do not get overrun on the receipt side with too many
packets too quickly. It seems an aggressive PSNP-scheme, to achieve
faster flow-control, is actually very counter-productive.

Of course the algorithm can be tweaked. E.g. TCP sends one ack per
every 2 received segments (if I'm not mistaken). If we do that here,
the number of PDUs would go down from 17K to 9K PDUs. What do you
propose ? How do you want the feedback of PSNPs to be quick, while
maintaining an efficient packing of multiple acks per PSNP ?


In any case, the points I'm trying to make here:
*) Your algorithm is not sender-driven, but still receiver-driven.
*) Your algorithm changes/dictates behaviour both on sender and 
receiver.
*) Interaction between a sender and a receiver is what we call a 
protocol.
   If you want to make this work, especially in multi-vendor 
environments,

   we need to document these 

Re: [Lsr] IS-IS over TCP

2018-11-07 Thread Henk Smit




Les Ginsberg (ginsberg) wrote 2018-11-07 17:06:


The problem that RFC6213 tries to solve is a case where one of the
neighbors is thinking that the other does not support BFD. And thus
the lack of BFD is not used as an indication that something is wrong.
Right ?


[Les:] This is not correct.
The key paragraph is in https://tools.ietf.org/html/rfc6213#section-2

" The problem with this solution is that it assumes that the
   transmission and receipt of IS-IS Hellos (IIHs) shares fate with
   forwarded data packets.  This is not a fair assumption to make given
   that the primary use of BFD is to protect IPv4 (and IPv6) 
forwarding,

   and IS-IS does not utilize IPv4 or IPv6 for sending or receiving its
   hellos."

We have seen cases where IPv4/IPv6 data packet delivery has been
compromised - but IS-IS PDU delivery was unaffected. This led to the
following behavior:

1)IS-IS exchanges hellos and bring adjacency up. Routes using the link
are installed
2)BFD session is started and comes up
3)After a time some problem occurs which only impact IP traffic - BFD
session goes down.
4)IS-IS adjacency is brought down due to BFD session down, but IS-IS
continues to send hellos. If they are successfully exchanged then
IS-IS adjacency is almost immediately restored and we resume
installing IP routes using the link even though BFD session never
comes up. Data traffic gets dropped.


Thanks for the explanation.
But that is kinda what I wanted to say.
BFD is failing (because the IP-path is failing), and IS-IS doesn't
realize this, because it thinks that BFD isn't being used (because
"BFD session never comes up".


The extensions in RFC 6213 allow IS-IS to know when both sides support
BFD, which means the sequence changes to:

1)IS-IS exchanges hellos - but adjacency remains in INIT state.
2)BFD session is initiated - IS-IS adjacency remains in INIT state
until BFD session comes up.

Thus IS-IS never installs routes using the link unless we know IP
traffic can be successfully delivered.
We can then use BFD both as a requirement to bring the adjacency up
AND as fast failure detection.


OK. What do you suggest we do to fix this failure case if we do
flooding over TCP ?

- We could make BFD mandatory when flooding over TCP ? If the IP-path
is broken, TCP will fail, but BFD will also fail ?

- We bring the adjacency to UP state, but we don't include it in our own
LSP immediately. Only after the TCP session has been established, we
advertise the new adjacency in our LSP. Would that be enough ? It would
stop routes from being calculated over the new adjacency.
Maybe wait until the TCP connection has been set up, and a pair of IIHs
has been exchanged over it ? (So authentication and other stuff can
be verified for the TCP session).
Or maybe even wait until IIHs have been sent, and then full sets of 
CSNPs

are exchanged in both directions ?

That last suggestion starts approaching the way OSPF does this. If I 
recall
correctly, OSPF will only include adjacencies in its type-1 LSA after 
DDs

have been exchanged, and the full LDSB has been synchronized. Would you
want IS-IS to do the same ?

- If the TCP session breaks, do you want to stop including the adjacency
in the LSP ? This will make things like NSF, process restart and control
plane failover much harder.


What if two routers can exchange IIHs and do proper flooding of
LSPs. But they can not exchange IP packets ? This could happen.
IS-IS does not have a way to deal with this.


[Les:] RFC 6213 was written precisely to address this case - and works
very well.


The fact that BFD is working does not mean it is 100% sure that aal
IP traffic will work. Failures might depend on protocol number,
portnumbers, packetsize, etc. I agree that it is likely that if BFD
works, all of IP will work. But it's no guarantee.
Likewise we have to decide how paranoia we want to be that if IIHs
are exchanged, how sure are we that TCP can exchange LSPs as well ?

Maybe a good compromise would be:
1) don't advertise the adjacency in your LSPs until the TCP flooding
connection has been established. (And maybe IIHs/CSNPs are exchanged).
2) after connection is fully up (IIHs and flooding works), use longer
time-outs to determine whether TCP is still working.
3) when the other side closes a TCP connection (by FIN or RST), don't
stop advertising the adjacency in your LSP immediately. In stead,
for the next 10 seconds or so, try to re-establish the TCP connection
first. If re-establishment doesn't work, then the router can stop
including the adjacency in its LSP.

This would prevent routers advertising new adjacencies that have a 
problem

with TCP. But if it works, and suddenly stops working, convergence is
slower (10 seconds or so). But the protocol has the ability to 
re-establish

the TCP session, to make it more flexible.

Would that be acceptable ?


[Les:] Actually they have. :-) That's why we wrote RFC 6213 - because
the problem has been seen in the field.


I was talking about a router 

Re: [Lsr] IS-IS over TCP

2018-11-07 Thread Henk Smit




Jeffrey Haas wrote 2018-11-07 20:56:

I guess my question to those who live in IGP land is how often is this 
a
problem?  In the case of an IGP, the backpressure means you have 
databases

that are out of sync and end up with bad forwarding.


As discussed below, if you have multiple flooding paths, and not all of
them are congested or throttled, when at least one copy of the LSP makes
it across, convergence will be fast.

Both iBGP and eBGP.  The two general issues are slow receivers (scale) 
or

responses to dropped packets.


Slow receivers are a problem for native flooding too.
Although I suspect you mean: after congestion problems, and the 
situation
improves, native flooding will recover quickly, while TCP might 
intentionally

keep things slow for a longer period of time. Correct ?
Yes, that is an issue.

The general experiment I recommend to people trying to do this sort of 
thing
is take your TCP stream of choice, pace it according to your 
transmission

needs, and then drop 5-30% of the packets.  Observe what happens.


Don't we have DSCP for this ?
And remember, the TCP connection will (almost) always be between two
directly connected endpoints.

TCP recovers fine.  But the hiccups can do bad things when timeliness 
is
expected.  For example, 3 second hold times for aggressive BGP peering 
may

time out.


Our proposal is to do only flooding over TCP.
Adjacency management is still done based on native IIHs (and BFD).
Even if TCP stalls the flooding, the adjacency should stay up.
With flooding over multiple paths, it should not be a fatal event.

I guess I'd restate my concern as "for this application, ensure that 
you're
okay with the results of stalled trasnmission".  Effectively, see the 
answer

to the question I asked above about native behaviors.


[Flooding happens over multiple paths. As long as one path is quick, 
convergence

 will be quick too].


This, I think, is a better point addressing my concerns.  Thanks.


I expect there will be more issues that need to be addressed.
E.g. an old rule of thumb is: don't generate a routing update (packet) 
unless
you are pretty sure you can send it right away, and the receiver can 
receive it.

Otherwise a lot of the actual communication might be stale information.

I'm not sure if people find this rule of thumb still relevant these 
days.
(I know people who do not). With an abundance of cpu-cores, memory and 
bandwidth,
it seems many problems of the past are not visible anymore. Unless you 
start
pushing beyond what most people do. But if you do care, it is advisable 
to keep
your TCP window-sizes small. Maybe at the default 16KB, maybe even 8KB 
or 4KB.
With a window-size of 4KB you might be able to still send a dozen 
average-size LSPs,
and those might get stuck/stale in TCP. But I think that's a good 
trade-off to get
syncing of large LSDBs. As long as you don't set the window-size to 64KB 
or larger.
And maybe even then, stale LSPs might be less of a problem than 
old-timers think.


henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] IS-IS over TCP

2018-11-07 Thread Henk Smit



Hi Jeff,

Jeffrey Haas wrote on 2018-11-06 05:20:

I'm ambivalent of the transport, but agree that TCP shouldn't be the 
default

answer.


I picked TCP because every router has a working TCP implementation.
And TCP is good enough for BGP. And thus also considered good enough
for LSVR. If that's the case, I'd assume it is good enough for IS-IS
as well.

It's easy to adjust our draft so that new transports can be introduced
over time. We can do TCP now. Add Quic later. And add other, new, better
transports later, when they become available.

I don't know much about Quic. But it seems the protocol and details are
not 100% stable yet. Maybe soon. So maybe we can do TCP now, and Quic 
later ?

Also, Quic might be easy to implement for router OSes that run on top
of Unix. But for OSes that use QNS, vxWorks, or something proprietary,
Quic might be more work. (It's up to others to decide if that's 
important

or not. I have no opinion on this matter myself).

My concerns that I tried raising via jabber summarize roughly as 
follows:
- TCP is prone to interesting backpressure issues, typically as a 
result of

  packet loss or slow receivers.


If a receiver is slow, that's the same situation as when IS-IS on the
neighbor is slow. Retransmissions happen. Retransmissions with 10589
flooding are fixed time (5 seconds). (I guess some modern 
implementations

do something smarter). So convergence would have been impacted with
native flooding too.

Note that our proposal only does TCP over directly connected routers.
I'm sorry to say I have little experience with behaviour of BGP in
real networks. Where did you observe these backpressure issues ? EBGP
or iBGP ? I expect to see more problems with iBGP, because iBGP goes
over multiple hops, which can cause all kinds of issues. EBGP is mostly
over directly connected interfaces. I expect TCP to behave much nicer
There. TCP behaviour of IS-IS flooding would resemble eBGP more than 
iBGP.

At least, that is my expectation.

Note. If you would do flooding over a tunnel, flooding over TCP might
be beneficial too. Because of tunnel overhead (e.g. GRE-headers) tunnels
usually have a smaller MTU. Therefor all max-sized LSPs will need to be
fragmented when sent over the tunnel. This is also (especially) true for
CSNPs. For packet-based tunneling-protocols, that means 2 packets for
each max-sized LSP or CSNP. When using TCP, the LSPs get spread out over
multiple segments, which should make reassembly a bit easier/cheaper.

- TCP timers can react poorly in some environments where you may want 
time
  sensitive things.  This includes something as long as 3 second BGP 
hold

  timers.


When you do flooding-over-tcp, then you don't need to send PSNPs (acks) 
or
do retransmission of LSPs. So you don't need timers for those. Things 
become

less time-sensitive (at the cost of potentially slower flooding).

- IGPs have a lot of interesting timer hacks to try to ensure that a 
given
  domain has a consistent database prior to running an SPF.  In the 
face of
  "stuck" flooding due to backpressure or other things, some of these 
may need

  to be revisited.


Again, it is my expectation that in case of problems with TCP, that same 
situation

would have been worse with native ISIS flooding.

Also note that in BGP, every update packet over every peering has 
significance.
If one gets delayed, it slow down overall convergence. In ISIS flooding, 
a router
will receive multiple copies of the same LSP. So if one TCP-connection 
is slow,
the router might still receive the same LSPs over other paths. And the 
impact

on overall convergence is likely to be less.

Of course, this implies that routers flood over more than 1 or 2 
interfaces. If
we do one of the flooding-reduction proposals, I hope we'll end up with 
a situation
where we have 3 or 4 redundant flooding topologies, so that routers will 
still
receive LSPs quickly over other topologies when the primary topology has 
problems.



It's been over a year since I looked at QUIC.  I agree with Tony that a
number of the properties it had on my last read are desirable.  I'd 
suggest
that its behavior (especially timers) in the event of packet loss 
should be

given a look at based on the comments above.


One other benefit of doing flooding over TCP is that part of the 
flooding
administration is now outsourced to TCP. And TCP usually runs in another 
thread
of process, inside or outside the kernel. This means that we'll 
automatically
get a light form of multi-threading. Less work for the IS-IS 
process/threads.
Quic runs in user-space. I don't know if that means it is a library, and 
functions
run in the user's thread/process. Or whether Quic is a separate 
process/thread.
If Quic runs in IS-IS's thread, it means we lose a cheap form 
performance

improvements because of multi-threading.

henk.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] IS-IS over TCP

2018-11-07 Thread Henk Smit
 1k adjacencies. If we want to improve the
protocol, I think we should also improve the situation where we need
to flood 10k LSPs over a single adjacency. That's the problem we try
to solve here.


My point here is that there are existing implementations which would
get no benefit from your proposal. It might be argued that someone
writing a new implementation may find it simpler to make use of a
transport mechanism like TCP - but I do not think there is compelling
data that demonstrates that the scalability of an implementation using
your proposal is better than that of many existing implementations.


When I worked at cisco in the nineties, that IS-IS implementation
could deal with 250 routers in a double full mesh. That's 500
adjacencies per router. Those routers were running 100MHz and 200MHz
mips cpus. Some were even cisco 7000s with 68040 cpus. That worked.
With my outdated knowledge, I can not understand who a datacenter
fabric where routers have 64 or 128 adjacencies could be a problem.
Not with a proper implementation. But we're trying to fix that too.


This then suggests that for existing implementations the main
motivation to support your proposal is to help other implementations
which have not optimized their existing implementations. :-)
Comments?


Proper IS-IS implementations should split up in 3 threads at least.
One thread for maintaining adjacencies, one for doing flooding and
one for doing SPFs (and route-installs). That way an SPF doesn't
hold up flooding. And heavy flooding or long SPFs don't break
adjacencies. How many implementations in the field do that ?
Heck, I've even seen today's implementations that do not split
off adjacency-maintenance as a separate process or thread.

Anyway, the question is: if you want to have 10k LSPs in your
flooding domain, do we depend on custom improvements, or do
we want something that's documented ?


One of the things that inspired me to do this proposal was LSVR.
LSVR uses BGP-LS to transport LSPs. Why ? The word on the street
is that "BGP scales so much better". Why does BGP scale so good ?
Imho there is one main reason: BGP uses TCP for transport.
Now I don't like LSVR (and I don't like BGP-LS). Because LSVR
seems to re-invent the wheel, with no real improvements over
IS-IS, except for the fact that it uses BGP which uses TCP.
If that's the main benefit of LSVR, then why not just use
TCP and be done with it ?


BTW, there is another reason for our proposal. With the incoming
drafts about flooding-topology-reduction, there is a new problem.
All these proposals have situations where non-flooding adjacencies
suddenly change to flooding adjacencies. When that happens, the
LSDBs need to be synchronized again. To do that, all of them
propose "just send a CSNP and be done with it". Well, the more
LSPs, the more CNSPs that need to be sent. With 10k LSPs that's
110 CSNPs. CSNPs are not reliable. This re-synchronization happens
when there is churn in the IGP. Are we sure CSNPs aren't dropped
somewhere ? Can we start sending LSPs because we know the neighbor
has sent all its CSNPs yet ? With reliable transport for LSPs and
SNPs these worst-case scenarios will improve.

Apologies for the long text.
I hope it explains our goals and proposal a bit more.

henk.





   Les



-Original Message-
From: Lsr  On Behalf Of Henk Smit
Sent: Monday, November 05, 2018 8:22 PM
To: tony...@tony.li
Cc: lsr@ietf.org
Subject: Re: [Lsr] IS-IS over TCP


Thanks, Tony.

We picked TCP because every router on the planet already has a TCP 
stack

in it.
That made it the obvious choice.

Our draft described a TVL in the IIHs to indicate a router's
ability to use TCP for flooding.
That TLV has several sub-TVLs.
1) the TCP port-number
2) an IPv4 address
3) and/or an IPv6 address

We can change the first sub-TVL so that it indicates:
1) 1 or 2 bytes indicating what protocol to use
2) the remainder of the sub-TLV is an indicator what port-number
or other identifier to use to connect over that protocol.

This way we can start improving IS-IS with TCP today.
And add/replace it with other protocols in the future.

henk.



tony...@tony.li schreef op 2018-11-06 04:51:
> Per the WG meeting, discussing on the list:
>
> This is good work and I support it.
>
> I would remind folks that TCP is NOT the only transport protocol
> available and that perhaps we should be considering QUIC while we’re
> at it.  In particular, flooding is a (relatively) low bandwidth
> operation in the modern network and we could avoid slow-start issues
> by using QUIC.
>
> Tony
>
> ___
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] IS-IS over TCP

2018-11-05 Thread Henk Smit


Thanks, Tony.

We picked TCP because every router on the planet already has a TCP stack 
in it.

That made it the obvious choice.

Our draft described a TVL in the IIHs to indicate a router's
ability to use TCP for flooding.
That TLV has several sub-TVLs.
1) the TCP port-number
2) an IPv4 address
3) and/or an IPv6 address

We can change the first sub-TVL so that it indicates:
1) 1 or 2 bytes indicating what protocol to use
2) the remainder of the sub-TLV is an indicator what port-number
   or other identifier to use to connect over that protocol.

This way we can start improving IS-IS with TCP today.
And add/replace it with other protocols in the future.

henk.



tony...@tony.li schreef op 2018-11-06 04:51:

Per the WG meeting, discussing on the list:

This is good work and I support it.

I would remind folks that TCP is NOT the only transport protocol
available and that perhaps we should be considering QUIC while we’re
at it.  In particular, flooding is a (relatively) low bandwidth
operation in the modern network and we could avoid slow-start issues
by using QUIC.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr