Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread tony . li

Robert,

> Apologies if I missed it but so far I understood that the new signalling from 
> the receiver could be different per each LSP sender (per each Hello). 


Correct.  We propose to piggyback feedback information in IIH’s, and *SNPs.


> Above I am suggesting that such signalling to be "global" per receiver 
> (ie.sent identical to all LSP senders) under moments of stress/congestion. 


How does that help?

Also, in systems where the area multiple forwarding plane silicon instances, 
the resources for two different interfaces may have very different control 
plane resources available.  A single response would not allow the receiver to 
accurately describe his situation.

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread Robert Raszuk
>
> > Perhaps such flag to "slow down guys" could be send by receiver
> uniformly to all peers when under LSP flooding congestion ?
>
> That’s effectively what we’re proposing, tho it need not be a binary
> flag.  It allows us to do simpler things saying “we’re running out of
> buffer space, please slow down a bit”. Again the goal is to find the
> optimal goodput.  Granularity in the feedback will be helpful.
>

Apologies if I missed it but so far I understood that the new signalling
from the receiver could be different per each LSP sender (per each Hello).

Above I am suggesting that such signalling to be "global" per receiver
(ie.sent identical to all LSP senders) under moments of stress/congestion.

Robert.
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread tony . li

Robert,

> For per peer flow control I do not get how receiver's ISIS process is to come 
> up with per peer timer if it may never see under congestion given peer's LSPs 
> (being dropped on the single RE cp queue or at the interface). 


I’m sorry, but I can’t parse this comment.  The intent is not for the receiver 
to specify a timer value.  The point is for the receiver to provider the LSP 
sender with feedback about available resources on the receiver. This can inform 
the sender’s computation of a reasonable transmit bandwidth.


> Perhaps such flag to "slow down guys" could be send by receiver uniformly to 
> all peers when under LSP flooding congestion ?


That’s effectively what we’re proposing, tho it need not be a binary flag.  It 
allows us to do simpler things saying “we’re running out of buffer space, 
please slow down a bit”. Again the goal is to find the optimal goodput.  
Granularity in the feedback will be helpful.

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread Robert Raszuk
Ahh ok.

I was under the assumption that flooding reduction is something we will
have sooner then LSP flow control. But maybe this was too optimistic.

- - -

For per peer flow control I do not get how receiver's ISIS process is to
come up with per peer timer if it may never see under congestion given
peer's LSPs (being dropped on the single RE cp queue or at the interface).

Perhaps such flag to "slow down guys" could be send by receiver uniformly
to all peers when under LSP flooding congestion ? And instead of
specifying an absolute number just mean: slow N times (ex: slow 2 times or
slow 4 times). Such timer could be either temporary with fixed duration or
in effect until relaxed explicitly.

LSP senders would still require to handle it on a per peer basis but as
indicated this is not an issue.

Thx.
R.

On Mon, Apr 27, 2020 at 8:57 PM  wrote:

>
> If we have 1000 of interfaces and all peers *all in the same time* will
> send us an LSP of max size of 1492 octets that our control plane buffer RAM
> size required to store them would be as huge as 1.5 MB. And that assumes we
> did not process any from arrival of the first to the arrival of the last
> one.
>
>
>
> And that’s only one LSP.  If they don’t stop there and each sends 1000
> LSPs, then you can have 10^6 incoming packets, requiring 1.6GB.
>
> Further, since the bottleneck is likely the queue of packet on the
> forwarding chip(s) to the CPU, this 1.6GB needs to exist on the forwarding
> silicon.  Needless to say, it doesn’t.
>
> Yes, the CPU can probably keep up with one of the peers. This implies that
> the forwarding plane queue grows at the rate that 999 peers are sending at.
> Thus, congestion.
>
> Tony
>
>
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread tony . li

> If we have 1000 of interfaces and all peers *all in the same time* will send 
> us an LSP of max size of 1492 octets that our control plane buffer RAM size 
> required to store them would be as huge as 1.5 MB. And that assumes we did 
> not process any from arrival of the first to the arrival of the last one. 


And that’s only one LSP.  If they don’t stop there and each sends 1000 LSPs, 
then you can have 10^6 incoming packets, requiring 1.6GB.

Further, since the bottleneck is likely the queue of packet on the forwarding 
chip(s) to the CPU, this 1.6GB needs to exist on the forwarding silicon.  
Needless to say, it doesn’t.

Yes, the CPU can probably keep up with one of the peers. This implies that the 
forwarding plane queue grows at the rate that 999 peers are sending at. Thus, 
congestion.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread Robert Raszuk
> Meanwhile, buffering is finite and control planes really can’t keep up.
> Forwarding is a parallel activity. The control plane is not.  This presents
> us with a situation where congestion is pretty much inevitable. We need to
> deal with it.


At least we are lucky that ISIS LSPs are produced by some other control
plane not by a silicon chip :)

If we have 1000 of interfaces and all peers *all in the same time* will
send us an LSP of max size of 1492 octets that our control plane buffer RAM
size required to store them would be as huge as 1.5 MB. And that assumes we
did not process any from arrival of the first to the arrival of the last
one.

Thx,
R.
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread tony . li

Hi Robert,

> Today from what I see operators (if they even change the default) normally 
> apply same timer value on all interfaces. If the timer is static what would 
> be the incentive for any implementation not to group interfaces with 
> identical transmit delay ? 


Why should the timer be static in an optimal system?  We want to avoid the need 
for the timer and have systems be adaptive.


> While this thread is very interesting I must observe that from my experience 
> the issue is usually on the receiver. If LSR would publish a one page 
> draft/rfc mandating that links state packets MUST or SHOULD be recognized and 
> separated from any other control plane traffic at the ingress interface level 
> (on their way to local RE/RP) we likely wouldn't be having such debate. 
> 
> Slowing senders just due to bad implementation of the receiving router is 
> IMHO a little suboptimal (not to say wrong) thing to do. 


Heck, we can do better than that: we can outlaw bad implementations of 
anything.  Do you think that will help? 

The fact of the matter is that even good implementations can congest.  As 
silicon continues to scale, the ratio of interface bandwidth to control plane 
processing power continues to shift. Silicon has completely taken over our 
forwarding planes and scales upwards, where a single chip is now forwardinging 
for hundreds of interfaces. Meanwhile, buffering is finite and control planes 
really can’t keep up. Forwarding is a parallel activity. The control plane is 
not.  This presents us with a situation where congestion is pretty much 
inevitable. We need to deal with it.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread Robert Raszuk
Hi Tony,


> You already have a per-interface flooding ‘queue' through the
> implementation of the SRM bit in the LSDB, which must be managed on a
> per-interface basis.
>

Today from what I see operators (if they even change the default) normally
apply same timer value on all interfaces. If the timer is static what would
be the incentive for any implementation not to group interfaces with
identical transmit delay ?

- - -

While this thread is very interesting I must observe that from my
experience the issue is usually on the receiver. If LSR would publish a one
page draft/rfc mandating that links state packets MUST or SHOULD be
recognized and separated from any other control plane traffic at the
ingress interface level (on their way to local RE/RP) we likely wouldn't be
having such debate.

Slowing senders just due to bad implementation of the receiving router is
IMHO a little suboptimal (not to say wrong) thing to do.

Kind regards,
R.
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread Robert Raszuk
> In both cases, this call, IMO a signaling capability from the receiver to
the sender.

So essentially you are asking for per peer flooding queue.

Now this get's a little bit of tricky (especially if you are dealing
with relatively small timers) if one peer sends you 1 ms, second 50 ms and
10th  250 ms.

Imagine that the LSP to be flooded to the 10th peer is already overwritten
due to new LSP but still sitting in the out queue ... do you drain that
queue and start over with new LSP or you in place replace the old one
keeping the running timer ?

I am just curious what will happen under the hood :)

Cheers,
R.




On Mon, Apr 27, 2020 at 3:02 PM  wrote:

> Hi Acee,
>
>
>
> Please see inline [Bruno2]
>
>
>
> *From:* Acee Lindem (acee) [mailto:a...@cisco.com]
> *Sent:* Monday, April 27, 2020 2:39 PM
> *To:* DECRAENE Bruno TGI/OLN; Robert Raszuk
> *Cc:* Les Ginsberg (ginsberg); lsr@ietf.org; Tony Przygienda
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Hi Bruno,
>
>
>
> *From: *Lsr  on behalf of Bruno Decraene <
> bruno.decra...@orange.com>
> *Date: *Monday, April 27, 2020 at 8:15 AM
> *To: *Robert Raszuk 
> *Cc: *"Les Ginsberg (ginsberg)" , "
> lsr@ietf.org" , Tony Przygienda 
> *Subject: *Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Robert,
>
>
>
> *From:* Robert Raszuk [mailto:rob...@raszuk.net]
> *Sent:* Monday, April 27, 2020 12:09 PM
> *To:* DECRAENE Bruno TGI/OLN
> *Cc:* Tony Przygienda; Les Ginsberg (ginsberg); lsr@ietf.org
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
>
>
> > Slow flooding increase the likelihood of multiple IGP SPF computations
>
>
>
> True.
>
>
>
> But if you keep your IGP nicely organized in area and levels, get rid of
> flooding anything incl. /32s domain wide to address bugs in MPLS
> architecture then your flooding radius is usually very small.
>
> [Bruno] First of all, the use of areas/levels brings tradeoffs. Then,
> after their initial design, networks grow and change.
>
>
>
> Coming back to flooding, if you have a core router with 50 IGP neighbors,
> the failure of this neighbor requires flooding 50 LSPs. At 33ms pacing
> between LSPs that’s a 1.6s delay/tax, before any computation & FIB update.
> As you see, it’s not related to the number of /32 nor the network diameter.
>
> Some may be fine with this additional 1.6s. Some may not.
>
>
>
> I’m not nearly as familiar with IS-IS deployments as OSPF. Are there any
> implementations that don’t offer configuration to override the 33ms
> inter-LSP interval?
>
> [Bruno2] AFAIK, all implementations allow the configuration of the
> inter-LSP interval.
>
> The question is which value do you set while not risking to overload your
> IS-IS neighbor. Which brings two issues:
>
> -  This depends on the receiver. E.g. High end router vs low end;
> PE with only 2 high adjacencies vs P with 50 adjacencies, router
> generation… While this is currently configured on the sender on a per
> receiver basis
>
> -  Even though the vendor know the design and how it is
> implemented, some vendors may not commit on scaling values supported by a
> given receiver.
>
>
>
> So there are two options:
>
> -  Have the receiver advertise static values (default but
> according to its platform capability=
>
> -  Use dynamic values. Flow control is likely to also require
> some signaling from the receiver to the sender.
>
>
>
> In both cases, this call, IMO a signaling capability from the receiver to
> the sender. The same signaling may then be used to advertise either
> static/default or dynamic values, depending on what the receiver prefer
> (some tend to prefer static values, some tend to prefer dynamic values)
>
>
>
> Thanks,
>
> BR
>
> --Bruno
>
> At Redback (circa 2000), our OSPF implementation defaulted to fast
> flooding and for the MinLSInterval and MinLSArrival OSPF values, you had to
> explicitly remove the fast flooding default if  you wanted to follow RFC
> 2328. Thanks,
> Acee
>
>
>
> Best
>
> --Bruno
>
>
>
>
>
> That in turn allows for both fast flooding and fast topology computation
> while only dealing with few external summaries. I am yet to see a
> practical case where a well designed network with today's ISIS requires
> flooding speedup.
>
>
>
> Best,
>
> R.
>
>
>
>
>
>
>
>
>
> On Mon, Apr 27, 2020 at 10:34 AM  wrote:
>
> Ø  ISIS flooding churn (and room for optimization) becomes a problem when
> node boots up (IMHO this is not a pro

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread Acee Lindem (acee)
Hi Bruno,

From: Lsr  on behalf of Bruno Decraene 

Date: Monday, April 27, 2020 at 8:15 AM
To: Robert Raszuk 
Cc: "Les Ginsberg (ginsberg)" , 
"lsr@ietf.org" , Tony Przygienda 
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Robert,

From: Robert Raszuk [mailto:rob...@raszuk.net]
Sent: Monday, April 27, 2020 12:09 PM
To: DECRAENE Bruno TGI/OLN
Cc: Tony Przygienda; Les Ginsberg (ginsberg); lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed


> Slow flooding increase the likelihood of multiple IGP SPF computations

True.

But if you keep your IGP nicely organized in area and levels, get rid of 
flooding anything incl. /32s domain wide to address bugs in MPLS architecture 
then your flooding radius is usually very small.
[Bruno] First of all, the use of areas/levels brings tradeoffs. Then, after 
their initial design, networks grow and change.

Coming back to flooding, if you have a core router with 50 IGP neighbors, the 
failure of this neighbor requires flooding 50 LSPs. At 33ms pacing between LSPs 
that’s a 1.6s delay/tax, before any computation & FIB update. As you see, it’s 
not related to the number of /32 nor the network diameter.
Some may be fine with this additional 1.6s. Some may not.

I’m not nearly as familiar with IS-IS deployments as OSPF. Are there any 
implementations that don’t offer configuration to override the 33ms inter-LSP 
interval? At Redback (circa 2000), our OSPF implementation defaulted to fast 
flooding and for the MinLSInterval and MinLSArrival OSPF values, you had to 
explicitly remove the fast flooding default if  you wanted to follow RFC 2328. 
Thanks,
Acee

Best
--Bruno


That in turn allows for both fast flooding and fast topology computation while 
only dealing with few external summaries. I am yet to see a practical case 
where a well designed network with today's ISIS requires flooding speedup.

Best,
R.




On Mon, Apr 27, 2020 at 10:34 AM 
mailto:bruno.decra...@orange.com>> wrote:

>  ISIS flooding churn (and room for optimization) becomes a problem when node 
> boots up (IMHO this is not a problem) and when node fails while having many 
> neighbors attached. Yes maybe second case could be improved but well designed 
> and operated network should have pre-programmed bypass paths against such 
> cases so IMO stressing IGP to "converge" faster while great in principle may 
> not be really needed in practice.


I don’t think that FRR is a replacement for “fast” (I’d rather say adequate) 
IGP convergence & flooding.

For multiple reasons such as:

-  Multiple ‘things’ depends on the IGP, such as BGP best path 
selection, CSPF/TE/PCE computations, FRR computations

-  Slow flooding increase the likelihood of multiple IGP SPF 
computations which is harmful for other computations which are typically 
heavier and manifolds (cf above)

-  Multiple IGP SPF computations also create multiple transient 
forwarding loops. There are some techniques to remove forwarding loops but this 
is still an advanced topic and some implementations do not handle consecutives 
IGP SPF (with ‘overlapping’ convergences and combined distributed forwarding 
loops)

-  For FRR, you mostly need to pre-decide/configure whether you want to 
protect link or node failures. Tradeoff are involved and given probability of 
events, link protection is usually enabled (hence not node protection)

-  …

Also, given the current “state of the art”, there is no stressing involved. 
Really. Using TCP, my 200€ mobile running on battery and over 
wifi+ADSL+Internet can achieve better communication throughput than a n*100k€ 
high end IS-IS router.
I think many persons agree that IS-IS could do better in term of flooding. 
(possibly not as good as a brand new approach, but incremental improvement also 
have some benefits). Eventually, we don’t need everyone to agree on this.



>  PS. Does anyone have a pointer to any real data showing that performance of 
> real life WAN ISIS deployments is bad ?

In some of our ASes, we do monitor IS-IS by listening to and recording flooded 
LSPs. I can’t share any data.
Next question could be what is “good enough”. I guess this may depend on the 
size of your network, its topology, and your requirements.

We also ran tests in labs. I may share some results during my presentation. (no 
names, possibly no KPI, but some high level outcomes).

Regards,
Bruno


From: Robert Raszuk [mailto:rob...@raszuk.net<mailto:rob...@raszuk.net>]
Sent: Friday, April 24, 2020 12:42 PM
To: DECRAENE Bruno TGI/OLN
Cc: Tony Przygienda; Les Ginsberg (ginsberg); lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hi Bruno  & all,

[Bruno] On my side, I’ll try once and I think the LSR WG should also try to 
improve IS-IS performance. May be if we want to move, we should first release 

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread Robert Raszuk
> Slow flooding increase the likelihood of multiple IGP SPF computations

True.

But if you keep your IGP nicely organized in area and levels, get rid of
flooding anything incl. /32s domain wide to address bugs in MPLS
architecture then your flooding radius is usually very small.

That in turn allows for both fast flooding and fast topology computation
while only dealing with few external summaries. I am yet to see a
practical case where a well designed network with today's ISIS requires
flooding speedup.

Best,
R.




On Mon, Apr 27, 2020 at 10:34 AM  wrote:

> Ø  ISIS flooding churn (and room for optimization) becomes a problem when
> node boots up (IMHO this is not a problem) and when node fails while having
> many neighbors attached. Yes maybe second case could be improved but well
> designed and operated network should have pre-programmed bypass paths
> against such cases so IMO stressing IGP to "converge" faster while great in
> principle may not be really needed in practice.
>
>
>
> I don’t think that FRR is a replacement for “fast” (I’d rather say
> adequate) IGP convergence & flooding.
>
> For multiple reasons such as:
>
> -  Multiple ‘things’ depends on the IGP, such as BGP best path
> selection, CSPF/TE/PCE computations, FRR computations
>
> -  Slow flooding increase the likelihood of multiple IGP SPF
> computations which is harmful for other computations which are typically
> heavier and manifolds (cf above)
>
> -  Multiple IGP SPF computations also create multiple transient
> forwarding loops. There are some techniques to remove forwarding loops but
> this is still an advanced topic and some implementations do not handle
> consecutives IGP SPF (with ‘overlapping’ convergences and combined
> distributed forwarding loops)
>
> -  For FRR, you mostly need to pre-decide/configure whether you
> want to protect link or node failures. Tradeoff are involved and given
> probability of events, link protection is usually enabled (hence not node
> protection)
>
> -  …
>
>
>
> Also, given the current “state of the art”, there is no stressing
> involved. Really. Using TCP, my 200€ mobile running on battery and over
> wifi+ADSL+Internet can achieve better communication throughput than a
> n*100k€ high end IS-IS router.
>
> I think many persons agree that IS-IS could do better in term of flooding..
> (possibly not as good as a brand new approach, but incremental improvement
> also have some benefits). Eventually, we don’t need everyone to agree on
> this.
>
>
>
> Ø  PS. Does anyone have a pointer to any real data showing that
> performance of real life WAN ISIS deployments is bad ?
>
>
>
> In some of our ASes, we do monitor IS-IS by listening to and recording
> flooded LSPs. I can’t share any data.
>
> Next question could be what is “good enough”. I guess this may depend on
> the size of your network, its topology, and your requirements.
>
>
>
> We also ran tests in labs. I may share some results during my
> presentation. (no names, possibly no KPI, but some high level outcomes).
>
>
>
> Regards,
>
> Bruno
>
>
>
>
>
> *From**:* Robert Raszuk [mailto:rob...@raszuk.net]
> *Sent:* Friday, April 24, 2020 12:42 PM
> *To:* DECRAENE Bruno TGI/OLN
> *Cc:* Tony Przygienda; Les Ginsberg (ginsberg); lsr@ietf.org
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Hi Bruno  & all,
>
>
>
> [Bruno] On my side, I’ll try once and I think the LSR WG should also try
> to improve IS-IS performance. May be if we want to move, we should first
> release the brakes.
>
>
>
> Well from my observations releasing the breaks means increasing the risks..
>
>
>
> Take BGP - breaks are off and see what happens :)
>
>
>
> My personal observation is that ISIS implementations across vendors are
> just fine for vast majority of deployments today. That actually also
> includes vast majority of compute clusters as they consists of max 10s of
> racks.
>
>
>
> Of course there are larger clusters with 1000+ or 10K and above network
> elements itself and x 20 L3 computes, but is there really a need to stretch
> protocol to accommodate those ? Those usually run BGP anyway. And also
> there is DV+LS hybrid too now.
>
>
>
> ISIS flooding churn (and room for optimization) becomes a problem when
> node boots up (IMHO this is not a problem) and when node fails while having
> many neighbors attached. Yes maybe second case could be improved but well
> designed and operated network should have pre-programmed bypass paths
> against such cases so IMO stressing IGP to "converge" faster while great in
> p

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-27 Thread bruno.decraene
Ø  ISIS flooding churn (and room for optimization) becomes a problem when node 
boots up (IMHO this is not a problem) and when node fails while having many 
neighbors attached. Yes maybe second case could be improved but well designed 
and operated network should have pre-programmed bypass paths against such cases 
so IMO stressing IGP to "converge" faster while great in principle may not be 
really needed in practice.


I don’t think that FRR is a replacement for “fast” (I’d rather say adequate) 
IGP convergence & flooding.

For multiple reasons such as:

-  Multiple ‘things’ depends on the IGP, such as BGP best path 
selection, CSPF/TE/PCE computations, FRR computations

-  Slow flooding increase the likelihood of multiple IGP SPF 
computations which is harmful for other computations which are typically 
heavier and manifolds (cf above)

-  Multiple IGP SPF computations also create multiple transient 
forwarding loops. There are some techniques to remove forwarding loops but this 
is still an advanced topic and some implementations do not handle consecutives 
IGP SPF (with ‘overlapping’ convergences and combined distributed forwarding 
loops)

-  For FRR, you mostly need to pre-decide/configure whether you want to 
protect link or node failures. Tradeoff are involved and given probability of 
events, link protection is usually enabled (hence not node protection)

-  …

Also, given the current “state of the art”, there is no stressing involved. 
Really. Using TCP, my 200€ mobile running on battery and over 
wifi+ADSL+Internet can achieve better communication throughput than a n*100k€ 
high end IS-IS router.
I think many persons agree that IS-IS could do better in term of flooding. 
(possibly not as good as a brand new approach, but incremental improvement also 
have some benefits). Eventually, we don’t need everyone to agree on this.



Ø  PS. Does anyone have a pointer to any real data showing that performance of 
real life WAN ISIS deployments is bad ?

In some of our ASes, we do monitor IS-IS by listening to and recording flooded 
LSPs. I can’t share any data.
Next question could be what is “good enough”. I guess this may depend on the 
size of your network, its topology, and your requirements.

We also ran tests in labs. I may share some results during my presentation. (no 
names, possibly no KPI, but some high level outcomes).

Regards,
Bruno


From: Robert Raszuk [mailto:rob...@raszuk.net]
Sent: Friday, April 24, 2020 12:42 PM
To: DECRAENE Bruno TGI/OLN
Cc: Tony Przygienda; Les Ginsberg (ginsberg); lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hi Bruno  & all,

[Bruno] On my side, I’ll try once and I think the LSR WG should also try to 
improve IS-IS performance. May be if we want to move, we should first release 
the brakes.

Well from my observations releasing the breaks means increasing the risks.

Take BGP - breaks are off and see what happens :)

My personal observation is that ISIS implementations across vendors are just 
fine for vast majority of deployments today. That actually also includes vast 
majority of compute clusters as they consists of max 10s of racks.

Of course there are larger clusters with 1000+ or 10K and above network 
elements itself and x 20 L3 computes, but is there really a need to stretch 
protocol to accommodate those ? Those usually run BGP anyway. And also there is 
DV+LS hybrid too now.

ISIS flooding churn (and room for optimization) becomes a problem when node 
boots up (IMHO this is not a problem) and when node fails while having many 
neighbors attached. Yes maybe second case could be improved but well designed 
and operated network should have pre-programmed bypass paths against such cases 
so IMO stressing IGP to "converge" faster while great in principle may not be 
really needed in practice.

Last I am worried that when IETF defines changes to core protocol behaviour the 
quality of the implementations of those changes may really differ across 
vendors overall resulting in much worse performance and stability as compared 
to where we are today.

I am just not sure if possible gains for few deployments are greater then risk 
for 1000s of today's deployments. Maybe one size does not fit all and for 
massive scale ISIS we should define a notion of "ISIS-DC-PLUGIN" which can be 
optionally in run time added when/if needed. If that requires protocol changes 
to accommodate such dynamic plugins - that work should take place.

Many thx,
R.

PS. Does anyone have a pointer to any real data showing that performance of 
real life WAN ISIS deployments is bad ?


On Fri, Apr 24, 2020 at 11:26 AM 
mailto:bruno.decra...@orange.com>> wrote:
Tony

From: Tony Przygienda [mailto:tonysi...@gmail.com<mailto:tonysi...@gmail.com>]
Sent: Thursday, April 23, 2020 7:29 PM
To: DECRAENE Bruno TGI/OLN
Cc: lsr@ietf.org<mailto:lsr@ietf.org>; Les Ginsbe

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-24 Thread Robert Raszuk
Hi Bruno  & all,

[Bruno] On my side, I’ll try once and I think the LSR WG should also try to
improve IS-IS performance. May be if we want to move, we should first
release the brakes.

Well from my observations releasing the breaks means increasing the risks.

Take BGP - breaks are off and see what happens :)

My personal observation is that ISIS implementations across vendors are
just fine for vast majority of deployments today. That actually also
includes vast majority of compute clusters as they consists of max 10s of
racks.

Of course there are larger clusters with 1000+ or 10K and above network
elements itself and x 20 L3 computes, but is there really a need to stretch
protocol to accommodate those ? Those usually run BGP anyway. And also
there is DV+LS hybrid too now.

ISIS flooding churn (and room for optimization) becomes a problem when node
boots up (IMHO this is not a problem) and when node fails while having many
neighbors attached. Yes maybe second case could be improved but well
designed and operated network should have pre-programmed bypass paths
against such cases so IMO stressing IGP to "converge" faster while great in
principle may not be really needed in practice.

Last I am worried that when IETF defines changes to core protocol behaviour
the quality of the implementations of those changes may really differ
across vendors overall resulting in much worse performance and stability as
compared to where we are today.

I am just not sure if possible gains for few deployments are greater
then risk for 1000s of today's deployments. Maybe one size does not fit all
and for massive scale ISIS we should define a notion of "ISIS-DC-PLUGIN"
which can be optionally in run time added when/if needed. If that requires
protocol changes to accommodate such dynamic plugins - that work should
take place.

Many thx,
R.

PS. Does anyone have a pointer to any real data showing that performance of
real life WAN ISIS deployments is bad ?


On Fri, Apr 24, 2020 at 11:26 AM  wrote:

> Tony
>
>
>
> *From:* Tony Przygienda [mailto:tonysi...@gmail.com]
> *Sent:* Thursday, April 23, 2020 7:29 PM
> *To:* DECRAENE Bruno TGI/OLN
> *Cc:* lsr@ietf.org; Les Ginsberg (ginsberg)
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> I was refering to RFC4960. Bruno, for all practical purposes I think that
> seems to go down the path of trying to reinvent RFC4960 (or ultimately use
> it).
>
> [Bruno] I don’t think that SCTP (RC4960) is a better fit than TCP.. Many
> more features and options than TCP, way more than needed given existing
> IS-IS flooding mechanism. Much less implementations experience and
> improvement than TCP.
>
> Or, changing the packet formats heavily to incorporate all the control
> loop data you need to the point you have a different control channel along
> those lines since you'll find most of the problems RFC4960 is describing
> (minus stuff like multiple paths).
>
> [Bruno] Really, adding one sub-TLV in IS-IS is not “changing the packet
> formats heavily”.
>
> Nothing wrong with that but it's ambitious on a 30 years old anitque
> artefact we're nursing forward here ;-)
>
> [Bruno] I’m perfectly fine with revolution approaches. I think that we can
> also provide incremental improvement to IS-IS.
>
> As entertaining footnote, I saw in last 20 years at least 3 attempts to
> allow multiple TCP sessions in BGP between peers to speed/prioritize things
> up. All failed, after the first one I helped to push I smarted up ;-)
>
>  [Bruno] On my side, I’ll try once and I think the LSR WG should also try
> to improve IS-IS performance. May be if we want to move, we should first
> release the brakes. I’m seen some progress, e.g., from “there is no need to
> improve flooding” to “we all agree to improve flooding”, or from “Network
> operator just need to configure existing CLI” to “We agree that we need
> something more automated/dynamic”. But this has been very slow progress
> over a year.
>
>
>
> --Bruno
>
>
>
> As another footnote: I looked @ all the stuff in RIFT (tcp, quic, 4960,
> more ephemeral stuff). I ended up adding to rift bunch very rudimentary
> things and do roughly what Les/Peter/Acee started to write (modulo algorith
> I contributed and bunch things that would be helpful but we can't fit into
> existing packet format). This was as much decision as to "what's available
> & well debugged" as well as "does it meet requirements" as "how complex is
> that vs. rtx in flooding architecture  we have today + some feedback". Not
> on powerpoint, in real production code ;-) rift draft shows you the outcome
> of that as IMO best trade-off to achieve high flooding speeds ;-)
>
>
>
> my 2c
>
>
>
> -- tony
>
>
&g

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-24 Thread bruno.decraene
Tony

From: Tony Przygienda [mailto:tonysi...@gmail.com]
Sent: Thursday, April 23, 2020 7:29 PM
To: DECRAENE Bruno TGI/OLN
Cc: lsr@ietf.org; Les Ginsberg (ginsberg)
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

I was refering to RFC4960. Bruno, for all practical purposes I think that seems 
to go down the path of trying to reinvent RFC4960 (or ultimately use it).
[Bruno] I don’t think that SCTP (RC4960) is a better fit than TCP. Many more 
features and options than TCP, way more than needed given existing IS-IS 
flooding mechanism. Much less implementations experience and improvement than 
TCP.
Or, changing the packet formats heavily to incorporate all the control loop 
data you need to the point you have a different control channel along those 
lines since you'll find most of the problems RFC4960 is describing (minus stuff 
like multiple paths).
[Bruno] Really, adding one sub-TLV in IS-IS is not “changing the packet formats 
heavily”.
Nothing wrong with that but it's ambitious on a 30 years old anitque artefact 
we're nursing forward here ;-)
[Bruno] I’m perfectly fine with revolution approaches. I think that we can also 
provide incremental improvement to IS-IS.
As entertaining footnote, I saw in last 20 years at least 3 attempts to allow 
multiple TCP sessions in BGP between peers to speed/prioritize things up. All 
failed, after the first one I helped to push I smarted up ;-)
 [Bruno] On my side, I’ll try once and I think the LSR WG should also try to 
improve IS-IS performance. May be if we want to move, we should first release 
the brakes. I’m seen some progress, e.g., from “there is no need to improve 
flooding” to “we all agree to improve flooding”, or from “Network operator just 
need to configure existing CLI” to “We agree that we need something more 
automated/dynamic”. But this has been very slow progress over a year.

--Bruno

As another footnote: I looked @ all the stuff in RIFT (tcp, quic, 4960, more 
ephemeral stuff). I ended up adding to rift bunch very rudimentary things and 
do roughly what Les/Peter/Acee started to write (modulo algorith I contributed 
and bunch things that would be helpful but we can't fit into existing packet 
format). This was as much decision as to "what's available & well debugged" as 
well as "does it meet requirements" as "how complex is that vs. rtx in flooding 
architecture  we have today + some feedback". Not on powerpoint, in real 
production code ;-) rift draft shows you the outcome of that as IMO best 
trade-off to achieve high flooding speeds ;-)

my 2c

-- tony

On Thu, Apr 23, 2020 at 10:15 AM 
mailto:bruno.decra...@orange.com>> wrote:
Tony,
Thanks for engaging.
Please inline [Bruno2]



From: Tony Przygienda [mailto:tonysi...@gmail.com<mailto:tonysi...@gmail.com>]
Sent: Wednesday, April 22, 2020 9:25 PM
To: DECRAENE Bruno TGI/OLN
Cc: lsr@ietf.org<mailto:lsr@ietf.org>; Les Ginsberg (ginsberg)
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed



On Wed, Apr 22, 2020 at 11:03 AM 
mailto:bruno.decra...@orange.com>> wrote:
Tony, all,

Thanks Tony for the technical and constructive feedback.
Please inline [Bruno]

From: Tony Przygienda [mailto:tonysi...@gmail.com<mailto:tonysi...@gmail.com>]
Sent: Wednesday, April 22, 2020 1:19 AM
To: Les Ginsberg (ginsberg)
Cc: DECRAENE Bruno TGI/OLN; lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

neither am I aware of anything like this (i.e. per platform/product flooding 
rate constants) in any major vendor stack for whatever that's worth. It's 
simply unmaintanable, point. All major vendors have extensive product lines 
over so many changing hardware configuration/setups it is simply not viable to 
attempt precise measurements (and even then, user changing config can throw the 
stuff off in a millisecond). There may have been here and there per deployment 
scenario some "recommended config" (not something I immediately recall either) 
but that means very fixed configuration of things & how they go into networks 
and even then I'm not aware of anyone having had a "precise model of the chain 
in the box". yes, probes to measure lots and lots of stuff in the boxes exist 
but again, those are chip/linecard/backplane/chassis/routing engine specific 
and mostly used in complex test/peformance scenarios and not to derive some 
kind of equations that can predict anything much ...
[Bruno] Good points.
Yet, one of the information that we propose to advertise by the LSP receiver to 
the LSP sender is the Receive Window.

-  This is a very common parameter and algorithm. Nothing fancy nor 
reinvented. In particular it’s a parameter used by TCP.

-  I would argue that TCP implementations also run on a variety of 
hardware and systems, must wider range than IS-IS platform. And those TCP 
implementations on all those platform manag

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-23 Thread Tony Przygienda
I was refering to RFC4960. Bruno, for all practical purposes I think that
seems to go down the path of trying to reinvent RFC4960 (or ultimately use
it). Or, changing the packet formats heavily to incorporate all the control
loop data you need to the point you have a different control channel along
those lines since you'll find most of the problems RFC4960 is describing
(minus stuff like multiple paths). Nothing wrong with that but it's
ambitious on a 30 years old anitque artefact we're nursing forward here ;-)
As entertaining footnote, I saw in last 20 years at least 3 attempts to
allow multiple TCP sessions in BGP between peers to speed/prioritize things
up. All failed, after the first one I helped to push I smarted up ;-)

As another footnote: I looked @ all the stuff in RIFT (tcp, quic, 4960,
more ephemeral stuff). I ended up adding to rift bunch very rudimentary
things and do roughly what Les/Peter/Acee started to write (modulo algorith
I contributed and bunch things that would be helpful but we can't fit into
existing packet format). This was as much decision as to "what's available
& well debugged" as well as "does it meet requirements" as "how complex is
that vs. rtx in flooding architecture  we have today + some feedback". Not
on powerpoint, in real production code ;-) rift draft shows you the outcome
of that as IMO best trade-off to achieve high flooding speeds ;-)

my 2c

-- tony

On Thu, Apr 23, 2020 at 10:15 AM  wrote:

> Tony,
>
> Thanks for engaging.
>
> Please inline [Bruno2]
>
>
>
>
>
>
>
> *From:* Tony Przygienda [mailto:tonysi...@gmail.com]
> *Sent:* Wednesday, April 22, 2020 9:25 PM
> *To:* DECRAENE Bruno TGI/OLN
> *Cc:* lsr@ietf.org; Les Ginsberg (ginsberg)
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
>
>
>
>
> On Wed, Apr 22, 2020 at 11:03 AM  wrote:
>
> Tony, all,
>
>
>
> Thanks Tony for the technical and constructive feedback.
>
> Please inline [Bruno]
>
>
>
> *From:* Tony Przygienda [mailto:tonysi...@gmail.com]
> *Sent:* Wednesday, April 22, 2020 1:19 AM
> *To:* Les Ginsberg (ginsberg)
> *Cc:* DECRAENE Bruno TGI/OLN; lsr@ietf.org
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> neither am I aware of anything like this (i.e. per platform/product
> flooding rate constants) in any major vendor stack for whatever that's
> worth. It's simply unmaintanable, point. All major vendors have extensive
> product lines over so many changing hardware configuration/setups it is
> simply not viable to attempt precise measurements (and even then, user
> changing config can throw the stuff off in a millisecond). There may have
> been here and there per deployment scenario some "recommended config"
> (not something I immediately recall either) but that means very fixed
> configuration of things & how they go into networks and even then I'm not
> aware of anyone having had a "precise model of the chain in the box". yes,
> probes to measure lots and lots of stuff in the boxes exist but again,
> those are chip/linecard/backplane/chassis/routing engine specific and
> mostly used in complex test/peformance scenarios and not to derive some
> kind of equations that can predict anything much ...
>
> [Bruno] Good points.
>
> Yet, one of the information that we propose to advertise by the LSP
> receiver to the LSP sender is the Receive Window.
>
> -  This is a very common parameter and algorithm. Nothing fancy
> nor reinvented. In particular it’s a parameter used by TCP.
>
> -  I would argue that TCP implementations also run on a variety
> of hardware and systems, must wider range than IS-IS platform. And those
> TCP implementations on all those platform manage to advertise this
> parameter (TCP window)
>
> -  I fail to understand that when some WG contributors proposed
> the use of TCP, nobody said that determining and advertising a Receive
> Window would be an issue, difficult or even impossible. But when we propose
> to advertise a Receive Window in an IS-IS TLV, this becomes difficult or
> even impossible for some platforms. Can anyone help me understand the
> technical difference?
>
>
>
>
>
> Bruno, I was waiting for that ;-)
>
> [Bruno2] Good ;-)
>
>
>
> Discounted for the fact that I'm not a major TCP expert: TCP is a very
> different beast. it has a 100-200msec fast timer & 500msec slow (which have
> to be quite accurate, it's really one timer for all connections + mbuf
> and other magic) and it sends a _lot_ of packets back and forth with window
> size indications so the negotiation can happen very quickly.  Also, TCP
> can detect losses based on sequence number r

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-23 Thread bruno.decraene
Tony,
Thanks for engaging.
Please inline [Bruno2]



From: Tony Przygienda [mailto:tonysi...@gmail.com]
Sent: Wednesday, April 22, 2020 9:25 PM
To: DECRAENE Bruno TGI/OLN
Cc: lsr@ietf.org; Les Ginsberg (ginsberg)
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed



On Wed, Apr 22, 2020 at 11:03 AM 
mailto:bruno.decra...@orange.com>> wrote:
Tony, all,

Thanks Tony for the technical and constructive feedback.
Please inline [Bruno]

From: Tony Przygienda [mailto:tonysi...@gmail.com<mailto:tonysi...@gmail.com>]
Sent: Wednesday, April 22, 2020 1:19 AM
To: Les Ginsberg (ginsberg)
Cc: DECRAENE Bruno TGI/OLN; lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

neither am I aware of anything like this (i.e. per platform/product flooding 
rate constants) in any major vendor stack for whatever that's worth. It's 
simply unmaintanable, point. All major vendors have extensive product lines 
over so many changing hardware configuration/setups it is simply not viable to 
attempt precise measurements (and even then, user changing config can throw the 
stuff off in a millisecond). There may have been here and there per deployment 
scenario some "recommended config" (not something I immediately recall either) 
but that means very fixed configuration of things & how they go into networks 
and even then I'm not aware of anyone having had a "precise model of the chain 
in the box". yes, probes to measure lots and lots of stuff in the boxes exist 
but again, those are chip/linecard/backplane/chassis/routing engine specific 
and mostly used in complex test/peformance scenarios and not to derive some 
kind of equations that can predict anything much ...
[Bruno] Good points.
Yet, one of the information that we propose to advertise by the LSP receiver to 
the LSP sender is the Receive Window.

-  This is a very common parameter and algorithm. Nothing fancy nor 
reinvented. In particular it’s a parameter used by TCP.

-  I would argue that TCP implementations also run on a variety of 
hardware and systems, must wider range than IS-IS platform. And those TCP 
implementations on all those platform manage to advertise this parameter (TCP 
window)

-  I fail to understand that when some WG contributors proposed the use 
of TCP, nobody said that determining and advertising a Receive Window would be 
an issue, difficult or even impossible. But when we propose to advertise a 
Receive Window in an IS-IS TLV, this becomes difficult or even impossible for 
some platforms. Can anyone help me understand the technical difference?


Bruno, I was waiting for that ;-)
[Bruno2] Good ;-)

Discounted for the fact that I'm not a major TCP expert: TCP is a very 
different beast. it has a 100-200msec fast timer & 500msec slow (which have to 
be quite accurate, it's really one timer for all connections + mbuf and other 
magic) and it sends a _lot_ of packets back and forth with window size 
indications so the negotiation can happen very quickly.  Also, TCP can detect 
losses based on sequence number received contrary to routing protocols (that's 
one of the things we added in RIFT BTW) and it can retransmit quickly when it 
sees a "hole". Contrary to that in ISIS ACKs may or may not come, they may be 
bundled, hellos may or may not come and we can't retransmit stuff on 100msec 
timers either. It's an utterly different transport channel.
[Bruno2] I would distinguish two things, which I think we have done in 
https://tools.ietf.org/html/draft-decraene-lsr-isis-flooding-speed-03

-  How fast you can adapt the sending rate. This seems mostly dependent 
on the speed of the feedback loop, rather than the format of message. E.g. In 
IS-IS the receiver can give a feedback more or less quickly (e.g. depending on 
how fast/bundled it sends the PSNP). In theory, I don’t see a major different. 
From an in implementation standpoint, I’m guessing that the difference is 
probably bigger (e.g. TCP is probably lower level/closer to the 
system/hardware, than IS-IS which is more user space and possibly Platform 
Independent in some organizations))

-  How fast you can detect packet loss. I agree that TCP & IS-IS are 
very different on this. We have proposed to improve this by allowing the 
receiver to advertise to the sender how fast it will ack the LSP. Currently the 
timer/behavior is known to receiver but no to the sender. Hence the sender 
needs to assume the wort case (ISO default).

In more abstract terms, TCP is a sliding N-window protocol (where N is adjusted 
all the time & losses can be efficiently detected) whereas LSR flooding is not 
a windowing protocol (or rather LSDB-size window protocol with selective 
retransmission but no detection of loss [or only very slow based on lack of ACK 
& CSNPs]). Disadvantage of something like TCP (I think I sent out the RFC with 
UDP control pr

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-23 Thread Peter Psenak
, and ask network
operator to configure them?

Thank you,

--Bruno

*From:*DECRAENE Bruno TGI/OLN
*Sent:* Wednesday, February 26, 2020 8:03 PM
*To:* 'Les Ginsberg (ginsberg)'
*Cc:* lsr@ietf.org <mailto:lsr@ietf.org>
*Subject:* RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

Please see inline[Bruno]

*From:*Lsr [mailto:lsr-boun...@ietf.org] *On Behalf Of *Les Ginsberg
(ginsberg)
*Sent:* Wednesday, February 19, 2020 3:32 AM
*To:* lsr@ietf.org <mailto:lsr@ietf.org>
*Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Base protocol operation of the Update process tracks the flooding of

LSPs/interface and guarantees timer-based retransmission on P2P interfaces

until an acknowledgment is received.

Using this base protocol mechanism in combination with exponential
backoff of the

retransmission timer provides flow control in the event of temporary
overload

of the receiver.

This mechanism works without protocol extensions, is dynamic, operates

independent of the reason for delayed acknowledgment (dropped packets, CPU

overload), and does not require additional signaling during the overloaded

period.

This is consistent with the recommendations in RFC 4222 (OSPF).

Receiver-based flow control (as proposed in
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )

requires protocol extensions and introduces additional signaling during

periods of high load. The asserted reason for this is to optimize
throughput -

but there is no evidence that it will achieve this goal.

Mention has been made to TCP-like flow control mechanisms as a model - which

are indeed receiver based. However, there are significant differences
between

TCP sessions and IGP flooding.

TCP consists of a single session between two endpoints. Resources

(primarily buffer space) for this session are typically allocated in the

control plane and current usage is easily measurable..

IGP flooding is point-to-multi-point, resources to support IGP flooding

involve both control plane queues and dataplane queues, both of which are

typically not per interface - nor even dedicated to a particular protocol

instance. What input is required to optimize receiver-based flow control
is not fully specified.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
suggests (Section 5) that the values

to be advertised:

"use a formula based on an off line tests of

 the overall LSPDU processing speed for a particular set of hardware

 and the number of interfaces configured for IS-IS"

implying that the advertised value is intentionally not dynamic. As such,

it could just as easily be configured on the transmit side and not require

additional signaling. As a static value, it would necessarily be somewhat

conservative as it has to account for the worst case under the current

configuration - which means it needs to consider concurrent use of the CPU

and dataplane by all protocols/features which are enabled on a router -
not all of whose

use is likely to be synchronized with peak IS-IS flooding load.

[Bruno] _/Assuming/_ that the parameters are static, those parameters

oare the same as the one implemented (configured) on multiple
implementations, including the one from your employer. Now do you
believe that those parameters can be determined?

§If yes, how do you do _/today/_ on your implementation? (this seems to
contradict your statement that you have no way to figure out how to find
the right value)

§If no, why did you implement those parameters, and ask network operator
to configure them?

§There is also the option to reply: I don’t know but don’t care as I
leave the issue to the network operator.

ocan still provide some form of dynamicity, by using the PSNP as dynamic
acknowledgement.

oare really dependent on the receiver, not the sender.

§the sender will never overload itself.

§The receiver has more information,  knowing its processing power (low
end, high end, 80s, 20s (currently we are stuck with 20 years old value
assuming the worst possible receiver (and worst there were, including
with packet processing partly done in the control plane processor)), its
expected IS-IS load (#neighbors), its preference for bursty LSP
reception (high delay between IS-IS CPU allocation cycles, memory not an
issue up to x kilo-octet…), its expected control plane load if IS-IS
traffic has not higher priority over other control plane traffic…), it’s
expected level of QoS prioritization [1]

· [1] looks for “Extended SPD Headroom”. E.g. “Since IGP and link
stability are more tenuous and more crucial than BGP stability, such
packets are now given the highest priority and are given extended SPD
headroom with a default of 10 packets. This means that these packets are
not dropped if the size of the input hold queue is lower than 185 (input
queue default size + spd headroom size + spd extended headroom).”

oAnd this is for distributed architecture, 15 years ago. So what

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-23 Thread bruno.decraene
S-IS Flooding Speed
> > 
> > Les,
> > 
> > After nearly 2 months, can we expect an answer from your side?
> > 
> > More specifically, the below question
> > 
> > [Bruno] _Assuming_ that the parameters are static, the parameters 
> > proposed in draft-decraene-lsr-isis-flooding-speed are the same as the 
> > one implemented (configured) on multiple implementations, including the 
> > one from your employer.
> > 
> > Now do you believe that those parameters can be determined?
> > 
> > §  If yes, how do you do _today_ on your implementation? (this seems to 
> > contradict your statement that you have no way to figure out how to find 
> > the right value)
> > 
> > §  If no, why did you implement those parameters, and ask network 
> > operator to configure them?
> > 
> > Thank you,
> > 
> > --Bruno
> > 
> > *From:*DECRAENE Bruno TGI/OLN
> > *Sent:* Wednesday, February 26, 2020 8:03 PM
> > *To:* 'Les Ginsberg (ginsberg)'
> > *Cc:* lsr@ietf.org <mailto:lsr@ietf.org>
> > *Subject:* RE: Flow Control Discussion for IS-IS Flooding Speed
> > 
> > Les,
> > 
> > Please see inline[Bruno]
> > 
> > *From:*Lsr [mailto:lsr-boun...@ietf.org] *On Behalf Of *Les Ginsberg 
> > (ginsberg)
> > *Sent:* Wednesday, February 19, 2020 3:32 AM
> > *To:* lsr@ietf.org <mailto:lsr@ietf.org>
> > *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> > 
> > Base protocol operation of the Update process tracks the flooding of
> > 
> > LSPs/interface and guarantees timer-based retransmission on P2P interfaces
> > 
> > until an acknowledgment is received.
> > 
> > Using this base protocol mechanism in combination with exponential 
> > backoff of the
> > 
> > retransmission timer provides flow control in the event of temporary 
> > overload
> > 
> > of the receiver.
> > 
> > This mechanism works without protocol extensions, is dynamic, operates
> > 
> > independent of the reason for delayed acknowledgment (dropped packets, CPU
> > 
> > overload), and does not require additional signaling during the overloaded
> > 
> > period.
> > 
> > This is consistent with the recommendations in RFC 4222 (OSPF).
> > 
> > Receiver-based flow control (as proposed in 
> > https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
> > 
> > requires protocol extensions and introduces additional signaling during
> > 
> > periods of high load. The asserted reason for this is to optimize 
> > throughput -
> > 
> > but there is no evidence that it will achieve this goal.
> > 
> > Mention has been made to TCP-like flow control mechanisms as a model - which
> > 
> > are indeed receiver based. However, there are significant differences 
> > between
> > 
> > TCP sessions and IGP flooding.
> > 
> > TCP consists of a single session between two endpoints. Resources
> > 
> > (primarily buffer space) for this session are typically allocated in the
> > 
> > control plane and current usage is easily measurable..
> > 
> > IGP flooding is point-to-multi-point, resources to support IGP flooding
> > 
> > involve both control plane queues and dataplane queues, both of which are
> > 
> > typically not per interface - nor even dedicated to a particular protocol
> > 
> > instance. What input is required to optimize receiver-based flow control 
> > is not fully specified.
> > 
> > https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
> > suggests (Section 5) that the values
> > 
> > to be advertised:
> > 
> > "use a formula based on an off line tests of
> > 
> > the overall LSPDU processing speed for a particular set of hardware
> > 
> > and the number of interfaces configured for IS-IS"
> > 
> > implying that the advertised value is intentionally not dynamic. As such,
> > 
> > it could just as easily be configured on the transmit side and not require
> > 
> > additional signaling. As a static value, it would necessarily be somewhat
> > 
> > conservative as it has to account for the worst case under the current
> > 
> > configuration - which means it needs to consider concurrent use of the CPU
> > 
> > and dataplane by all protocols/features which are enabled on a router - 
> > not all of whose
> > 
> > use is likely to be synchronized with peak IS-IS flooding load.
> > 
> > [Bruno] _/Ass

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-23 Thread Peter Psenak

Bruno,

On 22/04/2020 20:04, bruno.decra...@orange.com wrote:

Les,

Pleasesee inline

*From:*Les Ginsberg (ginsberg) [mailto:ginsb...@cisco.com]
*Sent:* Tuesday, April 21, 2020 11:48 PM
*To:* DECRAENE Bruno TGI/OLN
*Cc:* lsr@ietf.org
*Subject:* RE: Flow Control Discussion for IS-IS Flooding Speed

Bruno –

You have made an assumption that historically vendors have tuned LSP 
transmission rates to a platform specific value.


[Bruno] I don’t think so. What makes you think so?

In all cases, that is not my assumption, and for multiple reasons.

That certainly is not true in the case of my employer (happy to hear 
what other vendors have been doing for the past 20 years).


The default value is based on minimumBroadcastLSPTransmissionInterval 
specified in ISO10589.


A knob is available to allow tuning (faster or slower) in a given 
deployment – though in my experience this knob is rarely used.


*//*[Bruno] I would agree on both. More interestingly is the why: why 
aren’t those existing sending parameters tuned, while we agree that 
default configuration are suboptimal?


My take is that:

-We don’t want to overload the receiver and create loss of LSP (as this 
will delay the LSDB synchronization and decrease the goodput). Hence 
this (the parameters) is receiver dependent (e.g., platform type, number 
of IGP adjacencies, …).


-We don’t know which value to configure : difficult for the network 
operator to evaluate without a significant knowledge of the receiver 
implementation (in particular QoS parameters for incoming LSP), vendors 
are not really proposing value or guidance, especially since the sending 
behavior is not standardized and slightly different between vendors. So 
everyone stay safe with default 20 years old values.


We already discuss in 
https://tools.ietf.org/html/draft-ginsberg-lsr-isis-flooding-scale-02#section-2 
that this common interpretation isn’t the most appropriate, but 
historically the concern has been to avoid flooding too fast – not to 
optimize flooding speed.


Obviously, we are revisiting that approach and saying it needs to change 
– which is something I think we have consensus on.


I have provided a description in my recent response as to why it is 
difficult to derive an optimal value on a per platform basis. You may 
disagree – but please do not claim that we actually have been doing this 
for years. We haven’t been.


*//*[Bruno] I’m not sure how to rephrase my below email so that we could 
understand each other, have an answer from your side (I mean employer 
side), and progress.


More succinctly: How do network operator correctly configure the 
flooding parameters value on the implementation of your employer? In 
particular given the variety of conditions on the LSP receiver side.


the answer is test and see which value fits best in your specific 
environment.


One reason to have some sort of feedback mechanism (being it tx or rx 
based) is to avoid the need to tune today's static parameters and flood 
as fast as the receiver is able to consume and slow down if the receiver 
is not able to cope with the rate we flood.


thanks,
Peter








--Bruno

   Les

*From:*bruno.decra...@orange.com 
*Sent:* Monday, April 20, 2020 4:47 AM
*To:* Les Ginsberg (ginsberg) 
*Cc:* lsr@ietf.org
*Subject:* RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

After nearly 2 months, can we expect an answer from your side?

More specifically, the below question

[Bruno] _Assuming_ that the parameters are static, the parameters 
proposed in draft-decraene-lsr-isis-flooding-speed are the same as the 
one implemented (configured) on multiple implementations, including the 
one from your employer.


Now do you believe that those parameters can be determined?

§  If yes, how do you do _today_ on your implementation? (this seems to 
contradict your statement that you have no way to figure out how to find 
the right value)


§  If no, why did you implement those parameters, and ask network 
operator to configure them?


Thank you,

--Bruno

*From:*DECRAENE Bruno TGI/OLN
*Sent:* Wednesday, February 26, 2020 8:03 PM
*To:* 'Les Ginsberg (ginsberg)'
*Cc:* lsr@ietf.org <mailto:lsr@ietf.org>
*Subject:* RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

Please see inline[Bruno]

*From:*Lsr [mailto:lsr-boun...@ietf.org] *On Behalf Of *Les Ginsberg 
(ginsberg)

*Sent:* Wednesday, February 19, 2020 3:32 AM
*To:* lsr@ietf.org <mailto:lsr@ietf.org>
*Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Base protocol operation of the Update process tracks the flooding of

LSPs/interface and guarantees timer-based retransmission on P2P interfaces

until an acknowledgment is received.

Using this base protocol mechanism in combination with exponential 
backoff of the


retransmission timer provides flow control in the event of temporary 
overload


of the receiver.

This mechanism works without protocol extensions, is dynamic, operates

independent o

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-22 Thread Tony Przygienda
On Wed, Apr 22, 2020 at 11:03 AM  wrote:

> Tony, all,
>
>
>
> Thanks Tony for the technical and constructive feedback.
>
> Please inline [Bruno]
>
>
>
> *From:* Tony Przygienda [mailto:tonysi...@gmail.com]
> *Sent:* Wednesday, April 22, 2020 1:19 AM
> *To:* Les Ginsberg (ginsberg)
> *Cc:* DECRAENE Bruno TGI/OLN; lsr@ietf.org
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> neither am I aware of anything like this (i.e. per platform/product
> flooding rate constants) in any major vendor stack for whatever that's
> worth. It's simply unmaintanable, point. All major vendors have extensive
> product lines over so many changing hardware configuration/setups it is
> simply not viable to attempt precise measurements (and even then, user
> changing config can throw the stuff off in a millisecond). There may have
> been here and there per deployment scenario some "recommended config"
> (not something I immediately recall either) but that means very fixed
> configuration of things & how they go into networks and even then I'm not
> aware of anyone having had a "precise model of the chain in the box". yes,
> probes to measure lots and lots of stuff in the boxes exist but again,
> those are chip/linecard/backplane/chassis/routing engine specific and
> mostly used in complex test/peformance scenarios and not to derive some
> kind of equations that can predict anything much ...
>
> [Bruno] Good points.
>
> Yet, one of the information that we propose to advertise by the LSP
> receiver to the LSP sender is the Receive Window.
>
> -  This is a very common parameter and algorithm. Nothing fancy
> nor reinvented. In particular it’s a parameter used by TCP.
>
> -  I would argue that TCP implementations also run on a variety
> of hardware and systems, must wider range than IS-IS platform. And those
> TCP implementations on all those platform manage to advertise this
> parameter (TCP window)
>
> -  I fail to understand that when some WG contributors proposed
> the use of TCP, nobody said that determining and advertising a Receive
> Window would be an issue, difficult or even impossible. But when we propose
> to advertise a Receive Window in an IS-IS TLV, this becomes difficult or
> even impossible for some platforms. Can anyone help me understand the
> technical difference?
>
>
>

Bruno, I was waiting for that ;-) Discounted for the fact that I'm not a
major TCP expert: TCP is a very different beast. it has a 100-200msec fast
timer & 500msec slow (which have to be quite accurate, it's really one
timer for all connections + mbuf and other magic) and it sends a _lot_ of
packets back and forth with window size indications so the negotiation can
happen very quickly.  Also, TCP can detect losses based on sequence number
received contrary to routing protocols (that's one of the things we added
in RIFT BTW) and it can retransmit quickly when it sees a "hole". Contrary
to that in ISIS ACKs may or may not come, they may be bundled, hellos may
or may not come and we can't retransmit stuff on 100msec timers either.
It's an utterly different transport channel.

In more abstract terms, TCP is a sliding N-window protocol (where N is
adjusted all the time & losses can be efficiently detected) whereas LSR
flooding is not a windowing protocol (or rather LSDB-size window protocol
with selective retransmission but no detection of loss [or only very slow
based on lack of ACK & CSNPs]). Disadvantage of something like TCP (I think
I sent out the RFC with UDP control protocol work to make it more TCP like)
is that you are stuck when you put something into the pipe, no
prioritization possible and if receiver is slow you may have multiple
obsolete copies in the pipe waiting & lots retransmission BW when holes are
punched into the data through loss. And plain TCP  is actually quite bad
for control protocol traffic @ scale, almost all vendor run special version
of it for BGP for that reason. Why that is is out of scope of this list I
think ... Flooding is really good to send lots of data prioritized/in
parallel but on losses re-TX is slow.


> Bruno, if you're so deeply interested in that stuff we can talk 1:1
> off-line about implementation work on rift towards adapatable flooding
> rate
>
> [Bruno] Sure. My pleasure. Please propose me some timeslot offline. Please
> note that I’m based in Europe (CEST), so a priori during your morning and
> my evening.
>
> If you can also extend the offer to discuss the implementation work on the
> IS-IS implementation of your employer with regards to adaptable flooding
> rate, and/or how network operator can configure the CLI parameters of the
> LSP senders so as to improve flooding rate without overloading th

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-22 Thread bruno.decraene
Les,

Please see inline

From: Les Ginsberg (ginsberg) [mailto:ginsb...@cisco.com]
Sent: Tuesday, April 21, 2020 11:48 PM
To: DECRAENE Bruno TGI/OLN
Cc: lsr@ietf.org
Subject: RE: Flow Control Discussion for IS-IS Flooding Speed

Bruno -

You have made an assumption that historically vendors have tuned LSP 
transmission rates to a platform specific value.
[Bruno] I don't think so. What makes you think so?
In all cases, that is not my assumption, and for multiple reasons.

That certainly is not true in the case of my employer (happy to hear what other 
vendors have been doing for the past 20 years).

The default value is based on minimumBroadcastLSPTransmissionInterval specified 
in ISO10589.
A knob is available to allow tuning (faster or slower) in a given deployment - 
though in my experience this knob is rarely used.
[Bruno] I would agree on both. More interestingly is the why: why aren't those 
existing sending parameters tuned, while we  agree that default configuration 
are suboptimal?
My take is that:

-  We don't want to overload the receiver and create loss of LSP (as 
this will delay the LSDB synchronization and decrease the goodput). Hence this 
(the parameters) is receiver dependent (e.g., platform type,  number of IGP 
adjacencies, ...).

-  We don't know which value to configure : difficult for the network 
operator to evaluate without a significant knowledge of the receiver 
implementation (in particular QoS parameters for incoming LSP), vendors are not 
really proposing value or guidance, especially since the sending behavior is 
not standardized and slightly different between vendors. So everyone stay safe 
with default 20 years old values.

We already discuss in 
https://tools.ietf.org/html/draft-ginsberg-lsr-isis-flooding-scale-02#section-2 
that this common interpretation isn't the most appropriate, but historically 
the concern has been to avoid flooding too fast - not to optimize flooding 
speed.
Obviously, we are revisiting that approach and saying it needs to change - 
which is something I think we have consensus on.

I have provided a description in my recent response as to why it is difficult 
to derive an optimal value on a per platform basis. You may disagree - but 
please do not claim that we actually have been doing this for years. We haven't 
been.
[Bruno] I'm not sure how to rephrase my below email so that we could understand 
each other, have an answer from your side (I mean employer side), and progress.
More succinctly: How do network operator correctly configure the flooding 
parameters value on the implementation of your employer? In particular given 
the variety of conditions on the LSP receiver side.

--Bruno

  Les

From: bruno.decra...@orange.com 
Sent: Monday, April 20, 2020 4:47 AM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

After nearly 2 months, can we expect an answer from your side?

More specifically, the below question

[Bruno] _Assuming_ that the parameters are static, the parameters proposed in 
draft-decraene-lsr-isis-flooding-speed are the same as the one implemented 
(configured) on multiple implementations, including the one from your employer.
Now do you believe that those parameters can be determined?

§  If yes, how do you do _today_ on your implementation? (this seems to 
contradict your statement that you have no way to figure out how to find the 
right value)

§  If no, why did you implement those parameters, and ask network operator to 
configure them?


Thank you,
--Bruno

From: DECRAENE Bruno TGI/OLN
Sent: Wednesday, February 26, 2020 8:03 PM
To: 'Les Ginsberg (ginsberg)'
Cc: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

Please see inline[Bruno]

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Wednesday, February 19, 2020 3:32 AM
To: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Base protocol operation of the Update process tracks the flooding of
LSPs/interface and guarantees timer-based retransmission on P2P interfaces
until an acknowledgment is received.

Using this base protocol mechanism in combination with exponential backoff of 
the
retransmission timer provides flow control in the event of temporary overload
of the receiver.

This mechanism works without protocol extensions, is dynamic, operates
independent of the reason for delayed acknowledgment (dropped packets, CPU
overload), and does not require additional signaling during the overloaded
period.

This is consistent with the recommendations in RFC 4222 (OSPF).

Receiver-based flow control (as proposed in 
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
requires protocol extensions and introduces additional signaling during
periods of high load. The asserted reason for this is to optimize throughput -
but there is

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-22 Thread bruno.decraene
Tony, all,

Thanks Tony for the technical and constructive feedback.
Please inline [Bruno]

From: Tony Przygienda [mailto:tonysi...@gmail.com]
Sent: Wednesday, April 22, 2020 1:19 AM
To: Les Ginsberg (ginsberg)
Cc: DECRAENE Bruno TGI/OLN; lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

neither am I aware of anything like this (i.e. per platform/product flooding 
rate constants) in any major vendor stack for whatever that's worth. It's 
simply unmaintanable, point. All major vendors have extensive product lines 
over so many changing hardware configuration/setups it is simply not viable to 
attempt precise measurements (and even then, user changing config can throw the 
stuff off in a millisecond). There may have been here and there per deployment 
scenario some "recommended config" (not something I immediately recall either) 
but that means very fixed configuration of things & how they go into networks 
and even then I'm not aware of anyone having had a "precise model of the chain 
in the box". yes, probes to measure lots and lots of stuff in the boxes exist 
but again, those are chip/linecard/backplane/chassis/routing engine specific 
and mostly used in complex test/peformance scenarios and not to derive some 
kind of equations that can predict anything much ...
[Bruno] Good points.
Yet, one of the information that we propose to advertise by the LSP receiver to 
the LSP sender is the Receive Window.

-  This is a very common parameter and algorithm. Nothing fancy nor 
reinvented. In particular it’s a parameter used by TCP.

-  I would argue that TCP implementations also run on a variety of 
hardware and systems, must wider range than IS-IS platform. And those TCP 
implementations on all those platform manage to advertise this parameter (TCP 
window)

-  I fail to understand that when some WG contributors proposed the use 
of TCP, nobody said that determining and advertising a Receive Window would be 
an issue, difficult or even impossible. But when we propose to advertise a 
Receive Window in an IS-IS TLV, this becomes difficult or even impossible for 
some platforms. Can anyone help me understand the technical difference?

Bruno, if you're so deeply interested in that stuff we can talk 1:1 off-line 
about implementation work on rift towards adapatable flooding rate
[Bruno] Sure. My pleasure. Please propose me some timeslot offline. Please note 
that I’m based in Europe (CEST), so a priori during your morning and my evening.
If you can also extend the offer to discuss the implementation work on the 
IS-IS implementation of your employer with regards to adaptable flooding rate, 
and/or how network operator can configure the CLI parameters of the LSP senders 
so as to improve flooding rate without overloading the Juniper receiver 
(possibly depending on the capability of the receiver, its number of IS-IS 
neighbors… and/or whatever parameter that you feel are relevant) that would be 
most appreciated. And if you believe that the Juniper LSP receiver can handle 
any rate from any reasonable (e.g. 50)  number of IGP neighbors, without 
(significantly) dropping the received LSPs, that would be even simpler, please 
advise.

--Bruno
(algorithm you see in the -02 draft Les put out is a _very rough_ approximation 
of that BTW. I joined as co-author after we had some very fruitful discussions 
and I consider the draft close to what can be _realistically_ done  today in 
ISIS. I don't consider further details generic enough to merit wide forum 
discussions). And RIFT put a couple of things into packet formats we can't put 
into ISIS (I talked with Les about it) to improve the adaptability of the 
flooding rate BTW and some interesting, coarse indication from receiver. Again, 
this is not a constant that is calculated, it's all adaptive driven almost 
completely from the transmitter side and the feedback it gathers.

all my very own 2c

-- tony

On Tue, Apr 21, 2020 at 2:48 PM Les Ginsberg (ginsberg) 
mailto:40cisco@dmarc.ietf.org>> wrote:
Bruno –

You have made an assumption that historically vendors have tuned LSP 
transmission rates to a platform specific value.
That certainly is not true in the case of my employer (happy to hear what other 
vendors have been doing for the past 20 years).

The default value is based on minimumBroadcastLSPTransmissionInterval specified 
in ISO10589.
A knob is available to allow tuning (faster or slower) in a given deployment – 
though in my experience this knob is rarely used.

We already discuss in 
https://tools.ietf.org/html/draft-ginsberg-lsr-isis-flooding-scale-02#section-2 
that this common interpretation isn’t the most appropriate, but historically 
the concern has been to avoid flooding too fast – not to optimize flooding 
speed.
Obviously, we are revisiting that approach and saying it needs to change – 
which is something I think we have consensus on.

I have provided a des

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-21 Thread Tony Przygienda
neither am I aware of anything like this (i.e. per platform/product
flooding rate constants) in any major vendor stack for whatever that's
worth. It's simply unmaintanable, point. All major vendors have extensive
product lines over so many changing hardware configuration/setups it is
simply not viable to attempt precise measurements (and even then, user
changing config can throw the stuff off in a millisecond). There may have
been here and there per deployment scenario some "recommended config" (not
something I immediately recall either) but that means very fixed
configuration of things & how they go into networks and even then I'm not
aware of anyone having had a "precise model of the chain in the box". yes,
probes to measure lots and lots of stuff in the boxes exist but again,
those are chip/linecard/backplane/chassis/routing engine specific and
mostly used in complex test/peformance scenarios and not to derive some
kind of equations that can predict anything much ...

Bruno, if you're so deeply interested in that stuff we can talk 1:1
off-line about implementation work on rift towards adapatable flooding rate
(algorithm you see in the -02 draft Les put out is a _very rough_
approximation of that BTW. I joined as co-author after we had some very
fruitful discussions and I consider the draft close to what can be
_realistically_ done  today in ISIS. I don't consider further details
generic enough to merit wide forum discussions). And RIFT put a couple of
things into packet formats we can't put into ISIS (I talked with Les about
it) to improve the adaptability of the flooding rate BTW and some
interesting, coarse indication from receiver. Again, this is not a constant
that is calculated, it's all adaptive driven almost completely from the
transmitter side and the feedback it gathers.

all my very own 2c

-- tony

On Tue, Apr 21, 2020 at 2:48 PM Les Ginsberg (ginsberg)  wrote:

> Bruno –
>
>
>
> You have made an assumption that historically vendors have tuned LSP
> transmission rates to a platform specific value.
>
> That certainly is not true in the case of my employer (happy to hear what
> other vendors have been doing for the past 20 years).
>
>
>
> The default value is based on minimumBroadcastLSPTransmissionInterval
> specified in ISO10589.
>
> A knob is available to allow tuning (faster or slower) in a given
> deployment – though in my experience this knob is rarely used.
>
>
>
> We already discuss in
> https://tools.ietf.org/html/draft-ginsberg-lsr-isis-flooding-scale-02#section-2
> that this common interpretation isn’t the most appropriate, but
> historically the concern has been to avoid flooding too fast – not to
> optimize flooding speed.
>
> Obviously, we are revisiting that approach and saying it needs to change –
> which is something I think we have consensus on.
>
>
>
> I have provided a description in my recent response as to why it is
> difficult to derive an optimal value on a per platform basis. You may
> disagree – but please do not claim that we actually have been doing this
> for years. We haven’t been.
>
>
>
>   Les
>
>
>
> *From:* bruno.decra...@orange.com 
> *Sent:* Monday, April 20, 2020 4:47 AM
> *To:* Les Ginsberg (ginsberg) 
> *Cc:* lsr@ietf.org
> *Subject:* RE: Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Les,
>
>
>
> After nearly 2 months, can we expect an answer from your side?
>
>
>
> More specifically, the below question
>
>
>
> [Bruno] _Assuming_ that the parameters are static, the parameters proposed
> in draft-decraene-lsr-isis-flooding-speed are the same as the one
> implemented (configured) on multiple implementations, including the one
> from your employer.
>
> Now do you believe that those parameters can be determined?
>
>
>
> §  If yes, how do you do _today_ on your implementation? (this seems to
> contradict your statement that you have no way to figure out how to find
> the right value)
>
>
>
> §  If no, why did you implement those parameters, and ask network operator
> to configure them?
>
>
>
>
>
> Thank you,
>
> --Bruno
>
>
>
> *From:* DECRAENE Bruno TGI/OLN
> *Sent:* Wednesday, February 26, 2020 8:03 PM
> *To:* 'Les Ginsberg (ginsberg)'
> *Cc:* lsr@ietf.org
> *Subject:* RE: Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Les,
>
>
>
> Please see inline[Bruno]
>
>
>
> *From:* Lsr [mailto:lsr-boun...@ietf.org ] *On
> Behalf Of *Les Ginsberg (ginsberg)
> *Sent:* Wednesday, February 19, 2020 3:32 AM
> *To:* lsr@ietf.org
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Base protocol operation of the Update process tracks the flooding of
>
> LSPs/i

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-21 Thread Les Ginsberg (ginsberg)
Bruno -

You have made an assumption that historically vendors have tuned LSP 
transmission rates to a platform specific value.
That certainly is not true in the case of my employer (happy to hear what other 
vendors have been doing for the past 20 years).

The default value is based on minimumBroadcastLSPTransmissionInterval specified 
in ISO10589.
A knob is available to allow tuning (faster or slower) in a given deployment - 
though in my experience this knob is rarely used.

We already discuss in 
https://tools.ietf.org/html/draft-ginsberg-lsr-isis-flooding-scale-02#section-2 
that this common interpretation isn't the most appropriate, but historically 
the concern has been to avoid flooding too fast - not to optimize flooding 
speed.
Obviously, we are revisiting that approach and saying it needs to change - 
which is something I think we have consensus on.

I have provided a description in my recent response as to why it is difficult 
to derive an optimal value on a per platform basis. You may disagree - but 
please do not claim that we actually have been doing this for years. We haven't 
been.

  Les

From: bruno.decra...@orange.com 
Sent: Monday, April 20, 2020 4:47 AM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

After nearly 2 months, can we expect an answer from your side?

More specifically, the below question

[Bruno] _Assuming_ that the parameters are static, the parameters proposed in 
draft-decraene-lsr-isis-flooding-speed are the same as the one implemented 
(configured) on multiple implementations, including the one from your employer.
Now do you believe that those parameters can be determined?

§  If yes, how do you do _today_ on your implementation? (this seems to 
contradict your statement that you have no way to figure out how to find the 
right value)

§  If no, why did you implement those parameters, and ask network operator to 
configure them?


Thank you,
--Bruno

From: DECRAENE Bruno TGI/OLN
Sent: Wednesday, February 26, 2020 8:03 PM
To: 'Les Ginsberg (ginsberg)'
Cc: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

Please see inline[Bruno]

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Wednesday, February 19, 2020 3:32 AM
To: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Base protocol operation of the Update process tracks the flooding of
LSPs/interface and guarantees timer-based retransmission on P2P interfaces
until an acknowledgment is received.

Using this base protocol mechanism in combination with exponential backoff of 
the
retransmission timer provides flow control in the event of temporary overload
of the receiver.

This mechanism works without protocol extensions, is dynamic, operates
independent of the reason for delayed acknowledgment (dropped packets, CPU
overload), and does not require additional signaling during the overloaded
period.

This is consistent with the recommendations in RFC 4222 (OSPF).

Receiver-based flow control (as proposed in 
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
requires protocol extensions and introduces additional signaling during
periods of high load. The asserted reason for this is to optimize throughput -
but there is no evidence that it will achieve this goal.

Mention has been made to TCP-like flow control mechanisms as a model - which
are indeed receiver based. However, there are significant differences between
TCP sessions and IGP flooding.

TCP consists of a single session between two endpoints. Resources
(primarily buffer space) for this session are typically allocated in the
control plane and current usage is easily measurable..

IGP flooding is point-to-multi-point, resources to support IGP flooding
involve both control plane queues and dataplane queues, both of which are
typically not per interface - nor even dedicated to a particular protocol
instance. What input is required to optimize receiver-based flow control is not 
fully specified.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
suggests (Section 5) that the values
to be advertised:

"use a formula based on an off line tests of
   the overall LSPDU processing speed for a particular set of hardware
   and the number of interfaces configured for IS-IS"

implying that the advertised value is intentionally not dynamic. As such,
it could just as easily be configured on the transmit side and not require
additional signaling. As a static value, it would necessarily be somewhat
conservative as it has to account for the worst case under the current
configuration - which means it needs to consider concurrent use of the CPU
and dataplane by all protocols/features which are enabled on a router - not all 
of whose
use is likely to be synchronized with peak IS-IS flooding lo

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed -- A plea for cooperation

2020-04-20 Thread tony . li

Gentlebeings,

This discussion is producing far more heat than light.  Can we please refocus 
our attentions?

I think that we all agree that the legacy parameters are no longer serving us 
well and that we need to reconsider our flooding parameters and mechanisms.

If we fail to reach a constructive consensus, we will end up with a protocol 
that severely underperforms, hurting our end user experience and allowing other
protocols to supplant what we’ve worked so hard for. It behooves us to all work 
together to find a common ground that allows all implementations to 
converge rapidly.

At this point, arguing back and forth does not seem to be helping.

Our goal is to enable rapid flooding of a large LSP database. We need everyone 
to be able to do this. While it may help an implementation to be 
rapid when operating with just itself, in today’s industrial environment, not 
being able to do this across implementations is not helpful. We need
all implementations to be able to be performant.

We have iterated on the points of the discussion repeatedly. Further repetition 
is not helpful. What is lacking is experimentation and facts. 
We need the results of running code. Packet traces with analysis of flow 
control mechanisms. Demonstrations of how particular parameters and
mechanisms perform across implementations and across hardware platform classes. 
Comparisons of our flooding mechanisms with the throughput 
of other transport protocols.

What is going to help all platforms be their best?

Don’t just tell us what you think: prove it.

The competition is not the other implementation. It is other protocols who will 
supplant everything.

Regards,
Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-20 Thread tony . li

Bruno,

> Waiting some more details, as per you below email, the IS-IS receiver does 
> have a queue, sometimes dedicated to IS-IS, but in general relatively 
> dedicated to very important and time sensitive traffic to the control plane. 
> Details are indeed implementation specific. But in general, this queue is 
> designed to protect the IS-IS traffic from lower priority traffic, e.g. 
> burstyBGP.. So can we assume that the receiver have (or at least may have) 
> such a queue, and work with this?


That would be ideal, but probably not realistic. Not all implementations are 
going to have separate queues for BGP and IS-IS.

IMHO, that’s a very good thing, of course, but not all merchant silicon is 
quite that sophisticated. Yet. And it takes years to change, so it’s best that 
we proceed without.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-20 Thread bruno.decraene
 to be 
conservative as they would need to account for the worst case scenarios.

draft-ginsberg-lsr-isis-flooding-scale proposes dynamic flow control based on 
the state of the transmitter. In this model, there is no dependency on platform 
implementation. The number of unacknowledged LSPs sent on an interface is used 
as input to the flow control algorithm. This accounts for all reasons why a 
receiver may be slow to acknowledge without requiring knowledge of which 
stage(s) described above are affecting the receiver's ability to provide timely 
acknowledgements.

   Les


From: bruno.decra...@orange.com 
Sent: Wednesday, February 26, 2020 11:03 AM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

Please see inline[Bruno]

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Wednesday, February 19, 2020 3:32 AM
To: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Base protocol operation of the Update process tracks the flooding of
LSPs/interface and guarantees timer-based retransmission on P2P interfaces
until an acknowledgment is received.

Using this base protocol mechanism in combination with exponential backoff of 
the
retransmission timer provides flow control in the event of temporary overload
of the receiver.

This mechanism works without protocol extensions, is dynamic, operates
independent of the reason for delayed acknowledgment (dropped packets, CPU
overload), and does not require additional signaling during the overloaded
period.

This is consistent with the recommendations in RFC 4222 (OSPF).

Receiver-based flow control (as proposed in 
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
requires protocol extensions and introduces additional signaling during
periods of high load. The asserted reason for this is to optimize throughput -
but there is no evidence that it will achieve this goal.

Mention has been made to TCP-like flow control mechanisms as a model - which
are indeed receiver based. However, there are significant differences between
TCP sessions and IGP flooding.

TCP consists of a single session between two endpoints. Resources
(primarily buffer space) for this session are typically allocated in the
control plane and current usage is easily measurable..

IGP flooding is point-to-multi-point, resources to support IGP flooding
involve both control plane queues and dataplane queues, both of which are
typically not per interface - nor even dedicated to a particular protocol
instance. What input is required to optimize receiver-based flow control is not 
fully specified.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
suggests (Section 5) that the values
to be advertised:

"use a formula based on an off line tests of
   the overall LSPDU processing speed for a particular set of hardware
   and the number of interfaces configured for IS-IS"

implying that the advertised value is intentionally not dynamic. As such,
it could just as easily be configured on the transmit side and not require
additional signaling. As a static value, it would necessarily be somewhat
conservative as it has to account for the worst case under the current
configuration - which means it needs to consider concurrent use of the CPU
and dataplane by all protocols/features which are enabled on a router - not all 
of whose
use is likely to be synchronized with peak IS-IS flooding load.
[Bruno] _Assuming_ that the parameters are static, those parameters
o  are the same as the one implemented (configured) on multiple 
implementations, including the one from your employer. Now do you believe that 
those parameters can be determined?
§  If yes, how do you do _today_ on your implementation? (this seems to 
contradict your statement that you have no way to figure out how to find the 
right value)
§  If no, why did you implement those parameters, and ask network operator to 
configure them?
§  There is also the option to reply: I don't know but don't care as I leave 
the issue to the network operator.
o  can still provide some form of dynamicity, by using the PSNP as dynamic 
acknowledgement.
o  are really dependent on the receiver, not the sender.
§  the sender will never overload itself.
§  The receiver has more information,  knowing its processing power (low end, 
high end, 80s, 20s (currently we are stuck with 20 years old value assuming the 
worst possible receiver (and worst there were, including with packet processing 
partly done in the control plane processor)), its expected IS-IS load 
(#neighbors), its preference for bursty LSP reception (high delay between IS-IS 
CPU allocation cycles, memory not an issue up to x kilo-octet...), its expected 
control plane load if IS-IS traffic has not higher priority over other control 
plane traffic...), it's expected level of QoS prioritization [1]
·  [1] looks for "

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-20 Thread bruno.decraene
Les,

After nearly 2 months, can we expect an answer from your side?

More specifically, the below question

[Bruno] _Assuming_ that the parameters are static, the parameters proposed in 
draft-decraene-lsr-isis-flooding-speed are the same as the one implemented 
(configured) on multiple implementations, including the one from your employer.
Now do you believe that those parameters can be determined?

§  If yes, how do you do _today_ on your implementation? (this seems to 
contradict your statement that you have no way to figure out how to find the 
right value)

§  If no, why did you implement those parameters, and ask network operator to 
configure them?


Thank you,
--Bruno

From: DECRAENE Bruno TGI/OLN
Sent: Wednesday, February 26, 2020 8:03 PM
To: 'Les Ginsberg (ginsberg)'
Cc: lsr@ietf.org
Subject: RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

Please see inline[Bruno]

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Wednesday, February 19, 2020 3:32 AM
To: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Base protocol operation of the Update process tracks the flooding of
LSPs/interface and guarantees timer-based retransmission on P2P interfaces
until an acknowledgment is received.

Using this base protocol mechanism in combination with exponential backoff of 
the
retransmission timer provides flow control in the event of temporary overload
of the receiver.

This mechanism works without protocol extensions, is dynamic, operates
independent of the reason for delayed acknowledgment (dropped packets, CPU
overload), and does not require additional signaling during the overloaded
period.

This is consistent with the recommendations in RFC 4222 (OSPF).

Receiver-based flow control (as proposed in 
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
requires protocol extensions and introduces additional signaling during
periods of high load. The asserted reason for this is to optimize throughput -
but there is no evidence that it will achieve this goal.

Mention has been made to TCP-like flow control mechanisms as a model - which
are indeed receiver based. However, there are significant differences between
TCP sessions and IGP flooding.

TCP consists of a single session between two endpoints. Resources
(primarily buffer space) for this session are typically allocated in the
control plane and current usage is easily measurable..

IGP flooding is point-to-multi-point, resources to support IGP flooding
involve both control plane queues and dataplane queues, both of which are
typically not per interface - nor even dedicated to a particular protocol
instance. What input is required to optimize receiver-based flow control is not 
fully specified.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
suggests (Section 5) that the values
to be advertised:

"use a formula based on an off line tests of
   the overall LSPDU processing speed for a particular set of hardware
   and the number of interfaces configured for IS-IS"

implying that the advertised value is intentionally not dynamic. As such,
it could just as easily be configured on the transmit side and not require
additional signaling. As a static value, it would necessarily be somewhat
conservative as it has to account for the worst case under the current
configuration - which means it needs to consider concurrent use of the CPU
and dataplane by all protocols/features which are enabled on a router - not all 
of whose
use is likely to be synchronized with peak IS-IS flooding load.
[Bruno] _Assuming_ that the parameters are static, those parameters

o   are the same as the one implemented (configured) on multiple 
implementations, including the one from your employer. Now do you believe that 
those parameters can be determined?

§  If yes, how do you do _today_ on your implementation? (this seems to 
contradict your statement that you have no way to figure out how to find the 
right value)

§  If no, why did you implement those parameters, and ask network operator to 
configure them?

§  There is also the option to reply: I don't know but don't care as I leave 
the issue to the network operator.

o   can still provide some form of dynamicity, by using the PSNP as dynamic 
acknowledgement.

o   are really dependent on the receiver, not the sender.

§  the sender will never overload itself.

§  The receiver has more information,  knowing its processing power (low end, 
high end, 80s, 20s (currently we are stuck with 20 years old value assuming the 
worst possible receiver (and worst there were, including with packet processing 
partly done in the control plane processor)), its expected IS-IS load 
(#neighbors), its preference for bursty LSP reception (high delay between IS-IS 
CPU allocation cycles, memory not an issue up to x kilo-octet...), its expected 
control plane load if IS-IS traffic has not higher priority over oth

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-18 Thread Tony Przygienda
es.
>
> Obtaining feedback from earlier stages requires real-time updates from
> data plane to IS-IS in the control plane. This is much more challenging.
> Idiosyncrasies of platforms will have a significant impact on how to
> meaningfully interpret and use the data. How to integrate data from the
> various stages – especially when the numbers are not specific to IS-IS
> packets – is not intuitive.
>
>
>
> https://tools.ietf.org/html/draft-decraene-lsr-isis-flooding-speed-03
> provides no guidance on how information from the dataplane could be used.
>
> As an alternative, it suggests that static parameters derived from offline
> tests could be advertised. But static parameters would necessarily have to
> be conservative as they would need to account for the worst case scenarios.
>
>
>
> draft-ginsberg-lsr-isis-flooding-scale proposes dynamic flow control based
> on the state of the transmitter. In this model, there is no dependency on
> platform implementation. The number of unacknowledged LSPs sent on an
> interface is used as input to the flow control algorithm. This accounts for
> all reasons why a receiver may be slow to acknowledge without requiring
> knowledge of which stage(s) described above are affecting the receiver’s
> ability to provide timely acknowledgements.
>
>
>
>Les
>
>
>
>
>
> *From:* bruno.decra...@orange.com 
> *Sent:* Wednesday, February 26, 2020 11:03 AM
> *To:* Les Ginsberg (ginsberg) 
> *Cc:* lsr@ietf.org
> *Subject:* RE: Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Les,
>
>
>
> Please see inline[Bruno]
>
>
>
> *From:* Lsr [mailto:lsr-boun...@ietf.org ] *On
> Behalf Of *Les Ginsberg (ginsberg)
> *Sent:* Wednesday, February 19, 2020 3:32 AM
> *To:* lsr@ietf.org
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Base protocol operation of the Update process tracks the flooding of
>
> LSPs/interface and guarantees timer-based retransmission on P2P interfaces
>
> until an acknowledgment is received.
>
>
>
> Using this base protocol mechanism in combination with exponential backoff
> of the
>
> retransmission timer provides flow control in the event of temporary
> overload
>
> of the receiver.
>
>
>
> This mechanism works without protocol extensions, is dynamic, operates
>
> independent of the reason for delayed acknowledgment (dropped packets, CPU
>
> overload), and does not require additional signaling during the overloaded
>
> period.
>
>
>
> This is consistent with the recommendations in RFC 4222 (OSPF).
>
>
>
> Receiver-based flow control (as proposed in
> https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
>
> requires protocol extensions and introduces additional signaling during
>
> periods of high load. The asserted reason for this is to optimize
> throughput -
>
> but there is no evidence that it will achieve this goal.
>
>
>
> Mention has been made to TCP-like flow control mechanisms as a model -
> which
>
> are indeed receiver based. However, there are significant differences
> between
>
> TCP sessions and IGP flooding.
>
>
>
> TCP consists of a single session between two endpoints. Resources
>
> (primarily buffer space) for this session are typically allocated in the
>
> control plane and current usage is easily measurable..
>
>
>
> IGP flooding is point-to-multi-point, resources to support IGP flooding
>
> involve both control plane queues and dataplane queues, both of which are
>
> typically not per interface - nor even dedicated to a particular protocol
>
> instance. What input is required to optimize receiver-based flow control
> is not fully specified.
>
>
>
> https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
> suggests (Section 5) that the values
>
> to be advertised:
>
>
>
> "use a formula based on an off line tests of
>
>the overall LSPDU processing speed for a particular set of hardware
>
>and the number of interfaces configured for IS-IS"
>
>
>
> implying that the advertised value is intentionally not dynamic. As such,
>
> it could just as easily be configured on the transmit side and not require
>
> additional signaling. As a static value, it would necessarily be somewhat
>
> conservative as it has to account for the worst case under the current
>
> configuration - which means it needs to consider concurrent use of the CPU
>
> and dataplane by all protocols/features which are enabled on a router -
> not all of whose
>
> use is likely to be synchronized with peak IS-IS flooding

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-04-17 Thread Les Ginsberg (ginsberg)
Bruno -

Returning to this old thread...

The following is a generic description of how receive path for IS-IS PDUs 
functions today - based on examination of implementations on a variety of 
platforms from my employer. It is, for obvious reasons,  generic and 
intentionally omits any discussion of implementation details.
But it is hopefully detailed enough to illustrate some of the challenges in 
regards to providing real-time  accurate feedback to the control plane that 
could be used by a receiver based flow control mechanism.

Stage 1: Input Policing

As a first step, a policer operates at ingress to rate limit the number of 
packets which will be punted for local processing. This typically operates in 
an interface independent manner - limiting the total number of packets per 
second across all interfaces.
The policer may rate limit different classes of packets or simply limit all 
types of packets. A given platform may apply a limit specific to IS-IS PDUs or 
apply a limit to a class of packets (e.g., OSPF+BGP+IS-IS combined).

Stage 2: Punt Queue Shaping

Received packets are then placed in a queue for eventual transfer to the 
control plane. The number of queues varies. In some cases a single queue for 
all packet types is used. In other cases packets are placed on different queues 
based on packet classes.
Each queue is typically bounded to a maximum number of packets. As the queue 
usage approaches a limit, shaping policies are applied to prioritize certain 
packet types over others.
An upper limit specific to IS-IS packets may be employed - or a limit may be 
applied to a larger class of packet types of which IS-IS is only one of the 
packet types in the class.

Stage 3: Transfer to control Plane

Packets are then transferred to the control plane. Control plane input queues 
may map 1-1 with the data plane queues or map many to 1.
If the incoming packets are encapsulated (for example GRE) they may be 
transferred to a media specific control plane queue to process the 
encapsulation header. In some cases encapsulation may be processed in the data 
plane prior to transfer to the control plane.

Stage 4: Transfer to IS-IS

Packets are then transferred to a queue which is read directly by IS-IS. In the 
event there are multiple IS-IS instances, implementations may choose to have a 
shared queue which drives the execution of all instances or have instance 
specific queues filtered based on incoming interface.

A single queue is typically used for all interfaces and all IS-IS packet types. 
Subsequent processing may requeue packets based on packet type e.g., separating 
processing of hellos from processing of LSPs/SNPs.

*

A receiver based flow control mechanism which attempts to make dynamic 
adjustments needs to obtain real-time feedback from one or more of the above 
stages. Monitoring the state of the input queue to IS-IS is easy to do, but 
would not account for drops at previous stages.
Obtaining feedback from earlier stages requires real-time updates from data 
plane to IS-IS in the control plane. This is much more challenging. 
Idiosyncrasies of platforms will have a significant impact on how to 
meaningfully interpret and use the data. How to integrate data from the various 
stages - especially when the numbers are not specific to IS-IS packets - is not 
intuitive.

https://tools.ietf.org/html/draft-decraene-lsr-isis-flooding-speed-03 provides 
no guidance on how information from the dataplane could be used.
As an alternative, it suggests that static parameters derived from offline 
tests could be advertised. But static parameters would necessarily have to be 
conservative as they would need to account for the worst case scenarios.

draft-ginsberg-lsr-isis-flooding-scale proposes dynamic flow control based on 
the state of the transmitter. In this model, there is no dependency on platform 
implementation. The number of unacknowledged LSPs sent on an interface is used 
as input to the flow control algorithm. This accounts for all reasons why a 
receiver may be slow to acknowledge without requiring knowledge of which 
stage(s) described above are affecting the receiver's ability to provide timely 
acknowledgements.

   Les


From: bruno.decra...@orange.com 
Sent: Wednesday, February 26, 2020 11:03 AM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: RE: Flow Control Discussion for IS-IS Flooding Speed

Les,

Please see inline[Bruno]

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Wednesday, February 19, 2020 3:32 AM
To: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Base protocol operation of the Update process tracks the flooding of
LSPs/interface and guarantees timer-based retransmission on P2P interfaces
until an acknowledgment is received.

Using this base protocol mechanism in combination with exponential backoff of 
the
retransmission timer provides flow control in the

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-03-10 Thread Tony Przygienda
ony –
> > >
> > > If you have a suggestion for Tx back-off algorithm please feel free to
> share.
> > > The proposal in the draft is just a suggestion.
> > > As this is a local matter there is no interoperability issue, but
> certainly documenting a better algorithm is worthwhile.
> >
> > [as WG member]
> >
> > The main thing I'm afraid of is we're just making up some new overly
> simple congestion control algorithm (are there CC experts reviewing this?);
> maybe simulate it a few ways, deploy it, and have it work poorly or make
> things worse. In any case, damn the torpedos...
> >
> > In this current algorithm how does MaxLSPTx get set? What happens if
> MaxLSPTx is too high? If its too low we could be missing a much faster
> convergence capability.
> >
> > What if we had more quality information from the receiver, could we do a
> better job here? Maybe faster ACKs, or could we include a timestamp somehow
> to calculate RTT? This is the type of data that is used by existing CC
> algorithms (https://tools.ietf.org/html/rfc4342,
> https://tools.ietf.org/html/rfc5348). Of course going through these
> documents (which I've had to do for in another area) can start making one
> think "Punt to TCP" :)
> >
> > What would be nice, if we're going to attempt CC, is that the algorithm
> would be good enough to send relatively fast to start, adjust quickly if
> need be, and allow for *increasing* the send rate. The increasing part I
> think is important, if we take this work on, and I don't think it's
> currently covered.
> >
> > I also don't have a good feel for how quickly the current suggested
> algorithm adjusts its send rate when it needs to. The correct value for
> Usafe seems very much dependent on the receivers partialSNPInterval. It's
> so dependent that one might imagine it would be smart for the receiver to
> signal the value to the transmitter so that the transmitter can set Usafe
> correctly.
> >
> > Thanks,
> > Chris.
> > [as WG member]
> >
> >
> >
> > >
> > >Les (claws in check  )
> > >
> > >
> > > From: Tony Przygienda 
> > > Sent: Wednesday, February 19, 2020 11:25 AM
> > > To: Les Ginsberg (ginsberg) 
> > > Cc: Peter Psenak (ppsenak) ; Tony Li <
> tony1ath...@gmail.com>; lsr@ietf.org; tony...@tony.li
> > > Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> > >
> > > Having worked for last couple of years on implementation of flooding
> speeds that converge LSDBs some order of magnitudes above today's speeds
> ;-) here's a bunch of observations
> > >
> > > 1. TX side is easy and useful. My observation having gone quickly over
> the -ginsberg- draft is that you really want a better hysterisis there,
> it's bit too vertical and you will generate oscillations rather than walk
> around the equilibrium ;-)
> > > 2. Queue per interface is fairly trivial with modern implementation
> techniques and memory sizes if done correctly. Yes, very memory constrained
> platforms are a mildly different game and kind of precondition a different
> discussion.
> > > 3. RX side is possible and somewhat useful but much harder to do well
> depending on flavor. If we're talking about the RX advertising a very
> static value to cap the flooding speed that's actually a useful knob to
> have IMO/IME. Trying to cleverly communicate to the TXer a window size is
> not only fiendishly difficult, incurs back propagation speed (not
> neglectible @ those rates IME) but can easily lead to subtle flood
> starvation behaviors and lots of slow starts due to mixture of control loop
> dynamics and implementation complexity of such a scheme. Though, giving the
> TXer some hint that a backpressure is desired is however not a bad thing
> IME and can be derived failry easily without needs for checking queue sizes
> and so on. It's observable by looking @ some standard stats on what is
> productive incoming rate on the interface. Anything smarter needs new TLVs
> on packets & then you have a problem under/oversampling based on hellos
> (too low a frequency) and ACKs (too bursty, too batchy) and flooded back
> LSPs (too unpredictable)
> > >
> > > For more details I can recommend rift draft of course ;-)
> > >
> > > otherwise I'm staying out from this mildly feline spat ;-)
> > >
> > > --- tony
> > >
> > > On Wed, Feb 19, 2020 at 9:59 AM Les Ginsberg (ginsberg) <
> ginsb...@cisco.com<mailto:ginsb...@cisco.com>> wrote:
> > > Tony -
> > >
> > > Peter has a done a great 

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-03-10 Thread Christian Hopps
imulate it a few ways, deploy it, and have it work poorly or make things 
> worse. In any case, damn the torpedos...
> 
> In this current algorithm how does MaxLSPTx get set? What happens if MaxLSPTx 
> is too high? If its too low we could be missing a much faster convergence 
> capability.
> 
> What if we had more quality information from the receiver, could we do a 
> better job here? Maybe faster ACKs, or could we include a timestamp somehow 
> to calculate RTT? This is the type of data that is used by existing CC 
> algorithms (https://tools.ietf.org/html/rfc4342, 
> https://tools.ietf.org/html/rfc5348). Of course going through these documents 
> (which I've had to do for in another area) can start making one think "Punt 
> to TCP" :)
> 
> What would be nice, if we're going to attempt CC, is that the algorithm would 
> be good enough to send relatively fast to start, adjust quickly if need be, 
> and allow for *increasing* the send rate. The increasing part I think is 
> important, if we take this work on, and I don't think it's currently covered.
> 
> I also don't have a good feel for how quickly the current suggested algorithm 
> adjusts its send rate when it needs to. The correct value for Usafe seems 
> very much dependent on the receivers partialSNPInterval. It's so dependent 
> that one might imagine it would be smart for the receiver to signal the value 
> to the transmitter so that the transmitter can set Usafe correctly.
> 
> Thanks,
> Chris.
> [as WG member]
> 
> 
> 
> >
> >Les (claws in check  )
> >
> >
> > From: Tony Przygienda 
> > Sent: Wednesday, February 19, 2020 11:25 AM
> > To: Les Ginsberg (ginsberg) 
> > Cc: Peter Psenak (ppsenak) ; Tony Li 
> > ; lsr@ietf.org; tony...@tony.li
> > Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> >
> > Having worked for last couple of years on implementation of flooding speeds 
> > that converge LSDBs some order of magnitudes above today's speeds  ;-) 
> > here's a bunch of observations
> >
> > 1. TX side is easy and useful. My observation having gone quickly over the 
> > -ginsberg- draft is that you really want a better hysterisis there, it's 
> > bit too vertical and you will generate oscillations rather than walk around 
> > the equilibrium ;-)
> > 2. Queue per interface is fairly trivial with modern implementation 
> > techniques and memory sizes if done correctly. Yes, very memory constrained 
> > platforms are a mildly different game and kind of precondition a different 
> > discussion.
> > 3. RX side is possible and somewhat useful but much harder to do well 
> > depending on flavor. If we're talking about the RX advertising a very 
> > static value to cap the flooding speed that's actually a useful knob to 
> > have IMO/IME. Trying to cleverly communicate to the TXer a window size is 
> > not only fiendishly difficult, incurs back propagation speed (not 
> > neglectible @ those rates IME) but can easily lead to subtle flood 
> > starvation behaviors and lots of slow starts due to mixture of control loop 
> > dynamics and implementation complexity of such a scheme. Though, giving the 
> > TXer some hint that a backpressure is desired is however not a bad thing 
> > IME and can be derived failry easily without needs for checking queue sizes 
> > and so on. It's observable by looking @ some standard stats on what is 
> > productive incoming rate on the interface. Anything smarter needs new TLVs 
> > on packets & then you have a problem under/oversampling based on hellos 
> > (too low a frequency) and ACKs (too bursty, too batchy) and flooded back 
> > LSPs (too unpredictable)
> >
> > For more details I can recommend rift draft of course ;-)
> >
> > otherwise I'm staying out from this mildly feline spat ;-)
> >
> > --- tony
> >
> > On Wed, Feb 19, 2020 at 9:59 AM Les Ginsberg (ginsberg) 
> > mailto:ginsb...@cisco.com>> wrote:
> > Tony -
> >
> > Peter has a done a great job of highlighting that "single queue" is an 
> > oversimplification - I have nothing to add to that discussion.
> >
> > I would like to point out another aspect of the Rx based solution.
> >
> > As you need to send signaling based upon dynamic receiver state and this 
> > signaling is contained in unreliable PDUs (hellos) and to be useful this 
> > signaling needs to be sent ASAP - you cannot wait until the next periodic 
> > hello interval (default 10 seconds) to expire. So you are going to have to 
> > introduce extra hello traffic at a time when protocol input queues are 
> &g

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-03-10 Thread Tony Przygienda
On Tue, Mar 10, 2020 at 9:43 AM  wrote:

> With regards to punting to TCP, I think that TCP (stacks) enforce packet
> ordering.
>
> i.e. if you receive packets 1, 3,4, …,N, then you can only use packet 1
> until you receive packet 2. In the meantime, you cannot use the (N-2)
> packets that you did received.
>
> That seems like a regression for IS-IS which doesn’t requiring LSPs
> ordering. (vs BGP).
>

well, TCP is a windowing protocol and they're hard and very
bandwidth*delay/loss/jitter sensitive and yes, TCP is reliable transport
and not unreliable like ISIS flooding is ... Having said that, insane
amount of work has been done on TCP variants in all kind of stacks &
flavors coming full circle from fast start to slow start to fast start
again ;-)

Stepping back a bit, what we have in ISIS is basically a DHT and the
fastest way to synchronize that AFAIS is bittorrent transport ;-) When
doing RIFT I looked @ their transport & one can learn a lot from that but
it's a brave new world compared to the esteemed 10589 ;-) If one does that
kind of transport well, it blows TCP perf away in good scenario but then,
if we look for something 'download-open-source-and-plugin' kind except some
10 years old DCCP code you ain't gonna find anything massively hardened,
small or easily pluggable into an ISIS implementation. modulo open source
quic i didn't check the state off for last year or so or some secret sauce
someone found or maybe some of the p2p code got modularized and documented
;-) ...


>
> Also, from what I’ve been told from BGP implementers, TCP is not magic and
> can’t be treated as a black box (if you want scale/performance).
>

that's a mild understatement if you ask me after having lived in the
trenches since 1995 or so ;-)  On the other hand, the work has been done
often already and could be piggy-backed on but then, we buy ourselves
mandatory IP addressing into ISIS transport and lots of other interesting,
undesirable things TCP does since it's not a hop-by-hop transport but
"something that always tries to find a way". And next thing we'll need
TCP-AO so beware what you wish for ;-)

--- tony
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-03-10 Thread bruno.decraene
With regards to punting to TCP, I think that TCP (stacks) enforce packet 
ordering.
i.e. if you receive packets 1, 3,4, …,N, then you can only use packet 1 until 
you receive packet 2. In the meantime, you cannot use the (N-2) packets that 
you did received.
That seems like a regression for IS-IS which doesn’t requiring LSPs ordering. 
(vs BGP).

Also, from what I’ve been told from BGP implementers, TCP is not magic and 
can’t be treated as a black box (if you want scale/performance).

1 cent
--Bruno

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Tony Przygienda
Sent: Tuesday, March 10, 2020 4:23 PM
To: Christian Hopps
Cc: lsr@ietf.org; tony...@tony.li; Tony Li; Peter Psenak (ppsenak)
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hey Christian, MaxTX is not that hard to derive since it's basically limited by 
the local system and its CPU/prioritization/queing architecture.

For the rest of your email, in short, you have my observations in the previous 
email what I think is useful and can be done ... BTW, timestamps are useless 
unless you synchronize clocks and with all the queing that ISIS does through 
the system normally to get stuff done it is very hard to account for delays 
between packet being generated (or rx'ed on interface) and last queue it's 
pulled from usually. More musings below backed by good amount of work & 
empirical experience ;-)

If we try to punt to TCP (like BGP did in its time which I argued wasn't the 
optimal idea that has bitten us back endless amount of times for the shortcut 
it was ;-) then de facto, no real heavy duty IP box is using stock TCP stack, 
at least in the scope of experience I have across bunch of leading vendors. If 
you worked on mod'ing TCP for convergence speed with BGP and cute little things 
like GR/NSR you will know the practical problems and also why stock TCP is 
actually fairly underwhelming when it comes to push large amounts of control 
data around (mod' distro, mod rusty 2c, mod etc but that's my data)..

And agreed, control theory is a wonderful thing and transfer windowing 
protocols etc are long research if you know where to look and lots of the stuff 
is e.g. present in TCP, QUIC or https://tools.ietf.org/html/rfc4340 and so on. 
All of them are quite a lot of stuff to put into ISIS/link-state and mostly do 
not do precisely what we need or precondition things we can't afford under 
heavy load (very fast, non slip timers which are absolutely non trivial if 
you're not in kernel). On top of that you'd need to drag 2 protocol transports 
around now with old ISIS flooding with RE-TX and and the new thing that should 
be doing the stuff by itself (and negotiate transport on top and so on). To 
give you a rought idea DDCP which is probably smallest is ~ 10KLOC of C in user 
space in BETA and zero docs ;-) I looked @ the practically existing stuff 2+ 
years ago in detail when doing RIFT ;-) and with all what I practically found I 
ended up carving out the pieces we need for fast flooding without introducing 
fast-acks which IMO would be a non-starter for high scale link-state or rather, 
if we really want that, the loop closes and we should go practically speaking 
to TCP (or 4340 which looked like a better choice to me but  just like e.g. 
Open-R did and be done with it) or wait for the mythical QUIC 
all-singing-all-dancing public domain implementation maybe. For many reasons I 
do not think it would be a particularly good development entangling a control 
protocol again with a user transport in the whole ball of yarn that IP is 
already.

kind of all I had to say, next thing ;-)

--- tony

On Tue, Mar 10, 2020 at 7:48 AM Christian Hopps 
mailto:cho...@chopps.org>> wrote:

Les Ginsberg (ginsberg) mailto:ginsb...@cisco.com>> writes:

> Tony –
>
> If you have a suggestion for Tx back-off algorithm please feel free to share.
> The proposal in the draft is just a suggestion.
> As this is a local matter there is no interoperability issue, but certainly 
> documenting a better algorithm is worthwhile.

[as WG member]

The main thing I'm afraid of is we're just making up some new overly simple 
congestion control algorithm (are there CC experts reviewing this?); maybe 
simulate it a few ways, deploy it, and have it work poorly or make things 
worse. In any case, damn the torpedos...

In this current algorithm how does MaxLSPTx get set? What happens if MaxLSPTx 
is too high? If its too low we could be missing a much faster convergence 
capability.

What if we had more quality information from the receiver, could we do a better 
job here? Maybe faster ACKs, or could we include a timestamp somehow to 
calculate RTT? This is the type of data that is used by existing CC algorithms 
(https://tools.ietf.org/html/rfc4342, https://tools.ietf.org/html/rfc5348). Of 
course going through these documents (which I've had to do for in another area) 
can start making one think "Punt to TCP" :)

W

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-03-10 Thread Tony Przygienda
gt; Thanks,
> Chris.
> [as WG member]
>
>
>
> >
> >Les (claws in check  )
> >
> >
> > From: Tony Przygienda 
> > Sent: Wednesday, February 19, 2020 11:25 AM
> > To: Les Ginsberg (ginsberg) 
> > Cc: Peter Psenak (ppsenak) ; Tony Li <
> tony1ath...@gmail.com>; lsr@ietf.org; tony...@tony.li
> > Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> >
> > Having worked for last couple of years on implementation of flooding
> speeds that converge LSDBs some order of magnitudes above today's speeds
> ;-) here's a bunch of observations
> >
> > 1. TX side is easy and useful. My observation having gone quickly over
> the -ginsberg- draft is that you really want a better hysterisis there,
> it's bit too vertical and you will generate oscillations rather than walk
> around the equilibrium ;-)
> > 2. Queue per interface is fairly trivial with modern implementation
> techniques and memory sizes if done correctly. Yes, very memory constrained
> platforms are a mildly different game and kind of precondition a different
> discussion.
> > 3. RX side is possible and somewhat useful but much harder to do well
> depending on flavor. If we're talking about the RX advertising a very
> static value to cap the flooding speed that's actually a useful knob to
> have IMO/IME. Trying to cleverly communicate to the TXer a window size is
> not only fiendishly difficult, incurs back propagation speed (not
> neglectible @ those rates IME) but can easily lead to subtle flood
> starvation behaviors and lots of slow starts due to mixture of control loop
> dynamics and implementation complexity of such a scheme. Though, giving the
> TXer some hint that a backpressure is desired is however not a bad thing
> IME and can be derived failry easily without needs for checking queue sizes
> and so on. It's observable by looking @ some standard stats on what is
> productive incoming rate on the interface. Anything smarter needs new TLVs
> on packets & then you have a problem under/oversampling based on hellos
> (too low a frequency) and ACKs (too bursty, too batchy) and flooded back
> LSPs (too unpredictable)
> >
> > For more details I can recommend rift draft of course ;-)
> >
> > otherwise I'm staying out from this mildly feline spat ;-)
> >
> > --- tony
> >
> > On Wed, Feb 19, 2020 at 9:59 AM Les Ginsberg (ginsberg) <
> ginsb...@cisco.com<mailto:ginsb...@cisco.com>> wrote:
> > Tony -
> >
> > Peter has a done a great job of highlighting that "single queue" is an
> oversimplification - I have nothing to add to that discussion.
> >
> > I would like to point out another aspect of the Rx based solution.
> >
> > As you need to send signaling based upon dynamic receiver state and this
> signaling is contained in unreliable PDUs (hellos) and to be useful this
> signaling needs to be sent ASAP - you cannot wait until the next periodic
> hello interval (default 10 seconds) to expire. So you are going to have to
> introduce extra hello traffic at a time when protocol input queues are
> already stressed.
> >
> > Given hellos are unreliable, the question of how many transmissions of
> the update flow info is enough arises. You could make this more
> deterministic by enhancing the new TLV to include information received from
> the neighbor so that each side would know when the neighbor had received
> the updated info. This then requires additional hellos be sent in both
> directions - which exacerbates the queue issues on both receiver and
> transmitter.
> >
> > It is true (of course) that hellos should be treated with higher
> priority than other PDUs, but this does not mean that the additional hellos
> have no impact on the queue space available for LSPs/SNPs.
> >
> > Also, it seems like you are proposing interface independent logic, so
> you will be adjusting flow information on all interfaces enabled for IS-IS,
> which means that additional hello traffic will occur on all interfaces. At
> scale this is concerning.
> >
> >Les
> >
> >
> >> -Original Message-
> >> From: Peter Psenak mailto:ppse...@cisco.com>>
> >> Sent: Wednesday, February 19, 2020 2:49 AM
> >> To: Tony Li mailto:tony1ath...@gmail.com>>
> >> Cc: Les Ginsberg (ginsberg)  ginsb...@cisco.com>>; tony...@tony.li<mailto:tony...@tony.li>;
> >> lsr@ietf.org<mailto:lsr@ietf.org>
> >> Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> >>
> >> Tony,
> >>
> >> On 19/02/2020 11:37, Tony Li wrote:
> >> > Peter,
&

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-03-10 Thread Christian Hopps


Les Ginsberg (ginsberg)  writes:


Tony –

If you have a suggestion for Tx back-off algorithm please feel free to share.
The proposal in the draft is just a suggestion.
As this is a local matter there is no interoperability issue, but certainly 
documenting a better algorithm is worthwhile.


[as WG member]

The main thing I'm afraid of is we're just making up some new overly simple 
congestion control algorithm (are there CC experts reviewing this?); maybe 
simulate it a few ways, deploy it, and have it work poorly or make things 
worse. In any case, damn the torpedos...

In this current algorithm how does MaxLSPTx get set? What happens if MaxLSPTx 
is too high? If its too low we could be missing a much faster convergence 
capability.

What if we had more quality information from the receiver, could we do a better job here? 
Maybe faster ACKs, or could we include a timestamp somehow to calculate RTT? This is the 
type of data that is used by existing CC algorithms (https://tools.ietf.org/html/rfc4342, 
https://tools.ietf.org/html/rfc5348). Of course going through these documents (which I've 
had to do for in another area) can start making one think "Punt to TCP" :)

What would be nice, if we're going to attempt CC, is that the algorithm would 
be good enough to send relatively fast to start, adjust quickly if need be, and 
allow for *increasing* the send rate. The increasing part I think is important, 
if we take this work on, and I don't think it's currently covered.

I also don't have a good feel for how quickly the current suggested algorithm 
adjusts its send rate when it needs to. The correct value for Usafe seems very 
much dependent on the receivers partialSNPInterval. It's so dependent that one 
might imagine it would be smart for the receiver to signal the value to the 
transmitter so that the transmitter can set Usafe correctly.

Thanks,
Chris.
[as WG member]





   Les (claws in check  )


From: Tony Przygienda 
Sent: Wednesday, February 19, 2020 11:25 AM
To: Les Ginsberg (ginsberg) 
Cc: Peter Psenak (ppsenak) ; Tony Li 
; lsr@ietf.org; tony...@tony.li
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Having worked for last couple of years on implementation of flooding speeds 
that converge LSDBs some order of magnitudes above today's speeds  ;-) here's a 
bunch of observations

1. TX side is easy and useful. My observation having gone quickly over the 
-ginsberg- draft is that you really want a better hysterisis there, it's bit 
too vertical and you will generate oscillations rather than walk around the 
equilibrium ;-)
2. Queue per interface is fairly trivial with modern implementation techniques 
and memory sizes if done correctly. Yes, very memory constrained platforms are 
a mildly different game and kind of precondition a different discussion.
3. RX side is possible and somewhat useful but much harder to do well depending on 
flavor. If we're talking about the RX advertising a very static value to cap the 
flooding speed that's actually a useful knob to have IMO/IME. Trying to cleverly 
communicate to the TXer a window size is not only fiendishly difficult, incurs back 
propagation speed (not neglectible @ those rates IME) but can easily lead to subtle 
flood starvation behaviors and lots of slow starts due to mixture of control loop 
dynamics and implementation complexity of such a scheme. Though, giving the TXer 
some hint that a backpressure is desired is however not a bad thing IME and can be 
derived failry easily without needs for checking queue sizes and so on. It's 
observable by looking @ some standard stats on what is productive incoming rate on 
the interface. Anything smarter needs new TLVs on packets & then you have a 
problem under/oversampling based on hellos (too low a frequency) and ACKs (too 
bursty, too batchy) and flooded back LSPs (too unpredictable)

For more details I can recommend rift draft of course ;-)

otherwise I'm staying out from this mildly feline spat ;-)

--- tony

On Wed, Feb 19, 2020 at 9:59 AM Les Ginsberg (ginsberg) 
mailto:ginsb...@cisco.com>> wrote:
Tony -

Peter has a done a great job of highlighting that "single queue" is an 
oversimplification - I have nothing to add to that discussion.

I would like to point out another aspect of the Rx based solution.

As you need to send signaling based upon dynamic receiver state and this 
signaling is contained in unreliable PDUs (hellos) and to be useful this 
signaling needs to be sent ASAP - you cannot wait until the next periodic hello 
interval (default 10 seconds) to expire. So you are going to have to introduce 
extra hello traffic at a time when protocol input queues are already stressed.

Given hellos are unreliable, the question of how many transmissions of the 
update flow info is enough arises. You could make this more deterministic by 
enhancing the new TLV to include information received from the neighbor so that 
each side would know 

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-26 Thread Les Ginsberg (ginsberg)
Bruno –

I know there are more issues behind your comment, but sending PSNPs in response 
to receipt of LSPs is base protocol behavior – there is nothing new here.
And sending them in a timely manner is also nothing new.
I think the discussion with Robert was only that – in the context of greater 
scale/faster flooding – keeping the delay in sending PSNP to a small value 
becomes more important.

There is a legitimate discussion to be had as to whether additional information 
from the receiver is useful – and that discussion has started.
If there is agreement on the need for new information we can then discuss in 
what form to send it. New TLV in hellos has been mentioned.  It is conceivable 
that new TLV in SNPs could be used – though this would be less appealing on 
LANs.

But first we have to agree on whether new information should be sent at all.

   Les

From: bruno.decra...@orange.com 
Sent: Wednesday, February 26, 2020 10:34 AM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: RE: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Les,


From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Wednesday, February 19, 2020 6:49 PM
To: Robert Raszuk
Cc: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Robert –

Thanx for your input.

Note that one of the suggestions in 
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  is to 
prioritize the reception of SNPs over LSPs so that we are less likely to drop 
ACKs.
It is not clear to me why you think SNP generation would be an issue.
Once a received LSP is processed one of the outputs is to set a per interface 
flag indicating that an ACK (PSNP) needs to be sent (SSN flag). Implementations 
usually implement some small delay so that multiple ACKs can be sent in a 
single PSNP – but I do not see why this should be viewed as a bottleneck.

If your concern is that we need to emphasize the importance of sending timely 
ACKs, I think we could look at text that makes that point.
[Bruno]
So you need a new behavior on the Rx side (Rx with respect to LSP).  This is 
_not_ Tx only with no need for protocol change.
And BTW, this is called a feedback from the Rx to the Tx.

As we change the protocol on the Rx side, we have the opportunity to report 
more information from Rx to the Tx.

--Bruno

   Les


From: Lsr mailto:lsr-boun...@ietf.org>> On Behalf Of 
Robert Raszuk
Sent: Wednesday, February 19, 2020 1:07 AM
To: Les Ginsberg (ginsberg) mailto:ginsb...@cisco.com>>
Cc: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hi Les & all,

Watching this discussion I would like to state that IMO going with transmitter 
based rate limiting (I would not call it flow control) is much easier option to 
deploy and operate. It also has no dependency across other side of p2p adj 
which is a very important factor. The only issue here is if generation of 
[P|C]SNPs is fast enough.

Receiver based flow control is simple in flow theory however I have a feeling 
that if we are to go that path we would be much better to actually run ISIS 
flooding over DC-TCP and avoid reinventing the wheel.

Thx,
Robert.

On Wed, Feb 19, 2020 at 3:26 AM Les Ginsberg (ginsberg) 
mailto:ginsb...@cisco.com>> wrote:
Two recent drafts advocate for the use of faster LSP flooding speeds in IS-IS:

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/

There is strong agreement on two key points:

1)Modern networks require much faster flooding speeds than are commonly in use 
today

2)To deploy faster flooding speeds safely some form of flow control is needed

The key point of contention between the two drafts is how flow control should 
be implemented.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
advocates for a receiver based flow control where the receiver advertises in 
hellos the parameters which indicate the rate/burst size which the receiver is 
capable of supporting on the interface. Senders are required to limit the rate 
of LSP transmission on that interface in accordance with the values advertised 
by the receiver.

https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.

While other differences between the two drafts exist, it is fair to say that if 
agreement could be reached on the form of flow control  then it is likely other 
issues could be resolved easily.

This email starts the discussion regarding the flow control issue.



___
Lsr mailing list
Lsr@i

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-26 Thread bruno.decraene
Les,

Please see inline[Bruno]

From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Wednesday, February 19, 2020 3:32 AM
To: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Base protocol operation of the Update process tracks the flooding of
LSPs/interface and guarantees timer-based retransmission on P2P interfaces
until an acknowledgment is received.

Using this base protocol mechanism in combination with exponential backoff of 
the
retransmission timer provides flow control in the event of temporary overload
of the receiver.

This mechanism works without protocol extensions, is dynamic, operates
independent of the reason for delayed acknowledgment (dropped packets, CPU
overload), and does not require additional signaling during the overloaded
period.

This is consistent with the recommendations in RFC 4222 (OSPF).

Receiver-based flow control (as proposed in 
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
requires protocol extensions and introduces additional signaling during
periods of high load. The asserted reason for this is to optimize throughput -
but there is no evidence that it will achieve this goal.

Mention has been made to TCP-like flow control mechanisms as a model - which
are indeed receiver based. However, there are significant differences between
TCP sessions and IGP flooding.

TCP consists of a single session between two endpoints. Resources
(primarily buffer space) for this session are typically allocated in the
control plane and current usage is easily measurable..

IGP flooding is point-to-multi-point, resources to support IGP flooding
involve both control plane queues and dataplane queues, both of which are
typically not per interface - nor even dedicated to a particular protocol
instance. What input is required to optimize receiver-based flow control is not 
fully specified.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
suggests (Section 5) that the values
to be advertised:

"use a formula based on an off line tests of
   the overall LSPDU processing speed for a particular set of hardware
   and the number of interfaces configured for IS-IS"

implying that the advertised value is intentionally not dynamic. As such,
it could just as easily be configured on the transmit side and not require
additional signaling. As a static value, it would necessarily be somewhat
conservative as it has to account for the worst case under the current
configuration - which means it needs to consider concurrent use of the CPU
and dataplane by all protocols/features which are enabled on a router - not all 
of whose
use is likely to be synchronized with peak IS-IS flooding load.
[Bruno] _Assuming_ that the parameters are static, those parameters

o   are the same as the one implemented (configured) on multiple 
implementations, including the one from your employer. Now do you believe that 
those parameters can be determined?

§  If yes, how do you do _today_ on your implementation? (this seems to 
contradict your statement that you have no way to figure out how to find the 
right value)

§  If no, why did you implement those parameters, and ask network operator to 
configure them?

§  There is also the option to reply: I don't know but don't care as I leave 
the issue to the network operator.

o   can still provide some form of dynamicity, by using the PSNP as dynamic 
acknowledgement.

o   are really dependent on the receiver, not the sender.

§  the sender will never overload itself.

§  The receiver has more information,  knowing its processing power (low end, 
high end, 80s, 20s (currently we are stuck with 20 years old value assuming the 
worst possible receiver (and worst there were, including with packet processing 
partly done in the control plane processor)), its expected IS-IS load 
(#neighbors), its preference for bursty LSP reception (high delay between IS-IS 
CPU allocation cycles, memory not an issue up to x kilo-octet...), its expected 
control plane load if IS-IS traffic has not higher priority over other control 
plane traffic...), it's expected level of QoS prioritization [1]

·  [1] looks for "Extended SPD Headroom". E.g. "Since IGP and link 
stability are more tenuous and more crucial than BGP stability, such packets 
are now given the highest priority and are given extended SPD headroom with a 
default of 10 packets. This means that these packets are not dropped if the 
size of the input hold queue is lower than 185 (input queue default size + spd 
headroom size + spd extended headroom)."

o   And this is for distributed architecture, 15 years ago. So what about using 
the above number (in the router configuration), applies Tony's proposal 
(*oversubscription/number of IS neighbhors), and advertise this value to your 
LSP sender?



[1] 
https://www.cisco.com/c/en/us/support/docs/routers/12000-series-routers/29920-spd.html



Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-26 Thread bruno.decraene
Les,


From: Lsr [mailto:lsr-boun...@ietf.org] On Behalf Of Les Ginsberg (ginsberg)
Sent: Wednesday, February 19, 2020 6:49 PM
To: Robert Raszuk
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Robert –

Thanx for your input.

Note that one of the suggestions in 
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  is to 
prioritize the reception of SNPs over LSPs so that we are less likely to drop 
ACKs.
It is not clear to me why you think SNP generation would be an issue.
Once a received LSP is processed one of the outputs is to set a per interface 
flag indicating that an ACK (PSNP) needs to be sent (SSN flag). Implementations 
usually implement some small delay so that multiple ACKs can be sent in a 
single PSNP – but I do not see why this should be viewed as a bottleneck.

If your concern is that we need to emphasize the importance of sending timely 
ACKs, I think we could look at text that makes that point.
[Bruno]
So you need a new behavior on the Rx side (Rx with respect to LSP).  This is 
_not_ Tx only with no need for protocol change.
And BTW, this is called a feedback from the Rx to the Tx.

As we change the protocol on the Rx side, we have the opportunity to report 
more information from Rx to the Tx.

--Bruno

   Les


From: Lsr  On Behalf Of Robert Raszuk
Sent: Wednesday, February 19, 2020 1:07 AM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hi Les & all,

Watching this discussion I would like to state that IMO going with transmitter 
based rate limiting (I would not call it flow control) is much easier option to 
deploy and operate. It also has no dependency across other side of p2p adj 
which is a very important factor. The only issue here is if generation of 
[P|C]SNPs is fast enough.

Receiver based flow control is simple in flow theory however I have a feeling 
that if we are to go that path we would be much better to actually run ISIS 
flooding over DC-TCP and avoid reinventing the wheel.

Thx,
Robert.

On Wed, Feb 19, 2020 at 3:26 AM Les Ginsberg (ginsberg) 
mailto:ginsb...@cisco.com>> wrote:
Two recent drafts advocate for the use of faster LSP flooding speeds in IS-IS:

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/

There is strong agreement on two key points:

1)Modern networks require much faster flooding speeds than are commonly in use 
today

2)To deploy faster flooding speeds safely some form of flow control is needed

The key point of contention between the two drafts is how flow control should 
be implemented.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
advocates for a receiver based flow control where the receiver advertises in 
hellos the parameters which indicate the rate/burst size which the receiver is 
capable of supporting on the interface. Senders are required to limit the rate 
of LSP transmission on that interface in accordance with the values advertised 
by the receiver.

https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.

While other differences between the two drafts exist, it is fair to say that if 
agreement could be reached on the form of flow control  then it is likely other 
issues could be resolved easily.

This email starts the discussion regarding the flow control issue.



___
Lsr mailing list
Lsr@ietf.org<mailto:Lsr@ietf.org>
https://www.ietf.org/mailman/listinfo/lsr

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-20 Thread Tony Li

Les,

> With respect, it is hard to know what you are proposing since there has never 
> been a public description.

With respect, I’ve said it many times, both in person and in email. You need to 
hear it again? Sure.


> The draft on which you are a co-author does not discuss any sort of algorithm 
> to dynamically alter the advertised value based on current router state. In 
> fact it argues (or at least suggests) that this shouldn't be done.


I’m suggesting being more dynamic than what that draft advocates.

> Apparently you have a different idea, which maybe the next version of 
> draft-decraene will include, but right now all we have as a description is a 
> series of isolated sentences in multiple emails. You'll have to forgive me if 
> I am not always clear about what you intend but have yet to state.

Again, what I’m suggesting is that the LSP receiver use a TLV (possibly 
Bruno’s, possibly tweaked) to tell the LSP transmitter the current space in the 
input packet queue. As previously described, we may want to consider 
oversubscription when advertising this value. This MAY be included in IIH’s and 
SNP’s, piggy backed on existing transmissions. 

The LSP transmitter MAY use this information, in conjunction with the knowledge 
of unacknowledged LSPs, to optimize the transmission rate. If no information is 
learned from the LSP receiver, the LSP transmitter is no worse off. If the 
information that was learned is old, then LSP transmitter is free to ignore it.

If the LSP receiver can’t provide a useful value (e.g., if the PD layer hasn’t 
been implemented), it is free to provide either the static value or no value.

As always, I am expecting that implementation experience and inter-operability 
testing will contribute to this effort, and I am not expecting the right answer 
in the first pass.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Tony Przygienda
My sense of humor to be excused, Les 

Yes, so here's suggestion taht will build a better sloped hysteresis that
will allow you to ramp up slower first to not oscillate and also ramp down
somewhat more graciously. It's a sketch, more interesting metrics can be
taken into it further but that's the flavor good enough for the draft IMO.
holding down on timer is better achieved by 'normalizing' by exponential
decay on every tick that 'primes' to max rate over time when nothing is
happening.  Queue lengths except somle low/high watermark that stops/starts
sending are quite misleading IME given that queue length is not very
meaningful unless one can measure producer/consumer rates and those depend
largely on CPU cycles available & memory congestion and can be very, very
bursty. And all the queues involved as well to an extent (IME in halfway
non-trivial architecture @ least 3 of those in- and out- the box).

MaxAllowedEver/sec = maximum rate ever allowed (constant)
MinAllowedEver/sec = minimum rate ever allowed (constant)
MaxAllowedRate/sec = maximum packets (CSNP/PSNP/LSP) allowed out the
interface per sec
CurrentRate/sec = packets sent out this second
Re-TX/sec = retransmissions this second

per every second tic:

if Re-TX {
CurrentRate/sec = max(MinAllowedEver, CurrentRate - 30% ) // slope down
fast on re-tx
}

if CurrentRate/sec >= MaxAllowedRate/sec {
  MaxAllowedRate/sec = min(MaxAllowedEver, MaxAllowedRate + 20%) // when
under load slope up fast
} else {
  MaxAllowedRate/sec += min(MaxAllowedRate, CurrentRate + 5%) // slowly
normalizes even if no traffic
}

What you write about ack'ing only is dangerous and can seriously affect the
protocol. If you're holding newer LSPs than the ones received ack'ing old
versions is not helpful. one should flood back newer stuff. Generally it is
better to run proper WFQ per type and not just start send one type of
packets since that can hit ugly corner conditions in flooding & starve it
IME if the queues get long enough one can't drain them in a sec tic.

hope that's precise & lucid enough without smothering folks with too much
detail ...

--- tony






On Wed, Feb 19, 2020 at 2:01 PM Les Ginsberg (ginsberg) 
wrote:

> Tony –
>
>
>
> If you have a suggestion for Tx back-off algorithm please feel free to
> share.
>
> The proposal in the draft is just a suggestion.
>
> As this is a local matter there is no interoperability issue, but
> certainly documenting a better algorithm is worthwhile.
>
>
>
>Les (claws in check  )
>
>
>
>
>
> *From:* Tony Przygienda 
> *Sent:* Wednesday, February 19, 2020 11:25 AM
> *To:* Les Ginsberg (ginsberg) 
> *Cc:* Peter Psenak (ppsenak) ; Tony Li <
> tony1ath...@gmail.com>; lsr@ietf.org; tony...@tony.li
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Having worked for last couple of years on implementation of flooding
> speeds that converge LSDBs some order of magnitudes above today's speeds
> ;-) here's a bunch of observations
>
>
>
> 1. TX side is easy and useful. My observation having gone quickly over the
> -ginsberg- draft is that you really want a better hysterisis there, it's
> bit too vertical and you will generate oscillations rather than walk around
> the equilibrium ;-)
>
> 2. Queue per interface is fairly trivial with modern implementation
> techniques and memory sizes if done correctly. Yes, very memory constrained
> platforms are a mildly different game and kind of precondition a different
> discussion.
>
> 3. RX side is possible and somewhat useful but much harder to do well
> depending on flavor. If we're talking about the RX advertising a very
> static value to cap the flooding speed that's actually a useful knob to
> have IMO/IME. Trying to cleverly communicate to the TXer a window size is
> not only fiendishly difficult, incurs back propagation speed (not
> neglectible @ those rates IME) but can easily lead to subtle flood
> starvation behaviors and lots of slow starts due to mixture of control loop
> dynamics and implementation complexity of such a scheme. Though, giving the
> TXer some hint that a backpressure is desired is however not a bad thing
> IME and can be derived failry easily without needs for checking queue sizes
> and so on. It's observable by looking @ some standard stats on what is
> productive incoming rate on the interface. Anything smarter needs new TLVs
> on packets & then you have a problem under/oversampling based on hellos
> (too low a frequency) and ACKs (too bursty, too batchy) and flooded back
> LSPs (too unpredictable)
>
>
>
> For more details I can recommend rift draft of course ;-)
>
>
>
> otherwise I'm staying out from this mildly feline spat ;-)
>
>
>
> --- tony
>
>
>
> On Wed, Feb 19, 2020 at 9:59 

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Les Ginsberg (ginsberg)
Tony –

If you have a suggestion for Tx back-off algorithm please feel free to share.
The proposal in the draft is just a suggestion.
As this is a local matter there is no interoperability issue, but certainly 
documenting a better algorithm is worthwhile.

   Les (claws in check  )


From: Tony Przygienda 
Sent: Wednesday, February 19, 2020 11:25 AM
To: Les Ginsberg (ginsberg) 
Cc: Peter Psenak (ppsenak) ; Tony Li 
; lsr@ietf.org; tony...@tony.li
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Having worked for last couple of years on implementation of flooding speeds 
that converge LSDBs some order of magnitudes above today's speeds  ;-) here's a 
bunch of observations

1. TX side is easy and useful. My observation having gone quickly over the 
-ginsberg- draft is that you really want a better hysterisis there, it's bit 
too vertical and you will generate oscillations rather than walk around the 
equilibrium ;-)
2. Queue per interface is fairly trivial with modern implementation techniques 
and memory sizes if done correctly. Yes, very memory constrained platforms are 
a mildly different game and kind of precondition a different discussion.
3. RX side is possible and somewhat useful but much harder to do well depending 
on flavor. If we're talking about the RX advertising a very static value to cap 
the flooding speed that's actually a useful knob to have IMO/IME. Trying to 
cleverly communicate to the TXer a window size is not only fiendishly 
difficult, incurs back propagation speed (not neglectible @ those rates IME) 
but can easily lead to subtle flood starvation behaviors and lots of slow 
starts due to mixture of control loop dynamics and implementation complexity of 
such a scheme. Though, giving the TXer some hint that a backpressure is desired 
is however not a bad thing IME and can be derived failry easily without needs 
for checking queue sizes and so on. It's observable by looking @ some standard 
stats on what is productive incoming rate on the interface. Anything smarter 
needs new TLVs on packets & then you have a problem under/oversampling based on 
hellos (too low a frequency) and ACKs (too bursty, too batchy) and flooded back 
LSPs (too unpredictable)

For more details I can recommend rift draft of course ;-)

otherwise I'm staying out from this mildly feline spat ;-)

--- tony

On Wed, Feb 19, 2020 at 9:59 AM Les Ginsberg (ginsberg) 
mailto:ginsb...@cisco.com>> wrote:
Tony -

Peter has a done a great job of highlighting that "single queue" is an 
oversimplification - I have nothing to add to that discussion.

I would like to point out another aspect of the Rx based solution.

As you need to send signaling based upon dynamic receiver state and this 
signaling is contained in unreliable PDUs (hellos) and to be useful this 
signaling needs to be sent ASAP - you cannot wait until the next periodic hello 
interval (default 10 seconds) to expire. So you are going to have to introduce 
extra hello traffic at a time when protocol input queues are already stressed.

Given hellos are unreliable, the question of how many transmissions of the 
update flow info is enough arises. You could make this more deterministic by 
enhancing the new TLV to include information received from the neighbor so that 
each side would know when the neighbor had received the updated info. This then 
requires additional hellos be sent in both directions - which exacerbates the 
queue issues on both receiver and transmitter.

It is true (of course) that hellos should be treated with higher priority than 
other PDUs, but this does not mean that the additional hellos have no impact on 
the queue space available for LSPs/SNPs.

Also, it seems like you are proposing interface independent logic, so you will 
be adjusting flow information on all interfaces enabled for IS-IS, which means 
that additional hello traffic will occur on all interfaces. At scale this is 
concerning.

   Les


> -Original Message-
> From: Peter Psenak mailto:ppse...@cisco.com>>
> Sent: Wednesday, February 19, 2020 2:49 AM
> To: Tony Li mailto:tony1ath...@gmail.com>>
> Cc: Les Ginsberg (ginsberg) mailto:ginsb...@cisco.com>>; 
> tony...@tony.li<mailto:tony...@tony.li>;
> lsr@ietf.org<mailto:lsr@ietf.org>
> Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
> Tony,
>
> On 19/02/2020 11:37, Tony Li wrote:
> > Peter,
> >
> >> I'm aware of the PD layer and that is not the issue. The problem is that
> there is no common value to report across different PD layers, as each
> architecture may have different number of queues involved, etc. Trying to
> find a common value to report to IPGs across various PDs would involve
> some PD specific logic and that is the part I'm referring to and I would like
> NOT to get into.
> >
> >
> > I’m sorry that scares you.  It would seem lik

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Les Ginsberg (ginsberg)
Tony -

With respect, it is hard to know what you are proposing since there has never 
been a public description.

The draft on which you are a co-author does not discuss any sort of algorithm 
to dynamically alter the advertised value based on current router state. In 
fact it argues (or at least suggests) that this shouldn't be done. Section 5 
states:

" ... a reasonable choice
   might be for a node to use a formula based on an off line tests of
   the overall LSPDU processing speed for a particular set of hardware
   and the number of interfaces configured for IS-IS.  With such a
   formula, the values advertised in the Flooding Speed TLV would only
   change when additional IS-IS interfaces are configured.  On the other
   hand, it would be undesirable to use a formula that depends, for
   example, on an active measurement of the CPU load to modify the
   values advertised in the Flooding Speed TLV..."

Apparently you have a different idea, which maybe the next version of 
draft-decraene will include, but right now all we have as a description is a 
series of isolated sentences in multiple emails. You'll have to forgive me if I 
am not always clear about what you intend but have yet to state.

Inline.

> -Original Message-
> From: Tony Li  On Behalf Of tony...@tony.li
> Sent: Wednesday, February 19, 2020 11:29 AM
> To: Les Ginsberg (ginsberg) 
> Cc: Peter Psenak (ppsenak) ; lsr@ietf.org
> Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> 
> 
> Les,
> 
> > As you need to send signaling based upon dynamic receiver state and this
> signaling is contained in unreliable PDUs (hellos) and to be useful this
> signaling needs to be sent ASAP - you cannot wait until the next periodic
> hello interval (default 10 seconds) to expire. So you are going to have to
> introduce extra hello traffic at a time when protocol input queues are already
> stressed.
> 
> 
> I am not proposing that we add additional packets at this time.  Yes, I 
> realize
> that it may limit the effectiveness of the feedback, but without serious
> experimentation, the cure may be worse than the disease.  I propose to
> tread very carefully.
> 
> 
[Les:] So you intend to simply update the feedback in the next scheduled hello?

If so, this is interesting choice.

The occurrence of high flooding rates is an infrequent event. When it occurs, 
it will persist for some finite amount of time - which with higher flooding 
rates we hope will be modest - in the 10s of seconds or less.
If you aren't going to signal "slow down" for up to 10 seconds, the impact of 
the control is significantly diminished.

This is another significant difference relative to  the TCP analogy. For a TCP 
session it is quite reasonable to expect that high sustained throughput is 
desired for long periods of time.
For IGP flooding, high sustained throughput occurs rarely - and is of limited 
duration.

Les


> > Given hellos are unreliable, the question of how many transmissions of the
> update flow info is enough arises. You could make this more deterministic by
> enhancing the new TLV to include information received from the neighbor so
> that each side would know when the neighbor had received the updated
> info. This then requires additional hellos be sent in both directions - which
> exacerbates the queue issues on both receiver and transmitter.
> 
> 
> I am not proposing this.  If the hello is lost, then the transmitter has less
> information to work with, which is not an unreasonable situation.
> 
> Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread tony . li


> easy to say with a single PD. If you have 20, each with a different 
> architecture, it becomes a different problem.


My employer has multiple PD implementations. I sympathize, but it’s still 
necessary.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Peter Psenak

Tony,

On 19/02/2020 20:25, tony...@tony.li wrote:


Peter,

I'm not scared of polynomial evaluation, but the fact that my IGP 
implementation is dependent on the PD specifics, which are not 
generally available and need to be custom built for each PD 
specifically. I always thought a good IGP implementation is PD agnostic.



Your implementation is always dependent on the underlying hardware.  We 
have timers, we have filesystems, we have I/O subsystems, threads, and 
clocks to contend with. 


none of the above is dependent on the LC specific hardware.

The input queue in the hardware is a fact of 
life and knowing about it can improve our implementations.


Because the PD layer can provide isolation from the specifics, the IGP 
implementation is reasonably abstracted from those specifics, in much 
the same way that the OS has abstracted us from the remainder of the 
underlying hardware. All I’m proposing is adding one more item to that 
PD abstraction.


easy to say with a single PD. If you have 20, each with a different 
architecture, it becomes a different problem.


thanks,
Peter




Regards,
Tony



___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread tony . li

Peter,

> I'm not scared of polynomial evaluation, but the fact that my IGP 
> implementation is dependent on the PD specifics, which are not generally 
> available and need to be custom built for each PD specifically. I always 
> thought a good IGP implementation is PD agnostic.


Your implementation is always dependent on the underlying hardware.  We have 
timers, we have filesystems, we have I/O subsystems, threads, and clocks to 
contend with. The input queue in the hardware is a fact of life and knowing 
about it can improve our implementations.

Because the PD layer can provide isolation from the specifics, the IGP 
implementation is reasonably abstracted from those specifics, in much the same 
way that the OS has abstracted us from the remainder of the underlying 
hardware. All I’m proposing is adding one more item to that PD abstraction.

Regards,
Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Tony Przygienda
Having worked for last couple of years on implementation of flooding speeds
that converge LSDBs some order of magnitudes above today's speeds  ;-)
here's a bunch of observations

1. TX side is easy and useful. My observation having gone quickly over the
-ginsberg- draft is that you really want a better hysterisis there, it's
bit too vertical and you will generate oscillations rather than walk around
the equilibrium ;-)
2. Queue per interface is fairly trivial with modern implementation
techniques and memory sizes if done correctly. Yes, very memory constrained
platforms are a mildly different game and kind of precondition a different
discussion.
3. RX side is possible and somewhat useful but much harder to do well
depending on flavor. If we're talking about the RX advertising a very
static value to cap the flooding speed that's actually a useful knob to
have IMO/IME. Trying to cleverly communicate to the TXer a window size is
not only fiendishly difficult, incurs back propagation speed (not
neglectible @ those rates IME) but can easily lead to subtle flood
starvation behaviors and lots of slow starts due to mixture of control loop
dynamics and implementation complexity of such a scheme. Though, giving the
TXer some hint that a backpressure is desired is however not a bad thing
IME and can be derived failry easily without needs for checking queue sizes
and so on. It's observable by looking @ some standard stats on what is
productive incoming rate on the interface. Anything smarter needs new TLVs
on packets & then you have a problem under/oversampling based on hellos
(too low a frequency) and ACKs (too bursty, too batchy) and flooded back
LSPs (too unpredictable)

For more details I can recommend rift draft of course ;-)

otherwise I'm staying out from this mildly feline spat ;-)

--- tony

On Wed, Feb 19, 2020 at 9:59 AM Les Ginsberg (ginsberg) 
wrote:

> Tony -
>
> Peter has a done a great job of highlighting that "single queue" is an
> oversimplification - I have nothing to add to that discussion.
>
> I would like to point out another aspect of the Rx based solution.
>
> As you need to send signaling based upon dynamic receiver state and this
> signaling is contained in unreliable PDUs (hellos) and to be useful this
> signaling needs to be sent ASAP - you cannot wait until the next periodic
> hello interval (default 10 seconds) to expire. So you are going to have to
> introduce extra hello traffic at a time when protocol input queues are
> already stressed.
>
> Given hellos are unreliable, the question of how many transmissions of the
> update flow info is enough arises. You could make this more deterministic
> by enhancing the new TLV to include information received from the neighbor
> so that each side would know when the neighbor had received the updated
> info. This then requires additional hellos be sent in both directions -
> which exacerbates the queue issues on both receiver and transmitter.
>
> It is true (of course) that hellos should be treated with higher priority
> than other PDUs, but this does not mean that the additional hellos have no
> impact on the queue space available for LSPs/SNPs.
>
> Also, it seems like you are proposing interface independent logic, so you
> will be adjusting flow information on all interfaces enabled for IS-IS,
> which means that additional hello traffic will occur on all interfaces. At
> scale this is concerning.
>
>Les
>
>
> > -Original Message-
> > From: Peter Psenak 
> > Sent: Wednesday, February 19, 2020 2:49 AM
> > To: Tony Li 
> > Cc: Les Ginsberg (ginsberg) ; tony...@tony.li;
> > lsr@ietf.org
> > Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> >
> > Tony,
> >
> > On 19/02/2020 11:37, Tony Li wrote:
> > > Peter,
> > >
> > >> I'm aware of the PD layer and that is not the issue. The problem is
> that
> > there is no common value to report across different PD layers, as each
> > architecture may have different number of queues involved, etc. Trying to
> > find a common value to report to IPGs across various PDs would involve
> > some PD specific logic and that is the part I'm referring to and I would
> like
> > NOT to get into.
> > >
> > >
> > > I’m sorry that scares you.  It would seem like an initial
> implementation
> > might be to take the min of the free space of the queues leading from the
> > >interface to the CPU. I grant you that some additional sophistication
> may be
> > necessary, but I suspect that this is not going to become more
> >complicated
> > than polynomial evaluation.
> >
> > I'm not scared of polynomial evaluation, but the fact that my IGP
> > implementation is dependent on the 

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Les Ginsberg (ginsberg)
Robert –

Sure – we can add some text in this area.

Implementations which don’t do well at current flooding speeds clearly won’t do 
well at faster flooding speeds unless they are enhanced. And such 
implementations won’t do well at scale even w/o faster flooding.
As always with these kinds of improvements, deployment has to be done 
carefully. I suppose this argues for more discussion of deployment 
considerations.
Added to the list…

   Les

From: Robert Raszuk 
Sent: Wednesday, February 19, 2020 10:55 AM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hi Les,

Yes this "small delay" of ACK aggregation is something which I am a bit worried 
here from SNPs sender side.

Now as indeed draft mentioned prioritizing SNPs on reception let me indicate 
that some platforms I have not so long ago dealt with do not even prioritize 
any IGP packet over other packets at neither ingress LC nor queue to RE/RP. If 
that channel takes 100s of ms within the box I am afraid all bets for flooding 
improvement are off.

Thx
R,.


On Wed, Feb 19, 2020 at 6:48 PM Les Ginsberg (ginsberg) 
mailto:ginsb...@cisco.com>> wrote:
Robert –

Thanx for your input.

Note that one of the suggestions in 
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  is to 
prioritize the reception of SNPs over LSPs so that we are less likely to drop 
ACKs.
It is not clear to me why you think SNP generation would be an issue.
Once a received LSP is processed one of the outputs is to set a per interface 
flag indicating that an ACK (PSNP) needs to be sent (SSN flag). Implementations 
usually implement some small delay so that multiple ACKs can be sent in a 
single PSNP – but I do not see why this should be viewed as a bottleneck.

If your concern is that we need to emphasize the importance of sending timely 
ACKs, I think we could look at text that makes that point.

   Les


From: Lsr mailto:lsr-boun...@ietf.org>> On Behalf Of 
Robert Raszuk
Sent: Wednesday, February 19, 2020 1:07 AM
To: Les Ginsberg (ginsberg) mailto:ginsb...@cisco.com>>
Cc: lsr@ietf.org<mailto:lsr@ietf.org>
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hi Les & all,

Watching this discussion I would like to state that IMO going with transmitter 
based rate limiting (I would not call it flow control) is much easier option to 
deploy and operate. It also has no dependency across other side of p2p adj 
which is a very important factor. The only issue here is if generation of 
[P|C]SNPs is fast enough.

Receiver based flow control is simple in flow theory however I have a feeling 
that if we are to go that path we would be much better to actually run ISIS 
flooding over DC-TCP and avoid reinventing the wheel.

Thx,
Robert.
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Robert Raszuk
Hi Les,

Yes this "small delay" of ACK aggregation is something which I am a bit
worried here from SNPs sender side.

Now as indeed draft mentioned prioritizing SNPs on reception let me
indicate that some platforms I have not so long ago dealt with do not even
prioritize any IGP packet over other packets at neither ingress LC nor
queue to RE/RP. If that channel takes 100s of ms within the box I am
afraid all bets for flooding improvement are off.

Thx
R,.


On Wed, Feb 19, 2020 at 6:48 PM Les Ginsberg (ginsberg) 
wrote:

> Robert –
>
>
>
> Thanx for your input.
>
>
>
> Note that one of the suggestions in
> https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/
>  is to prioritize the reception of SNPs over LSPs so that we are less
> likely to drop ACKs.
>
> It is not clear to me why you think SNP generation would be an issue.
>
> Once a received LSP is processed one of the outputs is to set a per
> interface flag indicating that an ACK (PSNP) needs to be sent (SSN flag).
> Implementations usually implement some small delay so that multiple ACKs
> can be sent in a single PSNP – but I do not see why this should be viewed
> as a bottleneck.
>
>
>
> If your concern is that we need to emphasize the importance of sending
> timely ACKs, I think we could look at text that makes that point.
>
>
>
>Les
>
>
>
>
>
> *From:* Lsr  *On Behalf Of * Robert Raszuk
> *Sent:* Wednesday, February 19, 2020 1:07 AM
> *To:* Les Ginsberg (ginsberg) 
> *Cc:* lsr@ietf.org
> *Subject:* Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
>
>
>
> Hi Les & all,
>
>
>
> Watching this discussion I would like to state that IMO going with
> transmitter based rate limiting (I would not call it flow control) is much
> easier option to deploy and operate. It also has no dependency across other
> side of p2p adj which is a very important factor. The only issue here is if
> generation of [P|C]SNPs is fast enough.
>
>
>
> Receiver based flow control is simple in flow theory however I have a
> feeling that if we are to go that path we would be much better to actually
> run ISIS flooding over DC-TCP and avoid reinventing the wheel.
>
>
>
> Thx,
>
> Robert.
>
>
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Les Ginsberg (ginsberg)
Tony -

Peter has a done a great job of highlighting that "single queue" is an 
oversimplification - I have nothing to add to that discussion.

I would like to point out another aspect of the Rx based solution.

As you need to send signaling based upon dynamic receiver state and this 
signaling is contained in unreliable PDUs (hellos) and to be useful this 
signaling needs to be sent ASAP - you cannot wait until the next periodic hello 
interval (default 10 seconds) to expire. So you are going to have to introduce 
extra hello traffic at a time when protocol input queues are already stressed.

Given hellos are unreliable, the question of how many transmissions of the 
update flow info is enough arises. You could make this more deterministic by 
enhancing the new TLV to include information received from the neighbor so that 
each side would know when the neighbor had received the updated info. This then 
requires additional hellos be sent in both directions - which exacerbates the 
queue issues on both receiver and transmitter.

It is true (of course) that hellos should be treated with higher priority than 
other PDUs, but this does not mean that the additional hellos have no impact on 
the queue space available for LSPs/SNPs.

Also, it seems like you are proposing interface independent logic, so you will 
be adjusting flow information on all interfaces enabled for IS-IS, which means 
that additional hello traffic will occur on all interfaces. At scale this is 
concerning.

   Les


> -Original Message-
> From: Peter Psenak 
> Sent: Wednesday, February 19, 2020 2:49 AM
> To: Tony Li 
> Cc: Les Ginsberg (ginsberg) ; tony...@tony.li;
> lsr@ietf.org
> Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed
> 
> Tony,
> 
> On 19/02/2020 11:37, Tony Li wrote:
> > Peter,
> >
> >> I'm aware of the PD layer and that is not the issue. The problem is that
> there is no common value to report across different PD layers, as each
> architecture may have different number of queues involved, etc. Trying to
> find a common value to report to IPGs across various PDs would involve
> some PD specific logic and that is the part I'm referring to and I would like
> NOT to get into.
> >
> >
> > I’m sorry that scares you.  It would seem like an initial implementation
> might be to take the min of the free space of the queues leading from the
> >interface to the CPU. I grant you that some additional sophistication may be
> necessary, but I suspect that this is not going to become more >complicated
> than polynomial evaluation.
> 
> I'm not scared of polynomial evaluation, but the fact that my IGP
> implementation is dependent on the PD specifics, which are not generally
> available and need to be custom built for each PD specifically. I always
> thought a good IGP implementation is PD agnostic.
> 
> thanks,
> Peter
> 
> >
> > Tony
> >
> > ___
> > Lsr mailing list
> > Lsr@ietf.org
> > https://www.ietf.org/mailman/listinfo/lsr
> >
> >

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Les Ginsberg (ginsberg)
Robert –

Thanx for your input.

Note that one of the suggestions in 
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  is to 
prioritize the reception of SNPs over LSPs so that we are less likely to drop 
ACKs.
It is not clear to me why you think SNP generation would be an issue.
Once a received LSP is processed one of the outputs is to set a per interface 
flag indicating that an ACK (PSNP) needs to be sent (SSN flag). Implementations 
usually implement some small delay so that multiple ACKs can be sent in a 
single PSNP – but I do not see why this should be viewed as a bottleneck.

If your concern is that we need to emphasize the importance of sending timely 
ACKs, I think we could look at text that makes that point.

   Les


From: Lsr  On Behalf Of Robert Raszuk
Sent: Wednesday, February 19, 2020 1:07 AM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Hi Les & all,

Watching this discussion I would like to state that IMO going with transmitter 
based rate limiting (I would not call it flow control) is much easier option to 
deploy and operate. It also has no dependency across other side of p2p adj 
which is a very important factor. The only issue here is if generation of 
[P|C]SNPs is fast enough.

Receiver based flow control is simple in flow theory however I have a feeling 
that if we are to go that path we would be much better to actually run ISIS 
flooding over DC-TCP and avoid reinventing the wheel.

Thx,
Robert.

On Wed, Feb 19, 2020 at 3:26 AM Les Ginsberg (ginsberg) 
mailto:ginsb...@cisco.com>> wrote:
Two recent drafts advocate for the use of faster LSP flooding speeds in IS-IS:

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/

There is strong agreement on two key points:

1)Modern networks require much faster flooding speeds than are commonly in use 
today

2)To deploy faster flooding speeds safely some form of flow control is needed

The key point of contention between the two drafts is how flow control should 
be implemented.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
advocates for a receiver based flow control where the receiver advertises in 
hellos the parameters which indicate the rate/burst size which the receiver is 
capable of supporting on the interface. Senders are required to limit the rate 
of LSP transmission on that interface in accordance with the values advertised 
by the receiver.

https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.

While other differences between the two drafts exist, it is fair to say that if 
agreement could be reached on the form of flow control  then it is likely other 
issues could be resolved easily.

This email starts the discussion regarding the flow control issue.



___
Lsr mailing list
Lsr@ietf.org<mailto:Lsr@ietf.org>
https://www.ietf.org/mailman/listinfo/lsr
___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Peter Psenak

Tony,

On 19/02/2020 11:37, Tony Li wrote:

Peter,


I'm aware of the PD layer and that is not the issue. The problem is that there 
is no common value to report across different PD layers, as each architecture 
may have different number of queues involved, etc. Trying to find a common 
value to report to IPGs across various PDs would involve some PD specific logic 
and that is the part I'm referring to and I would like NOT to get into.



I’m sorry that scares you.  It would seem like an initial implementation might be to 
take the min of the free space of the queues leading from the >interface to the 
CPU. I grant you that some additional sophistication may be necessary, but I suspect 
that this is not going to become more >complicated than polynomial evaluation.


I'm not scared of polynomial evaluation, but the fact that my IGP 
implementation is dependent on the PD specifics, which are not generally 
available and need to be custom built for each PD specifically. I always 
thought a good IGP implementation is PD agnostic.


thanks,
Peter



Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr




___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Tony Li

Peter,

> I'm aware of the PD layer and that is not the issue. The problem is that 
> there is no common value to report across different PD layers, as each 
> architecture may have different number of queues involved, etc. Trying to 
> find a common value to report to IPGs across various PDs would involve some 
> PD specific logic and that is the part I'm referring to and I would like NOT 
> to get into.


I’m sorry that scares you.  It would seem like an initial implementation might 
be to take the min of the free space of the queues leading from the interface 
to the CPU. I grant you that some additional sophistication may be necessary, 
but I suspect that this is not going to become more complicated than polynomial 
evaluation. 

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Peter Psenak

Tony,

On 19/02/2020 10:47, Tony Li wrote:


Peter,


Given many different hardware architectures one may run a single IGP 
implementation on, this becomes impractical and complex as each hardware 
architecture has its own specifics. One would rather keep the IGP 
implementation hardware agnostic, rather than providing hardware specific hooks 
for each platform it runs on.


This is why your software architecture has a Platform Dependent component. This 
is a service that you should require from the PD layer.


I'm aware of the PD layer and that is not the issue. The problem is that 
there is no common value to report across different PD layers, as each 
architecture may have different number of queues involved, etc. Trying 
to find a common value to report to IPGs across various PDs would 
involve some PD specific logic and that is the part I'm referring to and 
I would like NOT to get into.


thanks,
Peter




Tony





___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Tony Li


Peter,

> Given many different hardware architectures one may run a single IGP 
> implementation on, this becomes impractical and complex as each hardware 
> architecture has its own specifics. One would rather keep the IGP 
> implementation hardware agnostic, rather than providing hardware specific 
> hooks for each platform it runs on.

This is why your software architecture has a Platform Dependent component. This 
is a service that you should require from the PD layer.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Peter Psenak

Tony,

On 19/02/2020 10:30, Tony Li wrote:


Peter,


above is nowhere close to what the reality is, especially in the distributed 
system. In such system, packets traverses via multiple queues on both LC and RP 
and application like IGP has no visibility to these queues.



As you may recall, I was lead software architect for NCS 6000.  


I do recall :)


I am well aware that a distributed implementation is more complex.  However, Les 
is insistent that we discuss specifics and that is most easily >done with a 
simpler model.

The applications can get visibility into the queues through the same fundamental 
mechanism: poll the various NPUs on the data path and report to >the 
application CPU.


everything is possible...

Given many different hardware architectures one may run a single IGP 
implementation on, this becomes impractical and complex as each hardware 
architecture has its own specifics. One would rather keep the IGP 
implementation hardware agnostic, rather than providing hardware 
specific hooks for each platform it runs on.


thanks,
Peter



Tony





___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Tony Li


Peter,

> above is nowhere close to what the reality is, especially in the distributed 
> system. In such system, packets traverses via multiple queues on both LC and 
> RP and application like IGP has no visibility to these queues.


As you may recall, I was lead software architect for NCS 6000.  I am well aware 
that a distributed implementation is more complex.  However, Les is insistent 
that we discuss specifics and that is most easily done with a simpler model.

The applications can get visibility into the queues through the same 
fundamental mechanism: poll the various NPUs on the data path and report to the 
application CPU.  

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-19 Thread Peter Psenak

Tony,

On 19/02/2020 08:52, tony...@tony.li wrote:


Les,

Overall, I think you are making  general statements and not providing 
needed specifics.



I’m sorry it’s not specific enough for you.  I’m not sure that I can 
help to your satisfaction.



Maybe it’s obvious to you how a receiver based window would be 
calculated – but it isn’t obvious to me – so please help me out here 
with specifics.
What inputs do you need on the receive side in order to do the 
necessary calculation?



Well, there can be many, as it depends on the receiver’s architecture. 
Now, I can’t talk about things that are under NDA or company secret, so 
I’m pretty constrained.  Talking about any specific implementation is 
going to not be very helpful, so I propose that we stick with a 
simplified model to start: a box with N interfaces and a single input 
queue up to the CPU.  The input queue is the only possible bottleneck.


above is nowhere close to what the reality is, especially in the 
distributed system. In such system, packets traverses via multiple 
queues on both LC and RP and application like IGP has no visibility to 
these queues.


thanks,
Peter


  Further, the avoid undue complexity (for the moment — it may return), 
let’s assume that the input queue is in max-MTU sized packets, so that 
knowing the free entries in this queue is entirely sufficient.  Let the 
number of free entries be F.


As previously noted, we will want some oversubscription factor.  For the 
sake of a simple model, let’s consider this a constant and call it O. 
  [For future reference, I suspect that we will want to come back and 
make this more sophisticated, such as a Kalman filter, but again, to 
start simply… ]


Now, we want to report the free space devoted to the interface, but 
derated by the oversubscription factor, so we end up reporting F*O/N.


Is that specific enough?


What assumptions are you making about how an implementation receives, 
punts, dequeues IS-IS LSPs?



None.


And how will this lead to better performance than having TX react to 
actual throughput?



The receiver will have better information. The transmitter can now 
convey useful things like “I processed all of your packets but my queue 
is still congested”, this would be a PSN that acknowledges all 
outstanding LSPs but shows no free buffers.


And please do not say  “just like TCP”. I have made some specific 
statements about how managing the resources associated with a TCP 
connection is not at all similar to managing resources for IGP flooding.

If you disagree – please provide some specific explanations.



I disagree with your disagreement.  A control loop is a very simple 
primitive in control theory.  That’s what we’re trying to create. 
  Modulating the receive window through control feedback is a control 
theory 101 technique.



It can look at its input queue and report the current space.  ~”Hi, 
I’ve got buffers available for 20 packets, totalling 20kB.”~
*/[Les2:] None of the implementations I have worked on (at least 3) 
work this way./*



Well, sorry, some of them do.  In particular the Cisco AGS+ worked 
exactly this way under IOS Classic in the day.  It may have morphed.




*/For me how to do this is not at all obvious given common
implementation issues such as:/*
*//*

  * */Sharing of a single punt path queue among many incoming
protocols/incoming interfaces/*

The receiver gets to decide how much window it wants to provide to 
each transmitter. Some oversubscription is probably a good thing.
*/[Les2:] That wasn’t my point. Neither of us Is advocating trying to 
completely eliminate retransmissions and/or transient overload./*
*/And since drops are possible, looking at the length of an input 
queue isn’t necessarily going to tell you whether you are indeed 
overloaded and if so due to what interface(s)./*



Looking at the length of the input queue does give you a snapshot at 
your congestion level.  You are correct, it does NOT ascribe it to 
specific interfaces.  A more sophisticated implementation might modulate 
its receive window inversely proportional to its interface input rate.




*//*
*/Tx side flow control is agnostic to receiver implementation strategy 
and the reasons why LSPs remain unacknowledged../*



Yes, it’s ignorant.  That doesn’t make it better.  The point is to 
maximize the goodput.  Systems theory tells us that we improve frequency 
response when we provide feedback.  That’s all I’m suggesting.




  * */Distributed dataplanes/*

This should definitely be a non-issue. An implementation should know 
the data path from the interface to the IS-IS process, for all data 
planes involved, and measure accordingly.

*/[Les2:] Again, you provide no specifics. Measure “what” accordingly?/*



The input queue size for the data path from the given interface.


*/IF I do not have a queue dedicated solely to IS-IS packets to be 
punted (and implementations may well use a single queue for multiple 

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread tony . li

Les,

> Overall, I think you are making  general statements and not providing needed 
> specifics.


I’m sorry it’s not specific enough for you.  I’m not sure that I can help to 
your satisfaction.


> Maybe it’s obvious to you how a receiver based window would be calculated – 
> but it isn’t obvious to me – so please help me out here with specifics.
> What inputs do you need on the receive side in order to do the necessary 
> calculation?


Well, there can be many, as it depends on the receiver’s architecture. Now, I 
can’t talk about things that are under NDA or company secret, so I’m pretty 
constrained.  Talking about any specific implementation is going to not be very 
helpful, so I propose that we stick with a simplified model to start: a box 
with N interfaces and a single input queue up to the CPU.  The input queue is 
the only possible bottleneck.  Further, the avoid undue complexity (for the 
moment — it may return), let’s assume that the input queue is in max-MTU sized 
packets, so that knowing the free entries in this queue is entirely sufficient. 
 Let the number of free entries be F.

As previously noted, we will want some oversubscription factor.  For the sake 
of a simple model, let’s consider this a constant and call it O.  [For future 
reference, I suspect that we will want to come back and make this more 
sophisticated, such as a Kalman filter, but again, to start simply… ]

Now, we want to report the free space devoted to the interface, but derated by 
the oversubscription factor, so we end up reporting F*O/N. 

Is that specific enough?


> What assumptions are you making about how an implementation receives, punts, 
> dequeues IS-IS LSPs?


None.


> And how will this lead to better performance than having TX react to actual 
> throughput?


The receiver will have better information. The transmitter can now convey 
useful things like “I processed all of your packets but my queue is still 
congested”, this would be a PSN that acknowledges all outstanding LSPs but 
shows no free buffers.

 
> And please do not say  “just like TCP”. I have made some specific statements 
> about how managing the resources associated with a TCP connection is not at 
> all similar to managing resources for IGP flooding.
> If you disagree – please provide some specific explanations.


I disagree with your disagreement.  A control loop is a very simple primitive 
in control theory.  That’s what we’re trying to create.  Modulating the receive 
window through control feedback is a control theory 101 technique.


>  It can look at its input queue and report the current space.  ~”Hi, I’ve got 
> buffers available for 20 packets, totalling 20kB.”~  
>  
> [Les2:] None of the implementations I have worked on (at least 3) work this 
> way.


Well, sorry, some of them do.  In particular the Cisco AGS+ worked exactly this 
way under IOS Classic in the day.  It may have morphed.


> For me how to do this is not at all obvious given common implementation 
> issues such as:
>  
> Sharing of a single punt path queue among many incoming protocols/incoming 
> interfaces
>  
>  
> The receiver gets to decide how much window it wants to provide to each 
> transmitter. Some oversubscription is probably a good thing.
> [Les2:] That wasn’t my point. Neither of us Is advocating trying to 
> completely eliminate retransmissions and/or transient overload.
> And since drops are possible, looking at the length of an input queue isn’t 
> necessarily going to tell you whether you are indeed overloaded and if so due 
> to what interface(s).


Looking at the length of the input queue does give you a snapshot at your 
congestion level.  You are correct, it does NOT ascribe it to specific 
interfaces.  A more sophisticated implementation might modulate its receive 
window inversely proportional to its interface input rate.


> Tx side flow control is agnostic to receiver implementation strategy and the 
> reasons why LSPs remain unacknowledged..


Yes, it’s ignorant.  That doesn’t make it better.  The point is to maximize the 
goodput.  Systems theory tells us that we improve frequency response when we 
provide feedback.  That’s all I’m suggesting.


>  
> Distributed dataplanes
>  
>  
> This should definitely be a non-issue. An implementation should know the data 
> path from the interface to the IS-IS process, for all data planes involved, 
> and measure accordingly.
>  
> [Les2:] Again, you provide no specifics. Measure “what” accordingly?


The input queue size for the data path from the given interface.


> IF I do not have a queue dedicated solely to IS-IS packets to be punted (and 
> implementations may well use a single queue for multiple protocols) what 
> should I measure? How to get that info to the control plane in real time?


You should STILL use that queue size.  That is still the bottleneck.

You get that to the control plane by doing a PIO to the queue status register 
in the dataplane ASIC.  This is trivial.


> If we 

Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Tony –

Overall, I think you are making  general statements and not providing needed 
specifics.
Maybe it’s obvious to you how a receiver based window would be calculated – but 
it isn’t obvious to me – so please help me out here with specifics.
What inputs do you need on the receive side in order to do the necessary 
calculation?
What assumptions are you making about how an implementation receives, punts, 
dequeues IS-IS LSPs?
And how will this lead to better performance than having TX react to actual 
throughput?

And please do not say  “just like TCP”. I have made some specific statements 
about how managing the resources associated with a TCP connection is not at all 
similar to managing resources for IGP flooding.
If you disagree – please provide some specific explanations.

A few more comments inline – but rather than go back-and-forth on each line 
item, it would be far better if you wrote up the details of the RX side 
solution.
Thanx.


From: Tony Li  On Behalf Of tony...@tony.li
Sent: Tuesday, February 18, 2020 10:43 PM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed


Les,

Then the LSP transmitter is operating without information from the LSP 
receiver. Additional information from the receiver can help the transmitter 
maintain a more accurate picture of reality and adapt to it more quickly.

[Les:] This is your claim – but you have not provided any specifics as to how 
information sent by the receiver would provide better adaptability than a Tx 
based flow control which is based on actual performance.



This is not a claim. This is normally how control loops work. See TCP. When the 
receiver’s window opens, it can tell the transmitter. When the receiver’s 
window closes, it can tell the transmitter. If it only opens a little bit, it 
can tell the transmitter.

[Les2:] TCP != IGP flooding – please see my remarks in my initial posting on 
this thread.


Nor have you addressed how the receiver would dynamically calculate the values 
it would send.


It can look at its input queue and report the current space.  ~”Hi, I’ve got 
buffers available for 20 packets, totalling 20kB.”~

[Les2:] None of the implementations I have worked on (at least 3) work this way.


For me how to do this is not at all obvious given common implementation issues 
such as:


  *   Sharing of a single punt path queue among many incoming 
protocols/incoming interfaces


The receiver gets to decide how much window it wants to provide to each 
transmitter. Some oversubscription is probably a good thing.
[Les2:] That wasn’t my point. Neither of us Is advocating trying to completely 
eliminate retransmissions and/or transient overload.
And since drops are possible, looking at the length of an input queue isn’t 
necessarily going to tell you whether you are indeed overloaded and if so due 
to what interface(s).
Tx side flow control is agnostic to receiver implementation strategy and the 
reasons why LSPs remain unacknowledged..



  *   Single interface independent input queue to IS-IS itself, making it 
difficult to track the contribution of a single interface to the current backlog


It’s not clear that this is problematic.  Again, reporting the window size in 
this queue is helpful.

[Les2:] Sorry, this is exactly the sort of generic statement that doesn’t add 
much. I know you believe this, but you need to explain how this is better than 
simply looking at what remains unacknowledged.


  *   Distributed dataplanes


This should definitely be a non-issue. An implementation should know the data 
path from the interface to the IS-IS process, for all data planes involved, and 
measure accordingly.

[Les2:] Again, you provide no specifics. Measure “what” accordingly? IF I do 
not have a queue dedicated solely to IS-IS packets to be punted (and 
implementations may well use a single queue for multiple protocols) what should 
I measure? How to get that info to the control plane in real time?

If we are to introduce new signaling/protocol extensions there needs to be good 
reason and it must be practical to implement – especially since we have an 
alternate solution which is practical to implement, dynamically responds to 
current state, and does not require any protocol extensions.


If we are to introduce new behaviors, they must be helpful. Estimates that do 
not utilize the available information may be sufficiently erroneous as to be 
harmful (see silly window syndrome).

[Les2:] Again – you try to apply TCP heuristics to IGP flooding. Not at all 
intuitive to me that this applies – I have stated why.

   Les

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Tony -

From: Tony Li  On Behalf Of tony...@tony.li
Sent: Tuesday, February 18, 2020 10:16 PM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

The TX side flow control is purely based on performance on each interface – 
there are no implementation requirements imposed or implied as regards the 
receiver.

Then the LSP transmitter is operating without information from the LSP 
receiver. Additional information from the receiver can help the transmitter 
maintain a more accurate picture of reality and adapt to it more quickly.

[Les:] This is your claim – but you have not provided any specifics as to how 
information sent by the receiver would provide better adaptability than a Tx 
based flow control which is based on actual performance.
Nor have you addressed how the receiver would dynamically calculate the values 
it would send. For me how to do this is not at all obvious given common 
implementation issues such as:


  *   Sharing of a single punt path queue among many incoming 
protocols/incoming interfaces
  *   Single interface independent input queue to IS-IS itself, making it 
difficult to track the contribution of a single interface to the current backlog
  *   Distributed dataplanes

If we are to introduce new signaling/protocol extensions there needs to be good 
reason and it must be practical to implement – especially since we have an 
alternate solution which is practical to implement, dynamically responds to 
current state, and does not require any protocol extensions.

Les



___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread tony . li
> The TX side flow control is purely based on performance on each interface – 
> there are no implementation requirements imposed or implied as regards the 
> receiver.

Then the LSP transmitter is operating without information from the LSP 
receiver. Additional information from the receiver can help the transmitter 
maintain a more accurate picture of reality and adapt to it more quickly.

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Tony –

There is no such assumption.

Transmitter has exact knowledge of how many unacknowledged LSPs have been 
transmitted on each interface.

Using an algorithm functionally equivalent to the example algorithm in the 
draft, the transmitter slows down when the neighbor is not acknowledging in a 
timely manner LSPs sent on that interface.
The reason the neighbor is falling behind is irrelevant.

Maybe the receiver has a per interface queue and the associated line card is 
overloaded.
Maybe the receiver has a single queue but there are so many LSPs received on 
other interfaces in the front of the queue that the receiver hasn’t yet 
processed the ones received on this interface.
Maybe the receiver received the same LSPs on other interfaces and is now so 
busy sending these LSPs that it has fallen behind on processing its receive 
queue.
Maybe BGP is consuming high CPU and starving IS-IS…

The transmitter doesn’t care.  It just adjusts the transmission rate based on 
actual performance.

If all interfaces on the receiver are backed up all the neighbors will slow 
down their transmission rate.

The TX side flow control is purely based on performance on each interface – 
there are no implementation requirements imposed or implied as regards the 
receiver.

Les


From: Tony Li 
Sent: Tuesday, February 18, 2020 7:10 PM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed


https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.


Les,

This makes the assumption that there is a per-interface queue on the LSP 
receiver. That has never been the case on any implementation that I’ve ever 
seen.

Without this assumption or more information, it seems difficult for the LSP 
transmitter to have enough information about how to proceed.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Tony Li

> https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/ 
>   
> advocates for a transmit based flow control where the transmitter monitors 
> the number of unacknowledged LSPs sent on each interface and implements a 
> backoff algorithm to slow the rate of sending LSPs based on the length of the 
> per interface unacknowledged queue.


Les,

This makes the assumption that there is a per-interface queue on the LSP 
receiver. That has never been the case on any implementation that I’ve ever 
seen.

Without this assumption or more information, it seems difficult for the LSP 
transmitter to have enough information about how to proceed.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Base protocol operation of the Update process tracks the flooding of
LSPs/interface and guarantees timer-based retransmission on P2P interfaces
until an acknowledgment is received.
Using this base protocol mechanism in combination with exponential backoff of 
the
retransmission timer provides flow control in the event of temporary overload
of the receiver.

This mechanism works without protocol extensions, is dynamic, operates
independent of the reason for delayed acknowledgment (dropped packets, CPU
overload), and does not require additional signaling during the overloaded
period.
This is consistent with the recommendations in RFC 4222 (OSPF).
Receiver-based flow control (as proposed in 
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
requires protocol extensions and introduces additional signaling during
periods of high load. The asserted reason for this is to optimize throughput -
but there is no evidence that it will achieve this goal.
Mention has been made to TCP-like flow control mechanisms as a model - which
are indeed receiver based. However, there are significant differences between
TCP sessions and IGP flooding.
TCP consists of a single session between two endpoints. Resources
(primarily buffer space) for this session are typically allocated in the
control plane and current usage is easily measurable.
IGP flooding is point-to-multi-point, resources to support IGP flooding
involve both control plane queues and dataplane queues, both of which are
typically not per interface - nor even dedicated to a particular protocol
instance. What input is required to optimize receiver-based flow control is not 
fully specified.
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
suggests (Section 5) that the values
to be advertised:
"use a formula based on an off line tests of
   the overall LSPDU processing speed for a particular set of hardware
   and the number of interfaces configured for IS-IS"
implying that the advertised value is intentionally not dynamic. As such,
it could just as easily be configured on the transmit side and not require
additional signaling. As a static value, it would necessarily be somewhat
conservative as it has to account for the worst case under the current
configuration - which means it needs to consider concurrent use of the CPU
and dataplane by all protocols/features which are enabled on a router - not all 
of whose
use is likely to be synchronized with peak IS-IS flooding load.
Unless a good case can be made as to why transmit-based flow control is not a 
good
fit and why receiver-based flow control is demonstrably better, it seems
unnecessary to extend the protocol.

Les


From: Lsr  On Behalf Of Les Ginsberg (ginsberg)
Sent: Tuesday, February 18, 2020 6:25 PM
To: lsr@ietf.org
Subject: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Two recent drafts advocate for the use of faster LSP flooding speeds in IS-IS:

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/

There is strong agreement on two key points:

1)Modern networks require much faster flooding speeds than are commonly in use 
today

2)To deploy faster flooding speeds safely some form of flow control is needed

The key point of contention between the two drafts is how flow control should 
be implemented.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
advocates for a receiver based flow control where the receiver advertises in 
hellos the parameters which indicate the rate/burst size which the receiver is 
capable of supporting on the interface. Senders are required to limit the rate 
of LSP transmission on that interface in accordance with the values advertised 
by the receiver.

https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.

While other differences between the two drafts exist, it is fair to say that if 
agreement could be reached on the form of flow control  then it is likely other 
issues could be resolved easily.

This email starts the discussion regarding the flow control issue.



___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


[Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Two recent drafts advocate for the use of faster LSP flooding speeds in IS-IS:

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/

There is strong agreement on two key points:

1)Modern networks require much faster flooding speeds than are commonly in use 
today

2)To deploy faster flooding speeds safely some form of flow control is needed

The key point of contention between the two drafts is how flow control should 
be implemented.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
advocates for a receiver based flow control where the receiver advertises in 
hellos the parameters which indicate the rate/burst size which the receiver is 
capable of supporting on the interface. Senders are required to limit the rate 
of LSP transmission on that interface in accordance with the values advertised 
by the receiver.

https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.

While other differences between the two drafts exist, it is fair to say that if 
agreement could be reached on the form of flow control  then it is likely other 
issues could be resolved easily.

This email starts the discussion regarding the flow control issue.



___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr