Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Tony Li

> https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/ 
>   
> advocates for a transmit based flow control where the transmitter monitors 
> the number of unacknowledged LSPs sent on each interface and implements a 
> backoff algorithm to slow the rate of sending LSPs based on the length of the 
> per interface unacknowledged queue.


Les,

This makes the assumption that there is a per-interface queue on the LSP 
receiver. That has never been the case on any implementation that I’ve ever 
seen.

Without this assumption or more information, it seems difficult for the LSP 
transmitter to have enough information about how to proceed.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Tony -

From: Tony Li  On Behalf Of tony...@tony.li
Sent: Tuesday, February 18, 2020 10:16 PM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

The TX side flow control is purely based on performance on each interface – 
there are no implementation requirements imposed or implied as regards the 
receiver.

Then the LSP transmitter is operating without information from the LSP 
receiver. Additional information from the receiver can help the transmitter 
maintain a more accurate picture of reality and adapt to it more quickly.

[Les:] This is your claim – but you have not provided any specifics as to how 
information sent by the receiver would provide better adaptability than a Tx 
based flow control which is based on actual performance.
Nor have you addressed how the receiver would dynamically calculate the values 
it would send. For me how to do this is not at all obvious given common 
implementation issues such as:


  *   Sharing of a single punt path queue among many incoming 
protocols/incoming interfaces
  *   Single interface independent input queue to IS-IS itself, making it 
difficult to track the contribution of a single interface to the current backlog
  *   Distributed dataplanes

If we are to introduce new signaling/protocol extensions there needs to be good 
reason and it must be practical to implement – especially since we have an 
alternate solution which is practical to implement, dynamically responds to 
current state, and does not require any protocol extensions.

Les



___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


[Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Two recent drafts advocate for the use of faster LSP flooding speeds in IS-IS:

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/

There is strong agreement on two key points:

1)Modern networks require much faster flooding speeds than are commonly in use 
today

2)To deploy faster flooding speeds safely some form of flow control is needed

The key point of contention between the two drafts is how flow control should 
be implemented.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
advocates for a receiver based flow control where the receiver advertises in 
hellos the parameters which indicate the rate/burst size which the receiver is 
capable of supporting on the interface. Senders are required to limit the rate 
of LSP transmission on that interface in accordance with the values advertised 
by the receiver.

https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.

While other differences between the two drafts exist, it is fair to say that if 
agreement could be reached on the form of flow control  then it is likely other 
issues could be resolved easily.

This email starts the discussion regarding the flow control issue.



___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread tony . li
> The TX side flow control is purely based on performance on each interface – 
> there are no implementation requirements imposed or implied as regards the 
> receiver.

Then the LSP transmitter is operating without information from the LSP 
receiver. Additional information from the receiver can help the transmitter 
maintain a more accurate picture of reality and adapt to it more quickly.

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Base protocol operation of the Update process tracks the flooding of
LSPs/interface and guarantees timer-based retransmission on P2P interfaces
until an acknowledgment is received.
Using this base protocol mechanism in combination with exponential backoff of 
the
retransmission timer provides flow control in the event of temporary overload
of the receiver.

This mechanism works without protocol extensions, is dynamic, operates
independent of the reason for delayed acknowledgment (dropped packets, CPU
overload), and does not require additional signaling during the overloaded
period.
This is consistent with the recommendations in RFC 4222 (OSPF).
Receiver-based flow control (as proposed in 
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ )
requires protocol extensions and introduces additional signaling during
periods of high load. The asserted reason for this is to optimize throughput -
but there is no evidence that it will achieve this goal.
Mention has been made to TCP-like flow control mechanisms as a model - which
are indeed receiver based. However, there are significant differences between
TCP sessions and IGP flooding.
TCP consists of a single session between two endpoints. Resources
(primarily buffer space) for this session are typically allocated in the
control plane and current usage is easily measurable.
IGP flooding is point-to-multi-point, resources to support IGP flooding
involve both control plane queues and dataplane queues, both of which are
typically not per interface - nor even dedicated to a particular protocol
instance. What input is required to optimize receiver-based flow control is not 
fully specified.
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
suggests (Section 5) that the values
to be advertised:
"use a formula based on an off line tests of
   the overall LSPDU processing speed for a particular set of hardware
   and the number of interfaces configured for IS-IS"
implying that the advertised value is intentionally not dynamic. As such,
it could just as easily be configured on the transmit side and not require
additional signaling. As a static value, it would necessarily be somewhat
conservative as it has to account for the worst case under the current
configuration - which means it needs to consider concurrent use of the CPU
and dataplane by all protocols/features which are enabled on a router - not all 
of whose
use is likely to be synchronized with peak IS-IS flooding load.
Unless a good case can be made as to why transmit-based flow control is not a 
good
fit and why receiver-based flow control is demonstrably better, it seems
unnecessary to extend the protocol.

Les


From: Lsr  On Behalf Of Les Ginsberg (ginsberg)
Sent: Tuesday, February 18, 2020 6:25 PM
To: lsr@ietf.org
Subject: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

Two recent drafts advocate for the use of faster LSP flooding speeds in IS-IS:

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/

There is strong agreement on two key points:

1)Modern networks require much faster flooding speeds than are commonly in use 
today

2)To deploy faster flooding speeds safely some form of flow control is needed

The key point of contention between the two drafts is how flow control should 
be implemented.

https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/ 
advocates for a receiver based flow control where the receiver advertises in 
hellos the parameters which indicate the rate/burst size which the receiver is 
capable of supporting on the interface. Senders are required to limit the rate 
of LSP transmission on that interface in accordance with the values advertised 
by the receiver.

https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.

While other differences between the two drafts exist, it is fair to say that if 
agreement could be reached on the form of flow control  then it is likely other 
issues could be resolved easily.

This email starts the discussion regarding the flow control issue.



___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Tony –

There is no such assumption.

Transmitter has exact knowledge of how many unacknowledged LSPs have been 
transmitted on each interface.

Using an algorithm functionally equivalent to the example algorithm in the 
draft, the transmitter slows down when the neighbor is not acknowledging in a 
timely manner LSPs sent on that interface.
The reason the neighbor is falling behind is irrelevant.

Maybe the receiver has a per interface queue and the associated line card is 
overloaded.
Maybe the receiver has a single queue but there are so many LSPs received on 
other interfaces in the front of the queue that the receiver hasn’t yet 
processed the ones received on this interface.
Maybe the receiver received the same LSPs on other interfaces and is now so 
busy sending these LSPs that it has fallen behind on processing its receive 
queue.
Maybe BGP is consuming high CPU and starving IS-IS…

The transmitter doesn’t care.  It just adjusts the transmission rate based on 
actual performance.

If all interfaces on the receiver are backed up all the neighbors will slow 
down their transmission rate.

The TX side flow control is purely based on performance on each interface – 
there are no implementation requirements imposed or implied as regards the 
receiver.

Les


From: Tony Li 
Sent: Tuesday, February 18, 2020 7:10 PM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed


https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/  
advocates for a transmit based flow control where the transmitter monitors the 
number of unacknowledged LSPs sent on each interface and implements a backoff 
algorithm to slow the rate of sending LSPs based on the length of the per 
interface unacknowledged queue.


Les,

This makes the assumption that there is a per-interface queue on the LSP 
receiver. That has never been the case on any implementation that I’ve ever 
seen.

Without this assumption or more information, it seems difficult for the LSP 
transmitter to have enough information about how to proceed.

Tony

___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread Les Ginsberg (ginsberg)
Tony –

Overall, I think you are making  general statements and not providing needed 
specifics.
Maybe it’s obvious to you how a receiver based window would be calculated – but 
it isn’t obvious to me – so please help me out here with specifics.
What inputs do you need on the receive side in order to do the necessary 
calculation?
What assumptions are you making about how an implementation receives, punts, 
dequeues IS-IS LSPs?
And how will this lead to better performance than having TX react to actual 
throughput?

And please do not say  “just like TCP”. I have made some specific statements 
about how managing the resources associated with a TCP connection is not at all 
similar to managing resources for IGP flooding.
If you disagree – please provide some specific explanations.

A few more comments inline – but rather than go back-and-forth on each line 
item, it would be far better if you wrote up the details of the RX side 
solution.
Thanx.


From: Tony Li  On Behalf Of tony...@tony.li
Sent: Tuesday, February 18, 2020 10:43 PM
To: Les Ginsberg (ginsberg) 
Cc: lsr@ietf.org
Subject: Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed


Les,

Then the LSP transmitter is operating without information from the LSP 
receiver. Additional information from the receiver can help the transmitter 
maintain a more accurate picture of reality and adapt to it more quickly.

[Les:] This is your claim – but you have not provided any specifics as to how 
information sent by the receiver would provide better adaptability than a Tx 
based flow control which is based on actual performance.



This is not a claim. This is normally how control loops work. See TCP. When the 
receiver’s window opens, it can tell the transmitter. When the receiver’s 
window closes, it can tell the transmitter. If it only opens a little bit, it 
can tell the transmitter.

[Les2:] TCP != IGP flooding – please see my remarks in my initial posting on 
this thread.


Nor have you addressed how the receiver would dynamically calculate the values 
it would send.


It can look at its input queue and report the current space.  ~”Hi, I’ve got 
buffers available for 20 packets, totalling 20kB.”~

[Les2:] None of the implementations I have worked on (at least 3) work this way.


For me how to do this is not at all obvious given common implementation issues 
such as:


  *   Sharing of a single punt path queue among many incoming 
protocols/incoming interfaces


The receiver gets to decide how much window it wants to provide to each 
transmitter. Some oversubscription is probably a good thing.
[Les2:] That wasn’t my point. Neither of us Is advocating trying to completely 
eliminate retransmissions and/or transient overload.
And since drops are possible, looking at the length of an input queue isn’t 
necessarily going to tell you whether you are indeed overloaded and if so due 
to what interface(s).
Tx side flow control is agnostic to receiver implementation strategy and the 
reasons why LSPs remain unacknowledged..



  *   Single interface independent input queue to IS-IS itself, making it 
difficult to track the contribution of a single interface to the current backlog


It’s not clear that this is problematic.  Again, reporting the window size in 
this queue is helpful.

[Les2:] Sorry, this is exactly the sort of generic statement that doesn’t add 
much. I know you believe this, but you need to explain how this is better than 
simply looking at what remains unacknowledged.


  *   Distributed dataplanes


This should definitely be a non-issue. An implementation should know the data 
path from the interface to the IS-IS process, for all data planes involved, and 
measure accordingly.

[Les2:] Again, you provide no specifics. Measure “what” accordingly? IF I do 
not have a queue dedicated solely to IS-IS packets to be punted (and 
implementations may well use a single queue for multiple protocols) what should 
I measure? How to get that info to the control plane in real time?

If we are to introduce new signaling/protocol extensions there needs to be good 
reason and it must be practical to implement – especially since we have an 
alternate solution which is practical to implement, dynamically responds to 
current state, and does not require any protocol extensions.


If we are to introduce new behaviors, they must be helpful. Estimates that do 
not utilize the available information may be sufficiently erroneous as to be 
harmful (see silly window syndrome).

[Les2:] Again – you try to apply TCP heuristics to IGP flooding. Not at all 
intuitive to me that this applies – I have stated why.

   Les

Tony


___
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr


Re: [Lsr] Flow Control Discussion for IS-IS Flooding Speed

2020-02-18 Thread tony . li

Les,

> Overall, I think you are making  general statements and not providing needed 
> specifics.


I’m sorry it’s not specific enough for you.  I’m not sure that I can help to 
your satisfaction.


> Maybe it’s obvious to you how a receiver based window would be calculated – 
> but it isn’t obvious to me – so please help me out here with specifics.
> What inputs do you need on the receive side in order to do the necessary 
> calculation?


Well, there can be many, as it depends on the receiver’s architecture. Now, I 
can’t talk about things that are under NDA or company secret, so I’m pretty 
constrained.  Talking about any specific implementation is going to not be very 
helpful, so I propose that we stick with a simplified model to start: a box 
with N interfaces and a single input queue up to the CPU.  The input queue is 
the only possible bottleneck.  Further, the avoid undue complexity (for the 
moment — it may return), let’s assume that the input queue is in max-MTU sized 
packets, so that knowing the free entries in this queue is entirely sufficient. 
 Let the number of free entries be F.

As previously noted, we will want some oversubscription factor.  For the sake 
of a simple model, let’s consider this a constant and call it O.  [For future 
reference, I suspect that we will want to come back and make this more 
sophisticated, such as a Kalman filter, but again, to start simply… ]

Now, we want to report the free space devoted to the interface, but derated by 
the oversubscription factor, so we end up reporting F*O/N. 

Is that specific enough?


> What assumptions are you making about how an implementation receives, punts, 
> dequeues IS-IS LSPs?


None.


> And how will this lead to better performance than having TX react to actual 
> throughput?


The receiver will have better information. The transmitter can now convey 
useful things like “I processed all of your packets but my queue is still 
congested”, this would be a PSN that acknowledges all outstanding LSPs but 
shows no free buffers.

 
> And please do not say  “just like TCP”. I have made some specific statements 
> about how managing the resources associated with a TCP connection is not at 
> all similar to managing resources for IGP flooding.
> If you disagree – please provide some specific explanations.


I disagree with your disagreement.  A control loop is a very simple primitive 
in control theory.  That’s what we’re trying to create.  Modulating the receive 
window through control feedback is a control theory 101 technique.


>  It can look at its input queue and report the current space.  ~”Hi, I’ve got 
> buffers available for 20 packets, totalling 20kB.”~  
>  
> [Les2:] None of the implementations I have worked on (at least 3) work this 
> way.


Well, sorry, some of them do.  In particular the Cisco AGS+ worked exactly this 
way under IOS Classic in the day.  It may have morphed.


> For me how to do this is not at all obvious given common implementation 
> issues such as:
>  
> Sharing of a single punt path queue among many incoming protocols/incoming 
> interfaces
>  
>  
> The receiver gets to decide how much window it wants to provide to each 
> transmitter. Some oversubscription is probably a good thing.
> [Les2:] That wasn’t my point. Neither of us Is advocating trying to 
> completely eliminate retransmissions and/or transient overload.
> And since drops are possible, looking at the length of an input queue isn’t 
> necessarily going to tell you whether you are indeed overloaded and if so due 
> to what interface(s).


Looking at the length of the input queue does give you a snapshot at your 
congestion level.  You are correct, it does NOT ascribe it to specific 
interfaces.  A more sophisticated implementation might modulate its receive 
window inversely proportional to its interface input rate.


> Tx side flow control is agnostic to receiver implementation strategy and the 
> reasons why LSPs remain unacknowledged..


Yes, it’s ignorant.  That doesn’t make it better.  The point is to maximize the 
goodput.  Systems theory tells us that we improve frequency response when we 
provide feedback.  That’s all I’m suggesting.


>  
> Distributed dataplanes
>  
>  
> This should definitely be a non-issue. An implementation should know the data 
> path from the interface to the IS-IS process, for all data planes involved, 
> and measure accordingly.
>  
> [Les2:] Again, you provide no specifics. Measure “what” accordingly?


The input queue size for the data path from the given interface.


> IF I do not have a queue dedicated solely to IS-IS packets to be punted (and 
> implementations may well use a single queue for multiple protocols) what 
> should I measure? How to get that info to the control plane in real time?


You should STILL use that queue size.  That is still the bottleneck.

You get that to the control plane by doing a PIO to the queue status register 
in the dataplane ASIC.  This is trivial.


> If we