Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-29 Thread Bob McMahon via Cake
Thanks for pointing out the congestion window. Not sure why it doesn't
increase. I think that takes a stack expert ;) The run below with rx window
clamp does seem to align with linux blocking the writes.

Yes, in the previous runt the worst cases were 5.121ms which does align
with the RTT.

As a side note: I wonder if WiFi AP folks can somehow better "schedule
aggregates" based on GSO "predictions." One of the challenges for WiFi is
to align aggregates with what TCP is feeding it. I'm not sure if an
intermediary last hop AP could keep the queue size based upon the e2e
source "big tcp" so-to-speak. This is all out of my areas of expertise but
it might be nice if the two non-linear control loops, being the AP &
802.11ax first/last link hop scheduling and e2e TCP's feedback loop could
somehow plugged together in a way to help with both e2e low latency and
throughput.

Here's a run with receive side window clamping set to 1024 bytes which I
think should force CWND not to grow. In this case it does look like
linux is blocking the writes as the TCP_NOTSENT_LOWAT select waits are sub
100 microseconds so the write must have blocked.

[root@localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i
1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
WARN: option of --burst-size without --burst-period defaults --burst-period
to 1 second

Client connecting to 192.168.1.1, TCP port 5001 with pid 24601 via eth1 (1
flows)
Write buffer size: 4096 Byte
Bursting: 40.0 KByte every 1.00 seconds
TCP window size: 85.0 KByte (default)
Event based writes (pending queue watermark at 4 bytes)
Enabled select histograms bin-width=0.100 ms, bins=1

[  1] local 192.168.1.4%eth1 port 46042 connected with 192.168.1.1 port
5001 (MSS=576) (prefetch=4) (trip-times) (sock=3) (ct=5.01 ms) on
2021-10-29 13:57:22 (PDT)
[ ID] IntervalTransferBandwidth   Write/Err  Rtry
Cwnd/RTTNetPwr
[  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  10/0  0
 5K/10109 us  4
[  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,40:1,50:7,51:1
(5.00/95.00/99.7%=1/51/51,Outliers=0,obl/obu=0/0) (5.015
ms/1635541042.537251)
[  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/4941 us  8
[  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10
(5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541043.465805)
[  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/5036 us  8
[  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10
(5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.013 ms/1635541044.602288)
[  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/4956 us  8
[  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10
(5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541045.465820)
[  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/5121 us  8
[  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10
(5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.014 ms/1635541046.664221)
[  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/5029 us  8
[  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10
(5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.091 ms/1635541047.466021)
[  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/4930 us  8
[  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:9,2:1
(5.00/95.00/99.7%=1/2/2,Outliers=0,obl/obu=0/0) (0.121 ms/1635541048.466058)
[  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/5096 us  8
[  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10
(5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541049.465821)
[  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/5086 us  8
[  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10
(5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541050.466051)
[  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
 5K/5112 us  8
[  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:10
(5.00/95.00/99.7%=1/1/1,Outliers=0,obl/obu=0/0) (0.015 ms/1635541051.465915)
[  1] 0.00-10.02 sec   400 KBytes   327 Kbits/sec  100/0  0
 5K/6518 us  6
[  1] 0.00-10.02 sec S8(f)-PDF:
bin(w=100us):cnt(100)=1:90,2:1,40:1,50:7,51:1
(5.00/95.00/99.7%=1/50/51,Outliers=9,obl/obu=0/0) (5.015
ms/1635541042.537251)


[root@localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%ap0
--tcp-rx-window-clamp 1024

Server listening on TCP port 5001 with pid 22772
Binding to local address 192.168.1.1 and iface ap0
Read buffer size:  128 KByte (Dist bin width=16.0 KByte)
TCP window size:  128 KByte (default)

[  1] local 192.168.1.1%ap0 port 5001 connected with 192.168.1.4 port 46042
(MSS=1448) (clamp=1024) (burst-period=1.00s) (trip-times) (sock=4) (peer

Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-28 Thread Christoph Paasch via Cake


> On Oct 26, 2021, at 8:45 PM, Bob McMahon  wrote:
> 
> This is linux. The code flow is burst writes until the burst size, take a 
> timestamp, call select(), take second timestamp and insert time delta into 
> histogram, await clock_nanosleep() to schedule the next burst. (actually, the 
> deltas, inserts into the histogram and user i/o are done in another thread, 
> i.e. iperf 2's reporter thread.)
> I still must be missing something.  Does anything else need to be set to 
> reduce the skb size? Everything seems to be indicating 4K writes even when 
> gso_max_size is 2000 (I assume these are units of bytes?) There are ten 
> writes, ten reads and ten  RTTs for the bursts.  I don't see partial writes 
> at the app level. 

One thing to keep in mind is that once the congestion-window increased to > 
40KB (your burst-size), all of the writes will not be blocking at all. 
TCP_NOTSENT_LOWAT is really just about the "notsent" part. Once the 
congestion-window is big enough to send 40KB in a burst, it will just all be 
immediately sent out.

> [root@localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000
> [root@localhost iperf2-code]# ip -d link sh dev eth1
> 9: eth1:  mtu 1500 qdisc fq_codel state 
> UNKNOWN mode DEFAULT group default qlen 1000
> link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 
> 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 
> 2000 gso_max_segs 65535
> [root@localhost iperf2-code]# uname -r
> 5.0.9-301.fc30.x86_64
> 
> It looks like RTT is being driven by WiFi TXOPs as doubling the write size 
> increases the aggregation by two but has no significant effect on the RTTs.
> 
> 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2
> 
> 8k writes:  tot_mpdus 317 tot_ampdus 107 mpduperampdu 3
> 
> [root@localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i 1 
> -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
> WARN: option of --burst-size without --burst-period defaults --burst-period 
> to 1 second
> 
> Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 (1 
> flows)
> Write buffer size: 4096 Byte
> Bursting: 40.0 KByte every 1.00 seconds
> TCP window size: 85.0 KByte (default)
> Event based writes (pending queue watermark at 4 bytes)
> Enabled select histograms bin-width=0.100 ms, bins=1
> 
> [  1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port 5001 
> (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on 2021-10-26 
> 20:25:29 (PDT)
> [ ID] IntervalTransferBandwidth   Write/Err  Rtry 
> Cwnd/RTTNetPwr
> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0  0   
> 14K/10091 us  4
> [  1] 0.00-1.00 sec S8-PDF: 
> bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 
> (5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121 ms/1635305129.152339)

Am I reading this correctly, that your writes take worst-case 5 milli-seconds ?

This looks correct then, because you seem to have an RTT of around 5ms.


It's surprising though that your congestion-window is not increasing.


Christoph


> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/4990 us  8
> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1 
> (5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991 ms/1635305130.153330)
> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/4904 us  8
> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1 
> (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455 ms/1635305131.147353)
> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/4964 us  8
> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1 
> (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460 ms/1635305132.146338)
> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/4970 us  8
> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1 
> (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404 ms/1635305133.146335)
> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/4986 us  8
> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1 
> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395 ms/1635305134.146343)
> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/5059 us  8
> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1 
> (5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343)
> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/5407 us  8
> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1 
> (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) 

Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-27 Thread Christoph Paasch via Cake
Hello,

> On Oct 25, 2021, at 9:24 PM, Eric Dumazet  wrote:
> 
> 
> 
> On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote:
>> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast 
>>  wrote:
>> 
>>> Hi All,
>>> 
>>> Sorry for the spam. I'm trying to support a meaningful TCP message latency 
>>> w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I 
>>> thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It 
>>> seems that this event goes off when the bytes are in flight vs have reached 
>>> the destination network stack. If that's the case, then iperf 2 client 
>>> (sender) may be able to produce the message latency by adding the drain 
>>> time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT.
>>> 
>>> Does this seem reasonable?
>> 
>> I’m not 100% sure what you’re asking, but I will try to help.
>> 
>> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your 
>> endpoint as writable (e.g., via kqueue or epoll) until less than that 
>> threshold of data remains unsent. It won’t stop you writing more bytes if 
>> you want to, up to the socket send buffer size, but it won’t *ask* you for 
>> more data until the TCP_NOTSENT_LOWAT threshold is reached.
> 
> 
> When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that 
> sendmsg() would actually
> stop feeding more bytes in TCP transmit queue if the current amount of unsent 
> bytes
> was above the threshold.
> 
> So it looks like Apple implementation is different, based on your description 
> ?

Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/...

An app can still fill the send-buffer if it does a sendmsg() with a large 
buffer or does repeated calls to sendmsg().

Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to quickly change 
the data it "scheduled" to send. And thus allow the app to write the smallest 
"logical unit" it has. If that unit is 512KB large, the app is allowed to send 
that.
For example, in case of video-streaming one may want to skip ahead in the 
video. In that case the app still needs to transmit the remaining parts of the 
previous frame anyways, before it can send the new video frame.
That's the reason why the Apple implementation allows one to write more than 
just the lowat threshold.


That being said, I do think that Linux's way allows for an easier API because 
the app does not need to be careful at how much data it sends after an 
epoll/kqueue wakeup. So, the latency-benefits will be easier to get.


Christoph



> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36
> 
> netperf does not use epoll(), but rather a loop over sendmsg().
> 
> One of the point of TCP_NOTSENT_LOWAT for Google was to be able to 
> considerably increase
> max number of bytes in transmit queues (3rd column of 
> /proc/sys/net/ipv4/tcp_wmem)
> by 10x, allowing for autotune to increase BDP for big RTT flows, this without
> increasing memory needs for flows with small RTT.
> 
> In other words, the TCP implementation attempts to keep BDP bytes in flight + 
> TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight 
> is necessary to fill the network pipe and get good throughput. The 
> TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the 
> source software some advance notice that the TCP implementation will soon be 
> looking for more bytes to send, so that the buffer doesn’t run dry, thereby 
> lowering throughput. (The old SO_SNDBUF option conflates both “bytes in 
> flight” and “bytes buffered and ready to go” into the same number.)
>> 
>> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes 
>> of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that 
>> will tell you roughly how long it took n bytes to depart the machine. You 
>> won’t know why, though. The bytes could depart the machine in response for 
>> acks indicating that the same number of bytes have been accepted at the 
>> receiver. But the bytes can also depart the machine because CWND is growing. 
>> Of course, both of those things are usually happening at the same time.
>> 
>> How to use TCP_NOTSENT_LOWAT is explained in this video:
>> 
>> 
>> 
>> Later in the same video is a two-minute demo (time offset 42:00 to time 
>> offset 44:00) showing a “before and after” demo illustrating the dramatic 
>> difference this makes for screen sharing responsiveness.
>> 
>> 
>> 
>> Stuart Cheshire
>> ___
>> Bloat mailing list
>> bl...@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>> 
> ___
> Bloat mailing list
> bl...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-27 Thread Christoph Paasch via Cake
Hi Bob,

> On Oct 26, 2021, at 4:23 PM, Bob McMahon  wrote:
> I'm confused. I don't see any blocking nor partial writes per the write() at 
> the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K, the 
> write size is 4K and the watermark is 4 bytes. There are ten writes per burst.

You are on Linux here, right?

AFAICS, Linux will still accept whatever fits in an skb. And that is likely 
more than 4K (with GSO on by default).

However, do you go back to select() after each write() or do you loop over the 
write() calls?


Christoph

> The S8 histograms are the times waiting on the select().  The first value is 
> the bin number (multiplied by 100usec bin width) and second the bin count. 
> The worst case time is at the end and is timestamped per unix epoch.
> 
> The second run is over a controlled WiFi link where a 99.7% point of 4-8ms 
> for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired 
> and is in the 600 usec range. (No media arbitration there.)
> 
>  [root@localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 -e 
> --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
> WARN: option of --burst-size without --burst-period defaults --burst-period 
> to 1 second
> 
> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows)
> Write buffer size: 4096 Byte
> Bursting: 40.0 KByte every 1.00 seconds
> TCP window size: 85.0 KByte (default)
> Event based writes (pending queue watermark at 4 bytes)
> Enabled select histograms bin-width=0.100 ms, bins=1
> 
> [  1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 5001 
> (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26 
> 16:07:33 (PDT)
> [ ID] IntervalTransferBandwidth   Write/Err  Rtry 
> Cwnd/RTTNetPwr
> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0  0   
> 14K/5368 us  8
> [  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 
> (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360)
> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/569 us  72
> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 
> (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088)
> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/312 us  131
> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 
> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776)
> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/302 us  136
> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 
> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814)
> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/316 us  130
> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 
> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810)
> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/253 us  162
> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 
> (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630)
> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/290 us  141
> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 
> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771)
> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/359 us  114
> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 
> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753)
> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/349 us  117
> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 
> (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843)
> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0   
> 14K/347 us  118
> [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 
> (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861)
> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0  0   
> 14K/1519 us  27
> [  1] 0.00-10.01 sec S8(f)-PDF: 
> bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 
> (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360)
> 
> [root@localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i 1 -e 
> --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
> WARN: option of --burst-size without --burst-period defaults --burst-period 
> to 1 second
> 
> Client connecting to 192.168.1.1, TCP port 5001 with pid 

Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-26 Thread Eric Dumazet


On 10/26/21 8:45 PM, Bob McMahon wrote:
> This is linux. The code flow is burst writes until the burst size, take a 
> timestamp, call select(), take second timestamp and insert time delta into 
> histogram, await clock_nanosleep() to schedule the next burst. (actually, the 
> deltas, inserts into the histogram and user i/o are done in another thread, 
> i.e. iperf 2's reporter thread.)
> 
> I still must be missing something.  Does anything else need to be set to 
> reduce the skb size? Everything seems to be indicating 4K writes even when 
> gso_max_size is 2000 (I assume these are units of bytes?) There are ten 
> writes, ten reads and ten  RTTs for the bursts.  I don't see partial writes 
> at the app level. 
> 
> [root@localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000

You could check with tcpdump on eth1, that outgoing packets are no longer 
'TSO/GSO', but single MSS ones.

(Note: this device gso_max_size is only taken into account for flows 
established after the change)

> 
> [root@localhost iperf2-code]# ip -d link sh dev eth1
> 9: eth1:  mtu 1500 qdisc fq_codel state 
> UNKNOWN mode DEFAULT group default qlen 1000
>     link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 
> minmtu 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 
> gso_max_size 2000 gso_max_segs 65535
> [root@localhost iperf2-code]# uname -r
> 5.0.9-301.fc30.x86_64
> 
> 
> It looks like RTT is being driven by WiFi TXOPs as doubling the write size 
> increases the aggregation by two but has no significant effect on the RTTs.
> 
> 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2
> 
> 
> 8k writes:  tot_mpdus 317 tot_ampdus 107 mpduperampdu 3
> 
> 
> [root@localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i 1 
> -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
> WARN: option of --burst-size without --burst-period defaults --burst-period 
> to 1 second
> 
> Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 (1 
> flows)
> Write buffer size: 4096 Byte
> Bursting: 40.0 KByte every 1.00 seconds
> TCP window size: 85.0 KByte (default)
> Event based writes (pending queue watermark at 4 bytes)
> Enabled select histograms bin-width=0.100 ms, bins=1
> 
> [  1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port 5001 
> (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on 2021-10-26 
> 20:25:29 (PDT)
> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     
> Cwnd/RTT        NetPwr
> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       
> 14K/10091 us  4
> [  1] 0.00-1.00 sec S8-PDF: 
> bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 
> (5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121 ms/1635305129.152339)
> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4990 us  8
> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1 
> (5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991 ms/1635305130.153330)
> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4904 us  8
> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1 
> (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455 ms/1635305131.147353)
> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4964 us  8
> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1 
> (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460 ms/1635305132.146338)
> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4970 us  8
> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1 
> (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404 ms/1635305133.146335)
> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4986 us  8
> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1 
> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395 ms/1635305134.146343)
> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/5059 us  8
> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1 
> (5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343)
> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/5407 us  8
> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1 
> (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.428 ms/1635305136.147343)
> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/5188 us  8
> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:3,50:3,64:1 
> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.388 ms/1635305137.146284)
> [  1] 9.00-10.00 sec  40.0 KBytes   

Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-26 Thread Bob McMahon via Cake
This is linux. The code flow is burst writes until the burst size, take a
timestamp, call select(), take second timestamp and insert time delta into
histogram, await clock_nanosleep() to schedule the next burst. (actually,
the deltas, inserts into the histogram and user i/o are done in another
thread, i.e. iperf 2's reporter thread.)

I still must be missing something.  Does anything else need to be set to
reduce the skb size? Everything seems to be indicating 4K writes even when
gso_max_size is 2000 (I assume these are units of bytes?) There are ten
writes, ten reads and ten  RTTs for the bursts.  I don't see partial writes
at the app level.

[root@localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000
[root@localhost iperf2-code]# ip -d link sh dev eth1
9: eth1:  mtu 1500 qdisc fq_codel state
UNKNOWN mode DEFAULT group default qlen 1000
link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu
68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size
2000 gso_max_segs 65535
[root@localhost iperf2-code]# uname -r
5.0.9-301.fc30.x86_64


It looks like RTT is being driven by WiFi TXOPs as doubling the write size
increases the aggregation by two but has no significant effect on the RTTs.

4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2


8k writes:  tot_mpdus 317 tot_ampdus 107 mpduperampdu 3


[root@localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i
1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
WARN: option of --burst-size without --burst-period defaults --burst-period
to 1 second

Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 (1
flows)
Write buffer size: 4096 Byte
Bursting: 40.0 KByte every 1.00 seconds
TCP window size: 85.0 KByte (default)
Event based writes (pending queue watermark at 4 bytes)
Enabled select histograms bin-width=0.100 ms, bins=1

[  1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port
5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on
2021-10-26 20:25:29 (PDT)
[ ID] IntervalTransferBandwidth   Write/Err  Rtry
Cwnd/RTTNetPwr
[  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0  0
14K/10091 us  4
[  1] 0.00-1.00 sec S8-PDF:
bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1
(5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121
ms/1635305129.152339)
[  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/4990 us  8
[  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1
(5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991
ms/1635305130.153330)
[  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/4904 us  8
[  1] 2.00-3.00 sec S8-PDF:
bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1
(5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455
ms/1635305131.147353)
[  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/4964 us  8
[  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1
(5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460
ms/1635305132.146338)
[  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/4970 us  8
[  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1
(5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404
ms/1635305133.146335)
[  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/4986 us  8
[  1] 5.00-6.00 sec S8-PDF:
bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1
(5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395
ms/1635305134.146343)
[  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/5059 us  8
[  1] 6.00-7.00 sec S8-PDF:
bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1
(5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417
ms/1635305135.148343)
[  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/5407 us  8
[  1] 7.00-8.00 sec S8-PDF:
bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1
(5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.428
ms/1635305136.147343)
[  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/5188 us  8
[  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:3,50:3,64:1
(5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.388
ms/1635305137.146284)
[  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/5306 us  8
[  1] 9.00-10.00 sec S8-PDF:
bin(w=100us):cnt(10)=1:2,39:1,49:2,50:2,51:1,60:1,65:1
(5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.422
ms/1635305138.146316)
[  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0  0
14K/5939 us  7
[  1] 0.00-10.01 sec S8(f)-PDF:
bin(w=100us):cnt(100)=1:19,29:1,36:1,39:3,40:3,44:1,45:1,46:1,48:2,49:33,50:18,51:1,52:1,59:5,60:2,64:2,65:3,75:2,85:1
(5.00/95.00/99.7%=1/65/85,Outliers=0,obl/obu=0/0) (8.417
ms/1635305135.148343)

[root@localhost iperf2-code]# 

Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-26 Thread Eric Dumazet


On 10/26/21 4:38 PM, Christoph Paasch wrote:
> Hi Bob,
> 
>> On Oct 26, 2021, at 4:23 PM, Bob McMahon > > wrote:
>> I'm confused. I don't see any blocking nor partial writes per the write() at 
>> the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K, the 
>> write size is 4K and the watermark is 4 bytes. There are ten writes per 
>> burst.
> 
> You are on Linux here, right?
> 
> AFAICS, Linux will still accept whatever fits in an skb. And that is likely 
> more than 4K (with GSO on by default).

This (max payload per skb) can be tuned at the driver level, at least for 
experimental purposes or dedicated devices.

ip link set dev eth0 gso_max_size 8000

To fetch current values :

ip -d link sh dev eth0


> 
> However, do you go back to select() after each write() or do you loop over 
> the write() calls?
> 
> 
> Christoph
> 
>> The S8 histograms are the times waiting on the select().  The first value is 
>> the bin number (multiplied by 100usec bin width) and second the bin count. 
>> The worst case time is at the end and is timestamped per unix epoch.
>>
>> The second run is over a controlled WiFi link where a 99.7% point of 4-8ms 
>> for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired 
>> and is in the 600 usec range. (No media arbitration there.)
>>
>>  [root@localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 -e 
>> --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
>> WARN: option of --burst-size without --burst-period defaults --burst-period 
>> to 1 second
>> 
>> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows)
>> Write buffer size: 4096 Byte
>> Bursting: 40.0 KByte every 1.00 seconds
>> TCP window size: 85.0 KByte (default)
>> Event based writes (pending queue watermark at 4 bytes)
>> Enabled select histograms bin-width=0.100 ms, bins=1
>> 
>> [  1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 5001 
>> (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26 
>> 16:07:33 (PDT)
>> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     
>> Cwnd/RTT        NetPwr
>> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       
>> 14K/5368 us  8
>> [  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 
>> (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 
>> ms/1635289653.928360)
>> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/569 us  72
>> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 
>> (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088)
>> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/312 us  131
>> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 
>> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776)
>> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/302 us  136
>> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 
>> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814)
>> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/316 us  130
>> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 
>> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810)
>> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/253 us  162
>> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 
>> (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630)
>> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/290 us  141
>> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 
>> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771)
>> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/359 us  114
>> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 
>> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753)
>> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/349 us  117
>> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 
>> (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843)
>> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
>> 14K/347 us  118
>> [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 
>> (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861)
>> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       
>> 14K/1519 us  27
>> [  1] 0.00-10.01 sec S8(f)-PDF: 
>> bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 
>> 

Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-26 Thread Bob McMahon via Cake
I'm confused. I don't see any blocking nor partial writes per the write()
at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K,
the write size is 4K and the watermark is 4 bytes. There are ten writes per
burst.

The S8 histograms are the times waiting on the select().  The first value
is the bin number (multiplied by 100usec bin width) and second the bin
count. The worst case time is at the end and is timestamped per unix epoch.

The second run is over a controlled WiFi link where a 99.7% point of 4-8ms
for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired
and is in the 600 usec range. (No media arbitration there.)

 [root@localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 -e
--tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
WARN: option of --burst-size without --burst-period defaults --burst-period
to 1 second

Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows)
Write buffer size: 4096 Byte
Bursting: 40.0 KByte every 1.00 seconds
TCP window size: 85.0 KByte (default)
Event based writes (pending queue watermark at 4 bytes)
Enabled select histograms bin-width=0.100 ms, bins=1

[  1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 5001
(MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26
16:07:33 (PDT)
[ ID] IntervalTransferBandwidth   Write/Err  Rtry
Cwnd/RTTNetPwr
[  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0  0
14K/5368 us  8
[  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1
(5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089
ms/1635289653.928360)
[  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/569 us  72
[  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1
(5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088)
[  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/312 us  131
[  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1
(5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776)
[  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/302 us  136
[  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1
(5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814)
[  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/316 us  130
[  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1
(5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810)
[  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/253 us  162
[  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1
(5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630)
[  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/290 us  141
[  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1
(5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771)
[  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/359 us  114
[  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1
(5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753)
[  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/349 us  117
[  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1
(5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843)
[  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0  0
14K/347 us  118
[  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1
(5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861)
[  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0  0
14K/1519 us  27
[  1] 0.00-10.01 sec S8(f)-PDF:
bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1
(5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089
ms/1635289653.928360)

[root@localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i 1 -e
--tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
WARN: option of --burst-size without --burst-period defaults --burst-period
to 1 second

Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows)
Write buffer size: 4096 Byte
Bursting: 40.0 KByte every 1.00 seconds
TCP window size: 85.0 KByte (default)
Event based writes (pending queue watermark at 4 bytes)
Enabled select histograms bin-width=0.100 ms, bins=1

[  1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 port
5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on
2021-10-26 16:07:56 (PDT)
[ ID] IntervalTransferBandwidth   Write/Err  Rtry
Cwnd/RTTNetPwr
[  1] 0.00-1.00 sec  40.1 KBytes 

Re: [Cake] [Bloat] [Make-wifi-fast] TCP_NOTSENT_LOWAT applied to e2e TCP msg latency

2021-10-25 Thread Eric Dumazet


On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote:
> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast 
>  wrote:
> 
>> Hi All,
>>
>> Sorry for the spam. I'm trying to support a meaningful TCP message latency 
>> w/iperf 2 from the sender side w/o requiring e2e clock synchronization. I 
>> thought I'd try to use the TCP_NOTSENT_LOWAT event to help with this. It 
>> seems that this event goes off when the bytes are in flight vs have reached 
>> the destination network stack. If that's the case, then iperf 2 client 
>> (sender) may be able to produce the message latency by adding the drain time 
>> (write start to TCP_NOTSENT_LOWAT) and the sampled RTT.
>>
>> Does this seem reasonable?
> 
> I’m not 100% sure what you’re asking, but I will try to help.
> 
> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report your 
> endpoint as writable (e.g., via kqueue or epoll) until less than that 
> threshold of data remains unsent. It won’t stop you writing more bytes if you 
> want to, up to the socket send buffer size, but it won’t *ask* you for more 
> data until the TCP_NOTSENT_LOWAT threshold is reached.


When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure that 
sendmsg() would actually
stop feeding more bytes in TCP transmit queue if the current amount of unsent 
bytes
was above the threshold.

So it looks like Apple implementation is different, based on your description ?

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36

netperf does not use epoll(), but rather a loop over sendmsg().

One of the point of TCP_NOTSENT_LOWAT for Google was to be able to considerably 
increase
max number of bytes in transmit queues (3rd column of 
/proc/sys/net/ipv4/tcp_wmem)
by 10x, allowing for autotune to increase BDP for big RTT flows, this without
increasing memory needs for flows with small RTT.

 In other words, the TCP implementation attempts to keep BDP bytes in flight + 
TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of bytes in flight is 
necessary to fill the network pipe and get good throughput. The 
TCP_NOTSENT_LOWAT of bytes buffered and ready to go is provided to give the 
source software some advance notice that the TCP implementation will soon be 
looking for more bytes to send, so that the buffer doesn’t run dry, thereby 
lowering throughput. (The old SO_SNDBUF option conflates both “bytes in flight” 
and “bytes buffered and ready to go” into the same number.)
> 
> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk of n bytes 
> of data, and then wait for the next TCP_NOTSENT_LOWAT notification, that will 
> tell you roughly how long it took n bytes to depart the machine. You won’t 
> know why, though. The bytes could depart the machine in response for acks 
> indicating that the same number of bytes have been accepted at the receiver. 
> But the bytes can also depart the machine because CWND is growing. Of course, 
> both of those things are usually happening at the same time.
> 
> How to use TCP_NOTSENT_LOWAT is explained in this video:
> 
> 
> 
> Later in the same video is a two-minute demo (time offset 42:00 to time 
> offset 44:00) showing a “before and after” demo illustrating the dramatic 
> difference this makes for screen sharing responsiveness.
> 
> 
> 
> Stuart Cheshire
> ___
> Bloat mailing list
> bl...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
> 
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake