Increasing TCP TSO size support

2024-02-02 Thread Scheffenegger, Richard
Hi, We have run a test for a RPC workload with 1MB IO sizes, and collected the tcp_default_output() len(gth) during the first pass in the output loop. In such a scenario, where the application frequently introduces small pauses (since the next large IO is only sent after the corresponding

Re: TSO + ECN

2023-12-22 Thread Scheffenegger, Richard
hen the CWR bit is encountered... I would like to gather some feedback by those who work on the various network drivers (intel, mlx, virtio, ...) if that sounds like a viable plan to rectify the sad state of ECN support with TSO - while becoming future-proof. > On Dec 20, 2023, at 12:1

TSO + ECN

2023-12-20 Thread Scheffenegger, Richard
Hi, I am curious if anyone here has expirience with the handling of ECN in TSO-enabled drivers/hardware... The other day I found that the virtio driver would bail out with ENOTSUP when encountering the TCP CWR header bit on a TSO-enabled flow, when the host does not also claim ECN-support

RE: Network starvation question

2023-11-04 Thread Scheffenegger, Richard
Cheng is correct; A non-reactive UDP flow (an application pushing data as quickly as it can, without any regards if the packet even departs the machine) will always be able to ursurp excessive amounts of network capacity. TCP uses (well-designed) congestion control, in order to prevent TCP

RE: Very slow scp performance comparing to Linux

2023-10-25 Thread Scheffenegger, Richard
Posting the full "iperf3 -i 1" output, as well as "netstat -snp tcp" before and after (or just the delta) would be nice; On high speed NICs, iperf3 is nowadays typically core-limited (scales with clock speed of the active core where the singular worker thread is running), but that should be

RE: Very slow scp performance comparing to Linux

2023-10-25 Thread Scheffenegger, Richard
Posting the full "iperf3 -i 1" output, as well as "netstat -snp tcp" before and after (or just the delta) would be nice; On high speed NICs, iperf3 is nowadays typically core-limited (scales with clock speed of the active core where the singular worker thread is running), but that should be

RE: BPF to filter/mod ARP

2023-03-01 Thread Scheffenegger, Richard
>> On 1. Mar 2023, at 21:33, Scheffenegger, Richard wrote: >> >> Hi group, >> >> Maybe someone can help me with this question - as I am usually only looking >> at L4 and the top side of L3 ;) > >> In order to validate a peculiar switches behavior,

mlx5en & tcpdump -Q

2023-03-01 Thread Scheffenegger, Richard
Related to the other issue just mentioned, I found that when trying to perform unidirectional packet captures using the tcpdump -Q option, when trying this against a CX5 NIC, i get this error message: tcpdump: e4a: pcap_setdirection() failed: Setting direction is not implemented on this

BPF to filter/mod ARP

2023-03-01 Thread Scheffenegger, Richard
Hi group, Maybe someone can help me with this question - as I am usually only looking at L4 and the top side of L3 ;) In order to validate a peculiar switches behavior, I want to adjust some fields in gracious arps sent out by an interface, after a new IP is assigned or changed. I believe

RE: Too aggressive TCP ACKs

2022-11-10 Thread Scheffenegger, Richard
This is the current draft in this space: https://datatracker.ietf.org/doc/draft-gomez-tcpm-ack-rate-request/ and it has been adopted as WG document at this weeks IETF, from what I can tell. So it has traction – if you want to give your feedback, please subscribe to the tcpm mailing list, and

RE: Too aggressive TCP ACKs

2022-10-27 Thread Scheffenegger, Richard
To come back to this. With TSO / LRO disabled, FBSD is behaving per RFC by acking every other data packet, or delaying an ACK (e.g when data stops on an “uneven” packet) after a short delay (delayed ACKs). If you want to see different ACK ratios, and also higher gigabit throughput rates for a

RE: Too aggressive TCP ACKs

2022-10-27 Thread Scheffenegger, Richard
It focuses on QUIC, but congestion control dynamics don't change with the protocol. You should be able to read there, but if not I'm happy to send anyone a pdf. >>> Is QUIC using an L=2 for ABC? >> >> I think that is the rfc recommendation, actual deployed reality is >> more

RE: IPv6 - NS, DAD and MLDv2 interaction

2022-02-23 Thread Scheffenegger, Richard
-Original Message- From: Lutz Donnerhacke > Yup. IPv6 replaced broadcast by multicast on the link layer. > >> It appears that some vendors of switches have started to become overly >> restrictive in forwarding Ethernet Multicast, and only deliver these >> *after* a Host has registered

IPv6 - NS, DAD and MLDv2 interaction

2022-02-23 Thread Scheffenegger, Richard
Hi, I hope someone more knowledgeable then me in IPv6 affairs can give an informed opinion on the following: As far as I know, an IPv6 host initially tries to perform Duplicate Address Detection, as well as Neighbor Discovery / Neighbor Solicitation. All of this typically works on Ethernet,

AW: NFS Mount Hangs

2021-04-12 Thread Scheffenegger, Richard
1857 Mobile Phone richard.scheffeneg...@netapp.com https://ts.la/richard49892 -Ursprüngliche Nachricht- Von: Rick Macklem Gesendet: Montag, 12. April 2021 00:50 An: Scheffenegger, Richard ; tue...@freebsd.org Cc: Youssef GHORBAL ; freebsd-net@freebsd.org Betreff: Re: NFS Mount Hangs Ne

Re: NFS Mount Hangs

2021-04-11 Thread Scheffenegger, Richard
still shows differences between prior to my central upcall change, post that change and with d29690 ... Von: tue...@freebsd.org Gesendet: Sunday, April 11, 2021 2:30:09 PM An: Rick Macklem Cc: Scheffenegger, Richard ; Youssef GHORBAL ; freebsd-net@freebsd.org Betr

AW: NFS Mount Hangs

2021-04-10 Thread Scheffenegger, Richard
bile Phone richard.scheffeneg...@netapp.com https://ts.la/richard49892 -Ursprüngliche Nachricht- Von: tue...@freebsd.org Gesendet: Samstag, 10. April 2021 18:13 An: Rick Macklem Cc: Scheffenegger, Richard ; Youssef GHORBAL ; freebsd-net@freebsd.org Betreff: Re: NFS Mount Hangs Ne

Re: NFS Mount Hangs

2021-04-10 Thread Scheffenegger, Richard
Von: tue...@freebsd.org Gesendet: Samstag, April 10, 2021 2:19 PM An: Scheffenegger, Richard Cc: Rick Macklem; Youssef GHORBAL; freebsd-net@freebsd.org Betreff: Re: NFS Mount Hangs NetApp Security WARNING: This is an external email. Do not click links or open

AW: NFS Mount Hangs

2021-04-10 Thread Scheffenegger, Richard
Hi Rick, > Well, I have some good news and some bad news (the bad is mostly for Richard). > > The only message logged is: > tcpflags 0x4; tcp_do_segment: Timestamp missing, segment processed > normally > > But...the RST battle no longer occurs. Just one RST that works and then the > SYN gets

Re: NFS Mount Hangs

2021-04-04 Thread Scheffenegger, Richard
For what it‘s worth, suse found two bugs in the linux nfconntrack (stateful firewall), and pfifo-fast scheduler, which could conspire to make tcp sessions hang forever. One is a missed updaten when the cöient is not using the noresvport moint option, which makes tje firewall think rsts are

AW: tcp-testsuite into src?

2021-03-23 Thread Scheffenegger, Richard
>> Yeah, it's not a problem to use binaries from ports in /usr/tests. As >> long as the tests can compile they can live in the base system. Is >> there a strong incentive to import them? > > The tests are just scripts, which can be executed by packetdrill, which is > available in the ports

AW: NFS Mount Hangs

2021-03-19 Thread Scheffenegger, Richard
Sorry, I though this was a problem on stable/13. This is only in HEAD, stable/13 and 13.0 - never MFC'd to stable/12 or backported to 12.1 > I did some reshuffling of socket-upcalls recently in the TCP stack, to > prevent some race conditions with our $work in-kernel NFS server >

AW: NFS Mount Hangs

2021-03-19 Thread Scheffenegger, Richard
be impacted by this. Richard Scheffenegger -Ursprüngliche Nachricht- Von: owner-freebsd-...@freebsd.org Im Auftrag von Rick Macklem Gesendet: Freitag, 19. März 2021 16:58 An: tue...@freebsd.org Cc: Scheffenegger, Richard ; freebsd-net@freebsd.org; Alexander Motin Betreff: Re: NFS

AW: NFS Mount Hangs

2021-03-18 Thread Scheffenegger, Richard
>>Output from the NFS Client when the issue occurs # netstat -an | grep >>NFS.Server.IP.X >>tcp0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 >>FIN_WAIT2 >I'm no TCP guy. Hopefully others might know why the client would be stuck in >FIN_WAIT2 (I vaguely recall this means

AW: panic: sackhint bytes rtx >= 0

2021-02-23 Thread Scheffenegger, Richard
Hi Andriy, I guess I am currently the person who has the most recent knowledge about that part of the base stack... Do you happen to have more (preceding) information about this, or a way to reproduce this? Are you running any special stack (RACK, BBR) which may have switched back to the

RE: Socket option to configure Ethernet PCP / CoS per-flow

2020-10-09 Thread Scheffenegger, Richard
t regards, Richard Scheffenegger -Original Message- From: Ryan Stone Sent: Donnerstag, 24. September 2020 23:31 To: Scheffenegger, Richard Cc: n...@freebsd.org; transp...@freebsd.org Subject: Re: Socket option to configure Ethernet PCP / CoS per-flow Hi Richard, At $WORK we're run

RE: Socket option to configure Ethernet PCP / CoS per-flow

2020-09-24 Thread Scheffenegger, Richard
tively lets each new session rotate through all PCPs, to make PFC more useful and not degrade into simple xon/xoff "global" flow control? Richard Scheffenegger -Original Message- From: Ryan Stone Sent: Donnerstag, 24. September 2020 23:31 To: Scheffenegger, Richard Cc: n...@free

RE: Socket option to configure Ethernet PCP / CoS per-flow

2020-09-11 Thread Scheffenegger, Richard
at FC issue? Richard Scheffenegger -Original Message- From: sth...@nethelp.no Sent: Freitag, 11. September 2020 18:55 To: Scheffenegger, Richard Cc: n...@freebsd.org; transp...@freebsd.org Subject: Re: Socket option to configure Ethernet PCP / CoS per-flow NetApp Security WARNING: This i

Socket option to configure Ethernet PCP / CoS per-flow

2020-09-11 Thread Scheffenegger, Richard
Hi, Currently, upstream head has only an IOCTL API to set up interface-wide default PCP marking: #define SIOCGVLANPCPSIOCGLANPCP /* Get VLAN PCP */ #define SIOCSVLANPCPSIOCSLANPCP /* Set VLAN PCP */ And the interface is via ifconfig pcp . However, while this allows all

RE: TFO for NFS

2020-08-29 Thread Scheffenegger, Richard
of-concept patch, and try to get benchmark data. Let's continue the discussion then. Best regards, Richard -Original Message- From: Rick Macklem Sent: Samstag, 29. August 2020 04:11 To: Scheffenegger, Richard Cc: Michael Tuexen ; freebsd-net@freebsd.org Subject: Re: TFO for NFS N

RE: Fast recovery ssthresh value

2020-08-23 Thread Scheffenegger, Richard
Hi Liang, In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2 [or 70% in case of cubic]) lost bytes - at least in theory. In comparison, (New)Reno can only recover one lost packet per window, and then keeps on transmitting new segments (ack + cwnd), even before the receipt of

RE: FreeBSD TCP/IP Tasks I (a contributor) could work on?

2020-08-12 Thread Scheffenegger, Richard
Hi Neel, If you are brave enough to leave the (mostly) stateless domain of L3 packet handling, and take on the challenge to tip your toes into the unforgiving realm of stateful L4 transport protocols, that would certainly be an area where every helping hand counts. E.g. Rod has recently found

RE: SFP I2C interface in drivers (driver SIOCGI2C support)

2020-06-22 Thread Scheffenegger, Richard
subclass = ethernet Which is the driver qlxgbe... Richard Scheffenegger -Original Message- From: Alexander V.Chernikov Sent: Montag, 22. Juni 2020 14:49 To: Scheffenegger, Richard ; n...@freebsd.org Subject: Re: SFP I2C interface in drivers (driver SIOCGI2C support) NetApp Security

SFP I2C interface in drivers (driver SIOCGI2C support)

2020-06-22 Thread Scheffenegger, Richard
Hi, I am just curious if anyone is working to get the NIC drivers support to read the pluggables I2C status (temperature, voltage level, optical power levels) from Intel NICs and Qlogic CNAs? Richard Scheffenegger ___ freebsd-net@freebsd.org mailing

RE: Some question about DCTCP implementation in FreeBSD

2019-06-06 Thread Scheffenegger, Richard
Hi Yu He, This code is simply using integer arithmetics (float is not really possible in the kernel), left-shifting the fractional value of g by 1024 (10 bits). Max_alpha_value = 1024 is “1” shifted left by 10. Agreed that this is not clearly documented, and I believe the sysctl handler also

RE: RFC8312 Cubic

2019-01-23 Thread Scheffenegger, Richard
art not always invoked with a properly (reset) ssthresh. Best regards, Richard From: Freddie Cash Sent: Mittwoch, 23. Jänner 2019 21:41 To: Scheffenegger, Richard Cc: freebsd-transp...@freebsd.org; freebsd-net@freebsd.org Subject: Re: RFC8312 Cubic NetApp Security WARNING: This is an external em

RFC8312 Cubic

2019-01-23 Thread Scheffenegger, Richard
Hi, we encounted an issue with the BSD11 cubic implementation during testing, when dealing with app-stalled and app-limited traffic patterns. The gist is, that at least in the after_idle cong_func for cubic, the states for ssthresh and cubic epoch time are not reset, leading to excessive cwnd

RFC6675

2018-12-21 Thread Scheffenegger, Richard
For those inclined, I have a working patch for full RFC6675 support now (need to validate pipe and behavior in various scenarios still): a) enters Loss Recovery on either Nth dupack, or when more than (N-1)*MSS bytes are sacked (the latter relevant for ack thinning) b) a cumulative ack below

RE: ECN+ Implementation

2018-11-04 Thread Scheffenegger, Richard
Hi Pavan, Try to make this behavior change dependent on a sysctl, possibly with 3 different settings (legacy, ECN+, ECN++): You may also want to look at ECN++ https://tools.ietf.org/html/draft-ietf-tcpm-generalized-ecn-03 Especially when you are looking to deploy this in Datacenters in

TCP SACK improvements (RFC6675 rescue retransmission and lost retransmission detection)

2015-03-25 Thread Scheffenegger, Richard
Hi, I hope this is the correct forum to ask for help improving a rather crude patch to introduce RFC6675 Rescue Retransmissions and efficient Lost Retransmission Detection. Note that this is not a full implementation of the RFC6675. The patch that I have is against 8.0, but I believe the SACK

RE: 1gbit LFN WAN link - odd tcp behavior

2011-06-28 Thread Scheffenegger, Richard
What is the effective latency under load, and the packet loss probability? Just to add some more detail. Richard Scheffenegger -Original Message- From: William Salt [mailto:williamejs...@googlemail.com] Sent: Montag, 27. Juni 2011 12:15 To: freebsd-net@freebsd.org Subject: 1gbit

kern/140597: Lost Retransmission Detection

2011-05-31 Thread Scheffenegger, Richard
Hi, please review the following patch, which enables the detection and recovery of lost retransmissions for SACK. This patch address the second most prominent cause of retransmission timeouts (after the failure to initiate loss recovery for small window sessions - e.g. Early Retransmit). The

RFC3517bis rescue retransmission

2011-05-31 Thread Scheffenegger, Richard
Hi, RFC3517bis has added a provision to fix a special corner case of SACK loss recovery. Under certain circumstances (end of stream), TCP SACK can be much less effective in loss recovery than TCP NewReno. For a history of this corner case, please see

Re: [CFT] Early Retransmit for TCP (rfc5827) patch

2011-05-31 Thread Scheffenegger, Richard
Hi Weongyo, Good to know that you are addressing the primary reason for retransmission timeouts with SACK. (Small window (early retransmit) is ~70%, lost retransmission ~25%, end-of-stream loss ~5% of all addressable causes for a RTO). I looked at your code to enable RFC5827 Early Retransmits.

Re: kern/140597 implement Lost Retransmission Detection

2010-04-20 Thread Scheffenegger, Richard
The following reply was made to PR kern/140597; it has been noted by GNATS. From: Scheffenegger, Richard r...@netapp.com To: bug-follo...@freebsd.org, Lawrence Stewart lastew...@swin.edu.au Cc: Biswas, Anumita anumita.bis...@netapp.com Subject: Re: kern/140597 implement Lost Retransmission

Re: kern/140597 implement Lost Retransmission Detection

2010-04-02 Thread Scheffenegger, Richard
The following reply was made to PR kern/140597; it has been noted by GNATS. From: Scheffenegger, Richard r...@netapp.com To: bug-follo...@freebsd.org, Lawrence Stewart lastew...@swin.edu.au Cc: Biswas, Anumita anumita.bis...@netapp.com Subject: Re: kern/140597 implement Lost Retransmission