On Wed, Feb 03, 2021 at 10:56:38AM +0100, Jan Klemkow wrote:
> On Tue, Jan 05, 2021 at 10:30:33AM +0100, Claudio Jeker wrote:
> > On Tue, Jan 05, 2021 at 10:16:04AM +0100, Jan Klemkow wrote:
> > > On Wed, Dec 23, 2020 at 11:59:13AM +0000, Stuart Henderson wrote:
> > > > On 2020/12/17 20:50, Jan Klemkow wrote:
> > > > > ping
> > > > > 
> > > > > On Fri, Nov 06, 2020 at 01:10:52AM +0100, Jan Klemkow wrote:
> > > > > > bluhm and I make some network performance measurements and kernel
> > > > > > profiling.
> > > > 
> > > > I've been running this on my workstation since you sent it out - lots
> > > > of long-running ssh connections, hourly reposync, daily rsync of base
> > > > snapshots.
> > > > 
> > > > I don't know enough about TCP stack behaviour to really give a 
> > > > meaningful
> > > > OK, but certainly not seeing any problems with it.
> > > 
> > > Thanks, Stuart.  Has someone else tested this diff?  Or, are there some
> > > opinions or objections about it?  Even bike-shedding is welcome :-)
> > 
> > From my memory TCP uses the ACKs on startup to increase the send window
> > and so your diff could slow down the initial startup. Not sure if that
> > matters actually. It can have some impact if userland reads in big blocks
> > at infrequent intervals since then the ACK clock slows down.
> > 
> > I guess to get converage it would be best to commit this and then monitor
> > the lists for possible slowdowns.
> 
> It there a way to commit this, or to test the diff in snapshots?

Just commit it. OK claudio@
If people see problems we can back it out again.
 
> bye,
> Jan
>  
> > > > > > Setup:      Linux (iperf) -10gbit-> OpenBSD (relayd) -10gbit-> 
> > > > > > Linux (iperf)
> > > > > > 
> > > > > > We figured out, that the kernel uses a huge amount of processing 
> > > > > > time
> > > > > > for sending ACKs to the sender on the receiving interface.  After
> > > > > > receiving a data segment, we send our two ACK.  The first one in
> > > > > > tcp_input() direct after receiving.  The second ACK is send out, 
> > > > > > after
> > > > > > the userland or the sosplice task read some data out of the socket
> > > > > > buffer.
> > > > > > 
> > > > > > The fist ACK in tcp_input() is called after receiving every other 
> > > > > > data
> > > > > > segment like it is discribed in RFC1122:
> > > > > > 
> > > > > >     4.2.3.2  When to Send an ACK Segment
> > > > > >             A TCP SHOULD implement a delayed ACK, but an ACK should
> > > > > >             not be excessively delayed; in particular, the delay
> > > > > >             MUST be less than 0.5 seconds, and in a stream of
> > > > > >             full-sized segments there SHOULD be an ACK for at least
> > > > > >             every second segment.
> > > > > > 
> > > > > > This advice is based on the paper "Congestion Avoidance and 
> > > > > > Control":
> > > > > > 
> > > > > >     4 THE GATEWAY SIDE OF CONGESTION CONTROL
> > > > > >             The 8 KBps senders were talking to 4.3+BSD receivers
> > > > > >             which would delay an ack for atmost one packet (because
> > > > > >             of an ack’s clock’ role, the authors believe that the
> > > > > >             minimum ack frequency should be every other packet).
> > > > > > 
> > > > > > Sending the first ACK (on every other packet) coasts us too much
> > > > > > processing time.  Thus, we run into a full socket buffer earlier.  
> > > > > > The
> > > > > > first ACK just acknowledges the received data, but does not update 
> > > > > > the
> > > > > > window.  The second ACK, caused by the socket buffer reader, also
> > > > > > acknowledges the data and also updates the window.  So, the second 
> > > > > > ACK,
> > > > > > is much more worth for a fast packet processing than the fist one.
> > > > > > 
> > > > > > The performance improvement is between 33% with splicing and 20% 
> > > > > > without
> > > > > > splice:
> > > > > > 
> > > > > >                     splicing        relaying
> > > > > > 
> > > > > >     current         3.1 GBit/s      2.6 GBit/s
> > > > > >     w/o first ack   4.1 GBit/s      3.1 GBit/s
> > > > > > 
> > > > > > As far as I understand the implementation of other operating 
> > > > > > systems:
> > > > > > Linux has implement a custom TCP_QUICKACK socket option, to turn 
> > > > > > this
> > > > > > kind of feature on and off.  FreeBSD and NetBSD sill depend on it, 
> > > > > > when
> > > > > > using the New Reno implementation.
> > > > > > 
> > > > > > The following diff turns off the direct ACK on every other segment. 
> > > > > >  We
> > > > > > are running this diff in production on our own machines at genua 
> > > > > > and on
> > > > > > our products for several month, now.  We don't noticed any problems,
> > > > > > even with interactive network sessions (ssh) nor with bulk traffic.
> > > > > > 
> > > > > > Another solution could be a sysctl(3) or an additional socket 
> > > > > > option,
> > > > > > similar to Linux, to control this behavior per socket or system 
> > > > > > wide.
> > > > > > Also, a counter to ACK every 3rd, 4th... data segment could beat the
> > > > > > problem.
> > > > > > 
> > > > > > bye,
> > > > > > Jan
> > > > > > 
> > > > > > Index: netinet/tcp_input.c
> > > > > > ===================================================================
> > > > > > RCS file: /cvs/src/sys/netinet/tcp_input.c,v
> > > > > > retrieving revision 1.365
> > > > > > diff -u -p -r1.365 tcp_input.c
> > > > > > --- netinet/tcp_input.c     19 Jun 2020 22:47:22 -0000      1.365
> > > > > > +++ netinet/tcp_input.c     5 Nov 2020 23:00:34 -0000
> > > > > > @@ -165,8 +165,8 @@ do { \
> > > > > >  #endif
> > > > > >  
> > > > > >  /*
> > > > > > - * Macro to compute ACK transmission behavior.  Delay the ACK 
> > > > > > unless
> > > > > > - * we have already delayed an ACK (must send an ACK every two 
> > > > > > segments).
> > > > > > + * Macro to compute ACK transmission behavior.  Delay the ACK until
> > > > > > + * a read from the socket buffer or the delayed ACK timer causes 
> > > > > > one.
> > > > > >   * We also ACK immediately if we received a PUSH and the 
> > > > > > ACK-on-PUSH
> > > > > >   * option is enabled or when the packet is coming from a loopback
> > > > > >   * interface.
> > > > > > @@ -176,8 +176,7 @@ do { \
> > > > > >     struct ifnet *ifp = NULL; \
> > > > > >     if (m && (m->m_flags & M_PKTHDR)) \
> > > > > >             ifp = if_get(m->m_pkthdr.ph_ifidx); \
> > > > > > -   if (TCP_TIMER_ISARMED(tp, TCPT_DELACK) || \
> > > > > > -       (tcp_ack_on_push && (tiflags) & TH_PUSH) || \
> > > > > > +   if ((tcp_ack_on_push && (tiflags) & TH_PUSH) || \
> > > > > >         (ifp && (ifp->if_flags & IFF_LOOPBACK))) \
> > > > > >             tp->t_flags |= TF_ACKNOW; \
> > > > > >     else \
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> > -- 
> > :wq Claudio
> 

-- 
:wq Claudio

Reply via email to