This is a note to let you know that I've just added the patch titled

    [PATCH 03/28] tcp: allow splice() to build full TSO packets

to the 3.3-stable tree which can be found at:
    
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     tcp-allow-splice-to-build-full-tso-packets.patch
and it can be found in the queue-3.3 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <[email protected]> know about it.


>From 3aebf9d9bb951e41e6e87aa6a63c1ebe0a4e31b6 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <[email protected]>
Date: Tue, 24 Apr 2012 22:12:06 -0400
Subject: [PATCH 03/28] tcp: allow splice() to build full TSO packets


From: Eric Dumazet <[email protected]>

[ This combines upstream commit
  2f53384424251c06038ae612e56231b96ab610ee and the follow-on bug fix
  commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 ]

vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <[email protected]>
Cc: Nandita Dukkipati <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: H.K. Jerry Chu <[email protected]>
Cc: Maciej Å»enczykowski <[email protected]>
Cc: Mahesh Bandewar <[email protected]>
Cc: Ilpo Järvinen <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
 fs/splice.c            |    5 ++++-
 include/linux/socket.h |    2 +-
 net/ipv4/tcp.c         |    2 +-
 net/socket.c           |    6 +++---
 4 files changed, 9 insertions(+), 6 deletions(-)

--- a/fs/splice.c
+++ b/fs/splice.c
@@ -30,6 +30,7 @@
 #include <linux/uio.h>
 #include <linux/security.h>
 #include <linux/gfp.h>
+#include <linux/socket.h>
 
 /*
  * Attempt to steal a page from a pipe buffer. This should perhaps go into
@@ -690,7 +691,9 @@ static int pipe_to_sendpage(struct pipe_
        if (!likely(file->f_op && file->f_op->sendpage))
                return -EINVAL;
 
-       more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len;
+       more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
+       if (sd->len < sd->total_len)
+               more |= MSG_SENDPAGE_NOTLAST;
        return file->f_op->sendpage(file, buf->page, buf->offset,
                                    sd->len, &pos, more);
 }
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -265,7 +265,7 @@ struct ucred {
 #define MSG_NOSIGNAL   0x4000  /* Do not generate SIGPIPE */
 #define MSG_MORE       0x8000  /* Sender will send more */
 #define MSG_WAITFORONE 0x10000 /* recvmmsg(): block until 1+ packets avail */
-
+#define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last 
page */
 #define MSG_EOF         MSG_FIN
 
 #define MSG_CMSG_CLOEXEC 0x40000000    /* Set close_on_exit for file
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -858,7 +858,7 @@ wait_for_memory:
        }
 
 out:
-       if (copied)
+       if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
                tcp_push(sk, flags, mss_now, tp->nonagle);
        return copied;
 
--- a/net/socket.c
+++ b/net/socket.c
@@ -811,9 +811,9 @@ static ssize_t sock_sendpage(struct file
 
        sock = file->private_data;
 
-       flags = !(file->f_flags & O_NONBLOCK) ? 0 : MSG_DONTWAIT;
-       if (more)
-               flags |= MSG_MORE;
+       flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
+       /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */
+       flags |= more;
 
        return kernel_sendpage(sock, page, offset, size, flags);
 }


Patches currently in stable-queue which might be from [email protected] are

queue-3.3/net-fix-a-race-in-sock_queue_err_skb.patch
queue-3.3/tcp-allow-splice-to-build-full-tso-packets.patch
queue-3.3/net-fix-proc-net-dev-regression.patch
queue-3.3/tcp-restore-correct-limit.patch
queue-3.3/net-allow-pskb_expand_head-to-get-maximum-tailroom.patch
queue-3.3/net-smsc911x-fix-skb-handling-in-receive-path.patch
queue-3.3/tcp-avoid-order-1-allocations-on-wifi-and-tx-path.patch
queue-3.3/netlink-fix-races-after-skb-queueing.patch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to