Hi,

On Monday, December 5, 2011 10:53 CET, "Sebastian Reitenbach" 
<sebas...@l00-bugdead-prods.de> wrote: 
 
> On Sunday, December 4, 2011 21:01 CET, Mark Kettenis 
> <mark.kette...@xs4all.nl> wrote: 
>  
> > > Date: Sun, 4 Dec 2011 15:10:56 +0100
> > > From: Claudio Jeker <cje...@diehard.n-r-g.com>
> > > 
> > > On Sun, Dec 04, 2011 at 01:35:33PM +0100, Sebastian Reitenbach wrote:
> > > > On Sunday, December 4, 2011 13:24 CET, Camiel Dobbelaar 
> > > > <c...@sentia.nl> wrote: 
> > > >  
> > > > > On 4-12-2011 13:01, Sebastian Reitenbach wrote:
> > > > > > the default maximum size of the tcp send and receive buffer used by 
> > > > > > the autosizing algorithm is way too small, when trying to get 
> > > > > > maximum speed with high bandwidth and high latency connections.
> > > > > 
> > > > > I have tweaked SB_MAX on a system too, but it was for UDP.
> > > > > 
> > > > > When running a busy Unbound resolver, the recommendation is too bump 
> > > > > the
> > > > > receive buffer to 4M or even 8M. See
> > > > > http://unbound.net/documentation/howto_optimise.html
> > > > > 
> > > > > Otherwise a lot of queries are dropped when the cache is cold.
> > > > > 
> > > > > I don't think there's a magic value that's right for everyone, so a
> > > > > sysctl would be nice.  Maybe separate ones for tcp and udp.
> > > > > 
> > > > > I know similar sysctl's have been removed recently, and that they are
> > > > > sometimes abused, but I'd say we have two valid use cases now.
> > > > > 
> > > > > So I'd love some more discussion.  :-)
> > > > 
> > > > since they were removed, and there is this keep it simple, and too many
> > > > knobs are bad attitude, which I think is not too bad, I just bumped the
> > > > SB_MAX value.
> > > > If there is consensus that a sysctl would make sense, I'd also look into
> > > > that approach and send new patch. 
> > >  
> > > SB_MAX is there to protect your system. It gives a upperbound on how much
> > > memory a socket may allocate. The current value is a compromize. Running
> > > with a huge SB_MAX may make one connection faster but it will cause
> > > resource starvation issues on busy systems.
> > > Sure you can bump it but be aware of the consequneces (and it is why I
> > > think we should not bump it at the moment). A proper change needs to
> > > include some sort of resource management that ensures that we do not run
> > > the kernel out of memory.
> > 
> > But 256k simply isn't enough for some use cases.  Turning this into a
> > sysctl tunable like FreeBSD and NetBSD would be a good idea if you ask
> > me.  Yes, people will use it to shoot themselves in the foot.  I don't
> > care.
> 
> So to be able to shoot myself in the foot without the need to compile the 
> kernel, I'll look into adding a sysctl to tweak the maximum size of the 
> buffer. Well, depending on time and how fast I figure out how to do that, 
> might take some time.

here is a first try to add such a sysctl. I called it net.inet.ip.sb-max. A 
better name, under a different hierarchy maybe?
The default value SB_MAX defined in sys/socketvar.h did not changed. I used 
sysctl_int for the sysctl, but not perfectly sure whether this is right? sb_max 
is u_long in sys/kern/uipc_socket2.c, so maybe using sysctl_quad? 
Tested and works for me on i386.

Its my first try in kernel land, and I'm no expert with regard to the network 
stack, so there may be things I should have done better. Please comment and let 
me know.

cheers,
Sebastian

Index: lib/libc/gen/sysctl.3
===================================================================
RCS file: /cvs/src/lib/libc/gen/sysctl.3,v
retrieving revision 1.210
diff -u -r1.210 sysctl.3
--- lib/libc/gen/sysctl.3       9 Dec 2011 16:14:54 -0000       1.210
+++ lib/libc/gen/sysctl.3       25 Dec 2011 13:50:15 -0000
@@ -1210,6 +1210,7 @@
 .It ip Ta porthilast Ta integer Ta yes
 .It ip Ta portlast Ta integer Ta yes
 .It ip Ta redirect Ta integer Ta yes
+.It ip Ta sb-max Ta integer Ta yes
 .It ip Ta sourceroute Ta integer Ta yes
 .It ip Ta stats Ta structure Ta no
 .It ip Ta ttl Ta integer Ta yes
@@ -1517,6 +1518,9 @@
 .Tn IP
 packets,
 and should normally be enabled on all systems.
+.It Li ip.sb-max
+Maximum size of socket buffers. This value is also used by the TCP
+send and receive buffer autosizing algorithm.
 .It Li ip.sourceroute
 Returns 1 when forwarding of source-routed packets is enabled for
 the host.
Index: sbin/sysctl/sysctl.8
===================================================================
RCS file: /cvs/src/sbin/sysctl/sysctl.8,v
retrieving revision 1.162
diff -u -r1.162 sysctl.8
--- sbin/sysctl/sysctl.8        3 Sep 2011 22:59:08 -0000       1.162
+++ sbin/sysctl/sysctl.8        25 Dec 2011 13:50:54 -0000
@@ -228,6 +228,7 @@
 .It net.inet.ip.porthilast Ta integer Ta yes
 .It net.inet.ip.maxqueue Ta integer Ta yes
 .It net.inet.ip.encdebug Ta integer Ta yes
+.It net.inet.ip.sb-max Ta integer Ta yes
 .It net.inet.ip.ipsec-expire-acquire Ta integer Ta yes
 .It net.inet.ip.ipsec-invalid-life Ta integer Ta yes
 .It net.inet.ip.ipsec-pfs Ta integer Ta yes
Index: sys/kern/uipc_socket2.c
===================================================================
RCS file: /cvs/src/sys/kern/uipc_socket2.c,v
retrieving revision 1.52
diff -u -r1.52 uipc_socket2.c
--- sys/kern/uipc_socket2.c     4 Apr 2011 21:11:22 -0000       1.52
+++ sys/kern/uipc_socket2.c     25 Dec 2011 13:51:30 -0000
@@ -50,6 +50,7 @@
  * Primitive routines for operating on sockets and socket buffers
  */
 
+extern int ip_sb_max;
 u_long sb_max = SB_MAX;                /* patchable */
 
 extern struct pool mclpools[];
@@ -385,6 +386,7 @@
 sbreserve(struct sockbuf *sb, u_long cc)
 {
 
+       sb_max = ip_sb_max;
        if (cc == 0 || cc > sb_max)
                return (1);
        sb->sb_hiwat = cc;
Index: sys/netinet/in.h
===================================================================
RCS file: /cvs/src/sys/netinet/in.h,v
retrieving revision 1.90
diff -u -r1.90 in.h
--- sys/netinet/in.h    6 Jul 2011 01:57:37 -0000       1.90
+++ sys/netinet/in.h    25 Dec 2011 13:51:33 -0000
@@ -655,7 +655,8 @@
 #define        IPCTL_MRTPROTO          34      /* type of multicast */
 #define        IPCTL_MRTSTATS          35
 #define        IPCTL_ARPQUEUED         36
-#define        IPCTL_MAXID             37
+#define        IPCTL_SB_MAX            37      /* int: max socketbuffer size */
+#define        IPCTL_MAXID             38
 
 #define        IPCTL_NAMES { \
        { 0, 0 }, \
@@ -695,6 +696,7 @@
        { "mrtproto", CTLTYPE_INT }, \
        { "mrtstats", CTLTYPE_STRUCT }, \
        { "arpqueued", CTLTYPE_INT }, \
+       { "sb-max", CTLTYPE_INT }, \
 }
 #define        IPCTL_VARS { \
        NULL, \
@@ -733,7 +735,8 @@
        NULL, \
        NULL, \
        NULL, \
-       &la_hold_total \
+       &la_hold_total, \
+       &ip_sb_max \
 }
 
 /* INET6 stuff */
Index: sys/netinet/ip_input.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_input.c,v
retrieving revision 1.195
diff -u -r1.195 ip_input.c
--- sys/netinet/ip_input.c      6 Jul 2011 02:42:28 -0000       1.195
+++ sys/netinet/ip_input.c      25 Dec 2011 13:51:39 -0000
@@ -107,6 +107,7 @@
 int    ip_mtudisc = 1;
 u_int  ip_mtudisc_timeout = IPMTUDISCTIMEOUT;
 int    ip_directedbcast = 0;
+int    ip_sb_max = SB_MAX;
 #ifdef DIAGNOSTIC
 int    ipprintfs = 0;
 #endif
@@ -1700,6 +1701,8 @@
 #else
                return (EOPNOTSUPP);
 #endif
+       case IPCTL_SB_MAX:
+               return (sysctl_int(oldp, oldlenp, newp, newlen, &ip_sb_max));
        default:
                if (name[0] < IPCTL_MAXID)
                        return (sysctl_int_arr(ipctl_vars, name, namelen,

Reply via email to