On 12/04/2011 09:10 AM, Claudio Jeker wrote:
On Sun, Dec 04, 2011 at 01:35:33PM +0100, Sebastian Reitenbach wrote:
On Sunday, December 4, 2011 13:24 CET, Camiel Dobbelaar<c...@sentia.nl> wrote:
On 4-12-2011 13:01, Sebastian Reitenbach wrote:
the default maximum size of the tcp send and receive buffer used by the
autosizing algorithm is way too small, when trying to get maximum speed with
high bandwidth and high latency connections.
I have tweaked SB_MAX on a system too, but it was for UDP.
When running a busy Unbound resolver, the recommendation is too bump the
receive buffer to 4M or even 8M. See
http://unbound.net/documentation/howto_optimise.html
Otherwise a lot of queries are dropped when the cache is cold.
I don't think there's a magic value that's right for everyone, so a
sysctl would be nice. Maybe separate ones for tcp and udp.
I know similar sysctl's have been removed recently, and that they are
sometimes abused, but I'd say we have two valid use cases now.
So I'd love some more discussion. :-)
since they were removed, and there is this keep it simple, and too many
knobs are bad attitude, which I think is not too bad, I just bumped the
SB_MAX value.
If there is consensus that a sysctl would make sense, I'd also look into
that approach and send new patch.
SB_MAX is there to protect your system. It gives a upperbound on how much
memory a socket may allocate. The current value is a compromize. Running
with a huge SB_MAX may make one connection faster but it will cause
resource starvation issues on busy systems.
Sure you can bump it but be aware of the consequneces (and it is why I
think we should not bump it at the moment). A proper change needs to
include some sort of resource management that ensures that we do not run
the kernel out of memory.
How many high speed high latency connections would it take to use a
"significant" proportion of kernel memory? Waving hands at the problem:
at 500 ms round trip delay, a 1Gb/s interface saturated link = 63MB
buffers per direction. A 100Mb/sec link would use 7MB per direction.
Multiple sockets on such a link should use a similar amount in total
using the autosizing algorithm. If this is approximately correct,
documenting a formula might be useful for sysadmins.
A system with 512MB physical memory should be able to saturate 2 or 3
100Mb/s links with large delays without seriously depleting kernel
memory. It seems unlikely that a small system with multiple saturated
1Gb/s links (or 1 10Gb/s link) could do anything very useful.
The pathological case: many sockets each one sequentially saturating the
link and then going idle. The current limit does not defend against this.
To generalize this problem: kernel memory is limited. It is autosized at
boot time. Allowing any kernel subsystem to use a large amount
jeopardizes system stability.
Does it make sense, philosophically and technically, to allow the
sysadmin to add physical memory to the kernel at run time, perhaps
limited to (arbitrarily) 50% of physical memory?
Geoff Steckel