Hi Hank,
On Sat, Nov 27, 2010 at 06:56:34AM -0800, Hank A. Paulson wrote:
1 - With recent CPUs Intel 5300/5400/5500/5600 and AMD 6100 the set of
optimal compiler settings for optimizations :) is not something anyone can
keep up with - not to mention different versions of gcc that understand
none, some or all of the features of these CPUs. march native allows gcc to
take on the burden of optimizing the compile time settings, so if that
could be added as one of the options in the makefile, it would be helpful
because then I could use the same make... line on every machine but it
would self-adjust for that machine.
(...)
That's a good idea, I have implemented it and even ported it to 1.4.
I have also added ARCH=32 and ARCH=64 do be used in combination with
CPU=native, so that you can select whether you explicitly want a 32
or 64-bit executable.
2 - Google has pushed via both tcp related RFCs and patches to the
networking code for the linux kernel to allow the initial cwnd to be set as
a socket option - this would be a huge help to sites that communicate with
the same clients over and over and/or with many small requests allowing a
full response in one (or at least fewer) round trips. For one site that I
work on that is over 250 ms away with a very reliable gateway on the other
end, I burn through several round trips to deliver an icon/small gif/etc -
an icon that could have all the necessary packets in flight before the
first ack. It turns out the small initial cwnd creates more traffic across
the under sea cables than an initial cwnd of 8 or 10 or 12.
http://www.amailbox.org/mailarchive/linux-netdev/2010/5/26/6278007
Indeed it can be nice in mobile environments for instance, where the
RTT is quite high. It does not seem too hard to add, I'm adding this
to the 1.5 TODO list.
I also wanted to see if you were aware of two other recent kernel changes
that could be helpful to haproxy performance, the first could be helpful
for the new UNIX socket connections in recent haproxy versions:
Implementation of recvmmsg:
recvmmsg() is a new syscall that allows to receive with a single syscall
multiple messages that would require multiple calls to recvmsg(). For
high-bandwith, small packet applications, throughput and latency are
improved greatly.
Unfortunately, this will have no effect here because recvmmsg()'s goal is
to receive multiple datagrams at once, but we're not working with datagrams
but with streams, and segments are already combined to return as many of
them as possible.
A small improvement we can work on is to use accept4() instead of accept()
to save one setsockopt().
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a2e2725541fad72416326798c2d7fa4dafb7d337
The second is RPS from google to improve network processing performance
with multiple CPUs - similar to MSI-X but google found that both together
had even more performance than just MSI-X:
http://kernelnewbies.org/Linux_2_6_35#head-94daf753b96280181e79a71ca4bb7f7a423e302a
http://lwn.net/Articles/362339/
Yes I've followed that. There's is nothing to do to make use of that,
you just need to upgrade your kernel :-)
Cheers,
Willy