Re: [SLUG] Poor Gb network performance

Glen Turner Mon, 04 Sep 2006 01:25:27 -0700

Raphael Kraus wrote:

G'day all,


We've got Ubuntu and Debian machines connected to our gigabyte networks.

Ubuntu kernel: 2.6.15-26-amd64-server
Debian kernel: 2.6.8.3-k7-smp

(Processors are AMD Athlon64 dual-cores.)

Both machines don't seem to be getting through data throughput across
the network. We're getting 10 ~ 12 MB/s (usu. ~11MB/s).

By my calculations we should be getting around ten times this.  Netgear
switches are being used.

Is anyone aware of any issues that would be causing this?


1)
Ignore those saying use a cross-over cable.  GbE is
auto-MDI and a crossover cable just confuses it. GbE
does need all 8 cores and it needs a pretty low capacitance
so use a high quality cable (these are usually marked as
"category 6") and avoid turns with a radius of <20cm
(ie, don't curl the excess cable).

2)
Are you seeing packet loss. That pushes TCP into slow
restart (since it thinks it is always seeing a congested
link).

Most common stuff-up here is that people insist on defeating
autonegotiate by setting speed or duplex and fail to realise
that this forces the other party to fall back to 10Mbps half
duplex. And thus *both* ends need to be autonegotiate or
*both* ends need to be manually set.  A dumb switch like
the Netgear will autonegotiate, so the PC needs to autnegotiate
too.

Use ethtool -S at the Linux box and look at the stats on
the switch. Any non-zero error count is bad news.

3)
Do you have enough TCP buffer? Linux TCP buffers are tuned
for a 100Mbps LAN.  So you're probably 10x deficient in
buffering.  Calculate the bandwidth-delay product and set
the TCP buffers accordingly.  For a worked example see:
  <http://www.aarnet.edu.au/~gdt/presentations/2006-05-11-cebit-pamphlet/>

In theory the switches need 0.25 of the BDP and I can tell
you that a cheap Netgear switch won't have that. The saving
grace here is that both links and GbE and thus no buffering
beyond a small amount of jitter should be needed in practice.

4)
Early Realtek controllers sucked, since the last packet never
got pushed.  Linux has a hack so that this happens on when
the clock ticks. But that distorts the TCP round-trip time
estimate (everything appears >5ms away).

So use ethtool to turn off the performance features like
segmentation offload and see if that improves matters.
With the amount of CPU you've got you won't miss them.

5)
The CPU should be enough to drive GbE to capacity with standard
frame sizes.  You can check how much CPU is used in kernel
context if you suspect that might be a problem.

Jumbo frames (9000 bytes) do work a lot better for backups
(since a 8K disk block fits nicely and the kernel has to
do a lot less work). But all the interfaces on a subnet
need to have the same MTU, so it's probably not a goer in
your situation.

6)
The GbE controllers are probably sitting on an internal
PCI-X bus, which will do 7.5Gbps and thus is enough.

7)
Grab a dump using Ethereal. Turn on the TCP analysis.
Now look for Acks which indicate a lost packet. You
shouldn't see any of these until the capacity of the
end-to-end path is reached.

--
 Glen Turner         Tel: (08) 8303 3936 or +61 8 8303 3936
 Australia's Academic & Research Network  www.aarnet.edu.au
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Re: [SLUG] Poor Gb network performance

Reply via email to