Ian,

I agree that something is wrong. The whole point of using those fancy Intel 
NICs should be to reduce the CPU load, right?

Here is a technote from HELIOS, makers of the enterprise AFP server EtherShare, 
about 10GbE tuning:

        <http://www.helios.de/web/EN/support/TI/154.html>

They say that tuning should usually not be necessary unless you are running 
10GbE on both ends of a connection.

Oh, and look at this. It sounds like X520 CPU load issues have been a problem 
in FreeBSD too:

        
<http://unix.derkeiler.com/Mailing-Lists/FreeBSD/performance/2012-01/msg00021.html>

Unfortunately, I can't dig into this much further right now, but please keep us 
in the loop if you solve things on your end!

Best,
Chris


Am 09.08.2014 um 01:29 schrieb Ian Collins via smartos-discuss 
<[email protected]>:

> Chris Ferebee wrote:
>> Ian,
>> 
>> Right now I'm fighting with my Finder/AFP/netatalk/getcwd() performance 
>> issues, which are a landmine, so the 10GbE slowdowns are the least of my 
>> worries. But here is what I did find out.
>> 
>> It helps to tune the following TCP stack parameters:
>> 
>> # ndd -set /dev/tcp tcp_recv_hiwat 400000
>> # ndd -set /dev/tcp tcp_xmit_hiwat 400000
>> # ndd -set /dev/tcp tcp_max_buf 2097152
>> # ndd -set /dev/tcp tcp_cwnd_max 16777216
>> 
>> Still, I can max out one CPU core at 100% by running a small number of 
>> threads of netstat or iperf (doesn't really matter which) in parallel. The 
>> other cores stay mostly idle.
>> 
>> After tuning the parameters as above, I was seeing about 3 Gbit/s throughput 
>> over the X520 with several threads.
>> 
>> I think the bottleneck is in the ixgbe driver, because running the same 
>> tests on localhost gives about 200 Gbit/s throughput, so the threads 
>> producing and consuming the data are definitely not at fault.
>> 
>> Considering comments by Nick Perry and others I suspect it would be worth 
>> trying to increase the ixgbe driver's rx_queue_number and tx_queue_number 
>> via /kernel/drv/ixgbe.conf. AIR the max # of queues depends on your hardware 
>> revision and is either 8 or 16 depending on the Intel part number, while the 
>> default is 1. As I understand it, this would allow parallelizing the driver 
>> load across multiple cores.
>> 
> 
> I'm still very skeptical about all these tweaks.  I have tried them and I've 
> been using them for streaming ZFS over 1GE networks.
> 
> Running a test application of mine (without changing any TCP settings) that 
> loads a big file (typically something that can't be compressed such as a 
> video file) into RAM and then writes it multiple times to disk I get a peek 
> throughput of >700MB/s to an NFD share using Intel X540 10GE cards to a 
> Solaris 11 host.  The overall transfer is limited by the pool write capacity 
> (~400MB/s long term).
> 
> -- 
> Ian.
> 
> 
> 
> -------------------------------------------
> smartos-discuss
> Archives: https://www.listbox.com/member/archive/184463/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/184463/24804823-eebbfb1e
> Modify Your Subscription: https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to