Re: e1000 full-duplex TCP performance well below wire speed
Hi all Rick Jones wrote: 2) use the aforementioned burst TCP_RR test. This is then a single netperf with data flowing both ways on a single connection so no issue of skew, but perhaps an issue of being one connection and so one process on each end. Since our major gaol is to establish a reliable way to test duplex connections this looks like a very good choice. Right now we just run this on a back to back test (cable connecting two hosts), but want to move to a high performance network with up to three switches between hosts. For this we want to have a stable test. I doubt that I will be able to finish the tests tonight, but I'll post a follow-up latest on Monday. Have a nice week-end and thanks a lot for all the suggestions so far! Cheers Carsten -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 full-duplex TCP performance well below wire speed
Good morning (my TZ), I'll try to answer all questions, hoewver if I miss something big, please point my nose to it again. Rick Jones wrote: As asked in LKML thread, please post the exact netperf command used to start the client/server, whether or not you're using irqbalanced (aka irqbalance) and what cat /proc/interrupts looks like (you ARE using MSI, right?) netperf was used without any special tuning parameters. Usually we start two processes on two hosts which start (almost) simultaneously, last for 20-60 seconds and simply use UDP_STREAM (works well) and TCP_STREAM, i.e. on 192.168.0.202: netperf -H 192.168.2.203 -t TCP_STREAL -l 20 on 192.168.0.203: netperf -H 192.168.2.202 -t TCP_STREAL -l 20 192.168.0.20[23] here is on eth0 which cannot do jumbo frames, thus we use the .2. part for eth1 for a range of mtus. The server is started on both nodes with the start-stop-daemon and no special parameters I'm aware of. /proc/interrupts shows me PCI_MSI-edge thus, I think YES. In particular, it would be good to know if you are doing two concurrent streams, or if you are using the burst mode TCP_RR with large request/response sizes method which then is only using one connection. As outlined above: Two concurrent streams right now. If you think TCP_RR should be better I'm happy to rerun some tests. More in other emails. I'll wade through them slowly. Carsten -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 full-duplex TCP performance well below wire speed
Hi all, slowly crawling through the mails. Brandeburg, Jesse wrote: The test was done with various mtu sizes ranging from 1500 to 9000, with ethernet flow control switched on and off, and using reno and cubic as a TCP congestion control. As asked in LKML thread, please post the exact netperf command used to start the client/server, whether or not you're using irqbalanced (aka irqbalance) and what cat /proc/interrupts looks like (you ARE using MSI, right?) We are using MSI, /proc/interrupts look like: n0003:~# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0:6536963 0 0 0 IO-APIC-edge timer 1: 2 0 0 0 IO-APIC-edge i8042 3: 1 0 0 0 IO-APIC-edge serial 8: 0 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 14: 32321 0 0 0 IO-APIC-edge libata 15: 0 0 0 0 IO-APIC-edge libata 16: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb5 18: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 23: 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2 378: 17234866 0 0 0 PCI-MSI-edge eth1 379: 129826 0 0 0 PCI-MSI-edge eth0 NMI: 0 0 0 0 LOC:6537181653732665371496537052 ERR: 0 (sorry for the line break). What we don't understand is why only core0 gets the interrupts, since the affinity is set to f: # cat /proc/irq/378/smp_affinity f Right now, irqbalance is not running, though I can give it shot if people think this will make a difference. I would suggest you try TCP_RR with a command line something like this: netperf -t TCP_RR -H hostname -C -c -- -b 4 -r 64K I did that and the results can be found here: https://n0.aei.uni-hannover.de/wiki/index.php/NetworkTest The results with netperf running like netperf -t TCP_STREAM -H host -l 20 can be found here: https://n0.aei.uni-hannover.de/wiki/index.php/NetworkTestNetperf1 I reran the tests with netperf -t test -H host -l 20 -c -C or in the case of TCP_RR with the suggested burst settings -b 4 -r 64k Yes, InterruptThrottleRate=8000 means there will be no more than 8000 ints/second from that adapter, and if interrupts are generated faster than that they are aggregated. Interestingly since you are interested in ultra low latency, and may be willing to give up some cpu for it during bulk transfers you should try InterruptThrottleRate=1 (can generate up to 7 ints/s) On the web page you'll see that there are about 4000 interrupts/s for most tests and up to 20,000/s for the TCP_RR test. Shall I change the throttle rate? just for completeness can you post the dump of ethtool -e eth0 and lspci -vvv? Yup, we'll give that info also. n0002:~# ethtool -e eth1 Offset Values -- -- 0x 00 30 48 93 94 2d 20 0d 46 f7 57 00 ff ff ff ff 0x0010 ff ff ff ff 6b 02 9a 10 d9 15 9a 10 86 80 df 80 0x0020 00 00 00 20 54 7e 00 00 00 10 da 00 04 00 00 27 0x0030 c9 6c 50 31 32 07 0b 04 84 29 00 00 00 c0 06 07 0x0040 08 10 00 00 04 0f ff 7f 01 4d ff ff ff ff ff ff 0x0050 14 00 1d 00 14 00 1d 00 af aa 1e 00 00 00 1d 00 0x0060 00 01 00 40 1e 12 ff ff ff ff ff ff ff ff ff ff 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff cf 2f lspci -vvv for this card: 0e:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller Subsystem: Super Micro Computer Inc Unknown device 109a Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 378 Region 0: Memory at ee20 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at 5000 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+ Address: fee0f00c Data: 41b9 Capabilities: [e0] Express Endpoint IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s 512ns, L1 64us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal- Fatal-
Re: e1000 full-duplex TCP performance well below wire speed
Brief question I forgot to ask: Right now we are using the old version 7.3.20-k2. To save some effort on your end, shall we upgrade this to 7.6.15 or should our version be good enough? Thanks Carsten -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 full-duplex TCP performance well below wire speed
Hi Andi, Andi Kleen wrote: Another issue with full duplex TCP not mentioned yet is that if TSO is used the output will be somewhat bursty and might cause problems with the TCP ACK clock of the other direction because the ACKs would need to squeeze in between full TSO bursts. You could try disabling TSO with ethtool. I just tried that: https://n0.aei.uni-hannover.de/wiki/index.php/NetworkTestNetperf3 It seems that the numbers do get better (sweet-spot seems to be MTU6000 with 914 MBit/s and 927 MBit/s), however for other settings the results vary a lot so I'm not sure how large the statistical fluctuations are. Next test I'll try if it makes sense to enlarge the ring buffers. Thanks Carsten -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 full-duplex TCP performance well below wire speed
Hi all, Brandeburg, Jesse wrote: I would suggest you try TCP_RR with a command line something like this: netperf -t TCP_RR -H hostname -C -c -- -b 4 -r 64K I did that and the results can be found here: https://n0.aei.uni-hannover.de/wiki/index.php/NetworkTest seems something went wrong and all you ran was the 1 byte tests, where it should have been 64K both directions (request/response). Yes, shell-quoting got me there. I'll re-run the tests, so please don't look at the TCP_RR results too closely. I think I'll be able to run maybe one or two more tests today, rest will follow tomorrow. Thanks for bearing with me Carsten PS: Am I right that the TCP_RR tests should only be run on a single node at a time, not on both ends simultaneously? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html