Re: TCP many-connection regression between 4.7 and 4.13 kernels.
On 01/22/2018 10:46 AM, Josh Hunt wrote: On Mon, Jan 22, 2018 at 10:30 AM, Ben Greearwrote: On 01/22/2018 10:16 AM, Eric Dumazet wrote: On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote: My test case is to have 6 processes each create 5000 TCP IPv4 connections to each other on a system with 16GB RAM and send slow-speed data. This works fine on a 4.7 kernel, but will not work at all on a 4.13. The 4.13 first complains about running out of tcp memory, but even after forcing those values higher, the max connections we can get is around 15k. Both kernels have my out-of-tree patches applied, so it is possible it is my fault at this point. Any suggestions as to what this might be caused by, or if it is fixed in more recent kernels? I will start bisecting in the meantime... Hi Ben Unfortunately I have no idea. Are you using loopback flows, or have I misunderstood you ? How loopback connections can be slow-speed ? I am sending to self, but over external network interfaces, by using routing tables and rules and such. On 4.13.16+, I see the Intel driver bouncing when I try to start 20k connections. In this case, I have a pair of 10G ports doing 15k, and then I try to start 5k on two of the 1G ports Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:43 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:45 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_s...es: 1 Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0 eth3: Reset adapter unexpectedly Ben We had an interface doing this and grabbing these commits resolved it for us: 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts 19110cfbb34d e1000e: Separate signaling for link check/link up d3509f8bc7b0 e1000e: Fix return value test 65a29da1f5fd e1000e: Fix wrong comment related to link detection c4c40e51f9c3 e1000e: Fix error path in link detection They are in the LTS kernels now, but don't believe they were when we first hit this problem. Thanks a lot for the suggestions, I can confirm that these patches applied to my 4.13.16+ tree does indeed seem to fix the problem. Thanks, Ben Josh -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: TCP many-connection regression between 4.7 and 4.13 kernels.
On Mon, Jan 22, 2018 at 10:30 AM, Ben Greearwrote: > On 01/22/2018 10:16 AM, Eric Dumazet wrote: >> >> On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote: >>> >>> My test case is to have 6 processes each create 5000 TCP IPv4 connections >>> to each other >>> on a system with 16GB RAM and send slow-speed data. This works fine on a >>> 4.7 kernel, but >>> will not work at all on a 4.13. The 4.13 first complains about running >>> out of tcp memory, >>> but even after forcing those values higher, the max connections we can >>> get is around 15k. >>> >>> Both kernels have my out-of-tree patches applied, so it is possible it is >>> my fault >>> at this point. >>> >>> Any suggestions as to what this might be caused by, or if it is fixed in >>> more recent kernels? >>> >>> I will start bisecting in the meantime... >>> >> >> Hi Ben >> >> Unfortunately I have no idea. >> >> Are you using loopback flows, or have I misunderstood you ? >> >> How loopback connections can be slow-speed ? >> > > I am sending to self, but over external network interfaces, by using > routing tables and rules and such. > > On 4.13.16+, I see the Intel driver bouncing when I try to start 20k > connections. In this case, I have a pair of 10G ports doing 15k, and then > I try to start 5k on two of the 1G ports > > Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is > Down > Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is > Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx > Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is > Down > Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is > Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx > Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is > Down > Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is > Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx > Jan 22 10:15:43 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is > Down > Jan 22 10:15:45 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is > Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx > Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 > (e1000e): transmit queue 0 timed out, trans_s...es: 1 > Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0 > eth3: Reset adapter unexpectedly > Ben We had an interface doing this and grabbing these commits resolved it for us: 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts 19110cfbb34d e1000e: Separate signaling for link check/link up d3509f8bc7b0 e1000e: Fix return value test 65a29da1f5fd e1000e: Fix wrong comment related to link detection c4c40e51f9c3 e1000e: Fix error path in link detection They are in the LTS kernels now, but don't believe they were when we first hit this problem. Josh
Re: TCP many-connection regression between 4.7 and 4.13 kernels.
On 01/22/2018 10:30 AM, Ben Greear wrote: On 01/22/2018 10:16 AM, Eric Dumazet wrote: On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote: My test case is to have 6 processes each create 5000 TCP IPv4 connections to each other on a system with 16GB RAM and send slow-speed data. This works fine on a 4.7 kernel, but will not work at all on a 4.13. The 4.13 first complains about running out of tcp memory, but even after forcing those values higher, the max connections we can get is around 15k. Both kernels have my out-of-tree patches applied, so it is possible it is my fault at this point. Any suggestions as to what this might be caused by, or if it is fixed in more recent kernels? I will start bisecting in the meantime... Hi Ben Unfortunately I have no idea. Are you using loopback flows, or have I misunderstood you ? How loopback connections can be slow-speed ? I am sending to self, but over external network interfaces, by using routing tables and rules and such. On 4.13.16+, I see the Intel driver bouncing when I try to start 20k connections. In this case, I have a pair of 10G ports doing 15k, and then I try to start 5k on two of the 1G ports Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:43 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:45 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_s...es: 1 Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0 eth3: Reset adapter unexpectedly System reports 10+GB RAM free in this case, btw. Actually, maybe the good kernel was even older than 4.7...I see same resets and inability to do a full 20k connections on 4.7 too. I double-checked with system-test and it seems 4.4 was a good kernel. I'll test that next. Here is splat from 4.7: [ 238.921679] [ cut here ] [ 238.921689] WARNING: CPU: 0 PID: 3 at /home/greearb/git/linux-bisect/net/sched/sch_generic.c:272 dev_watchdog+0xd4/0x12f [ 238.921690] NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out [ 238.921691] Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 cfg80211 macvlan pktgen bnep bluetooth fuse coretemp intel_rapl ftdi_sio x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt iTCO_vendor_support joydev ie31200_edac ipmi_devintf irqbypass serio_raw ipmi_si edac_core shpchp fjes video i2c_i801 tpm_tis lpc_ich ipmi_msghandler tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core e1000e ixgbe mdio hwmon dca ptp pps_core ipv6 [last unloaded: nf_conntrack] [ 238.921720] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.7.0 #62 [ 238.921721] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 09/17/2012 [ 238.921723] 88041cdd7cd8 81352a23 88041cdd7d28 [ 238.921725] 88041cdd7d18 810ea5dd 01101cdd7d90 [ 238.921727] 880417a84000 0100 8163ecff 880417a84440 [ 238.921728] Call Trace: [ 238.921733] [] dump_stack+0x61/0x7d [ 238.921736] [] __warn+0xbd/0xd8 [ 238.921738] [] ? netif_tx_lock+0x81/0x81 [ 238.921740] [] warn_slowpath_fmt+0x46/0x4e [ 238.921741] [] ? netif_tx_lock+0x74/0x81 [ 238.921743] [] dev_watchdog+0xd4/0x12f [ 238.921746] [] call_timer_fn+0x65/0x11b [ 238.921748] [] ? netif_tx_lock+0x81/0x81 [ 238.921749] [] run_timer_softirq+0x1ad/0x1d7 [ 238.921751] [] __do_softirq+0xfb/0x25c [ 238.921752] [] run_ksoftirqd+0x19/0x35 [ 238.921755] [] smpboot_thread_fn+0x169/0x1a9 [ 238.921756] [] ? sort_range+0x1d/0x1d [ 238.921759] [] kthread+0xa0/0xa8 [ 238.921763] [] ret_from_fork+0x1f/0x40 [ 238.921764] [] ? init_completion+0x24/0x24 [ 238.921765] ---[ end trace 933912956c6ee5ff ]--- [ 238.961672] e1000e :07:00.0 eth3: Reset adapter unexpectedly So, on 4.4.8+, I see this and other splats related to e1000e. I guess that is a separate issue. I can easily start 40k connections however, 30k across the two 10G ports, and 10k more across a pair of mac-vlans on the 10G ports (since I was out of address space to add a full 40k on the two physical ports). Looks like the e1000e problem is a separate issue, so
Re: TCP many-connection regression between 4.7 and 4.13 kernels.
On 01/22/2018 10:16 AM, Eric Dumazet wrote: On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote: My test case is to have 6 processes each create 5000 TCP IPv4 connections to each other on a system with 16GB RAM and send slow-speed data. This works fine on a 4.7 kernel, but will not work at all on a 4.13. The 4.13 first complains about running out of tcp memory, but even after forcing those values higher, the max connections we can get is around 15k. Both kernels have my out-of-tree patches applied, so it is possible it is my fault at this point. Any suggestions as to what this might be caused by, or if it is fixed in more recent kernels? I will start bisecting in the meantime... Hi Ben Unfortunately I have no idea. Are you using loopback flows, or have I misunderstood you ? How loopback connections can be slow-speed ? I am sending to self, but over external network interfaces, by using routing tables and rules and such. On 4.13.16+, I see the Intel driver bouncing when I try to start 20k connections. In this case, I have a pair of 10G ports doing 15k, and then I try to start 5k on two of the 1G ports Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:43 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Down Jan 22 10:15:45 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_s...es: 1 Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0 eth3: Reset adapter unexpectedly System reports 10+GB RAM free in this case, btw. Actually, maybe the good kernel was even older than 4.7...I see same resets and inability to do a full 20k connections on 4.7 too. I double-checked with system-test and it seems 4.4 was a good kernel. I'll test that next. Here is splat from 4.7: [ 238.921679] [ cut here ] [ 238.921689] WARNING: CPU: 0 PID: 3 at /home/greearb/git/linux-bisect/net/sched/sch_generic.c:272 dev_watchdog+0xd4/0x12f [ 238.921690] NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out [ 238.921691] Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 cfg80211 macvlan pktgen bnep bluetooth fuse coretemp intel_rapl ftdi_sio x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt iTCO_vendor_support joydev ie31200_edac ipmi_devintf irqbypass serio_raw ipmi_si edac_core shpchp fjes video i2c_i801 tpm_tis lpc_ich ipmi_msghandler tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core e1000e ixgbe mdio hwmon dca ptp pps_core ipv6 [last unloaded: nf_conntrack] [ 238.921720] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.7.0 #62 [ 238.921721] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 09/17/2012 [ 238.921723] 88041cdd7cd8 81352a23 88041cdd7d28 [ 238.921725] 88041cdd7d18 810ea5dd 01101cdd7d90 [ 238.921727] 880417a84000 0100 8163ecff 880417a84440 [ 238.921728] Call Trace: [ 238.921733] [] dump_stack+0x61/0x7d [ 238.921736] [] __warn+0xbd/0xd8 [ 238.921738] [] ? netif_tx_lock+0x81/0x81 [ 238.921740] [] warn_slowpath_fmt+0x46/0x4e [ 238.921741] [] ? netif_tx_lock+0x74/0x81 [ 238.921743] [] dev_watchdog+0xd4/0x12f [ 238.921746] [] call_timer_fn+0x65/0x11b [ 238.921748] [] ? netif_tx_lock+0x81/0x81 [ 238.921749] [] run_timer_softirq+0x1ad/0x1d7 [ 238.921751] [] __do_softirq+0xfb/0x25c [ 238.921752] [] run_ksoftirqd+0x19/0x35 [ 238.921755] [] smpboot_thread_fn+0x169/0x1a9 [ 238.921756] [] ? sort_range+0x1d/0x1d [ 238.921759] [] kthread+0xa0/0xa8 [ 238.921763] [] ret_from_fork+0x1f/0x40 [ 238.921764] [] ? init_completion+0x24/0x24 [ 238.921765] ---[ end trace 933912956c6ee5ff ]--- [ 238.961672] e1000e :07:00.0 eth3: Reset adapter unexpectedly Thanks, Ben -- Ben GreearCandela Technologies Inc http://www.candelatech.com
Re: TCP many-connection regression between 4.7 and 4.13 kernels.
Hi Eric, On Mon, Jan 22, 2018 at 10:16:06AM -0800, Eric Dumazet wrote: > On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote: > > My test case is to have 6 processes each create 5000 TCP IPv4 connections > > to each other > > on a system with 16GB RAM and send slow-speed data. This works fine on a > > 4.7 kernel, but > > will not work at all on a 4.13. The 4.13 first complains about running out > > of tcp memory, > > but even after forcing those values higher, the max connections we can get > > is around 15k. > > > > Both kernels have my out-of-tree patches applied, so it is possible it is > > my fault > > at this point. > > > > Any suggestions as to what this might be caused by, or if it is fixed in > > more recent kernels? > > > > I will start bisecting in the meantime... > > > > Hi Ben > > Unfortunately I have no idea. > > Are you using loopback flows, or have I misunderstood you ? > > How loopback connections can be slow-speed ? A few quick points : I have not noticed this on 4.9, which we use with pretty satisfying performance (typically around 100k conn/s). However during some recent tests I did around the meltdown fixes on 4.15, I noticed a high connect() or bind() cost to find a local port when running on the loopback, that I didn't have the time to compare to older kernels. However, strace clearly showed that bind() (or connect() if bind was not used) could take as much as 2-3 ms as source ports were filling up. To be clear, it was just a quick observation and anything could be wrong there, including my tests. I'm just saying this in case it matches anything Ben has observed. I can try to get more info if that helps, but it could be a different case. Cheers, Willy
Re: TCP many-connection regression between 4.7 and 4.13 kernels.
On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote: > My test case is to have 6 processes each create 5000 TCP IPv4 connections to > each other > on a system with 16GB RAM and send slow-speed data. This works fine on a 4.7 > kernel, but > will not work at all on a 4.13. The 4.13 first complains about running out > of tcp memory, > but even after forcing those values higher, the max connections we can get is > around 15k. > > Both kernels have my out-of-tree patches applied, so it is possible it is my > fault > at this point. > > Any suggestions as to what this might be caused by, or if it is fixed in more > recent kernels? > > I will start bisecting in the meantime... > Hi Ben Unfortunately I have no idea. Are you using loopback flows, or have I misunderstood you ? How loopback connections can be slow-speed ?
TCP many-connection regression between 4.7 and 4.13 kernels.
My test case is to have 6 processes each create 5000 TCP IPv4 connections to each other on a system with 16GB RAM and send slow-speed data. This works fine on a 4.7 kernel, but will not work at all on a 4.13. The 4.13 first complains about running out of tcp memory, but even after forcing those values higher, the max connections we can get is around 15k. Both kernels have my out-of-tree patches applied, so it is possible it is my fault at this point. Any suggestions as to what this might be caused by, or if it is fixed in more recent kernels? I will start bisecting in the meantime... Thanks, Ben -- Ben GreearCandela Technologies Inc http://www.candelatech.com