Re: TCP many-connection regression between 4.7 and 4.13 kernels.

2018-01-23 Thread Ben Greear

On 01/22/2018 10:46 AM, Josh Hunt wrote:

On Mon, Jan 22, 2018 at 10:30 AM, Ben Greear  wrote:

On 01/22/2018 10:16 AM, Eric Dumazet wrote:


On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote:


My test case is to have 6 processes each create 5000 TCP IPv4 connections
to each other
on a system with 16GB RAM and send slow-speed data.  This works fine on a
4.7 kernel, but
will not work at all on a 4.13.  The 4.13 first complains about running
out of tcp memory,
but even after forcing those values higher, the max connections we can
get is around 15k.

Both kernels have my out-of-tree patches applied, so it is possible it is
my fault
at this point.

Any suggestions as to what this might be caused by, or if it is fixed in
more recent kernels?

I will start bisecting in the meantime...



Hi Ben

Unfortunately I have no idea.

Are you using loopback flows, or have I misunderstood you ?

How loopback connections can be slow-speed ?



I am sending to self, but over external network interfaces, by using
routing tables and rules and such.

On 4.13.16+, I see the Intel driver bouncing when I try to start 20k
connections.  In this case, I have a pair of 10G ports doing 15k, and then
I try to start 5k on two of the 1G ports

Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:43 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
Down
Jan 22 10:15:45 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3
(e1000e): transmit queue 0 timed out, trans_s...es: 1
Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0
eth3: Reset adapter unexpectedly



Ben

We had an interface doing this and grabbing these commits resolved it for us:

4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts
19110cfbb34d e1000e: Separate signaling for link check/link up
d3509f8bc7b0 e1000e: Fix return value test
65a29da1f5fd e1000e: Fix wrong comment related to link detection
c4c40e51f9c3 e1000e: Fix error path in link detection

They are in the LTS kernels now, but don't believe they were when we
first hit this problem.


Thanks a lot for the suggestions, I can confirm that these patches applied to 
my 4.13.16+
tree does indeed seem to fix the problem.

Thanks,
Ben



Josh




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: TCP many-connection regression between 4.7 and 4.13 kernels.

2018-01-22 Thread Josh Hunt
On Mon, Jan 22, 2018 at 10:30 AM, Ben Greear  wrote:
> On 01/22/2018 10:16 AM, Eric Dumazet wrote:
>>
>> On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote:
>>>
>>> My test case is to have 6 processes each create 5000 TCP IPv4 connections
>>> to each other
>>> on a system with 16GB RAM and send slow-speed data.  This works fine on a
>>> 4.7 kernel, but
>>> will not work at all on a 4.13.  The 4.13 first complains about running
>>> out of tcp memory,
>>> but even after forcing those values higher, the max connections we can
>>> get is around 15k.
>>>
>>> Both kernels have my out-of-tree patches applied, so it is possible it is
>>> my fault
>>> at this point.
>>>
>>> Any suggestions as to what this might be caused by, or if it is fixed in
>>> more recent kernels?
>>>
>>> I will start bisecting in the meantime...
>>>
>>
>> Hi Ben
>>
>> Unfortunately I have no idea.
>>
>> Are you using loopback flows, or have I misunderstood you ?
>>
>> How loopback connections can be slow-speed ?
>>
>
> I am sending to self, but over external network interfaces, by using
> routing tables and rules and such.
>
> On 4.13.16+, I see the Intel driver bouncing when I try to start 20k
> connections.  In this case, I have a pair of 10G ports doing 15k, and then
> I try to start 5k on two of the 1G ports
>
> Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
> Down
> Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
> Down
> Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
> Down
> Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> Jan 22 10:15:43 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
> Down
> Jan 22 10:15:45 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3
> (e1000e): transmit queue 0 timed out, trans_s...es: 1
> Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0
> eth3: Reset adapter unexpectedly
>

Ben

We had an interface doing this and grabbing these commits resolved it for us:

4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts
19110cfbb34d e1000e: Separate signaling for link check/link up
d3509f8bc7b0 e1000e: Fix return value test
65a29da1f5fd e1000e: Fix wrong comment related to link detection
c4c40e51f9c3 e1000e: Fix error path in link detection

They are in the LTS kernels now, but don't believe they were when we
first hit this problem.

Josh


Re: TCP many-connection regression between 4.7 and 4.13 kernels.

2018-01-22 Thread Ben Greear

On 01/22/2018 10:30 AM, Ben Greear wrote:

On 01/22/2018 10:16 AM, Eric Dumazet wrote:

On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote:

My test case is to have 6 processes each create 5000 TCP IPv4 connections to 
each other
on a system with 16GB RAM and send slow-speed data.  This works fine on a 4.7 
kernel, but
will not work at all on a 4.13.  The 4.13 first complains about running out of 
tcp memory,
but even after forcing those values higher, the max connections we can get is 
around 15k.

Both kernels have my out-of-tree patches applied, so it is possible it is my 
fault
at this point.

Any suggestions as to what this might be caused by, or if it is fixed in more 
recent kernels?

I will start bisecting in the meantime...



Hi Ben

Unfortunately I have no idea.

Are you using loopback flows, or have I misunderstood you ?

How loopback connections can be slow-speed ?



I am sending to self, but over external network interfaces, by using
routing tables and rules and such.

On 4.13.16+, I see the Intel driver bouncing when I try to start 20k
connections.  In this case, I have a pair of 10G ports doing 15k, and then
I try to start 5k on two of the 1G ports

Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is 
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is 
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is 
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:43 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is 
Down
Jan 22 10:15:45 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 
(e1000e): transmit queue 0 timed out, trans_s...es: 1
Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0 eth3: 
Reset adapter unexpectedly


System reports 10+GB RAM free in this case, btw.

Actually, maybe the good kernel was even older than 4.7...I see same resets and 
inability to do a full 20k
connections on 4.7 too.   I double-checked with system-test and it seems 4.4 
was a good kernel.  I'll test
that next.  Here is splat from 4.7:

[  238.921679] [ cut here ]
[  238.921689] WARNING: CPU: 0 PID: 3 at 
/home/greearb/git/linux-bisect/net/sched/sch_generic.c:272 
dev_watchdog+0xd4/0x12f
[  238.921690] NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out
[  238.921691] Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink 
nf_defrag_ipv4 cfg80211 macvlan pktgen bnep bluetooth fuse coretemp intel_rapl
ftdi_sio x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt 
iTCO_vendor_support joydev ie31200_edac ipmi_devintf irqbypass serio_raw 
ipmi_si edac_core
shpchp fjes video i2c_i801 tpm_tis lpc_ich ipmi_msghandler tpm nfsd auth_rpcgss 
nfs_acl lockd grace sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
e1000e ixgbe mdio hwmon dca ptp pps_core ipv6 [last unloaded: nf_conntrack]
[  238.921720] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.7.0 #62
[  238.921721] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 
09/17/2012
[  238.921723]   88041cdd7cd8 81352a23 
88041cdd7d28
[  238.921725]   88041cdd7d18 810ea5dd 
01101cdd7d90
[  238.921727]  880417a84000 0100 8163ecff 
880417a84440
[  238.921728] Call Trace:
[  238.921733]  [] dump_stack+0x61/0x7d
[  238.921736]  [] __warn+0xbd/0xd8
[  238.921738]  [] ? netif_tx_lock+0x81/0x81
[  238.921740]  [] warn_slowpath_fmt+0x46/0x4e
[  238.921741]  [] ? netif_tx_lock+0x74/0x81
[  238.921743]  [] dev_watchdog+0xd4/0x12f
[  238.921746]  [] call_timer_fn+0x65/0x11b
[  238.921748]  [] ? netif_tx_lock+0x81/0x81
[  238.921749]  [] run_timer_softirq+0x1ad/0x1d7
[  238.921751]  [] __do_softirq+0xfb/0x25c
[  238.921752]  [] run_ksoftirqd+0x19/0x35
[  238.921755]  [] smpboot_thread_fn+0x169/0x1a9
[  238.921756]  [] ? sort_range+0x1d/0x1d
[  238.921759]  [] kthread+0xa0/0xa8
[  238.921763]  [] ret_from_fork+0x1f/0x40
[  238.921764]  [] ? init_completion+0x24/0x24
[  238.921765] ---[ end trace 933912956c6ee5ff ]---
[  238.961672] e1000e :07:00.0 eth3: Reset adapter unexpectedly


So, on 4.4.8+, I see this and other splats related to e1000e.  I guess that is 
a separate
issue.  I can easily start 40k connections however, 30k across the two 10G 
ports,
and 10k more across a pair of mac-vlans on the 10G ports (since I was out of
address space to add a full 40k on the two physical ports).


Looks like the e1000e problem is a separate issue, so 

Re: TCP many-connection regression between 4.7 and 4.13 kernels.

2018-01-22 Thread Ben Greear

On 01/22/2018 10:16 AM, Eric Dumazet wrote:

On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote:

My test case is to have 6 processes each create 5000 TCP IPv4 connections to 
each other
on a system with 16GB RAM and send slow-speed data.  This works fine on a 4.7 
kernel, but
will not work at all on a 4.13.  The 4.13 first complains about running out of 
tcp memory,
but even after forcing those values higher, the max connections we can get is 
around 15k.

Both kernels have my out-of-tree patches applied, so it is possible it is my 
fault
at this point.

Any suggestions as to what this might be caused by, or if it is fixed in more 
recent kernels?

I will start bisecting in the meantime...



Hi Ben

Unfortunately I have no idea.

Are you using loopback flows, or have I misunderstood you ?

How loopback connections can be slow-speed ?



I am sending to self, but over external network interfaces, by using
routing tables and rules and such.

On 4.13.16+, I see the Intel driver bouncing when I try to start 20k
connections.  In this case, I have a pair of 10G ports doing 15k, and then
I try to start 5k on two of the 1G ports

Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is 
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is 
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is 
Down
Jan 22 10:15:41 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:43 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is 
Down
Jan 22 10:15:45 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 
(e1000e): transmit queue 0 timed out, trans_s...es: 1
Jan 22 10:15:51 lf1003-e3v2-13100124-f20x64 kernel: e1000e :07:00.0 eth3: 
Reset adapter unexpectedly


System reports 10+GB RAM free in this case, btw.

Actually, maybe the good kernel was even older than 4.7...I see same resets and 
inability to do a full 20k
connections on 4.7 too.   I double-checked with system-test and it seems 4.4 
was a good kernel.  I'll test
that next.  Here is splat from 4.7:

[  238.921679] [ cut here ]
[  238.921689] WARNING: CPU: 0 PID: 3 at 
/home/greearb/git/linux-bisect/net/sched/sch_generic.c:272 
dev_watchdog+0xd4/0x12f
[  238.921690] NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out
[  238.921691] Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 cfg80211 macvlan pktgen bnep bluetooth fuse coretemp intel_rapl 
ftdi_sio x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt iTCO_vendor_support joydev ie31200_edac ipmi_devintf irqbypass serio_raw ipmi_si edac_core 
shpchp fjes video i2c_i801 tpm_tis lpc_ich ipmi_msghandler tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core 
e1000e ixgbe mdio hwmon dca ptp pps_core ipv6 [last unloaded: nf_conntrack]

[  238.921720] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.7.0 #62
[  238.921721] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 
09/17/2012
[  238.921723]   88041cdd7cd8 81352a23 
88041cdd7d28
[  238.921725]   88041cdd7d18 810ea5dd 
01101cdd7d90
[  238.921727]  880417a84000 0100 8163ecff 
880417a84440
[  238.921728] Call Trace:
[  238.921733]  [] dump_stack+0x61/0x7d
[  238.921736]  [] __warn+0xbd/0xd8
[  238.921738]  [] ? netif_tx_lock+0x81/0x81
[  238.921740]  [] warn_slowpath_fmt+0x46/0x4e
[  238.921741]  [] ? netif_tx_lock+0x74/0x81
[  238.921743]  [] dev_watchdog+0xd4/0x12f
[  238.921746]  [] call_timer_fn+0x65/0x11b
[  238.921748]  [] ? netif_tx_lock+0x81/0x81
[  238.921749]  [] run_timer_softirq+0x1ad/0x1d7
[  238.921751]  [] __do_softirq+0xfb/0x25c
[  238.921752]  [] run_ksoftirqd+0x19/0x35
[  238.921755]  [] smpboot_thread_fn+0x169/0x1a9
[  238.921756]  [] ? sort_range+0x1d/0x1d
[  238.921759]  [] kthread+0xa0/0xa8
[  238.921763]  [] ret_from_fork+0x1f/0x40
[  238.921764]  [] ? init_completion+0x24/0x24
[  238.921765] ---[ end trace 933912956c6ee5ff ]---
[  238.961672] e1000e :07:00.0 eth3: Reset adapter unexpectedly


Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: TCP many-connection regression between 4.7 and 4.13 kernels.

2018-01-22 Thread Willy Tarreau
Hi Eric,

On Mon, Jan 22, 2018 at 10:16:06AM -0800, Eric Dumazet wrote:
> On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote:
> > My test case is to have 6 processes each create 5000 TCP IPv4 connections 
> > to each other
> > on a system with 16GB RAM and send slow-speed data.  This works fine on a 
> > 4.7 kernel, but
> > will not work at all on a 4.13.  The 4.13 first complains about running out 
> > of tcp memory,
> > but even after forcing those values higher, the max connections we can get 
> > is around 15k.
> > 
> > Both kernels have my out-of-tree patches applied, so it is possible it is 
> > my fault
> > at this point.
> > 
> > Any suggestions as to what this might be caused by, or if it is fixed in 
> > more recent kernels?
> > 
> > I will start bisecting in the meantime...
> > 
> 
> Hi Ben
> 
> Unfortunately I have no idea.
> 
> Are you using loopback flows, or have I misunderstood you ?
> 
> How loopback connections can be slow-speed ?

A few quick points : I have not noticed this on 4.9, which we use with
pretty satisfying performance (typically around 100k conn/s). However
during some recent tests I did around the meltdown fixes on 4.15, I
noticed a high connect() or bind() cost to find a local port when
running on the loopback, that I didn't have the time to compare to
older kernels. However, strace clearly showed that bind() (or connect()
if bind was not used) could take as much as 2-3 ms as source ports were
filling up.

To be clear, it was just a quick observation and anything could be wrong
there, including my tests. I'm just saying this in case it matches anything
Ben has observed. I can try to get more info if that helps, but it could be
a different case.

Cheers,
Willy


Re: TCP many-connection regression between 4.7 and 4.13 kernels.

2018-01-22 Thread Eric Dumazet
On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote:
> My test case is to have 6 processes each create 5000 TCP IPv4 connections to 
> each other
> on a system with 16GB RAM and send slow-speed data.  This works fine on a 4.7 
> kernel, but
> will not work at all on a 4.13.  The 4.13 first complains about running out 
> of tcp memory,
> but even after forcing those values higher, the max connections we can get is 
> around 15k.
> 
> Both kernels have my out-of-tree patches applied, so it is possible it is my 
> fault
> at this point.
> 
> Any suggestions as to what this might be caused by, or if it is fixed in more 
> recent kernels?
> 
> I will start bisecting in the meantime...
> 

Hi Ben

Unfortunately I have no idea.

Are you using loopback flows, or have I misunderstood you ?

How loopback connections can be slow-speed ?



TCP many-connection regression between 4.7 and 4.13 kernels.

2018-01-22 Thread Ben Greear

My test case is to have 6 processes each create 5000 TCP IPv4 connections to 
each other
on a system with 16GB RAM and send slow-speed data.  This works fine on a 4.7 
kernel, but
will not work at all on a 4.13.  The 4.13 first complains about running out of 
tcp memory,
but even after forcing those values higher, the max connections we can get is 
around 15k.

Both kernels have my out-of-tree patches applied, so it is possible it is my 
fault
at this point.

Any suggestions as to what this might be caused by, or if it is fixed in more 
recent kernels?

I will start bisecting in the meantime...

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com