Re: CURRENT: em0 NIC freezes under heavy I/O on net

2017-01-12 Thread Sean Bruno


On 01/11/17 01:27, O. Hartmann wrote:
> Running recent CURRENT (FreeBSD 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28
> CET 2017 amd64), the system freezes when doing a rsync over automounted
> (autofs) NFSv4 filesystem, mounted from another CURRENT server (same revision,
> but with BCM NICs).
> 
> The host in question is a Fujitsu Celsius M740 equipted with an Intel NIC:
> 
> [...]
> em0:  port 0xf020-0xf03f mem
> 0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on
> pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and
> 1024 rx descriptors em0: msix_init qsets capped at 1
> em0: Unable to map MSIX table 
> em0: Using an MSI interrupt
> em0: allocated for 1 tx_queues
> em0: allocated for 1 rx_queues
> em0: netmap queues/slots: TX 1/1024, RX 1/1024
> [...]
> 
> The pciconf output reveals:
> 
> em0@pci0:0:25:0:class=0x02 card=0x11ed1734 chip=0x153a8086 
> rev=0x05
> hdr=0x00 vendor = 'Intel Corporation'
> device = 'Ethernet Connection I217-LM'
> class  = network
> subclass   = ethernet
> bar   [10] = type Memory, range 32, base 0xfb30, size 131072, enabled
> bar   [14] = type Memory, range 32, base 0xfb339000, size 4096, enabled
> bar   [18] = type I/O Port, range 32, base 0xf020, size 32, enabled
> cap 01[c8] = powerspec 2  supports D0 D3  current D0
> cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
> cap 13[e0] = PCI Advanced Features: FLR TP
> 
> I have a customized kernel. The NIC has revealed itself all the time as an
> "emX" device (never as igbX). The kernel contains device netmap (if
> relevevant).
> 
> The phenomenon:
> 
> Syncing a poudriere repository between to remote hosts, I use rsync on a NGSv4
> exported filesystem, mounted via AUTOFS. So far, this work two days ago
> perfectly. Since yesterday, syncing brings down the network connection - the
> connection is simply dead. Terminating the rsync, bringing em0 down and up
> again doesn't help much, for short moments, the connection is established, but
> dies within seconds. Restarting via "service netif restart" all network
> services have the same effect: after the desaster, it is impossible for me to
> bring back the NIC/connection to normal, I have to reboot. The same happens
> when having heavy network load, but it takes a time and even rsync isn't
> "deadly" within the same timeframe - it takes sometimes a couple of seconds,
> another takes only one or two seconds to make the connection die. 
> 
> I checked with dd'ing a large file over that connection, it takes several
> seconds then to make the connection freezing (so, someone could reproduce iy
> not ncessarily using rsync).
> 
> Kind regards,
> 
> oh

If you have the time today or tomorrow.  Can you please capture 'sysctl
dev.em.0' and post it here?

In addition, I would like to have this patch tested in your configuration:

https://people.freebsd.org/~sbruno/em_tx_limit.diff

Finally, if you have any loader.conf entries for hw.em, please post them
as well.

sean



signature.asc
Description: OpenPGP digital signature


Re: CURRENT: em0 NIC freezes under heavy I/O on net

2017-01-11 Thread O. Hartmann
On Wed, 11 Jan 2017 03:06:19 -0800
Matthew Macy  wrote:

Hello,

thanks for your responding.

Your Email looks funny in my claws-mail ;-)

You asked whether it started with the introduction of IFLIB - I do not know.
Last week (I think it was Friday, and I did at least two updates of
world/kernel that day), I had a very similar situation on this box,
but it could be solved by disabling/commenting out the officially-non-supported
option "options EM_MULTIQUEUE".

Around yesterday, also after several buildworld/buildkernels (so I can not tell
about the revision number), the problem under heavy load occured even without
EM_MULTIQUEUE.

I have no idea when the first code really flushed into HEAD.

The problem can be solved by "ifconfig down && ifconfig up' temporarily as long
as there is no load. That way, I managed to rsync a repository, but it took its
while ... 

As long as the NIC is not under pressure/heavy I/O load, there is no problem so
far. We run lots of i350, i210 devices and I also have those with my SoHo and I
didn't have had these severe issues even putting a high load on two servers
with the same rsyncing of a ports repo. They took the load (i350). i210 has not
been tested under load.

Hopefully, this naive observation is od use. i have no debug kernels at the
moment ... sorry.

Kind regards,

Oliver Hartmann

> 
> 
> 
> It looks like I have the wrong msix bar value for your NIC. Will
> fix in the next day or so.-M On Wed, 11 Jan 2017 00:27:30 -0800  O.
> Hartmann wrote Running recent CURRENT (FreeBSD
> 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 CET 2017 amd64), the system
> freezes when doing a rsync over automounted (autofs) NFSv4 filesystem,
> mounted from another CURRENT server (same revision, but with BCM NICs).  The
> host in question is a Fujitsu Celsius M740 equipted with an Intel NIC:  [...]
> em0:  port 0xf020-0xf03f mem
> 0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on
> pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and
> 1024 rx descriptors em0: msix_init qsets capped at 1 em0: Unable to map MSIX
> table  em0: Using an MSI interrupt em0: allocated for 1 tx_queues em0:
> allocated for 1 rx_queues em0: netmap queues/slots: TX 1/1024, RX 1/1024
> [...]  The pciconf output reveals:  em0@pci0:0:25:0:class=0x02
> card=0x11ed1734 chip=0x153a8086 rev=0x05 hdr=0x00 vendor = 'Intel
> Corporation' device = 'Ethernet Connection I217-LM' class  =
> network subclass   = ethernet bar   [10] = type Memory, range 32,
> base 0xfb30, size 131072, enabled bar   [14] = type Memory, range 32,
> base 0xfb339000, size 4096, enabled bar   [18] = type I/O Port, range 32,
> base 0xf020, size 32, enabled cap 01[c8] = powerspec 2  supports D0 D3
> current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1
> message cap 13[e0] = PCI Advanced Features: FLR TP  I have a customized
> kernel. The NIC has revealed itself all the time as an "emX" device (never as
> igbX). The kernel contains device netmap (if relevevant).  The phenomenon:
> Syncing a poudriere repository between to remote hosts, I use rsync on a
> NGSv4 exported filesystem, mounted via AUTOFS. So far, this work two days ago
> perfectly. Since yesterday, syncing brings down the network connection - the
> connection is simply dead. Terminating the rsync, bringing em0 down and up
> again doesn't help much, for short moments, the connection is established,
> but dies within seconds. Restarting via "service netif restart" all network
> services have the same effect: after the desaster, it is impossible for me to
> bring back the NIC/connection to normal, I have to reboot. The same happens
> when having heavy network load, but it takes a time and even rsync isn't
> "deadly" within the same timeframe - it takes sometimes a couple of seconds,
> another takes only one or two seconds to make the connection die.   I checked
> with dd'ing a large file over that connection, it takes several seconds then
> to make the connection freezing (so, someone could reproduce iy not
> ncessarily using rsync).  Kind regards,  oh
> ___ freebsd-current@freebsd.org
> mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To
> unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" 
> 
> 
> 
> 

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: em0 NIC freezes under heavy I/O on net

2017-01-11 Thread Matthew Macy




Sorry, I meant to send that to the other thread. Was this after the 
iflib driver commit? If so it's odd that we haven't seen anything like this and 
I'll try to get a fix in ASAP. On Wed, 11 Jan 2017 03:06:19 -0800  
Me wrote It looks like I have the wrong msix bar value 
for your NIC. Will fix in the next day or so.-M On Wed, 11 Jan 2017 
00:27:30 -0800  O. Hartmann wrote Running recent 
CURRENT (FreeBSD 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 CET 2017 amd64), 
the system freezes when doing a rsync over automounted (autofs) NFSv4 
filesystem, mounted from another CURRENT server (same revision, but with BCM 
NICs).  The host in question is a Fujitsu Celsius M740 equipted with an Intel 
NIC:  [...] em0:  port 0xf020-0xf03f mem 
0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on 
pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and 
1024 rx descriptors em0: msix_init qsets capped at 1 em0: Unable to map MSIX 
table  em0: Using an MSI interrupt em0: allocated for 1 tx_queues em0: 
allocated for 1 rx_queues em0: netmap queues/slots: TX 1/1024, RX 1/1024 [...]  
The pciconf output reveals:  em0@pci0:0:25:0:class=0x02 
card=0x11ed1734 chip=0x153a8086 rev=0x05 hdr=0x00 vendor = 'Intel 
Corporation' device = 'Ethernet Connection I217-LM' class  = 
network subclass   = ethernet bar   [10] = type Memory, range 32, base 
0xfb30, size 131072, enabled bar   [14] = type Memory, range 32, base 
0xfb339000, size 4096, enabled bar   [18] = type I/O Port, range 32, base 
0xf020, size 32, enabled cap 01[c8] = powerspec 2  supports D0 D3  current 
D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message 
cap 13[e0] = PCI Advanced Features: FLR TP  I have a customized kernel. The NIC 
has revealed itself all the time as an "emX" device (never as igbX). The kernel 
contains device netmap (if relevevant).  The phenomenon:  Syncing a poudriere 
repository between to remote hosts, I use rsync on a NGSv4 exported filesystem, 
mounted via AUTOFS. So far, this work two days ago perfectly. Since yesterday, 
syncing brings down the network connection - the connection is simply dead. 
Terminating the rsync, bringing em0 down and up again doesn't help much, for 
short moments, the connection is established, but dies within seconds. 
Restarting via "service netif restart" all network services have the same 
effect: after the desaster, it is impossible for me to bring back the 
NIC/connection to normal, I have to reboot. The same happens when having heavy 
network load, but it takes a time and even rsync isn't "deadly" within the same 
timeframe - it takes sometimes a couple of seconds, another takes only one or 
two seconds to make the connection die.   I checked with dd'ing a large file 
over that connection, it takes several seconds then to make the connection 
freezing (so, someone could reproduce iy not ncessarily using rsync).  Kind 
regards,  oh ___ 
freebsd-current@freebsd.org mailing list 
https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send 
any mail to "freebsd-current-unsubscr...@freebsd.org" 






___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: CURRENT: em0 NIC freezes under heavy I/O on net

2017-01-11 Thread Matthew Macy




It looks like I have the wrong msix bar value for your NIC. Will 
fix in the next day or so.-M On Wed, 11 Jan 2017 00:27:30 -0800  O. 
Hartmann wrote Running recent CURRENT (FreeBSD 
12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 CET 2017 amd64), the system 
freezes when doing a rsync over automounted (autofs) NFSv4 filesystem, mounted 
from another CURRENT server (same revision, but with BCM NICs).  The host in 
question is a Fujitsu Celsius M740 equipted with an Intel NIC:  [...] em0: 
 port 0xf020-0xf03f mem 
0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on 
pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and 
1024 rx descriptors em0: msix_init qsets capped at 1 em0: Unable to map MSIX 
table  em0: Using an MSI interrupt em0: allocated for 1 tx_queues em0: 
allocated for 1 rx_queues em0: netmap queues/slots: TX 1/1024, RX 1/1024 [...]  
The pciconf output reveals: 
  em0@pci0:0:25:0:class=0x02 card=0x11ed1734 chip=0x153a8086 
rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet 
Connection I217-LM' class  = network subclass   = ethernet bar  
 [10] = type Memory, range 32, base 0xfb30, size 131072, enabled bar   
[14] = type Memory, range 32, base 0xfb339000, size 4096, enabled bar   
[18] = type I/O Port, range 32, base 0xf020, size 32, enabled cap 01[c8] = 
powerspec 2  supports D0 D3  current D0 cap 05[d0] = MSI supports 1 
message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: 
FLR TP  I have a customized kernel. The NIC has revealed itself all the time as 
an "emX" device (never as igbX). The kernel contains device netmap (if 
relevevant).  The phenomenon:  Syncing a poudriere repository between to remote 
hosts, I use rsync on a NGSv4 exported filesystem, mounted via AUTOFS. So far, 
this work two days ago perfectly. Since yesterday, syncing brings down t
 he network connection - the connection is simply dead. Terminating the rsync, 
bringing em0 down and up again doesn't help much, for short moments, the 
connection is established, but dies within seconds. Restarting via "service 
netif restart" all network services have the same effect: after the desaster, 
it is impossible for me to bring back the NIC/connection to normal, I have to 
reboot. The same happens when having heavy network load, but it takes a time 
and even rsync isn't "deadly" within the same timeframe - it takes sometimes a 
couple of seconds, another takes only one or two seconds to make the connection 
die.   I checked with dd'ing a large file over that connection, it takes 
several seconds then to make the connection freezing (so, someone could 
reproduce iy not ncessarily using rsync).  Kind regards,  oh 
___ freebsd-current@freebsd.org 
mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To 
unsubscribe, send any
  mail to "freebsd-current-unsubscr...@freebsd.org" 






___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"