Re: CURRENT: em0 NIC freezes under heavy I/O on net
On 01/11/17 01:27, O. Hartmann wrote: > Running recent CURRENT (FreeBSD 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 > CET 2017 amd64), the system freezes when doing a rsync over automounted > (autofs) NFSv4 filesystem, mounted from another CURRENT server (same revision, > but with BCM NICs). > > The host in question is a Fujitsu Celsius M740 equipted with an Intel NIC: > > [...] > em0: port 0xf020-0xf03f mem > 0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on > pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and > 1024 rx descriptors em0: msix_init qsets capped at 1 > em0: Unable to map MSIX table > em0: Using an MSI interrupt > em0: allocated for 1 tx_queues > em0: allocated for 1 rx_queues > em0: netmap queues/slots: TX 1/1024, RX 1/1024 > [...] > > The pciconf output reveals: > > em0@pci0:0:25:0:class=0x02 card=0x11ed1734 chip=0x153a8086 > rev=0x05 > hdr=0x00 vendor = 'Intel Corporation' > device = 'Ethernet Connection I217-LM' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xfb30, size 131072, enabled > bar [14] = type Memory, range 32, base 0xfb339000, size 4096, enabled > bar [18] = type I/O Port, range 32, base 0xf020, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 13[e0] = PCI Advanced Features: FLR TP > > I have a customized kernel. The NIC has revealed itself all the time as an > "emX" device (never as igbX). The kernel contains device netmap (if > relevevant). > > The phenomenon: > > Syncing a poudriere repository between to remote hosts, I use rsync on a NGSv4 > exported filesystem, mounted via AUTOFS. So far, this work two days ago > perfectly. Since yesterday, syncing brings down the network connection - the > connection is simply dead. Terminating the rsync, bringing em0 down and up > again doesn't help much, for short moments, the connection is established, but > dies within seconds. Restarting via "service netif restart" all network > services have the same effect: after the desaster, it is impossible for me to > bring back the NIC/connection to normal, I have to reboot. The same happens > when having heavy network load, but it takes a time and even rsync isn't > "deadly" within the same timeframe - it takes sometimes a couple of seconds, > another takes only one or two seconds to make the connection die. > > I checked with dd'ing a large file over that connection, it takes several > seconds then to make the connection freezing (so, someone could reproduce iy > not ncessarily using rsync). > > Kind regards, > > oh If you have the time today or tomorrow. Can you please capture 'sysctl dev.em.0' and post it here? In addition, I would like to have this patch tested in your configuration: https://people.freebsd.org/~sbruno/em_tx_limit.diff Finally, if you have any loader.conf entries for hw.em, please post them as well. sean signature.asc Description: OpenPGP digital signature
Re: CURRENT: em0 NIC freezes under heavy I/O on net
On Wed, 11 Jan 2017 03:06:19 -0800 Matthew Macy wrote: Hello, thanks for your responding. Your Email looks funny in my claws-mail ;-) You asked whether it started with the introduction of IFLIB - I do not know. Last week (I think it was Friday, and I did at least two updates of world/kernel that day), I had a very similar situation on this box, but it could be solved by disabling/commenting out the officially-non-supported option "options EM_MULTIQUEUE". Around yesterday, also after several buildworld/buildkernels (so I can not tell about the revision number), the problem under heavy load occured even without EM_MULTIQUEUE. I have no idea when the first code really flushed into HEAD. The problem can be solved by "ifconfig down && ifconfig up' temporarily as long as there is no load. That way, I managed to rsync a repository, but it took its while ... As long as the NIC is not under pressure/heavy I/O load, there is no problem so far. We run lots of i350, i210 devices and I also have those with my SoHo and I didn't have had these severe issues even putting a high load on two servers with the same rsyncing of a ports repo. They took the load (i350). i210 has not been tested under load. Hopefully, this naive observation is od use. i have no debug kernels at the moment ... sorry. Kind regards, Oliver Hartmann > > > > It looks like I have the wrong msix bar value for your NIC. Will > fix in the next day or so.-M On Wed, 11 Jan 2017 00:27:30 -0800 O. > Hartmann wrote Running recent CURRENT (FreeBSD > 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 CET 2017 amd64), the system > freezes when doing a rsync over automounted (autofs) NFSv4 filesystem, > mounted from another CURRENT server (same revision, but with BCM NICs). The > host in question is a Fujitsu Celsius M740 equipted with an Intel NIC: [...] > em0: port 0xf020-0xf03f mem > 0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on > pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and > 1024 rx descriptors em0: msix_init qsets capped at 1 em0: Unable to map MSIX > table em0: Using an MSI interrupt em0: allocated for 1 tx_queues em0: > allocated for 1 rx_queues em0: netmap queues/slots: TX 1/1024, RX 1/1024 > [...] The pciconf output reveals: em0@pci0:0:25:0:class=0x02 > card=0x11ed1734 chip=0x153a8086 rev=0x05 hdr=0x00 vendor = 'Intel > Corporation' device = 'Ethernet Connection I217-LM' class = > network subclass = ethernet bar [10] = type Memory, range 32, > base 0xfb30, size 131072, enabled bar [14] = type Memory, range 32, > base 0xfb339000, size 4096, enabled bar [18] = type I/O Port, range 32, > base 0xf020, size 32, enabled cap 01[c8] = powerspec 2 supports D0 D3 > current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 > message cap 13[e0] = PCI Advanced Features: FLR TP I have a customized > kernel. The NIC has revealed itself all the time as an "emX" device (never as > igbX). The kernel contains device netmap (if relevevant). The phenomenon: > Syncing a poudriere repository between to remote hosts, I use rsync on a > NGSv4 exported filesystem, mounted via AUTOFS. So far, this work two days ago > perfectly. Since yesterday, syncing brings down the network connection - the > connection is simply dead. Terminating the rsync, bringing em0 down and up > again doesn't help much, for short moments, the connection is established, > but dies within seconds. Restarting via "service netif restart" all network > services have the same effect: after the desaster, it is impossible for me to > bring back the NIC/connection to normal, I have to reboot. The same happens > when having heavy network load, but it takes a time and even rsync isn't > "deadly" within the same timeframe - it takes sometimes a couple of seconds, > another takes only one or two seconds to make the connection die. I checked > with dd'ing a large file over that connection, it takes several seconds then > to make the connection freezing (so, someone could reproduce iy not > ncessarily using rsync). Kind regards, oh > ___ freebsd-current@freebsd.org > mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To > unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > > > > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: CURRENT: em0 NIC freezes under heavy I/O on net
Sorry, I meant to send that to the other thread. Was this after the iflib driver commit? If so it's odd that we haven't seen anything like this and I'll try to get a fix in ASAP. On Wed, 11 Jan 2017 03:06:19 -0800 Me wrote It looks like I have the wrong msix bar value for your NIC. Will fix in the next day or so.-M On Wed, 11 Jan 2017 00:27:30 -0800 O. Hartmann wrote Running recent CURRENT (FreeBSD 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 CET 2017 amd64), the system freezes when doing a rsync over automounted (autofs) NFSv4 filesystem, mounted from another CURRENT server (same revision, but with BCM NICs). The host in question is a Fujitsu Celsius M740 equipted with an Intel NIC: [...] em0: port 0xf020-0xf03f mem 0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and 1024 rx descriptors em0: msix_init qsets capped at 1 em0: Unable to map MSIX table em0: Using an MSI interrupt em0: allocated for 1 tx_queues em0: allocated for 1 rx_queues em0: netmap queues/slots: TX 1/1024, RX 1/1024 [...] The pciconf output reveals: em0@pci0:0:25:0:class=0x02 card=0x11ed1734 chip=0x153a8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection I217-LM' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xfb30, size 131072, enabled bar [14] = type Memory, range 32, base 0xfb339000, size 4096, enabled bar [18] = type I/O Port, range 32, base 0xf020, size 32, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: FLR TP I have a customized kernel. The NIC has revealed itself all the time as an "emX" device (never as igbX). The kernel contains device netmap (if relevevant). The phenomenon: Syncing a poudriere repository between to remote hosts, I use rsync on a NGSv4 exported filesystem, mounted via AUTOFS. So far, this work two days ago perfectly. Since yesterday, syncing brings down the network connection - the connection is simply dead. Terminating the rsync, bringing em0 down and up again doesn't help much, for short moments, the connection is established, but dies within seconds. Restarting via "service netif restart" all network services have the same effect: after the desaster, it is impossible for me to bring back the NIC/connection to normal, I have to reboot. The same happens when having heavy network load, but it takes a time and even rsync isn't "deadly" within the same timeframe - it takes sometimes a couple of seconds, another takes only one or two seconds to make the connection die. I checked with dd'ing a large file over that connection, it takes several seconds then to make the connection freezing (so, someone could reproduce iy not ncessarily using rsync). Kind regards, oh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: CURRENT: em0 NIC freezes under heavy I/O on net
It looks like I have the wrong msix bar value for your NIC. Will fix in the next day or so.-M On Wed, 11 Jan 2017 00:27:30 -0800 O. Hartmann wrote Running recent CURRENT (FreeBSD 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 CET 2017 amd64), the system freezes when doing a rsync over automounted (autofs) NFSv4 filesystem, mounted from another CURRENT server (same revision, but with BCM NICs). The host in question is a Fujitsu Celsius M740 equipted with an Intel NIC: [...] em0: port 0xf020-0xf03f mem 0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and 1024 rx descriptors em0: msix_init qsets capped at 1 em0: Unable to map MSIX table em0: Using an MSI interrupt em0: allocated for 1 tx_queues em0: allocated for 1 rx_queues em0: netmap queues/slots: TX 1/1024, RX 1/1024 [...] The pciconf output reveals: em0@pci0:0:25:0:class=0x02 card=0x11ed1734 chip=0x153a8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection I217-LM' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xfb30, size 131072, enabled bar [14] = type Memory, range 32, base 0xfb339000, size 4096, enabled bar [18] = type I/O Port, range 32, base 0xf020, size 32, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: FLR TP I have a customized kernel. The NIC has revealed itself all the time as an "emX" device (never as igbX). The kernel contains device netmap (if relevevant). The phenomenon: Syncing a poudriere repository between to remote hosts, I use rsync on a NGSv4 exported filesystem, mounted via AUTOFS. So far, this work two days ago perfectly. Since yesterday, syncing brings down t he network connection - the connection is simply dead. Terminating the rsync, bringing em0 down and up again doesn't help much, for short moments, the connection is established, but dies within seconds. Restarting via "service netif restart" all network services have the same effect: after the desaster, it is impossible for me to bring back the NIC/connection to normal, I have to reboot. The same happens when having heavy network load, but it takes a time and even rsync isn't "deadly" within the same timeframe - it takes sometimes a couple of seconds, another takes only one or two seconds to make the connection die. I checked with dd'ing a large file over that connection, it takes several seconds then to make the connection freezing (so, someone could reproduce iy not ncessarily using rsync). Kind regards, oh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
CURRENT: em0 NIC freezes under heavy I/O on net
Running recent CURRENT (FreeBSD 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 CET 2017 amd64), the system freezes when doing a rsync over automounted (autofs) NFSv4 filesystem, mounted from another CURRENT server (same revision, but with BCM NICs). The host in question is a Fujitsu Celsius M740 equipted with an Intel NIC: [...] em0: port 0xf020-0xf03f mem 0xfb30-0xfb31,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and 1024 rx descriptors em0: msix_init qsets capped at 1 em0: Unable to map MSIX table em0: Using an MSI interrupt em0: allocated for 1 tx_queues em0: allocated for 1 rx_queues em0: netmap queues/slots: TX 1/1024, RX 1/1024 [...] The pciconf output reveals: em0@pci0:0:25:0:class=0x02 card=0x11ed1734 chip=0x153a8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection I217-LM' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xfb30, size 131072, enabled bar [14] = type Memory, range 32, base 0xfb339000, size 4096, enabled bar [18] = type I/O Port, range 32, base 0xf020, size 32, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: FLR TP I have a customized kernel. The NIC has revealed itself all the time as an "emX" device (never as igbX). The kernel contains device netmap (if relevevant). The phenomenon: Syncing a poudriere repository between to remote hosts, I use rsync on a NGSv4 exported filesystem, mounted via AUTOFS. So far, this work two days ago perfectly. Since yesterday, syncing brings down the network connection - the connection is simply dead. Terminating the rsync, bringing em0 down and up again doesn't help much, for short moments, the connection is established, but dies within seconds. Restarting via "service netif restart" all network services have the same effect: after the desaster, it is impossible for me to bring back the NIC/connection to normal, I have to reboot. The same happens when having heavy network load, but it takes a time and even rsync isn't "deadly" within the same timeframe - it takes sometimes a couple of seconds, another takes only one or two seconds to make the connection die. I checked with dd'ing a large file over that connection, it takes several seconds then to make the connection freezing (so, someone could reproduce iy not ncessarily using rsync). Kind regards, oh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"