Re: NetBSD 9.0 vs 8.0 network slow
Are you in position to compile custom kernel, can you please try disabling msix and confirm that is indeed the problem? Can you also please do pcidump for the device, and send it over to me? Particular thing to watch for is whether the device supports more than 1 MSI-X vector, while it has only single MSI vector. Attached patch will make the attach only use INTx and MSI. Jaromir Le lun. 13 avr. 2020 à 21:46, Dima Veselov a écrit : > > Greetings, > > is there any chance to debug and fix msix problem or even bring it > back to msi? > > It seems bge interface on 9.0 have only difference with 8.0 by having > msix vs msi and it is making Dell R220 servers totally unusable with > 9.0 because sometimes it takes seconds for a packet to leave interface. > > 07.04.2020 17:05, Dima Veselov пишет: > >> Another possible issue is a switch from single interrupt to multiple msi > >> (and bugs in that area). If you check your -8 and -9 dmesg you should see > >> details printed about interrupts routed to the bge interfaces (not sure > >> if you need to boot -v for that nowadays). > > > > There is no any difference except 8.1 find pic msi1 but 9.0 find pic msix1 > > here: > > > > [ 1.033124] allocated pic msix1 type edge pin 0 level 6 to cpu0 slot 17 > > idt entry 99 > > > > all other information is letter for letter. In a case of missing something > > I've placed all output here: http://kab00m.ru/temp/almaz.tgz > > -- > Dima Veselov > Physics R Establishment of Saint-Petersburg University Index: if_bge.c === RCS file: /cvsroot/src/sys/dev/pci/if_bge.c,v retrieving revision 1.345 diff -u -p -r1.345 if_bge.c --- if_bge.c7 Feb 2020 00:04:28 - 1.345 +++ if_bge.c13 Apr 2020 20:16:51 - @@ -3579,7 +3579,7 @@ bge_attach(device_t parent, device_t sel int counts[PCI_INTR_TYPE_SIZE] = { [PCI_INTR_TYPE_INTX] = 1, [PCI_INTR_TYPE_MSI] = 1, - [PCI_INTR_TYPE_MSIX] = 1, + [PCI_INTR_TYPE_MSIX] = 0, }; int max_type = PCI_INTR_TYPE_MSIX;
Re: NetBSD 9.0 vs 8.0 network slow
Greetings, is there any chance to debug and fix msix problem or even bring it back to msi? It seems bge interface on 9.0 have only difference with 8.0 by having msix vs msi and it is making Dell R220 servers totally unusable with 9.0 because sometimes it takes seconds for a packet to leave interface. 07.04.2020 17:05, Dima Veselov пишет: Another possible issue is a switch from single interrupt to multiple msi (and bugs in that area). If you check your -8 and -9 dmesg you should see details printed about interrupts routed to the bge interfaces (not sure if you need to boot -v for that nowadays). There is no any difference except 8.1 find pic msi1 but 9.0 find pic msix1 here: [ 1.033124] allocated pic msix1 type edge pin 0 level 6 to cpu0 slot 17 idt entry 99 all other information is letter for letter. In a case of missing something I've placed all output here: http://kab00m.ru/temp/almaz.tgz -- Dima Veselov Physics R Establishment of Saint-Petersburg University
Re: NetBSD 9.0 vs 8.0 network slow
On Tue, Mar 03, 2020 at 10:07:00AM +0100, Martin Husemann wrote: > > > > Would be nice if someone can point me out what and > > where I may debug to find the cause. > > There are multiple possibilities, could you please double check the > ifconfig output on both -8 and -9 and verify that the negotiated media types > are identical? Same for offload features? The only difference: ec_enabled=2 on 9.0 ec_enabled=3 on 8.1 full output: bge0: flags=0x8843 mtu 1500 capabilities=3f80 capabilities=3f80 enabled=0 ec_capabilities=7 ec_enabled=2 address: 20:47:47:87:3f:20 media: Ethernet autoselect (1000baseT full-duplex) status: active I think VLAN_MTU is irrelevant and enabled on 8.1 because of using VLANs. > Another possible issue is a switch from single interrupt to multiple msi > (and bugs in that area). If you check your -8 and -9 dmesg you should see > details printed about interrupts routed to the bge interfaces (not sure > if you need to boot -v for that nowadays). There is no any difference except 8.1 find pic msi1 but 9.0 find pic msix1 here: [ 1.033124] allocated pic msix1 type edge pin 0 level 6 to cpu0 slot 17 idt entry 99 all other information is letter for letter. In a case of missing something I've placed all output here: http://kab00m.ru/temp/almaz.tgz -- Sincerely yours, Dima Veselov Physics R Establishment of Saint-Petersburg University
Re: NetBSD 9.0 vs 8.0 network slow
On Tue, Mar 03, 2020 at 12:32:01AM +0300, Dima Veselov wrote: > Would be nice if someone can point me out what and > where I may debug to find the cause. There are multiple possibilities, could you please double check the ifconfig output on both -8 and -9 and verify that the negotiated media types are identical? Same for offload features? Another possible issue is a switch from single interrupt to multiple msi (and bugs in that area). If you check your -8 and -9 dmesg you should see details printed about interrupts routed to the bge interfaces (not sure if you need to boot -v for that nowadays). Martin
NetBSD 9.0 vs 8.0 network slow
Greetings, having two similar servers I tried to update one of them to be first working on NetBSD 9. It seem to work fine, but sometimes it freeze, that usually happens on network services. This short freezes are actually mess up, so I had to remove the server from production use. Hardware is identical (maybe slightly different firmware) and they both were proved to work very fast on NetBSD 8-STABLE. Would be nice if someone can point me out what and where I may debug to find the cause. N9 ethernet: [ 1,008595] bge1 at pci2 dev 0 function 1: Broadcom BCM5720 Gigabit Ethernet [ 1,008595] bge1: APE firmware NCSI 1.5.12.0 [ 1,008595] bge1: interrupting at msix2 vec 0 [ 1,008595] bge1: HW config 002b1194, 6014, 4000aa08, [ 1,008595] bge1: ASIC BCM5720 A0 (0x572), Ethernet address 20:47:47:87:3f:22 [ 1,008595] bge1: setting short Tx thresholds [ 1,008595] brgphy1 at bge1 phy 2: BCM5720C 1000BASE-T media interface, rev. 0 N8 ethernet: bge1 at pci2 dev 0 function 1: Broadcom BCM5720 Gigabit Ethernet bge1: APE firmware NCSI 1.3.7.0 bge1: interrupting at msi2 vec 0 bge1: HW config 002b1194, 6014, 4000aa08, bge1: ASIC BCM5720 A0 (0x572), Ethernet address 20:47:47:8f:00:02 bge1: setting short Tx thresholds brgphy1 at bge1 phy 2: BCM5720C 1000BASE-T media interface, rev. 0 both sysctl and kernel parameters are default. Both interfaces have vlans, one on each and both interfaces on N9 are affected. Both servers hit port-amd64/47016 and port-amd64/53687, but on NetBSD 8 this cause no noticeable performance issues. Sympthoms that I ever noticed: 1. named was awfully slow and it could take seconds to answer on local zone. 2. host request (without named) usually take 0.2 sec, but one of ten will take up to 10 sec. tcpdump show that request is not going out on interface for all that time. 3. lighttpd on N8 does the same work, but take 0% always. same work on N9 can make it to take up to 30% CPU and always will be in system part and almost always in kqueue. Any other process using network will show the same, for example find over NFS directory took 40% system CPU time in kqueue. 4. testing network speed with /dev/zero->netcat->netcat->/dev/null is showing 7-8 Mb/s between N8 and N9 and more than 70 Mb/s between N8 and anything else. 5. constant ping have much more varying times around 1ms from N9 to N8, often raised to 10 and once even to 300 ms, but ping from N8 to N9 is much more stable like pinging any other host. 6. Other NetBSD-9 system show nice performance in same network. And for now I have OS reverted back and ensured that NetBSD 8 do not cause any of the above. Thanks in advance. -- Dima Veselov Physics R Establishment of Saint-Petersburg University