Re: NetBSD 9.0 vs 8.0 network slow

2020-04-13 Thread Jaromír Doleček
Are you in position to compile custom kernel, can you please try
disabling msix and confirm that is indeed the problem?

Can you also please do pcidump for the device, and send it over to me?
Particular thing to watch for is whether the device supports more than
1 MSI-X vector, while it has only single MSI vector.

Attached patch will make the attach only use INTx and MSI.

Jaromir

Le lun. 13 avr. 2020 à 21:46, Dima Veselov  a écrit :
>
> Greetings,
>
> is there any chance to debug and fix msix problem or even bring it
> back to msi?
>
> It seems bge interface on 9.0 have only difference with 8.0 by having
> msix vs msi and it is making Dell R220 servers totally unusable with
> 9.0 because sometimes it takes seconds for a packet to leave interface.
>
> 07.04.2020 17:05, Dima Veselov пишет:
> >> Another possible issue is a switch from single interrupt to multiple msi
> >> (and bugs in that area). If you check your -8 and -9 dmesg you should see
> >> details printed about interrupts routed to the bge interfaces (not sure
> >> if you need to boot -v for that nowadays).
> >
> > There is no any difference except 8.1 find pic msi1 but 9.0 find pic msix1 
> > here:
> >
> > [ 1.033124] allocated pic msix1 type edge pin 0 level 6 to cpu0 slot 17 
> > idt entry 99
> >
> > all other information is letter for letter. In a case of missing something
> > I've placed all output here: http://kab00m.ru/temp/almaz.tgz
>
> --
> Dima Veselov
> Physics R Establishment of Saint-Petersburg University
Index: if_bge.c
===
RCS file: /cvsroot/src/sys/dev/pci/if_bge.c,v
retrieving revision 1.345
diff -u -p -r1.345 if_bge.c
--- if_bge.c7 Feb 2020 00:04:28 -   1.345
+++ if_bge.c13 Apr 2020 20:16:51 -
@@ -3579,7 +3579,7 @@ bge_attach(device_t parent, device_t sel
int counts[PCI_INTR_TYPE_SIZE] = {
[PCI_INTR_TYPE_INTX] = 1,
[PCI_INTR_TYPE_MSI] = 1,
-   [PCI_INTR_TYPE_MSIX] = 1,
+   [PCI_INTR_TYPE_MSIX] = 0,
};
int max_type = PCI_INTR_TYPE_MSIX;
 


Re: NetBSD 9.0 vs 8.0 network slow

2020-04-13 Thread Dima Veselov

Greetings,

is there any chance to debug and fix msix problem or even bring it
back to msi?

It seems bge interface on 9.0 have only difference with 8.0 by having
msix vs msi and it is making Dell R220 servers totally unusable with
9.0 because sometimes it takes seconds for a packet to leave interface.

07.04.2020 17:05, Dima Veselov пишет:

Another possible issue is a switch from single interrupt to multiple msi
(and bugs in that area). If you check your -8 and -9 dmesg you should see
details printed about interrupts routed to the bge interfaces (not sure
if you need to boot -v for that nowadays).


There is no any difference except 8.1 find pic msi1 but 9.0 find pic msix1 here:

[ 1.033124] allocated pic msix1 type edge pin 0 level 6 to cpu0 slot 17 idt 
entry 99

all other information is letter for letter. In a case of missing something
I've placed all output here: http://kab00m.ru/temp/almaz.tgz


--
Dima Veselov
Physics R Establishment of Saint-Petersburg University


Re: NetBSD 9.0 vs 8.0 network slow

2020-04-07 Thread Dima Veselov
On Tue, Mar 03, 2020 at 10:07:00AM +0100, Martin Husemann wrote:
> >
> > Would be nice if someone can point me out what and
> > where I may debug to find the cause.
> 
> There are multiple possibilities, could you please double check the
> ifconfig output on both -8 and -9 and verify that the negotiated media types
> are identical? Same for offload features?

The only difference: 

ec_enabled=2 on 9.0
ec_enabled=3 on 8.1

full output:

bge0: flags=0x8843 mtu 1500
capabilities=3f80
capabilities=3f80
enabled=0
ec_capabilities=7
ec_enabled=2
address: 20:47:47:87:3f:20
media: Ethernet autoselect (1000baseT full-duplex)
status: active

I think VLAN_MTU is irrelevant and enabled on 8.1 because of using VLANs.

> Another possible issue is a switch from single interrupt to multiple msi
> (and bugs in that area). If you check your -8 and -9 dmesg you should see
> details printed about interrupts routed to the bge interfaces (not sure
> if you need to boot -v for that nowadays).

There is no any difference except 8.1 find pic msi1 but 9.0 find pic msix1 here:

[ 1.033124] allocated pic msix1 type edge pin 0 level 6 to cpu0 slot 17 idt 
entry 99

all other information is letter for letter. In a case of missing something
I've placed all output here: http://kab00m.ru/temp/almaz.tgz

-- 
Sincerely yours,
Dima Veselov
Physics R Establishment of Saint-Petersburg University


Re: NetBSD 9.0 vs 8.0 network slow

2020-03-03 Thread Martin Husemann
On Tue, Mar 03, 2020 at 12:32:01AM +0300, Dima Veselov wrote:
> Would be nice if someone can point me out what and
> where I may debug to find the cause.

There are multiple possibilities, could you please double check the
ifconfig output on both -8 and -9 and verify that the negotiated media types
are identical? Same for offload features?

Another possible issue is a switch from single interrupt to multiple msi
(and bugs in that area). If you check your -8 and -9 dmesg you should see
details printed about interrupts routed to the bge interfaces (not sure
if you need to boot -v for that nowadays).

Martin


NetBSD 9.0 vs 8.0 network slow

2020-03-02 Thread Dima Veselov

Greetings,

having two similar servers I tried to update one of them
to be first working on NetBSD 9. It seem to work fine, but
sometimes it freeze, that usually happens on network
services. This short freezes are actually mess up, so
I had to remove the server from production use.

Hardware is identical (maybe slightly different firmware)
and they both were proved to work very fast on NetBSD
8-STABLE.

Would be nice if someone can point me out what and
where I may debug to find the cause.

N9 ethernet:
[ 1,008595] bge1 at pci2 dev 0 function 1: Broadcom BCM5720 Gigabit 
Ethernet

[ 1,008595] bge1: APE firmware NCSI 1.5.12.0
[ 1,008595] bge1: interrupting at msix2 vec 0
[ 1,008595] bge1: HW config 002b1194, 6014, 4000aa08,  

[ 1,008595] bge1: ASIC BCM5720 A0 (0x572), Ethernet address 
20:47:47:87:3f:22

[ 1,008595] bge1: setting short Tx thresholds
[ 1,008595] brgphy1 at bge1 phy 2: BCM5720C 1000BASE-T media 
interface, rev. 0


N8 ethernet:
bge1 at pci2 dev 0 function 1: Broadcom BCM5720 Gigabit Ethernet
bge1: APE firmware NCSI 1.3.7.0
bge1: interrupting at msi2 vec 0
bge1: HW config 002b1194, 6014, 4000aa08,  
bge1: ASIC BCM5720 A0 (0x572), Ethernet address 20:47:47:8f:00:02
bge1: setting short Tx thresholds
brgphy1 at bge1 phy 2: BCM5720C 1000BASE-T media interface, rev. 0

both sysctl and kernel parameters are default.

Both interfaces have vlans, one on each and both interfaces
on N9 are affected.

Both servers hit port-amd64/47016 and port-amd64/53687, but
on NetBSD 8 this cause no noticeable performance issues.

Sympthoms that I ever noticed:
1. named was awfully slow and it could take seconds to answer
on local zone.

2. host request (without named) usually take 0.2 sec, but one of
ten will take up to 10 sec. tcpdump show that request is
not going out on interface for all that time.

3. lighttpd on N8 does the same work, but take 0% always.
same work on N9 can make it to take up to 30% CPU and always
will be in system part and almost always in kqueue. Any other
process using network will show the same, for example
find over NFS directory took 40% system CPU time in kqueue.

4. testing network speed with /dev/zero->netcat->netcat->/dev/null
is showing 7-8 Mb/s between N8 and N9 and more than 70 Mb/s
between N8 and anything else.

5. constant ping have much more varying times around 1ms
from N9 to N8, often raised to 10 and once even to 300 ms,
but ping from N8 to N9 is much more stable like pinging
any other host.

6. Other NetBSD-9 system show nice performance in same
network.

And for now I have OS reverted back and ensured that
NetBSD 8 do not cause any of the above.

Thanks in advance.

--
Dima Veselov
Physics R Establishment of Saint-Petersburg University