Re: em(4) multiqueue

2023-04-25 Thread Jan Klemkow
On Fri, Apr 14, 2023 at 10:26:14AM +0800, Kevin Lo wrote:
> On Thu, Apr 13, 2023 at 01:30:36PM -0500, Brian Conway wrote:
> > Reviving this thread, apologies for discontinuity in mail readers: 
> > https://marc.info/?t=16564219358
> > 
> > After rebasing on 7.3, my results have mirrored Hrvoje's testing at
> > the end of that thread. No issues with throughput, unusual latency,
> > or reliability. `vmstat -i` shows some level of balancing between
> > the queues. I've been testing on as many em(4) systems as I have
> > access to, some manually, some in a packet forwarder/firewall
> > scenarios:
> 
> Last time I tested (about a year go) on I211, rx locked up if I tried 
> something
> like iperf3 or tcpbench.  Don't know if you have a similar problem.

I rebased the rest to current and tested it with tcpbench between the
following interfaces:

em0 at pci7 dev 0 function 0 "Intel 82580" rev 0x01, msix, 4 queues, address 
90:e2:ba:df:d5:2c
em0 at pci5 dev 0 function 0 "Intel I350" rev 0x01, msix, 8 queues, address 
00:25:90:eb:b3:c2

After a second the connection stucked.  As far as I can see, the
sending side got a problem.

ot45# tcpbench 192.168.99.3
  elapsed_ms  bytes mbps   bwidth
1012   14574120  115.210  100.00%
Conn:   1 Mbps:  115.210 Peak Mbps:  115.210 Avg Mbps:  115.210
2022  00.000-nan%
...

ot46# tcpbench -s
  elapsed_ms  bytes mbps   bwidth
1017   14313480  112.594  100.00%
Conn:   1 Mbps:  112.594 Peak Mbps:  112.594 Avg Mbps:  112.594
2027  00.000-nan%
...

ot45# netstat  -nf inet -p tcp
Active Internet connections
Proto   Recv-Q Send-Q  Local Address  Foreign AddressTCP-State
tcp  0 260640  192.168.99.1.18530 192.168.99.3.12345 CLOSING

When I retried it, it sometimes work and most times not.

kstat tells me, that transmit queues 1 to 3 are oactive and just 0
works:

em0:0:txq:0
 packets: 4042648 packets
   bytes: 5310138322 bytes
  qdrops: 9 packets
  errors: 0 packets
qlen: 0 packets
 maxqlen: 511 packets
 oactive: false
em0:0:txq:1
 packets: 9812 packets
   bytes: 14846716 bytes
  qdrops: 0 packets
  errors: 0 packets
qlen: 184 packets
 maxqlen: 511 packets
 oactive: true
em0:0:txq:2
 packets: 690362 packets
   bytes: 60011484 bytes
  qdrops: 0 packets
  errors: 0 packets
qlen: 185 packets
 maxqlen: 511 packets
 oactive: true
em0:0:txq:3
 packets: 443181 packets
   bytes: 43829886 bytes
  qdrops: 0 packets
  errors: 0 packets
qlen: 198 packets
 maxqlen: 511 packets
 oactive: true

This is the rebased diff on current i tested:

Index: dev/pci/files.pci
===
RCS file: /cvs/src/sys/dev/pci/files.pci,v
retrieving revision 1.361
diff -u -p -r1.361 files.pci
--- dev/pci/files.pci   23 Apr 2023 00:20:26 -  1.361
+++ dev/pci/files.pci   25 Apr 2023 11:25:47 -
@@ -334,7 +334,7 @@ attach  fxp at pci with fxp_pci
 file   dev/pci/if_fxp_pci.cfxp_pci
 
 # Intel Pro/1000
-device em: ether, ifnet, ifmedia
+device em: ether, ifnet, ifmedia, intrmap, stoeplitz
 attach em at pci
 file   dev/pci/if_em.c em
 file   dev/pci/if_em_hw.c  em
Index: dev/pci/if_em.c
===
RCS file: /cvs/src/sys/dev/pci/if_em.c,v
retrieving revision 1.365
diff -u -p -r1.365 if_em.c
--- dev/pci/if_em.c 9 Feb 2023 21:21:27 -   1.365
+++ dev/pci/if_em.c 25 Apr 2023 11:25:47 -
@@ -247,6 +247,7 @@ int  em_intr(void *);
 int  em_allocate_legacy(struct em_softc *);
 void em_start(struct ifqueue *);
 int  em_ioctl(struct ifnet *, u_long, caddr_t);
+int  em_rxrinfo(struct em_softc *, struct if_rxrinfo *);
 void em_watchdog(struct ifnet *);
 void em_init(void *);
 void em_stop(void *, int);
@@ -309,8 +310,10 @@ int  em_setup_queues_msix(struct em_soft
 int  em_queue_intr_msix(void *);
 int  em_link_intr_msix(void *);
 void em_enable_queue_intr_msix(struct em_queue *);
+void em_setup_rss(struct em_softc *);
 #else
 #define em_allocate_msix(_sc)  (-1)
+#define em_setup_rss(_sc)  0
 #endif
 
 #if NKSTAT > 0
@@ -333,7 +336,6 @@ struct cfdriver em_cd = {
 };
 
 static int em_smart_pwr_down = FALSE;
-int em_enable_msix = 0;
 
 /*
  *  Device identification routine
@@ -629,12 +631,12 @@ err_pci:
 void
 em_start(struct ifqueue *ifq)
 {
+   struct em_queue *que = ifq->ifq_softc;
struct ifnet *ifp = ifq->ifq_if;
struct em_softc *sc = ifp->if_softc;
u_int head, free, used;
struct mbuf *m;
int post = 0;
-   struct 

Re: em(4) multiqueue

2023-04-13 Thread Kevin Lo
On Thu, Apr 13, 2023 at 01:30:36PM -0500, Brian Conway wrote:
> 
> Reviving this thread, apologies for discontinuity in mail readers: 
> https://marc.info/?t=16564219358
> 
> After rebasing on 7.3, my results have mirrored Hrvoje's testing at the end 
> of that thread. No issues with throughput, unusual latency, or reliability. 
> `vmstat -i` shows some level of balancing between the queues. I've been 
> testing on as many em(4) systems as I have access to, some manually, some in 
> a packet forwarder/firewall scenarios:
> 
> em0 at pci1 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em1 at pci2 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em2 at pci3 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em3 at pci4 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em4 at pci5 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em5 at pci6 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> 
> em0 at pci1 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
> 00:0d:b9:...
> em1 at pci2 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
> 00:0d:b9:...
> em2 at pci3 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
> 00:0d:b9:...
> 
> em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
> 00:0d:b9:...
> em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
> 00:0d:b9:...
> em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
> 00:0d:b9:...

Last time I tested (about a year go) on I211, rx locked up if I tried something
like iperf3 or tcpbench.  Don't know if you have a similar problem.

> em0 at pci1 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 
> 68:05:ca:...
> 
> The only questions I have are around queue identification. All the specs I've 
> been able to find indicate the I210 should have 4 queues, did Intel make a 
> cheaper version with 2 toward the end of production? Or could it be an I211 
> masquerading as an I210 (and would that be bad for the driver)?
> 
> Also, 
> https://www.mouser.com/pdfdocs/Intel_82574L_82574IT_GbE_Controller_brief.pdf 
> indicates that the 82574L should have 2 queues?
> 
> Anyway, great work, please let me know if there's more I can do to help this 
> move forward.
> 
> Brian Conway
> Lead Software Engineer, Owner
> RCE Software, LLC
> 



Re: em(4) multiqueue

2023-04-13 Thread Stuart Henderson
On 2023/04/13 16:45, Sonic wrote:
> Is this multiqueue support in 7.3 or does it require patching?
> According to Intel the i211 should have 2 queues but I see no msi-x
> support in dmesg:
> em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03: msi, address

It is not committed, there's a diff.



Re: em(4) multiqueue

2023-04-13 Thread Sonic
Is this multiqueue support in 7.3 or does it require patching?
According to Intel the i211 should have 2 queues but I see no msi-x
support in dmesg:
em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03: msi, address

Thanks.
Chris



Re: em(4) multiqueue

2023-04-13 Thread Brian Conway
On Thu, Apr 13, 2023, at 2:45 PM, Stuart Henderson wrote:
> On 2023/04/13 13:30, Brian Conway wrote:
>> Reviving this thread, apologies for discontinuity in mail readers: 
>> https://marc.info/?t=16564219358
>> 
>> After rebasing on 7.3, my results have mirrored Hrvoje's testing at the end 
>> of that thread. No issues with throughput, unusual latency, or reliability. 
>> `vmstat -i` shows some level of balancing between the queues. I've been 
>> testing on as many em(4) systems as I have access to, some manually, some in 
>> a packet forwarder/firewall scenarios:
>> 
>> em0 at pci1 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
>> 00:f1:f3:...
>> em1 at pci2 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
>> 00:f1:f3:...
>> em2 at pci3 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
>> 00:f1:f3:...
>> em3 at pci4 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
>> 00:f1:f3:...
>> em4 at pci5 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
>> 00:f1:f3:...
>> em5 at pci6 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
>> 00:f1:f3:...
>> 
>> em0 at pci1 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
>> 00:0d:b9:...
>> em1 at pci2 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
>> 00:0d:b9:...
>> em2 at pci3 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
>> 00:0d:b9:...
>> 
>> em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
>> 00:0d:b9:...
>> em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
>> 00:0d:b9:...
>> em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
>> 00:0d:b9:...
>> 
>> em0 at pci1 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 
>> 68:05:ca:...
>> 
>> The only questions I have are around queue identification. All the specs 
>> I've been able to find indicate the I210 should have 4 queues, did Intel 
>> make a cheaper version with 2 toward the end of production? Or could it be 
>> an I211 masquerading as an I210 (and would that be bad for the driver)?
>
> Is it a 2-cpu machine?

Ah, you're right. The level of detail I provided was insufficient.

>> Also, 
>> https://www.mouser.com/pdfdocs/Intel_82574L_82574IT_GbE_Controller_brief.pdf 
>> indicates that the 82574L should have 2 queues?
>
> No msix in your dmesg excerpt for that one

I'll lug that one back out and take a look. Probably safe to assume a 
misunderstanding on my part. Thanks.

-b



Re: em(4) multiqueue

2023-04-13 Thread Stuart Henderson
On 2023/04/13 13:30, Brian Conway wrote:
> Reviving this thread, apologies for discontinuity in mail readers: 
> https://marc.info/?t=16564219358
> 
> After rebasing on 7.3, my results have mirrored Hrvoje's testing at the end 
> of that thread. No issues with throughput, unusual latency, or reliability. 
> `vmstat -i` shows some level of balancing between the queues. I've been 
> testing on as many em(4) systems as I have access to, some manually, some in 
> a packet forwarder/firewall scenarios:
> 
> em0 at pci1 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em1 at pci2 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em2 at pci3 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em3 at pci4 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em4 at pci5 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> em5 at pci6 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
> 00:f1:f3:...
> 
> em0 at pci1 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
> 00:0d:b9:...
> em1 at pci2 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
> 00:0d:b9:...
> em2 at pci3 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
> 00:0d:b9:...
> 
> em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
> 00:0d:b9:...
> em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
> 00:0d:b9:...
> em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
> 00:0d:b9:...
> 
> em0 at pci1 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 
> 68:05:ca:...
> 
> The only questions I have are around queue identification. All the specs I've 
> been able to find indicate the I210 should have 4 queues, did Intel make a 
> cheaper version with 2 toward the end of production? Or could it be an I211 
> masquerading as an I210 (and would that be bad for the driver)?

Is it a 2-cpu machine?

> Also, 
> https://www.mouser.com/pdfdocs/Intel_82574L_82574IT_GbE_Controller_brief.pdf 
> indicates that the 82574L should have 2 queues?

No msix in your dmesg excerpt for that one



Re: em(4) multiqueue

2023-04-13 Thread Brian Conway
Reviving this thread, apologies for discontinuity in mail readers: 
https://marc.info/?t=16564219358

After rebasing on 7.3, my results have mirrored Hrvoje's testing at the end of 
that thread. No issues with throughput, unusual latency, or reliability. 
`vmstat -i` shows some level of balancing between the queues. I've been testing 
on as many em(4) systems as I have access to, some manually, some in a packet 
forwarder/firewall scenarios:

em0 at pci1 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
00:f1:f3:...
em1 at pci2 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
00:f1:f3:...
em2 at pci3 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
00:f1:f3:...
em3 at pci4 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
00:f1:f3:...
em4 at pci5 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
00:f1:f3:...
em5 at pci6 dev 0 function 0 "Intel I210" rev 0x03, msix, 2 queues, address 
00:f1:f3:...

em0 at pci1 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
00:0d:b9:...
em1 at pci2 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
00:0d:b9:...
em2 at pci3 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues, address 
00:0d:b9:...

em0 at pci1 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
00:0d:b9:...
em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
00:0d:b9:...
em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03, msix, 2 queues, address 
00:0d:b9:...

em0 at pci1 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 68:05:ca:...

The only questions I have are around queue identification. All the specs I've 
been able to find indicate the I210 should have 4 queues, did Intel make a 
cheaper version with 2 toward the end of production? Or could it be an I211 
masquerading as an I210 (and would that be bad for the driver)?

Also, 
https://www.mouser.com/pdfdocs/Intel_82574L_82574IT_GbE_Controller_brief.pdf 
indicates that the 82574L should have 2 queues?

Anyway, great work, please let me know if there's more I can do to help this 
move forward.

Brian Conway
Lead Software Engineer, Owner
RCE Software, LLC



Re: em(4) multiqueue

2022-12-25 Thread Hrvoje Popovski
On 15.8.2022. 20:51, Hrvoje Popovski wrote:
> On 12.8.2022. 22:15, Hrvoje Popovski wrote:
>> Hi,
>>
>> I'm testing forwarding over
>>
>> em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01, msix, 4 queues,
>> em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01, msix, 4 queues,
>> em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues,
>> em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues,
>> em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01, msix, 4 queues,
>> em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01, msix, 4 queues,
> I've managed to get linux pktgen to send traffic on all 6 em interfaces
> at that same time, and box seems to work just fine. Some systat, vmstat
> and kstat details in attachment while traffic is flowing over that box.

Hi,

after 95 day in production with this diff and i350 and everything works
as expected. I'm sending this because it's time to upgrade :)
Is it maybe time to put this diff in ?


ix0 at pci5 dev 0 function 0 "Intel X540T" rev 0x01, msix, 8 queues,
address a0:36:9f:29:f3:28
ix1 at pci5 dev 0 function 1 "Intel X540T" rev 0x01, msix, 8 queues,
address a0:36:9f:29:f3:2a
em0 at pci6 dev 0 function 0 "Intel I350" rev 0x01, msix, 8 queues,
address ac:1f:6b:14:bd:b2
em1 at pci6 dev 0 function 1 "Intel I350" rev 0x01, msix, 8 queues,
address ac:1f:6b:14:bd:b3


fw2# uptime
 6:34PM  up 95 days, 19:26, 1 user, load averages: 0.00, 0.00, 0.00


fw2# vmstat -i
interrupt   total rate
irq0/clock 6622294171  799
irq0/ipi   8263089839  998
irq96/acpi0 10
irq114/ix0:0514761687   62
irq115/ix0:1510189468   61
irq116/ix0:2522691117   63
irq117/ix0:3531638415   64
irq118/ix0:4534116996   64
irq119/ix0:5511162669   61
irq120/ix0:6535267806   64
irq121/ix0:7519707637   62
irq122/ix0  20
irq99/xhci0680
irq100/ehci0   190
irq132/em0:0498689640   60
irq133/em0:1516744073   62
irq134/em0:2520784714   62
irq135/em0:3512596405   61
irq136/em0:4521988376   63
irq137/em0:5513939246   62
irq138/em0:6517184525   62
irq139/em0:7509781661   61
irq140/em0  20
irq141/em1:0216273893   26
irq143/em1:2283094667   34
irq148/em1:520
irq151/em1 180
irq100/ehci1   190
irq103/ahci0  50490680
Total 23681046204 2860




Re: em(4) multiqueue

2022-08-15 Thread Hrvoje Popovski
On 12.8.2022. 22:15, Hrvoje Popovski wrote:
> Hi,
> 
> I'm testing forwarding over
> 
> em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01, msix, 4 queues,
> em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01, msix, 4 queues,
> em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues,
> em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues,
> em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01, msix, 4 queues,
> em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01, msix, 4 queues,

I've managed to get linux pktgen to send traffic on all 6 em interfaces
at that same time, and box seems to work just fine. Some systat, vmstat
and kstat details in attachment while traffic is flowing over that box.

irq124/em0:0701061017 7450
irq125/em0:1700477475 7444
irq126/em0:2700518530 7445
irq127/em0:3700477219 7444
irq128/em0 120
irq129/em1:0702693602 7468
irq130/em1:1702621154 7467
irq131/em1:2702638755 7467
irq132/em1:3702619278 7467
irq133/em1  80
irq134/em2:0700792107 7448
irq135/em2:1685857158 7289
irq136/em2:2685987301 7290
irq137/em2:3685853293 7289
irq138/em2 120
irq139/em3:0702784432 7469
irq140/em3:1702673600 7468
irq141/em3:2702692900 7468
irq142/em3:3702670362 7468
irq143/em3  80
irq146/em4:0691767956 7352
irq147/em4:1687629590 7308
irq148/em4:2687675100 7308
irq149/em4:3687627987 7308
irq150/em4 120
irq151/em5:0702655585 7467
irq152/em5:1702482994 7466
irq153/em5:2702502382 7466
irq154/em5:3702481315 7466
irq155/em5  80NAME   LEN IDLE  NGC  CPU  REQ  REL LREQ
 LREL
knotepl  80   140 8680 87074
6
1 4181 42310
6
2 3659 36880
3
3 3409 34430
3
mbufpl 628* 64230  31819767309  36433364464 19452409
 46897323
1  32515621331  29647742759 38355243
 15286334
2  32477451749  32774718213 25059389
 35077344
3  32535508383  30493295832 36129475
 21741989
mcl12k   8000000
0
1000
0
2000
0
3000
0
mcl16k   8000000
0
1000
0
2000
0
3000
0
mcl2k8 5200 21370105468738 74153346  3915988
 1562
1650734300439537696 26402919
 3343
2617420319419639380 24727419
 4801
3665185877   1105501145 26117590
 81156997
mcl2k2 601* 61400  31683334392  36327982542 19966506
 48295824
1  31678859095  29022314473 38321812
 16928852
2  31682996160  32178017835 24908639
 37514119
3  31678642505  29196245479 37458168
 17922524
mcl4k8000000
0
1000
0
2000
0
3000
0
mcl64k   8000000
0
1000
0
2

Re: em(4) multiqueue

2022-08-12 Thread Hrvoje Popovski
On 28.6.2022. 15:11, Jonathan Matthew wrote:
> This adds the (not quite) final bits to em(4) to enable multiple rx/tx queues.
> Note that desktop/laptop models (I218, I219 etc.) do not support multiple 
> queues,
> so this only really applies to servers and network appliances (including 
> APU2).
> 
> It also removes the 'em_enable_msix' variable, in favour of using MSI-X on 
> devices
> that support multiple queues and MSI or INTX everywhere else.
> 
> I've tested this with an I350 on amd64 and arm64, where it works as expected, 
> and
> with the I218-LM in my laptop where it does nothing (as expected).
> More testing is welcome, especially in forwarding environments.


Hi,

I'm testing forwarding over

em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01, msix, 4 queues,
em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01, msix, 4 queues,
em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues,
em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03, msix, 4 queues,
em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01, msix, 4 queues,
em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01, msix, 4 queues,

and it seems that plain forwarding works as expected.
I'm sending traffic from em0 to em1, from em2 to em3 and from em4 to
em5, em6 is for ssh ...


irq124/em0:0  1233974 1316
irq125/em0:1  1233943 1316
irq126/em0:2  1233942 1316
irq127/em0:3  1233944 1316
irq128/em0  20
irq129/em1:0  1021586 1090
irq132/em1:320
irq133/em1  40
irq98/xhci0940
irq99/ehci0190
irq134/em2:0   466894  498
irq135/em2:1   466846  498
irq136/em2:2   466846  498
irq137/em2:3   466846  498
irq138/em2  20
irq139/em3:0   467019  498
irq143/em3  20
irq146/em4:0  1192252 1272
irq147/em4:1  1192213 1272
irq148/em4:2  1192211 1272
irq149/em4:3  1192212 1272
irq150/em4  20
irq151/em5:0  1192354 1272
irq155/em5  20
irq156/em6:0 29363
irq157/em6:1   840
irq158/em6:2   320
irq159/em6:3   300
irq160/em6  20


OpenBSD 7.2-beta (GENERIC.MP) #0: Fri Aug 12 12:50:45 CEST 2022
r...@smc4.srce.hr:/sys/arch/amd64/compile/GENERIC.MP
real mem = 17052663808 (16262MB)
avail mem = 16518463488 (15753MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xed9b0 (48 entries)
bios0: vendor American Megatrends Inc. version "2.3" date 05/07/2021
bios0: Supermicro Super Server
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SPMI MCFG UEFI DBG2 HPET WDDT
SSDT SSDT SSDT PRAD DMAR HEST BERT ERST EINJ
acpi0: wakeup devices IP2P(S4) EHC1(S4) EHC2(S4) RP07(S4) RP08(S4)
BR1A(S4) BR1B(S4) BR2A(S4) BR2B(S4) BR2C(S4) BR2D(S4) BR3A(S4) BR3B(S4)
BR3C(S4) BR3D(S4) RP01(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz, 2200.34 MHz, 06-56-03
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,RDSEED,ADX,SMAP,PT,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz, 2200.01 MHz, 06-56-03
cpu1:

Re: em(4) multiqueue

2022-07-02 Thread David Gwynne



> On 2 Jul 2022, at 08:44, Hrvoje Popovski  wrote:
> 
> On 28.6.2022. 15:11, Jonathan Matthew wrote:
>> This adds the (not quite) final bits to em(4) to enable multiple rx/tx 
>> queues.
>> Note that desktop/laptop models (I218, I219 etc.) do not support multiple 
>> queues,
>> so this only really applies to servers and network appliances (including 
>> APU2).
>> 
>> It also removes the 'em_enable_msix' variable, in favour of using MSI-X on 
>> devices
>> that support multiple queues and MSI or INTX everywhere else.
>> 
>> I've tested this with an I350 on amd64 and arm64, where it works as 
>> expected, and
>> with the I218-LM in my laptop where it does nothing (as expected).
>> More testing is welcome, especially in forwarding environments.
> 
> 
> Hi,
> 
> I'm testing this diff in forwarding setup where source is 10.113.0/24
> connected to em2 and destination is 10.114.0/24 connected to em3. I'm
> doing random source and destination per ip.
> 
> dmesg:
> em2 at pci6 dev 0 function 2 "Intel I350" rev 0x01, msix, 8 queues
> em3 at pci6 dev 0 function 3 "Intel I350" rev 0x01, msix, 8 queues
> 
> netstat:
> 10.113.0/24192.168.113.11 UGS00 - 8 em2
> 10.114.0/24192.168.114.11 UGS0 404056853 - 8 em3
> 
> 
> ifconfig:
> em2: flags=8843 mtu 1500
>lladdr 40:f2:e9:ec:b4:14
>index 5 priority 0 llprio 3
>media: Ethernet autoselect (1000baseT full-duplex,master)
>status: active
>inet 192.168.113.1 netmask 0xff00 broadcast 192.168.113.255
> em3: flags=8843 mtu 1500
>lladdr 40:f2:e9:ec:b4:15
>index 6 priority 0 llprio 3
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
>inet 192.168.114.1 netmask 0xff00 broadcast 192.168.114.255
> 
> 
> with vmstat -i
> irq160/em2:0  4740972 3538
> irq161/em2:1  4740979 3538
> irq162/em2:2  4740977 3538
> irq163/em2:3  4740978 3538
> irq164/em2:4  4740965 3538
> irq165/em2:5  4740972 3538
> irq166/em2:6  4740971 3538
> irq167/em2:7  4740965 3538
> irq168/em2  20
> irq169/em3:0  4741258 3538
> irq177/em3  20
> 
> 
> should I see 8 queues on em3 as on em2 ?

em(4) isn't populating the mbuf flowid field with the rss hash value the chip 
calculates when it receives packets, so there's no flow identifier for the 
network stack to use to assign packets to output queues on the way out. this 
means everything lands on the default (0th) queue.

cheers,
dlg


> 
> x3550m4# tcpdump -ni em3
> tcpdump: listening on em3, link-type EN10MB
> 00:39:26.663617 10.113.0.230.9 > 10.114.0.154.9: udp 18
> 00:39:26.663618 10.113.0.176.9 > 10.114.0.3.9: udp 18
> 00:39:26.663619 10.113.0.37.9 > 10.114.0.7.9: udp 18
> 00:39:26.663620 10.113.0.200.9 > 10.114.0.197.9: udp 18
> 00:39:26.663620 10.113.0.37.9 > 10.114.0.230.9: udp 18
> 00:39:26.663621 10.113.0.95.9 > 10.114.0.216.9: udp 18
> 00:39:26.663622 10.113.0.8.9 > 10.114.0.187.9: udp 18
> 00:39:26.663623 10.113.0.56.9 > 10.114.0.107.9: udp 18
> 00:39:26.663624 10.113.0.4.9 > 10.114.0.39.9: udp 18
> 00:39:26.663624 10.113.0.244.9 > 10.114.0.188.9: udp 18
> 00:39:26.663625 10.113.0.166.9 > 10.114.0.15.9: udp 18
> 00:39:26.663626 10.113.0.7.9 > 10.114.0.78.9: udp 18
> 00:39:26.663627 10.113.0.147.9 > 10.114.0.202.9: udp 18
> 00:39:26.663628 10.113.0.144.9 > 10.114.0.184.9: udp 18
> 00:39:26.663628 10.113.0.221.9 > 10.114.0.100.9: udp 18
> 00:39:26.663630 10.113.0.69.9 > 10.114.0.231.9: udp 18
> 00:39:26.663648 10.113.0.71.9 > 10.114.0.64.9: udp 18
> 
> 
> vmstat -iz
> irq160/em2:0  4740972 3501
> irq161/em2:1  4740979 3501
> irq162/em2:2  4740977 3501
> irq163/em2:3  4740978 3501
> irq164/em2:4  4740965 3501
> irq165/em2:5  4740972 3501
> irq166/em2:6  4740971 3501
> irq167/em2:7  4740965 3501
> irq168/em2  20
> irq169/em3:0  4741258 3501
> irq170/em3:100
> irq171/em3:200
> irq172/em3:300
> irq173/em3:400
> irq174/em3:500
> irq175/em3:600
> irq176/em3:700
> irq177/em3  20
> 



Re: em(4) multiqueue

2022-07-01 Thread Hrvoje Popovski
On 28.6.2022. 15:11, Jonathan Matthew wrote:
> This adds the (not quite) final bits to em(4) to enable multiple rx/tx queues.
> Note that desktop/laptop models (I218, I219 etc.) do not support multiple 
> queues,
> so this only really applies to servers and network appliances (including 
> APU2).
> 
> It also removes the 'em_enable_msix' variable, in favour of using MSI-X on 
> devices
> that support multiple queues and MSI or INTX everywhere else.
> 
> I've tested this with an I350 on amd64 and arm64, where it works as expected, 
> and
> with the I218-LM in my laptop where it does nothing (as expected).
> More testing is welcome, especially in forwarding environments.


Hi,

I'm testing this diff in forwarding setup where source is 10.113.0/24
connected to em2 and destination is 10.114.0/24 connected to em3. I'm
doing random source and destination per ip.

dmesg:
em2 at pci6 dev 0 function 2 "Intel I350" rev 0x01, msix, 8 queues
em3 at pci6 dev 0 function 3 "Intel I350" rev 0x01, msix, 8 queues

netstat:
10.113.0/24192.168.113.11 UGS00 - 8 em2
10.114.0/24192.168.114.11 UGS0 404056853 - 8 em3


ifconfig:
em2: flags=8843 mtu 1500
lladdr 40:f2:e9:ec:b4:14
index 5 priority 0 llprio 3
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
inet 192.168.113.1 netmask 0xff00 broadcast 192.168.113.255
em3: flags=8843 mtu 1500
lladdr 40:f2:e9:ec:b4:15
index 6 priority 0 llprio 3
media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
status: active
inet 192.168.114.1 netmask 0xff00 broadcast 192.168.114.255


with vmstat -i
irq160/em2:0  4740972 3538
irq161/em2:1  4740979 3538
irq162/em2:2  4740977 3538
irq163/em2:3  4740978 3538
irq164/em2:4  4740965 3538
irq165/em2:5  4740972 3538
irq166/em2:6  4740971 3538
irq167/em2:7  4740965 3538
irq168/em2  20
irq169/em3:0  4741258 3538
irq177/em3  20


should I see 8 queues on em3 as on em2 ?

x3550m4# tcpdump -ni em3
tcpdump: listening on em3, link-type EN10MB
00:39:26.663617 10.113.0.230.9 > 10.114.0.154.9: udp 18
00:39:26.663618 10.113.0.176.9 > 10.114.0.3.9: udp 18
00:39:26.663619 10.113.0.37.9 > 10.114.0.7.9: udp 18
00:39:26.663620 10.113.0.200.9 > 10.114.0.197.9: udp 18
00:39:26.663620 10.113.0.37.9 > 10.114.0.230.9: udp 18
00:39:26.663621 10.113.0.95.9 > 10.114.0.216.9: udp 18
00:39:26.663622 10.113.0.8.9 > 10.114.0.187.9: udp 18
00:39:26.663623 10.113.0.56.9 > 10.114.0.107.9: udp 18
00:39:26.663624 10.113.0.4.9 > 10.114.0.39.9: udp 18
00:39:26.663624 10.113.0.244.9 > 10.114.0.188.9: udp 18
00:39:26.663625 10.113.0.166.9 > 10.114.0.15.9: udp 18
00:39:26.663626 10.113.0.7.9 > 10.114.0.78.9: udp 18
00:39:26.663627 10.113.0.147.9 > 10.114.0.202.9: udp 18
00:39:26.663628 10.113.0.144.9 > 10.114.0.184.9: udp 18
00:39:26.663628 10.113.0.221.9 > 10.114.0.100.9: udp 18
00:39:26.663630 10.113.0.69.9 > 10.114.0.231.9: udp 18
00:39:26.663648 10.113.0.71.9 > 10.114.0.64.9: udp 18


vmstat -iz
irq160/em2:0  4740972 3501
irq161/em2:1  4740979 3501
irq162/em2:2  4740977 3501
irq163/em2:3  4740978 3501
irq164/em2:4  4740965 3501
irq165/em2:5  4740972 3501
irq166/em2:6  4740971 3501
irq167/em2:7  4740965 3501
irq168/em2  20
irq169/em3:0  4741258 3501
irq170/em3:100
irq171/em3:200
irq172/em3:300
irq173/em3:400
irq174/em3:500
irq175/em3:600
irq176/em3:700
irq177/em3  20



Re: em(4) multiqueue

2022-06-29 Thread Stuart Henderson
On 2022/06/29 13:19, Stuart Henderson wrote:
> On 2022/06/28 23:11, Jonathan Matthew wrote:
> > This adds the (not quite) final bits to em(4) to enable multiple rx/tx 
> > queues.
> > Note that desktop/laptop models (I218, I219 etc.) do not support multiple 
> > queues,
> > so this only really applies to servers and network appliances (including 
> > APU2).
> > 
> > It also removes the 'em_enable_msix' variable, in favour of using MSI-X on 
> > devices
> > that support multiple queues and MSI or INTX everywhere else.
> > 
> > I've tested this with an I350 on amd64 and arm64, where it works as 
> > expected, and
> > with the I218-LM in my laptop where it does nothing (as expected).
> > More testing is welcome, especially in forwarding environments.
> 
> Doesn't break things but doesn't do anything on i386 (I guess there's no 
> MSI-X?)

On amd64 on a similar machine, it works

I guess it maybe related to this in ppb.c


#ifdef __i386__
if (pci_intr_map(pa, ) == 0)
sc->sc_intrhand = pci_intr_establish(pc, ih, IPL_BIO,
ppb_intr, sc, self->dv_xname);
#else
if (pci_intr_map_msi(pa, ) == 0 ||
pci_intr_map(pa, ) == 0)
sc->sc_intrhand = pci_intr_establish(pc, ih, IPL_BIO,
ppb_intr, sc, self->dv_xname);
#endif