Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-09 Thread Alexander Bluhm
On Tue, Jan 09, 2024 at 12:04:17PM +1000, Jonathan Matthew wrote:
> On Wed, Jan 03, 2024 at 10:14:12AM +0100, Hrvoje Popovski wrote:
> > On 3.1.2024. 7:51, Jonathan Matthew wrote:
> > > On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
> > >> On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> > >>> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> > >>> interface and ifconfig bnxt0 down/up at the same time I can trigger
> > >>> panic. Panic can be triggered without kettenis@ diff...
> > >> It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
> > >> receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
> > >> My panic looks different.
> > > It looks like I wasn't trying very hard when I wrote bnxt_down().
> > > I think there's also a problem with bnxt_up() unwinding after failure
> > > in various places, but that's a different issue.
> > > 
> > > This makes it a more resilient for me, though it still logs
> > > 'bnxt0: unexpected completion type 3' a lot if I take the interface
> > > down while it's in use.  I'll look at that separately.
> > 
> > Hi,
> > 
> > with this diff I can still panic box with ifconfig up/down but not as
> > fast as without it
> 
> Right, this is the other problem where bnxt_up() wasn't cleaning up properly
> after failing part way through.  This diff should fix that, but I don't think
> it will fix the 'HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error'
> problem, so the interface will still stop working at that point.

OK bluhm@

> Index: if_bnxt.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v
> retrieving revision 1.39
> diff -u -p -r1.39 if_bnxt.c
> --- if_bnxt.c 10 Nov 2023 15:51:20 -  1.39
> +++ if_bnxt.c 9 Jan 2024 01:59:38 -
> @@ -1073,7 +1081,7 @@ bnxt_up(struct bnxt_softc *sc)
>   if (bnxt_hwrm_vnic_ctx_alloc(sc, >sc_vnic.rss_id) != 0) {
>   printf("%s: failed to allocate vnic rss context\n",
>   DEVNAME(sc));
> - goto down_queues;
> + goto down_all_queues;
>   }
>  
>   sc->sc_vnic.id = (uint16_t)HWRM_NA_SIGNATURE;
> @@ -1139,8 +1147,11 @@ dealloc_vnic:
>   bnxt_hwrm_vnic_free(sc, >sc_vnic);
>  dealloc_vnic_ctx:
>   bnxt_hwrm_vnic_ctx_free(sc, >sc_vnic.rss_id);
> +
> +down_all_queues:
> + i = sc->sc_nqueues;
>  down_queues:
> - for (i = 0; i < sc->sc_nqueues; i++)
> + while (i-- > 0)
>   bnxt_queue_down(sc, >sc_queues[i]);
>  
>   bnxt_dmamem_free(sc, sc->sc_rx_cfg);



Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-08 Thread Hrvoje Popovski
On 9.1.2024. 3:04, Jonathan Matthew wrote:
> On Wed, Jan 03, 2024 at 10:14:12AM +0100, Hrvoje Popovski wrote:
>> On 3.1.2024. 7:51, Jonathan Matthew wrote:
>>> On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
 On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> interface and ifconfig bnxt0 down/up at the same time I can trigger
> panic. Panic can be triggered without kettenis@ diff...
 It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
 receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
 My panic looks different.
>>> It looks like I wasn't trying very hard when I wrote bnxt_down().
>>> I think there's also a problem with bnxt_up() unwinding after failure
>>> in various places, but that's a different issue.
>>>
>>> This makes it a more resilient for me, though it still logs
>>> 'bnxt0: unexpected completion type 3' a lot if I take the interface
>>> down while it's in use.  I'll look at that separately.
>>
>> Hi,
>>
>> with this diff I can still panic box with ifconfig up/down but not as
>> fast as without it
> 
> Right, this is the other problem where bnxt_up() wasn't cleaning up properly
> after failing part way through.  This diff should fix that, but I don't think
> it will fix the 'HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error'
> problem, so the interface will still stop working at that point.
> 


With this diff bnxt behaves exactly as you said.
After a lot of ifconfig down/up at some point I get

smc24# ifconfig bnxt0 down
smc24# ifconfig bnxt0 up
bnxt0: attempt to re-allocate ring 0010
bnxt0: failed to allocate completion queue 0

and bnxt stop working ..



> 
> Index: if_bnxt.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v
> retrieving revision 1.39
> diff -u -p -r1.39 if_bnxt.c
> --- if_bnxt.c 10 Nov 2023 15:51:20 -  1.39
> +++ if_bnxt.c 9 Jan 2024 01:59:38 -
> @@ -1073,7 +1081,7 @@ bnxt_up(struct bnxt_softc *sc)
>   if (bnxt_hwrm_vnic_ctx_alloc(sc, >sc_vnic.rss_id) != 0) {
>   printf("%s: failed to allocate vnic rss context\n",
>   DEVNAME(sc));
> - goto down_queues;
> + goto down_all_queues;
>   }
>  
>   sc->sc_vnic.id = (uint16_t)HWRM_NA_SIGNATURE;
> @@ -1139,8 +1147,11 @@ dealloc_vnic:
>   bnxt_hwrm_vnic_free(sc, >sc_vnic);
>  dealloc_vnic_ctx:
>   bnxt_hwrm_vnic_ctx_free(sc, >sc_vnic.rss_id);
> +
> +down_all_queues:
> + i = sc->sc_nqueues;
>  down_queues:
> - for (i = 0; i < sc->sc_nqueues; i++)
> + while (i-- > 0)
>   bnxt_queue_down(sc, >sc_queues[i]);
>  
>   bnxt_dmamem_free(sc, sc->sc_rx_cfg);
> 



Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-08 Thread Jonathan Matthew
On Wed, Jan 03, 2024 at 10:14:12AM +0100, Hrvoje Popovski wrote:
> On 3.1.2024. 7:51, Jonathan Matthew wrote:
> > On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
> >> On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> >>> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> >>> interface and ifconfig bnxt0 down/up at the same time I can trigger
> >>> panic. Panic can be triggered without kettenis@ diff...
> >> It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
> >> receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
> >> My panic looks different.
> > It looks like I wasn't trying very hard when I wrote bnxt_down().
> > I think there's also a problem with bnxt_up() unwinding after failure
> > in various places, but that's a different issue.
> > 
> > This makes it a more resilient for me, though it still logs
> > 'bnxt0: unexpected completion type 3' a lot if I take the interface
> > down while it's in use.  I'll look at that separately.
> 
> Hi,
> 
> with this diff I can still panic box with ifconfig up/down but not as
> fast as without it

Right, this is the other problem where bnxt_up() wasn't cleaning up properly
after failing part way through.  This diff should fix that, but I don't think
it will fix the 'HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error'
problem, so the interface will still stop working at that point.


Index: if_bnxt.c
===
RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v
retrieving revision 1.39
diff -u -p -r1.39 if_bnxt.c
--- if_bnxt.c   10 Nov 2023 15:51:20 -  1.39
+++ if_bnxt.c   9 Jan 2024 01:59:38 -
@@ -1073,7 +1081,7 @@ bnxt_up(struct bnxt_softc *sc)
if (bnxt_hwrm_vnic_ctx_alloc(sc, >sc_vnic.rss_id) != 0) {
printf("%s: failed to allocate vnic rss context\n",
DEVNAME(sc));
-   goto down_queues;
+   goto down_all_queues;
}
 
sc->sc_vnic.id = (uint16_t)HWRM_NA_SIGNATURE;
@@ -1139,8 +1147,11 @@ dealloc_vnic:
bnxt_hwrm_vnic_free(sc, >sc_vnic);
 dealloc_vnic_ctx:
bnxt_hwrm_vnic_ctx_free(sc, >sc_vnic.rss_id);
+
+down_all_queues:
+   i = sc->sc_nqueues;
 down_queues:
-   for (i = 0; i < sc->sc_nqueues; i++)
+   while (i-- > 0)
bnxt_queue_down(sc, >sc_queues[i]);
 
bnxt_dmamem_free(sc, sc->sc_rx_cfg);



Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-03 Thread Jonathan Matthew
On Wed, Jan 03, 2024 at 01:04:05PM +0100, Alexander Bluhm wrote:
> On Wed, Jan 03, 2024 at 04:51:39PM +1000, Jonathan Matthew wrote:
> > On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
> > > On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> > > > While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> > > > interface and ifconfig bnxt0 down/up at the same time I can trigger
> > > > panic. Panic can be triggered without kettenis@ diff...
> > > 
> > > It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
> > > receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
> > > My panic looks different.
> > 
> > It looks like I wasn't trying very hard when I wrote bnxt_down().
> > I think there's also a problem with bnxt_up() unwinding after failure
> > in various places, but that's a different issue.
> > 
> > This makes it a more resilient for me, though it still logs
> > 'bnxt0: unexpected completion type 3' a lot if I take the interface
> > down while it's in use.  I'll look at that separately.
> 
> Should we intr_barrier(sc->sc_queues[0].q_ihc) if sc->sc_intrmap == NULL ?

In that case, we only have one interrupt vector and sc->sc_ih is its
cookie.

> 
> All these barriers make sense to me.  OK bluhm@

Thanks.

> 
> > Index: if_bnxt.c
> > ===
> > RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v
> > retrieving revision 1.39
> > diff -u -p -r1.39 if_bnxt.c
> > --- if_bnxt.c   10 Nov 2023 15:51:20 -  1.39
> > +++ if_bnxt.c   3 Jan 2024 06:36:02 -
> > @@ -1158,12 +1159,16 @@ bnxt_down(struct bnxt_softc *sc)
> >  
> > CLR(ifp->if_flags, IFF_RUNNING);
> >  
> > +   intr_barrier(sc->sc_ih);
> > +
> > for (i = 0; i < sc->sc_nqueues; i++) {
> > ifq_clr_oactive(ifp->if_ifqs[i]);
> > ifq_barrier(ifp->if_ifqs[i]);
> > -   /* intr barrier? */
> >  
> > -   timeout_del(>sc_queues[i].q_rx.rx_refill);
> > +   timeout_del_barrier(>sc_queues[i].q_rx.rx_refill);
> > +
> > +   if (sc->sc_intrmap != NULL)
> > +   intr_barrier(sc->sc_queues[i].q_ihc);
> > }
> >  
> > bnxt_hwrm_free_filter(sc, >sc_vnic);
> > 
> 



Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-03 Thread Alexander Bluhm
On Wed, Jan 03, 2024 at 04:51:39PM +1000, Jonathan Matthew wrote:
> On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
> > On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> > > While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> > > interface and ifconfig bnxt0 down/up at the same time I can trigger
> > > panic. Panic can be triggered without kettenis@ diff...
> > 
> > It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
> > receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
> > My panic looks different.
> 
> It looks like I wasn't trying very hard when I wrote bnxt_down().
> I think there's also a problem with bnxt_up() unwinding after failure
> in various places, but that's a different issue.
> 
> This makes it a more resilient for me, though it still logs
> 'bnxt0: unexpected completion type 3' a lot if I take the interface
> down while it's in use.  I'll look at that separately.

Should we intr_barrier(sc->sc_queues[0].q_ihc) if sc->sc_intrmap == NULL ?

All these barriers make sense to me.  OK bluhm@

> Index: if_bnxt.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v
> retrieving revision 1.39
> diff -u -p -r1.39 if_bnxt.c
> --- if_bnxt.c 10 Nov 2023 15:51:20 -  1.39
> +++ if_bnxt.c 3 Jan 2024 06:36:02 -
> @@ -1158,12 +1159,16 @@ bnxt_down(struct bnxt_softc *sc)
>  
>   CLR(ifp->if_flags, IFF_RUNNING);
>  
> + intr_barrier(sc->sc_ih);
> +
>   for (i = 0; i < sc->sc_nqueues; i++) {
>   ifq_clr_oactive(ifp->if_ifqs[i]);
>   ifq_barrier(ifp->if_ifqs[i]);
> - /* intr barrier? */
>  
> - timeout_del(>sc_queues[i].q_rx.rx_refill);
> + timeout_del_barrier(>sc_queues[i].q_rx.rx_refill);
> +
> + if (sc->sc_intrmap != NULL)
> + intr_barrier(sc->sc_queues[i].q_ihc);
>   }
>  
>   bnxt_hwrm_free_filter(sc, >sc_vnic);
> 



Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-03 Thread Hrvoje Popovski
On 3.1.2024. 7:51, Jonathan Matthew wrote:
> On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
>> On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
>>> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
>>> interface and ifconfig bnxt0 down/up at the same time I can trigger
>>> panic. Panic can be triggered without kettenis@ diff...
>> It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
>> receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
>> My panic looks different.
> It looks like I wasn't trying very hard when I wrote bnxt_down().
> I think there's also a problem with bnxt_up() unwinding after failure
> in various places, but that's a different issue.
> 
> This makes it a more resilient for me, though it still logs
> 'bnxt0: unexpected completion type 3' a lot if I take the interface
> down while it's in use.  I'll look at that separately.

Hi,

with this diff I can still panic box with ifconfig up/down but not as
fast as without it

panic with diff

bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.
bnxt0: failed to set up tx ring
uvm_fault(0xfd8e57e02460, 0xff0, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  bnxt_queue_down+0x62:   movq0(%r12,%rax,1),%rsi
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
* 70181  53204  0 0x3  00K ifconfig
bnxt_queue_down(802c9000,802c9f88) at bnxt_queue_down+0x62
bnxt_up(802c9000) at bnxt_up+0x36b
bnxt_ioctl(802c9048,80206910,8000607fffd0) at bnxt_ioctl+0x162
ifioctl(fd8e417ab758,80206910,8000607fffd0,800060797aa8) at
ifioctl+0x726
sys_ioctl(800060797aa8,8000608000d0,800060800120) at
sys_ioctl+0x2af
syscall(800060800190) at syscall+0x3b4
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7e3d0a930430, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.


ddb{0}> show reg
rdi   0x8244b950pci_bus_dma_tag
rsi   0x802c9f88
rbp   0x8000607ffe40
rbx0x101
rdx   0xc803
rcx0x206
rax0xff0
r8  0x3f
r9 0
r10   0xa14b312597c5ea6a
r11   0x819fac40_bus_dmamap_destroy
r120
r130x100
r14   0x802c9f88
r15   0x802c9000
rip   0x81b578e2bnxt_queue_down+0x62
cs   0x8
rflags   0x10216__ALIGN_SIZE+0xf216
rsp   0x8000607ffde0
ss  0x10
bnxt_queue_down+0x62:   movq0(%r12,%rax,1),%rsi



ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*53204   70181  81971  0  7 0x3ifconfig
 57044  336864  81971  0  30x100083  kqreadiperf3
 57044  317909  81971  0  3   0x4100083  kqreadiperf3
 57044  253167  81971  0  3   0x4100083  kqreadiperf3
 57044  199984  81971  0  3   0x4100083  kqreadiperf3
 57044  343144  81971  0  3   0x4100083  kqreadiperf3
 81971  379109  1  0  30x10008b  sigsusp   ksh
 69236  410163  1  0  30x100098  kqreadcron
 28984  478747  27164 95  3   0x1100092  kqreadsmtpd
 75309  290569  27164103  3   0x1100092  kqreadsmtpd
  3782  175531  27164 95  3   0x1100092  kqreadsmtpd
 60089   38850  27164 95  30x100092  kqreadsmtpd
 72803  151501  27164 95  3   0x1100092  kqreadsmtpd
 88240  203086  27164 95  3   0x1100092  kqreadsmtpd
 27164  293957  1  0  30x100080  kqreadsmtpd
 51687  170066  1  0  30x88  kqreadsshd
 82716  114406  1  0  30x100080  kqreadntpd
 95469  439610  76144 83  30x100092  kqreadntpd
 76144  242283  1 83  3   0x1100092  kqreadntpd
 25275  206721  16938 73  3   0x1100090  kqreadsyslogd
 16938  424245  1  0  30x100082  netio syslogd
 92580  279098  0  0  3 0x14200  bored smr
 40549  159120  0  0  3 0x14200  pgzerozerothread
 12488  115575  0  0  3 0x14200  aiodoned  aiodoned
 91171  460632  0  0  3 0x14200  syncerupdate
 83952  275089  0  0  3 0x14200  cleaner   cleaner
  6394  148862  0  0  3 0x14200  reaperreaper
 60888  287201  0  0  3 0x14200  pgdaemon  pagedaemon
 25804  403088  0  0  3 0x14200  usbtskusbtask
 39034  435293  0  0  3 0x14200  usbatsk   

Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-02 Thread Jonathan Matthew
On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
> On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> > While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> > interface and ifconfig bnxt0 down/up at the same time I can trigger
> > panic. Panic can be triggered without kettenis@ diff...
> 
> It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
> receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
> My panic looks different.

It looks like I wasn't trying very hard when I wrote bnxt_down().
I think there's also a problem with bnxt_up() unwinding after failure
in various places, but that's a different issue.

This makes it a more resilient for me, though it still logs
'bnxt0: unexpected completion type 3' a lot if I take the interface
down while it's in use.  I'll look at that separately.


Index: if_bnxt.c
===
RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v
retrieving revision 1.39
diff -u -p -r1.39 if_bnxt.c
--- if_bnxt.c   10 Nov 2023 15:51:20 -  1.39
+++ if_bnxt.c   3 Jan 2024 06:36:02 -
@@ -1158,12 +1159,16 @@ bnxt_down(struct bnxt_softc *sc)
 
CLR(ifp->if_flags, IFF_RUNNING);
 
+   intr_barrier(sc->sc_ih);
+
for (i = 0; i < sc->sc_nqueues; i++) {
ifq_clr_oactive(ifp->if_ifqs[i]);
ifq_barrier(ifp->if_ifqs[i]);
-   /* intr barrier? */
 
-   timeout_del(>sc_queues[i].q_rx.rx_refill);
+   timeout_del_barrier(>sc_queues[i].q_rx.rx_refill);
+
+   if (sc->sc_intrmap != NULL)
+   intr_barrier(sc->sc_queues[i].q_ihc);
}
 
bnxt_hwrm_free_filter(sc, >sc_vnic);




Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-02 Thread Alexander Bluhm
On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> interface and ifconfig bnxt0 down/up at the same time I can trigger
> panic. Panic can be triggered without kettenis@ diff...

It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
My panic looks different.

root@ot42:.../~# ifconfig bnxt1 down
bnxt1: unexpected completion type 3
...
bnxt1: unexpected completion type 3
uvm_fault(0x8256c0b8, 0x30, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  bnxt_rx_fill+0x5f:  movq0x30(%rdx),%rdx
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 452275   8801  00x13  0x4003  iperf3
 343849  34751  0 0x14000  0x2002  softnet1
 154248  41240  0 0x14000  0x2001  softnet0
bnxt_rx_fill(802df888) at bnxt_rx_fill+0x5f
bnxt_intr(802df888) at bnxt_intr+0x406
intr_handler(80005c04c040,800a7800) at intr_handler+0x72
Xintr_ioapic_edge1_untramp() at Xintr_ioapic_edge1_untramp+0x18f
acpicpu_idle() at acpicpu_idle+0x11f
sched_idle(80005a61fff0) at sched_idle+0x282
end trace frame: 0x0, count: 9
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{7}> show panic
*cpu7: uvm_fault(0x8256c0b8, 0x30, 0, 1) -> e
ddb{7}> trace
bnxt_rx_fill(802df888) at bnxt_rx_fill+0x5f
bnxt_intr(802df888) at bnxt_intr+0x406
intr_handler(80005c04c040,800a7800) at intr_handler+0x72
Xintr_ioapic_edge1_untramp() at Xintr_ioapic_edge1_untramp+0x18f
acpicpu_idle() at acpicpu_idle+0x11f
sched_idle(80005a61fff0) at sched_idle+0x282
end trace frame: 0x0, count: -6
ddb{7}> show register
rdi   0x802df958
rsi   0x802df918
rbp   0x80005c04bf20
rbx   0x802df024
rdx0
rcx0
rax  0x4
r80xcc01
r9   0x1
r10   0x7be05f26dfeb8079
r11   0x81c2c48b86f2e7bd
r12  0x1
r13  0x1
r14   0x802df888
r15   0x802df000
rip   0x81b6180fbnxt_rx_fill+0x5f
cs   0x8
rflags   0x10202__ALIGN_SIZE+0xf202
rsp   0x80005c04bee0
ss  0x10
bnxt_rx_fill+0x5f:  movq0x30(%rdx),%rdx

In my case, I would say rx->rx_ring_mem is NULL.
slots = bnxt_rx_fill_slots(sc, >rx_ring,
BNXT_DMA_KVA(rx->rx_ring_mem), rx->rx_slots,
>rx_prod, MCLBYTES,
RX_PROD_PKT_BD_TYPE_RX_PROD_PKT, slots);

For Hrvoje's panic it looks like tx->tx_slots is NULL.
bnxt_free_slots(sc, tx->tx_slots, tx->tx_ring.ring_size,
tx->tx_ring.ring_size);



bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-02 Thread Hrvoje Popovski
Hi all,

While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
interface and ifconfig bnxt0 down/up at the same time I can trigger
panic. Panic can be triggered without kettenis@ diff...


bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.
bnxt0: failed to set up tx ring
uvm_fault(0xfd8e57f12a20, 0xff0, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  bnxt_queue_down+0x62:   movq0(%r12,%rax,1),%rsi
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*292054  36537  0 0x3  00K ifconfig
 163937  81780  0 0x14000 0x42006  sensors
bnxt_queue_down(802c9000,802c9f88) at bnxt_queue_down+0x62
bnxt_up(802c9000) at bnxt_up+0x36b
bnxt_ioctl(802c9048,80206910,8000607295f0) at bnxt_ioctl+0x162
ifioctl(fd8e442f2758,80206910,8000607295f0,8000607cf2b0) at
ifioctl+0x726
sys_ioctl(8000607cf2b0,8000607296f0,800060729740) at
sys_ioctl+0x2af
syscall(8000607297b0) at syscall+0x3b4
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x726ac871f790, count: 8
https://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.


ddb{0}> show reg
rdi   0x82485c78pci_bus_dma_tag
rsi   0x802c9f88
rbp   0x800060729460
rbx0x101
rdx   0xc803
rcx0x286
rax0xff0
r8  0x3f
r9 0
r10   0x96b31028f3e5d46c
r11   0x81825410_bus_dmamap_destroy
r120
r130x100
r14   0x802c9f88
r15   0x802c9000
rip   0x81db3da2bnxt_queue_down+0x62
cs   0x8
rflags   0x10216__ALIGN_SIZE+0xf216
rsp   0x800060729400
ss  0x10
bnxt_queue_down+0x62:   movq0(%r12,%rax,1),%rsi
ddb{0}>


ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*36537  292054  47404  0  7 0x3ifconfig
  86797
280843  47404  0  30x100083  kqreadiperf3
 86797  429491  47404  0  3   0x4100083  kqreadiperf3
 86797  214299  47404  0  3   0x4100083  kqreadiperf3
 86797  368590  47404  0  3   0x4100083  kqreadiperf3
 86797  380965  47404  0  3   0x4100083  kqreadiperf3
 47404  299766  1  0  30x10008b  sigsusp   ksh
  7161  521423  1  0  30x100098  kqreadcron
 39740  121938  83604 95  3   0x1100092  kqreadsmtpd
 94839  467744  83604103  3   0x1100092  kqreadsmtpd
 31264  522699  83604 95  3   0x1100092  kqreadsmtpd
 94528  511199  83604 95  30x100092  kqreadsmtpd
 37502  123618  83604 95  3   0x1100092  kqreadsmtpd
 89306   15887  83604 95  3   0x1100092  kqreadsmtpd
 83604  206718  1  0  30x100080  kqreadsmtpd
   428   70010  1  0  30x88  kqreadsshd
 94146  379619  1  0  30x100080  kqreadntpd
 23446  401588  82414 83  30x100092  kqreadntpd
 82414  378350  1 83  3   0x1100092  kqreadntpd
 80891  252069  55631 73  3   0x1100090  kqreadsyslogd
 55631   62854  1  0  30x100082  netio syslogd
 60491  452354  0  0  3 0x14200  bored smr
 20945   92407  0  0  3 0x14200  pgzerozerothread
 369255987  0  0  3 0x14200  aiodoned  aiodoned
 55091  437847  0  0  3 0x14200  syncerupdate
 13970  164134  0  0  3 0x14200  cleaner   cleaner
 36841  522592  0  0  3 0x14200  reaperreaper
 93326  303752  0  0  3 0x14200  pgdaemon  pagedaemon
  7898  311095  0  0  3 0x14200  usbtskusbtask
  2747  192075  0  0  3 0x14200  usbatsk   usbatsk
 97645  203456  0  0  3  0x40014200  acpi0 acpi0
 57525   67008  0  0  7  0x40014200idle23
 51862  472206  0  0  7  0x40014200idle22
 60651  418998  0  0  7  0x40014200idle21
  3576  237393  0  0  7  0x40014200idle20
  6504  170181  0  0  7  0x40014200idle19
 207063186  0  0  7  0x40014200idle18
 78053  233580  0  0  7  0x40014200idle17
 29625   58284  0  0  7  0x40014200idle16
 94538  146456  0  0  7  0x40014200idle15
 84902  429192  0  0  7  0x40014200