Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Mon, Jun 01, 2020 at 08:46:32AM +, Mikolaj Kucharski wrote: > On Sat, Apr 18, 2020 at 02:01:14PM +0200, Patrick Wildt wrote: > > On Tue, Apr 14, 2020 at 06:42:53PM +, Mikolaj Kucharski wrote: > > > On Tue, Apr 14, 2020 at 11:06:00AM +0200, Patrick Wildt wrote: > > > > On Mon, Apr 13, 2020 at 09:36:04PM +0200, Mark Kettenis wrote: > > > > > Can you print the status (full 32 bits) of that particular QTD? > > > > > > > > Yeah, I wish I could see the status field for each sqtd in that > > > > chain. > > > > > > XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, > > > sqtd_count=3 > > > XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, > > > status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1 > > > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, > > > status=0xc00, qtd_status=0xc00, sqtd_count=2 > > > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, > > > status=0x8d00, qtd_status=0x8d00, sqtd_count=3 > > > panic: _dmamap_sync: ran off map! > > > > Thanks for the information! I'll try and see if I can find what's > > happening. But yeah, that's good information. This is a control > > transfers, which has 3 transfer descriptors. > > > > The three TDs should be: SETUP, DATA (because of len=4), and STATUS. > > > > As you can see, the middle one, DATA, has a len of 4, and the BYTES > > field in the status is 0, indicating that the transfer completed. > > > > Then we have STATUS, which has no len, looks just fine as well. > > > > But the first one, the SETUP, is weird. The len is fine, 8 bytes, > > but ehci_qtd_len looks completely off, the BYTES field also says > > "lots of bytes remaining". This is where we fuck up, I suppose. > > > > What's weird is that we actually account it, because we have this > > check: > > > > if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) > > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > > > But it looks like the status field on the SETUP descriptor does > > not say PID_SETUP, but PID_IN. Did the DMA controller change the > > bit? Or did someone else overwrite it? > > > > Patrick > > I'm kinda stuck with this. I added KASSERT()s in places where I think > condtions should be checked and otherwise kernel should panic. Obviously > as I can trigger bogus condition I ended up with following ddb trace: > > urtwn0: SCAN -> INIT > XXX S1 ehci_idone: actlen=4, sqtd_len=0, ehci_qtd_len=16000, status=0x3e808d00 > XXX S2 ehci_idone: actlen=-15996, sqtd_len=0, ehci_qtd_len=16000, > status=0x3e808d00 > panic: kernel diagnostic assertion "sqtd_len >= ehci_qtd_len" failed: file > "/home/mkucharski/src/sys/dev/usb/ehci.c", line 915 > ... > db_enter() at panic+0x14c > panic() at ehci_idone+0x264 > ehci_isoc_idone() at ehci_softintr+0x158 > ehci_softintr() at softintr_biglock_wrap+0x1c > softintr_biglock_wrap() at softintr_dispatch+0x9c > softintr_dispatch() at arm_do_pending_intr+0xa8 > arm_do_pending_intr() at ampintc_irq_handler+0x178 > > This is retyped by hand from a glass console, be aware of typos. > > I don't understand why ehci_idone() goes to ehci_isoc_idone() in above > ddb trace output. Hi, I haven't gotten to it yet, but mpi@ said we should try to rely less on the hardware. I think an idea would be to store in each sqtd whether or not it carries data (that needs to be counted). So then you only read the actlen from the SQTD if it actually is supposed to carry data. This way we don't even try to read a length from anything else, like STATUS. Patrick
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sat, Apr 18, 2020 at 02:01:14PM +0200, Patrick Wildt wrote: > On Tue, Apr 14, 2020 at 06:42:53PM +, Mikolaj Kucharski wrote: > > On Tue, Apr 14, 2020 at 11:06:00AM +0200, Patrick Wildt wrote: > > > On Mon, Apr 13, 2020 at 09:36:04PM +0200, Mark Kettenis wrote: > > > > Can you print the status (full 32 bits) of that particular QTD? > > > > > > Yeah, I wish I could see the status field for each sqtd in that > > > chain. > > > > XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, > > sqtd_count=3 > > XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, > > status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1 > > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, > > status=0xc00, qtd_status=0xc00, sqtd_count=2 > > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, > > status=0x8d00, qtd_status=0x8d00, sqtd_count=3 > > panic: _dmamap_sync: ran off map! > > Thanks for the information! I'll try and see if I can find what's > happening. But yeah, that's good information. This is a control > transfers, which has 3 transfer descriptors. > > The three TDs should be: SETUP, DATA (because of len=4), and STATUS. > > As you can see, the middle one, DATA, has a len of 4, and the BYTES > field in the status is 0, indicating that the transfer completed. > > Then we have STATUS, which has no len, looks just fine as well. > > But the first one, the SETUP, is weird. The len is fine, 8 bytes, > but ehci_qtd_len looks completely off, the BYTES field also says > "lots of bytes remaining". This is where we fuck up, I suppose. > > What's weird is that we actually account it, because we have this > check: > > if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > But it looks like the status field on the SETUP descriptor does > not say PID_SETUP, but PID_IN. Did the DMA controller change the > bit? Or did someone else overwrite it? > > Patrick I'm kinda stuck with this. I added KASSERT()s in places where I think condtions should be checked and otherwise kernel should panic. Obviously as I can trigger bogus condition I ended up with following ddb trace: urtwn0: SCAN -> INIT XXX S1 ehci_idone: actlen=4, sqtd_len=0, ehci_qtd_len=16000, status=0x3e808d00 XXX S2 ehci_idone: actlen=-15996, sqtd_len=0, ehci_qtd_len=16000, status=0x3e808d00 panic: kernel diagnostic assertion "sqtd_len >= ehci_qtd_len" failed: file "/home/mkucharski/src/sys/dev/usb/ehci.c", line 915 ... db_enter() at panic+0x14c panic() at ehci_idone+0x264 ehci_isoc_idone() at ehci_softintr+0x158 ehci_softintr() at softintr_biglock_wrap+0x1c softintr_biglock_wrap() at softintr_dispatch+0x9c softintr_dispatch() at arm_do_pending_intr+0xa8 arm_do_pending_intr() at ampintc_irq_handler+0x178 This is retyped by hand from a glass console, be aware of typos. I don't understand why ehci_idone() goes to ehci_isoc_idone() in above ddb trace output. Index: sys/dev/usb/ehci.c === RCS file: /cvs/src/sys/dev/usb/ehci.c,v retrieving revision 1.210 diff -u -p -u -r1.210 ehci.c --- sys/dev/usb/ehci.c 3 Apr 2020 20:11:47 - 1.210 +++ sys/dev/usb/ehci.c 1 Jun 2020 06:19:27 - @@ -870,6 +870,7 @@ ehci_idone(struct usbd_xfer *xfer) struct ehci_soft_qtd *sqtd; u_int32_t status = 0, nstatus = 0; int actlen, cerr; + u_int32_t sqtd_len, ehci_qtd_len; #ifdef DIAGNOSTIC { @@ -899,8 +900,20 @@ ehci_idone(struct usbd_xfer *xfer) if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && EHCI_QTD_GET_BYTES(status) == 0) status &= ~EHCI_QTD_HALTED; - if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) - actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); + if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) { + sqtd_len = sqtd->len; + ehci_qtd_len = EHCI_QTD_GET_BYTES(status); + if (ehci_qtd_len > sqtd_len) { + printf("XXX S1 %s: actlen=%d, sqtd_len=%u, ehci_qtd_len=%u, status=0x%x\n", + __func__, actlen, sqtd_len, ehci_qtd_len, status); + } + actlen += sqtd_len - ehci_qtd_len; + if (ehci_qtd_len > sqtd_len) { + printf("XXX S2 %s: actlen=%d, sqtd_len=%u, ehci_qtd_len=%u, status=0x%x\n", + __func__, actlen, sqtd_len, ehci_qtd_len, status); + } + KASSERT(sqtd_len >= ehci_qtd_len); + } } cerr = EHCI_QTD_GET_CERR(status); Index: sys/dev/usb/usbdi.c === RCS file: /cvs/src/sys/dev/usb/usbdi.c,v retrieving
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Tue, Apr 14, 2020 at 06:42:53PM +, Mikolaj Kucharski wrote: > On Tue, Apr 14, 2020 at 11:06:00AM +0200, Patrick Wildt wrote: > > On Mon, Apr 13, 2020 at 09:36:04PM +0200, Mark Kettenis wrote: > > > Can you print the status (full 32 bits) of that particular QTD? > > > > Yeah, I wish I could see the status field for each sqtd in that > > chain. > > XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, > sqtd_count=3 > XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, > status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1 > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, > status=0xc00, qtd_status=0xc00, sqtd_count=2 > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, > status=0x8d00, qtd_status=0x8d00, sqtd_count=3 > panic: _dmamap_sync: ran off map! Thanks for the information! I'll try and see if I can find what's happening. But yeah, that's good information. This is a control transfers, which has 3 transfer descriptors. The three TDs should be: SETUP, DATA (because of len=4), and STATUS. As you can see, the middle one, DATA, has a len of 4, and the BYTES field in the status is 0, indicating that the transfer completed. Then we have STATUS, which has no len, looks just fine as well. But the first one, the SETUP, is weird. The len is fine, 8 bytes, but ehci_qtd_len looks completely off, the BYTES field also says "lots of bytes remaining". This is where we fuck up, I suppose. What's weird is that we actually account it, because we have this check: if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); But it looks like the status field on the SETUP descriptor does not say PID_SETUP, but PID_IN. Did the DMA controller change the bit? Or did someone else overwrite it? Patrick > Above are my debug lines in dmesg generated just before panic. > > The code which prints those lines is below. It's similar for-loop to > what is currently in CVS in ehci_idone() function. I'm doing this > for-loop twice. First version of for-loop is executed as is in CVS > and record condition that panic will happen, then below for-loop is > executed again with more debug output, otherwise printf() will slow > the code path so mutch that I will not be able to trigger error > condition. > > With below code (lines from 1 to 29) I already know panic condition is > met and I'm executing code with printf()s added. > > sqtd_count=3 means that for-loop from line 6 to 28 was executed 3 times. > > XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, > sqtd_count=3 > > Above message is from line 2 of the code listed below. > > Then we enter the for-loop at line 6, which iterates 3 times. Each > iteration gives following results: > > XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, > status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1 > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, > status=0xc00, qtd_status=0xc00, sqtd_count=2 > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, > status=0x8d00, qtd_status=0x8d00, sqtd_count=3 > > then code proceeds and triggers panic. > > 1if (xfer->length < actlen) { > 2printf("XXX #1# ehci_idone: len=%u, actlen=%u, > xf_actlen=%u, " > 3"status=0x%x, sqtd_count=%d\n", xfer->length, > actlen, xfer->actlen, status, sqtd_count); > 4x_actlen = 0; > 5sqtd_count = 0; > 6for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { > 7sqtd_count++; > 8usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), > 9BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); > 10nstatus = letoh32(sqtd->qtd.qtd_status); > 11qtd_status = nstatus; > 12if (nstatus & EHCI_QTD_ACTIVE) > 13break; > 14 > 15status = nstatus; > 16/* halt is ok if descriptor is last, and complete */ > 17if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) > && > 18EHCI_QTD_GET_BYTES(status) == 0) > 19status &= ~EHCI_QTD_HALTED; > 20if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) { > 21sqtd_len = sqtd->len; > 22ehci_qtd_len = EHCI_QTD_GET_BYTES(status); > 23x_actlen += sqtd_len - ehci_qtd_len; > 24printf("XXX #2# ehci_idone: x_actlen=%u, > sqtd_len=%u, ehci_qtd_len=%u, " > 25"status=0x%x, qtd_status=0x%x, > sqtd_count=%d\n", > 26x_actlen, sqtd_len, ehci_qtd_len, > status, qtd_status, sqtd_count); > 27} > 28
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Tue, Apr 14, 2020 at 11:06:00AM +0200, Patrick Wildt wrote: > On Mon, Apr 13, 2020 at 09:36:04PM +0200, Mark Kettenis wrote: > > Can you print the status (full 32 bits) of that particular QTD? > > Yeah, I wish I could see the status field for each sqtd in that > chain. XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, sqtd_count=3 XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1 XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, status=0xc00, qtd_status=0xc00, sqtd_count=2 XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, status=0x8d00, qtd_status=0x8d00, sqtd_count=3 panic: _dmamap_sync: ran off map! Above are my debug lines in dmesg generated just before panic. The code which prints those lines is below. It's similar for-loop to what is currently in CVS in ehci_idone() function. I'm doing this for-loop twice. First version of for-loop is executed as is in CVS and record condition that panic will happen, then below for-loop is executed again with more debug output, otherwise printf() will slow the code path so mutch that I will not be able to trigger error condition. With below code (lines from 1 to 29) I already know panic condition is met and I'm executing code with printf()s added. sqtd_count=3 means that for-loop from line 6 to 28 was executed 3 times. XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, sqtd_count=3 Above message is from line 2 of the code listed below. Then we enter the for-loop at line 6, which iterates 3 times. Each iteration gives following results: XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1 XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, status=0xc00, qtd_status=0xc00, sqtd_count=2 XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, status=0x8d00, qtd_status=0x8d00, sqtd_count=3 then code proceeds and triggers panic. 1 if (xfer->length < actlen) { 2 printf("XXX #1# ehci_idone: len=%u, actlen=%u, xf_actlen=%u, " 3 "status=0x%x, sqtd_count=%d\n", xfer->length, actlen, xfer->actlen, status, sqtd_count); 4 x_actlen = 0; 5 sqtd_count = 0; 6 for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { 7 sqtd_count++; 8 usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), 9 BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); 10 nstatus = letoh32(sqtd->qtd.qtd_status); 11 qtd_status = nstatus; 12 if (nstatus & EHCI_QTD_ACTIVE) 13 break; 14 15 status = nstatus; 16 /* halt is ok if descriptor is last, and complete */ 17 if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && 18 EHCI_QTD_GET_BYTES(status) == 0) 19 status &= ~EHCI_QTD_HALTED; 20 if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) { 21 sqtd_len = sqtd->len; 22 ehci_qtd_len = EHCI_QTD_GET_BYTES(status); 23 x_actlen += sqtd_len - ehci_qtd_len; 24 printf("XXX #2# ehci_idone: x_actlen=%u, sqtd_len=%u, ehci_qtd_len=%u, " 25 "status=0x%x, qtd_status=0x%x, sqtd_count=%d\n", 26 x_actlen, sqtd_len, ehci_qtd_len, status, qtd_status, sqtd_count); 27 } 28 } 29 } I can provide full patch if someone is interested what exactly I am doing. -- Regards, Mikolaj
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Mon, Apr 13, 2020 at 09:36:04PM +0200, Mark Kettenis wrote: > > From: "Theo de Raadt" > > cc: Patrick Wildt > > Comments: In-reply-to Mikolaj Kucharski > >message dated "Mon, 13 Apr 2020 18:55:37 -." > > Content-Type: text/plain; charset="us-ascii" > > Content-ID: <61132.158680562...@cvs.openbsd.org> > > Date: Mon, 13 Apr 2020 13:20:27 -0600 > > > > Mikolaj Kucharski wrote: > > > > > On Mon, Apr 13, 2020 at 12:44:54PM -0600, Theo de Raadt wrote: > > > > Then you need to look at why this crazy value is composed here in this > > > > loop: > > > > > > > > actlen = 0; > > > > for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { > > > > usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), > > > > BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); > > > > nstatus = letoh32(sqtd->qtd.qtd_status); > > > > if (nstatus & EHCI_QTD_ACTIVE) > > > > break; > > > > > > > > status = nstatus; > > > > /* halt is ok if descriptor is last, and complete */ > > > > if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) > > > > && > > > > EHCI_QTD_GET_BYTES(status) == 0) > > > > status &= ~EHCI_QTD_HALTED; > > > > if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) > > > > actlen += sqtd->len - > > > > EHCI_QTD_GET_BYTES(status); > > > > } > > > > > > But I just described that in one of my previous emails few minutes ago. > > > > > > In my diff I'm using second loop of the same code, but I guess I can > > > provide more surgurucal diff to be more explicit. What I wrote in my > > > previous email can be written also as follows: > > > > > > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > > > > > In above equation, on first for loop iteration: > > > > > > actlen is 0 > > > EHCI_QTD_GET_BYTES(status) is 16000 > > > sqtd->len is 8 > > > > > > actlen += 8 - 1600; > > > > > > Which results of actlen being 4294951304, from that moment we are > > > heading to kernel panic. > > > > Then next step is to figure out why the sqtd is incoherent. > > I actually suspect that the 16000 here is bogus. Exactly. I looked at it yesterday, without his additional info, and realized that in this case the calculated actlen for that control transfer must be broken. xfer->length was 8, xfer->actlen was 15996, so something was doing (0 - 4), or maybe (4 - 1600), which actually makes much more sense, since otherwise sqtd->len would have been 0, which would be very weird. So now, why does EHCI_QTD_GET_BYTES return 16000... > Can you print the status (full 32 bits) of that particular QTD? Yeah, I wish I could see the status field for each sqtd in that chain. On the bright side: This is not a bug in my DMA diff! But it exposed a bug in the actlen calculation. Patrick
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Mon, Apr 13, 2020 at 01:20:27PM -0600, Theo de Raadt wrote: > Then next step is to figure out why the sqtd is incoherent. Ok :) If someone has spare time to help me more on this I'm all ears. Maybe this crash output makes it more clear: XXX S01: ehci_idone: actlen=0, sqtd_len=8, ehci_qtd_len=16000, status=0x3e808d00 XXX S02: ehci_idone: actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, status=0x3e808d00 panic: _dmamap_sync: ran off map! Above output with below changes on top of CVS as of today: Index: sys/dev/usb/ehci.c === RCS file: /cvs/src/sys/dev/usb/ehci.c,v retrieving revision 1.210 diff -u -p -u -r1.210 ehci.c --- sys/dev/usb/ehci.c 3 Apr 2020 20:11:47 - 1.210 +++ sys/dev/usb/ehci.c 13 Apr 2020 19:14:44 - @@ -869,7 +869,8 @@ ehci_idone(struct usbd_xfer *xfer) struct ehci_xfer *ex = (struct ehci_xfer *)xfer; struct ehci_soft_qtd *sqtd; u_int32_t status = 0, nstatus = 0; - int actlen, cerr; + u_int32_t actlen, sqtd_len, ehci_qtd_len; + int cerr; #ifdef DIAGNOSTIC { @@ -899,8 +900,19 @@ ehci_idone(struct usbd_xfer *xfer) if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && EHCI_QTD_GET_BYTES(status) == 0) status &= ~EHCI_QTD_HALTED; - if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) - actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); + if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) { + sqtd_len = sqtd->len; + ehci_qtd_len = EHCI_QTD_GET_BYTES(status); + if (ehci_qtd_len > sqtd_len) { + printf("XXX S01 %s: actlen=%u, sqtd_len=%u, ehci_qtd_len=%u, status=0x%x\n", + __func__, actlen, sqtd_len, ehci_qtd_len, status); + } + actlen += sqtd_len - ehci_qtd_len; + if (ehci_qtd_len > sqtd_len) { + printf("XXX S02 %s: actlen=%u, sqtd_len=%u, ehci_qtd_len=%u, status=0x%x\n", + __func__, actlen, sqtd_len, ehci_qtd_len, status); + } + } } cerr = EHCI_QTD_GET_CERR(status); -- Regards, Mikolaj
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
> From: "Theo de Raadt" > cc: Patrick Wildt > Comments: In-reply-to Mikolaj Kucharski >message dated "Mon, 13 Apr 2020 18:55:37 -." > Content-Type: text/plain; charset="us-ascii" > Content-ID: <61132.158680562...@cvs.openbsd.org> > Date: Mon, 13 Apr 2020 13:20:27 -0600 > > Mikolaj Kucharski wrote: > > > On Mon, Apr 13, 2020 at 12:44:54PM -0600, Theo de Raadt wrote: > > > Then you need to look at why this crazy value is composed here in this > > > loop: > > > > > > actlen = 0; > > > for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { > > > usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), > > > BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); > > > nstatus = letoh32(sqtd->qtd.qtd_status); > > > if (nstatus & EHCI_QTD_ACTIVE) > > > break; > > > > > > status = nstatus; > > > /* halt is ok if descriptor is last, and complete */ > > > if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && > > > EHCI_QTD_GET_BYTES(status) == 0) > > > status &= ~EHCI_QTD_HALTED; > > > if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) > > > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > > } > > > > But I just described that in one of my previous emails few minutes ago. > > > > In my diff I'm using second loop of the same code, but I guess I can > > provide more surgurucal diff to be more explicit. What I wrote in my > > previous email can be written also as follows: > > > > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > > > In above equation, on first for loop iteration: > > > > actlen is 0 > > EHCI_QTD_GET_BYTES(status) is 16000 > > sqtd->len is 8 > > > > actlen += 8 - 1600; > > > > Which results of actlen being 4294951304, from that moment we are > > heading to kernel panic. > > Then next step is to figure out why the sqtd is incoherent. I actually suspect that the 16000 here is bogus. Can you print the status (full 32 bits) of that particular QTD?
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
Mikolaj Kucharski wrote: > On Mon, Apr 13, 2020 at 12:44:54PM -0600, Theo de Raadt wrote: > > Then you need to look at why this crazy value is composed here in this > > loop: > > > > actlen = 0; > > for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { > > usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), > > BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); > > nstatus = letoh32(sqtd->qtd.qtd_status); > > if (nstatus & EHCI_QTD_ACTIVE) > > break; > > > > status = nstatus; > > /* halt is ok if descriptor is last, and complete */ > > if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && > > EHCI_QTD_GET_BYTES(status) == 0) > > status &= ~EHCI_QTD_HALTED; > > if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) > > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > } > > But I just described that in one of my previous emails few minutes ago. > > In my diff I'm using second loop of the same code, but I guess I can > provide more surgurucal diff to be more explicit. What I wrote in my > previous email can be written also as follows: > > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > In above equation, on first for loop iteration: > > actlen is 0 > EHCI_QTD_GET_BYTES(status) is 16000 > sqtd->len is 8 > > actlen += 8 - 1600; > > Which results of actlen being 4294951304, from that moment we are > heading to kernel panic. Then next step is to figure out why the sqtd is incoherent.
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Mon, Apr 13, 2020 at 12:44:54PM -0600, Theo de Raadt wrote: > Then you need to look at why this crazy value is composed here in this > loop: > > actlen = 0; > for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { > usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), > BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); > nstatus = letoh32(sqtd->qtd.qtd_status); > if (nstatus & EHCI_QTD_ACTIVE) > break; > > status = nstatus; > /* halt is ok if descriptor is last, and complete */ > if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && > EHCI_QTD_GET_BYTES(status) == 0) > status &= ~EHCI_QTD_HALTED; > if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > } But I just described that in one of my previous emails few minutes ago. In my diff I'm using second loop of the same code, but I guess I can provide more surgurucal diff to be more explicit. What I wrote in my previous email can be written also as follows: actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); In above equation, on first for loop iteration: actlen is 0 EHCI_QTD_GET_BYTES(status) is 16000 sqtd->len is 8 actlen += 8 - 1600; Which results of actlen being 4294951304, from that moment we are heading to kernel panic. -- Regards, Mikolaj
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
Mikolaj Kucharski wrote: > On Mon, Apr 13, 2020 at 12:34:48PM -0600, Theo de Raadt wrote: > > Mikolaj Kucharski wrote: > > > > > Index: sys/dev/usb/ehci.c > > > === > > > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > > > retrieving revision 1.210 > > > diff -u -p -u -r1.210 ehci.c > > > --- sys/dev/usb/ehci.c3 Apr 2020 20:11:47 - 1.210 > > > +++ sys/dev/usb/ehci.c13 Apr 2020 18:04:51 - > > > @@ -869,7 +869,9 @@ ehci_idone(struct usbd_xfer *xfer) > > > struct ehci_xfer *ex = (struct ehci_xfer *)xfer; > > > struct ehci_soft_qtd *sqtd; > > > u_int32_t status = 0, nstatus = 0; > > > - int actlen, cerr; > > > + u_int32_t actlen, x_actlen; > > > + int cerr; > > > + u_int32_t x1, x2; > > > > > > #ifdef DIAGNOSTIC > > > { > > > @@ -903,8 +905,33 @@ ehci_idone(struct usbd_xfer *xfer) > > > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > > } > > > > > > + if (xfer->length < actlen) { > > > + printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, " > > > + "status=0x%x\n", xfer->length, actlen, xfer->actlen, > > > status); > > > > actlen hasn't been intialized, so you've introduced a new bug and > > aren't chasing the same bug. It is random stack garbage. > > With my diff actlen is initialized to 0 on line 891. It's existing code > already in CVS, where that initialization exists on line 889. In my diff > initialization of actlen is not visible as I didn't touch those lines, > so it's outside of diffs context. Anyway, I don't see what you see. Then you need to look at why this crazy value is composed here in this loop: actlen = 0; for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); nstatus = letoh32(sqtd->qtd.qtd_status); if (nstatus & EHCI_QTD_ACTIVE) break; status = nstatus; /* halt is ok if descriptor is last, and complete */ if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && EHCI_QTD_GET_BYTES(status) == 0) status &= ~EHCI_QTD_HALTED; if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); }
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Mon, Apr 13, 2020 at 12:34:48PM -0600, Theo de Raadt wrote: > Mikolaj Kucharski wrote: > > > Index: sys/dev/usb/ehci.c > > === > > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > > retrieving revision 1.210 > > diff -u -p -u -r1.210 ehci.c > > --- sys/dev/usb/ehci.c 3 Apr 2020 20:11:47 - 1.210 > > +++ sys/dev/usb/ehci.c 13 Apr 2020 18:04:51 - > > @@ -869,7 +869,9 @@ ehci_idone(struct usbd_xfer *xfer) > > struct ehci_xfer *ex = (struct ehci_xfer *)xfer; > > struct ehci_soft_qtd *sqtd; > > u_int32_t status = 0, nstatus = 0; > > - int actlen, cerr; > > + u_int32_t actlen, x_actlen; > > + int cerr; > > + u_int32_t x1, x2; > > > > #ifdef DIAGNOSTIC > > { > > @@ -903,8 +905,33 @@ ehci_idone(struct usbd_xfer *xfer) > > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > > } > > > > + if (xfer->length < actlen) { > > + printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, " > > + "status=0x%x\n", xfer->length, actlen, xfer->actlen, > > status); > > actlen hasn't been intialized, so you've introduced a new bug and > aren't chasing the same bug. It is random stack garbage. With my diff actlen is initialized to 0 on line 891. It's existing code already in CVS, where that initialization exists on line 889. In my diff initialization of actlen is not visible as I didn't touch those lines, so it's outside of diffs context. Anyway, I don't see what you see. -- Regards, Mikolaj
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
Mikolaj Kucharski wrote: > Index: sys/dev/usb/ehci.c > === > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > retrieving revision 1.210 > diff -u -p -u -r1.210 ehci.c > --- sys/dev/usb/ehci.c3 Apr 2020 20:11:47 - 1.210 > +++ sys/dev/usb/ehci.c13 Apr 2020 18:04:51 - > @@ -869,7 +869,9 @@ ehci_idone(struct usbd_xfer *xfer) > struct ehci_xfer *ex = (struct ehci_xfer *)xfer; > struct ehci_soft_qtd *sqtd; > u_int32_t status = 0, nstatus = 0; > - int actlen, cerr; > + u_int32_t actlen, x_actlen; > + int cerr; > + u_int32_t x1, x2; > > #ifdef DIAGNOSTIC > { > @@ -903,8 +905,33 @@ ehci_idone(struct usbd_xfer *xfer) > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); > } > > + if (xfer->length < actlen) { > + printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, " > + "status=0x%x\n", xfer->length, actlen, xfer->actlen, > status); actlen hasn't been intialized, so you've introduced a new bug and aren't chasing the same bug. It is random stack garbage.
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sun, Apr 12, 2020 at 01:58:18PM +, Mikolaj Kucharski wrote: > On Sun, Apr 12, 2020 at 03:22:01PM +0200, Patrick Wildt wrote: > > Maybe, make that printf conditional on something weird, like the > > following. Since, if it runs off the DMA buffers map, maybe actlen > > is broken. > > > > if (xfer->length < xfer->actlen) > > printf("ehci_idon: > > Bingo. Kernel panic on first boot. > > ehci_idone: len=4, actlen=-15996, cerr=3, status=0x8c00 type=0x0 > At the end is my debugging patch, so it's easier (hopefully) to understand dmesg output just before kernel panic. Please be aware of wrong indentations, but I wanted to make my changes against existing code minimal. Below diff is with latest Patrick's changes in this thread. So, with below diff crash looks as follows (transcribed by hand): urtwn0: RUN -> INIT urtwn0: INIT -> SCAN urtwn0: begin active scan urtwn0: SCAN -> SCAN urtwn0: sending probe_req to ff:ff:ff:ff:ff:ff on channel 2 mode auto urtwn0: SCAN -> INIT XXX ehci_idone: len=1, actlen=4294951305, xf_actlen=0, status=0x8c00 XXX ehci_idone: x_actlen=4294951304, x1=16000, x2=8 XXX ehci_idone: x_actlen=4294951305, x1=0, x2=1 XXX ehci_idone: x_actlen=4294951305, x1=0, x2=0 ehci_idone: len=1, actlen=4294951305, x_actlen=4294951305, cerr=3, status=0x8c00 type=0x0 panic: _dmamap_sync: ran off map! and the usual panic trace with ehci_idone() starts from that moment. With above output I'm following this code path: 908 if (xfer->length < actlen) { 909 printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, " 910 "status=0x%x\n", xfer->length, actlen, xfer->actlen, status); 911 x_actlen = 0; 912 for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { 913 usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), 914 BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); 915 nstatus = letoh32(sqtd->qtd.qtd_status); 916 if (nstatus & EHCI_QTD_ACTIVE) 917 break; 918 919 status = nstatus; 920 /* halt is ok if descriptor is last, and complete */ 921 if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) && 922 EHCI_QTD_GET_BYTES(status) == 0) 923 status &= ~EHCI_QTD_HALTED; 924 if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) { 925 x1 = EHCI_QTD_GET_BYTES(status); 926 x2 = sqtd->len; 927 x_actlen += x2 - x1; 928 printf("XXX ehci_idone: x_actlen=%u, x1=%u, x2=%u\n", x_actlen, x1, x2); 929 } 930 } 931 } On the first iteration of above for loop x_actlen ends up as 4294951304, in line 927 because EHCI_QTD_GET_BYTES(status) returns 16000, line 925 and sqtd->len returns 8 line 926. So EHCI_QTD_GET_BYTES(status) is bigger than sqtd->len and I don't know is this expected, can this happened. From that moment x_actlen will have large value and that will lead to kernel panic. I actually have seen this before also with urtwn(4), but don't remember was that Pinebook or amd64, but dmesg says: usb_transfer_complete: actlen > len 4294951172 > 4 That was couple of days, or weeks before recent changes to USB stack. That message comes from usb_transfer_complete() file dev/usb/usbdi.c around lines 738 to 744. I'm writing this, so maybe someone can push me in the correct direction how to help debug this further. Index: sys/dev/usb/ehci.c === RCS file: /cvs/src/sys/dev/usb/ehci.c,v retrieving revision 1.210 diff -u -p -u -r1.210 ehci.c --- sys/dev/usb/ehci.c 3 Apr 2020 20:11:47 - 1.210 +++ sys/dev/usb/ehci.c 13 Apr 2020 18:04:51 - @@ -869,7 +869,9 @@ ehci_idone(struct usbd_xfer *xfer) struct ehci_xfer *ex = (struct ehci_xfer *)xfer; struct ehci_soft_qtd *sqtd; u_int32_t status = 0, nstatus = 0; - int actlen, cerr; + u_int32_t actlen, x_actlen; + int cerr; + u_int32_t x1, x2; #ifdef DIAGNOSTIC { @@ -903,8 +905,33 @@ ehci_idone(struct usbd_xfer *xfer) actlen += sqtd->len - EHCI_QTD_GET_BYTES(status); } + if (xfer->length < actlen) { + printf("XXX ehci_idone: len=%u, actlen=%u, xf_actlen=%u, " + "status=0x%x\n", xfer->length, actlen, xfer->actlen, status); + x_actlen = 0; + for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) { + usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd), + BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD); + nstatus = letoh32(sqtd->qtd.qtd_status); + if (nstatus & EHCI_QTD_ACTIVE) + break; + + status = nstatus; + /* halt is ok if descriptor
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sun, Apr 12, 2020 at 03:22:01PM +0200, Patrick Wildt wrote: > Maybe, make that printf conditional on something weird, like the > following. Since, if it runs off the DMA buffers map, maybe actlen > is broken. > > if (xfer->length < xfer->actlen) > printf("ehci_idon: Bingo. Kernel panic on first boot. ehci_idone: len=4, actlen=-15996, cerr=3, status=0x8c00 type=0x0 -- Regards, Mikolaj
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sun, Apr 12, 2020 at 11:51:52AM +, Mikolaj Kucharski wrote: > On Sun, Apr 12, 2020 at 10:05:28AM +0200, Patrick Wildt wrote: > > I'm now trying to reproduce it, running this diff. I'm sure the printfs > > will slow the system down, but it's one way to see what's going on. > > > > If you can reproduce it easily, would you mind trying again, with the > > debug printfs? > > So far I cannot reproduce kernel panic with below diff. At the end of > this email dmesg output after machine boots up and works for few > minutes. I will continue rebooting and hope panic will finally > trigger. Giving how easy it was to trigger before, and as I rebooted few > times Pinebook already I think with below changes it will not panic :/ Thats weird yhough, since it's the same diff as before, but with a printf. Maybe the printf changes the timing and thus the behaviour. > > Patrick > > > > diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c > > index 41851defe4f..00a9af6903f 100644 > > --- a/sys/dev/usb/ehci.c > > +++ b/sys/dev/usb/ehci.c > > @@ -912,13 +912,18 @@ ehci_idone(struct usbd_xfer *xfer) > > xfer->status = USBD_STALLED; > > else > > xfer->status = USBD_IOERROR; /* more info XXX */ > > - } else > > + } else { > > + if (xfer->actlen) { Maybe, make that printf conditional on something weird, like the following. Since, if it runs off the DMA buffers map, maybe actlen is broken. if (xfer->length < xfer->actlen) printf("ehci_idon: > > + printf("ehci_idone: len=%d, actlen=%d, cerr=%d, " > > + "status=0x%x type=0x%x\n", xfer->length, actlen, > > cerr, > > + status, > > UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes)); > > + usb_syncmem(>dmabuf, 0, xfer->actlen, > > + usbd_xfer_isread(xfer) ? > > + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > > + } > > xfer->status = USBD_NORMAL_COMPLETION; > > + } > > > > - if (xfer->actlen) > > - usb_syncmem(>dmabuf, 0, xfer->actlen, > > - usbd_xfer_isread(xfer) ? > > - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > > usb_transfer_complete(xfer); > > DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); > > } > > > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d08 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=1580, actlen=1580, cerr=3, status=0x8c00 type=0x2 > ehci_idone: len=776, actlen=776, cerr=3, status=0x8c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=1580, actlen=1580, cerr=3, status=0x8c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=1208, actlen=1208, cerr=3, status=0x80008c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=1480, actlen=1480, cerr=3, status=0x8c00 type=0x2 > ehci_idone: len=752, actlen=752, cerr=3, status=0x8c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=504, actlen=504, cerr=3, status=0x80008c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=1, actlen=1, cerr=3, status=0x8c00 type=0x0 > ehci_idone: len=2, actlen=2, cerr=3, status=0x8d00 type=0x0 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=4, actlen=4, cerr=3, status=0x8d00 type=0x0 > ehci_idone: len=4, actlen=4, cerr=3, status=0x8d00 type=0x0 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=384, actlen=384, cerr=3, status=0x8c00 type=0x2 > ehci_idone: len=384, actlen=384, cerr=3, status=0x80008c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=448, actlen=448, cerr=3, status=0x8c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 > ehci_idone: len=448, actlen=448, cerr=3, status=0x80008c00 type=0x2 > ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 > ehci_idone: len=384, actlen=384,
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sun, Apr 12, 2020 at 10:05:28AM +0200, Patrick Wildt wrote: > I'm now trying to reproduce it, running this diff. I'm sure the printfs > will slow the system down, but it's one way to see what's going on. > > If you can reproduce it easily, would you mind trying again, with the > debug printfs? So far I cannot reproduce kernel panic with below diff. At the end of this email dmesg output after machine boots up and works for few minutes. I will continue rebooting and hope panic will finally trigger. Giving how easy it was to trigger before, and as I rebooted few times Pinebook already I think with below changes it will not panic :/ > Patrick > > diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c > index 41851defe4f..00a9af6903f 100644 > --- a/sys/dev/usb/ehci.c > +++ b/sys/dev/usb/ehci.c > @@ -912,13 +912,18 @@ ehci_idone(struct usbd_xfer *xfer) > xfer->status = USBD_STALLED; > else > xfer->status = USBD_IOERROR; /* more info XXX */ > - } else > + } else { > + if (xfer->actlen) { > + printf("ehci_idone: len=%d, actlen=%d, cerr=%d, " > + "status=0x%x type=0x%x\n", xfer->length, actlen, > cerr, > + status, > UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes)); > + usb_syncmem(>dmabuf, 0, xfer->actlen, > + usbd_xfer_isread(xfer) ? > + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > + } > xfer->status = USBD_NORMAL_COMPLETION; > + } > > - if (xfer->actlen) > - usb_syncmem(>dmabuf, 0, xfer->actlen, > - usbd_xfer_isread(xfer) ? > - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > usb_transfer_complete(xfer); > DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); > } ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d08 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=1580, actlen=1580, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=776, actlen=776, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=1580, actlen=1580, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=1208, actlen=1208, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=1480, actlen=1480, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=752, actlen=752, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=504, actlen=504, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=1, actlen=1, cerr=3, status=0x8c00 type=0x0 ehci_idone: len=2, actlen=2, cerr=3, status=0x8d00 type=0x0 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=4, actlen=4, cerr=3, status=0x8d00 type=0x0 ehci_idone: len=4, actlen=4, cerr=3, status=0x8d00 type=0x0 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=384, actlen=384, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=384, actlen=384, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=448, actlen=448, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=448, actlen=448, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=384, actlen=384, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=376, actlen=376, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=384, actlen=384, cerr=3, status=0x8c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2 ehci_idone: len=304, actlen=304, cerr=3, status=0x80008c00 type=0x2 ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2 ehci_idone: len=16384, actlen=256,
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sun, Apr 12, 2020 at 09:19:48AM +0200, Patrick Wildt wrote: > On Sun, Apr 12, 2020 at 05:41:15AM +, Mikolaj Kucharski wrote: > > On Sat, Apr 11, 2020 at 09:46:49PM +0200, Patrick Wildt wrote: > > > > OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020 > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > > > panic: _dmamap_sync: ran off map! > > > > > > Oh, wow! That means that the DMA sync is using a longer length than the > > > DMA buffer has segments for. It's very plausible that this happens on > > > the Pinebook, and not on an x86 machine, because on the Pinebook it has > > > to do DMA syncs, while on x86 that's essentially a no-op. > > > > > > > Stopped at panic+0x150: > > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > > 242142 87433 0 0x120 3 sha256 > > > > 192114 36379 0 0x14000 0x200 2K sdmmc2 > > > > db_enter() at panic+0x14c > > > > panic() at ehci_idone+0x1d4 > > > > ehci_idone() at ehci_softintr+0x158 > > > > > > I see! This is the one. I think I made a slight error in this > > > function, since for all the other drivers I made sure to sync only > > > on successful completions, while for ehci(4) I missed this for one. > > > > > > I think this diff should fix it, can you give it a go? > > > > Unfortunately kernel panics in very similar way: > > Ok, too bad. I'll reproduce this on my Pinebook and hopefully have an > update for you soon. > > Patrick I'm now trying to reproduce it, running this diff. I'm sure the printfs will slow the system down, but it's one way to see what's going on. If you can reproduce it easily, would you mind trying again, with the debug printfs? Patrick diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c index 41851defe4f..00a9af6903f 100644 --- a/sys/dev/usb/ehci.c +++ b/sys/dev/usb/ehci.c @@ -912,13 +912,18 @@ ehci_idone(struct usbd_xfer *xfer) xfer->status = USBD_STALLED; else xfer->status = USBD_IOERROR; /* more info XXX */ - } else + } else { + if (xfer->actlen) { + printf("ehci_idone: len=%d, actlen=%d, cerr=%d, " + "status=0x%x type=0x%x\n", xfer->length, actlen, cerr, + status, UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes)); + usb_syncmem(>dmabuf, 0, xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); + } xfer->status = USBD_NORMAL_COMPLETION; + } - if (xfer->actlen) - usb_syncmem(>dmabuf, 0, xfer->actlen, - usbd_xfer_isread(xfer) ? - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); usb_transfer_complete(xfer); DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); }
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sun, Apr 12, 2020 at 05:41:15AM +, Mikolaj Kucharski wrote: > On Sat, Apr 11, 2020 at 09:46:49PM +0200, Patrick Wildt wrote: > > > OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020 > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > panic: _dmamap_sync: ran off map! > > > > Oh, wow! That means that the DMA sync is using a longer length than the > > DMA buffer has segments for. It's very plausible that this happens on > > the Pinebook, and not on an x86 machine, because on the Pinebook it has > > to do DMA syncs, while on x86 that's essentially a no-op. > > > > > Stopped atpanic+0x150: > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > 24214287433 0 0x120 3 sha256 > > > 19211436379 0 0x14000 0x200 2K sdmmc2 > > > db_enter() at panic+0x14c > > > panic() at ehci_idone+0x1d4 > > > ehci_idone() at ehci_softintr+0x158 > > > > I see! This is the one. I think I made a slight error in this > > function, since for all the other drivers I made sure to sync only > > on successful completions, while for ehci(4) I missed this for one. > > > > I think this diff should fix it, can you give it a go? > > Unfortunately kernel panics in very similar way: Ok, too bad. I'll reproduce this on my Pinebook and hopefully have an update for you soon. Patrick > ddb{0}> show panic > _dmamap_sync: ran off map! > > ddb{0}> trace > db_enter() at panic+0x14c > panic() at ehci_idone+0x1d8 > ehci_idone() at ehci_softintr+0x158 > ehci_softintr() at softintr_biglock_wrap+0x1c > softintr_biglock_wrap() at softintr_dispatch+0x9c > softintr_dispatch() at arm_do_pending_intr+0xa8 > arm_do_pending_intr() at ampintc_irq_handle+0x178 > ampintc_irq_handle() at arm_cpu_intr+0x30 > arm_cpu_intr() at handle_el1h_irq+06c > handle_el1h_irq() at sched_idle+0x220 > sched_idle() at proc_trampoline+0x10 > > > mpi@, kettenis@: ok? > > > > Patrick > > > > diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c > > index 41851defe4f..249bb073fca 100644 > > --- a/sys/dev/usb/ehci.c > > +++ b/sys/dev/usb/ehci.c > > @@ -912,13 +912,14 @@ ehci_idone(struct usbd_xfer *xfer) > > xfer->status = USBD_STALLED; > > else > > xfer->status = USBD_IOERROR; /* more info XXX */ > > - } else > > + } else { > > + if (xfer->actlen) > > + usb_syncmem(>dmabuf, 0, xfer->actlen, > > + usbd_xfer_isread(xfer) ? > > + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > > xfer->status = USBD_NORMAL_COMPLETION; > > + } > > > > - if (xfer->actlen) > > - usb_syncmem(>dmabuf, 0, xfer->actlen, > > - usbd_xfer_isread(xfer) ? > > - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > > usb_transfer_complete(xfer); > > DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); > > } > > -- > Regards, > Mikolaj >
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sat, Apr 11, 2020 at 09:46:49PM +0200, Patrick Wildt wrote: > > OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020 > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > panic: _dmamap_sync: ran off map! > > Oh, wow! That means that the DMA sync is using a longer length than the > DMA buffer has segments for. It's very plausible that this happens on > the Pinebook, and not on an x86 machine, because on the Pinebook it has > to do DMA syncs, while on x86 that's essentially a no-op. > > > Stopped at panic+0x150: > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > 242142 87433 0 0x120 3 sha256 > > 192114 36379 0 0x14000 0x200 2K sdmmc2 > > db_enter() at panic+0x14c > > panic() at ehci_idone+0x1d4 > > ehci_idone() at ehci_softintr+0x158 > > I see! This is the one. I think I made a slight error in this > function, since for all the other drivers I made sure to sync only > on successful completions, while for ehci(4) I missed this for one. > > I think this diff should fix it, can you give it a go? Unfortunately kernel panics in very similar way: ddb{0}> show panic _dmamap_sync: ran off map! ddb{0}> trace db_enter() at panic+0x14c panic() at ehci_idone+0x1d8 ehci_idone() at ehci_softintr+0x158 ehci_softintr() at softintr_biglock_wrap+0x1c softintr_biglock_wrap() at softintr_dispatch+0x9c softintr_dispatch() at arm_do_pending_intr+0xa8 arm_do_pending_intr() at ampintc_irq_handle+0x178 ampintc_irq_handle() at arm_cpu_intr+0x30 arm_cpu_intr() at handle_el1h_irq+06c handle_el1h_irq() at sched_idle+0x220 sched_idle() at proc_trampoline+0x10 > mpi@, kettenis@: ok? > > Patrick > > diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c > index 41851defe4f..249bb073fca 100644 > --- a/sys/dev/usb/ehci.c > +++ b/sys/dev/usb/ehci.c > @@ -912,13 +912,14 @@ ehci_idone(struct usbd_xfer *xfer) > xfer->status = USBD_STALLED; > else > xfer->status = USBD_IOERROR; /* more info XXX */ > - } else > + } else { > + if (xfer->actlen) > + usb_syncmem(>dmabuf, 0, xfer->actlen, > + usbd_xfer_isread(xfer) ? > + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > xfer->status = USBD_NORMAL_COMPLETION; > + } > > - if (xfer->actlen) > - usb_syncmem(>dmabuf, 0, xfer->actlen, > - usbd_xfer_isread(xfer) ? > - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > usb_transfer_complete(xfer); > DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); > } -- Regards, Mikolaj
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
> Date: Sat, 11 Apr 2020 21:46:49 +0200 > From: Patrick Wildt > > On Sat, Apr 11, 2020 at 12:41:36PM +, Mikolaj Kucharski wrote: > > On Sat, Apr 11, 2020 at 09:06:57AM +, Mikolaj Kucharski wrote: > > > >Synopsis:kernel panic with message _dmamap_sync: ran off map! > > > >Category:kernel > > > >Environment: > > > System : OpenBSD 6.7 > > > Details : OpenBSD 6.7-beta (GENERIC.MP) #546: Mon Apr 6 12:39:22 > > > MDT 2020 > > > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > > > Architecture: OpenBSD.arm64 > > > Machine : arm64 > > > >Description: > > > Kernel panic. It happens after usage of network via wireless adapter > > > connected via USB, urtwn(4): > > > > > > urtwn0 at uhub4 port 2 configuration 1 interface 0 "Realtek 802.11n WLAN > > > Adapter" rev 2.00/2.00 addr 5 > > > urtwn0: MAC/BB RTL8188CUS, RF 6052 1T1R, address 80:1f:02:4b:6a:6b > > > > > > Currently don't have trace output from ddb as it doesn't have function > > > names, but addresses. > > > Will try to update kernel to latest arm64 snapshot. > > > > > > >How-To-Repeat: > > > Connect urtwn(4) to Pinebook and use network, after a while kernel > > > panics. > > > For me simplest way is to start `sysupgrade -s -f` > > > > > > >Fix: > > > Unknown. > > > > OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020 > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > panic: _dmamap_sync: ran off map! > > Oh, wow! That means that the DMA sync is using a longer length than the > DMA buffer has segments for. It's very plausible that this happens on > the Pinebook, and not on an x86 machine, because on the Pinebook it has > to do DMA syncs, while on x86 that's essentially a no-op. > > > Stopped at panic+0x150: > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > 242142 87433 0 0x120 3 sha256 > > 192114 36379 0 0x14000 0x200 2K sdmmc2 > > db_enter() at panic+0x14c > > panic() at ehci_idone+0x1d4 > > ehci_idone() at ehci_softintr+0x158 > > I see! This is the one. I think I made a slight error in this > function, since for all the other drivers I made sure to sync only > on successful completions, while for ehci(4) I missed this for one. > > I think this diff should fix it, can you give it a go? > > mpi@, kettenis@: ok? If that works, yes ok kettenis@ > diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c > index 41851defe4f..249bb073fca 100644 > --- a/sys/dev/usb/ehci.c > +++ b/sys/dev/usb/ehci.c > @@ -912,13 +912,14 @@ ehci_idone(struct usbd_xfer *xfer) > xfer->status = USBD_STALLED; > else > xfer->status = USBD_IOERROR; /* more info XXX */ > - } else > + } else { > + if (xfer->actlen) > + usb_syncmem(>dmabuf, 0, xfer->actlen, > + usbd_xfer_isread(xfer) ? > + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > xfer->status = USBD_NORMAL_COMPLETION; > + } > > - if (xfer->actlen) > - usb_syncmem(>dmabuf, 0, xfer->actlen, > - usbd_xfer_isread(xfer) ? > - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > usb_transfer_complete(xfer); > DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); > } > >
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sat, Apr 11, 2020 at 12:41:36PM +, Mikolaj Kucharski wrote: > On Sat, Apr 11, 2020 at 09:06:57AM +, Mikolaj Kucharski wrote: > > >Synopsis: kernel panic with message _dmamap_sync: ran off map! > > >Category: kernel > > >Environment: > > System : OpenBSD 6.7 > > Details : OpenBSD 6.7-beta (GENERIC.MP) #546: Mon Apr 6 12:39:22 > > MDT 2020 > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > Architecture: OpenBSD.arm64 > > Machine : arm64 > > >Description: > > Kernel panic. It happens after usage of network via wireless adapter > > connected via USB, urtwn(4): > > > > urtwn0 at uhub4 port 2 configuration 1 interface 0 "Realtek 802.11n WLAN > > Adapter" rev 2.00/2.00 addr 5 > > urtwn0: MAC/BB RTL8188CUS, RF 6052 1T1R, address 80:1f:02:4b:6a:6b > > > > Currently don't have trace output from ddb as it doesn't have function > > names, but addresses. > > Will try to update kernel to latest arm64 snapshot. > > > > >How-To-Repeat: > > Connect urtwn(4) to Pinebook and use network, after a while kernel > > panics. > > For me simplest way is to start `sysupgrade -s -f` > > > > >Fix: > > Unknown. Thanks for the report. I can't help you with the crash but yesterday I updated an amd64 laptop using urtwn and everything worked fine. The problem is something related to arm64. > > OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020 > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > panic: _dmamap_sync: ran off map! > Stopped atpanic+0x150: > TID PID UID PRFLAGS PFLAGS CPU COMMAND > 24214287433 0 0x120 3 sha256 > 19211436379 0 0x14000 0x200 2K sdmmc2 > db_enter() at panic+0x14c > panic() at ehci_idone+0x1d4 > ehci_idone() at ehci_softintr+0x158 > ehci_softintr() at softintr_biglock_wrap+0x1c > softintr_biglock_wrap() at softintr_dispatch+0x9c > softintr_dispatch() at arm_do_pending_intr+0xa8 > arm_do_pending_intr() at ampintc_irq_handle+0x178 > > ddb{0}> trace > db_enter() at panic+0x14c > panic() at ehci_idone+0x1d4 > ehci_idone() at ehci_softintr+0x158 > ehci_softintr() at softintr_biglock_wrap+0x1c > softintr_biglock_wrap() at softintr_dispatch+0x9c > softintr_dispatch() at arm_do_pending_intr+0xa8 > arm_do_pending_intr() at ampintc_irq_handle+0x178 > ampintc_irq_handle() at arm_cpu_intr+0x30 > arm_cpu_intr() at handle_el1h_irq+06c > handle_el1h_irq() at sched_idle+0x220 > sched_idle() at proc_trampoline+0x10 > > ddb{0}> machine ddb 1 > Stopped atampintc_ipi_ddb+0x1c: db_enter() at ampintc_ipi_ddb+0x18: > ampintc_ipi_ddb() at arm_cpu_intr+0x30 > arm_cpu_intr() at handle_el1h_irq+0x6c > handle_el1h_irq() at sched_idle+0x220 > sched_idle() at proc_trampoline+0x10 > > ddb{1}> trace > db_enter() at ampintc_ipi_ddb+0x18 > ampintc_ipi_ddb() at arm_cpu_intr+0x30 > arm_cpu_intr() at handle_el1h_irq+0x6c > handle_el1h_irq() at sched_idle+0x220 > sched_idle() at proc_trampoline+0x10 > > ddb{1}> machine ddb 2 > Stopped atampintc_ipi_ddb+0x1c: db_enter() at ampintc_ipi_ddb+0x18: > ampintc_ipi_ddb() at arm_cpu_intr+0x30 > arm_cpu_intr() at handle_el1h_irq+0x6c > handle_el1h_irq() at sximmc_exec_command+0x240 > sximmc_exec_command() at sdmmc_mmc_command+0x38 > sdmmc_mmc_command() at sdmmc_mem_read_block_subr+0xa4 > sdmmc_mem_read_block_subr() at sdmmc_mem_read_block+0x11c > > ddb{2}> trace > db_enter() at ampintc_ipi_ddb+0x18 > ampintc_ipi_ddb() at arm_cpu_intr+0x30 > arm_cpu_intr() at handle_el1h_irq+0x6c > handle_el1h_irq() at sximmc_exec_command+0x240 > sximmc_exec_command() at sdmmc_mmc_command+0x38 > sdmmc_mmc_command() at sdmmc_mem_read_block_subr+0xa4 > sdmmc_mem_read_block_subr() at sdmmc_mem_read_block+0x11c > sdmmc_mem_read_block() at sdmmc_complete_xs+0x68 > sdmmc_complete_xs() at sdmmc_task_thread+0x13c > sdmmc_task_thread() at proc_trampoline+0x10 > > ddb{2}> machine ddb 3 > Stopped atampintc_ipi_ddb+0x1c: db_enter() at ampintc_ipi_ddb+0x18: > ampintc_ipi_ddb() at arm_cpu_intr+0x30 > arm_cpu_intr() at handle_el1h_irq+0x6c > handle_el1h_irq() at tsleep+0x13c > tsleep() at getblk+0xd8 > getblk() at bread_cluster+0x48 > bread_cluster() at ffs_read+0x190 > > ddb{3}> trace > db_enter() at ampintc_ipi_ddb+0x18 > ampintc_ipi_ddb() at arm_cpu_intr+0x30 > arm_cpu_intr() at handle_el1h_irq+0x6c > handle_el1h_irq() at tsleep+0x13c > tsleep() at getblk+0xd8 > getblk() at bread_cluster+0x48 > bread_cluster() at ffs_read+0x190 > ffs_read() at VOP_READ+0x34 > VOP_READ() at vn_read+0xac > vn_read() at dofilereadv+0x114 > dofilereadv() at sys_read+0x54 > sys_read() at svc_handler+0x2c8 > svc_handler() at do_el0_sync+0x120 > do_el0_sync() at handle_el0_sync+0x74 > handle_el0_sync() at 0xb6cbfb550 > --- trap --- > > > Above is transcribed by hand, so there may be typos. Sometimes I cannot > reproduce this panic, but rebooting Pinebook to different version of
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sat, Apr 11, 2020 at 12:41:36PM +, Mikolaj Kucharski wrote: > On Sat, Apr 11, 2020 at 09:06:57AM +, Mikolaj Kucharski wrote: > > >Synopsis: kernel panic with message _dmamap_sync: ran off map! > > >Category: kernel > > >Environment: > > System : OpenBSD 6.7 > > Details : OpenBSD 6.7-beta (GENERIC.MP) #546: Mon Apr 6 12:39:22 > > MDT 2020 > > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > > > Architecture: OpenBSD.arm64 > > Machine : arm64 > > >Description: > > Kernel panic. It happens after usage of network via wireless adapter > > connected via USB, urtwn(4): > > > > urtwn0 at uhub4 port 2 configuration 1 interface 0 "Realtek 802.11n WLAN > > Adapter" rev 2.00/2.00 addr 5 > > urtwn0: MAC/BB RTL8188CUS, RF 6052 1T1R, address 80:1f:02:4b:6a:6b > > > > Currently don't have trace output from ddb as it doesn't have function > > names, but addresses. > > Will try to update kernel to latest arm64 snapshot. > > > > >How-To-Repeat: > > Connect urtwn(4) to Pinebook and use network, after a while kernel > > panics. > > For me simplest way is to start `sysupgrade -s -f` > > > > >Fix: > > Unknown. > > OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020 > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > panic: _dmamap_sync: ran off map! Oh, wow! That means that the DMA sync is using a longer length than the DMA buffer has segments for. It's very plausible that this happens on the Pinebook, and not on an x86 machine, because on the Pinebook it has to do DMA syncs, while on x86 that's essentially a no-op. > Stopped atpanic+0x150: > TID PID UID PRFLAGS PFLAGS CPU COMMAND > 24214287433 0 0x120 3 sha256 > 19211436379 0 0x14000 0x200 2K sdmmc2 > db_enter() at panic+0x14c > panic() at ehci_idone+0x1d4 > ehci_idone() at ehci_softintr+0x158 I see! This is the one. I think I made a slight error in this function, since for all the other drivers I made sure to sync only on successful completions, while for ehci(4) I missed this for one. I think this diff should fix it, can you give it a go? mpi@, kettenis@: ok? Patrick diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c index 41851defe4f..249bb073fca 100644 --- a/sys/dev/usb/ehci.c +++ b/sys/dev/usb/ehci.c @@ -912,13 +912,14 @@ ehci_idone(struct usbd_xfer *xfer) xfer->status = USBD_STALLED; else xfer->status = USBD_IOERROR; /* more info XXX */ - } else + } else { + if (xfer->actlen) + usb_syncmem(>dmabuf, 0, xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); xfer->status = USBD_NORMAL_COMPLETION; + } - if (xfer->actlen) - usb_syncmem(>dmabuf, 0, xfer->actlen, - usbd_xfer_isread(xfer) ? - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); usb_transfer_complete(xfer); DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); }
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sat, Apr 11, 2020 at 08:40:09PM +0200, Juan Francisco Cantero Hurtado wrote: > Thanks for the report. I can't help you with the crash but yesterday I > updated an amd64 laptop using urtwn and everything worked fine. The > problem is something related to arm64. Yeah, I'm trying to reproduce this on amd64 and RPi3 without luck. RPi3 is not using/having EHCI, like Pinebook and my testing amd64 urtwn(4) is attached via xHCI, so also not the same code path like on Pinebook. -- Regards, Mikolaj
Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
On Sat, Apr 11, 2020 at 09:06:57AM +, Mikolaj Kucharski wrote: > >Synopsis:kernel panic with message _dmamap_sync: ran off map! > >Category:kernel > >Environment: > System : OpenBSD 6.7 > Details : OpenBSD 6.7-beta (GENERIC.MP) #546: Mon Apr 6 12:39:22 > MDT 2020 > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > > Architecture: OpenBSD.arm64 > Machine : arm64 > >Description: > Kernel panic. It happens after usage of network via wireless adapter > connected via USB, urtwn(4): > > urtwn0 at uhub4 port 2 configuration 1 interface 0 "Realtek 802.11n WLAN > Adapter" rev 2.00/2.00 addr 5 > urtwn0: MAC/BB RTL8188CUS, RF 6052 1T1R, address 80:1f:02:4b:6a:6b > > Currently don't have trace output from ddb as it doesn't have function names, > but addresses. > Will try to update kernel to latest arm64 snapshot. > > >How-To-Repeat: > Connect urtwn(4) to Pinebook and use network, after a while kernel > panics. > For me simplest way is to start `sysupgrade -s -f` > > >Fix: > Unknown. OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020 dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP panic: _dmamap_sync: ran off map! Stopped at panic+0x150: TID PID UID PRFLAGS PFLAGS CPU COMMAND 242142 87433 0 0x120 3 sha256 192114 36379 0 0x14000 0x200 2K sdmmc2 db_enter() at panic+0x14c panic() at ehci_idone+0x1d4 ehci_idone() at ehci_softintr+0x158 ehci_softintr() at softintr_biglock_wrap+0x1c softintr_biglock_wrap() at softintr_dispatch+0x9c softintr_dispatch() at arm_do_pending_intr+0xa8 arm_do_pending_intr() at ampintc_irq_handle+0x178 ddb{0}> trace db_enter() at panic+0x14c panic() at ehci_idone+0x1d4 ehci_idone() at ehci_softintr+0x158 ehci_softintr() at softintr_biglock_wrap+0x1c softintr_biglock_wrap() at softintr_dispatch+0x9c softintr_dispatch() at arm_do_pending_intr+0xa8 arm_do_pending_intr() at ampintc_irq_handle+0x178 ampintc_irq_handle() at arm_cpu_intr+0x30 arm_cpu_intr() at handle_el1h_irq+06c handle_el1h_irq() at sched_idle+0x220 sched_idle() at proc_trampoline+0x10 ddb{0}> machine ddb 1 Stopped at ampintc_ipi_ddb+0x1c: db_enter() at ampintc_ipi_ddb+0x18: ampintc_ipi_ddb() at arm_cpu_intr+0x30 arm_cpu_intr() at handle_el1h_irq+0x6c handle_el1h_irq() at sched_idle+0x220 sched_idle() at proc_trampoline+0x10 ddb{1}> trace db_enter() at ampintc_ipi_ddb+0x18 ampintc_ipi_ddb() at arm_cpu_intr+0x30 arm_cpu_intr() at handle_el1h_irq+0x6c handle_el1h_irq() at sched_idle+0x220 sched_idle() at proc_trampoline+0x10 ddb{1}> machine ddb 2 Stopped at ampintc_ipi_ddb+0x1c: db_enter() at ampintc_ipi_ddb+0x18: ampintc_ipi_ddb() at arm_cpu_intr+0x30 arm_cpu_intr() at handle_el1h_irq+0x6c handle_el1h_irq() at sximmc_exec_command+0x240 sximmc_exec_command() at sdmmc_mmc_command+0x38 sdmmc_mmc_command() at sdmmc_mem_read_block_subr+0xa4 sdmmc_mem_read_block_subr() at sdmmc_mem_read_block+0x11c ddb{2}> trace db_enter() at ampintc_ipi_ddb+0x18 ampintc_ipi_ddb() at arm_cpu_intr+0x30 arm_cpu_intr() at handle_el1h_irq+0x6c handle_el1h_irq() at sximmc_exec_command+0x240 sximmc_exec_command() at sdmmc_mmc_command+0x38 sdmmc_mmc_command() at sdmmc_mem_read_block_subr+0xa4 sdmmc_mem_read_block_subr() at sdmmc_mem_read_block+0x11c sdmmc_mem_read_block() at sdmmc_complete_xs+0x68 sdmmc_complete_xs() at sdmmc_task_thread+0x13c sdmmc_task_thread() at proc_trampoline+0x10 ddb{2}> machine ddb 3 Stopped at ampintc_ipi_ddb+0x1c: db_enter() at ampintc_ipi_ddb+0x18: ampintc_ipi_ddb() at arm_cpu_intr+0x30 arm_cpu_intr() at handle_el1h_irq+0x6c handle_el1h_irq() at tsleep+0x13c tsleep() at getblk+0xd8 getblk() at bread_cluster+0x48 bread_cluster() at ffs_read+0x190 ddb{3}> trace db_enter() at ampintc_ipi_ddb+0x18 ampintc_ipi_ddb() at arm_cpu_intr+0x30 arm_cpu_intr() at handle_el1h_irq+0x6c handle_el1h_irq() at tsleep+0x13c tsleep() at getblk+0xd8 getblk() at bread_cluster+0x48 bread_cluster() at ffs_read+0x190 ffs_read() at VOP_READ+0x34 VOP_READ() at vn_read+0xac vn_read() at dofilereadv+0x114 dofilereadv() at sys_read+0x54 sys_read() at svc_handler+0x2c8 svc_handler() at do_el0_sync+0x120 do_el0_sync() at handle_el0_sync+0x74 handle_el0_sync() at 0xb6cbfb550 --- trap --- Above is transcribed by hand, so there may be typos. Sometimes I cannot reproduce this panic, but rebooting Pinebook to different version of the same kernel (kernel relinking) makes the problem trigger easily. After testing, the easiest way to reproduce it is: 1) configure wireless: # cat /etc/hostname.urtwn0 debug nwid access-point-name wpakey secret-key inet6 autoconf dhcp 2) configure hotplugd(8) attach script: # cat /etc/hotplug/attach #!/bin/sh logger -t "$0[$$]" -- "$@" if [ "$1" -eq 3 ] then ifconfig "$2" | logger -t "ifconfig[$$]" if [ -r "/etc/hostname.$2" ] then
OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!
>Synopsis: kernel panic with message _dmamap_sync: ran off map! >Category: kernel >Environment: System : OpenBSD 6.7 Details : OpenBSD 6.7-beta (GENERIC.MP) #546: Mon Apr 6 12:39:22 MDT 2020 dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP Architecture: OpenBSD.arm64 Machine : arm64 >Description: Kernel panic. It happens after usage of network via wireless adapter connected via USB, urtwn(4): urtwn0 at uhub4 port 2 configuration 1 interface 0 "Realtek 802.11n WLAN Adapter" rev 2.00/2.00 addr 5 urtwn0: MAC/BB RTL8188CUS, RF 6052 1T1R, address 80:1f:02:4b:6a:6b Currently don't have trace output from ddb as it doesn't have function names, but addresses. Will try to update kernel to latest arm64 snapshot. >How-To-Repeat: Connect urtwn(4) to Pinebook and use network, after a while kernel panics. For me simplest way is to start `sysupgrade -s -f` >Fix: Unknown. dmesg: OpenBSD 6.7-beta (GENERIC.MP) #546: Mon Apr 6 12:39:22 MDT 2020 dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP real mem = 2015850496 (1922MB) avail mem = 1923846144 (1834MB) mainbus0 at root: Pinebook cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4 cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache cpu0: 512KB 64b/line 16-way L2 cache efi0 at mainbus0: UEFI 2.7 efi0: Das U-Boot rev 0x20190400 apm0 at mainbus0 psci0 at mainbus0: PSCI 0.2 "osc24M_clk" at mainbus0 not configured "osc32k_clk" at mainbus0 not configured "internal-osc-clk" at mainbus0 not configured "sound_spdif" at mainbus0 not configured "spdif-out" at mainbus0 not configured agtimer0 at mainbus0: tick rate 24000 KHz simplebus0 at mainbus0: "soc" sxisyscon0 at simplebus0 sxisid0 at simplebus0 sxiccmu0 at simplebus0 sxipio0 at simplebus0: 103 pins ampintc0 at simplebus0 nirq 224, ncpu 4 ipi: 0, 1: "interrupt-controller" sxirtc0 at simplebus0 sxiccmu1 at simplebus0 sxipio1 at simplebus0: 13 pins sxirsb0 at simplebus0 axppmic0 at sxirsb0 addr 0x3a3: AXP803 "de2" at simplebus0 not configured "dma-controller" at simplebus0 not configured "lcd-controller" at simplebus0 not configured "lcd-controller" at simplebus0 not configured sximmc0 at simplebus0 sdmmc0 at sximmc0: 4-bit, sd high-speed, mmc high-speed, dma sximmc1 at simplebus0 sdmmc1 at sximmc1: 4-bit, sd high-speed, mmc high-speed, dma sximmc2 at simplebus0 sdmmc2 at sximmc2: 8-bit, sd high-speed, mmc high-speed, dma "phy" at simplebus0 not configured ehci0 at simplebus0 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1 ohci0 at simplebus0: version 1.0 ehci1 at simplebus0 usb1 at ehci1: USB revision 2.0 uhub1 at usb1 configuration 1 interface 0 "Generic EHCI root hub" rev 2.00/1.00 addr 1 ohci1 at simplebus0: version 1.0 com0 at simplebus0sxiccmu_ccu_reset: 0x002e : ns16550, no working fifo sxipwm0 at simplebus0 "hdmi-phy" at simplebus0 not configured "interrupt-controller" at simplebus0 not configured sxitwi0 at simplebus0 iic0 at sxitwi0 "analogix,anx6345" at iic0 addr 0x38 not configured sxidog0 at simplebus0 gpio0 at sxipio0: 32 pins gpio1 at sxipio0: 32 pins gpio2 at sxipio0: 32 pins gpio3 at sxipio0: 32 pins gpio4 at sxipio0: 32 pins gpio5 at sxipio0: 32 pins gpio6 at sxipio0: 32 pins gpio7 at sxipio0: 32 pins gpio8 at sxipio1: 32 pins usb2 at ohci0: USB revision 1.0 uhub2 at usb2 configuration 1 interface 0 "Generic OHCI root hub" rev 1.00/1.00 addr 1 usb3 at ohci1: USB revision 1.0 uhub3 at usb3 configuration 1 interface 0 "Generic OHCI root hub" rev 1.00/1.00 addr 1 "regulator" at mainbus0 not configured pwmbl0 at mainbus0 "gpio_keys" at mainbus0 not configured "vcc3v3" at mainbus0 not configured "wifi_pwrseq" at mainbus0 not configured "binman" at mainbus0 not configured simplefb0 at mainbus0: 1920x1080, 32bpp wsdisplay0 at simplefb0 mux 1: console (std, vt100 emulation) wsdisplay0: screen 1-5 added (std, vt100 emulation) cpu1 at mainbus0 mpidr 1: ARM Cortex-A53 r0p4 cpu1: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache cpu1: 512KB 64b/line 16-way L2 cache cpu2 at mainbus0 mpidr 2: ARM Cortex-A53 r0p4 cpu2: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache cpu2: 512KB 64b/line 16-way L2 cache cpu3 at mainbus0 mpidr 3: ARM Cortex-A53 r0p4 cpu3: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache cpu3: 512KB 64b/line 16-way L2 cache manufacturer 0x024c, product 0xb703 at sdmmc1 function 1 not configured scsibus0 at sdmmc0: 2 targets, initiator 0 sd0 at scsibus0 targ 1 lun 0: removable sd0: 244016MB, 512 bytes/sector, 499744768 sectors scsibus1 at sdmmc2: 2 targets, initiator 0 sd1 at scsibus1 targ 1 lun 0: removable sd1: 14910MB, 512 bytes/sector, 30535680 sectors uhub4 at uhub1 port 1 configuration 1 interface 0 "Genesys Logic USB2.0 Hub" rev 2.00/88.32 addr 2 uhidev0 at uhub4 port 1 configuration 1 interface 0