Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
On Fri, Nov 27, 2020 at 12:57:02PM +, Mikolaj Kucharski wrote: > I think something as simple as below would be okay. If requested I can > put in DPRINTFN()s based on current printf()s, like I proposed in > earlier diff in this thread. However more important part is, that I > think DIAGNOSTIC ifdef should be removed as rest of the code, which > relies on `if (curlen > len) curlen = len;` is not enclosed with > `#ifdef DIAGNOSTIC` Right. That code should be outside of DIAGNOSTIC. Though I would leave the printf's in as DPRINTF's for the time being. If you can send such a diff I'm fine. > Index: dev/usb/ehci.c > === > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > retrieving revision 1.212 > diff -u -p -u -r1.212 ehci.c > --- dev/usb/ehci.c23 Oct 2020 20:25:35 - 1.212 > +++ dev/usb/ehci.c27 Nov 2020 10:16:23 - > @@ -2393,16 +2406,10 @@ ehci_alloc_sqtd_chain(struct ehci_softc > /* must use multiple TDs, fill as much as possible. */ > curlen = EHCI_QTD_NBUFFERS * EHCI_PAGE_SIZE - >EHCI_PAGE_OFFSET(dataphys); > -#ifdef DIAGNOSTIC > - if (curlen > len) { > - printf("ehci_alloc_sqtd_chain: curlen=%u " > - "len=%u offs=0x%x\n", curlen, len, > - EHCI_PAGE_OFFSET(dataphys)); > - printf("lastpage=0x%x page=0x%x phys=0x%x\n", > - dataphyslastpage, dataphyspage, dataphys); > + > + if (curlen > len) > curlen = len; > - } > -#endif > + > /* the length must be a multiple of the max size */ > curlen -= curlen % mps; > DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, " > > > On Sun, Nov 22, 2020 at 01:36:10AM +, Mikolaj Kucharski wrote: > > Hi, > > > > Whould below diff be okay, or just simple: > > > > if (curlen > len) > > curlen = len; > > > > be more appropriate here? > > > > On Wed, Nov 11, 2020 at 09:02:49AM +, Mikolaj Kucharski wrote: > > > On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote: > > > > Now you have on M less in your tree checkout :-) > > > > Thanks for tracking this down. > > > > > > There is one more change, which I would consider. It was visible after I > > > switched back to official snapshot kernel. Now that kernel is not > > > panicing, when the specific code path from this email thread is executed > > > it prints: > > > > > > ehci_alloc_sqtd_chain: curlen=20480 len=0 offs=0x0 > > > lastpage=0xcfe66000 page=0xcfe67000 phys=0xcfe67000 > > > > > > and I think this is not needed by default any more, so I have this diff: > > > > > > Index: dev/usb/ehci.c > > > === > > > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > > > retrieving revision 1.212 > > > diff -u -p -u -r1.212 ehci.c > > > --- dev/usb/ehci.c23 Oct 2020 20:25:35 - 1.212 > > > +++ dev/usb/ehci.c11 Nov 2020 08:55:01 - > > > @@ -2395,11 +2408,11 @@ ehci_alloc_sqtd_chain(struct ehci_softc > > >EHCI_PAGE_OFFSET(dataphys); > > > #ifdef DIAGNOSTIC > > > if (curlen > len) { > > > - printf("ehci_alloc_sqtd_chain: curlen=%u " > > > + DPRINTFN(1,("ehci_alloc_sqtd_chain: curlen=%u " > > > "len=%u offs=0x%x\n", curlen, len, > > > - EHCI_PAGE_OFFSET(dataphys)); > > > - printf("lastpage=0x%x page=0x%x phys=0x%x\n", > > > - dataphyslastpage, dataphyspage, dataphys); > > > + EHCI_PAGE_OFFSET(dataphys))); > > > + DPRINTFN(1,("lastpage=0x%x page=0x%x > > > phys=0x%x\n", > > > + dataphyslastpage, dataphyspage, dataphys)); > > > curlen = len; > > > } > > > #endif > > > > > > to mute those messages. I'm also wondering could above be just as simple > > > as: > > > > > > if (curlen > len) { > > > curlen = len; > > > > > > and to drop completly above printf()s / DPRINTFN()s as for me they > > > didn't bring a lot of troubleshooting value. Dunno. Anyway one way or > > > another muting those I think would be good. > > > > > > > > > > On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote: > > > > > > > > > Honestly, I haven't spent much time to investigate how the curlen = 0 > > > > > is > > > > > getting generated exactly, because for me it will be very difficult to > > > > > understand that without the hardware on my side re-producing the same. > > > > > > > > > > But I had look when the code was introduced to
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
I think something as simple as below would be okay. If requested I can put in DPRINTFN()s based on current printf()s, like I proposed in earlier diff in this thread. However more important part is, that I think DIAGNOSTIC ifdef should be removed as rest of the code, which relies on `if (curlen > len) curlen = len;` is not enclosed with `#ifdef DIAGNOSTIC` Index: dev/usb/ehci.c === RCS file: /cvs/src/sys/dev/usb/ehci.c,v retrieving revision 1.212 diff -u -p -u -r1.212 ehci.c --- dev/usb/ehci.c 23 Oct 2020 20:25:35 - 1.212 +++ dev/usb/ehci.c 27 Nov 2020 10:16:23 - @@ -2393,16 +2406,10 @@ ehci_alloc_sqtd_chain(struct ehci_softc /* must use multiple TDs, fill as much as possible. */ curlen = EHCI_QTD_NBUFFERS * EHCI_PAGE_SIZE - EHCI_PAGE_OFFSET(dataphys); -#ifdef DIAGNOSTIC - if (curlen > len) { - printf("ehci_alloc_sqtd_chain: curlen=%u " - "len=%u offs=0x%x\n", curlen, len, - EHCI_PAGE_OFFSET(dataphys)); - printf("lastpage=0x%x page=0x%x phys=0x%x\n", - dataphyslastpage, dataphyspage, dataphys); + + if (curlen > len) curlen = len; - } -#endif + /* the length must be a multiple of the max size */ curlen -= curlen % mps; DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, " On Sun, Nov 22, 2020 at 01:36:10AM +, Mikolaj Kucharski wrote: > Hi, > > Whould below diff be okay, or just simple: > > if (curlen > len) > curlen = len; > > be more appropriate here? > > On Wed, Nov 11, 2020 at 09:02:49AM +, Mikolaj Kucharski wrote: > > On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote: > > > Now you have on M less in your tree checkout :-) > > > Thanks for tracking this down. > > > > There is one more change, which I would consider. It was visible after I > > switched back to official snapshot kernel. Now that kernel is not > > panicing, when the specific code path from this email thread is executed > > it prints: > > > > ehci_alloc_sqtd_chain: curlen=20480 len=0 offs=0x0 > > lastpage=0xcfe66000 page=0xcfe67000 phys=0xcfe67000 > > > > and I think this is not needed by default any more, so I have this diff: > > > > Index: dev/usb/ehci.c > > === > > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > > retrieving revision 1.212 > > diff -u -p -u -r1.212 ehci.c > > --- dev/usb/ehci.c 23 Oct 2020 20:25:35 - 1.212 > > +++ dev/usb/ehci.c 11 Nov 2020 08:55:01 - > > @@ -2395,11 +2408,11 @@ ehci_alloc_sqtd_chain(struct ehci_softc > > EHCI_PAGE_OFFSET(dataphys); > > #ifdef DIAGNOSTIC > > if (curlen > len) { > > - printf("ehci_alloc_sqtd_chain: curlen=%u " > > + DPRINTFN(1,("ehci_alloc_sqtd_chain: curlen=%u " > > "len=%u offs=0x%x\n", curlen, len, > > - EHCI_PAGE_OFFSET(dataphys)); > > - printf("lastpage=0x%x page=0x%x phys=0x%x\n", > > - dataphyslastpage, dataphyspage, dataphys); > > + EHCI_PAGE_OFFSET(dataphys))); > > + DPRINTFN(1,("lastpage=0x%x page=0x%x > > phys=0x%x\n", > > + dataphyslastpage, dataphyspage, dataphys)); > > curlen = len; > > } > > #endif > > > > to mute those messages. I'm also wondering could above be just as simple > > as: > > > > if (curlen > len) { > > curlen = len; > > > > and to drop completly above printf()s / DPRINTFN()s as for me they > > didn't bring a lot of troubleshooting value. Dunno. Anyway one way or > > another muting those I think would be good. > > > > > > > On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote: > > > > > > > Honestly, I haven't spent much time to investigate how the curlen = 0 is > > > > getting generated exactly, because for me it will be very difficult to > > > > understand that without the hardware on my side re-producing the same. > > > > > > > > But I had look when the code was introduced to handle curlen == 0 later > > > > in the function: > > > > > > > > if (iscontrol) { > > > > /* > > > > * adjust the toggle based on the number of packets > > > > * in this qtd > > > > */ > > > > if curlen + mps - 1) / mps) & 1) || curlen == 0) > > > > qtdstatus ^= EHCI_QTD_TOGGLE_MASK; > > > > } > > > > > > > > This was
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
Hi, Whould below diff be okay, or just simple: if (curlen > len) curlen = len; be more appropriate here? On Wed, Nov 11, 2020 at 09:02:49AM +, Mikolaj Kucharski wrote: > On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote: > > Now you have on M less in your tree checkout :-) > > Thanks for tracking this down. > > There is one more change, which I would consider. It was visible after I > switched back to official snapshot kernel. Now that kernel is not > panicing, when the specific code path from this email thread is executed > it prints: > > ehci_alloc_sqtd_chain: curlen=20480 len=0 offs=0x0 > lastpage=0xcfe66000 page=0xcfe67000 phys=0xcfe67000 > > and I think this is not needed by default any more, so I have this diff: > > Index: dev/usb/ehci.c > === > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > retrieving revision 1.212 > diff -u -p -u -r1.212 ehci.c > --- dev/usb/ehci.c23 Oct 2020 20:25:35 - 1.212 > +++ dev/usb/ehci.c11 Nov 2020 08:55:01 - > @@ -2395,11 +2408,11 @@ ehci_alloc_sqtd_chain(struct ehci_softc >EHCI_PAGE_OFFSET(dataphys); > #ifdef DIAGNOSTIC > if (curlen > len) { > - printf("ehci_alloc_sqtd_chain: curlen=%u " > + DPRINTFN(1,("ehci_alloc_sqtd_chain: curlen=%u " > "len=%u offs=0x%x\n", curlen, len, > - EHCI_PAGE_OFFSET(dataphys)); > - printf("lastpage=0x%x page=0x%x phys=0x%x\n", > - dataphyslastpage, dataphyspage, dataphys); > + EHCI_PAGE_OFFSET(dataphys))); > + DPRINTFN(1,("lastpage=0x%x page=0x%x > phys=0x%x\n", > + dataphyslastpage, dataphyspage, dataphys)); > curlen = len; > } > #endif > > to mute those messages. I'm also wondering could above be just as simple > as: > > if (curlen > len) { > curlen = len; > > and to drop completly above printf()s / DPRINTFN()s as for me they > didn't bring a lot of troubleshooting value. Dunno. Anyway one way or > another muting those I think would be good. > > > > On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote: > > > > > Honestly, I haven't spent much time to investigate how the curlen = 0 is > > > getting generated exactly, because for me it will be very difficult to > > > understand that without the hardware on my side re-producing the same. > > > > > > But I had look when the code was introduced to handle curlen == 0 later > > > in the function: > > > > > > if (iscontrol) { > > > /* > > >* adjust the toggle based on the number of packets > > >* in this qtd > > >*/ > > > if curlen + mps - 1) / mps) & 1) || curlen == 0) > > > qtdstatus ^= EHCI_QTD_TOGGLE_MASK; > > > } > > > > > > This was introduced by revision 1.57 of ehci.c 14 years ago: > > > > > > *** > > > > > > If a zero-length bulk or interrupt transfer is requested then assume > > > USBD_FORCE_SHORT_XFER to ensure that we actually build and execute > > > a transfer. > > > > > > Based on changes in FreeBSD rev1.47 > > > > > > *** > > > > > > While the DIAGNOSTIC code to panic at curlen == 0 was introduced with > > > the first commit of ehci.c. I think the revision 1.57 should have > > > removed that DIAGNOSTIC code already, since we obviously can cope > > > with curlen = 0. > > > > > > Given that, your below diff would be OK for me. > > > > > > On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote: > > > > > > > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote: > > > > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote: > > > > > > So time of the crash varies and I guess probably there is no pattern > > > > > > there. Here is new panic report, with some kernel printf()s added. > > > > > > > > > > > > Problem seems to be related that dataphyspage is greater than > > > > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000. > > > > > > The same is observable in all those previous panics. > > > > > > > > > > > > > > > > Here ia nother analysis, without my patches, I think they may be > > > > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values > > > > > of variables which I collected via my debugging effort. > > > > > > > > > > I think I have understanding what is the flow, but not sure what > > > > > code should do in those circumstances. > > > > > > > > > > > > > I didn't get any feedback on this so far. I'm running custom kernel > > > > with additional printf()'s to help me understand the flow. > > > > > > > > I ended up with following diff, which I think is safe to do, > > > > as code handles situation when curlen == 0
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote: > Now you have on M less in your tree checkout :-) > Thanks for tracking this down. There is one more change, which I would consider. It was visible after I switched back to official snapshot kernel. Now that kernel is not panicing, when the specific code path from this email thread is executed it prints: ehci_alloc_sqtd_chain: curlen=20480 len=0 offs=0x0 lastpage=0xcfe66000 page=0xcfe67000 phys=0xcfe67000 and I think this is not needed by default any more, so I have this diff: Index: dev/usb/ehci.c === RCS file: /cvs/src/sys/dev/usb/ehci.c,v retrieving revision 1.212 diff -u -p -u -r1.212 ehci.c --- dev/usb/ehci.c 23 Oct 2020 20:25:35 - 1.212 +++ dev/usb/ehci.c 11 Nov 2020 08:55:01 - @@ -2395,11 +2408,11 @@ ehci_alloc_sqtd_chain(struct ehci_softc EHCI_PAGE_OFFSET(dataphys); #ifdef DIAGNOSTIC if (curlen > len) { - printf("ehci_alloc_sqtd_chain: curlen=%u " + DPRINTFN(1,("ehci_alloc_sqtd_chain: curlen=%u " "len=%u offs=0x%x\n", curlen, len, - EHCI_PAGE_OFFSET(dataphys)); - printf("lastpage=0x%x page=0x%x phys=0x%x\n", - dataphyslastpage, dataphyspage, dataphys); + EHCI_PAGE_OFFSET(dataphys))); + DPRINTFN(1,("lastpage=0x%x page=0x%x phys=0x%x\n", + dataphyslastpage, dataphyspage, dataphys)); curlen = len; } #endif to mute those messages. I'm also wondering could above be just as simple as: if (curlen > len) { curlen = len; and to drop completly above printf()s / DPRINTFN()s as for me they didn't bring a lot of troubleshooting value. Dunno. Anyway one way or another muting those I think would be good. > On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote: > > > Honestly, I haven't spent much time to investigate how the curlen = 0 is > > getting generated exactly, because for me it will be very difficult to > > understand that without the hardware on my side re-producing the same. > > > > But I had look when the code was introduced to handle curlen == 0 later > > in the function: > > > > if (iscontrol) { > > /* > > * adjust the toggle based on the number of packets > > * in this qtd > > */ > > if curlen + mps - 1) / mps) & 1) || curlen == 0) > > qtdstatus ^= EHCI_QTD_TOGGLE_MASK; > > } > > > > This was introduced by revision 1.57 of ehci.c 14 years ago: > > > > *** > > > > If a zero-length bulk or interrupt transfer is requested then assume > > USBD_FORCE_SHORT_XFER to ensure that we actually build and execute > > a transfer. > > > > Based on changes in FreeBSD rev1.47 > > > > *** > > > > While the DIAGNOSTIC code to panic at curlen == 0 was introduced with > > the first commit of ehci.c. I think the revision 1.57 should have > > removed that DIAGNOSTIC code already, since we obviously can cope > > with curlen = 0. > > > > Given that, your below diff would be OK for me. > > > > On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote: > > > > > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote: > > > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote: > > > > > So time of the crash varies and I guess probably there is no pattern > > > > > there. Here is new panic report, with some kernel printf()s added. > > > > > > > > > > Problem seems to be related that dataphyspage is greater than > > > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000. > > > > > The same is observable in all those previous panics. > > > > > > > > > > > > > Here ia nother analysis, without my patches, I think they may be > > > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values > > > > of variables which I collected via my debugging effort. > > > > > > > > I think I have understanding what is the flow, but not sure what > > > > code should do in those circumstances. > > > > > > > > > > I didn't get any feedback on this so far. I'm running custom kernel > > > with additional printf()'s to help me understand the flow. > > > > > > I ended up with following diff, which I think is safe to do, > > > as code handles situation when curlen == 0 further down in > > > ehci_alloc_sqtd_chain() function. My machine runs more than > > > two weeks with below panic() removed and condition of panic > > > is triggered multiple times during uptime and I didn't notice > > > any side effects. > > > > > > However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me > > > anyway. At this stage I don't know how to improve
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote: > Now you have on M less in your tree checkout :-) > Thanks for tracking this down. Awesome, thank you! One down, more to go... -- Regards, Mikolaj
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
Now you have on M less in your tree checkout :-) Thanks for tracking this down. On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote: > Honestly, I haven't spent much time to investigate how the curlen = 0 is > getting generated exactly, because for me it will be very difficult to > understand that without the hardware on my side re-producing the same. > > But I had look when the code was introduced to handle curlen == 0 later > in the function: > > if (iscontrol) { > /* >* adjust the toggle based on the number of packets >* in this qtd >*/ > if curlen + mps - 1) / mps) & 1) || curlen == 0) > qtdstatus ^= EHCI_QTD_TOGGLE_MASK; > } > > This was introduced by revision 1.57 of ehci.c 14 years ago: > > *** > > If a zero-length bulk or interrupt transfer is requested then assume > USBD_FORCE_SHORT_XFER to ensure that we actually build and execute > a transfer. > > Based on changes in FreeBSD rev1.47 > > *** > > While the DIAGNOSTIC code to panic at curlen == 0 was introduced with > the first commit of ehci.c. I think the revision 1.57 should have > removed that DIAGNOSTIC code already, since we obviously can cope > with curlen = 0. > > Given that, your below diff would be OK for me. > > On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote: > > > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote: > > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote: > > > > So time of the crash varies and I guess probably there is no pattern > > > > there. Here is new panic report, with some kernel printf()s added. > > > > > > > > Problem seems to be related that dataphyspage is greater than > > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000. > > > > The same is observable in all those previous panics. > > > > > > > > > > Here ia nother analysis, without my patches, I think they may be > > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values > > > of variables which I collected via my debugging effort. > > > > > > I think I have understanding what is the flow, but not sure what > > > code should do in those circumstances. > > > > > > > I didn't get any feedback on this so far. I'm running custom kernel > > with additional printf()'s to help me understand the flow. > > > > I ended up with following diff, which I think is safe to do, > > as code handles situation when curlen == 0 further down in > > ehci_alloc_sqtd_chain() function. My machine runs more than > > two weeks with below panic() removed and condition of panic > > is triggered multiple times during uptime and I didn't notice > > any side effects. > > > > However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me > > anyway. At this stage I don't know how to improve things, beyond below > > diff. > > > > Any feedback would be appreciated. I don't like running custom kernels, > > so I would like to drop below M from my source checkout. > > > > > > Index: ehci.c > > === > > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > > retrieving revision 1.211 > > diff -u -p -u -r1.211 ehci.c > > --- ehci.c 6 Aug 2020 14:06:12 - 1.211 > > +++ ehci.c 23 Oct 2020 10:00:35 - > > @@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc > > curlen -= curlen % mps; > > DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, " > > "curlen=%u\n", curlen)); > > -#ifdef DIAGNOSTIC > > - if (curlen == 0) > > - panic("ehci_alloc_std: curlen == 0"); > > -#endif > > } > > > > DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x " > > > > -- > > Regards, > > Mikolaj > > >
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
Honestly, I haven't spent much time to investigate how the curlen = 0 is getting generated exactly, because for me it will be very difficult to understand that without the hardware on my side re-producing the same. But I had look when the code was introduced to handle curlen == 0 later in the function: if (iscontrol) { /* * adjust the toggle based on the number of packets * in this qtd */ if curlen + mps - 1) / mps) & 1) || curlen == 0) qtdstatus ^= EHCI_QTD_TOGGLE_MASK; } This was introduced by revision 1.57 of ehci.c 14 years ago: *** If a zero-length bulk or interrupt transfer is requested then assume USBD_FORCE_SHORT_XFER to ensure that we actually build and execute a transfer. Based on changes in FreeBSD rev1.47 *** While the DIAGNOSTIC code to panic at curlen == 0 was introduced with the first commit of ehci.c. I think the revision 1.57 should have removed that DIAGNOSTIC code already, since we obviously can cope with curlen = 0. Given that, your below diff would be OK for me. On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote: > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote: > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote: > > > So time of the crash varies and I guess probably there is no pattern > > > there. Here is new panic report, with some kernel printf()s added. > > > > > > Problem seems to be related that dataphyspage is greater than > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000. > > > The same is observable in all those previous panics. > > > > > > > Here ia nother analysis, without my patches, I think they may be > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values > > of variables which I collected via my debugging effort. > > > > I think I have understanding what is the flow, but not sure what > > code should do in those circumstances. > > > > I didn't get any feedback on this so far. I'm running custom kernel > with additional printf()'s to help me understand the flow. > > I ended up with following diff, which I think is safe to do, > as code handles situation when curlen == 0 further down in > ehci_alloc_sqtd_chain() function. My machine runs more than > two weeks with below panic() removed and condition of panic > is triggered multiple times during uptime and I didn't notice > any side effects. > > However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me > anyway. At this stage I don't know how to improve things, beyond below > diff. > > Any feedback would be appreciated. I don't like running custom kernels, > so I would like to drop below M from my source checkout. > > > Index: ehci.c > === > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > retrieving revision 1.211 > diff -u -p -u -r1.211 ehci.c > --- ehci.c6 Aug 2020 14:06:12 - 1.211 > +++ ehci.c23 Oct 2020 10:00:35 - > @@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc > curlen -= curlen % mps; > DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, " > "curlen=%u\n", curlen)); > -#ifdef DIAGNOSTIC > - if (curlen == 0) > - panic("ehci_alloc_std: curlen == 0"); > -#endif > } > > DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x " > > -- > Regards, > Mikolaj >
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote: > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote: > > So time of the crash varies and I guess probably there is no pattern > > there. Here is new panic report, with some kernel printf()s added. > > > > Problem seems to be related that dataphyspage is greater than > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000. > > The same is observable in all those previous panics. > > > > Here ia nother analysis, without my patches, I think they may be > confusing. I will show flow of ehci_alloc_sqtd_chain() with values > of variables which I collected via my debugging effort. > > I think I have understanding what is the flow, but not sure what > code should do in those circumstances. > I didn't get any feedback on this so far. I'm running custom kernel with additional printf()'s to help me understand the flow. I ended up with following diff, which I think is safe to do, as code handles situation when curlen == 0 further down in ehci_alloc_sqtd_chain() function. My machine runs more than two weeks with below panic() removed and condition of panic is triggered multiple times during uptime and I didn't notice any side effects. However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me anyway. At this stage I don't know how to improve things, beyond below diff. Any feedback would be appreciated. I don't like running custom kernels, so I would like to drop below M from my source checkout. Index: ehci.c === RCS file: /cvs/src/sys/dev/usb/ehci.c,v retrieving revision 1.211 diff -u -p -u -r1.211 ehci.c --- ehci.c 6 Aug 2020 14:06:12 - 1.211 +++ ehci.c 23 Oct 2020 10:00:35 - @@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc curlen -= curlen % mps; DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, " "curlen=%u\n", curlen)); -#ifdef DIAGNOSTIC - if (curlen == 0) - panic("ehci_alloc_std: curlen == 0"); -#endif } DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x " -- Regards, Mikolaj
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote: > So time of the crash varies and I guess probably there is no pattern > there. Here is new panic report, with some kernel printf()s added. > > Problem seems to be related that dataphyspage is greater than > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000. > The same is observable in all those previous panics. > Here ia nother analysis, without my patches, I think they may be confusing. I will show flow of ehci_alloc_sqtd_chain() with values of variables which I collected via my debugging effort. I think I have understanding what is the flow, but not sure what code should do in those circumstances. Values are based on panic: lastpage=0xa0ec000 ehci_page=0xa0ed000 phys=0xa0ed000 ehci_alloc_sqtd_chain(800db000,1000,fd812e8d6990,8000225adfc0,8000225adfc8) at ehci_alloc_sqtd_chain+0x766 but there are also panics with alen=0x2000 ehci_alloc_sqtd_chain(800db000,2000,fd812e8d6aa0,8000225c60e0,8000225c60e8) at ehci_alloc_sqtd_chain+0x729 1 usbd_status 2 ehci_alloc_sqtd_chain(struct ehci_softc *sc, u_int alen, struct usbd_xfer *xfer, 3 struct ehci_soft_qtd **sp, struct ehci_soft_qtd **ep) 4 { 5 struct ehci_soft_qtd *next, *cur; 6 ehci_physaddr_t dataphys, dataphyspage, dataphyslastpage, nextphys; 7 u_int32_t qtdstatus; 8 u_int len, curlen; 9 int mps, i, iscontrol, forceshort; 10 int rd = usbd_xfer_isread(xfer); 11 struct usb_dma *dma = >dmabuf; 12 13 DPRINTFN(alen<4*4096,("ehci_alloc_sqtd_chain: start len=%d\n", alen)); 14 15 len = alen; alen=4096 16 iscontrol = UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes) == 17 UE_CONTROL; iscontrol=0 18 19 dataphys = DMAADDR(dma, 0); 20 dataphyslastpage = EHCI_PAGE(dataphys + len - 1); dataphys=0xa0ec000 dataphyslastpage=0xa0ec000 21 qtdstatus = EHCI_QTD_ACTIVE | 22 EHCI_QTD_SET_PID(rd ? EHCI_QTD_PID_IN : EHCI_QTD_PID_OUT) | 23 EHCI_QTD_SET_CERR(3); /* IOC and BYTES set below */ 24 mps = UGETW(xfer->pipe->endpoint->edesc->wMaxPacketSize); mps=512 25 forceshort = ((xfer->flags & USBD_FORCE_SHORT_XFER) || len == 0) && 26 len % mps == 0; xfer->flags=0x9 USBD_FORCE_SHORT_XFER=0x8 (xfer->flags & USBD_FORCE_SHORT_XFER)=0x8 forceshort=1 27 /* 28 * The control transfer data stage always starts with a toggle of 1. 29 * For other transfers we let the hardware track the toggle state. 30 */ 31 if (iscontrol) 32 qtdstatus |= EHCI_QTD_SET_TOGGLE(1); 33 34 cur = ehci_alloc_sqtd(sc); 35 *sp = cur; 36 if (cur == NULL) 37 goto nomem; 38 39 usb_syncmem(dma, 0, alen, 40 rd ? BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); 41 for (;;) { 42 dataphyspage = EHCI_PAGE(dataphys); dataphyspage=0xa0ec000 43 /* The EHCI hardware can handle at most 5 pages. */ 44 if (dataphyslastpage - dataphyspage < 45 EHCI_QTD_NBUFFERS * EHCI_PAGE_SIZE) { EHCI_QTD_NBUFFERS=5 EHCI_PAGE_SIZE=4096 0 < 5*4096 (20480) -> true 46 /* we can handle it in this QTD */ 47 curlen = len; curlen=4096 48 } else { else branch is not executed 49 /* must use multiple TDs, fill as much as possible. */ 50 curlen = EHCI_QTD_NBUFFERS * EHCI_PAGE_SIZE - 51 EHCI_PAGE_OFFSET(dataphys); 52 #ifdef DIAGNOSTIC 53 if (curlen > len) { 54 printf("ehci_alloc_sqtd_chain: curlen=%u " 55 "len=%u offs=0x%x\n", curlen, len, 56 EHCI_PAGE_OFFSET(dataphys)); 57 printf("lastpage=0x%x page=0x%x phys=0x%x\n", 58 dataphyslastpage, dataphyspage, dataphys); 59 curlen = len; 60 } 61 #endif 62 /* the length must be a multiple of the max size */ 63