Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-11-27 Thread Marcus Glocker
On Fri, Nov 27, 2020 at 12:57:02PM +, Mikolaj Kucharski wrote:

> I think something as simple as below would be okay. If requested I can
> put in DPRINTFN()s based on current printf()s, like I proposed in
> earlier diff in this thread. However more important part is, that I
> think DIAGNOSTIC ifdef should be removed as rest of the code, which
> relies on `if (curlen > len) curlen = len;` is not enclosed with
> `#ifdef DIAGNOSTIC`

Right.  That code should be outside of DIAGNOSTIC.  Though I would
leave the printf's in as DPRINTF's for the time being.  If you can
send such a diff I'm fine.
 
> Index: dev/usb/ehci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> retrieving revision 1.212
> diff -u -p -u -r1.212 ehci.c
> --- dev/usb/ehci.c23 Oct 2020 20:25:35 -  1.212
> +++ dev/usb/ehci.c27 Nov 2020 10:16:23 -
> @@ -2393,16 +2406,10 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
>   /* must use multiple TDs, fill as much as possible. */
>   curlen = EHCI_QTD_NBUFFERS * EHCI_PAGE_SIZE -
>EHCI_PAGE_OFFSET(dataphys);
> -#ifdef DIAGNOSTIC
> - if (curlen > len) {
> - printf("ehci_alloc_sqtd_chain: curlen=%u "
> - "len=%u offs=0x%x\n", curlen, len,
> - EHCI_PAGE_OFFSET(dataphys));
> - printf("lastpage=0x%x page=0x%x phys=0x%x\n",
> - dataphyslastpage, dataphyspage, dataphys);
> +
> + if (curlen > len)
>   curlen = len;
> - }
> -#endif
> +
>   /* the length must be a multiple of the max size */
>   curlen -= curlen % mps;
>   DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, "
> 
> 
> On Sun, Nov 22, 2020 at 01:36:10AM +, Mikolaj Kucharski wrote:
> > Hi,
> > 
> > Whould below diff be okay, or just simple:
> > 
> > if (curlen > len)
> > curlen = len;
> > 
> > be more appropriate here?
> > 
> > On Wed, Nov 11, 2020 at 09:02:49AM +, Mikolaj Kucharski wrote:
> > > On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote:
> > > > Now you have on M less in your tree checkout :-)
> > > > Thanks for tracking this down.
> > > 
> > > There is one more change, which I would consider. It was visible after I
> > > switched back to official snapshot kernel. Now that kernel is not
> > > panicing, when the specific code path from this email thread is executed
> > > it prints:
> > > 
> > > ehci_alloc_sqtd_chain: curlen=20480 len=0 offs=0x0
> > > lastpage=0xcfe66000 page=0xcfe67000 phys=0xcfe67000
> > > 
> > > and I think this is not needed by default any more, so I have this diff:
> > > 
> > > Index: dev/usb/ehci.c
> > > ===
> > > RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> > > retrieving revision 1.212
> > > diff -u -p -u -r1.212 ehci.c
> > > --- dev/usb/ehci.c23 Oct 2020 20:25:35 -  1.212
> > > +++ dev/usb/ehci.c11 Nov 2020 08:55:01 -
> > > @@ -2395,11 +2408,11 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
> > >EHCI_PAGE_OFFSET(dataphys);
> > >  #ifdef DIAGNOSTIC
> > >   if (curlen > len) {
> > > - printf("ehci_alloc_sqtd_chain: curlen=%u "
> > > + DPRINTFN(1,("ehci_alloc_sqtd_chain: curlen=%u "
> > >   "len=%u offs=0x%x\n", curlen, len,
> > > - EHCI_PAGE_OFFSET(dataphys));
> > > - printf("lastpage=0x%x page=0x%x phys=0x%x\n",
> > > - dataphyslastpage, dataphyspage, dataphys);
> > > + EHCI_PAGE_OFFSET(dataphys)));
> > > + DPRINTFN(1,("lastpage=0x%x page=0x%x 
> > > phys=0x%x\n",
> > > + dataphyslastpage, dataphyspage, dataphys));
> > >   curlen = len;
> > >   }
> > >  #endif
> > > 
> > > to mute those messages. I'm also wondering could above be just as simple
> > > as:
> > > 
> > >   if (curlen > len) {
> > >   curlen = len;
> > > 
> > > and to drop completly above printf()s / DPRINTFN()s as for me they
> > > didn't bring a lot of troubleshooting value. Dunno. Anyway one way or
> > > another muting those I think would be good.
> > > 
> > > 
> > > > On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote:
> > > > 
> > > > > Honestly, I haven't spent much time to investigate how the curlen = 0 
> > > > > is
> > > > > getting generated exactly, because for me it will be very difficult to
> > > > > understand that without the hardware on my side re-producing the same.
> > > > > 
> > > > > But I had look when the code was introduced to 

Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-11-27 Thread Mikolaj Kucharski
I think something as simple as below would be okay. If requested I can
put in DPRINTFN()s based on current printf()s, like I proposed in
earlier diff in this thread. However more important part is, that I
think DIAGNOSTIC ifdef should be removed as rest of the code, which
relies on `if (curlen > len) curlen = len;` is not enclosed with
`#ifdef DIAGNOSTIC`


Index: dev/usb/ehci.c
===
RCS file: /cvs/src/sys/dev/usb/ehci.c,v
retrieving revision 1.212
diff -u -p -u -r1.212 ehci.c
--- dev/usb/ehci.c  23 Oct 2020 20:25:35 -  1.212
+++ dev/usb/ehci.c  27 Nov 2020 10:16:23 -
@@ -2393,16 +2406,10 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
/* must use multiple TDs, fill as much as possible. */
curlen = EHCI_QTD_NBUFFERS * EHCI_PAGE_SIZE -
 EHCI_PAGE_OFFSET(dataphys);
-#ifdef DIAGNOSTIC
-   if (curlen > len) {
-   printf("ehci_alloc_sqtd_chain: curlen=%u "
-   "len=%u offs=0x%x\n", curlen, len,
-   EHCI_PAGE_OFFSET(dataphys));
-   printf("lastpage=0x%x page=0x%x phys=0x%x\n",
-   dataphyslastpage, dataphyspage, dataphys);
+
+   if (curlen > len)
curlen = len;
-   }
-#endif
+
/* the length must be a multiple of the max size */
curlen -= curlen % mps;
DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, "


On Sun, Nov 22, 2020 at 01:36:10AM +, Mikolaj Kucharski wrote:
> Hi,
> 
> Whould below diff be okay, or just simple:
> 
>   if (curlen > len)
>   curlen = len;
> 
> be more appropriate here?
> 
> On Wed, Nov 11, 2020 at 09:02:49AM +, Mikolaj Kucharski wrote:
> > On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote:
> > > Now you have on M less in your tree checkout :-)
> > > Thanks for tracking this down.
> > 
> > There is one more change, which I would consider. It was visible after I
> > switched back to official snapshot kernel. Now that kernel is not
> > panicing, when the specific code path from this email thread is executed
> > it prints:
> > 
> > ehci_alloc_sqtd_chain: curlen=20480 len=0 offs=0x0
> > lastpage=0xcfe66000 page=0xcfe67000 phys=0xcfe67000
> > 
> > and I think this is not needed by default any more, so I have this diff:
> > 
> > Index: dev/usb/ehci.c
> > ===
> > RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> > retrieving revision 1.212
> > diff -u -p -u -r1.212 ehci.c
> > --- dev/usb/ehci.c  23 Oct 2020 20:25:35 -  1.212
> > +++ dev/usb/ehci.c  11 Nov 2020 08:55:01 -
> > @@ -2395,11 +2408,11 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
> >  EHCI_PAGE_OFFSET(dataphys);
> >  #ifdef DIAGNOSTIC
> > if (curlen > len) {
> > -   printf("ehci_alloc_sqtd_chain: curlen=%u "
> > +   DPRINTFN(1,("ehci_alloc_sqtd_chain: curlen=%u "
> > "len=%u offs=0x%x\n", curlen, len,
> > -   EHCI_PAGE_OFFSET(dataphys));
> > -   printf("lastpage=0x%x page=0x%x phys=0x%x\n",
> > -   dataphyslastpage, dataphyspage, dataphys);
> > +   EHCI_PAGE_OFFSET(dataphys)));
> > +   DPRINTFN(1,("lastpage=0x%x page=0x%x 
> > phys=0x%x\n",
> > +   dataphyslastpage, dataphyspage, dataphys));
> > curlen = len;
> > }
> >  #endif
> > 
> > to mute those messages. I'm also wondering could above be just as simple
> > as:
> > 
> > if (curlen > len) {
> > curlen = len;
> > 
> > and to drop completly above printf()s / DPRINTFN()s as for me they
> > didn't bring a lot of troubleshooting value. Dunno. Anyway one way or
> > another muting those I think would be good.
> > 
> > 
> > > On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote:
> > > 
> > > > Honestly, I haven't spent much time to investigate how the curlen = 0 is
> > > > getting generated exactly, because for me it will be very difficult to
> > > > understand that without the hardware on my side re-producing the same.
> > > > 
> > > > But I had look when the code was introduced to handle curlen == 0 later
> > > > in the function:
> > > > 
> > > > if (iscontrol) {
> > > > /*
> > > >  * adjust the toggle based on the number of packets
> > > >  * in this qtd
> > > >  */
> > > > if curlen + mps - 1) / mps) & 1) || curlen == 0)
> > > > qtdstatus ^= EHCI_QTD_TOGGLE_MASK;
> > > > }
> > > > 
> > > > This was 

Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-11-21 Thread Mikolaj Kucharski
Hi,

Whould below diff be okay, or just simple:

if (curlen > len)
curlen = len;

be more appropriate here?

On Wed, Nov 11, 2020 at 09:02:49AM +, Mikolaj Kucharski wrote:
> On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote:
> > Now you have on M less in your tree checkout :-)
> > Thanks for tracking this down.
> 
> There is one more change, which I would consider. It was visible after I
> switched back to official snapshot kernel. Now that kernel is not
> panicing, when the specific code path from this email thread is executed
> it prints:
> 
> ehci_alloc_sqtd_chain: curlen=20480 len=0 offs=0x0
> lastpage=0xcfe66000 page=0xcfe67000 phys=0xcfe67000
> 
> and I think this is not needed by default any more, so I have this diff:
> 
> Index: dev/usb/ehci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> retrieving revision 1.212
> diff -u -p -u -r1.212 ehci.c
> --- dev/usb/ehci.c23 Oct 2020 20:25:35 -  1.212
> +++ dev/usb/ehci.c11 Nov 2020 08:55:01 -
> @@ -2395,11 +2408,11 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
>EHCI_PAGE_OFFSET(dataphys);
>  #ifdef DIAGNOSTIC
>   if (curlen > len) {
> - printf("ehci_alloc_sqtd_chain: curlen=%u "
> + DPRINTFN(1,("ehci_alloc_sqtd_chain: curlen=%u "
>   "len=%u offs=0x%x\n", curlen, len,
> - EHCI_PAGE_OFFSET(dataphys));
> - printf("lastpage=0x%x page=0x%x phys=0x%x\n",
> - dataphyslastpage, dataphyspage, dataphys);
> + EHCI_PAGE_OFFSET(dataphys)));
> + DPRINTFN(1,("lastpage=0x%x page=0x%x 
> phys=0x%x\n",
> + dataphyslastpage, dataphyspage, dataphys));
>   curlen = len;
>   }
>  #endif
> 
> to mute those messages. I'm also wondering could above be just as simple
> as:
> 
>   if (curlen > len) {
>   curlen = len;
> 
> and to drop completly above printf()s / DPRINTFN()s as for me they
> didn't bring a lot of troubleshooting value. Dunno. Anyway one way or
> another muting those I think would be good.
> 
> 
> > On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote:
> > 
> > > Honestly, I haven't spent much time to investigate how the curlen = 0 is
> > > getting generated exactly, because for me it will be very difficult to
> > > understand that without the hardware on my side re-producing the same.
> > > 
> > > But I had look when the code was introduced to handle curlen == 0 later
> > > in the function:
> > > 
> > > if (iscontrol) {
> > >   /*
> > >* adjust the toggle based on the number of packets
> > >* in this qtd
> > >*/
> > >   if curlen + mps - 1) / mps) & 1) || curlen == 0)
> > >   qtdstatus ^= EHCI_QTD_TOGGLE_MASK;
> > > }
> > > 
> > > This was introduced by revision 1.57 of ehci.c 14 years ago:
> > > 
> > > ***
> > > 
> > > If a zero-length bulk or interrupt transfer is requested then assume
> > > USBD_FORCE_SHORT_XFER to ensure that we actually build and execute
> > > a transfer.
> > > 
> > > Based on changes in FreeBSD rev1.47
> > > 
> > > ***
> > > 
> > > While the DIAGNOSTIC code to panic at curlen == 0 was introduced with
> > > the first commit of ehci.c.  I think the revision 1.57 should have
> > > removed that DIAGNOSTIC code already, since we obviously can cope
> > > with curlen = 0.
> > > 
> > > Given that, your below diff would be OK for me.
> > > 
> > > On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote:
> > > 
> > > > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote:
> > > > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote:
> > > > > > So time of the crash varies and I guess probably there is no pattern
> > > > > > there. Here is new panic report, with some kernel printf()s added.
> > > > > > 
> > > > > > Problem seems to be related that dataphyspage is greater than
> > > > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000.
> > > > > > The same is observable in all those previous panics.
> > > > > > 
> > > > > 
> > > > > Here ia nother analysis, without my patches, I think they may be
> > > > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values
> > > > > of variables which I collected via my debugging effort.
> > > > > 
> > > > > I think I have understanding what is the flow, but not sure what
> > > > > code should do in those circumstances.
> > > > > 
> > > > 
> > > > I didn't get any feedback on this so far. I'm running custom kernel
> > > > with additional printf()'s to help me understand the flow.
> > > > 
> > > > I ended up with following diff, which I think is safe to do,
> > > > as code handles situation when curlen == 0 

Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-11-11 Thread Mikolaj Kucharski
On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote:
> Now you have on M less in your tree checkout :-)
> Thanks for tracking this down.

There is one more change, which I would consider. It was visible after I
switched back to official snapshot kernel. Now that kernel is not
panicing, when the specific code path from this email thread is executed
it prints:

ehci_alloc_sqtd_chain: curlen=20480 len=0 offs=0x0
lastpage=0xcfe66000 page=0xcfe67000 phys=0xcfe67000

and I think this is not needed by default any more, so I have this diff:

Index: dev/usb/ehci.c
===
RCS file: /cvs/src/sys/dev/usb/ehci.c,v
retrieving revision 1.212
diff -u -p -u -r1.212 ehci.c
--- dev/usb/ehci.c  23 Oct 2020 20:25:35 -  1.212
+++ dev/usb/ehci.c  11 Nov 2020 08:55:01 -
@@ -2395,11 +2408,11 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
 EHCI_PAGE_OFFSET(dataphys);
 #ifdef DIAGNOSTIC
if (curlen > len) {
-   printf("ehci_alloc_sqtd_chain: curlen=%u "
+   DPRINTFN(1,("ehci_alloc_sqtd_chain: curlen=%u "
"len=%u offs=0x%x\n", curlen, len,
-   EHCI_PAGE_OFFSET(dataphys));
-   printf("lastpage=0x%x page=0x%x phys=0x%x\n",
-   dataphyslastpage, dataphyspage, dataphys);
+   EHCI_PAGE_OFFSET(dataphys)));
+   DPRINTFN(1,("lastpage=0x%x page=0x%x 
phys=0x%x\n",
+   dataphyslastpage, dataphyspage, dataphys));
curlen = len;
}
 #endif

to mute those messages. I'm also wondering could above be just as simple
as:

if (curlen > len) {
curlen = len;

and to drop completly above printf()s / DPRINTFN()s as for me they
didn't bring a lot of troubleshooting value. Dunno. Anyway one way or
another muting those I think would be good.


> On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote:
> 
> > Honestly, I haven't spent much time to investigate how the curlen = 0 is
> > getting generated exactly, because for me it will be very difficult to
> > understand that without the hardware on my side re-producing the same.
> > 
> > But I had look when the code was introduced to handle curlen == 0 later
> > in the function:
> > 
> > if (iscontrol) {
> > /*
> >  * adjust the toggle based on the number of packets
> >  * in this qtd
> >  */
> > if curlen + mps - 1) / mps) & 1) || curlen == 0)
> > qtdstatus ^= EHCI_QTD_TOGGLE_MASK;
> > }
> > 
> > This was introduced by revision 1.57 of ehci.c 14 years ago:
> > 
> > ***
> > 
> > If a zero-length bulk or interrupt transfer is requested then assume
> > USBD_FORCE_SHORT_XFER to ensure that we actually build and execute
> > a transfer.
> > 
> > Based on changes in FreeBSD rev1.47
> > 
> > ***
> > 
> > While the DIAGNOSTIC code to panic at curlen == 0 was introduced with
> > the first commit of ehci.c.  I think the revision 1.57 should have
> > removed that DIAGNOSTIC code already, since we obviously can cope
> > with curlen = 0.
> > 
> > Given that, your below diff would be OK for me.
> > 
> > On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote:
> > 
> > > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote:
> > > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote:
> > > > > So time of the crash varies and I guess probably there is no pattern
> > > > > there. Here is new panic report, with some kernel printf()s added.
> > > > > 
> > > > > Problem seems to be related that dataphyspage is greater than
> > > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000.
> > > > > The same is observable in all those previous panics.
> > > > > 
> > > > 
> > > > Here ia nother analysis, without my patches, I think they may be
> > > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values
> > > > of variables which I collected via my debugging effort.
> > > > 
> > > > I think I have understanding what is the flow, but not sure what
> > > > code should do in those circumstances.
> > > > 
> > > 
> > > I didn't get any feedback on this so far. I'm running custom kernel
> > > with additional printf()'s to help me understand the flow.
> > > 
> > > I ended up with following diff, which I think is safe to do,
> > > as code handles situation when curlen == 0 further down in
> > > ehci_alloc_sqtd_chain() function. My machine runs more than
> > > two weeks with below panic() removed and condition of panic
> > > is triggered multiple times during uptime and I didn't notice
> > > any side effects.
> > > 
> > > However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me
> > > anyway. At this stage I don't know how to improve 

Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-10-25 Thread Mikolaj Kucharski
On Sat, Oct 24, 2020 at 09:08:45AM +0200, Marcus Glocker wrote:
> Now you have on M less in your tree checkout :-)
> Thanks for tracking this down.

Awesome, thank you! One down, more to go...

-- 
Regards,
 Mikolaj



Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-10-24 Thread Marcus Glocker
Now you have on M less in your tree checkout :-)
Thanks for tracking this down.

On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote:

> Honestly, I haven't spent much time to investigate how the curlen = 0 is
> getting generated exactly, because for me it will be very difficult to
> understand that without the hardware on my side re-producing the same.
> 
> But I had look when the code was introduced to handle curlen == 0 later
> in the function:
> 
> if (iscontrol) {
>   /*
>* adjust the toggle based on the number of packets
>* in this qtd
>*/
>   if curlen + mps - 1) / mps) & 1) || curlen == 0)
>   qtdstatus ^= EHCI_QTD_TOGGLE_MASK;
> }
> 
> This was introduced by revision 1.57 of ehci.c 14 years ago:
> 
> ***
> 
> If a zero-length bulk or interrupt transfer is requested then assume
> USBD_FORCE_SHORT_XFER to ensure that we actually build and execute
> a transfer.
> 
> Based on changes in FreeBSD rev1.47
> 
> ***
> 
> While the DIAGNOSTIC code to panic at curlen == 0 was introduced with
> the first commit of ehci.c.  I think the revision 1.57 should have
> removed that DIAGNOSTIC code already, since we obviously can cope
> with curlen = 0.
> 
> Given that, your below diff would be OK for me.
> 
> On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote:
> 
> > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote:
> > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote:
> > > > So time of the crash varies and I guess probably there is no pattern
> > > > there. Here is new panic report, with some kernel printf()s added.
> > > > 
> > > > Problem seems to be related that dataphyspage is greater than
> > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000.
> > > > The same is observable in all those previous panics.
> > > > 
> > > 
> > > Here ia nother analysis, without my patches, I think they may be
> > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values
> > > of variables which I collected via my debugging effort.
> > > 
> > > I think I have understanding what is the flow, but not sure what
> > > code should do in those circumstances.
> > > 
> > 
> > I didn't get any feedback on this so far. I'm running custom kernel
> > with additional printf()'s to help me understand the flow.
> > 
> > I ended up with following diff, which I think is safe to do,
> > as code handles situation when curlen == 0 further down in
> > ehci_alloc_sqtd_chain() function. My machine runs more than
> > two weeks with below panic() removed and condition of panic
> > is triggered multiple times during uptime and I didn't notice
> > any side effects.
> > 
> > However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me
> > anyway. At this stage I don't know how to improve things, beyond below
> > diff.
> > 
> > Any feedback would be appreciated. I don't like running custom kernels,
> > so I would like to drop below M from my source checkout.
> > 
> > 
> > Index: ehci.c
> > ===
> > RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> > retrieving revision 1.211
> > diff -u -p -u -r1.211 ehci.c
> > --- ehci.c  6 Aug 2020 14:06:12 -   1.211
> > +++ ehci.c  23 Oct 2020 10:00:35 -
> > @@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
> > curlen -= curlen % mps;
> > DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, "
> > "curlen=%u\n", curlen));
> > -#ifdef DIAGNOSTIC
> > -   if (curlen == 0)
> > -   panic("ehci_alloc_std: curlen == 0");
> > -#endif
> > }
> >  
> > DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x "
> > 
> > -- 
> > Regards,
> >  Mikolaj
> > 
> 



Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-10-23 Thread Marcus Glocker
Honestly, I haven't spent much time to investigate how the curlen = 0 is
getting generated exactly, because for me it will be very difficult to
understand that without the hardware on my side re-producing the same.

But I had look when the code was introduced to handle curlen == 0 later
in the function:

if (iscontrol) {
/*
 * adjust the toggle based on the number of packets
 * in this qtd
 */
if curlen + mps - 1) / mps) & 1) || curlen == 0)
qtdstatus ^= EHCI_QTD_TOGGLE_MASK;
}

This was introduced by revision 1.57 of ehci.c 14 years ago:

***

If a zero-length bulk or interrupt transfer is requested then assume
USBD_FORCE_SHORT_XFER to ensure that we actually build and execute
a transfer.

Based on changes in FreeBSD rev1.47

***

While the DIAGNOSTIC code to panic at curlen == 0 was introduced with
the first commit of ehci.c.  I think the revision 1.57 should have
removed that DIAGNOSTIC code already, since we obviously can cope
with curlen = 0.

Given that, your below diff would be OK for me.

On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote:

> On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote:
> > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote:
> > > So time of the crash varies and I guess probably there is no pattern
> > > there. Here is new panic report, with some kernel printf()s added.
> > > 
> > > Problem seems to be related that dataphyspage is greater than
> > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000.
> > > The same is observable in all those previous panics.
> > > 
> > 
> > Here ia nother analysis, without my patches, I think they may be
> > confusing. I will show flow of ehci_alloc_sqtd_chain() with values
> > of variables which I collected via my debugging effort.
> > 
> > I think I have understanding what is the flow, but not sure what
> > code should do in those circumstances.
> > 
> 
> I didn't get any feedback on this so far. I'm running custom kernel
> with additional printf()'s to help me understand the flow.
> 
> I ended up with following diff, which I think is safe to do,
> as code handles situation when curlen == 0 further down in
> ehci_alloc_sqtd_chain() function. My machine runs more than
> two weeks with below panic() removed and condition of panic
> is triggered multiple times during uptime and I didn't notice
> any side effects.
> 
> However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me
> anyway. At this stage I don't know how to improve things, beyond below
> diff.
> 
> Any feedback would be appreciated. I don't like running custom kernels,
> so I would like to drop below M from my source checkout.
> 
> 
> Index: ehci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> retrieving revision 1.211
> diff -u -p -u -r1.211 ehci.c
> --- ehci.c6 Aug 2020 14:06:12 -   1.211
> +++ ehci.c23 Oct 2020 10:00:35 -
> @@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
>   curlen -= curlen % mps;
>   DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, "
>   "curlen=%u\n", curlen));
> -#ifdef DIAGNOSTIC
> - if (curlen == 0)
> - panic("ehci_alloc_std: curlen == 0");
> -#endif
>   }
>  
>   DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x "
> 
> -- 
> Regards,
>  Mikolaj
> 



Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-10-23 Thread Mikolaj Kucharski
On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote:
> On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote:
> > So time of the crash varies and I guess probably there is no pattern
> > there. Here is new panic report, with some kernel printf()s added.
> > 
> > Problem seems to be related that dataphyspage is greater than
> > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000.
> > The same is observable in all those previous panics.
> > 
> 
> Here ia nother analysis, without my patches, I think they may be
> confusing. I will show flow of ehci_alloc_sqtd_chain() with values
> of variables which I collected via my debugging effort.
> 
> I think I have understanding what is the flow, but not sure what
> code should do in those circumstances.
> 

I didn't get any feedback on this so far. I'm running custom kernel
with additional printf()'s to help me understand the flow.

I ended up with following diff, which I think is safe to do,
as code handles situation when curlen == 0 further down in
ehci_alloc_sqtd_chain() function. My machine runs more than
two weeks with below panic() removed and condition of panic
is triggered multiple times during uptime and I didn't notice
any side effects.

However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me
anyway. At this stage I don't know how to improve things, beyond below
diff.

Any feedback would be appreciated. I don't like running custom kernels,
so I would like to drop below M from my source checkout.


Index: ehci.c
===
RCS file: /cvs/src/sys/dev/usb/ehci.c,v
retrieving revision 1.211
diff -u -p -u -r1.211 ehci.c
--- ehci.c  6 Aug 2020 14:06:12 -   1.211
+++ ehci.c  23 Oct 2020 10:00:35 -
@@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
curlen -= curlen % mps;
DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, "
"curlen=%u\n", curlen));
-#ifdef DIAGNOSTIC
-   if (curlen == 0)
-   panic("ehci_alloc_std: curlen == 0");
-#endif
}
 
DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x "

-- 
Regards,
 Mikolaj



Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-09-26 Thread Mikolaj Kucharski
On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote:
> So time of the crash varies and I guess probably there is no pattern
> there. Here is new panic report, with some kernel printf()s added.
> 
> Problem seems to be related that dataphyspage is greater than
> dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000.
> The same is observable in all those previous panics.
> 

Here ia nother analysis, without my patches, I think they may be
confusing. I will show flow of ehci_alloc_sqtd_chain() with values
of variables which I collected via my debugging effort.

I think I have understanding what is the flow, but not sure what
code should do in those circumstances.

Values are based on panic:

lastpage=0xa0ec000 ehci_page=0xa0ed000 phys=0xa0ed000

ehci_alloc_sqtd_chain(800db000,1000,fd812e8d6990,8000225adfc0,8000225adfc8)
 at ehci_alloc_sqtd_chain+0x766

but there are also panics with alen=0x2000

ehci_alloc_sqtd_chain(800db000,2000,fd812e8d6aa0,8000225c60e0,8000225c60e8)
 at ehci_alloc_sqtd_chain+0x729

  1 usbd_status
  2 ehci_alloc_sqtd_chain(struct ehci_softc *sc, u_int alen, struct 
usbd_xfer *xfer,
  3 struct ehci_soft_qtd **sp, struct ehci_soft_qtd **ep)
  4 {
  5 struct ehci_soft_qtd *next, *cur;
  6 ehci_physaddr_t dataphys, dataphyspage, dataphyslastpage, 
nextphys;
  7 u_int32_t qtdstatus;
  8 u_int len, curlen;
  9 int mps, i, iscontrol, forceshort;
 10 int rd = usbd_xfer_isread(xfer);
 11 struct usb_dma *dma = >dmabuf;
 12 
 13 DPRINTFN(alen<4*4096,("ehci_alloc_sqtd_chain: start len=%d\n", 
alen));
 14 
 15 len = alen;

alen=4096   


 16 iscontrol = 
UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes) ==
 17 UE_CONTROL;

iscontrol=0


 18 
 19 dataphys = DMAADDR(dma, 0);
 20 dataphyslastpage = EHCI_PAGE(dataphys + len - 1);

dataphys=0xa0ec000
dataphyslastpage=0xa0ec000


 21 qtdstatus = EHCI_QTD_ACTIVE |
 22 EHCI_QTD_SET_PID(rd ? EHCI_QTD_PID_IN : EHCI_QTD_PID_OUT) |
 23 EHCI_QTD_SET_CERR(3); /* IOC and BYTES set below */
 24 mps = UGETW(xfer->pipe->endpoint->edesc->wMaxPacketSize);

mps=512


 25 forceshort = ((xfer->flags & USBD_FORCE_SHORT_XFER) || len == 
0) &&
 26 len % mps == 0;

xfer->flags=0x9
USBD_FORCE_SHORT_XFER=0x8
(xfer->flags & USBD_FORCE_SHORT_XFER)=0x8
forceshort=1


 27 /*
 28  * The control transfer data stage always starts with a toggle 
of 1.
 29  * For other transfers we let the hardware track the toggle 
state.
 30  */
 31 if (iscontrol)
 32 qtdstatus |= EHCI_QTD_SET_TOGGLE(1);
 33 
 34 cur = ehci_alloc_sqtd(sc);
 35 *sp = cur;
 36 if (cur == NULL)
 37 goto nomem;
 38 
 39 usb_syncmem(dma, 0, alen,
 40 rd ? BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE);
 41 for (;;) {
 42 dataphyspage = EHCI_PAGE(dataphys);

dataphyspage=0xa0ec000


 43 /* The EHCI hardware can handle at most 5 pages. */
 44 if (dataphyslastpage - dataphyspage <
 45 EHCI_QTD_NBUFFERS * EHCI_PAGE_SIZE) {

EHCI_QTD_NBUFFERS=5
EHCI_PAGE_SIZE=4096

0 < 5*4096 (20480) -> true


 46 /* we can handle it in this QTD */
 47 curlen = len;

curlen=4096


 48 } else {

else branch is not executed


 49 /* must use multiple TDs, fill as much as 
possible. */
 50 curlen = EHCI_QTD_NBUFFERS * EHCI_PAGE_SIZE -
 51  EHCI_PAGE_OFFSET(dataphys);
 52 #ifdef DIAGNOSTIC
 53 if (curlen > len) {
 54 printf("ehci_alloc_sqtd_chain: 
curlen=%u "
 55 "len=%u offs=0x%x\n", curlen, len,
 56 EHCI_PAGE_OFFSET(dataphys));
 57 printf("lastpage=0x%x page=0x%x 
phys=0x%x\n",
 58 dataphyslastpage, dataphyspage, 
dataphys);
 59 curlen = len;
 60 }
 61 #endif
 62 /* the length must be a multiple of the max 
size */
 63