Re: XHCI support for bulk xfers >64k

2019-11-21 Thread sc . dying
On 2019/11/21 07:59, Patrick Wildt wrote:
> On Thu, Nov 21, 2019 at 06:52:36AM +, sc.dy...@gmail.com wrote:
>> Hi,
>>
>> On 2019/11/18 20:34, Marcus Glocker wrote:
>>> On Sun, Nov 17, 2019 at 12:36:49PM +0100, Marcus Glocker wrote:
>>>
 While recently testing UDL on XHCI I faced a lot of USB timeouts, and
 therefore screen rendering issues.
 The reason is that XHCI currently only supports single bulk transfers
 up to 64k, while UDL can schedule a bulk transfer up to 1m.

 The following diff adds XHCI bulk transfer support >64k and makes UDL
 work fine for me on XHCI.

 mpi@ already did an initial re-view, and I have adapted some of his
 comments already in this diff.

 More testing and comments welcome.
>>>
>>> After patrick@ did feedback that the current bulk transfer code in
>>> XHCI should also work for larger transfers as is, we have found the
>>> bug which resulted in a much smaller diff :-)
>>>
>>> Attached for your reference.  It just has been committed.
>>>
>>>
>>> Index: xhci.c
>>> ===
>>> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
>>> retrieving revision 1.106
>>> diff -u -p -u -p -r1.106 xhci.c
>>> --- xhci.c  6 Oct 2019 17:30:00 -   1.106
>>> +++ xhci.c  18 Nov 2019 19:32:30 -
>>> @@ -2871,7 +2871,7 @@ xhci_device_generic_start(struct usbd_xf
>>> /* If the buffer crosses a 64k boundary, we need one more. */
>>> len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>>> if (len < xfer->length)
>>> -   ntrb++;
>>> +   ntrb = howmany(xfer->length - len, XHCI_TRB_MAXSIZE) + 1;
>>> else
>>> len = xfer->length;
>>>  
>>>
>>
>> Should xhci_device_isoc_start also count up the chunks after boundary?
>>
> 
> We had discussed this as well, but since a frame cannot be bigger than
> 64k, a hard boundary which is visible when you realize frlengths is a
> uint16_t *, that change would essentially be a no-op.

Oh, I missed it.
Thank you for explanation.
I see, xhci_event_xfer_isoc does not need fix.

> 
> We could change it for the sake of having the same calculation in all
> places.
> 
> Thanks,
> Patrick
> 
>>
>> --- sys/dev/usb/xhci.c.orig  Mon Nov 18 22:47:25 2019
>> +++ sys/dev/usb/xhci.c   Thu Nov 21 05:13:02 2019
>> @@ -3049,7 +3049,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
>>  /* If the buffer crosses a 64k boundary, we need one more. */
>>  len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>>  if (len < xfer->frlengths[i])
>> -ntrb++;
>> +ntrb = howmany(xfer->frlengths[i] - len,
>> +XHCI_TRB_MAXSIZE) + 1;
>>  
>>  paddr += xfer->frlengths[i];
>>  }
>> @@ -3066,7 +3067,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
>>  /* If the buffer crosses a 64k boundary, we need one more. */
>>  len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>>  if (len < xfer->frlengths[i])
>> -ntrb++;
>> +ntrb = howmany(xfer->frlengths[i] - len,
>> +XHCI_TRB_MAXSIZE) + 1;
>>  else
>>  len = xfer->frlengths[i];
>>  
>>



Re: XHCI support for bulk xfers >64k

2019-11-21 Thread Patrick Wildt
On Thu, Nov 21, 2019 at 06:52:36AM +, sc.dy...@gmail.com wrote:
> Hi,
> 
> On 2019/11/18 20:34, Marcus Glocker wrote:
> > On Sun, Nov 17, 2019 at 12:36:49PM +0100, Marcus Glocker wrote:
> > 
> >> While recently testing UDL on XHCI I faced a lot of USB timeouts, and
> >> therefore screen rendering issues.
> >> The reason is that XHCI currently only supports single bulk transfers
> >> up to 64k, while UDL can schedule a bulk transfer up to 1m.
> >>
> >> The following diff adds XHCI bulk transfer support >64k and makes UDL
> >> work fine for me on XHCI.
> >>
> >> mpi@ already did an initial re-view, and I have adapted some of his
> >> comments already in this diff.
> >>
> >> More testing and comments welcome.
> > 
> > After patrick@ did feedback that the current bulk transfer code in
> > XHCI should also work for larger transfers as is, we have found the
> > bug which resulted in a much smaller diff :-)
> > 
> > Attached for your reference.  It just has been committed.
> > 
> > 
> > Index: xhci.c
> > ===
> > RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> > retrieving revision 1.106
> > diff -u -p -u -p -r1.106 xhci.c
> > --- xhci.c  6 Oct 2019 17:30:00 -   1.106
> > +++ xhci.c  18 Nov 2019 19:32:30 -
> > @@ -2871,7 +2871,7 @@ xhci_device_generic_start(struct usbd_xf
> > /* If the buffer crosses a 64k boundary, we need one more. */
> > len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
> > if (len < xfer->length)
> > -   ntrb++;
> > +   ntrb = howmany(xfer->length - len, XHCI_TRB_MAXSIZE) + 1;
> > else
> > len = xfer->length;
> >  
> > 
> 
> Should xhci_device_isoc_start also count up the chunks after boundary?
> 

We had discussed this as well, but since a frame cannot be bigger than
64k, a hard boundary which is visible when you realize frlengths is a
uint16_t *, that change would essentially be a no-op.

We could change it for the sake of having the same calculation in all
places.

Thanks,
Patrick

> 
> --- sys/dev/usb/xhci.c.orig   Mon Nov 18 22:47:25 2019
> +++ sys/dev/usb/xhci.cThu Nov 21 05:13:02 2019
> @@ -3049,7 +3049,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
>   /* If the buffer crosses a 64k boundary, we need one more. */
>   len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>   if (len < xfer->frlengths[i])
> - ntrb++;
> + ntrb = howmany(xfer->frlengths[i] - len,
> + XHCI_TRB_MAXSIZE) + 1;
>  
>   paddr += xfer->frlengths[i];
>   }
> @@ -3066,7 +3067,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
>   /* If the buffer crosses a 64k boundary, we need one more. */
>   len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>   if (len < xfer->frlengths[i])
> - ntrb++;
> + ntrb = howmany(xfer->frlengths[i] - len,
> + XHCI_TRB_MAXSIZE) + 1;
>   else
>   len = xfer->frlengths[i];
>  
> 



Re: XHCI support for bulk xfers >64k

2019-11-20 Thread sc . dying
Hi,

On 2019/11/18 20:34, Marcus Glocker wrote:
> On Sun, Nov 17, 2019 at 12:36:49PM +0100, Marcus Glocker wrote:
> 
>> While recently testing UDL on XHCI I faced a lot of USB timeouts, and
>> therefore screen rendering issues.
>> The reason is that XHCI currently only supports single bulk transfers
>> up to 64k, while UDL can schedule a bulk transfer up to 1m.
>>
>> The following diff adds XHCI bulk transfer support >64k and makes UDL
>> work fine for me on XHCI.
>>
>> mpi@ already did an initial re-view, and I have adapted some of his
>> comments already in this diff.
>>
>> More testing and comments welcome.
> 
> After patrick@ did feedback that the current bulk transfer code in
> XHCI should also work for larger transfers as is, we have found the
> bug which resulted in a much smaller diff :-)
> 
> Attached for your reference.  It just has been committed.
> 
> 
> Index: xhci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.106
> diff -u -p -u -p -r1.106 xhci.c
> --- xhci.c6 Oct 2019 17:30:00 -   1.106
> +++ xhci.c18 Nov 2019 19:32:30 -
> @@ -2871,7 +2871,7 @@ xhci_device_generic_start(struct usbd_xf
>   /* If the buffer crosses a 64k boundary, we need one more. */
>   len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>   if (len < xfer->length)
> - ntrb++;
> + ntrb = howmany(xfer->length - len, XHCI_TRB_MAXSIZE) + 1;
>   else
>   len = xfer->length;
>  
> 

Should xhci_device_isoc_start also count up the chunks after boundary?


--- sys/dev/usb/xhci.c.orig Mon Nov 18 22:47:25 2019
+++ sys/dev/usb/xhci.c  Thu Nov 21 05:13:02 2019
@@ -3049,7 +3049,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
/* If the buffer crosses a 64k boundary, we need one more. */
len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
if (len < xfer->frlengths[i])
-   ntrb++;
+   ntrb = howmany(xfer->frlengths[i] - len,
+   XHCI_TRB_MAXSIZE) + 1;
 
paddr += xfer->frlengths[i];
}
@@ -3066,7 +3067,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
/* If the buffer crosses a 64k boundary, we need one more. */
len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
if (len < xfer->frlengths[i])
-   ntrb++;
+   ntrb = howmany(xfer->frlengths[i] - len,
+   XHCI_TRB_MAXSIZE) + 1;
else
len = xfer->frlengths[i];
 



Re: XHCI support for bulk xfers >64k

2019-11-18 Thread Marcus Glocker
On Sun, Nov 17, 2019 at 12:36:49PM +0100, Marcus Glocker wrote:

> While recently testing UDL on XHCI I faced a lot of USB timeouts, and
> therefore screen rendering issues.
> The reason is that XHCI currently only supports single bulk transfers
> up to 64k, while UDL can schedule a bulk transfer up to 1m.
> 
> The following diff adds XHCI bulk transfer support >64k and makes UDL
> work fine for me on XHCI.
> 
> mpi@ already did an initial re-view, and I have adapted some of his
> comments already in this diff.
> 
> More testing and comments welcome.

After patrick@ did feedback that the current bulk transfer code in
XHCI should also work for larger transfers as is, we have found the
bug which resulted in a much smaller diff :-)

Attached for your reference.  It just has been committed.


Index: xhci.c
===
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.106
diff -u -p -u -p -r1.106 xhci.c
--- xhci.c  6 Oct 2019 17:30:00 -   1.106
+++ xhci.c  18 Nov 2019 19:32:30 -
@@ -2871,7 +2871,7 @@ xhci_device_generic_start(struct usbd_xf
/* If the buffer crosses a 64k boundary, we need one more. */
len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
if (len < xfer->length)
-   ntrb++;
+   ntrb = howmany(xfer->length - len, XHCI_TRB_MAXSIZE) + 1;
else
len = xfer->length;
 



XHCI support for bulk xfers >64k

2019-11-17 Thread Marcus Glocker
While recently testing UDL on XHCI I faced a lot of USB timeouts, and
therefore screen rendering issues.
The reason is that XHCI currently only supports single bulk transfers
up to 64k, while UDL can schedule a bulk transfer up to 1m.

The following diff adds XHCI bulk transfer support >64k and makes UDL
work fine for me on XHCI.

mpi@ already did an initial re-view, and I have adapted some of his
comments already in this diff.

More testing and comments welcome.


Index: xhci.c
===
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.106
diff -u -p -u -p -r1.106 xhci.c
--- xhci.c  6 Oct 2019 17:30:00 -   1.106
+++ xhci.c  17 Nov 2019 11:22:22 -
@@ -816,9 +816,29 @@ xhci_event_xfer_generic(struct xhci_soft
 uint8_t code, uint8_t slot, uint8_t dci)
 {
struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
-
+   int trb0_idx;
+   
switch (code) {
case XHCI_CODE_SUCCESS:
+   if (trb_idx == 0)
+   trb0_idx = xp->ring.ntrb - 2;
+   else
+   trb0_idx = trb_idx - 1;
+
+   if (xp->ring.trbs[trb0_idx].trb_flags & XHCI_TRB_CHAIN) {
+   xfer->actlen += XHCI_TRB_LEN(xp->ring.trbs[trb0_idx].
+   trb_status) - remain;
+   }
+   xfer->actlen += XHCI_TRB_LEN(xp->ring.trbs[trb_idx].
+   trb_status) - remain;
+
+   DPRINTF(("XHCI_CODE_SUCCESS: "
+   "index=%d, trb_idx=%d, actlen=%d, length=%d, remain=%d\n"
+   xx->index, trb_idx, xfer->actlen, xfer->length, remain));
+
+   if (xx->index != trb_idx)
+   return (1);
+
/*
 * This might be the last TRB of a TD that ended up
 * with a Short Transfer condition, see below.
@@ -2856,68 +2876,93 @@ xhci_device_generic_start(struct usbd_xf
struct xhci_trb *trb0, *trb;
uint32_t len, remain, flags;
uint32_t mps = UGETW(xfer->pipe->endpoint->edesc->wMaxPacketSize);
-   uint64_t paddr = DMAADDR(&xfer->dmabuf, 0);
+   uint64_t paddr;
uint8_t toggle;
-   int s, i, ntrb;
+   int s, i, j, ntrb;
+   int chlengths[XHCI_MAX_BULK_CHUNKS];
+   int nchunks;
 
KASSERT(!(xfer->rqflags & URQ_REQUEST));
 
if (sc->sc_bus.dying || xp->halted)
return (USBD_IOERROR);
 
-   /* How many TRBs do we need for this transfer? */
-   ntrb = howmany(xfer->length, XHCI_TRB_MAXSIZE);
+   /* Split the payload in chunks of XHCI_TRB_MAXSIZE. */
+   nchunks = xfer->length / XHCI_TRB_MAXSIZE;
+   remain = xfer->length % XHCI_TRB_MAXSIZE;
 
-   /* If the buffer crosses a 64k boundary, we need one more. */
-   len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
-   if (len < xfer->length)
-   ntrb++;
-   else
-   len = xfer->length;
+   if ((nchunks + (remain ? 1 : 0)) >= XHCI_MAX_BULK_CHUNKS)
+   return (USBD_NOMEM);
 
-   /* If we need to append a zero length packet, we need one more. */
-   if ((xfer->flags & USBD_FORCE_SHORT_XFER || xfer->length == 0) &&
-   (xfer->length % UE_GET_SIZE(mps) == 0))
-   ntrb++;
+   for (i = 0; i < nchunks; i++)
+   chlengths[i] = XHCI_TRB_MAXSIZE;
+   if (remain != 0) {
+   chlengths[i++] = remain;
+   nchunks++;
+   }
+
+   paddr = DMAADDR(&xfer->dmabuf, 0);
+
+   /* How many TRBs do for all Transfers? */
+   for (i = 0, ntrb = 0; i < nchunks; i++) {
+   /* How many TRBs do we need for this transfer? */
+   ntrb += howmany(chlengths[i], XHCI_TRB_MAXSIZE);
+
+   /* If the buffer crosses a 64k boundary, we need one more. */
+   len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
+   if (len < chlengths[i])
+   ntrb++;
+
+   /* If we need to append a zero length packet, we need 1 more. */
+   if ((xfer->flags & USBD_FORCE_SHORT_XFER || xfer->length == 0)
+   && (xfer->length % UE_GET_SIZE(mps) == 0))
+   ntrb++;
+
+   paddr += chlengths[i];
+}
 
if (xp->free_trbs < ntrb)
return (USBD_NOMEM);
 
-   /* We'll toggle the first TRB once we're finished with the chain. */
-   trb0 = xhci_xfer_get_trb(sc, xfer, &toggle, (ntrb == 1));
-   flags = XHCI_TRB_TYPE_NORMAL | (toggle ^ 1);
-   if (usbd_xfer_isread(xfer))
-   flags |= XHCI_TRB_ISP;
-   flags |= (ntrb == 1) ? XHCI_TRB_IOC : XHCI_TRB_CHAIN;
-
-   trb0->trb_paddr = htole64(DMAADDR(&xfer->dmabuf, 0));
-   trb0->trb_status = htole32(
-   XHCI_TRB_INTR(0) | XHCI_TRB_LEN(len) |
-   xhci_xfer_tdsize(xfer, xfer->length, len)
-   );
-   trb0->trb