hi, The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI. I'll investigate this problem.
BTW please check if malloc returns NULL. On 2020/07/13 09:59, Marcus Glocker wrote: > On Sat, 11 Jul 2020 23:43:43 +0200 > Marcus Glocker <[email protected]> wrote: > >> On Sat, 11 Jul 2020 20:14:03 +0200 >> Marcus Glocker <[email protected]> wrote: >> >>> On Sat, 11 Jul 2020 11:56:38 +0200 >>> Marcus Glocker <[email protected]> wrote: >>> >>>> On Fri, 10 Jul 2020 14:32:47 +0200 >>>> Marcus Glocker <[email protected]> wrote: >>>> >>>>> On Fri, 10 Jul 2020 14:19:00 +0200 >>>>> Patrick Wildt <[email protected]> wrote: >>>>> >>>>>> On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker >>>>>> wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> Laurence Tratt was reporting about corrupted video images >>>>>>> when using uvideo(4) on xhci(4) with MJPEG and higher >>>>>>> resolutions, e.g. when using ffplay: >>>>>>> >>>>>>> $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f >>>>>>> /dev/video1 >>>>>>> >>>>>>> When trying to re-produce the issue on my side, the video >>>>>>> images were still OK, but I also could see a lot of errors >>>>>>> returned by ffplay related to corrupted MJPEG data. >>>>>>> >>>>>>> When debugging the error I noticed that the corruption only >>>>>>> happens when TRBs are get chained due to hitting a 64k >>>>>>> boundary, and therefore require to queue up two TRBs. So, >>>>>>> what happened? >>>>>>> >>>>>>> In xhci.c:xhci_event_xfer_isoc() we do the accounting of the >>>>>>> processed, chained TD: >>>>>>> >>>>>>> /* >>>>>>> * If we queued two TRBs for a frame and this is the second >>>>>>> TRB, >>>>>>> * check if the first TRB needs accounting since it might >>>>>>> not have >>>>>>> * raised an interrupt in case of full data received. >>>>>>> */ >>>>>>> >>>>>>> If there was a zero length transfer with such a TD, and the >>>>>>> 2. TRB receives the interrupt with len=0, it assumes that >>>>>>> the 1. TRB was processed (but it wasn't), and adds the >>>>>>> requested 1. TRB length to actlen, passing back that data >>>>>>> has been received, which is obviously wrong, resulting in >>>>>>> the seen data corruptions. >>>>>>> >>>>>>> Instead, when the 2. TRB receives the IOC interrupt with >>>>>>> len=0, we need to ignore the 1. TRB, considering this was a >>>>>>> zero length transfer. >>>>>>> >>>>>>> With the below diff Laurent is getting a clear video image >>>>>>> now with high resolutions, and I could remove all the output >>>>>>> errors from ffplay in my case. I also tested with an >>>>>>> uaudio(4) device. >>>>>>> >>>>>>> Feedback, more testers, OKs? >>>>>>> >>>>>>> Thanks, >>>>>>> Marcus >>>>>>> >>>>>> >>>>>> What if the first TRB was filled completely, but the second >>>>>> TRB transferred 0? Wouldn't we then need to add 1. TRB? I >>>>>> think your diff wouldn't do that. >>>>> >>>>> Right - That was the only case I was thinking about as well >>>>> which wouldn't get handled at the moment, but I wasn't sure if >>>>> it can be a case at all (according to your comment it can :-) >>>>> >>>>>> I think what we actually need is some "processed" flag. If >>>>>> you look on bugs@ or so, there's a diff that tries to fix the >>>>>> same issue by adding a "CHAINED" or so flag. I'm not sure if >>>>>> that's the right way. >>>>>> >>>>>> But, anyway, I think we need a way to say "this TRB got an >>>>>> interrupt" or "this TRB was processed". >>>> >>>> Sorry, I just did see the "xhci: zero length multi-TRB inbound >>>> xfer does not work" mail thread now. >>>> >>>> Honestly, at the moment I also don't know what's the right way to >>>> implement this. I need to understand the approach from sc.dyings >>>> diff a bit better first. >>> >>> Ok, I think sc.dyings diff is going in to the right direction but I >>> agree with Patrick that we should track the TRB processed status per >>> TRB. >>> >>> How's that as a start (for isoc)? >> >> A slightly re-worked diff which moves the trb_processed=false >> initialization to xhci_ring_produce(). >> I couldn't manage to move the setting of trb_processed=true to >> xhci_ring_consume(). Maybe at one point we need to introduce a new >> xhci_ring_* function/macro to do this which is getting called by the >> according TRB processing functions? > > Updated diff including feedback from mpi@ that we don't want to > introduce 'bool' in the kernel. > > > Index: xhci.c > =================================================================== > RCS file: /cvs/src/sys/dev/usb/xhci.c,v > retrieving revision 1.116 > diff -u -p -r1.116 xhci.c > --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116 > +++ xhci.c 13 Jul 2020 09:49:23 -0000 > @@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x > if (trb0_idx++ == (xp->ring.ntrb - 1)) > trb0_idx = 0; > } > + xp->ring.trb_processed[trb_idx] = 1; > > /* > * If we queued two TRBs for a frame and this is the second TRB, > @@ -958,7 +959,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x > trb0_idx = xp->ring.ntrb - 2; > else > trb0_idx = trb_idx - 1; > - if (xfer->frlengths[frame_idx] == 0) { > + if (xp->ring.trb_processed[trb0_idx] == 0) { > xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32( > xp->ring.trbs[trb0_idx].trb_status)); > } > @@ -1702,6 +1703,9 @@ xhci_ring_alloc(struct xhci_softc *sc, s > > ring->ntrb = ntrb; > > + ring->trb_processed = malloc(ring->ntrb * sizeof(*ring->trb_processed), > + M_DEVBUF, M_NOWAIT); > + > xhci_ring_reset(sc, ring); > > return (0); > @@ -1710,6 +1714,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s > void > xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring) > { > + free(ring->trb_processed, M_DEVBUF, > + ring->ntrb * sizeof(*ring->trb_processed)); > usbd_dma_contig_free(&sc->sc_bus, &ring->dma); > } > > @@ -1721,6 +1727,8 @@ xhci_ring_reset(struct xhci_softc *sc, s > size = ring->ntrb * sizeof(struct xhci_trb); > > memset(ring->trbs, 0, size); > + memset(ring->trb_processed, 0, > + ring->ntrb * sizeof(*ring->trb_processed)); > > ring->index = 0; > ring->toggle = XHCI_TRB_CYCLE; > @@ -1805,6 +1813,8 @@ xhci_ring_produce(struct xhci_softc *sc, > ring->index = 0; > ring->toggle ^= 1; > } > + > + ring->trb_processed[(trb - &ring->trbs[0])] = 0; > > return (trb); > } > Index: xhcivar.h > =================================================================== > RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v > retrieving revision 1.11 > diff -u -p -r1.11 xhcivar.h > --- xhcivar.h 6 Oct 2019 17:30:00 -0000 1.11 > +++ xhcivar.h 13 Jul 2020 09:49:23 -0000 > @@ -44,6 +44,7 @@ struct xhci_xfer { > > struct xhci_ring { > struct xhci_trb *trbs; > + uint8_t *trb_processed; > size_t ntrb; > struct usbd_dma_info dma; > > >
