On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker wrote: > Hi All, > > Laurence Tratt was reporting about corrupted video images when using > uvideo(4) on xhci(4) with MJPEG and higher resolutions, e.g. when using > ffplay: > > $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f /dev/video1 > > When trying to re-produce the issue on my side, the video images were > still OK, but I also could see a lot of errors returned by ffplay > related to corrupted MJPEG data. > > When debugging the error I noticed that the corruption only happens > when TRBs are get chained due to hitting a 64k boundary, and therefore > require to queue up two TRBs. So, what happened? > > In xhci.c:xhci_event_xfer_isoc() we do the accounting of the > processed, chained TD: > > /* > * If we queued two TRBs for a frame and this is the second TRB, > * check if the first TRB needs accounting since it might not have > * raised an interrupt in case of full data received. > */ > > If there was a zero length transfer with such a TD, and the 2. TRB > receives the interrupt with len=0, it assumes that the 1. TRB was > processed (but it wasn't), and adds the requested 1. TRB length to > actlen, passing back that data has been received, which is obviously > wrong, resulting in the seen data corruptions. > > Instead, when the 2. TRB receives the IOC interrupt with len=0, we > need to ignore the 1. TRB, considering this was a zero length transfer. > > With the below diff Laurent is getting a clear video image now with > high resolutions, and I could remove all the output errors from ffplay > in my case. I also tested with an uaudio(4) device. > > Feedback, more testers, OKs? > > Thanks, > Marcus >
What if the first TRB was filled completely, but the second TRB transferred 0? Wouldn't we then need to add 1. TRB? I think your diff wouldn't do that. I think what we actually need is some "processed" flag. If you look on bugs@ or so, there's a diff that tries to fix the same issue by adding a "CHAINED" or so flag. I'm not sure if that's the right way. But, anyway, I think we need a way to say "this TRB got an interrupt" or "this TRB was processed". > > Index: xhci.c > =================================================================== > RCS file: /cvs/src/sys/dev/usb/xhci.c,v > retrieving revision 1.116 > diff -u -p -u -p -r1.116 xhci.c > --- xhci.c 30 Jun 2020 10:21:59 -0000 1.116 > +++ xhci.c 10 Jul 2020 05:57:11 -0000 > @@ -932,6 +932,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x > struct usbd_xfer *skipxfer; > struct xhci_xfer *xx = (struct xhci_xfer *)xfer; > int trb0_idx, frame_idx = 0; > + uint32_t len; > > KASSERT(xx->index >= 0); > trb0_idx = > @@ -945,6 +946,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x > if (trb0_idx++ == (xp->ring.ntrb - 1)) > trb0_idx = 0; > } > + len = XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain; > > /* > * If we queued two TRBs for a frame and this is the second TRB, > @@ -954,18 +956,17 @@ xhci_event_xfer_isoc(struct usbd_xfer *x > if ((letoh32(xp->ring.trbs[trb_idx].trb_flags) & XHCI_TRB_TYPE_MASK) == > XHCI_TRB_TYPE_NORMAL) { > frame_idx--; > - if (trb_idx == 0) > - trb0_idx = xp->ring.ntrb - 2; > - else > - trb0_idx = trb_idx - 1; > - if (xfer->frlengths[frame_idx] == 0) { > + if (len > 0 && xfer->frlengths[frame_idx] == 0) { > + if (trb_idx == 0) > + trb0_idx = xp->ring.ntrb - 2; > + else > + trb0_idx = trb_idx - 1; > xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32( > xp->ring.trbs[trb0_idx].trb_status)); > } > } > > - xfer->frlengths[frame_idx] += > - XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain; > + xfer->frlengths[frame_idx] += len; > xfer->actlen += xfer->frlengths[frame_idx]; > > if (xx->index != trb_idx) >