Re: xhci(4) isoc: fix bogus handling of chained TRBs

Marcus Glocker Wed, 15 Jul 2020 15:18:29 -0700

On Wed, 15 Jul 2020 21:28:29 +0000
[email protected] wrote:

> hi,
> 
> The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
> I'll investigate this problem.


Ok, thanks for checking this - I unfortunately only can test this on a
Intel xhci currently.

> BTW please check if malloc returns NULL.

Sure thing ...

> 
> 
> On 2020/07/13 09:59, Marcus Glocker wrote:
> > On Sat, 11 Jul 2020 23:43:43 +0200
> > Marcus Glocker <[email protected]> wrote:
> >   
> >> On Sat, 11 Jul 2020 20:14:03 +0200
> >> Marcus Glocker <[email protected]> wrote:
> >>  
> >>> On Sat, 11 Jul 2020 11:56:38 +0200
> >>> Marcus Glocker <[email protected]> wrote:
> >>>     
> >>>> On Fri, 10 Jul 2020 14:32:47 +0200
> >>>> Marcus Glocker <[email protected]> wrote:
> >>>>       
> >>>>> On Fri, 10 Jul 2020 14:19:00 +0200
> >>>>> Patrick Wildt <[email protected]> wrote:
> >>>>>         
> >>>>>> On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker
> >>>>>> wrote:   
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> Laurence Tratt was reporting about corrupted video images
> >>>>>>> when using uvideo(4) on xhci(4) with MJPEG and higher
> >>>>>>> resolutions, e.g. when using ffplay:
> >>>>>>>
> >>>>>>> $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
> >>>>>>> /dev/video1
> >>>>>>>
> >>>>>>> When trying to re-produce the issue on my side, the video
> >>>>>>> images were still OK, but I also could see a lot of errors
> >>>>>>> returned by ffplay related to corrupted MJPEG data.
> >>>>>>>
> >>>>>>> When debugging the error I noticed that the corruption only
> >>>>>>> happens when TRBs are get chained due to hitting a 64k
> >>>>>>> boundary, and therefore require to queue up two TRBs.  So,
> >>>>>>> what happened?
> >>>>>>>
> >>>>>>> In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
> >>>>>>> processed, chained TD:
> >>>>>>>
> >>>>>>> /*
> >>>>>>>  * If we queued two TRBs for a frame and this is the second
> >>>>>>> TRB,
> >>>>>>>  * check if the first TRB needs accounting since it might
> >>>>>>> not have
> >>>>>>>  * raised an interrupt in case of full data received.
> >>>>>>>  */
> >>>>>>>
> >>>>>>> If there was a zero length transfer with such a TD, and the
> >>>>>>> 2. TRB receives the interrupt with len=0, it assumes that
> >>>>>>> the 1. TRB was processed (but it wasn't), and adds the
> >>>>>>> requested 1. TRB length to actlen, passing back that data
> >>>>>>> has been received, which is obviously wrong, resulting in
> >>>>>>> the seen data corruptions.
> >>>>>>>
> >>>>>>> Instead, when the 2. TRB receives the IOC interrupt with
> >>>>>>> len=0, we need to ignore the 1. TRB, considering this was a
> >>>>>>> zero length transfer.
> >>>>>>>
> >>>>>>> With the below diff Laurent is getting a clear video image
> >>>>>>> now with high resolutions, and I could remove all the output
> >>>>>>> errors from ffplay in my case.  I also tested with an
> >>>>>>> uaudio(4) device.
> >>>>>>>
> >>>>>>> Feedback, more testers, OKs?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Marcus
> >>>>>>>             
> >>>>>>
> >>>>>> What if the first TRB was filled completely, but the second
> >>>>>> TRB transferred 0?  Wouldn't we then need to add 1. TRB?  I
> >>>>>> think your diff wouldn't do that.          
> >>>>>
> >>>>> Right - That was the only case I was thinking about as well
> >>>>> which wouldn't get handled at the moment, but I wasn't sure if
> >>>>> it can be a case at all (according to your comment it can :-)
> >>>>>         
> >>>>>> I think what we actually need is some "processed" flag.  If
> >>>>>> you look on bugs@ or so, there's a diff that tries to fix the
> >>>>>> same issue by adding a "CHAINED" or so flag.  I'm not sure if
> >>>>>> that's the right way.
> >>>>>>
> >>>>>> But, anyway, I think we need a way to say "this TRB got an
> >>>>>> interrupt" or "this TRB was processed".          
> >>>>
> >>>> Sorry, I just did see the "xhci: zero length multi-TRB inbound
> >>>> xfer does not work" mail thread now.
> >>>>
> >>>> Honestly, at the moment I also don't know what's the right way to
> >>>> implement this.  I need to understand the approach from sc.dyings
> >>>> diff a bit better first.      
> >>>
> >>> Ok, I think sc.dyings diff is going in to the right direction but
> >>> I agree with Patrick that we should track the TRB processed
> >>> status per TRB.
> >>>
> >>> How's that as a start (for isoc)?    
> >>
> >> A slightly re-worked diff which moves the trb_processed=false
> >> initialization to xhci_ring_produce().
> >> I couldn't manage to move the setting of trb_processed=true to
> >> xhci_ring_consume().  Maybe at one point we need to introduce a new
> >> xhci_ring_* function/macro to do this which is getting called by
> >> the according TRB processing functions?  
> > 
> > Updated diff including feedback from mpi@ that we don't want to
> > introduce 'bool' in the kernel.
> > 
> > 
> > Index: xhci.c
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> > retrieving revision 1.116
> > diff -u -p -r1.116 xhci.c
> > --- xhci.c  30 Jun 2020 10:21:59 -0000      1.116
> > +++ xhci.c  13 Jul 2020 09:49:23 -0000
> > @@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
> >             if (trb0_idx++ == (xp->ring.ntrb - 1))
> >                     trb0_idx = 0;
> >     }
> > +   xp->ring.trb_processed[trb_idx] = 1;
> >  
> >     /*
> >      * If we queued two TRBs for a frame and this is the
> > second TRB, @@ -958,7 +959,7 @@ xhci_event_xfer_isoc(struct
> > usbd_xfer *x trb0_idx = xp->ring.ntrb - 2;
> >             else
> >                     trb0_idx = trb_idx - 1;
> > -           if (xfer->frlengths[frame_idx] == 0) {
> > +           if (xp->ring.trb_processed[trb0_idx] == 0) {
> >                     xfer->frlengths[frame_idx] =
> > XHCI_TRB_LEN(letoh32( xp->ring.trbs[trb0_idx].trb_status));
> >             }
> > @@ -1702,6 +1703,9 @@ xhci_ring_alloc(struct xhci_softc *sc, s
> >  
> >     ring->ntrb = ntrb;
> >  
> > +   ring->trb_processed = malloc(ring->ntrb *
> > sizeof(*ring->trb_processed),
> > +       M_DEVBUF, M_NOWAIT);
> > +
> >     xhci_ring_reset(sc, ring);
> >  
> >     return (0);
> > @@ -1710,6 +1714,8 @@ xhci_ring_alloc(struct xhci_softc *sc, s
> >  void
> >  xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
> >  {
> > +   free(ring->trb_processed, M_DEVBUF,
> > +       ring->ntrb * sizeof(*ring->trb_processed));
> >     usbd_dma_contig_free(&sc->sc_bus, &ring->dma);
> >  }
> >  
> > @@ -1721,6 +1727,8 @@ xhci_ring_reset(struct xhci_softc *sc, s
> >     size = ring->ntrb * sizeof(struct xhci_trb);
> >  
> >     memset(ring->trbs, 0, size);
> > +   memset(ring->trb_processed, 0,
> > +       ring->ntrb * sizeof(*ring->trb_processed));
> >  
> >     ring->index = 0;
> >     ring->toggle = XHCI_TRB_CYCLE;
> > @@ -1805,6 +1813,8 @@ xhci_ring_produce(struct xhci_softc *sc,
> >             ring->index = 0;
> >             ring->toggle ^= 1;
> >     }
> > +
> > +   ring->trb_processed[(trb - &ring->trbs[0])] = 0;
> >  
> >     return (trb);
> >  }
> > Index: xhcivar.h
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/usb/xhcivar.h,v
> > retrieving revision 1.11
> > diff -u -p -r1.11 xhcivar.h
> > --- xhcivar.h       6 Oct 2019 17:30:00 -0000       1.11
> > +++ xhcivar.h       13 Jul 2020 09:49:23 -0000
> > @@ -44,6 +44,7 @@ struct xhci_xfer {
> >  
> >  struct xhci_ring {
> >     struct xhci_trb         *trbs;
> > +   uint8_t                 *trb_processed;
> >     size_t                   ntrb;
> >     struct usbd_dma_info     dma;
> >  
> >  
> >   
>

Re: xhci(4) isoc: fix bogus handling of chained TRBs

Reply via email to