On Wed, Apr 01, 2020 at 12:23:53PM +0200, Patrick Wildt wrote: > On Wed, Apr 01, 2020 at 12:04:25PM +0200, Patrick Wildt wrote: > > On Wed, Apr 01, 2020 at 09:40:06AM +0200, Patrick Wildt wrote: > > > On Wed, Apr 01, 2020 at 09:22:07AM +0200, Patrick Wildt wrote: > > > > On Wed, Apr 01, 2020 at 04:47:10PM +1100, Jonathan Gray wrote: > > > > > On Wed, Apr 01, 2020 at 12:58:23PM +1100, Jonathan Gray wrote: > > > > > > On Wed, Mar 18, 2020 at 01:41:06PM +0100, Patrick Wildt wrote: > > > > > > > On Wed, Mar 18, 2020 at 11:22:40AM +0100, Patrick Wildt wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I've spent a few days investigating why USB ethernet adapters > > > > > > > > are so > > > > > > > > horribly slow on my ARMs. Using dt(4) I realized that it was > > > > > > > > spending > > > > > > > > most of its time in memcpy. But, why? As it turns out, all > > > > > > > > USB data > > > > > > > > buffers are mapped COHERENT, which on some/most ARMs means > > > > > > > > uncached. > > > > > > > > Using cached data buffers makes the performance rise from 20 > > > > > > > > mbit/s to > > > > > > > > 200 mbit/s. Quite a difference. > > > > > > > > > > > > > > > > sys/dev/usb/usb_mem.c: > > > > > > > > error = bus_dmamem_map(tag, p->segs, p->nsegs, p->size, > > > > > > > > &p->kaddr, > > > > > > > > BUS_DMA_NOWAIT|BUS_DMA_COHERENT); > > > > > > > > > > > > > > > > On x86, COHERENT is essentially a no-op. On ARM, it depends on > > > > > > > > the SoC. > > > > > > > > Some SoCs have cache-coherent USB controllers, some don't. > > > > > > > > Mine does > > > > > > > > not, so mapping it COHERENT means uncached and thus slow. > > > > > > > > > > > > > > > > Why do we do that? Well, when the code was imported in 99, it > > > > > > > > was > > > > > > > > already there. Since then we have gained infrastructure for DMA > > > > > > > > syncs in the USB stack, which I think are proper. > > > > > > > > > > > > > > > > sys/dev/usb/usbdi.c - usbd_transfer() (before transfer) > > > > > > > > > > > > > > > > if (!usbd_xfer_isread(xfer)) { > > > > > > > > if ((xfer->flags & USBD_NO_COPY) == 0) > > > > > > > > memcpy(KERNADDR(&xfer->dmabuf, 0), > > > > > > > > xfer->buffer, > > > > > > > > xfer->length); > > > > > > > > usb_syncmem(&xfer->dmabuf, 0, xfer->length, > > > > > > > > BUS_DMASYNC_PREWRITE); > > > > > > > > } else > > > > > > > > usb_syncmem(&xfer->dmabuf, 0, xfer->length, > > > > > > > > BUS_DMASYNC_PREREAD); > > > > > > > > err = pipe->methods->transfer(xfer); > > > > > > > > > > > > > > > > sys/dev/usb/usbdi.c - usb_transfer_complete() (after transfer) > > > > > > > > > > > > > > > > if (xfer->actlen != 0) { > > > > > > > > if (usbd_xfer_isread(xfer)) { > > > > > > > > usb_syncmem(&xfer->dmabuf, 0, > > > > > > > > xfer->actlen, > > > > > > > > BUS_DMASYNC_POSTREAD); > > > > > > > > if (!(xfer->flags & USBD_NO_COPY)) > > > > > > > > memcpy(xfer->buffer, > > > > > > > > KERNADDR(&xfer->dmabuf, 0), > > > > > > > > xfer->actlen); > > > > > > > > } else > > > > > > > > usb_syncmem(&xfer->dmabuf, 0, > > > > > > > > xfer->actlen, > > > > > > > > BUS_DMASYNC_POSTWRITE); > > > > > > > > } > > > > > > > > > > > > > > > > We cannot just remove COHERENT, since some drivers, like > > > > > > > > ehci(4), use > > > > > > > > the same backend to allocate their rings. And I can't vouch > > > > > > > > for those > > > > > > > > drivers' sanity. > > > > > > > > > > > > > > > > As a first step, I would like to go ahead with another > > > > > > > > solution, which > > > > > > > > is based on a diff from Marius Strobl, who added those syncs in > > > > > > > > the > > > > > > > > first place. Essentially it splits the memory handling into > > > > > > > > cacheable > > > > > > > > and non-cacheable blocks. The USB data transfers and everyone > > > > > > > > who uses > > > > > > > > usbd_alloc_buffer() then use cacheable buffers, while code like > > > > > > > > ehci(4) > > > > > > > > still don't. This is a bit of a safer approach imho, since we > > > > > > > > don't > > > > > > > > hurt the controller drivers, but speed up the data buffers. > > > > > > > > > > > > > > > > Once we have verified that there are no regressions, we can > > > > > > > > adjust > > > > > > > > ehci(4) and the like, add proper syncs, make sure they still > > > > > > > > work as > > > > > > > > well as before, and maybe then back this out again. > > > > > > > > > > > > > > > > Keep note that this is all a no-op on X86, but all the other > > > > > > > > archs will > > > > > > > > profit from this. > > > > > > > > > > > > > > > > ok? > > > > > > > > > > > > > > > > Patrick > > > > > > > > > > > > > > Update diff with inverted logic. kettenis@ argues that we should > > > > > > > invert the logic, and those who need COHERENT memory should ask > > > > > > > for that explicitly, since for bus_dmamem_map() it also needs to > > > > > > > be passed explicitly. This also points out all those users that > > > > > > > use usb_allocmem() internally, where we might want to have a look > > > > > > > if COHERENT is actually needed or not, or if it can be refactored > > > > > > > in another way. > > > > > > > > > > > > These commits broke usb on imx.6 with cubox: > > > > > > > > > > > > imxehci0 at simplebus3 > > > > > > usb0 at imxehci0: USB revision 2.0 > > > > > > usb0: root hub problem > > > > > > imxehci1 at simplebus3 > > > > > > usb1 at imxehci1: USB revision 2.0 > > > > > > usb1: root hub problem > > > > > > "usbmisc" at simplebus3 not configured > > > > > > > > > > pandaboard with omap4 (another cortex a9) also has broken usb with > > > > > the latest snapshot: > > > > > > > > > > omehci0 at simplebus0 > > > > > usb0 at omehci0: USB revision 2.0 > > > > > usb0: root hub problem > > > > > > > > I think I know what it is. When we enqueue a request for the root > > > > hub, the buffer, for which the USB subsystem allocates a DMA buffer, > > > > is filled not by an external device, but by our driver. > > > > > > > > Now on completion of that request, since it's doing a READ of the > > > > root hub, it will do a DMA sync with POSTREAD. Since the ehci code > > > > hasn't flushed the buffer do memory, but simply did a memcpy to the > > > > buffer, the POSTREAD will essentially drop whatever ehci's memcpy > > > > did. > > > > > > > > Can you check if that makes a difference? It essentially forces the > > > > root hub code to flush the data to the caches, so that the POSTREAD > > > > can successfully flush the cache and read the data from memory. I > > > > wonder if there's a better way of doing this, but I kinda doubt it. > > > > > > > > Patrick > > > > > > > > diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c > > > > index a352d83eaf4..d81901d3762 100644 > > > > --- a/sys/dev/usb/ehci.c > > > > +++ b/sys/dev/usb/ehci.c > > > > @@ -2154,6 +2154,9 @@ ehci_root_ctrl_start(struct usbd_xfer *xfer) > > > > err = USBD_IOERROR; > > > > goto ret; > > > > } > > > > + if (totlen != 0) > > > > + usb_syncmem(&xfer->dmabuf, 0, totlen, > > > > + BUS_DMASYNC_PREWRITE); > > > > xfer->actlen = totlen; > > > > err = USBD_NORMAL_COMPLETION; > > > > ret: > > > > > > > > > > If that diff does indeed work, I think a better fix would be to move the > > > DMA syncs from the USB layer into the driver layer, so that for the root > > > methods no sync has to be done, but all the individual device methods do > > > proper syncs. Preparing that diff would take some more minutes though. > > > In the meantime we could use USB_DMA_COHERENT for all requests without > > > an extra allocated buffer, which includes all the control requests. > > > Then move the syncs to the drivers, and at last remove the flag again. > > > > Since kettenis@ wondered how that should look like, I took the time to > > look at eHCI and here's how I approached it. No diff yet, I'll come up > > with that soon. > > > > ====== > > Before Start of Transfer: We need to make sure that the buffers are > > synced before handing them to the hardware. > > > > Check device_*_start to sync &xfer->dmabuf. Easiest way is to check, > > where DMAADDR(&xfer->dmabuf) used. That's where we tell the hardware > > where our transfer buffer is. So those indicate places where we should > > sync before telling the HW to go. > > > > * ehci_pcd(): No device transfer, just memset, sync would be wrong. OK > > * ehci_root_ctrl_start(): No device transfer, sync would be wrong. OK > > * ehci_alloc_sqtd_chain(): chain is allocated based on the buffer, we > > already call usb_syncmem() for the buffer. OK > > * ehci_device_intr_done(): called _after_ the transfer completes and > > after we memcpy the data from a successful transfer. on repeat, where > > it enqueues the transfer again, it calls usb_syncmem() for POST. I > > think that one is nonsense, and should have been done before calling > > usb_transfer_complete(). Then it calls ehci_alloc_sqtd_chain(), which > > does a proper PRE-sync. NOT OK > > * ehci_alloc_itd_chain(): Similar to alloc_sqtd_chain(), but there's no > > sync. NOT OK > > * ehci_alloc_sitd_chain(): Similar to alloc_sqtd_chain(), but there's no > > sync. NOT OK > > ====== > > Before Transfer Completion: We need to make sure buffers are synced > > before handing them back to the stack. > > > > Check usb_transfer_complete() callers to have synced &xfer->dmabuf. But > > not those for error handling, only those were we supposedly have buffers > > with valid data. > > > > * ehci_isoc_idone(): only the descriptors are synced, probably needs a > > sync on the buffer. NOT OK > > * ehci_idone(): only the descriptors are synced, probably needs a sync on > > the buffer, especially since there's the following comment. NOT OK > > > > /* XXX transfer_complete memcpys out transfer data (for in endpoints) > > * during this call, before methods->done is called: dma sync required > > * beforehand? */ > > usb_transfer_complete(xfer); > > > > * ehci_root_ctrl_start: No device transfer, same code path as in the > > other part. OK. > > * all other functions: error handling, no successfull completion, no sync > > needed. OK. > > ==== > > > > So, to summarize: > > > > * ehci_alloc_itd_chain() and ehci_alloc_sitd_chain() both should sync > > so that the buffers handed to the hardware are fine. > > * ehci_isoc_idone() and ehci_idone() both should sync the buffers > > modified by the hardware, before handing them to the stack. > > * ehci_device_intr_done() doesn't need to call usb_syncmem again, > > after we have fixed ehci_idone() to do the sync. > > > > With those changes, removing USB_DMA_COHERENT from the usb_allocmem() > > again, as well as removing the usb_syncmem() calls in usbdi.c, should > > make the Cubox-i work. I'll come up with a diff and see if I can find > > some Cubox-i to test it with. > > > > Though we'll still need to check ohci, uhci, xhci and dwc2 before we > > can actually rip that out again. > > > > Patrick > > And that's the diff I'd try based on the analysis. > > Keep note that the reason the isochronous completion uses xfer->length > instead of xfer->actlen is, that isochronous transfers don't use > continuous buffers. So it's either going through each successful frame > and doing a sync per frame, or simply syncing the whole buffer.
Hi, I went through all the drivers and this is what I came up with. DWC2: This one felt like the hardest, because the complexity of the driver is... huge. It's not a host controller, it's a gadget controller that we use like a host controller. Anyway, I figured there are two relevant points: dwc2_device_start(), where we set up the transfer. And there's a nice if-case already where I just put the PRE-sync. For the POST-, there is a single exit-point that we can use, and if we were successful and transfered data, we can do a sync. EHCI: I talked about that in the previous mail. OHCI: bulk&ctrl use ohci_alloc_std_chain() for setting up the DMA, while intr/isoc have their own specific one. All xfertypes have their exit-point in ohci_softintr, so that's an easy point to use for the syncs. UHCI: intr&ctrl&bulk use uhci_alloc_std_chain(), only isoc has its own again. The single exit-point in uhci_idone makes it easy as well. XHCI: the device_*_start() functions each handle the TDs themselves, so it's easy to add the syncs. There are two exit points, one for isoc, one for the "rest". USBDI: With the syncs now in the driver, we can remove the syncs there, and also remove the workaround I committed yesterday. Patrick diff --git a/sys/dev/usb/dwc2/dwc2.c b/sys/dev/usb/dwc2/dwc2.c index 6ca3cc658e5..e14dbf02746 100644 --- a/sys/dev/usb/dwc2/dwc2.c +++ b/sys/dev/usb/dwc2/dwc2.c @@ -1260,6 +1260,9 @@ dwc2_device_start(struct usbd_xfer *xfer) dwc2_urb->usbdma = &xfer->dmabuf; dwc2_urb->buf = KERNADDR(dwc2_urb->usbdma, 0); dwc2_urb->dma = DMAADDR(dwc2_urb->usbdma, 0); + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); } dwc2_urb->length = len; dwc2_urb->flags = flags; @@ -1649,6 +1652,17 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd, xfer); } + if (xfer->status == USBD_NORMAL_COMPLETION) { + if (xfertype == UE_ISOCHRONOUS) + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); + else if (xfer->actlen) + usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); + } + qtd->urb = NULL; timeout_del(&xfer->timeout_handle); usb_rem_task(xfer->device, &xfer->abort_task); diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c index a352d83eaf4..94a27f48bd9 100644 --- a/sys/dev/usb/ehci.c +++ b/sys/dev/usb/ehci.c @@ -856,6 +856,10 @@ ehci_isoc_idone(struct usbd_xfer *xfer) #endif xfer->actlen = actlen; xfer->status = USBD_NORMAL_COMPLETION; + + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); usb_transfer_complete(xfer); } @@ -911,9 +915,10 @@ ehci_idone(struct usbd_xfer *xfer) } else xfer->status = USBD_NORMAL_COMPLETION; - /* XXX transfer_complete memcpys out transfer data (for in endpoints) - * during this call, before methods->done is called: dma sync required - * beforehand? */ + if (xfer->actlen) + usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); usb_transfer_complete(xfer); DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex)); } @@ -3208,9 +3213,6 @@ ehci_device_intr_done(struct usbd_xfer *xfer) if (xfer->pipe->repeat) { ehci_free_sqtd_chain(sc, ex); - usb_syncmem(&xfer->dmabuf, 0, xfer->length, - usbd_xfer_isread(xfer) ? - BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); sqh = epipe->sqh; err = ehci_alloc_sqtd_chain(sc, xfer->length, xfer, &data, &dataend); @@ -3408,6 +3410,9 @@ ehci_alloc_itd_chain(struct ehci_softc *sc, struct usbd_xfer *xfer) if (nframes == 0) return (1); + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); for (i = 0; i < nframes; i++) { uint32_t froffs = offs; @@ -3522,6 +3527,9 @@ ehci_alloc_sitd_chain(struct ehci_softc *sc, struct usbd_xfer *xfer) if (usbd_xfer_isread(xfer)) endp |= EHCI_SITD_SET_DIR(1); + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); for (i = 0; i < nframes; i++) { uint32_t addr = DMAADDR(&xfer->dmabuf, offs); uint32_t page = EHCI_PAGE(addr + xfer->frlengths[i] - 1); diff --git a/sys/dev/usb/ohci.c b/sys/dev/usb/ohci.c index 7aa2303eabd..1144228b7dc 100644 --- a/sys/dev/usb/ohci.c +++ b/sys/dev/usb/ohci.c @@ -491,6 +491,10 @@ ohci_alloc_std_chain(struct ohci_softc *sc, u_int alen, struct usbd_xfer *xfer, DPRINTFN(alen < 4096,("ohci_alloc_std_chain: start len=%u\n", alen)); + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + len = alen; cur = sp; end = NULL; @@ -1279,6 +1283,12 @@ ohci_softintr(void *v) ohci_free_std(sc, std); if (done) { + if (xfer->actlen) + usb_syncmem(&xfer->dmabuf, 0, + xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : + BUS_DMASYNC_POSTWRITE); xfer->status = USBD_NORMAL_COMPLETION; s = splusb(); usb_transfer_complete(xfer); @@ -1309,9 +1319,15 @@ ohci_softintr(void *v) if (cc == OHCI_CC_STALL) xfer->status = USBD_STALLED; - else if (cc == OHCI_CC_DATA_UNDERRUN) + else if (cc == OHCI_CC_DATA_UNDERRUN) { + if (xfer->actlen) + usb_syncmem(&xfer->dmabuf, 0, + xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : + BUS_DMASYNC_POSTWRITE); xfer->status = USBD_NORMAL_COMPLETION; - else + } else xfer->status = USBD_IOERROR; s = splusb(); usb_transfer_complete(xfer); @@ -1389,6 +1405,13 @@ ohci_softintr(void *v) xfer->actlen = actlen; xfer->hcpriv = NULL; + if (xfer->status == USBD_NORMAL_COMPLETION) { + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : + BUS_DMASYNC_POSTWRITE); + } + s = splusb(); usb_transfer_complete(xfer); splx(s); @@ -2846,6 +2869,10 @@ ohci_device_intr_start(struct usbd_xfer *xfer) panic("ohci_device_intr_transfer: a request"); #endif + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + len = xfer->length; endpt = xfer->pipe->endpoint->edesc->bEndpointAddress; @@ -3057,6 +3084,10 @@ ohci_device_isoc_enter(struct usbd_xfer *xfer) if (sc->sc_bus.dying) return; + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + if (iso->next == -1) { /* Not in use yet, schedule it a few frames ahead. */ iso->next = letoh32(sc->sc_hcca->hcca_frame_number) + 5; diff --git a/sys/dev/usb/uhci.c b/sys/dev/usb/uhci.c index bae6c1b81a9..e3c7c57580f 100644 --- a/sys/dev/usb/uhci.c +++ b/sys/dev/usb/uhci.c @@ -1236,6 +1236,9 @@ uhci_idone(struct usbd_xfer *xfer) actlen += len; } upipe->u.iso.inuse -= nframes; + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); xfer->actlen = actlen; xfer->status = USBD_NORMAL_COMPLETION; goto end; @@ -1299,6 +1302,10 @@ uhci_idone(struct usbd_xfer *xfer) else xfer->status = USBD_IOERROR; /* more info XXX */ } else { + if (xfer->actlen) + usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); xfer->status = USBD_NORMAL_COMPLETION; } @@ -1527,6 +1534,10 @@ uhci_alloc_std_chain(struct uhci_softc *sc, u_int len, struct usbd_xfer *xfer, __func__, addr, UE_GET_ADDR(endpt), len, xfer->device->speed, flags)); + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + mps = UGETW(xfer->pipe->endpoint->edesc->wMaxPacketSize); if (mps == 0) { printf("uhci_alloc_std_chain: mps=0\n"); @@ -2127,6 +2138,10 @@ uhci_device_isoc_enter(struct usbd_xfer *xfer) printf("uhci_device_isoc_enter: overflow!\n"); #endif + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + next = iso->next; if (next == -1) { /* Not in use yet, schedule it a few frames ahead. */ diff --git a/sys/dev/usb/usbdi.c b/sys/dev/usb/usbdi.c index f6707432e41..5ba01fbd0bd 100644 --- a/sys/dev/usb/usbdi.c +++ b/sys/dev/usb/usbdi.c @@ -305,22 +305,15 @@ usbd_transfer(struct usbd_xfer *xfer) if (xfer->rqflags & URQ_AUTO_DMABUF) printf("usbd_transfer: has old buffer!\n"); #endif - err = usb_allocmem(bus, xfer->length, 0, USB_DMA_COHERENT, - &xfer->dmabuf); + err = usb_allocmem(bus, xfer->length, 0, 0, &xfer->dmabuf); if (err) return (err); xfer->rqflags |= URQ_AUTO_DMABUF; } - if (!usbd_xfer_isread(xfer)) { - if ((xfer->flags & USBD_NO_COPY) == 0) - memcpy(KERNADDR(&xfer->dmabuf, 0), xfer->buffer, - xfer->length); - usb_syncmem(&xfer->dmabuf, 0, xfer->length, - BUS_DMASYNC_PREWRITE); - } else - usb_syncmem(&xfer->dmabuf, 0, xfer->length, - BUS_DMASYNC_PREREAD); + if (!usbd_xfer_isread(xfer) && (xfer->flags & USBD_NO_COPY) == 0) + memcpy(KERNADDR(&xfer->dmabuf, 0), xfer->buffer, + xfer->length); usb_tap(bus, xfer, USBTAP_DIR_OUT); @@ -750,17 +743,10 @@ usb_transfer_complete(struct usbd_xfer *xfer) } #endif - if (xfer->actlen != 0) { - if (usbd_xfer_isread(xfer)) { - usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, - BUS_DMASYNC_POSTREAD); - if (!(xfer->flags & USBD_NO_COPY)) - memcpy(xfer->buffer, KERNADDR(&xfer->dmabuf, 0), - xfer->actlen); - } else - usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, - BUS_DMASYNC_POSTWRITE); - } + if (usbd_xfer_isread(xfer) && xfer->actlen != 0 && + (xfer->flags & USBD_NO_COPY) == 0) + memcpy(xfer->buffer, KERNADDR(&xfer->dmabuf, 0), + xfer->actlen); /* if we allocated the buffer in usbd_transfer() we free it here. */ if (xfer->rqflags & URQ_AUTO_DMABUF) { diff --git a/sys/dev/usb/xhci.c b/sys/dev/usb/xhci.c index f98f9a0b8fe..b73d4fc2ac2 100644 --- a/sys/dev/usb/xhci.c +++ b/sys/dev/usb/xhci.c @@ -851,6 +851,10 @@ xhci_event_xfer_generic(struct xhci_softc *sc, struct usbd_xfer *xfer, else xfer->actlen = xfer->length; } + if (xfer->actlen) + usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); xfer->status = USBD_NORMAL_COMPLETION; break; case XHCI_CODE_SHORT_XFER: @@ -871,6 +875,10 @@ xhci_event_xfer_generic(struct xhci_softc *sc, struct usbd_xfer *xfer, DEVNAME(sc), xfer, xx->index)); return (1); } + if (xfer->actlen) + usb_syncmem(&xfer->dmabuf, 0, xfer->actlen, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); xfer->status = USBD_NORMAL_COMPLETION; break; case XHCI_CODE_TXERR: @@ -975,6 +983,9 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp, xp->skip = 0; } + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); xfer->status = USBD_NORMAL_COMPLETION; return (0); @@ -2809,6 +2820,11 @@ xhci_device_ctrl_start(struct usbd_xfer *xfer) if (xp->free_trbs < 3) return (USBD_NOMEM); + if (len != 0) + usb_syncmem(&xfer->dmabuf, 0, len, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + /* We'll toggle the setup TRB once we're finished with the stages. */ trb0 = xhci_xfer_get_trb(sc, xfer, &toggle, 0); @@ -2935,6 +2951,10 @@ xhci_device_generic_start(struct usbd_xfer *xfer) if (xp->free_trbs < (ntrb + zerotd)) return (USBD_NOMEM); + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + /* We'll toggle the first TRB once we're finished with the chain. */ trb0 = xhci_xfer_get_trb(sc, xfer, &toggle, (ntrb == 1)); flags = XHCI_TRB_TYPE_NORMAL | (toggle ^ 1); @@ -3091,6 +3111,10 @@ xhci_device_isoc_start(struct usbd_xfer *xfer) if (xp->free_trbs < ntrb) return (USBD_NOMEM); + usb_syncmem(&xfer->dmabuf, 0, xfer->length, + usbd_xfer_isread(xfer) ? + BUS_DMASYNC_PREREAD : BUS_DMASYNC_PREWRITE); + paddr = DMAADDR(&xfer->dmabuf, 0); for (i = 0, trb0 = NULL; i < xfer->nframes; i++) {