Module Name: src Committed By: riastradh Date: Fri Aug 12 13:44:12 UTC 2022
Modified Files: src/sys/arch/x86/x86: bus_dma.c Log Message: x86: Adjust fences issued in bus_dmamap_sync after bouncing. And expand the comment on the lfence for POSTREAD before bouncing. Net change: op before bounce after bounce old new PREREAD nop lfence sfence PREWRITE nop mfence sfence PREREAD|PREWRITE nop mfence sfence POSTREAD lfence lfence nop[*] POSTWRITE nop mfence nop POSTREAD|POSTWRITE lfence mfence nop[*] The case of PREREAD is as follows: 1. loads and stores before DMA buffer may be allocated for the purpose 2. bus_dmamap_sync(BUS_DMASYNC_PREREAD) 3. store to register or DMA descriptor to trigger DMA The register or DMA descriptor may be in any type of memory (or I/O). lfence at (2) is _not enough_ to ensure stores at (1) have completed before the store in (3) in case the register or DMA descriptor lives in wc/wc+ memory, or the store to it is non-temporal: in that case, it may executed early before all the stores in (1) have completed. On the other hand, lfence at (2) is _not needed_ to ensure loads in (1) have completed before the store in (3), because x86 never reorders load;store to store;load. So we may need to enforce store/store ordering, but not any other ordering, hence sfence. The case of PREWRITE is as follows: 1. stores to DMA buffer (and loads from it, before allocated) 2. bus_dmamap_sync(BUS_DMASYNC_PREWRITE) 3. store to register or DMA descriptor to trigger DMA Ensuring prior loads have completed is not necessary because x86 never reorders load;store to store;load (and in any case, the device isn't changing the DMA buffer, so it's safe to read over and over again). But we must ensure the stores in (1) have completed before the store in (3). So we need sfence, in case either the DMA buffer or the register or the DMA descriptor is in wc/wc+ memory or either store is non-temporal. But we don't need mfence. The case of POSTREAD is as follows: 1. load from register or DMA descriptor notifying DMA completion 2. bus_dmamap_sync(BUS_DMASYNC_POSTREAD) (a) lfence [*] (b) if bouncing, memcpy(userbuf, bouncebuf, ...) (c) ??? 3. loads from DMA buffer to use data, and stores to reuse buffer This certainly needs an lfence to prevent the loads at (3) from being executed early -- but bus_dmamap_sync already issues lfence in that case at 2(a), before it conditionally loads from the bounce buffer into the user's buffer. So we don't need any _additional_ fence _after_ bouncing at 2(c). The case of POSTWRITE is as follows: 1. load from register or DMA descriptor notifying DMA completion 2. bus_dmamap_sync(BUS_DMASYNC_POSTWRITE) 3. loads and stores to reuse buffer Stores at (3) will never be executed early because x86 never reorders load;store to store;load for any memory types. Loads at (3) are harmless because the device isn't changing the buffer -- it's supposed to be fixed from the time of PREWRITE to the time of POSTWRITE as far as the CPU can witness. Proposed on port-amd64 last month: https://mail-index.netbsd.org/port-amd64/2022/07/16/msg003593.html Reference: AMD64 Architecture Programmer's Manual, Volume 2: System Programming, 24593--Rev. 3.38--November 2021, Sec. 7.4.2 Memory Barrier Interaction with Memory Types, Table 7-3, p. 196. https://www.amd.com/system/files/TechDocs/24593.pdf To generate a diff of this commit: cvs rdiff -u -r1.85 -r1.86 src/sys/arch/x86/x86/bus_dma.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.