Module Name:    src
Committed By:   riastradh
Date:           Fri Aug 12 13:44:12 UTC 2022

Modified Files:
        src/sys/arch/x86/x86: bus_dma.c

Log Message:
x86: Adjust fences issued in bus_dmamap_sync after bouncing.

And expand the comment on the lfence for POSTREAD before bouncing.

Net change:

op                      before bounce       after bounce
                                        old             new
PREREAD                 nop             lfence          sfence
PREWRITE                nop             mfence          sfence
PREREAD|PREWRITE        nop             mfence          sfence
POSTREAD                lfence          lfence          nop[*]
POSTWRITE               nop             mfence          nop
POSTREAD|POSTWRITE      lfence          mfence          nop[*]

The case of PREREAD is as follows:

1. loads and stores before DMA buffer may be allocated for the purpose
2. bus_dmamap_sync(BUS_DMASYNC_PREREAD)
3. store to register or DMA descriptor to trigger DMA

The register or DMA descriptor may be in any type of memory (or I/O).

lfence at (2) is _not enough_ to ensure stores at (1) have completed
before the store in (3) in case the register or DMA descriptor lives
in wc/wc+ memory, or the store to it is non-temporal: in that case,
it may executed early before all the stores in (1) have completed.

On the other hand, lfence at (2) is _not needed_ to ensure loads in
(1) have completed before the store in (3), because x86 never
reorders load;store to store;load.  So we may need to enforce
store/store ordering, but not any other ordering, hence sfence.

The case of PREWRITE is as follows:

1. stores to DMA buffer (and loads from it, before allocated)
2. bus_dmamap_sync(BUS_DMASYNC_PREWRITE)
3. store to register or DMA descriptor to trigger DMA

Ensuring prior loads have completed is not necessary because x86
never reorders load;store to store;load (and in any case, the device
isn't changing the DMA buffer, so it's safe to read over and over
again).  But we must ensure the stores in (1) have completed before
the store in (3).  So we need sfence, in case either the DMA buffer
or the register or the DMA descriptor is in wc/wc+ memory or either
store is non-temporal.  But we don't need mfence.

The case of POSTREAD is as follows:

1. load from register or DMA descriptor notifying DMA completion
2. bus_dmamap_sync(BUS_DMASYNC_POSTREAD)
   (a) lfence [*]
   (b) if bouncing, memcpy(userbuf, bouncebuf, ...)
   (c) ???
3. loads from DMA buffer to use data, and stores to reuse buffer

This certainly needs an lfence to prevent the loads at (3) from being
executed early -- but bus_dmamap_sync already issues lfence in that
case at 2(a), before it conditionally loads from the bounce buffer
into the user's buffer.  So we don't need any _additional_ fence
_after_ bouncing at 2(c).

The case of POSTWRITE is as follows:

1. load from register or DMA descriptor notifying DMA completion
2. bus_dmamap_sync(BUS_DMASYNC_POSTWRITE)
3. loads and stores to reuse buffer

Stores at (3) will never be executed early because x86 never reorders
load;store to store;load for any memory types.  Loads at (3) are
harmless because the device isn't changing the buffer -- it's
supposed to be fixed from the time of PREWRITE to the time of
POSTWRITE as far as the CPU can witness.

Proposed on port-amd64 last month:

https://mail-index.netbsd.org/port-amd64/2022/07/16/msg003593.html

Reference:

AMD64 Architecture Programmer's Manual, Volume 2: System Programming,
24593--Rev. 3.38--November 2021, Sec. 7.4.2 Memory Barrier Interaction
with Memory Types, Table 7-3, p. 196.
https://www.amd.com/system/files/TechDocs/24593.pdf


To generate a diff of this commit:
cvs rdiff -u -r1.85 -r1.86 src/sys/arch/x86/x86/bus_dma.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Reply via email to