Module Name:    src
Committed By:   riastradh
Date:           Fri Aug 12 13:24:37 UTC 2022

Modified Files:
        src/share/man/man9: bus_space.9

Log Message:
bus_space(9): Update barrier semantics to match reality and sense.

As proposed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2022/07/16/msg028249.html

tl;dr:
- bus_space_barrier is needed only with prefetchable/cacheable.
- BUS_SPACE_BARRIER_READ is like membar_acquire.
- BUS_SPACE_BARRIER_WRITE is like membar_release.
- READ|WRITE is like membar_sync.


To generate a diff of this commit:
cvs rdiff -u -r1.53 -r1.54 src/share/man/man9/bus_space.9

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/share/man/man9/bus_space.9
diff -u src/share/man/man9/bus_space.9:1.53 src/share/man/man9/bus_space.9:1.54
--- src/share/man/man9/bus_space.9:1.53	Mon Nov 13 09:10:37 2017
+++ src/share/man/man9/bus_space.9	Fri Aug 12 13:24:37 2022
@@ -1,4 +1,4 @@
-.\" $NetBSD: bus_space.9,v 1.53 2017/11/13 09:10:37 wiz Exp $
+.\" $NetBSD: bus_space.9,v 1.54 2022/08/12 13:24:37 riastradh Exp $
 .\"
 .\" Copyright (c) 1997 The NetBSD Foundation, Inc.
 .\" All rights reserved.
@@ -27,7 +27,7 @@
 .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 .\" POSSIBILITY OF SUCH DAMAGE.
 .\"
-.Dd September 15, 2016
+.Dd August 12, 2022
 .Dt BUS_SPACE 9
 .Os
 .Sh NAME
@@ -466,49 +466,30 @@ that all of the data being accessed is i
 handle describes.
 Trying to access data outside that region is an error.
 .Pp
-Because some architectures' memory systems use buffering to improve
-memory and device access performance, there is a mechanism which can be
-used to create
-.Dq barriers
-in the bus space read and write stream.
-.Pp
-There are two types of barriers: ordering barriers and completion
-barriers.
-.Pp
-Ordering barriers prevent some operations from bypassing other
-operations.
-They are relatively light weight and described in terms of the
-operations they are intended to order.
-The important thing to note is that they create specific ordering
-constraint surrounding bus accesses but do not necessarily force any
-synchronization themselves.
-So, if there is enough distance between the memory operations being
-ordered, the preceding ones could complete by themselves resulting
-in no performance penalty.
-.Pp
-For instance, a write before read barrier will force any writes
-issued before the barrier instruction to complete before any reads
-after the barrier are issued.
-This forces processors with write buffers to read data from memory rather
-than from the pending write in the write buffer.
-.Pp
-Ordering barriers are usually sufficient for most circumstances,
-and can be combined together.
-For instance a read before write barrier can be combined with a write
-before write barrier to force all memory operations to complete before
-the next write is started.
-.Pp
-Completion barriers force all memory operations and any pending
-exceptions to be completed before any instructions after the
-barrier may be issued.
-Completion barriers are extremely expensive and almost never required
-in device driver code.
-A single completion barrier can force the processor to stall on memory
-for hundreds of cycles on some machines.
-.Pp
-Correctly-written drivers will include all appropriate barriers,
-and assume only the read/write ordering imposed by the barrier
-operations.
+Bus space I/O operations on mappings made with
+.Dv BUS_SPACE_MAP_PREFETCHABLE
+or
+.Dv BUS_SPACE_MAP_CACHEABLE
+may be reordered or combined for performance on devices that support
+it, such as write-combining
+.Pq "a.k.a." Sq prefetchable
+graphics framebuffers or cacheable ROM images.
+The
+.Fn bus_space_barrier
+function orders reads and writes in prefetchable or cacheable mappings
+relative to other reads and writes in bus spaces.
+Barriers are needed
+.Em only
+when prefetchable or cacheable mappings are involved:
+.Bl -bullet
+.It
+Bus space reads and writes on non-prefetchable, non-cacheable mappings
+at a single device are totally ordered with one another.
+.It
+Ordering of memory operations on normal memory with bus space I/O
+for triggering DMA or being notified of DMA completion requires
+.Xr bus_dmamap_sync 9 .
+.El
 .Pp
 People trying to write portable drivers with the
 .Nm
@@ -1185,9 +1166,9 @@ be read, on others it may cause a system
 .Pp
 Read operations done by the
 .Fn bus_space_read_N
-functions may be executed out
-of order with respect to other pending read and write operations unless
-order is enforced by use of the
+functions may be executed out of order with respect to other read and
+write operations if either are on prefetchable or cacheable mappings
+unless order is enforced by use of the
 .Fn bus_space_barrier
 function.
 .Pp
@@ -1223,8 +1204,8 @@ to be written, on others it may cause a 
 .Pp
 Write operations done by the
 .Fn bus_space_write_N
-functions may be executed
-out of order with respect to other pending read and write operations
+functions may be executed out of order with respect to other read and
+write operations if either are on prefetchable or cacheable mappings
 unless order is enforced by use of the
 .Fn bus_space_barrier
 function.
@@ -1267,13 +1248,6 @@ functions also apply to
 and
 .Fn bus_space_poke_N .
 .Pp
-In addition, explicit calls to the
-.Fn bus_space_barrier
-function are not required as the implementation will ensure all
-pending operations complete before the peek or poke operation starts.
-The implementation will also ensure that the peek or poke operations
-complete before returning.
-.Pp
 The return value indicates the outcome of the peek or poke operation.
 A return value of zero implies that a hardware device is
 responding to the operation at the specified offset in the bus space.
@@ -1334,12 +1308,19 @@ of the bus space specified by
 .Fa space .
 .El
 .Sh BARRIERS
-In order to allow high-performance buffering implementations to avoid bus
-activity on every operation, read and write ordering should be specified
-explicitly by drivers when necessary.
-The
+Devices that support prefetchable (also known as
+.Sq write-combining )
+or cacheable I/O may be mapped with
+.Dv BUS_SPACE_MAP_PREFETCHABLE
+or
+.Dv BUS_SPACE_MAP_CACHEABLE
+for higher performance by allowing bus space read and write operations
+to be reordered, fused, torn, and/or cached by the system.
+.Pp
+When a driver requires ordering, e.g. to write to a command ring in bus
+space and then update the command ring pointer, the
 .Fn bus_space_barrier
-function provides that ability.
+function enforces it.
 .Pp
 .Bl -ohang -compact
 .It Fn bus_space_barrier "space" "handle" "offset" "length" "flags"
@@ -1362,67 +1343,95 @@ argument controls what types of operatio
 Supported flags are:
 .Bl -tag -width BUS_SPACE_BARRIER_WRITE -offset indent
 .It Dv BUS_SPACE_BARRIER_READ
-Force all
-.Nm
-operations before the barrier to complete before any reads
-after the barrier may be issued.
+Guarantee that any program-prior bus space read on
+.Em any
+bus space has returned data from the device or memory before any
+program-later bus space read, bus space write, or memory access via
+.Fn bus_space_vaddr ,
+on the specified range in the given bus space.
+.Pp
+This functions similarly to
+.Xr membar_acquire 3 ,
+but additionally orders bus space I/O which
+.Xr membar_ops 3
+may not.
 .It Dv BUS_SPACE_BARRIER_WRITE
-Force all
-.Nm
-operations before the barrier to complete before any writes
-after the barrier may be issued.
+Guarantee that any program-prior bus space read, bus space write, or
+memory access via
+.Fn bus_space_vaddr ,
+on the specified range in the given bus space, has completed before any
+program-later bus space write on
+.Em any
+bus space.
+.Pp
+This functions similarly to
+.Xr membar_release 3 ,
+but additionally orders bus space I/O which
+.Xr membar_ops 3
+may not.
+.It Dv "BUS_SPACE_BARRIER_READ" Li "|" Dv "BUS_SPACE_BARRIER_WRITE"
+Guarantee that any program-prior bus space read, bus space write, or
+memory access via
+.Fn bus_space_vaddr
+on
+.Em any
+bus space has completed before any program-later bus space read, bus
+space write, or memory access via
+.Fn bus_space_vaddr
+on
+.Em any
+bus space.
+.Pp
+Note that this is independent of the specified bus space and range.
+.Pp
+This functions similarly to
+.Xr membar_sync 3 ,
+but additionally orders bus space I/O which
+.Xr membar_ops 3
+may not.
+This combination is very unusual, and often much more expensive; most
+drivers do not need it.
 .El
 .Pp
-Those flags can be combined (or-ed together) to enforce ordering on
-different combinations of read and write operations.
+Example: Consider a command ring in bus space with a command ring
+pointer register, and a response ring in bus space with a response ring
+pointer register.
+.Bd -literal
+error = bus_space_map(sc->sc_regt, ..., 0, &sc->sc_regh);
+if (error)
+	\&...
+error = bus_space_map(sc->sc_memt, ..., BUS_SPACE_MAP_PREFETCHABLE,
+    &sc->sc_memh);
+if (error)
+	\&...
+.Ed
 .Pp
-All of the specified type(s) of operation which are done to the region
-before the barrier operation are guaranteed to complete before any of the
-specified type(s) of operation done after the barrier.
-.Pp
-Example: Consider a hypothetical device with two single-byte ports, one
-write-only input port (at offset 0) and a read-only output port (at
-offset 1).
-Operation of the device is as follows: data bytes are written to the
-input port, and are placed by the device on a stack, the top of
-which is read by reading from the output port.
-The sequence to correctly write two data bytes to the device then read
-those two data bytes back would be:
+To submit a command (assuming there is space in the ring), first write
+it out and then update the pointer:
 .Bd -literal
-/*
- * t and h are the tag and handle for the mapped device's
- * space.
- */
-bus_space_write_1(t, h, 0, data0);
-bus_space_barrier(t, h, 0, 1, BUS_SPACE_BARRIER_WRITE); /* 1 */
-bus_space_write_1(t, h, 0, data1);
-bus_space_barrier(t, h, 0, 2, BUS_SPACE_BARRIER_WRITE);  /* 2 */
-ndata1 = bus_space_read_1(t, h, 1);
-bus_space_barrier(t, h, 1, 1, BUS_SPACE_BARRIER_READ);   /* 3 */
-ndata0 = bus_space_read_1(t, h, 1);
-/* data0 == ndata0, data1 == ndata1 */
+i = sc->sc_nextcmdslot;
+bus_space_write_4(sc->sc_memt, sc->sc_memh, CMDSLOT(i), cmd);
+bus_space_write_4(sc->sc_memt, sc->sc_memh, CMDSLOT(i) + 4, arg1);
+bus_space_write_4(sc->sc_memt, sc->sc_memh, CMDSLOT(i) + 8, arg2);
+\&...
+bus_space_write_4(sc->sc_memt, sc->sc_memh, CMDSLOT(i) + 4*n, argn);
+bus_space_barrier(sc->sc_memt, sc->sc_memh, CMDSLOT(i), 4*n,
+    BUS_SPACE_BARRIER_WRITE);
+bus_space_write_4(sc->sc_regt, sc->sc_regh, CMDPTR, i);
+sc->sc_nextcmdslot = (i + n + 1) % sc->sc_ncmdslots;
 .Ed
 .Pp
-The first barrier makes sure that the first write finishes before the
-second write is issued, so that two writes to the input port are done
-in order and are not collapsed into a single write.
-This ensures that the data bytes are written to the device correctly and
-in order.
-.Pp
-The second barrier forces the writes to the output port finish before
-any of the reads to the input port are issued, thereby making sure
-that all of the writes are finished before data is read.
-This ensures that the first byte read from the device really is the last
-one that was written.
-.Pp
-The third barrier makes sure that the first read finishes before the
-second read is issued, ensuring that data is read correctly and in order.
-.Pp
-The barriers in the example above are specified to cover the absolute
-minimum number of bus space locations.
-It is correct (and often easier) to make barrier operations cover the
-device's whole range of bus space, that is, to specify an offset of zero
-and the size of the whole region.
+To obtain a response, read the pointer first and then the ring data:
+.Bd -literal
+ptr = bus_space_read_4(sc->sc_regt, sc->sc_regh, RESPPTR);
+while ((i = sc->sc_nextrespslot) != ptr) {
+	bus_space_barrier(sc->sc_memt, sc->sc_memh, RESPSLOT(i), 4,
+	    BUS_SPACE_BARRIER_READ);
+	status = bus_space_read_4(sc->sc_memt, sc->sc_memh, RESPSLOT(i));
+	handle_response(status);
+	sc->sc_nextrespslot = (i + 1) % sc->sc_nrespslots;
+}
+.Ed
 .El
 .Sh REGION OPERATIONS
 Some devices use buffers which are mapped as regions in bus space.
@@ -1479,8 +1488,9 @@ to be read, on others it may cause a sys
 Read operations done by the
 .Fn bus_space_read_region_N
 functions may be executed in any order.
-They may also be executed out of order with respect to other pending
-read and write operations unless order is enforced by use of the
+They may also be executed out of order with respect to other read and
+write operations if either are on prefetchable or cacheable mappings
+unless order is enforced by use of the
 .Fn bus_space_barrier
 function.
 There is no way to insert barriers between reads of individual bus
@@ -1529,8 +1539,9 @@ Write operations done by the
 .Fn bus_space_write_region_N
 functions may be
 executed in any order.
-They may also be executed out of order with respect to other pending read
-and write operations unless order is enforced by use of the
+They may also be executed out of order with respect to other read and
+write operations if either are on prefetchable or cacheable mappings
+unless order is enforced by use of the
 .Fn bus_space_barrier
 function.
 There is no way to insert barriers between writes of individual bus
@@ -1582,9 +1593,11 @@ to be copied, on others it may cause a s
 Read and write operations done by the
 .Fn bus_space_copy_region_N
 functions may be executed in any order.
-They may also be executed out of order with respect to other pending
-read and write operations unless order is enforced by use of the
-.Fn bus_space_barrier function .
+They may also be executed out of order with respect to other read and
+write operations if either are on prefetchable or cacheable mappings
+unless order is enforced by use of the
+.Fn bus_space_barrier
+function.
 There is no way to insert barriers between reads or writes of
 individual bus space locations executed by the
 .Fn bus_space_copy_region_N
@@ -1635,8 +1648,9 @@ Write operations done by the
 .Fn bus_space_set_region_N
 functions may be
 executed in any order.
-They may also be executed out of order with respect to other pending read
-and write operations unless order is enforced by use of the
+They may also be executed out of order with respect to other read and
+write operations if either are on prefetchable or cacheable mappings
+unless order is enforced by use of the
 .Fn bus_space_barrier
 function.
 There is no way to insert barriers between writes of
@@ -1693,16 +1707,13 @@ to be read, on others it may cause a sys
 .Pp
 Read operations done by the
 .Fn bus_space_read_multi_N
-functions may be
-executed out of order with respect to other pending read and write
-operations unless order is enforced by use of the
+functions may be executed out of order with respect to other read and
+write operations if the latter are on prefetchable or cacheable
+mappings unless order is enforced by use of the
 .Fn bus_space_barrier
 function.
-Because the
 .Fn bus_space_read_multi_N
-functions read the same bus space location multiple times, they
-place an implicit read barrier between each successive read of that bus
-space location.
+makes no sense itself on prefetchable or cacheable mappings.
 .Pp
 These functions will never fail.
 If they would fail (e.g., because of an argument error), that indicates
@@ -1741,15 +1752,13 @@ to be written, on others it may cause a 
 .Pp
 Write operations done by the
 .Fn bus_space_write_multi_N
-functions may be executed out of order with respect to other pending
-read and write operations unless order is enforced by use of the
+functions may be executed out of order with respect to other read and
+write operations if the latter are on prefetchable or cacheable
+mappings unless order is enforced by use of the
 .Fn bus_space_barrier
 function.
-Because the
 .Fn bus_space_write_multi_N
-functions write the same bus space location multiple times, they
-place an implicit write barrier between each successive write of that
-bus space location.
+makes no sense itself on prefetchable or cacheable mappings.
 .Pp
 These functions will never fail.
 If they would fail (e.g., because of an argument error), that indicates
@@ -1900,22 +1909,6 @@ There are several changes and improvemen
 including:
 .Bl -bullet
 .It
-Providing a mechanism by which incorrectly-written drivers will be
-automatically given barriers and properly-written drivers won't be forced
-to use more barriers than they need.
-This should probably be done via a
-.Li #define
-in the incorrectly-written drivers.
-Unfortunately, at this time, few drivers actually use barriers correctly
-(or at all).
-Because of that,
-.Nm
-implementations on architectures which do buffering must always
-do the barriers inside the
-.Nm
-calls, to be safe.
-That has a potentially significant performance impact.
-.It
 Exporting the
 .Nm
 functions to userland so that applications

Reply via email to