> Date: Sun, 24 Nov 2019 19:25:52 +
> From: Taylor R Campbell
>
> This thread is not converging on consensus, so we're discussing the
> semantics and naming of these operations as core and will come back
> with a decision by the end of the week.
We (core) carefully read the thread, and discussed this and the
related Linux READ_ONCE/WRITE_ONCE macros as well as the C11 atomic
API.
For maxv: Please add conditional definitions in
according to what KCSAN needs, and use atomic_load/store_relaxed
for counters and other integer objects in the rest of your patch.
(I didn't see any pointer loads there.) For uvm's lossy counters,
please use atomic_store_relaxed(p, 1 + atomic_load_relaxed(p)) and
not an __add_once macro -- since these should really be per-CPU
counters, we don't want to endorse this pattern by making it
pretty.
* Summary
We added a few macros to for the purpose,
atomic_load_(p) and atomic_store_(p,v). The
orderings are relaxed, acquire, consume, and release, and are intended
to match C11 semantics. See the new atomic_loadstore(9) man page for
reference.
Currently they are defined in terms of volatile loads and stores, but
we should eventually use the C11 atomic API instead in order to
provide the intended atomicity guarantees under all compilers without
having to rely on the folklore interpretations of volatile.
* Details
There are four main properties involved in the operations under
discussion:
1. No tearing. A 32-bit write can't be split into two separate 16-bit
writes, for instance.
* In _some_ cases, namely aligned pointers to sufficiently small
objects, Linux READ_ONCE/WRITE_ONCE guarantee no tearing.
* C11 atomic_load/store guarantees no tearing -- although on large
objects it may involve locks, requiring the C11 type qualifier
_Atomic and changing the ABI.
This was the primary motivation for maxv's original question.
2. No fusing. Consecutive writes can't be combined into one, for
instance, or a write followed by a read can't skip the read to
return the value that was written.
* Linux's READ_ONCE/WRITE_ONCE and C11's atomic_load/store
guarantee no fusing.
3. Data-dependent memory ordering. If you read a pointer, and then
dereference the pointer (maybe plus some offset), the reads happen
in that order.
* Linux's READ_ONCE guarantees this by issuing the analogue of
membar_datadep_consumer on DEC Alpha, and nothing on other CPUs.
* C11's atomic_load guarantees this with seq_cst, acquire, or
consume memory ordering.
4. Cost. There's no need to incur cost of read/modify/write atomic
operations, and for many purposes, no need to incur cost of
memory-ordering barriers.
To express these, we've decided to add a few macros that are similar
to Linux's READ_ONCE/WRITE_ONCE and C11's atomic_load/store_explicit
but are less error-prone and less cumbersome:
#include
- atomic_load_relaxed(p) is like *p, but guarantees no tearing and no
fusing. No ordering relative to memory operations on other objects
is guaranteed.
- atomic_store_relaxed(p, v) is like *p = v, but guarantees no tearing
and no fusing. No ordering relative to memory operations on other
objects is guaranteed.
- atomic_store_release(p, v) and atomic_load_acquire(p) are,
respectively, like *p = v and *p, but guarantee no tearing and no
fusing. They _also_ guarantee for logic like
Thread AThread B
stuff();
atomic_store_release(p, v);
u = atomic_load_acquire(p);
things();
that _if_ the atomic_load_acquire(p) in thread B witnesses the state
of the object at p set by atomic_store_release(p, v) in thread A,
then all memory operations in stuff() happen before any memory
operations in things().
No guarantees if only one thread participates -- the store-release
and load-acquire _must_ be paired.
- atomic_load_consume(p) is like atomic_load_acquire(p), but it only
guarantees ordering for data-dependent memory references. Like
atomic_load_acquire, it must be paired with atomic_store_release.
However, on most CPUs, it is as _cheap_ as atomic_load_relaxed.
The atomic load/store operations are defined _only_ on objects as
large as the architecture can support -- so, for example, on 32-bit
platforms they cannot be used on 64-bit quantities; attempts to do so
will lead to compile-time errors. They are also defined _only_ on
aligned pointers -- using them on unaligned pointers may lead to
run-time crashes, even on architectures without strict alignment
requirements.
* Why the names atomic_{load,store}_?
- Atomic. Although `atomic' may suggest `expensive' to some people
(and I'm guilty of making that connection in the past), what's
really expensive is atomic _read/modify/write_ operations and
_memory ordering guarantees_.
Merely preventing tearing