On Thu, Mar 10, 2022 at 10:45:47AM +0000, Laurence Tratt wrote:
> On Thu, Mar 10, 2022 at 09:05:54AM +0000, Visa Hankala wrote:
>
> Hello Visa,
>
> > In general, atomic_* functions have not provided implicit memory
> > barriers on OpenBSD.
>
> I've used atomics fairly extensively in other settings. Forgive me if I'm
> explaining the obvious, but I had a devil of a job making sense of this
> stuff a few years back, and so perhaps others might find it useful to expand
> on this point.
>
> Quick background: modern CPUs come in two main flavours, weakly ordered
> (e.g. most Arm systems) and strongly ordered (e.g. x86), which determine the
> rules of when multiple cores can see the reads/writes of other cores. Weakly
> ordered systems can move/optimise loads/stores around more than strongly
> ordered systems (so code that seems to work fine on x86 can then fail on
> Arm).
>
> There are in a sense two "safe" ways to use atomics: to assume that each
> atomic is isolated and that reading/writing to it tells you nothing about any
> other location in memory; or that every atomic is fully ordered with respect
> to every other atomic (i.e. no reorderings of atomic operations are allowed).
> The former is fast but (without additional operations) can't even express
> a mutex safely. The latter doesn't have very good performance.
>
> C11 thus allows you to do various atomic operations with different memory
> orderings [1] so that you can choose on a case-by-case basis what you're
> prepared to tolerate. "relaxed" is the most efficient but has the least
> guarantees; "seq_cst" is the least efficient but has the most guarantees.
>
> I would be very nervous about adding further atomic functions (as in the
> original diff) to OpenBSD that don't allow the user to specify what ordering
> they want: it's impossible to pick a memory ordering that suits all use
> cases. For example, neither READ_ONCE nor the existing atomic_* instructions
> define an ordering: I suppose I'd have to to assume they're relaxed. I worry
> that people might assume these atomic operations provide greater guarantees
> than they actually do.
My understanding is that OpenBSD's atomic_* instructions are relaxed
in terms of memory order. To accomplish a specific form of semantics,
the user adds the appropriate memory barrier instruction. Well, this
is the plan at least, I think.
Below is a patch that shows how to accomplish release semantics with
the file and refcnt APIs. (The added memory barriers ensure that the
CPU completes its loads and stores to the object before dropping the
reference. Another CPU might delete the object immediately after.
The barrier might not be strictly necessary with refcnt_finalize(),
though.)
> Fortunately since, AFAICT, we already use C11 (or C17?!) for base, and LLVM
> includes all of the relevant functions (e.g. the compare_exchange family
> [2]) I don't think we need to add any functions of our own? It might not
> even be a bad idea to deprecate the current atomic_* functions in base
> and migrate to the C11 alternatives?
Some of the architectures are still using GCC 4.2.1.
Base does not use C11 at the moment.
Index: kern/kern_synch.c
===================================================================
RCS file: src/sys/kern/kern_synch.c,v
retrieving revision 1.182
diff -u -p -r1.182 kern_synch.c
--- kern/kern_synch.c 19 Feb 2022 23:56:18 -0000 1.182
+++ kern/kern_synch.c 10 Mar 2022 13:37:50 -0000
@@ -825,6 +825,7 @@ refcnt_rele(struct refcnt *r)
{
u_int refcnt;
+ membar_exit_before_atomic();
refcnt = atomic_dec_int_nv(&r->refs);
KASSERT(refcnt != ~0);
@@ -844,6 +845,7 @@ refcnt_finalize(struct refcnt *r, const
struct sleep_state sls;
u_int refcnt;
+ membar_exit_before_atomic();
refcnt = atomic_dec_int_nv(&r->refs);
while (refcnt) {
sleep_setup(&sls, r, PWAIT, wmesg, 0);
Index: sys/file.h
===================================================================
RCS file: src/sys/sys/file.h,v
retrieving revision 1.65
diff -u -p -r1.65 file.h
--- sys/file.h 20 Jan 2022 03:43:31 -0000 1.65
+++ sys/file.h 10 Mar 2022 13:37:50 -0000
@@ -36,6 +36,7 @@
#include <sys/fcntl.h>
#else /* _KERNEL */
+#include <sys/atomic.h>
#include <sys/queue.h>
#include <sys/mutex.h>
#endif /* _KERNEL */
@@ -113,13 +114,21 @@ struct file {
atomic_inc_int(&(fp)->f_count); \
} while (0)
-#define FRELE(fp,p) \
- (atomic_dec_int_nv(&fp->f_count) == 0 ? fdrop(fp, p) : 0)
-
#define FDUP_MAX_COUNT (UINT_MAX - 2 * MAXCPUS)
int fdrop(struct file *, struct proc *);
+static inline int
+FRELE(struct file *fp, struct proc *p)
+{
+ int error = 0;
+
+ membar_exit_before_atomic();
+ if (atomic_dec_int_nv(&fp->f_count) == 0)
+ error = fdrop(fp, p);
+ return (error);
+}
+
static inline off_t
foffset(struct file *fp)
{