On Wed, 3 May 2023 at 08:08, Richard Henderson <richard.hender...@linaro.org> wrote: > > Create ldst_atomicity.c.inc. > > Not required for user-only code loads, because we've ensured that > the page is read-only before beginning to translate code. > > Reviewed-by: Alex Bennée <alex.ben...@linaro.org> > Signed-off-by: Richard Henderson <richard.hender...@linaro.org> > ---
> +/** > + * required_atomicity: > + * > + * Return the lg2 bytes of atomicity required by @memop for @p. > + * If the operation must be split into two operations to be > + * examined separately for atomicity, return -lg2. > + */ > +static int required_atomicity(CPUArchState *env, uintptr_t p, MemOp memop) > +{ > + int atmax = memop & MO_ATMAX_MASK; > + int size = memop & MO_SIZE; > + unsigned tmp; > + > + if (atmax == MO_ATMAX_SIZE) { > + atmax = size; > + } else { > + atmax >>= MO_ATMAX_SHIFT; > + } > + > + switch (memop & MO_ATOM_MASK) { > + case MO_ATOM_IFALIGN: > + tmp = (1 << atmax) - 1; > + if (p & tmp) { > + return MO_8; > + } > + break; > + case MO_ATOM_NONE: > + return MO_8; > + case MO_ATOM_SUBALIGN: > + tmp = p & -p; > + if (tmp != 0 && tmp < atmax) { > + atmax = tmp; > + } > + break; I don't understand the bit manipulation going on here. AIUI what we're trying to do is say "if e.g. p is only 2-aligned then we only get 2-alignment". But, suppose p == 0x1002. Then (p & -p) is 0x2. But that's MO_32, not MO_16. Am I missing something ? (Also, it would be nice to have a comment mentioning what (p & -p) does, so readers don't have to try to search for a not very-searchable expression to find out.) > + case MO_ATOM_WITHIN16: > + tmp = p & 15; > + if (tmp + (1 << size) <= 16) { > + atmax = size; OK, so this is "whole operation is within 16 bytes, whole operation must be atomic"... > + } else if (atmax == size) { > + return MO_8; ...but I don't understand the interaction of WITHIN16 and also specifying an ATMAX value that's not ATMAX_SIZE. > + } else if (tmp + (1 << atmax) != 16) { Why is this doing an exact inequality check? What if you're asking for a load of 8 bytes at MO_ATMAX_2 from a pointer that's at an offset of 10 bytes from a 16-byte boundary? Then tmp is 10, tmp + (1 << atmax) is 12, but we could still do the loads at atomicity 2. This doesn't seem to me to be any different from the case it does catch where the first ATMAX_2-sized unit happens to be the only thing in this 16-byte block. The doc comment in patch 1 could probably be beefed up to better explain the interaction of WITHIN16 and ATMAX_*. > + /* > + * Paired load/store, where the pairs aren't aligned. > + * One of the two must still be handled atomically. > + */ > + atmax = -atmax; > + } > + break; > + default: > + g_assert_not_reached(); > + } > + > + /* > + * Here we have the architectural atomicity of the operation. > + * However, when executing in a serial context, we need no extra > + * host atomicity in order to avoid racing. This reduction > + * avoids looping with cpu_loop_exit_atomic. > + */ > + if (cpu_in_serial_context(env_cpu(env))) { > + return MO_8; > + } > + return atmax; > +} > + > +/** > + * load_atomic8_or_exit: > + * @env: cpu context > + * @ra: host unwind address > + * @pv: host address > + * > + * Atomically load 8 aligned bytes from @pv. > + * If this is not possible, longjmp out to restart serially. > + */ > +static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void > *pv) > +{ > + if (HAVE_al8) { > + return load_atomic8(pv); > + } > + > +#ifdef CONFIG_USER_ONLY > + /* > + * If the page is not writable, then assume the value is immutable > + * and requires no locking. This ignores the case of MAP_SHARED with > + * another process, because the fallback start_exclusive solution > + * provides no protection across processes. > + */ > + if (!page_check_range(h2g(pv), 8, PAGE_WRITE)) { > + uint64_t *p = __builtin_assume_aligned(pv, 8); > + return *p; > + } This will also do a non-atomic read for the case where the guest has mapped the same memory twice at different addresses, once read-only and once writeable, I think. In theory in that situation we could use start_exclusive. But maybe that's a weird corner case we can ignore? > +#endif > + > + /* Ultimate fallback: re-execute in serial context. */ > + cpu_loop_exit_atomic(env_cpu(env), ra); > +} > + thanks -- PMM