The main objective here is to support Arm FEAT_LSE2, which says that any single memory access that does not cross a 16-byte boundary is atomic. This is the MO_ATOM_WITHIN16 control.
While I'm touching all of this, a secondary objective is to handle the atomicity of the IBM machines. Both Power and s390x treat misaligned accesses as atomic on the lsb of the pointer. For instance, an 8-byte access at ptr % 8 == 4 will appear as two atomic 4-byte accesses, and ptr % 4 == 2 will appear as four 3-byte accesses. This is the MO_ATOM_SUBALIGN control. By default, acceses are atomic only if aligned, which is the current behaviour of the tcg code generator (mostly, anyway, there were bugs). This is the MO_ATOM_IFALIGN control. Further, one can say that a large memory access is really a set of contiguous smaller accesses, and we need not provide more atomicity than that (modulo MO_ATOM_WITHIN16). This is the MO_ATMAX_* control. While I've had a go at documenting all of this, I'm certain it could be improved -- soliciting suggestions. r~ Based-on: 20221118091858.242569-1-richard.hender...@linaro.org ("main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD") which itself depends on "tcg: Support for Int128 with helpers". Richard Henderson (29): include/qemu/cpuid: Introduce xgetbv_low include/exec/memop: Add bits describing atomicity accel/tcg: Add cpu_in_serial_context accel/tcg: Introduce tlb_read_idx accel/tcg: Reorg system mode load helpers accel/tcg: Reorg system mode store helpers accel/tcg: Honor atomicity of loads accel/tcg: Honor atomicity of stores tcg/tci: Use cpu_{ld,st}_mmu tcg: Unify helper_{be,le}_{ld,st}* accel/tcg: Implement helper_{ld,st}*_mmu for user-only tcg: Add 128-bit guest memory primitives meson: Detect atomic128 support with optimization tcg/i386: Add have_atomic16 include/qemu/int128: Add vector type to Int128Alias accel/tcg: Use have_atomic16 in ldst_atomicity.c.inc tcg/aarch64: Add have_lse, have_lse2 accel/tcg: Add aarch64 specific support in ldst_atomicity tcg: Introduce TCG_OPF_TYPE_MASK tcg: Add INDEX_op_qemu_{ld,st}_i128 tcg/i386: Introduce tcg_out_mov2 tcg/i386: Introduce tcg_out_testi tcg/i386: Use full load/store helpers in user-only mode tcg/i386: Replace is64 with type in qemu_ld/st routines tcg/i386: Mark Win64 call-saved vector regs as reserved tcg/i386: Examine MemOp for atomicity and alignment tcg/i386: Support 128-bit load/store with have_atomic16 tcg/i386: Add vex_v argument to tcg_out_vex_modrm_pool tcg/i386: Honor 64-bit atomicity in 32-bit mode accel/tcg/internal.h | 5 + accel/tcg/tcg-runtime.h | 3 + include/exec/cpu-defs.h | 7 +- include/exec/cpu_ldst.h | 26 +- include/exec/memop.h | 36 + include/qemu/cpuid.h | 25 + include/qemu/int128.h | 10 +- include/tcg/tcg-ldst.h | 70 +- include/tcg/tcg-opc.h | 8 + include/tcg/tcg.h | 22 +- tcg/aarch64/tcg-target.h | 5 + tcg/arm/tcg-target.h | 2 + tcg/i386/tcg-target.h | 4 + tcg/loongarch64/tcg-target.h | 2 + tcg/mips/tcg-target.h | 2 + tcg/ppc/tcg-target.h | 2 + tcg/riscv/tcg-target.h | 2 + tcg/s390x/tcg-target.h | 2 + tcg/sparc64/tcg-target.h | 2 + tcg/tci/tcg-target.h | 2 + accel/tcg/cpu-exec-common.c | 3 + accel/tcg/cputlb.c | 1884 +++++++++++++++++++----------- accel/tcg/tb-maint.c | 2 +- accel/tcg/user-exec.c | 478 +++++--- tcg/optimize.c | 15 +- tcg/tcg-op.c | 246 ++-- tcg/tcg.c | 8 +- tcg/tci.c | 127 +- util/bufferiszero.c | 3 +- accel/tcg/ldst_atomicity.c.inc | 1170 +++++++++++++++++++ docs/devel/loads-stores.rst | 36 +- meson.build | 52 +- tcg/README | 10 +- tcg/aarch64/tcg-target.c.inc | 57 +- tcg/arm/tcg-target.c.inc | 45 +- tcg/i386/tcg-target.c.inc | 1228 +++++++++++++------ tcg/loongarch64/tcg-target.c.inc | 25 +- tcg/mips/tcg-target.c.inc | 40 +- tcg/ppc/tcg-target.c.inc | 30 +- tcg/riscv/tcg-target.c.inc | 51 +- tcg/s390x/tcg-target.c.inc | 38 +- tcg/sparc64/tcg-target.c.inc | 37 +- tcg/tci/tcg-target.c.inc | 3 +- 43 files changed, 4145 insertions(+), 1680 deletions(-) create mode 100644 accel/tcg/ldst_atomicity.c.inc -- 2.34.1