This is the fifth iteration of the MTTCG patches and I'm finally dropping the RFC tag from the series. Previous versions had suffered from hangs which have been fixed by the additional cputlb fixes. A lot of races where identified and fixed using ThreadSanitizer (although a chunk of those fixes will come in a separate series).
I'm hoping to get this into 2.8 although if the maintainers aren't quite ready to take the full tree I'd appreciate cherry picking a good chunk of the clean-up patches to reduce the delta we need to hold over some of the work to the 2.9 cycle. This series enables MTTCG for ARM guests on x86_64 hosts by default. Prerequisites ============= Most of the pre-requisites have already been merged. The final one is a solution for atomic instruction emulation. This series has been based on v7 of Emilo & Richard's cmpxchg based atomics series. Once that is merged this series should apply cleanly. You can find the base of my tree at: https://github.com/stsquad/qemu/tree/mttcg/cmpxchg-atomics-v7-prepull Changes ======= Since the last posting there have been a number of updates to the original patches: - usual update of r-b tags - fixed bunch of races identified by ThreadSanitizer - updated the single-threaded kick timer as per review comments - a bunch of BQL asserts (IRQ processing) - use of parallel_cpus/tb_flush to ensure correct codegen - cputlb updates for atomic setting of dirty flags - cputlb fixes where work was not being deferred to async safe work It introduces a new patch to add run_on_cpu_data as a type for the *_run_on_cpu functions. The main aim is to ensure a target pointer (i.e. target_ulong) can always be passed in one argument even when emulating 64 bit targets on a 32 bit build. Finally there are some ARM specific updates: - cpu_reset is deferred to async work - arm specific messing to TLB removed - BQL taken for ARM_CP_IO register access - some helpers take BQL The last two patches expand on the approach we take for device emulation through MMIO. Any case where the emulation may touch global state (device emulation, cross-vCPU) needs to take the BQL. Simple helper functions which only update their own cpu->env are not affected. Testing on additional hardware models would be useful although pretty much any MMIO device is already protected by the BQL. The ARM_CP_IO registers where a little special as they updated the GIC which needed locking for serialisation. As usual the patches themselves have a revision summary under the --- A copy of the tree can be found at: https://github.com/stsquad/qemu/tree/mttcg/base-patches-v5 Testing ======= I've tested this boots ARMv7/ARMv8 Debian with a repeating compile test load (which previously would trigger cputlb races) as well as both ARMv7 and v8 kvm-unit-tests with both: -accel tcg,thread=single and: -accel tcg,thread=multi Performance =========== The following was measured with my boot+build benchmark: $QEMU_BIN -machine type=virt -display none -m 4096 \ -cpu $CPU -serial telnet:127.0.0.1:4444 -monitor stdio \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=${JESSIE}.qcow2,id=myblock,index=0,if=none,snapshot=on \ -device virtio-blk-device,drive=myblock -append "console=ttyAMA0 root=/dev/vda1 systemd.unit=benchmark-build.service" \ -kernel ${KERNEL} -name debug-threads=on \ -machine gic-version=3 -accel tcg,thread=multi -smp @ My Desktop (i7, 4+4) | smp | armv7, single | armv7, multi | x | armv8, single | armv8, multi | x | |-----+---------------+--------------+------+---------------+--------------+------| | 1 | 224.035 | 224.010 | 1.00 | 397.285 | 399.456 | 0.99 | | 2 | 231.043 | 125.923 | 1.83 | 415.307 | 225.760 | 1.84 | | 3 | 235.548 | 94.837 | 2.48 | 422.565 | 170.647 | 2.48 | | 4 | 239.403 | 81.145 | 2.95 | 432.743 | 146.869 | 2.95 | | 5 | 243.107 | 81.045 | 3.00 | 435.414 | 146.367 | 2.97 | | 6 | 249.164 | 78.742 | 3.16 | 445.176 | 143.415 | 3.10 | Alex Alex Bennée (28): cpus: make all_vcpus_paused() return bool translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH translate-all: add DEBUG_LOCKING asserts cpu-exec: include cpu_index in CPU_LOG_EXEC messages docs: new design document multi-thread-tcg.txt (DRAFTING) linux-user/elfload: ensure mmap_lock() held while setting up translate-all: Add assert_(memory|tb)_lock annotations target-arm/arm-powerctl: wake up sleeping CPUs tcg: move tcg_exec_all and helpers above thread fn tcg: cpus rm tcg_exec_all() tcg: add kick timer for single-threaded vCPU emulation tcg: rename tcg_current_cpu to tcg_current_rr_cpu cpus: re-factor out handle_icount_deadline tcg: remove global exit_request tcg: move locking for tb_invalidate_phys_page_range up tcg: enable tb_lock() for SoftMMU tcg: enable thread-per-vCPU atomic: introduce cmpxchg_bool *_run_on_cpu: introduce run_on_cpu_data type cputlb: add assert_cpu_is_self checks cputlb: tweak qemu_ram_addr_from_host_nofail reporting cputlb: atomically update tlb fields used by tlb_reset_dirty cputlb: make tlb_flush_by_mmuidx safe for MTTCG target-arm/powerctl: defer cpu reset work to CPU context target-arm/cpu: don't reset TLB structures, use cputlb to do it target-arm: ensure BQL taken for ARM_CP_IO register access target-arm: helpers which may affect global state need the BQL tcg: enable MTTCG by default for ARM on x86 hosts Jan Kiszka (1): tcg: drop global lock during TCG code execution KONRAD Frederic (3): tcg: protect translation related stuff with tb_lock. tcg: add options for enabling MTTCG cputlb: introduce tlb_flush_* async work. Paolo Bonzini (1): tcg: comment on which functions have to be called with tb_lock held bsd-user/mmap.c | 5 + configure | 12 + cpu-exec-common.c | 3 - cpu-exec.c | 48 ++-- cpus-common.c | 9 +- cpus.c | 544 ++++++++++++++++++++++++++-------------- cputlb.c | 400 +++++++++++++++++++++++------ default-configs/arm-softmmu.mak | 2 + docs/multi-thread-tcg.txt | 310 +++++++++++++++++++++++ exec.c | 28 +++ hw/core/irq.c | 1 + hw/i386/kvm/apic.c | 14 +- hw/i386/kvmvapic.c | 17 +- hw/intc/arm_gicv3_cpuif.c | 3 + hw/ppc/ppce500_spin.c | 6 +- hw/ppc/spapr.c | 7 +- hw/ppc/spapr_hcall.c | 12 +- include/exec/cputlb.h | 2 - include/exec/exec-all.h | 7 +- include/qemu/atomic.h | 9 + include/qom/cpu.h | 51 +++- include/sysemu/cpus.h | 2 + kvm-all.c | 20 +- linux-user/elfload.c | 4 + linux-user/mmap.c | 5 + memory.c | 2 + qemu-options.hx | 20 ++ qom/cpu.c | 10 + target-arm/Makefile.objs | 2 +- target-arm/arm-powerctl.c | 142 ++++++----- target-arm/cpu.c | 6 + target-arm/helper.c | 6 + target-arm/op_helper.c | 43 +++- target-i386/helper.c | 8 +- target-i386/kvm.c | 4 +- target-i386/smm_helper.c | 7 + target-s390x/cpu.c | 4 +- target-s390x/cpu.h | 4 +- target-s390x/misc_helper.c | 9 +- tcg/tcg.h | 2 + translate-all.c | 192 +++++++++++--- translate-common.c | 21 +- vl.c | 49 +++- 43 files changed, 1590 insertions(+), 462 deletions(-) create mode 100644 docs/multi-thread-tcg.txt -- 2.10.1