Branch: refs/heads/master Home: https://github.com/qemu/qemu Commit: d7f425fdea991f052241c6479acd9feae834063b https://github.com/qemu/qemu/commit/d7f425fdea991f052241c6479acd9feae834063b Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018)
Changed paths: M accel/tcg/cpu-exec.c M tcg/tcg-op.c Log Message: ----------- tcg: Implement CPU_LOG_TB_NOCHAIN during expansion Rather than test NOCHAIN before linking, do not emit the goto_tb opcode at all. We already do this for goto_ptr. Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: fff42f183ea4c3967405d4c1dce6d97dae4d64c8 https://github.com/qemu/qemu/commit/fff42f183ea4c3967405d4c1dce6d97dae4d64c8 Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M accel/tcg/tcg-all.c M accel/tcg/translate-all.c M qom/cpu.c Log Message: ----------- tcg: access cpu->icount_decr.u16.high with atomics Consistently access u16.high with atomics to avoid undefined behaviour in MTTCG. Note that icount_decr.u16.low is only used in icount mode, so regular accesses to it are OK. Reviewed-by: Richard Henderson <richard.hender...@linaro.org> Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181010144853.13005-2-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: c1f543b739086733024e31d74a52d9e41553f316 https://github.com/qemu/qemu/commit/c1f543b739086733024e31d74a52d9e41553f316 Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M tcg/tcg.c Log Message: ----------- tcg: fix use of uninitialized variable under CONFIG_PROFILER We forgot to initialize n in commit 15fa08f845 ("tcg: Dynamically allocate TCGOps", 2017-12-29). Reviewed-by: Philippe Mathieu-Daudé <phi...@redhat.com> Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181010144853.13005-3-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: dd1d7da23b0abef87f46d9ab39ba9b0974eaec04 https://github.com/qemu/qemu/commit/dd1d7da23b0abef87f46d9ab39ba9b0974eaec04 Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M tcg/tcg.h Log Message: ----------- tcg: plug holes in struct TCGProfile This plugs two 4-byte holes in 64-bit. Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181010144853.13005-4-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 72fd2efbbd52c1a7974000a60a0c2131b1a4aaf2 https://github.com/qemu/qemu/commit/72fd2efbbd52c1a7974000a60a0c2131b1a4aaf2 Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M cpus.c M include/qemu/timer.h M monitor.c M tcg/tcg.c M tcg/tcg.h Log Message: ----------- tcg: distribute tcg_time into TCG contexts When we implemented per-vCPU TCG contexts, we forgot to also distribute the tcg_time counter, which has remained as a global accessed without any serialization, leading to potentially missed counts. Fix it by distributing the field over the TCG contexts, embedding it into TCGProfile with a field called "cpu_exec_time", which is more descriptive than "tcg_time". Add a function to query this value directly, and for completeness, fill in the field in tcg_profile_snapshot, even though its callers do not use it. Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181010144853.13005-5-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 6e11beecfde0d8b7ed13164fbfb1cea30d66f9c9 https://github.com/qemu/qemu/commit/6e11beecfde0d8b7ed13164fbfb1cea30d66f9c9 Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/alpha/cpu.c Log Message: ----------- target/alpha: remove tlb_flush from alpha_cpu_initfn As far as I can tell tlb_flush does not need to be called this early. tlb_flush is eventually called after the CPU has been realized. This change paves the way to the introduction of tlb_init, which will be called from cpu_exec_realizefn. Reviewed-by: Alex Bennée <alex.ben...@linaro.org> Reviewed-by: Richard Henderson <richard.hender...@linaro.org> Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181009174557.16125-2-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 022d6378c7fda797ef91fe71a4e13a7a651298b8 https://github.com/qemu/qemu/commit/022d6378c7fda797ef91fe71a4e13a7a651298b8 Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/unicore32/cpu.c Log Message: ----------- target/unicore32: remove tlb_flush from uc32_init_fn As far as I can tell tlb_flush does not need to be called this early. tlb_flush is eventually called after the CPU has been realized. This change paves the way to the introduction of tlb_init, which will be called from cpu_exec_realizefn. Cc: Guan Xuetao <g...@mprc.pku.edu.cn> Reviewed-by: Alex Bennée <alex.ben...@linaro.org> Reviewed-by: Richard Henderson <richard.hender...@linaro.org> Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181009174557.16125-3-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 5005e2537d090bee87aca3b924dcd17920fd146a https://github.com/qemu/qemu/commit/5005e2537d090bee87aca3b924dcd17920fd146a Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M accel/tcg/cputlb.c M exec.c M include/exec/exec-all.h Log Message: ----------- exec: introduce tlb_init Paves the way for the addition of a per-TLB lock. Reviewed-by: Alex Bennée <alex.ben...@linaro.org> Reviewed-by: Richard Henderson <richard.hender...@linaro.org> Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181009174557.16125-4-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: ea9025cb49027d9b3c4f48c56602351b9cf65ff1 https://github.com/qemu/qemu/commit/ea9025cb49027d9b3c4f48c56602351b9cf65ff1 Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M accel/tcg/cputlb.c Log Message: ----------- cputlb: fix assert_cpu_is_self macro Reviewed-by: Richard Henderson <richard.hender...@linaro.org> Reviewed-by: Alex Bennée <alex.ben...@linaro.org> Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181009174557.16125-5-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 71aec3541d87d611f6efad71d45b310e515372cc https://github.com/qemu/qemu/commit/71aec3541d87d611f6efad71d45b310e515372cc Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M accel/tcg/cputlb.c M include/exec/cpu-defs.h Log Message: ----------- cputlb: serialize tlb updates with env->tlb_lock Currently we rely on atomic operations for cross-CPU invalidations. There are two cases that these atomics miss: cross-CPU invalidations can race with either (1) vCPU threads flushing their TLB, which happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB, which updates .addr_write with a regular store. This results in undefined behaviour, since we're mixing regular and atomic ops on concurrent accesses. Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table and the corresponding victim cache now hold the lock. The readers that do not hold tlb_lock must use atomic reads when reading .addr_write, since this field can be updated by other threads; the conversion to atomic reads is done in the next patch. Note that an alternative fix would be to expand the use of atomic ops. However, in the case of TLB flushes this would have a huge performance impact, since (1) TLB flushes can happen very frequently and (2) we currently use a full memory barrier to flush each TLB entry, and a TLB has many entries. Instead, acquiring the lock is barely slower than a full memory barrier since it is uncontended, and with a single lock acquisition we can flush the entire TLB. Tested-by: Alex Bennée <alex.ben...@linaro.org> Reviewed-by: Alex Bennée <alex.ben...@linaro.org> Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181009174557.16125-6-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 383beda9cf32f795616c3b93f7d6154d70372d4b https://github.com/qemu/qemu/commit/383beda9cf32f795616c3b93f7d6154d70372d4b Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M accel/tcg/cputlb.c M accel/tcg/softmmu_template.h M include/exec/cpu_ldst.h M include/exec/cpu_ldst_template.h Log Message: ----------- tcg: Add tlb_index and tlb_entry helpers Isolate the computation of an index from an address into a helper before we change that function. Reviewed-by: Alex Bennée <alex.ben...@linaro.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> [ cota: convert tlb_vaddr_to_host; use atomic_read on addr_write ] Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181009175129.17888-2-c...@braap.org> Commit: e6cd4bb59b8154fa00da611200beef7eb4e8ec56 https://github.com/qemu/qemu/commit/e6cd4bb59b8154fa00da611200beef7eb4e8ec56 Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M accel/tcg/atomic_template.h M accel/tcg/cputlb.c M accel/tcg/user-exec.c M configure A include/qemu/atomic128.h M include/qemu/compiler.h M tcg/tcg.h Log Message: ----------- tcg: Split CONFIG_ATOMIC128 GCC7+ will no longer advertise support for 16-byte __atomic operations if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still has support for __sync_compare_and_swap_16 and we can make use of that. AArch64 does not have, nor ever has had such support, so open-code it. Reviewed-by: Emilio G. Cota <c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: e1ed709fbe687b1c92b31014b0ecfcd059252ec1 https://github.com/qemu/qemu/commit/e1ed709fbe687b1c92b31014b0ecfcd059252ec1 Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/i386/mem_helper.c Log Message: ----------- target/i386: Convert to HAVE_CMPXCHG128 Reviewed-by: Emilio G. Cota <c...@braap.org> Reviewed-by: Philippe Mathieu-Daudé <phi...@redhat.com> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 1ec182c3337993a7d8b3983a1ac4f608c1d0fd64 https://github.com/qemu/qemu/commit/1ec182c3337993a7d8b3983a1ac4f608c1d0fd64 Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/arm/helper-a64.c Log Message: ----------- target/arm: Convert to HAVE_CMPXCHG128 Reviewed-by: Emilio G. Cota <c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 62823083b8a2da8e126bb82b7b70f68eaa27b338 https://github.com/qemu/qemu/commit/62823083b8a2da8e126bb82b7b70f68eaa27b338 Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/arm/helper-a64.c M target/arm/translate-a64.c Log Message: ----------- target/arm: Check HAVE_CMPXCHG128 at translate time Reviewed-by: Emilio G. Cota <c...@braap.org> Reviewed-by: Philippe Mathieu-Daudé <phi...@redhat.com> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: f34ec0f6d79f1b9f52e41cd89bf7f0e7c853b124 https://github.com/qemu/qemu/commit/f34ec0f6d79f1b9f52e41cd89bf7f0e7c853b124 Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/ppc/helper.h M target/ppc/mem_helper.c M target/ppc/translate.c Log Message: ----------- target/ppc: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128 Reviewed-by: Emilio G. Cota <c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 5e95612e2e3bf6647972b9596f3e72532fd65192 https://github.com/qemu/qemu/commit/5e95612e2e3bf6647972b9596f3e72532fd65192 Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/s390x/mem_helper.c Log Message: ----------- target/s390x: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128 Reviewed-by: David Hildenbrand <da...@redhat.com> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 0c9fa16805b450f53f602b9a8043e1d5fbe997ff https://github.com/qemu/qemu/commit/0c9fa16805b450f53f602b9a8043e1d5fbe997ff Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/s390x/mem_helper.c Log Message: ----------- target/s390x: Split do_cdsg, do_lpq, do_stpq Reviewed-by: David Hildenbrand <da...@redhat.com> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 72d8ad67ba6d2fb71b84c884bd9f7e7817e2817d https://github.com/qemu/qemu/commit/72d8ad67ba6d2fb71b84c884bd9f7e7817e2817d Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/s390x/translate.c Log Message: ----------- target/s390x: Skip wout, cout helpers if op helper does not return When op raises an exception, it may not have initialized the output temps that would be written back by wout or cout. Reviewed-by: David Hildenbrand <da...@redhat.com> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 830bf10c82b49c7e8e2e3e6ff0cc6e440cdcf8d4 https://github.com/qemu/qemu/commit/830bf10c82b49c7e8e2e3e6ff0cc6e440cdcf8d4 Author: Richard Henderson <richard.hender...@linaro.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M target/s390x/mem_helper.c M target/s390x/translate.c Log Message: ----------- target/s390x: Check HAVE_ATOMIC128 and HAVE_CMPXCHG128 at translate Reviewed-by: David Hildenbrand <da...@redhat.com> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 403f290c0603f35f2d09c982bf5549b6d0803ec1 https://github.com/qemu/qemu/commit/403f290c0603f35f2d09c982bf5549b6d0803ec1 Author: Emilio G. Cota <c...@braap.org> Date: 2018-10-18 (Thu, 18 Oct 2018) Changed paths: M accel/tcg/cputlb.c M accel/tcg/softmmu_template.h M include/exec/cpu_ldst.h M include/exec/cpu_ldst_template.h Log Message: ----------- cputlb: read CPUTLBEntry.addr_write atomically Updates can come from other threads, so readers that do not take tlb_lock must use atomic_read to avoid undefined behaviour (UB). This completes the conversion to tlb_lock. This conversion results on average in no performance loss, as the following experiments (run on an Intel i7-6700K CPU @ 4.00GHz) show. 1. aarch64 bootup+shutdown test: - Before: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% ) 31,574,905,303 cycles # 4.217 GHz ( +- 0.12% ) 57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% ) 10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% ) 173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% ) 7.504481349 seconds time elapsed ( +- 0.14% ) - After: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs): 7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 31,478,476,520 cycles # 4.218 GHz ( +- 0.07% ) 57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% ) 10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% ) 173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% ) 7.474970463 seconds time elapsed ( +- 0.07% ) 2. SPEC06int: SPEC06int (test set) [Y axis: Speedup over master] 1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+ | | 1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+ | +++ | +++ tlb-lock-v3 (spinl|ck) | | +++ | | +++ +++ | | | 1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+ | ### ++#| # |# |# ***### +++### +++#+# | +++ | #|# ### | 1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+ | *+* # #++# *** # #### *** # * *++# ****+# *| * # ****|# |# # #|# #+# # # | 0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+ | * * # # # *|* # # # *|* # * * # *++* # * * # * * # * |* # ++# # # # *** # | | * * # ++# # *+* # # # *|* # * * # * * # * * # * * # *++* # **** # ++# # * * # | 0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+ | * * # *** # * * # |# # *+* # * * # * * # * * # * * # * * # *++* # |# # * * # | 0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+ | * * # *+* # * * # *|* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # | | * * # * * # * * # *+* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # | 0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+ | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # | 0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+ 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean png: https://imgur.com/a/BHzpPTW Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 corresponds to the current implementation, i.e. a spinlock and a single lock acquisition in tlb_set_page_with_attrs. Signed-off-by: Emilio G. Cota <c...@braap.org> Message-Id: <20181016153840.25877-1-c...@braap.org> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> Commit: 31e213e30617b986a6e8ab4d9a0646eb4e6a4227 https://github.com/qemu/qemu/commit/31e213e30617b986a6e8ab4d9a0646eb4e6a4227 Author: Peter Maydell <peter.mayd...@linaro.org> Date: 2018-10-19 (Fri, 19 Oct 2018) Changed paths: M accel/tcg/atomic_template.h M accel/tcg/cpu-exec.c M accel/tcg/cputlb.c M accel/tcg/softmmu_template.h M accel/tcg/tcg-all.c M accel/tcg/translate-all.c M accel/tcg/user-exec.c M configure M cpus.c M exec.c M include/exec/cpu-defs.h M include/exec/cpu_ldst.h M include/exec/cpu_ldst_template.h M include/exec/exec-all.h A include/qemu/atomic128.h M include/qemu/compiler.h M include/qemu/timer.h M monitor.c M qom/cpu.c M target/alpha/cpu.c M target/arm/helper-a64.c M target/arm/translate-a64.c M target/i386/mem_helper.c M target/ppc/helper.h M target/ppc/mem_helper.c M target/ppc/translate.c M target/s390x/mem_helper.c M target/s390x/translate.c M target/unicore32/cpu.c M tcg/tcg-op.c M tcg/tcg.c M tcg/tcg.h Log Message: ----------- Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20181018' into staging Queued tcg patches. # gpg: Signature made Fri 19 Oct 2018 07:03:20 BST # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.hender...@linaro.org>" # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20181018: (21 commits) cputlb: read CPUTLBEntry.addr_write atomically target/s390x: Check HAVE_ATOMIC128 and HAVE_CMPXCHG128 at translate target/s390x: Skip wout, cout helpers if op helper does not return target/s390x: Split do_cdsg, do_lpq, do_stpq target/s390x: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128 target/ppc: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128 target/arm: Check HAVE_CMPXCHG128 at translate time target/arm: Convert to HAVE_CMPXCHG128 target/i386: Convert to HAVE_CMPXCHG128 tcg: Split CONFIG_ATOMIC128 tcg: Add tlb_index and tlb_entry helpers cputlb: serialize tlb updates with env->tlb_lock cputlb: fix assert_cpu_is_self macro exec: introduce tlb_init target/unicore32: remove tlb_flush from uc32_init_fn target/alpha: remove tlb_flush from alpha_cpu_initfn tcg: distribute tcg_time into TCG contexts tcg: plug holes in struct TCGProfile tcg: fix use of uninitialized variable under CONFIG_PROFILER tcg: access cpu->icount_decr.u16.high with atomics ... Signed-off-by: Peter Maydell <peter.mayd...@linaro.org> Compare: https://github.com/qemu/qemu/compare/784c2e4f232a...31e213e30617 **NOTE:** This service has been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.