The bug describes a race whereby cpu_exec_step_atomic can acquire a TB which is invalidated by a tb_flush before we execute it. This doesn't affect the other cpu_exec modes as a tb_flush by it's nature can only occur on a quiescent system. The race was described as:
B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code B3. tcg_tb_alloc obtains a new TB C3. TB obtained with tb_lookup__cpu_state or tb_gen_code (same TB as B2) A3. start_exclusive critical section entered A4. do_tb_flush is called, TB memory freed/re-allocated A5. end_exclusive exits critical section B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code B3. tcg_tb_alloc reallocates TB from B2 C4. start_exclusive critical section entered C5. cpu_tb_exec executes the TB code that was free in A4 The simplest fix is to widen the exclusive period to include the TB lookup. As a result we can drop the complication of checking we are in the exclusive region before we end it. Signed-off-by: Alex Bennée <alex.ben...@linaro.org> Cc: Yifan <m...@yifanlu.com> Cc: Bug 1863025 <1863...@bugs.launchpad.net> --- accel/tcg/cpu-exec.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 2560c90eec7..d95c4848a47 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -240,6 +240,8 @@ void cpu_exec_step_atomic(CPUState *cpu) uint32_t cf_mask = cflags & CF_HASH_MASK; if (sigsetjmp(cpu->jmp_env, 0) == 0) { + start_exclusive(); + tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask); if (tb == NULL) { mmap_lock(); @@ -247,8 +249,6 @@ void cpu_exec_step_atomic(CPUState *cpu) mmap_unlock(); } - start_exclusive(); - /* Since we got here, we know that parallel_cpus must be true. */ parallel_cpus = false; cc->cpu_exec_enter(cpu); @@ -271,14 +271,15 @@ void cpu_exec_step_atomic(CPUState *cpu) qemu_plugin_disable_mem_helpers(cpu); } - if (cpu_in_exclusive_context(cpu)) { - /* We might longjump out of either the codegen or the - * execution, so must make sure we only end the exclusive - * region if we started it. - */ - parallel_cpus = true; - end_exclusive(); - } + + /* + * As we start the exclusive region before codegen we must still + * be in the region if we longjump out of either the codegen or + * the execution. + */ + g_assert(cpu_in_exclusive_context(cpu)); + parallel_cpus = true; + end_exclusive(); } struct tb_desc { -- 2.20.1