Jørgen Hansen <jorgen.han...@wdc.com> writes: > Hi, > > While doing some testing using numactl-based interleaving of application > memory > across regular memory and CXL-based memory using QEMU with tcg, I ran into an > issue similar to what we saw a while back - link to old issue: > https://lore.kernel.org/qemu-devel/CAFEAcA_a_AyQ=epz3_+cheat8crsk9mou894wbnw_fywamk...@mail.gmail.com/#t. > > When running: > > numactl --interleave 0,1 ./cachebench … > > I hit the following: > > numactl --interleave 0,1 ./cachebench --json_test_config > ../test_configs/hit_ratio/graph_cache_follower_assocs/config.json > qemu: fatal: cpu_io_recompile: could not find TB for pc=0x7fffc3926dd4
Ok so this will be because we (by design) don't cache TB's for MMIO regions. But as you say: > > Looking at the tb being executed, it looks like it is a single instruction tb, > so with my _very_ limited understanding of tcg, it shouldn’t be necessary to > do the IO recompile: > > (gdb) up 13 > > #13 0x0000555555ca13ac in cpu_loop_exec_tb (tb_exit=0x7ffff49ff6d8, > last_tb=<synthetic pointer>, pc=<optimized out>, tb=0x7fffc3926cc0 > <code_gen_buffer+464678035>, cpu=0x5555578c19c0) at > ../accel/tcg/cpu-exec.c:904 > 904 tb = cpu_tb_exec(cpu, tb, tb_exit); > (gdb) print *tb > $1 = {pc = 0, cs_base = 0, flags = 415285939, cflags = 4278321152, size = 7, > icount = 1, tc = {ptr = 0x7fffc3926d80 <code_gen_buffer+464678227>, size = > 176}, page_next = {0, 0}, page_addr = {18446744073709551615, > 18446744073709551615}, jmp_lock = {value = 0}, jmp_reset_offset = > {65535, 65535}, jmp_insn_offset = {65535, 65535}, jmp_target_addr = > {0, 0}, jmp_list_head = 140736474540928, jmp_list_next = {0, 0}, > jmp_dest = {0, 0}} with an icount of 1 we by definition can do io. This is done in the translator_loop: /* * Disassemble one instruction. The translate_insn hook should * update db->pc_next and db->is_jmp to indicate what should be * done next -- either exiting this loop or locate the start of * the next instruction. */ if (db->num_insns == db->max_insns) { /* Accept I/O on the last instruction. */ set_can_do_io(db, true); } and we see later on: tb->icount = db->num_insns; So I guess we must have gone into the translator loop expecting to translate more? (side note having int8_t type for the saved bool value seems weird). Could you confirm by doing something like: if (tb->icount == 1 && db->max_insns > 1) { fprintf(stderr, "ended up doing one insn (%d, %d)", db->max_insns, db->saved_can_do_io); } after we set tb->icount? > If the application is run entirely out of MMIO memory, things work fine (the > previous patches related to this is in Jonathans branch), so one thought is > that > it is related to having the code on a mix of regular and CXL memory. Since we > previously had issues with code crossing page boundaries where only the second > page is MMIO, I tried out the following change to the fix introduced for that > issue thinking that reverting to the slow path in the middle of the > translation > might not correctly update can_do_io: > > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c > index 38c34009a5..db6ea360e0 100644 > --- a/accel/tcg/translator.c > +++ b/accel/tcg/translator.c > @@ -258,6 +258,7 @@ static void *translator_access(CPUArchState *env, > DisasContextBase *db, > if (unlikely(new_page1 == -1)) { > tb_unlock_pages(tb); > tb_set_page_addr0(tb, -1); > + set_can_do_io(db, true); > return NULL; > } > > With that change, things work fine. Not saying that this is a valid fix for > the > issue, but just trying to provide as much information as possible :) I can see why this works, my worry is if we address all the early exit cases here. > > Thanks, > Jorgen -- Alex Bennée Virtualisation Tech Lead @ Linaro