On Wed, 2023-01-11 at 02:47 +0100, Ilya Leoshkevich wrote: > Add ability to dump /tmp/perf-<pid>.map and jit-<pid>.dump. > The first one allows the perf tool to map samples to each individual > translation block. The second one adds the ability to resolve symbol > names, line numbers and inspect JITed code. > > Example of use: > > perf record qemu-x86_64 -perfmap ./a.out > perf report > > or > > perf record -k 1 qemu-x86_64 -jitdump ./a.out > DEBUGINFOD_URLS= perf inject -j -i perf.data -o perf.data.jitted > perf report -i perf.data.jitted > > Co-developed-by: Vanderson M. do Rosario <vanderson...@gmail.com> > Co-developed-by: Alex Bennée <alex.ben...@linaro.org> > Signed-off-by: Ilya Leoshkevich <i...@linux.ibm.com> > --- > accel/tcg/meson.build | 1 + > accel/tcg/perf.c | 366 > ++++++++++++++++++++++++++++++++++++++ > accel/tcg/perf.h | 49 +++++ > accel/tcg/translate-all.c | 8 + > docs/devel/tcg.rst | 23 +++ > linux-user/exit.c | 2 + > linux-user/main.c | 15 ++ > qemu-options.hx | 20 +++ > softmmu/vl.c | 11 ++ > tcg/tcg.c | 2 + > 10 files changed, 497 insertions(+) > create mode 100644 accel/tcg/perf.c > create mode 100644 accel/tcg/perf.h
... > +void perf_report_code(unsigned long long guest_pc, size_t icount, > + const void *start, size_t size) > +{ > + struct debuginfo_query *q; > + size_t insn; > + > + if (!perfmap && !jitdump) { > + return; > + } > + > + q = g_try_malloc0_n(icount, sizeof(*q)); > + if (!q) { > + return; > + } > + > + debuginfo_lock(); > + > + /* Query debuginfo for each guest instruction. */ > + for (insn = 0; insn < icount; insn++) { > + q[insn].address = tcg_ctx->gen_insn_data[insn][0] + > + (TARGET_TB_PCREL ? guest_pc : 0); Currently this produces plausibly looking, but actually wrong addresses. This needs to match restore_state_to_opc(), so at least: --- a/accel/tcg/perf.c +++ b/accel/tcg/perf.c @@ -325,8 +325,10 @@ void perf_report_code(unsigned long long guest_pc, size_t icount, /* Query debuginfo for each guest instruction. */ for (insn = 0; insn < icount; insn++) { - q[insn].address = tcg_ctx->gen_insn_data[insn][0] + - (TARGET_TB_PCREL ? guest_pc : 0); + q[insn].address = tcg_ctx->gen_insn_data[insn][0]; + if (TARGET_TB_PCREL) { + q[insn].address |= (guest_pc & TARGET_PAGE_MASK); + } q[insn].flags = DEBUGINFO_SYMBOL | (jitdump ? DEBUGINFO_LINE : 0); } debuginfo_query(q, icount); Apparently even with this there are corner cases, e.g. in x86_restore_state_to_opc() we have: if (TARGET_TB_PCREL) { env->eip = (env->eip & TARGET_PAGE_MASK) | data[0]; } else { env->eip = data[0] - tb->cs_base; } so if tb->cs_base != 0, the result is still going to be wrong. I wonder if it would make sense to create a new TCGCPUOps member purely for resolving a PC from data[0]?