v1: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg05682.html
Changes since v1: - Drop the 2-pass translation. Instead, empty instrumentation is injected during translation. If it turns out that this empty instrumentation is not needed, it is removed from the output. For this, add 2 TCG ops that mark the beginning and end of this empty instrumentation. This is cleaner than 2-pass translation, although it ends up being quite a bit more code, since we have to copy backend TCG ops, which is tedious. Performance-wise, it is at worst ~9% slower (~1.3% avg) than 2-pass for SPEC06int: https://imgur.com/a/bUNox3H This is for an "empty" plugin (also added to tests/plugin/empty.c). That is, it subscribes to TB translation events and does nothing with them (i.e. no execution-time subscriptions). This means the empty instrumentation has to be injected and then removed, which is the worst-case scenario since all the injection work is wasted. - Add QTAILQ_REMOVE_SEVERAL, which helps speed up the removal of empty instrumentation. - Drop the "TCG runtime helper" support. We do not need it for empty instrumentation; we just replace the function pointer in the copied "call" op directly. + To detect when an instruction uses helpers, just strncmp the helper's name against "plugin_". - Drop tb->plugin_mask. Instead, read cpu->plugin_mask from translator_loop. - Drop the xxhash patches, since I submitted those as a separate series. - Move a lot of plugin-related code from translator.c to plugin-gen.c, leaving only a few function calls in translator.c. - Add support for only subscribing to an instruction's reads or writes. This is implemented via a flag added to the memory registration functions of the public API. - Disentangle callbacks into separate arrays. Instead of just having 3 arrays (tb, insn and mem callbacks), have 5 arrays (tb, insn, virt. mem, hostaddr mem) of 2 arrays each (udata_cb and inline). This takes a bit more space per TB, but note that this struct is allocated only once in each TCGContext. OTOH, it makes the code much simpler. The union in struct dyn_cb remains, since for instrumenting memory accesses from helpers we still coalesce all types of memory callbacks into a single array. - Add get_page_addr_code_hostp to get the host address of code from common code. Use this to export the host address of instructions (qemu_plugin_insn_haddr() added to the public API). - Define TCGMemOp MO_HADDR. If set, the TCG backend copies on a TLB hit the corresponding host address to env->hostaddr. This allows us to only do this copy when needed. - Use helpers for reading and setting env->hostaddr, so that we minimize the use of #ifdef CONFIG_PLUGIN. - Only define env->hostaddr if CONFIG_PLUGIN. - Drop the trailing 'S' in CONFIG_PLUGINS: it is now CONFIG_PLUGIN. - Drop a few optional features from the RFC: + lockstep execution + plugin-chan + guest hooks + virtual clock control - Define translator_ld* helpers and use them, as suggested by Alex and rth. All target ISAs that use translator_loop have been converted, except s390x and mips. - Do not bloat TCGContext if !CONFIG_PLUGIN. - Define TCGContext.plugin_tb as a pointer, instead of the whole struct. - Test on 32-bit and 64-bit hosts (i386, x86_64, ppc64, aarch64). - Add cpu_in_exclusive_work_context() and use it in tb_flush(), as suggested by Alex. - configure fixes, including MacOSX builds thanks to Roman's help. - Remove macros in atomic_template.h, as suggested by Alex. Turns out they aren't needed, inlines are enough. - Fixed a bug by which cpu->plugin_mem was not being cleared if the instruction that used helpers was the last one in a TB (e.g. an exception). Fix it by adding checks (1) when returning from longjmp, and (2) when finishing a TB from tcg, so that we're sure to leave cpu->plugin_mem in a good state. (I noticed the bug by uninstalling a plugin that had registered memory callbacks, which resulted in callbacks to the uninstalled [dlclose'd] plugin.) - Make sure tcg_ctx->plugin_mem_cb is always NULL after finishing the translation of a TB. This fixes a bug on uninstall. - Do not abort when qemu_plugin_uninstall is called more than once. This is actually quite common, so just silently return on subsequent calls to uninstall. - Drop the "qemu"/QEMU from some overly long function/macro names. This applies to qemu-internal files, of course. - Keep the plugin's argument array in memory until the plugin is uninstalled, so that plugins don't have to strdup their arguments. - Drop nargs argument from tcg_op_insert_before/after; it's unused. - Rename plugin-api.h to qemu-plugin.h, which is the same name it gets in the final destination (after `make install'). - Add insn_inline function to the API. - Add some sample plugins to tests/plugin. You can fetch this series from: https://github.com/cota/qemu/tree/plugin-v2 Thanks, Emilio --- .gitignore | 2 + Makefile | 8 +- Makefile.target | 18 + accel/tcg/Makefile.objs | 1 + accel/tcg/atomic_template.h | 117 +++- accel/tcg/cpu-exec.c | 2 + accel/tcg/cputlb.c | 23 +- accel/tcg/plugin-gen.c | 1085 +++++++++++++++++++++++++++++ accel/tcg/plugin-helpers.h | 6 + accel/tcg/softmmu_template.h | 43 +- accel/tcg/translate-all.c | 15 +- accel/tcg/translator.c | 16 + bsd-user/syscall.c | 12 + configure | 86 ++- cpus-common.c | 2 + cpus.c | 10 + exec.c | 2 + include/exec/cpu-defs.h | 9 + include/exec/cpu_ldst.h | 9 + include/exec/cpu_ldst_template.h | 43 +- include/exec/cpu_ldst_useronly_template.h | 42 +- include/exec/exec-all.h | 13 + include/exec/helper-gen.h | 1 + include/exec/helper-proto.h | 1 + include/exec/helper-tcg.h | 1 + include/exec/plugin-gen.h | 75 ++ include/exec/translator.h | 28 + include/qemu/plugin.h | 253 +++++++ include/qemu/qemu-plugin.h | 241 +++++++ include/qemu/queue.h | 10 + include/qom/cpu.h | 19 + linux-user/exit.c | 1 + linux-user/main.c | 18 + linux-user/syscall.c | 3 + plugin.c | 1030 +++++++++++++++++++++++++++ qemu-options.hx | 17 + qemu-plugins.symbols | 34 + qom/cpu.c | 2 + target/alpha/translate.c | 2 +- target/arm/translate-a64.c | 2 + target/arm/translate.c | 8 +- target/hppa/translate.c | 2 +- target/i386/translate.c | 10 +- target/m68k/translate.c | 2 +- target/openrisc/translate.c | 2 +- target/ppc/translate.c | 8 +- target/riscv/translate.c | 2 +- target/sh4/translate.c | 2 +- target/sparc/translate.c | 2 +- target/xtensa/translate.c | 4 +- tcg/README | 2 +- tcg/i386/tcg-target.inc.c | 7 + tcg/optimize.c | 4 +- tcg/tcg-op.c | 44 +- tcg/tcg-op.h | 16 + tcg/tcg-opc.h | 3 + tcg/tcg.c | 27 +- tcg/tcg.h | 32 +- tests/plugin/Makefile | 28 + tests/plugin/bb.c | 66 ++ tests/plugin/empty.c | 30 + tests/plugin/insn.c | 63 ++ tests/plugin/mem.c | 93 +++ trace-events | 2 +- vl.c | 11 + 65 files changed, 3653 insertions(+), 119 deletions(-)