On Tue, Jul 04, 2017 at 10:30:13AM +0200, Lluís Vilanova wrote: > Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory > accesses), making it feasible to statically enable them by default on all QEMU > builds. Last patch shows that overheads are completely eliminated in various > types of benchmarks for linux-user and softmmu (overheads where up to 2x > before). > > > Right now, events with the 'tcg' property always generate TCG code to trace > that > event at guest code execution time, where the event's dynamic state is > checked. > > This series adds a performance optimization where TCG code for events with the > 'tcg' and 'vcpu' properties is not generated if the event is dynamically > disabled. This optimization raises two issues: > > * An event can be dynamically disabled/enabled after the corresponding TCG > code > has been generated (i.e., a new TB with the corresponding code should be > used). > > * Each vCPU can have a different dynamic state for the same event (i.e., > tracing > the memory accesses of only one process pinned to a vCPU). > > To handle both issues, this series integrates the dynamic tracing event state > into the TB hashing function, so that vCPUs tracing different events will use > separate TBs. Note that only events with the 'vcpu' property are used for > hashing (as stored in the bitmap of #CPUState::trace_dstate). > > This makes dynamic event state changes on vCPUs very efficient, since they can > use TBs produced by other vCPUs while on the same event state combination (or > produced by the same vCPU, earlier). > > Discarded alternatives: > > * Emitting TCG code to check if an event needs tracing, where we should still > move the tracing call code to either a cold path (making tracing performance > worse), or leave it inlined (making non-tracing performance worse). > > * Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling > it on a single vCPU will impact the performance of all other vCPUs that are > not tracing that event. > > Signed-off-by: Lluís Vilanova <address@hidden> > --- > > Changes in v11 > ============== > > * Rebase on 931892e8a6. > * Drop patch for flushing per-CPU virtual TB cache. > * Fix typo in comments. > * Change documentation format in TranslationBlock. > > > Changes in v10 > ============== > > * Replace lingering trace_get_vcpu_event_count() with > CPU_TRACE_DSTATE_MAX_EVENTS [Emilio G. Cota]. > * Add performance results for dbt-bench [Emilio G. Cota]. > > > Changes in v9 > ============= > > * Rebase on 931892e8a6. > * Undo renaming of tb->trace_vcpu_dstate to the shorter tb->trace_ds. > * Add measurements to commit enabling all guest events. > > > Changes in v8 > ============= > > [Emilio G. Cota] > > * Ported to current dev tree. > > * Allocate cpu->trace_dstate statically. This > * allows us to drop the event_count inline patch. > * simplifies and improves the performance of accessing cpu->trace_dstate: > we just need to dereference, instead of going through bitmap_copy and > an intermediate unsigned long. > > * If we try to register more CPU events than the max we support (there's a > constant for it), drop the event and tell the user with error_report. But > really this is a bug, since we control what CPU events are traceable. Should > we abort() as well? > > * Added rth's R-b tag > > * Addressed my own comments: > * rename tb->trace_vcpu_dstate to the shorter tb->trace_ds > * use uint32_t for tb->trace_ds instead of a typedef > * add BUILD_BUG_ON check to make sure tb->trace_ds is big enough > * fix xxhash > > * Do not add trace_dstate to tb_htable_lookup, since we can grab it from > cpu->trace_dstate. > > This patchset applies cleanly on top of rth's tcg-next (a01792e1e). > > > Changes in v7 > ============= > > * Fix delayed dstate changes (now uses async_run_on_cpu() as suggested by > Paolo > Bonzini). > > * Note to Richard: patch 4 has been adapted to the new patch 3 async callback, > but is essentially the same. > > > Changes in v6 > ============= > > * Check hashing size error with QEMU_BUILD_BUG_ON [Richard Henderson]. > > > Changes in v5 > ============= > > * Move define into "qemu-common.h" to allow compilation of tests. > > > Changes in v4 > ============= > > * Incorporate trace_dstate into the TB hashing function instead of using > multiple physical TB caches [suggested by Richard Henderson]. > > > Changes in v3 > ============= > > * Rebase on 0737f32daf. > * Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi]. > * Refactor trace_get_vcpu_event_count() to be inlinable. > * Optimize cpu_tb_cache_set_requested() (hottest path). > > > Changes in v2 > ============= > > * Fix bitmap copy in cpu_tb_cache_set_apply(). > * Split generated code re-alignment into a separate patch [Daniel P. > Berrange]. > > Lluís Vilanova (6): > trace: Allocate cpu->trace_dstate in place > trace: [tcg] Delay changes to dynamic state when translating > exec: [tcg] Use different TBs according to the vCPU's dynamic tracing > state > trace: [tcg] Do not generate TCG code to trace dynamically-disabled > events > trace: [tcg,trivial] Re-align generated code > trace: [trivial] Statically enable all guest events > > > accel/tcg/cpu-exec.c | 8 ++++++-- > accel/tcg/translate-all.c | 11 +++++++++-- > include/exec/exec-all.h | 3 +++ > include/exec/tb-hash-xx.h | 7 +++++-- > include/exec/tb-hash.h | 5 +++-- > include/qom/cpu.h | 12 ++++++------ > qom/cpu.c | 8 -------- > scripts/tracetool/__init__.py | 3 ++- > scripts/tracetool/backend/dtrace.py | 4 ++-- > scripts/tracetool/backend/ftrace.py | 20 ++++++++++---------- > scripts/tracetool/backend/log.py | 19 ++++++++++--------- > scripts/tracetool/backend/simple.py | 4 ++-- > scripts/tracetool/backend/syslog.py | 6 +++--- > scripts/tracetool/backend/ust.py | 4 ++-- > scripts/tracetool/format/h.py | 26 +++++++++++++++++++------- > scripts/tracetool/format/tcg_h.py | 21 +++++++++++++++++---- > scripts/tracetool/format/tcg_helper_c.py | 5 +++-- > tcg/tcg-runtime.c | 3 ++- > tests/qht-bench.c | 2 +- > trace-events | 6 +++--- > trace/control-target.c | 21 ++++++++++++++++++--- > trace/control.c | 9 ++++++++- > trace/control.h | 3 +++ > 23 files changed, 137 insertions(+), 73 deletions(-) > > > To: qemu-devel@nongnu.org > Cc: Stefan Hajnoczi <stefa...@redhat.com> > Cc: Eduardo Habkost <ehabk...@redhat.com> > Cc: Eric Blake <ebl...@redhat.com> > Cc: Emilio G. Cota <c...@braap.org> >
Thanks, applied to my tracing tree: https://github.com/stefanha/qemu/commits/tracing Stefan
signature.asc
Description: PGP signature