Re: [Qemu-devel] [PATCH v8 02/11] accel: collecting TB execution count

2019-08-30 Thread Vanderson Martins do Rosario
Ok. I haven't change it before because I would like to be able to collect
information for already translated TBs when I, for instance, remove the
filter during execution. Having the TBStats already created guarantee this.
To guarantee this in your approach, we would need to tb_flush when changing
the filter. Does it make sense? Is that ok for you?

Vanderson M. Rosario


On Fri, Aug 30, 2019 at 7:21 AM Alex Bennée  wrote:

>
> vandersonmr  writes:
>
> > If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
> > enabled, then we instrument the start code of this TB
> > to atomically count the number of times it is executed.
> > We count both the number of "normal" executions and atomic
> > executions of a TB.
> >
> > The execution count of the TB is stored in its respective
> > TBS.
> >
> > All TBStatistics are created by default with the flags from
> > default_tbstats_flag.
> >
> > Signed-off-by: Vanderson M. do Rosario 
> > ---
> >  accel/tcg/cpu-exec.c  |  4 
> >  accel/tcg/tb-stats.c  |  5 +
> >  accel/tcg/tcg-runtime.c   |  7 +++
> >  accel/tcg/tcg-runtime.h   |  2 ++
> >  accel/tcg/translate-all.c |  7 +++
> >  accel/tcg/translator.c|  1 +
> >  include/exec/gen-icount.h |  9 +
> >  include/exec/tb-stats.h   | 19 +++
> >  util/log.c|  1 +
> >  9 files changed, 55 insertions(+)
> >
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 48272c781b..9b2b7bff80 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -251,6 +251,10 @@ void cpu_exec_step_atomic(CPUState *cpu)
> >
> >  start_exclusive();
> >
> > +if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
> > +tb->tb_stats->executions.atomic++;
> > +}
> > +
> >  /* Since we got here, we know that parallel_cpus must be true.
> */
> >  parallel_cpus = false;
> >  in_exclusive_region = true;
> > diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> > index 948b107e68..1db81d83e7 100644
> > --- a/accel/tcg/tb-stats.c
> > +++ b/accel/tcg/tb-stats.c
> > @@ -61,3 +61,8 @@ bool tb_stats_collection_paused(void)
> >  {
> >  return tcg_collect_tb_stats == TB_STATS_PAUSED;
> >  }
> > +
> > +uint32_t get_default_tbstats_flag(void)
> > +{
> > +return default_tbstats_flag;
> > +}
> > diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
> > index 8a1e408e31..6f4aafba11 100644
> > --- a/accel/tcg/tcg-runtime.c
> > +++ b/accel/tcg/tcg-runtime.c
> > @@ -167,3 +167,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
> >  {
> >  cpu_loop_exit_atomic(env_cpu(env), GETPC());
> >  }
> > +
> > +void HELPER(inc_exec_freq)(void *ptr)
> > +{
> > +TBStatistics *stats = (TBStatistics *) ptr;
> > +g_assert(stats);
> > +atomic_inc(&stats->executions.normal);
> > +}
> > diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
> > index 4fa61b49b4..bf0b75dbe8 100644
> > --- a/accel/tcg/tcg-runtime.h
> > +++ b/accel/tcg/tcg-runtime.h
> > @@ -28,6 +28,8 @@ DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE,
> ptr, env)
> >
> >  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
> >
> > +DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
> > +
> >  #ifdef CONFIG_SOFTMMU
> >
> >  DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > index b7bccacd3b..e72aeba682 100644
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -1785,6 +1785,13 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >   */
> >  if (tb_stats_collection_enabled()) {
> >  tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
> > +
> > +if (qemu_log_in_addr_range(tb->pc)) {
>
> We can open this out because this test will always pass if no dfilter
> has been set and there is no point creating a tb_stats record if we
> won't fill it in. So
>
>   if (qemu_log_in_addr_range(tb->pc)) {
>  tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
>  uint32_t flag = get_default_tbstats_flag();
>
>  if (flag & TB_EXEC_STATS) {
>...
>
> And the additional tests that get added later. This way we'll only
> create and collect stats for what we want.
>
> > +uint32_t flag = get_default_tbstats_flag();
> > +if (flag & TB_EXEC_STATS) {
> > +tb->tb_stats->stats_enabled |= TB_EXEC_STATS;
> > +}
> > +}
> >  } else {
> >  tb->tb_stats = NULL;
> >  }
> > diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> > index 70c66c538c..ec6bd829a0 100644
> > --- a/accel/tcg/translator.c
> > +++ b/accel/tcg/translator.c
> > @@ -46,6 +46,7 @@ void translator_loop(const TranslatorOps *ops,
> DisasContextBase *db,
> >
> >  ops->init_disas_context(db, cpu);
> >  tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> > +gen_tb_exec_count(tb);
> >
> >  /* Reset the temp count s

Re: [Qemu-devel] [PATCH v8 08/11] Adding tb_stats [start|pause|stop|filter] command to hmp.

2019-08-29 Thread Vanderson Martins do Rosario
Ops, this commit shouldn't exist. My mistake. The series still work though.
I will remove this on v9.


Vanderson M. Rosario


On Thu, Aug 29, 2019 at 2:35 PM vandersonmr  wrote:

> This allows controlling the collection of statistics.
> It is also possible to set the level of collection:
> all, jit, or exec.
>
> tb_stats filter allow to only collect statistics for the TB
> in the last_search list.
>
> The goal of this command is to allow the dynamic exploration
> of the TCG behavior and quality. Therefore, for now, a
> corresponding QMP command is not worthwhile.
>
> Acked-by: Dr. David Alan Gilbert 
> Signed-off-by: Vanderson M. do Rosario 
> ---
>  monitor/misc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/monitor/misc.c b/monitor/misc.c
> index b389ca09a1..218263d29a 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -74,6 +74,8 @@
>  #include "sysemu/cpus.h"
>  #include "qemu/cutils.h"
>  #include "tcg/tcg.h"
> +#include "exec/tb-stats.h"
> +#include "qemu-common.h"
>
>  #if defined(TARGET_S390X)
>  #include "hw/s390x/storage-keys.h"
> --
> 2.22.0
>
>


Re: [Qemu-devel] [PATCH v1 1/2] accel/tcg: adding integration with linux perf

2019-08-21 Thread Vanderson Martins do Rosario
On Thu, Aug 15, 2019 at 11:40 AM Stefan Hajnoczi  wrote:

> On Wed, Aug 14, 2019 at 11:37:24PM -0300, vandersonmr wrote:
> > This commit adds support to Linux Perf in order
> > to be able to analyze qemu jitted code and
> > also to able to see the TBs PC in it.
>
> Is there any reference to the file format?  Please include it in a code
> comment, if such a thing exists.
>
> > diff --git a/accel/tcg/perf/jitdump.c b/accel/tcg/perf/jitdump.c
> > new file mode 100644
> > index 00..6f4c0911c2
> > --- /dev/null
> > +++ b/accel/tcg/perf/jitdump.c
> > @@ -0,0 +1,180 @@
>
> License header?
>
> > +#ifdef __linux__
>
> If the entire source file is #ifdef __linux__ then Makefile.objs should
> probably contain obj-$(CONFIG_LINUX) += jitdump.o instead.  Letting the
> build system decide what to compile is cleaner than ifdeffing large
> amounts of code.
>
> > +
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "jitdump.h"
> > +#include "qemu-common.h"
>
> Please follow QEMU's header ordering conventions.  See ./HACKING "1.2.
> Include directives".
>
> > +void start_jitdump_file(void)
> > +{
> > +GString *dumpfile_name = g_string_new(NULL);;
> > +g_string_printf(dumpfile_name, "./jit-%d.dump", getpid());
>
> Simpler:
>
>   gchar *dumpfile_name = g_strdup_printf("./jit-%d.dump", getpid());
>   ...
>   g_free(dumpfile_name);
>
> > +dumpfile = fopen(dumpfile_name->str, "w+");
>
> getpid() and the global dumpfile variable make me wonder what happens
> with multi-threaded TCG?
>

I did some tests and it appears to be working fine with multi-threaded TCG.
tcg_exec_init should execute only once even in multi-threaded TCG, right?
If so, we are going to create only one jitdump file. Also, both in Windows
and Linux/POSIX fwrites is thread-safe, thus we would be safely updating
the jitdump file. Does it make sense?


>
> > +
> > +perf_marker = mmap(NULL, sysconf(_SC_PAGESIZE),
>
> Please mention the point of this mmap in a comment.  My best guess is
> that perf stores the /proc/$PID/maps and this is how it finds the
> jitdump file?
>
> > +  PROT_READ | PROT_EXEC,
> > +  MAP_PRIVATE,
> > +  fileno(dumpfile), 0);
> > +
> > +if (perf_marker == MAP_FAILED) {
> > +printf("Failed to create mmap marker file for perf %d\n",
> fileno(dumpfile));
> > +fclose(dumpfile);
> > +return;
> > +}
> > +
> > +g_string_free(dumpfile_name, TRUE);
> > +
> > +struct jitheader *header = g_new0(struct jitheader, 1);
>
> Why g_new this struct?  It's small and can be declared on the stack.
>
> Please use g_free() with g_malloc/new/etc().  It's not safe to mismatch
> glib and libc memory allocation functions.
>
> > +header->magic = 0x4A695444;
> > +header->version = 1;
> > +header->elf_mach = get_e_machine();
> > +header->total_size = sizeof(struct jitheader);
> > +header->pid = getpid();
> > +header->timestamp = get_timestamp();
> > +
> > +fwrite(header, header->total_size, 1, dumpfile);
> > +
> > +free(header);
> > +fflush(dumpfile);
> > +}
> > +
> > +void append_load_in_jitdump_file(TranslationBlock *tb)
> > +{
> > +GString *func_name = g_string_new(NULL);
> > +g_string_printf(func_name, "TB virt:0x"TARGET_FMT_lx"%c", tb->pc,
> '\0');
>
> The explicit NUL character looks strange to me.  I think the idea is to
> avoid func_name->len + 1?  Adding NUL characters to C strings can be a
> source of bugs, I would stick to convention and do len + 1 instead of
> putting NUL characters into the GString.  This is a question of style
> though.
>
> > +
> > +struct jr_code_load *load_event = g_new0(struct jr_code_load, 1);
>
> No need to allocate load_event on the heap.
>
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 9621e934c0..1c26eeeb9c 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -4147,6 +4147,18 @@ STEXI
> >  Enable FIPS 140-2 compliance mode.
> >  ETEXI
> >
> > +#ifdef __linux__
> > +DEF("perf", 0, QEMU_OPTION_perf,
> > +"-perfdump jitdump files to help linux perf JIT code
> visualization\n",
> > +QEMU_ARCH_ALL)
> > +#endif
> > +STEXI
> > +@item -perf
> > +@findex -perf
> > +Dumps jitdump files to help linux perf JIT code visualization
>
> Suggestions on expanding the documentation:
>
> Where are the jitdump files dumped?  The current working directory?
>
> Anything to say about the naming scheme for these files?
>
> Can you include an example of how to load them into perf(1)?
>


Re: [Qemu-devel] [PATCH v5 08/10] Adding info [tbs|tb|coverset] commands to HMP. These commands allow the exploration of TBs generated by the TCG. Understand which one hotter, with more guest/host ins

2019-08-21 Thread Vanderson Martins do Rosario
Do you think that "info tb-list 15" as a command to list 15 TBs make more
sense than "info tbs 15"?

It would be "into tb id" to investigate a specific TB and "into tb-list
number" to list the TBs.

Vanderson M. Rosario


On Thu, Aug 15, 2019 at 6:00 AM Dr. David Alan Gilbert 
wrote:

> * vandersonmr (vanderson...@gmail.com) wrote:
> > The goal of this command is to allow the dynamic exploration
> > of TCG behavior and code quality. Therefore, for now, a
> > corresponding QMP command is not worthwhile.
> >
> > Signed-off-by: Vanderson M. do Rosario 
> > ---
> >  accel/tcg/tb-stats.c | 398 ++-
> >  accel/tcg/translate-all.c|   2 +-
> >  disas.c  |  31 ++-
> >  hmp-commands-info.hx |  24 +++
> >  include/exec/tb-stats.h  |  43 +++-
> >  include/qemu/log-for-trace.h |   4 +
> >  include/qemu/log.h   |   2 +
> >  monitor/misc.c   |  74 +++
> >  util/log.c   |  52 -
> >  9 files changed, 609 insertions(+), 21 deletions(-)
> >
> > diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> > index f28fd7b434..f5e519bdb7 100644
> > --- a/accel/tcg/tb-stats.c
> > +++ b/accel/tcg/tb-stats.c
> > @@ -11,9 +11,36 @@
> >
> >  /* only accessed in safe work */
> >  static GList *last_search;
> > -
> > +int id = 1; /* display_id increment counter */
> >  uint64_t dev_time;
> >
> > +static TBStatistics *get_tbstats_by_id(int id)
> > +{
> > +GList *iter;
> > +
> > +for (iter = last_search; iter; iter = g_list_next(iter)) {
> > +TBStatistics *tbs = iter->data;
> > +if (tbs && tbs->display_id == id) {
> > +return tbs;
> > +break;
> > +}
> > +}
> > +return NULL;
> > +}
> > +
> > +static TBStatistics *get_tbstats_by_addr(target_ulong pc)
> > +{
> > +GList *iter;
> > +for (iter = last_search; iter; iter = g_list_next(iter)) {
> > +TBStatistics *tbs = iter->data;
> > +if (tbs && tbs->pc == pc) {
> > +return tbs;
> > +break;
> > +}
> > +}
> > +return NULL;
> > +}
> > +
> >  struct jit_profile_info {
> >  uint64_t translations;
> >  uint64_t aborted;
> > @@ -155,6 +182,7 @@ static void clean_tbstats(void)
> >  qht_destroy(&tb_ctx.tb_stats);
> >  }
> >
> > +
> >  void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
> >  {
> >  struct TbstatsCommand *cmdinfo = icmd.host_ptr;
> > @@ -242,6 +270,374 @@ void init_tb_stats_htable_if_not(void)
> >  }
> >  }
> >
> > +static void collect_tb_stats(void *p, uint32_t hash, void *userp)
> > +{
> > +last_search = g_list_prepend(last_search, p);
> > +}
> > +
> > +static void dump_tb_targets(TBStatistics *tbs)
> > +{
> > +if (tbs && tbs->tb) {
> > +uintptr_t dst1 = atomic_read(tbs->tb->jmp_dest);
> > +uintptr_t dst2 = atomic_read(tbs->tb->jmp_dest + 1);
> > +TranslationBlock* tb_dst1 = dst1 > 1 ? (TranslationBlock *)
> dst1 : 0;
> > +TranslationBlock* tb_dst2 = dst2 > 1 ? (TranslationBlock *)
> dst2 : 0;
> > +target_ulong pc1 = tb_dst1 ? tb_dst1->pc : 0;
> > +target_ulong pc2 = tb_dst2 ? tb_dst2->pc : 0;
> > +
> > +/* if there is no display id from the last_search, then create
> one */
> > +TBStatistics *tbstats_pc1 = get_tbstats_by_addr(pc1);
> > +TBStatistics *tbstats_pc2 = get_tbstats_by_addr(pc2);
> > +
> > +if (!tbstats_pc1 && tb_dst1 && tb_dst1->tb_stats) {
> > +last_search = g_list_append(last_search, tb_dst1->tb_stats);
> > +tbstats_pc1 = tb_dst1->tb_stats;
> > +}
> > +
> > +if (!tbstats_pc2 && tb_dst2 && tb_dst2->tb_stats) {
> > +last_search = g_list_append(last_search, tb_dst2->tb_stats);
> > +tbstats_pc2 = tb_dst2->tb_stats;
> > +}
> > +
> > +if (tbstats_pc1 && tbstats_pc1->display_id == 0) {
> > +tbstats_pc1->display_id = id++;
> > +}
> > +
> > +if (tbstats_pc2 && tbstats_pc2->display_id == 0) {
> > +tbstats_pc2->display_id = id++;
> > +}
> > +
> > +if (pc1 && !pc2) {
> > +qemu_log("\t| targets: 0x"TARGET_FMT_lx" (id:%d)\n",
> > +pc1, tb_dst1 ? tbstats_pc1->display_id : -1);
> > +} else if (pc1 && pc2) {
> > +qemu_log("\t| targets: 0x"TARGET_FMT_lx" (id:%d), "
> > + "0x"TARGET_FMT_lx" (id:%d)\n",
> > +pc1, tb_dst1 ? tbstats_pc1->display_id : -1,
> > +pc2, tb_dst2 ? tbstats_pc2->display_id : -1);
> > +} else {
> > +qemu_log("\t| targets: no direct target\n");
> > +}
> > +}
> > +}
> > +
> > +static void dump_tb_header(TBStatistics *tbs)
> > +{
> > +unsigned g = stat_per_translation(tbs, code.num_guest_inst);
> > +unsigned ops = stat_per_translation(tbs, code.num_tcg_ops);
> > +unsigned ops_opt = stat_per_translation(tbs, code.num_tcg_ops_opt);

Re: [Qemu-devel] [PATCH 2/3] pc: Improve error message when die-id is omitted

2019-08-15 Thread Vanderson Martins do Rosario
Reviewed-by: Vanderson M. do Rosario 

Vanderson M. Rosario


On Thu, Aug 15, 2019 at 3:43 PM Eduardo Habkost  wrote:

> The error message when die-id is omitted doesn't make sense:
>
>   $ qemu-system-x86_64 -smp 1,sockets=6,maxcpus=6 \
> -device qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id=0
>   qemu-system-x86_64: -device
> qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id=0: \
> Invalid CPU die-id: 4294967295 must be in range 0:0
>
> Fix it, so it will now read:
>
>   qemu-system-x86_64: -device
> qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id=0: \
> CPU die-id is not set
>
> Signed-off-by: Eduardo Habkost 
> ---
>  hw/i386/pc.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 24b03bb49c..fb4ac5ca90 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2410,6 +2410,10 @@ static void pc_cpu_pre_plug(HotplugHandler
> *hotplug_dev,
>  error_setg(errp, "Invalid CPU socket-id: %u must be in range
> 0:%u",
> cpu->socket_id, max_socket);
>  return;
> +}
> +if (cpu->die_id < 0) {
> +error_setg(errp, "CPU die-id is not set");
> +return;
>  } else if (cpu->die_id > pcms->smp_dies - 1) {
>  error_setg(errp, "Invalid CPU die-id: %u must be in range
> 0:%u",
> cpu->die_id, pcms->smp_dies - 1);
> --
> 2.21.0
>
>
>


Re: [Qemu-devel] [PATCH 1/3] pc: Fix error message on die-id validation

2019-08-15 Thread Vanderson Martins do Rosario
Reviewed-by: Vanderson M. do Rosario 

Vanderson M. Rosario


On Thu, Aug 15, 2019 at 3:48 PM Eduardo Habkost  wrote:

> The error message for die-id range validation is incorrect.  Example:
>
>   $ qemu-system-x86_64 -smp 1,sockets=6,maxcpus=6 \
> -device qemu64-x86_64-cpu,socket-id=1,die-id=1,core-id=0,thread-id=0
>   qemu-system-x86_64: -device
> qemu64-x86_64-cpu,socket-id=1,die-id=1,core-id=0,thread-id=0: \
> Invalid CPU die-id: 1 must be in range 0:5
>
> The actual range for die-id in this example is 0:0.
>
> Fix the error message to use smp_dies and print the correct range.
>
> Signed-off-by: Eduardo Habkost 
> ---
>  hw/i386/pc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 549c437050..24b03bb49c 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2412,7 +2412,7 @@ static void pc_cpu_pre_plug(HotplugHandler
> *hotplug_dev,
>  return;
>  } else if (cpu->die_id > pcms->smp_dies - 1) {
>  error_setg(errp, "Invalid CPU die-id: %u must be in range
> 0:%u",
> -   cpu->die_id, max_socket);
> +   cpu->die_id, pcms->smp_dies - 1);
>  return;
>  }
>  if (cpu->core_id < 0) {
> --
> 2.21.0
>
>
>


Re: [Qemu-devel] [PATCH v3 5/6] monitor: adding info tb and tbs to monitor

2019-07-07 Thread Vanderson Martins do Rosario
Markus,

Thank you for your comments! Based on your questions and suggestions of
writing a more complete explanation in my commits, I decided to start to
describe our whole work on the wiki:
https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality
I will update and expand it weekly, so I can link and cite it in future
patches to give a better whole vision for anyone interested.

Btw, thank you for your QMP tips, I will discuss with Alex how to address
it in our approach.

[]'s
Vanderson M. Rosario


On Sat, Jul 6, 2019 at 3:09 AM Markus Armbruster  wrote:

> Cc: Marc-André, who has patches that might be useful here.
>
> Alex Bennée  writes:
>
> > Markus Armbruster  writes:
> >
> >> vandersonmr  writes:
> >>
> > 
> >
> > I'll leave Vanderson to address your other comments.
> >
> >>
> >> Debugging commands are kind of borderline.  Debugging is commonly a
> >> human activity, where HMP is just fine.  However, humans create tools to
> >> assist with their activities, and then QMP is useful.  While I wouldn't
> >> encourage HMP-only for the debugging use case, I wouldn't veto it.
> >>
> >> Your (overly terse!) commit message and help texts make me guess the
> >> commands are for gathering statistics.  Statistics can have debugging
> >> uses.  But they often have non-debugging uses as well.  What use cases
> >> can you imagine for these commands?
> >
> > So this is all really aimed at making TCG go faster - but before we can
> > make it go faster we need better tools for seeing where the time is
> > being spent and examining the code that we generate. So I expect the
> > main users of this functionality will be QEMU developers.
> >
> > That said I can see a good rationale for supporting QMP because it is
> > more amenable to automation. However this is early days so I would
> > caution about exposing this stuff too early least we bake in a woolly
> > API.
>
> Development tools should exempt themselves from QMP's interface
> stability promise: prefix the command names with 'x-'.
>
> > The other wrinkle is we do have to take control of the emulator to
> > safely calculate some of the numbers we output. This essentially means
> > the HMP commands are asynchronous - we kick of safe work which waits
> > until all vCPU threads are stopped before we go through the records and
> > add up numbers. This is fine for the HMP because we just output to the
> > monitor FD when we are ready. I assume for QMP commands there is more
> > housekeeping to do? Can QMP commands wait for a response to be
> > calculated by another thread? Are there any existing commands that have
> > to support this sort of pattern?
>
> Let me clarify "synchronous" to avoid confusion.
>
> Both QMP and HMP commands are synchronous protocols in the sense that
> commands are executed one after the other, without overlap.  When a
> client sends multiple commands, it can assume that each one starts only
> after the previous one completed.
>
> Both HMP and QMP commands execute synchronously in the sense that the
> command runs to completion without ever yielding the thread.  Any
> blocking operations put the thread to sleep (but see below).
>
> HMP runs in the main thread.  Putting the main thread to sleep is
> generally undesirable.
>
> QMP used to run in the main thread, too.  Nowadays, the QMP core runs in
> an I/O thread shared by all monitors, and dispatches commands to the
> main thread.  Moving command execution out of the main thread as well
> requires careful review of the command's code for hidden assumptions.
> Major project.
>
> Fine print: OOB commands are a special case, but I doubt you want to
> know more.
>
> Fine print: certain character devices can't support use of an I/O
> thread; QMP runs in the main thread then.  The ones you want to use with
> QMP all support I/O threads.
>
> You wrote "we kick of safe work which waits until all vCPU threads are
> stopped before we go through the records and add up numbers [...] we
> just output to the monitor FD".  Does this mean the HMP command kicks
> off the work, terminates, and some time later something else prints
> results to the monitor?  How much later?
>
> If "later" is actually "soon", for a suitable value of "soon",
> Marc-André's work on "asynchronous" QMP might be pertinent.  I put
> "asynchronous" in scare quotes, because of the confusion it has caused.
> My current understanding (Marc-André, please correct me if wrong): it
> lets QMP commands to block without putting their thread to sleep.  It
> does not make QMP an asynchronous protocol.
>
> If "later" need not be "soon", read on.
>
> In QMP, there are two established ways to do potentially long-running
> work.  Both ways use a command that kicks off the work, then terminates
> without waiting for it to complete.
>
> The first way is traditional: pair the kick off command with a query
> command and optionally an event.
>
> When the work completes, it fires off the event.  The event is broadcast
> to all QMP monitors (we could imple

Re: [Qemu-devel] [PATCH v2 1/4] add and link a statistic struct to TBs

2019-06-25 Thread Vanderson Martins do Rosario
When the tb_flush removes a block and it's recreated, this shouldn't
be creating a new block but using the one that is found by:

lookup_result = g_list_find_custom(tb_ctx.tb_statistics, new_stats,
statistics_cmp);

So the tb_statisticics will be reused and we could just add this
regen counter in the if statement: if (lookup_result)

isn't that correct?

Vanderson M. Rosario


On Tue, Jun 25, 2019 at 1:28 PM Alex Bennée  wrote:

>
> Alex Bennée  writes:
>
> > vandersonmr  writes:
> >
> >> We want to store statistics for each TB even after flushes.
> >> We do not want to modify or grow the TB struct.
> >> So we create a new struct to contain this statistics and
> >> link it to each TB while they are created.
> >>
> >> Signed-off-by: Vanderson M. do Rosario 
> >> ---
> >>  accel/tcg/translate-all.c | 40 +++
> >>  include/exec/exec-all.h   | 21 
> >>  include/exec/tb-context.h |  1 +
> >>  include/qemu/log.h|  1 +
> >>  4 files changed, 63 insertions(+)
> >>
> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> >> index 5d1e08b169..f7e99f90e0 100644
> >> --- a/accel/tcg/translate-all.c
> >> +++ b/accel/tcg/translate-all.c
> >> @@ -1118,6 +1118,18 @@ static inline void code_gen_alloc(size_t tb_size)
> >>  }
> >>  }
> >>
> >> +static gint statistics_cmp(gconstpointer p1, gconstpointer p2)
> >> +{
> >> +const TBStatistics *a = (TBStatistics *) p1;
> >> +const TBStatistics *b = (TBStatistics *) p2;
> >> +
> >> +return (a->pc == b->pc &&
> >> +   a->cs_base == b->cs_base &&
> >> +   a->flags == b->flags &&
> >> +   a->page_addr[0] == b->page_addr[0] &&
> >> +   a->page_addr[1] == b->page_addr[1]) ? 0 : 1;
> >> +}
> >> +
> >
> > There are a bunch of white space errors that need fixing up as you
> > probably already know from patchew ;-)
> >
> >>  static bool tb_cmp(const void *ap, const void *bp)
> >>  {
> >>  const TranslationBlock *a = ap;
> >> @@ -1586,6 +1598,29 @@ static inline void tb_page_add(PageDesc *p,
> TranslationBlock *tb,
> >>  #endif
> >>  }
> >>
> >> +static void tb_insert_statistics_structure(TranslationBlock *tb) {
> >> +TBStatistics *new_stats = g_new0(TBStatistics, 1);
> >> +new_stats->pc = tb->pc;
> >> +new_stats->cs_base = tb->cs_base;
> >> +new_stats->flags = tb->flags;
> >> +new_stats->page_addr[0] = tb->page_addr[0];
> >> +new_stats->page_addr[1] = tb->page_addr[1];
> >> +
> >> +GList *lookup_result = g_list_find_custom(tb_ctx.tb_statistics,
> new_stats, statistics_cmp);
> >> +
> >> +if (lookup_result) {
> >> +/* If there is already a TBStatistic for this TB from a
> previous flush
> >> +* then just make the new TB point to the older TBStatistic
> >> +*/
> >> +free(new_stats);
> >> +tb->tb_stats = lookup_result->data;
> >> +} else {
> >> +/* If not, then points to the new tb_statistics and add it
> to the hash */
> >> +tb->tb_stats = new_stats;
> >> +tb_ctx.tb_statistics = g_list_prepend(tb_ctx.tb_statistics,
> >> new_stats);
> >
> > This is going to race as nothing prevents two threads attempting to
> > insert records at the same time.
> >
> >> +}
> >> +}
> >> +
> >>  /* add a new TB and link it to the physical page tables. phys_page2 is
> >>   * (-1) to indicate that only one page contains the TB.
> >>   *
> >> @@ -1636,6 +1671,11 @@ tb_link_page(TranslationBlock *tb,
> tb_page_addr_t phys_pc,
> >>  void *existing_tb = NULL;
> >>  uint32_t h;
> >>
> >> +if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
> >> +/* create and link to its TB a structure to store
> statistics */
> >> +tb_insert_statistics_structure(tb);
> >> +}
> >> +
> >>  /* add in the hash table */
> >>  h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags &
> CF_HASH_MASK,
> >>   tb->trace_vcpu_dstate);
> >
> > Perhaps we could just have a second qht array which allows for
> > "unlocked" insertion of record. You could take advantage of the fact the
> > hash would be the same:
> >
> > h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags &
> CF_HASH_MASK,
> >  tb->trace_vcpu_dstate);
> > qht_insert(&tb_ctx.htable, tb, h, &existing_tb);
> >
> > /* remove TB from the page(s) if we couldn't insert it */
> > if (unlikely(existing_tb)) {
> > ...
> > tb = existing_tb;
> > } else if (qemu_loglevel_mask(CPU_LOG_HOT_TBS)) {
> > TBStatistics *new_stats = g_new0(TBStatistics, 1);
> > bool ok;
> > new_stats->pc = tb->pc;
> > new_stats->cs_base = tb->cs_base;
> > new_stats->flags = tb->flags;
> > new_stats->page_addr[0] = tb->page_addr[0];
> > new_stats->page_addr[1] = tb->page_addr[1];
> >

[Qemu-devel] [GSOC] Improving Measurement of Tiny Code Generation Quality

2019-03-27 Thread Vanderson Martins do Rosario
Hi everyone,

I’m sending this email to present myself and a project that I’m planning to
submit to QEMU/GSoC this year. As I see that this project could result in
useful improvements and tools to the community, I would love to have
feedback from the community and perhaps make it even more useful.

-

So, before I start, I would like to present myself:

I'm Vanderson Martins do Rosario, a Ph.D. candidate in Computer Science at
Unicamp in Campinas, Brazil.

Motivations to GSOC:  My first contact with the Free and Open Source
movement was through a neighbor of mine back when I was in high-school
(2009) and since then I have been using free software daily, became a
regular user of GNU/Linux and graduated in Computer Science (2016). My main
contributions to the FOSS community was related only to organizing events
(FLISOL) and founding a Hackerspace in my city helping to spread free
software philosophy. However, I have not yet had the chance to contribute
in code for main FOSS projects (only a few LLVM commits), initially because
I did not have the skills necessary and then because I did not have time.
Today I am a Ph.D. candidate in Computer Science working with JIT compilers
and Dynamic Binary Translation and when I saw QEMU in GSoC, I realize that
now I have the knowledge (it is what I have been working in my Ph.D.) and
also the time (as I have already finished all my classes and obligations).
Thus, I thought that this would be a great opportunity to finally
contribute to the community with code.
Technically, I will not be on vacation during the GSoC because in my Ph.D.
program there is not such a thing, but my advisor agreed that this will be
a great opportunity for me and, as it is related to my thesis theme, he
gave me permission to only work and focus on GSoC  during the necessary
period.

Experience and Contributions: I have experience with Dynamic Compilers and
Compilers in general. I have already done an internship as Compiler
Engineer at MATERA Systems and I have worked in a research project with the
Android JIT compiler (ART) at Samsung Brazil.
I am proficient in C, modern C++, and Python. I also have experience
working with x86, ARM, MIPS, and RISC-V assembly.

Some of my relevant projects:
- OI-DBT: A Dynamic Binary Translator for a research ISA named OpenISA that
uses the LLVM to generate optimized translations to x86 and ARM.
https://github.com/OpenISA/oi-dbt
- Custom RISC-V Tracer: I changed RV-8 RISC-V emulator to dump execution
traces and others execution information in a unique gzip file.
https://github.com/vandersonmr/custom-riscv-tracer
- RAIn: A DBT simulator that uses traces from the Custom RISC-V Tracer to
test new proposed DBT’s configurations. https://github.com/vandersonmr/Rain3
- RISC-V SBT: A RISC-V LLVM-based Static Binary Translator which I have
helped building https://github.com/OpenISA/riscv-sbt

-

Project: Improving Measurement of Tiny Code Generation Quality.
Mentor: Alex Bennée

I/ Introduction

In most applications, the majority of the execution time is spent in a very
small portion of code. Regions of a code which have high-frequency
execution are called hot while all other regions are called cold. As a
direct consequence, emulators also spent most of their execution time
emulating these hot regions and, so, dynamic compilers and translators need
to pay extra attention to them. To guarantee that these hot regions are
compiled/translated generating high-quality code is fundamental to achieve
a final high-performance emulation. Thus, one of the most important steps
in tuning an emulator performance is to identify which are the hot regions
and to measure their translation quality.

QEMU is not different and it offers the ‘-d’ options which can dump the
imputed assembly (guest binary) with ‘-d in_asm’, the generated TCG
Intermediate Representation (IR, TCG ops)  with ‘-d op_opt‘ and the final
host assembly (target binary) with ‘-d out_asm’. So, one could use these
dumps to analyze the quality of the translations, but in practice, these
options generate a huge amount of data and, therefore, doing this analyze
by hand can easily become infeasible. The ‘-dfilter’ can be used to specify
a range of address to filter the dump, so we could use it to only dump
information for the hot Translated Blocks (TB). However, in QEMU,
currently, there is also no easy way to identify hot TBs. We could use the
‘-d exec’ information to reconstruct the emulation frequency and identify
the hot TBs, but, once again, it is not trivial to do so as it generates
lots of raw information.

Having in mind the importance of detecting and inspecting hot regions of
code and giving the lack of tools to do so in QEMU, we propose in this
project the enhance of QEMU log system to add these capabilities. Our plan
is to add three new capabilities to QEMU: (