On Fri, Aug 8, 2025 at 9:23 PM Pierrick Bouvier <pierrick.bouv...@linaro.org> wrote: > > On 8/8/25 11:13 AM, Manos Pitsidianakis wrote: > > On Fri, Aug 8, 2025 at 9:03 PM Pierrick Bouvier > > <pierrick.bouv...@linaro.org> wrote: > >> > >> On 8/8/25 2:11 AM, Manos Pitsidianakis wrote: > >>> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouv...@linaro.org> > >>> wrote: > >>>> We implement tracing, following uftrace format. > >>>> Trace is flushed every 32 MB, so file operations don't impact > >>>> performance at runtime. > >>>> > >>>> A different trace is generated per cpu, and we ensure they have a unique > >>>> name, based on vcpu_index, while keeping room for privilege level coming > >>>> in next commit. > >>> > >>> Suggestion (not a request): put some kind of documentation about the > >>> format this patch implements, maybe a commit sha & URL to a header file > >>> from upstream uftrace. > >>> > >>>> > >>>> Signed-off-by: Pierrick Bouvier <pierrick.bouv...@linaro.org> > >>>> --- > >>>> contrib/plugins/uftrace.c | 149 +++++++++++++++++++++++++++++++++++++- > >>>> 1 file changed, 148 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c > >>>> index d51faceb344..402a242433e 100644 > >>>> --- a/contrib/plugins/uftrace.c > >>>> +++ b/contrib/plugins/uftrace.c > >>>> @@ -12,6 +12,13 @@ > >>>> #include <qemu-plugin.h> > >>>> #include <glib.h> > >>>> #include <stdio.h> > >>>> +#include <sys/stat.h> > >>>> +#include <sys/time.h> > >>>> +#include <time.h> > >>>> +#include <unistd.h> > >>>> + > >>>> +#define MiB (INT64_C(1) << 20) > >>>> +#define NANOSECONDS_PER_SECOND 1000000000LL > >>>> > >>>> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION; > >>>> > >>>> @@ -24,6 +31,13 @@ typedef struct { > >>>> uint64_t frame_pointer; > >>>> } CallstackEntry; > >>>> > >>>> +typedef struct { > >>>> + GArray *t; > >>>> + GString *path; > >>>> + GString *name; > >>>> + uint32_t id; > >>>> +} Trace; > >>>> + > >>>> typedef struct Cpu Cpu; > >>>> > >>>> typedef struct { > >>>> @@ -34,6 +48,7 @@ typedef struct { > >>>> } CpuOps; > >>>> > >>>> typedef struct Cpu { > >>>> + Trace *trace; > >>>> Callstack *cs; > >>>> GByteArray *buf; > >>>> CpuOps ops; > >>>> @@ -44,9 +59,41 @@ typedef struct { > >>>> struct qemu_plugin_register *reg_fp; > >>>> } Aarch64Cpu; > >>>> > >>>> +typedef struct { > >>>> + uint64_t timestamp; > >>>> + uint64_t data; > >>>> +} UftraceEntry; > >>>> + > >>>> +typedef enum { > >>>> + UFTRACE_ENTRY, > >>>> + UFTRACE_EXIT, > >>>> + UFTRACE_LOST, > >>>> + UFTRACE_EVENT > >>>> +} UftraceRecordType; > >>>> + > >>>> static struct qemu_plugin_scoreboard *score; > >>>> static CpuOps arch_ops; > >>>> > >>>> +static uint64_t gettime_ns(void) > >>>> +{ > >>>> +#ifdef _WIN32 > >>>> + /* > >>>> + * On Windows, timespec_get is available only with UCRT, but not > >>>> with > >>>> + * MinGW64 environment. Simplify by using only gettimeofday on this > >>>> + * platform. This may result in a precision loss. > >>>> + */ > >>>> + struct timeval tv; > >>>> + gettimeofday(&tv, NULL); > >>>> + uint64_t now_ns = tv.tv_sec * NANOSECONDS_PER_SECOND + tv.tv_usec * > >>>> 1000; > >>>> +#else > >>>> + /* We need nanosecond precision for short lived functions. */ > >>>> + struct timespec ts; > >>>> + timespec_get(&ts, TIME_UTC); > >>>> + uint64_t now_ns = ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec; > >>>> +#endif > >>>> + return now_ns; > >>>> +} > >>>> + > >>>> static Callstack *callstack_new(void) > >>>> { > >>>> Callstack *cs = g_new0(Callstack, 1); > >>>> @@ -112,6 +159,85 @@ static CallstackEntry callstack_pop(Callstack *cs) > >>>> return e; > >>>> } > >>>> > >>>> +static Trace *trace_new(uint32_t id, GString *name) > >>>> +{ > >>>> + Trace *t = g_new0(Trace, 1); > >>>> + t->t = g_array_new(false, false, sizeof(UftraceEntry)); > >>>> + t->path = g_string_new(NULL); > >>>> + g_string_append_printf(t->path, "./uftrace.data/%"PRIu32".dat", id); > >>>> + t->name = g_string_new(name->str); > >>>> + t->id = id; > >>>> + return t; > >>>> +} > >>>> + > >>>> +static void trace_free(Trace *t) > >>>> +{ > >>>> + g_assert(t->t->len == 0); > >>>> + g_array_free(t->t, true); > >>>> + t->t = NULL; > >>>> + g_string_free(t->path, true); > >>>> + t->path = NULL; > >>>> + g_string_free(t->name, true); > >>>> + t->name = NULL; > >>>> + g_free(t); > >>>> +} > >>>> + > >>>> +static void trace_flush(Trace *t, bool append) > >>>> +{ > >>>> + int create_dir = g_mkdir_with_parents("./uftrace.data", > >>>> + S_IRWXU | S_IRWXG | S_IRWXO); > >>>> + g_assert(create_dir == 0); > >>>> + FILE *dat = fopen(t->path->str, append ? "a" : "w"); > >>>> + g_assert(dat); > >>>> + GArray *data = t->t; > >>>> + if (data->len) { > >>>> + fwrite(data->data, data->len, sizeof(UftraceEntry), dat); > >>> > >>> fwrite might not write all bytes, how about using the > >>> g_file_set_contents() wrapper? > >>> > >> > >> If I see correctly, g_file_set_contents does not allow to append data, > >> which is what we need to do here (that's the point of flushing every > >> 32MB). > > > > Ah you're right, my bad. It might be because it renames a temporary > > file to the destination filename in order to make the operation > > atomic. > > > >> I can add an assert on fwrite return to make sure we wrote > >> everything. > > > > Or a while loop with a bytes_written counter. > > > > fwrite guarantees all data will be written (to the opposite or raw write > calls), including if syscall is interrupted, doing a proper write loop > until all data is written.
Right, as the manpage says: > RETURN VALUE > On success, fread() and fwrite() return the number of items read or > written. This number equals the number of bytes transferred only when size > is 1. If an error occurs, or the > end of the file is reached, the return value is a short item count (or > zero). I thought it meant that "on error" included EAGAIN/EINTR, that's why I suggested a loop. But the manpage doesn't list error values or even mention whether it sets errno at all so I was mistaken. > You can check glibc source if in doubt. > Thus, the only situation where it might fail is if disk space is full. > In this case, I think that an assert (similar to a panic!) is good > enough because we can't really do anything to solve the problem. > > > Or keep it all in memory and write it on exit? Your call > > That was the first version, but it does not scale on long executions, > where traces can reach several gb. > > By the way, adding the assert caught a bug, size and num_elems were > reversed, so definitely thanks for pointing this out.