On 4/8/25 11:03, Yannis Bolliger wrote:

Hi Pierrick,

Thanks again for your help.

That's a good starting point.

Although gva2gpa usually only works on either the user virt address or the 
kernel one, perhaps depending on what context the cpu is running in?


I'm not familiar with our monitor command gva2gpa, but I guess it's
relying on current page table set. Kernel and user space may have
different ones (and additional ones depending on privilege level and
architecture). To be able to write to user space, kernel must still map
this address in its own page table, so there still should be a correct
virtual -> physical correct mapping from kernel context.

I'm not sure how those details are handled in Linux kernel, but I know
that kernel space is mapped on a specific partition of the address
space, and user space on the rest. So user address space is accessible
to kernel (but not the opposite, for obvious reasons).
Before Meltdown exploit, it used to be the same page table (but with U
bit set to 0 on kernel entries). To mitigate it, they now have distinct
page tables (KPTI) [1][2], but I don't think it's enabled by default
because of the cost related to syscalls (flushing TLB completely).

[1] https://www.kernel.org/doc/html/next/x86/pti.html
[2] https://en.wikipedia.org/wiki/Kernel_page-table_isolation

There is not a single and absolute answer to "Which physical address
matches this virtual one?" during all the execution. It will vary based
on current context.

I read the intro to KPTI and it seems it's mostly about avoiding mapping the 
kernel while running in the user space,
but in kernel space (where I do my logs) the page table should include both 
mappings, if I interpreted it correctly.
I checked with the monitor commands that when I'm in kernel context (EL1), the 
user virtual address I just logged is reported as unmapped by gva2gpa.
Although of course a user page may be paged out I don't think it should happen 
that quickly. I see in the code that an mmu index is looked up based on 
exception level in the
code used by gva2gpa (arm_cpu_get_phys_page_attrs_debug if I traced it 
correctly) so that may have something to do with it but I have not had enough 
time to look into it in more detail.


If you use QEMU monitor, I'm not surprised you don't see it.
Looking at gva2gpa implementation, it uses the monitor vcpu, which is the first vcpu by default. So if the page table associated to this process at this time does not have the mapping (other process running), then it won't be mapped.


Do you see any store at all (even with a different address)?

Yes, I see many stores too. Just not to the addresses I expect.
In other examples I see only stores to the user buffer address but not loads to 
the corresponding kernel buffer.
I just never see both in an interleaved manner as should happen based on the 
ARM memcpy assembly code used for the buffer copies in the kernel.


If you don't see any store, the most obvious idea I have is to check if
you use QEMU_PLUGIN_MEM_RW, and not only QEMU_PLUGIN_MEM_R, when
registering the plugin memory callback.

 From QEMU perspective, the fact those things happen from a kernel or
user space does not really matters. All it sees are load/stores, and
instrument them.


If you can, please share your plugin code, or at least the memory
callback setup to make sure everything is ok.

I added it to the end of this message.

It's possible that what you try to observe is split amongst several
vcpus. I'm not sure how Linux kernel deals with those copies, but if
several kernel threads are involved, you won't see all side effects only
by observing a single vcpu.

I would suggest to debug with -smp 1 first.

Although I log my memory trace per CPU, I do an offline merging of it in the 
end.
So the behavior I observe should be including all vCPUs.
But just to be sure testing with -smp 1 is definitely a good idea though.

I'll keep you updated if I find something.

Thanks and kind regards,
Yannis

------------- Code below -------------

#include <glib.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <qemu-plugin.h>

#define BUF_SIZE (1 * 1024 * 1024)

typedef struct LogRecord {
   uint64_t insn_count;
   char store;
   uint64_t address;
} LogRecord;

typedef struct CPU {
   char *last_exec;
   char *buf_start;
   FILE *logfile;
   LogRecord record;
} CPU;

QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;

static CPU *cpus;
static int cpu_len;

static uint64_t insn_count;

static inline void buf_dump(CPU *cpu) {
   fwrite(cpu->buf_start, cpu->last_exec - cpu->buf_start, 1, cpu->logfile);
   cpu->last_exec = cpu->buf_start;
}

static inline void buf_write(CPU *cpu, void *value, size_t len) {
   if ((cpu->last_exec + len) > (cpu->buf_start + BUF_SIZE)) {
     buf_dump(cpu);
   }
   memcpy(cpu->last_exec, value, len);
   cpu->last_exec += len;
}

static void vcpu_mem(unsigned int cpu_index, qemu_plugin_meminfo_t info,
                      uint64_t vaddr, void *udata) {
   if (!qemu_plugin_log_is_enabled()) {
     return;
   }
   CPU *c = cpus + cpu_index;
   if (qemu_plugin_mem_is_store(info)) {
     c->record.store = 1;
   } else {
     c->record.store = 0;
   }
   struct qemu_plugin_hwaddr *hwaddr = qemu_plugin_get_hwaddr(info, vaddr);
   if (qemu_plugin_hwaddr_is_io(hwaddr)) {
     return;
   }

Could you try logging those ones as well?
They should not be there, but better to check this.

You can compare to:
qemu-system-aarch64 -plugin ./build/tests/tcg/plugins/libmem.so,print-accesses=on -d plugin

This plugin prints all accesses inconditionally.

   uint64_t addr = qemu_plugin_hwaddr_phys_addr(hwaddr);
   c->record.address = addr;
   buf_write(c, &c->record, sizeof(LogRecord));
}

static void vcpu_insn_exec(unsigned int cpu_index, void *udata) {
   if (!qemu_plugin_log_is_enabled()) {
     return;
   }
   cpus[cpu_index].record.insn_count =
       __atomic_fetch_add(&insn_count, 1, __ATOMIC_SEQ_CST);
}

static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb) {
   struct qemu_plugin_insn *insn;

   size_t n_insns = qemu_plugin_tb_n_insns(tb);
   for (size_t i = 0; i < n_insns; i++) {
     insn = qemu_plugin_tb_get_insn(tb, i);
     qemu_plugin_register_vcpu_mem_cb(insn, vcpu_mem, QEMU_PLUGIN_CB_NO_REGS,
                                      QEMU_PLUGIN_MEM_RW, NULL);
     qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec,
                                            QEMU_PLUGIN_CB_NO_REGS, NULL);
   }
}

static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index) {
   CPU *c;
   char filename[32];
   snprintf(filename, sizeof(filename), "exec.log.%d", vcpu_index);
   c = cpus + vcpu_index;
   c->logfile = fopen(filename, "w");
   c->buf_start = malloc(BUF_SIZE); // 1MB
   c->last_exec = c->buf_start;
}

static void plugin_exit(qemu_plugin_id_t id, void *p) {
   int i;
   for (i = 0; i < cpu_len; i++) {
     CPU *c = cpus + i;
     buf_dump(c);
     fclose(c->logfile);
     free(c->buf_start);
   }
}

QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
                                            const qemu_info_t *info, int argc,
                                            char **argv) {
   cpus = malloc(info->system.max_vcpus * sizeof(CPU));
   cpu_len = info->system.max_vcpus;
   /* Register init, translation block and exit callbacks */
   qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
   qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
   qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);

   return 0;
}



Reply via email to