
I was extending perf counters to sample the stack of a KVM guest from
a module[0].

The current KVM profiling architecture, keeps a CPU local variable
current_vcpu of the current vcpu running before vm_enter, and removes
it after a vm_exit.

Then, when an NMI occurs, it could check the current_vcpu variable,
and get statistics of the guest from it, if it occurred during the
time the VM ran.

What I needed is, sampling the guest stack evey time an NMI occurs. I
needed two things.

1. A way to add code that would run when a PMI occurs.
-  possible with register_nmi_handler public API.
2. A way to access the CPU local variable current_vcpu.
- problematic, since current_vcpu is static.

What I eventually did is, since KVM expose a "setter" to current_vcpu,
I scanned the assembly code of the setter, and looked for a direct
move from register to gs (where CPU local variables are stored) plus
offset. Then take this offset and use it to access the current_vcpu

What can fail?

1. kvm performance implementation is completely changed.
2. Compiler would do use different instructions to set CPU local
variables (e.g., access CPU local variable by "mov $offset, %r2; mov
$value, (%r2)").

I think both cases are unlikely. This mechanism was written in 2010,
and had a cosmetic change in 2011 (access function to CPU local
variables). I think that there are a few years until this approach
could fail.

Obviously, the correct approach is to fix perf counters in the kernel
to support stack sampling (not trivial). But sometimes you need a
solution now, without patching all your host kernels.

I would be grateful for feedback of this approach, and especially
possible pitfalls I haven't considered.

The gist of the code is[1]:

    for (;;) {
        u8 *p;
        c = memchr(c, GS_SEG_OVERRIDE, end - c);
        if (c == NULL)
            return -1;
        p = c;
        if (!IS_RX_W(*p))
        if (*p != MOV_M_TO_R_OPCODE)
        /* We need direct access to memory with displacement */
        /* Don't care which registers are used */
        if (MOD(*p) != 0 || RM(*p) != 0b100)
        if (BASE(*p) != 0b101 || INDEX(*p) != 0b100)
        /* grab displacement32 value */
        return *(u32 *)p;

[0] https://github.com/elazarl/gueststack
[1] https://github.com/elazarl/gueststack/blob/master/module.c#L114

Linux-il mailing list

Reply via email to