RE: [PATCH] whpx: Added support for saving/restoring VM state

2022-05-18 Thread Ivan Shcherbakov
Hi Paolo,

Thanks for pointing out the relevant fields of the XSAVE state - it explains 
what is going on.
I did some quick experiments with modifying the buffer manually, so:

* Hyper-V doesn't care about the value of MXCSR_MASK. Both 0x and 0x 
work.
* Setting XCOMP_BV to 0 does trigger the error.

My best guess is that Hyper-V is emulating XSAVE/XRSTOR programmatically and 
they only implemented the compacted format.

Do you think it would be a reasonable workaround to handle it like this:

1. Pass the XSAVE data received from Hyper-V to x86_cpu_xrstor_all_areas().
2. Also save it into the snapshot area like we do now.
3. When restoring, first try to pass the data from x86_cpu_xsave_all_areas() to 
Hyper-V.
4. If it rejects it, pass the original block saved in step 2.

If either Microsoft adds support for the regular format, or someone on the Qemu 
side implements the compacted one, this logic will start properly synchronizing 
the QEMU-level XSAVE structure with Hyper-V. Until then (or if it breaks again) 
it will fall back to saving/restoring the data "as is", which will be 
sufficient for snapshots.

Best,
Ivan

-Original Message-
From: Qemu-devel  On Behalf Of 
Paolo Bonzini
Sent: Tuesday, May 17, 2022 7:12 AM
To: Ivan Shcherbakov ; qemu-devel@nongnu.org
Subject: Re: [PATCH] whpx: Added support for saving/restoring VM state

On 5/16/22 20:44, Ivan Shcherbakov wrote:
> Passing it to x86_cpu_xrstor_all_areas()/x86_cpu_xsave_all_areas() changed 
> the following values:
> 
> 0x001C: ff ff -> 00 00
> 0x0208: 07 -> 00
> 0x020F: 80 -> 00

0x1C-0x1F is MXCSR_MASK.  There's already a field in the x86 CPUState, but it 
was forgotten in x86_cpu_xsave_all_areas()/x86_cpu_xrstor_all_areas().  The 
field should also be initialized to 0x in the CPU reset function.

0x208...0x20F is XCOMP_BV and bit 63 in there is indeed signaling compacted 
format.  First of all I'd start with your patch and hack it to check if Hyper-V 
accepts zero at 0x208..0x20F; in this specific case of 0x208...0x20F have all 
low consecutive bits set plus bit 63 set, it's fine to do just that.  If so, 
x86_cpu_xrstor_all_areas() needs no support for compacted format.  I would be 
somewhat surprised if Hyper-V needs support in XRSTOR too.

For XSAVE, the algorithm to compute the offset (instead of just using 
x->offset) is given in the Intel manual:

If XCOMP_BV[i] = 0, state component i is not in the XSAVE area at all.

If XCOMP_BV[i] = 1, state component i is located at a byte offset  from the 
base address of the XSAVE area, which is determined by the following
steps:

- If i is the first bit set in bits 62:2 of the XCOMP_BV, state component i 
starts at offset 576

- Otherwise, take CPUID[EAX=0DH,ECX=i].ECX[1]:

   - If it is 0, state component i starts right after the preceding state
 component whose bit is set in XCOMP_BV (where the size of component
 j is enumerated in CPUID[EAX=0DH,ECX=j].EAX).

   - If it is 1, state component i starts after the preceding state
 component whose bit is set in XCOMP_BV, but on a 64-byte aligned
 offset relative to the beginning of the XSAVE area.

Paolo




RE: [PATCH] whpx: Added support for saving/restoring VM state

2022-05-16 Thread Ivan Shcherbakov
Hi Paolo,

>What are the differences?  Is it using the XSAVEC/XSAVES ("compacted") format?

I am not very familiar with the format internals, so I briefly checked whether 
I could reuse the general logic from the HVF port. Here's what I got on a 
booted Linux VM:

WHvGetVirtualProcessorXsaveState() returned this block (844 bytes):

7f 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0 1f 
00 00 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 58 65 c8 34 7f 00 00 04 7f 65 c8 34 7f 00 00 44 e1 68 c8 34 7f 
00 00 f8 35 65 c8 34 7f 00 00 ec 4f 65 c8 34 7f 00 00 5c 4f 6a c8 34 7f 00 00 
d0 7d 65 c8 34 7f 00 00 3c 6a 67 c8 34 7f 00 00 64 90 67 c8 34 7f 00 00 9c 0c 
78 c8 34 7f 00 00 3c 6a 67 c8 34 7f 00 00 40 60 65 c8 34 7f 00 00 5c 4f 6a c8 
34 7f 00 00 d0 7d 65 c8 34 7f 00 00 f8 35 65 c8 34 7f 00 00 ec 4f 65 c8 34 7f 
00 00 70 3d 66 cb 34 7f 00 00 00 00 00 00 00 00 00 00 69 00 65 00 6f 00 20 00 
00 00 00 00 00 00 00 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 ff 00 
00 00 ff ff 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 02 00 00 00 0c 00 00 00 05 00 00 00 22 00 00 00 05 00 00 00 10 00 
00 00 05 00 00 00 22 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 
07 00 00 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Passing it to x86_cpu_xrstor_all_areas()/x86_cpu_xsave_all_areas() changed the 
following values:

0x001C: ff ff -> 00 00
0x0208: 07 -> 00
0x020F: 80 -> 00

Trying to pass the updated block to WHvSetVirtualProcessorXsaveState() just 
made it reject the entire block with error c0350005 (generic "invalid 
parameter").

I haven't looked too deep into it, since just saving the state "as is" is 
sufficient to get the snapshots working. If there is something obvious I 
missed, I can give it another brief try.

Best,
Ivan




[PATCH] whpx: Added support for saving/restoring VM state

2022-05-13 Thread Ivan Shcherbakov
This patch adds support for the savevm/loadvm commands when using WHPX.

It saves the XSAVE state and the relevant internal registers that were
not otherwise captured into a separate "whpx/cpustate" object in the
snapshot, and restores them when the snapshot is loaded.

Note that due to the XSAVE format differences between the WHPX API and
QEMU, the XSAVE state captured from WHPX is not reflected in the X86CPU
structure, and is instead saved to the snapshots "as is".

Signed-off-by: Ivan Shcherbakov 
---
 target/i386/cpu.h|   2 +-
 target/i386/whpx/whpx-all.c  | 205 +--
 target/i386/whpx/whpx-internal.h |   7 +-
 3 files changed, 201 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9661f9fbd1..9c16199679 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1698,7 +1698,7 @@ typedef struct CPUArchState {
 int64_t user_tsc_khz; /* for sanity check only */
 uint64_t apic_bus_freq;
 uint64_t tsc;
-#if defined(CONFIG_KVM) || defined(CONFIG_HVF)
+#if defined(CONFIG_KVM) || defined(CONFIG_HVF) || defined(CONFIG_WHPX)
 void *xsave_buf;
 uint32_t xsave_buf_len;
 #endif
diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index b22a3314b4..6f95e9c780 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -26,6 +26,9 @@
 #include "qapi/qapi-types-common.h"
 #include "qapi/qapi-visit-common.h"
 #include "migration/blocker.h"
+#include "migration/register.h"
+#include "migration/qemu-file-types.h"
+#include "qemu/memalign.h"
 #include 
 
 #include "whpx-internal.h"
@@ -242,6 +245,10 @@ struct whpx_vcpu {
 WHV_RUN_VP_EXIT_CONTEXT exit_ctx;
 };
 
+enum {
+WHPX_XSAVE_AREA_ALIGNMENT = 4096,
+};
+
 static bool whpx_allowed;
 static bool whp_dispatch_initialized;
 static HMODULE hWinHvPlatform, hWinHvEmulation;
@@ -251,6 +258,29 @@ static WHV_PROCESSOR_XSAVE_FEATURES whpx_xsave_cap;
 struct whpx_state whpx_global;
 struct WHPDispatch whp_dispatch;
 
+static void whpx_xsave_save(QEMUFile *f, void *opaque);
+static int whpx_xsave_load(QEMUFile *f, void *opaque, int version_id);
+
+/*
+ * As of Windows 10 21H1, the layout of the XSAVE data returned by the WHPX
API
+ * does not match the layout used by QEMU.
+ *
+ * Specifically, trying to pass the state returned by
x86_cpu_xsave_all_areas()
+ * to WHvSetVirtualProcessorXsaveState() causes it to return an error.
+ *
+ * As a result, we do not reflect the captured XSAVE state in the X86CPU
+ * structure, and instead manually save it to the snapshots via the
+ * whpx_xsave_() callbacks.
+ *
+ * Note that unlike the device drivers that can use the new
VMStateDescription
+ * mechanism via 'DeviceClass::vmsd' field, AccelClass objects cannot
easily
+ * do it. Hence, we rely on the legacy state management API.
+ */
+static SaveVMHandlers savevm_whpx = {
+.save_state = whpx_xsave_save,
+.load_state = whpx_xsave_load,
+};
+
 static bool whpx_has_xsave(void)
 {
 return whpx_xsave_cap.XsaveSupport;
@@ -307,18 +337,35 @@ static SegmentCache whpx_seg_h2q(const
WHV_X64_SEGMENT_REGISTER *hs)
 }
 
 /* X64 Extended Control Registers */
-static void whpx_set_xcrs(CPUState *cpu)
+static void whpx_set_xsave(CPUState *cpu)
 {
 CPUX86State *env = cpu->env_ptr;
 HRESULT hr;
 struct whpx_state *whpx = _global;
 WHV_REGISTER_VALUE xcr0;
 WHV_REGISTER_NAME xcr0_name = WHvX64RegisterXCr0;
+void *xsave = X86_CPU(cpu)->env.xsave_buf;
+uint32_t xsave_len = X86_CPU(cpu)->env.xsave_buf_len;
 
 if (!whpx_has_xsave()) {
 return;
 }
 
+if (xsave) {
+/*
+ *  See the comment on 'savevm_whpx' for an explanation why
+ *  we cannot do this:
+ *  x86_cpu_xsave_all_areas(X86_CPU(cpu), xsave, xsave_len);
+ */
+
+hr = whp_dispatch.WHvSetVirtualProcessorXsaveState(
+whpx->partition, cpu->cpu_index, xsave, xsave_len);
+
+if (FAILED(hr)) {
+error_report("WHPX: Failed to set xsave state, hr=%08lx", hr);
+}
+}
+
 /* Only xcr0 is supported by the hypervisor currently */
 xcr0.Reg64 = env->xcr0;
 hr = whp_dispatch.WHvSetVirtualProcessorRegisters(
@@ -326,6 +373,7 @@ static void whpx_set_xcrs(CPUState *cpu)
 if (FAILED(hr)) {
 error_report("WHPX: Failed to set register xcr0, hr=%08lx", hr);
 }
+
 }
 
 static int whpx_set_tsc(CPUState *cpu)
@@ -474,7 +522,7 @@ static void whpx_set_registers(CPUState *cpu, int level)
  * Extended control registers needs to be handled separately depending
  * on whether xsave is supported/enabled or not.
  */
-whpx_set_xcrs(cpu);
+whpx_set_xsave(cpu);
 
 /* 16 XMM registers */
 assert(whpx_register_names[idx] == WHvX64RegisterXmm0);
@@ -582,28 +630,57 @@ static int whpx_get_tsc(CPUState *cpu)
 r

[PATCH] whpx: Added support for saving/restoring VM state

2022-05-13 Thread Ivan Shcherbakov
This patch adds support for the savevm/loadvm commands when using WHPX.

 

It saves the XSAVE state and the relevant internal registers that were

not otherwise captured into a separate "whpx/cpustate" object in the

snapshot, and restores them when the snapshot is loaded.

 

Note that due to the XSAVE format differences between the WHPX API and

QEMU, the XSAVE state captured from WHPX is not reflected in the X86CPU

structure, and is instead saved to the snapshots "as is".

 

Signed-off-by: Ivan Shcherbakov 

---

target/i386/cpu.h|   2 +-

target/i386/whpx/whpx-all.c  | 205 +--

target/i386/whpx/whpx-internal.h |   7 +-

3 files changed, 201 insertions(+), 13 deletions(-)

 

diff --git a/target/i386/cpu.h b/target/i386/cpu.h

index 9661f9fbd1..9c16199679 100644

--- a/target/i386/cpu.h

+++ b/target/i386/cpu.h

@@ -1698,7 +1698,7 @@ typedef struct CPUArchState {

 int64_t user_tsc_khz; /* for sanity check only */

 uint64_t apic_bus_freq;

 uint64_t tsc;

-#if defined(CONFIG_KVM) || defined(CONFIG_HVF)

+#if defined(CONFIG_KVM) || defined(CONFIG_HVF) || defined(CONFIG_WHPX)

 void *xsave_buf;

 uint32_t xsave_buf_len;

#endif

diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c

index b22a3314b4..6f95e9c780 100644

--- a/target/i386/whpx/whpx-all.c

+++ b/target/i386/whpx/whpx-all.c

@@ -26,6 +26,9 @@

#include "qapi/qapi-types-common.h"

#include "qapi/qapi-visit-common.h"

#include "migration/blocker.h"

+#include "migration/register.h"

+#include "migration/qemu-file-types.h"

+#include "qemu/memalign.h"

#include 

 #include "whpx-internal.h"

@@ -242,6 +245,10 @@ struct whpx_vcpu {

 WHV_RUN_VP_EXIT_CONTEXT exit_ctx;

};

+enum {

+WHPX_XSAVE_AREA_ALIGNMENT = 4096,

+};

+

static bool whpx_allowed;

static bool whp_dispatch_initialized;

static HMODULE hWinHvPlatform, hWinHvEmulation;

@@ -251,6 +258,29 @@ static WHV_PROCESSOR_XSAVE_FEATURES whpx_xsave_cap;

struct whpx_state whpx_global;

struct WHPDispatch whp_dispatch;

+static void whpx_xsave_save(QEMUFile *f, void *opaque);

+static int whpx_xsave_load(QEMUFile *f, void *opaque, int version_id);

+

+/*

+ * As of Windows 10 21H1, the layout of the XSAVE data returned by the WHPX
API

+ * does not match the layout used by QEMU.

+ *

+ * Specifically, trying to pass the state returned by
x86_cpu_xsave_all_areas()

+ * to WHvSetVirtualProcessorXsaveState() causes it to return an error.

+ *

+ * As a result, we do not reflect the captured XSAVE state in the X86CPU

+ * structure, and instead manually save it to the snapshots via the

+ * whpx_xsave_() callbacks.

+ *

+ * Note that unlike the device drivers that can use the new
VMStateDescription

+ * mechanism via 'DeviceClass::vmsd' field, AccelClass objects cannot
easily

+ * do it. Hence, we rely on the legacy state management API.

+ */

+static SaveVMHandlers savevm_whpx = {

+.save_state = whpx_xsave_save,

+.load_state = whpx_xsave_load,

+};

+

static bool whpx_has_xsave(void)

{

 return whpx_xsave_cap.XsaveSupport;

@@ -307,18 +337,35 @@ static SegmentCache whpx_seg_h2q(const
WHV_X64_SEGMENT_REGISTER *hs)

}

 /* X64 Extended Control Registers */

-static void whpx_set_xcrs(CPUState *cpu)

+static void whpx_set_xsave(CPUState *cpu)

{

 CPUX86State *env = cpu->env_ptr;

 HRESULT hr;

 struct whpx_state *whpx = _global;

 WHV_REGISTER_VALUE xcr0;

 WHV_REGISTER_NAME xcr0_name = WHvX64RegisterXCr0;

+void *xsave = X86_CPU(cpu)->env.xsave_buf;

+uint32_t xsave_len = X86_CPU(cpu)->env.xsave_buf_len;

 if (!whpx_has_xsave()) {

 return;

 }

+if (xsave) {

+/*

+ *  See the comment on 'savevm_whpx' for an explanation why

+ *  we cannot do this:

+ *  x86_cpu_xsave_all_areas(X86_CPU(cpu), xsave, xsave_len);

+ */

+

+hr = whp_dispatch.WHvSetVirtualProcessorXsaveState(

+whpx->partition, cpu->cpu_index, xsave, xsave_len);

+

+if (FAILED(hr)) {

+error_report("WHPX: Failed to set xsave state, hr=%08lx", hr);

+}

+}

+

 /* Only xcr0 is supported by the hypervisor currently */

 xcr0.Reg64 = env->xcr0;

 hr = whp_dispatch.WHvSetVirtualProcessorRegisters(

@@ -326,6 +373,7 @@ static void whpx_set_xcrs(CPUState *cpu)

 if (FAILED(hr)) {

 error_report("WHPX: Failed to set register xcr0, hr=%08lx", hr);

 }

+

}

 static int whpx_set_tsc(CPUState *cpu)

@@ -474,7 +522,7 @@ static void whpx_set_registers(CPUState *cpu, int level)

  * Extended control registers needs to be handled separately depending

  * on whether xsave is supported/enabled or not.

  */

-whpx_set_xcrs(cpu);

+whpx_set_xsave(cpu);

 /* 16 XMM registers */

 assert(whpx_register_names

[PATCH v2] whpx: Added support for breakpoints and stepping

2022-03-02 Thread Ivan Shcherbakov
Below is the updated version of the patch adding debugging support to WHPX.
It incorporates feedback from Alex Bennée and Peter Maydell regarding not
changing the emulation logic depending on the gdb connection status.

Instead of checking for an active gdb connection to determine whether QEMU
should intercept the INT1 exceptions, it now checks whether any breakpoints
have been set, or whether gdb has explicitly requested one or more CPUs to
do single-stepping. Having none of these condition present now has the same
effect as not using gdb at all. 

---

This adds support for breakpoints and stepping when debugging
WHPX-accelerated guests with gdb.
It enables reliable debugging of the Linux kernel in both single-CPU and SMP
modes.

Signed-off-by: Ivan Shcherbakov 
---
 gdbstub.c |  10 +-
 include/sysemu/accel-ops.h|   1 +
 include/sysemu/runstate.h |   8 +-
 softmmu/cpus.c|  12 +-
 target/i386/whpx/whpx-accel-ops.c |   1 +
 target/i386/whpx/whpx-accel-ops.h |   1 +
 target/i386/whpx/whpx-all.c   | 770 +-
 target/i386/whpx/whpx-internal.h  |  30 ++
 8 files changed, 815 insertions(+), 18 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 3c14c6a038..ab3cd44f72 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -519,7 +519,15 @@ static int gdb_continue_partial(char *newstates)
 int flag = 0;
 
 if (!runstate_needs_reset()) {
-if (vm_prepare_start()) {
+bool step_requested = false;
+CPU_FOREACH(cpu) {
+if (newstates[cpu->cpu_index] == 's') {
+step_requested = true;
+break;
+}
+}
+
+if (vm_prepare_start(step_requested)) {
 return 0;
 }
 
diff --git a/include/sysemu/accel-ops.h b/include/sysemu/accel-ops.h
index 032f6979d7..f40ca4c033 100644
--- a/include/sysemu/accel-ops.h
+++ b/include/sysemu/accel-ops.h
@@ -35,6 +35,7 @@ struct AccelOpsClass {
 void (*synchronize_post_init)(CPUState *cpu);
 void (*synchronize_state)(CPUState *cpu);
 void (*synchronize_pre_loadvm)(CPUState *cpu);
+void (*synchronize_pre_resume)(bool step_pending);
 
 void (*handle_interrupt)(CPUState *cpu, int mask);
 
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index a535691573..798ba13ecb 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -34,7 +34,13 @@ static inline bool shutdown_caused_by_guest(ShutdownCause
cause)
 }
 
 void vm_start(void);
-int vm_prepare_start(void);
+
+/**
+ * vm_prepare_start: Prepare for starting/resuming the VM
+ *
+ * @step_pending: whether any of the CPUs is about to be single-stepped by
gdb
+ */
+int vm_prepare_start(bool step_pending);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 035395ae13..acd695a229 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -662,7 +662,7 @@ int vm_stop(RunState state)
  * Returns -1 if the vCPUs are not to be restarted (e.g. if they are
already
  * running or in case of an error condition), 0 otherwise.
  */
-int vm_prepare_start(void)
+int vm_prepare_start(bool step_pending)
 {
 RunState requested;
 
@@ -682,6 +682,14 @@ int vm_prepare_start(void)
 return -1;
 }
 
+/*
+ * WHPX accelerator needs to know whether we are going to step
+ * any CPUs, before starting the first one.
+ */
+if (cpus_accel->synchronize_pre_resume) {
+cpus_accel->synchronize_pre_resume(step_pending);
+}
+
 /* We are sending this now, but the CPUs will be resumed shortly later
*/
 qapi_event_send_resume();
 
@@ -693,7 +701,7 @@ int vm_prepare_start(void)
 
 void vm_start(void)
 {
-if (!vm_prepare_start()) {
+if (!vm_prepare_start(false)) {
 resume_all_vcpus();
 }
 }
diff --git a/target/i386/whpx/whpx-accel-ops.c
b/target/i386/whpx/whpx-accel-ops.c
index 6bc47c5309..1a859a084d 100644
--- a/target/i386/whpx/whpx-accel-ops.c
+++ b/target/i386/whpx/whpx-accel-ops.c
@@ -94,6 +94,7 @@ static void whpx_accel_ops_class_init(ObjectClass *oc,
void *data)
 ops->synchronize_post_init = whpx_cpu_synchronize_post_init;
 ops->synchronize_state = whpx_cpu_synchronize_state;
 ops->synchronize_pre_loadvm = whpx_cpu_synchronize_pre_loadvm;
+ops->synchronize_pre_resume = whpx_cpu_synchronize_pre_resume;
 }
 
 static const TypeInfo whpx_accel_ops_type = {
diff --git a/target/i386/whpx/whpx-accel-ops.h
b/target/i386/whpx/whpx-accel-ops.h
index 2dee6d61ea..b5102dd1ee 100644
--- a/target/i386/whpx/whpx-accel-ops.h
+++ b/target/i386/whpx/whpx-accel-ops.h
@@ -21,6 +21,7 @@ void whpx_cpu_synchronize_state(CPUState *cpu);
 void whpx_cpu_synchronize_post_reset(CPUState *cpu);
 void whpx_cpu_synchronize_post_init(CPUState *cpu);
 void whpx_cpu_synchronize_pre_loadvm(CPUState *cpu);
+void whpx_cpu_synchronize_pre_resume(bool step_pending);
 
 /* stat

RE: [PATCH 3/3] whpx: Added support for breakpoints and stepping

2022-03-01 Thread Ivan Shcherbakov
Hi Alex,

Is there anything I could do to get the WHPX debugging support accepted into 
QEMU? Would the proposed callback AccelOpsClass work for you, or would you 
prefer another approach?

Best,
Ivan

-Original Message-
From: Qemu-devel  On Behalf Of 
Ivan Shcherbakov
Sent: Monday, February 28, 2022 6:09 PM
To: 'Alex Bennée' 
Cc: 'Peter Maydell' ; m...@redhat.com; 
qemu-devel@nongnu.org; arm...@redhat.com; 'Paolo Bonzini' 
Subject: RE: [PATCH 3/3] whpx: Added support for breakpoints and stepping

Hi Alex,

Thanks for getting back to me. It is definitely the latter case (i.e. it is 
possible to change it while the target is stopped at a breakpoint before 
resuming any VCPUs).
vm_state_notify() does look like it's intended for this type of notifications, 
but unfortunately, it doesn't make a distinction between stepping and running 
normally.
Below is the relevant code from gdbstub.c:

>static int gdb_continue_partial(char *newstates) {
>int flag = 0;
>
>/* Various corner cases omitted for brevity  */
>if (vm_prepare_start()) {
>return 0;
>}
>CPU_FOREACH(cpu) {
>switch (newstates[cpu->cpu_index]) {
>case 's':
>trace_gdbstub_op_stepping(cpu->cpu_index);
>cpu_single_step(cpu, gdbserver_state.sstep_flags);
>cpu_resume(cpu);
>flag = 1;
>break;
>case 'c':
>trace_gdbstub_op_continue_cpu(cpu->cpu_index);
>cpu_resume(cpu);
>flag = 1;
>break;
>default:
>res = -1;
>break;
>}
>}
>}

Also:

>int vm_prepare_start(void)
>{
>runstate_set(RUN_STATE_RUNNING);
>vm_state_notify(1, RUN_STATE_RUNNING);
>return 0;
>}

and:

>void vm_state_notify(bool running, RunState state);

So, currently, vm_prepare_start() has no way of distinguishing between 
single-stepping and normal running unless gdb_continue_partial() scans the 
'newstates' array before calling it, and passes some extra argument to 
vm_prepare_start(), indicating whether a step request was present anywhere in 
the array.

I couldn't find any existing run state that would match single-stepping, and 
adding another run state looks like a very non-trivial change that can easily 
backfire. Adding another argument to vm_state_notify() could be easier, but I 
am still afraid it could break some other part of QEMU, so I thought adding a 
new member to AccelOpsClass would be a safer bet. But again, if you think 
another way to do it is better, I am very open to it.

Best regards,
Ivan

-Original Message-
From: Alex Bennée 
Sent: Monday, February 28, 2022 2:28 AM
To: Ivan Shcherbakov 
Cc: 'Peter Maydell' ; 'Paolo Bonzini' 
; qemu-devel@nongnu.org; arm...@redhat.com; m...@redhat.com
Subject: Re: [PATCH 3/3] whpx: Added support for breakpoints and stepping

Is the limitation that whpx_set_exception_exit_bitmap needs to be set before 
any vCPU can be run or that it cannot be set if any vCPUs are currently running?
If it is the later wouldn't adding a hook into the vm_change_state_head 
callbacks be enough?




RE: [PATCH 3/3] whpx: Added support for breakpoints and stepping

2022-02-28 Thread Ivan Shcherbakov
Hi Alex,

Thanks for getting back to me. It is definitely the latter case (i.e. it is 
possible to change it while the target is stopped at a breakpoint before 
resuming any VCPUs).
vm_state_notify() does look like it's intended for this type of notifications, 
but unfortunately, it doesn't make a distinction between stepping and running 
normally.
Below is the relevant code from gdbstub.c:

>static int gdb_continue_partial(char *newstates)
>{
>int flag = 0;
>
>/* Various corner cases omitted for brevity  */
>if (vm_prepare_start()) {
>return 0;
>}
>CPU_FOREACH(cpu) {
>switch (newstates[cpu->cpu_index]) {
>case 's':
>trace_gdbstub_op_stepping(cpu->cpu_index);
>cpu_single_step(cpu, gdbserver_state.sstep_flags);
>cpu_resume(cpu);
>flag = 1;
>break;
>case 'c':
>trace_gdbstub_op_continue_cpu(cpu->cpu_index);
>cpu_resume(cpu);
>flag = 1;
>break;
>default:
>res = -1;
>break;
>}
>}
>}

Also:

>int vm_prepare_start(void)
>{
>runstate_set(RUN_STATE_RUNNING);
>vm_state_notify(1, RUN_STATE_RUNNING);
>return 0;
>}

and:

>void vm_state_notify(bool running, RunState state);

So, currently, vm_prepare_start() has no way of distinguishing between 
single-stepping and normal running unless gdb_continue_partial() scans the 
'newstates' array before calling it, and passes some extra argument to 
vm_prepare_start(), indicating whether a step request was present anywhere in 
the array.

I couldn't find any existing run state that would match single-stepping, and 
adding another run state looks like a very non-trivial change that can easily 
backfire. Adding another argument to vm_state_notify() could be easier, but I 
am still afraid it could break some other part of QEMU, so I thought adding a 
new member to AccelOpsClass would be a safer bet. But again, if you think 
another way to do it is better, I am very open to it.

Best regards,
Ivan

-Original Message-
From: Alex Bennée  
Sent: Monday, February 28, 2022 2:28 AM
To: Ivan Shcherbakov 
Cc: 'Peter Maydell' ; 'Paolo Bonzini' 
; qemu-devel@nongnu.org; arm...@redhat.com; m...@redhat.com
Subject: Re: [PATCH 3/3] whpx: Added support for breakpoints and stepping

Is the limitation that whpx_set_exception_exit_bitmap needs to be set before 
any vCPU can be run or that it cannot be set if any vCPUs are currently running?
If it is the later wouldn't adding a hook into the vm_change_state_head 
callbacks be enough?




RE: [PATCH 3/3] whpx: Added support for breakpoints and stepping

2022-02-27 Thread Ivan Shcherbakov
Hi Alex and Peter,

It would be great to hear your feedback on handling the WHPX stepping via an 
added argument to vm_prepare_start(). It is only called from 2 places, so the 
changes will be minimal. I this works for you, I will gladly send an updated 
patch.

I am also fully open to other suggestions. You obviously know the QEMU codebase 
much better, so I'm happy to refactor it so that it blends in well with the 
rest of the code.

Best,
Ivan

-Original Message-
From: Qemu-devel  On Behalf Of 
Ivan Shcherbakov
Sent: Thursday, February 24, 2022 7:54 AM
To: 'Alex Bennée' ; 'Peter Maydell' 

Cc: 'Paolo Bonzini' ; qemu-devel@nongnu.org; 
arm...@redhat.com; m...@redhat.com
Subject: RE: [PATCH 3/3] whpx: Added support for breakpoints and stepping

> I haven't looked at the rest of the patch -- but can you explain where 
> whpx is different from how other accelerators handle debug such that 
> it needs to know whether gdb is connected or not ?
This mainly comes from the way single-stepping is handled. WHPX needs to know 
whether you want to trap INT1 before starting the first VCPU. The current 
gdbstub implementation doesn’t make it easy. Assume the scenario:

1. You have 2 VCPUs. You run the first one and step the second one.
2. gdb_continue_partial() calls cpu_resume(0) 3. gdb_continue_partial() calls 
cpu_single_step(1).

WHPX needs to know whether to trap INT1 at step #2 (starting the first CPU), 
but it won't know it until step #3. So, the current logic simply checks if gdb 
is connected at all in step #2.

>Just the fact you have connected to the gdbserver shouldn't affect how you run 
>WHPX up until the point there are things you need to trap - i.e.
>handling installed breakpoints.

This can be addressed by adding a "bool stepping_expected" argument to 
vm_prepare_start(). It will be set to true if gdb_continue_partial() expects 
ANY thread to be stepped, and will be false otherwise. It will also require a 
new callback in AccelOpsClass (e.g. on_vm_starting(bool stepping_expected)) 
that will be called from vm_prepare_start(). The WHPX implementation will then 
check if any breakpoints are set, and if the last call to this function 
expected stepping, and use it to decide whether to trap INT1.

Let me know if this will work better.

Best,
Ivan




RE: [PATCH 3/3] whpx: Added support for breakpoints and stepping

2022-02-24 Thread Ivan Shcherbakov
> I haven't looked at the rest of the patch -- but can you explain where 
> whpx is different from how other accelerators handle debug such that 
> it needs to know whether gdb is connected or not ?
This mainly comes from the way single-stepping is handled. WHPX needs to know 
whether you want to trap INT1 before starting the first VCPU. The current 
gdbstub implementation doesn’t make it easy. Assume the scenario:

1. You have 2 VCPUs. You run the first one and step the second one.
2. gdb_continue_partial() calls cpu_resume(0)
3. gdb_continue_partial() calls cpu_single_step(1).

WHPX needs to know whether to trap INT1 at step #2 (starting the first CPU), 
but it won't know it until step #3. So, the current logic simply checks if gdb 
is connected at all in step #2.

>Just the fact you have connected to the gdbserver shouldn't affect how you run 
>WHPX up until the point there are things you need to trap - i.e.
>handling installed breakpoints.

This can be addressed by adding a "bool stepping_expected" argument to 
vm_prepare_start(). It will be set to true if gdb_continue_partial() expects 
ANY thread to be stepped, and will be false otherwise. It will also require a 
new callback in AccelOpsClass (e.g. on_vm_starting(bool stepping_expected)) 
that will be called from vm_prepare_start(). The WHPX implementation will then 
check if any breakpoints are set, and if the last call to this function 
expected stepping, and use it to decide whether to trap INT1.

Let me know if this will work better.

Best,
Ivan




RE: [PATCH 3/3] whpx: Added support for breakpoints and stepping

2022-02-23 Thread Ivan Shcherbakov
Hi Paolo,

Thanks for getting back to me. Please see my comments below:

>Please use WhpxStepMode and likewise for WhpxBreakpointState.
No problem, I have updated the patch.

>(In the case of WhpxStepMode I would also consider simply a "bool exclusive" 
>in whpx_cpu_run).
This is a leftover from prior experiments with stepping into the exception 
handlers 
that involves reading the IDT/GDT, writing INT3 into the original handler, 
disabling INT1 interception
and instead intercepting INT3. It would be implemented via a third step mode. I 
ended up removing
it due to the unbelievable complexity of properly handling all corner cases, 
but I thought it would make
sense to keep it as an enum if someone else decides to add anything similar 
later.

>Please leave the empty line.
Oops, leftover from other experiments. Thanks for pointing out.

>Out of curiosity, does the guest see TF=1 if it single steps through a PUSHF 
>(and then break horribly on POPF :))?
This is a very good point. It would indeed get pushed into the stack and popped 
back with POPF, raising another INT1.
I shouldn't be too horrible though: it would report another stop to gdb, and 
doing a step again would clear it.
A somewhat bigger issue would be that stepping through POPF or LAHF could reset 
the TF to 0, effectively ending the
single-stepping mode.

It could be addressed by emulating PUSHF/POPF, but there are a few corner cases 
(alignment checks, page faults on RSP,
flag translation for the virtual 8086 mode), that could make things more 
complicated. Also, PUSHF/POPF it not the only
special case. Stepping over IRET, into page fault handlers, and when the guest 
itself is running a debugger that wants
to do single-stepping would also require rather non-trivial workarounds.

I have added a detailed comment outlining the limitations of the current 
stepping logic and possible workarounds above
the definition of WhpxStepMode. The current implementation works like a charm 
for debugging real-world kernel modules
on x64 Ubuntu, and if some of these limitations end up breaking other debug 
scenarios, I would much more prefer
addressing specific narrow issues, than adding another layer of complexity 
proactively, if that's OK with you. 

I have taken special care to make sure the new functionality won't cause any 
regressions when not debugging.
The intercepted exception mask is set to 0 when gdb is not connected, 100% 
matching the behavior prior to the patch.

>Why separate the original addresses in a different array
This accommodates the case when different parts of QEMU would set multiple 
breakpoints at the same location, or when
a breakpoint removed on the QEMU level could not be removed from RAM (e.g. the 
page got swapped out). 
Synchronizing the QEMU breakpoint list with the RAM breakpoint list currently 
has O(N^2) complexity. However, in many
cases (e.g. stopping to handle timers), the QEMU breakpoints won't change 
between the invocations, so we can just do a
quick O(N) comparison of the old breakpoint list vs. new breakpoint list, and 
avoid the O(N^2) part. You can find this logic
by searching for the 'update_pending' variable.

This could be avoided if cpu_breakpoint_insert() or gdb_breakpoint_insert() 
could invoke a WHPX-specific handler,
similar to what is currently done with kvm_insert_breakpoint(), or a generic 
callback via AccelOpsClass, that could just
mark the RAM breakpoint list dirty. But I didn't want to introduce unnecessary 
changes outside the WHPX module.

>(and why the different logic, with used/allocated for one array and an exact 
>size for the other)
When we are rebuilding the memory breakpoint list, we don't know how many of 
the CPU breakpoints will refer to the same
memory addresses, or overlap with the breakpoints that could not be removed. 
So, we allocate for the worst case, and keep
a track of the actually created entries in the 'used' field.

Updated patch below. Let me know if you have any further concerns and I will be 
happy to address them.

This adds support for breakpoints and stepping when debugging WHPX-accelerated 
guests with gdb.
It enables reliable debugging of the Linux kernel in both single-CPU and SMP 
modes.

Signed-off-by: Ivan Shcherbakov 
---
 gdbstub.c|  10 +
 include/exec/gdbstub.h   |   8 +
 target/i386/whpx/whpx-all.c  | 763 ++-
 target/i386/whpx/whpx-internal.h |  29 ++
 4 files changed, 796 insertions(+), 14 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 3c14c6a038..d30cbfa478 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -373,6 +373,12 @@ typedef struct GDBState {
 } GDBState;
 
 static GDBState gdbserver_state;
+static bool gdbserver_is_connected;
+
+bool gdb_is_connected(void)
+{
+return gdbserver_is_connected;
+}
 
 static void init_gdbserver_state(void)
 {
@@ -3410,6 +3416,10 @@ static void gdb_chr_event(void *opaque, QEMUChrEvent 
event)
 vm_stop(RUN_STATE_

[PATCH 3/3] whpx: Added support for breakpoints and stepping

2022-02-22 Thread Ivan Shcherbakov
This adds support for breakpoints and stepping when debugging
WHPX-accelerated guests with gdb.
It enables reliable debugging of the Linux kernel in both single-CPU and SMP
modes.

Signed-off-by: Ivan Shcherbakov 
---
 gdbstub.c|  10 +
 include/exec/gdbstub.h   |   8 +
 target/i386/whpx/whpx-all.c  | 689 ++-
 target/i386/whpx/whpx-internal.h |  29 ++
 4 files changed, 721 insertions(+), 15 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 3c14c6a038..d30cbfa478 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -373,6 +373,12 @@ typedef struct GDBState {
 } GDBState;
 
 static GDBState gdbserver_state;
+static bool gdbserver_is_connected;
+
+bool gdb_is_connected(void)
+{
+return gdbserver_is_connected;
+}
 
 static void init_gdbserver_state(void)
 {
@@ -3410,6 +3416,10 @@ static void gdb_chr_event(void *opaque, QEMUChrEvent
event)
 vm_stop(RUN_STATE_PAUSED);
 replay_gdb_attached();
 gdb_has_xml = false;
+gdbserver_is_connected = true;
+break;
+case CHR_EVENT_CLOSED:
+gdbserver_is_connected = false;
 break;
 default:
 break;
diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index a024a0350d..0ef54cdeb5 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -188,4 +188,12 @@ extern bool gdb_has_xml;
 /* in gdbstub-xml.c, generated by scripts/feature_to_c.sh */
 extern const char *const xml_builtin[][2];
 
+/**
+ * gdb_is_connected: Check whether gdb is currently connected.
+ * This function is used to determine if gdb is currently connected to
qemu.
+ * It is used by the WHPX engine to enable interception of debug-related
+ * exceptions, when debugging with gdb, and pass them to the guest
otherwise.
+ */
+bool gdb_is_connected(void);
+
 #endif
diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index 8a8b5d55d1..030988fa51 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -12,6 +12,7 @@
 #include "cpu.h"
 #include "exec/address-spaces.h"
 #include "exec/ioport.h"
+#include "exec/gdbstub.h"
 #include "qemu-common.h"
 #include "qemu/accel.h"
 #include "sysemu/whpx.h"
@@ -148,6 +149,12 @@ struct whpx_register_set {
 WHV_REGISTER_VALUE values[RTL_NUMBER_OF(whpx_register_names)];
 };
 
+enum whpx_step_mode {
+whpx_step_none = 0,
+/* Halt other VCPUs */
+whpx_step_exclusive,
+};
+
 struct whpx_vcpu {
 WHV_EMULATOR_HANDLE emulator;
 bool window_registered;
@@ -156,7 +163,6 @@ struct whpx_vcpu {
 uint64_t tpr;
 uint64_t apic_base;
 bool interruption_pending;
-
 /* Must be the last field as it may have a tail */
 WHV_RUN_VP_EXIT_CONTEXT exit_ctx;
 };
@@ -793,6 +799,515 @@ static int whpx_handle_portio(CPUState *cpu,
 return 0;
 }
 
+/*
+ * Controls whether we should intercept various exceptions on the guest,
+ * namely breakpoint/single-step events.
+ *
+ * The 'exceptions' argument accepts a bitmask, e.g:
+ * (1 << WHvX64ExceptionTypeDebugTrapOrFault) | (...)
+ */
+static HRESULT whpx_set_exception_exit_bitmap(UINT64 exceptions)
+{
+struct whpx_state *whpx = _global;
+WHV_PARTITION_PROPERTY prop = { 0, };
+HRESULT hr;
+
+if (exceptions == whpx->exception_exit_bitmap) {
+return S_OK;
+}
+
+prop.ExceptionExitBitmap = exceptions;
+
+hr = whp_dispatch.WHvSetPartitionProperty(
+whpx->partition,
+WHvPartitionPropertyCodeExceptionExitBitmap,
+,
+sizeof(WHV_PARTITION_PROPERTY));
+
+if (SUCCEEDED(hr)) {
+whpx->exception_exit_bitmap = exceptions;
+}
+
+return hr;
+}
+
+
+/*
+ * This function is called before/after stepping over a single instruction.
+ * It will update the CPU registers to arm/disarm the instruction stepping
+ * accordingly.
+ */
+static HRESULT whpx_vcpu_configure_single_stepping(CPUState *cpu,
+bool set,
+uint64_t *exit_context_rflags)
+{
+WHV_REGISTER_NAME reg_name;
+WHV_REGISTER_VALUE reg_value;
+HRESULT hr;
+struct whpx_state *whpx = _global;
+
+/*
+ * If we are trying to step over a single instruction, we need to set
the
+ * TF bit in rflags. Otherwise, clear it.
+ */
+reg_name = WHvX64RegisterRflags;
+hr = whp_dispatch.WHvGetVirtualProcessorRegisters(
+whpx->partition,
+cpu->cpu_index,
+_name,
+1,
+_value);
+
+if (FAILED(hr)) {
+error_report("WHPX: Failed to get rflags, hr=%08lx", hr);
+return hr;
+}
+
+if (exit_context_rflags) {
+assert(*exit_context_rflags == reg_value.Reg64);
+}
+
+if (set) {
+/* Raise WHvX64ExceptionTypeDebugTrapOrFault after each instruction
*/
+reg_value.Reg64 |= TF_MASK;
+} else {
+reg_value.Reg64 &= ~TF_MASK;
+}
+
+if (exit_context_rflags) {
+*exit_context

[PATCH 2/3] whpx: Fixed incorrect CR8/TPR synchronization

2022-02-22 Thread Ivan Shcherbakov
This fixes the following error triggered when stopping and resuming a 64-bit
Linux kernel via gdb:

qemu-system-x86_64.exe: WHPX: Failed to set virtual processor context,
hr=c0350005

The previous logic for synchronizing the values did not take into account
that the lower 4 bits of
the CR8 register, containing the priority level, mapped to bits 7:4 of the
APIC.TPR register
(see section 10.8.6.1 of Volume 3 of Intel 64 and IA-32 Architectures
Software Developer's Manual).
The caused WHvSetVirtualProcessorRegisters() to fail with an error,
effectively preventing GDB from
changing the guest context.

Signed-off-by: Ivan Shcherbakov 
---
 target/i386/whpx/whpx-all.c | 49 +++--
 1 file changed, 41 insertions(+), 8 deletions(-)

diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index edd4fafbdf..8a8b5d55d1 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -256,6 +256,28 @@ static int whpx_set_tsc(CPUState *cpu)
 return 0;
 }
 
+/*
+ * The CR8 register in the CPU is mapped to the TPR register of the APIC,
+ * however, they use a slightly different encoding. Specifically:
+ *
+ * APIC.TPR[bits 7:4] = CR8[bits 3:0]
+ *
+ * This mechanism is described in section 10.8.6.1 of Volume 3 of Intel 64
+ * and IA-32 Architectures Software Developer's Manual.
+ *
+ * The functions below translate the value of CR8 to TPR and vice versa.
+ */
+
+static uint64_t whpx_apic_tpr_to_cr8(uint64_t tpr)
+{
+return tpr >> 4;
+}
+
+static uint64_t whpx_cr8_to_apic_tpr(uint64_t cr8)
+{
+return cr8 << 4;
+}
+
 static void whpx_set_registers(CPUState *cpu, int level)
 {
 struct whpx_state *whpx = _global;
@@ -284,7 +306,7 @@ static void whpx_set_registers(CPUState *cpu, int level)
 v86 = (env->eflags & VM_MASK);
 r86 = !(env->cr[0] & CR0_PE_MASK);
 
-vcpu->tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
+vcpu->tpr =
whpx_apic_tpr_to_cr8(cpu_get_apic_tpr(x86_cpu->apic_state));
 vcpu->apic_base = cpu_get_apic_base(x86_cpu->apic_state);
 
 idx = 0;
@@ -475,6 +497,17 @@ static void whpx_get_registers(CPUState *cpu)
  hr);
 }
 
+if (whpx_apic_in_platform()) {
+/*
+ * Fetch the TPR value from the emulated APIC. It may get
overwritten
+ * below with the value from CR8 returned by
+ * WHvGetVirtualProcessorRegisters().
+ */
+whpx_apic_get(x86_cpu->apic_state);
+vcpu->tpr = whpx_apic_tpr_to_cr8(
+cpu_get_apic_tpr(x86_cpu->apic_state));
+}
+
 idx = 0;
 
 /* Indexes for first 16 registers match between HV and QEMU definitions
*/
@@ -521,8 +554,12 @@ static void whpx_get_registers(CPUState *cpu)
 assert(whpx_register_names[idx] == WHvX64RegisterCr8);
 tpr = vcxt.values[idx++].Reg64;
 if (tpr != vcpu->tpr) {
+/*
+ * TPR value stored in the CR8 register doesn't match the one
fetched
+ * from the emulated APIC. Override the latter with the former.
+ */
 vcpu->tpr = tpr;
-cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
+cpu_set_apic_tpr(x86_cpu->apic_state, whpx_cr8_to_apic_tpr(tpr));
 }
 
 /* 8 Debug Registers - Skipped */
@@ -600,10 +637,6 @@ static void whpx_get_registers(CPUState *cpu)
 
 assert(idx == RTL_NUMBER_OF(whpx_register_names));
 
-if (whpx_apic_in_platform()) {
-whpx_apic_get(x86_cpu->apic_state);
-}
-
 x86_update_hflags(env);
 
 return;
@@ -865,7 +898,7 @@ static void whpx_vcpu_pre_run(CPUState *cpu)
  }
 
 /* Sync the TPR to the CR8 if was modified during the intercept */
-tpr = cpu_get_apic_tpr(x86_cpu->apic_state);
+tpr = whpx_apic_tpr_to_cr8(cpu_get_apic_tpr(x86_cpu->apic_state));
 if (tpr != vcpu->tpr) {
 vcpu->tpr = tpr;
 reg_values[reg_count].Reg64 = tpr;
@@ -914,7 +947,7 @@ static void whpx_vcpu_post_run(CPUState *cpu)
 if (vcpu->tpr != tpr) {
 vcpu->tpr = tpr;
 qemu_mutex_lock_iothread();
-cpu_set_apic_tpr(x86_cpu->apic_state, vcpu->tpr);
+cpu_set_apic_tpr(x86_cpu->apic_state,
whpx_cr8_to_apic_tpr(vcpu->tpr));
 qemu_mutex_unlock_iothread();
 }
 
-- 
2.29.2.windows.3





[PATCH 1/3] whpx: Fixed reporting of the CPU context to GDB for 64-bit

2022-02-22 Thread Ivan Shcherbakov
Hi All,

We have been looking into kernel-debugging Linux VMs running on Windows with
Hyper-V enabled (that forces the virtualization software to use WHPX), and
it turned out, none of the major virtualization tools supports it properly.
I've added the missing parts to QEMU and it looks pretty solid: setting
breakpoints in the kernel, running, stepping in/over works reliably and
fast.
The changes involved 3 parts:
1. Fixing the x64 register reporting to gdb (this patch)
2. Fixing synchronization of CR8 <=> APIC.TPR, that was preventing
WHvSetVirtualProcessorRegisters() from working
3. Implementing software breakpoints 

It would be great if the changes could be integrated into the QEMU
repository, allowing other Windows users to debug their VMs.
Below is the description of the first patch.

This change makes sure that stopping in the 64-bit mode will set the
HF_CS64_MASK flag in env->hflags (see x86_update_hflags() in
target/i386/cpu.c).
Without it, the code in gdbstub.c would only use the 32-bit register values
when debugging 64-bit targets, making debugging effectively impossible.

Signed-off-by: Ivan Shcherbakov 
---
 target/i386/whpx/whpx-all.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index ef896da0a2..edd4fafbdf 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -604,6 +604,8 @@ static void whpx_get_registers(CPUState *cpu)
 whpx_apic_get(x86_cpu->apic_state);
 }
 
+x86_update_hflags(env);
+
 return;
 }
 
--