Re: [PATCH v3 5/6] Move tcg implementation of x86 get_physical_address into common helper code.

2024-06-15 Thread Don Porter

On 6/7/24 1:03 PM, Richard Henderson wrote:

On 6/6/24 07:02, Don Porter wrote:

Signed-off-by: Don Porter 
---
  target/i386/cpu.h    |  42 ++
  target/i386/helper.c | 515 +
  target/i386/tcg/sysemu/excp_helper.c | 555 +--
  3 files changed, 562 insertions(+), 550 deletions(-)


Why do you want to move this code out of a tcg specific file?
I think this is definitely wrong. 


I want to share the get_physical_address() code with 
x86_cpu_get_phys_page_attrs_debug().


I assume that newly shared code should migrate into a common file, but 
happy to adopt another strategy.


Best,

Don




Re: [PATCH v3 1/6] Add an "info pg" command that prints the current page tables

2024-06-14 Thread Don Porter

On 6/7/24 1:43 PM, Richard Henderson wrote:

On 6/6/24 07:02, Don Porter wrote:

+/**
+ * get_pte - Copy the contents of the page table entry at node[i] 
into pt_entry.
+ *   Optionally, add the relevant bits to the virtual 
address in

+ *   vaddr_pte.
+ *
+ * @cs - CPU state
+ * @node - physical address of the current page table node
+ * @i - index (in page table entries, not bytes) of the page table
+ *  entry, within node
+ * @height - height of node within the tree (leaves are 1, not 0)
+ * @pt_entry - Poiter to a PTE_t, stores the contents of the page 
table entry
+ * @vaddr_parent - The virtual address bits already translated in 
walking the
+ * page table to node.  Optional: only used if 
vaddr_pte is set.
+ * @vaddr_pte - Optional pointer to a variable storing the virtual 
address bits

+ *  translated by node[i].
+ * @pte_paddr - Pointer to the physical address of the PTE within node.
+ *  Optional parameter.
+ */
+void
+x86_get_pte(CPUState *cs, hwaddr node, int i, int height,
+    PTE_t *pt_entry, vaddr vaddr_parent, vaddr *vaddr_pte,
+    hwaddr *pte_paddr)
+
+{
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = >env;
+    int32_t a20_mask = x86_get_a20_mask(env);
+    hwaddr pte;
+
+    if (env->hflags & HF_LMA_MASK) {
+    /* 64 bit */
+    int pte_width = 8;
+    pte = (node + (i * pte_width)) & a20_mask;
+    pt_entry->pte64_t = address_space_ldq(cs->as, pte,
+ MEMTXATTRS_UNSPECIFIED, NULL);
+    } else {
+    /* 32 bit */
+    int pte_width = 4;
+    pte = (node + (i * pte_width)) & a20_mask;
+    pt_entry->pte32_t = address_space_ldl(cs->as, pte,
+ MEMTXATTRS_UNSPECIFIED, NULL);
+    }
+
+    if (vaddr_pte) {
+    int shift = 0;
+    _mmu_decode_va_parameters(cs, height, , NULL);
+    *vaddr_pte = vaddr_parent | ((i & 0x1ffULL) << shift);
+    }
+
+    if (pte_paddr) {
+    *pte_paddr = pte;
+    }
+}


This fails to recurse with nested page tables, which definitely breaks 
the TCG walker.


Hi Richard,

Thank you again for all of the advice and feedback.

What do you think the correct semantics should be for nested paging?

My understanding is that the current 'info mem' command on x86 does not 
recur on nested page tables, but this does seem like a useful 
extension.  Same for 'info tlb'.


In the case of 'info pg', I might want to print each page table 
separately, rather than a combined/shadow view.


My reading of the tcg code is that it also walks the guest page tables 
first, then uses probe_access_full() to query the guest->host physical 
translation, but I may be misunderstanding the code: 
https://github.com/qemu/qemu/blob/046a64b9801343e2e89eef10c7a48eec8d8c0d4f/target/i386/tcg/sysemu/excp_helper.c#L432


In patch 6 of the series, I replace the chunk of mmu_translate() in tcg 
that walks the guest page tables, but leave the portion alone that does 
the nested page walk.


I'm open to implementing a nested walker, but it might be better not to 
add more functionality/changes to this patch series.


Thank you,

Don




Re: [PATCH v3 1/6] Add an "info pg" command that prints the current page tables

2024-06-14 Thread Don Porter

Hi Richard,

Thank you for all of the helpful comments!  v4 will be much cleaner as a 
result.


A few follow-ups inline.

On 6/7/24 12:57 PM, Richard Henderson wrote:

On 6/6/24 07:02, Don Porter wrote:

+
+    /**
+ * @mon_init_page_table_iterator: Callback to configure a page 
table

+ * iterator for use by a monitor function.
+ * Returns true on success, false if not supported (e.g., no 
paging disabled

+ * or not implemented on this CPU).
+ */
+    bool (*mon_init_page_table_iterator)(Monitor *mon,
+ struct mem_print_state 
*state);


I don't understand the purpose of this one as a target-specific hook.


The iterator needs some architecture-specific initialization, such as 
getting the virtual

and physical address width.




+    /**
+ * @flush_page_table_iterator_state: Prints the last entry,
+ * if one is present.  Useful for iterators that aggregate 
information

+ * across page table entries.
+ */
+    bool (*mon_flush_page_print_state)(CPUState *cs,
+   struct mem_print_state *state);


Is this specific to "info pg" or not?
It appears to be, but the description suggests it is not.


It is.

Thank you,

Don






Re: [PATCH v3 1/6] Add an "info pg" command that prints the current page tables

2024-06-11 Thread Don Porter

On 6/7/24 3:16 AM, Daniel P. Berrangé wrote:

On Thu, Jun 06, 2024 at 10:02:48AM -0400, Don Porter wrote:
Please don't add new HMP commands that don't have a QMP
equivalent.

This should be adding an 'x-query-pg' QMP command, which
returns HumanReadableText, and then call that from the HMP

There is guidance on this here:

   
https://www.qemu.org/docs/master/devel/writing-monitor-commands.html#writing-a-debugging-aid-returning-unstructured-text

If you need more real examples, look at the various
'x-query-' commands in qapi/machine.json  and
their impl.


Thank you both for the pointers.  This makes sense to me;
outputting a string is much cleaner.  Will implement in v4...

-dp




[PATCH v3 5/6] Move tcg implementation of x86 get_physical_address into common helper code.

2024-06-06 Thread Don Porter
Signed-off-by: Don Porter 
---
 target/i386/cpu.h|  42 ++
 target/i386/helper.c | 515 +
 target/i386/tcg/sysemu/excp_helper.c | 555 +--
 3 files changed, 562 insertions(+), 550 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 1e463cc556..2c7cfe7901 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2136,6 +2136,43 @@ struct X86CPUClass {
 ResettablePhases parent_phases;
 };
 
+
+typedef struct X86TranslateParams {
+target_ulong addr;
+target_ulong cr3;
+int pg_mode;
+int mmu_idx;
+int ptw_idx;
+MMUAccessType access_type;
+} X86TranslateParams;
+
+typedef struct X86TranslateResult {
+hwaddr paddr;
+int prot;
+int page_size;
+} X86TranslateResult;
+
+typedef enum X86TranslateFaultStage2 {
+S2_NONE,
+S2_GPA,
+S2_GPT,
+} X86TranslateFaultStage2;
+
+typedef struct X86TranslateFault {
+int exception_index;
+int error_code;
+target_ulong cr2;
+X86TranslateFaultStage2 stage2;
+} X86TranslateFault;
+
+typedef struct X86PTETranslate {
+CPUX86State *env;
+X86TranslateFault *err;
+int ptw_idx;
+void *haddr;
+hwaddr gaddr;
+} X86PTETranslate;
+
 #ifndef CONFIG_USER_ONLY
 extern const VMStateDescription vmstate_x86_cpu;
 #endif
@@ -2180,6 +2217,11 @@ void x86_cpu_list(void);
 int cpu_x86_support_mca_broadcast(CPUX86State *env);
 
 #ifndef CONFIG_USER_ONLY
+bool x86_cpu_get_physical_address(CPUX86State *env, vaddr addr,
+  MMUAccessType access_type, int mmu_idx,
+  X86TranslateResult *out,
+  X86TranslateFault *err, uint64_t ra);
+
 hwaddr x86_cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
  MemTxAttrs *attrs);
 int cpu_get_pic_interrupt(CPUX86State *s);
diff --git a/target/i386/helper.c b/target/i386/helper.c
index f9d1381f90..746570a442 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -26,6 +26,7 @@
 #include "sysemu/hw_accel.h"
 #include "monitor/monitor.h"
 #include "kvm/kvm_i386.h"
+#include "exec/cpu_ldst.h"
 #endif
 #include "qemu/log.h"
 #ifdef CONFIG_TCG
@@ -231,6 +232,520 @@ void cpu_x86_update_cr4(CPUX86State *env, uint32_t 
new_cr4)
 }
 
 #if !defined(CONFIG_USER_ONLY)
+
+static inline uint32_t ptw_ldl(const X86PTETranslate *in, uint64_t ra)
+{
+if (likely(in->haddr)) {
+return ldl_p(in->haddr);
+}
+return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra);
+}
+
+static inline uint64_t ptw_ldq(const X86PTETranslate *in, uint64_t ra)
+{
+if (likely(in->haddr)) {
+return ldq_p(in->haddr);
+}
+return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra);
+}
+/*
+ * Note that we can use a 32-bit cmpxchg for all page table entries,
+ * even 64-bit ones, because PG_PRESENT_MASK, PG_ACCESSED_MASK and
+ * PG_DIRTY_MASK are all in the low 32 bits.
+ */
+static bool ptw_setl_slow(const X86PTETranslate *in, uint32_t old, uint32_t 
new)
+{
+uint32_t cmp;
+
+/* Does x86 really perform a rmw cycle on mmio for ptw? */
+start_exclusive();
+cmp = cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0);
+if (cmp == old) {
+cpu_stl_mmuidx_ra(in->env, in->gaddr, new, in->ptw_idx, 0);
+}
+end_exclusive();
+return cmp == old;
+}
+
+static inline bool ptw_setl(const X86PTETranslate *in, uint32_t old,
+uint32_t set)
+{
+if (set & ~old) {
+uint32_t new = old | set;
+if (likely(in->haddr)) {
+old = cpu_to_le32(old);
+new = cpu_to_le32(new);
+return qatomic_cmpxchg((uint32_t *)in->haddr, old, new) == old;
+}
+return ptw_setl_slow(in, old, new);
+}
+return true;
+}
+
+
+static bool ptw_translate(X86PTETranslate *inout, hwaddr addr, uint64_t ra)
+{
+CPUTLBEntryFull *full;
+int flags;
+
+inout->gaddr = addr;
+flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE,
+  inout->ptw_idx, true, >haddr, , ra);
+
+if (unlikely(flags & TLB_INVALID_MASK)) {
+X86TranslateFault *err = inout->err;
+
+assert(inout->ptw_idx == MMU_NESTED_IDX);
+*err = (X86TranslateFault){
+.error_code = inout->env->error_code,
+.cr2 = addr,
+.stage2 = S2_GPT,
+};
+return false;
+}
+return true;
+}
+
+static bool x86_mmu_translate(CPUX86State *env, const X86TranslateParams *in,
+  X86TranslateResult *out,
+  X86TranslateFault *err, uint64_t ra)
+{
+const target_ulong addr = in->addr;
+const int pg_mode = in->pg_mode;
+const bool is_user = is_mmu_index_user(in->mmu_idx);
+

[PATCH v3 6/6] Convert x86_mmu_translate() to use common code.

2024-06-06 Thread Don Porter
Signed-off-by: Don Porter 
---
 target/i386/arch_memory_mapping.c|  44 +++-
 target/i386/cpu.h|   5 +-
 target/i386/helper.c | 374 +++
 target/i386/tcg/sysemu/excp_helper.c |   2 +-
 4 files changed, 129 insertions(+), 296 deletions(-)

diff --git a/target/i386/arch_memory_mapping.c 
b/target/i386/arch_memory_mapping.c
index b52e98133c..bccd290b9f 100644
--- a/target/i386/arch_memory_mapping.c
+++ b/target/i386/arch_memory_mapping.c
@@ -228,9 +228,38 @@ static void _mmu_decode_va_parameters(CPUState *cs, int 
height,
 }
 
 /**
- * get_pte - Copy the contents of the page table entry at node[i] into 
pt_entry.
- *   Optionally, add the relevant bits to the virtual address in
- *   vaddr_pte.
+ * x86_virtual_to_pte_index - Given a virtual address and height in
+ *   the page table radix tree, return the index that should be
+ *   used to look up the next page table entry (pte) in
+ *   translating an address.
+ *
+ * @cs - CPU state
+ * @vaddr - The virtual address to translate
+ * @height - height of node within the tree (leaves are 1, not 0).
+ *
+ * Example: In 32-bit x86 page tables, the virtual address is split
+ * into 10 bits at height 2, 10 bits at height 1, and 12 offset bits.
+ * So a call with VA and height 2 would return the first 10 bits of va,
+ * right shifted by 22.
+ */
+
+int x86_virtual_to_pte_index(CPUState *cs, target_ulong vaddr, int height)
+{
+int shift = 0;
+int width = 0;
+int mask = 0;
+
+_mmu_decode_va_parameters(cs, height, , );
+
+mask = (1 << width) - 1;
+
+return (vaddr >> shift) & mask;
+}
+
+/**
+ * x86_get_pte - Copy the contents of the page table entry at node[i]
+ *   into pt_entry.  Optionally, add the relevant bits to
+ *   the virtual address in vaddr_pte.
  *
  * @cs - CPU state
  * @node - physical address of the current page table node
@@ -249,7 +278,6 @@ void
 x86_get_pte(CPUState *cs, hwaddr node, int i, int height,
 PTE_t *pt_entry, vaddr vaddr_parent, vaddr *vaddr_pte,
 hwaddr *pte_paddr)
-
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
@@ -282,8 +310,8 @@ x86_get_pte(CPUState *cs, hwaddr node, int i, int height,
 }
 
 
-static bool
-mmu_pte_check_bits(CPUState *cs, PTE_t *pte, int64_t mask)
+bool
+x86_pte_check_bits(CPUState *cs, PTE_t *pte, int64_t mask)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
@@ -300,7 +328,7 @@ mmu_pte_check_bits(CPUState *cs, PTE_t *pte, int64_t mask)
 bool
 x86_pte_present(CPUState *cs, PTE_t *pte)
 {
-return mmu_pte_check_bits(cs, pte, PG_PRESENT_MASK);
+return x86_pte_check_bits(cs, pte, PG_PRESENT_MASK);
 }
 
 /**
@@ -312,7 +340,7 @@ x86_pte_present(CPUState *cs, PTE_t *pte)
 bool
 x86_pte_leaf(CPUState *cs, int height, PTE_t *pte)
 {
-return height == 1 || mmu_pte_check_bits(cs, pte, PG_PSE_MASK);
+return height == 1 || x86_pte_check_bits(cs, pte, PG_PSE_MASK);
 }
 
 /**
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 2c7cfe7901..978841a624 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2198,6 +2198,8 @@ bool x86_pte_present(CPUState *cs, PTE_t *pte);
 bool x86_pte_leaf(CPUState *cs, int height, PTE_t *pte);
 hwaddr x86_pte_child(CPUState *cs, PTE_t *pte, int height);
 uint64_t x86_pte_flags(uint64_t pte);
+bool x86_pte_check_bits(CPUState *cs, PTE_t *pte, int64_t mask);
+int x86_virtual_to_pte_index(CPUState *cs, target_ulong vaddr, int height);
 bool x86_cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
 Error **errp);
 bool x86_mon_init_page_table_iterator(Monitor *mon,
@@ -2220,7 +,8 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env);
 bool x86_cpu_get_physical_address(CPUX86State *env, vaddr addr,
   MMUAccessType access_type, int mmu_idx,
   X86TranslateResult *out,
-  X86TranslateFault *err, uint64_t ra);
+  X86TranslateFault *err, uint64_t ra,
+  bool read_only);
 
 hwaddr x86_cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
  MemTxAttrs *attrs);
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 746570a442..4e5467ee57 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -308,7 +308,8 @@ static bool ptw_translate(X86PTETranslate *inout, hwaddr 
addr, uint64_t ra)
 
 static bool x86_mmu_translate(CPUX86State *env, const X86TranslateParams *in,
   X86TranslateResult *out,
-  X86TranslateFault *err, uint64_t ra)
+  X86TranslateFault *err, uint64_t ra,
+  bool read_only)
 {
 const target_ulong addr = in->addr;
 const int pg_mode = in->pg_mode;
@@ -324,6 +325,10 @@ s

[PATCH v3 3/6] Convert 'info mem' to use generic iterator

2024-06-06 Thread Don Porter
Signed-off-by: Don Porter 
---
 include/hw/core/sysemu-cpu-ops.h |   6 +
 include/monitor/monitor.h|   4 +
 monitor/hmp-cmds-target.c|   5 +-
 target/i386/cpu.c|   1 +
 target/i386/cpu.h|   1 +
 target/i386/monitor.c| 354 ---
 6 files changed, 60 insertions(+), 311 deletions(-)

diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
index bf3de3e004..3bef129460 100644
--- a/include/hw/core/sysemu-cpu-ops.h
+++ b/include/hw/core/sysemu-cpu-ops.h
@@ -250,6 +250,12 @@ typedef struct SysemuCPUOps {
 void (*mon_print_pte) (Monitor *mon, CPUArchState *env, hwaddr addr,
hwaddr pte);
 
+/**
+ * @mon_print_mem: Hook called by the monitor to print a range
+ * of memory mappings in 'info mem'
+ */
+bool (*mon_print_mem)(CPUState *cs, struct mem_print_state *state);
+
 } SysemuCPUOps;
 
 #endif /* SYSEMU_CPU_OPS_H */
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 965f5d5450..e954946ba0 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -5,6 +5,7 @@
 #include "qapi/qapi-types-misc.h"
 #include "qemu/readline.h"
 #include "exec/hwaddr.h"
+#include "hw/core/cpu.h"
 
 typedef struct MonitorHMP MonitorHMP;
 typedef struct MonitorOptions MonitorOptions;
@@ -63,4 +64,7 @@ void monitor_register_hmp_info_hrt(const char *name,
 int error_vprintf_unless_qmp(const char *fmt, va_list ap) G_GNUC_PRINTF(1, 0);
 int error_printf_unless_qmp(const char *fmt, ...) G_GNUC_PRINTF(1, 2);
 
+int compressing_iterator(CPUState *cs, void *data, PTE_t *pte, vaddr vaddr_in,
+ int height, int offset);
+
 #endif /* MONITOR_H */
diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c
index 3393e5ad0b..8ce37d3187 100644
--- a/monitor/hmp-cmds-target.c
+++ b/monitor/hmp-cmds-target.c
@@ -122,9 +122,8 @@ void hmp_info_registers(Monitor *mon, const QDict *qdict)
 }
 
 /* Assume only called on present entries */
-static
-int compressing_iterator(CPUState *cs, void *data, PTE_t *pte,
- vaddr vaddr_in, int height, int offset)
+int compressing_iterator(CPUState *cs, void *data, PTE_t *pte, vaddr vaddr_in,
+ int height, int offset)
 {
 CPUClass *cc = CPU_GET_CLASS(cs);
 struct mem_print_state *state = (struct mem_print_state *) data;
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8bd6164b68..046d75f6bb 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -8316,6 +8316,7 @@ static const struct SysemuCPUOps i386_sysemu_ops = {
 .mon_init_page_table_iterator = _mon_init_page_table_iterator,
 .mon_info_pg_print_header = _mon_info_pg_print_header,
 .mon_flush_page_print_state = _mon_flush_print_pg_state,
+.mon_print_mem = _mon_print_mem,
 };
 #endif
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 1346ec0033..1e463cc556 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2169,6 +2169,7 @@ void x86_mon_info_pg_print_header(Monitor *mon, struct 
mem_print_state *state);
 bool x86_mon_flush_print_pg_state(CPUState *cs, struct mem_print_state *state);
 void x86_mon_print_pte(Monitor *mon, CPUArchState *env, hwaddr addr,
hwaddr pte);
+bool x86_mon_print_mem(CPUState *cs, struct mem_print_state *state);
 
 void x86_cpu_dump_state(CPUState *cs, FILE *f, int flags);
 
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index ecde164857..215c018d1f 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -281,332 +281,70 @@ void hmp_info_tlb(Monitor *mon, const QDict *qdict)
 for_each_pte(cs, _print_tlb, , false, false);
 }
 
-static void mem_print(Monitor *mon, CPUArchState *env,
-  hwaddr *pstart, int *plast_prot,
-  hwaddr end, int prot)
+bool x86_mon_print_mem(CPUState *cs, struct mem_print_state *state)
 {
-int prot1;
-prot1 = *plast_prot;
-if (prot != prot1) {
-if (*pstart != -1) {
-monitor_printf(mon, HWADDR_FMT_plx "-" HWADDR_FMT_plx " "
-   HWADDR_FMT_plx " %c%c%c\n",
-   addr_canonical(env, *pstart),
-   addr_canonical(env, end),
-   addr_canonical(env, end - *pstart),
-   prot1 & PG_USER_MASK ? 'u' : '-',
-   'r',
-   prot1 & PG_RW_MASK ? 'w' : '-');
-}
-if (prot != 0)
-*pstart = end;
-else
-*pstart = -1;
-*plast_prot = prot;
-}
-}
+CPUArchState *env = state->env;
+int i = 0;
 
-static void mem_info_32(Monitor *mon, CPUArchState *env)
-{
-unsigned int l1, l2;
-int prot, last_prot;
-uint32_t pgd, pde, pte;
-hwaddr start, end;
-
-pgd = env->

[PATCH v3 2/6] Convert 'info tlb' to use generic iterator

2024-06-06 Thread Don Porter
Signed-off-by: Don Porter 
---
 include/hw/core/sysemu-cpu-ops.h |   7 +
 monitor/hmp-cmds-target.c|   1 +
 target/i386/cpu.h|   2 +
 target/i386/monitor.c| 217 ++-
 4 files changed, 53 insertions(+), 174 deletions(-)

diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
index eb16a1c3e2..bf3de3e004 100644
--- a/include/hw/core/sysemu-cpu-ops.h
+++ b/include/hw/core/sysemu-cpu-ops.h
@@ -243,6 +243,13 @@ typedef struct SysemuCPUOps {
 bool (*mon_flush_page_print_state)(CPUState *cs,
struct mem_print_state *state);
 
+/**
+ * @mon_print_pte: Hook called by the monitor to print a page
+ * table entry at address addr, with contents pte.
+ */
+void (*mon_print_pte) (Monitor *mon, CPUArchState *env, hwaddr addr,
+   hwaddr pte);
+
 } SysemuCPUOps;
 
 #endif /* SYSEMU_CPU_OPS_H */
diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c
index 60a8bd0c37..3393e5ad0b 100644
--- a/monitor/hmp-cmds-target.c
+++ b/monitor/hmp-cmds-target.c
@@ -318,6 +318,7 @@ void hmp_info_pg(Monitor *mon, const QDict *qdict)
 /* Print last entry, if one present */
 cc->sysemu_ops->mon_flush_page_print_state(cs, );
 }
+
 static void memory_dump(Monitor *mon, int count, int format, int wsize,
 hwaddr addr, int is_physical)
 {
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index cbb6f6fc4d..1346ec0033 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2167,6 +2167,8 @@ bool x86_mon_init_page_table_iterator(Monitor *mon,
   struct mem_print_state *state);
 void x86_mon_info_pg_print_header(Monitor *mon, struct mem_print_state *state);
 bool x86_mon_flush_print_pg_state(CPUState *cs, struct mem_print_state *state);
+void x86_mon_print_pte(Monitor *mon, CPUArchState *env, hwaddr addr,
+   hwaddr pte);
 
 void x86_cpu_dump_state(CPUState *cs, FILE *f, int flags);
 
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 65e82e73e8..ecde164857 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -214,202 +214,71 @@ static hwaddr addr_canonical(CPUArchState *env, hwaddr 
addr)
 return addr;
 }
 
-static void print_pte(Monitor *mon, CPUArchState *env, hwaddr addr,
-  hwaddr pte, hwaddr mask)
+void x86_mon_print_pte(Monitor *mon, CPUArchState *env, hwaddr addr,
+   hwaddr pte)
 {
+char buf[128];
+char *pos = buf, *end = buf + sizeof(buf);
+
 addr = addr_canonical(env, addr);
 
-monitor_printf(mon, HWADDR_FMT_plx ": " HWADDR_FMT_plx
-   " %c%c%c%c%c%c%c%c%c\n",
-   addr,
-   pte & mask,
-   pte & PG_NX_MASK ? 'X' : '-',
-   pte & PG_GLOBAL_MASK ? 'G' : '-',
-   pte & PG_PSE_MASK ? 'P' : '-',
-   pte & PG_DIRTY_MASK ? 'D' : '-',
-   pte & PG_ACCESSED_MASK ? 'A' : '-',
-   pte & PG_PCD_MASK ? 'C' : '-',
-   pte & PG_PWT_MASK ? 'T' : '-',
-   pte & PG_USER_MASK ? 'U' : '-',
-   pte & PG_RW_MASK ? 'W' : '-');
-}
+pos += snprintf(pos, end - pos, HWADDR_FMT_plx ": " HWADDR_FMT_plx " ",
+addr, (hwaddr) (pte & PG_ADDRESS_MASK));
 
-static void tlb_info_32(Monitor *mon, CPUArchState *env)
-{
-unsigned int l1, l2;
-uint32_t pgd, pde, pte;
+pos += snprintf(pos, end - pos, " %s", pg_bits(pte));
 
-pgd = env->cr[3] & ~0xfff;
-for(l1 = 0; l1 < 1024; l1++) {
-cpu_physical_memory_read(pgd + l1 * 4, , 4);
-pde = le32_to_cpu(pde);
-if (pde & PG_PRESENT_MASK) {
-if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) {
-/* 4M pages */
-print_pte(mon, env, (l1 << 22), pde, ~((1 << 21) - 1));
-} else {
-for(l2 = 0; l2 < 1024; l2++) {
-cpu_physical_memory_read((pde & ~0xfff) + l2 * 4, , 4);
-pte = le32_to_cpu(pte);
-if (pte & PG_PRESENT_MASK) {
-print_pte(mon, env, (l1 << 22) + (l2 << 12),
-  pte & ~PG_PSE_MASK,
-  ~0xfff);
-}
-}
-}
-}
+/* Trim line to fit screen */
+if (pos - buf > 79) {
+strcpy(buf + 77, "..");
 }
-}
 
-static void tlb_info_pae32(Monitor *mon, CPUArchState *env)
-{
-unsigned int l1, l2, l3;
-uint64_t pdpe, pde, pte;
-uint64_t pdp_addr, pd_addr, pt_addr;
-
-pdp_addr = env->cr[3] & ~0x1f;
-for (l1 = 0; l1 < 4; l1++) {
-   

[PATCH v3 4/6] Convert x86_cpu_get_memory_mapping() to use generic iterators

2024-06-06 Thread Don Porter
Signed-off-by: Don Porter 
---
 target/i386/arch_memory_mapping.c | 320 --
 1 file changed, 43 insertions(+), 277 deletions(-)

diff --git a/target/i386/arch_memory_mapping.c 
b/target/i386/arch_memory_mapping.c
index 562a00b5a7..b52e98133c 100644
--- a/target/i386/arch_memory_mapping.c
+++ b/target/i386/arch_memory_mapping.c
@@ -19,6 +19,7 @@
  ** code hook implementations for x86 ***
  */
 
+/* PAE Paging or IA-32e Paging */
 #define PML4_ADDR_MASK 0xff000ULL /* selects bits 51:12 */
 
 /**
@@ -365,301 +366,66 @@ x86_pte_child(CPUState *cs, PTE_t *pte, int height)
 return -1;
 }
 
-/* PAE Paging or IA-32e Paging */
-static void walk_pte(MemoryMappingList *list, AddressSpace *as,
- hwaddr pte_start_addr,
- int32_t a20_mask, target_ulong start_line_addr)
-{
-hwaddr pte_addr, start_paddr;
-uint64_t pte;
-target_ulong start_vaddr;
-int i;
-
-for (i = 0; i < 512; i++) {
-pte_addr = (pte_start_addr + i * 8) & a20_mask;
-pte = address_space_ldq(as, pte_addr, MEMTXATTRS_UNSPECIFIED, NULL);
-if (!(pte & PG_PRESENT_MASK)) {
-/* not present */
-continue;
-}
-
-start_paddr = (pte & ~0xfff) & ~(0x1ULL << 63);
-if (cpu_physical_memory_is_io(start_paddr)) {
-/* I/O region */
-continue;
-}
-
-start_vaddr = start_line_addr | ((i & 0x1ff) << 12);
-memory_mapping_list_add_merge_sorted(list, start_paddr,
- start_vaddr, 1 << 12);
-}
-}
-
-/* 32-bit Paging */
-static void walk_pte2(MemoryMappingList *list, AddressSpace *as,
-  hwaddr pte_start_addr, int32_t a20_mask,
-  target_ulong start_line_addr)
-{
-hwaddr pte_addr, start_paddr;
-uint32_t pte;
-target_ulong start_vaddr;
-int i;
-
-for (i = 0; i < 1024; i++) {
-pte_addr = (pte_start_addr + i * 4) & a20_mask;
-pte = address_space_ldl(as, pte_addr, MEMTXATTRS_UNSPECIFIED, NULL);
-if (!(pte & PG_PRESENT_MASK)) {
-/* not present */
-continue;
-}
-
-start_paddr = pte & ~0xfff;
-if (cpu_physical_memory_is_io(start_paddr)) {
-/* I/O region */
-continue;
-}
-
-start_vaddr = start_line_addr | ((i & 0x3ff) << 12);
-memory_mapping_list_add_merge_sorted(list, start_paddr,
- start_vaddr, 1 << 12);
-}
-}
-
-/* PAE Paging or IA-32e Paging */
-#define PLM4_ADDR_MASK 0xff000ULL /* selects bits 51:12 */
+/**
+ * Back to x86 hooks
+ */
+struct memory_mapping_data {
+MemoryMappingList *list;
+};
 
-static void walk_pde(MemoryMappingList *list, AddressSpace *as,
- hwaddr pde_start_addr,
- int32_t a20_mask, target_ulong start_line_addr)
+static int add_memory_mapping_to_list(CPUState *cs, void *data, PTE_t *pte,
+  vaddr vaddr_in, int height,
+  int offset)
 {
-hwaddr pde_addr, pte_start_addr, start_paddr;
-uint64_t pde;
-target_ulong line_addr, start_vaddr;
-int i;
-
-for (i = 0; i < 512; i++) {
-pde_addr = (pde_start_addr + i * 8) & a20_mask;
-pde = address_space_ldq(as, pde_addr, MEMTXATTRS_UNSPECIFIED, NULL);
-if (!(pde & PG_PRESENT_MASK)) {
-/* not present */
-continue;
-}
-
-line_addr = start_line_addr | ((i & 0x1ff) << 21);
-if (pde & PG_PSE_MASK) {
-/* 2 MB page */
-start_paddr = (pde & ~0x1f) & ~(0x1ULL << 63);
-if (cpu_physical_memory_is_io(start_paddr)) {
-/* I/O region */
-continue;
-}
-start_vaddr = line_addr;
-memory_mapping_list_add_merge_sorted(list, start_paddr,
- start_vaddr, 1 << 21);
-continue;
-}
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
 
-pte_start_addr = (pde & PLM4_ADDR_MASK) & a20_mask;
-walk_pte(list, as, pte_start_addr, a20_mask, line_addr);
-}
-}
+struct memory_mapping_data *mm_data = (struct memory_mapping_data *) data;
 
-/* 32-bit Paging */
-static void walk_pde2(MemoryMappingList *list, AddressSpace *as,
-  hwaddr pde_start_addr, int32_t a20_mask,
-  bool pse)
-{
-hwaddr pde_addr, pte_start_addr, start_paddr, high_paddr;
-uint32_t pde;
-target_ulong line_addr, start_vaddr;
-int i;
-
-for (i = 0; i < 1024; i++) {
-pde_addr = (pde_start_addr + i * 4) & a20_mask;
-pde = address_space_ldl(as, pde_addr, MEMTXATTRS_UNSPECI

[PATCH v3 0/6] Rework x86 page table walks

2024-06-06 Thread Don Porter
This version of the 'info pg' command adopts Peter Maydell's request
to write guest-agnostic page table iterator and accessor code, along
with architecture-specific hooks.  The first patch in this series
contributes a generic page table iterator and an x86 instantiation.
As a client, we first introduce an 'info pg' monitor command, as well
as a compressing callback hook for creating succinct page table
representations.

After this, each successive patch replaces an exisitng x86 page table
walker with a use of common iterator code.

I could use advice on how to ensure this is sufficiently well tested.
I used 'make check' and 'make check-avocado', which both pass; what is
the typical standard for testing something like a page table related
change?

As far as generality, I have only attempted this on x86, but I expect
the design would work for any similar radix-tree style page table.

I am still new enough to the code base that I wasn't certain about
where to put the generic code, as well as naming conventions.

Per David Gilbert's suggestion, I was careful to ensure that monitor
calls do not perturb TLB state (see the read-only flag in some
functions).

Version 3 of this patch series moves 'info pg' into common monitor
code and implements the architecture-specific code hooks.  I did not
do this with the 'info mem' and 'info tlb' commands, since they have
implementations on other ISAs.

Don Porter (6):
  Add an "info pg" command that prints the current page tables
  Convert 'info tlb' to use generic iterator
  Convert 'info mem' to use generic iterator
  Convert x86_cpu_get_memory_mapping() to use generic iterators
  Move tcg implementation of x86 get_physical_address into common helper
code.
  Convert x86_mmu_translate() to use common code.

 hmp-commands-info.hx |  13 +
 hw/core/cpu-sysemu.c | 140 ++
 include/hw/core/cpu.h|  34 +-
 include/hw/core/sysemu-cpu-ops.h | 169 +++
 include/monitor/hmp-target.h |   1 +
 include/monitor/monitor.h|   4 +
 monitor/hmp-cmds-target.c| 198 
 target/i386/arch_memory_mapping.c| 621 ++-
 target/i386/cpu.c|  12 +
 target/i386/cpu.h|  63 +++
 target/i386/helper.c | 523 +++
 target/i386/monitor.c| 724 +--
 target/i386/tcg/sysemu/excp_helper.c | 555 +---
 13 files changed, 1688 insertions(+), 1369 deletions(-)

--
2.34.1



[PATCH v3 1/6] Add an "info pg" command that prints the current page tables

2024-06-06 Thread Don Porter
The new "info pg" monitor command prints the current page table,
including virtual address ranges, flag bits, and snippets of physical
page numbers.  Completely filled regions of the page table with
compatible flags are "folded", with the result that the complete
output for a freshly booted x86-64 Linux VM can fit in a single
terminal window.  The output looks like this:

VPN range Entry FlagsPhysical page
[7f000-7f000] PML4[0fe] ---DA--UWP
  [7f28c-7f28f]  PDP[0a3] ---DA--UWP
[7f28c4600-7f28c47ff]  PDE[023] ---DA--UWP
  [7f28c4655-7f28c4656]  PTE[055-056] X--D---U-P 007f14-007f15
  [7f28c465b-7f28c465b]  PTE[05b] A--U-P 001cfc
...
[ff800-ff800] PML4[1ff] ---DA--UWP
  [8-b]  PDP[1fe] ---DA---WP
[81000-81dff]  PDE[008-00e] -GSDA---WP 001000-001dff
  [c-f]  PDP[1ff] ---DA--UWP
[ff400-ff5ff]  PDE[1fa] ---DA--UWP
  [ff5fb-ff5fc]  PTE[1fb-1fc] XG-DACT-WP 0fec00 0fee00
[ff600-ff7ff]  PDE[1fb] ---DA--UWP
  [ff600-ff600]  PTE[000] -G-DA--U-P 001467

This draws heavy inspiration from Austin Clements' original patch.

This also adds a generic page table walker, which other monitor
and execution commands will be migrated to in subsequent patches.

Signed-off-by: Don Porter 
---
 hmp-commands-info.hx  |  13 ++
 hw/core/cpu-sysemu.c  | 140 
 include/hw/core/cpu.h |  34 ++-
 include/hw/core/sysemu-cpu-ops.h  | 156 +
 include/monitor/hmp-target.h  |   1 +
 monitor/hmp-cmds-target.c | 198 +
 target/i386/arch_memory_mapping.c | 351 +-
 target/i386/cpu.c |  11 +
 target/i386/cpu.h |  15 ++
 target/i386/monitor.c | 165 ++
 10 files changed, 1082 insertions(+), 2 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 20a9835ea8..a873841920 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -242,6 +242,19 @@ SRST
 Show memory tree.
 ERST
 
+{
+.name   = "pg",
+.args_type  = "",
+.params = "",
+.help   = "show the page table",
+.cmd= hmp_info_pg,
+},
+
+SRST
+  ``info pg``
+Show the active page table.
+ERST
+
 #if defined(CONFIG_TCG)
 {
 .name   = "jit",
diff --git a/hw/core/cpu-sysemu.c b/hw/core/cpu-sysemu.c
index 2a9a2a4eb5..fd936fa90c 100644
--- a/hw/core/cpu-sysemu.c
+++ b/hw/core/cpu-sysemu.c
@@ -142,3 +142,143 @@ GuestPanicInformation *cpu_get_crash_info(CPUState *cpu)
 }
 return res;
 }
+
+/**
+ * _for_each_pte - recursive helper function
+ *
+ * @cs - CPU state
+ * @fn(cs, data, pte, vaddr, height) - User-provided function to call on each
+ * pte.
+ *   * @cs - pass through cs
+ *   * @data - user-provided, opaque pointer
+ *   * @pte - current pte
+ *   * @vaddr_in - virtual address translated by pte
+ *   * @height - height in the tree of pte
+ * @data - user-provided, opaque pointer, passed to fn()
+ * @visit_interior_nodes - if true, call fn() on page table entries in
+ * interior nodes.  If false, only call fn() on page
+ * table entries in leaves.
+ * @visit_not_present - if true, call fn() on entries that are not present.
+ * if false, visit only present entries.
+ * @node - The physical address of the current page table radix tree node
+ * @vaddr_in - The virtual address bits translated in walking the page
+ *  table to node
+ * @height - The height of node in the radix tree
+ *
+ * height starts at the max and counts down.
+ * In a 4 level x86 page table, pml4e is level 4, pdpe is level 3,
+ *  pde is level 2, and pte is level 1
+ *
+ * Returns true on success, false on error.
+ */
+static bool
+_for_each_pte(CPUState *cs,
+  int (*fn)(CPUState *cs, void *data, PTE_t *pte,
+vaddr vaddr_in, int height, int offset),
+  void *data, bool visit_interior_nodes,
+  bool visit_not_present, hwaddr node,
+  vaddr vaddr_in, int height)
+{
+int ptes_per_node;
+int i;
+
+assert(height > 0);
+
+CPUClass *cc = CPU_GET_CLASS(cs);
+
+if ((!cc->sysemu_ops->page_table_entries_per_node)
+|| (!cc->sysemu_ops->get_pte)
+|| (!cc->sysemu_ops->pte_present)
+|| (!cc->sysemu_ops->pte_leaf)
+|| (!cc->sysemu_ops->pte_child)) {
+return false;
+}
+
+ptes_per_node = cc->sysemu_ops->page_table_entries_per_node(cs, height);
+
+for (i = 0; i < ptes_per_node; i++) {
+PTE_t pt_entry;
+vaddr vaddr_i;
+bool pte_present;
+
+

Re: [PATCH v2 2/6] Convert 'info tlb' to use generic iterator

2024-06-05 Thread Don Porter

On 6/5/24 14:44, Richard Henderson wrote:

On 6/5/24 13:35, Don Porter wrote:


On 5/31/24 10:18, Dr. David Alan Gilbert wrote:

* Don Porter (por...@cs.unc.edu) wrote:

Signed-off-by: Don Porter

If this changes the output of 'info tlb' could you add a before/after
to the commit message please.


Thanks for the advice.  It should not change the output of info tlb.

Also, have a look at glib's g_printf and friends, you might find 
they're

easier;
https://www.manpagez.com/html/glib/glib-2.52.3/glib-String-Utility-Functions.php#g-printf 



Thanks for this tip too!  It isn't clear to me that converting will 
substantially simplify
the patch at this point, but I'm open to it if I'm missing something 
or this is the preferred

project style.


sprintf is deprecated in the current MacOS version, so using it 
produces warnings.  Once we convert our existing usage, we will want 
to poison new uses entirely.




Gotcha, makes sense.

Thank you,

Don




Re: [PATCH v2 2/6] Convert 'info tlb' to use generic iterator

2024-06-05 Thread Don Porter


On 5/31/24 10:18, Dr. David Alan Gilbert wrote:

* Don Porter (por...@cs.unc.edu) wrote:

Signed-off-by: Don Porter

If this changes the output of 'info tlb' could you add a before/after
to the commit message please.


Thanks for the advice.  It should not change the output of info tlb.


Also, have a look at glib's g_printf and friends, you might find they're
easier;
https://www.manpagez.com/html/glib/glib-2.52.3/glib-String-Utility-Functions.php#g-printf


Thanks for this tip too!  It isn't clear to me that converting will 
substantially simplify
the patch at this point, but I'm open to it if I'm missing something or 
this is the preferred

project style.

-Don

Re: [PATCH v2 1/6] Add an "info pg" command that prints the current page tables

2024-06-03 Thread Don Porter

On 6/3/24 4:46 AM, Philippe Mathieu-Daudé wrote:

On 31/5/24 16:11, Dr. David Alan Gilbert wrote:

* Don Porter (por...@cs.unc.edu) wrote:

The new "info pg" monitor command prints the current page table,
including virtual address ranges, flag bits, and snippets of physical
page numbers.  Completely filled regions of the page table with
compatible flags are "folded", with the result that the complete
output for a freshly booted x86-64 Linux VM can fit in a single
terminal window.  The output looks like this:

VPN range Entry Flags    Physical page
[7f000-7f000] PML4[0fe] ---DA--UWP
   [7f28c-7f28f]  PDP[0a3] ---DA--UWP
 [7f28c4600-7f28c47ff]  PDE[023] ---DA--UWP
   [7f28c4655-7f28c4656]  PTE[055-056] X--D---U-P 
007f14-007f15

   [7f28c465b-7f28c465b]  PTE[05b] A--U-P 001cfc
...
[ff800-ff800] PML4[1ff] ---DA--UWP
   [8-b]  PDP[1fe] ---DA---WP
 [81000-81dff]  PDE[008-00e] -GSDA---WP 
001000-001dff

   [c-f]  PDP[1ff] ---DA--UWP
 [ff400-ff5ff]  PDE[1fa] ---DA--UWP
   [ff5fb-ff5fc]  PTE[1fb-1fc] XG-DACT-WP 0fec00 
0fee00

 [ff600-ff7ff]  PDE[1fb] ---DA--UWP
   [ff600-ff600]  PTE[000] -G-DA--U-P 001467

This draws heavy inspiration from Austin Clements' original patch.

This also adds a generic page table walker, which other monitor
and execution commands will be migrated to in subsequent patches.

Signed-off-by: Don Porter 
---
  hmp-commands-info.hx  |  26 ++
  include/monitor/hmp-target.h  |   1 +
  target/i386/arch_memory_mapping.c | 486 
+-

  target/i386/cpu.h |  16 +
  target/i386/monitor.c | 380 +++
  5 files changed, 908 insertions(+), 1 deletion(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 20a9835ea8..918b82015c 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -237,6 +237,32 @@ ERST
  .cmd    = hmp_info_mtree,
  },
  +#if defined(TARGET_I386)


FYI in order to unify all QEMU system binaries as a single
one, we are trying to remove target-specific bits in monitor.

How 'info pg' should work in a binary supporting heterogeneous
emulation? 


In the discussion of v1 of this patch, it was suggested that we
rework this mmu related code into generic hooks that need to be
implemented on each ISA.  The hooks need to be migrated, but in
the monitor.c code there are several functions (pg_print,
pg_print_header, etc), and in arch_memory_mapping.c
(mmu_page_table_root, mmu_page_table_entries_per_node, etc)
that each ISA would need to implement to interpret its particular
page table structures.

I suppose one can also push the check for whether a given ISA supports
the command down one level in the code, and call a wrapper that
may do nothing on some ISAs.

Does that answer the question?

Thanks,

Don




Re: [PATCH v2 1/6] Add an "info pg" command that prints the current page tables

2024-06-03 Thread Don Porter

On 5/31/24 10:11 AM, Dr. David Alan Gilbert wrote:

* Don Porter (por...@cs.unc.edu) wrote:

+
+SRST   
|
+  ``info pg``  
|
+Show the active page table.
|
+ERST

Are those |'s over ->in the actual patch?


+{
+.name   = "mtree",
+.args_type  = "flatview:-f,dispatch_tree:-d,owner:-o,disabled:-D",
+.params = "[-f][-d][-o][-D]",
+.help   = "show memory tree (-f: dump flat view for address 
spaces;"
+  "-d: dump dispatch tree, valid with -f only);"
+  "-o: dump region owners/parents;"
+  "-D: dump disabled regions",
+.cmd= hmp_info_mtree,
+},

Hmm this looks like a copy-pasteism ?


Yep, my bad. Sorry about that - will be addressed in v3.




[PATCH v2 5/6] Move tcg implementation of x86 get_physical_address into common helper code.

2024-05-24 Thread Don Porter
Signed-off-by: Don Porter 
---
 target/i386/cpu.h|  41 ++
 target/i386/helper.c | 515 +
 target/i386/tcg/sysemu/excp_helper.c | 555 +--
 3 files changed, 561 insertions(+), 550 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index fc3ae55213..39ce49e61f 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2094,6 +2094,42 @@ struct X86CPUClass {
 ResettablePhases parent_phases;
 };
 
+typedef struct X86TranslateParams {
+target_ulong addr;
+target_ulong cr3;
+int pg_mode;
+int mmu_idx;
+int ptw_idx;
+MMUAccessType access_type;
+} X86TranslateParams;
+
+typedef struct X86TranslateResult {
+hwaddr paddr;
+int prot;
+int page_size;
+} X86TranslateResult;
+
+typedef enum X86TranslateFaultStage2 {
+S2_NONE,
+S2_GPA,
+S2_GPT,
+} X86TranslateFaultStage2;
+
+typedef struct X86TranslateFault {
+int exception_index;
+int error_code;
+target_ulong cr2;
+X86TranslateFaultStage2 stage2;
+} X86TranslateFault;
+
+typedef struct X86PTETranslate {
+CPUX86State *env;
+X86TranslateFault *err;
+int ptw_idx;
+void *haddr;
+hwaddr gaddr;
+} X86PTETranslate;
+
 /* Intended to become a generic PTE type */
 typedef union PTE {
 uint64_t pte64_t;
@@ -2137,6 +2173,11 @@ void x86_cpu_list(void);
 int cpu_x86_support_mca_broadcast(CPUX86State *env);
 
 #ifndef CONFIG_USER_ONLY
+bool x86_cpu_get_physical_address(CPUX86State *env, vaddr addr,
+  MMUAccessType access_type, int mmu_idx,
+  X86TranslateResult *out,
+  X86TranslateFault *err, uint64_t ra);
+
 hwaddr x86_cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
  MemTxAttrs *attrs);
 int cpu_get_pic_interrupt(CPUX86State *s);
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 48d1513a35..21445e84b2 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -26,6 +26,7 @@
 #include "sysemu/hw_accel.h"
 #include "monitor/monitor.h"
 #include "kvm/kvm_i386.h"
+#include "exec/cpu_ldst.h"
 #endif
 #include "qemu/log.h"
 #ifdef CONFIG_TCG
@@ -227,6 +228,520 @@ void cpu_x86_update_cr4(CPUX86State *env, uint32_t 
new_cr4)
 }
 
 #if !defined(CONFIG_USER_ONLY)
+
+static inline uint32_t ptw_ldl(const X86PTETranslate *in, uint64_t ra)
+{
+if (likely(in->haddr)) {
+return ldl_p(in->haddr);
+}
+return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra);
+}
+
+static inline uint64_t ptw_ldq(const X86PTETranslate *in, uint64_t ra)
+{
+if (likely(in->haddr)) {
+return ldq_p(in->haddr);
+}
+return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra);
+}
+/*
+ * Note that we can use a 32-bit cmpxchg for all page table entries,
+ * even 64-bit ones, because PG_PRESENT_MASK, PG_ACCESSED_MASK and
+ * PG_DIRTY_MASK are all in the low 32 bits.
+ */
+static bool ptw_setl_slow(const X86PTETranslate *in, uint32_t old, uint32_t 
new)
+{
+uint32_t cmp;
+
+/* Does x86 really perform a rmw cycle on mmio for ptw? */
+start_exclusive();
+cmp = cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0);
+if (cmp == old) {
+cpu_stl_mmuidx_ra(in->env, in->gaddr, new, in->ptw_idx, 0);
+}
+end_exclusive();
+return cmp == old;
+}
+
+static inline bool ptw_setl(const X86PTETranslate *in, uint32_t old,
+uint32_t set)
+{
+if (set & ~old) {
+uint32_t new = old | set;
+if (likely(in->haddr)) {
+old = cpu_to_le32(old);
+new = cpu_to_le32(new);
+return qatomic_cmpxchg((uint32_t *)in->haddr, old, new) == old;
+}
+return ptw_setl_slow(in, old, new);
+}
+return true;
+}
+
+
+static bool ptw_translate(X86PTETranslate *inout, hwaddr addr, uint64_t ra)
+{
+CPUTLBEntryFull *full;
+int flags;
+
+inout->gaddr = addr;
+flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE,
+  inout->ptw_idx, true, >haddr, , ra);
+
+if (unlikely(flags & TLB_INVALID_MASK)) {
+X86TranslateFault *err = inout->err;
+
+assert(inout->ptw_idx == MMU_NESTED_IDX);
+*err = (X86TranslateFault){
+.error_code = inout->env->error_code,
+.cr2 = addr,
+.stage2 = S2_GPT,
+};
+return false;
+}
+return true;
+}
+
+static bool x86_mmu_translate(CPUX86State *env, const X86TranslateParams *in,
+  X86TranslateResult *out,
+  X86TranslateFault *err, uint64_t ra)
+{
+const target_ulong addr = in->addr;
+const int pg_mode = in->pg_mode;
+const bool is_user = is_mmu_index_user(in->mmu_idx);
+

[PATCH v2 2/6] Convert 'info tlb' to use generic iterator

2024-05-24 Thread Don Porter
Signed-off-by: Don Porter 
---
 target/i386/monitor.c | 203 ++
 1 file changed, 28 insertions(+), 175 deletions(-)

diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index d7aae99c73..adf95edfb4 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -430,201 +430,54 @@ void hmp_info_pg(Monitor *mon, const QDict *qdict)
 }
 
 static void print_pte(Monitor *mon, CPUArchState *env, hwaddr addr,
-  hwaddr pte, hwaddr mask)
+  hwaddr pte)
 {
-addr = addr_canonical(env, addr);
-
-monitor_printf(mon, HWADDR_FMT_plx ": " HWADDR_FMT_plx
-   " %c%c%c%c%c%c%c%c%c\n",
-   addr,
-   pte & mask,
-   pte & PG_NX_MASK ? 'X' : '-',
-   pte & PG_GLOBAL_MASK ? 'G' : '-',
-   pte & PG_PSE_MASK ? 'P' : '-',
-   pte & PG_DIRTY_MASK ? 'D' : '-',
-   pte & PG_ACCESSED_MASK ? 'A' : '-',
-   pte & PG_PCD_MASK ? 'C' : '-',
-   pte & PG_PWT_MASK ? 'T' : '-',
-   pte & PG_USER_MASK ? 'U' : '-',
-   pte & PG_RW_MASK ? 'W' : '-');
-}
+char buf[128];
+char *pos = buf;
 
-static void tlb_info_32(Monitor *mon, CPUArchState *env)
-{
-unsigned int l1, l2;
-uint32_t pgd, pde, pte;
+addr = addr_canonical(env, addr);
 
-pgd = env->cr[3] & ~0xfff;
-for(l1 = 0; l1 < 1024; l1++) {
-cpu_physical_memory_read(pgd + l1 * 4, , 4);
-pde = le32_to_cpu(pde);
-if (pde & PG_PRESENT_MASK) {
-if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) {
-/* 4M pages */
-print_pte(mon, env, (l1 << 22), pde, ~((1 << 21) - 1));
-} else {
-for(l2 = 0; l2 < 1024; l2++) {
-cpu_physical_memory_read((pde & ~0xfff) + l2 * 4, , 4);
-pte = le32_to_cpu(pte);
-if (pte & PG_PRESENT_MASK) {
-print_pte(mon, env, (l1 << 22) + (l2 << 12),
-  pte & ~PG_PSE_MASK,
-  ~0xfff);
-}
-}
-}
-}
-}
-}
+pos += sprintf(pos, HWADDR_FMT_plx ": " HWADDR_FMT_plx " ", addr,
+   (hwaddr) (pte & PG_ADDRESS_MASK));
 
-static void tlb_info_pae32(Monitor *mon, CPUArchState *env)
-{
-unsigned int l1, l2, l3;
-uint64_t pdpe, pde, pte;
-uint64_t pdp_addr, pd_addr, pt_addr;
+pos += sprintf(pos, " %s", pg_bits(pte));
 
-pdp_addr = env->cr[3] & ~0x1f;
-for (l1 = 0; l1 < 4; l1++) {
-cpu_physical_memory_read(pdp_addr + l1 * 8, , 8);
-pdpe = le64_to_cpu(pdpe);
-if (pdpe & PG_PRESENT_MASK) {
-pd_addr = pdpe & 0x3f000ULL;
-for (l2 = 0; l2 < 512; l2++) {
-cpu_physical_memory_read(pd_addr + l2 * 8, , 8);
-pde = le64_to_cpu(pde);
-if (pde & PG_PRESENT_MASK) {
-if (pde & PG_PSE_MASK) {
-/* 2M pages with PAE, CR4.PSE is ignored */
-print_pte(mon, env, (l1 << 30) + (l2 << 21), pde,
-  ~((hwaddr)(1 << 20) - 1));
-} else {
-pt_addr = pde & 0x3f000ULL;
-for (l3 = 0; l3 < 512; l3++) {
-cpu_physical_memory_read(pt_addr + l3 * 8, , 
8);
-pte = le64_to_cpu(pte);
-if (pte & PG_PRESENT_MASK) {
-print_pte(mon, env, (l1 << 30) + (l2 << 21)
-  + (l3 << 12),
-  pte & ~PG_PSE_MASK,
-  ~(hwaddr)0xfff);
-}
-}
-}
-}
-}
-}
+/* Trim line to fit screen */
+if (pos - buf > 79) {
+strcpy(buf + 77, "..");
 }
-}
 
-#ifdef TARGET_X86_64
-static void tlb_info_la48(Monitor *mon, CPUArchState *env,
-uint64_t l0, uint64_t pml4_addr)
-{
-uint64_t l1, l2, l3, l4;
-uint64_t pml4e, pdpe, pde, pte;
-uint64_t pdp_addr, pd_addr, pt_addr;
-
-for (l1 = 0; l1 < 512; l1++) {
-cpu_physical_memory_read(pml4_addr + l1 * 8, , 8);
-pml4e = le64_to_cpu(pml4e);
-if (!(pml4e & PG_PRESENT_MASK)) {
-continue;
-}
-
-pdp_addr = pml4e & 0x3f000ULL;
-for (l2 = 0; l2 < 512; l2++) {
-cpu_physical_memory_rea

[PATCH v2 6/6] Convert x86_mmu_translate() to use common code.

2024-05-24 Thread Don Porter
Signed-off-by: Don Porter 
---
 target/i386/arch_memory_mapping.c|  37 ++-
 target/i386/cpu.h|  11 +-
 target/i386/helper.c | 371 ++-
 target/i386/tcg/sysemu/excp_helper.c |   2 +-
 4 files changed, 128 insertions(+), 293 deletions(-)

diff --git a/target/i386/arch_memory_mapping.c 
b/target/i386/arch_memory_mapping.c
index 040464dd34..9ea5aeff16 100644
--- a/target/i386/arch_memory_mapping.c
+++ b/target/i386/arch_memory_mapping.c
@@ -33,7 +33,7 @@
  * Returns a hardware address on success.  Should not fail (i.e., caller is
  * responsible to ensure that a page table is actually present).
  */
-static hwaddr mmu_page_table_root(CPUState *cs, int *height)
+hwaddr mmu_page_table_root(CPUState *cs, int *height)
 {
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
@@ -228,6 +228,35 @@ static void _mmu_decode_va_parameters(CPUState *cs, int 
height,
 }
 }
 
+/**
+ * mmu_virtual_to_pte_index - Given a virtual address and height in the
+ *   page table radix tree, return the index that should be used
+ *   to look up the next page table entry (pte) in translating an
+ *   address.
+ *
+ * @cs - CPU state
+ * @vaddr - The virtual address to translate
+ * @height - height of node within the tree (leaves are 1, not 0).
+ *
+ * Example: In 32-bit x86 page tables, the virtual address is split
+ * into 10 bits at height 2, 10 bits at height 1, and 12 offset bits.
+ * So a call with VA and height 2 would return the first 10 bits of va,
+ * right shifted by 22.
+ */
+
+int mmu_virtual_to_pte_index(CPUState *cs, target_ulong vaddr, int height)
+{
+int shift = 0;
+int width = 0;
+int mask = 0;
+
+_mmu_decode_va_parameters(cs, height, , );
+
+mask = (1 << width) - 1;
+
+return (vaddr >> shift) & mask;
+}
+
 /**
  * get_pte - Copy the contents of the page table entry at node[i] into 
pt_entry.
  *   Optionally, add the relevant bits to the virtual address in
@@ -247,7 +276,7 @@ static void _mmu_decode_va_parameters(CPUState *cs, int 
height,
  *  Optional parameter.
  */
 
-static void
+void
 get_pte(CPUState *cs, hwaddr node, int i, int height,
 PTE_t *pt_entry, target_ulong vaddr_parent, target_ulong *vaddr_pte,
 hwaddr *pte_paddr)
@@ -284,7 +313,7 @@ get_pte(CPUState *cs, hwaddr node, int i, int height,
 }
 
 
-static bool
+bool
 mmu_pte_check_bits(CPUState *cs, PTE_t *pte, int64_t mask)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -300,7 +329,7 @@ mmu_pte_check_bits(CPUState *cs, PTE_t *pte, int64_t mask)
  * mmu_pte_presetn - Return true if the pte is
  *   marked 'present'
  */
-static bool
+bool
 mmu_pte_present(CPUState *cs, PTE_t *pte)
 {
 return mmu_pte_check_bits(cs, pte, PG_PRESENT_MASK);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 39ce49e61f..51d4a55e6b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2151,15 +2151,23 @@ int x86_cpu_write_elf64_qemunote(WriteCoreDumpFunction 
f, CPUState *cpu,
 int x86_cpu_write_elf32_qemunote(WriteCoreDumpFunction f, CPUState *cpu,
  DumpState *s);
 
+hwaddr mmu_page_table_root(CPUState *cs, int *height);
+bool mmu_pte_check_bits(CPUState *cs, PTE_t *pte, int64_t mask);
+bool mmu_pte_present(CPUState *cs, PTE_t *pte);
 bool mmu_pte_leaf(CPUState *cs, int height, PTE_t *pte);
 target_ulong mmu_pte_leaf_page_size(CPUState *cs, int height);
 hwaddr mmu_pte_child(CPUState *cs, PTE_t *pte, int height);
 int mmu_page_table_entries_per_node(CPUState *cs, int height);
+int mmu_virtual_to_pte_index(CPUState *cs, target_ulong vaddr, int height);
 bool for_each_pte(CPUState *cs,
   int (*fn)(CPUState *cs, void *data, PTE_t *pte,
 target_ulong vaddr, int height, int offset),
   void *data, bool visit_interior_nodes,
   bool visit_not_present);
+void get_pte(CPUState *cs, hwaddr node, int i, int height, PTE_t *pt_entry,
+ target_ulong vaddr_parent, target_ulong *vaddr_pte,
+ hwaddr *pte_paddr);
+
 
 bool x86_cpu_get_memory_mapping(CPUState *cpu, MemoryMappingList *list,
 Error **errp);
@@ -2176,7 +2184,8 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env);
 bool x86_cpu_get_physical_address(CPUX86State *env, vaddr addr,
   MMUAccessType access_type, int mmu_idx,
   X86TranslateResult *out,
-  X86TranslateFault *err, uint64_t ra);
+  X86TranslateFault *err, uint64_t ra,
+  bool read_only);
 
 hwaddr x86_cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
  MemTxAttrs *attrs);
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 21445e84b2..17ffba200d 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper

[PATCH v2 1/6] Add an "info pg" command that prints the current page tables

2024-05-24 Thread Don Porter
The new "info pg" monitor command prints the current page table,
including virtual address ranges, flag bits, and snippets of physical
page numbers.  Completely filled regions of the page table with
compatible flags are "folded", with the result that the complete
output for a freshly booted x86-64 Linux VM can fit in a single
terminal window.  The output looks like this:

VPN range Entry FlagsPhysical page
[7f000-7f000] PML4[0fe] ---DA--UWP
  [7f28c-7f28f]  PDP[0a3] ---DA--UWP
[7f28c4600-7f28c47ff]  PDE[023] ---DA--UWP
  [7f28c4655-7f28c4656]  PTE[055-056] X--D---U-P 007f14-007f15
  [7f28c465b-7f28c465b]  PTE[05b] A--U-P 001cfc
...
[ff800-ff800] PML4[1ff] ---DA--UWP
  [8-b]  PDP[1fe] ---DA---WP
[81000-81dff]  PDE[008-00e] -GSDA---WP 001000-001dff
  [c-f]  PDP[1ff] ---DA--UWP
[ff400-ff5ff]  PDE[1fa] ---DA--UWP
  [ff5fb-ff5fc]  PTE[1fb-1fc] XG-DACT-WP 0fec00 0fee00
[ff600-ff7ff]  PDE[1fb] ---DA--UWP
  [ff600-ff600]  PTE[000] -G-DA--U-P 001467

This draws heavy inspiration from Austin Clements' original patch.

This also adds a generic page table walker, which other monitor
and execution commands will be migrated to in subsequent patches.

Signed-off-by: Don Porter 
---
 hmp-commands-info.hx  |  26 ++
 include/monitor/hmp-target.h  |   1 +
 target/i386/arch_memory_mapping.c | 486 +-
 target/i386/cpu.h |  16 +
 target/i386/monitor.c | 380 +++
 5 files changed, 908 insertions(+), 1 deletion(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 20a9835ea8..918b82015c 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -237,6 +237,32 @@ ERST
 .cmd= hmp_info_mtree,
 },
 
+#if defined(TARGET_I386)
+{
+.name   = "pg",
+.args_type  = "",
+.params = "",
+.help   = "show the page table",
+.cmd= hmp_info_pg,
+},
+#endif
+
+SRST   
|
+  ``info pg``  
|
+Show the active page table.
|
+ERST
+
+{
+.name   = "mtree",
+.args_type  = "flatview:-f,dispatch_tree:-d,owner:-o,disabled:-D",
+.params = "[-f][-d][-o][-D]",
+.help   = "show memory tree (-f: dump flat view for address 
spaces;"
+  "-d: dump dispatch tree, valid with -f only);"
+  "-o: dump region owners/parents;"
+  "-D: dump disabled regions",
+.cmd= hmp_info_mtree,
+},
+
 SRST
   ``info mtree``
 Show memory tree.
diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h
index b679aaebbf..9af72ea58d 100644
--- a/include/monitor/hmp-target.h
+++ b/include/monitor/hmp-target.h
@@ -50,6 +50,7 @@ CPUState *mon_get_cpu(Monitor *mon);
 void hmp_info_mem(Monitor *mon, const QDict *qdict);
 void hmp_info_tlb(Monitor *mon, const QDict *qdict);
 void hmp_mce(Monitor *mon, const QDict *qdict);
+void hmp_info_pg(Monitor *mon, const QDict *qdict);
 void hmp_info_local_apic(Monitor *mon, const QDict *qdict);
 void hmp_info_sev(Monitor *mon, const QDict *qdict);
 void hmp_info_sgx(Monitor *mon, const QDict *qdict);
diff --git a/target/i386/arch_memory_mapping.c 
b/target/i386/arch_memory_mapping.c
index d1ff659128..00bf2a2116 100644
--- a/target/i386/arch_memory_mapping.c
+++ b/target/i386/arch_memory_mapping.c
@@ -15,6 +15,491 @@
 #include "cpu.h"
 #include "sysemu/memory_mapping.h"
 
+/**
+ ** code hook implementations for x86 ***
+ */
+
+#define PML4_ADDR_MASK 0xff000ULL /* selects bits 51:12 */
+
+/**
+ * mmu_page_table_root - Given a CPUState, return the physical address
+ *   of the current page table root, as well as
+ *   write the height of the tree into *height.
+ *
+ * @cs - CPU state
+ * @height - a pointer to an integer, to store the page table tree height
+ *
+ * Returns a hardware address on success.  Should not fail (i.e., caller is
+ * responsible to ensure that a page table is actually present).
+ */
+static hwaddr mmu_page_table_root(CPUState *cs, int *height)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
+/*
+ * DEP 5/15/24: Some original page table walking code sets the a20
+ * mask as a 32 bit integer and checks it on each level of hte
+ * page table walk; some only checks it against the final result.
+ * For 64 bits, I th

[PATCH v2 4/6] Convert x86_cpu_get_memory_mapping() to use generic iterators

2024-05-24 Thread Don Porter
Signed-off-by: Don Porter 
---
 target/i386/arch_memory_mapping.c | 318 --
 1 file changed, 40 insertions(+), 278 deletions(-)

diff --git a/target/i386/arch_memory_mapping.c 
b/target/i386/arch_memory_mapping.c
index 00bf2a2116..040464dd34 100644
--- a/target/i386/arch_memory_mapping.c
+++ b/target/i386/arch_memory_mapping.c
@@ -19,6 +19,7 @@
  ** code hook implementations for x86 ***
  */
 
+/* PAE Paging or IA-32e Paging */
 #define PML4_ADDR_MASK 0xff000ULL /* selects bits 51:12 */
 
 /**
@@ -499,302 +500,63 @@ bool for_each_pte(CPUState *cs,
 /**
  * Back to x86 hooks
  */
+struct memory_mapping_data {
+MemoryMappingList *list;
+};
 
-/* PAE Paging or IA-32e Paging */
-static void walk_pte(MemoryMappingList *list, AddressSpace *as,
- hwaddr pte_start_addr,
- int32_t a20_mask, target_ulong start_line_addr)
-{
-hwaddr pte_addr, start_paddr;
-uint64_t pte;
-target_ulong start_vaddr;
-int i;
-
-for (i = 0; i < 512; i++) {
-pte_addr = (pte_start_addr + i * 8) & a20_mask;
-pte = address_space_ldq(as, pte_addr, MEMTXATTRS_UNSPECIFIED, NULL);
-if (!(pte & PG_PRESENT_MASK)) {
-/* not present */
-continue;
-}
-
-start_paddr = (pte & ~0xfff) & ~(0x1ULL << 63);
-if (cpu_physical_memory_is_io(start_paddr)) {
-/* I/O region */
-continue;
-}
-
-start_vaddr = start_line_addr | ((i & 0x1ff) << 12);
-memory_mapping_list_add_merge_sorted(list, start_paddr,
- start_vaddr, 1 << 12);
-}
-}
-
-/* 32-bit Paging */
-static void walk_pte2(MemoryMappingList *list, AddressSpace *as,
-  hwaddr pte_start_addr, int32_t a20_mask,
-  target_ulong start_line_addr)
-{
-hwaddr pte_addr, start_paddr;
-uint32_t pte;
-target_ulong start_vaddr;
-int i;
-
-for (i = 0; i < 1024; i++) {
-pte_addr = (pte_start_addr + i * 4) & a20_mask;
-pte = address_space_ldl(as, pte_addr, MEMTXATTRS_UNSPECIFIED, NULL);
-if (!(pte & PG_PRESENT_MASK)) {
-/* not present */
-continue;
-}
-
-start_paddr = pte & ~0xfff;
-if (cpu_physical_memory_is_io(start_paddr)) {
-/* I/O region */
-continue;
-}
-
-start_vaddr = start_line_addr | ((i & 0x3ff) << 12);
-memory_mapping_list_add_merge_sorted(list, start_paddr,
- start_vaddr, 1 << 12);
-}
-}
-
-/* PAE Paging or IA-32e Paging */
-#define PLM4_ADDR_MASK 0xff000ULL /* selects bits 51:12 */
-
-static void walk_pde(MemoryMappingList *list, AddressSpace *as,
- hwaddr pde_start_addr,
- int32_t a20_mask, target_ulong start_line_addr)
+static int add_memory_mapping_to_list(CPUState *cs, void *data, PTE_t *pte,
+  target_ulong vaddr, int height,
+  int offset)
 {
-hwaddr pde_addr, pte_start_addr, start_paddr;
-uint64_t pde;
-target_ulong line_addr, start_vaddr;
-int i;
-
-for (i = 0; i < 512; i++) {
-pde_addr = (pde_start_addr + i * 8) & a20_mask;
-pde = address_space_ldq(as, pde_addr, MEMTXATTRS_UNSPECIFIED, NULL);
-if (!(pde & PG_PRESENT_MASK)) {
-/* not present */
-continue;
-}
-
-line_addr = start_line_addr | ((i & 0x1ff) << 21);
-if (pde & PG_PSE_MASK) {
-/* 2 MB page */
-start_paddr = (pde & ~0x1f) & ~(0x1ULL << 63);
-if (cpu_physical_memory_is_io(start_paddr)) {
-/* I/O region */
-continue;
-}
-start_vaddr = line_addr;
-memory_mapping_list_add_merge_sorted(list, start_paddr,
- start_vaddr, 1 << 21);
-continue;
-}
-
-pte_start_addr = (pde & PLM4_ADDR_MASK) & a20_mask;
-walk_pte(list, as, pte_start_addr, a20_mask, line_addr);
-}
-}
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = >env;
 
-/* 32-bit Paging */
-static void walk_pde2(MemoryMappingList *list, AddressSpace *as,
-  hwaddr pde_start_addr, int32_t a20_mask,
-  bool pse)
-{
-hwaddr pde_addr, pte_start_addr, start_paddr, high_paddr;
-uint32_t pde;
-target_ulong line_addr, start_vaddr;
-int i;
+struct memory_mapping_data *mm_data = (struct memory_mapping_data *) data;
 
-for (i = 0; i < 1024; i++) {
-pde_addr = (pde_start_addr + i * 4) & a20_mask;
-pde = address_space_ldl(as, pde_addr, MEMTXATTRS_UNSPECIFIED, NULL);
-   

[PATCH v2 0/6] Rework x86 page table walks

2024-05-24 Thread Don Porter
This version of the 'info pg' command adopts Peter Maydell's request
to write some guest-agnostic page table iterator and accessor code,
along with architecture-specific hooks.  The first patch in this
series contributes a generic page table iterator and an x86
instantiation.  As a client, we first introduce an 'info pg' monitor
command, as well as a compressing callback hook for creating succinct
page table representations.

After this, each successive patch replaces an exisitng x86 page table
walker with a use of common iterator code.

I could use advice on how to ensure this is sufficiently well tested.
I used 'make check' and 'make check-avocado', which both pass; what is
the typical standard for testing something like a page table related
change?

As far as generality, I have only attempted this on x86, but I expect
the design would work for any similar radix-tree style page table.

I am still new enough to the code base that I wasn't certain about
where to put the generic code, as well as naming conventions.

Per David Gilbert's suggestion, I was careful to ensure that monitor
calls do not perturb TLB state (see the read-only flag in some
functions).

I appreciate Nadav's suggestion about other ways to pursue the same
goal: I ended up deciding I would like to try my hand at consolidating
the x86 page table code.

Don Porter (6):
  Add an "info pg" command that prints the current page tables
  Convert 'info tlb' to use generic iterator
  Convert 'info mem' to use generic iterator
  Convert x86_cpu_get_memory_mapping() to use generic iterators
  Move tcg implementation of x86 get_physical_address into common helper
code.
  Convert x86_mmu_translate() to use common code.

 hmp-commands-info.hx |  26 +
 include/monitor/hmp-target.h |   1 +
 target/i386/arch_memory_mapping.c| 735 +++---
 target/i386/cpu.h|  66 ++
 target/i386/helper.c | 518 
 target/i386/monitor.c| 883 +--
 target/i386/tcg/sysemu/excp_helper.c | 555 +
 7 files changed, 1439 insertions(+), 1345 deletions(-)

--
2.34.1



[PATCH v2 3/6] Convert 'info mem' to use generic iterator

2024-05-24 Thread Don Porter
Signed-off-by: Don Porter 
---
 target/i386/monitor.c | 344 +-
 1 file changed, 35 insertions(+), 309 deletions(-)

diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index adf95edfb4..147743392d 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -480,332 +480,58 @@ void hmp_info_tlb(Monitor *mon, const QDict *qdict)
 for_each_pte(cs, _print_tlb, , false, false);
 }
 
-static void mem_print(Monitor *mon, CPUArchState *env,
-  hwaddr *pstart, int *plast_prot,
-  hwaddr end, int prot)
-{
-int prot1;
-prot1 = *plast_prot;
-if (prot != prot1) {
-if (*pstart != -1) {
-monitor_printf(mon, HWADDR_FMT_plx "-" HWADDR_FMT_plx " "
-   HWADDR_FMT_plx " %c%c%c\n",
-   addr_canonical(env, *pstart),
-   addr_canonical(env, end),
-   addr_canonical(env, end - *pstart),
-   prot1 & PG_USER_MASK ? 'u' : '-',
-   'r',
-   prot1 & PG_RW_MASK ? 'w' : '-');
-}
-if (prot != 0)
-*pstart = end;
-else
-*pstart = -1;
-*plast_prot = prot;
-}
-}
-
-static void mem_info_32(Monitor *mon, CPUArchState *env)
+static
+bool mem_print(CPUState *cs, struct mem_print_state *state)
 {
-unsigned int l1, l2;
-int prot, last_prot;
-uint32_t pgd, pde, pte;
-hwaddr start, end;
-
-pgd = env->cr[3] & ~0xfff;
-last_prot = 0;
-start = -1;
-for(l1 = 0; l1 < 1024; l1++) {
-cpu_physical_memory_read(pgd + l1 * 4, , 4);
-pde = le32_to_cpu(pde);
-end = l1 << 22;
-if (pde & PG_PRESENT_MASK) {
-if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_PSE_MASK)) {
-prot = pde & (PG_USER_MASK | PG_RW_MASK | PG_PRESENT_MASK);
-mem_print(mon, env, , _prot, end, prot);
-} else {
-for(l2 = 0; l2 < 1024; l2++) {
-cpu_physical_memory_read((pde & ~0xfff) + l2 * 4, , 4);
-pte = le32_to_cpu(pte);
-end = (l1 << 22) + (l2 << 12);
-if (pte & PG_PRESENT_MASK) {
-prot = pte & pde &
-(PG_USER_MASK | PG_RW_MASK | PG_PRESENT_MASK);
-} else {
-prot = 0;
-}
-mem_print(mon, env, , _prot, end, prot);
-}
-}
-} else {
-prot = 0;
-mem_print(mon, env, , _prot, end, prot);
-}
-}
-/* Flush last range */
-mem_print(mon, env, , _prot, (hwaddr)1 << 32, 0);
-}
+CPUArchState *env = state->env;
+int i = 0;
 
-static void mem_info_pae32(Monitor *mon, CPUArchState *env)
-{
-unsigned int l1, l2, l3;
-int prot, last_prot;
-uint64_t pdpe, pde, pte;
-uint64_t pdp_addr, pd_addr, pt_addr;
-hwaddr start, end;
-
-pdp_addr = env->cr[3] & ~0x1f;
-last_prot = 0;
-start = -1;
-for (l1 = 0; l1 < 4; l1++) {
-cpu_physical_memory_read(pdp_addr + l1 * 8, , 8);
-pdpe = le64_to_cpu(pdpe);
-end = l1 << 30;
-if (pdpe & PG_PRESENT_MASK) {
-pd_addr = pdpe & 0x3f000ULL;
-for (l2 = 0; l2 < 512; l2++) {
-cpu_physical_memory_read(pd_addr + l2 * 8, , 8);
-pde = le64_to_cpu(pde);
-end = (l1 << 30) + (l2 << 21);
-if (pde & PG_PRESENT_MASK) {
-if (pde & PG_PSE_MASK) {
-prot = pde & (PG_USER_MASK | PG_RW_MASK |
-  PG_PRESENT_MASK);
-mem_print(mon, env, , _prot, end, prot);
-} else {
-pt_addr = pde & 0x3f000ULL;
-for (l3 = 0; l3 < 512; l3++) {
-cpu_physical_memory_read(pt_addr + l3 * 8, , 
8);
-pte = le64_to_cpu(pte);
-end = (l1 << 30) + (l2 << 21) + (l3 << 12);
-if (pte & PG_PRESENT_MASK) {
-prot = pte & pde & (PG_USER_MASK | PG_RW_MASK |
-PG_PRESENT_MASK);
-} else {
-prot = 0;
-}
-mem_print(mon, env, , _prot, end, prot);
-}
-}
-} else {
-prot = 0;
-mem_print(mon, env, , _prot, end, prot);
-

Re: Add 'info pg' command to monitor

2024-04-16 Thread Don Porter

On 4/16/24 13:03, Peter Maydell wrote:

On Tue, 16 Apr 2024 at 17:53, Don Porter  wrote:

There is still a lot I am learning about the code base, but it seems
that qemu_get_guest_memory_mapping() does most of what one would need.
It currently only returns the "leaves" of the page table tree in a list.

What if I extend this function with an optional argument to either
1) return the interior nodes of the page table in additional lists (and
then parse+print in the monitor code), or
2) inline the monitor printing in the arch-specific hook, and pass a
flag to get_guest_memory_mapping() that turns on/off the statements that
pretty print the page tables?

It looks like most CPUs implement this function as part of checkpointing.

As far as I can see only x86 implements the get_memory_mapping
function, so once again somebody has added some bit of
functionality that does a walk of the page tables that is
x86 only and that shares no code with any of the other
page table walking code :-(


My mistake - get_memory_mappings() is only implemented in x86.

In doing some searching of the code, many architectures implement 
mmu_translate() and
get_physical_address() functions, but they are not standardized. I also 
see your larger point

about replicating page walking code in x86.

I imagine you have something in mind that abstracts things like the 
height of the radix tree,

entries per node, checking permissions, printing the contents, etc.

Perhaps I should start by trying to merge the x86 page walking code into 
one set of common
helper functions, get more feedback (perhaps on a new patch thread?), 
and then consider

how to abstract across architectures after getting feedback on this?

In looking at x86 code, I see the following places where there is page 
table walking code to

potentially merge:

* target/i386/monitor.c - existing info commands

* target/i386/helper.c - get_phys_page_attrs_debug

* target/i386/arch_memory_mapping.c - implements get_memory_mapping

* tcg/sysemu/excp_helper.c: implements mmu_translate() and 
get_physical_address()


Thanks,

Don




Re: Add 'info pg' command to monitor

2024-04-16 Thread Don Porter

On 4/15/24 12:37, Peter Maydell wrote:

On Mon, 15 Apr 2024 at 17:09, Don Porter  wrote:

This patch set adds an 'info pg' command to the monitor, which prints
a nicer view of the page tables.  A project in my graduate OS course
involves implementing x86 page table support, and my students have
found this helpful for debugging.
So, my issue with this is that it's x86 specific, and it adds
yet another monitor command that is doing "show some kind of debug
info related to the guest page tables", along with "info mem"
and "info tlb". Plus it is yet another lump of code that's
walking the guest page tables and interpreting them.

What I'd really like to see is some infrastructure that is
at least somewhat guest-architecture-agnostic, so we can
minimise what a guest architecture needs to implement (and
then make providing that mandatory).

The other thing I'd like to see is perhaps some investigation of
whether there's any way to implement something useful by
using/extending the existing get_phys_page_attrs_debug() and
similar functions, so that you don't have to write one lot
of page-table-walking code for QEMU to use when it's executing
guest code and a separate lot (that's bound to get out of
sync or not support new functionality/changes) that's only
handling these monitor debug commands. There's a lot of
complexity in figuring out things like permissions in a
modern architecture...

thanks
-- PMM
  


Thank you for the feedback!

There is still a lot I am learning about the code base, but it seems 
that qemu_get_guest_memory_mapping() does most of what one would need.  
It currently only returns the "leaves" of the page table tree in a list.


What if I extend this function with an optional argument to either
1) return the interior nodes of the page table in additional lists (and 
then parse+print in the monitor code), or
2) inline the monitor printing in the arch-specific hook, and pass a 
flag to get_guest_memory_mapping() that turns on/off the statements that 
pretty print the page tables?


It looks like most CPUs implement this function as part of checkpointing.

Thoughts?

Don




[PATCH 1/2] monitor: Implement a generic x86 page table iterator

2024-04-15 Thread Don Porter
From: Austin Clements 

This iterator provides a way to traverse 32-bit, PAE, and 64-bit page
tables by abstracting them as n-ary trees.  A struct describes the
full range of x86 page table layouts and the iterator builds on this
to provide a "successor" function for efficiently traversing the page
table tree.

This code is currently unused, but provides the groundwork for the
following "info pg" patch.  It could also be used to unify and
simplify the implementations of "info mem" and "info tlb".

Signed-off-by: Austin Clements 
[geo...@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas 
Signed-off-by: Don Porter 
---
 target/i386/monitor.c | 149 ++
 1 file changed, 149 insertions(+)

diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 3a281dab02..7924de6520 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -34,6 +34,155 @@
 #include "qapi/qapi-commands-misc-target.h"
 #include "qapi/qapi-commands-misc.h"
 
+/**
+ * PTLayout describes the layout of an x86 page table in enough detail
+ * to fully decode up to a 4-level 64-bit page table tree.
+ */
+typedef struct PTLayout {
+int levels, entsize;
+int entries[4]; /* Entries in each table level */
+int shift[4];   /* VA bit shift each each level */
+bool pse[4];/* Whether PSE bit is valid */
+const char *names[4];
+int vaw, paw;   /* VA and PA width in characters */
+} PTLayout;
+
+/**
+ * PTIter provides a generic way to traverse and decode an x86 page
+ * table tree.
+ */
+typedef struct PTIter {
+const PTLayout *layout;
+bool pse;   /* PSE enabled */
+int level;  /* Current level */
+int i[4];   /* Index at each level */
+hwaddr base[4]; /* Physical base pointer */
+
+uint64_t ent;   /* Current entry */
+bool present, leaf;
+target_ulong va;
+hwaddr pa;
+target_ulong  size;
+} PTIter;
+
+static bool ptiter_succ(PTIter *it);
+
+/**
+ * Initialize a PTIter to point to the first entry of the page table's
+ * top level.  On failure, prints a message to mon and returns false.
+ */
+static bool
+__attribute__ ((unused))
+ptiter_init(Monitor *mon, PTIter *it)
+{
+static const PTLayout l32 = {
+2, 4, {1024, 1024}, {22, 12}, {1, 0}, {"PDE", "PTE"}, 8, 8
+};
+static const PTLayout lpae = {
+3, 8, {4, 512, 512}, {30, 21, 12}, {0, 1, 0},
+{"PDP", "PDE", "PTE"}, 8, 13
+};
+#ifdef TARGET_X86_64
+static const PTLayout l64 = {
+4, 8, {512, 512, 512, 512}, {39, 30, 21, 12}, {0, 1, 1, 0},
+{"PML4", "PDP", "PDE", "PTE"}, 12, 13
+};
+#endif
+CPUArchState *env;
+
+env = mon_get_cpu_env(mon);
+
+if (!(env->cr[0] & CR0_PG_MASK)) {
+monitor_printf(mon, "PG disabled\n");
+return false;
+}
+
+memset(it, 0, sizeof(*it));
+if (env->cr[4] & CR4_PAE_MASK) {
+#ifdef TARGET_X86_64
+if (env->hflags & HF_LMA_MASK) {
+it->layout = 
+it->base[0] = env->cr[3] & 0x3f000ULL;
+} else
+#endif
+{
+it->layout = 
+it->base[0] = env->cr[3] & ~0x1f;
+}
+it->pse = true;
+} else {
+it->layout = 
+it->base[0] = env->cr[3] & ~0xfff;
+it->pse = (env->cr[4] & CR4_PSE_MASK);
+}
+
+/* Trick ptiter_succ into doing the hard initialization. */
+it->i[0] = -1;
+it->leaf = true;
+ptiter_succ(it);
+return true;
+}
+
+/**
+ * Move a PTIter to the successor of the current entry.  Specifically:
+ * if the iterator points to a leaf, move to its next sibling, or to
+ * the next sibling of a parent if it has no more siblings.  If the
+ * iterator points to a non-leaf, move to its first child.  If there
+ * is no successor, return false.
+ *
+ * Note that the resulting entry may not be marked present, though
+ * non-present entries are always leafs (within a page
+ * table/directory/etc, this will always visit all entries).
+ */
+static bool ptiter_succ(PTIter *it)
+{
+int i, l, entsize;
+uint64_t ent64;
+uint32_t ent32;
+bool large;
+
+if (it->level < 0) {
+return false;
+} else if (!it->leaf) {
+/* Move to this entry's first child */
+it->level++;
+it->base[it->level] = it->pa;
+it->i[it->level] = 0;
+} else {
+/* Move forward and, if we hit the end of this level, up */
+while (++it->i[it->level] == it->layout->entries[it->level]) {
+if (it->level-- == 0) {
+/* We're out of page

Add 'info pg' command to monitor

2024-04-15 Thread Don Porter
Hi all,

I am a CS professor (and, newly, a second-time contributor). I have
been using qemu in my courses for over a decade, especially a course
that asks students to write major pieces of an OS kernel from starter
code.

I have some patches, originally from Austin Clements at MIT, that I
have found useful over the years and that may be useful to others.  It
would also be nice not to have to build a custom qemu each semester.  I
have cleared upstreaming these with Austin, the original author.

This patch set adds an 'info pg' command to the monitor, which prints
a nicer view of the page tables.  A project in my graduate OS course
involves implementing x86 page table support, and my students have
found this helpful for debugging.

Thank you in advance for your time,
Don Porter





[PATCH 2/2] monitor: Add an "info pg" command that prints the current page tables

2024-04-15 Thread Don Porter
From: Austin Clements 

The new "info pg" monitor command prints the current page table,
including virtual address ranges, flag bits, and snippets of physical
page numbers.  Completely filled regions of the page table with
compatible flags are "folded", with the result that the complete
output for a freshly booted x86-64 Linux VM can fit in a single
terminal window.  The output looks like this:

VPN range Entry FlagsPhysical page
[7f000-7f000] PML4[0fe] ---DA--UWP
  [7f28c-7f28f]  PDP[0a3] ---DA--UWP
[7f28c4600-7f28c47ff]  PDE[023] ---DA--UWP
  [7f28c4655-7f28c4656]  PTE[055-056] X--D---U-P 007f14-007f15
  [7f28c465b-7f28c465b]  PTE[05b] A--U-P 001cfc
...
[ff800-ff800] PML4[1ff] ---DA--UWP
  [8-b]  PDP[1fe] ---DA---WP
[81000-81dff]  PDE[008-00e] -GSDA---WP 001000-001dff
  [c-f]  PDP[1ff] ---DA--UWP
[ff400-ff5ff]  PDE[1fa] ---DA--UWP
  [ff5fb-ff5fc]  PTE[1fb-1fc] XG-DACT-WP 0fec00 0fee00
[ff600-ff7ff]  PDE[1fb] ---DA--UWP
  [ff600-ff600]  PTE[000] -G-DA--U-P 001467

Signed-off-by: Austin Clements 
[geo...@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas 
Signed-off-by: Don Porter 
---
 hmp-commands-info.hx |  15 +++
 include/monitor/hmp-target.h |   2 +
 target/i386/monitor.c| 179 ++-
 3 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index ad1b1306e3..ae7de74041 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -239,6 +239,21 @@ SRST
 Show the active virtual memory mappings.
 ERST
 
+#if defined(TARGET_I386)
+{
+.name   = "pg",
+.args_type  = "",
+.params = "",
+.help   = "show the page table",
+.cmd= hmp_info_pg,
+},
+#endif
+
+SRST   
|
+  ``info pg``  
|
+Show the active page table.
|
+ERST
+
 {
 .name   = "mtree",
 .args_type  = "flatview:-f,dispatch_tree:-d,owner:-o,disabled:-D",
diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h
index d78e979f05..68f58d2a31 100644
--- a/include/monitor/hmp-target.h
+++ b/include/monitor/hmp-target.h
@@ -48,6 +48,8 @@ void hmp_info_mem(Monitor *mon, const QDict *qdict);
 void hmp_info_tlb(Monitor *mon, const QDict *qdict);
 void hmp_mce(Monitor *mon, const QDict *qdict);
 void hmp_info_local_apic(Monitor *mon, const QDict *qdict);
+void hmp_info_io_apic(Monitor *mon, const QDict *qdict);
+void hmp_info_pg(Monitor *mon, const QDict *qdict);
 void hmp_info_sev(Monitor *mon, const QDict *qdict);
 void hmp_info_sgx(Monitor *mon, const QDict *qdict);
 void hmp_info_via(Monitor *mon, const QDict *qdict);
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 7924de6520..4cf39a4140 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -72,7 +72,6 @@ static bool ptiter_succ(PTIter *it);
  * top level.  On failure, prints a message to mon and returns false.
  */
 static bool
-__attribute__ ((unused))
 ptiter_init(Monitor *mon, PTIter *it)
 {
 static const PTLayout l32 = {
@@ -88,10 +87,16 @@ ptiter_init(Monitor *mon, PTIter *it)
 {"PML4", "PDP", "PDE", "PTE"}, 12, 13
 };
 #endif
+
 CPUArchState *env;
 
 env = mon_get_cpu_env(mon);
 
+if (!env) {
+monitor_printf(mon, "No CPU available\n");
+return false;
+}
+
 if (!(env->cr[0] & CR0_PG_MASK)) {
 monitor_printf(mon, "PG disabled\n");
 return false;
@@ -200,6 +205,178 @@ static hwaddr addr_canonical(CPUArchState *env, hwaddr 
addr)
 return addr;
 }
 
+/**
+ * Return true if the page tree rooted at iter is complete and
+ * compatible with compat.  last will be filled with the last entry at
+ * each level.  If false, does not change iter and last can be filled
+ * with anything; if true, returns with iter at the next entry on the
+ * same level, or the next parent entry if iter is on the last entry
+ * of this level.
+ */
+static bool pg_complete(PTIter *root, const PTIter compat[], PTIter last[])
+{
+PTIter iter = *root;
+
+if ((root->ent & 0xfff) != (compat[root->level].ent & 0xfff)) {
+return false;
+}
+
+last[root->level] = *root;
+ptiter_succ();
+if (!root->leaf) {
+/* Are all of the direct children of root complete? */
+while (iter.level == root->level + 1) {
+if (!pg_complete(, compat, last)) {
+return fa

[PATCH v3] e1000: Convert debug macros into tracepoints.

2024-04-08 Thread Don Porter
The E1000 debug messages are very useful for developing drivers.
Make these available to users without recompiling QEMU.

Signed-off-by: Austin Clements 
[geo...@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas 
Signed-off-by: Don Porter 
Reviewed-by: Richard Henderson 
---
 hw/net/e1000.c  | 90 +++--
 hw/net/trace-events | 25 -
 2 files changed, 54 insertions(+), 61 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 43f3a4a701..24475636a3 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -44,26 +44,6 @@
 #include "trace.h"
 #include "qom/object.h"
 
-/* #define E1000_DEBUG */
-
-#ifdef E1000_DEBUG
-enum {
-DEBUG_GENERAL,  DEBUG_IO,   DEBUG_MMIO, DEBUG_INTERRUPT,
-DEBUG_RX,   DEBUG_TX,   DEBUG_MDIC, DEBUG_EEPROM,
-DEBUG_UNKNOWN,  DEBUG_TXSUM,DEBUG_TXERR,DEBUG_RXERR,
-DEBUG_RXFILTER, DEBUG_PHY,  DEBUG_NOTYET,
-};
-#define DBGBIT(x)(1<mac_reg[ICR],
-s->mac_reg[IMS]);
+trace_e1000_set_ics(val, s->mac_reg[ICR], s->mac_reg[IMS]);
 set_interrupt_cause(s, 0, val | s->mac_reg[ICR]);
 }
 
@@ -425,8 +404,7 @@ set_rx_control(E1000State *s, int index, uint32_t val)
 s->mac_reg[RCTL] = val;
 s->rxbuf_size = e1000x_rxbufsize(val);
 s->rxbuf_min_shift = ((val / E1000_RCTL_RDMTS_QUAT) & 3) + 1;
-DBGOUT(RX, "RCTL: %d, mac_reg[RCTL] = 0x%x\n", s->mac_reg[RDT],
-   s->mac_reg[RCTL]);
+trace_e1000_set_rx_control(s->mac_reg[RDT], s->mac_reg[RCTL]);
 timer_mod(s->flush_queue_timer,
   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 1000);
 }
@@ -440,16 +418,16 @@ set_mdic(E1000State *s, int index, uint32_t val)
 if ((val & E1000_MDIC_PHY_MASK) >> E1000_MDIC_PHY_SHIFT != 1) // phy #
 val = s->mac_reg[MDIC] | E1000_MDIC_ERROR;
 else if (val & E1000_MDIC_OP_READ) {
-DBGOUT(MDIC, "MDIC read reg 0x%x\n", addr);
+trace_e1000_mdic_read_register(addr);
 if (!(phy_regcap[addr] & PHY_R)) {
-DBGOUT(MDIC, "MDIC read reg %x unhandled\n", addr);
+trace_e1000_mdic_read_register_unhandled(addr);
 val |= E1000_MDIC_ERROR;
 } else
 val = (val ^ data) | s->phy_reg[addr];
 } else if (val & E1000_MDIC_OP_WRITE) {
-DBGOUT(MDIC, "MDIC write reg 0x%x, value 0x%x\n", addr, data);
+trace_e1000_mdic_write_register(addr, data);
 if (!(phy_regcap[addr] & PHY_W)) {
-DBGOUT(MDIC, "MDIC write reg %x unhandled\n", addr);
+trace_e1000_mdic_write_register_unhandled(addr);
 val |= E1000_MDIC_ERROR;
 } else {
 if (addr < NPHYWRITEOPS && phyreg_writeops[addr]) {
@@ -471,8 +449,8 @@ get_eecd(E1000State *s, int index)
 {
 uint32_t ret = E1000_EECD_PRES|E1000_EECD_GNT | s->eecd_state.old_eecd;
 
-DBGOUT(EEPROM, "reading eeprom bit %d (reading %d)\n",
-   s->eecd_state.bitnum_out, s->eecd_state.reading);
+trace_e1000_get_eecd(s->eecd_state.bitnum_out, s->eecd_state.reading);
+
 if (!s->eecd_state.reading ||
 ((s->eeprom_data[(s->eecd_state.bitnum_out >> 4) & 0x3f] >>
   ((s->eecd_state.bitnum_out & 0xf) ^ 0xf))) & 1)
@@ -511,9 +489,8 @@ set_eecd(E1000State *s, int index, uint32_t val)
 s->eecd_state.reading = (((s->eecd_state.val_in >> 6) & 7) ==
 EEPROM_READ_OPCODE_MICROWIRE);
 }
-DBGOUT(EEPROM, "eeprom bitnum in %d out %d, reading %d\n",
-   s->eecd_state.bitnum_in, s->eecd_state.bitnum_out,
-   s->eecd_state.reading);
+trace_e1000_set_eecd(s->eecd_state.bitnum_in, s->eecd_state.bitnum_out,
+ s->eecd_state.reading);
 }
 
 static uint32_t
@@ -580,8 +557,7 @@ xmit_seg(E1000State *s)
 
 if (tp->cptse) {
 css = props->ipcss;
-DBGOUT(TXSUM, "frames %d size %d ipcss %d\n",
-   frames, tp->size, css);
+trace_e1000_xmit_seg1(frames, tp->size, css);
 if (props->ip) {/* IPv4 */
 stw_be_p(tp->data+css+2, tp->size - css);
 stw_be_p(tp->data+css+4,
@@ -591,7 +567,7 @@ xmit_seg(E1000State *s)
 }
 css = props->tucss;
 len = tp->size - css;
-DBGOUT(TXSUM, "tcp %d tucss %d len %d\n", props->tcp, css, len);
+trace_e1000_xmit_seg2(props->tcp, css, len);
 if (props->tcp) {
 sofar = frames * props->mss;
 stl_be_p(tp->data+css+4, ldl_be_p(tp->data+css+4)+sofar); /* seq */
@@ -759,7 +735,7 @@ start_xmit(E1000State *s)
 uint32_t tdh_start = s->mac_reg[TDH], cause = E1000_ICS_TXQE;
 
 if (!(s->mac_reg[TCT

Re: [PATCH v2] e1000: Convert debug macros into tracepoints.

2024-04-08 Thread Don Porter

On 4/3/24 2:44 PM, Austin Clements wrote:
At this point there's not much of my original code left. :D Don, 
you're welcome to take the credit in the commit.


Thanks Austin.  I'll send v3 with this change :)

BTW, my attempt to include the appropriate maintainer from 
scripts/get_maintainer.pl (jasonw...@redhat.com) bounced.  Any pointers 
on who else should be cc-ed are appreciated.


-dp




[PATCH v2] e1000: Convert debug macros into tracepoints.

2024-04-03 Thread Don Porter
From: Austin Clements 

The E1000 debug messages are very useful for developing drivers.
Make these available to users without recompiling QEMU.

Signed-off-by: Austin Clements 
[geo...@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas 
Signed-off-by: Don Porter 
---
 hw/net/e1000.c  | 90 +++--
 hw/net/trace-events | 25 -
 2 files changed, 54 insertions(+), 61 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 43f3a4a701..24475636a3 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -44,26 +44,6 @@
 #include "trace.h"
 #include "qom/object.h"
 
-/* #define E1000_DEBUG */
-
-#ifdef E1000_DEBUG
-enum {
-DEBUG_GENERAL,  DEBUG_IO,   DEBUG_MMIO, DEBUG_INTERRUPT,
-DEBUG_RX,   DEBUG_TX,   DEBUG_MDIC, DEBUG_EEPROM,
-DEBUG_UNKNOWN,  DEBUG_TXSUM,DEBUG_TXERR,DEBUG_RXERR,
-DEBUG_RXFILTER, DEBUG_PHY,  DEBUG_NOTYET,
-};
-#define DBGBIT(x)(1<mac_reg[ICR],
-s->mac_reg[IMS]);
+trace_e1000_set_ics(val, s->mac_reg[ICR], s->mac_reg[IMS]);
 set_interrupt_cause(s, 0, val | s->mac_reg[ICR]);
 }
 
@@ -425,8 +404,7 @@ set_rx_control(E1000State *s, int index, uint32_t val)
 s->mac_reg[RCTL] = val;
 s->rxbuf_size = e1000x_rxbufsize(val);
 s->rxbuf_min_shift = ((val / E1000_RCTL_RDMTS_QUAT) & 3) + 1;
-DBGOUT(RX, "RCTL: %d, mac_reg[RCTL] = 0x%x\n", s->mac_reg[RDT],
-   s->mac_reg[RCTL]);
+trace_e1000_set_rx_control(s->mac_reg[RDT], s->mac_reg[RCTL]);
 timer_mod(s->flush_queue_timer,
   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 1000);
 }
@@ -440,16 +418,16 @@ set_mdic(E1000State *s, int index, uint32_t val)
 if ((val & E1000_MDIC_PHY_MASK) >> E1000_MDIC_PHY_SHIFT != 1) // phy #
 val = s->mac_reg[MDIC] | E1000_MDIC_ERROR;
 else if (val & E1000_MDIC_OP_READ) {
-DBGOUT(MDIC, "MDIC read reg 0x%x\n", addr);
+trace_e1000_mdic_read_register(addr);
 if (!(phy_regcap[addr] & PHY_R)) {
-DBGOUT(MDIC, "MDIC read reg %x unhandled\n", addr);
+trace_e1000_mdic_read_register_unhandled(addr);
 val |= E1000_MDIC_ERROR;
 } else
 val = (val ^ data) | s->phy_reg[addr];
 } else if (val & E1000_MDIC_OP_WRITE) {
-DBGOUT(MDIC, "MDIC write reg 0x%x, value 0x%x\n", addr, data);
+trace_e1000_mdic_write_register(addr, data);
 if (!(phy_regcap[addr] & PHY_W)) {
-DBGOUT(MDIC, "MDIC write reg %x unhandled\n", addr);
+trace_e1000_mdic_write_register_unhandled(addr);
 val |= E1000_MDIC_ERROR;
 } else {
 if (addr < NPHYWRITEOPS && phyreg_writeops[addr]) {
@@ -471,8 +449,8 @@ get_eecd(E1000State *s, int index)
 {
 uint32_t ret = E1000_EECD_PRES|E1000_EECD_GNT | s->eecd_state.old_eecd;
 
-DBGOUT(EEPROM, "reading eeprom bit %d (reading %d)\n",
-   s->eecd_state.bitnum_out, s->eecd_state.reading);
+trace_e1000_get_eecd(s->eecd_state.bitnum_out, s->eecd_state.reading);
+
 if (!s->eecd_state.reading ||
 ((s->eeprom_data[(s->eecd_state.bitnum_out >> 4) & 0x3f] >>
   ((s->eecd_state.bitnum_out & 0xf) ^ 0xf))) & 1)
@@ -511,9 +489,8 @@ set_eecd(E1000State *s, int index, uint32_t val)
 s->eecd_state.reading = (((s->eecd_state.val_in >> 6) & 7) ==
 EEPROM_READ_OPCODE_MICROWIRE);
 }
-DBGOUT(EEPROM, "eeprom bitnum in %d out %d, reading %d\n",
-   s->eecd_state.bitnum_in, s->eecd_state.bitnum_out,
-   s->eecd_state.reading);
+trace_e1000_set_eecd(s->eecd_state.bitnum_in, s->eecd_state.bitnum_out,
+ s->eecd_state.reading);
 }
 
 static uint32_t
@@ -580,8 +557,7 @@ xmit_seg(E1000State *s)
 
 if (tp->cptse) {
 css = props->ipcss;
-DBGOUT(TXSUM, "frames %d size %d ipcss %d\n",
-   frames, tp->size, css);
+trace_e1000_xmit_seg1(frames, tp->size, css);
 if (props->ip) {/* IPv4 */
 stw_be_p(tp->data+css+2, tp->size - css);
 stw_be_p(tp->data+css+4,
@@ -591,7 +567,7 @@ xmit_seg(E1000State *s)
 }
 css = props->tucss;
 len = tp->size - css;
-DBGOUT(TXSUM, "tcp %d tucss %d len %d\n", props->tcp, css, len);
+trace_e1000_xmit_seg2(props->tcp, css, len);
 if (props->tcp) {
 sofar = frames * props->mss;
 stl_be_p(tp->data+css+4, ldl_be_p(tp->data+css+4)+sofar); /* seq */
@@ -759,7 +735,7 @@ start_xmit(E1000State *s)
 uint32_t tdh_start = s->mac_reg[TDH], cause = E1000_ICS_TXQE;
 
 if (!(s->mac_reg[TCTL] & E100

[PATCH 0/1] Upstreaming Course Debugging Changes

2024-03-29 Thread Don Porter
Hi all,

I am a CS professor (and first time contributor) and have been using
qemu in my courses for over a decade, especially a course that asks
students to write major pieces of an OS kernel from starter code.

I have some patches, originally from Austin Clements at MIT, that I
have found useful over the years and that may be useful to others.  It
would also be nice not to have to build a custom qemu each semester.  I
have cleared upstreaming these with Austin, the original author.

In order to learn the process and community, I thought I would start
with one patch.  As the description says, it enables e1000 debugging
without recompilation.  One project in the course is to write an e1000
driver, and these logs are quite helpful to students.

Thank you in advance for your time,
Don Porter

Austin Clements (1):
  e1000: Get debug flags from an environment variable

 hw/net/e1000.c | 65 +-
 1 file changed, 59 insertions(+), 6 deletions(-)

--
2.25.1



[PATCH 1/1] e1000: Get debug flags from an environment variable

2024-03-29 Thread Don Porter
From: Austin Clements 

The E1000 debug messages are very useful for developing drivers, so
this introduces an E1000_DEBUG environment variable that lets the
debug flags be set without recompiling QEMU.

Signed-off-by: Austin Clements 
[geo...@ldpreload.com: Rebased on top of 2.9.0]
Signed-off-by: Geoffrey Thomas 
Signed-off-by: Don Porter 
---
 hw/net/e1000.c | 65 +-
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 43f3a4a701..8d46225944 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -30,11 +30,14 @@
 #include "hw/pci/pci_device.h"
 #include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
+#include "monitor/monitor.h"
 #include "net/eth.h"
 #include "net/net.h"
 #include "net/checksum.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/dma.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
 #include "qemu/iov.h"
 #include "qemu/module.h"
 #include "qemu/range.h"
@@ -44,15 +47,19 @@
 #include "trace.h"
 #include "qom/object.h"
 
-/* #define E1000_DEBUG */
-
-#ifdef E1000_DEBUG
 enum {
 DEBUG_GENERAL,  DEBUG_IO,   DEBUG_MMIO, DEBUG_INTERRUPT,
 DEBUG_RX,   DEBUG_TX,   DEBUG_MDIC, DEBUG_EEPROM,
 DEBUG_UNKNOWN,  DEBUG_TXSUM,DEBUG_TXERR,DEBUG_RXERR,
 DEBUG_RXFILTER, DEBUG_PHY,  DEBUG_NOTYET,
 };
+
+static const char *debugnames[] = {
+"GENERAL",  "IO",   "MMIO", "INTERRUPT",
+"RX",   "TX",   "MDIC", "EEPROM",
+"UNKNOWN",  "TXSUM","TXERR","RXERR",
+"RXFILTER", "PHY",  "NOTYET",   NULL
+};
 #define DBGBIT(x)(1<