date:20250520

On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
> On 09.05.2025 11:05, Jiqian Chen wrote:
> > When init_msi() fails, the previous new changes will hide MSI
> > capability, it can't rely on vpci_deassign_device() to remove
> > all MSI related resources anymore, those resources must be
> > removed in cleanup function of MSI.
> 
> That's because vpci_deassign_device() simply isn't called anymore?
> Could do with wording along these lines then. But (also applicable
> to the previous patch) - doesn't this need to come earlier? And is
> it sufficient to simply remove the register intercepts? Don't you
> need to put in place ones dropping all writes and making all reads
> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
> this may already be the case by default behavior)?

For domUs this is already the default behavior.

For dom0 I think it should be enough to hide the capability from the
linked list, but not hide all the capability related
registers.  IMO a well behaved dom0 won't try to access capabilities
disconnected from the linked list, and in general we allow dom0 access
to as much as possible.

For dom0 Xen could drop writes to the MSI(-X) enable bit, thus forcing
MSI(-X) to stay disabled.  I however don't see this as a mandatory
step for the work here.

Regards, Roger.

RE: [PATCH v4 12/15] tools/xenpm: Print CPPC parameters for amd-cppc driver

[Public]

> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, May 13, 2025 4:03 PM
> To: Penny, Zheng 
> Cc: Huang, Ray ; Anthony PERARD
> ; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v4 12/15] tools/xenpm: Print CPPC parameters for amd-cppc
> driver
>
> On 09.05.2025 08:36, Penny, Zheng wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: Wednesday, April 30, 2025 9:55 PM
> >>
> >> On 14.04.2025 09:40, Penny Zheng wrote:
> >>> HWP, amd-cppc, amd-cppc-epp are all the implementation of ACPI CPPC
> >>> (Collaborative Processor Performace Control), so we introduce
> >>> cppc_mode flag to print CPPC-related para.
> >>>
> >>> And HWP and amd-cppc-epp are both governor-less driver, so we
> >>> introduce hw_auto flag to bypass governor-related print.
> >>
> >> But in the EPP driver you use the information which governor is active.
> >
> > We want to have a one-one mapping between governor and epp value, such
> > as, If users choose performance governor, no matter via "xenpm" or
> > cmdline, users want maximum performance, We set epp with 0 to meet the
> expectation.
> > And if users choose powersave governor, users want the least power
> > consumption, then we shall set epp with 255 to meet the expectation.
>
> That's all fine, but completely misses the point of my question: If the 
> governor is
> relevant, why would you bypass respective printing?
>

The only useful info in governor for epp mode, IMO, is its name.
I deduce which performance policy user wants to apply through which governor 
they choose.
If user chooses performance governor, they want maximum performance.
If user chooses powersave governor, they want the least power consumption
I could only provide appropriate epp value in above two scenarios.
ondemand and userspace are forbidden choices, and if users specify such options 
in cmdline,
I shall give warning message to say  that amd-cppc in active mode is 
governor-less, and users could
set epp values on runtime to specify bias towards performance or efficiency.

If above is messy, I could also totally forbid governor thing for active 
mode... wdyt?


> > Ondemand is a tricky part, h, I don't know which value is suitable
> > for it, the medium one? So I neglect it in the first place
>
> Medium one may be okay-ish, but it's not really an appropriate fit. We may 
> want to
> at least consider rejecting the use of ondemand with the EPP driver.
> That, however, heavily depends on how hardware would behave when using the
> medium value.
>
> >>> --- a/tools/misc/xenpm.c
> >>> +++ b/tools/misc/xenpm.c
> >>> @@ -790,9 +790,18 @@ static unsigned int
> >>> calculate_activity_window(const xc_cppc_para_t *cppc,
> >>>  /* print out parameters about cpu frequency */  static void
> >>> print_cpufreq_para(int cpuid, struct xc_get_cpufreq_para *p_cpufreq)
> >>> {
> >>> -bool hwp = strcmp(p_cpufreq->scaling_driver,
> XEN_HWP_DRIVER_NAME)
> >> == 0;
> >>> +bool cppc_mode = false, hw_auto = false;
> >>>  int i;
> >>>
> >>> +if ( !strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) ||
> >>> + !strcmp(p_cpufreq->scaling_driver,
> XEN_AMD_CPPC_DRIVER_NAME) ||
> >>> + !strcmp(p_cpufreq->scaling_driver,
> >> XEN_AMD_CPPC_EPP_DRIVER_NAME) )
> >>> +cppc_mode = true;
> >>> +
> >>> +if ( !strcmp(p_cpufreq->scaling_driver, XEN_HWP_DRIVER_NAME) ||
> >>> + !strcmp(p_cpufreq->scaling_driver,
> >> XEN_AMD_CPPC_EPP_DRIVER_NAME) )
> >>> +hw_auto = true;
> >>
> >> Please avoid doing the same strcmp()s twice. There are several ways
> >> how to, so I'm not going to make a particular suggestion.
> >
> > Maybe we shall use switch-case() to replace the same strcmp()s Since
> > it's not easy to switch-case() string value, I had a draft idea to
> > include an new entry in "struct xen_cppc_para",
> > See:
> > ```
> > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> > index fa431fd983..b872f1b66a 100644
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -308,6 +308,10 @@ struct xen_ondemand {
> >
> >  struct xen_cppc_para {
> >  /* OUT */
> > +#define XEN_SYSCTL_CPPC_VENDOR_HWP  1
> > +#define XEN_SYSCTL_CPPC_VENDOR_AMD  2
> > +#define XEN_SYSCTL_CPPC_VENDOR_AMD_EPP  3
> > +uint8_t vendor;
> >  /* activity_window supported if set */  #define
> > XEN_SYSCTL_CPPC_FEAT_ACT_WINDOW  (1 << 0)
> >  uint32_t features; /* bit flags for features */
> >
> > ```
> > A new vendor filed in struct xen_cppc_para could help us differ the 
> > underlying
> implementation.
> > Or any better suggestions?
>
> Well, if you set hw_auto first, then you can use that variable plus one more 
> strcmp()
> to set cppc_mode.
>
> Jan

Re: Request for patch to fix boot loop issue in Xen 4.17.6

On Sat, May 03, 2025 at 02:02:32PM +, Ngamia Djabiri Julie wrote:
> Dear Xen developers,
> 
> I would like to ask if the following fix can also be included in Xen 4.17.6 
> (and eventually in the Xen versions after 4.17.6 that don't have the fix) :

Hello,

4.17.6 is planned for the end of the year (so more than 6 months from
now).  It would be faster if you request the backport to be added to
the Alpine Xen 4.17 package.

Andrew provided a link for the backport to 4.17 that we use in
XenServer, it will most likely apply cleanly to the Alpine package.

Regards, Roger.

Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails

On 20.05.2025 11:09, Roger Pau Monné wrote:
> On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>> When init_msi() fails, the previous new changes will hide MSI
>>> capability, it can't rely on vpci_deassign_device() to remove
>>> all MSI related resources anymore, those resources must be
>>> removed in cleanup function of MSI.
>>
>> That's because vpci_deassign_device() simply isn't called anymore?
>> Could do with wording along these lines then. But (also applicable
>> to the previous patch) - doesn't this need to come earlier? And is
>> it sufficient to simply remove the register intercepts? Don't you
>> need to put in place ones dropping all writes and making all reads
>> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
>> this may already be the case by default behavior)?
> 
> For domUs this is already the default behavior.
> 
> For dom0 I think it should be enough to hide the capability from the
> linked list, but not hide all the capability related
> registers.  IMO a well behaved dom0 won't try to access capabilities
> disconnected from the linked list,

Just that I've seen drivers knowing where their device has certain
capabilities, thus not bothering to look up the respective
capability.

> and in general we allow dom0 access
> to as much as possible.
> 
> For dom0 Xen could drop writes to the MSI(-X) enable bit, thus forcing
> MSI(-X) to stay disabled.  I however don't see this as a mandatory
> step for the work here.

You're the maintainer, so you have the final say.

Jan

Re: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline

On 20.05.2025 10:28, Penny, Zheng wrote:
> [Public]
> 
>> -Original Message-
>> From: Jan Beulich 
>> Sent: Monday, May 19, 2025 9:19 PM
>> To: Penny, Zheng 
>> Cc: Huang, Ray ; Andrew Cooper
>> ; Anthony PERARD ;
>> Orzel, Michal ; Julien Grall ; Roger 
>> Pau
>> Monné ; Stefano Stabellini ; 
>> xen-
>> de...@lists.xenproject.org
>> Subject: Re: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc" xen
>> cmdline
>>
>> On 19.05.2025 10:36, Penny, Zheng wrote:
>>> [Public]
>>>
 -Original Message-
 From: Jan Beulich 
 Sent: Tuesday, April 29, 2025 8:52 PM
 To: Penny, Zheng 
 Cc: Huang, Ray ; Andrew Cooper
 ; Anthony PERARD
 ; Orzel, Michal ;
 Julien Grall ; Roger Pau Monné
 ; Stefano Stabellini ;
 xen- de...@lists.xenproject.org
 Subject: Re: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc"
 xen cmdline

 On 14.04.2025 09:40, Penny Zheng wrote:
> --- a/xen/include/acpi/cpufreq/processor_perf.h
> +++ b/xen/include/acpi/cpufreq/processor_perf.h
> @@ -5,6 +5,9 @@
>  #include 
>  #include 
>
> +/* ability bits */
> +#define XEN_PROCESSOR_PM_CPPC   8

 This needs correlating (at least via commentary, better by build-time
 checking) with the other XEN_PROCESSOR_PM_* values. Otherwise
>> someone
 adding a new #define in the public header may not (easily) notice a
 possible conflict. With that in mind I also question whether 8 is
 actually a good choice: That's the obvious next value to use in the
 public interface. SIF_PM_MASK is 8 bits wide, so a sensible value to use 
 here
>> would by e.g. 0x100.

>>>
>>> I've added a public flag anchor "XEN_PROCESSOR_PM_PUBLIC_END" in
>> public header:
>>>  #define XEN_PROCESSOR_PM_PUBLIC_END
>> XEN_PROCESSOR_PM_TX
>>> and will do the following build-time checking:
>>> BUILD_BUG_ON(XEN_PROCESSOR_PM_CPPC <=
>>> XEN_PROCESSOR_PM_PUBLIC_END);
>>
>> I don't really see why anything would need to be added to the public header 
>> (and we
>> really should strive to avoid unnecessary additions).
> 
> If I put such definition
> "#define XEN_PROCESSOR_PM_PUBLIC_END XEN_PROCESSOR_PM_TX"
> in internal header, I'm afraid it won't be updated if new ones added in the 
> public in the future.
> Or any other suggestions to provide build-time checking?

Imo it's sufficient to check that the new #define doesn't overlap with
SIF_PM_MASK.

Jan

Re: [PATCH v4 12/15] tools/xenpm: Print CPPC parameters for amd-cppc driver

On 20.05.2025 10:22, Penny, Zheng wrote:
>> -Original Message-
>> From: Jan Beulich 
>> Sent: Tuesday, May 13, 2025 4:03 PM
>>
>> On 09.05.2025 08:36, Penny, Zheng wrote:
 -Original Message-
 From: Jan Beulich 
 Sent: Wednesday, April 30, 2025 9:55 PM

 On 14.04.2025 09:40, Penny Zheng wrote:
> HWP, amd-cppc, amd-cppc-epp are all the implementation of ACPI CPPC
> (Collaborative Processor Performace Control), so we introduce
> cppc_mode flag to print CPPC-related para.
>
> And HWP and amd-cppc-epp are both governor-less driver, so we
> introduce hw_auto flag to bypass governor-related print.

 But in the EPP driver you use the information which governor is active.
>>>
>>> We want to have a one-one mapping between governor and epp value, such
>>> as, If users choose performance governor, no matter via "xenpm" or
>>> cmdline, users want maximum performance, We set epp with 0 to meet the
>> expectation.
>>> And if users choose powersave governor, users want the least power
>>> consumption, then we shall set epp with 255 to meet the expectation.
>>
>> That's all fine, but completely misses the point of my question: If the 
>> governor is
>> relevant, why would you bypass respective printing?
>>
> 
> The only useful info in governor for epp mode, IMO, is its name.
> I deduce which performance policy user wants to apply through which governor 
> they choose.
> If user chooses performance governor, they want maximum performance.
> If user chooses powersave governor, they want the least power consumption
> I could only provide appropriate epp value in above two scenarios.
> ondemand and userspace are forbidden choices, and if users specify such 
> options in cmdline,
> I shall give warning message to say  that amd-cppc in active mode is 
> governor-less, and users could
> set epp values on runtime to specify bias towards performance or efficiency.
> 
> If above is messy, I could also totally forbid governor thing for active 
> mode... wdyt?

Then how would one pick between performance and powersave?

Jan

Re: [PATCH v2 16/16] xen/riscv: add basic UART support



On 5/15/25 11:59 AM, Jan Beulich wrote:

On 06.05.2025 18:51, Oleksii Kurochko wrote:

--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -4,12 +4,16 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
  #include 
+#include 

Why's this needed? I can't spot anything ...


It should be dropped. This rudiment left when I called percpu_init_areas().




+#include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -136,8 +140,17 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
  
  intc_preinit();
  
+uart_init();

+console_init_preirq();
+
  intc_init();
  
+timer_init();

+
+local_irq_enable();
+
+console_init_postirq();
+
  printk("All set up\n");
  
  machine_halt();

... relevant here. With the need clarified or with the #include dropped:
Acked-by: Jan Beulich


Thanks.

~ Oleksii

[PATCH v2 2/3] Add lockdown mode

2025-05-20 Thread Kevin Lampis

From: Ross Lagerwall 

The intention of lockdown mode is to prevent attacks from a rogue dom0
userspace from compromising the system. Lockdown mode can be controlled by a
Kconfig option and a command-line parameter. It is also enabled automatically
when Secure Boot is enabled and it cannot be disabled in that case.

If enabled from the command-line then it is required to be first in the
list otherwise Xen may process some insecure parameters before reaching
the lockdown parameter.

Signed-off-by: Ross Lagerwall 
Signed-off-by: Kevin Lampis 
---
Changes in v2:
- Remove custom command line parsing
- Print warning if lockdown is not first on command line
---
 xen/arch/x86/setup.c   |  1 +
 xen/common/Kconfig |  8 ++
 xen/common/Makefile|  1 +
 xen/common/kernel.c|  6 +
 xen/common/lockdown.c  | 54 ++
 xen/include/xen/lockdown.h | 11 
 6 files changed, 81 insertions(+)
 create mode 100644 xen/common/lockdown.c
 create mode 100644 xen/include/xen/lockdown.h

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2518954124..276957c4ed 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 6d43be2e6e..c84073563f 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -576,4 +576,12 @@ config BUDDY_ALLOCATOR_SIZE
  Amount of memory reserved for the buddy allocator to serve Xen heap,
  working alongside the colored one.
 
+config LOCKDOWN_DEFAULT
+   bool "Enable lockdown mode by default"
+   default n
+   help
+ Lockdown mode prevents attacks from a rogue dom0 userspace from
+ compromising the system. This is automatically enabled when Secure
+ Boot is enabled.
+
 endmenu
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 98f0873056..b00a8a925a 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_KEXEC) += kimage.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o livepatch_elf.o
 obj-$(CONFIG_LLC_COLORING) += llc-coloring.o
+obj-y += lockdown.o
 obj-$(CONFIG_VM_EVENT) += mem_access.o
 obj-y += memory.o
 obj-y += multicall.o
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 8b63ca55f1..3538f467ad 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -199,6 +199,8 @@ static int parse_params(const char *cmdline, const struct 
kernel_param *start,
 printk("parameter \"%s\" unknown!\n", key);
 final_rc = -EINVAL;
 }
+
+lockdown_clear_first_flag();
 }
 
 return final_rc;
@@ -216,6 +218,9 @@ static void __init _cmdline_parse(const char *cmdline)
  */
 void __init cmdline_parse(const char *cmdline)
 {
+/* Call this early since it affects command-line parsing */
+lockdown_init(cmdline);
+
 if ( opt_builtin_cmdline[0] )
 {
 printk("Built-in command line: %s\n", opt_builtin_cmdline);
@@ -227,6 +232,7 @@ void __init cmdline_parse(const char *cmdline)
 return;
 
 safe_strcpy(saved_cmdline, cmdline);
+lockdown_set_first_flag();
 _cmdline_parse(cmdline);
 #endif
 }
diff --git a/xen/common/lockdown.c b/xen/common/lockdown.c
new file mode 100644
index 00..cd3deeb63e
--- /dev/null
+++ b/xen/common/lockdown.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include 
+#include 
+#include 
+
+#define FIRST_ARG_FLAG 2
+
+static int __ro_after_init lockdown = IS_ENABLED(CONFIG_LOCKDOWN_DEFAULT);
+
+void __init lockdown_set_first_flag(void)
+{
+lockdown |= FIRST_ARG_FLAG;
+}
+
+void __init lockdown_clear_first_flag(void)
+{
+lockdown &= ~FIRST_ARG_FLAG;
+}
+
+static int __init parse_lockdown_opt(const char *s)
+{
+if ( strncmp("no", s, 2) == 0 )
+if ( efi_secure_boot )
+printk("lockdown can't be disabled because Xen booted in Secure 
Boot mode\n");
+else
+lockdown = 0;
+else
+{
+if ( !(lockdown & FIRST_ARG_FLAG) )
+printk("lockdown was not the first argument, unsafe arguments may 
have been already processed\n");
+
+lockdown = 1;
+}
+
+return 0;
+}
+custom_secure_param("lockdown", parse_lockdown_opt);
+
+bool is_locked_down(void)
+{
+return lockdown & ~FIRST_ARG_FLAG;
+}
+
+void __init lockdown_init(const char *cmdline)
+{
+if ( efi_secure_boot )
+{
+printk("Enabling lockdown mode because Secure Boot is enabled\n");
+lockdown = 1;
+}
+
+printk("Lockdown mode is %s\n", is_locked_down() ? "enabled" : "disabled");
+}
diff --git a/xen/include/xen/lockdown.h b/xen/include/xen/lockdown.h
new file mode 100644
index 00..6ae97f9d5f
--- /dev/null
+++ b/xen/include/xen/lockdown.h
@@ -0,0 +1,11 @@
+#ifndef XEN__LOCKDOWN_H
+#define XEN__LOCKDOWN_H
+
+#include 
+
+void lockdown_set_first_flag(void);
+void

[PATCH v2 3/3] Disallow most command-line options when lockdown mode is enabled

2025-05-20 Thread Kevin Lampis

A subset of command-line parameters that are specifically safe to use when
lockdown mode is enabled are annotated as such.

These are commonly used parameters which have been audited to ensure they
cannot be used to undermine the integrity of the system when booted in
Secure Boot mode.

Signed-off-by: Kevin Lampis 
Signed-off-by: Ross Lagerwall 
---
Changes in v2:
- Add more information about the safe parameters
- Add lockdown section to the command line doc
---
 docs/misc/xen-command-line.pandoc | 16 +
 xen/arch/arm/domain_build.c   |  4 +--
 xen/arch/x86/acpi/cpu_idle.c  |  2 +-
 xen/arch/x86/cpu/amd.c|  2 +-
 xen/arch/x86/cpu/mcheck/mce.c |  2 +-
 xen/arch/x86/cpu/microcode/core.c |  2 +-
 xen/arch/x86/dom0_build.c |  4 +--
 xen/arch/x86/hvm/hvm.c|  2 +-
 xen/arch/x86/irq.c|  2 +-
 xen/arch/x86/nmi.c|  2 +-
 xen/arch/x86/setup.c  |  2 +-
 xen/arch/x86/traps.c  |  2 +-
 xen/arch/x86/x86_64/mmconfig-shared.c |  2 +-
 xen/common/domain.c   |  2 +-
 xen/common/kernel.c   | 11 +-
 xen/common/kexec.c|  2 +-
 xen/common/numa.c |  2 +-
 xen/common/page_alloc.c   |  2 +-
 xen/common/shutdown.c |  2 +-
 xen/drivers/char/console.c|  2 +-
 xen/drivers/char/ns16550.c|  4 +--
 xen/drivers/video/vga.c   |  2 +-
 xen/include/xen/param.h   | 49 +--
 23 files changed, 87 insertions(+), 35 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index b0eadd2c5d..7916875f22 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1798,6 +1798,22 @@ immediately. Specifying `0` will disable all testing of 
illegal lock nesting.
 
 This option is available for hypervisors built with CONFIG_DEBUG_LOCKS only.
 
+### lockdown
+> `= `
+
+> Default: `false`
+
+The intention of lockdown mode is to prevent attacks from a rogue dom0
+userspace from compromising the system. It is also enabled automatically
+when Secure Boot is enabled and it cannot be disabled in that case.
+
+After lockdown mode is enabled some unsafe command line options will be
+ignored by Xen.
+
+If enabling lockdown mode via the command line then ensure it is positioned as
+the first option in the command line string otherwise Xen may process unsafe
+options before reaching the lockdown parameter.
+
 ### loglvl
 > `= [/]` where level is `none | error | warning | 
 > info | debug | all`
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index b189a7cfae..ef1cba8f0f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -41,7 +41,7 @@
 #include 
 
 static unsigned int __initdata opt_dom0_max_vcpus;
-integer_param("dom0_max_vcpus", opt_dom0_max_vcpus);
+integer_secure_param("dom0_max_vcpus", opt_dom0_max_vcpus);
 
 /*
  * If true, the extended regions support is enabled for dom0 and
@@ -61,7 +61,7 @@ static int __init parse_dom0_mem(const char *s)
 
 return *s ? -EINVAL : 0;
 }
-custom_param("dom0_mem", parse_dom0_mem);
+custom_secure_param("dom0_mem", parse_dom0_mem);
 
 int __init parse_arch_dom0_param(const char *s, const char *e)
 {
diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 1dbf15b01e..431fd0c997 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -113,7 +113,7 @@ static int __init cf_check parse_cstate(const char *s)
 max_csubstate = simple_strtoul(s + 1, NULL, 0);
 return 0;
 }
-custom_param("max_cstate", parse_cstate);
+custom_secure_param("max_cstate", parse_cstate);
 
 static bool __read_mostly local_apic_timer_c2_ok;
 boolean_param("lapic_timer_c2_ok", local_apic_timer_c2_ok);
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 27ae167808..9aab3d05e1 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -47,7 +47,7 @@ integer_param("cpuid_mask_thermal_ecx", 
opt_cpuid_mask_thermal_ecx);
 
 /* 1 = allow, 0 = don't allow guest creation, -1 = don't allow boot */
 int8_t __read_mostly opt_allow_unsafe;
-boolean_param("allow_unsafe", opt_allow_unsafe);
+boolean_secure_param("allow_unsafe", opt_allow_unsafe);
 
 /* Signal whether the ACPI C1E quirk is required. */
 bool __read_mostly amd_acpi_c1e_quirk;
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 1c348e557d..a229af6fd3 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -31,7 +31,7 @@
 #include "vmce.h"
 
 bool __read_mostly opt_mce = true;
-boolean_param("mce", opt_mce);
+boolean_secure_param("mce", opt_mce);
 bool __read_mostly mce_broadcast;
 bool is_mc_panic;
 DEFINE_PER_CPU_READ_MOSTLY(unsigned int, nr_mce_banks);
diff --git a/xen/arch/x86/cpu/microcode/core.c 
b/xen/arch/x86/cpu/microcode

Re: [RFC PATCH] xen: Introduce extra IRQ count domain creation parameter

On Fri, May 16, 2025 at 01:50:25PM +, Teddy Astie wrote:
> When doing PCI Passthrough with high-IRQ devices (e.g NVMe drives),
> the default limit may be unefficient as not all domains requires
> more IRQs.
> 
> Introduce a new parameter to allow the toolstack to tune the IRQ
> count if more is required.
> 
> Signed-off-by: Teddy Astie 
> ---
> 0 extra_irqs is meaningful, though I am not sure how to expose this
> special case.

You could introduce a new CDF flag to signal the contents of
extra_irqs is valid, or maybe use the top bit of the `extra_irqs`
field to signal the value is set?

> This of course wants libxl support next.

It would be nice if this could come together in a patch series.

> ---
>  xen/common/domain.c | 8 +---
>  xen/include/public/domctl.h | 3 +++
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index abf1969e60..5c628962fc 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -912,10 +912,12 @@ struct domain *domain_create(domid_t domid,
>  
>  #ifdef CONFIG_HAS_PIRQ
>  if ( !is_hardware_domain(d) )
> -d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
> +d->nr_pirqs = nr_static_irqs + config->extra_irqs ?: extra_domU_irqs;
>  else
> -d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
> -   : arch_hwdom_irqs(d);
> +{
> +unsigned int extra_irqs = config->extra_irqs ?: extra_hwdom_irqs;

Newline.

> +d->nr_pirqs = extra_irqs ? nr_static_irqs + extra_irqs : 
> arch_hwdom_irqs(d);

I think the above line is > 80 characters?

> +}
>  d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
>  
>  radix_tree_init(&d->pirq_tree);
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 5b2063eed9..e4bb366c78 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -121,6 +121,9 @@ struct xen_domctl_createdomain {
>  /* CPU pool to use; specify 0 or a specific existing pool */
>  uint32_t cpupool_id;
>  
> +/* Additional IRQ for this guest. 0 to use Xen default value. */
> +uint32_t extra_irqs;

I think you need a domctl version bump, as the last one was done for
4.19.

Regards, Roger.

Re: [PATCH v2 14/16] xen/riscv: add external interrupt handling for hypervisor mode



On 5/15/25 11:54 AM, Jan Beulich wrote:

On 06.05.2025 18:51, Oleksii Kurochko wrote:

+static void cf_check aplic_set_irq_type(struct irq_desc *desc, unsigned int 
type)
+{
+/*
+* Interrupt 0 isn't possible based on the spec:
+*   Each of an APLIC’s interrupt sources has a fixed unique identity 
number in the range 1 to N,
+*   where N is the total number of sources at the APLIC. The number zero 
is not a valid interrupt
+*   identity number at an APLIC. The maximum number of interrupt sources 
an APLIC may support
+*   is 1023.
+*
+* Thereby interrupt 1 will correspond to bit 0 in sourcecfg[] register,
+* interrupt 2 ->sourcecfg[1] and so on.
+*
+* And that is the reason why we need -1.
+*/
+unsigned int irq_bit = desc->irq - 1;
+
+spin_lock(&aplic.lock);
+
+switch(type)

Nit: style


+{
+case IRQ_TYPE_EDGE_RISING:
+writel(APLIC_SOURCECFG_SM_EDGE_RISE, &aplic.regs->sourcecfg[irq_bit]);
+break;
+
+case IRQ_TYPE_EDGE_FALLING:
+writel(APLIC_SOURCECFG_SM_EDGE_FALL, &aplic.regs->sourcecfg[irq_bit]);
+break;
+
+case IRQ_TYPE_LEVEL_HIGH:
+writel(APLIC_SOURCECFG_SM_LEVEL_HIGH, &aplic.regs->sourcecfg[irq_bit]);
+break;
+
+case IRQ_TYPE_LEVEL_LOW:
+writel(APLIC_SOURCECFG_SM_LEVEL_LOW, &aplic.regs->sourcecfg[irq_bit]);
+break;
+
+case IRQ_TYPE_NONE:
+fallthrough;

This is pointless (and hampering readability) when there are no other 
statements.


Oh, okay, it should be:
  case IRQ_TYPE_NONE:
  case IRQ_TYPE_INVALID:
I thought fallthrough should be used always in such cases.
Anyway, I'll drop it.



With both taken care of:
Acked-by: Jan Beulich


Thanks.

I am going also to add "ASSERT(spin_is_locked(&desc->lock));" at the start of 
this
function to be algined with other callbacks which uses 
spin_{un}lock(&aplic.lock).

~ Oleksii

Re: [PATCH v2 15/16] xen/riscv: implement setup_irq()



On 5/15/25 11:57 AM, Jan Beulich wrote:

On 06.05.2025 18:51, Oleksii Kurochko wrote:

@@ -58,6 +59,89 @@ int platform_get_irq(const struct dt_device_node *device, 
int index)
  return dt_irq.irq;
  }
  
+static int _setup_irq(struct irq_desc *desc, unsigned int irqflags,

+  struct irqaction *new)
+{
+bool shared = irqflags & IRQF_SHARED;
+
+ASSERT(new != NULL);
+
+/*
+ * Sanity checks:
+ *  - if the IRQ is marked as shared
+ *  - dev_id is not NULL when IRQF_SHARED is set
+ */
+if ( desc->action != NULL && (!(desc->status & IRQF_SHARED) || !shared) )
+return -EINVAL;
+if ( shared && new->dev_id == NULL )
+return -EINVAL;
+
+if ( shared )
+desc->status |= IRQF_SHARED;
+
+#ifdef CONFIG_IRQ_HAS_MULTIPLE_ACTION
+new->next = desc->action;
+#endif

Didn't you indicate you'd drop this?


To one of replies I wrote that probably something (eg RISC-V's IOMMU) will want 
to setup
multiple handler for the same interrupt. But I'm not sure yet. I can drop for 
now and
return back when it will be really a use case.

~ Oleksii

Re: [PATCH] arm/vgic-v3: Fix write_ignore_64's check in __vgic_v3_rdistr_rd_mmio_write()

On 20/05/2025 2:47 pm, Oleksandr Tyshchenko wrote:
> An attempt to write access the register (i.e. GICR_PROPBASER, GICR_PENDBASER)
> which should be ignored (i.e. no virtual ITS present) causes the data about

Do you mean "data abort" here?  If not, I can't parse the sentence.

> due to incorrect check at the write_ignore_64 label. The check should be
> inverted.
>
> Fixes: c4d6bbdc12e5 ("xen/arm: vgic-v3: Support 32-bit access for 64-bit 
> registers")
> Signed-off-by: Oleksandr Tyshchenko 
> ---
>  xen/arch/arm/vgic-v3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> index 2eaa48fadb..b366b046a2 100644
> --- a/xen/arch/arm/vgic-v3.c
> +++ b/xen/arch/arm/vgic-v3.c
> @@ -649,7 +649,7 @@ bad_width:
>  return 0;
>  
>  write_ignore_64:
> -if ( vgic_reg64_check_access(dabt) ) goto bad_width;
> +if ( !vgic_reg64_check_access(dabt) ) goto bad_width;

As you're modifying anyway, the goto should be on the next line.

~Andrew

Re: [PATCH v2 6/6] x86/hvm: reduce the need to flush caches in memory_type_changed()




On 5/19/25 4:33 PM, Roger Pau Monné wrote:

On Mon, May 19, 2025 at 03:22:32PM +0200, Jan Beulich wrote:

On 19.05.2025 13:08, Roger Pau Monné wrote:

On Sun, May 18, 2025 at 01:44:49PM +0200, Jan Beulich wrote:

On 16.05.2025 11:45, Roger Pau Monne wrote:

Not sure whether this attempt to reduce cache flushing should get some
mention in the CHANGELOG.

Err on the side of adding an entry there?

Oleksii would you be fine with me adding:

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1ea06524db20..fa971cd9e6ee 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -11,6 +11,9 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
 - For x86, GCC 5.1 and Binutils 2.25, or Clang/LLVM 11
 - For ARM32 and ARM64, GCC 5.1 and Binutils 2.25
   - Linux based device model stubdomains are now fully supported.
+ - On x86:
+   - Restrict the cache flushing done in memory_type_changed() to improve
+ performance.

Maybe better to mention function names here, saying "after a memory type change
by a guest" instead?

It's not just "after a memory type change by a guest", since
memory_type_changed() is also used for toolstack operations like
io{mem,ports}_{permit,deny}_access(), that strictly speaking are not
memory type changes, neither are done by the guest itself.  I could
reword to:

- Restrict the cache flushing done as a result of guest physical
  memory map manipulations and memory type changes.

Does that sound better?


Acked-By: Oleksii Kurochko 

~ Oleksii

Re: [PATCH v2 2/3] Add lockdown mode

2025-05-20 Thread Kevin Lampis

On Tue, May 20, 2025 at 3:23 PM Jan Beulich  wrote:
>
> No comments on the patch itself (yet), just a formal remark: I was puzzled
> by having only v2 2/3 and 3/3 in my inbox. Looks like you sent each as
> reply on the v1 sub-threads. Very occasionally for a larger series it may
> be okay to send just a single update that way. Here, however, please re-
> send as a full, standalone v2 series.

Sorry I will do that in future.

Re: [PATCH v2 2/5] public/sysctl: Clarify usage of pm_{px,cx}_stat

2025-05-20 Thread Ross Lagerwall

On Tue, May 13, 2025 at 3:43 PM Jan Beulich  wrote:
>
> On 12.05.2025 16:46, Ross Lagerwall wrote:
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -215,23 +215,51 @@ typedef struct pm_px_val pm_px_val_t;
> >  DEFINE_XEN_GUEST_HANDLE(pm_px_val_t);
> >
> >  struct pm_px_stat {
> > -uint8_t total;/* total Px states */
> > -uint8_t usable;   /* usable Px states */
> > -uint8_t last; /* last Px state */
> > -uint8_t cur;  /* current Px state */
> > -XEN_GUEST_HANDLE_64(uint64) trans_pt;   /* Px transition table */
> > +/*
> > + * IN: Number of elements in pt, number of rows/columns in trans_pt
> > + * (PMSTAT_get_pxstat)
> > + * OUT: total Px states (PMSTAT_get_max_px, PMSTAT_get_pxstat)
> > + */
> > +uint8_t total;
>
> The part for this field ought to go in patch 1, as already indicated there.
>
> > +uint8_t usable;   /* OUT: usable Px states (PMSTAT_get_pxstat) */
> > +uint8_t last; /* OUT: last Px state (PMSTAT_get_pxstat) */
> > +uint8_t cur;  /* OUT: current Px state (PMSTAT_get_pxstat) */
> > +/*
> > + * OUT: Px transition table. This should have total * total elements.
> > + *  This will not be copied if it is smaller than the hypervisor's
> > + *  Px transition table. (PMSTAT_get_pxstat)
> > + */
> > +XEN_GUEST_HANDLE_64(uint64) trans_pt;
> > +/* OUT: This should have total elements (PMSTAT_get_pxstat) */
> >  XEN_GUEST_HANDLE_64(pm_px_val_t) pt;
>
> As also indicated there, the same constraint as for trans_pt applies to this
> output buffer, just that it's having only one dimension.
>
> >  };
> >
> >  struct pm_cx_stat {
> > -uint32_t nr;/* entry nr in triggers & residencies, including C0 */
> > -uint32_t last;  /* last Cx state */
> > -uint64_aligned_t idle_time; /* idle time from boot */
> > -XEN_GUEST_HANDLE_64(uint64) triggers;/* Cx trigger counts */
> > -XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
> > -uint32_t nr_pc;  /* entry nr in pc[] */
> > -uint32_t nr_cc;  /* entry nr in cc[] */
> >  /*
> > + * IN:  Number of elements in triggers, residencies (PMSTAT_get_cxstat)
> > + * OUT: entry nr in triggers & residencies, including C0
> > + *  (PMSTAT_get_cxstat, PMSTAT_get_max_cx)
> > + */
> > +uint32_t nr;
> > +uint32_t last;  /* OUT: last Cx state (PMSTAT_get_cxstat) */
> > +/* OUT: idle time from boot (PMSTAT_get_cxstat)*/
> > +uint64_aligned_t idle_time;
> > +/* OUT: Cx trigger counts, nr elements (PMSTAT_get_cxstat) */
> > +XEN_GUEST_HANDLE_64(uint64) triggers;
> > +/* OUT: Cx residencies, nr elements (PMSTAT_get_cxstat) */
> > +XEN_GUEST_HANDLE_64(uint64) residencies;
> > +/*
> > + * IN: entry nr in pc[] (PMSTAT_get_cxstat)
> > + * OUT: Index of highest non-zero entry set in pc[] (PMSTAT_get_cxstat)
> > + */
> > +uint32_t nr_pc;
> > +/*
> > + * IN: entry nr in cc[] (PMSTAT_get_cxstat)
> > + * OUT: Index of highest non-zero entry set in cc[] (PMSTAT_get_cxstat)
> > + */
>
> For both of these, it's not "highest non-zero" but, according to ...
>
> > +uint32_t nr_cc;
> > +/*
> > + * OUT: (PMSTAT_get_cxstat)
> >   * These two arrays may (and generally will) have unused slots; slots 
> > not
> >   * having a corresponding hardware register will not be written by the
> >   * hypervisor. It is therefore up to the caller to put a suitable 
> > sentinel
>
> ... this comment, "highest written by hypervisor". They're also not "index 
> of",
> but "one higher than the index of" (i.e. counts, not indexes).
>

Looking at this again, I don't think that matches what Xen does (nor
does my previous attempt). The code in question:

#define PUT_xC(what, n) do { \
if ( stat->nr_##what >= n && \
 copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
return -EFAULT; \
if ( hw_res.what##n ) \
nr_##what = n; \
} while ( 0 )

Xen will copy all the hardware registers that it knows about (regardless
of whether the hardware actually has them) and will return in nr_pc /
nr_cc the index + 1 of the highest non-zero entry it _would_ have
written if there is sufficient space.

I could describe it simply as "OUT: Required size of cc[]" ?

Thanks,
Ross

Re: [PATCH v1 3/6] xen/riscv: construct the P2M pages pool for guests

On 09.05.2025 17:57, Oleksii Kurochko wrote:
> Implement p2m_set_allocation() to construct p2m pages pool for guests
> based on required number of pages.
> 
> This is implemented by:
> - Adding a `struct paging_domain` which contains a freelist, a
>   counter variable and a spinlock to `struct arch_domain` to
>   indicate the free p2m pages and the number of p2m total pages in
>   the p2m pages pool.
> - Adding a helper `p2m_set_allocation` to set the p2m pages pool
>   size. This helper should be called before allocating memory for
>   a guest and is called from domain_p2m_set_allocation(), the latter
>   is a part of common dom0less code.
> 
> Signed-off-by: Oleksii Kurochko 

As already indicated in reply to patch 2, I expect this pool will want
introducing ahead of that.

> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -1,4 +1,12 @@
>  #include 
> +/*
> + * Because of general_preempt_check() from xen/sched.h which uses
> + * local_events_need_delivery() but latter is declared in .
> + * Thereby it is needed to icnlude  here before xen/sched.h.
> + *
> + * Shouldn't be xen/event.h be included in ?
> + */
> +#include 

The question doesn't belong here; such could be put in the post-commit-
message area. And the answer depends on what dependency you found missing.

> @@ -166,3 +176,60 @@ int p2m_init(struct domain *d)
>  
>  return 0;
>  }
> +
> +/*
> + * Set the pool of pages to the required number of pages.
> + * Returns 0 for success, non-zero for failure.
> + * Call with d->arch.paging.lock held.
> + */
> +int p2m_set_allocation(struct domain *d, unsigned long pages, bool 
> *preempted)
> +{
> +struct page_info *pg;
> +
> +ASSERT(spin_is_locked(&d->arch.paging.lock));
> +
> +for ( ; ; )
> +{
> +if ( d->arch.paging.p2m_total_pages < pages )
> +{
> +/* Need to allocate more memory from domheap */
> +pg = alloc_domheap_page(d, MEMF_no_owner);
> +if ( pg == NULL )
> +{
> +printk(XENLOG_ERR "Failed to allocate P2M pages.\n");
> +return -ENOMEM;
> +}
> +ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
> +d->arch.paging.p2m_total_pages + 1;

Looks like you copied this from Arm, but this code is bogus: Using
ACCESS_ONCE() just on the lhs is pretty pointless. Once also used on the
rhs the expression can easily become

ACCESS_ONCE(d->arch.paging.p2m_total_pages) += 1;

or even

ACCESS_ONCE(d->arch.paging.p2m_total_pages)++;

.

> +page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
> +}
> +else if ( d->arch.paging.p2m_total_pages > pages )
> +{
> +/* Need to return memory to domheap */
> +pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
> +if( pg )
> +{
> +ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
> +d->arch.paging.p2m_total_pages - 1;

Same here then, obviously.

> +free_domheap_page(pg);
> +}
> +else
> +{
> +printk(XENLOG_ERR
> +   "Failed to free P2M pages, P2M freelist is empty.\n");
> +return -ENOMEM;
> +}
> +}
> +else
> +break;
> +
> +/* Check to see if we need to yield and try again */
> +if ( preempted && general_preempt_check() )

While it's this way on both Arm and x86, I wonder how useful it is
to check on every iteration, especially when freeing pages back to the
buddy allocator.

Jan

Re: [PATCH v1 4/6] xen/riscv: define pt_t and pt_walk_t structures

On 09.05.2025 17:57, Oleksii Kurochko wrote:
> Refactor pte_t to be a union which hold page table entry plus
> pt_t and pt_walk_t structures to simpilfy p2m functions.

Is this really simplifying things? I really view ...

> Also, introduce some helpers which are using pt_walk_t.

... these helpers as confusing things, by using the wrong part of the
union relative to what their names are. (I'll re-phrase this some at
the bottom.)

With the union it's also unclear to me how to know which part of the
union is the one that's valid to use, when there's no auxiliary info
available.

> --- a/xen/arch/riscv/include/asm/page.h
> +++ b/xen/arch/riscv/include/asm/page.h
> @@ -99,15 +99,67 @@
>  
>  #endif
>  
> -/* Page Table entry */
>  typedef struct {
> +unsigned long v:1;
> +unsigned long r:1;
> +unsigned long w:1;
> +unsigned long x:1;
> +unsigned long u:1;
> +unsigned long g:1;
> +unsigned long a:1;
> +unsigned long d:1;
> +unsigned long rsw:2;
> +#if RV_STAGE1_MODE == SATP_MODE_SV39
> +unsigned long ppn0:9;
> +unsigned long ppn1:9;
> +unsigned long ppn2:26;
> +unsigned long rsw2:7;
> +unsigned long pbmt:2;
> +unsigned long n:1;
> +#elif RV_STAGE1_MODE == SATP_MODE_SV48
> +unsigned long ppn0:9;
> +unsigned long ppn1:9;
> +unsigned long ppn2:9;
> +unsigned long ppn3:17;
> +unsigned long rsw2:7;
> +unsigned long pbmt:2;
> +unsigned long n:1;
> +#else

Imo you want to settle on whether you want to use bitfields or #define-s
to manipulate page table entries. Using a mix is going to be confusing
(or worse).

> +#error "Add proper bits for SATP_MODE"
> +#endif
> +} pt_t;
> +
> +typedef struct {
> +unsigned long rsw:10;
> +#if RV_STAGE1_MODE == SATP_MODE_SV39 || RV_STAGE1_MODE == SATP_MODE_SV48
> +unsigned long ppn: 44;

Nit: Why's there a blank after the colon here when there's none anywhere else?

> +static inline void pte_set_mfn(pte_t *pte, mfn_t mfn)
> +{
> +pte->walk.ppn = mfn_x(mfn);
> +}
> +
> +static inline mfn_t pte_get_mfn(pte_t pte)
> +{
> +return _mfn(pte.walk.ppn);
> +}

Following to my remark at the top - if you do it this way, what use are the
ppn fields?

Jan

Re: [PATCH v1 5/6] xen/riscv: add new p2m types and helper macros for type classification

On 09.05.2025 17:57, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/p2m.h
> +++ b/xen/arch/riscv/include/asm/p2m.h
> @@ -80,8 +80,36 @@ struct p2m_domain {
>  typedef enum {
>  p2m_invalid = 0,/* Nothing mapped here */
>  p2m_ram_rw, /* Normal read/write domain RAM */
> +p2m_ram_ro, /* Read-only; writes are silently dropped */

This is pretty special a type, which imo better wouldn't be introduced
without there being proper support for it. (I don't suppose RISC-V
hardware alone can effect this type?)

> +p2m_mmio_direct_dev,/* Read/write mapping of genuine Device MMIO area */
> +p2m_map_foreign_rw, /* Read/write RAM pages from foreign domain */
> +p2m_map_foreign_ro, /* Read-only RAM pages from foreign domain */

Aiui you took these from Arm. Looking at its sole use, I'm not convinced
it's used correctly. If it is, the same comment as for p2m_ram_ro above
would apply here, too.

Jan

Re: [PATCH] arm/vgic-v3: Fix write_ignore_64's check in __vgic_v3_rdistr_rd_mmio_write()



On 20.05.25 17:24, Andrew Cooper wrote:

Hello Andrew

> On 20/05/2025 2:47 pm, Oleksandr Tyshchenko wrote:
>> An attempt to write access the register (i.e. GICR_PROPBASER, GICR_PENDBASER)
>> which should be ignored (i.e. no virtual ITS present) causes the data about
> 
> Do you mean "data abort" here?

yes

   If not, I can't parse the sentence.
> 
>> due to incorrect check at the write_ignore_64 label. The check should be
>> inverted.
>>
>> Fixes: c4d6bbdc12e5 ("xen/arm: vgic-v3: Support 32-bit access for 64-bit 
>> registers")
>> Signed-off-by: Oleksandr Tyshchenko 
>> ---
>>   xen/arch/arm/vgic-v3.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
>> index 2eaa48fadb..b366b046a2 100644
>> --- a/xen/arch/arm/vgic-v3.c
>> +++ b/xen/arch/arm/vgic-v3.c
>> @@ -649,7 +649,7 @@ bad_width:
>>   return 0;
>>   
>>   write_ignore_64:
>> -if ( vgic_reg64_check_access(dabt) ) goto bad_width;
>> +if ( !vgic_reg64_check_access(dabt) ) goto bad_width;
> 
> As you're modifying anyway, the goto should be on the next line.

ok, will move

> 
> ~Andrew

Re: [PATCH v1 6/6] xen/riscv: implement p2m mapping functionality

On 09.05.2025 17:57, Oleksii Kurochko wrote:
> These utilities are needed for building and managing RISC-V guest page
> tables and MMIO mappings by using functions map_regions_p2mt() and
> guest_physmap_add_entry().
> 
> To implement p2m mapping functionality the following is introduced:
> - Define P2M root level/order and entry count.
> - Introdude radix type for p2m types as it isn't enough free bits in pte
>   and the helpers (p2m_type_radix_{get,set}()) to deal with them.
> - Introduce p2m_is_*() helpers() as  pte_is_*() helpers are checking
>   the valid bit set in the PTE but we have to check p2m_type instead
>   (look at the comment above p2m_is_valid() for some details).

May I suggest to name them at least p2me_is_*() then, as they check entries
rather than entire P2Ms? Same perhaps elsewhere.

> - Introduce helper to set p2m's pte permission: p2m_set_permissions().
> - Introduce helper to create p2m entry based on mfn, p2m_type_t and
>   p2m_access_t.
> - Introduce helper to generate table entry with correct attributes:
>   page_to_p2m_table().
> - Introduce p2m page allocation function: p2m_alloc_page().
> - Introduce functions to write/remove p2m's entries: p2m_{write,remove}_pte().
> - Introduce function to allocate p2m table: p2m_create_table().
> - Introduce functions used to free p2m entry.
> - Introduce function for table walking: p2m_next_level().
> - Introduce function to insert an entry in the p2m (p2m_set_entry()).
> - Introduce superpage splitting: p2m_split_superpage()).
> - Introduce page table type defines (PGT_{none,writable_page}, etc).
> 
> Signed-off-by: Oleksii Kurochko 
> ---
>  xen/arch/riscv/include/asm/mm.h   |  32 +-
>  xen/arch/riscv/include/asm/p2m.h  |  17 +-
>  xen/arch/riscv/include/asm/page.h |  11 +
>  xen/arch/riscv/p2m.c  | 780 ++
>  4 files changed, 829 insertions(+), 11 deletions(-)

It's imo too many things you do in one go here.

Jan

Re: [PATCH v2 1/2] xen/domain: introduce non-x86 hardware emulation flags

On 16.05.2025 04:29, dm...@proton.me wrote:
> From: Denis Mukhin 
> 
> Define per-architecture emulation_flags for configuring domain emulation
> features.
> 
> Print d->arch.emulation_flags from 'q' keyhandler for better traceability
> while debugging.
> 
> Signed-off-by: Denis Mukhin 
> ---
> Changes since v1:
> - dropped comments
> ---
>  xen/arch/arm/include/asm/domain.h   | 1 +
>  xen/arch/ppc/include/asm/domain.h   | 1 +
>  xen/arch/riscv/include/asm/domain.h | 1 +
>  xen/common/keyhandler.c | 1 +
>  4 files changed, 4 insertions(+)

If every arch gains identical fields, accessed from common code, why would
those need to live in struct arch_domain?

Jan

Re: [PATCH] arm/vgic-v3: Fix write_ignore_64's check in __vgic_v3_rdistr_rd_mmio_write()



On 20.05.25 18:02, Julien Grall wrote:
> Hi Oleksandr,

Hello Julien

> 
> On 20/05/2025 14:47, Oleksandr Tyshchenko wrote:
>> An attempt to write access the register (i.e. GICR_PROPBASER, 
>> GICR_PENDBASER)
>> which should be ignored (i.e. no virtual ITS present) causes the data 
>> about
> 
> I assume, this is a guest data abort, rather than Xen crash?

yes

> 
>> due to incorrect check at the write_ignore_64 label. The check should be
>> inverted.
> 
> OOI, why would a guest try to write to GICR_PROPBASER if the ITS is not 
> present? Was it a bug in the OS?

no, it was just me experimenting with redistributor registers.

> 
>>
>> Fixes: c4d6bbdc12e5 ("xen/arm: vgic-v3: Support 32-bit access for 
>> 64-bit registers")
>> Signed-off-by: Oleksandr Tyshchenko 
> 
> With the commit message clarified and Andrew's comments addressed:
> 
> Acked-by: Julien Grall 

thanks

> 
> Cheers,
>

Re: [PATCH v1 2/6] xen/riscv: introduce things necessary for p2m initialization

On 09.05.2025 17:57, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -5,6 +5,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  struct hvm_domain
>  {
>  uint64_t  params[HVM_NR_PARAMS];
> @@ -16,8 +18,12 @@ struct arch_vcpu_io {
>  struct arch_vcpu {
>  };
>  
> +

Nit: As before, no double blank lines please.

>  struct arch_domain {
>  struct hvm_domain hvm;
> +
> +struct p2m_domain p2m;
> +
>  };

Similarly, no blank lines please at the end of a struct/union/enum.

> --- a/xen/arch/riscv/include/asm/mm.h
> +++ b/xen/arch/riscv/include/asm/mm.h
> @@ -149,6 +149,10 @@ extern struct page_info *frametable_virt_start;
>  #define mfn_to_page(mfn)(frametable_virt_start + mfn_x(mfn))
>  #define page_to_mfn(pg) _mfn((pg) - frametable_virt_start)
>  
> +/* Convert between machine addresses and page-info structures. */
> +#define maddr_to_page(ma) mfn_to_page(maddr_to_mfn(ma))
> +#define page_to_maddr(pg) (mfn_to_maddr(page_to_mfn(pg)))

Nit: The outermost parentheses aren't really needed here. Or if you really
want them, then please be consistent and also add them for the other macro
you add.

> --- /dev/null
> +++ b/xen/arch/riscv/p2m.c
> @@ -0,0 +1,168 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +/*
> + * Force a synchronous P2M TLB flush.
> + *
> + * Must be called with the p2m lock held.
> + *
> + * TODO: add support of flushing TLB connected to VMID.
> + */
> +static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
> +{
> +ASSERT(p2m_is_write_locked(p2m));
> +
> +/*
> + * TODO: shouldn't be this flush done for each physical CPU?
> + *   If yes, then SBI call sbi_remote_hfence_gvma() could
> + *   be used for that.
> + */
> +#if defined(__riscv_hh) || defined(__riscv_h)

This is a compiler capability check (which would be applicable if you
used a built-in function below).

> +asm volatile ( "hfence.gvma" ::: "memory" );

Whereas here you use a feature in the assembler, at least for the GNU
toolchain.

> +static void clear_and_clean_page(struct page_info *page)
> +{
> +void *p = __map_domain_page(page);
> +
> +clear_page(p);
> +unmap_domain_page(p);
> +}

What's the "clean" about in the function name? The "clear" is referring
to the clear_page() call afaict. Also aren't you largely open-coding
clear_domain_page() here?

> +static struct page_info *p2m_get_clean_page(struct domain *d)
> +{
> +struct page_info *page;
> +
> +/*
> + * As mentioned in the Priviliged Architecture Spec (version 20240411)
> + * As explained in Section 18.5.1, for the paged virtual-memory schemes
> + * (Sv32x4, Sv39x4, Sv48x4, and Sv57x4), the root page table is 16 KiB
> + * and must be aligned to a 16-KiB boundary.
> + */
> +page = alloc_domheap_pages(NULL, 2, 0);

Shouldn't this allocation come from the domain's P2M pool (which is yet
to be introduced)? Also hard-coding 2 here as order effectively builds
in an assumption that PAGE_SIZE will only ever be 4k. I think to wants
properly calculating instead.

> +if ( page == NULL )
> +return NULL;
> +
> +clear_and_clean_page(page);
> +
> +return page;
> +}

Contrary to the function name you obtained 4 pages here, which is suitable
for ...

> +static struct page_info *p2m_allocate_root(struct domain *d)
> +{
> +return p2m_get_clean_page(d);
> +}

... this but - I expect - no anywhere else.

> +static unsigned long hgatp_from_page_info(struct page_info *page_info)

Except for the struct name please drop _info; we don#t use such anywhere
else.

> +{
> +unsigned long ppn;
> +unsigned long hgatp_mode;
> +
> +ppn = PFN_DOWN(page_to_maddr(page_info)) & HGATP_PPN;
> +
> +/* ASID (VMID) not supported yet */
> +
> +#if RV_STAGE1_MODE == SATP_MODE_SV39
> +hgatp_mode = HGATP_MODE_SV39X4;
> +#elif RV_STAGE1_MODE == SATP_MODE_SV48
> +hgatp_mode = HGATP_MODE_SV48X4;
> +#else
> +#error "add HGATP_MODE"

As before, please have the # of pre-processor directives in the first column.

> +#endif
> +
> +return ppn | (hgatp_mode << HGATP_MODE_SHIFT);

Use MASK_INSR()?

> +}
> +
> +static int p2m_alloc_table(struct domain *d)
> +{
> +struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +p2m->root = p2m_allocate_root(d);
> +if ( !p2m->root )
> +return -ENOMEM;
> +
> +p2m->hgatp = hgatp_from_page_info(p2m->root);
> +
> +/*
> + * Make sure that all TLBs corresponding to the new VMID are flushed
> + * before using it.
> + */
> +p2m_write_lock(p2m);
> +p2m_force_tlb_flush_sync(p2m);
> +p2m_write_unlock(p2m);

While Andrew directed you towards a better model in general, it won't be
usable here then, as the guest didn't run on any pCPU(s) yet. Imo you
want to do a single global flush e.g. when VMIDs wrap around. That'll be
fewer g

[PATCH] xen/argo: Command line handling improvements

Treat "argo" on the command line as a positive boolean, rather than requiring
the user to pass "argo=1/on/enable/true".

Move both opt_argo* variables into __ro_after_init.  They're set during
command line parsing and never modified thereafter.

Signed-off-by: Andrew Cooper 
---
CC: Christopher Clark 
CC: Daniel P. Smith 
CC: Denis Mukhin 

Found while
---
 xen/common/argo.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index cbe8911a4364..027b37b18a6c 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -79,8 +79,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
 DEFINE_COMPAT_HANDLE(compat_argo_iov_t);
 #endif
 
-static bool __read_mostly opt_argo;
-static bool __read_mostly opt_argo_mac_permissive;
+static bool __ro_after_init opt_argo;
+static bool __ro_after_init opt_argo_mac_permissive;
 
 static int __init cf_check parse_argo(const char *s)
 {
@@ -92,7 +92,10 @@ static int __init cf_check parse_argo(const char *s)
 if ( !ss )
 ss = strchr(s, '\0');
 
-if ( (val = parse_bool(s, ss)) >= 0 )
+/* Intepret "argo" as a positive boolean. */
+if ( *s == '\0' )
+opt_argo = true;
+else if ( (val = parse_bool(s, ss)) >= 0 )
 opt_argo = val;
 else if ( (val = parse_boolean("mac-permissive", s, ss)) >= 0 )
 opt_argo_mac_permissive = val;

base-commit: 293abb9e0c5e8df96cc5dfe457c62dfcb7542b19
-- 
2.39.5

Re: [PATCH v2 09/16] xen/riscv: introduce register_intc_ops() and intc_hw_ops.

On 20.05.2025 16:04, Oleksii Kurochko wrote:
> On 5/19/25 3:16 PM, Jan Beulich wrote:
>> On 19.05.2025 11:16, Oleksii Kurochko wrote:
>>> On 5/15/25 10:06 AM, Jan Beulich wrote:
 On 06.05.2025 18:51, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/intc.h
> +++ b/xen/arch/riscv/include/asm/intc.h
> @@ -8,6 +8,8 @@
>#ifndef ASM__RISCV__INTERRUPT_CONTOLLER_H
>#define ASM__RISCV__INTERRUPT_CONTOLLER_H
>
> +#include 
 If you need this include anyway, why ...

> @@ -17,6 +19,26 @@ struct intc_info {
>const struct dt_device_node *node;
>};
>
> +struct irq_desc;
 ... this "forward" decl for something that's then already fully defined?
 I can't, however, spot why xen/irq.h would be needed for anything ...
>>> forward decl for irq_desc could be really dropped.
>>>
>>> Inclusion of xen/irq.h was added because of hw_irq_controller which is 
>>> defined as:
>>> typedef const struct hw_interrupt_type hw_irq_controller;
>>>
>>> And I'm not sure how to do forward declaration properly in this case. Just 
>>> use
>>> an explicit definition of hw_irq_controller for host_irq_type member of 
>>> struct
>>> intc_hw_operations seems as not the best one option:
>>> struct hw_interrupt_type;
>> This isn't needed for the use below.
> 
> Shouldn't I tell the compiler that definition of hw_interrupt_type struct 
> exist
> somewhere else?

The above decl merely introduces the type into (global) scope. The same is
achieved by ...

>>> struct intc_hw_operations {
>>> ...
>>> // const hw_irq_controller *host_irq_type;
>>> const struct hw_interrupt_type *host_irq_type;

... this. The case where the earlier decl matters is when a type is used
as a function parameter in a prototype. There, if not previously introduced
into global scope, the scope would be limited to that of the function decl
(and hence a type conflict would result when later the same type is
introduced into global scope).

> --- a/xen/arch/riscv/intc.c
> +++ b/xen/arch/riscv/intc.c
> @@ -5,6 +5,15 @@
>#include 
>#include 
>
> +#include 
> +
> +static struct __ro_after_init intc_hw_operations *intc_hw_ops;
 Nit: Attributes between type and identifier please. Also shouldn't both
 this and ...

> +void __init register_intc_ops(struct intc_hw_operations *ops)
 ... the parameter here be pointer-to-const?
>>> Then|intc_hw_ops| should also be marked as|const| (with the removal 
>>> of|__ro_after_init|),
>> Why remove the attribute?
> 
> My understanding is that if it is marked as 'const' then it automatically 
> mean that it is read-only
> always before and after init, so '__ro_after_init' is useless.

You need to separate properties of the (pointer type) variable, and what
that pointer points to. __ro_after_init applies to the variable, whereas
the const asked for is to apply to the pointed-to type. (This is more
obvious when you place the attribute as requested. Hence why we want that
particular placement.)

Jan

Re: [PATCH v2 10/16] xen/riscv: imsic_init() implementation



On 5/19/25 8:32 PM, Jan Beulich wrote:

On 19.05.2025 17:19, Oleksii Kurochko wrote:

On 5/15/25 10:42 AM, Jan Beulich wrote:

On 06.05.2025 18:51, Oleksii Kurochko wrote:

--- /dev/null
+++ b/xen/arch/riscv/include/asm/imsic.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: MIT */
+
+/*
+ * xen/arch/riscv/imsic.h
+ *
+ * RISC-V Incoming MSI Controller support
+ *
+ * (c) 2023 Microchip Technology Inc.
+ */
+
+#ifndef ASM__RISCV__IMSIC_H
+#define ASM__RISCV__IMSIC_H
+
+#include 
+
+#define IMSIC_MMIO_PAGE_SHIFT   12
+#define IMSIC_MMIO_PAGE_SZ  (1UL << IMSIC_MMIO_PAGE_SHIFT)
+
+#define IMSIC_MIN_ID63

This isn't the "minimum ID", but the "minimum permissible number of IDs" as per
its first use in imsic_parse_node(). Further uses there consider it a mask,
which makes me wonder whether the imsic_cfg.nr_ids field name is actually
correct: Isn't this the maximum valid ID rather than their count/number?

imsic_cfg.nr_ids it is a value of interrupt identities supported by IMSIC 
interrupt file according to
DT binding:
riscv,num-ids:
  $ref: /schemas/types.yaml#/definitions/uint32
  minimum: 63
  maximum: 2047
  description:
Number of interrupt identities supported by IMSIC interrupt file.

Unless this count accounts for 0 being invalid (and hence isn't counted), I'm
still confused: With the maximum value this would mean 2046 is still valid,
but 2047 isn't anymore. When normally such a boundary would be at an exact
power of 2, i.e. between 2047 and 2048 here.


2047 is inclusive according to the AIA spec:
  The number of interrupt identities supported by an interrupt file
  (and hence the number of active bits in each array) is one less than a 
multiple
  of 64, and may be a minimum of 63 and a maximum of 2047.
  ...
  When an interrupt file supports N distinct interrupt identities,
  valid identity numbers are between 1 and N inclusive.
  The identity numbers within this range are said to be implemented by the 
interrupt
  file; numbers outside this range are not implemented.
  The number zero is never a valid interrupt identity. Identity 0 is just 
ignored.

It is still  not a power of two but it was the AIA spec tells us.

Also, this maximum identity number of 2047 is consistent with related fields 
like
the EIID (External Interrupt Identity) field used in APLICs when forwarding 
MSIs,
which specifies the MSI data value that becomes the minor identity at the target
hart's interrupt file.
The EIID field is typically an 11-bit field, able to hold values from 0 through
2047.
Since identity 0 is invalid, the entire range of valid identity numbers (1-2047)
fits within the values representable by an 11-bit field.

~ Oleksii

Re: [PATCH v1 2/6] xen/riscv: introduce things necessary for p2m initialization

On 12.05.2025 11:24, Oleksii Kurochko wrote:
> On 5/9/25 6:14 PM, Andrew Cooper wrote:
>> On 09/05/2025 4:57 pm, Oleksii Kurochko wrote:
>>> --- /dev/null
>>> +++ b/xen/arch/riscv/p2m.c
>>> @@ -0,0 +1,168 @@
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +/*
>>> + * Force a synchronous P2M TLB flush.
>>> + *
>>> + * Must be called with the p2m lock held.
>>> + *
>>> + * TODO: add support of flushing TLB connected to VMID.
>>> + */
>>> +static void p2m_force_tlb_flush_sync(struct p2m_domain *p2m)
>>> +{
>>> +ASSERT(p2m_is_write_locked(p2m));
>>> +
>>> +/*
>>> + * TODO: shouldn't be this flush done for each physical CPU?
>>> + *   If yes, then SBI call sbi_remote_hfence_gvma() could
>>> + *   be used for that.
>>> + */
>>> +#if defined(__riscv_hh) || defined(__riscv_h)
>>> +asm volatile ( "hfence.gvma" ::: "memory" );
>>> +#else
>>> +asm volatile ( ".insn r 0x73, 0x0, 0x31, x0, x0, x0" ::: "memory" );
>>> +#endif
>> TLB flushing needs to happen for each pCPU which potentially has cached
>> a mapping.
>>
>> In other arches, this is tracked by d->dirty_cpumask which is the bitmap
>> of pCPUs where this domain is scheduled.
> 
> I could only find usage of|d->dirty_cpumask| in x86 and common code (grant
> tables) for flushing the TLB. However, it seems that|d->dirty_cpumask| is
> not set anywhere for ARM. Is it sufficient to set a bit in|d->dirty_cpumask|
> in|startup_cpu_idle_loop()|?

No, how would the idle loop be relevant here? The bit needs setting for any
pCPU a vCPU of the domain is running on, i.e. somewhere in context switch
code.

> In addition, it’s also necessary to set and clear bits in|d->dirty_cpumask|
> during|context_switch|, correct? Set it before switching from the previous
> domain, and clear it after switching to the new domain?
> 
> Also, when a bit is set in|d->dirty_cpumask|, the|v->processor| value is also
> stored in|v->dirty_cpu|. Is this needed to track which processor is
> currently being used for the vCPU?
> 
>> CPUs need to flush their TLBs before removing themselves from
>> d->dirty_cpumask, which is typically done during context switch, but it
>> means that to flush the P2M, you only need to IPI a subset of CPUs.
> 
> I can't find where the P2M flush happens for x86/ARM. Could you please point 
> me
> to where it is handled?

Grep for ept_sync_domain, which will give you several involved functions
(for the Intel, i.e. EPT case).

> Also, I found guest_flush_tlb_mask() for x86. I assume that it is x86 specific
> and generally it is enough to have only flush_tlb_mask(), right?

Yes, that's an x86-specific helper which you may or may not want a
counterpart of.

Jan

[PATCH] arm/vgic-v3: Fix write_ignore_64's check in __vgic_v3_rdistr_rd_mmio_write()

An attempt to write access the register (i.e. GICR_PROPBASER, GICR_PENDBASER)
which should be ignored (i.e. no virtual ITS present) causes the data about
due to incorrect check at the write_ignore_64 label. The check should be
inverted.

Fixes: c4d6bbdc12e5 ("xen/arm: vgic-v3: Support 32-bit access for 64-bit 
registers")
Signed-off-by: Oleksandr Tyshchenko 
---
 xen/arch/arm/vgic-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 2eaa48fadb..b366b046a2 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -649,7 +649,7 @@ bad_width:
 return 0;
 
 write_ignore_64:
-if ( vgic_reg64_check_access(dabt) ) goto bad_width;
+if ( !vgic_reg64_check_access(dabt) ) goto bad_width;
 return 1;
 
 write_ignore_32:
-- 
2.34.1

Re: [PATCH v2 2/5] public/sysctl: Clarify usage of pm_{px,cx}_stat

On 20.05.2025 16:25, Ross Lagerwall wrote:
> On Tue, May 13, 2025 at 3:43 PM Jan Beulich  wrote:
>>
>> On 12.05.2025 16:46, Ross Lagerwall wrote:
>>> --- a/xen/include/public/sysctl.h
>>> +++ b/xen/include/public/sysctl.h
>>> @@ -215,23 +215,51 @@ typedef struct pm_px_val pm_px_val_t;
>>>  DEFINE_XEN_GUEST_HANDLE(pm_px_val_t);
>>>
>>>  struct pm_px_stat {
>>> -uint8_t total;/* total Px states */
>>> -uint8_t usable;   /* usable Px states */
>>> -uint8_t last; /* last Px state */
>>> -uint8_t cur;  /* current Px state */
>>> -XEN_GUEST_HANDLE_64(uint64) trans_pt;   /* Px transition table */
>>> +/*
>>> + * IN: Number of elements in pt, number of rows/columns in trans_pt
>>> + * (PMSTAT_get_pxstat)
>>> + * OUT: total Px states (PMSTAT_get_max_px, PMSTAT_get_pxstat)
>>> + */
>>> +uint8_t total;
>>
>> The part for this field ought to go in patch 1, as already indicated there.
>>
>>> +uint8_t usable;   /* OUT: usable Px states (PMSTAT_get_pxstat) */
>>> +uint8_t last; /* OUT: last Px state (PMSTAT_get_pxstat) */
>>> +uint8_t cur;  /* OUT: current Px state (PMSTAT_get_pxstat) */
>>> +/*
>>> + * OUT: Px transition table. This should have total * total elements.
>>> + *  This will not be copied if it is smaller than the hypervisor's
>>> + *  Px transition table. (PMSTAT_get_pxstat)
>>> + */
>>> +XEN_GUEST_HANDLE_64(uint64) trans_pt;
>>> +/* OUT: This should have total elements (PMSTAT_get_pxstat) */
>>>  XEN_GUEST_HANDLE_64(pm_px_val_t) pt;
>>
>> As also indicated there, the same constraint as for trans_pt applies to this
>> output buffer, just that it's having only one dimension.
>>
>>>  };
>>>
>>>  struct pm_cx_stat {
>>> -uint32_t nr;/* entry nr in triggers & residencies, including C0 */
>>> -uint32_t last;  /* last Cx state */
>>> -uint64_aligned_t idle_time; /* idle time from boot */
>>> -XEN_GUEST_HANDLE_64(uint64) triggers;/* Cx trigger counts */
>>> -XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
>>> -uint32_t nr_pc;  /* entry nr in pc[] */
>>> -uint32_t nr_cc;  /* entry nr in cc[] */
>>>  /*
>>> + * IN:  Number of elements in triggers, residencies (PMSTAT_get_cxstat)
>>> + * OUT: entry nr in triggers & residencies, including C0
>>> + *  (PMSTAT_get_cxstat, PMSTAT_get_max_cx)
>>> + */
>>> +uint32_t nr;
>>> +uint32_t last;  /* OUT: last Cx state (PMSTAT_get_cxstat) */
>>> +/* OUT: idle time from boot (PMSTAT_get_cxstat)*/
>>> +uint64_aligned_t idle_time;
>>> +/* OUT: Cx trigger counts, nr elements (PMSTAT_get_cxstat) */
>>> +XEN_GUEST_HANDLE_64(uint64) triggers;
>>> +/* OUT: Cx residencies, nr elements (PMSTAT_get_cxstat) */
>>> +XEN_GUEST_HANDLE_64(uint64) residencies;
>>> +/*
>>> + * IN: entry nr in pc[] (PMSTAT_get_cxstat)
>>> + * OUT: Index of highest non-zero entry set in pc[] (PMSTAT_get_cxstat)
>>> + */
>>> +uint32_t nr_pc;
>>> +/*
>>> + * IN: entry nr in cc[] (PMSTAT_get_cxstat)
>>> + * OUT: Index of highest non-zero entry set in cc[] (PMSTAT_get_cxstat)
>>> + */
>>
>> For both of these, it's not "highest non-zero" but, according to ...
>>
>>> +uint32_t nr_cc;
>>> +/*
>>> + * OUT: (PMSTAT_get_cxstat)
>>>   * These two arrays may (and generally will) have unused slots; slots 
>>> not
>>>   * having a corresponding hardware register will not be written by the
>>>   * hypervisor. It is therefore up to the caller to put a suitable 
>>> sentinel
>>
>> ... this comment, "highest written by hypervisor". They're also not "index 
>> of",
>> but "one higher than the index of" (i.e. counts, not indexes).
>>
> 
> Looking at this again, I don't think that matches what Xen does (nor
> does my previous attempt). The code in question:
> 
> #define PUT_xC(what, n) do { \
> if ( stat->nr_##what >= n && \
>  copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
> return -EFAULT; \
> if ( hw_res.what##n ) \
> nr_##what = n; \
> } while ( 0 )

Right. I should have inserted "that could be" in my reply.

> Xen will copy all the hardware registers that it knows about (regardless
> of whether the hardware actually has them) and will return in nr_pc /
> nr_cc the index + 1 of the highest non-zero entry it _would_ have
> written if there is sufficient space.

Which is kind of bogus when the last (or more) of those would merely be
zero.

> I could describe it simply as "OUT: Required size of cc[]" ?

Maybe append something like "..., for all known to Xen entries to be
written"? Provided we don't want to change the behavior anyway.

Jan

Re: [PATCH] arm/vgic-v3: Fix write_ignore_64's check in __vgic_v3_rdistr_rd_mmio_write()

2025-05-20 Thread Julien Grall


Hi Oleksandr,

On 20/05/2025 14:47, Oleksandr Tyshchenko wrote:

An attempt to write access the register (i.e. GICR_PROPBASER, GICR_PENDBASER)
which should be ignored (i.e. no virtual ITS present) causes the data about


I assume, this is a guest data abort, rather than Xen crash?

due to incorrect check at the write_ignore_64 label. 
The check should be

inverted.


OOI, why would a guest try to write to GICR_PROPBASER if the ITS is not 
present? Was it a bug in the OS?




Fixes: c4d6bbdc12e5 ("xen/arm: vgic-v3: Support 32-bit access for 64-bit 
registers")
Signed-off-by: Oleksandr Tyshchenko 


With the commit message clarified and Andrew's comments addressed:

Acked-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v2 2/3] Add lockdown mode

On 20.05.2025 13:57, Kevin Lampis wrote:
> From: Ross Lagerwall 
> 
> The intention of lockdown mode is to prevent attacks from a rogue dom0
> userspace from compromising the system. Lockdown mode can be controlled by a
> Kconfig option and a command-line parameter. It is also enabled automatically
> when Secure Boot is enabled and it cannot be disabled in that case.
> 
> If enabled from the command-line then it is required to be first in the
> list otherwise Xen may process some insecure parameters before reaching
> the lockdown parameter.
> 
> Signed-off-by: Ross Lagerwall 
> Signed-off-by: Kevin Lampis 
> ---
> Changes in v2:
> - Remove custom command line parsing
> - Print warning if lockdown is not first on command line

No comments on the patch itself (yet), just a formal remark: I was puzzled
by having only v2 2/3 and 3/3 in my inbox. Looks like you sent each as
reply on the v1 sub-threads. Very occasionally for a larger series it may
be okay to send just a single update that way. Here, however, please re-
send as a full, standalone v2 series.

Jan

Re: [PATCH v2 2/2] xen/domain: rewrite emulation_flags_ok()

On 16.05.2025 04:29, dm...@proton.me wrote:
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -494,6 +494,12 @@ struct arch_domain
>   X86_EMU_PIT | X86_EMU_USE_PIRQ |   \
>   X86_EMU_VPCI)
>  
> +/* User-selectable features. */
> +#define X86_EMU_OPTIONAL(X86_EMU_USE_PIRQ)
> +
> +#define X86_EMU_BASELINE(X86_EMU_ALL & ~(X86_EMU_VPCI | \
> + X86_EMU_OPTIONAL))

That is, VPCI is neither baseline nor optional. Certainly at least strange.

Jan

[PATCH V2] arm/vgic-v3: Fix write_ignore_64's check in __vgic_v3_rdistr_rd_mmio_write()

An attempt to write access the register (i.e. GICR_PROPBASER, GICR_PENDBASER)
which should be ignored (i.e. no virtual ITS present) causes the guest data 
abort
due to incorrect check at the write_ignore_64 label. The check should be
inverted.

While at it, move goto to the next line.

Fixes: c4d6bbdc12e5 ("xen/arm: vgic-v3: Support 32-bit access for 64-bit 
registers")
Signed-off-by: Oleksandr Tyshchenko 
Acked-by: Julien Grall 
---
  V2:
   - s/data about/guest data abort in the description
   - add A-b
   - move goto to the next line
---
---
 xen/arch/arm/vgic-v3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 2eaa48fadb..f20249f731 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -649,7 +649,8 @@ bad_width:
 return 0;
 
 write_ignore_64:
-if ( vgic_reg64_check_access(dabt) ) goto bad_width;
+if ( !vgic_reg64_check_access(dabt) )
+goto bad_width;
 return 1;
 
 write_ignore_32:
-- 
2.34.1

Re: [PATCH TEST-ARTEFACTS] (Re)add python3 to alpine rootfs

On Tue, 20 May 2025, Andrew Cooper wrote:
> XTF uses python, and we're looking to reintroduce XTF testing to Xen.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Stefano Stabellini 


> ---
> CC: Anthony PERARD 
> CC: Stefano Stabellini 
> CC: Marek Marczykowski-Górecki 
> ---
>  scripts/alpine-rootfs.sh | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/scripts/alpine-rootfs.sh b/scripts/alpine-rootfs.sh
> index c304e2ebfbd9..c999b89dbcd8 100755
> --- a/scripts/alpine-rootfs.sh
> +++ b/scripts/alpine-rootfs.sh
> @@ -22,6 +22,9 @@ PKGS=(
>  xz
>  yajl
>  
> +# Xen Test Framework
> +python3
> +
>  # QEMU
>  glib
>  libaio
> -- 
> 2.39.5
>

Re: [PATCH] tools: Add install/uninstall targets to tests/x86_emulator

On Tue, 20 May 2025, Andrew Cooper wrote:
> On 16/05/2024 12:07 pm, Alejandro Vallejo wrote:
> > Bring test_x86_emulator in line with other tests by adding
> > install/uninstall rules.
> >
> > Signed-off-by: Alejandro Vallejo 
> > ---
> >  tools/tests/x86_emulator/Makefile | 11 +--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/tests/x86_emulator/Makefile 
> > b/tools/tests/x86_emulator/Makefile
> > index 834b2112e7fe..30edf7e0185d 100644
> > --- a/tools/tests/x86_emulator/Makefile
> > +++ b/tools/tests/x86_emulator/Makefile
> > @@ -269,8 +269,15 @@ clean:
> >  .PHONY: distclean
> >  distclean: clean
> >  
> > -.PHONY: install uninstall
> > -install uninstall:
> > +.PHONY: install
> > +install: all
> > +   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
> > +   $(if $(TARGET-y),$(INSTALL_PROG) $(TARGET-y) $(DESTDIR)$(LIBEXEC_BIN))
> > +
> > +.PHONY: uninstall
> > +uninstall:
> > +   $(RM) -- $(addprefix $(DESTDIR)$(LIBEXEC_BIN)/,$(TARGET-y))
> > +
> >  
> >  .PHONY: run32 clean32
> >  ifeq ($(XEN_COMPILE_ARCH),x86_64)
> 
> [starting a clean thread]
> 
> x86_emulator is not special enough to behave differently to the rest of
> tools/.
> 
> Theoretical concerns over cross compiling test_x86_emulator for non-x86
> can be fixed by whomever first wants to do this.
> 
> The very real problem is that this doesn't run in x86 CI, because and
> only because it doesn't have an install target.

This patch has been on the list of a while. I do think that having the
x86 emulator better tested as part of the CI-loop is a priority so I am
in favor of taking the patch as is -- anything to make progress in terms
of running test_x86_emulator as part of the CI-loop.

Acked-by: Stefano Stabellini 

Cheers,

Stefano

[PATCH 2/2] MAINTAINERS: add Daniel P. Smith as an Argo maintainer

2025-05-20 Thread Christopher Clark

Daniel is a longstanding contributor to the OpenXT Project where Argo
was developed and is in active use with Xen, and to Argo itself,
involved with the design and development of Argo software.

Signed-off-by: Christopher Clark 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e7198363c5..c2e4f8528f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -222,6 +222,7 @@ F:  tools/libacpi/
 
 ARGO
 M: Christopher Clark 
+M: Daniel P. Smith 
 S: Maintained
 F: xen/include/public/argo.h
 F: xen/include/xen/argo.h
-- 
2.25.1

[PATCH 1/2] MAINTAINERS: include Argo documentation in the ARGO section

2025-05-20 Thread Christopher Clark

Signed-off-by: Christopher Clark 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c11b82eca9..e7198363c5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -226,6 +226,7 @@ S:  Maintained
 F: xen/include/public/argo.h
 F: xen/include/xen/argo.h
 F: xen/common/argo.c
+F: docs/designs/argo.pandoc
 
 ARINC653 SCHEDULER
 M: Nathan Studer 
-- 
2.25.1

pin_user_pages and foreign mappings error

Hi Juergen and all,

We have an issue where QEMU is mapping foreign pages as usual and
passing them to a driver in Linux (amdxdna). The driver in Linux calls
pin_user_pages_fast() on these pages, and it returns -EFAULT. Stack
trace appended below.

This is Dom0 PVH. We disabled CONFIG_XEN_UNPOPULATED_ALLOC and
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG attemping to make things better but it
did not solved the issue. We tried changing pin_user_pages_fast() to
pin_user_pages(), still -EFAULT. check_vma_flags returns -EFAULT because
of the (VM_IO | VM_PFNMAP) check.

We tried removing (VM_IO | VM_PFNMAP) from privcmd_mmap and
xen_xlate_remap_gfn_array based on the idea that the underlying pages
are normal memory once CONFIG_XEN_UNPOPULATED_ALLOC and
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG are disabled.

In this case, vm_normal_page takes the if
(IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) path, none of the checks work,
so it calls print_bad_pte and it breaks.

As another attempt, we tried removing pte_mkspecial from
xlate_mmu.c:remap_pte_fn and remap_pfn_fn, based again on the same idea
that the underlying pages should not be "special". Now it went further
but it broke at unmap_vmas time on a reference counting error. Specifically,
we get "non-zero mapcount" on the callchain from unmap_vmas:

[31789.440433] BUG: Bad page map in process qemu-system-x86  
pte:80018f8a9027 pmd:13c29a067
[31789.440459] page:8316c487 refcount:0 mapcount:-1 
mapping: index:0x0 pfn:0x18f8a9
[31789.440461] flags: 
0x17c214(referenced|dirty|workingset|node=0|zone=2|lastcpupid=0x1f)
[31789.440463] page_type: 0xfffe()
[31789.440465] raw: 0017c214 dead0100 dead0122 

[31789.440467] raw:   fffe 

[31789.440468] page dumped because: bad pte
[31789.440469] addr:780c1213a000 vm_flags:0c0600fb 
anon_vma: mapping:888185672418 index:3a
[31789.440498] file:privcmd fault:privcmd_fault [xen_privcmd] mmap:privcmd_mmap 
[xen_privcmd] read_folio:0x0

So, it would seem that we need to keep treating foreign mapping pages as
special (pte_mkspecial and also VM_IO | VM_PFNMAP) but if we do that
pin_user_pages() fails.

Do you have any ideas how to get pin_user_pages() to work with foreign
mappings from userspace?

Many thanks for your help!!

Cheers,

Stefano

[ 1645.560295] WARNING: CPU: 5 PID: 3989 at mm/gup.c:229 
try_grab_page+0xd7/0x140
[ 1645.560452] CPU: 5 PID: 3989 Comm: qemu-system-x86 Tainted: GB  OE   
   6.8.0+ #36
[ 1645.560457] Hardware name: AMD BIRMANPLUS/BirmanPlus-STX, BIOS TXB1001dB 
12/09/2024 22:52:15
[ 1645.560459] RIP: 0010:try_grab_page+0xd7/0x140
[ 1645.560463] Code: 00 00 00 be 23 00 00 00 48 c1 e8 36 48 8b 3c c5 a0 3d 04 
83 e8 1a 46 fe ff 31 c0 5d 31 d2 31 f6 31 ff 45 31 c0 c3 cc cc cc cc <0f> 0b 41 
b8 f4 ff ff ff 44 89 c0 31 d2 31 f6 31 ff 45 31 c0 c3 cc
[ 1645.560467] RSP: 0018:c90004b6b9d0 EFLAGS: 00010286
[ 1645.560470] RAX: ea00047559c0 RBX: 00080101 RCX: 
[ 1645.560472] RDX:  RSI: 00080101 RDI: ea00047559c0
[ 1645.560474] RBP: c90004b6ba20 R08:  R09: 
[ 1645.560475] R10:  R11:  R12: 8881252fab40
[ 1645.560476] R13: 88818c65c998 R14: ea00047559c0 R15: 80011d567027
[ 1645.560481] FS:  7283a21d3f80() GS:88827e48() 
knlGS:
[ 1645.560483] CS:  0010 DS:  ES:  CR0: 80050033
[ 1645.560485] CR2: 7283a0cb4060 CR3: 00011c986000 CR4: 00750ef0
[ 1645.560490] PKRU: 5554
[ 1645.560492] Call Trace:
[ 1645.560494]  
[ 1645.560499]  ? show_regs+0x72/0x90
[ 1645.560506]  ? try_grab_page+0xd7/0x140
[ 1645.560508]  ? __warn+0x8d/0x160
[ 1645.560512]  ? try_grab_page+0xd7/0x140
[ 1645.560515]  ? report_bug+0x1bb/0x1d0
[ 1645.560520]  ? handle_bug+0x46/0x90
[ 1645.560525]  ? exc_invalid_op+0x19/0x80
[ 1645.560529]  ? asm_exc_invalid_op+0x1b/0x20
[ 1645.560536]  ? try_grab_page+0xd7/0x140
[ 1645.560538]  ? follow_page_pte+0xf1/0x5e0
[ 1645.560541]  follow_page_mask+0x3b5/0x5f0
[ 1645.560544]  ? check_vma_flags+0x1b5/0x290
[ 1645.560547]  __get_user_pages+0x112/0x780
[ 1645.560550]  ? vprintk_emit+0x1ad/0x320
[ 1645.560556]  __gup_longterm_locked+0x231/0xca0
[ 1645.560566]  pin_user_pages+0x78/0xb0
[ 1645.560571]  amdxdna_get_ubuf+0x2c6/0x4d0 [amdxdna]
[ 1645.560667]  amdxdna_gem_create_share_object+0x114/0x170 [amdxdna]
[ 1645.560684]  amdxdna_drm_create_bo_ioctl+0x398/0x5b0 [amdxdna]
[ 1645.560702]  ? __pfx_amdxdna_drm_create_bo_ioctl+0x10/0x10 [amdxdna]
[ 1645.560719]  drm_ioctl_kernel+0xb2/0x110
[ 1645.560727]  drm_ioctl+0x2e2/0x590
[ 1645.560731]  ? __pfx_amdxdna_drm_create_bo_ioctl+0x10/0x10 [amdxdna]
[ 1645.560749]  ? syscall_exit_to_user_mode+0x86/0x260
[ 1645.560753]  ? do_syscall_64+0x93/0x180
[ 1645.560759]  ? __f_unlock_pos+0x12/0x20
[ 1645.560770]  __x64_sys_ioctl+0

Re: pin_user_pages and foreign mappings error

2025-05-20 Thread Demi Marie Obenour

On 5/20/25 20:24, Stefano Stabellini wrote:
> Hi Juergen and all,
> 
> We have an issue where QEMU is mapping foreign pages as usual and
> passing them to a driver in Linux (amdxdna). The driver in Linux calls
> pin_user_pages_fast() on these pages, and it returns -EFAULT. Stack
> trace appended below.

Is the QEMU virtual device that does this upstreamed?

> This is Dom0 PVH. We disabled CONFIG_XEN_UNPOPULATED_ALLOC and
> CONFIG_XEN_BALLOON_MEMORY_HOTPLUG attemping to make things better but it
> did not solved the issue. We tried changing pin_user_pages_fast() to
> pin_user_pages(), still -EFAULT. check_vma_flags returns -EFAULT because
> of the (VM_IO | VM_PFNMAP) check.
> 
> We tried removing (VM_IO | VM_PFNMAP) from privcmd_mmap and
> xen_xlate_remap_gfn_array based on the idea that the underlying pages
> are normal memory once CONFIG_XEN_UNPOPULATED_ALLOC and
> CONFIG_XEN_BALLOON_MEMORY_HOTPLUG are disabled.
> 
> In this case, vm_normal_page takes the if
> (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) path, none of the checks work,
> so it calls print_bad_pte and it breaks.
> 
> As another attempt, we tried removing pte_mkspecial from
> xlate_mmu.c:remap_pte_fn and remap_pfn_fn, based again on the same idea
> that the underlying pages should not be "special". Now it went further
> but it broke at unmap_vmas time on a reference counting error. Specifically,
> we get "non-zero mapcount" on the callchain from unmap_vmas:
> 
> [31789.440433] BUG: Bad page map in process qemu-system-x86  
> pte:80018f8a9027 pmd:13c29a067
> [31789.440459] page:8316c487 refcount:0 mapcount:-1 
> mapping: index:0x0 pfn:0x18f8a9
> [31789.440461] flags: 
> 0x17c214(referenced|dirty|workingset|node=0|zone=2|lastcpupid=0x1f)
> [31789.440463] page_type: 0xfffe()
> [31789.440465] raw: 0017c214 dead0100 dead0122 
> 
> [31789.440467] raw:   fffe 
> 
> [31789.440468] page dumped because: bad pte
> [31789.440469] addr:780c1213a000 vm_flags:0c0600fb 
> anon_vma: mapping:888185672418 index:3a
> [31789.440498] file:privcmd fault:privcmd_fault [xen_privcmd] 
> mmap:privcmd_mmap [xen_privcmd] read_folio:0x0
> 
> So, it would seem that we need to keep treating foreign mapping pages as
> special (pte_mkspecial and also VM_IO | VM_PFNMAP) but if we do that
> pin_user_pages() fails.
> 
> Do you have any ideas how to get pin_user_pages() to work with foreign
> mappings from userspace?I

Does the privcmd driver try to free the pages while amdxdna is still
using them?  privcmd might be assuming that the pages are freed once
the unmap ioctl is called from userspace and the pages are unmapped
from userspace memory.  That isn't true if the pages are pinned by
another driver.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 2/2] xen/domain: rewrite emulation_flags_ok()

On Tue, May 20, 2025 at 05:24:33PM +0200, Jan Beulich wrote:
> On 16.05.2025 04:29, dm...@proton.me wrote:
> > --- a/xen/arch/x86/include/asm/domain.h
> > +++ b/xen/arch/x86/include/asm/domain.h
> > @@ -494,6 +494,12 @@ struct arch_domain
> >   X86_EMU_PIT | X86_EMU_USE_PIRQ |   \
> >   X86_EMU_VPCI)
> >
> > +/* User-selectable features. */
> > +#define X86_EMU_OPTIONAL(X86_EMU_USE_PIRQ)
> > +
> > +#define X86_EMU_BASELINE(X86_EMU_ALL & ~(X86_EMU_VPCI | \
> > + X86_EMU_OPTIONAL))
> 
> That is, VPCI is neither baseline nor optional. Certainly at least strange.

IMO, X86_EMU_OPTIONAL should include both VPCI and PIRQ.

But that will be a behavior change: AFAIU, VPCI is injected implicitly for dom0
case only, "default" xl toolstack currently excludes VPCI for HVM domains.

Do I understand it correctly that "BASELINE" in the symbol name is a concern?

> 
> Jan

[PATCH v8 2/3] xen/domain: adjust domain ID allocation for Arm

From: Denis Mukhin 

Remove the hardcoded domain ID 0 allocation for hardware domain and replace it
with a call to get_initial_domain_id() (returns the value of hardware_domid on
Arm).

Update domid_alloc(DOMID_INVALID) case to ensure that get_initial_domain_id()
ID is skipped during domain ID allocation to cover domU case in dom0less
configuration. That also fixes a potential issue with re-using ID#0 for domUs
when get_initial_domain_id() returns non-zero.

Signed-off-by: Denis Mukhin 
---
Changes since v7:
- use `bool reserved;` in domid_alloc()
---
 xen/arch/arm/domain_build.c | 4 ++--
 xen/common/device-tree/dom0less-build.c | 9 +++--
 xen/common/domain.c | 6 ++
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index e9d563c269..0ad80b020a 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2035,9 +2035,9 @@ void __init create_dom0(void)
 if ( !llc_coloring_enabled )
 flags |= CDF_directmap;
 
-domid = domid_alloc(0);
+domid = domid_alloc(get_initial_domain_id());
 if ( domid == DOMID_INVALID )
-panic("Error allocating domain ID 0\n");
+panic("Error allocating domain ID %d\n", get_initial_domain_id());
 
 dom0 = domain_create(domid, &dom0_cfg, flags);
 if ( IS_ERR(dom0) )
diff --git a/xen/common/device-tree/dom0less-build.c 
b/xen/common/device-tree/dom0less-build.c
index 9236dbae11..8e38affd0c 100644
--- a/xen/common/device-tree/dom0less-build.c
+++ b/xen/common/device-tree/dom0less-build.c
@@ -974,14 +974,11 @@ void __init create_domUs(void)
 
 arch_create_domUs(node, &d_cfg, flags);
 
-/*
- * The variable max_init_domid is initialized with zero, so here it's
- * very important to use the pre-increment operator to call
- * domain_create() with a domid > 0. (domid == 0 is reserved for Dom0)
- */
-domid = domid_alloc(++max_init_domid);
+domid = domid_alloc(DOMID_INVALID);
 if ( domid == DOMID_INVALID )
 panic("Error allocating ID for domain %s\n", dt_node_name(node));
+if ( max_init_domid < domid )
+max_init_domid = domid;
 
 d = domain_create(domid, &d_cfg, flags);
 if ( IS_ERR(d) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 9c6932c457..01a65cb35d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -2423,6 +2423,9 @@ domid_t domid_alloc(domid_t domid)
 }
 else
 {
+bool reserved = __test_and_set_bit(get_initial_domain_id(),
+   domid_bitmap);
+
 domid = find_next_zero_bit(domid_bitmap, DOMID_FIRST_RESERVED,
domid_last);
 
@@ -2438,6 +2441,9 @@ domid_t domid_alloc(domid_t domid)
 __set_bit(domid, domid_bitmap);
 domid_last = domid;
 }
+
+if ( !reserved )
+__clear_bit(reserved, domid_bitmap);
 }
 
 spin_unlock(&domid_lock);
-- 
2.34.1

[PATCH v8 1/3] xen/domain: unify domain ID allocation

From: Denis Mukhin 

Currently, hypervisor code has two different non-system domain ID allocation
implementations:

  (a) Sequential IDs allocation in dom0less Arm code based on max_init_domid;

  (b) Sequential IDs allocation in XEN_DOMCTL_createdomain; does not use
  max_init_domid (both Arm and x86).

It makes sense to have a common helper code for such task across architectures
(Arm and x86) and between dom0less / toolstack domU allocation.

Wrap the domain ID allocation as an arch-independent function domid_alloc() in
common/domain.c based on the bitmap.

Allocation algorithm:
- If an explicit domain ID is provided, verify its availability and use it if
  ID is not used;
- If DOMID_INVALID is provided, search the range [0..DOMID_FIRST_RESERVED-1],
  starting from the last used ID and wrapping around as needed. It guarantees
  that two consecutive calls will never return the same ID.

Also, remove is_free_domid() helper as it is not needed now.

No functional change intended.

Signed-off-by: Denis Mukhin 
---
Changes since v7:
- Use DOMID_FIRST_RESERVED
---
 xen/arch/arm/domain_build.c | 17 ++---
 xen/arch/x86/setup.c| 11 +++---
 xen/common/device-tree/dom0less-build.c | 10 +++---
 xen/common/domain.c | 47 +
 xen/common/domctl.c | 42 +++---
 xen/include/xen/domain.h|  3 ++
 6 files changed, 80 insertions(+), 50 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index b189a7cfae..e9d563c269 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2010,6 +2010,7 @@ void __init create_dom0(void)
 .grant_opts = XEN_DOMCTL_GRANT_version(opt_gnttab_max_version),
 };
 unsigned int flags = CDF_privileged | CDF_hardware;
+domid_t domid;
 int rc;
 
 /* The vGIC for DOM0 is exactly emulating the hardware GIC */
@@ -2034,19 +2035,25 @@ void __init create_dom0(void)
 if ( !llc_coloring_enabled )
 flags |= CDF_directmap;
 
-dom0 = domain_create(0, &dom0_cfg, flags);
+domid = domid_alloc(0);
+if ( domid == DOMID_INVALID )
+panic("Error allocating domain ID 0\n");
+
+dom0 = domain_create(domid, &dom0_cfg, flags);
 if ( IS_ERR(dom0) )
-panic("Error creating domain 0 (rc = %ld)\n", PTR_ERR(dom0));
+panic("Error creating domain %d (rc = %ld)\n", domid, PTR_ERR(dom0));
 
 if ( llc_coloring_enabled && (rc = dom0_set_llc_colors(dom0)) )
-panic("Error initializing LLC coloring for domain 0 (rc = %d)\n", rc);
+panic("Error initializing LLC coloring for domain %pd (rc = %d)\n",
+  dom0, rc);
 
 if ( alloc_dom0_vcpu0(dom0) == NULL )
-panic("Error creating domain 0 vcpu0\n");
+panic("Error creating domain %pdv0\n", dom0);
 
 rc = construct_dom0(dom0);
 if ( rc )
-panic("Could not set up DOM0 guest OS (rc = %d)\n", rc);
+panic("Could not set up guest OS for domain %pd (rc = %d)\n",
+  dom0, rc);
 
 set_xs_domain(dom0);
 }
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2518954124..ac1c3e669b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1030,8 +1030,11 @@ static struct domain *__init create_dom0(struct 
boot_info *bi)
 if ( iommu_enabled )
 dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
-/* Create initial domain.  Not d0 for pvshim. */
-bd->domid = get_initial_domain_id();
+/* Allocate initial domain ID. Not d0 for pvshim. */
+bd->domid = domid_alloc(get_initial_domain_id());
+if ( bd->domid == DOMID_INVALID )
+panic("Error allocating domain ID %d\n", get_initial_domain_id());
+
 d = domain_create(bd->domid, &dom0_cfg,
   pv_shim ? 0 : CDF_privileged | CDF_hardware);
 if ( IS_ERR(d) )
@@ -1063,7 +1066,7 @@ static struct domain *__init create_dom0(struct boot_info 
*bi)
 
 if ( (strlen(acpi_param) == 0) && acpi_disabled )
 {
-printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
+printk("ACPI is disabled, notifying domain %pd (acpi=off)\n", d);
 safe_strcpy(acpi_param, "off");
 }
 
@@ -1078,7 +1081,7 @@ static struct domain *__init create_dom0(struct boot_info 
*bi)
 
 bd->d = d;
 if ( construct_dom0(bd) != 0 )
-panic("Could not construct domain 0\n");
+panic("Could not construct domain %pd\n", d);
 
 bd->cmdline = NULL;
 xfree(cmdline);
diff --git a/xen/common/device-tree/dom0less-build.c 
b/xen/common/device-tree/dom0less-build.c
index 2c56f13771..9236dbae11 100644
--- a/xen/common/device-tree/dom0less-build.c
+++ b/xen/common/device-tree/dom0less-build.c
@@ -850,15 +850,13 @@ void __init create_domUs(void)
 struct xen_domctl_createdomain d_cfg = {0};
 unsigned int flags = 0U;
 bool has_dtb = false;
+domid_t domid;
 uint32_t val;

[PATCH v8 3/3] xen/domain: introduce CONFIG_MAX_DOMID

From: Denis Mukhin 

Embedded deployments of Xen do not need to have support for more than dozen of
domains.

Introduce build-time configuration option to limit the number of domains during
run-time.

Also, move DOMID_FIRST_RESERVED compile-time check from Arm to common code.

Suggested-by: Julien Grall 
Signed-off-by: Denis Mukhin 
---
This is an RFC.

Changes since v7:
- moved DOMID_FIRST_RESERVED compile-time checks to common code
- changed description for MAX_DOMID Kconfig option
---
 xen/arch/arm/tee/ffa.c  |  3 +--
 xen/arch/x86/cpu/mcheck/mce.c   |  2 +-
 xen/arch/x86/cpu/vpmu.c |  2 +-
 xen/common/Kconfig  |  7 +++
 xen/common/domain.c | 21 -
 xen/common/sched/core.c |  4 ++--
 xen/drivers/passthrough/vtd/iommu.c |  2 +-
 xen/include/public/domctl.h |  2 +-
 8 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
index 3bbdd7168a..7417ce6bed 100644
--- a/xen/arch/arm/tee/ffa.c
+++ b/xen/arch/arm/tee/ffa.c
@@ -331,10 +331,9 @@ static int ffa_domain_init(struct domain *d)
  * reserved for the hypervisor and we only support secure endpoints using
  * FF-A IDs with BIT 15 set to 1 so make sure those are not used by Xen.
  */
-BUILD_BUG_ON(DOMID_FIRST_RESERVED >= UINT16_MAX);
 BUILD_BUG_ON((DOMID_MASK & BIT(15, U)) != 0);
 
-if ( d->domain_id >= DOMID_FIRST_RESERVED )
+if ( d->domain_id >= CONFIG_MAX_DOMID )
 return -ERANGE;
 
 ctx = xzalloc(struct ffa_ctx);
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 1c348e557d..ee8ddd33b0 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -1493,7 +1493,7 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc)
 d = rcu_lock_domain_by_any_id(mc_msrinject->mcinj_domid);
 if ( d == NULL )
 {
-if ( mc_msrinject->mcinj_domid >= DOMID_FIRST_RESERVED )
+if ( mc_msrinject->mcinj_domid >= CONFIG_MAX_DOMID )
 return x86_mcerr("do_mca inject: incompatible flag "
  "MC_MSRINJ_F_GPADDR with domain %d",
  -EINVAL, domid);
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index c28192ea26..67d423e088 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -174,7 +174,7 @@ void vpmu_do_interrupt(void)
  * in XENPMU_MODE_ALL, for everyone.
  */
 if ( (vpmu_mode & XENPMU_MODE_ALL) ||
- (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
+ (sampled->domain->domain_id >= CONFIG_MAX_DOMID) )
 {
 sampling = choose_hwdom_vcpu();
 if ( !sampling )
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 6d43be2e6e..66b91840f2 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -576,4 +576,11 @@ config BUDDY_ALLOCATOR_SIZE
  Amount of memory reserved for the buddy allocator to serve Xen heap,
  working alongside the colored one.
 
+config MAX_DOMID
+   int "Maximum number of user domains"
+   range 1 32752
+   default 32752
+   help
+ Specifies the maximum number of domains a user can create.
+
 endmenu
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 01a65cb35d..204b71d096 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -68,7 +68,7 @@ struct domain *domain_list;
 
 /* Non-system domain ID allocator. */
 static DEFINE_SPINLOCK(domid_lock);
-static DECLARE_BITMAP(domid_bitmap, DOMID_FIRST_RESERVED);
+static DECLARE_BITMAP(domid_bitmap, CONFIG_MAX_DOMID);
 static domid_t domid_last;
 
 /*
@@ -155,7 +155,7 @@ int domain_init_states(void)
 ASSERT(rw_is_write_locked_by_me(¤t->domain->event_lock));
 
 dom_state_changed = xvzalloc_array(unsigned long,
-   BITS_TO_LONGS(DOMID_FIRST_RESERVED));
+   BITS_TO_LONGS(CONFIG_MAX_DOMID));
 if ( !dom_state_changed )
 return -ENOMEM;
 
@@ -235,7 +235,7 @@ int get_domain_state(struct xen_domctl_get_domain_state 
*info, struct domain *d,
 while ( dom_state_changed )
 {
 dom = find_first_bit(dom_state_changed, DOMID_MASK + 1);
-if ( dom >= DOMID_FIRST_RESERVED )
+if ( dom >= CONFIG_MAX_DOMID )
 break;
 if ( test_and_clear_bit(dom, dom_state_changed) )
 {
@@ -824,7 +824,7 @@ struct domain *domain_create(domid_t domid,
 /* Sort out our idea of is_hardware_domain(). */
 if ( (flags & CDF_hardware) || domid == hardware_domid )
 {
-if ( hardware_domid < 0 || hardware_domid >= DOMID_FIRST_RESERVED )
+if ( hardware_domid < 0 || hardware_domid >= CONFIG_MAX_DOMID )
 panic("The value of hardware_dom must be a valid domain ID\n");
 
 /* late_hwdom is only allowed for dom0. */
@@ -2414,9 +2414,12 @@ domid

[PATCH v8 0/3] xen/domain: domain ID allocation

The patch series adds new library calls for allocating domain IDs.

Patch 1 introduces new domid_{alloc,free} calls.
Patch 2 adjusts hardware domain ID treatment on Arm.
Patch 3 is an RFC: introduces new CONFIG_MAX_DOMID parameter to limit the
number of user domains during run-time.

Link to v7: 
https://lore.kernel.org/xen-devel/20250519192306.1364471-1-dmuk...@ford.com/ 
Link to CI: 
https://gitlab.com/xen-project/people/dmukhin/xen/-/pipelines/1828093998 

Denis Mukhin (3):
  xen/domain: unify domain ID allocation
  xen/domain: adjust domain ID allocation for Arm
  xen/domain: introduce CONFIG_MAX_DOMID

 xen/arch/arm/domain_build.c | 17 +--
 xen/arch/arm/tee/ffa.c  |  3 +-
 xen/arch/x86/cpu/mcheck/mce.c   |  2 +-
 xen/arch/x86/cpu/vpmu.c |  2 +-
 xen/arch/x86/setup.c| 11 +++--
 xen/common/Kconfig  |  7 +++
 xen/common/device-tree/dom0less-build.c | 17 ---
 xen/common/domain.c | 62 +++--
 xen/common/domctl.c | 42 ++---
 xen/common/sched/core.c |  4 +-
 xen/drivers/passthrough/vtd/iommu.c |  2 +-
 xen/include/public/domctl.h |  2 +-
 xen/include/xen/domain.h|  3 ++
 13 files changed, 108 insertions(+), 66 deletions(-)

-- 
2.34.1

Re: [PATCH 0/3] CI: Improvements to -tools-test- jobs

On Tue, May 20, 2025 at 09:52:36PM +0100, Andrew Cooper wrote:
> Rearrange tools/tests to be more ameanable to running in CI, and drop the
> special casing holding it together.
> 
> 
> 
> Andrew Cooper (3):
>   tools/tests: Drop depriv-fd-checker
>   tools/tests: Install tests into $(LIBEXEC)/tests
>   CI: Drop custom handling of tools/tests

The code changes look good to me

Reviewed-by: Denis Mukhin 

for the series.

> 
>  .gitignore |   1 -
>  automation/scripts/build   |   1 -
>  automation/scripts/qubes-x86-64.sh |   7 +-
>  automation/scripts/run-tools-tests |  43 ++-
>  tools/tests/Makefile   |   1 -
>  tools/tests/cpu-policy/Makefile|   6 +-
>  tools/tests/depriv/Makefile|  52 ---
>  tools/tests/depriv/depriv-fd-checker.c | 436 -
>  tools/tests/paging-mempool/Makefile|   6 +-
>  tools/tests/rangeset/Makefile  |   6 +-
>  tools/tests/resource/Makefile  |   6 +-
>  tools/tests/tsx/Makefile   |   6 +-
>  tools/tests/xenstore/Makefile  |   6 +-
>  13 files changed, 40 insertions(+), 537 deletions(-)
>  delete mode 100644 tools/tests/depriv/Makefile
>  delete mode 100644 tools/tests/depriv/depriv-fd-checker.c
> 
> --
> 2.39.5
> 
>

Re: [PATCH 2/2] MAINTAINERS: add Daniel P. Smith as an Argo maintainer

On Wed, 21 May 2025, Christopher Clark wrote:
> Daniel is a longstanding contributor to the OpenXT Project where Argo
> was developed and is in active use with Xen, and to Argo itself,
> involved with the design and development of Argo software.
> 
> Signed-off-by: Christopher Clark 

Acked-by: Stefano Stabellini 


> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e7198363c5..c2e4f8528f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -222,6 +222,7 @@ F:tools/libacpi/
>  
>  ARGO
>  M:   Christopher Clark 
> +M:   Daniel P. Smith 
>  S:   Maintained
>  F:   xen/include/public/argo.h
>  F:   xen/include/xen/argo.h
> -- 
> 2.25.1
>

Re: [PATCH 1/2] MAINTAINERS: include Argo documentation in the ARGO section

On Wed, 21 May 2025, Christopher Clark wrote:
> Signed-off-by: Christopher Clark 

Acked-by: Stefano Stabellini 


> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c11b82eca9..e7198363c5 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -226,6 +226,7 @@ S:Maintained
>  F:   xen/include/public/argo.h
>  F:   xen/include/xen/argo.h
>  F:   xen/common/argo.c
> +F:   docs/designs/argo.pandoc
>  
>  ARINC653 SCHEDULER
>  M:   Nathan Studer 
> -- 
> 2.25.1
>

Re: [PATCH v2 2/2] xen/domain: rewrite emulation_flags_ok()

On Tue, 20 May 2025, dm...@proton.me wrote:
> On Tue, May 20, 2025 at 05:24:33PM +0200, Jan Beulich wrote:
> > On 16.05.2025 04:29, dm...@proton.me wrote:
> > > --- a/xen/arch/x86/include/asm/domain.h
> > > +++ b/xen/arch/x86/include/asm/domain.h
> > > @@ -494,6 +494,12 @@ struct arch_domain
> > >   X86_EMU_PIT | X86_EMU_USE_PIRQ |   \
> > >   X86_EMU_VPCI)
> > >
> > > +/* User-selectable features. */
> > > +#define X86_EMU_OPTIONAL(X86_EMU_USE_PIRQ)
> > > +
> > > +#define X86_EMU_BASELINE(X86_EMU_ALL & ~(X86_EMU_VPCI | \
> > > + X86_EMU_OPTIONAL))
> > 
> > That is, VPCI is neither baseline nor optional. Certainly at least strange.

I think Denis tried to keep the code more similar to the original. This
way it is easier to review the code change and it seems correct. But at
the same time it is easier to spot inconsistency that were present even
before the patch.

> IMO, X86_EMU_OPTIONAL should include both VPCI and PIRQ.

It looks that way to me too.

However, then we need to be careful as the check would differ from the
original, but maybe that's OK. We want vPCI to be potentially exposed to
DomUs as well.

> But that will be a behavior change: AFAIU, VPCI is injected implicitly for 
> dom0
> case only, "default" xl toolstack currently excludes VPCI for HVM domains.
> 
> Do I understand it correctly that "BASELINE" in the symbol name is a concern?

Re: [PATCH] xen/argo: Command line handling improvements

2025-05-20 Thread Christopher Clark

On Tue, May 20, 2025 at 3:10 PM Andrew Cooper 
wrote:

> Treat "argo" on the command line as a positive boolean, rather than
> requiring
> the user to pass "argo=1/on/enable/true".
>
> Move both opt_argo* variables into __ro_after_init.  They're set during
> command line parsing and never modified thereafter.
>

Instead of binding to static values set at boot, would
boolean_runtime_param be acceptable? This allows the system administrator
to adjust the Argo settings if the hypervisor has been compiled with it
enabled and booted with the default settings, and avoid having to reboot
and modify the bootloader configuration.

I agree with Daniel's request for a doc update where affected.

thanks,

Christopher



>
> Signed-off-by: Andrew Cooper 
> ---
> CC: Christopher Clark 
> CC: Daniel P. Smith 
> CC: Denis Mukhin 
>
> Found while
> ---
>  xen/common/argo.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index cbe8911a4364..027b37b18a6c 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -79,8 +79,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
>  DEFINE_COMPAT_HANDLE(compat_argo_iov_t);
>  #endif
>
> -static bool __read_mostly opt_argo;
> -static bool __read_mostly opt_argo_mac_permissive;
> +static bool __ro_after_init opt_argo;
> +static bool __ro_after_init opt_argo_mac_permissive;
>
>  static int __init cf_check parse_argo(const char *s)
>  {
> @@ -92,7 +92,10 @@ static int __init cf_check parse_argo(const char *s)
>  if ( !ss )
>  ss = strchr(s, '\0');
>
> -if ( (val = parse_bool(s, ss)) >= 0 )
> +/* Intepret "argo" as a positive boolean. */
> +if ( *s == '\0' )
> +opt_argo = true;
> +else if ( (val = parse_bool(s, ss)) >= 0 )
>  opt_argo = val;
>  else if ( (val = parse_boolean("mac-permissive", s, ss)) >= 0 )
>  opt_argo_mac_permissive = val;
>
> base-commit: 293abb9e0c5e8df96cc5dfe457c62dfcb7542b19
> --
> 2.39.5
>
>

RE: [PATCH v3 01/20] xen/x86: remove "depends on !PV_SHIM_EXCLUSIVE"

[Public]

> -Original Message-
> From: Jan Beulich 
> Sent: Wednesday, April 30, 2025 11:17 PM
> To: Penny, Zheng 
> Cc: Huang, Ray ; Andrew Cooper
> ; Roger Pau Monné ;
> Anthony PERARD ; Orzel, Michal
> ; Julien Grall ; Stefano Stabellini
> ; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v3 01/20] xen/x86: remove "depends
> on !PV_SHIM_EXCLUSIVE"
>
> On 21.04.2025 09:37, Penny Zheng wrote:
> > Remove all "depends on !PV_SHIM_EXCLUSIVE" (also the functionally
> > equivalent "if !...") in Kconfig file, since negative dependancy will
> > badly affect allyesconfig.
> > This commit is based on "x86: provide an inverted Kconfig control for
> > shim-exclusive mode"[1]
>
> Recall me asking to avoid wording like "This commit" in commit messages?
> Also personally I consider "is based on" ambiguous: It could also mean the 
> one here
> needs to go on top of that other one. It's not entirely clear to me what kind 
> of
> (relevant) information you're trying to convey with this sentence. Surely you 
> didn't
> really need to even look at that patch of mine to find all the 
> !PV_SHIM_EXCLUSIVE;
> that's a matter of a simply grep.
>
> > ---
> >  xen/arch/x86/Kconfig  | 4 
> >  xen/arch/x86/hvm/Kconfig  | 1 -
> >  xen/drivers/video/Kconfig | 4 ++--
> >  3 files changed, 2 insertions(+), 7 deletions(-)
>
> With the changes here, what does this mean for the in-tree shim build, or any 
> others
> using xen/arch/x86/configs/pvshim_defconfig as the basis? You aren't altering 
> that
> file, so I expect the binary produced will change significantly (when it 
> shouldn't,
> unless explicitly stated otherwise in the description, which may be warranted 
> for
> SHADOW_PAGING).
>

Yes, I've missed the changes in defconfig
I'll explicitly state above options in xen/arch/x86/configs/pvshim_defconfig

For SHADOW_PAGING and TBOOT, maybe we shall add back default y, otherwise 
x86_64_defconfig
will change...

> Jan

[PATCH 0/2] SUPPORT.md: update the xenstore support state

2025-05-20 Thread Juergen Gross

Two updates regarding C Xenstore support.

Juergen Gross (2):
  SUPPORT.md: add xenstore stubdom as supported
  SUPPORT.md: mark xenstore live update as supported

 SUPPORT.md | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

-- 
2.43.0

[PATCH 2/2] SUPPORT.md: mark xenstore live update as supported

2025-05-20 Thread Juergen Gross

Live update of xenstored is available since Xen 4.15 and it is tested
on a regular basis since then.

Switch the live update support from "Tech Preview" to "Supported".

Signed-off-by: Juergen Gross 
---
 SUPPORT.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/SUPPORT.md b/SUPPORT.md
index 7dbb477765..198dfb4229 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -270,7 +270,7 @@ or itself will not be regarded a security issue.
 ### C xenstored daemon
 
 Status: Supported
-Status, Liveupdate: Tech Preview
+Status, Liveupdate: Supported
 
 ### C xenstore stubdom PV
 
-- 
2.43.0

[PATCH 1/2] SUPPORT.md: add xenstore stubdom as supported

2025-05-20 Thread Juergen Gross

SUPPORT.md is missing a suupport statement for Xenstore stubdom.

As SUSE is using it in production since several years now, it should
be added as "supported". This covers the PV and the PVH variant.

Signed-off-by: Juergen Gross 
---
 SUPPORT.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/SUPPORT.md b/SUPPORT.md
index e8fd0c251e..7dbb477765 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -272,6 +272,16 @@ or itself will not be regarded a security issue.
 Status: Supported
 Status, Liveupdate: Tech Preview
 
+### C xenstore stubdom PV
+
+Status: Supported
+Status, Liveupdate: Not implemented
+
+### C xenstore stubdom PVH
+
+Status: Supported
+Status, Liveupdate: Not implemented
+
 ### OCaml xenstored daemon
 
 Status: Supported
-- 
2.43.0

xen/arm: Virtio-PCI for dom0less on ARM

2025-05-20 Thread Edgar E. Iglesias

Hi all,

Following up on the ARM virtio-pci series I posted a while back ago.

There have been some concerns around the delayed and silent apperance of
devices on the ECAM area. The spec is not super clear wether this is OK or
not but I'm providing some references to the PCI specs and to some real cases
where this is used for FPGAs.

There are two options how to implement virtio-pci that we've discussed:
1. VPCI + IOREQ
2. IOREQ only

There are pros and cons with both. For example with #1, has the benefit that
we would only have a single PCIe RC (in Xen) and we could emulate a hotplug
capable expansion port with a standard way to notify when PCI devices plug in.
Approach #2 has the benefit that there is (almost) no additional complexity
or code added to Xen, almost everything lives outside.
IMO, both options have merit and both could co-exist.

For dynamic xl flows, options #2 already works without modifications to Xen.
Users need to pass the correct command-line options to QEMU and a device-tree
fragment with the pci-generic-ecam-host device.

For static dom0less flows, we can do the same as for xl flows but we have the
additional problem of domU's PCI bus enumeration racing with QEMU.
On x86, when domU's access a memory range that has not yet got IOREQ's
connected to it, the accesses succeeds with reads returning 0x and
writes ignored. This makes it easy for guests to wait for IOREQ devices to
pop up since guests will find an empty bus and can initiate a rescan later
when QEMU attaches. On ARM, we trap on these accesses.

If we on ARM add support for MMIO background regions with a default read value,
i.e mmio handlers that have lower priority than IOREQs and that are
read-const + writes-ignored, we could support the same flow on ARM.
This may be generally useful for other devices as well (e.g virtio-mmio or
something else). We could also use this to defer PCI enumeration.

So the next versions of this series I was thinking to remove the PCI specifics
and instead add FDT bindings to ARM dom0less enabling setup of user
configurable (by address-range and read-value) background mmio regions.
Xen would then support option #2 without any PCI specifics added.

Thoughts?

Cheers,
Edgar

# References to spec

PCI express base specification:
7.5.1.1.1 Vendor ID Register (Offset 00h)
The Vendor ID register is HwInit and the value in this register identifies the
manufacturer of the Function. In keeping with
PCI-SIG procedures, valid vendor identifiers must be allocated by the PCI-SIG
to ensure uniqueness. Each vendor must
have at least one Vendor ID. It is recommended that software read the Vendor ID
register to determine if a Function is
present, where a value of h indicates that no Function is present.

PCI Firmware Specification:
3.5 Device State at Firmware/Operating System Handoff
Page 34:
The operating system is required to configure PCI subsystems:
 During hotplug
 For devices that take too long to come out of reset
 PCI-to-PCI bridges that are at levels below what firmware is designed to
configure

Page 36:
Note: The operating system does not have to walk all buses during boot. The
kernel can
automatically configure devices on request; i.e., an event can cause a scan of
I/O on demand.

FPGA's can be programmed at runtime and appear on the ECAM bus silently.
An PCI rescan needs to be triggered for the OS to discover the device:
Intel FPGAs:
https://www.intel.com/content/www/us/en/docs/programmable/683190/1-3-1/how-to-rescan-bus-and-re-enable-aer.html

Re: [PATCH v2 1/5] x86/pmstat: Check size of PMSTAT_get_pxstat buffers

2025-05-20 Thread Ross Lagerwall

On Tue, May 13, 2025 at 3:27 PM Jan Beulich  wrote:
>
> On 12.05.2025 16:46, Ross Lagerwall wrote:
> > Check that the total number of states passed in and hence the size of
> > buffers is sufficient to avoid writing more than the caller has
> > allocated.
> >
> > The interface is not explicit about whether getpx.total is expected to
> > be set by the caller in this case but since it is always set in
> > libxenctrl it seems reasonable to check it.
>
> Yet if we start checking the value, I think the public header should also
> be made say so (in a comment).
>
> > --- a/xen/drivers/acpi/pmstat.c
> > +++ b/xen/drivers/acpi/pmstat.c
> > @@ -103,8 +103,10 @@ int do_get_pm_info(struct xen_sysctl_get_pmstat *op)
> >
> >  cpufreq_residency_update(op->cpuid, pxpt->u.cur);
> >
> > -ct = pmpt->perf.state_count;
> > -if ( copy_to_guest(op->u.getpx.trans_pt, pxpt->u.trans_pt, ct*ct) )
> > +ct = min_t(uint32_t, pmpt->perf.state_count, op->u.getpx.total);
>
> With this, ...
>
> > +if ( ct <= op->u.getpx.total &&
>
> ... this is going to be always true, isn't it? Which constitutes a violation
> of Misra rule 14.3.
>
> Also it would be nice if the min_t() could become a normal min(), e.g. by
> "promoting" op->u.getpx.total to unsigned int via adding 0U. This way it's
> clear there's no hidden truncation (or else there might be an argument for
> keeping the check above).
>
> > + copy_to_guest(op->u.getpx.trans_pt, pxpt->u.trans_pt, ct * 
> > ct) )
> >  {
> >  spin_unlock(cpufreq_statistic_lock);
> >  ret = -EFAULT;
>
> Why would you constrain this copy-out but not the one just out of context
> below here? (The question is of course moot if the condition was dropped.)
>

Oh, I had intended this condition to be...

if ( ct == op->u.getpx.total &&

... based on your previous comment about the difficulties of partially
copying a 2d array.

> An option may be to document that this array is copied only when the
> buffer is large enough.

I left the other alone since partially copying a 1d array makes sense.

If you would prefer, I can drop the condition and just let the caller
deal with the partially copied 2d array?

Thanks,
Ross

Re: [PATCH] xen: Give compile.h header guards

2025-05-20 Thread Frediano Ziglio

On Mon, May 19, 2025 at 2:52 PM Andrew Cooper  wrote:
>
> Signed-off-by: Andrew Cooper 
> ---
> CC: Anthony PERARD 
> CC: Michal Orzel 
> CC: Jan Beulich 
> CC: Julien Grall 
> CC: Roger Pau Monné 
> CC: Stefano Stabellini 
> ---
>  xen/include/xen/compile.h.in | 3 +++
>  xen/tools/process-banner.sed | 5 +
>  2 files changed, 8 insertions(+)
>
> diff --git a/xen/include/xen/compile.h.in b/xen/include/xen/compile.h.in
> index 3151d1e7d1bf..9206341ba692 100644
> --- a/xen/include/xen/compile.h.in
> +++ b/xen/include/xen/compile.h.in
> @@ -1,3 +1,6 @@
> +#ifndef XEN_COMPILE_H
> +#define XEN_COMPILE_H
> +

Why not follow CODING_STYLE ?

OT: Maybe while on it, why not add SPDX comments too ?

>  #define XEN_COMPILE_DATE   "@@date@@"
>  #define XEN_COMPILE_TIME   "@@time@@"
>  #define XEN_COMPILE_BY "@@whoami@@"
> diff --git a/xen/tools/process-banner.sed b/xen/tools/process-banner.sed
> index 56c76558bcd9..4cf3f9a1163a 100755
> --- a/xen/tools/process-banner.sed
> +++ b/xen/tools/process-banner.sed
> @@ -12,3 +12,8 @@ s_(.*)_"\1\\n"_
>
>  # Trailing \ on all but the final line.
>  $!s_$_ \\_
> +
> +# Append closing header guard
> +$a\
> +\
> +#endif /* XEN_COMPILE_H */
>
> base-commit: 6fc02ebdd053856221f37ba5306232ac1575332d
> prerequisite-patch-id: 7bc1c498ba2c9c4a4939a56a0006f820f47f2a66
> --
> 2.39.5
>
>
Frediano

Re: [PATCH v2 12/16] xen/riscv: introduce intc_init() and helpers



On 5/15/25 11:29 AM, Jan Beulich wrote:

On 06.05.2025 18:51, Oleksii Kurochko wrote:

Introduce intc_init() to initialize the interrupt controller using the
registered hardware ops.
Also add intc_route_irq_to_xen() to route IRQs to Xen, with support for
setting IRQ type and priority via new internal helpers intc_set_irq_type()
and intc_set_irq_priority().

Call intc_init() to do basic initialization steps for APLIC and IMSIC.

Co-developed-by: Romain Caritey
Signed-off-by: Oleksii Kurochko
---
Changes in V2:
  - This patch was part of "xen/riscv: Introduce intc_hw_operations abstraction"
and splitted to have ability to merge patch "xen/riscv: initialize interrupt
controller" to the current patch (where intc_init() call is actually
introduced).
  - Add checks of that callbacks aren't set to NULL in intc_set_irq_priority()
and intc_set_irq_type().
  - add num_irqs member to struct intc_info as it is used now in
intc_route_irq_to_xen().
  - Add ASSERT(spin_is_locked(&desc->lock)) to intc_set_irq_priority() in
the case this function will be called outside intc_route_irq_to_xen().
---
  xen/arch/riscv/include/asm/intc.h |  4 +++
  xen/arch/riscv/intc.c | 45 +++
  xen/arch/riscv/setup.c|  2 ++
  3 files changed, 51 insertions(+)

diff --git a/xen/arch/riscv/include/asm/intc.h 
b/xen/arch/riscv/include/asm/intc.h
index 2d55d74a2e..45a41147a6 100644
--- a/xen/arch/riscv/include/asm/intc.h
+++ b/xen/arch/riscv/include/asm/intc.h
@@ -44,4 +44,8 @@ void intc_preinit(void);
  
  void register_intc_ops(struct intc_hw_operations *ops);
  
+void intc_init(void);

+
+void intc_route_irq_to_xen(struct irq_desc *desc, unsigned int priority);
+
  #endif /* ASM__RISCV__INTERRUPT_CONTOLLER_H */
diff --git a/xen/arch/riscv/intc.c b/xen/arch/riscv/intc.c
index 122e7b32b5..15f358601d 100644
--- a/xen/arch/riscv/intc.c
+++ b/xen/arch/riscv/intc.c
@@ -1,9 +1,12 @@
  /* SPDX-License-Identifier: GPL-2.0-only */
  
  #include 

+#include 
  #include 
  #include 
+#include 
  #include 
+#include 
  
  #include 
  
@@ -21,3 +24,45 @@ void __init intc_preinit(void)

  else
  panic("ACPI interrupt controller preinit() isn't implemented\n");
  }
+
+void __init intc_init(void)
+{
+ASSERT(intc_hw_ops);

What's the goal of this check (also elsewhere below)? You'll crash anyway ...


Agree, that it could be dropped.

The goal was that it will a little bit easier to find a place where NULL
pointer de-reference  will/could happen.

Then it make sense to drop ASSERT(intc_hw_ops) in intc_set_irq_type() and
intc_set_irq_priority() as intc_init() will be called first and so if it
won't crash, then intc_hw_ops is registered.




+if ( intc_hw_ops->init() )

... here if the point is still NULL, just like you will if the function
pointer is unpopulated (and hence NULL).


Here, I think it would be better to rewrite to:
  void __init intc_init(void)
  {
  if ( !intc_hw_ops->init )
  return;

  if ( intc_hw_ops->init() )
  panic("Failed to initialize the interrupt controller drivers\n");
  }
For the case, if an interrupt controller doesn't want to do some initialization.
(but I am not sure if it makes sense, likely an interrupt controller will want
to do some initialization. Then it makes do drop the first if (...)).



Preferably with all of these (but not the other assertions) dropped
Acked-by: Jan Beulich


Thanks.

~ Oleksii

Re: [PATCH v2 13/16] xen/riscv: implementation of aplic and imsic operations



On 5/15/25 11:44 AM, Jan Beulich wrote:

@@ -159,6 +270,8 @@ static int __init aplic_preinit(struct dt_device_node 
*node, const void *dat)
  
  dt_irq_xlate = aplic_irq_xlate;
  
+spin_lock_init(&aplic.lock);

Can't you have the struct field have a suitable initializer?


Sure, I will use struct initializer:
  static struct aplic_priv aplic = {
  .lock = SPIN_LOCK_UNLOCKED,
  };


+static void imsic_local_eix_update(unsigned long base_id, unsigned long num_id,
+   bool pend, bool val)
+{
+unsigned long id = base_id, last_id = base_id + num_id;
+
+while ( id < last_id )
+{
+unsigned long isel, ireg;
+unsigned long start_id = id & (__riscv_xlen - 1);
+unsigned long chunk = __riscv_xlen - start_id;
+unsigned long count = (last_id - id < chunk) ? last_id - id : chunk;
+
+isel = id / __riscv_xlen;
+isel *= __riscv_xlen / IMSIC_EIPx_BITS;
+isel += pend ? IMSIC_EIP0 : IMSIC_EIE0;
+
+ireg = GENMASK(start_id + count - 1, start_id);
+
+id += count;
+
+if ( val )
+imsic_csr_set(isel, ireg);
+else
+imsic_csr_clear(isel, ireg);
+}
+}
+
+void imsic_irq_enable(unsigned int irq)
+{
+unsigned long flags;
+
+spin_lock_irqsave(&imsic_cfg.lock, flags);
+/*
+ * There is no irq - 1 here (look at aplic_set_irq_type()) because:
+ * From the spec:
+ *   When an interrupt file supports distinct interrupt identities,
+ *   valid identity numbers are between 1 and inclusive. The identity
+ *   numbers within this range are said to be implemented by the interrupt
+ *   file; numbers outside this range are not implemented. The number zero
+ *   is never a valid interrupt identity.
+ *   ...
+ *   Bit positions in a valid eiek register that don’t correspond to a
+ *   supported interrupt identity (such as bit 0 of eie0) are read-only 
zeros.
+ *
+ * So in EIx registers interrupt i corresponds to bit i in comparison wiht
+ * APLIC's sourcecfg which starts from 0. (l)

What's this 'l' in parentheses here to indicate?


I don't really remember, it seems like I want to point to the spec, but
then just make a quote from the spec instead. I'll just drop it.


+ */
+imsic_local_eix_update(irq, 1, false, true);
+spin_unlock_irqrestore(&imsic_cfg.lock, flags);
+}
+
+void imsic_irq_disable(unsigned int irq)
+{
+unsigned long flags;
+
+spin_lock_irqsave(&imsic_cfg.lock, flags);
+imsic_local_eix_update(irq, 1, false, false);
+spin_unlock_irqrestore(&imsic_cfg.lock, flags);
+}

The sole caller of the function has doubly turned off IRQs already; perhaps
no need to it a 3rd time, unless other callers are to appear? Same for
imsic_irq_enable() as it looks.


I checked a code in private branches and it seems like these functions are 
called
only in aplic_irq_{enable,disable}(), so we could do, at 
least,spin_lock(&imsic_cfg.lock)
+ ASSERT(!local_irq_is_enabled());

~ Oleksii

RE: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline

[Public]

> -Original Message-
> From: Jan Beulich 
> Sent: Monday, May 19, 2025 9:19 PM
> To: Penny, Zheng 
> Cc: Huang, Ray ; Andrew Cooper
> ; Anthony PERARD ;
> Orzel, Michal ; Julien Grall ; Roger Pau
> Monné ; Stefano Stabellini ; 
> xen-
> de...@lists.xenproject.org
> Subject: Re: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc" xen
> cmdline
>
> On 19.05.2025 10:36, Penny, Zheng wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: Tuesday, April 29, 2025 8:52 PM
> >> To: Penny, Zheng 
> >> Cc: Huang, Ray ; Andrew Cooper
> >> ; Anthony PERARD
> >> ; Orzel, Michal ;
> >> Julien Grall ; Roger Pau Monné
> >> ; Stefano Stabellini ;
> >> xen- de...@lists.xenproject.org
> >> Subject: Re: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc"
> >> xen cmdline
> >>
> >> On 14.04.2025 09:40, Penny Zheng wrote:
> >>> --- a/xen/include/acpi/cpufreq/processor_perf.h
> >>> +++ b/xen/include/acpi/cpufreq/processor_perf.h
> >>> @@ -5,6 +5,9 @@
> >>>  #include 
> >>>  #include 
> >>>
> >>> +/* ability bits */
> >>> +#define XEN_PROCESSOR_PM_CPPC   8
> >>
> >> This needs correlating (at least via commentary, better by build-time
> >> checking) with the other XEN_PROCESSOR_PM_* values. Otherwise
> someone
> >> adding a new #define in the public header may not (easily) notice a
> >> possible conflict. With that in mind I also question whether 8 is
> >> actually a good choice: That's the obvious next value to use in the
> >> public interface. SIF_PM_MASK is 8 bits wide, so a sensible value to use 
> >> here
> would by e.g. 0x100.
> >>
> >
> > I've added a public flag anchor "XEN_PROCESSOR_PM_PUBLIC_END" in
> public header:
> >  #define XEN_PROCESSOR_PM_PUBLIC_END
> XEN_PROCESSOR_PM_TX
> > and will do the following build-time checking:
> > BUILD_BUG_ON(XEN_PROCESSOR_PM_CPPC <=
> > XEN_PROCESSOR_PM_PUBLIC_END);
>
> I don't really see why anything would need to be added to the public header 
> (and we
> really should strive to avoid unnecessary additions).

If I put such definition
"#define XEN_PROCESSOR_PM_PUBLIC_END XEN_PROCESSOR_PM_TX"
in internal header, I'm afraid it won't be updated if new ones added in the 
public in the future.
Or any other suggestions to provide build-time checking?

>
> Jan

Re: [PATCH v4 09/10] vpci/msi: Free MSI resources when init_msi() fails

On Tue, May 20, 2025 at 11:14:27AM +0200, Jan Beulich wrote:
> On 20.05.2025 11:09, Roger Pau Monné wrote:
> > On Tue, May 20, 2025 at 08:40:28AM +0200, Jan Beulich wrote:
> >> On 09.05.2025 11:05, Jiqian Chen wrote:
> >>> When init_msi() fails, the previous new changes will hide MSI
> >>> capability, it can't rely on vpci_deassign_device() to remove
> >>> all MSI related resources anymore, those resources must be
> >>> removed in cleanup function of MSI.
> >>
> >> That's because vpci_deassign_device() simply isn't called anymore?
> >> Could do with wording along these lines then. But (also applicable
> >> to the previous patch) - doesn't this need to come earlier? And is
> >> it sufficient to simply remove the register intercepts? Don't you
> >> need to put in place ones dropping all writes and making all reads
> >> return either 0 or ~0 (covering in particular Dom0, while for DomU-s
> >> this may already be the case by default behavior)?
> > 
> > For domUs this is already the default behavior.
> > 
> > For dom0 I think it should be enough to hide the capability from the
> > linked list, but not hide all the capability related
> > registers.  IMO a well behaved dom0 won't try to access capabilities
> > disconnected from the linked list,
> 
> Just that I've seen drivers knowing where their device has certain
> capabilities, thus not bothering to look up the respective
> capability.

OK, so let's make the control register read-only in case of failure.

If MSI(-X) is already enabled we should also make the entries
read-only, and while that's not very complicated for MSI, it does get
more convoluted for MSI-X.  I'm fine with just making the control
register read-only for the time being.

Thanks, Roger.

RE: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc" xen cmdline

[Public]

> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, May 20, 2025 5:16 PM
> To: Penny, Zheng 
> Cc: Huang, Ray ; Andrew Cooper
> ; Anthony PERARD ;
> Orzel, Michal ; Julien Grall ; Roger Pau
> Monné ; Stefano Stabellini ; 
> xen-
> de...@lists.xenproject.org
> Subject: Re: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc" xen
> cmdline
>
> On 20.05.2025 10:28, Penny, Zheng wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: Monday, May 19, 2025 9:19 PM
> >> To: Penny, Zheng 
> >> Cc: Huang, Ray ; Andrew Cooper
> >> ; Anthony PERARD
> >> ; Orzel, Michal ;
> >> Julien Grall ; Roger Pau Monné
> >> ; Stefano Stabellini ;
> >> xen- de...@lists.xenproject.org
> >> Subject: Re: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc"
> >> xen cmdline
> >>
> >> On 19.05.2025 10:36, Penny, Zheng wrote:
> >>> [Public]
> >>>
>  -Original Message-
>  From: Jan Beulich 
>  Sent: Tuesday, April 29, 2025 8:52 PM
>  To: Penny, Zheng 
>  Cc: Huang, Ray ; Andrew Cooper
>  ; Anthony PERARD
>  ; Orzel, Michal ;
>  Julien Grall ; Roger Pau Monné
>  ; Stefano Stabellini
>  ;
>  xen- de...@lists.xenproject.org
>  Subject: Re: [PATCH v4 05/15] xen/x86: introduce "cpufreq=amd-cppc"
>  xen cmdline
> 
>  On 14.04.2025 09:40, Penny Zheng wrote:
> > --- a/xen/include/acpi/cpufreq/processor_perf.h
> > +++ b/xen/include/acpi/cpufreq/processor_perf.h
> > @@ -5,6 +5,9 @@
> >  #include 
> >  #include 
> >
> > +/* ability bits */
> > +#define XEN_PROCESSOR_PM_CPPC   8
> 
>  This needs correlating (at least via commentary, better by
>  build-time
>  checking) with the other XEN_PROCESSOR_PM_* values. Otherwise
> >> someone
>  adding a new #define in the public header may not (easily) notice a
>  possible conflict. With that in mind I also question whether 8 is
>  actually a good choice: That's the obvious next value to use in the
>  public interface. SIF_PM_MASK is 8 bits wide, so a sensible value
>  to use here
> >> would by e.g. 0x100.
> 
> >>>
> >>> I've added a public flag anchor "XEN_PROCESSOR_PM_PUBLIC_END" in
> >> public header:
> >>>  #define XEN_PROCESSOR_PM_PUBLIC_END
> >> XEN_PROCESSOR_PM_TX
> >>> and will do the following build-time checking:
> >>> BUILD_BUG_ON(XEN_PROCESSOR_PM_CPPC <=
> >>> XEN_PROCESSOR_PM_PUBLIC_END);
> >>
> >> I don't really see why anything would need to be added to the public
> >> header (and we really should strive to avoid unnecessary additions).
> >
> > If I put such definition
> > "#define XEN_PROCESSOR_PM_PUBLIC_END XEN_PROCESSOR_PM_TX"
> > in internal header, I'm afraid it won't be updated if new ones added in the 
> > public in
> the future.
> > Or any other suggestions to provide build-time checking?
>
> Imo it's sufficient to check that the new #define doesn't overlap with
> SIF_PM_MASK.

Understood!

>
> Jan

RE: [PATCH v4 12/15] tools/xenpm: Print CPPC parameters for amd-cppc driver

[Public]

> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, May 20, 2025 5:18 PM
> To: Penny, Zheng 
> Cc: Huang, Ray ; Anthony PERARD
> ; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v4 12/15] tools/xenpm: Print CPPC parameters for amd-cppc
> driver
>
> On 20.05.2025 10:22, Penny, Zheng wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: Tuesday, May 13, 2025 4:03 PM
> >>
> >> On 09.05.2025 08:36, Penny, Zheng wrote:
>  -Original Message-
>  From: Jan Beulich 
>  Sent: Wednesday, April 30, 2025 9:55 PM
> 
>  On 14.04.2025 09:40, Penny Zheng wrote:
> > HWP, amd-cppc, amd-cppc-epp are all the implementation of ACPI
> > CPPC (Collaborative Processor Performace Control), so we introduce
> > cppc_mode flag to print CPPC-related para.
> >
> > And HWP and amd-cppc-epp are both governor-less driver, so we
> > introduce hw_auto flag to bypass governor-related print.
> 
>  But in the EPP driver you use the information which governor is active.
> >>>
> >>> We want to have a one-one mapping between governor and epp value,
> >>> such as, If users choose performance governor, no matter via "xenpm"
> >>> or cmdline, users want maximum performance, We set epp with 0 to
> >>> meet the
> >> expectation.
> >>> And if users choose powersave governor, users want the least power
> >>> consumption, then we shall set epp with 255 to meet the expectation.
> >>
> >> That's all fine, but completely misses the point of my question: If
> >> the governor is relevant, why would you bypass respective printing?
> >>
> >
> > The only useful info in governor for epp mode, IMO, is its name.
> > I deduce which performance policy user wants to apply through which governor
> they choose.
> > If user chooses performance governor, they want maximum performance.
> > If user chooses powersave governor, they want the least power
> > consumption I could only provide appropriate epp value in above two 
> > scenarios.
> > ondemand and userspace are forbidden choices, and if users specify
> > such options in cmdline, I shall give warning message to say  that
> > amd-cppc in active mode is governor-less, and users could set epp values on
> runtime to specify bias towards performance or efficiency.
> >
> > If above is messy, I could also totally forbid governor thing for active 
> > mode...
> wdyt?
>
> Then how would one pick between performance and powersave?

In hwp, we defined
"
#define CPPC_ENERGY_PERF_BALANCE 0x80
"
to provide the balanced value. Users could run "xenpm set-cpufreq-cppc balance" 
to achieve on runtime
Any other specific epp value, users shall run "xenpm set-cpufreq-cppc 
energy-perf:N" to achieve
We didn't provide any options in cmdline, so I tried to re-use the governor to 
achieve a few. hmmm,
now it seems it brings confusion

>
> Jan

Re: [PATCH v2 1/5] x86/pmstat: Check size of PMSTAT_get_pxstat buffers

On 20.05.2025 12:53, Ross Lagerwall wrote:
> On Tue, May 13, 2025 at 3:27 PM Jan Beulich  wrote:
>>
>> On 12.05.2025 16:46, Ross Lagerwall wrote:
>>> Check that the total number of states passed in and hence the size of
>>> buffers is sufficient to avoid writing more than the caller has
>>> allocated.
>>>
>>> The interface is not explicit about whether getpx.total is expected to
>>> be set by the caller in this case but since it is always set in
>>> libxenctrl it seems reasonable to check it.
>>
>> Yet if we start checking the value, I think the public header should also
>> be made say so (in a comment).
>>
>>> --- a/xen/drivers/acpi/pmstat.c
>>> +++ b/xen/drivers/acpi/pmstat.c
>>> @@ -103,8 +103,10 @@ int do_get_pm_info(struct xen_sysctl_get_pmstat *op)
>>>
>>>  cpufreq_residency_update(op->cpuid, pxpt->u.cur);
>>>
>>> -ct = pmpt->perf.state_count;
>>> -if ( copy_to_guest(op->u.getpx.trans_pt, pxpt->u.trans_pt, ct*ct) )
>>> +ct = min_t(uint32_t, pmpt->perf.state_count, op->u.getpx.total);
>>
>> With this, ...
>>
>>> +if ( ct <= op->u.getpx.total &&
>>
>> ... this is going to be always true, isn't it? Which constitutes a violation
>> of Misra rule 14.3.
>>
>> Also it would be nice if the min_t() could become a normal min(), e.g. by
>> "promoting" op->u.getpx.total to unsigned int via adding 0U. This way it's
>> clear there's no hidden truncation (or else there might be an argument for
>> keeping the check above).
>>
>>> + copy_to_guest(op->u.getpx.trans_pt, pxpt->u.trans_pt, ct * 
>>> ct) )
>>>  {
>>>  spin_unlock(cpufreq_statistic_lock);
>>>  ret = -EFAULT;
>>
>> Why would you constrain this copy-out but not the one just out of context
>> below here? (The question is of course moot if the condition was dropped.)
>>
> 
> Oh, I had intended this condition to be...
> 
> if ( ct == op->u.getpx.total &&
> 
> ... based on your previous comment about the difficulties of partially
> copying a 2d array.
> 
>> An option may be to document that this array is copied only when the
>> buffer is large enough.
> 
> I left the other alone since partially copying a 1d array makes sense.

Right, if the relation there becomes == it indeed reflects that the 2-D
one is different in this regard from the 1-D one.

> If you would prefer, I can drop the condition and just let the caller
> deal with the partially copied 2d array?

With the adjusted relation I think all is going to be fine as you
(otherwise) had it. There may want to be a brief comment on that extra
condition you add.

Jan

Re: [PATCH v7 3/3] xen/domain: introduce CONFIG_MAX_DOMID

On Tue, May 20, 2025 at 08:04:14AM +0200, Jan Beulich wrote:
> On 19.05.2025 21:23, dm...@proton.me wrote:
> > From: Denis Mukhin 
> >
> > Embedded deployments of Xen do not need to have support for more than dozen 
> > of
> > domains.
> >
> > Introduce build-time configuration option to limit the number of domains 
> > during
> > run-time.
> 
> I fear I don't see the (sufficiently meaningful) gain of this. And I must 
> have ...
> 
> > Suggested-by: Julien Grall 
> 
> ... missed tis earlier suggestion, or else I would have asked the question 
> already
> there.

The code change is based on the feedback here:

  
https://lore.kernel.org/xen-devel/2e5afdf1-3913-4b6f-86ea-21b3ccd08...@xen.org/

It probably should have been sent as an RFC change.

> 
> > --- a/xen/arch/arm/tee/ffa.c
> > +++ b/xen/arch/arm/tee/ffa.c
> > @@ -333,8 +333,9 @@ static int ffa_domain_init(struct domain *d)
> >   */
> >  BUILD_BUG_ON(DOMID_FIRST_RESERVED >= UINT16_MAX);
> >  BUILD_BUG_ON((DOMID_MASK & BIT(15, U)) != 0);
> > +BUILD_BUG_ON(DOMID_FIRST_RESERVED < CONFIG_MAX_DOMID);
> 
> We want this check, yes, but in common code. It's entirely unrelated to Arm's 
> TEE.
> 
> > --- a/xen/common/Kconfig
> > +++ b/xen/common/Kconfig
> > @@ -576,4 +576,11 @@ config BUDDY_ALLOCATOR_SIZE
> >   Amount of memory reserved for the buddy allocator to serve Xen heap,
> >   working alongside the colored one.
> >
> > +config MAX_DOMID
> > +   int "Maximum number of non-system domains"
> 
> Hmm, without clarifying what a system domain is (is hwdom one? is a control
> domain one), I fear this may be ambiguous to users.
> 
> Jan
>

Re: [PATCH] xen: Give compile.h header guards

On Tue, 20 May 2025, Jan Beulich wrote:
> On 19.05.2025 23:34, Stefano Stabellini wrote:
> > On Mon, 19 May 2025, Jan Beulich wrote:
> >> On 19.05.2025 15:52, Andrew Cooper wrote:
> >>> Signed-off-by: Andrew Cooper 
> >>
> >> Is this to please Misra in some way?
> >>
> >>> --- a/xen/include/xen/compile.h.in
> >>> +++ b/xen/include/xen/compile.h.in
> >>> @@ -1,3 +1,6 @@
> >>> +#ifndef XEN_COMPILE_H
> >>> +#define XEN_COMPILE_H
> >>> +
> >>>  #define XEN_COMPILE_DATE "@@date@@"
> >>>  #define XEN_COMPILE_TIME "@@time@@"
> >>>  #define XEN_COMPILE_BY   "@@whoami@@"
> >>> --- a/xen/tools/process-banner.sed
> >>> +++ b/xen/tools/process-banner.sed
> >>> @@ -12,3 +12,8 @@ s_(.*)_"\1\\n"_
> >>>  
> >>>  # Trailing \ on all but the final line.
> >>>  $!s_$_ \\_
> >>> +
> >>> +# Append closing header guard
> >>> +$a\
> >>> +\
> >>> +#endif /* XEN_COMPILE_H */
> >>
> >> This split of #ifndef and #endif is ugly. Can't we switch to something
> >> more conventional, like
> >>
> >> #define XEN_BANNER "@@banner@@"
> >>
> >> with the first sed invocation then replacing this by the result of
> >> a nested sed invocation using process-banner.sed (which of course would
> >> need adjusting some)? (Maybe the double quotes would need omitting here,
> >> for process-banner.sed to uniformly apply them.)
> > 
> > While I agree with Jan that this is kind of ugly, it is a unique case
> > and I would prefer an ugly but simple solution than a more complex
> > solution. This would be different if we were talking about a widespread
> > pattern, in which case the price for complexity would be worth it.
> > 
> > My 2 cents in this case are in favor of the simplest approach. I would
> > ack this patch.
> 
> Feel free to do so; my comment was not meant as a plain objection, but more
> as a suggestion. The one thing I would ask for is a non-empty description,
> though.

Fair enough. Andrew, please add a better commit message. With that:

Reviewed-by: Stefano Stabellini

[PATCH TEST-ARTEFACTS] (Re)add python3 to alpine rootfs

XTF uses python, and we're looking to reintroduce XTF testing to Xen.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Marek Marczykowski-Górecki 
---
 scripts/alpine-rootfs.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/alpine-rootfs.sh b/scripts/alpine-rootfs.sh
index c304e2ebfbd9..c999b89dbcd8 100755
--- a/scripts/alpine-rootfs.sh
+++ b/scripts/alpine-rootfs.sh
@@ -22,6 +22,9 @@ PKGS=(
 xz
 yajl
 
+# Xen Test Framework
+python3
+
 # QEMU
 glib
 libaio
-- 
2.39.5

Re: [PATCH] xen/argo: Command line handling improvements

On Tue, May 20, 2025 at 03:10:27PM +0100, Andrew Cooper wrote:
> Treat "argo" on the command line as a positive boolean, rather than requiring
> the user to pass "argo=1/on/enable/true".
> 
> Move both opt_argo* variables into __ro_after_init.  They're set during
> command line parsing and never modified thereafter.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Denis Mukhin 

> ---
> CC: Christopher Clark 
> CC: Daniel P. Smith 
> CC: Denis Mukhin 
> 
> Found while
> ---
>  xen/common/argo.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/argo.c b/xen/common/argo.c
> index cbe8911a4364..027b37b18a6c 100644
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -79,8 +79,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
>  DEFINE_COMPAT_HANDLE(compat_argo_iov_t);
>  #endif
> 
> -static bool __read_mostly opt_argo;
> -static bool __read_mostly opt_argo_mac_permissive;
> +static bool __ro_after_init opt_argo;
> +static bool __ro_after_init opt_argo_mac_permissive;
> 
>  static int __init cf_check parse_argo(const char *s)
>  {
> @@ -92,7 +92,10 @@ static int __init cf_check parse_argo(const char *s)
>  if ( !ss )
>  ss = strchr(s, '\0');
> 
> -if ( (val = parse_bool(s, ss)) >= 0 )
> +/* Intepret "argo" as a positive boolean. */
> +if ( *s == '\0' )
> +opt_argo = true;
> +else if ( (val = parse_bool(s, ss)) >= 0 )
>  opt_argo = val;
>  else if ( (val = parse_boolean("mac-permissive", s, ss)) >= 0 )
>  opt_argo_mac_permissive = val;
> 
> base-commit: 293abb9e0c5e8df96cc5dfe457c62dfcb7542b19
> --
> 2.39.5
>

Re: [PATCH] CI: Rename qubes-x86-64 parameter "" to "dom0pv"

2025-05-20 Thread Marek Marczykowski-Górecki

On Tue, May 20, 2025 at 06:37:19PM +0100, Andrew Cooper wrote:
> This really is a legacy of not having parameters to start with.  Give PV dom0
> with a PVH domU a real name.
> 
> Reformat the table to fix alignment.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Marek Marczykowski-Górecki 

> ---
> CC: Marek Marczykowski-Górecki 
> ---
>  automation/gitlab-ci/test.yaml |  8 
>  automation/scripts/qubes-x86-64.sh | 22 +++---
>  2 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> index a603d4039aef..842cecf71382 100644
> --- a/automation/gitlab-ci/test.yaml
> +++ b/automation/gitlab-ci/test.yaml
> @@ -257,7 +257,7 @@ xilinx-smoke-dom0-x86_64-gcc-debug-argo:
>  adl-smoke-x86-64-gcc-debug:
>extends: .adl-x86-64
>script:
> -- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
> +- ./automation/scripts/qubes-x86-64.sh dom0pv 2>&1 | tee ${LOGFILE}
>needs:
>  - *x86-64-test-needs
>  - alpine-3.18-gcc-debug
> @@ -335,7 +335,7 @@ adl-tools-tests-pvh-x86-64-gcc-debug:
>  kbl-smoke-x86-64-gcc-debug:
>extends: .kbl-x86-64
>script:
> -- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
> +- ./automation/scripts/qubes-x86-64.sh dom0pv 2>&1 | tee ${LOGFILE}
>needs:
>  - *x86-64-test-needs
>  - alpine-3.18-gcc-debug
> @@ -413,7 +413,7 @@ kbl-tools-tests-pvh-x86-64-gcc-debug:
>  zen2-smoke-x86-64-gcc-debug:
>extends: .zen2-x86-64
>script:
> -- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
> +- ./automation/scripts/qubes-x86-64.sh dom0pv 2>&1 | tee ${LOGFILE}
>needs:
>  - *x86-64-test-needs
>  - alpine-3.18-gcc-debug
> @@ -429,7 +429,7 @@ zen2-suspend-x86-64-gcc-debug:
>  zen3p-smoke-x86-64-gcc-debug:
>extends: .zen3p-x86-64
>script:
> -- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
> +- ./automation/scripts/qubes-x86-64.sh dom0pv 2>&1 | tee ${LOGFILE}
>needs:
>  - *x86-64-test-needs
>  - alpine-3.18-gcc-debug
> diff --git a/automation/scripts/qubes-x86-64.sh 
> b/automation/scripts/qubes-x86-64.sh
> index bfdd2ceb99ba..aa47ba6bf5c0 100755
> --- a/automation/scripts/qubes-x86-64.sh
> +++ b/automation/scripts/qubes-x86-64.sh
> @@ -2,16 +2,16 @@
>  
>  set -ex -o pipefail
>  
> -# One of:
> -#  - "" PV dom0,  PVH domU
> -#  - dom0pvhPVH dom0, PVH domU
> -#  - dom0pvh-hvmPVH dom0, HVM domU
> -#  - pci-hvmPV dom0,  HVM domU + PCI Passthrough
> -#  - pci-pv PV dom0,  PV domU + PCI Passthrough
> -#  - pvshim PV dom0,  PVSHIM domU
> -#  - s3 PV dom0,  S3 suspend/resume
> -#  - tools-tests-pv PV dom0, run tests from tools/tests/*
> -#  - tools-tests-pvh PVH dom0, run tests from tools/tests/*
> +# One of:  dom0:   Testing:
> +#  - dom0pv  PV  PVH domU
> +#  - dom0pvh PVH PVH domU
> +#  - dom0pvh-hvm PVH HVM domU
> +#  - pci-hvm PV  HVM domU + PCI Passthrough
> +#  - pci-pv  PV  PV domU + PCI Passthrough
> +#  - pvshim  PV  PVSHIM domU
> +#  - s3  PV  S3 suspend/resume
> +#  - tools-tests-pv  PV  Run tests from tools/tests/*
> +#  - tools-tests-pvh PVH Run tests from tools/tests/*
>  test_variant=$1
>  
>  ### defaults
> @@ -25,7 +25,7 @@ retrieve_xml=
>  
>  case "${test_variant}" in
>  ### test: smoke test & smoke test PVH & smoke test HVM & smoke test 
> PVSHIM
> -""|"dom0pvh"|"dom0pvh-hvm"|"pvshim")
> +"dom0pv"|"dom0pvh"|"dom0pvh-hvm"|"pvshim")
>  passed="ping test passed"
>  domU_check="
>  ifconfig eth0 192.168.0.2
> -- 
> 2.39.5
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

[PATCH 1/3] tools/tests: Drop depriv-fd-checker

Unlike the other tests, this is not standalone.  It requires poking at a live
system, making it unweildly to use.

It hasn't been touched in 7 years, despite changes in libraries and kernel
devices using the deprivilege infrastructure.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Marek Marczykowski-Górecki 
---
 .gitignore |   1 -
 tools/tests/Makefile   |   1 -
 tools/tests/depriv/Makefile|  52 ---
 tools/tests/depriv/depriv-fd-checker.c | 436 -
 4 files changed, 490 deletions(-)
 delete mode 100644 tools/tests/depriv/Makefile
 delete mode 100644 tools/tests/depriv/depriv-fd-checker.c

diff --git a/.gitignore b/.gitignore
index 53f5df000383..4a4e20680464 100644
--- a/.gitignore
+++ b/.gitignore
@@ -165,7 +165,6 @@ tools/misc/xencov
 tools/pkg-config/*
 tools/qemu-xen-build
 tools/xentrace/xenalyze
-tools/tests/depriv/depriv-fd-checker
 tools/tests/x86_emulator/*.bin
 tools/tests/x86_emulator/*.tmp
 tools/tests/x86_emulator/32/x86_emulate
diff --git a/tools/tests/Makefile b/tools/tests/Makefile
index 3e60ab6b268d..36928676a666 100644
--- a/tools/tests/Makefile
+++ b/tools/tests/Makefile
@@ -9,7 +9,6 @@ ifneq ($(clang),y)
 SUBDIRS-$(CONFIG_X86) += x86_emulator
 endif
 SUBDIRS-y += xenstore
-SUBDIRS-y += depriv
 SUBDIRS-y += rangeset
 SUBDIRS-y += vpci
 SUBDIRS-y += paging-mempool
diff --git a/tools/tests/depriv/Makefile b/tools/tests/depriv/Makefile
deleted file mode 100644
index 5404a12f4780..
--- a/tools/tests/depriv/Makefile
+++ /dev/null
@@ -1,52 +0,0 @@
-XEN_ROOT=$(CURDIR)/../../..
-include $(XEN_ROOT)/tools/Rules.mk
-
-CFLAGS += $(CFLAGS_xeninclude)
-CFLAGS += $(CFLAGS_libxenctrl)
-CFLAGS += $(CFLAGS_libxencall)
-CFLAGS += $(CFLAGS_libxenevtchn)
-CFLAGS += $(CFLAGS_libxengnttab)
-CFLAGS += $(CFLAGS_libxenforeignmemory)
-CFLAGS += $(CFLAGS_libxendevicemodel)
-CFLAGS += $(CFLAGS_libxentoolcore)
-CFLAGS += $(CFLAGS_libxentoollog)
-
-LDLIBS += $(LDLIBS_xeninclude)
-LDLIBS += $(LDLIBS_libxenctrl)
-LDLIBS += $(LDLIBS_libxencall)
-LDLIBS += $(LDLIBS_libxenevtchn)
-LDLIBS += $(LDLIBS_libxengnttab)
-LDLIBS += $(LDLIBS_libxenforeignmemory)
-LDLIBS += $(LDLIBS_libxendevicemodel)
-LDLIBS += $(LDLIBS_libxentoolcore)
-LDLIBS += $(LDLIBS_libxentoollog)
-
-INSTALL_PRIVBIN-y += depriv-fd-checker
-INSTALL_PRIVBIN := $(INSTALL_PRIVBIN-y)
-TARGETS += $(INSTALL_PRIVBIN)
-
-.PHONY: all
-all: build
-
-.PHONY: build
-build: $(TARGETS)
-
-.PHONY: clean
-clean:
-   $(RM) *.o $(TARGETS) *~ $(DEPS_RM)
-
-.PHONY: distclean
-distclean: clean
-
-depriv-fd-checker: depriv-fd-checker.o
-   $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS) $(APPEND_LDFLAGS)
-
-install: all
-   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
-   $(INSTALL_PROG) $(INSTALL_PRIVBIN) $(DESTDIR)$(LIBEXEC_BIN)
-
-.PHONY: uninstall
-uninstall:
-   rm -f $(addprefix $(DESTDIR)$(LIBEXEC_BIN)/, $(INSTALL_PRIVBIN))
-
--include $(DEPS_INCLUDE)
diff --git a/tools/tests/depriv/depriv-fd-checker.c 
b/tools/tests/depriv/depriv-fd-checker.c
deleted file mode 100644
index 98a27a03d543..
--- a/tools/tests/depriv/depriv-fd-checker.c
+++ /dev/null
@@ -1,436 +0,0 @@
-/*
- * depriv-fd-checker
- *
- * utility to check whether file descriptor(s) are deprivileged
- *
- * usage:
- *  .../depriv-fd-checker CLASS FD X-INFO [CLASS FD X-INFO...]
- *
- * CLASS is one of:
- *privcmd gntdev evtchn FD should be appropriate Xen control fd
- *readonly  FD is expected to be readonly
- *appendonlyFD is expected to be append write only
- #tun   FD is expected to be an open tun device
- *
- * In each case FD is probably a reference to an open-file stolen
- * from another process, eg by the use of fishdescriptor.
- *
- * X-INFO is simply appended to the discursive reportage.
- *
- * It is an error if depriv-fd-checker cannot open the control
- * facilities itself, or something goes wrong with checking, or an FD
- * is entirely the wrong kind for the specified CLASS.  Otherwise:
- *
- * depriv-fd-checker will perhaps print, for each triplet:
- *   CLASS checking FD INFORMATION... X-INFO
- * and in any case print, for each triplet, exactly one of:
- *   CLASS pass|fail FD INFORMATION... X-INFO
- *   tun maybe FD IFNAME X-INFO
- *
- * "pass" means that the descriptor was restricted as expected.
- * "fail" means that the descriptor was unrestricted.
- * "maybe" means that further information is printed, as detailed above,
- * and the caller should check that it is as expected
- */
-/*
- * Copyright (C)2018 Citrix Systems R&D
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public License
- * as published by the Free Software Foundation; version 2.1 of the
- * License.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
-

[PATCH 3/3] CI: Drop custom handling of tools/tests

... and use them from their installed location.

The full recusive copy of tools/tests brings in all build and intermediate
artefacts.  e.g. for test-tsx alone:

  ./tests/tsx
  ./tests/tsx/.test-tsx.o.d
  ./tests/tsx/test-tsx.o
  ./tests/tsx/.gitignore
  ./tests/tsx/test-tsx
  ./tests/tsx/Makefile
  ./tests/tsx/test-tsx.c

duplicating the test binary which is also in ./usr/lib/xen/tests

Rewrite run-tools-tests to run tests from their installed
location (/usr/lib/xen/tests in alpine), which effectively removes the outer
loop over $dir.

No practical change.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Marek Marczykowski-Górecki 

This doesn't change any tests that run, although in the XML we get two fewer 
skips.

Both skips can be fixed by giving vpci and x86_emulator some install targets
---
 automation/scripts/build   |  1 -
 automation/scripts/qubes-x86-64.sh |  7 +++--
 automation/scripts/run-tools-tests | 43 +-
 3 files changed, 22 insertions(+), 29 deletions(-)

diff --git a/automation/scripts/build b/automation/scripts/build
index cdb8cd7c722b..0e7494ff6d87 100755
--- a/automation/scripts/build
+++ b/automation/scripts/build
@@ -109,6 +109,5 @@ else
 # even though dist/ contains everything, while some containers don't even
 # build Xen
 (cd dist/install; find | cpio -o -H newc | gzip) > 
binaries/xen-tools.cpio.gz
-cp -r tools/tests binaries/
 collect_xen_artefacts
 fi
diff --git a/automation/scripts/qubes-x86-64.sh 
b/automation/scripts/qubes-x86-64.sh
index aa47ba6bf5c0..577a00238a75 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -136,7 +136,7 @@ done
 passed="test passed"
 domU_check=""
 dom0_check="
-/tests/run-tools-tests /tests /tmp/tests-junit.xml && echo \"${passed}\"
+/root/run-tools-tests /usr/lib/xen/tests /tmp/tests-junit.xml && echo 
\"${passed}\"
 nc -l -p 8080 < /tmp/tests-junit.xml >/dev/null &
 "
 if [ "${test_variant}" = "tools-tests-pvh" ]; then
@@ -195,9 +195,8 @@ cat binaries/xen-tools.cpio.gz >> 
binaries/dom0-rootfs.cpio.gz
 # test-local configuration
 mkdir -p rootfs
 cd rootfs
-mkdir -p boot etc/local.d
-cp -ar ../binaries/tests .
-cp -a ../automation/scripts/run-tools-tests tests/
+mkdir -p boot etc/local.d root
+cp -a ../automation/scripts/run-tools-tests root/
 
 echo "#!/bin/bash
 
diff --git a/automation/scripts/run-tools-tests 
b/automation/scripts/run-tools-tests
index 770e97c3e943..8d7aa8fa5140 100755
--- a/automation/scripts/run-tools-tests
+++ b/automation/scripts/run-tools-tests
@@ -12,30 +12,25 @@ printf '\n' > 
"$xml_out"
 printf '\n' >> "$xml_out"
 printf ' \n' >> "$xml_out"
 failed=
-for dir in "$1"/*; do
-[ -d "$dir" ] || continue
-echo "Running test in $dir"
-printf '  \n' "$dir" >> "$xml_out"
-ret=
-for f in "$dir"/*; do
-[ -f "$f" ] || continue
-[ -x "$f" ] || continue
-"$f" 2>&1 | tee /tmp/out
-ret=$?
-if [ "$ret" -ne 0 ]; then
-echo "FAILED: $ret"
-failed+=" $dir"
-printf '   \n' "$f" "$ret" >> "$xml_out"
-# TODO: could use xml escaping... but current tests seems to
-# produce sane output
-cat /tmp/out >> "$xml_out"
-printf '   \n' >> "$xml_out"
-else
-echo "PASSED"
-fi
-done
-if [ -z "$ret" ]; then
-printf '   \n' "$dir" >> "$xml_out"
+for f in "$1"/*; do
+if [ -x "$f" ]; then
+echo "SKIP: $f not executable"
+continue
+fi
+echo "Running $f"
+printf '  \n' "$f" >> "$xml_out"
+"$f" 2>&1 | tee /tmp/out
+ret=$?
+if [ "$ret" -ne 0 ]; then
+echo "FAILED: $f"
+failed+=" $f"
+printf '   \n' "$f" "$ret" >> "$xml_out"
+# TODO: could use xml escaping... but current tests seems to
+# produce sane output
+cat /tmp/out >> "$xml_out"
+printf '   \n' >> "$xml_out"
+else
+echo "PASSED"
 fi
 printf '  \n' >> "$xml_out"
 done
-- 
2.39.5

[PATCH 2/3] tools/tests: Install tests into $(LIBEXEC)/tests

$(LIBEXEC_BIN) is a dumping ground of many things.  Separate the "clearly
tests" from everything else so we can clean up how they're run in CI.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Stefano Stabellini 
CC: Marek Marczykowski-Górecki 
---
 tools/tests/cpu-policy/Makefile | 6 +++---
 tools/tests/paging-mempool/Makefile | 6 +++---
 tools/tests/rangeset/Makefile   | 6 +++---
 tools/tests/resource/Makefile   | 6 +++---
 tools/tests/tsx/Makefile| 6 +++---
 tools/tests/xenstore/Makefile   | 6 +++---
 6 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/tools/tests/cpu-policy/Makefile b/tools/tests/cpu-policy/Makefile
index 5df9b1ebbd7e..24f87e2eca2a 100644
--- a/tools/tests/cpu-policy/Makefile
+++ b/tools/tests/cpu-policy/Makefile
@@ -29,12 +29,12 @@ distclean: clean
 
 .PHONY: install
 install: all
-   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
-   $(if $(TARGETS),$(INSTALL_PROG) $(TARGETS) $(DESTDIR)$(LIBEXEC_BIN))
+   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC)/tests
+   $(if $(TARGETS),$(INSTALL_PROG) $(TARGETS) $(DESTDIR)$(LIBEXEC)/tests)
 
 .PHONY: uninstall
 uninstall:
-   $(RM) -- $(addprefix $(DESTDIR)$(LIBEXEC_BIN)/,$(TARGETS))
+   $(RM) -- $(addprefix $(DESTDIR)$(LIBEXEC)/tests/,$(TARGETS))
 
 CFLAGS += -D__XEN_TOOLS__
 CFLAGS += $(CFLAGS_xeninclude)
diff --git a/tools/tests/paging-mempool/Makefile 
b/tools/tests/paging-mempool/Makefile
index a081b3baa514..a1e12584ce80 100644
--- a/tools/tests/paging-mempool/Makefile
+++ b/tools/tests/paging-mempool/Makefile
@@ -16,12 +16,12 @@ distclean: clean
 
 .PHONY: install
 install: all
-   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
-   $(INSTALL_PROG) $(TARGET) $(DESTDIR)$(LIBEXEC_BIN)
+   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC)/tests
+   $(INSTALL_PROG) $(TARGET) $(DESTDIR)$(LIBEXEC)/tests
 
 .PHONY: uninstall
 uninstall:
-   $(RM) -- $(DESTDIR)$(LIBEXEC_BIN)/$(TARGET)
+   $(RM) -- $(DESTDIR)$(LIBEXEC)/tests/$(TARGET)
 
 CFLAGS += $(CFLAGS_xeninclude)
 CFLAGS += $(CFLAGS_libxenctrl)
diff --git a/tools/tests/rangeset/Makefile b/tools/tests/rangeset/Makefile
index 3dafcbd0546c..e3bfce471cd3 100644
--- a/tools/tests/rangeset/Makefile
+++ b/tools/tests/rangeset/Makefile
@@ -20,12 +20,12 @@ distclean: clean
 
 .PHONY: install
 install: all
-   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
-   $(INSTALL_PROG) $(TARGET) $(DESTDIR)$(LIBEXEC_BIN)
+   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC)/tests
+   $(INSTALL_PROG) $(TARGET) $(DESTDIR)$(LIBEXEC)/tests
 
 .PHONY: uninstall
 uninstall:
-   $(RM) -- $(addprefix $(DESTDIR)$(LIBEXEC_BIN)/,$(TARGET))
+   $(RM) -- $(addprefix $(DESTDIR)$(LIBEXEC)/tests/,$(TARGET))
 
 list.h: $(XEN_ROOT)/xen/include/xen/list.h
 rangeset.h: $(XEN_ROOT)/xen/include/xen/rangeset.h
diff --git a/tools/tests/resource/Makefile b/tools/tests/resource/Makefile
index a5856bf09590..09d678fffe3e 100644
--- a/tools/tests/resource/Makefile
+++ b/tools/tests/resource/Makefile
@@ -20,12 +20,12 @@ distclean: clean
 
 .PHONY: install
 install: all
-   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
-   $(INSTALL_PROG) $(TARGET) $(DESTDIR)$(LIBEXEC_BIN)
+   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC)/tests
+   $(INSTALL_PROG) $(TARGET) $(DESTDIR)$(LIBEXEC)/tests
 
 .PHONY: uninstall
 uninstall:
-   $(RM) -- $(DESTDIR)$(LIBEXEC_BIN)/$(TARGET)
+   $(RM) -- $(DESTDIR)$(LIBEXEC)/tests/$(TARGET)
 
 CFLAGS += $(CFLAGS_xeninclude)
 CFLAGS += $(CFLAGS_libxenctrl)
diff --git a/tools/tests/tsx/Makefile b/tools/tests/tsx/Makefile
index a4f516b72597..0bb7e7010347 100644
--- a/tools/tests/tsx/Makefile
+++ b/tools/tests/tsx/Makefile
@@ -16,12 +16,12 @@ distclean: clean
 
 .PHONY: install
 install: all
-   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
-   $(INSTALL_PROG) $(TARGET) $(DESTDIR)$(LIBEXEC_BIN)
+   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC)/tests
+   $(INSTALL_PROG) $(TARGET) $(DESTDIR)$(LIBEXEC)/tests
 
 .PHONY: uninstall
 uninstall:
-   $(RM) -- $(DESTDIR)$(LIBEXEC_BIN)/$(TARGET)
+   $(RM) -- $(DESTDIR)$(LIBEXEC)/tests/$(TARGET)
 
 .PHONY: uninstall
 uninstall:
diff --git a/tools/tests/xenstore/Makefile b/tools/tests/xenstore/Makefile
index 202dda0d3c23..2ee4a1327e75 100644
--- a/tools/tests/xenstore/Makefile
+++ b/tools/tests/xenstore/Makefile
@@ -20,12 +20,12 @@ distclean: clean
 
 .PHONY: install
 install: all
-   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
-   $(if $(TARGETS),$(INSTALL_PROG) $(TARGETS) $(DESTDIR)$(LIBEXEC_BIN))
+   $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC)/tests
+   $(if $(TARGETS),$(INSTALL_PROG) $(TARGETS) $(DESTDIR)$(LIBEXEC)/tests)
 
 .PHONY: uninstall
 uninstall:
-   $(RM) -- $(addprefix $(DESTDIR)$(LIBEXEC_BIN)/,$(TARGETS))
+   $(RM) -- $(addprefix $(DESTDIR)$(LIBEXEC)/tests/,$(TARGETS))
 
 CFLAGS += $(CFLAGS_libxenstore)
 CFLAGS += $(APPEND_CFLAGS)
-- 
2.39.5

[PATCH 0/3] CI: Improvements to -tools-test- jobs

Rearrange tools/tests to be more ameanable to running in CI, and drop the
special casing holding it together.



Andrew Cooper (3):
  tools/tests: Drop depriv-fd-checker
  tools/tests: Install tests into $(LIBEXEC)/tests
  CI: Drop custom handling of tools/tests

 .gitignore |   1 -
 automation/scripts/build   |   1 -
 automation/scripts/qubes-x86-64.sh |   7 +-
 automation/scripts/run-tools-tests |  43 ++-
 tools/tests/Makefile   |   1 -
 tools/tests/cpu-policy/Makefile|   6 +-
 tools/tests/depriv/Makefile|  52 ---
 tools/tests/depriv/depriv-fd-checker.c | 436 -
 tools/tests/paging-mempool/Makefile|   6 +-
 tools/tests/rangeset/Makefile  |   6 +-
 tools/tests/resource/Makefile  |   6 +-
 tools/tests/tsx/Makefile   |   6 +-
 tools/tests/xenstore/Makefile  |   6 +-
 13 files changed, 40 insertions(+), 537 deletions(-)
 delete mode 100644 tools/tests/depriv/Makefile
 delete mode 100644 tools/tests/depriv/depriv-fd-checker.c

-- 
2.39.5

Re: [PATCH] tools: Add install/uninstall targets to tests/x86_emulator

On 16/05/2024 12:07 pm, Alejandro Vallejo wrote:
> Bring test_x86_emulator in line with other tests by adding
> install/uninstall rules.
>
> Signed-off-by: Alejandro Vallejo 
> ---
>  tools/tests/x86_emulator/Makefile | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/tools/tests/x86_emulator/Makefile 
> b/tools/tests/x86_emulator/Makefile
> index 834b2112e7fe..30edf7e0185d 100644
> --- a/tools/tests/x86_emulator/Makefile
> +++ b/tools/tests/x86_emulator/Makefile
> @@ -269,8 +269,15 @@ clean:
>  .PHONY: distclean
>  distclean: clean
>  
> -.PHONY: install uninstall
> -install uninstall:
> +.PHONY: install
> +install: all
> + $(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
> + $(if $(TARGET-y),$(INSTALL_PROG) $(TARGET-y) $(DESTDIR)$(LIBEXEC_BIN))
> +
> +.PHONY: uninstall
> +uninstall:
> + $(RM) -- $(addprefix $(DESTDIR)$(LIBEXEC_BIN)/,$(TARGET-y))
> +
>  
>  .PHONY: run32 clean32
>  ifeq ($(XEN_COMPILE_ARCH),x86_64)

[starting a clean thread]

x86_emulator is not special enough to behave differently to the rest of
tools/.

Theoretical concerns over cross compiling test_x86_emulator for non-x86
can be fixed by whomever first wants to do this.

The very real problem is that this doesn't run in x86 CI, because and
only because it doesn't have an install target.

~Andrew

[PATCH] CI: Rename qubes-x86-64 parameter "" to "dom0pv"

This really is a legacy of not having parameters to start with.  Give PV dom0
with a PVH domU a real name.

Reformat the table to fix alignment.

Signed-off-by: Andrew Cooper 
---
CC: Marek Marczykowski-Górecki 
---
 automation/gitlab-ci/test.yaml |  8 
 automation/scripts/qubes-x86-64.sh | 22 +++---
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index a603d4039aef..842cecf71382 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -257,7 +257,7 @@ xilinx-smoke-dom0-x86_64-gcc-debug-argo:
 adl-smoke-x86-64-gcc-debug:
   extends: .adl-x86-64
   script:
-- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
+- ./automation/scripts/qubes-x86-64.sh dom0pv 2>&1 | tee ${LOGFILE}
   needs:
 - *x86-64-test-needs
 - alpine-3.18-gcc-debug
@@ -335,7 +335,7 @@ adl-tools-tests-pvh-x86-64-gcc-debug:
 kbl-smoke-x86-64-gcc-debug:
   extends: .kbl-x86-64
   script:
-- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
+- ./automation/scripts/qubes-x86-64.sh dom0pv 2>&1 | tee ${LOGFILE}
   needs:
 - *x86-64-test-needs
 - alpine-3.18-gcc-debug
@@ -413,7 +413,7 @@ kbl-tools-tests-pvh-x86-64-gcc-debug:
 zen2-smoke-x86-64-gcc-debug:
   extends: .zen2-x86-64
   script:
-- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
+- ./automation/scripts/qubes-x86-64.sh dom0pv 2>&1 | tee ${LOGFILE}
   needs:
 - *x86-64-test-needs
 - alpine-3.18-gcc-debug
@@ -429,7 +429,7 @@ zen2-suspend-x86-64-gcc-debug:
 zen3p-smoke-x86-64-gcc-debug:
   extends: .zen3p-x86-64
   script:
-- ./automation/scripts/qubes-x86-64.sh 2>&1 | tee ${LOGFILE}
+- ./automation/scripts/qubes-x86-64.sh dom0pv 2>&1 | tee ${LOGFILE}
   needs:
 - *x86-64-test-needs
 - alpine-3.18-gcc-debug
diff --git a/automation/scripts/qubes-x86-64.sh 
b/automation/scripts/qubes-x86-64.sh
index bfdd2ceb99ba..aa47ba6bf5c0 100755
--- a/automation/scripts/qubes-x86-64.sh
+++ b/automation/scripts/qubes-x86-64.sh
@@ -2,16 +2,16 @@
 
 set -ex -o pipefail
 
-# One of:
-#  - "" PV dom0,  PVH domU
-#  - dom0pvhPVH dom0, PVH domU
-#  - dom0pvh-hvmPVH dom0, HVM domU
-#  - pci-hvmPV dom0,  HVM domU + PCI Passthrough
-#  - pci-pv PV dom0,  PV domU + PCI Passthrough
-#  - pvshim PV dom0,  PVSHIM domU
-#  - s3 PV dom0,  S3 suspend/resume
-#  - tools-tests-pv PV dom0, run tests from tools/tests/*
-#  - tools-tests-pvh PVH dom0, run tests from tools/tests/*
+# One of:  dom0:   Testing:
+#  - dom0pv  PV  PVH domU
+#  - dom0pvh PVH PVH domU
+#  - dom0pvh-hvm PVH HVM domU
+#  - pci-hvm PV  HVM domU + PCI Passthrough
+#  - pci-pv  PV  PV domU + PCI Passthrough
+#  - pvshim  PV  PVSHIM domU
+#  - s3  PV  S3 suspend/resume
+#  - tools-tests-pv  PV  Run tests from tools/tests/*
+#  - tools-tests-pvh PVH Run tests from tools/tests/*
 test_variant=$1
 
 ### defaults
@@ -25,7 +25,7 @@ retrieve_xml=
 
 case "${test_variant}" in
 ### test: smoke test & smoke test PVH & smoke test HVM & smoke test PVSHIM
-""|"dom0pvh"|"dom0pvh-hvm"|"pvshim")
+"dom0pv"|"dom0pvh"|"dom0pvh-hvm"|"pvshim")
 passed="ping test passed"
 domU_check="
 ifconfig eth0 192.168.0.2
-- 
2.39.5

Re: [PATCH] xen/argo: Command line handling improvements

2025-05-20 Thread Daniel P. Smith


On 5/20/25 10:10, Andrew Cooper wrote:

Treat "argo" on the command line as a positive boolean, rather than requiring
the user to pass "argo=1/on/enable/true".

Move both opt_argo* variables into __ro_after_init.  They're set during
command line parsing and never modified thereafter.

Signed-off-by: Andrew Cooper 
---
CC: Christopher Clark 
CC: Daniel P. Smith 
CC: Denis Mukhin 

Found while
---
  xen/common/argo.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/common/argo.c b/xen/common/argo.c
index cbe8911a4364..027b37b18a6c 100644
--- a/xen/common/argo.c
+++ b/xen/common/argo.c
@@ -79,8 +79,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
  DEFINE_COMPAT_HANDLE(compat_argo_iov_t);
  #endif
  
-static bool __read_mostly opt_argo;

-static bool __read_mostly opt_argo_mac_permissive;
+static bool __ro_after_init opt_argo;
+static bool __ro_after_init opt_argo_mac_permissive;
  
  static int __init cf_check parse_argo(const char *s)

  {
@@ -92,7 +92,10 @@ static int __init cf_check parse_argo(const char *s)
  if ( !ss )
  ss = strchr(s, '\0');
  
-if ( (val = parse_bool(s, ss)) >= 0 )

+/* Intepret "argo" as a positive boolean. */
+if ( *s == '\0' )
+opt_argo = true;
+else if ( (val = parse_bool(s, ss)) >= 0 )
  opt_argo = val;
  else if ( (val = parse_boolean("mac-permissive", s, ss)) >= 0 )
  opt_argo_mac_permissive = val;

base-commit: 293abb9e0c5e8df96cc5dfe457c62dfcb7542b19


While it is logical, this does technically change the behavior of the 
command line flag. Should there be an update to xen-command-line.pandoc 
to clarify that the list is optional?


Other than the doc concern,

Reviewed-by: Daniel P. Smith

Re: [PATCH v2 1/2] xen/domain: introduce non-x86 hardware emulation flags

On Tue, May 20, 2025 at 05:21:06PM +0200, Jan Beulich wrote:
> On 16.05.2025 04:29, dm...@proton.me wrote:
> > From: Denis Mukhin 
> >
> > Define per-architecture emulation_flags for configuring domain emulation
> > features.
> >
> > Print d->arch.emulation_flags from 'q' keyhandler for better traceability
> > while debugging.
> >
> > Signed-off-by: Denis Mukhin 
> > ---
> > Changes since v1:
> > - dropped comments
> > ---
> >  xen/arch/arm/include/asm/domain.h   | 1 +
> >  xen/arch/ppc/include/asm/domain.h   | 1 +
> >  xen/arch/riscv/include/asm/domain.h | 1 +
> >  xen/common/keyhandler.c | 1 +
> >  4 files changed, 4 insertions(+)
> 
> If every arch gains identical fields, accessed from common code, why would
> those need to live in struct arch_domain?

I did it this way to keep the diff smaller, but I agree such property
makes sense to put in common domain struct. Will update in v3.

Thanks!

> 
> Jan

Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0

2025-05-20 Thread Chen, Jiqian

On 2025/5/19 21:21, Roger Pau Monné wrote:
> On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote:
>> On 19.05.2025 09:13, Chen, Jiqian wrote:
>>> On 2025/5/19 14:56, Jan Beulich wrote:
 On 19.05.2025 08:43, Chen, Jiqian wrote:
> On 2025/5/18 22:20, Jan Beulich wrote:
>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct 
>>> pci_dev *pdev)
>>>   
>>> PCI_STATUS_RSVDZ_MASK);
>>>  }
>>>  
>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>> +{
>>> +unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
>>
>> The ttl value exists (in the function you took it from) to make sure
>> the loop below eventually ends. That is, to be able to kind of
>> gracefully deal with loops in the linked list. Such loops, however,
>> would ...
>>
>>> +if ( !is_hardware_domain(pdev->domain) )
>>> +/* Extended capabilities read as zero, write ignore for guest 
>>> */
>>> +return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>> + pos, 4, (void *)0);
>>> +
>>> +while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>>> +{
>>> +uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>> +int rc;
>>> +
>>> +if ( !header )
>>> +return 0;
>>> +
>>> +rc = vpci_add_register(pdev->vpci, vpci_read_val, 
>>> vpci_hw_write32,
>>> +   pos, 4, (void *)(uintptr_t)header);
>>
>> ... mean we may invoke this twice for the same capability. Such
>> a secondary invocation would fail with -EEXIST, causing device init
>> to fail altogether. Which is kind of against our aim of exposing
>> (in a controlled manner) as much of the PCI hardware as possible.
> May I know what situation that can make this twice for one capability 
> when initialization?
> Does hardware capability list have a cycle?

 Any of this is to work around flawed hardware, I suppose.

>> Imo we ought to be using a bitmap to detect the situation earlier
>> and hence to be able to avoid redundant register addition. Thoughts?
> Can we just let it go forward and continue to add register for next 
> capability when rc == -EXIST, instead of returning error ?

 Possible, but feels wrong.
>>> How about when EXIST, setting the next bits of previous extended capability 
>>> to be zero and return 0? Then we break the cycle.
>>
>> Hmm. Again an option, yet again I'm not certain. But that's perhaps just
>> me, and Roger may be fine with it. IOW we might as well start out this way,
>> and adjust if (ever) an issue with a real device is found.
> 
> Returning -EEXIST might be fine, but at that point there's no further
> capability to process.  There's a loop in the linked capability list,
> and we should just exit.  There needs to be a warning in this case,
> and since this is for the hardware domain only it shouldn't be fatal.
> 
If I understand correctly, I need to add below in next version?

 rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
pos, 4, (void *)(uintptr_t)header);
+
+if ( rc == -EEXIST )
+{
+printk(XENLOG_WARNING
+   "%pd %pp: there is a loop in the linked capability list\n",
+   pdev->domain, &pdev->sbdf);
+return 0;
+}
+
 if ( rc )
 return rc;

> If it was for domUs we would possibly need to discuss whether
> assigning the device should fail if a capability linked list loop is
> found.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

Re: [PATCH v2 1/2] xen/domain: introduce non-x86 hardware emulation flags

On 20.05.2025 23:39, dm...@proton.me wrote:
> On Tue, May 20, 2025 at 05:21:06PM +0200, Jan Beulich wrote:
>> On 16.05.2025 04:29, dm...@proton.me wrote:
>>> From: Denis Mukhin 
>>>
>>> Define per-architecture emulation_flags for configuring domain emulation
>>> features.
>>>
>>> Print d->arch.emulation_flags from 'q' keyhandler for better traceability
>>> while debugging.
>>>
>>> Signed-off-by: Denis Mukhin 
>>> ---
>>> Changes since v1:
>>> - dropped comments
>>> ---
>>>  xen/arch/arm/include/asm/domain.h   | 1 +
>>>  xen/arch/ppc/include/asm/domain.h   | 1 +
>>>  xen/arch/riscv/include/asm/domain.h | 1 +
>>>  xen/common/keyhandler.c | 1 +
>>>  4 files changed, 4 insertions(+)
>>
>> If every arch gains identical fields, accessed from common code, why would
>> those need to live in struct arch_domain?
> 
> I did it this way to keep the diff smaller, but I agree such property
> makes sense to put in common domain struct. Will update in v3.

Provided there's arch maintainer buy-off on generalizing this. There's (at
least) a 3rd option, after all: Have arch-specific and (separate) common
emulation flags.

Jan

Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0

2025-05-20 Thread Chen, Jiqian

On 2025/5/21 14:25, Jan Beulich wrote:
> On 21.05.2025 08:08, Chen, Jiqian wrote:
>> On 2025/5/19 21:21, Roger Pau Monné wrote:
>>> On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote:
 On 19.05.2025 09:13, Chen, Jiqian wrote:
> On 2025/5/19 14:56, Jan Beulich wrote:
>> On 19.05.2025 08:43, Chen, Jiqian wrote:
>>> On 2025/5/18 22:20, Jan Beulich wrote:
 On 09.05.2025 11:05, Jiqian Chen wrote:
> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct 
> pci_dev *pdev)
>   
> PCI_STATUS_RSVDZ_MASK);
>  }
>  
> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
> +{
> +unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;

 The ttl value exists (in the function you took it from) to make sure
 the loop below eventually ends. That is, to be able to kind of
 gracefully deal with loops in the linked list. Such loops, however,
 would ...

> +if ( !is_hardware_domain(pdev->domain) )
> +/* Extended capabilities read as zero, write ignore for 
> guest */
> +return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
> + pos, 4, (void *)0);
> +
> +while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
> +{
> +uint32_t header = pci_conf_read32(pdev->sbdf, pos);
> +int rc;
> +
> +if ( !header )
> +return 0;
> +
> +rc = vpci_add_register(pdev->vpci, vpci_read_val, 
> vpci_hw_write32,
> +   pos, 4, (void *)(uintptr_t)header);

 ... mean we may invoke this twice for the same capability. Such
 a secondary invocation would fail with -EEXIST, causing device init
 to fail altogether. Which is kind of against our aim of exposing
 (in a controlled manner) as much of the PCI hardware as possible.
>>> May I know what situation that can make this twice for one capability 
>>> when initialization?
>>> Does hardware capability list have a cycle?
>>
>> Any of this is to work around flawed hardware, I suppose.
>>
 Imo we ought to be using a bitmap to detect the situation earlier
 and hence to be able to avoid redundant register addition. Thoughts?
>>> Can we just let it go forward and continue to add register for next 
>>> capability when rc == -EXIST, instead of returning error ?
>>
>> Possible, but feels wrong.
> How about when EXIST, setting the next bits of previous extended 
> capability to be zero and return 0? Then we break the cycle.

 Hmm. Again an option, yet again I'm not certain. But that's perhaps just
 me, and Roger may be fine with it. IOW we might as well start out this way,
 and adjust if (ever) an issue with a real device is found.
>>>
>>> Returning -EEXIST might be fine, but at that point there's no further
>>> capability to process.  There's a loop in the linked capability list,
>>> and we should just exit.  There needs to be a warning in this case,
>>> and since this is for the hardware domain only it shouldn't be fatal.
>>>
>> If I understand correctly, I need to add below in next version?
>>
>>  rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>> pos, 4, (void *)(uintptr_t)header);
>> +
>> +if ( rc == -EEXIST )
>> +{
>> +printk(XENLOG_WARNING
>> +   "%pd %pp: there is a loop in the linked capability 
>> list\n",
> 
> I think we shouldn't say "loop" unless we firmly know that's what the
> issue is. Maybe use "overlap" instead? And then also log the offending
> register range? (As a nit: "there is" and "linked" are not adding any
> value to the log message; to keep them short [without losing
> information], please try to avoid such.)
OK, below may be more in line with your opinion.

 rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
pos, 4, (void *)(uintptr_t)header);
+
+if ( rc == -EEXIST )
+{
+printk(XENLOG_WARNING
+   "%pd %pp: overlap in extended cap list, offset %#x\n",
+   pdev->domain, &pdev->sbdf, pos);
+return 0;
+}
+
 if ( rc )
 return rc;

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [PATCH v2 2/2] xen/domain: rewrite emulation_flags_ok()

On 21.05.2025 01:00, Stefano Stabellini wrote:
> On Tue, 20 May 2025, dm...@proton.me wrote:
>> On Tue, May 20, 2025 at 05:24:33PM +0200, Jan Beulich wrote:
>>> On 16.05.2025 04:29, dm...@proton.me wrote:
 --- a/xen/arch/x86/include/asm/domain.h
 +++ b/xen/arch/x86/include/asm/domain.h
 @@ -494,6 +494,12 @@ struct arch_domain
   X86_EMU_PIT | X86_EMU_USE_PIRQ |   \
   X86_EMU_VPCI)

 +/* User-selectable features. */
 +#define X86_EMU_OPTIONAL(X86_EMU_USE_PIRQ)
 +
 +#define X86_EMU_BASELINE(X86_EMU_ALL & ~(X86_EMU_VPCI | \
 + X86_EMU_OPTIONAL))
>>>
>>> That is, VPCI is neither baseline nor optional. Certainly at least strange.
> 
> I think Denis tried to keep the code more similar to the original. This
> way it is easier to review the code change and it seems correct. But at
> the same time it is easier to spot inconsistency that were present even
> before the patch.

Right, and that was in response to me complaining on an earlier version
that the behavior was (silently) changed. Yet doing it like this wasn't
the only way to address my comment there. At the same time (or already
before) I raised the question whether having such constants is helpful /
necessary in the first place. And now that their names don't reflect
their purpose, this question becomes yet more relevant.

Jan

Re: [PATCH] xen/argo: Command line handling improvements

On 20.05.2025 20:45, Daniel P. Smith wrote:
> On 5/20/25 10:10, Andrew Cooper wrote:
>> Treat "argo" on the command line as a positive boolean, rather than requiring
>> the user to pass "argo=1/on/enable/true".
>>
>> Move both opt_argo* variables into __ro_after_init.  They're set during
>> command line parsing and never modified thereafter.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Christopher Clark 
>> CC: Daniel P. Smith 
>> CC: Denis Mukhin 
>>
>> Found while
>> ---
>>   xen/common/argo.c | 9 ++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/common/argo.c b/xen/common/argo.c
>> index cbe8911a4364..027b37b18a6c 100644
>> --- a/xen/common/argo.c
>> +++ b/xen/common/argo.c
>> @@ -79,8 +79,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_argo_unregister_ring_t);
>>   DEFINE_COMPAT_HANDLE(compat_argo_iov_t);
>>   #endif
>>   
>> -static bool __read_mostly opt_argo;
>> -static bool __read_mostly opt_argo_mac_permissive;
>> +static bool __ro_after_init opt_argo;
>> +static bool __ro_after_init opt_argo_mac_permissive;
>>   
>>   static int __init cf_check parse_argo(const char *s)
>>   {
>> @@ -92,7 +92,10 @@ static int __init cf_check parse_argo(const char *s)
>>   if ( !ss )
>>   ss = strchr(s, '\0');
>>   
>> -if ( (val = parse_bool(s, ss)) >= 0 )
>> +/* Intepret "argo" as a positive boolean. */
>> +if ( *s == '\0' )
>> +opt_argo = true;
>> +else if ( (val = parse_bool(s, ss)) >= 0 )
>>   opt_argo = val;
>>   else if ( (val = parse_boolean("mac-permissive", s, ss)) >= 0 )
>>   opt_argo_mac_permissive = val;
>>
>> base-commit: 293abb9e0c5e8df96cc5dfe457c62dfcb7542b19
> 
> While it is logical, this does technically change the behavior of the 
> command line flag. Should there be an update to xen-command-line.pandoc 
> to clarify that the list is optional?

I'd view it the other way around: If you look at the neighboring doc
entries, we uniformly say

> `= `

when the options can appear all by themselves. This would therefore be
the expected behavior for "argo" alone, too.

Jan

Re: [PATCH v2 2/2] xen/domain: rewrite emulation_flags_ok()

On 21.05.2025 00:38, dm...@proton.me wrote:
> On Tue, May 20, 2025 at 05:24:33PM +0200, Jan Beulich wrote:
>> On 16.05.2025 04:29, dm...@proton.me wrote:
>>> --- a/xen/arch/x86/include/asm/domain.h
>>> +++ b/xen/arch/x86/include/asm/domain.h
>>> @@ -494,6 +494,12 @@ struct arch_domain
>>>   X86_EMU_PIT | X86_EMU_USE_PIRQ |   \
>>>   X86_EMU_VPCI)
>>>
>>> +/* User-selectable features. */
>>> +#define X86_EMU_OPTIONAL(X86_EMU_USE_PIRQ)
>>> +
>>> +#define X86_EMU_BASELINE(X86_EMU_ALL & ~(X86_EMU_VPCI | \
>>> + X86_EMU_OPTIONAL))
>>
>> That is, VPCI is neither baseline nor optional. Certainly at least strange.
> 
> IMO, X86_EMU_OPTIONAL should include both VPCI and PIRQ.
> 
> But that will be a behavior change: AFAIU, VPCI is injected implicitly for 
> dom0
> case only, "default" xl toolstack currently excludes VPCI for HVM domains.
> 
> Do I understand it correctly that "BASELINE" in the symbol name is a concern?

It's not the word by itself. _If_ we want to have such symbols (which I had put
under question before), they need to be named to accurately reflect the purpose.
Imo with the names chosen an implication is that the two are non-overlapping,
while when combined yield the full set of flags.

Jan

Re: [PATCH v4 03/10] vpci/header: Emulate extended capability list for dom0