Re: [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain

2024-01-10 Thread Jan Beulich
On 10.01.2024 17:24, Elias El Yandouzi wrote:
> On 22/12/2022 13:06, Jan Beulich wrote:
>> On 16.12.2022 12:48, Julien Grall wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5963,6 +5963,9 @@ int create_perdomain_mapping(struct domain *d, 
>>> unsigned long va,
>>>   l3tab = __map_domain_page(pg);
>>>   clear_page(l3tab);
>>>   d->arch.perdomain_l3_pg = pg;
>>> +if ( is_idle_domain(d) )
>>> +idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] =
>>> +l4e_from_page(pg, __PAGE_HYPERVISOR_RW);
>>
>> Hmm, having an idle domain check here isn't very nice. I agree putting
>> it in arch_domain_create()'s respective conditional isn't very neat
>> either, but personally I'd consider this at least a little less bad.
>> And the layering violation aspect isn't much worse than that of setting
>> d->arch.ctxt_switch there as well.
> 
> Why do you think it would be less bad to move it in 
> arch_domain_create()? To me, it would make things worse as it would 
> spread the mapping stuff across different functions.

Not sure what to add to what I said: create_perdomain_mapping() gaining
such a check is a layering violation to me. arch_domain_create() otoh
special cases the idle domain already.

Jan



Re: [PATCH 1/2] x86/vmx: Fix IRQ handling for EXIT_REASON_INIT

2024-01-10 Thread Jan Beulich
On 10.01.2024 20:11, Andrew Cooper wrote:
> On 02/11/2023 8:57 am, Jan Beulich wrote:
>> On 01.11.2023 20:20, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -4097,10 +4097,6 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>>>  case EXIT_REASON_MCE_DURING_VMENTRY:
>>>  do_machine_check(regs);
>>>  break;
>>> -
>>> -case EXIT_REASON_INIT:
>>> -printk(XENLOG_ERR "Error: INIT received - ignoring\n");
>>> -return; /* Renter the guest without further processing */
>>>  }
>> Wouldn't the printk() better remain where it was, and just the "return" be
>> purged?
> 
> Not really... that would hit the unknown vmexit path in the second.

Well, I didn't mean to suggest to purge the other hunk. Instead I meant ...

> We actually have a variety of empty cases in the second.  We could add
> another.

... something along these lines - do nothing but "break;" there.

Jan



[PATCH 8/8] x86/APIC: drop regs parameter from direct vector handler functions

2024-01-10 Thread Jan Beulich
The only place it was needed is in the spurious handler, and there we
can use get_irq_regs() instead.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1324,7 +1324,7 @@ int reprogram_timer(s_time_t timeout)
 return apic_tmict || !timeout;
 }
 
-static void cf_check apic_timer_interrupt(struct cpu_user_regs *regs)
+static void cf_check apic_timer_interrupt(void)
 {
 ack_APIC_irq();
 perfc_incr(apic_timer);
@@ -1343,7 +1343,7 @@ void smp_send_state_dump(unsigned int cp
 /*
  * Spurious interrupts should _never_ happen with our APIC/SMP architecture.
  */
-static void cf_check spurious_interrupt(struct cpu_user_regs *regs)
+static void cf_check spurious_interrupt(void)
 {
 /*
  * Check if this is a vectored interrupt (most likely, as this is probably
@@ -1357,7 +1357,7 @@ static void cf_check spurious_interrupt(
 is_spurious = !nmi_check_continuation();
 if (this_cpu(state_dump_pending)) {
 this_cpu(state_dump_pending) = false;
-dump_execstate(regs);
+dump_execstate(get_irq_regs());
 is_spurious = false;
 }
 
@@ -1374,7 +1374,7 @@ static void cf_check spurious_interrupt(
  * This interrupt should never happen with our APIC/SMP architecture
  */
 
-static void cf_check error_interrupt(struct cpu_user_regs *regs)
+static void cf_check error_interrupt(void)
 {
 static const char *const esr_fields[] = {
 ", Send CS error",
@@ -1409,7 +1409,7 @@ static void cf_check error_interrupt(str
  * This interrupt handles performance counters interrupt
  */
 
-static void cf_check pmu_interrupt(struct cpu_user_regs *regs)
+static void cf_check pmu_interrupt(void)
 {
 ack_APIC_irq();
 vpmu_do_interrupt();
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -58,7 +58,7 @@ bool __read_mostly lmce_support;
 #define MCE_RING0x1
 static DEFINE_PER_CPU(int, last_state);
 
-static void cf_check intel_thermal_interrupt(struct cpu_user_regs *regs)
+static void cf_check intel_thermal_interrupt(void)
 {
 uint64_t msr_content;
 unsigned int cpu = smp_processor_id();
@@ -642,7 +642,7 @@ static void cpu_mcheck_disable(void)
 clear_cmci();
 }
 
-static void cf_check cmci_interrupt(struct cpu_user_regs *regs)
+static void cf_check cmci_interrupt(void)
 {
 mctelem_cookie_t mctc;
 struct mca_summary bs;
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -158,7 +158,7 @@ static void __init init_memmap(void)
 }
 }
 
-static void cf_check xen_evtchn_upcall(struct cpu_user_regs *regs)
+static void cf_check xen_evtchn_upcall(void)
 {
 struct vcpu_info *vcpu_info = this_cpu(vcpu_info);
 unsigned long pending;
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2773,7 +2773,7 @@ static struct hvm_function_table __initd
 };
 
 /* Handle VT-d posted-interrupt when VCPU is blocked. */
-static void cf_check pi_wakeup_interrupt(struct cpu_user_regs *regs)
+static void cf_check pi_wakeup_interrupt(void)
 {
 struct vmx_vcpu *vmx, *tmp;
 spinlock_t *lock = _cpu(vmx_pi_blocking, smp_processor_id()).lock;
@@ -2805,7 +2805,7 @@ static void cf_check pi_wakeup_interrupt
 }
 
 /* Handle VT-d posted-interrupt when VCPU is running. */
-static void cf_check pi_notification_interrupt(struct cpu_user_regs *regs)
+static void cf_check pi_notification_interrupt(void)
 {
 ack_APIC_irq();
 this_cpu(irq_count)++;
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -72,17 +72,15 @@ extern int opt_irq_vector_map;
 
 #define platform_legacy_irq(irq)   ((irq) < 16)
 
-void cf_check event_check_interrupt(struct cpu_user_regs *regs);
-void cf_check invalidate_interrupt(struct cpu_user_regs *regs);
-void cf_check call_function_interrupt(struct cpu_user_regs *regs);
-void cf_check irq_move_cleanup_interrupt(struct cpu_user_regs *regs);
+void cf_check event_check_interrupt(void);
+void cf_check invalidate_interrupt(void);
+void cf_check call_function_interrupt(void);
+void cf_check irq_move_cleanup_interrupt(void);
 
 uint8_t alloc_hipriority_vector(void);
 
-void set_direct_apic_vector(
-uint8_t vector, void (*handler)(struct cpu_user_regs *regs));
-void alloc_direct_apic_vector(
-uint8_t *vector, void (*handler)(struct cpu_user_regs *regs));
+void set_direct_apic_vector(uint8_t vector, void (*handler)(void));
+void alloc_direct_apic_vector(uint8_t *vector, void (*handler)(void));
 
 void do_IRQ(struct cpu_user_regs *regs);
 
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -743,7 +743,7 @@ void move_native_irq(struct irq_desc *de
 desc->handler->enable(desc);
 }
 
-void cf_check irq_move_cleanup_interrupt(struct cpu_user_regs *regs)
+void cf_check irq_move_cleanup_interrupt(void)
 {
 unsigned vector, me;
 
@@ -913,16 +913,14 @@ uint8_t alloc_hipriority_vector(void)
 return next++;
 }
 
-static void 

[PATCH 7/8] x86/vPMU: drop regs parameter from interrupt functions

2024-01-10 Thread Jan Beulich
The vendor functions don't use the respective parameters at all. In
vpmu_do_interrupt() there's only a very limited area where the
outer context's state would be needed, retrievable by get_irq_regs().

This is in preparation of dropping the register parameters from direct
APIC vector handler functions.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1412,7 +1412,7 @@ static void cf_check error_interrupt(str
 static void cf_check pmu_interrupt(struct cpu_user_regs *regs)
 {
 ack_APIC_irq();
-vpmu_do_interrupt(regs);
+vpmu_do_interrupt();
 }
 
 void __init apic_intr_init(void)
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -158,7 +158,7 @@ static inline struct vcpu *choose_hwdom_
 return hardware_domain->vcpu[idx];
 }
 
-void vpmu_do_interrupt(struct cpu_user_regs *regs)
+void vpmu_do_interrupt(void)
 {
 struct vcpu *sampled = current, *sampling;
 struct vpmu_struct *vpmu;
@@ -239,6 +239,7 @@ void vpmu_do_interrupt(struct cpu_user_r
 else
 #endif
 {
+const struct cpu_user_regs *regs = get_irq_regs();
 struct xen_pmu_regs *r = >xenpmu_data->pmu.r.regs;
 
 if ( (vpmu_mode & XENPMU_MODE_SELF) )
@@ -301,7 +302,7 @@ void vpmu_do_interrupt(struct cpu_user_r
 /* We don't support (yet) HVM dom0 */
 ASSERT(sampling == sampled);
 
-if ( !alternative_call(vpmu_ops.do_interrupt, regs) ||
+if ( !alternative_call(vpmu_ops.do_interrupt) ||
  !is_vlapic_lvtpc_enabled(vlapic) )
 return;
 
--- a/xen/arch/x86/cpu/vpmu_amd.c
+++ b/xen/arch/x86/cpu/vpmu_amd.c
@@ -175,7 +175,7 @@ static void amd_vpmu_unset_msr_bitmap(st
 msr_bitmap_off(vpmu);
 }
 
-static int cf_check amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
+static int cf_check amd_vpmu_do_interrupt(void)
 {
 return 1;
 }
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -772,7 +772,7 @@ static void cf_check core2_vpmu_dump(con
 }
 }
 
-static int cf_check core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
+static int cf_check core2_vpmu_do_interrupt(void)
 {
 struct vcpu *v = current;
 u64 msr_content;
--- a/xen/arch/x86/include/asm/vpmu.h
+++ b/xen/arch/x86/include/asm/vpmu.h
@@ -31,7 +31,7 @@ struct arch_vpmu_ops {
 int (*initialise)(struct vcpu *v);
 int (*do_wrmsr)(unsigned int msr, uint64_t msr_content);
 int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
-int (*do_interrupt)(struct cpu_user_regs *regs);
+int (*do_interrupt)(void);
 void (*arch_vpmu_destroy)(struct vcpu *v);
 int (*arch_vpmu_save)(struct vcpu *v, bool to_guest);
 int (*arch_vpmu_load)(struct vcpu *v, bool from_guest);
@@ -99,7 +99,7 @@ static inline bool vpmu_are_all_set(cons
 
 void vpmu_lvtpc_update(uint32_t val);
 int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, bool is_write);
-void vpmu_do_interrupt(struct cpu_user_regs *regs);
+void vpmu_do_interrupt(void);
 void vpmu_initialise(struct vcpu *v);
 void vpmu_destroy(struct vcpu *v);
 void vpmu_save(struct vcpu *v);




[PATCH 6/8] IRQ: drop register parameter from handler functions

2024-01-10 Thread Jan Beulich
It's simply not needed anymore. Note how Linux made this change many
years ago already, in 2.6.19 (late 2006, see [1]).

Signed-off-by: Jan Beulich 

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=7d12e780e003f93433d49ce78cfedf4b4c52adc5

--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -397,7 +397,7 @@ void gic_interrupt(struct cpu_user_regs
 } while (1);
 }
 
-static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs 
*regs)
+static void maintenance_interrupt(int irq, void *dev_id)
 {
 /*
  * This is a dummy interrupt handler.
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -182,8 +182,7 @@ void irq_set_affinity(struct irq_desc *d
 }
 
 int request_irq(unsigned int irq, unsigned int irqflags,
-void (*handler)(int irq, void *dev_id,
-struct cpu_user_regs *regs),
+void (*handler)(int irq, void *dev_id),
 const char *devname, void *dev_id)
 {
 struct irqaction *action;
@@ -276,7 +275,7 @@ void do_IRQ(struct cpu_user_regs *regs,
 
 do
 {
-action->handler(irq, action->dev_id, regs);
+action->handler(irq, action->dev_id);
 action = action->next;
 } while ( action );
 
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -241,7 +241,7 @@ int reprogram_timer(s_time_t timeout)
 }
 
 /* Handle the firing timer */
-static void htimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
+static void htimer_interrupt(int irq, void *dev_id)
 {
 if ( unlikely(!(READ_SYSREG(CNTHP_CTL_EL2) & CNTx_CTL_PENDING)) )
 return;
@@ -255,7 +255,7 @@ static void htimer_interrupt(int irq, vo
 WRITE_SYSREG(0, CNTHP_CTL_EL2);
 }
 
-static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
+static void vtimer_interrupt(int irq, void *dev_id)
 {
 /*
  * Edge-triggered interrupts can be used for the virtual timer. Even
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -962,7 +962,7 @@ static int __init cf_check irq_ratelimit
 __initcall(irq_ratelimit_init);
 
 int __init request_irq(unsigned int irq, unsigned int irqflags,
-void (*handler)(int irq, void *dev_id, struct cpu_user_regs *regs),
+void (*handler)(int irq, void *dev_id),
 const char * devname, void *dev_id)
 {
 struct irqaction * action;
@@ -2009,7 +2009,7 @@ void do_IRQ(struct cpu_user_regs *regs)
 spin_unlock_irq(>lock);
 
 tsc_in = tb_init_done ? get_cycles() : 0;
-action->handler(irq, action->dev_id, regs);
+action->handler(irq, action->dev_id);
 TRACE_3D(TRC_HW_IRQ_HANDLED, irq, tsc_in, get_cycles());
 
 spin_lock_irq(>lock);
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -237,8 +237,7 @@ again:
 }
 }
 
-static void cf_check hpet_interrupt_handler(
-int irq, void *data, struct cpu_user_regs *regs)
+static void cf_check hpet_interrupt_handler(int irq, void *data)
 {
 struct hpet_event_channel *ch = data;
 
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -198,8 +198,7 @@ static void smp_send_timer_broadcast_ipi
 }
 }
 
-static void cf_check timer_interrupt(
-int irq, void *dev_id, struct cpu_user_regs *regs)
+static void cf_check timer_interrupt(int irq, void *dev_id)
 {
 ASSERT(local_irq_is_enabled());
 
--- a/xen/common/irq.c
+++ b/xen/common/irq.c
@@ -29,7 +29,7 @@ int init_one_irq_desc(struct irq_desc *d
 return err;
 }
 
-void cf_check no_action(int cpl, void *dev_id, struct cpu_user_regs *regs)
+void cf_check no_action(int cpl, void *dev_id)
 {
 }
 
--- a/xen/drivers/char/cadence-uart.c
+++ b/xen/drivers/char/cadence-uart.c
@@ -40,7 +40,7 @@ static struct cuart {
 #define cuart_read(uart, off)   readl((uart)->regs + (off))
 #define cuart_write(uart, off,val)  writel((val), (uart)->regs + (off))
 
-static void cuart_interrupt(int irq, void *data, struct cpu_user_regs *regs)
+static void cuart_interrupt(int irq, void *data)
 {
 struct serial_port *port = data;
 struct cuart *uart = port->uart;
--- a/xen/drivers/char/exynos4210-uart.c
+++ b/xen/drivers/char/exynos4210-uart.c
@@ -45,7 +45,7 @@ static struct exynos4210_uart {
 #define exynos4210_read(uart, off)  readl((uart)->regs + off)
 #define exynos4210_write(uart, off, val)writel(val, (uart->regs) + off)
 
-static void exynos4210_uart_interrupt(int irq, void *data, struct 
cpu_user_regs *regs)
+static void exynos4210_uart_interrupt(int irq, void *data)
 {
 struct serial_port *port = data;
 struct exynos4210_uart *uart = port->uart;
--- a/xen/drivers/char/imx-lpuart.c
+++ b/xen/drivers/char/imx-lpuart.c
@@ -37,8 +37,7 @@ static struct imx_lpuart {
 struct vuart_info vuart;
 } imx8_com;
 
-static void imx_lpuart_interrupt(int irq, void *data,
- struct cpu_user_regs *regs)
+static void imx_lpuart_interrupt(int irq, void *data)
 {
 struct serial_port *port = data;
 struct 

[PATCH 5/8] serial: drop serial_[rt]x_interrupt()'s regs parameter

2024-01-10 Thread Jan Beulich
In the the polling functions (ab)using set_irq_regs() is necessary
to balance the change. This is in preparation of dropping the register
parameters from IRQ handler functions.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/char/cadence-uart.c
+++ b/xen/drivers/char/cadence-uart.c
@@ -51,7 +51,7 @@ static void cuart_interrupt(int irq, voi
 /* ACK.  */
 if ( status & UART_SR_INTR_RTRIG )
 {
-serial_rx_interrupt(port, regs);
+serial_rx_interrupt(port);
 cuart_write(uart, R_UART_CISR, UART_SR_INTR_RTRIG);
 }
 } while ( status & UART_SR_INTR_RTRIG );
--- a/xen/drivers/char/ehci-dbgp.c
+++ b/xen/drivers/char/ehci-dbgp.c
@@ -1253,6 +1253,7 @@ static void cf_check _ehci_dbgp_poll(str
 unsigned long flags;
 unsigned int timeout = MICROSECS(DBGP_CHECK_INTERVAL);
 bool empty = false;
+struct cpu_user_regs *old_regs;
 
 if ( !dbgp->ehci_debug )
 return;
@@ -1268,11 +1269,16 @@ static void cf_check _ehci_dbgp_poll(str
 spin_unlock_irqrestore(>tx_lock, flags);
 }
 
+/* Mimic interrupt context. */
+old_regs = set_irq_regs(regs);
+
 if ( dbgp->in.chunk )
-serial_rx_interrupt(port, regs);
+serial_rx_interrupt(port);
 
 if ( empty )
-serial_tx_interrupt(port, regs);
+serial_tx_interrupt(port);
+
+set_irq_regs(old_regs);
 
 if ( spin_trylock_irqsave(>tx_lock, flags) )
 {
--- a/xen/drivers/char/exynos4210-uart.c
+++ b/xen/drivers/char/exynos4210-uart.c
@@ -81,7 +81,7 @@ static void exynos4210_uart_interrupt(in
 if ( status & (UINTM_RXD | UINTM_ERROR) )
 {
 /* uart->regs[UINTM] |= RXD|ERROR; */
-serial_rx_interrupt(port, regs);
+serial_rx_interrupt(port);
 /* uart->regs[UINTM] &= ~(RXD|ERROR); */
 exynos4210_write(uart, UINTP, UINTM_RXD | UINTM_ERROR);
 }
@@ -89,7 +89,7 @@ static void exynos4210_uart_interrupt(in
 if ( status & (UINTM_TXD | UINTM_MODEM) )
 {
 /* uart->regs[UINTM] |= TXD|MODEM; */
-serial_tx_interrupt(port, regs);
+serial_tx_interrupt(port);
 /* uart->regs[UINTM] &= ~(TXD|MODEM); */
 exynos4210_write(uart, UINTP, UINTM_TXD | UINTM_MODEM);
 }
--- a/xen/drivers/char/imx-lpuart.c
+++ b/xen/drivers/char/imx-lpuart.c
@@ -48,10 +48,10 @@ static void imx_lpuart_interrupt(int irq
 rxcnt = imx_lpuart_read(uart, UARTWATER) >> UARTWATER_RXCNT_OFF;
 
 if ( (sts & UARTSTAT_RDRF) || (rxcnt > 0) )
-   serial_rx_interrupt(port, regs);
+   serial_rx_interrupt(port);
 
 if ( sts & UARTSTAT_TDRE )
-   serial_tx_interrupt(port, regs);
+   serial_tx_interrupt(port);
 
 imx_lpuart_write(uart, UARTSTAT, sts);
 }
--- a/xen/drivers/char/meson-uart.c
+++ b/xen/drivers/char/meson-uart.c
@@ -69,10 +69,10 @@ static void meson_uart_interrupt(int irq
 uint32_t st = readl(uart->regs + AML_UART_STATUS_REG);
 
 if ( !(st & AML_UART_RX_FIFO_EMPTY) )
-serial_rx_interrupt(port, regs);
+serial_rx_interrupt(port);
 
 if ( !(st & AML_UART_TX_FIFO_FULL) )
-serial_tx_interrupt(port, regs);
+serial_tx_interrupt(port);
 }
 
 static void __init meson_uart_init_preirq(struct serial_port *port)
--- a/xen/drivers/char/mvebu-uart.c
+++ b/xen/drivers/char/mvebu-uart.c
@@ -76,10 +76,10 @@ static void mvebu3700_uart_interrupt(int
 
 if ( st & (STATUS_RX_RDY | STATUS_OVR_ERR | STATUS_FRM_ERR |
STATUS_BRK_DET) )
-serial_rx_interrupt(port, regs);
+serial_rx_interrupt(port);
 
 if ( st & STATUS_TX_RDY )
-serial_tx_interrupt(port, regs);
+serial_tx_interrupt(port);
 }
 
 static void __init mvebu3700_uart_init_preirq(struct serial_port *port)
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -188,9 +188,9 @@ static void cf_check ns16550_interrupt(
 u8 lsr = ns_read_reg(uart, UART_LSR);
 
 if ( (lsr & uart->lsr_mask) == uart->lsr_mask )
-serial_tx_interrupt(port, regs);
+serial_tx_interrupt(port);
 if ( lsr & UART_LSR_DR )
-serial_rx_interrupt(port, regs);
+serial_rx_interrupt(port);
 
 /* A "busy-detect" condition is observed on Allwinner/sunxi UART
  * after LCR is written during setup. It needs to be cleared at
@@ -211,22 +211,27 @@ static void cf_check __ns16550_poll(stru
 {
 struct serial_port *port = this_cpu(poll_port);
 struct ns16550 *uart = port->uart;
+struct cpu_user_regs *old_regs;
 
 if ( uart->intr_works )
 return; /* Interrupts work - no more polling */
 
+/* Mimic interrupt context. */
+old_regs = set_irq_regs(regs);
+
 while ( ns_read_reg(uart, UART_LSR) & UART_LSR_DR )
 {
 if ( ns16550_ioport_invalid(uart) )
 goto out;
 
-serial_rx_interrupt(port, regs);
+serial_rx_interrupt(port);
 }
 

[PATCH 4/8] PV-shim: drop pv_console_rx()'s regs parameter

2024-01-10 Thread Jan Beulich
It's not needed anymore. This is in preparation of dropping the register
parameters from IRQ handler functions.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -181,7 +181,7 @@ static void cf_check xen_evtchn_upcall(s
 port += l1 * BITS_PER_LONG;
 
 if ( pv_console && port == pv_console_evtchn() )
-pv_console_rx(regs);
+pv_console_rx();
 else if ( pv_shim )
 pv_shim_inject_evtchn(port);
 }
--- a/xen/drivers/char/xen_pv_console.c
+++ b/xen/drivers/char/xen_pv_console.c
@@ -94,7 +94,7 @@ evtchn_port_t pv_console_evtchn(void)
 return cons_evtchn;
 }
 
-size_t pv_console_rx(struct cpu_user_regs *regs)
+size_t pv_console_rx(void)
 {
 char c;
 XENCONS_RING_IDX cons, prod;
--- a/xen/include/xen/pv_console.h
+++ b/xen/include/xen/pv_console.h
@@ -9,7 +9,7 @@ void pv_console_init(void);
 void pv_console_set_rx_handler(serial_rx_fn fn);
 void pv_console_init_postirq(void);
 void pv_console_puts(const char *buf, size_t nr);
-size_t pv_console_rx(struct cpu_user_regs *regs);
+size_t pv_console_rx(void);
 evtchn_port_t pv_console_evtchn(void);
 
 #else
@@ -18,7 +18,7 @@ static inline void pv_console_init(void)
 static inline void pv_console_set_rx_handler(serial_rx_fn fn) { }
 static inline void pv_console_init_postirq(void) { }
 static inline void pv_console_puts(const char *buf, size_t nr) { }
-static inline size_t pv_console_rx(struct cpu_user_regs *regs) { return 0; }
+static inline size_t pv_console_rx(void) { return 0; }
 
 #endif /* !CONFIG_XEN_GUEST */
 #endif /* __XEN_PV_CONSOLE_H__ */




[PATCH 3/8] serial: drop serial_rx_fn's regs parameter

2024-01-10 Thread Jan Beulich
In the one place where it's needed, get_irq_regs() can be used instead.
This is in preparation of dropping the register parameters from IRQ
handler functions.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -527,12 +527,12 @@ static void switch_serial_input(void)
 printk("\n");
 }
 
-static void __serial_rx(char c, struct cpu_user_regs *regs)
+static void __serial_rx(char c)
 {
 switch ( console_rx )
 {
 case 0:
-return handle_keypress(c, regs);
+return handle_keypress(c, get_irq_regs());
 
 case 1:
 /*
@@ -579,7 +579,7 @@ static void __serial_rx(char c, struct c
 #endif
 }
 
-static void cf_check serial_rx(char c, struct cpu_user_regs *regs)
+static void cf_check serial_rx(char c)
 {
 static int switch_code_count = 0;
 
@@ -595,10 +595,10 @@ static void cf_check serial_rx(char c, s
 }
 
 for ( ; switch_code_count != 0; switch_code_count-- )
-__serial_rx(switch_code, regs);
+__serial_rx(switch_code);
 
 /* Finally process the just-received character. */
-__serial_rx(c, regs);
+__serial_rx(c);
 }
 
 static void cf_check notify_dom0_con_ring(void *unused)
--- a/xen/drivers/char/serial.c
+++ b/xen/drivers/char/serial.c
@@ -68,7 +68,7 @@ void serial_rx_interrupt(struct serial_p
 spin_unlock_irqrestore(>rx_lock, flags);
 
 if ( fn != NULL )
-(*fn)(c & 0x7f, regs);
+fn(c & 0x7f);
 }
 
 void serial_tx_interrupt(struct serial_port *port, struct cpu_user_regs *regs)
--- a/xen/drivers/char/xen_pv_console.c
+++ b/xen/drivers/char/xen_pv_console.c
@@ -118,7 +118,7 @@ size_t pv_console_rx(struct cpu_user_reg
 {
 c = cons_ring->in[MASK_XENCONS_IDX(cons++, cons_ring->in)];
 if ( cons_rx_handler )
-cons_rx_handler(c, regs);
+cons_rx_handler(c);
 recv++;
 }
 
--- a/xen/include/xen/serial.h
+++ b/xen/include/xen/serial.h
@@ -15,7 +15,7 @@
 struct cpu_user_regs;
 
 /* Register a character-receive hook on the specified COM port. */
-typedef void (*serial_rx_fn)(char c, struct cpu_user_regs *regs);
+typedef void (*serial_rx_fn)(char c);
 void serial_set_rx_handler(int handle, serial_rx_fn fn);
 
 /* Number of characters we buffer for a polling receiver. */




[PATCH 2/8] IRQ: generalize [gs]et_irq_regs()

2024-01-10 Thread Jan Beulich
Move functions (and their data) to common code, and invoke the functions
on Arm as well. This is in preparation of dropping the register
parameters from handler functions.

Signed-off-by: Jan Beulich 
---
To limit visibility of the per-CPU data item, we may want to consider
making the functions out-of-line ones (in common/irq.c).

--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -221,6 +221,7 @@ void do_IRQ(struct cpu_user_regs *regs,
 {
 struct irq_desc *desc = irq_to_desc(irq);
 struct irqaction *action;
+struct cpu_user_regs *old_regs = set_irq_regs(regs);
 
 perfc_incr(irqs);
 
@@ -288,6 +289,7 @@ out:
 out_no_end:
 spin_unlock(>lock);
 irq_exit();
+set_irq_regs(old_regs);
 }
 
 void release_irq(unsigned int irq, const void *dev_id)
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -70,27 +70,6 @@ extern bool opt_noirqbalance;
 
 extern int opt_irq_vector_map;
 
-/*
- * Per-cpu current frame pointer - the location of the last exception frame on
- * the stack
- */
-DECLARE_PER_CPU(struct cpu_user_regs *, __irq_regs);
-
-static inline struct cpu_user_regs *get_irq_regs(void)
-{
-   return this_cpu(__irq_regs);
-}
-
-static inline struct cpu_user_regs *set_irq_regs(struct cpu_user_regs 
*new_regs)
-{
-   struct cpu_user_regs *old_regs, **pp_regs = _cpu(__irq_regs);
-
-   old_regs = *pp_regs;
-   *pp_regs = new_regs;
-   return old_regs;
-}
-
-
 #define platform_legacy_irq(irq)   ((irq) < 16)
 
 void cf_check event_check_interrupt(struct cpu_user_regs *regs);
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -53,8 +53,6 @@ static DEFINE_SPINLOCK(vector_lock);
 
 DEFINE_PER_CPU(vector_irq_t, vector_irq);
 
-DEFINE_PER_CPU(struct cpu_user_regs *, __irq_regs);
-
 static LIST_HEAD(irq_ratelimit_list);
 static DEFINE_SPINLOCK(irq_ratelimit_lock);
 static struct timer irq_ratelimit_timer;
--- a/xen/common/irq.c
+++ b/xen/common/irq.c
@@ -1,6 +1,8 @@
 #include 
 #include 
 
+DEFINE_PER_CPU(struct cpu_user_regs *, irq_regs);
+
 int init_one_irq_desc(struct irq_desc *desc)
 {
 int err;
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -131,6 +131,26 @@ void cf_check irq_actor_none(struct irq_
 #define irq_disable_none irq_actor_none
 #define irq_enable_none irq_actor_none
 
+/*
+ * Per-cpu interrupted context register state - the top-most interrupt frame
+ * on the stack.
+ */
+DECLARE_PER_CPU(struct cpu_user_regs *, irq_regs);
+
+static inline struct cpu_user_regs *get_irq_regs(void)
+{
+   return this_cpu(irq_regs);
+}
+
+static inline struct cpu_user_regs *set_irq_regs(struct cpu_user_regs 
*new_regs)
+{
+   struct cpu_user_regs *old_regs, **pp_regs = _cpu(irq_regs);
+
+   old_regs = *pp_regs;
+   *pp_regs = new_regs;
+   return old_regs;
+}
+
 struct domain;
 struct vcpu;
 




[PATCH 1/8] keyhandler: don't pass cpu_user_regs around

2024-01-10 Thread Jan Beulich
There are exactly two handlers which care about the registers. Have
handle_keypress() make the pointer available via a per-CPU variable,
thus eliminating the need to pass it to all IRQ key handlers, making
sure that a console-invoked key's handling can still nest inside a
sysctl-invoked one's.

Signed-off-by: Jan Beulich 
---
Subsequently we may want to eliminate the fn/irq_fn union as well,
along with dropping the now redundant irq_keyhandler_fn_t.

--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -80,6 +80,7 @@ static void cf_check keypress_action(voi
 }
 
 static DECLARE_TASKLET(keypress_tasklet, keypress_action, NULL);
+static DEFINE_PER_CPU(struct cpu_user_regs *, keypress_regs);
 
 void handle_keypress(unsigned char key, struct cpu_user_regs *regs)
 {
@@ -91,7 +92,16 @@ void handle_keypress(unsigned char key,
 if ( !in_irq() || h->irq_callback )
 {
 console_start_log_everything();
-h->irq_callback ? h->irq_fn(key, regs) : h->fn(key);
+if ( h->irq_callback )
+{
+struct cpu_user_regs *old = this_cpu(keypress_regs);
+
+this_cpu(keypress_regs) = regs;
+h->irq_fn(key);
+this_cpu(keypress_regs) = old;
+}
+else
+h->fn(key);
 console_end_log_everything();
 }
 else
@@ -171,8 +181,7 @@ void cf_check dump_execstate(struct cpu_
 watchdog_enable();
 }
 
-static void cf_check dump_registers(
-unsigned char key, struct cpu_user_regs *regs)
+static void cf_check dump_registers(unsigned char key)
 {
 unsigned int cpu;
 
@@ -185,8 +194,8 @@ static void cf_check dump_registers(
 cpumask_copy(_execstate_mask, _online_map);
 
 /* Get local execution state out immediately, in case we get stuck. */
-if ( regs )
-dump_execstate(regs);
+if ( this_cpu(keypress_regs) )
+dump_execstate(this_cpu(keypress_regs));
 else
 run_in_exception_handler(dump_execstate);
 
@@ -248,8 +257,7 @@ static void cf_check dump_hwdom_register
 }
 }
 
-static void cf_check reboot_machine(
-unsigned char key, struct cpu_user_regs *regs)
+static void cf_check reboot_machine(unsigned char key)
 {
 printk("'%c' pressed -> rebooting machine\n", key);
 machine_restart(0);
@@ -477,8 +485,7 @@ static void cf_check run_all_nonirq_keyh
 static DECLARE_TASKLET(run_all_keyhandlers_tasklet,
run_all_nonirq_keyhandlers, NULL);
 
-static void cf_check run_all_keyhandlers(
-unsigned char key, struct cpu_user_regs *regs)
+static void cf_check run_all_keyhandlers(unsigned char key)
 {
 struct keyhandler *h;
 unsigned int k;
@@ -494,7 +501,7 @@ static void cf_check run_all_keyhandlers
 if ( !h->irq_fn || !h->diagnostic || !h->irq_callback )
 continue;
 printk("[%c: %s]\n", k, h->desc);
-h->irq_fn(k, regs);
+h->irq_fn(k);
 }
 
 watchdog_enable();
@@ -511,17 +518,16 @@ static void cf_check do_debugger_trap_fa
 barrier();
 }
 
-static void cf_check do_debug_key(unsigned char key, struct cpu_user_regs 
*regs)
+static void cf_check do_debug_key(unsigned char key)
 {
 printk("'%c' pressed -> trapping into debugger\n", key);
-if ( regs )
-do_debugger_trap_fatal(regs);
+if ( this_cpu(keypress_regs) )
+do_debugger_trap_fatal(this_cpu(keypress_regs));
 else
 run_in_exception_handler(do_debugger_trap_fatal);
 }
 
-static void cf_check do_toggle_alt_key(
-unsigned char key, struct cpu_user_regs *regs)
+static void cf_check do_toggle_alt_key(unsigned char key)
 {
 alt_key_handling = !alt_key_handling;
 printk("'%c' pressed -> using %s key handling\n", key,
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -280,7 +280,7 @@ static int *__read_mostly upper_thresh_a
 static int *__read_mostly lower_thresh_adj = _lower_thresh;
 static const char *__read_mostly thresh_adj = "standard";
 
-static void cf_check do_toggle_guest(unsigned char key, struct cpu_user_regs 
*regs)
+static void cf_check do_toggle_guest(unsigned char key)
 {
 if ( upper_thresh_adj == _upper_thresh )
 {
@@ -307,13 +307,13 @@ static void do_adj_thresh(unsigned char
loglvl_str(*upper_thresh_adj));
 }
 
-static void cf_check do_inc_thresh(unsigned char key, struct cpu_user_regs 
*regs)
+static void cf_check do_inc_thresh(unsigned char key)
 {
 ++*lower_thresh_adj;
 do_adj_thresh(key);
 }
 
-static void cf_check do_dec_thresh(unsigned char key, struct cpu_user_regs 
*regs)
+static void cf_check do_dec_thresh(unsigned char key)
 {
 if ( *lower_thresh_adj )
 --*lower_thresh_adj;
--- a/xen/include/xen/keyhandler.h
+++ b/xen/include/xen/keyhandler.h
@@ -24,9 +24,7 @@ typedef void (keyhandler_fn_t)(unsigned
  *
  * Called in hardirq context with interrupts disabled.
  */
-struct cpu_user_regs;
-typedef void (irq_keyhandler_fn_t)(unsigned char key,
-   struct cpu_user_regs *regs);

[PATCH 0/8] limit passing around of cpu_user_regs

2024-01-10 Thread Jan Beulich
Unlike (synchronous) exception handlers, interrupt handlers don't normally
have a need to know the outer context's register state. Similarly, the vast
majority of key handlers has no need for such.

1: keyhandler: don't pass cpu_user_regs around
2: IRQ: generalize [gs]et_irq_regs()
3: serial: drop serial_rx_fn's regs parameter
4: PV-shim: drop pv_console_rx()'s regs parameter
5: serial: drop serial_[rt]x_interrupt()'s regs parameter
6: IRQ: drop regs parameter from handler functions
7: x86/vPMU: drop regs parameter from interrupt functions
8: x86/APIC: drop regs parameter from direct vector handler functions

Jan



Re: [PATCH 0/8] limit passing around of cpu_user_regs

2024-01-10 Thread Jan Beulich
On 11.01.2024 08:21, Jan Beulich wrote:
> Unlike (synchronous) exception handlers, interrupt handlers don't normally
> have a need to know the outer context's register state. Similarly, the vast
> majority of key handlers has no need for such.
> 
> 1: keyhandler: don't pass cpu_user_regs around
> 2: IRQ: generalize [gs]et_irq_regs()
> 3: serial: drop serial_rx_fn's regs parameter
> 4: PV-shim: drop pv_console_rx()'s regs parameter
> 5: serial: drop serial_[rt]x_interrupt()'s regs parameter
> 6: IRQ: drop regs parameter from handler functions
> 7: x86/vPMU: drop regs parameter from interrupt functions
> 8: x86/APIC: drop regs parameter from direct vector handler functions

I'm sorry, I need to start over - somehow ordering got confused in the
reply-to-s sent so far.

Jan




[PATCH 4/8] PV-shim: drop pv_console_rx()'s regs parameter

2024-01-10 Thread Jan Beulich
It's not needed anymore. This is in preparation of dropping the register
parameters from IRQ handler functions.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -181,7 +181,7 @@ static void cf_check xen_evtchn_upcall(s
 port += l1 * BITS_PER_LONG;
 
 if ( pv_console && port == pv_console_evtchn() )
-pv_console_rx(regs);
+pv_console_rx();
 else if ( pv_shim )
 pv_shim_inject_evtchn(port);
 }
--- a/xen/drivers/char/xen_pv_console.c
+++ b/xen/drivers/char/xen_pv_console.c
@@ -94,7 +94,7 @@ evtchn_port_t pv_console_evtchn(void)
 return cons_evtchn;
 }
 
-size_t pv_console_rx(struct cpu_user_regs *regs)
+size_t pv_console_rx(void)
 {
 char c;
 XENCONS_RING_IDX cons, prod;
--- a/xen/include/xen/pv_console.h
+++ b/xen/include/xen/pv_console.h
@@ -9,7 +9,7 @@ void pv_console_init(void);
 void pv_console_set_rx_handler(serial_rx_fn fn);
 void pv_console_init_postirq(void);
 void pv_console_puts(const char *buf, size_t nr);
-size_t pv_console_rx(struct cpu_user_regs *regs);
+size_t pv_console_rx(void);
 evtchn_port_t pv_console_evtchn(void);
 
 #else
@@ -18,7 +18,7 @@ static inline void pv_console_init(void)
 static inline void pv_console_set_rx_handler(serial_rx_fn fn) { }
 static inline void pv_console_init_postirq(void) { }
 static inline void pv_console_puts(const char *buf, size_t nr) { }
-static inline size_t pv_console_rx(struct cpu_user_regs *regs) { return 0; }
+static inline size_t pv_console_rx(void) { return 0; }
 
 #endif /* !CONFIG_XEN_GUEST */
 #endif /* __XEN_PV_CONSOLE_H__ */




[PATCH 3/8] serial: drop serial_rx_fn's regs parameter

2024-01-10 Thread Jan Beulich
In the one place where it's needed, get_irq_regs() can be used instead.
This is in preparation of dropping the register parameters from IRQ
handler functions.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -527,12 +527,12 @@ static void switch_serial_input(void)
 printk("\n");
 }
 
-static void __serial_rx(char c, struct cpu_user_regs *regs)
+static void __serial_rx(char c)
 {
 switch ( console_rx )
 {
 case 0:
-return handle_keypress(c, regs);
+return handle_keypress(c, get_irq_regs());
 
 case 1:
 /*
@@ -579,7 +579,7 @@ static void __serial_rx(char c, struct c
 #endif
 }
 
-static void cf_check serial_rx(char c, struct cpu_user_regs *regs)
+static void cf_check serial_rx(char c)
 {
 static int switch_code_count = 0;
 
@@ -595,10 +595,10 @@ static void cf_check serial_rx(char c, s
 }
 
 for ( ; switch_code_count != 0; switch_code_count-- )
-__serial_rx(switch_code, regs);
+__serial_rx(switch_code);
 
 /* Finally process the just-received character. */
-__serial_rx(c, regs);
+__serial_rx(c);
 }
 
 static void cf_check notify_dom0_con_ring(void *unused)
--- a/xen/drivers/char/serial.c
+++ b/xen/drivers/char/serial.c
@@ -68,7 +68,7 @@ void serial_rx_interrupt(struct serial_p
 spin_unlock_irqrestore(>rx_lock, flags);
 
 if ( fn != NULL )
-(*fn)(c & 0x7f, regs);
+fn(c & 0x7f);
 }
 
 void serial_tx_interrupt(struct serial_port *port, struct cpu_user_regs *regs)
--- a/xen/drivers/char/xen_pv_console.c
+++ b/xen/drivers/char/xen_pv_console.c
@@ -118,7 +118,7 @@ size_t pv_console_rx(struct cpu_user_reg
 {
 c = cons_ring->in[MASK_XENCONS_IDX(cons++, cons_ring->in)];
 if ( cons_rx_handler )
-cons_rx_handler(c, regs);
+cons_rx_handler(c);
 recv++;
 }
 
--- a/xen/include/xen/serial.h
+++ b/xen/include/xen/serial.h
@@ -15,7 +15,7 @@
 struct cpu_user_regs;
 
 /* Register a character-receive hook on the specified COM port. */
-typedef void (*serial_rx_fn)(char c, struct cpu_user_regs *regs);
+typedef void (*serial_rx_fn)(char c);
 void serial_set_rx_handler(int handle, serial_rx_fn fn);
 
 /* Number of characters we buffer for a polling receiver. */




[PATCH 2/8] IRQ: drop register parameter from handler functions

2024-01-10 Thread Jan Beulich
It's simply not needed anymore. Note how Linux made this change many
years ago already, in 2.6.19 (late 2006, see [1]).

Signed-off-by: Jan Beulich 

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=7d12e780e003f93433d49ce78cfedf4b4c52adc5

--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -397,7 +397,7 @@ void gic_interrupt(struct cpu_user_regs
 } while (1);
 }
 
-static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs 
*regs)
+static void maintenance_interrupt(int irq, void *dev_id)
 {
 /*
  * This is a dummy interrupt handler.
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -182,8 +182,7 @@ void irq_set_affinity(struct irq_desc *d
 }
 
 int request_irq(unsigned int irq, unsigned int irqflags,
-void (*handler)(int irq, void *dev_id,
-struct cpu_user_regs *regs),
+void (*handler)(int irq, void *dev_id),
 const char *devname, void *dev_id)
 {
 struct irqaction *action;
@@ -276,7 +275,7 @@ void do_IRQ(struct cpu_user_regs *regs,
 
 do
 {
-action->handler(irq, action->dev_id, regs);
+action->handler(irq, action->dev_id);
 action = action->next;
 } while ( action );
 
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -241,7 +241,7 @@ int reprogram_timer(s_time_t timeout)
 }
 
 /* Handle the firing timer */
-static void htimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
+static void htimer_interrupt(int irq, void *dev_id)
 {
 if ( unlikely(!(READ_SYSREG(CNTHP_CTL_EL2) & CNTx_CTL_PENDING)) )
 return;
@@ -255,7 +255,7 @@ static void htimer_interrupt(int irq, vo
 WRITE_SYSREG(0, CNTHP_CTL_EL2);
 }
 
-static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
+static void vtimer_interrupt(int irq, void *dev_id)
 {
 /*
  * Edge-triggered interrupts can be used for the virtual timer. Even
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -962,7 +962,7 @@ static int __init cf_check irq_ratelimit
 __initcall(irq_ratelimit_init);
 
 int __init request_irq(unsigned int irq, unsigned int irqflags,
-void (*handler)(int irq, void *dev_id, struct cpu_user_regs *regs),
+void (*handler)(int irq, void *dev_id),
 const char * devname, void *dev_id)
 {
 struct irqaction * action;
@@ -2009,7 +2009,7 @@ void do_IRQ(struct cpu_user_regs *regs)
 spin_unlock_irq(>lock);
 
 tsc_in = tb_init_done ? get_cycles() : 0;
-action->handler(irq, action->dev_id, regs);
+action->handler(irq, action->dev_id);
 TRACE_3D(TRC_HW_IRQ_HANDLED, irq, tsc_in, get_cycles());
 
 spin_lock_irq(>lock);
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -237,8 +237,7 @@ again:
 }
 }
 
-static void cf_check hpet_interrupt_handler(
-int irq, void *data, struct cpu_user_regs *regs)
+static void cf_check hpet_interrupt_handler(int irq, void *data)
 {
 struct hpet_event_channel *ch = data;
 
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -198,8 +198,7 @@ static void smp_send_timer_broadcast_ipi
 }
 }
 
-static void cf_check timer_interrupt(
-int irq, void *dev_id, struct cpu_user_regs *regs)
+static void cf_check timer_interrupt(int irq, void *dev_id)
 {
 ASSERT(local_irq_is_enabled());
 
--- a/xen/common/irq.c
+++ b/xen/common/irq.c
@@ -29,7 +29,7 @@ int init_one_irq_desc(struct irq_desc *d
 return err;
 }
 
-void cf_check no_action(int cpl, void *dev_id, struct cpu_user_regs *regs)
+void cf_check no_action(int cpl, void *dev_id)
 {
 }
 
--- a/xen/drivers/char/cadence-uart.c
+++ b/xen/drivers/char/cadence-uart.c
@@ -40,7 +40,7 @@ static struct cuart {
 #define cuart_read(uart, off)   readl((uart)->regs + (off))
 #define cuart_write(uart, off,val)  writel((val), (uart)->regs + (off))
 
-static void cuart_interrupt(int irq, void *data, struct cpu_user_regs *regs)
+static void cuart_interrupt(int irq, void *data)
 {
 struct serial_port *port = data;
 struct cuart *uart = port->uart;
--- a/xen/drivers/char/exynos4210-uart.c
+++ b/xen/drivers/char/exynos4210-uart.c
@@ -45,7 +45,7 @@ static struct exynos4210_uart {
 #define exynos4210_read(uart, off)  readl((uart)->regs + off)
 #define exynos4210_write(uart, off, val)writel(val, (uart->regs) + off)
 
-static void exynos4210_uart_interrupt(int irq, void *data, struct 
cpu_user_regs *regs)
+static void exynos4210_uart_interrupt(int irq, void *data)
 {
 struct serial_port *port = data;
 struct exynos4210_uart *uart = port->uart;
--- a/xen/drivers/char/imx-lpuart.c
+++ b/xen/drivers/char/imx-lpuart.c
@@ -37,8 +37,7 @@ static struct imx_lpuart {
 struct vuart_info vuart;
 } imx8_com;
 
-static void imx_lpuart_interrupt(int irq, void *data,
- struct cpu_user_regs *regs)
+static void imx_lpuart_interrupt(int irq, void *data)
 {
 struct serial_port *port = data;
 struct 

[PATCH 1/8] limit passing around of cpu_user_regs

2024-01-10 Thread Jan Beulich
There are exactly two handlers which care about the registers. Have
handle_keypress() make the pointer available via a per-CPU variable,
thus eliminating the need to pass it to all IRQ key handlers, making
sure that a console-invoked key's handling can still nest inside a
sysctl-invoked one's.

Signed-off-by: Jan Beulich 
---
Subsequently we may want to eliminate the fn/irq_fn union as well,
along with dropping the now redundant irq_keyhandler_fn_t.

--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -80,6 +80,7 @@ static void cf_check keypress_action(voi
 }
 
 static DECLARE_TASKLET(keypress_tasklet, keypress_action, NULL);
+static DEFINE_PER_CPU(struct cpu_user_regs *, keypress_regs);
 
 void handle_keypress(unsigned char key, struct cpu_user_regs *regs)
 {
@@ -91,7 +92,16 @@ void handle_keypress(unsigned char key,
 if ( !in_irq() || h->irq_callback )
 {
 console_start_log_everything();
-h->irq_callback ? h->irq_fn(key, regs) : h->fn(key);
+if ( h->irq_callback )
+{
+struct cpu_user_regs *old = this_cpu(keypress_regs);
+
+this_cpu(keypress_regs) = regs;
+h->irq_fn(key);
+this_cpu(keypress_regs) = old;
+}
+else
+h->fn(key);
 console_end_log_everything();
 }
 else
@@ -171,8 +181,7 @@ void cf_check dump_execstate(struct cpu_
 watchdog_enable();
 }
 
-static void cf_check dump_registers(
-unsigned char key, struct cpu_user_regs *regs)
+static void cf_check dump_registers(unsigned char key)
 {
 unsigned int cpu;
 
@@ -185,8 +194,8 @@ static void cf_check dump_registers(
 cpumask_copy(_execstate_mask, _online_map);
 
 /* Get local execution state out immediately, in case we get stuck. */
-if ( regs )
-dump_execstate(regs);
+if ( this_cpu(keypress_regs) )
+dump_execstate(this_cpu(keypress_regs));
 else
 run_in_exception_handler(dump_execstate);
 
@@ -248,8 +257,7 @@ static void cf_check dump_hwdom_register
 }
 }
 
-static void cf_check reboot_machine(
-unsigned char key, struct cpu_user_regs *regs)
+static void cf_check reboot_machine(unsigned char key)
 {
 printk("'%c' pressed -> rebooting machine\n", key);
 machine_restart(0);
@@ -477,8 +485,7 @@ static void cf_check run_all_nonirq_keyh
 static DECLARE_TASKLET(run_all_keyhandlers_tasklet,
run_all_nonirq_keyhandlers, NULL);
 
-static void cf_check run_all_keyhandlers(
-unsigned char key, struct cpu_user_regs *regs)
+static void cf_check run_all_keyhandlers(unsigned char key)
 {
 struct keyhandler *h;
 unsigned int k;
@@ -494,7 +501,7 @@ static void cf_check run_all_keyhandlers
 if ( !h->irq_fn || !h->diagnostic || !h->irq_callback )
 continue;
 printk("[%c: %s]\n", k, h->desc);
-h->irq_fn(k, regs);
+h->irq_fn(k);
 }
 
 watchdog_enable();
@@ -511,17 +518,16 @@ static void cf_check do_debugger_trap_fa
 barrier();
 }
 
-static void cf_check do_debug_key(unsigned char key, struct cpu_user_regs 
*regs)
+static void cf_check do_debug_key(unsigned char key)
 {
 printk("'%c' pressed -> trapping into debugger\n", key);
-if ( regs )
-do_debugger_trap_fatal(regs);
+if ( this_cpu(keypress_regs) )
+do_debugger_trap_fatal(this_cpu(keypress_regs));
 else
 run_in_exception_handler(do_debugger_trap_fatal);
 }
 
-static void cf_check do_toggle_alt_key(
-unsigned char key, struct cpu_user_regs *regs)
+static void cf_check do_toggle_alt_key(unsigned char key)
 {
 alt_key_handling = !alt_key_handling;
 printk("'%c' pressed -> using %s key handling\n", key,
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -280,7 +280,7 @@ static int *__read_mostly upper_thresh_a
 static int *__read_mostly lower_thresh_adj = _lower_thresh;
 static const char *__read_mostly thresh_adj = "standard";
 
-static void cf_check do_toggle_guest(unsigned char key, struct cpu_user_regs 
*regs)
+static void cf_check do_toggle_guest(unsigned char key)
 {
 if ( upper_thresh_adj == _upper_thresh )
 {
@@ -307,13 +307,13 @@ static void do_adj_thresh(unsigned char
loglvl_str(*upper_thresh_adj));
 }
 
-static void cf_check do_inc_thresh(unsigned char key, struct cpu_user_regs 
*regs)
+static void cf_check do_inc_thresh(unsigned char key)
 {
 ++*lower_thresh_adj;
 do_adj_thresh(key);
 }
 
-static void cf_check do_dec_thresh(unsigned char key, struct cpu_user_regs 
*regs)
+static void cf_check do_dec_thresh(unsigned char key)
 {
 if ( *lower_thresh_adj )
 --*lower_thresh_adj;
--- a/xen/include/xen/keyhandler.h
+++ b/xen/include/xen/keyhandler.h
@@ -24,9 +24,7 @@ typedef void (keyhandler_fn_t)(unsigned
  *
  * Called in hardirq context with interrupts disabled.
  */
-struct cpu_user_regs;
-typedef void (irq_keyhandler_fn_t)(unsigned char key,
-   struct cpu_user_regs *regs);

[PATCH 0/8] limit passing around of cpu_user_regs

2024-01-10 Thread Jan Beulich
Unlike (synchronous) exception handlers, interrupt handlers don't normally
have a need to know the outer context's register state. Similarly, the vast
majority of key handlers has no need for such.

1: keyhandler: don't pass cpu_user_regs around
2: IRQ: generalize [gs]et_irq_regs()
3: serial: drop serial_rx_fn's regs parameter
4: PV-shim: drop pv_console_rx()'s regs parameter
5: serial: drop serial_[rt]x_interrupt()'s regs parameter
6: IRQ: drop regs parameter from handler functions
7: x86/vPMU: drop regs parameter from interrupt functions
8: x86/APIC: drop regs parameter from direct vector handler functions

Jan



[linux-5.4 test] 184306: regressions - FAIL

2024-01-10 Thread osstest service owner
flight 184306 linux-5.4 real [real]
flight 184317 linux-5.4 real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/184306/
http://logs.test-lab.xenproject.org/osstest/logs/184317/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-multivcpu 18 guest-start/debian.repeat fail in 184298 
REGR. vs. 184192

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-multivcpu 14 guest-start   fail pass in 184298
 test-armhf-armhf-xl  18 guest-start/debian.repeat  fail pass in 184298
 test-amd64-amd64-xl-qcow212 debian-di-install  fail pass in 184298

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 184192

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 
184192
 test-armhf-armhf-xl-credit1 18 guest-start/debian.repeat fail in 184298 
blocked in 184192
 test-armhf-armhf-xl-credit1 15 migrate-support-check fail in 184298 never pass
 test-armhf-armhf-xl-credit1 16 saverestore-support-check fail in 184298 never 
pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail in 184298 never 
pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail in 184298 
never pass
 test-armhf-armhf-xl-credit1  14 guest-start  fail  like 184192
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184192
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184192
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184192
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184192
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184192
 test-armhf-armhf-xl-credit2  18 guest-start/debian.repeatfail  like 184192
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184192
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184192
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184192
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184192
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184192
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184192
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check   

E820 memory allocation issue on Threadripper platforms

2024-01-10 Thread Patrick Plenefisch
Hi,

I ran into a memory allocation issue, I think. It is the same as
https://github.com/QubesOS/qubes-issues/issues/8791 and I saw at the end it
was recommended (by marmarek) that the issue reporter forward the issue to
this list. I searched the list, but as I didn't see it in the list already,
I'm doing that now.

Hardware:
I have an AMD Threadripper 7960X on a ASRock TRX50 WS motherboard. The
Qubes reporter had a Threadripper 3970X on an ASUS Prime TRX40-Pro
Motherboard. I saw a 3rd issue report of a similar issue on another
Threadripper, so I think this may be Threadripper-specific.

Setup:
The QuebesOS reporter was using Qubes Installer.
My install was that I had a fresh install of Debian 12 (no gui), and then
did `apt install xen-system-amd64` and rebooted.

The issue:
Any boot of Xen on the hardware results in a halted machine. When
monitoring the logs with `vga=,keep`, we get:

(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Freed 644kB init memory
mapping kernel into physical memory
about to get started…
xen hypervisor allocated kernel memory conflicts with E820
(XEN) Hardware Dom0 halted: halting machine

None of the settings I or the Qubes reporter have tried have been able to
get past this failure.

I am happy to provide debugging support.

Patrick


[ovmf test] 184315: all pass - PUSHED

2024-01-10 Thread osstest service owner
flight 184315 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184315/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 889535caf8869e3d4818b75f95f2fc910c84a4d2
baseline version:
 ovmf e7cfdc5f14b408e85fcbcb335aae7c15bbce4dfb

Last test of basis   184310  2024-01-10 16:11:02 Z0 days
Testing same since   184315  2024-01-11 03:12:57 Z0 days1 attempts


People who touched revisions under test:
  Junfeng Guan 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   e7cfdc5f14..889535caf8  889535caf8869e3d4818b75f95f2fc910c84a4d2 -> 
xen-tested-master



[linux-linus test] 184303: regressions - FAIL

2024-01-10 Thread osstest service owner
flight 184303 linux-linus real [real]
flight 184313 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/184303/
http://logs.test-lab.xenproject.org/osstest/logs/184313/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl-credit1  12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-xl  12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-xl-thunderx 12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-xl-credit2  12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-xl-xsm  12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-libvirt-xsm 12 debian-install   fail REGR. vs. 184270

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt-raw  8 xen-bootfail pass in 184313-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail in 184313 like 
184270
 test-armhf-armhf-libvirt-raw 14 migrate-support-check fail in 184313 never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184270
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184270
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184270
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184270
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184270
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184270
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184270
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linuxab27740f76654ed58dd32ac0ba0031c18a6dea3b
baseline version:
 linux0dd3ee31125508cd67f7e7172247f05b7fd1753a

Last test of basis   184270  2024-01-07 20:42:19 Z3 days
Failing since184283  2024-01-08 20:10:43 Z2 days4 attempts
Testing same since   184303  2024-01-10 07:18:09 Z0 days1 attempts


433 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt 

Re: [RFC XEN PATCH v4 4/5] domctl: Use gsi to grant/revoke irq permission

2024-01-10 Thread Chen, Jiqian
On 2024/1/11 04:09, Stewart Hildebrand wrote:
> On 1/5/24 02:09, Jiqian Chen wrote:
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index f5a71ee5f78d..eeb975bd0194 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -653,12 +653,20 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
>> u_domctl)
>>  unsigned int pirq = op->u.irq_permission.pirq, irq;
>>  int allow = op->u.irq_permission.allow_access;
>>  
>> -if ( pirq >= current->domain->nr_pirqs )
>> +if ( pirq >= nr_irqs_gsi )
> 
> This doesn't build on ARM, as nr_irqs_gsi is x86 only. This is a wild guess: 
> we may want keep the existing current->domain->nr_pirqs check, then add the 
> new nr_irqs_gsi check wrapped in #ifdef CONFIG_X86.
As I will add a new hypercall in next version, then this change will not exist, 
thank you.

> 
>>  {
>>  ret = -EINVAL;
>>  break;
>>  }
>> -irq = pirq_access_permitted(current->domain, pirq);
>> +
>> +if ( irq_access_permitted(current->domain, pirq) )
>> +irq = pirq;
>> +else
>> +{
>> +ret = -EPERM;
>> +break;
>> +}
>> +
>>  if ( !irq || xsm_irq_permission(XSM_HOOK, d, irq, allow) )
>>  ret = -EPERM;
>>  else if ( allow )

-- 
Best regards,
Jiqian Chen.


Re: [RFC XEN PATCH v4 1/5] xen/vpci: Clear all vpci status of device

2024-01-10 Thread Chen, Jiqian
On 2024/1/10 22:09, Stewart Hildebrand wrote:
> On 1/10/24 01:24, Chen, Jiqian wrote:
>> On 2024/1/9 23:24, Stewart Hildebrand wrote:
>>> On 1/5/24 02:09, Jiqian Chen wrote:
 diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
 index 42db3e6d133c..552ccbf747cb 100644
 --- a/xen/drivers/pci/physdev.c
 +++ b/xen/drivers/pci/physdev.c
 @@ -67,6 +68,39 @@ ret_t pci_physdev_op(int cmd, 
 XEN_GUEST_HANDLE_PARAM(void) arg)
  break;
  }
  
 +case PHYSDEVOP_pci_device_state_reset: {
 +struct physdev_pci_device dev;
 +struct pci_dev *pdev;
 +pci_sbdf_t sbdf;
 +
 +if ( !is_pci_passthrough_enabled() )
 +return -EOPNOTSUPP;
 +
 +ret = -EFAULT;
 +if ( copy_from_guest(, arg, 1) != 0 )
 +break;
 +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
 +
 +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
 +if ( ret )
 +break;
 +
 +pcidevs_lock();
 +pdev = pci_get_pdev(NULL, sbdf);
 +if ( !pdev )
 +{
 +pcidevs_unlock();
 +ret = -ENODEV;
 +break;
 +}
 +
>>>
>>> write_lock(>domain->pci_lock);
>>>
 +ret = vpci_reset_device_state(pdev);
>>>
>>> write_unlock(>domain->pci_lock);
>> vpci_reset_device_state only reset the vpci state of pdev without deleting 
>> pdev from domain, and here has held pcidevs_lock, it has no need to lock 
>> pci_lock?
> 
> Strictly speaking, it is not enforced yet. However, an upcoming change [1] 
> will expand the scope of d->pci_lock to include protecting the pdev->vpci 
> structure to an extent, so it will be required once that change is committed. 
> In my opinion there is no harm in adding the additional lock now. If you 
> prefer to wait I would not object, but in this case I would at least ask for 
> a TODO comment to help remind us to address it later.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2024-01/msg00446.html
> 
Ok, I see. I will add pci_lock in next version, thank you for reminding me.

>>
>>>
 +pcidevs_unlock();
 +if ( ret )
 +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
 );
 +break;
 +}
 +
  default:
  ret = -ENOSYS;
  break;
 diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
 index 72ef277c4f8e..3c64cb10ccbb 100644
 --- a/xen/drivers/vpci/vpci.c
 +++ b/xen/drivers/vpci/vpci.c
 @@ -107,6 +107,15 @@ int vpci_add_handlers(struct pci_dev *pdev)
  
  return rc;
  }
 +
 +int vpci_reset_device_state(struct pci_dev *pdev)
 +{
 +ASSERT(pcidevs_locked());
>>>
>>> ASSERT(rw_is_write_locked(>domain->pci_lock));
>>>
 +
 +vpci_remove_device(pdev);
 +return vpci_add_handlers(pdev);
 +}
 +
  #endif /* __XEN__ */
  
  static int vpci_register_cmp(const struct vpci_register *r1,
>>

-- 
Best regards,
Jiqian Chen.


[libvirt test] 184301: tolerable FAIL - PUSHED

2024-01-10 Thread osstest service owner
flight 184301 libvirt real [real]
flight 184312 libvirt real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/184301/
http://logs.test-lab.xenproject.org/osstest/logs/184312/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt-qcow2 13 guest-start   fail pass in 184312-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail in 184312 
like 184289
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-check fail in 184312 never 
pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184289
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184289
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  7cb03e6a28e465c49f0cabe8fe2e7d21edb5aadf
baseline version:
 libvirt  377e30087a8379e7cdede95bd3b95d2e35cec84a

Last test of basis   184289  2024-01-09 04:18:58 Z1 days
Testing same since   184301  2024-01-10 04:20:43 Z0 days1 attempts


People who touched revisions under test:
  Andrea Bolognani 
  Daniel P. Berrangé 
  Jiri Denemark 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   fail
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



Re: [PATCH v3 29/33] tools/xenstored: map stubdom interface

2024-01-10 Thread Jason Andryuk
On Thu, Jan 4, 2024 at 4:03 AM Juergen Gross  wrote:
>
> When running as stubdom, map the stubdom's Xenstore ring page in order
> to support using the 9pfs frontend.
>
> Use the same pattern as in dom0_init() when running as daemon in dom0
> (introduce the own domain, then send an event to the client side to
> signal Xenstore is ready to communicate).
>
> Signed-off-by: Juergen Gross 

Reviewed-by: Jason Andryuk 



Re: [PATCH v3 28/33] tools/xenstored: split domain_init()

2024-01-10 Thread Jason Andryuk
On Thu, Jan 4, 2024 at 4:59 AM Juergen Gross  wrote:
>
> Today domain_init() is called either just before calling dom0_init()
> in case no live update is being performed, or it is called after
> reading the global state from read_state_global(), as the event
> channel fd is needed.
>
> Split up domain_init() into a preparation part which can be called
> unconditionally, and in a part setting up the event channel handle.
>
> Note that there is no chance that chk_domain_generation() can be
> called now before xc_handle has been setup, so there is no need for
> the related special case anymore.
>
> Signed-off-by: Juergen Gross 

Reviewed-by: Jason Andryuk 



Re: [PATCH v3 26/33] tools/xenstored: get own domid in stubdom case

2024-01-10 Thread Jason Andryuk
On Thu, Jan 4, 2024 at 4:34 AM Juergen Gross  wrote:
>
> Obtain the own domid when running as stubdom.

Maybe "Obtain own domid when running as stubdom."?

>
> Signed-off-by: Juergen Gross 

Reviewed-by: Jason Andryuk 



Re: [PATCH v3 25/33] tools/xenstored: move all socket handling into posix.c

2024-01-10 Thread Jason Andryuk
On Thu, Jan 4, 2024 at 4:10 AM Juergen Gross  wrote:
>
> All of the socket handling is needed only when running as daemon.
>
> Move it into posix.c, allowing to remove the NO_SOCKETS macro.
>
> Signed-off-by: Juergen Gross 

Reviewed-by: Jason Andryuk 



Re: [PATCH v3 24/33] tools/xenstored: move all log-pipe handling into posix.c

2024-01-10 Thread Jason Andryuk
On Thu, Jan 4, 2024 at 4:11 AM Juergen Gross  wrote:
>
> All of the log-pipe handling is needed only when running as daemon.
>
> Move it into posix.c. This requires to have a service function in the
> main event loop for handling the related requests and one for setting
> the fds[] array. Use a generic name for those functions, as socket
> handling can be added to them later, too.
>
> Signed-off-by: Juergen Gross 

Reviewed-by: Jason Andryuk 



Re: [PATCH v3 23/33] tools/xenstored: move systemd handling to posix.c

2024-01-10 Thread Jason Andryuk
On Thu, Jan 4, 2024 at 4:13 AM Juergen Gross  wrote:
>
> Move systemd handling to a new late_init() function in posix.c.
>
> This prepares a future removal of the NO_SOCKETS macro.
>
> Signed-off-by: Juergen Gross 

Reviewed-by: Jason Andryuk 



Re: [PATCH for-4.18 v1] xen/common: Don't dereference overlay_node after checking that it is NULL

2024-01-10 Thread Vikram Garhwal
Hi Javi,
Thank you for spotting and fixing this.
On Tue, Jan 09, 2024 at 03:31:55PM +, Julien Grall wrote:
> Hi Javi,
> 
> Title: Any reason this is titled for-4.18? Shouldn't this patch also be
> merged in staging?
> 
> On 09/01/2024 14:19, Javi Merino wrote:
> > In remove_nodes(), overlay_node is dereferenced when printing the
> > error message even though it is known to be NULLL.  Fix the error
> 
> Typo: s/NULLL/NULL/
> 
> This can be fixed on commit if there is nothing else.
> 
> > message to avoid dereferencing a NULL pointer.
> > 
> > The semantic patch that spots this code is available in
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/coccinelle/null/deref_null.cocci?id=1f874787ed9a2d78ed59cb21d0d90ac0178eceb0
> 
> Good catch and glad to see that coccinelle can work on Xen. I am looking
> forward for more work in that area :).
> 
> > 
> > Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal 
> > functionalities")
> > Signed-off-by: Javi Merino 
> c> ---
> > CC: Vikram Garhwal 
> > 
> > Vikram, I didn't know what to put in the error message.  Feel free to
> > suggest something more appropriate than "Device not present in the
> > tree".
> 
> More questions for Vikram, looking at the code, it is not 100% clear in
> which condition overlay_node could be NULL. Is this a programming error? if
> so, maybe this should be an ASSERT_UNREACHABLE() (could be added separately)
> and it would be fine to print nothing.
> 
This can happen with failures in add_nodes() function. add_nodes() failure will
try to call remove_nodes function. Depending on where add_nodes() is failed,
nodes_address may or may not be NULL.

We also added a detailed comment on this:
https://github.com/xen-project/xen/blob/5a3ace21f3d779b291a2d305824b2820d88de7f1/xen/common/dt-overlay.c#L816

For now, we can return from here without printing anything as error message will
be printed by the caller of remove_nodes() anyway.

> Cheers,
> 
> -- 
> Julien Grall



Re: [PATCH v3 22/33] tools/xenstored: add early_init() function

2024-01-10 Thread Jason Andryuk
On Thu, Jan 4, 2024 at 4:12 AM Juergen Gross  wrote:
>
> Some xenstored initialization needs to be done in the daemon case only,
> so split it out into a new early_init() function being a stub in the
> stubdom case.
>
> Remove the call of talloc_enable_leak_report_full(), as it serves no
> real purpose: the daemon only ever exits due to a crash, in which case
> a log of talloc()ed memory hardly has any value.
>
> Signed-off-by: Juergen Gross 

Reviewed-by: Jason Andryuk 



Re: [RFC XEN PATCH v4 4/5] domctl: Use gsi to grant/revoke irq permission

2024-01-10 Thread Stewart Hildebrand
On 1/5/24 02:09, Jiqian Chen wrote:
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index f5a71ee5f78d..eeb975bd0194 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -653,12 +653,20 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
> u_domctl)
>  unsigned int pirq = op->u.irq_permission.pirq, irq;
>  int allow = op->u.irq_permission.allow_access;
>  
> -if ( pirq >= current->domain->nr_pirqs )
> +if ( pirq >= nr_irqs_gsi )

This doesn't build on ARM, as nr_irqs_gsi is x86 only. This is a wild guess: we 
may want keep the existing current->domain->nr_pirqs check, then add the new 
nr_irqs_gsi check wrapped in #ifdef CONFIG_X86.

>  {
>  ret = -EINVAL;
>  break;
>  }
> -irq = pirq_access_permitted(current->domain, pirq);
> +
> +if ( irq_access_permitted(current->domain, pirq) )
> +irq = pirq;
> +else
> +{
> +ret = -EPERM;
> +break;
> +}
> +
>  if ( !irq || xsm_irq_permission(XSM_HOOK, d, irq, allow) )
>  ret = -EPERM;
>  else if ( allow )



Re: [PATCH 1/2] x86/vmx: Fix IRQ handling for EXIT_REASON_INIT

2024-01-10 Thread Andrew Cooper
On 02/11/2023 8:57 am, Jan Beulich wrote:
> On 01.11.2023 20:20, Andrew Cooper wrote:
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -4097,10 +4097,6 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>>  case EXIT_REASON_MCE_DURING_VMENTRY:
>>  do_machine_check(regs);
>>  break;
>> -
>> -case EXIT_REASON_INIT:
>> -printk(XENLOG_ERR "Error: INIT received - ignoring\n");
>> -return; /* Renter the guest without further processing */
>>  }
> Wouldn't the printk() better remain where it was, and just the "return" be
> purged?

Not really... that would hit the unknown vmexit path in the second.

We actually have a variety of empty cases in the second.  We could add
another.

~Andrew



[ovmf test] 184310: all pass - PUSHED

2024-01-10 Thread osstest service owner
flight 184310 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184310/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf e7cfdc5f14b408e85fcbcb335aae7c15bbce4dfb
baseline version:
 ovmf 7d055812cc7a7d2b062cf56291211e8cecca36ed

Last test of basis   184307  2024-01-10 12:41:03 Z0 days
Testing same since   184310  2024-01-10 16:11:02 Z0 days1 attempts


People who touched revisions under test:
  Hou, Wenxing 
  Joey Vagedes 
  Wenxing Hou 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   7d055812cc..e7cfdc5f14  e7cfdc5f14b408e85fcbcb335aae7c15bbce4dfb -> 
xen-tested-master



Re: [PATCH] x86/nmi: ensure Global Performance Counter Control is setup correctly

2024-01-10 Thread Andrew Cooper
On 10/01/2024 4:58 pm, Roger Pau Monné wrote:
> On Wed, Jan 10, 2024 at 03:52:49PM +, Andrew Cooper wrote:
>> On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
>>> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
>>> MSR contains per-counter enable bits that is ANDed with the enable bit in 
>>> the
>>> counter EVNTSEL MSR in order for a PMC counter to be enabled.
>>>
>>> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
>>> bits being set by default, but at least on some Intel Sapphire and Emerald
>>> Rapids this is no longer the case, and Xen reports:
>>>
>>> Testing NMI watchdog on all CPUs: 0 40 stuck
>>>
>>> The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so 
>>> PMC0
>>> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
>>> relevant enable bit in PERF_GLOBAL_CTRL not being set.
>>>
>>> Fix by detecting when Architectural Performance Monitoring is available and
>>> making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>> ---
>>> The fact that it's only the first CPU on each socket that's started with
>>> PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case 
>>> making
>>> sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
>> It's each package-BSP, and yes, this is clearly a firmware bug.  It's
>> probably worth saying that we're raising it with Intel, but this bug is
>> out in production firmware for SPR and EMR.
>>
>>> ---
>>>  xen/arch/x86/nmi.c | 13 +
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
>>> index dc79c25e3ffd..7a6601c4fd31 100644
>>> --- a/xen/arch/x86/nmi.c
>>> +++ b/xen/arch/x86/nmi.c
>>> @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
>>>   nmi_p6_event_width > BITS_PER_LONG )
>>>  return;
>>>  
>>> +if ( cpu_has_arch_perfmon )
>>> +{
>>> +uint64_t global_ctrl;
>>> +
>>> +rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
>>> +/*
>>> + * Make sure PMC0 is enabled in global control, as the enable bit 
>>> in
>>> + * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
>>> + */
>>> +if ( !(global_ctrl & 1) )
>>> +wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
>> My gut feeling is that we ought to reinstate all bits, not just bit 1. 
>> If nothing else because that will make debugging using other counters
>> more reliable too.
> Hm, yes, I was borderline on enabling all possible counters in
> PERF_GLOBAL_CTRL, as reported by CPUID.0AH: EAX[15:8].
>
> But then wondered if it was going too far, as for the purposes here we
> just care about PMC1.
>
> My reasoning for not doing it would be that such wide setup of
> PERF_GLOBAL_CTRL would then be gated on the watchdog being enabled,
> usages of other counters apart from PMC0 will be gated on the watchdog
> being enabled.  It seems more reliable to me to either do the setting
> of PERF_GLOBAL_CTRL as part of CPU initialization, or defer to each
> user of a PMC to take care of enabling it in PERF_GLOBAL_CTRL.

It is buggy that each socket-BSP is handed over with ctl=0 rather than 0xff.

But we're exasperating the bug by not returning each socket-BSP to the
default behaviour.


It makes a practical difference if a developer wants to hand-code up
PCR2.  It also makes a practical difference to what a guest sees when it
executes RDPMC in guests, because right now the perf counter values leak
in (there's another oustanding patch series of mine trying to stem this
leak).

The fixup we're performing here isn't "because we're using one
counter".  It's to get state back to default.

~Andrew



Re: [PATCH] x86/nmi: ensure Global Performance Counter Control is setup correctly

2024-01-10 Thread Roger Pau Monné
On Wed, Jan 10, 2024 at 03:52:49PM +, Andrew Cooper wrote:
> On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
> > When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> > MSR contains per-counter enable bits that is ANDed with the enable bit in 
> > the
> > counter EVNTSEL MSR in order for a PMC counter to be enabled.
> >
> > So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> > bits being set by default, but at least on some Intel Sapphire and Emerald
> > Rapids this is no longer the case, and Xen reports:
> >
> > Testing NMI watchdog on all CPUs: 0 40 stuck
> >
> > The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so 
> > PMC0
> > doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> > relevant enable bit in PERF_GLOBAL_CTRL not being set.
> >
> > Fix by detecting when Architectural Performance Monitoring is available and
> > making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> > The fact that it's only the first CPU on each socket that's started with
> > PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case 
> > making
> > sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
> 
> It's each package-BSP, and yes, this is clearly a firmware bug.  It's
> probably worth saying that we're raising it with Intel, but this bug is
> out in production firmware for SPR and EMR.
> 
> > ---
> >  xen/arch/x86/nmi.c | 13 +
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
> > index dc79c25e3ffd..7a6601c4fd31 100644
> > --- a/xen/arch/x86/nmi.c
> > +++ b/xen/arch/x86/nmi.c
> > @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
> >   nmi_p6_event_width > BITS_PER_LONG )
> >  return;
> >  
> > +if ( cpu_has_arch_perfmon )
> > +{
> > +uint64_t global_ctrl;
> > +
> > +rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> > +/*
> > + * Make sure PMC0 is enabled in global control, as the enable bit 
> > in
> > + * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
> > + */
> > +if ( !(global_ctrl & 1) )
> > +wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
> 
> My gut feeling is that we ought to reinstate all bits, not just bit 1. 
> If nothing else because that will make debugging using other counters
> more reliable too.

Hm, yes, I was borderline on enabling all possible counters in
PERF_GLOBAL_CTRL, as reported by CPUID.0AH: EAX[15:8].

But then wondered if it was going too far, as for the purposes here we
just care about PMC1.

My reasoning for not doing it would be that such wide setup of
PERF_GLOBAL_CTRL would then be gated on the watchdog being enabled,
usages of other counters apart from PMC0 will be gated on the watchdog
being enabled.  It seems more reliable to me to either do the setting
of PERF_GLOBAL_CTRL as part of CPU initialization, or defer to each
user of a PMC to take care of enabling it in PERF_GLOBAL_CTRL.

> vPMU (although mutually exclusive with watchdog) does context switch
> this register as a whole.
> 
> See how global_ctrl_mask gets set up, although I'm not sure how much of
> that infrastructure we really want to reuse here.

Yes, if we want to enable all possible counters we would need to use
something similar to what's done there, albeit without the fixed
counter part.

Thanks, Roger.



[xen-unstable test] 184299: tolerable FAIL - PUSHED

2024-01-10 Thread osstest service owner
flight 184299 xen-unstable real [real]
flight 184309 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/184299/
http://logs.test-lab.xenproject.org/osstest/logs/184309/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 184309-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184271
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184271
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184271
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184271
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184271
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184271
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184271
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184271
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184271
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184271
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184271
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184271
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  c27c8922f2c6995d688437b0758cec6a27d18320
baseline version:
 xen  5a3ace21f3d779b291a2d305824b2820d88de7f1

Last test of basis   184271  

Re: [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain

2024-01-10 Thread Elias El Yandouzi

Hi,

On 22/12/2022 13:06, Jan Beulich wrote:

On 16.12.2022 12:48, Julien Grall wrote:

From: Hongyan Xia 

In order to use the mapcache in the idle domain, we also have to
populate its page tables in the PERDOMAIN region, and we need to move
mapcache_domain_init() earlier in arch_domain_create().

Note, commit 'x86: lift mapcache variable to the arch level' has
initialised the mapcache for HVM domains. With this patch, PV, HVM,
idle domains now all initialise the mapcache.


But they can't use it yet, can they? This needs saying explicitly, or
else one is going to make wrong implications.



Yes, I tried to use the mapcache right after the idle vCPU gets 
scheduled and it worked. So, I believe it is enough.



--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -732,6 +732,8 @@ int arch_domain_create(struct domain *d,
  
  spin_lock_init(>arch.e820_lock);
  
+mapcache_domain_init(d);

+
  /* Minimal initialisation for the idle domain. */
  if ( unlikely(is_idle_domain(d)) )
  {
@@ -829,8 +831,6 @@ int arch_domain_create(struct domain *d,
  
  psr_domain_init(d);
  
-mapcache_domain_init(d);


You move this ahead of error paths taking the "goto out" route, so
adjustments to affected error paths are going to be needed to avoid
memory leaks.


Correct, I'll fix that.




--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5963,6 +5963,9 @@ int create_perdomain_mapping(struct domain *d, unsigned 
long va,
  l3tab = __map_domain_page(pg);
  clear_page(l3tab);
  d->arch.perdomain_l3_pg = pg;
+if ( is_idle_domain(d) )
+idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] =
+l4e_from_page(pg, __PAGE_HYPERVISOR_RW);


Hmm, having an idle domain check here isn't very nice. I agree putting
it in arch_domain_create()'s respective conditional isn't very neat
either, but personally I'd consider this at least a little less bad.
And the layering violation aspect isn't much worse than that of setting
d->arch.ctxt_switch there as well.



Why do you think it would be less bad to move it in 
arch_domain_create()? To me, it would make things worse as it would 
spread the mapping stuff across different functions.


--
Elias



Re: xenpm cpufrequency settings don't work

2024-01-10 Thread flamv3421
I meant the xenpm get-cpufreq-average result did not match with what I had set 
as the maximum scaling frequency with xenpm and there are even times where the 
average is at 3GHz while my maximum scaling frequency is set to 80. Even if 
it was accurate, I believe there is something wrong with power management in 
xen since my laptop heats up very fast when using xen, and it does not have 
this issue when I use it without xen.


On Thursday, January 4th, 2024 at 11:39 AM, Jan Beulich wrote:


> On 28.12.2023 12:28, flamv3421 wrote:
> 
> > I used xenpm to disable turbo mode and set the maximum frequency to 80 
> > and governor to powersave, but my laptop fans are still running at full 
> > speed when I am using xen and the average frequency shown does not match 
> > the maximum frequency I set with xenpm which is 80.
> 
> 
> What do you derive from that the maximum freq isn't 800MHz after you set
> it? All the CPUs are in P15 as per the output you supplied, which is a good
> indication that no lower P-state is in use anymore (as even the CPU where
> the command was carried out was still in P15). Lower P-states would of
> course have been in use prior to you running xenpm. Sadly while there is a
> way to reset the statistics, the hypercall subfunction isn't wired up beyond
> the libxc wrapper function.
> 
> > Why are my fans running at full speed and why doesn't xenpm maximum 
> > frequency setting work?
> 
> 
> I'm afraid I can't answer this question, as I don't know how exactly fan
> speed is controlled on that system of yours.
> 
> Jan



[ovmf test] 184307: all pass - PUSHED

2024-01-10 Thread osstest service owner
flight 184307 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184307/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 7d055812cc7a7d2b062cf56291211e8cecca36ed
baseline version:
 ovmf bc34a79cd2a005e1d12d4b05bec6efc3b102cad6

Last test of basis   184305  2024-01-10 10:41:30 Z0 days
Testing same since   184307  2024-01-10 12:41:03 Z0 days1 attempts


People who touched revisions under test:
  Arun Sura 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   bc34a79cd2..7d055812cc  7d055812cc7a7d2b062cf56291211e8cecca36ed -> 
xen-tested-master



Re: [PATCH] x86/nmi: ensure Global Performance Counter Control is setup correctly

2024-01-10 Thread Andrew Cooper
On 10/01/2024 3:34 pm, Roger Pau Monne wrote:
> When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
> MSR contains per-counter enable bits that is ANDed with the enable bit in the
> counter EVNTSEL MSR in order for a PMC counter to be enabled.
>
> So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
> bits being set by default, but at least on some Intel Sapphire and Emerald
> Rapids this is no longer the case, and Xen reports:
>
> Testing NMI watchdog on all CPUs: 0 40 stuck
>
> The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so PMC0
> doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
> relevant enable bit in PERF_GLOBAL_CTRL not being set.
>
> Fix by detecting when Architectural Performance Monitoring is available and
> making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.
>
> Signed-off-by: Roger Pau Monné 
> ---
> The fact that it's only the first CPU on each socket that's started with
> PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case making
> sure PERF_GLOBAL_CTRL is properly setup should be done regardless.

It's each package-BSP, and yes, this is clearly a firmware bug.  It's
probably worth saying that we're raising it with Intel, but this bug is
out in production firmware for SPR and EMR.

> ---
>  xen/arch/x86/nmi.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
> index dc79c25e3ffd..7a6601c4fd31 100644
> --- a/xen/arch/x86/nmi.c
> +++ b/xen/arch/x86/nmi.c
> @@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
>   nmi_p6_event_width > BITS_PER_LONG )
>  return;
>  
> +if ( cpu_has_arch_perfmon )
> +{
> +uint64_t global_ctrl;
> +
> +rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
> +/*
> + * Make sure PMC0 is enabled in global control, as the enable bit in
> + * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
> + */
> +if ( !(global_ctrl & 1) )
> +wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);

My gut feeling is that we ought to reinstate all bits, not just bit 1. 
If nothing else because that will make debugging using other counters
more reliable too.

vPMU (although mutually exclusive with watchdog) does context switch
this register as a whole.

See how global_ctrl_mask gets set up, although I'm not sure how much of
that infrastructure we really want to reuse here.

~Andrew



[PATCH] x86/nmi: ensure Global Performance Counter Control is setup correctly

2024-01-10 Thread Roger Pau Monne
When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
MSR contains per-counter enable bits that is ANDed with the enable bit in the
counter EVNTSEL MSR in order for a PMC counter to be enabled.

So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
bits being set by default, but at least on some Intel Sapphire and Emerald
Rapids this is no longer the case, and Xen reports:

Testing NMI watchdog on all CPUs: 0 40 stuck

The first CPU on each socket is started with PERF_GLOBAL_CTRL zeroed, so PMC0
doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
relevant enable bit in PERF_GLOBAL_CTRL not being set.

Fix by detecting when Architectural Performance Monitoring is available and
making sure the enable bit for PMC0 is set in PERF_GLOBAL_CTRL.

Signed-off-by: Roger Pau Monné 
---
The fact that it's only the first CPU on each socket that's started with
PERF_GLOBAL_CTRL clear looks like a firmware bug to me, but in any case making
sure PERF_GLOBAL_CTRL is properly setup should be done regardless.
---
 xen/arch/x86/nmi.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index dc79c25e3ffd..7a6601c4fd31 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -335,6 +335,19 @@ static void setup_p6_watchdog(unsigned counter)
  nmi_p6_event_width > BITS_PER_LONG )
 return;
 
+if ( cpu_has_arch_perfmon )
+{
+uint64_t global_ctrl;
+
+rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+/*
+ * Make sure PMC0 is enabled in global control, as the enable bit in
+ * PERF_GLOBAL_CTRL is AND'ed with the enable bit in EVNTSEL0.
+ */
+if ( !(global_ctrl & 1) )
+wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl | 1);
+}
+
 clear_msr_range(MSR_P6_EVNTSEL(0), 2);
 clear_msr_range(MSR_P6_PERFCTR(0), 2);
 
-- 
2.43.0




[xtf test] 184308: all pass - PUSHED

2024-01-10 Thread osstest service owner
flight 184308 xtf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184308/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 xtf  f3bd152f5e83da943535f6ba7b5772d4dbe96717
baseline version:
 xtf  837f771d9612215d5e6c9a1a41bf3b3ab0d0b381

Last test of basis   184294  2024-01-09 12:14:55 Z1 days
Testing same since   184308  2024-01-10 14:12:53 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 

jobs:
 build-amd64-xtf  pass
 build-amd64  pass
 build-amd64-pvopspass
 test-xtf-amd64-amd64-1   pass
 test-xtf-amd64-amd64-2   pass
 test-xtf-amd64-amd64-3   pass
 test-xtf-amd64-amd64-4   pass
 test-xtf-amd64-amd64-5   pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xtf.git
   837f771..f3bd152  f3bd152f5e83da943535f6ba7b5772d4dbe96717 -> 
xen-tested-master



Re: [PATCH] x86/hvm: don't expose XENFEAT_hvm_pirqs by default

2024-01-10 Thread Roger Pau Monné
On Wed, Jan 10, 2024 at 03:47:12PM +0200, Xenia Ragiadakou wrote:
> 
> 
> On 10/1/24 12:26, Jan Beulich wrote:
> > On 10.01.2024 10:53, Roger Pau Monne wrote:
> > > The HVM pirq feature allows routing interrupts from both physical and 
> > > emulated
> > > devices over event channels, this was done a performance improvement.  
> > > However
> > > its usage is fully undocumented, and the only reference implementation is 
> > > in
> > > Linux.  It defeats the purpose of local APIC hardware virtualization, 
> > > because
> > > when using it interrupts avoid the usage of the local APIC altogether.
> > 
> > So without sufficient APIC acceleration, isn't this arranging for degraded
> > performance then? IOW should the new default perhaps be dependent on the
> > degree of APIC acceleration?
> > 
> > > It has also been reported to not work properly with certain devices, at 
> > > least
> > > when using some AMD GPUs Linux attempts to route interrupts over event
> > > channels, but Xen doesn't correctly detect such routing, which leads to 
> > > the
> > > hypervisor complaining with:
> > > 
> > > (XEN) d15v0: Unsupported MSI delivery mode 7 for Dom15
> > > 
> > > When MSIs are attempted to be routed over event channels the entry 
> > > delivery
> > > mode is set to ExtINT, but Xen doesn't detect such routing and attempts to
> > > inject the interrupt following the native MSI path, and the ExtINT 
> > > delivery
> > > mode is not supported.
> > 
> > Shouldn't this be properly addressed nevertheless? The way it's described
> > it sounds as if MSI wouldn't work at all this way; I can't spot why the
> > issue would only be "with certain devices". Yet that in turn doesn't look
> > to be very likely - pass-through use cases, in particular SR-IOV ones,
> > would certainly have noticed.
> 
> The issue gets triggered when the guest performs save/restore of MSIs,
> because PHYSDEVOP_map_pirq is not implemented for MSIs, and thus, QEMU
> cannot remap the MSI to the event channel once unmapped.

I'm kind of confused by this sentence, PHYSDEVOP_map_pirq does support
MSIs, see xc_physdev_map_pirq_msi() helper in Xen code base.

> So, to fix this issue either would be needed to change QEMU to not unmap
> pirq-emulated MSIs or to implement PHYSDEVOP_map_pirq for MSIs.
> 
> But still, even when no device has been passed-through, scheduling latencies
> (of hundreds of ms), were observed in the guest even when running a simple
> loop application, that disappear once the flag is disabled. We did not have
> the chance to root cause it further.

So XENFEAT_hvm_pirqs is causing such latency issues?  That I certainly
didn't notice.

Regards, Roger.



Re: [PATCH] x86/hvm: don't expose XENFEAT_hvm_pirqs by default

2024-01-10 Thread Andrew Cooper
On 10/01/2024 10:26 am, Jan Beulich wrote:
> On 10.01.2024 10:53, Roger Pau Monne wrote:
>> The HVM pirq feature allows routing interrupts from both physical and 
>> emulated
>> devices over event channels, this was done a performance improvement.  
>> However
>> its usage is fully undocumented, and the only reference implementation is in
>> Linux.  It defeats the purpose of local APIC hardware virtualization, because
>> when using it interrupts avoid the usage of the local APIC altogether.
> So without sufficient APIC acceleration, isn't this arranging for degraded
> performance then?

Maybe.  Quite possibly.

>  IOW should the new default perhaps be dependent on the
> degree of APIC acceleration?

No.  Getting things working is strictly more important than making
things fast.

HVM_PIRQ is an entirely undocumented pile of Xen-specific deviation from
standards, with a single reference implementation and 0 people who
understand it.  It has also been shown to be the cause of real breakage,
and disabling it has been shown to make guests function better.

If and when we have two working schemes, we can choose a default based
on performance. We are not in this position right now. ~Andrew



[PATCH v5 5/5] x86/vIRQ: split PCI link load state checking from actual loading

2024-01-10 Thread Jan Beulich
Move the checking into a check hook, and add checking of the padding
fields as well.

Signed-off-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 
---
v4: New.

--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -749,6 +749,30 @@ static int cf_check irq_load_isa(struct
 return 0;
 }
 
+static int cf_check irq_check_link(const struct domain *d,
+   hvm_domain_context_t *h)
+{
+const struct hvm_hw_pci_link *pci_link = hvm_get_entry(PCI_LINK, h);
+unsigned int link;
+
+if ( !pci_link )
+return -ENODATA;
+
+for ( link = 0; link < ARRAY_SIZE(pci_link->pad0); link++ )
+if ( pci_link->pad0[link] )
+return -EINVAL;
+
+for ( link = 0; link < ARRAY_SIZE(pci_link->route); link++ )
+if ( pci_link->route[link] > 15 )
+{
+printk(XENLOG_G_ERR
+   "HVM restore: PCI-ISA link %u out of range (%u)\n",
+   link, pci_link->route[link]);
+return -EINVAL;
+}
+
+return 0;
+}
 
 static int cf_check irq_load_link(struct domain *d, hvm_domain_context_t *h)
 {
@@ -759,16 +783,6 @@ static int cf_check irq_load_link(struct
 if ( hvm_load_entry(PCI_LINK, h, _irq->pci_link) != 0 )
 return -EINVAL;
 
-/* Sanity check */
-for ( link = 0; link < 4; link++ )
-if ( hvm_irq->pci_link.route[link] > 15 )
-{
-printk(XENLOG_G_ERR
-   "HVM restore: PCI-ISA link %u out of range (%u)\n",
-   link, hvm_irq->pci_link.route[link]);
-return -EINVAL;
-}
-
 /* Adjust the GSI assert counts for the link outputs.
  * This relies on the PCI and ISA IRQ state being loaded first */
 for ( link = 0; link < 4; link++ )
@@ -788,5 +802,5 @@ HVM_REGISTER_SAVE_RESTORE(PCI_IRQ, irq_s
   1, HVMSR_PER_DOM);
 HVM_REGISTER_SAVE_RESTORE(ISA_IRQ, irq_save_isa, NULL, irq_load_isa,
   1, HVMSR_PER_DOM);
-HVM_REGISTER_SAVE_RESTORE(PCI_LINK, irq_save_link, NULL, irq_load_link,
-  1, HVMSR_PER_DOM);
+HVM_REGISTER_SAVE_RESTORE(PCI_LINK, irq_save_link, irq_check_link,
+  irq_load_link, 1, HVMSR_PER_DOM);




[PATCH v5 4/5] x86/vPIC: check values loaded from state save record

2024-01-10 Thread Jan Beulich
Loading is_master from the state save record can lead to out-of-bounds
accesses via at least the two container_of() uses by vpic_domain() and
__vpic_lock(). Make sure the value is consistent with the instance being
loaded.

For ->int_output (which for whatever reason isn't a 1-bit bitfield),
besides bounds checking also take ->init_state into account.

For ELCR follow what vpic_intercept_elcr_io()'s write path and
vpic_reset() do, i.e. don't insist on the internal view of the value to
be saved.

Move the instance range check as well, leaving just an assertion in the
load handler.

Signed-off-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 
---
v3: vpic_domain() fix and vpic_elcr_mask() adjustment split out. Re-base
over rename in earlier patch.
v2: Introduce separate checking function; switch to refusing to load
bogus values. Re-base.

--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -429,6 +429,38 @@ static int cf_check vpic_save(struct vcp
 return 0;
 }
 
+static int cf_check vpic_check(const struct domain *d, hvm_domain_context_t *h)
+{
+unsigned int inst = hvm_load_instance(h);
+const struct hvm_hw_vpic *s;
+
+if ( !has_vpic(d) )
+return -ENODEV;
+
+/* Which PIC is this? */
+if ( inst >= ARRAY_SIZE(d->arch.hvm.vpic) )
+return -ENOENT;
+
+s = hvm_get_entry(PIC, h);
+if ( !s )
+return -ENODATA;
+
+/*
+ * Check to-be-loaded values are within valid range, for them to represent
+ * actually reachable state.  Uses of some of the values elsewhere assume
+ * this is the case.
+ */
+if ( s->int_output > 1 )
+return -EDOM;
+
+if ( s->is_master != !inst ||
+ (s->int_output && s->init_state) ||
+ (s->elcr & ~vpic_elcr_mask(s, 1)) )
+return -EINVAL;
+
+return 0;
+}
+
 static int cf_check vpic_load(struct domain *d, hvm_domain_context_t *h)
 {
 struct hvm_hw_vpic *s;
@@ -438,18 +470,21 @@ static int cf_check vpic_load(struct dom
 return -ENODEV;
 
 /* Which PIC is this? */
-if ( inst > 1 )
-return -ENOENT;
+ASSERT(inst < ARRAY_SIZE(d->arch.hvm.vpic));
 s = >arch.hvm.vpic[inst];
 
 /* Load the state */
 if ( hvm_load_entry(PIC, h, s) != 0 )
 return -EINVAL;
 
+if ( s->is_master )
+s->elcr |= 1 << 2;
+
 return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(PIC, vpic_save, NULL, vpic_load, 2, HVMSR_PER_DOM);
+HVM_REGISTER_SAVE_RESTORE(PIC, vpic_save, vpic_check, vpic_load, 2,
+  HVMSR_PER_DOM);
 
 void vpic_reset(struct domain *d)
 {




[PATCH v5 3/5] x86/vPIT: check values loaded from state save record

2024-01-10 Thread Jan Beulich
In particular pit_latch_status() and speaker_ioport_read() perform
calculations which assume in-bounds values. Several of the state save
record fields can hold wider ranges, though. Refuse to load values which
cannot result from normal operation, except mode, the init state of
which (see also below) cannot otherwise be reached.

Note that ->gate should only be possible to be zero for channel 2;
enforce that as well.

Adjust pit_reset()'s writing of ->mode as well, to not unduly affect
the value pit_latch_status() may calculate. The chosen mode of 7 is
still one which cannot be established by writing the control word. Note
that with or without this adjustment effectively all switch() statements
using mode as the control expression aren't quite right when the PIT is
still in that init state; there is an apparent assumption that before
these can sensibly be invoked, the guest would init the PIT (i.e. in
particular set the mode).

Signed-off-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 
---
For mode we could refuse to load values in the [0x08,0xfe] range; I'm
not certain that's going to be overly helpful.

For count I was considering to clip the saved value to 16 bits (i.e. to
convert the internally used 0x1 back to the architectural 0x),
but pit_save() doesn't easily lend itself to such a "fixup". If desired
perhaps better a separate change anyway.
---
v3: Slightly adjust two comments. Re-base over rename in earlier patch.
v2: Introduce separate checking function; switch to refusing to load
bogus values. Re-base.

--- a/xen/arch/x86/emul-i8254.c
+++ b/xen/arch/x86/emul-i8254.c
@@ -47,6 +47,7 @@
 #define RW_STATE_MSB 2
 #define RW_STATE_WORD0 3
 #define RW_STATE_WORD1 4
+#define RW_STATE_NUM 5
 
 #define get_guest_time(v) \
(is_hvm_vcpu(v) ? hvm_get_guest_time(v) : (u64)get_s_time())
@@ -427,6 +428,47 @@ static int cf_check pit_save(struct vcpu
 return rc;
 }
 
+static int cf_check pit_check(const struct domain *d, hvm_domain_context_t *h)
+{
+const struct hvm_hw_pit *hw;
+unsigned int i;
+
+if ( !has_vpit(d) )
+return -ENODEV;
+
+hw = hvm_get_entry(PIT, h);
+if ( !hw )
+return -ENODATA;
+
+/*
+ * Check to-be-loaded values are within valid range, for them to represent
+ * actually reachable state.  Uses of some of the values elsewhere assume
+ * this is the case.  Note that the channels' mode fields aren't checked;
+ * Xen prior to 4.19 might save them as 0xff.
+ */
+if ( hw->speaker_data_on > 1 || hw->pad0 )
+return -EDOM;
+
+for ( i = 0; i < ARRAY_SIZE(hw->channels); ++i )
+{
+const struct hvm_hw_pit_channel *ch = >channels[i];
+
+if ( ch->count > 0x1 ||
+ ch->count_latched >= RW_STATE_NUM ||
+ ch->read_state >= RW_STATE_NUM ||
+ ch->write_state >= RW_STATE_NUM ||
+ ch->rw_mode > RW_STATE_WORD0 ||
+ ch->gate > 1 ||
+ ch->bcd > 1 )
+return -EDOM;
+
+if ( i != 2 && !ch->gate )
+return -EINVAL;
+}
+
+return 0;
+}
+
 static int cf_check pit_load(struct domain *d, hvm_domain_context_t *h)
 {
 PITState *pit = domain_vpit(d);
@@ -443,6 +485,14 @@ static int cf_check pit_load(struct doma
 goto out;
 }
 
+for ( i = 0; i < ARRAY_SIZE(pit->hw.channels); ++i )
+{
+struct hvm_hw_pit_channel *ch = >hw.channels[i];
+
+if ( (ch->mode &= 7) > 5 )
+ch->mode -= 4;
+}
+
 /*
  * Recreate platform timers from hardware state.  There will be some 
  * time jitter here, but the wall-clock will have jumped massively, so 
@@ -458,7 +508,7 @@ static int cf_check pit_load(struct doma
 return rc;
 }
 
-HVM_REGISTER_SAVE_RESTORE(PIT, pit_save, NULL, pit_load, 1, HVMSR_PER_DOM);
+HVM_REGISTER_SAVE_RESTORE(PIT, pit_save, pit_check, pit_load, 1, 
HVMSR_PER_DOM);
 #endif
 
 /* The intercept action for PIT DM retval: 0--not handled; 1--handled. */
@@ -575,7 +625,7 @@ void pit_reset(struct domain *d)
 for ( i = 0; i < 3; i++ )
 {
 s = >hw.channels[i];
-s->mode = 0xff; /* the init mode */
+s->mode = 7; /* unreachable sentinel */
 s->gate = (i != 2);
 pit_load_count(pit, i, 0);
 }




[PATCH v5 2/5] x86/HVM: adjust save/restore hook registration for optional check handler

2024-01-10 Thread Jan Beulich
Register NULL uniformly as a first step.

Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 
---
v2: New.

--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -374,7 +374,7 @@ static int cf_check vmce_load_vcpu_ctxt(
 return err ?: vmce_restore_vcpu(v, );
 }
 
-HVM_REGISTER_SAVE_RESTORE(VMCE_VCPU, vmce_save_vcpu_ctxt,
+HVM_REGISTER_SAVE_RESTORE(VMCE_VCPU, vmce_save_vcpu_ctxt, NULL,
   vmce_load_vcpu_ctxt, 1, HVMSR_PER_VCPU);
 #endif
 
--- a/xen/arch/x86/emul-i8254.c
+++ b/xen/arch/x86/emul-i8254.c
@@ -458,7 +458,7 @@ static int cf_check pit_load(struct doma
 return rc;
 }
 
-HVM_REGISTER_SAVE_RESTORE(PIT, pit_save, pit_load, 1, HVMSR_PER_DOM);
+HVM_REGISTER_SAVE_RESTORE(PIT, pit_save, NULL, pit_load, 1, HVMSR_PER_DOM);
 #endif
 
 /* The intercept action for PIT DM retval: 0--not handled; 1--handled. */
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -692,7 +692,7 @@ static int cf_check hpet_load(struct dom
 return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(HPET, hpet_save, hpet_load, 1, HVMSR_PER_DOM);
+HVM_REGISTER_SAVE_RESTORE(HPET, hpet_save, NULL, hpet_load, 1, HVMSR_PER_DOM);
 
 static void hpet_set(HPETState *h)
 {
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -793,7 +793,7 @@ static int cf_check hvm_load_tsc_adjust(
 return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(TSC_ADJUST, hvm_save_tsc_adjust,
+HVM_REGISTER_SAVE_RESTORE(TSC_ADJUST, hvm_save_tsc_adjust, NULL,
   hvm_load_tsc_adjust, 1, HVMSR_PER_VCPU);
 
 static int cf_check hvm_save_cpu_ctxt(struct vcpu *v, hvm_domain_context_t *h)
@@ -1186,7 +1186,7 @@ static int cf_check hvm_load_cpu_ctxt(st
 return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(CPU, hvm_save_cpu_ctxt, hvm_load_cpu_ctxt, 1,
+HVM_REGISTER_SAVE_RESTORE(CPU, hvm_save_cpu_ctxt, NULL, hvm_load_cpu_ctxt, 1,
   HVMSR_PER_VCPU);
 
 #define HVM_CPU_XSAVE_SIZE(xcr0) (offsetof(struct hvm_hw_cpu_xsave, \
@@ -1535,6 +1535,7 @@ static int __init cf_check hvm_register_
 hvm_register_savevm(CPU_XSAVE_CODE,
 "CPU_XSAVE",
 hvm_save_cpu_xsave_states,
+NULL,
 hvm_load_cpu_xsave_states,
 HVM_CPU_XSAVE_SIZE(xfeature_mask) +
 sizeof(struct hvm_save_descriptor),
@@ -1543,6 +1544,7 @@ static int __init cf_check hvm_register_
 hvm_register_savevm(CPU_MSR_CODE,
 "CPU_MSR",
 hvm_save_cpu_msrs,
+NULL,
 hvm_load_cpu_msrs,
 HVM_CPU_MSR_SIZE(ARRAY_SIZE(msrs_to_send)) +
 sizeof(struct hvm_save_descriptor),
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -784,9 +784,9 @@ static int cf_check irq_load_link(struct
 return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(PCI_IRQ, irq_save_pci, irq_load_pci,
+HVM_REGISTER_SAVE_RESTORE(PCI_IRQ, irq_save_pci, NULL, irq_load_pci,
   1, HVMSR_PER_DOM);
-HVM_REGISTER_SAVE_RESTORE(ISA_IRQ, irq_save_isa, irq_load_isa,
+HVM_REGISTER_SAVE_RESTORE(ISA_IRQ, irq_save_isa, NULL, irq_load_isa,
   1, HVMSR_PER_DOM);
-HVM_REGISTER_SAVE_RESTORE(PCI_LINK, irq_save_link, irq_load_link,
+HVM_REGISTER_SAVE_RESTORE(PCI_LINK, irq_save_link, NULL, irq_load_link,
   1, HVMSR_PER_DOM);
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,7 +773,7 @@ static int cf_check hvm_load_mtrr_msr(st
 return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(MTRR, hvm_save_mtrr_msr, hvm_load_mtrr_msr, 1,
+HVM_REGISTER_SAVE_RESTORE(MTRR, hvm_save_mtrr_msr, NULL, hvm_load_mtrr_msr, 1,
   HVMSR_PER_VCPU);
 
 void memory_type_changed(struct domain *d)
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -300,7 +300,7 @@ static int cf_check acpi_load(struct dom
 return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(PMTIMER, acpi_save, acpi_load,
+HVM_REGISTER_SAVE_RESTORE(PMTIMER, acpi_save, NULL, acpi_load,
   1, HVMSR_PER_DOM);
 
 int pmtimer_change_ioport(struct domain *d, uint64_t version)
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -797,7 +797,7 @@ static int cf_check rtc_load(struct doma
 return 0;
 }
 
-HVM_REGISTER_SAVE_RESTORE(RTC, rtc_save, rtc_load, 1, HVMSR_PER_DOM);
+HVM_REGISTER_SAVE_RESTORE(RTC, rtc_save, NULL, rtc_load, 1, HVMSR_PER_DOM);
 
 void rtc_reset(struct domain *d)
 {
--- a/xen/arch/x86/hvm/save.c
+++ b/xen/arch/x86/hvm/save.c
@@ -88,6 +88,7 @@ static struct {
 void __init hvm_register_savevm(uint16_t typecode,
 const char *name,
 hvm_save_handler save_state,
+hvm_check_handler check_state,
 hvm_load_handler load_state,
 size_t size, int kind)
 {
@@ 

[PATCH v5 1/5] x86/HVM: split restore state checking from state loading

2024-01-10 Thread Jan Beulich
..., at least as reasonably feasible without making a check hook
mandatory (in particular strict vs relaxed/zero-extend length checking
can't be done early this way).

Note that only one of the two uses of "real" hvm_load() is accompanied
with a "checking" one. The other directly consumes hvm_save() output,
which ought to be well-formed. This means that while input data related
checks don't need repeating in the "load" function when already done by
the "check" one (albeit assertions to this effect may be desirable),
domain state related checks (e.g. has_xyz(d)) will be required in both
places.

With the split arch_hvm_{check,load}(), also invoke the latter only
after downing all the vCPU-s.

Suggested-by: Roger Pau Monné 
Signed-off-by: Jan Beulich 
---
Do we really need all the copying involved in use of _hvm_read_entry()
(backing hvm_load_entry()? Zero-extending loads are likely easier to
handle that way, but for strict loads all we gain is a reduced risk of
unaligned accesses (compared to simply pointing into h->data[]).
---
v5: Add comment. Re-order vCPU-down vs arch_hvm_load().
v4: Fold hvm_check() into hvm_load().
v2: New.

--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -379,8 +379,12 @@ long arch_do_domctl(
 if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0 
)
 goto sethvmcontext_out;
 
+ret = hvm_load(d, false, );
+if ( ret )
+goto sethvmcontext_out;
+
 domain_pause(d);
-ret = hvm_load(d, );
+ret = hvm_load(d, true, );
 domain_unpause(d);
 
 sethvmcontext_out:
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5391,7 +5391,7 @@ int hvm_copy_context_and_params(struct d
 }
 
 c.cur = 0;
-rc = hvm_load(dst, );
+rc = hvm_load(dst, true, );
 
  out:
 vfree(c.data);
--- a/xen/arch/x86/hvm/save.c
+++ b/xen/arch/x86/hvm/save.c
@@ -30,7 +30,8 @@ static void arch_hvm_save(struct domain
 d->arch.hvm.sync_tsc = rdtsc();
 }
 
-static int arch_hvm_load(struct domain *d, const struct hvm_save_header *hdr)
+static int arch_hvm_check(const struct domain *d,
+  const struct hvm_save_header *hdr)
 {
 uint32_t eax, ebx, ecx, edx;
 
@@ -55,6 +56,11 @@ static int arch_hvm_load(struct domain *
"(%#"PRIx32") and restored on another (%#"PRIx32").\n",
d->domain_id, hdr->cpuid, eax);
 
+return 0;
+}
+
+static void arch_hvm_load(struct domain *d, const struct hvm_save_header *hdr)
+{
 /* Restore guest's preferred TSC frequency. */
 if ( hdr->gtsc_khz )
 d->arch.tsc_khz = hdr->gtsc_khz;
@@ -66,13 +72,12 @@ static int arch_hvm_load(struct domain *
 
 /* VGA state is not saved/restored, so we nobble the cache. */
 d->arch.hvm.stdvga.cache = STDVGA_CACHE_DISABLED;
-
-return 0;
 }
 
 /* List of handlers for various HVM save and restore types */
 static struct {
 hvm_save_handler save;
+hvm_check_handler check;
 hvm_load_handler load;
 const char *name;
 size_t size;
@@ -88,6 +93,7 @@ void __init hvm_register_savevm(uint16_t
 {
 ASSERT(typecode <= HVM_SAVE_CODE_MAX);
 ASSERT(hvm_sr_handlers[typecode].save == NULL);
+ASSERT(hvm_sr_handlers[typecode].check == NULL);
 ASSERT(hvm_sr_handlers[typecode].load == NULL);
 hvm_sr_handlers[typecode].save = save_state;
 hvm_sr_handlers[typecode].load = load_state;
@@ -275,12 +281,15 @@ int hvm_save(struct domain *d, hvm_domai
 return 0;
 }
 
-int hvm_load(struct domain *d, hvm_domain_context_t *h)
+/*
+ * @real = false requests checking of the incoming state, while @real = true
+ * requests actual loading, which will then assume that checking was already
+ * done or is unnecessary.
+ */
+int hvm_load(struct domain *d, bool real, hvm_domain_context_t *h)
 {
 const struct hvm_save_header *hdr;
 struct hvm_save_descriptor *desc;
-hvm_load_handler handler;
-struct vcpu *v;
 int rc;
 
 if ( d->is_dying )
@@ -291,50 +300,92 @@ int hvm_load(struct domain *d, hvm_domai
 if ( !hdr )
 return -ENODATA;
 
-rc = arch_hvm_load(d, hdr);
-if ( rc )
-return rc;
+rc = arch_hvm_check(d, hdr);
+if ( real )
+{
+struct vcpu *v;
+
+ASSERT(!rc);
 
-/* Down all the vcpus: we only re-enable the ones that had state saved. */
-for_each_vcpu(d, v)
-if ( !test_and_set_bit(_VPF_down, >pause_flags) )
-vcpu_sleep_nosync(v);
+/*
+ * Down all the vcpus: we only re-enable the ones that had state
+ * saved.
+ */
+for_each_vcpu(d, v)
+if ( !test_and_set_bit(_VPF_down, >pause_flags) )
+vcpu_sleep_nosync(v);
+
+arch_hvm_load(d, hdr);
+}
+else if ( rc )
+return rc;
 
 for ( ; ; )
 {
+const char *name;
+hvm_load_handler load;
+
 if ( h->size - h->cur < sizeof(struct hvm_save_descriptor) )
 {
 /* Run 

[PATCH v5 0/5] x86/HVM: load state checking

2024-01-10 Thread Jan Beulich
With the request to convert bounding to actual refusal, and then
doing so in new hooks, the two previously separate patches now need
to be in a series, with infrastructure work done first. Clearly the
checking in other load handlers could (and likely wants to be) moved
to separate check handlers as well - one example of doing so is
added anew in v4, the rest will want doing down the road.

Only patch 1 changed in v5; see there for details.

1: HVM: split restore state checking from state loading
2: HVM: adjust save/restore hook registration for optional check handler
3: vPIT: check values loaded from state save record
4: vPIC: check values loaded from state save record
5: vIRQ: split PCI link load state checking from actual loading

Jan



[PATCH v3] NUMA: limit first_valid_mfn exposure

2024-01-10 Thread Jan Beulich
Address the TODO regarding first_valid_mfn by making the variable static
when NUMA=y, thus also addressing a Misra C:2012 rule 8.4 concern (on
x86). To carry this out, introduce two new IS_ENABLED()-like macros
conditionally inserting "static". One less macro expansion layer is
sufficient though (I might guess that some early form of IS_ENABLED()
pasted CONFIG_ onto the incoming argument, at which point the extra
layer would have been necessary), and part of the existing helper macros
can be re-used.

Signed-off-by: Jan Beulich 
---
v3: Introduce STATIC_IF{,_NOT}().
v2: New, split off.

--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -255,12 +255,10 @@ static PAGE_LIST_HEAD(page_broken_list);
  */
 
 /*
- * first_valid_mfn is exported because it is used when !CONFIG_NUMA.
- *
- * TODO: Consider if we can conditionally export first_valid_mfn based
- * on whether NUMA is selected.
+ * When !CONFIG_NUMA first_valid_mfn is non-static, for use by respective
+ * stubs.
  */
-mfn_t first_valid_mfn = INVALID_MFN_INITIALIZER;
+STATIC_IF(CONFIG_NUMA) mfn_t first_valid_mfn = INVALID_MFN_INITIALIZER;
 
 struct bootmem_region {
 unsigned long s, e; /* MFNs @s through @e-1 inclusive are free */
--- a/xen/include/xen/kconfig.h
+++ b/xen/include/xen/kconfig.h
@@ -11,6 +11,8 @@
 /* cppcheck is failing to parse the macro so use a dummy one */
 #ifdef CPPCHECK
 #define IS_ENABLED(option) option
+#define STATIC_IF(option) option
+#define STATIC_IF_NOT(option) option
 #else
 /*
  * Getting something that works in C and CPP for an arg that may or may
@@ -31,6 +33,17 @@
  * otherwise.
  */
 #define IS_ENABLED(option) config_enabled(option)
+
+/* Use similar trickery for conditionally inserting "static". */
+#define static_if(value) _static_if(__ARG_PLACEHOLDER_##value)
+#define _static_if(arg1_or_junk) ___config_enabled(arg1_or_junk static,)
+
+#define STATIC_IF(option) static_if(option)
+
+#define static_if_not(value) _static_if_not(__ARG_PLACEHOLDER_##value)
+#define _static_if_not(arg1_or_junk) ___config_enabled(arg1_or_junk, static)
+
+#define STATIC_IF_NOT(option) static_if_not(option)
 #endif
 
 #endif /* __XEN_KCONFIG_H */



Re: [PATCH 6/6] xen/x86: Add topology generator

2024-01-10 Thread Jan Beulich
On 10.01.2024 15:16, Alejandro Vallejo wrote:
> Review-to-self after running in Gitlab:
> 
> On 09/01/2024 15:38, Alejandro Vallejo wrote:
>> +p->basic.lppp = 0xff;
>> +if ( threads_per_pkg < 0xff )
>> +p->basic.lppp = threads_per_pkg;
>> +
>> +switch ( p->x86_vendor )
>> +{
>> +case X86_VENDOR_INTEL:
>> +struct cpuid_cache_leaf *sl = p->cache.subleaf;
>> +for ( size_t i = 0; sl->type &&
>> +i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
>> +{
>> +sl->cores_per_package = cores_per_pkg - 1;
>> +sl->threads_per_cache = threads_per_core - 1;
>> +if ( sl->type == 3 /* unified cache */ )
>> +sl->threads_per_cache = threads_per_pkg - 1;
>> +}
>> +break;
>> +
>> +case X86_VENDOR_AMD:
>> +case X86_VENDOR_HYGON:
> 
> Missing braces around the INTEL block due to the variable declarared
> there. I'll include that in v2 after the rest of the review comments
> come through.

And (just looking at the fragment above) too deep indentation as well.

Jan



Re: [PATCH 6/6] xen/x86: Add topology generator

2024-01-10 Thread Alejandro Vallejo
Review-to-self after running in Gitlab:

On 09/01/2024 15:38, Alejandro Vallejo wrote:
> +p->basic.lppp = 0xff;
> +if ( threads_per_pkg < 0xff )
> +p->basic.lppp = threads_per_pkg;
> +
> +switch ( p->x86_vendor )
> +{
> +case X86_VENDOR_INTEL:
> +struct cpuid_cache_leaf *sl = p->cache.subleaf;
> +for ( size_t i = 0; sl->type &&
> +i < ARRAY_SIZE(p->cache.raw); i++, sl++ )
> +{
> +sl->cores_per_package = cores_per_pkg - 1;
> +sl->threads_per_cache = threads_per_core - 1;
> +if ( sl->type == 3 /* unified cache */ )
> +sl->threads_per_cache = threads_per_pkg - 1;
> +}
> +break;
> +
> +case X86_VENDOR_AMD:
> +case X86_VENDOR_HYGON:

Missing braces around the INTEL block due to the variable declarared
there. I'll include that in v2 after the rest of the review comments
come through.

Cheers,
Alejandro



Re: [RFC XEN PATCH v4 1/5] xen/vpci: Clear all vpci status of device

2024-01-10 Thread Stewart Hildebrand
On 1/10/24 01:24, Chen, Jiqian wrote:
> On 2024/1/9 23:24, Stewart Hildebrand wrote:
>> On 1/5/24 02:09, Jiqian Chen wrote:
>>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>>> index 42db3e6d133c..552ccbf747cb 100644
>>> --- a/xen/drivers/pci/physdev.c
>>> +++ b/xen/drivers/pci/physdev.c
>>> @@ -67,6 +68,39 @@ ret_t pci_physdev_op(int cmd, 
>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>  break;
>>>  }
>>>  
>>> +case PHYSDEVOP_pci_device_state_reset: {
>>> +struct physdev_pci_device dev;
>>> +struct pci_dev *pdev;
>>> +pci_sbdf_t sbdf;
>>> +
>>> +if ( !is_pci_passthrough_enabled() )
>>> +return -EOPNOTSUPP;
>>> +
>>> +ret = -EFAULT;
>>> +if ( copy_from_guest(, arg, 1) != 0 )
>>> +break;
>>> +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
>>> +
>>> +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>>> +if ( ret )
>>> +break;
>>> +
>>> +pcidevs_lock();
>>> +pdev = pci_get_pdev(NULL, sbdf);
>>> +if ( !pdev )
>>> +{
>>> +pcidevs_unlock();
>>> +ret = -ENODEV;
>>> +break;
>>> +}
>>> +
>>
>> write_lock(>domain->pci_lock);
>>
>>> +ret = vpci_reset_device_state(pdev);
>>
>> write_unlock(>domain->pci_lock);
> vpci_reset_device_state only reset the vpci state of pdev without deleting 
> pdev from domain, and here has held pcidevs_lock, it has no need to lock 
> pci_lock?

Strictly speaking, it is not enforced yet. However, an upcoming change [1] will 
expand the scope of d->pci_lock to include protecting the pdev->vpci structure 
to an extent, so it will be required once that change is committed. In my 
opinion there is no harm in adding the additional lock now. If you prefer to 
wait I would not object, but in this case I would at least ask for a TODO 
comment to help remind us to address it later.

[1] https://lists.xenproject.org/archives/html/xen-devel/2024-01/msg00446.html

> 
>>
>>> +pcidevs_unlock();
>>> +if ( ret )
>>> +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
>>> );
>>> +break;
>>> +}
>>> +
>>>  default:
>>>  ret = -ENOSYS;
>>>  break;
>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>> index 72ef277c4f8e..3c64cb10ccbb 100644
>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -107,6 +107,15 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>>  
>>>  return rc;
>>>  }
>>> +
>>> +int vpci_reset_device_state(struct pci_dev *pdev)
>>> +{
>>> +ASSERT(pcidevs_locked());
>>
>> ASSERT(rw_is_write_locked(>domain->pci_lock));
>>
>>> +
>>> +vpci_remove_device(pdev);
>>> +return vpci_add_handlers(pdev);
>>> +}
>>> +
>>>  #endif /* __XEN__ */
>>>  
>>>  static int vpci_register_cmp(const struct vpci_register *r1,
> 



Re: [PATCH 1/2] libxl: Remove cdrom forced QDISK w/ stubdom

2024-01-10 Thread Marek Marczykowski-Górecki
On Tue, Jan 09, 2024 at 03:46:54PM -0500, Jason Andryuk wrote:
> A Linux HVM domain ignores PV block devices with type cdrom.  The
> Windows PV drivers also ignore device-type != "disk".  Therefore QEMU's
> emulated CD-ROM support is used.  This allows ejection and other CD-ROM
> features to work.
> 
> With a stubdom, QEMU is running in the stubdom.  A PV disk is still
> connected into the stubdom, and then QEMU can emulate the CD-ROM into
> the guest.  This removes the need for forcing to a QDISK.  Relax the
> checks to support this.
> 
> Signed-off-by: Jason Andryuk 
> ---
>  tools/libs/light/libxl_disk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libs/light/libxl_disk.c b/tools/libs/light/libxl_disk.c
> index b65cad33cc..d1f84ef404 100644
> --- a/tools/libs/light/libxl_disk.c
> +++ b/tools/libs/light/libxl_disk.c
> @@ -192,7 +192,8 @@ static int libxl__device_disk_setdefault(libxl__gc *gc, 
> uint32_t domid,
>  
>  /* Force Qdisk backend for CDROM devices of guests with a device model. 
> */
>  if (disk->is_cdrom != 0 &&
> -libxl__domain_type(gc, domid) == LIBXL_DOMAIN_TYPE_HVM) {
> +libxl__domain_type(gc, domid) == LIBXL_DOMAIN_TYPE_HVM &&
> +!libxl_get_stubdom_id(CTX, domid)) {

Should this check for stubdomain flavor too? I guess it won't really
work with qemu-traditional.
Similar check also wants to be in the next patch, instead of completely
dropping stubdomain check.

>  if (!(disk->backend == LIBXL_DISK_BACKEND_QDISK ||
>disk->backend == LIBXL_DISK_BACKEND_UNKNOWN)) {
>  LOGD(ERROR, domid, "Backend for CD devices on HVM guests must be 
> Qdisk");
> -- 
> 2.43.0
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH] x86/hvm: don't expose XENFEAT_hvm_pirqs by default

2024-01-10 Thread Xenia Ragiadakou




On 10/1/24 12:26, Jan Beulich wrote:

On 10.01.2024 10:53, Roger Pau Monne wrote:

The HVM pirq feature allows routing interrupts from both physical and emulated
devices over event channels, this was done a performance improvement.  However
its usage is fully undocumented, and the only reference implementation is in
Linux.  It defeats the purpose of local APIC hardware virtualization, because
when using it interrupts avoid the usage of the local APIC altogether.


So without sufficient APIC acceleration, isn't this arranging for degraded
performance then? IOW should the new default perhaps be dependent on the
degree of APIC acceleration?


It has also been reported to not work properly with certain devices, at least
when using some AMD GPUs Linux attempts to route interrupts over event
channels, but Xen doesn't correctly detect such routing, which leads to the
hypervisor complaining with:

(XEN) d15v0: Unsupported MSI delivery mode 7 for Dom15

When MSIs are attempted to be routed over event channels the entry delivery
mode is set to ExtINT, but Xen doesn't detect such routing and attempts to
inject the interrupt following the native MSI path, and the ExtINT delivery
mode is not supported.


Shouldn't this be properly addressed nevertheless? The way it's described
it sounds as if MSI wouldn't work at all this way; I can't spot why the
issue would only be "with certain devices". Yet that in turn doesn't look
to be very likely - pass-through use cases, in particular SR-IOV ones,
would certainly have noticed.


The issue gets triggered when the guest performs save/restore of MSIs, 
because PHYSDEVOP_map_pirq is not implemented for MSIs, and thus, QEMU 
cannot remap the MSI to the event channel once unmapped.
So, to fix this issue either would be needed to change QEMU to not unmap 
pirq-emulated MSIs or to implement PHYSDEVOP_map_pirq for MSIs.


But still, even when no device has been passed-through, scheduling 
latencies (of hundreds of ms), were observed in the guest even when 
running a simple loop application, that disappear once the flag is 
disabled. We did not have the chance to root cause it further.




Jan





Re: [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings

2024-01-10 Thread El Yandouzi, Elias

Hi Jan,

I have been looking at this series recently and tried my best
to address your comments. I'll shortly to the other patches too.

On 22/12/2022 11:48, Jan Beulich wrote:

On 16.12.2022 12:48, Julien Grall wrote:

From: Hongyan Xia 

Building a PV dom0 is allocating from the domheap but uses it like the
xenheap. This is clearly wrong. Fix.


"Clearly wrong" would mean there's a bug here, at lest under certain
conditions. But there isn't: Even on huge systems, due to running on
idle page tables, all memory is mapped at present.


I agree with you, I'll rephrase the commit message.




@@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d,
  v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS;
  }
  
+#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \

+do {\
+UNMAP_DOMAIN_PAGE(virt_var);\


Not much point using the macro when ...


+mfn_var = maddr_to_mfn(maddr);  \
+maddr += PAGE_SIZE; \
+virt_var = map_domain_page(mfn_var);\


... the variable gets reset again to non-NULL unconditionally right
away.


Sure, I'll change that.




+} while ( false )


This being a local macro and all use sites passing mpt_alloc as the
last argument, I think that parameter wants dropping, which would
improve readability.


I have to disagree. It wouldn't improve readability but make only make 
things more obscure. I'll keep the macro as is.





@@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d,
  if ( !l3e_get_intpte(*l3tab) )
  {
  maddr_to_page(mpt_alloc)->u.inuse.type_info = 
PGT_l2_page_table;
-l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
-clear_page(l2tab);
-*l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT);
+UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
+clear_page(l2start);
+*l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);


The l2start you map on the last iteration here can be re-used ...


@@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d,
  unmap_domain_page(l2t);
  }


... in the code the tail of which is visible here, eliminating a
redundant map/unmap pair.


Good catch, I'll remove the redundant pair.




@@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d,
   * !CONFIG_VIDEO case so the logic here can be simplified.
   */
  if ( pv_shim )
+{
+l4start = map_domain_page(l4start_mfn);
  pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, 
vconsole_start,
vphysmap_start, si);
+UNMAP_DOMAIN_PAGE(l4start);
+}


The, at the first glance, redundant re-mapping of the L4 table here could
do with explaining in the description. However, I further wonder in how
far in shim mode eliminating the direct map is actually useful. Which is
to say that I question the need for this change in the first place. Or
wait - isn't this (unlike the rest of this patch) actually a bug fix? At
this point we're on the domain's page tables, which may not cover the
page the L4 is allocated at (if a truly huge shim was configured). So I
guess the change is needed but wants breaking out, allowing to at least
consider whether to backport it.



I will create a separate patch for this change.


Jan





Re: [PATCH v6 07/11] virtio-gpu: Handle resource blob commands

2024-01-10 Thread Pierre-Eric Pelloux-Prayer




Le 21/12/2023 à 09:09, Akihiko Odaki a écrit :

On 2023/12/19 16:53, Huang Rui wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true


I have another concern about delaying virgl_renderer_resource_unref() until the 
resource gets unmapped; the guest will expect the resource ID will be available 
for a new resource immediately after VIRTIO_GPU_CMD_RESOURCE_UNREF, but it will 
break the assumption and may corrupt things.



Yes this is a problem.

And another one is virglrenderer is not really thread-safe, so this callstack:

#0  virgl_resource_blob_async_unmap ()
#1  object_finalize ()
#2  object_unref ()
#3  memory_region_unref ()
#4  flatview_destroy ()
#5  call_rcu_thread ()
#6  qemu_thread_start ()

Will call into virgl_renderer_ctx_resource_unmap which in turn uses 
virgl_resource_lookup
without any multithreading considerations.


Regards,
Pierre-Eric



[PATCH v2] x86/hvm: don't expose XENFEAT_hvm_pirqs by default

2024-01-10 Thread Roger Pau Monne
The HVM pirq feature allows routing interrupts from both physical and emulated
devices over event channels, this was done a performance improvement.  However
its usage is fully undocumented, and the only reference implementation is in
Linux.  It defeats the purpose of local APIC hardware virtualization, because
when using it interrupts avoid the usage of the local APIC altogether.

It has also been reported to not work properly with certain devices, at least
when using some AMD GPUs Linux attempts to route interrupts over event
channels, but Xen doesn't correctly detect such routing, which leads to the
hypervisor complaining with:

(XEN) d15v0: Unsupported MSI delivery mode 7 for Dom15

When MSIs are attempted to be routed over event channels the entry delivery
mode is set to ExtINT, but Xen doesn't detect such routing and attempts to
inject the interrupt following the native MSI path, and the ExtINT delivery
mode is not supported.

Disable HVM PIRQs by default and provide a per-domain option in xl.cfg to
enable such feature.  Also for backwards compatibility keep the feature enabled
for any resumed domains that don't have an explicit selection.

Note that the only user of the feature (Linux) is also able to handle native
interrupts fine, as the feature was already not used if Xen reported local APIC
hardware virtualization active.

Link: https://github.com/QubesOS/qubes-issues/issues/7971
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Fix libxl for PV guests.
---
 docs/man/xl.cfg.5.pod.in  |  7 +++
 tools/include/libxl.h |  7 +++
 tools/libs/light/libxl_create.c   |  7 +--
 tools/libs/light/libxl_types.idl  |  1 +
 tools/libs/light/libxl_x86.c  | 12 +---
 tools/python/xen/lowlevel/xc/xc.c |  4 +++-
 tools/xl/xl_parse.c   |  1 +
 xen/arch/x86/domain.c |  4 +++-
 8 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 2e234b450efb..ea8d41727d8e 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2460,6 +2460,13 @@ The viridian option can be specified as a boolean. A 
value of true (1)
 is equivalent to the list [ "defaults" ], and a value of false (0) is
 equivalent to an empty list.
 
+=item B
+
+Select whether the guest is allowed to route interrupts from devices (either
+emulated or passed through) over event channels.
+
+This option is disabled by default.
+
 =back
 
 =head3 Emulated VGA Graphics Device
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 907aa0a3303a..f1652b1664f0 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -608,6 +608,13 @@
  * executable in order to not run it as the same user as libxl.
  */
 
+/*
+ * LIBXL_HAVE_HVM_PIRQ indicates the presence of the u.hvm.pirq filed in
+ * libxl_domain_build_info that signals whether an HVM guest has accesses to
+ * the XENFEAT_hvm_pirqs feature.
+ */
+#define LIBXL_HAVE_HVM_PIRQ 1
+
 /*
  * libxl memory management
  *
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index ce1d43110336..0008fac607e3 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -376,6 +376,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 libxl_defbool_setdefault(_info->u.hvm.usb,false);
 libxl_defbool_setdefault(_info->u.hvm.vkb_device, true);
 libxl_defbool_setdefault(_info->u.hvm.xen_platform_pci,   true);
+libxl_defbool_setdefault(_info->u.hvm.pirq,   false);
 
 libxl_defbool_setdefault(_info->u.hvm.spice.enable, false);
 if (!libxl_defbool_val(b_info->u.hvm.spice.enable) &&
@@ -2375,10 +2376,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 
 /*
  * When restoring (either from a save file or for a migration domain) set
- * the MSR relaxed mode for compatibility with older Xen versions if the
- * option is not set as part of the original configuration.
+ * the MSR relaxed mode and HVM PIRQs for compatibility with older Xen
+ * versions if the options are not set as part of the original
+ * configuration.
  */
 libxl_defbool_setdefault(_config->b_info.arch_x86.msr_relaxed, true);
+libxl_defbool_setdefault(_config->b_info.u.hvm.pirq, true);
 
 return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
 params, ao_how, aop_console_how);
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 7d8bd5d21667..899ad3096926 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -692,6 +692,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
("rdm", libxl_rdm_reserve),
("rdm_mem_boundary_memkb", MemKB),
("mca_caps",

[ovmf test] 184305: all pass - PUSHED

2024-01-10 Thread osstest service owner
flight 184305 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184305/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf bc34a79cd2a005e1d12d4b05bec6efc3b102cad6
baseline version:
 ovmf 265b4ab91b8a31d6d1760ad1eaa1e16f9244cba7

Last test of basis   184302  2024-01-10 06:11:07 Z0 days
Testing same since   184305  2024-01-10 10:41:30 Z0 days1 attempts


People who touched revisions under test:
  Nickle Wang 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   265b4ab91b..bc34a79cd2  bc34a79cd2a005e1d12d4b05bec6efc3b102cad6 -> 
xen-tested-master



[linux-5.4 test] 184298: regressions - FAIL

2024-01-10 Thread osstest service owner
flight 184298 linux-5.4 real [real]
flight 184304 linux-5.4 real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/184298/
http://logs.test-lab.xenproject.org/osstest/logs/184304/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-multivcpu 18 guest-start/debian.repeat fail REGR. vs. 
184192

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 184192

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit1 18 guest-start/debian.repeat fail blocked in 184192
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 
184192
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184192
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184192
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184192
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184192
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184192
 test-armhf-armhf-xl-credit2  18 guest-start/debian.repeatfail  like 184192
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184192
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184192
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184192
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184192
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184192
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184192
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   

Re: [PATCH] x86/hvm: don't expose XENFEAT_hvm_pirqs by default

2024-01-10 Thread Roger Pau Monné
On Wed, Jan 10, 2024 at 11:26:02AM +0100, Jan Beulich wrote:
> On 10.01.2024 10:53, Roger Pau Monne wrote:
> > The HVM pirq feature allows routing interrupts from both physical and 
> > emulated
> > devices over event channels, this was done a performance improvement.  
> > However
> > its usage is fully undocumented, and the only reference implementation is in
> > Linux.  It defeats the purpose of local APIC hardware virtualization, 
> > because
> > when using it interrupts avoid the usage of the local APIC altogether.
> 
> So without sufficient APIC acceleration, isn't this arranging for degraded
> performance then? IOW should the new default perhaps be dependent on the
> degree of APIC acceleration?

Maybe, I certainly have no figures, but given the feature is not
working as expected I don't think we should block disabling it on
performance reasons.  Reliability should always be first to
performance.

> > It has also been reported to not work properly with certain devices, at 
> > least
> > when using some AMD GPUs Linux attempts to route interrupts over event
> > channels, but Xen doesn't correctly detect such routing, which leads to the
> > hypervisor complaining with:
> > 
> > (XEN) d15v0: Unsupported MSI delivery mode 7 for Dom15
> > 
> > When MSIs are attempted to be routed over event channels the entry delivery
> > mode is set to ExtINT, but Xen doesn't detect such routing and attempts to
> > inject the interrupt following the native MSI path, and the ExtINT delivery
> > mode is not supported.
> 
> Shouldn't this be properly addressed nevertheless?

There's no documentation at all about how the HVM PIRQ interface, so
every time I had to deal with it I had to guess and reverse engineer
how it's supposed to work.

> The way it's described
> it sounds as if MSI wouldn't work at all this way; I can't spot why the
> issue would only be "with certain devices". Yet that in turn doesn't look
> to be very likely - pass-through use cases, in particular SR-IOV ones,
> would certainly have noticed.

I think it has to do with when/how the MSI is updated.  I can't repro
myself, which makes fixing even more complicated.

TBH, I think the feature is simply broken, and I don't feel like
spending more time fixing it.  The fact that it was added in the first
place, and enabled by default was IMO a mistake.

If someone is willing to fix the issue, I'm fine with that, but I
certainly don't want to spend more time fixing issues for an
undocumented feature, the more that going forward such feature is
likely to be even less relevant with hardware APIC virtualization
being more widely available.

FWIW, this patch has an issue in libxl with PV guests, so will need to
send v2.

Thanks, Roger.



Re: [PATCH v2 08/15] VMX: convert vmx_basic_msr

2024-01-10 Thread Jan Beulich
On 09.01.2024 15:54, Andrew Cooper wrote:
> On 09/01/2024 2:39 pm, Jan Beulich wrote:
>> Even the inner struct plan falls apart pretty quickly (or grows what
>> needs doing by too much for my taste, in the context right here):
>> While for basic_msr this works, and it would apparently also work
>> for vmfunc and tertiary exec control (the latter is itself only part
>> of a yet to be reviewed / approved patch), it doesn't for all the
>> others with split 0-setting and 1-setting halves. This is because
>> what VMX code wants are the calculated values to put in the VMCS,
>> whereas imo in the policy we'd want to store both halves (and what
>> exactly wants to be in the host policy there isn't really clear to
>> me). As a result I can't create a single uniform structure properly
>> serving both purposes. Nor could I have VMX code use the host
>> policy for most of its capability checks.
>>
>> Thought / ideas?
> 
> If it's not actually trivial, then don't worry.
> 
> The policy does need to hold the architectural representation.  The
> in-use settings need storing per-vCPU because they do (or need to me
> made to) vary based on the configuration of the VM, and because they're
> needed on every virtual vmentry when re-calculating VMCS02.

Would it help if I did a hybrid approach, i.e. move to raw/host policies
only what can be easily moved, with the rest kept as is (perhaps with
vmx_caps renamed to vmx_ctls then)?

Jan



Re: [PATCH] x86/hvm: don't expose XENFEAT_hvm_pirqs by default

2024-01-10 Thread Jan Beulich
On 10.01.2024 10:53, Roger Pau Monne wrote:
> The HVM pirq feature allows routing interrupts from both physical and emulated
> devices over event channels, this was done a performance improvement.  However
> its usage is fully undocumented, and the only reference implementation is in
> Linux.  It defeats the purpose of local APIC hardware virtualization, because
> when using it interrupts avoid the usage of the local APIC altogether.

So without sufficient APIC acceleration, isn't this arranging for degraded
performance then? IOW should the new default perhaps be dependent on the
degree of APIC acceleration?

> It has also been reported to not work properly with certain devices, at least
> when using some AMD GPUs Linux attempts to route interrupts over event
> channels, but Xen doesn't correctly detect such routing, which leads to the
> hypervisor complaining with:
> 
> (XEN) d15v0: Unsupported MSI delivery mode 7 for Dom15
> 
> When MSIs are attempted to be routed over event channels the entry delivery
> mode is set to ExtINT, but Xen doesn't detect such routing and attempts to
> inject the interrupt following the native MSI path, and the ExtINT delivery
> mode is not supported.

Shouldn't this be properly addressed nevertheless? The way it's described
it sounds as if MSI wouldn't work at all this way; I can't spot why the
issue would only be "with certain devices". Yet that in turn doesn't look
to be very likely - pass-through use cases, in particular SR-IOV ones,
would certainly have noticed.

Jan



Re: AMD EPYC VM to VM performance investigation

2024-01-10 Thread David Morel
On Thu, Jan 04, 2024 at 16:39:46PM, Stefano Stabellini wrote:
> On Thu, 4 Jan 2024, David Morel wrote:
> > Hello,
> > 
> > We have a customer and multiple users on our forum having performances that
> > seems quite low related to the general performance of the machines on AMD 
> > EPYC
> > Zen hosts when doing VM to VM networking.
> 
> By "VM to VM networking" I take you mean VM-to-VM on the same host using
> PV network?
> 
> 
> > Below you'll find a write up about what we had a look at and what's in the
> > TODO on our side, but in the meantime we would like to ask here for some
> > feedback, suggestions and possible leads.
> > 
> > To sum up, the VM to VM performance on Zen generation server CPUs seems 
> > quite
> > low, and only minimally scaling when adding threads. They are outperformed 
> > by
> > 10 year old AMD desktop cpu and pretty low frequency XEON bronze from 2014.
> > CPU usage does not seem to be the limiting factor as neither the VM threads 
> > or
> > the kthreads on host seems to go to a 100% cpu usage.
> > 
> > As we're Vates, I'm talking about XCP-ng here, so Xen 4.13.5 and a dom0 
> > kernel
> > 4.19. I did try a Xen 4.18-rc2 and kernel 6.1.56 on a Zen4 epyc, but as it 
> > was
> > borrowed from a colleague I was unsure of the setup, so although it was
> > actually worse than on my other test setups, I would not consider that a
> > complete validation the issues is also present on recent Xen versions.
> 
> I think it might be difficult to triage this if you are working on a
> Xen/Linux version that is so different from upstream
I ran some tests on a Xen 4.13.5 with a dom0 in 6.6.10, and on an XCP-ng on
the same machine, the performances are similar, a few percent better on
the recent Xen, but still pretty low for such a machine and similar to
other EPYC we looked at.

> 
> > 1. Has anybody else noticed a similar behavior?
> > 2. Has anybody done any kind of investigation about it beside us?
> > 3. Any insight and suggestions of other points to look at would be welcome 
> > :)
> > 
> > And now the lengthy part about what we tested, I tried to make it shorter 
> > and
> > more legible than a full report…
> > 
> > Investigated
> > 
> > 
> > - Bench various cpu with iperf2 (iperf3 is not actually multithreaded):
> >   - amd fx8320e, xeon 3106: not impacted.
> >   - epyc 7451, 7443, 7302p, 7313p, 9124: impacted, but the zen4 one scales a
> > bit more than zen1, 2 and 3.
> >   - ryzen 5950x, ryzen 7600: performances should likely be better than
> > observed results, but still way better than epycs, and scaling nicely 
> > with
> > more threads.
> > - Bench with tinymembench[1]: performances were as expected and didn't show
> >   issues with rep movsb as discussed in this article[2] and issue[3]. Which
> >   makes sense as it looks like this issues is related to ERMS support which 
> > is
> >   not present on Zen1 and 2 where the issue has been raised.
> > - Bench skb allocation with a small kernel module measuring cycles: actually
> >   same or lower on epyc than on the xeon with higher frequency so can be
> >   considered faster and likely not related to our issue.
> > - mitigations: we tried disabling what can be disabled through boot
> >   parameters, both for xen, dom0 and guests, but this made no differences.
> > - disabling AVX; Zen cpus before zen4 are know to limit boost and cpu 
> > scaling
> >   when doing heavy AVX load on one core, there was no reason to think this 
> > was
> >   related, but it was a quick test and as expected had no effect.
> > - localhost iperf bench on dom0 and guests: we noticed that on other 
> > machines
> >   host/guest with 1 threads are almost 1:1, with 4 threads guests are about
> >   generally not scaling as well in guests. On epyc machines, host tests were
> >   significantly slower than guests both with 1 and 4 threads, first
> >   investigation of profiling didn't help finding a cause yet. More in the
> >   profiling and TODO.
> 
> Wait, are you saying that the localhost iperf benchmark is faster in a
> VM compared to host ("host" I take means baremetal Linux without a
> hypervisor) ?   Maybe you meant the other way around?
> 
> 
> > - cpu load: top/htop/xentop all seem to indicate that machines are not under
> >   full load, queue allocations on dom0 for VIF are by default (1 per vcpu) 
> > and
> >   seem to be all used when traffic is running but at a percentage below 100%
> >   per core/thread.
> > - pinning: manually pinning dom0 and guests to the same node and avoiding
> >   sharing cpu "threads" between host and guests gives a minimal increase of 
> > a
> >   few percents, but nothing drastic. Note, we do not know about the
> >   ccd/ccx/node mapping on these cpus, so we are not sure all memory access 
> > are
> >   "local".
> > - sched weight: playing with sched weight to prioritize dom0 did not make a
> >   difference either, which makes sense as the system are not under full 
> > load.
> > - cpu scaling: it is 

Re: [PATCH v12 07/15] rangeset: add rangeset_purge() function

2024-01-10 Thread Jan Beulich
On 09.01.2024 22:51, Stewart Hildebrand wrote:
> From: Volodymyr Babchuk 
> 
> This function can be used when user wants to remove all rangeset
> entries but do not want to destroy rangeset itself.
> 
> Signed-off-by: Volodymyr Babchuk 
> Signed-off-by: Stewart Hildebrand 

Acked-by: Jan Beulich 





[PATCH] x86/hvm: don't expose XENFEAT_hvm_pirqs by default

2024-01-10 Thread Roger Pau Monne
The HVM pirq feature allows routing interrupts from both physical and emulated
devices over event channels, this was done a performance improvement.  However
its usage is fully undocumented, and the only reference implementation is in
Linux.  It defeats the purpose of local APIC hardware virtualization, because
when using it interrupts avoid the usage of the local APIC altogether.

It has also been reported to not work properly with certain devices, at least
when using some AMD GPUs Linux attempts to route interrupts over event
channels, but Xen doesn't correctly detect such routing, which leads to the
hypervisor complaining with:

(XEN) d15v0: Unsupported MSI delivery mode 7 for Dom15

When MSIs are attempted to be routed over event channels the entry delivery
mode is set to ExtINT, but Xen doesn't detect such routing and attempts to
inject the interrupt following the native MSI path, and the ExtINT delivery
mode is not supported.

Disable HVM PIRQs by default and provide a per-domain option in xl.cfg to
enable such feature.  Also for backwards compatibility keep the feature enabled
for any resumed domains that don't have an explicit selection.

Note that the only user of the feature (Linux) is also able to handle native
interrupts fine, as the feature was already not used if Xen reported local APIC
hardware virtualization active.

Link: https://github.com/QubesOS/qubes-issues/issues/7971
Signed-off-by: Roger Pau Monné 
---
 docs/man/xl.cfg.5.pod.in  |  7 +++
 tools/include/libxl.h |  7 +++
 tools/libs/light/libxl_create.c   |  7 +--
 tools/libs/light/libxl_types.idl  |  1 +
 tools/libs/light/libxl_x86.c  | 11 ---
 tools/python/xen/lowlevel/xc/xc.c |  4 +++-
 tools/xl/xl_parse.c   |  1 +
 xen/arch/x86/domain.c |  4 +++-
 8 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 2e234b450efb..ea8d41727d8e 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2460,6 +2460,13 @@ The viridian option can be specified as a boolean. A 
value of true (1)
 is equivalent to the list [ "defaults" ], and a value of false (0) is
 equivalent to an empty list.
 
+=item B
+
+Select whether the guest is allowed to route interrupts from devices (either
+emulated or passed through) over event channels.
+
+This option is disabled by default.
+
 =back
 
 =head3 Emulated VGA Graphics Device
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 907aa0a3303a..f1652b1664f0 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -608,6 +608,13 @@
  * executable in order to not run it as the same user as libxl.
  */
 
+/*
+ * LIBXL_HAVE_HVM_PIRQ indicates the presence of the u.hvm.pirq filed in
+ * libxl_domain_build_info that signals whether an HVM guest has accesses to
+ * the XENFEAT_hvm_pirqs feature.
+ */
+#define LIBXL_HAVE_HVM_PIRQ 1
+
 /*
  * libxl memory management
  *
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index ce1d43110336..0008fac607e3 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -376,6 +376,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 libxl_defbool_setdefault(_info->u.hvm.usb,false);
 libxl_defbool_setdefault(_info->u.hvm.vkb_device, true);
 libxl_defbool_setdefault(_info->u.hvm.xen_platform_pci,   true);
+libxl_defbool_setdefault(_info->u.hvm.pirq,   false);
 
 libxl_defbool_setdefault(_info->u.hvm.spice.enable, false);
 if (!libxl_defbool_val(b_info->u.hvm.spice.enable) &&
@@ -2375,10 +2376,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, 
libxl_domain_config *d_config,
 
 /*
  * When restoring (either from a save file or for a migration domain) set
- * the MSR relaxed mode for compatibility with older Xen versions if the
- * option is not set as part of the original configuration.
+ * the MSR relaxed mode and HVM PIRQs for compatibility with older Xen
+ * versions if the options are not set as part of the original
+ * configuration.
  */
 libxl_defbool_setdefault(_config->b_info.arch_x86.msr_relaxed, true);
+libxl_defbool_setdefault(_config->b_info.u.hvm.pirq, true);
 
 return do_domain_create(ctx, d_config, domid, restore_fd, send_back_fd,
 params, ao_how, aop_console_how);
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 7d8bd5d21667..899ad3096926 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -692,6 +692,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
("rdm", libxl_rdm_reserve),
("rdm_mem_boundary_memkb", MemKB),
("mca_caps", uint64),
+   

Re: [PATCH v5 09/13] xen: add cache coloring allocator for domains

2024-01-10 Thread Jan Beulich
On 10.01.2024 01:46, Stefano Stabellini wrote:
> On Tue, 9 Jan 2024, Jan Beulich wrote:
>> On 02.01.2024 10:51, Carlo Nonato wrote:
>>> This commit adds a new memory page allocator that implements the cache
>>> coloring mechanism. The allocation algorithm enforces equal frequency
>>> distribution of cache partitions, following the coloring configuration of a
>>> domain. This allows an even utilization of cache sets for every domain.
>>>
>>> Pages are stored in a color-indexed array of lists. Those lists are filled
>>> by a simple init function which computes the color of each page.
>>> When a domain requests a page, the allocator extract the page from the list
>>> with the maximum number of free pages between those that the domain can
>>> access, given its coloring configuration.
>>>
>>> The allocator can only handle requests of order-0 pages. This allows for
>>> easier implementation and since cache coloring targets only embedded 
>>> systems,
>>> it's assumed not to be a major problem.
>>
>> I'm curious about the specific properties of embedded systems that makes
>> the performance implications of deeper page walks less of an issue for
>> them.
> 
> I think Carlo meant to say that embedded systems tend to have a smaller
> amount of RAM (our boards today have 4-8GB of total memory). So higher
> level allocations (2MB/1GB) might not be possible.
> 
> Also, domains that care about interrupt latency tend to be RTOSes (e.g.
> Zephyr, FreeRTOS) and RTOSes are happy to run with less than 1MB of
> total memory available. This is so true that I vaguely remember hitting
> a bug in xl/libxl when I tried to create a domain with 128KB of memory. 
> 
> 
>> Nothing is said about address-constrained allocations. Are such entirely
>> of no interest to domains on Arm, not even to Dom0 (e.g. for filling
>> Linux'es swiotlb)?
> 
> Cache coloring is useful if you can use an IOMMU with all the
> dma-capable devices. If that is not the case, then not even Dom0 would
> be able to boot with cache coloring enabled (because it wouldn't be 1:1
> mapped).
> 
> On ARM we only support booting Dom0 1:1 mapped, or not-1:1-mapped but
> relying on the IOMMU.

So another constraint to be enforced both at the Kconfig level and at
runtime? That said, Linux'es swiotlb allocation can't know whether an
IOMMU is in use by Xen. If something like that was done in a Dom0, the
respective allocations still wouldn't really work correctly (and the
kernel may or may not choke on this).

Jan



Re: [PATCH v6 07/11] virtio-gpu: Handle resource blob commands

2024-01-10 Thread Pierre-Eric Pelloux-Prayer




Le 09/01/2024 à 17:50, Pierre-Eric Pelloux-Prayer a écrit :



Le 19/12/2023 à 08:53, Huang Rui a écrit :

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
---

Changes in v6:
- Use new struct virgl_gpu_resource.
- Unmap, unref and destroy the resource only after the memory region
   has been completely removed.
- In unref check whether the resource is still mapped.
- In unmap_blob check whether the resource has been already unmapped.
- Fix coding style

  hw/display/virtio-gpu-virgl.c | 274 +-
  hw/display/virtio-gpu.c   |   4 +-
  meson.build   |   4 +
  3 files changed, 276 insertions(+), 6 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index faab374336..5a3a292f79 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -17,6 +17,7 @@
  #include "trace.h"
  #include "hw/virtio/virtio.h"
  #include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-bswap.h"
  #include "ui/egl-helpers.h"
@@ -24,8 +25,62 @@
  struct virgl_gpu_resource {
  struct virtio_gpu_simple_resource res;
+    uint32_t ref;
+    VirtIOGPU *g;
+
+#ifdef HAVE_VIRGL_RESOURCE_BLOB
+    /* only blob resource needs this region to be mapped as guest mmio */
+    MemoryRegion *region;
+#endif
  };
+static void vres_get_ref(struct virgl_gpu_resource *vres)
+{
+    uint32_t ref;
+
+    ref = qatomic_fetch_inc(>ref);
+    g_assert(ref < INT_MAX);
+}
+
+static void virgl_resource_destroy(struct virgl_gpu_resource *vres)
+{
+    struct virtio_gpu_simple_resource *res;
+    VirtIOGPU *g;
+
+    if (!vres) {
+    return;
+    }
+
+    g = vres->g;
+    res = >res;
+    QTAILQ_REMOVE(>reslist, res, next);
+    virtio_gpu_cleanup_mapping(g, res);
+    g_free(vres);
+}
+
+static void virgl_resource_unref(struct virgl_gpu_resource *vres)
+{
+    struct virtio_gpu_simple_resource *res;
+
+    if (!vres) {
+    return;
+    }
+
+    res = >res;
+    virgl_renderer_resource_detach_iov(res->resource_id, NULL, NULL);
+    virgl_renderer_resource_unref(res->resource_id);
+}
+
+static void vres_put_ref(struct virgl_gpu_resource *vres)
+{
+    g_assert(vres->ref > 0);
+
+    if (qatomic_fetch_dec(>ref) == 1) {
+    virgl_resource_unref(vres);
+    virgl_resource_destroy(vres);
+    }
+}
+
  static struct virgl_gpu_resource *
  virgl_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id)
  {
@@ -59,6 +114,8 @@ static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
 c2d.width, c2d.height);
  vres = g_new0(struct virgl_gpu_resource, 1);
+    vres_get_ref(vres);
+    vres->g = g;
  vres->res.width = c2d.width;
  vres->res.height = c2d.height;
  vres->res.format = c2d.format;
@@ -91,6 +148,8 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
 c3d.width, c3d.height, c3d.depth);
  vres = g_new0(struct virgl_gpu_resource, 1);
+    vres_get_ref(vres);
+    vres->g = g;
  vres->res.width = c3d.width;
  vres->res.height = c3d.height;
  vres->res.format = c3d.format;
@@ -126,12 +185,21 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  return;
  }
-    virgl_renderer_resource_detach_iov(unref.resource_id, NULL, NULL);
-    virgl_renderer_resource_unref(unref.resource_id);
+#ifdef HAVE_VIRGL_RESOURCE_BLOB
+    if (vres->region) {
+    VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+    MemoryRegion *mr = vres->region;
+
+    warn_report("%s: blob resource %d not unmapped",
+    __func__, unref.resource_id);
+    vres->region = NULL;


Shouldn't there be a call to memory_region_unref(mr)?


+    memory_region_set_enabled(mr, false);
+    memory_region_del_subregion(>hostmem, mr);
+    object_unparent(OBJECT(mr));
+    }
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
-    QTAILQ_REMOVE(>reslist, >res, next);
-    virtio_gpu_cleanup_mapping(g, >res);
-    g_free(vres);
+    vres_put_ref(vres);
  }
  static void virgl_cmd_context_create(VirtIOGPU *g,
@@ -470,6 +538,191 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
  g_free(resp);
  }
+#ifdef HAVE_VIRGL_RESOURCE_BLOB
+
+static void virgl_resource_unmap(struct virgl_gpu_resource *vres)
+{
+    if (!vres) {
+    return;
+    }
+
+    virgl_renderer_resource_unmap(vres->res.resource_id);
+
+    vres_put_ref(vres);
+}
+
+static void virgl_resource_blob_async_unmap(void *obj)
+{
+    MemoryRegion *mr = MEMORY_REGION(obj);
+    struct virgl_gpu_resource *vres = mr->opaque;
+
+    virgl_resource_unmap(vres);
+
+    g_free(obj);
+}
+
+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+  

Re: [RFC XEN PATCH v4 4/5] domctl: Use gsi to grant/revoke irq permission

2024-01-10 Thread Chen, Jiqian
Thank Jan and Roger, I may know how to add a new hypercall 
XEN_DOMCTL_gsi_permission, I will implement it in next version.

On 2024/1/9 18:46, Jan Beulich wrote:
> On 09.01.2024 11:16, Chen, Jiqian wrote:
>> On 2024/1/9 17:38, Jan Beulich wrote:
>>> On 09.01.2024 09:18, Chen, Jiqian wrote:
 A new hypercall using for granting gsi? If so, how does the caller know to 
 call which hypercall to grant permission, XEN_DOMCTL_irq_permission or 
 that new hypercall?
>>>
>>> Either we add a feature indicator, or the caller simply tries the
>>> new GSI interface first.
>> I am still not sure how to use and implement it.
>> Taking pci_add_dm_done as an example, for now its implementation is:
>> pci_add_dm_done
>>  xc_physdev_map_pirq
>>  xc_domain_irq_permission(,,pirq,)
>>  XEN_DOMCTL_irq_permission
>>
>> And assume the new hypercall is XEN_DOMCTL_gsi_permission, do you mean:
>> pci_add_dm_done
>>  xc_physdev_map_pirq
>>  ret = xc_domain_gsi_permission(,,gsi,)
>>  XEN_DOMCTL_gsi_permission
>>  if ( ret != 0 )
>>  xc_domain_irq_permission(,,pirq,)
>>  XEN_DOMCTL_irq_permission
> 
> No, falling back shouldn't be "blind". Fallback should only happen
> when the new sub-op isn't implemented (hence why a feature indicator
> may be necessary), and only if calling the existing sub-op promises
> to be useful (which iirc would limit that to the PV Dom0 case).
> 
>> But if so, I have a question that in XEN_DOMCTL_gsi_permission, when to fail 
>> and when to success?
> 
> I'm afraid I don't understand the question. Behavior there isn't to
> be fundamentally different from that for XEN_DOMCTL_irq_permission.
> It's just that the incoming value is in another value space.
> 
>> Or do you mean:
>> pci_add_dm_done
>>  xc_physdev_map_pirq
>>  ret = xc_domain_irq_permission(,,pirq,)
>>  XEN_DOMCTL_irq_permission
>>  if ( ret != 0 )
>>  xc_domain_gsi_permission(,,gsi,)
>>  XEN_DOMCTL_gsi_permission
> 
> No, this looks the wrong way round.
> 
>> And in XEN_DOMCTL_gsi_permission, as long as the current domain has the 
>> access of gsi, then granting gsi to caller should be successful. Right?
> 
> I think so; see above.
> 
> Jan

-- 
Best regards,
Jiqian Chen.


Re: [PATCH v2 4/7] xen/device-tree: Fix bootfdt.c to tolerate 0 reserved regions

2024-01-10 Thread Michal Orzel



On 09/01/2024 19:14, Julien Grall wrote:
> 
> 
> (+ Stefano)
> 
> Hi Shawn,
> 
> On 15/12/2023 02:43, Shawn Anastasio wrote:
>> The early_print_info routine in bootfdt.c incorrectly stores the result
>> of a call to fdt_num_mem_rsv() in an unsigned int, which results in the
>> negative error code being interpreted incorrectly in a subsequent loop
>> in the case where the device tree does not contain any memory reserve
>> map entries.
> 
> I have some trouble to reconciliate the code with your explanation.
> Looking at the implementation fdt_num_mem_rsv() should return 0 if there
> are no reserved regions. A negative value would only be returned if the
> device-tree is malformated.
I agree with Julien. The function takes an offset to reserve map and grabs 
blocks of type fdt_reserve_entry
from there. In case of no regions, there will be one entry with addr/size 0 
which always acts as a termination region.
The only way to return < 0 is when you have a buggy FDT.

> 
> Do you have a Device-Tree where the issue occurs?
> 
> That said, I agree that the code could be hardened.
> 
>>
>> Signed-off-by: Shawn Anastasio 
>> ---
>>   xen/common/device-tree/bootfdt.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/common/device-tree/bootfdt.c 
>> b/xen/common/device-tree/bootfdt.c
>> index ae9fa1e3d6..796ac01c18 100644
>> --- a/xen/common/device-tree/bootfdt.c
>> +++ b/xen/common/device-tree/bootfdt.c
>> @@ -466,7 +466,8 @@ static void __init early_print_info(void)
>>   struct meminfo *mem_resv = _mem;
>>   struct bootmodules *mods = 
>>   struct bootcmdlines *cmds = 
>> -unsigned int i, j, nr_rsvd;
>> +unsigned int i, j;
>> +int nr_rsvd;
>>
>>   for ( i = 0; i < mi->nr_banks; i++ )
>>   printk("RAM: %"PRIpaddr" - %"PRIpaddr"\n",
>> @@ -481,7 +482,7 @@ static void __init early_print_info(void)
>>   boot_module_kind_as_string(mods->module[i].kind));
>>
>>   nr_rsvd = fdt_num_mem_rsv(device_tree_flattened);
> 
> If I am correct above, then I think we should panic() rather than trying
> to continue with a buggy DT.
+1. Furthermore, we already call panic in such case in dt_unreserved_regions().

> 
>> -for ( i = 0; i < nr_rsvd; i++ )
>> +for ( i = 0; nr_rsvd > 0 && i < nr_rsvd; i++ )
>>   {
>>   paddr_t s, e;
>>
> 
> Cheers,
> 
> --
> Julien Grall
> 

~Michal



[ovmf test] 184302: all pass - PUSHED

2024-01-10 Thread osstest service owner
flight 184302 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184302/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 265b4ab91b8a31d6d1760ad1eaa1e16f9244cba7
baseline version:
 ovmf edba0779ba05c0598dbdd32006714fed4fd24ae0

Last test of basis   184300  2024-01-10 01:56:08 Z0 days
Testing same since   184302  2024-01-10 06:11:07 Z0 days1 attempts


People who touched revisions under test:
  Abner Chang 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   edba0779ba..265b4ab91b  265b4ab91b8a31d6d1760ad1eaa1e16f9244cba7 -> 
xen-tested-master



Re: [XEN RFC] x86/uaccess: remove __{put,get}_user_bad()

2024-01-10 Thread Federico Serafini

On 08/01/24 15:44, Jan Beulich wrote:

On 08.01.2024 15:01, Federico Serafini wrote:

Additionally, looking at violations of 16.3 on X86 [1],
I think we should also consider generate_exception(),
ASSERT_UNREACHABLE() and PARSE_ERR_RET() as allowed terminals
for a switch-clause, do you agree?


No, and iirc this was discussed before. ASSERT_UNREACHABLE() is a
debug-build-only construct, so unsuitable. The other two aren't
global constructs, and hence shouldn't be deviated globally.
generate_exception() at least ends in a goto anyway, so why would
it need special treatment?


This is related to what I said before regarding the fact that Rule 16.3
is a purely syntactic rule.
The goto is within an if statement; the tool is able to considered the
fact that the condition is always true but an explicit configuration is
needed.
Sorry for the late reply.

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)