from:"Roger Pau Monne"

[PATCH for-4.19 9/9] x86/irq: forward pending interrupts to new destination in fixup_irqs()

2024-05-29 Thread Roger Pau Monne

fixup_irqs() is used to evacuate interrupts from to be offlined CPUs.  Given
the CPU is to become offline, the normal migration logic used by Xen where the
vector in the previous target(s) is left configured until the interrupt is
received on the new destination is not suitable.

Instead attempt to do as much as possible in order to prevent loosing
interrupts.  If fixup_irqs() is called from the CPU to be offlined (as is
currently the case) attempt to forward pending vectors when interrupts that
target the current CPU are migrated to a different destination.

Additionally, for interrupts that have already been moved from the current CPU
prior to the call to fixup_irqs() but that haven't been delivered to the new
destination (iow: interrupts with move_in_progress set and the current CPU set
in ->arch.old_cpu_mask) also check whether the previous vector is pending and
forward it to the new destination.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/apic.c |  5 +
 xen/arch/x86/include/asm/apic.h |  3 +++
 xen/arch/x86/irq.c  | 39 +++--
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 6567af685a1b..4d77fba3ed19 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1543,3 +1543,8 @@ void check_for_unexpected_msi(unsigned int vector)
 {
 BUG_ON(apic_isr_read(vector));
 }
+
+bool lapic_check_pending(unsigned int vector)
+{
+return apic_read(APIC_IRR + (vector / 32 * 0x10)) & (1U << (vector % 32));
+}
diff --git a/xen/arch/x86/include/asm/apic.h b/xen/arch/x86/include/asm/apic.h
index d1cb001fb4ab..7b5a0832c05e 100644
--- a/xen/arch/x86/include/asm/apic.h
+++ b/xen/arch/x86/include/asm/apic.h
@@ -179,6 +179,9 @@ extern void record_boot_APIC_mode(void);
 extern enum apic_mode current_local_apic_mode(void);
 extern void check_for_unexpected_msi(unsigned int vector);
 
+/* Return whether vector is pending in IRR. */
+bool lapic_check_pending(unsigned int vector);
+
 uint64_t calibrate_apic_timer(void);
 uint32_t apic_tmcct_read(void);
 
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 33979729257e..211ad3bd7cf5 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2591,8 +2591,8 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 
 for ( irq = 0; irq < nr_irqs; irq++ )
 {
-bool break_affinity = false, set_affinity = true;
-unsigned int vector;
+bool break_affinity = false, set_affinity = true, check_irr = false;
+unsigned int vector, cpu = smp_processor_id();
 cpumask_t *affinity = this_cpu(scratch_cpumask);
 
 if ( irq == 2 )
@@ -2643,6 +2643,25 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
  !cpumask_test_cpu(cpu, _online_map) &&
  cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
 {
+/*
+ * This to be offlined CPU was the target of an interrupt that's
+ * been moved, and the new destination target hasn't yet
+ * acknowledged any interrupt from it.
+ *
+ * We know the interrupt is configured to target the new CPU at
+ * this point, so we can check IRR for any pending vectors and
+ * forward them to the new destination.
+ *
+ * Note the difference between move_in_progress or
+ * move_cleanup_count being set.  For the later we know the new
+ * destination has already acked at least one interrupt from this
+ * source, and hence there's no need to forward any stale
+ * interrupts.
+ */
+if ( lapic_check_pending(desc->arch.old_vector) )
+send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
+  desc->arch.vector);
+
 /*
  * This CPU is going offline, remove it from ->arch.old_cpu_mask
  * and possibly release the old vector if the old mask becomes
@@ -2683,11 +2702,27 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 if ( desc->handler->disable )
 desc->handler->disable(desc);
 
+/*
+ * If the current CPU is going offline and is (one of) the target(s) of
+ * the interrupt signal to check whether there are any pending vectors
+ * to be handled in the local APIC after the interrupt has been moved.
+ */
+if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
+check_irr = true;
+
 if ( desc->handler->set_affinity )
 desc->handler->set_affinity(desc, affinity);
 else if ( !(warned++) )
 set_affinity = false;
 
+if ( check_irr && lapic_check_pending(vector) )
+/*
+ * Forward pending interrupt to the new destination, this CPU is
+ * going offline and otherwise the interrupt would be lost.
+ */
+

[PATCH for-4.19 7/9] x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs()

2024-05-29 Thread Roger Pau Monne

Given the current logic it's possible for ->arch.old_cpu_mask to get out of
sync: if a CPU set in old_cpu_mask is offlined and then onlined
again without old_cpu_mask having been updated the data in the mask will no
longer be accurate, as when brought back online the CPU will no longer have
old_vector configured to handle the old interrupt source.

If there's an interrupt movement in progress, and the to be offlined CPU (which
is the call context) is in the old_cpu_mask clear it and update the mask, so it
doesn't contain stale data.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/irq.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index ae8fa574d4e7..8822f382bfe4 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2602,6 +2602,28 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 _online_map);
 }
 
+if ( desc->arch.move_in_progress &&
+ !cpumask_test_cpu(cpu, _online_map) &&
+ cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
+{
+/*
+ * This CPU is going offline, remove it from ->arch.old_cpu_mask
+ * and possibly release the old vector if the old mask becomes
+ * empty.
+ *
+ * Note cleaning ->arch.old_cpu_mask is required if the CPU is
+ * brought offline and then online again, as when re-onlined the
+ * per-cpu vector table will no longer have ->arch.old_vector
+ * setup, and hence ->arch.old_cpu_mask would be stale.
+ */
+cpumask_clear_cpu(cpu, desc->arch.old_cpu_mask);
+if ( cpumask_empty(desc->arch.old_cpu_mask) )
+{
+desc->arch.move_in_progress = 0;
+release_old_vec(desc);
+}
+}
+
 /*
  * Avoid shuffling the interrupt around as long as current target CPUs
  * are a subset of the input mask.  What fixup_irqs() cares about is
-- 
2.44.0

[PATCH for-4.19 5/9] x86/irq: limit interrupt movement done by fixup_irqs()

2024-05-29 Thread Roger Pau Monne

The current check used in fixup_irqs() to decide whether to move around
interrupts is based on the affinity mask, but such mask can have all bits set,
and hence is unlikely to be a subset of the input mask.  For example if an
interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
an all set CPU mask would cause that interrupt to be shuffled around
unconditionally.

What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
input CPU mask, and for that purpose it should check whether the interrupt is
assigned to a CPU not present in the input mask.  Assume that ->arch.cpu_mask
is a subset of the ->affinity mask, and keep the current logic that resets the
->affinity mask if the interrupt has to be shuffled around.

While there also adjust the comment as to the purpose of fixup_irqs().

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Do not AND ->arch.cpu_mask with cpu_online_map.
 - Keep using cpumask_subset().
 - Integrate into bigger series.
---
 xen/arch/x86/include/asm/irq.h | 2 +-
 xen/arch/x86/irq.c | 9 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 80a3aa7a88b9..b7dc75d0acbd 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -156,7 +156,7 @@ void free_domain_pirqs(struct domain *d);
 int map_domain_emuirq_pirq(struct domain *d, int pirq, int emuirq);
 int unmap_domain_pirq_emuirq(struct domain *d, int pirq);
 
-/* Reset irq affinities to match the given CPU mask. */
+/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
 void fixup_irqs(const cpumask_t *mask, bool verbose);
 void fixup_eoi(void);
 
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 9716e00e873b..1b7127090377 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2525,7 +2525,7 @@ static int __init cf_check setup_dump_irqs(void)
 }
 __initcall(setup_dump_irqs);
 
-/* Reset irq affinities to match the given CPU mask. */
+/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
 void fixup_irqs(const cpumask_t *mask, bool verbose)
 {
 unsigned int irq;
@@ -2582,7 +2582,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 _online_map);
 }
 
-if ( !desc->action || cpumask_subset(desc->affinity, mask) )
+/*
+ * Avoid shuffling the interrupt around as long as current target CPUs
+ * are a subset of the input mask.  What fixup_irqs() cares about is
+ * evacuating interrupts from CPUs not in the input mask.
+ */
+if ( !desc->action || cpumask_subset(desc->arch.cpu_mask, mask) )
 {
 spin_unlock(>lock);
 continue;
-- 
2.44.0

[PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway

2024-05-29 Thread Roger Pau Monne

Due to the current rwlock logic, if the CPU calling get_cpu_maps() does so from
a cpu_hotplug_{begin,done}() region the function will still return success,
because a CPU taking the rwlock in read mode after having taken it in write
mode is allowed.  Such behavior however defeats the purpose of get_cpu_maps(),
as it should always return false when called with a CPU hot{,un}plug operation
is in progress.  Otherwise the logic in send_IPI_mask() for example is wrong,
as it could decide to use the shorthand even when a CPU operation is in
progress.

Adjust the logic in get_cpu_maps() to return false when the CPUs lock is
already hold in write mode by the current CPU, as read_trylock() would
otherwise return true.

Fixes: 868a01021c6f ('rwlock: allow recursive read locking when already locked 
in write mode')
Signed-off-by: Roger Pau Monné 
---
 xen/common/cpu.c | 3 ++-
 xen/include/xen/rwlock.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 6173220e771b..d76f80fe2e99 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -49,7 +49,8 @@ static DEFINE_RWLOCK(cpu_add_remove_lock);
 
 bool get_cpu_maps(void)
 {
-return read_trylock(_add_remove_lock);
+return !rw_is_write_locked_by_me(_add_remove_lock) &&
+   read_trylock(_add_remove_lock);
 }
 
 void put_cpu_maps(void)
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index a2e98cad343e..4e7802821859 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -316,6 +316,8 @@ static always_inline void write_lock_irq(rwlock_t *l)
 
 #define rw_is_locked(l)   _rw_is_locked(l)
 #define rw_is_write_locked(l) _rw_is_write_locked(l)
+#define rw_is_write_locked_by_me(l) \
+lock_evaluate_nospec(_is_write_locked_by_me(atomic_read(&(l)->cnts)))
 
 
 typedef struct percpu_rwlock percpu_rwlock_t;
-- 
2.44.0

[PATCH for-4.19 1/9] x86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count

2024-05-29 Thread Roger Pau Monne

When adjusting move_cleanup_count to account for CPUs that are offline also
adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
those again creating and create an imbalance in move_cleanup_count.

Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in 
fixup_irqs()')
Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/irq.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index c16205a9beb6..9716e00e873b 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2572,6 +2572,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 desc->arch.move_cleanup_count -= cpumask_weight(affinity);
 if ( !desc->arch.move_cleanup_count )
 release_old_vec(desc);
+else
+/*
+ * Adjust old_cpu_mask to account for the offline CPUs,
+ * otherwise further calls to fixup_irqs() could subtract those
+ * again and possibly underflow the counter.
+ */
+cpumask_and(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
+_online_map);
 }
 
 if ( !desc->action || cpumask_subset(desc->affinity, mask) )
-- 
2.44.0

[PATCH for-4.19 8/9] x86/irq: handle moving interrupts in _assign_irq_vector()

2024-05-29 Thread Roger Pau Monne

Currently there's logic in fixup_irqs() that attempts to prevent
_assign_irq_vector() from failing, as fixup_irqs() is required to evacuate all
interrupts form the CPUs not present in the input mask.  The current logic in
fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.

Instead of attempting to fixup the interrupt descriptor in fixup_irqs() so that
_assign_irq_vector() cannot fail, introduce some logic in _assign_irq_vector()
to deal with interrupts that have either move_{in_progress,cleanup_count} set
and no remaining online CPUs in ->arch.cpu_mask.

If _assign_irq_vector() is requested to move an interrupt in the state
described above, first attempt to see if ->arch.old_cpu_mask contains any valid
CPUs that could be used as fallback, and if that's the case do move the
interrupt back to the previous destination.  Note this is easier because the
vector hasn't been released yet, so there's no need to allocate and setup a new
vector on the destination.

Due to the logic in fixup_irqs() that clears offline CPUs from
->arch.old_cpu_mask (and releases the old vector if the mask becomes empty) it
shouldn't be possible to get into _assign_irq_vector() with
->arch.move_{in_progress,cleanup_count} set but no online CPUs in
->arch.old_cpu_mask.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/irq.c | 66 ++
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 8822f382bfe4..33979729257e 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -553,7 +553,44 @@ static int _assign_irq_vector(struct irq_desc *desc, const 
cpumask_t *mask)
 }
 
 if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
-return -EAGAIN;
+{
+/*
+ * If the current destination is online refuse to shuffle.  Retry after
+ * the in-progress movement has finished.
+ */
+if ( cpumask_intersects(desc->arch.cpu_mask, _online_map) )
+return -EAGAIN;
+
+/*
+ * Fallback to the old destination if moving is in progress and the
+ * current destination is to be offlined.  This is required by
+ * fixup_irqs() to evacuate interrupts from a CPU to be offlined.
+ *
+ * Due to the logic in fixup_irqs() that clears offlined CPUs from
+ * ->arch.old_cpu_mask it shouldn't be possible to get here with
+ * ->arch.move_{in_progress,cleanup_count} set and no online CPUs in
+ * ->arch.old_cpu_mask.
+ */
+ASSERT(valid_irq_vector(desc->arch.old_vector));
+ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, mask));
+ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, _online_map));
+
+/* Set the old destination as the current one. */
+desc->arch.vector = desc->arch.old_vector;
+cpumask_and(desc->arch.cpu_mask, desc->arch.old_cpu_mask, mask);
+
+/* Undo any possibly done cleanup. */
+for_each_cpu(cpu, desc->arch.cpu_mask)
+per_cpu(vector_irq, cpu)[desc->arch.vector] = irq;
+
+/* Cancel the pending move. */
+desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
+cpumask_clear(desc->arch.old_cpu_mask);
+desc->arch.move_in_progress = 0;
+desc->arch.move_cleanup_count = 0;
+
+return 0;
+}
 
 err = -ENOSPC;
 
@@ -2635,33 +2672,6 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 continue;
 }
 
-/*
- * In order for the affinity adjustment below to be successful, we
- * need _assign_irq_vector() to succeed. This in particular means
- * clearing desc->arch.move_in_progress if this would otherwise
- * prevent the function from succeeding. Since there's no way for the
- * flag to get cleared anymore when there's no possible destination
- * left (the only possibility then would be the IRQs enabled window
- * after this loop), there's then also no race with us doing it here.
- *
- * Therefore the logic here and there need to remain in sync.
- */
-if ( desc->arch.move_in_progress &&
- !cpumask_intersects(mask, desc->arch.cpu_mask) )
-{
-unsigned int cpu;
-
-cpumask_and(affinity, desc->arch.old_cpu_mask, _online_map);
-
-spin_lock(_lock);
-for_each_cpu(cpu, affinity)
-per_cpu(vector_irq, cpu)[desc->arch.old_vector] = ~irq;
-spin_unlock(_lock);
-
-release_old_vec(desc);
-desc->arch.move_in_progress = 0;
-}
-
 if ( !cpumask_intersects(mask, desc->affinity) )
 {
 break_affinity = true;
-- 
2.44.0

[PATCH for-4.19 6/9] x86/irq: restrict CPU movement in set_desc_affinity()

2024-05-29 Thread Roger Pau Monne

If external interrupts are using logical mode it's possible to have an overlap
between the current ->arch.cpu_mask and the provided mask (or TARGET_CPUS).  If
that's the case avoid assigning a new vector and just move the interrupt to a
member of ->arch.cpu_mask that overlaps with the provided mask and is online.

While there also add an extra assert to ensure the mask containing the possible
destinations is not empty before calling cpu_mask_to_apicid(), as at that point
having an empty mask is not expected.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/irq.c | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 1b7127090377..ae8fa574d4e7 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -846,19 +846,38 @@ void cf_check irq_complete_move(struct irq_desc *desc)
 
 unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
 {
-int ret;
-unsigned long flags;
 cpumask_t dest_mask;
 
 if ( mask && !cpumask_intersects(mask, _online_map) )
 return BAD_APICID;
 
-spin_lock_irqsave(_lock, flags);
-ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);
-spin_unlock_irqrestore(_lock, flags);
+/*
+ * mask input set con contain CPUs that are not online.  To decide whether
+ * the interrupt needs to be migrated restrict the input mask to the CPUs
+ * that are online.
+ */
+if ( mask )
+cpumask_and(_mask, mask, _online_map);
+else
+cpumask_copy(_mask, TARGET_CPUS);
 
-if ( ret < 0 )
-return BAD_APICID;
+/*
+ * Only move the interrupt if there are no CPUs left in ->arch.cpu_mask
+ * that can handle it, otherwise just shuffle it around ->arch.cpu_mask
+ * to an available destination.
+ */
+if ( !cpumask_intersects(desc->arch.cpu_mask, _mask) )
+{
+int ret;
+unsigned long flags;
+
+spin_lock_irqsave(_lock, flags);
+ret = _assign_irq_vector(desc, mask ?: TARGET_CPUS);
+spin_unlock_irqrestore(_lock, flags);
+
+if ( ret < 0 )
+return BAD_APICID;
+}
 
 if ( mask )
 {
@@ -871,6 +890,7 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const 
cpumask_t *mask)
 cpumask_copy(_mask, desc->arch.cpu_mask);
 }
 cpumask_and(_mask, _mask, _online_map);
+ASSERT(!cpumask_empty(_mask));
 
 return cpu_mask_to_apicid(_mask);
 }
-- 
2.44.0

[PATCH for-4.19 4/9] x86/irq: describe how the interrupt CPU movement works

2024-05-29 Thread Roger Pau Monne

The logic to move interrupts across CPUs is complex, attempt to provide a
comment that describes the expected behavior so users of the interrupt system
have more context about the usage of the arch_irq_desc structure fields.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/include/asm/irq.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 413994d2133b..80a3aa7a88b9 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -28,6 +28,32 @@ typedef struct {
 
 struct irq_desc;
 
+/*
+ * Xen logic for moving interrupts around CPUs allows manipulating interrupts
+ * that target remote CPUs.  The logic to move an interrupt from CPU(s) is as
+ * follows:
+ *
+ * 1. cpu_mask and vector is copied to old_cpu_mask and old_vector.
+ * 2. New cpu_mask and vector are set, vector is setup at the new destination.
+ * 3. move_in_progress is set.
+ * 4. Interrupt source is updated to target new CPU and vector.
+ * 5. Interrupts arriving at old_cpu_mask are processed normally.
+ * 6. When an interrupt is delivered at the new destination (cpu_mask) as part
+ *of acking the interrupt move_in_progress is cleared and 
move_cleanup_count
+ *is set to the weight of online CPUs in old_cpu_mask.
+ *IRQ_MOVE_CLEANUP_VECTOR is sent to all CPUs in old_cpu_mask.
+ * 7. When receiving IRQ_MOVE_CLEANUP_VECTOR CPUs in old_cpu_mask clean the
+ *vector entry and decrease the count in move_cleanup_count.  The CPU that
+ *sets move_cleanup_count to 0 releases the vector.
+ *
+ * Note that when interrupt movement (either move_in_progress or
+ * move_cleanup_count set) is in progress it's not possible to move the
+ * interrupt to yet a different CPU.
+ *
+ * By keeping the vector in the old CPU(s) configured until the interrupt is
+ * acked on the new destination Xen allows draining any pending interrupts at
+ * the old destinations.
+ */
 struct arch_irq_desc {
 s16 vector;  /* vector itself is only 8 bits, */
 s16 old_vector;  /* but we use -1 for unassigned  */
-- 
2.44.0

[PATCH for-4.19 2/9] xen/cpu: do not get the CPU map in stop_machine_run()

2024-05-29 Thread Roger Pau Monne

The current callers of stop_machine_run() outside of init code already have the
CPU maps locked, and hence there's no reason for stop_machine_run() to attempt
to lock again.

Replace the get_cpu_maps() call with a suitable unreachable assert.

Further changes will modify the conditions under which get_cpu_maps() returns
success and without the adjustment proposed here the usage of
stop_machine_run() in cpu_down() would then return an error.

Signed-off-by: Roger Pau Monné 
---
 xen/common/cpu.c  |  5 +
 xen/common/stop_machine.c | 15 ---
 xen/include/xen/cpu.h |  2 ++
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 8709db4d2957..6173220e771b 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -68,6 +68,11 @@ void cpu_hotplug_done(void)
 write_unlock(_add_remove_lock);
 }
 
+bool cpu_map_locked(void)
+{
+return rw_is_locked(_add_remove_lock);
+}
+
 static NOTIFIER_HEAD(cpu_chain);
 
 void __init register_cpu_notifier(struct notifier_block *nb)
diff --git a/xen/common/stop_machine.c b/xen/common/stop_machine.c
index 398cfd507c10..7face75648e8 100644
--- a/xen/common/stop_machine.c
+++ b/xen/common/stop_machine.c
@@ -82,9 +82,15 @@ int stop_machine_run(int (*fn)(void *data), void *data, 
unsigned int cpu)
 BUG_ON(!local_irq_is_enabled());
 BUG_ON(!is_idle_vcpu(current));
 
-/* cpu_online_map must not change. */
-if ( !get_cpu_maps() )
+/*
+ * cpu_online_map must not change.  The only two callers of
+ * stop_machine_run() outside of init code already have the CPU map locked.
+ */
+if ( system_state >= SYS_STATE_active && !cpu_map_locked() )
+{
+ASSERT_UNREACHABLE();
 return -EBUSY;
+}
 
 nr_cpus = num_online_cpus();
 if ( cpu_online(this) )
@@ -92,10 +98,7 @@ int stop_machine_run(int (*fn)(void *data), void *data, 
unsigned int cpu)
 
 /* Must not spin here as the holder will expect us to be descheduled. */
 if ( !spin_trylock(_lock) )
-{
-put_cpu_maps();
 return -EBUSY;
-}
 
 stopmachine_data.fn = fn;
 stopmachine_data.fn_data = data;
@@ -136,8 +139,6 @@ int stop_machine_run(int (*fn)(void *data), void *data, 
unsigned int cpu)
 
 spin_unlock(_lock);
 
-put_cpu_maps();
-
 return ret;
 }
 
diff --git a/xen/include/xen/cpu.h b/xen/include/xen/cpu.h
index e1d4eb59675c..d8c8264c58b0 100644
--- a/xen/include/xen/cpu.h
+++ b/xen/include/xen/cpu.h
@@ -13,6 +13,8 @@ void put_cpu_maps(void);
 void cpu_hotplug_begin(void);
 void cpu_hotplug_done(void);
 
+bool cpu_map_locked(void);
+
 /* Receive notification of CPU hotplug events. */
 void register_cpu_notifier(struct notifier_block *nb);
 
-- 
2.44.0

[PATCH for-4.19 0/9] x86/irq: fixes for CPU hot{,un}plug

2024-05-29 Thread Roger Pau Monne

Hello,

The following series aim to fix interrupt handling when doing CPU
plug/unplug operations.  Without this series running:

cpus=`xl info max_cpu_id`
while [ 1 ]; do
for i in `seq 1 $cpus`; do
xen-hptool cpu-offline $i;
xen-hptool cpu-online $i;
done
done

Quite quickly results in interrupts getting lost and "No irq handler for
vector" messages on the Xen console.  Drivers in dom0 also start getting
interrupt timeouts and the system becomes unusable.

After applying the series running the loop over night still result in a
fully usable system, no  "No irq handler for vector" messages at all, no
interrupt loses reported by dom0.  Test with
x2apic-mode={mixed,cluster}.

I'm tagging this for 4.19 as it's IMO bugfixes, but the series has grown
quite bigger than expected, and hence we need to be careful to not
introduce breakages late in the release cycle.  I've attempted to
document all code as good as I could, interrupt handling has some
unexpected corner cases that are hard to diagnose and reason about.

I'm currently also doing some extra testing with XenRT in case I've
missed something.

Thanks, Roger.

Roger Pau Monne (9):
  x86/irq: remove offline CPUs from old CPU mask when adjusting
move_cleanup_count
  xen/cpu: do not get the CPU map in stop_machine_run()
  xen/cpu: ensure get_cpu_maps() returns false if CPU operations are
underway
  x86/irq: describe how the interrupt CPU movement works
  x86/irq: limit interrupt movement done by fixup_irqs()
  x86/irq: restrict CPU movement in set_desc_affinity()
  x86/irq: deal with old_cpu_mask for interrupts in movement in
fixup_irqs()
  x86/irq: handle moving interrupts in _assign_irq_vector()
  x86/irq: forward pending interrupts to new destination in fixup_irqs()

 xen/arch/x86/apic.c |   5 +
 xen/arch/x86/include/asm/apic.h |   3 +
 xen/arch/x86/include/asm/irq.h  |  28 +-
 xen/arch/x86/irq.c  | 172 +---
 xen/common/cpu.c|   8 +-
 xen/common/stop_machine.c   |  15 +--
 xen/include/xen/cpu.h   |   2 +
 xen/include/xen/rwlock.h|   2 +
 8 files changed, 190 insertions(+), 45 deletions(-)

-- 
2.44.0

[PATCH for-4.19] x86/msi: prevent watchdog triggering when dumping MSI state

2024-05-17 Thread Roger Pau Monne

Use the same check that's used in dump_irqs().

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/msi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 19830528b65a..0c97fbb3fc03 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1451,6 +1452,9 @@ static void cf_check dump_msi(unsigned char key)
 unsigned long flags;
 const char *type = "???";
 
+if ( !(irq & 0x1f) )
+process_pending_softirqs();
+
 if ( !irq_desc_initialized(desc) )
 continue;
 
-- 
2.44.0

[PATCH for-4.19 v3 3/3] xen/x86: remove foreign mappings from the p2m on teardown

2024-05-17 Thread Roger Pau Monne

Iterate over the p2m up to the maximum recorded gfn and remove any foreign
mappings, in order to drop the underlying page references and thus don't keep
extra page references if a domain is destroyed while still having foreign
mappings on it's p2m.

The logic is similar to the one used on Arm.

Note that foreign mappings cannot be created by guests that have altp2m or
nested HVM enabled, as p2ms different than the host one are not currently
scrubbed when destroyed in order to drop references to any foreign maps.

It's unclear whether the right solution is to take an extra reference when
foreign maps are added to p2ms different than the host one, or just rely on the
host p2m already having a reference.  The mapping being removed from the host
p2m should cause it to be dropped on all domain p2ms.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes since v2:
 - Fix arch_acquire_resource_check() condition.

Changes since v1:
 - Use existing p2m max_mapped_pfn field.
 - Prevent creating foreign mappings by guests that have altp2m or nestedhvm
   enabled.
---
 CHANGELOG.md   |  1 +
 xen/arch/x86/domain.c  |  6 +++
 xen/arch/x86/include/asm/p2m.h | 26 +++--
 xen/arch/x86/mm/p2m-basic.c| 18 +
 xen/arch/x86/mm/p2m.c  | 68 --
 5 files changed, 103 insertions(+), 16 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index c43c45d8d4bd..59aad04e9364 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -16,6 +16,7 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
  - xl/libxl configures vkb=[] for HVM domains with priority over vkb_device.
  - Increase the maximum number of CPUs Xen can be built for from 4095 to
16383.
+ - Allow HVM/PVH domains to map foreign pages.
 
 ### Added
  - On x86:
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a89ceff9ae93..de5f6b3212fb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2386,6 +2386,7 @@ int domain_relinquish_resources(struct domain *d)
 enum {
 PROG_iommu_pagetables = 1,
 PROG_shared,
+PROG_mappings,
 PROG_paging,
 PROG_vcpu_pagetables,
 PROG_xen,
@@ -2434,6 +2435,11 @@ int domain_relinquish_resources(struct domain *d)
 }
 #endif
 
+PROGRESS(mappings):
+ret = relinquish_p2m_mapping(d);
+if ( ret )
+return ret;
+
 PROGRESS(paging):
 
 /* Tear down paging-assistance stuff. */
diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
index 107b9f260848..c1478ffc3647 100644
--- a/xen/arch/x86/include/asm/p2m.h
+++ b/xen/arch/x86/include/asm/p2m.h
@@ -383,6 +383,8 @@ struct p2m_domain {
 
 /* Number of foreign mappings. */
 unsigned long  nr_foreign;
+/* Cursor for iterating over the p2m on teardown. */
+unsigned long  teardown_gfn;
 #endif /* CONFIG_HVM */
 };
 
@@ -395,16 +397,7 @@ struct p2m_domain {
 #endif
 #include 
 
-static inline bool arch_acquire_resource_check(struct domain *d)
-{
-/*
- * FIXME: Until foreign pages inserted into the P2M are properly
- * reference counted, it is unsafe to allow mapping of
- * resource pages unless the caller is the hardware domain
- * (see set_foreign_p2m_entry()).
- */
-return !paging_mode_translate(d) || is_hardware_domain(d);
-}
+bool arch_acquire_resource_check(const struct domain *d);
 
 /*
  * Updates vCPU's n2pm to match its np2m_base in VMCx12 and returns that np2m.
@@ -720,6 +713,10 @@ p2m_pod_offline_or_broken_hit(struct page_info *p);
 void
 p2m_pod_offline_or_broken_replace(struct page_info *p);
 
+/* Perform cleanup of p2m mappings ahead of teardown. */
+int
+relinquish_p2m_mapping(struct domain *d);
+
 #else
 
 static inline bool
@@ -748,6 +745,11 @@ static inline void 
p2m_pod_offline_or_broken_replace(struct page_info *p)
 ASSERT_UNREACHABLE();
 }
 
+static inline int relinquish_p2m_mapping(struct domain *d)
+{
+return 0;
+}
+
 #endif
 
 
@@ -1043,7 +1045,7 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 break;
 
 case p2m_map_foreign:
-if ( !mfn_valid(nfn) )
+if ( !mfn_valid(nfn) || p2m != p2m_get_hostp2m(p2m->domain) )
 {
 ASSERT_UNREACHABLE();
 return -EINVAL;
@@ -1068,7 +1070,7 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 break;
 
 case p2m_map_foreign:
-if ( !mfn_valid(ofn) )
+if ( !mfn_valid(ofn) || p2m != p2m_get_hostp2m(p2m->domain) )
 {
 ASSERT_UNREACHABLE();
 return -EINVAL;
diff --git a/xen/arch/x86/mm/p2m-basic.c b/xen/arch/x86/mm/p2m-basic.c
index 8599bd15c61a..25d27a0a9f56 100644
--- a/xen/arch/x86/mm/p2m-basic.c
+++ b/xen/arch/x86/mm/p2m-basic.c
@@ -13,6 +13,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include "mm-locks.h"

[PATCH for-4.19 v3 1/3] xen/x86: account number of foreign mappings in the p2m

2024-05-17 Thread Roger Pau Monne

Such information will be needed in order to remove foreign mappings during
teardown for HVM guests.

Right now the introduced counter is not consumed.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes since v1:
 - Drop max_gfn accounting.
---
 xen/arch/x86/include/asm/p2m.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
index 111badf89a6e..107b9f260848 100644
--- a/xen/arch/x86/include/asm/p2m.h
+++ b/xen/arch/x86/include/asm/p2m.h
@@ -380,6 +380,9 @@ struct p2m_domain {
 unsigned int flags;
 unsigned long entry_count;
 } ioreq;
+
+/* Number of foreign mappings. */
+unsigned long  nr_foreign;
 #endif /* CONFIG_HVM */
 };
 
@@ -1049,6 +1052,8 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 if ( !page_get_owner_and_reference(mfn_to_page(nfn)) )
 return -EBUSY;
 
+p2m->nr_foreign++;
+
 break;
 
 default:
@@ -1069,6 +1074,7 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 return -EINVAL;
 }
 put_page(mfn_to_page(ofn));
+p2m->nr_foreign--;
 break;
 
 default:
-- 
2.44.0

[PATCH for-4.19 v3 0/3] xen/x86: support foreign mappings for HVM/PVH

2024-05-17 Thread Roger Pau Monne

Hello,

The following series attempts to solve a shortcoming of HVM/PVH guests
with the lack of support for foreign mappings.  Such lack of support
prevents using PVH based guests as stubdomains for example.

Add support in a way similar to how it's done on Arm, by iterating over
the p2m based on the maximum gfn.

Patch 2 is not strictly needed.  Moving the enablement of altp2m from an
HVM param to a create domctl flag avoids any possible race with the HVM
param changing after it's been evaluated.  Note the param can only be
set by the control domain, and libxl currently sets it at domain
create.  Also altp2m enablement is different from activation, as
activation does happen during runtime of the domain.

Thanks, Roger.

Roger Pau Monne (3):
  xen/x86: account number of foreign mappings in the p2m
  xen/x86: enable altp2m at create domain domctl
  xen/x86: remove foreign mappings from the p2m on teardown

 CHANGELOG.md|  1 +
 tools/libs/light/libxl_create.c | 23 +-
 tools/libs/light/libxl_x86.c| 26 +--
 tools/ocaml/libs/xc/xenctrl_stubs.c |  2 +-
 xen/arch/arm/domain.c   |  6 +++
 xen/arch/x86/domain.c   | 28 
 xen/arch/x86/hvm/hvm.c  | 23 +-
 xen/arch/x86/include/asm/p2m.h  | 32 +-
 xen/arch/x86/mm/p2m-basic.c | 18 
 xen/arch/x86/mm/p2m.c   | 68 +++--
 xen/include/public/domctl.h | 20 -
 xen/include/public/hvm/params.h |  9 +---
 12 files changed, 215 insertions(+), 41 deletions(-)

-- 
2.44.0

[PATCH for-4.19 v3 2/3] xen: enable altp2m at create domain domctl

2024-05-17 Thread Roger Pau Monne

Enabling it using an HVM param is fragile, and complicates the logic when
deciding whether options that interact with altp2m can also be enabled.

Leave the HVM param value for consumption by the guest, but prevent it from
being set.  Enabling is now done using and additional altp2m specific field in
xen_domctl_createdomain.

Note that albeit only currently implemented in x86, altp2m could be implemented
in other architectures, hence why the field is added to xen_domctl_createdomain
instead of xen_arch_domainconfig.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Introduce a new altp2m field in xen_domctl_createdomain.

Changes since v1:
 - New in this version.
---
 tools/libs/light/libxl_create.c | 23 ++-
 tools/libs/light/libxl_x86.c| 26 --
 tools/ocaml/libs/xc/xenctrl_stubs.c |  2 +-
 xen/arch/arm/domain.c   |  6 ++
 xen/arch/x86/domain.c   | 22 ++
 xen/arch/x86/hvm/hvm.c  | 23 ++-
 xen/include/public/domctl.h | 20 +++-
 xen/include/public/hvm/params.h |  9 ++---
 8 files changed, 106 insertions(+), 25 deletions(-)

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 41252ec55370..edeadd57ef5a 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -372,7 +372,6 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 libxl_defbool_setdefault(_info->u.hvm.viridian,   false);
 libxl_defbool_setdefault(_info->u.hvm.hpet,   true);
 libxl_defbool_setdefault(_info->u.hvm.vpt_align,  true);
-libxl_defbool_setdefault(_info->u.hvm.altp2m, false);
 libxl_defbool_setdefault(_info->u.hvm.usb,false);
 libxl_defbool_setdefault(_info->u.hvm.vkb_device, true);
 libxl_defbool_setdefault(_info->u.hvm.xen_platform_pci,   true);
@@ -678,6 +677,28 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 if (info->passthrough == LIBXL_PASSTHROUGH_SYNC_PT)
 create.iommu_opts |= XEN_DOMCTL_IOMMU_no_sharept;
 
+LOG(DETAIL, "altp2m: %s", libxl_altp2m_mode_to_string(b_info->altp2m));
+switch(b_info->altp2m) {
+case LIBXL_ALTP2M_MODE_MIXED:
+create.altp2m_opts |=
+XEN_DOMCTL_ALTP2M_mode(XEN_DOMCTL_ALTP2M_mixed);
+break;
+
+case LIBXL_ALTP2M_MODE_EXTERNAL:
+create.altp2m_opts |=
+XEN_DOMCTL_ALTP2M_mode(XEN_DOMCTL_ALTP2M_external);
+break;
+
+case LIBXL_ALTP2M_MODE_LIMITED:
+create.altp2m_opts |=
+XEN_DOMCTL_ALTP2M_mode(XEN_DOMCTL_ALTP2M_limited);
+break;
+
+case LIBXL_ALTP2M_MODE_DISABLED:
+/* Nothing to do - altp2m disabled is signaled as mode == 0. */
+break;
+}
+
 /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
 libxl_uuid_copy(ctx, (libxl_uuid *), >uuid);
 
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index a50ec37eb3eb..60643d6f5376 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -407,19 +407,9 @@ static int hvm_set_conf_params(libxl__gc *gc, uint32_t 
domid,
 libxl_ctx *ctx = libxl__gc_owner(gc);
 xc_interface *xch = ctx->xch;
 int ret = ERROR_FAIL;
-unsigned int altp2m = info->altp2m;
 
 switch(info->type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-/* The config parameter "altp2m" replaces the parameter "altp2mhvm". 
For
- * legacy reasons, both parameters are accepted on x86 HVM guests.
- *
- * If the legacy field info->u.hvm.altp2m is set, activate altp2m.
- * Otherwise set altp2m based on the field info->altp2m. */
-if (info->altp2m == LIBXL_ALTP2M_MODE_DISABLED &&
-libxl_defbool_val(info->u.hvm.altp2m))
-altp2m = libxl_defbool_val(info->u.hvm.altp2m);
-
 if (xc_hvm_param_set(xch, domid, HVM_PARAM_HPET_ENABLED,
  libxl_defbool_val(info->u.hvm.hpet))) {
 LOG(ERROR, "Couldn't set HVM_PARAM_HPET_ENABLED");
@@ -444,10 +434,6 @@ static int hvm_set_conf_params(libxl__gc *gc, uint32_t 
domid,
 LOG(ERROR, "Couldn't set HVM_PARAM_TIMER_MODE");
 goto out;
 }
-if (xc_hvm_param_set(xch, domid, HVM_PARAM_ALTP2M, altp2m)) {
-LOG(ERROR, "Couldn't set HVM_PARAM_ALTP2M");
-goto out;
-}
 break;
 
 default:
@@ -818,6 +804,18 @@ int libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
 libxl_defbool_setdefault(_info->acpi, true);
 libxl_defbool_setdefault(_info->arch_x86.msr_relaxed, false);
 
+/*
+ * The config parameter "altp2m" replaces the parameter "altp2mhvm".
+ * For legacy reasons, both parameters are accepted on x86

[PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

2024-05-16 Thread Roger Pau Monne

The current check used in fixup_irqs() to decide whether to move around
interrupts is based on the affinity mask, but such mask can have all bits set,
and hence is unlikely to be a subset of the input mask.  For example if an
interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
an all set CPU mask would cause that interrupt to be shuffled around
unconditionally.

What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
input CPU mask, and for that purpose it should check whether the interrupt is
assigned to a CPU not present in the input mask.

Note that the shuffling done by fixup_irqs() is racy: the old destination
target is not allowed to drain any pending interrupts before the new
destination is set, which can lead to spurious 'No irq handler for vector ...'
messages.  While the proposed change doesn't fix the issue, limiting the
amount of shuffling to only strictly necessary reduces the chances of stray
interrupts.

While there also adjust the comment as to the purpose of fixup_irqs().

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/include/asm/irq.h | 2 +-
 xen/arch/x86/irq.c | 9 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 413994d2133b..33dd7667137b 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -130,7 +130,7 @@ void free_domain_pirqs(struct domain *d);
 int map_domain_emuirq_pirq(struct domain *d, int pirq, int emuirq);
 int unmap_domain_pirq_emuirq(struct domain *d, int pirq);
 
-/* Reset irq affinities to match the given CPU mask. */
+/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
 void fixup_irqs(const cpumask_t *mask, bool verbose);
 void fixup_eoi(void);
 
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 3b951d81bd6d..223f813f6ceb 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2527,7 +2527,7 @@ static int __init cf_check setup_dump_irqs(void)
 }
 __initcall(setup_dump_irqs);
 
-/* Reset irq affinities to match the given CPU mask. */
+/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
 void fixup_irqs(const cpumask_t *mask, bool verbose)
 {
 unsigned int irq;
@@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 release_old_vec(desc);
 }
 
-if ( !desc->action || cpumask_subset(desc->affinity, mask) )
+/*
+ * Avoid shuffling the interrupt around if it's assigned to a CPU set
+ * that's all covered by the requested affinity mask.
+ */
+cpumask_and(affinity, desc->arch.cpu_mask, _online_map);
+if ( !desc->action || cpumask_subset(affinity, mask) )
 {
 spin_unlock(>lock);
 continue;
-- 
2.44.0

[PATCH for-4.19?] xen/x86: pretty print interrupt CPU affinity masks

2024-05-15 Thread Roger Pau Monne

Print the CPU affinity masks as numeric ranges instead of plain hexadecimal
bitfields.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/irq.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 80ba8d9fe912..3b951d81bd6d 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1934,10 +1934,10 @@ void do_IRQ(struct cpu_user_regs *regs)
 if ( ~irq < nr_irqs && irq_desc_initialized(desc) )
 {
 spin_lock(>lock);
-printk("IRQ%d a=%04lx[%04lx,%04lx] v=%02x[%02x] t=%s 
s=%08x\n",
-   ~irq, *cpumask_bits(desc->affinity),
-   *cpumask_bits(desc->arch.cpu_mask),
-   *cpumask_bits(desc->arch.old_cpu_mask),
+printk("IRQ%d a={%*pbl}[{%*pbl},{%*pbl}] v=%02x[%02x] t=%s 
s=%08x\n",
+   ~irq, CPUMASK_PR(desc->affinity),
+   CPUMASK_PR(desc->arch.cpu_mask),
+   CPUMASK_PR(desc->arch.old_cpu_mask),
desc->arch.vector, desc->arch.old_vector,
desc->handler->typename, desc->status);
 spin_unlock(>lock);
@@ -2638,7 +2638,7 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
 if ( !set_affinity )
 printk("Cannot set affinity for IRQ%u\n", irq);
 else if ( break_affinity )
-printk("Broke affinity for IRQ%u, new: %*pb\n",
+printk("Broke affinity for IRQ%u, new: {%*pbl}\n",
irq, CPUMASK_PR(affinity));
 }
 
-- 
2.44.0

[PATCH for-4.19] x86/mtrr: avoid system wide rendezvous when setting AP MTRRs

2024-05-13 Thread Roger Pau Monne

There's no point in forcing a system wide update of the MTRRs on all processors
when there are no changes to be propagated.  On AP startup it's only the AP
that needs to write the system wide MTRR values in order to match the rest of
the already online CPUs.

We have occasionally seen the watchdog trigger during `xen-hptool cpu-online`
in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs
on all the CPUs in the system.

While there adjust the comment to clarify why the system-wide resetting of the
MTRR registers is not needed for the purposes of mtrr_ap_init().

Signed-off-by: Roger Pau Monné 
---
For consideration for 4.19: it's a bugfix of a rare instance of the watchdog
triggering, but it's also a good performance improvement when performing
cpu-online.

Hopefully runtime changes to MTRR will affect a single MSR at a time, lowering
the chance of the watchdog triggering due to the system-wide resetting of the
range.
---
 xen/arch/x86/cpu/mtrr/main.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/cpu/mtrr/main.c b/xen/arch/x86/cpu/mtrr/main.c
index 90b235f57e68..0a44ebbcb04f 100644
--- a/xen/arch/x86/cpu/mtrr/main.c
+++ b/xen/arch/x86/cpu/mtrr/main.c
@@ -573,14 +573,15 @@ void mtrr_ap_init(void)
if (!mtrr_if || hold_mtrr_updates_on_aps)
return;
/*
-* Ideally we should hold mtrr_mutex here to avoid mtrr entries changed,
-* but this routine will be called in cpu boot time, holding the lock
-* breaks it. This routine is called in two cases: 1.very earily time
-* of software resume, when there absolutely isn't mtrr entry changes;
-* 2.cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug lock to
-* prevent mtrr entry changes
+* hold_mtrr_updates_on_aps takes care of preventing unnecessary MTRR
+* updates when batch starting the CPUs (see
+* mtrr_aps_sync_{begin,end}()).
+*
+* Otherwise just apply the current system wide MTRR values to this AP.
+* Note this doesn't require synchronization with the other CPUs, as
+* there are strictly no modifications of the current MTRR values.
 */
-   set_mtrr(~0U, 0, 0, 0);
+   mtrr_set_all();
 }
 
 /**
-- 
2.44.0

[PATCH for-4.19 v3] tools/xen-cpuid: switch to use cpu-policy defined names

2024-05-13 Thread Roger Pau Monne

Like it was done recently for libxl, switch to using the auto-generated feature
names by the processing of cpufeatureset.h, this allows removing the open-coded
feature names, and unifies the feature naming with libxl and the hypervisor.

Introduce a newly auto-generated array that contains the feature names indexed
at featureset bit position, otherwise using the existing INIT_FEATURE_NAMES
would require iterating over the array elements until a match with the expected
bit position is found.

Note that leaf names need to be kept, as the current auto-generated data
doesn't contain the leaf names.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes since v2:
 - Remove __maybe_unused definition from the emulator harness.

Changes since v1:
 - Modify gen-cpuid.py to generate an array of strings with the feature names.
 - Introduce and use __maybe_unused.
---
 tools/include/xen-tools/common-macros.h |   4 +
 tools/misc/xen-cpuid.c  | 320 +++-
 tools/tests/x86_emulator/x86-emulate.h  |   1 -
 xen/tools/gen-cpuid.py  |  26 ++
 4 files changed, 68 insertions(+), 283 deletions(-)

diff --git a/tools/include/xen-tools/common-macros.h 
b/tools/include/xen-tools/common-macros.h
index 60912225cb7a..560528dbc638 100644
--- a/tools/include/xen-tools/common-macros.h
+++ b/tools/include/xen-tools/common-macros.h
@@ -83,6 +83,10 @@
 #define __packed __attribute__((__packed__))
 #endif
 
+#ifndef __maybe_unused
+# define __maybe_unused __attribute__((__unused__))
+#endif
+
 #define container_of(ptr, type, member) ({  \
 typeof(((type *)0)->member) *mptr__ = (ptr);\
 (type *)((char *)mptr__ - offsetof(type, member));  \
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 8893547bebce..2a1ac0ee8326 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -12,282 +12,33 @@
 
 #include 
 
-static uint32_t nr_features;
-
-static const char *const str_1d[32] =
-{
-[ 0] = "fpu",  [ 1] = "vme",
-[ 2] = "de",   [ 3] = "pse",
-[ 4] = "tsc",  [ 5] = "msr",
-[ 6] = "pae",  [ 7] = "mce",
-[ 8] = "cx8",  [ 9] = "apic",
-/* [10] */ [11] = "sysenter",
-[12] = "mtrr", [13] = "pge",
-[14] = "mca",  [15] = "cmov",
-[16] = "pat",  [17] = "pse36",
-[18] = "psn",  [19] = "clflush",
-/* [20] */ [21] = "ds",
-[22] = "acpi", [23] = "mmx",
-[24] = "fxsr", [25] = "sse",
-[26] = "sse2", [27] = "ss",
-[28] = "htt",  [29] = "tm",
-[30] = "ia64", [31] = "pbe",
-};
-
-static const char *const str_1c[32] =
-{
-[ 0] = "sse3",[ 1] = "pclmulqdq",
-[ 2] = "dtes64",  [ 3] = "monitor",
-[ 4] = "ds-cpl",  [ 5] = "vmx",
-[ 6] = "smx", [ 7] = "est",
-[ 8] = "tm2", [ 9] = "ssse3",
-[10] = "cntx-id", [11] = "sdgb",
-[12] = "fma", [13] = "cx16",
-[14] = "xtpr",[15] = "pdcm",
-/* [16] */[17] = "pcid",
-[18] = "dca", [19] = "sse41",
-[20] = "sse42",   [21] = "x2apic",
-[22] = "movebe",  [23] = "popcnt",
-[24] = "tsc-dl",  [25] = "aesni",
-[26] = "xsave",   [27] = "osxsave",
-[28] = "avx", [29] = "f16c",
-[30] = "rdrnd",   [31] = "hyper",
-};
-
-static const char *const str_e1d[32] =
-{
-[ 0] = "fpu",[ 1] = "vme",
-[ 2] = "de", [ 3] = "pse",
-[ 4] = "tsc",[ 5] = "msr",
-[ 6] = "pae",[ 7] = "mce",
-[ 8] = "cx8",[ 9] = "apic",
-/* [10] */   [11] = "syscall",
-[12] = "mtrr",   [13] = "pge",
-[14] = "mca",[15] = "cmov",
-[16] = "fcmov",  [17] = "pse36",
-/* [18] */   [19] = "mp",
-[20] = "nx", /* [21] */
-[22] = "mmx+",   [23] = "mmx",
-[24] = "fxsr",   [25] = "fxsr+",
-[26] = "pg1g",   [27] = "rdtscp",
-/* [28] */   [29] = "lm",
-[30] = "3dnow+", [31] = "3dnow",
-};
-
-static const char *const str_e1c[32] =
-{
-[ 0] = "lahf-lm",[ 1] = "cmp",
-[ 2] = "svm",[ 3] = "extapic",
-[ 4] = "cr8d",   [ 5] = "lzcnt",
-[ 6] = "sse4a",  [ 7] = "msse",
-[ 8] = "3dnowpf",[ 9] = "osvw",
-[10] = "ibs",[11] = "xop",
-[12] = "skinit", [13] = "wdt",
-/* [14] */   [15] = "lwp",
-[16] = "fma4",   [17] = "tce",
-/* [18] */   [19] = "nodeid",
-/* [20] */   [21] = "tbm",
-[22] = "topoext",[23] = "perfctr-core",
-[24] = "perfctr-nb", /* [25] */
-[26] = "dbx",[27] = "perftsc",
-[28] = "pcx-l2i",[29] = "monitorx",
-[30] = "addr-msk-ext",
-};
-
-static const char *const str_7b0[32] =
-{
-[ 0] = "fsgsbase", [ 1] = "tsc-adj",
-[ 2] = "sgx",  [ 3] = "bmi1",
-[ 4] = "hle",  [ 5] = "avx2",
-[ 6] = "fdp-exn",  [ 7] = "smep",
-[ 8] = "bmi2", [ 9] = "erms",
-[10] = "invpcid",  [11] = "rtm",
-[12] = "pqm",  [13] = "depfpp",
-[14] = "mpx",  [15] = "pqe",
-[16] = "avx512f",  [17] = "avx512dq",
-[18] = "rdseed",   [19] =

[PATCH for-4.19] libxl: fix population of the online vCPU bitmap for PVH

2024-05-10 Thread Roger Pau Monne

libxl passes some information to libacpi to create the ACPI table for a PVH
guest, and among that information it's a bitmap of which vCPUs are online
which can be less than the maximum number of vCPUs assigned to the domain.

While the population of the bitmap is done correctly for HVM based on the
number of online vCPUs, for PVH the population of the bitmap is done based on
the number of maximum vCPUs allowed.  This leads to all local APIC entries in
the MADT being set as enabled, which contradicts the data in xenstore if vCPUs
is different than maximum vCPUs.

Fix by copying the internal libxl bitmap that's populated based on the vCPUs
parameter.

Reported-by: Arthur Borsboom 
Link: https://gitlab.com/libvirt/libvirt/-/issues/399
Reported-by: Leigh Brown 
Fixes: 14c0d328da2b ('libxl/acpi: Build ACPI tables for HVMlite guests')
Signed-off-by: Roger Pau Monné 
---
Note that the setup of hvm_info_table could be shared between PVH and HVM, as
the fields are very limited, see hvm_build_set_params() for the HVM side.
However this late in the release it's safer to just adjust the PVH path.

Also note the checksum is not provided when hvm_info_table is built for PVH.
This is fine so far because such checksum is only consumed by hvmloader and not
libacpi itself.

It's a bugfix, so should be considered for 4.19.
---
 tools/libs/light/libxl_x86_acpi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/libs/light/libxl_x86_acpi.c 
b/tools/libs/light/libxl_x86_acpi.c
index 620f3c700c3e..5cf261bd6794 100644
--- a/tools/libs/light/libxl_x86_acpi.c
+++ b/tools/libs/light/libxl_x86_acpi.c
@@ -89,7 +89,7 @@ static int init_acpi_config(libxl__gc *gc,
 uint32_t domid = dom->guest_domid;
 xc_domaininfo_t info;
 struct hvm_info_table *hvminfo;
-int i, r, rc;
+int r, rc;
 
 config->dsdt_anycpu = config->dsdt_15cpu = dsdt_pvh;
 config->dsdt_anycpu_len = config->dsdt_15cpu_len = dsdt_pvh_len;
@@ -138,8 +138,8 @@ static int init_acpi_config(libxl__gc *gc,
 hvminfo->nr_vcpus = info.max_vcpu_id + 1;
 }
 
-for (i = 0; i < hvminfo->nr_vcpus; i++)
-hvminfo->vcpu_online[i / 8] |= 1 << (i & 7);
+memcpy(hvminfo->vcpu_online, b_info->avail_vcpus.map,
+   b_info->avail_vcpus.size);
 
 config->hvminfo = hvminfo;
 
-- 
2.44.0

[PATCH for-4.19 v2 1/3] xen/x86: account number of foreign mappings in the p2m

2024-05-08 Thread Roger Pau Monne

Such information will be needed in order to remove foreign mappings during
teardown for HVM guests.

Right now the introduced counter is not consumed.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Drop max_gfn accounting.
---
 xen/arch/x86/include/asm/p2m.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
index 111badf89a6e..107b9f260848 100644
--- a/xen/arch/x86/include/asm/p2m.h
+++ b/xen/arch/x86/include/asm/p2m.h
@@ -380,6 +380,9 @@ struct p2m_domain {
 unsigned int flags;
 unsigned long entry_count;
 } ioreq;
+
+/* Number of foreign mappings. */
+unsigned long  nr_foreign;
 #endif /* CONFIG_HVM */
 };
 
@@ -1049,6 +1052,8 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 if ( !page_get_owner_and_reference(mfn_to_page(nfn)) )
 return -EBUSY;
 
+p2m->nr_foreign++;
+
 break;
 
 default:
@@ -1069,6 +1074,7 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 return -EINVAL;
 }
 put_page(mfn_to_page(ofn));
+p2m->nr_foreign--;
 break;
 
 default:
-- 
2.44.0

[PATCH for-4.19 v2 2/3] xen/x86: enable altp2m at create domain domctl

2024-05-08 Thread Roger Pau Monne

Enabling it using an HVM param is fragile, and complicates the logic when
deciding whether options that interact with altp2m can also be enabled.

Leave the HVM param value for consumption by the guest, but prevent it from
being set.  Enabling is now done using the misc_flags field in
xen_arch_domainconfig.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 tools/libs/light/libxl_x86.c  | 43 +--
 xen/arch/x86/domain.c | 25 +-
 xen/arch/x86/hvm/hvm.c| 15 ++-
 xen/include/public/arch-x86/xen.h | 18 +
 xen/include/public/hvm/params.h   |  9 ++-
 5 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index a50ec37eb3eb..86ee8132536c 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -6,8 +6,21 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
   libxl_domain_config *d_config,
   struct xen_domctl_createdomain *config)
 {
+unsigned int altp2m = d_config->b_info.altp2m;
+
 switch(d_config->c_info.type) {
 case LIBXL_DOMAIN_TYPE_HVM:
+/*
+ * The config parameter "altp2m" replaces the parameter "altp2mhvm".
+ * For legacy reasons, both parameters are accepted on x86 HVM guests.
+ *
+ * If the legacy field info->u.hvm.altp2m is set, activate altp2m.
+ * Otherwise set altp2m based on the field info->altp2m.
+ */
+if (altp2m == LIBXL_ALTP2M_MODE_DISABLED &&
+libxl_defbool_val(d_config->b_info.u.hvm.altp2m))
+altp2m = libxl_defbool_val(d_config->b_info.u.hvm.altp2m);
+
 config->arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
 if (!libxl_defbool_val(d_config->b_info.u.hvm.pirq))
 config->arch.emulation_flags &= ~XEN_X86_EMU_USE_PIRQ;
@@ -26,6 +39,22 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 if (libxl_defbool_val(d_config->b_info.arch_x86.msr_relaxed))
 config->arch.misc_flags |= XEN_X86_MSR_RELAXED;
 
+if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) {
+switch (altp2m) {
+case LIBXL_ALTP2M_MODE_MIXED:
+config->arch.misc_flags |= XEN_X86_ALTP2M_MIXED;
+break;
+
+case LIBXL_ALTP2M_MODE_EXTERNAL:
+config->arch.misc_flags |= XEN_X86_ALTP2M_EXT;
+break;
+
+case LIBXL_ALTP2M_MODE_LIMITED:
+config->arch.misc_flags |= XEN_X86_ALTP2M_LIMITED;
+break;
+}
+}
+
 return 0;
 }
 
@@ -407,19 +436,9 @@ static int hvm_set_conf_params(libxl__gc *gc, uint32_t 
domid,
 libxl_ctx *ctx = libxl__gc_owner(gc);
 xc_interface *xch = ctx->xch;
 int ret = ERROR_FAIL;
-unsigned int altp2m = info->altp2m;
 
 switch(info->type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-/* The config parameter "altp2m" replaces the parameter "altp2mhvm". 
For
- * legacy reasons, both parameters are accepted on x86 HVM guests.
- *
- * If the legacy field info->u.hvm.altp2m is set, activate altp2m.
- * Otherwise set altp2m based on the field info->altp2m. */
-if (info->altp2m == LIBXL_ALTP2M_MODE_DISABLED &&
-libxl_defbool_val(info->u.hvm.altp2m))
-altp2m = libxl_defbool_val(info->u.hvm.altp2m);
-
 if (xc_hvm_param_set(xch, domid, HVM_PARAM_HPET_ENABLED,
  libxl_defbool_val(info->u.hvm.hpet))) {
 LOG(ERROR, "Couldn't set HVM_PARAM_HPET_ENABLED");
@@ -444,10 +463,6 @@ static int hvm_set_conf_params(libxl__gc *gc, uint32_t 
domid,
 LOG(ERROR, "Couldn't set HVM_PARAM_TIMER_MODE");
 goto out;
 }
-if (xc_hvm_param_set(xch, domid, HVM_PARAM_ALTP2M, altp2m)) {
-LOG(ERROR, "Couldn't set HVM_PARAM_ALTP2M");
-goto out;
-}
 break;
 
 default:
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 20e83cf38bbd..dff790060605 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -637,6 +637,9 @@ int arch_sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 bool hap = config->flags & XEN_DOMCTL_CDF_hap;
 bool nested_virt = config->flags & XEN_DOMCTL_CDF_nested_virt;
 unsigned int max_vcpus;
+unsigned int altp2m = config->arch.misc_flags & (XEN_X86_ALTP2M_MIXED |
+ XEN_X86_ALTP2M_EXT |
+ XEN_X86_ALTP2M_LIMITED);
 
 if ( hvm ? !hvm_enabled : !IS_ENABLED(CONFIG_PV) )
 {
@@ -708,13 +711,33 @@ int arch_sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 }
 }
 
-if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
+if ( config->arch.misc_flags & ~XEN_X86_MISC_FLAGS_ALL )
 {

[PATCH for-4.19 v2 3/3] xen/x86: remove foreign mappings from the p2m on teardown

2024-05-08 Thread Roger Pau Monne

Iterate over the p2m up to the maximum recorded gfn and remove any foreign
mappings, in order to drop the underlying page references and thus don't keep
extra page references if a domain is destroyed while still having foreign
mappings on it's p2m.

The logic is similar to the one used on Arm.

Note that foreign mappings cannot be created by guests that have altp2m or
nested HVM enabled, as p2ms different than the host one are not currently
scrubbed when destroyed in order to drop references to any foreign maps.

It's unclear whether the right solution is to take an extra reference when
foreign maps are added to p2ms different than the host one, or just rely on the
host p2m already having a reference.  The mapping being removed from the host
p2m should cause it to be dropped on all domain p2ms.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Use existing p2m max_mapped_pfn field.
 - Prevent creating foreign mappings by guests that have altp2m or nestedhvm
   enabled.
---
 CHANGELOG.md   |  1 +
 xen/arch/x86/domain.c  |  8 +++-
 xen/arch/x86/include/asm/p2m.h | 26 +++--
 xen/arch/x86/mm/p2m-basic.c| 17 +
 xen/arch/x86/mm/p2m.c  | 68 --
 5 files changed, 103 insertions(+), 17 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 8041cfb7d243..09bdb9b97578 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -14,6 +14,7 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
- HVM PIRQs are disabled by default.
- Reduce IOMMU setup time for hardware domain.
  - xl/libxl configures vkb=[] for HVM domains with priority over vkb_device.
+ - Allow HVM/PVH domains to map foreign pages.
 
 ### Added
  - On x86:
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dff790060605..1cb3ccddab00 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -718,7 +718,7 @@ int arch_sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 return -EINVAL;
 }
 
-if ( altp2m && (altp2m & (altp2m - 1)) )
+if ( altp2m & (altp2m - 1) )
 {
 dprintk(XENLOG_INFO, "Multiple altp2m options selected in flags: 
%#x\n",
 config->flags);
@@ -2387,6 +2387,7 @@ int domain_relinquish_resources(struct domain *d)
 enum {
 PROG_iommu_pagetables = 1,
 PROG_shared,
+PROG_mappings,
 PROG_paging,
 PROG_vcpu_pagetables,
 PROG_xen,
@@ -2435,6 +2436,11 @@ int domain_relinquish_resources(struct domain *d)
 }
 #endif
 
+PROGRESS(mappings):
+ret = relinquish_p2m_mapping(d);
+if ( ret )
+return ret;
+
 PROGRESS(paging):
 
 /* Tear down paging-assistance stuff. */
diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
index 107b9f260848..c1478ffc3647 100644
--- a/xen/arch/x86/include/asm/p2m.h
+++ b/xen/arch/x86/include/asm/p2m.h
@@ -383,6 +383,8 @@ struct p2m_domain {
 
 /* Number of foreign mappings. */
 unsigned long  nr_foreign;
+/* Cursor for iterating over the p2m on teardown. */
+unsigned long  teardown_gfn;
 #endif /* CONFIG_HVM */
 };
 
@@ -395,16 +397,7 @@ struct p2m_domain {
 #endif
 #include 
 
-static inline bool arch_acquire_resource_check(struct domain *d)
-{
-/*
- * FIXME: Until foreign pages inserted into the P2M are properly
- * reference counted, it is unsafe to allow mapping of
- * resource pages unless the caller is the hardware domain
- * (see set_foreign_p2m_entry()).
- */
-return !paging_mode_translate(d) || is_hardware_domain(d);
-}
+bool arch_acquire_resource_check(const struct domain *d);
 
 /*
  * Updates vCPU's n2pm to match its np2m_base in VMCx12 and returns that np2m.
@@ -720,6 +713,10 @@ p2m_pod_offline_or_broken_hit(struct page_info *p);
 void
 p2m_pod_offline_or_broken_replace(struct page_info *p);
 
+/* Perform cleanup of p2m mappings ahead of teardown. */
+int
+relinquish_p2m_mapping(struct domain *d);
+
 #else
 
 static inline bool
@@ -748,6 +745,11 @@ static inline void 
p2m_pod_offline_or_broken_replace(struct page_info *p)
 ASSERT_UNREACHABLE();
 }
 
+static inline int relinquish_p2m_mapping(struct domain *d)
+{
+return 0;
+}
+
 #endif
 
 
@@ -1043,7 +1045,7 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 break;
 
 case p2m_map_foreign:
-if ( !mfn_valid(nfn) )
+if ( !mfn_valid(nfn) || p2m != p2m_get_hostp2m(p2m->domain) )
 {
 ASSERT_UNREACHABLE();
 return -EINVAL;
@@ -1068,7 +1070,7 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 break;
 
 case p2m_map_foreign:
-if ( !mfn_valid(ofn) )
+if ( !mfn_valid(ofn) || p2m != p2m_get_hostp2m(p2m->domain) )
 {
 ASSERT_UNREACHABLE();
 return -EINVAL;
diff --git

[PATCH for-4.19 v2 0/3] xen/x86: support foreign mappings for HVM/PVH

2024-05-08 Thread Roger Pau Monne

Hello,

The following series attempts to solve a shortcoming of HVM/PVH guests
with the lack of support for foreign mappings.  Such lack of support
prevents using PVH based guests as stubdomains for example.

Add support in a way similar to how it's done on Arm, by iterating over
the p2m based on the maximum gfn.

Patch 2 is not strictly needed.  Moving the enablement of altp2m from an
HVM param to a create domctl flag avoids any possible race with the HVM
param changing after it's been evaluated.  Note the param can only be
set by the control domain, and libxl currently sets it at domain
create.  Also altp2m enablement is different from activation, as
activation does happen during runtime of the domain.

Thanks, Roger.

Roger Pau Monne (3):
  xen/x86: account number of foreign mappings in the p2m
  xen/x86: enable altp2m at create domain domctl
  xen/x86: remove foreign mappings from the p2m on teardown

 CHANGELOG.md  |  1 +
 tools/libs/light/libxl_x86.c  | 43 ---
 xen/arch/x86/domain.c | 31 +-
 xen/arch/x86/hvm/hvm.c| 15 ++-
 xen/arch/x86/include/asm/p2m.h| 32 +--
 xen/arch/x86/mm/p2m-basic.c   | 17 
 xen/arch/x86/mm/p2m.c | 68 +--
 xen/include/public/arch-x86/xen.h | 18 
 xen/include/public/hvm/params.h   |  9 +---
 9 files changed, 195 insertions(+), 39 deletions(-)

-- 
2.44.0

[PATCH for-4.19 v2] tools/xen-cpuid: switch to use cpu-policy defined names

2024-05-02 Thread Roger Pau Monne

Like it was done recently for libxl, switch to using the auto-generated feature
names by the processing of cpufeatureset.h, this allows removing the open-coded
feature names, and unifies the feature naming with libxl and the hypervisor.

Introduce a newly auto-generated array that contains the feature names indexed
at featureset bit position, otherwise using the existing INIT_FEATURE_NAMES
would require iterating over the array elements until a match with the expected
bit position is found.

Note that leaf names need to be kept, as the current auto-generated data
doesn't contain the leaf names.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Modify gen-cpuid.py to generate an array of strings with the feature names.
 - Introduce and use __maybe_unused.
---
 tools/include/xen-tools/common-macros.h |   4 +
 tools/misc/xen-cpuid.c  | 320 +++-
 xen/tools/gen-cpuid.py  |  26 ++
 3 files changed, 68 insertions(+), 282 deletions(-)

diff --git a/tools/include/xen-tools/common-macros.h 
b/tools/include/xen-tools/common-macros.h
index 07aed92684b5..3e6a66080a4f 100644
--- a/tools/include/xen-tools/common-macros.h
+++ b/tools/include/xen-tools/common-macros.h
@@ -83,6 +83,10 @@
 #define __packed __attribute__((__packed__))
 #endif
 
+#ifndef __maybe_unused
+# define __maybe_unused __attribute__((__unused__))
+#endif
+
 #define container_of(ptr, type, member) ({  \
 typeof(((type *)0)->member) *mptr__ = (ptr);\
 (type *)((char *)mptr__ - offsetof(type, member));  \
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 8893547bebce..2a1ac0ee8326 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -12,282 +12,33 @@
 
 #include 
 
-static uint32_t nr_features;
-
-static const char *const str_1d[32] =
-{
-[ 0] = "fpu",  [ 1] = "vme",
-[ 2] = "de",   [ 3] = "pse",
-[ 4] = "tsc",  [ 5] = "msr",
-[ 6] = "pae",  [ 7] = "mce",
-[ 8] = "cx8",  [ 9] = "apic",
-/* [10] */ [11] = "sysenter",
-[12] = "mtrr", [13] = "pge",
-[14] = "mca",  [15] = "cmov",
-[16] = "pat",  [17] = "pse36",
-[18] = "psn",  [19] = "clflush",
-/* [20] */ [21] = "ds",
-[22] = "acpi", [23] = "mmx",
-[24] = "fxsr", [25] = "sse",
-[26] = "sse2", [27] = "ss",
-[28] = "htt",  [29] = "tm",
-[30] = "ia64", [31] = "pbe",
-};
-
-static const char *const str_1c[32] =
-{
-[ 0] = "sse3",[ 1] = "pclmulqdq",
-[ 2] = "dtes64",  [ 3] = "monitor",
-[ 4] = "ds-cpl",  [ 5] = "vmx",
-[ 6] = "smx", [ 7] = "est",
-[ 8] = "tm2", [ 9] = "ssse3",
-[10] = "cntx-id", [11] = "sdgb",
-[12] = "fma", [13] = "cx16",
-[14] = "xtpr",[15] = "pdcm",
-/* [16] */[17] = "pcid",
-[18] = "dca", [19] = "sse41",
-[20] = "sse42",   [21] = "x2apic",
-[22] = "movebe",  [23] = "popcnt",
-[24] = "tsc-dl",  [25] = "aesni",
-[26] = "xsave",   [27] = "osxsave",
-[28] = "avx", [29] = "f16c",
-[30] = "rdrnd",   [31] = "hyper",
-};
-
-static const char *const str_e1d[32] =
-{
-[ 0] = "fpu",[ 1] = "vme",
-[ 2] = "de", [ 3] = "pse",
-[ 4] = "tsc",[ 5] = "msr",
-[ 6] = "pae",[ 7] = "mce",
-[ 8] = "cx8",[ 9] = "apic",
-/* [10] */   [11] = "syscall",
-[12] = "mtrr",   [13] = "pge",
-[14] = "mca",[15] = "cmov",
-[16] = "fcmov",  [17] = "pse36",
-/* [18] */   [19] = "mp",
-[20] = "nx", /* [21] */
-[22] = "mmx+",   [23] = "mmx",
-[24] = "fxsr",   [25] = "fxsr+",
-[26] = "pg1g",   [27] = "rdtscp",
-/* [28] */   [29] = "lm",
-[30] = "3dnow+", [31] = "3dnow",
-};
-
-static const char *const str_e1c[32] =
-{
-[ 0] = "lahf-lm",[ 1] = "cmp",
-[ 2] = "svm",[ 3] = "extapic",
-[ 4] = "cr8d",   [ 5] = "lzcnt",
-[ 6] = "sse4a",  [ 7] = "msse",
-[ 8] = "3dnowpf",[ 9] = "osvw",
-[10] = "ibs",[11] = "xop",
-[12] = "skinit", [13] = "wdt",
-/* [14] */   [15] = "lwp",
-[16] = "fma4",   [17] = "tce",
-/* [18] */   [19] = "nodeid",
-/* [20] */   [21] = "tbm",
-[22] = "topoext",[23] = "perfctr-core",
-[24] = "perfctr-nb", /* [25] */
-[26] = "dbx",[27] = "perftsc",
-[28] = "pcx-l2i",[29] = "monitorx",
-[30] = "addr-msk-ext",
-};
-
-static const char *const str_7b0[32] =
-{
-[ 0] = "fsgsbase", [ 1] = "tsc-adj",
-[ 2] = "sgx",  [ 3] = "bmi1",
-[ 4] = "hle",  [ 5] = "avx2",
-[ 6] = "fdp-exn",  [ 7] = "smep",
-[ 8] = "bmi2", [ 9] = "erms",
-[10] = "invpcid",  [11] = "rtm",
-[12] = "pqm",  [13] = "depfpp",
-[14] = "mpx",  [15] = "pqe",
-[16] = "avx512f",  [17] = "avx512dq",
-[18] = "rdseed",   [19] = "adx",
-[20] = "smap", [21] = "avx512-ifma",
-[22] = "pcommit",  [23] = "clflushopt",
-[24] = "clwb", [25] = "proc-trace",
-[26] =

[PATCH for-4.19? 2/2] xen/x86: remove foreign mappings from the p2m on teardown

2024-04-30 Thread Roger Pau Monne

Iterate over the p2m based on the maximum recorded gfn and remove any foreign
mappings, in order to drop the underlying page references and thus don't keep
extra page references if a domain is destroyed while still having foreign
mappings on it's p2m.

The logic is similar to the one used on Arm.

Signed-off-by: Roger Pau Monné 
---
I still have to test with destroying a guest that does have foreign mappings on
it's p2m.
---
 CHANGELOG.md   |  1 +
 xen/arch/x86/domain.c  |  6 +++
 xen/arch/x86/include/asm/p2m.h | 17 +
 xen/arch/x86/mm/p2m.c  | 68 --
 4 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 8041cfb7d243..09bdb9b97578 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -14,6 +14,7 @@ The format is based on [Keep a 
Changelog](https://keepachangelog.com/en/1.0.0/)
- HVM PIRQs are disabled by default.
- Reduce IOMMU setup time for hardware domain.
  - xl/libxl configures vkb=[] for HVM domains with priority over vkb_device.
+ - Allow HVM/PVH domains to map foreign pages.
 
 ### Added
  - On x86:
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 20e83cf38bbd..5aa2d3744e6b 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2364,6 +2364,7 @@ int domain_relinquish_resources(struct domain *d)
 enum {
 PROG_iommu_pagetables = 1,
 PROG_shared,
+PROG_mappings,
 PROG_paging,
 PROG_vcpu_pagetables,
 PROG_xen,
@@ -2412,6 +2413,11 @@ int domain_relinquish_resources(struct domain *d)
 }
 #endif
 
+PROGRESS(mappings):
+ret = relinquish_p2m_mapping(d);
+if ( ret )
+return ret;
+
 PROGRESS(paging):
 
 /* Tear down paging-assistance stuff. */
diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
index d95341ef4242..8b3e6a473a0c 100644
--- a/xen/arch/x86/include/asm/p2m.h
+++ b/xen/arch/x86/include/asm/p2m.h
@@ -402,13 +402,7 @@ struct p2m_domain {
 
 static inline bool arch_acquire_resource_check(struct domain *d)
 {
-/*
- * FIXME: Until foreign pages inserted into the P2M are properly
- * reference counted, it is unsafe to allow mapping of
- * resource pages unless the caller is the hardware domain
- * (see set_foreign_p2m_entry()).
- */
-return !paging_mode_translate(d) || is_hardware_domain(d);
+return true;
 }
 
 /*
@@ -725,6 +719,10 @@ p2m_pod_offline_or_broken_hit(struct page_info *p);
 void
 p2m_pod_offline_or_broken_replace(struct page_info *p);
 
+/* Perform cleanup of p2m mappings ahead of teardown. */
+int
+relinquish_p2m_mapping(struct domain *d);
+
 #else
 
 static inline bool
@@ -753,6 +751,11 @@ static inline void 
p2m_pod_offline_or_broken_replace(struct page_info *p)
 ASSERT_UNREACHABLE();
 }
 
+static inline int relinquish_p2m_mapping(struct domain *d)
+{
+return 0;
+}
+
 #endif
 
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 05d8536adcd7..fac41e5ec808 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2335,10 +2335,6 @@ static int p2m_add_foreign(struct domain *tdom, unsigned 
long fgfn,
 int rc;
 struct domain *fdom;
 
-/*
- * hvm fixme: until support is added to p2m teardown code to cleanup any
- * foreign entries, limit this to hardware domain only.
- */
 if ( !arch_acquire_resource_check(tdom) )
 return -EPERM;
 
@@ -2695,6 +2691,70 @@ int p2m_set_altp2m_view_visibility(struct domain *d, 
unsigned int altp2m_idx,
 return rc;
 }
 
+/*
+ * Remove foreign mappings from the p2m, as that drops the page reference taken
+ * when mapped.
+ */
+int relinquish_p2m_mapping(struct domain *d)
+{
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+unsigned long gfn = gfn_x(p2m->max_gfn);
+int rc = 0;
+
+if ( !paging_mode_translate(d) )
+return 0;
+
+BUG_ON(!d->is_dying);
+
+p2m_lock(p2m);
+
+/* Iterate over the whole p2m on debug builds to ensure correctness. */
+while ( gfn && (IS_ENABLED(CONFIG_DEBUG) || p2m->nr_foreign) )
+{
+unsigned int order;
+p2m_type_t t;
+p2m_access_t a;
+
+_get_gfn_type_access(p2m, _gfn(gfn - 1), , , 0, , 0);
+ASSERT(IS_ALIGNED(gfn, 1u << order));
+gfn -= 1 << order;
+
+if ( t == p2m_map_foreign )
+{
+ASSERT(p2m->nr_foreign);
+ASSERT(order == 0);
+/*
+ * Foreign mappings can only be of order 0, hence there's no need
+ * to align the gfn to the entry order.  Otherwise we would need to
+ * adjust gfn to point to the start of the page if order > 0.
+ */
+rc = p2m_set_entry(p2m, _gfn(gfn), INVALID_MFN, order, p2m_invalid,
+   p2m->default_access);
+if ( rc )
+{
+printk(XENLOG_ERR
+

[PATCH for-4.19? 0/2] xen/x86: support foreign mappings for HVM

2024-04-30 Thread Roger Pau Monne

Hello,

The following series attempts to solve a shortcoming of HVM/PVH guests
with the lack of support for foreign mappings.  Such lack of support
prevents using PVH based guests as stubdomains for example.

Add support in a way similar to how it was done on Arm, by iterating
over the p2m based on the maximum gfn.

Mostly untested, sending early in case it could be considered for 4.19.

Thanks, Roger.

Roger Pau Monne (2):
  xen/x86: account for max guest gfn and number of foreign mappings in
the p2m
  xen/x86: remove foreign mappings from the p2m on teardown

 CHANGELOG.md   |  1 +
 xen/arch/x86/domain.c  |  6 +++
 xen/arch/x86/include/asm/p2m.h | 28 ++
 xen/arch/x86/mm/p2m.c  | 70 --
 4 files changed, 94 insertions(+), 11 deletions(-)

-- 
2.44.0

[PATCH for-4.19? 1/2] xen/x86: account for max guest gfn and number of foreign mappings in the p2m

2024-04-30 Thread Roger Pau Monne

Keep track of the maximum gfn that has ever been populated into the p2m, and
also account for the number of foreign mappings.  Such information will be
needed in order to remove foreign mappings during teardown for HVM guests.

Right now the introduced counters are not consumed.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/include/asm/p2m.h | 11 +++
 xen/arch/x86/mm/p2m.c  |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
index 111badf89a6e..d95341ef4242 100644
--- a/xen/arch/x86/include/asm/p2m.h
+++ b/xen/arch/x86/include/asm/p2m.h
@@ -380,6 +380,14 @@ struct p2m_domain {
 unsigned int flags;
 unsigned long entry_count;
 } ioreq;
+
+/*
+ * Max gfn possibly mapped into the guest p2m.  Note max_gfn is not
+ * adjusted to account for removals from the p2m.
+ */
+gfn_t  max_gfn;
+/* Number of foreign mappings. */
+unsigned long  nr_foreign;
 #endif /* CONFIG_HVM */
 };
 
@@ -1049,6 +1057,8 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 if ( !page_get_owner_and_reference(mfn_to_page(nfn)) )
 return -EBUSY;
 
+p2m->nr_foreign++;
+
 break;
 
 default:
@@ -1069,6 +1079,7 @@ static inline int p2m_entry_modify(struct p2m_domain 
*p2m, p2m_type_t nt,
 return -EINVAL;
 }
 put_page(mfn_to_page(ofn));
+p2m->nr_foreign--;
 break;
 
 default:
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ce742c12e0de..05d8536adcd7 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -413,6 +413,8 @@ int p2m_set_entry(struct p2m_domain *p2m, gfn_t gfn, mfn_t 
mfn,
 set_rc = p2m->set_entry(p2m, gfn, mfn, order, p2mt, p2ma, -1);
 if ( set_rc )
 rc = set_rc;
+else
+p2m->max_gfn = gfn_max(gfn_add(gfn, 1u << order), p2m->max_gfn);
 
 gfn = gfn_add(gfn, 1UL << order);
 if ( !mfn_eq(mfn, INVALID_MFN) )
-- 
2.44.0

[PATCH for-4.19] ppc/riscv: fix arch_acquire_resource_check()

2024-04-30 Thread Roger Pau Monne

None of the implementations support set_foreign_p2m_entry() yet, neither they
have a p2m walk in domain_relinquish_resources() in order to remove the foreign
mappings from the p2m and thus drop the extra refcounts.

Adjust the arch helpers to return false and introduce a comment that clearly
states it is not only taking extra refcounts that's needed, but also dropping
them on domain teardown.

Fixes: 4988704e00d8 ('xen/riscv: introduce p2m.h')
Fixes: 4a2f68f90930 ('xen/ppc: Define minimal stub headers required for full 
build')
Signed-off-by: Roger Pau Monné 
---
 xen/arch/ppc/include/asm/p2m.h   | 7 ---
 xen/arch/riscv/include/asm/p2m.h | 7 ---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/ppc/include/asm/p2m.h b/xen/arch/ppc/include/asm/p2m.h
index 25ba05466853..f144ef8e1a54 100644
--- a/xen/arch/ppc/include/asm/p2m.h
+++ b/xen/arch/ppc/include/asm/p2m.h
@@ -81,10 +81,11 @@ static inline mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
 static inline bool arch_acquire_resource_check(struct domain *d)
 {
 /*
- * The reference counting of foreign entries in set_foreign_p2m_entry()
- * is supported on PPC.
+ * Requires refcounting the foreign mappings and walking the p2m on
+ * teardown in order to remove foreign pages from the p2m and drop the
+ * extra reference counts.
  */
-return true;
+return false;
 }
 
 static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 87b13f897926..387f372b5d26 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -79,10 +79,11 @@ static inline mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
 static inline bool arch_acquire_resource_check(struct domain *d)
 {
 /*
- * The reference counting of foreign entries in set_foreign_p2m_entry()
- * is supported on RISCV.
+ * Requires refcounting the foreign mappings and walking the p2m on
+ * teardown in order to remove foreign pages from the p2m and drop the
+ * extra reference counts.
  */
-return true;
+return false;
 }
 
 static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
-- 
2.44.0

[PATCH] tools/xen-cpuid: switch to use cpu-policy defined names

2024-04-30 Thread Roger Pau Monne

Like it was done recently for libxl, switch to using the auto-generated feature
names by the processing of cpufeatureset.h, this allows removing the open-coded
feature names, and unifies the feature naming with libxl and the hypervisor.

Note that leaf names need to be kept, as the current auto-generated data
doesn't contain the leaf names.

Signed-off-by: Roger Pau Monné 
---
Late for 4.19, but I would still like it to be considered for inclusion since
it's IMO a nice cleanup and reduces the burden of adding new feature bits into
the policy.
---
 tools/misc/xen-cpuid.c | 336 +++--
 1 file changed, 51 insertions(+), 285 deletions(-)

diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 8893547bebce..ab5d88472cf1 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -12,282 +12,33 @@
 
 #include 
 
-static uint32_t nr_features;
-
-static const char *const str_1d[32] =
-{
-[ 0] = "fpu",  [ 1] = "vme",
-[ 2] = "de",   [ 3] = "pse",
-[ 4] = "tsc",  [ 5] = "msr",
-[ 6] = "pae",  [ 7] = "mce",
-[ 8] = "cx8",  [ 9] = "apic",
-/* [10] */ [11] = "sysenter",
-[12] = "mtrr", [13] = "pge",
-[14] = "mca",  [15] = "cmov",
-[16] = "pat",  [17] = "pse36",
-[18] = "psn",  [19] = "clflush",
-/* [20] */ [21] = "ds",
-[22] = "acpi", [23] = "mmx",
-[24] = "fxsr", [25] = "sse",
-[26] = "sse2", [27] = "ss",
-[28] = "htt",  [29] = "tm",
-[30] = "ia64", [31] = "pbe",
-};
-
-static const char *const str_1c[32] =
-{
-[ 0] = "sse3",[ 1] = "pclmulqdq",
-[ 2] = "dtes64",  [ 3] = "monitor",
-[ 4] = "ds-cpl",  [ 5] = "vmx",
-[ 6] = "smx", [ 7] = "est",
-[ 8] = "tm2", [ 9] = "ssse3",
-[10] = "cntx-id", [11] = "sdgb",
-[12] = "fma", [13] = "cx16",
-[14] = "xtpr",[15] = "pdcm",
-/* [16] */[17] = "pcid",
-[18] = "dca", [19] = "sse41",
-[20] = "sse42",   [21] = "x2apic",
-[22] = "movebe",  [23] = "popcnt",
-[24] = "tsc-dl",  [25] = "aesni",
-[26] = "xsave",   [27] = "osxsave",
-[28] = "avx", [29] = "f16c",
-[30] = "rdrnd",   [31] = "hyper",
-};
-
-static const char *const str_e1d[32] =
-{
-[ 0] = "fpu",[ 1] = "vme",
-[ 2] = "de", [ 3] = "pse",
-[ 4] = "tsc",[ 5] = "msr",
-[ 6] = "pae",[ 7] = "mce",
-[ 8] = "cx8",[ 9] = "apic",
-/* [10] */   [11] = "syscall",
-[12] = "mtrr",   [13] = "pge",
-[14] = "mca",[15] = "cmov",
-[16] = "fcmov",  [17] = "pse36",
-/* [18] */   [19] = "mp",
-[20] = "nx", /* [21] */
-[22] = "mmx+",   [23] = "mmx",
-[24] = "fxsr",   [25] = "fxsr+",
-[26] = "pg1g",   [27] = "rdtscp",
-/* [28] */   [29] = "lm",
-[30] = "3dnow+", [31] = "3dnow",
-};
-
-static const char *const str_e1c[32] =
-{
-[ 0] = "lahf-lm",[ 1] = "cmp",
-[ 2] = "svm",[ 3] = "extapic",
-[ 4] = "cr8d",   [ 5] = "lzcnt",
-[ 6] = "sse4a",  [ 7] = "msse",
-[ 8] = "3dnowpf",[ 9] = "osvw",
-[10] = "ibs",[11] = "xop",
-[12] = "skinit", [13] = "wdt",
-/* [14] */   [15] = "lwp",
-[16] = "fma4",   [17] = "tce",
-/* [18] */   [19] = "nodeid",
-/* [20] */   [21] = "tbm",
-[22] = "topoext",[23] = "perfctr-core",
-[24] = "perfctr-nb", /* [25] */
-[26] = "dbx",[27] = "perftsc",
-[28] = "pcx-l2i",[29] = "monitorx",
-[30] = "addr-msk-ext",
-};
-
-static const char *const str_7b0[32] =
-{
-[ 0] = "fsgsbase", [ 1] = "tsc-adj",
-[ 2] = "sgx",  [ 3] = "bmi1",
-[ 4] = "hle",  [ 5] = "avx2",
-[ 6] = "fdp-exn",  [ 7] = "smep",
-[ 8] = "bmi2", [ 9] = "erms",
-[10] = "invpcid",  [11] = "rtm",
-[12] = "pqm",  [13] = "depfpp",
-[14] = "mpx",  [15] = "pqe",
-[16] = "avx512f",  [17] = "avx512dq",
-[18] = "rdseed",   [19] = "adx",
-[20] = "smap", [21] = "avx512-ifma",
-[22] = "pcommit",  [23] = "clflushopt",
-[24] = "clwb", [25] = "proc-trace",
-[26] = "avx512pf", [27] = "avx512er",
-[28] = "avx512cd", [29] = "sha",
-[30] = "avx512bw", [31] = "avx512vl",
-};
-
-static const char *const str_Da1[32] =
-{
-[ 0] = "xsaveopt", [ 1] = "xsavec",
-[ 2] = "xgetbv1",  [ 3] = "xsaves",
-};
-
-static const char *const str_7c0[32] =
-{
-[ 0] = "prefetchwt1",  [ 1] = "avx512-vbmi",
-[ 2] = "umip", [ 3] = "pku",
-[ 4] = "ospke",[ 5] = "waitpkg",
-[ 6] = "avx512-vbmi2", [ 7] = "cet-ss",
-[ 8] = "gfni", [ 9] = "vaes",
-[10] = "vpclmulqdq",   [11] = "avx512-vnni",
-[12] = "avx512-bitalg",
-[14] = "avx512-vpopcntdq",
-
-[22] = "rdpid",
-/* 24 */   [25] = "cldemote",
-/* 26 */   [27] = "movdiri",
-[28] = "movdir64b",[29] = "enqcmd",
-[30] = "sgx-lc",   [31] = "pks",
-};
-
-static const

[PATCH] xen/x86: add extra pages to unpopulated-alloc if available

2024-04-29 Thread Roger Pau Monne

Commit 262fc47ac174 ('xen/balloon: don't use PV mode extra memory for zone
device allocations') removed the addition of the extra memory ranges to the
unpopulated range allocator, using those only for the balloon driver.

This forces the unpopulated allocator to attach hotplug ranges even when spare
memory (as part of the extra memory ranges) is available.  Furthermore, on PVH
domains it defeats the purpose of commit 38620fc4e893 ('x86/xen: attempt to
inflate the memory balloon on PVH'), as extra memory ranges would only be
used to map foreign memory if the kernel is built without XEN_UNPOPULATED_ALLOC
support.

Fix this by adding a helpers that adds the extra memory ranges to the list of
unpopulated pages, and zeroes the ranges so they are not also consumed by the
balloon driver.

This should have been part of 38620fc4e893, hence the fixes tag.

Note the current logic relies on unpopulated_init() (and hence
arch_xen_unpopulated_init()) always being called ahead of balloon_init(), so
that the extra memory regions are consumed by arch_xen_unpopulated_init().

Fixes: 38620fc4e893 ('x86/xen: attempt to inflate the memory balloon on PVH')
Signed-off-by: Roger Pau Monné 
---
There's a lot of duplication between the unpopulated allocator and the balloon
driver.  I feel like the balloon driver should request any extra memory it
needs to the unpopulated allocator, so that the current helpers provided by the
XEN_BALLOON_MEMORY_HOTPLUG option could be replaced with wrappers around the
unpopulated handlers.

However this is much more work than strictly required here, and won't be
suitable for backport IMO.  Hence the more contained fix presented in this
patch.
---
 arch/x86/xen/enlighten.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a01ca255b0c6..b88722dfc4f8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -382,3 +382,36 @@ void __init xen_add_extra_mem(unsigned long start_pfn, 
unsigned long n_pfns)
 
memblock_reserve(PFN_PHYS(start_pfn), PFN_PHYS(n_pfns));
 }
+
+#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
+int __init arch_xen_unpopulated_init(struct resource **res)
+{
+   unsigned int i;
+
+   if (!xen_domain())
+   return -ENODEV;
+
+   /* Must be set strictly before calling xen_free_unpopulated_pages(). */
+   *res = _resource;
+
+   /*
+* Initialize with pages from the extra memory regions (see
+* arch/x86/xen/setup.c).
+*/
+   for (i = 0; i < XEN_EXTRA_MEM_MAX_REGIONS; i++) {
+   unsigned int j;
+
+   for (j = 0; j < xen_extra_mem[i].n_pfns; j++) {
+   struct page *pg =
+   pfn_to_page(xen_extra_mem[i].start_pfn + j);
+
+   xen_free_unpopulated_pages(1, );
+   }
+
+   /* Zero so region is not also added to the balloon driver. */
+   xen_extra_mem[i].n_pfns = 0;
+   }
+
+   return 0;
+}
+#endif
-- 
2.44.0

[PATCH 5/9] create-diff-object: move kpatch_include_symbol()

2024-04-29 Thread Roger Pau Monne

So it can be used by kpatch_process_special_sections() in further changes.

Non functional change.

Signed-off-by: Roger Pau Monné 
---
 create-diff-object.c | 74 ++--
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/create-diff-object.c b/create-diff-object.c
index 6a751bf3b789..7674d972e301 100644
--- a/create-diff-object.c
+++ b/create-diff-object.c
@@ -1198,6 +1198,43 @@ void kpatch_update_ex_table_addend(struct kpatch_elf 
*kelf,
}
 }
 
+#define inc_printf(fmt, ...) \
+   log_debug("%*s" fmt, recurselevel, "", ##__VA_ARGS__);
+
+static void kpatch_include_symbol(struct symbol *sym, int recurselevel)
+{
+   struct rela *rela;
+   struct section *sec;
+
+   inc_printf("start include_symbol(%s)\n", sym->name);
+   sym->include = 1;
+   inc_printf("symbol %s is included\n", sym->name);
+   /*
+* Check if sym is a non-local symbol (sym->sec is NULL) or
+* if an unchanged local symbol.  This a base case for the
+* inclusion recursion.
+*/
+   if (!sym->sec || sym->sec->include ||
+   (sym->type != STT_SECTION && sym->status == SAME))
+   goto out;
+   sec = sym->sec;
+   sec->include = 1;
+   inc_printf("section %s is included\n", sec->name);
+   if (sec->secsym && sec->secsym != sym) {
+   sec->secsym->include = 1;
+   inc_printf("section symbol %s is included\n", 
sec->secsym->name);
+   }
+   if (!sec->rela)
+   goto out;
+   sec->rela->include = 1;
+   inc_printf("section %s is included\n", sec->rela->name);
+   list_for_each_entry(rela, >rela->relas, list)
+   kpatch_include_symbol(rela->sym, recurselevel+1);
+out:
+   inc_printf("end include_symbol(%s)\n", sym->name);
+   return;
+}
+
 static void kpatch_regenerate_special_section(struct kpatch_elf *kelf,
  struct special_section *special,
  struct section *sec)
@@ -1455,43 +1492,6 @@ static void 
kpatch_include_standard_string_elements(struct kpatch_elf *kelf)
}
 }
 
-#define inc_printf(fmt, ...) \
-   log_debug("%*s" fmt, recurselevel, "", ##__VA_ARGS__);
-
-static void kpatch_include_symbol(struct symbol *sym, int recurselevel)
-{
-   struct rela *rela;
-   struct section *sec;
-
-   inc_printf("start include_symbol(%s)\n", sym->name);
-   sym->include = 1;
-   inc_printf("symbol %s is included\n", sym->name);
-   /*
-* Check if sym is a non-local symbol (sym->sec is NULL) or
-* if an unchanged local symbol.  This a base case for the
-* inclusion recursion.
-*/
-   if (!sym->sec || sym->sec->include ||
-   (sym->type != STT_SECTION && sym->status == SAME))
-   goto out;
-   sec = sym->sec;
-   sec->include = 1;
-   inc_printf("section %s is included\n", sec->name);
-   if (sec->secsym && sec->secsym != sym) {
-   sec->secsym->include = 1;
-   inc_printf("section symbol %s is included\n", 
sec->secsym->name);
-   }
-   if (!sec->rela)
-   goto out;
-   sec->rela->include = 1;
-   inc_printf("section %s is included\n", sec->rela->name);
-   list_for_each_entry(rela, >rela->relas, list)
-   kpatch_include_symbol(rela->sym, recurselevel+1);
-out:
-   inc_printf("end include_symbol(%s)\n", sym->name);
-   return;
-}
-
 static int kpatch_include_changed_functions(struct kpatch_elf *kelf)
 {
struct symbol *sym;
-- 
2.44.0

[PATCH 9/9] create-diff-object: account for special section changes

2024-04-29 Thread Roger Pau Monne

When deciding whether there's content to generate a payload also take into
account whether special sections have changed.  Initially account for changes
to alternative related section to cause the generation of a livepatch.

Note that accounting for hook sections is already done by
kpatch_include_hook_elements() when deciding whether there's changed content in
the object file.  .bugframe sections are also not accounted for since any
section in a bugframe section will be accompanied by a change in the function
the bugframe references (the placement of the BUG_INSTR will change in the
caller function).

Signed-off-by: Roger Pau Monné 
---
 create-diff-object.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/create-diff-object.c b/create-diff-object.c
index 8bfb6bf92a1b..957631097b8a 100644
--- a/create-diff-object.c
+++ b/create-diff-object.c
@@ -1663,6 +1663,28 @@ static int kpatch_include_hook_elements(struct 
kpatch_elf *kelf)
return num_new_functions;
 }
 
+static unsigned int include_special_sections(const struct kpatch_elf *kelf)
+{
+   const struct section *sec;
+   unsigned int nr = 0;
+
+   /*
+* Only account for changes in alternatives and exception table related
+* sections.  Hooks are already taken care of separately, and changes
+* in bug_frame sections always go along with changes in the caller
+* functions.
+*/
+   list_for_each_entry(sec, >sections, list)
+   if (sec->status != SAME &&
+   (!strcmp(sec->name, ".altinstructions") ||
+!strcmp(sec->name, ".altinstr_replacement") ||
+!strcmp(sec->name, ".ex_table") ||
+!strcmp(sec->name, ".fixup")))
+   nr++;
+
+   return nr;
+}
+
 static int kpatch_include_new_globals(struct kpatch_elf *kelf)
 {
struct symbol *sym;
@@ -2469,6 +2491,8 @@ int main(int argc, char *argv[])
kpatch_include_debug_sections(kelf_patched);
log_debug("Include hook elements\n");
num_changed += kpatch_include_hook_elements(kelf_patched);
+   log_debug("Include special sections\n");
+   num_changed += include_special_sections(kelf_patched);
log_debug("num_changed = %d\n", num_changed);
log_debug("Include new globals\n");
new_globals_exist = kpatch_include_new_globals(kelf_patched);
-- 
2.44.0

[PATCH 7/9] create-diff-object: don't account for changes to .bug_frame.? sections

2024-04-29 Thread Roger Pau Monne

bug_frame related sections exclusively contain addresses that reference back to
the address where the BUG_INSTR is executed.  As such, any change to the
contents of bug_frame sections (or it's relocations) will necessarily require a
change in the caller function, as the placement of the BUG_INSTR must have
changed.

Take advantage of this relocation, and unconditionally mark the bug_frame
sections themselves as not having changed, the logic in
kpatch_regenerate_special_section() will already take care of including any
bug_frame element group that references a function that has changed.

This should be a non functional change in the payload generated by
create-diff-object, but needs doing so that we can take into account changes to
.altinstructions and .ex_table sections themselves without unnecessarily also
pulling .bug_frame sections.

Signed-off-by: Roger Pau Monné 
---
 create-diff-object.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/create-diff-object.c b/create-diff-object.c
index f4e4da063d0a..d1b1477be1cd 100644
--- a/create-diff-object.c
+++ b/create-diff-object.c
@@ -1284,6 +1284,17 @@ static void kpatch_regenerate_special_section(struct 
kpatch_elf *kelf,
}
}
 
+   /*
+* For bug_frame sections don't care if the section itself or the
+* relocations have changed, as any change in the bug_frames will be
+* accompanied by a change in the caller function text, as the
+* BUG_INSTR will have a different placement in the caller.
+*/
+   if (!strncmp(special->name, ".bug_frames.", strlen(".bug_frames."))) {
+   sec->status = SAME;
+   sec->base->status = SAME;
+   }
+
for ( ; src_offset < sec->base->sh.sh_size; src_offset += group_size) {
 
group_size = special->group_size(kelf, src_offset);
-- 
2.44.0

[PATCH 3/9] livepatch-build: fix detection of structure sizes

2024-04-29 Thread Roger Pau Monne

The current runes assume that in the list of DWARF tags DW_AT_byte_size will
come after DW_AT_name, but that's not always the case.  On one of my builds
I've seen:

   DW_AT_name: (indirect string, offset: 0x3c45): 
exception_table_entry
   DW_AT_declaration : 1
 <1>: Abbrev Number: 5 (DW_TAG_const_type)
   DW_AT_type: <0xb617>
 <1>: Abbrev Number: 14 (DW_TAG_pointer_type)
   DW_AT_byte_size   : 8

Instead of assuming such order, explicitly search for the DW_AT_byte_size tag
when a match in the DW_AT_name one is found.

Signed-off-by: Roger Pau Monné 
---
All this ad hoc parsing of DWARF data seems very fragile.  This is an
improvement over the current logic, but I would still prefer if we could find a
more reliable way to obtain the struct sizes we need.
---
 livepatch-build | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/livepatch-build b/livepatch-build
index f3ca9399d149..aad9849f2ba9 100755
--- a/livepatch-build
+++ b/livepatch-build
@@ -427,9 +427,16 @@ if [ "${SKIP}" != "build" ]; then
 SPECIAL_VARS=$(readelf -wi "$OUTPUT/xen-syms" |
awk '
BEGIN { a = b = e = 0 }
+   # Ensure no fall through to the next tag without getting the 
size
+   (a == 1 || b == 1 || e == 1) && /DW_AT_name/ \
+   {print "can not get special structure size" > 
"/dev/stderr"; exit 1}
a == 0 && /DW_AT_name.* alt_instr/ {a = 1; next}
b == 0 && /DW_AT_name.* bug_frame/ {b = 1; next}
e == 0 && /DW_AT_name.* exception_table_entry/ {e = 1; next}
+   # Seek the line that contains the size
+   a == 1 && !/DW_AT_byte_size/ {next}
+   b == 1 && !/DW_AT_byte_size/ {next}
+   e == 1 && !/DW_AT_byte_size/ {next}
# Use shell printf to (possibly) convert from hex to decimal
a == 1 {printf("export ALT_STRUCT_SIZE=`printf \"%%d\" 
\"%s\"`\n", $4); a = 2}
b == 1 {printf("export BUG_STRUCT_SIZE=`printf \"%%d\" 
\"%s\"`\n", $4); b = 2}
-- 
2.44.0

[PATCH 1/9] livepatch-build-tools: allow patch file name sizes up to 127 characters

2024-04-29 Thread Roger Pau Monne

XenServer uses quite long Xen version names, and encode such in the livepatch
filename, and it's currently running out of space in the file name.

Bump max filename size to 127, so it also matches the patch name length in the
hypervisor interface.  Note the size of the buffer is 128 character, and the
last one is reserved for the null terminator.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Take into account the null terminator.
---
 livepatch-build | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/livepatch-build b/livepatch-build
index 948b2acfc2f6..f3ca9399d149 100755
--- a/livepatch-build
+++ b/livepatch-build
@@ -72,8 +72,9 @@ function make_patch_name()
 fi
 
 # Only allow alphanumerics and '_' and '-' in the patch name.  Everything
-# else is replaced with '-'.  Truncate to 48 chars.
-echo ${PATCHNAME//[^a-zA-Z0-9_-]/-} |cut -c 1-48
+# else is replaced with '-'.  Truncate to 127 chars
+# (XEN_LIVEPATCH_NAME_SIZE - 1).
+echo ${PATCHNAME//[^a-zA-Z0-9_-]/-} |cut -c -127
 }
 
 # Do a full normal build
-- 
2.44.0

[PATCH 6/9] create-diff-object: detect changes in .altinstr_replacement and .fixup sections

2024-04-29 Thread Roger Pau Monne

The current logic to detect changes in special sections elements will only
include group elements that reference function symbols that need including
(either because they have changed or are new).

This works fine in order to detect when a special section element needs
including because of the function is points has changed or is newly introduced,
but doesn't detect changes to the replacement code in .altinstr_replacement or
.fixup sections, as the relocations in that case are against STT_SECTION
symbols.

Fix this by also including the special section group if the symbol the
relocations points to is of type section.

Signed-off-by: Roger Pau Monné 
---
 create-diff-object.c | 29 +++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/create-diff-object.c b/create-diff-object.c
index 7674d972e301..f4e4da063d0a 100644
--- a/create-diff-object.c
+++ b/create-diff-object.c
@@ -1158,11 +1158,21 @@ static int should_keep_rela_group(struct section *sec, 
int start, int size)
struct rela *rela;
int found = 0;
 
-   /* check if any relas in the group reference any changed functions */
+   /*
+* Check if any relas in the group reference any changed functions.
+*
+* Relocations in the .altinstructions and .ex_table special sections
+* can also reference changed section symbols, since the replacement
+* text in both cases resides on a section that has no function symbols
+* (.altinstr_replacement and .fixup respectively).  Account for that
+* and also check whether the referenced symbols are section ones in
+* order to decide whether the relocation group needs including.
+*/
list_for_each_entry(rela, >relas, list) {
if (rela->offset >= start &&
rela->offset < start + size &&
-   rela->sym->type == STT_FUNC &&
+   (rela->sym->type == STT_FUNC ||
+rela->sym->type == STT_SECTION) &&
rela->sym->sec->include) {
found = 1;
log_debug("new/changed symbol %s found in special 
section %s\n",
@@ -1338,6 +1348,21 @@ static void kpatch_regenerate_special_section(struct 
kpatch_elf *kelf,
 
rela->sym->include = 1;
 
+   /*
+* Changes that cause only the
+* .altinstr_replacement or .fixup sections to
+* be modified need the target function of the
+* replacement to also be marked as CHANGED,
+* otherwise livepatch won't replace the
+* function, and the new replacement code won't
+* take effect.
+*/
+   if (rela->sym->type == STT_FUNC &&
+   rela->sym->status == SAME) {
+   rela->sym->status = CHANGED;
+   kpatch_include_symbol(rela->sym, 0);
+   }
+
if (!strcmp(special->name, ".fixup"))
kpatch_update_ex_table_addend(kelf, 
special,
  
src_offset,
-- 
2.44.0

[PATCH 8/9] create-diff-object: account for changes in the special section itself

2024-04-29 Thread Roger Pau Monne

Changes to special sections are not accounted for right now.  For example a
change that only affects .altinstructions or .ex_table won't be included in the
livepatch payload, if none of the symbols referenced by those sections have
also changed.

Since it's not possible to know which elements in the special group have
changed if we detect a change in the special section the whole group (minus
elements that have references to init section symbols) must be included.

Signed-off-by: Roger Pau Monné 
---
 create-diff-object.c | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/create-diff-object.c b/create-diff-object.c
index d1b1477be1cd..8bfb6bf92a1b 100644
--- a/create-diff-object.c
+++ b/create-diff-object.c
@@ -1170,13 +1170,32 @@ static int should_keep_rela_group(struct section *sec, 
int start, int size)
 */
list_for_each_entry(rela, >relas, list) {
if (rela->offset >= start &&
-   rela->offset < start + size &&
-   (rela->sym->type == STT_FUNC ||
-rela->sym->type == STT_SECTION) &&
-   rela->sym->sec->include) {
-   found = 1;
-   log_debug("new/changed symbol %s found in special 
section %s\n",
- rela->sym->name, sec->name);
+   rela->offset < start + size)
+   {
+   /*
+* Avoid including groups with relocations against
+* symbols living in init sections, those are never
+* included in livepatches.
+*/
+   if (is_init_section(rela->sym->sec))
+   return 0;
+
+   /*
+* If the base section has changed, always include all
+* groups (except those pointing to .init section
+* symbols), as we have no way to differentiate which
+* groups have changed.
+*/
+   if (sec->status != SAME || sec->base->status != SAME)
+   found = 1;
+   else if ((rela->sym->type == STT_FUNC ||
+ rela->sym->type == STT_SECTION) &&
+rela->sym->sec->include) {
+   found = 1;
+   log_debug(
+   "new/changed symbol %s found in special section %s\n",
+ rela->sym->name, sec->name);
+   }
}
}
 
-- 
2.44.0

[PATCH 4/9] create-diff-object: document kpatch_regenerate_special_section() behavior

2024-04-29 Thread Roger Pau Monne

The purpose of kpatch_regenerate_special_section() is fairly opaque without
spending a good amount of time and having quite a lot of knowledge about what
the special sections contains.

Introduce some comments in order to give context and describe the expected
functionality.

Signed-off-by: Roger Pau Monné 
---
 create-diff-object.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/create-diff-object.c b/create-diff-object.c
index d8a2afbf2774..6a751bf3b789 100644
--- a/create-diff-object.c
+++ b/create-diff-object.c
@@ -1210,6 +1210,12 @@ static void kpatch_regenerate_special_section(struct 
kpatch_elf *kelf,
 
src_offset = 0;
dest_offset = 0;
+
+   /*
+* Special sections processed here are array objects, hence in order to
+* detect whether a special section needs processing attempt to get the
+* element size.  Returning a size of 0 means no processing required.
+*/
group_size = special->group_size(kelf, src_offset);
if (group_size == 0) {
log_normal("Skipping regeneration of a special section: %s\n",
@@ -1246,6 +1252,33 @@ static void kpatch_regenerate_special_section(struct 
kpatch_elf *kelf,
if (src_offset + group_size > sec->base->sh.sh_size)
group_size = sec->base->sh.sh_size - src_offset;
 
+   /*
+* Special sections handled perform a bunch of different tasks,
+* but they all have something in common: they are array like
+* sections that reference functions in the object file being
+* processed.
+*
+* .bug_frames.* relocations reference the symbol (plus offset)
+* where the exception is triggered from.
+*
+* .altinstructions relocations contain references to
+* coordinates where the alternatives are to be applied, plus
+* coordinates that point to the replacement code in
+* .altinstr_replacement.
+*
+* .ex_table relocations contain references to the coordinates
+* where the fixup code should be executed, plus relocation
+* coordinates that point to the text code to execte living in
+* the .fixup section.
+*
+* .livepatch.hooks.* relocations point to the hook
+* functions.
+*
+* Such dependencies allow to make a decision on whether an
+* element in the array needs including in the livepatch: if
+* the symbol pointed by the relocation is new or has changed
+* the array element needs including.
+*/
include = should_keep_rela_group(sec, src_offset, group_size);
 
if (!include)
-- 
2.44.0

[PATCH 0/9] livepatch-build-tools: some bug fixes and improvements

2024-04-29 Thread Roger Pau Monne

Hello,

The first 3 patches in the series attempt to fix some bugs, I don't
think they will be specially controversial or difficult to review (patch
1 was already posted standalone).

The rest of the patches are more convoluted, as they attempt to solve
some shortcomings when attempting to create livepatches that change
alternatives or exceptions.  For example a patch that only changes
content in alternative blocks (the code that ends up in the
.altinstr_replacement section) won't work, as create-diff-object will
report that there are no changes.

I've attempted to test as many corner cases as I could come up with, but
it's hard to figure all the possible changes, plus it's also fairly easy
to inadvertently regress existing functionality.

Thanks, Roger.

Roger Pau Monne (9):
  livepatch-build-tools: allow patch file name sizes up to 127
characters
  create-diff-object: update default alt_instr size
  livepatch-build: fix detection of structure sizes
  create-diff-object: document kpatch_regenerate_special_section()
behavior
  create-diff-object: move kpatch_include_symbol()
  create-diff-object: detect changes in .altinstr_replacement and .fixup
sections
  create-diff-object: don't account for changes to .bug_frame.? sections
  create-diff-object: account for changes in the special section itself
  create-diff-object: account for special section changes

 create-diff-object.c | 196 +--
 livepatch-build  |  12 ++-
 2 files changed, 161 insertions(+), 47 deletions(-)

-- 
2.44.0

[PATCH 2/9] create-diff-object: update default alt_instr size

2024-04-29 Thread Roger Pau Monne

The size of the alt_instr structure in Xen is 14 instead of 12 bytes, adjust
it.

Signed-off-by: Roger Pau Monné 
---
 create-diff-object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/create-diff-object.c b/create-diff-object.c
index fed360a9aa68..d8a2afbf2774 100644
--- a/create-diff-object.c
+++ b/create-diff-object.c
@@ -1000,7 +1000,7 @@ static int altinstructions_group_size(struct kpatch_elf 
*kelf, int offset)
char *str;
if (!size) {
str = getenv("ALT_STRUCT_SIZE");
-   size = str ? atoi(str) : 12;
+   size = str ? atoi(str) : 14;
}
 
log_debug("altinstr_size=%d\n", size);
-- 
2.44.0

[PATCH] xen-livepatch: fix --force option comparison

2024-04-26 Thread Roger Pau Monne

The check for --force option shouldn't be against 0.

Reported-by: Jan Beulich 
Fixes: 62a72092a517 ('livepatch: introduce --force option')
Signed-off-by: Roger Pau Monné 
---
 tools/misc/xen-livepatch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/misc/xen-livepatch.c b/tools/misc/xen-livepatch.c
index c16fb6862d6c..f406ea1373ae 100644
--- a/tools/misc/xen-livepatch.c
+++ b/tools/misc/xen-livepatch.c
@@ -576,7 +576,7 @@ int main(int argc, char *argv[])
 return 0;
 }
 
-if ( strcmp("--force", argv[1]) )
+if ( !strcmp("--force", argv[1]) )
 {
 if ( argc <= 2 )
 {
-- 
2.44.0

[PATCH v4 2/4] livepatch: introduce --force option

2024-04-24 Thread Roger Pau Monne

Introduce a xen-livepatch tool --force option, that's propagated into the
hyerpvisor for livepatch operations.  The intention is for the option to be
used to bypass some checks that would otherwise prevent the patch from being
loaded.

Re purpose the pad field in xen_sysctl_livepatch_op to be a flags field that
applies to all livepatch operations.  The flag is currently only set by the
hypercall wrappers for the XEN_SYSCTL_LIVEPATCH_UPLOAD operation, as that's so
far the only one where it will be used initially.  Other uses can be added as
required.

Note that helpers would set the .pad field to 0, that's been removed since the
structure is already zero initialized at definition.

No functional usages of the new flag introduced in this patch.

Signed-off-by: Roger Pau Monné 
Acked-by: Jan Beulich 
Acked-by: Anthony PERARD 
---
Changes since v3:
 - Use strcmp instead of strncmp.

Changes since v2:
 - New in this version.
---
 tools/include/xenctrl.h |  3 ++-
 tools/libs/ctrl/xc_misc.c   |  7 +++
 tools/misc/xen-livepatch.c  | 21 +++--
 xen/common/livepatch.c  |  3 ++-
 xen/include/public/sysctl.h |  4 +++-
 5 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e05422..499685594427 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2555,7 +2555,8 @@ int xc_psr_get_hw_info(xc_interface *xch, uint32_t socket,
 #endif
 
 int xc_livepatch_upload(xc_interface *xch,
-char *name, unsigned char *payload, uint32_t size);
+char *name, unsigned char *payload, uint32_t size,
+bool force);
 
 int xc_livepatch_get(xc_interface *xch,
  char *name,
diff --git a/tools/libs/ctrl/xc_misc.c b/tools/libs/ctrl/xc_misc.c
index 5ecdfa2c7934..50282fd60dcc 100644
--- a/tools/libs/ctrl/xc_misc.c
+++ b/tools/libs/ctrl/xc_misc.c
@@ -576,7 +576,8 @@ int xc_getcpuinfo(xc_interface *xch, int max_cpus,
 int xc_livepatch_upload(xc_interface *xch,
 char *name,
 unsigned char *payload,
-uint32_t size)
+uint32_t size,
+bool force)
 {
 int rc;
 struct xen_sysctl sysctl = {};
@@ -612,7 +613,7 @@ int xc_livepatch_upload(xc_interface *xch,
 
 sysctl.cmd = XEN_SYSCTL_livepatch_op;
 sysctl.u.livepatch.cmd = XEN_SYSCTL_LIVEPATCH_UPLOAD;
-sysctl.u.livepatch.pad = 0;
+sysctl.u.livepatch.flags = force ? LIVEPATCH_FLAG_FORCE : 0;
 sysctl.u.livepatch.u.upload.size = size;
 set_xen_guest_handle(sysctl.u.livepatch.u.upload.payload, local);
 
@@ -656,7 +657,6 @@ int xc_livepatch_get(xc_interface *xch,
 
 sysctl.cmd = XEN_SYSCTL_livepatch_op;
 sysctl.u.livepatch.cmd = XEN_SYSCTL_LIVEPATCH_GET;
-sysctl.u.livepatch.pad = 0;
 
 sysctl.u.livepatch.u.get.status.state = 0;
 sysctl.u.livepatch.u.get.status.rc = 0;
@@ -985,7 +985,6 @@ static int _xc_livepatch_action(xc_interface *xch,
 
 sysctl.cmd = XEN_SYSCTL_livepatch_op;
 sysctl.u.livepatch.cmd = XEN_SYSCTL_LIVEPATCH_ACTION;
-sysctl.u.livepatch.pad = 0;
 sysctl.u.livepatch.u.action.cmd = action;
 sysctl.u.livepatch.u.action.timeout = timeout;
 sysctl.u.livepatch.u.action.flags = flags;
diff --git a/tools/misc/xen-livepatch.c b/tools/misc/xen-livepatch.c
index 2c4f69e596fa..c16fb6862d6c 100644
--- a/tools/misc/xen-livepatch.c
+++ b/tools/misc/xen-livepatch.c
@@ -19,11 +19,15 @@
 
 static xc_interface *xch;
 
+/* Global option to disable checks. */
+static bool force;
+
 void show_help(void)
 {
 fprintf(stderr,
 "xen-livepatch: live patching tool\n"
-"Usage: xen-livepatch  [args] [command-flags]\n"
+"Usage: xen-livepatch [--force]  [args] [command-flags]\n"
+" Use --force option to bypass some checks.\n"
 "  An unique name of payload. Up to %d characters.\n"
 "Commands:\n"
 "  help   display this help\n"
@@ -240,7 +244,7 @@ static int upload_func(int argc, char *argv[])
 return saved_errno;
 }
 printf("Uploading %s... ", filename);
-rc = xc_livepatch_upload(xch, name, fbuf, len);
+rc = xc_livepatch_upload(xch, name, fbuf, len, force);
 if ( rc )
 {
 rc = errno;
@@ -571,6 +575,19 @@ int main(int argc, char *argv[])
 show_help();
 return 0;
 }
+
+if ( strcmp("--force", argv[1]) )
+{
+if ( argc <= 2 )
+{
+show_help();
+return EXIT_FAILURE;
+}
+force = true;
+argv++;
+argc--;
+}
+
 for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
 if (!strcmp(main_options[i].name, argv[1]))
 break;
diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 351a3e0b9a60..502e264bc6fe 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -2125,7

[PATCH v4 4/4] x86/livepatch: perform sanity checks on the payload exception table contents

2024-04-24 Thread Roger Pau Monne

Ensure the entries of a payload exception table only apply to text regions in
the payload itself.  Since the payload exception table needs to be loaded and
active even before a patch is applied (because hooks might already rely on it),
make sure the exception table (if any) only contains fixups for the payload
text section.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Jan Beulich 
---
Changes since v3:
 - Rename to extable_in_bounds().
 - Use _p() instead of casting to void *.

Changes since v2:
 - New in this version.
---
 xen/arch/x86/extable.c | 18 ++
 xen/arch/x86/include/asm/uaccess.h |  3 +++
 xen/common/livepatch.c |  9 +
 3 files changed, 30 insertions(+)

diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 8415cd1fa249..2be090b92df8 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -228,3 +228,21 @@ unsigned long asmlinkage search_pre_exception_table(struct 
cpu_user_regs *regs)
 }
 return fixup;
 }
+
+#ifdef CONFIG_LIVEPATCH
+bool extable_in_bounds(const struct exception_table_entry *ex_start,
+   const struct exception_table_entry *ex_end,
+   const void *start, const void *end)
+{
+for ( ; ex_start < ex_end; ex_start++ )
+{
+const void *addr = _p(ex_addr(ex_start));
+const void *cont = _p(ex_cont(ex_start));
+
+if ( addr < start || addr >= end || cont < start || cont >= end )
+return false;
+}
+
+return true;
+}
+#endif
diff --git a/xen/arch/x86/include/asm/uaccess.h 
b/xen/arch/x86/include/asm/uaccess.h
index 48b684c19d44..4995edfdd8db 100644
--- a/xen/arch/x86/include/asm/uaccess.h
+++ b/xen/arch/x86/include/asm/uaccess.h
@@ -426,5 +426,8 @@ extern unsigned long search_exception_table(const struct 
cpu_user_regs *regs,
 extern void sort_exception_tables(void);
 extern void sort_exception_table(struct exception_table_entry *start,
  const struct exception_table_entry *stop);
+extern bool extable_in_bounds(const struct exception_table_entry *ex_start,
+  const struct exception_table_entry *ex_end,
+  const void *start, const void *end);
 
 #endif /* __X86_UACCESS_H__ */
diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 1afde0281402..4702c06902a9 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -912,6 +912,15 @@ static int prepare_payload(struct payload *payload,
 s = sec->load_addr;
 e = sec->load_addr + sec->sec->sh_size;
 
+if ( !extable_in_bounds(s, e, payload->text_addr,
+payload->text_addr + payload->text_size) )
+{
+printk(XENLOG_ERR LIVEPATCH
+   "%s: Invalid exception table with out of bounds entries\n",
+   elf->name);
+return -EINVAL;
+}
+
 sort_exception_table(s ,e);
 
 region->ex = s;
-- 
2.44.0

[PATCH v4 0/4] livepatch: minor bug fixes and improvements

2024-04-24 Thread Roger Pau Monne

Hello,

Following series contain some minor bugfixes and improvements for
livepatch logic, both inside the hypervisor and on the command line
tool.

Thanks, Roger.

Roger Pau Monne (4):
  xen-livepatch: fix parameter name parsing
  livepatch: introduce --force option
  livepatch: refuse to resolve symbols that belong to init sections
  x86/livepatch: perform sanity checks on the payload exception table
contents

 tools/include/xenctrl.h|  3 ++-
 tools/libs/ctrl/xc_misc.c  |  7 +++
 tools/misc/xen-livepatch.c | 25 +
 xen/arch/x86/extable.c | 18 ++
 xen/arch/x86/include/asm/uaccess.h |  3 +++
 xen/common/livepatch.c | 25 +++--
 xen/common/livepatch_elf.c | 22 +-
 xen/include/public/sysctl.h|  4 +++-
 xen/include/xen/livepatch_elf.h|  2 +-
 9 files changed, 91 insertions(+), 18 deletions(-)

-- 
2.44.0

[PATCH v4 3/4] livepatch: refuse to resolve symbols that belong to init sections

2024-04-24 Thread Roger Pau Monne

Livepatch payloads containing symbols that belong to init sections can only
lead to page faults later on, as by the time the livepatch is loaded init
sections have already been freed.

Refuse to resolve such symbols and return an error instead.

Note such resolutions are only relevant for symbols that point to undefined
sections (SHN_UNDEF), as that implies the symbol is not in the current payload
and hence must either be a Xen or a different livepatch payload symbol.

Do not allow to resolve symbols that point to __init_begin, as that address is
also unmapped.  On the other hand, __init_end is not unmapped, and hence allow
resolutions against it.

Since __init_begin can alias other symbols (like _erodata for example)
allow the force flag to override the check and resolve the symbol anyway.

Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Print warning message even when using the force option.

Changes since v2:
 - Allow bypassing added check with the force flag.

Changes since v1:
 - Fix off-by-one in range checking.
---
 xen/common/livepatch.c  | 13 -
 xen/common/livepatch_elf.c  | 22 +-
 xen/include/xen/livepatch_elf.h |  2 +-
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 502e264bc6fe..1afde0281402 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -1080,7 +1080,8 @@ static void free_payload(struct payload *data)
 xfree(data);
 }
 
-static int load_payload_data(struct payload *payload, void *raw, size_t len)
+static int load_payload_data(struct payload *payload, void *raw, size_t len,
+ bool force)
 {
 struct livepatch_elf elf = { .name = payload->name, .len = len };
 int rc = 0;
@@ -1093,7 +1094,7 @@ static int load_payload_data(struct payload *payload, 
void *raw, size_t len)
 if ( rc )
 goto out;
 
-rc = livepatch_elf_resolve_symbols();
+rc = livepatch_elf_resolve_symbols(, force);
 if ( rc )
 goto out;
 
@@ -1133,7 +1134,8 @@ static int load_payload_data(struct payload *payload, 
void *raw, size_t len)
 return rc;
 }
 
-static int livepatch_upload(struct xen_sysctl_livepatch_upload *upload)
+static int livepatch_upload(struct xen_sysctl_livepatch_upload *upload,
+bool force)
 {
 struct payload *data, *found;
 char n[XEN_LIVEPATCH_NAME_SIZE];
@@ -1162,7 +1164,7 @@ static int livepatch_upload(struct 
xen_sysctl_livepatch_upload *upload)
 {
 memcpy(data->name, n, strlen(n));
 
-rc = load_payload_data(data, raw_data, upload->size);
+rc = load_payload_data(data, raw_data, upload->size, force);
 if ( rc )
 goto out;
 
@@ -2132,7 +2134,8 @@ int livepatch_op(struct xen_sysctl_livepatch_op 
*livepatch)
 switch ( livepatch->cmd )
 {
 case XEN_SYSCTL_LIVEPATCH_UPLOAD:
-rc = livepatch_upload(>u.upload);
+rc = livepatch_upload(>u.upload,
+  livepatch->flags & LIVEPATCH_FLAG_FORCE);
 break;
 
 case XEN_SYSCTL_LIVEPATCH_GET:
diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c
index 45d73912a3cd..1d0ed8f99bcc 100644
--- a/xen/common/livepatch_elf.c
+++ b/xen/common/livepatch_elf.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -276,7 +277,7 @@ static int elf_get_sym(struct livepatch_elf *elf, const 
void *data)
 return 0;
 }
 
-int livepatch_elf_resolve_symbols(struct livepatch_elf *elf)
+int livepatch_elf_resolve_symbols(struct livepatch_elf *elf, bool force)
 {
 unsigned int i;
 int rc = 0;
@@ -310,6 +311,25 @@ int livepatch_elf_resolve_symbols(struct livepatch_elf 
*elf)
 break;
 }
 }
+
+/*
+ * Ensure not an init symbol.  Only applicable to Xen symbols, as
+ * livepatch payloads don't have init sections or equivalent.
+ */
+else if ( st_value >= (uintptr_t)&__init_begin &&
+  st_value <  (uintptr_t)&__init_end )
+{
+printk("%s" LIVEPATCH "%s: symbol %s is in init section%s\n",
+   force ? XENLOG_WARNING : XENLOG_ERR,
+   elf->name, elf->sym[i].name,
+   force ? "" : ", not resolving");
+if ( !force )
+{
+rc = -ENXIO;
+break;
+}
+}
+
 dprintk(XENLOG_DEBUG, LIVEPATCH "%s: Undefined symbol resolved: %s 
=> %#"PRIxElfAddr"\n",
 elf->name, elf->sym[i].name, st_value);
 break;
diff --git a/xen/include/xen/livepatch_elf.h b/xen/include/xen/livepatch_elf.h
index 7116deaddc28..84e9c5eb7be5 100644
--- a/xen/include/xen/livepatch_elf.h
+++ b/xen/include/xen/livepatch_elf.h
@@ -44,7 +44,7 @@ livepatch_elf_sec_by_name(const struct

[PATCH v4 1/4] xen-livepatch: fix parameter name parsing

2024-04-24 Thread Roger Pau Monne

It's incorrect to restrict strncmp to the length of the command line input
parameter, as then a user passing a rune like:

% xen-livepatch up foo.livepatch

Would match against the "upload" command, because the string comparison has
been truncated to the length of the input argument.  Use strcmp instead which
doesn't truncate.  Otherwise in order to keep using strncmp we would need to
also check strings are of the same length before doing the comparison.

Fixes: 05bb8afedede ('xen-xsplice: Tool to manipulate xsplice payloads')
Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - Use strcmp.

Changes since v2:
 - New in this version.
---
 tools/misc/xen-livepatch.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/misc/xen-livepatch.c b/tools/misc/xen-livepatch.c
index 5bf9d9a32b65..2c4f69e596fa 100644
--- a/tools/misc/xen-livepatch.c
+++ b/tools/misc/xen-livepatch.c
@@ -572,13 +572,13 @@ int main(int argc, char *argv[])
 return 0;
 }
 for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
-if (!strncmp(main_options[i].name, argv[1], strlen(argv[1])))
+if (!strcmp(main_options[i].name, argv[1]))
 break;
 
 if ( i == ARRAY_SIZE(main_options) )
 {
 for ( j = 0; j < ARRAY_SIZE(action_options); j++ )
-if (!strncmp(action_options[j].name, argv[1], strlen(argv[1])))
+if (!strcmp(action_options[j].name, argv[1]))
 break;
 
 if ( j == ARRAY_SIZE(action_options) )
-- 
2.44.0

[PATCH v3 1/4] xen-livepatch: fix parameter name parsing

2024-04-23 Thread Roger Pau Monne

It's incorrect to restrict strncmp to the length of the command line input
parameter, as then a user passing a rune like:

% xen-livepatch up foo.livepatch

Would match against the "upload" command, because the string comparison has
been truncated to the length of the input argument.  Instead the truncation
should be done based on the length of the command name stored in the internal
array of actions.

Fixes: 05bb8afedede ('xen-xsplice: Tool to manipulate xsplice payloads')
Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - New in this version.
---
 tools/misc/xen-livepatch.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/misc/xen-livepatch.c b/tools/misc/xen-livepatch.c
index 5bf9d9a32b65..a246e5dfd38e 100644
--- a/tools/misc/xen-livepatch.c
+++ b/tools/misc/xen-livepatch.c
@@ -572,13 +572,15 @@ int main(int argc, char *argv[])
 return 0;
 }
 for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
-if (!strncmp(main_options[i].name, argv[1], strlen(argv[1])))
+if (!strncmp(main_options[i].name, argv[1],
+ strlen(main_options[i].name)))
 break;
 
 if ( i == ARRAY_SIZE(main_options) )
 {
 for ( j = 0; j < ARRAY_SIZE(action_options); j++ )
-if (!strncmp(action_options[j].name, argv[1], strlen(argv[1])))
+if (!strncmp(action_options[j].name, argv[1],
+ strlen(action_options[j].name)))
 break;
 
 if ( j == ARRAY_SIZE(action_options) )
-- 
2.44.0

[PATCH v3 3/4] livepatch: refuse to resolve symbols that belong to init sections

2024-04-23 Thread Roger Pau Monne

Livepatch payloads containing symbols that belong to init sections can only
lead to page faults later on, as by the time the livepatch is loaded init
sections have already been freed.

Refuse to resolve such symbols and return an error instead.

Note such resolutions are only relevant for symbols that point to undefined
sections (SHN_UNDEF), as that implies the symbol is not in the current payload
and hence must either be a Xen or a different livepatch payload symbol.

Do not allow to resolve symbols that point to __init_begin, as that address is
also unmapped.  On the other hand, __init_end is not unmapped, and hence allow
resolutions against it.

Since __init_begin can alias other symbols (like _erodata for example)
allow the force flag to override the check and resolve the symbol anyway.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Allow bypassing added check with the force flag.

Changes since v1:
 - Fix off-by-one in range checking.
---
 xen/common/livepatch.c  | 13 -
 xen/common/livepatch_elf.c  | 18 +-
 xen/include/xen/livepatch_elf.h |  2 +-
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 1503a84457e4..36cf4bee8b8a 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -1080,7 +1080,8 @@ static void free_payload(struct payload *data)
 xfree(data);
 }
 
-static int load_payload_data(struct payload *payload, void *raw, size_t len)
+static int load_payload_data(struct payload *payload, void *raw, size_t len,
+ bool force)
 {
 struct livepatch_elf elf = { .name = payload->name, .len = len };
 int rc = 0;
@@ -1093,7 +1094,7 @@ static int load_payload_data(struct payload *payload, 
void *raw, size_t len)
 if ( rc )
 goto out;
 
-rc = livepatch_elf_resolve_symbols();
+rc = livepatch_elf_resolve_symbols(, force);
 if ( rc )
 goto out;
 
@@ -1133,7 +1134,8 @@ static int load_payload_data(struct payload *payload, 
void *raw, size_t len)
 return rc;
 }
 
-static int livepatch_upload(struct xen_sysctl_livepatch_upload *upload)
+static int livepatch_upload(struct xen_sysctl_livepatch_upload *upload,
+bool force)
 {
 struct payload *data, *found;
 char n[XEN_LIVEPATCH_NAME_SIZE];
@@ -1162,7 +1164,7 @@ static int livepatch_upload(struct 
xen_sysctl_livepatch_upload *upload)
 {
 memcpy(data->name, n, strlen(n));
 
-rc = load_payload_data(data, raw_data, upload->size);
+rc = load_payload_data(data, raw_data, upload->size, force);
 if ( rc )
 goto out;
 
@@ -2132,7 +2134,8 @@ int livepatch_op(struct xen_sysctl_livepatch_op 
*livepatch)
 switch ( livepatch->cmd )
 {
 case XEN_SYSCTL_LIVEPATCH_UPLOAD:
-rc = livepatch_upload(>u.upload);
+rc = livepatch_upload(>u.upload,
+  livepatch->flags & LIVEPATCH_FLAG_FORCE);
 break;
 
 case XEN_SYSCTL_LIVEPATCH_GET:
diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c
index 45d73912a3cd..0436f2d5fcbd 100644
--- a/xen/common/livepatch_elf.c
+++ b/xen/common/livepatch_elf.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -276,7 +277,7 @@ static int elf_get_sym(struct livepatch_elf *elf, const 
void *data)
 return 0;
 }
 
-int livepatch_elf_resolve_symbols(struct livepatch_elf *elf)
+int livepatch_elf_resolve_symbols(struct livepatch_elf *elf, bool force)
 {
 unsigned int i;
 int rc = 0;
@@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct livepatch_elf 
*elf)
 break;
 }
 }
+
+/*
+ * Ensure not an init symbol.  Only applicable to Xen symbols, as
+ * livepatch payloads don't have init sections or equivalent.
+ */
+else if ( st_value >= (uintptr_t)&__init_begin &&
+  st_value <  (uintptr_t)&__init_end && !force )
+{
+printk(XENLOG_ERR LIVEPATCH
+   "%s: symbol %s is in init section, not resolving\n",
+   elf->name, elf->sym[i].name);
+rc = -ENXIO;
+break;
+}
+
 dprintk(XENLOG_DEBUG, LIVEPATCH "%s: Undefined symbol resolved: %s 
=> %#"PRIxElfAddr"\n",
 elf->name, elf->sym[i].name, st_value);
 break;
diff --git a/xen/include/xen/livepatch_elf.h b/xen/include/xen/livepatch_elf.h
index 7116deaddc28..84e9c5eb7be5 100644
--- a/xen/include/xen/livepatch_elf.h
+++ b/xen/include/xen/livepatch_elf.h
@@ -44,7 +44,7 @@ livepatch_elf_sec_by_name(const struct livepatch_elf *elf,
 int livepatch_elf_load(struct livepatch_elf *elf, const void *data);
 void livepatch_elf_free(struct livepatch_elf *elf);
 
-int livepatch_elf_resolve_symbols(struct livepatch_elf *elf);
+int

[PATCH v3 4/4] x86/livepatch: perform sanity checks on the payload exception table contents

2024-04-23 Thread Roger Pau Monne

Ensure the entries of a payload exception table only apply to text regions in
the payload itself.  Since the payload exception table needs to be loaded and
active even before a patch is applied (because hooks might already rely on it),
make sure the exception table (if any) only contains fixups for the payload
text section.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - New in this version.
---
 xen/arch/x86/extable.c | 18 ++
 xen/arch/x86/include/asm/uaccess.h |  4 
 xen/common/livepatch.c |  9 +
 3 files changed, 31 insertions(+)

diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 8415cd1fa249..9e91e8234e71 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -228,3 +228,21 @@ unsigned long asmlinkage search_pre_exception_table(struct 
cpu_user_regs *regs)
 }
 return fixup;
 }
+
+#ifdef CONFIG_LIVEPATCH
+bool extable_is_between_bounds(const struct exception_table_entry *ex_start,
+   const struct exception_table_entry *ex_end,
+   const void *start, const void *end)
+{
+for ( ; ex_start < ex_end; ex_start++ )
+{
+const void *addr = (void *)ex_addr(ex_start);
+const void *cont = (void *)ex_cont(ex_start);
+
+if ( addr < start || addr >= end || cont < start || cont >= end )
+return false;
+}
+
+return true;
+}
+#endif
diff --git a/xen/arch/x86/include/asm/uaccess.h 
b/xen/arch/x86/include/asm/uaccess.h
index 48b684c19d44..0dad61e21a9c 100644
--- a/xen/arch/x86/include/asm/uaccess.h
+++ b/xen/arch/x86/include/asm/uaccess.h
@@ -426,5 +426,9 @@ extern unsigned long search_exception_table(const struct 
cpu_user_regs *regs,
 extern void sort_exception_tables(void);
 extern void sort_exception_table(struct exception_table_entry *start,
  const struct exception_table_entry *stop);
+extern bool extable_is_between_bounds(
+const struct exception_table_entry *ex_start,
+const struct exception_table_entry *ex_end,
+const void *start, const void *end);
 
 #endif /* __X86_UACCESS_H__ */
diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 36cf4bee8b8a..67b6815d87ac 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -912,6 +912,15 @@ static int prepare_payload(struct payload *payload,
 s = sec->load_addr;
 e = sec->load_addr + sec->sec->sh_size;
 
+if ( !extable_is_between_bounds(s, e, payload->text_addr,
+payload->text_addr + payload->text_size) )
+{
+printk(XENLOG_ERR LIVEPATCH
+   "%s: Invalid exception table with out of bounds entries\n",
+   elf->name);
+return -EINVAL;
+}
+
 sort_exception_table(s ,e);
 
 region->ex = s;
-- 
2.44.0

[PATCH v3 2/4] livepatch: introduce --force option

2024-04-23 Thread Roger Pau Monne

Introduce a xen-livepatch tool --force option, that's propagated into the
hyerpvisor for livepatch operations.  The intention is for the option to be
used to bypass some checks that would otherwise prevent the patch from being
loaded.

Re purpose the pad field in xen_sysctl_livepatch_op to be a flags field that
applies to all livepatch operations.  The flag is currently only set by the
hypercall wrappers for the XEN_SYSCTL_LIVEPATCH_UPLOAD operation, as that's so
far the only one where it will be used initially.  Other uses can be added as
required.

Note that helpers would set the .pad field to 0, that's been removed since the
structure is already zero initialized at definition.

No functional usages of the new flag introduced in this patch.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - New in this version.
---
 tools/include/xenctrl.h |  3 ++-
 tools/libs/ctrl/xc_misc.c   |  7 +++
 tools/misc/xen-livepatch.c  | 21 +++--
 xen/common/livepatch.c  |  3 ++-
 xen/include/public/sysctl.h |  4 +++-
 5 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e05422..499685594427 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2555,7 +2555,8 @@ int xc_psr_get_hw_info(xc_interface *xch, uint32_t socket,
 #endif
 
 int xc_livepatch_upload(xc_interface *xch,
-char *name, unsigned char *payload, uint32_t size);
+char *name, unsigned char *payload, uint32_t size,
+bool force);
 
 int xc_livepatch_get(xc_interface *xch,
  char *name,
diff --git a/tools/libs/ctrl/xc_misc.c b/tools/libs/ctrl/xc_misc.c
index 5ecdfa2c7934..50282fd60dcc 100644
--- a/tools/libs/ctrl/xc_misc.c
+++ b/tools/libs/ctrl/xc_misc.c
@@ -576,7 +576,8 @@ int xc_getcpuinfo(xc_interface *xch, int max_cpus,
 int xc_livepatch_upload(xc_interface *xch,
 char *name,
 unsigned char *payload,
-uint32_t size)
+uint32_t size,
+bool force)
 {
 int rc;
 struct xen_sysctl sysctl = {};
@@ -612,7 +613,7 @@ int xc_livepatch_upload(xc_interface *xch,
 
 sysctl.cmd = XEN_SYSCTL_livepatch_op;
 sysctl.u.livepatch.cmd = XEN_SYSCTL_LIVEPATCH_UPLOAD;
-sysctl.u.livepatch.pad = 0;
+sysctl.u.livepatch.flags = force ? LIVEPATCH_FLAG_FORCE : 0;
 sysctl.u.livepatch.u.upload.size = size;
 set_xen_guest_handle(sysctl.u.livepatch.u.upload.payload, local);
 
@@ -656,7 +657,6 @@ int xc_livepatch_get(xc_interface *xch,
 
 sysctl.cmd = XEN_SYSCTL_livepatch_op;
 sysctl.u.livepatch.cmd = XEN_SYSCTL_LIVEPATCH_GET;
-sysctl.u.livepatch.pad = 0;
 
 sysctl.u.livepatch.u.get.status.state = 0;
 sysctl.u.livepatch.u.get.status.rc = 0;
@@ -985,7 +985,6 @@ static int _xc_livepatch_action(xc_interface *xch,
 
 sysctl.cmd = XEN_SYSCTL_livepatch_op;
 sysctl.u.livepatch.cmd = XEN_SYSCTL_LIVEPATCH_ACTION;
-sysctl.u.livepatch.pad = 0;
 sysctl.u.livepatch.u.action.cmd = action;
 sysctl.u.livepatch.u.action.timeout = timeout;
 sysctl.u.livepatch.u.action.flags = flags;
diff --git a/tools/misc/xen-livepatch.c b/tools/misc/xen-livepatch.c
index a246e5dfd38e..cf2e5fada12d 100644
--- a/tools/misc/xen-livepatch.c
+++ b/tools/misc/xen-livepatch.c
@@ -19,11 +19,15 @@
 
 static xc_interface *xch;
 
+/* Global option to disable checks. */
+static bool force;
+
 void show_help(void)
 {
 fprintf(stderr,
 "xen-livepatch: live patching tool\n"
-"Usage: xen-livepatch  [args] [command-flags]\n"
+"Usage: xen-livepatch [--force]  [args] [command-flags]\n"
+" Use --force option to bypass some checks.\n"
 "  An unique name of payload. Up to %d characters.\n"
 "Commands:\n"
 "  help   display this help\n"
@@ -240,7 +244,7 @@ static int upload_func(int argc, char *argv[])
 return saved_errno;
 }
 printf("Uploading %s... ", filename);
-rc = xc_livepatch_upload(xch, name, fbuf, len);
+rc = xc_livepatch_upload(xch, name, fbuf, len, force);
 if ( rc )
 {
 rc = errno;
@@ -571,6 +575,19 @@ int main(int argc, char *argv[])
 show_help();
 return 0;
 }
+
+if ( !strncmp("--force", argv[1], strlen("--force")) )
+{
+if ( argc <= 2 )
+{
+show_help();
+return EXIT_FAILURE;
+}
+force = true;
+argv++;
+argc--;
+}
+
 for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
 if (!strncmp(main_options[i].name, argv[1],
  strlen(main_options[i].name)))
diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 351a3e0b9a60..1503a84457e4 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -2125,7 +2125,8 @@ int livepatch_op(struct

[PATCH v3 0/4] livepatch: minor bug fixes and improvements

2024-04-23 Thread Roger Pau Monne

Hello,

Following series contain some minor bugfixes and improvements for
livepatch logic, both inside the hypervisor and on the command line
tool.

Thanks, Roger.

Roger Pau Monne (4):
  xen-livepatch: fix parameter name parsing
  livepatch: introduce --force option
  livepatch: refuse to resolve symbols that belong to init sections
  x86/livepatch: perform sanity checks on the payload exception table
contents

 tools/include/xenctrl.h|  3 ++-
 tools/libs/ctrl/xc_misc.c  |  7 +++
 tools/misc/xen-livepatch.c | 27 +++
 xen/arch/x86/extable.c | 18 ++
 xen/arch/x86/include/asm/uaccess.h |  4 
 xen/common/livepatch.c | 25 +++--
 xen/common/livepatch_elf.c | 18 +-
 xen/include/public/sysctl.h|  4 +++-
 xen/include/xen/livepatch_elf.h|  2 +-
 9 files changed, 90 insertions(+), 18 deletions(-)

-- 
2.44.0

[PATCH v2 1/2] x86/video: add boot_video_info offset generation to asm-offsets

2024-04-22 Thread Roger Pau Monne

Currently the offsets into the boot_video_info struct are manually encoded in
video.S, which is fragile.  Generate them in asm-offsets.c and switch the
current code to use those instead.

No functional change intended.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Andrew Cooper 
---
Changes since v1:
 - Style changes.
 - Calculate remaining BVI size without referencing specific fields.
---
 xen/arch/x86/boot/video.S | 83 ---
 xen/arch/x86/x86_64/asm-offsets.c | 26 ++
 2 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/xen/arch/x86/boot/video.S b/xen/arch/x86/boot/video.S
index 0ae04f270f8c..f78906878a16 100644
--- a/xen/arch/x86/boot/video.S
+++ b/xen/arch/x86/boot/video.S
@@ -26,32 +26,13 @@
 /* Force 400 scan lines for standard modes (hack to fix bad BIOS behaviour */
 #undef CONFIG_VIDEO_400_HACK
 
-/* Positions of various video parameters passed to the kernel */
-/* (see also include/linux/tty.h) */
-#define PARAM_CURSOR_POS0x00
-#define PARAM_VIDEO_MODE0x02
-#define PARAM_VIDEO_COLS0x03
-#define PARAM_VIDEO_LINES   0x04
-#define PARAM_HAVE_VGA  0x05
-#define PARAM_FONT_POINTS   0x06
-#define PARAM_CAPABILITIES  0x08
-#define PARAM_LFB_LINELENGTH0x0c
-#define PARAM_LFB_WIDTH 0x0e
-#define PARAM_LFB_HEIGHT0x10
-#define PARAM_LFB_DEPTH 0x12
-#define PARAM_LFB_BASE  0x14
-#define PARAM_LFB_SIZE  0x18
-#define PARAM_LFB_COLORS0x1c
-#define PARAM_VESAPM_SEG0x24
-#define PARAM_VESAPM_OFF0x26
-#define PARAM_VESA_ATTRIB   0x28
 #define _param(param) bootsym(boot_vid_info)+(param)
 
 video:  xorw%ax, %ax
 movw%ax, %gs# GS is zero
 cld
 callbasic_detect# Basic adapter type testing (EGA/VGA/MDA/CGA)
-cmpb$0,_param(PARAM_HAVE_VGA)
+cmpb$0, _param(BVI_have_vga)
 je  1f# Bail if there's no VGA
 movwbootsym(boot_vid_mode), %ax # User selected video mode
 cmpw$ASK_VGA, %ax   # Bring up the menu
@@ -69,7 +50,7 @@ vid1:   callstore_edid
 
 # Detect if we have CGA, MDA, EGA or VGA and pass it to the kernel.
 basic_detect:
-movb$0, _param(PARAM_HAVE_VGA)
+movb$0, _param(BVI_have_vga)
 movb$0x12, %ah  # Check EGA/VGA
 movb$0x10, %bl
 int $0x10
@@ -79,7 +60,7 @@ basic_detect:
 int $0x10
 cmpb$0x1a, %al  # 1a means VGA...
 jne basret  # anything else is EGA.
-incb_param(PARAM_HAVE_VGA)  # We've detected a VGA
+incb_param(BVI_have_vga)# We've detected a VGA
 basret: ret
 
 # Store the video mode parameters for later usage by the kernel.
@@ -92,57 +73,57 @@ mode_params:
 movb$0x03, %ah  # Read cursor position
 xorb%bh, %bh
 int $0x10
-movw%dx, _param(PARAM_CURSOR_POS)
+movw%dx, _param(BVI_cursor_pos)
 movb$0x0f, %ah  # Read page/mode/width
 int $0x10
-movw%ax, _param(PARAM_VIDEO_MODE)   # Video mode and screen width
+movw%ax, _param(BVI_video_mode) # Video mode and screen width
 movw%gs:(0x485), %ax# Font size
-movw%ax, _param(PARAM_FONT_POINTS)  # (valid only on EGA/VGA)
+movw%ax, _param(BVI_font_points)# (valid only on EGA/VGA)
 movwbootsym(force_size), %ax# Forced size?
 orw %ax, %ax
 jz  mopar1
 
-movb%ah, _param(PARAM_VIDEO_COLS)
-movb%al, _param(PARAM_VIDEO_LINES)
+movb%ah, _param(BVI_video_cols)
+movb%al, _param(BVI_video_lines)
 ret
 
 mopar1: movb%gs:(0x484), %al# On EGA/VGA, use the EGA+ BIOS
 incb%al # location of max lines.
-mopar2: movb%al, _param(PARAM_VIDEO_LINES)
+mopar2: movb%al, _param(BVI_video_lines)
 ret
 
 # Fetching of VESA frame buffer parameters
 mopar_gr:
 movw$vesa_mode_info, %di
-movb$0x23, _param(PARAM_HAVE_VGA)
+movb$0x23, _param(BVI_have_vga)
 movw16(%di), %ax
-movw%ax, _param(PARAM_LFB_LINELENGTH)
+movw%ax, _param(BVI_lfb_linelength)
 movw18(%di), %ax
-movw%ax, _param(PARAM_LFB_WIDTH)
+movw%ax, _param(BVI_lfb_width)
 movw20(%di), %ax
-movw%ax, _param(PARAM_LFB_HEIGHT)
+movw%ax, _param(BVI_lfb_height)
 movzbw  25(%di), %ax
-movw%ax, _param(PARAM_LFB_DEPTH)
+movw%ax, _param(BVI_lfb_depth)
 movl40(%di), %eax
-movl%eax, _param(PARAM_LFB_BASE)
+movl%eax, _param(BVI_lfb_base)
 movl

[PATCH v2 0/2] x86/pvh: fix VGA reporting when booted as a PVH guest

2024-04-22 Thread Roger Pau Monne

Hello,

Following patches are kind of unconnected after v1, but patch 1/2 is
still a helpful improvement.

Patch 2 fixes the (wrong) reporting of VGA information when Xen is
booted as a PVH guest.

Thanks, Roger.

Roger Pau Monne (2):
  x86/video: add boot_video_info offset generation to asm-offsets
  x86/pvh: zero VGA information

 xen/arch/x86/boot/video.S | 83 ---
 xen/arch/x86/boot/video.h |  2 +
 xen/arch/x86/guest/xen/pvh-boot.c |  9 
 xen/arch/x86/setup.c  |  1 -
 xen/arch/x86/x86_64/asm-offsets.c | 26 ++
 5 files changed, 69 insertions(+), 52 deletions(-)

-- 
2.44.0

[PATCH v2 2/2] x86/pvh: zero VGA information

2024-04-22 Thread Roger Pau Monne

PVH guests skip real mode VGA detection, and never have a VGA available, hence
the default VGA selection is not applicable, and at worse can cause confusion
when parsing Xen boot log.

Zero the boot_vid_info structure when Xen is booted from the PVH entry point.

This fixes Xen incorrectly reporting:

(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16

When booted as a PVH guest.

Reported-by: Andrew Cooper 
Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/boot/video.h | 2 ++
 xen/arch/x86/guest/xen/pvh-boot.c | 9 +
 xen/arch/x86/setup.c  | 1 -
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/boot/video.h b/xen/arch/x86/boot/video.h
index 6a7775d24292..1203515f9e5b 100644
--- a/xen/arch/x86/boot/video.h
+++ b/xen/arch/x86/boot/video.h
@@ -67,6 +67,8 @@ struct boot_video_info {
 } vesapm;
 uint16_t vesa_attrib;/* 0x28 */
 };
+
+extern struct boot_video_info boot_vid_info;
 #endif /* __ASSEMBLY__ */
 
 #endif /* __BOOT_VIDEO_H__ */
diff --git a/xen/arch/x86/guest/xen/pvh-boot.c 
b/xen/arch/x86/guest/xen/pvh-boot.c
index 9cbe87b61bdd..cc57ab2cbcde 100644
--- a/xen/arch/x86/guest/xen/pvh-boot.c
+++ b/xen/arch/x86/guest/xen/pvh-boot.c
@@ -15,6 +15,10 @@
 
 #include 
 
+#ifdef CONFIG_VIDEO
+# include "../../boot/video.h"
+#endif
+
 /* Initialised in head.S, before .bss is zeroed. */
 bool __initdata pvh_boot;
 uint32_t __initdata pvh_start_info_pa;
@@ -95,6 +99,11 @@ void __init pvh_init(multiboot_info_t **mbi, module_t **mod)
 ASSERT(xen_guest);
 
 get_memory_map();
+
+#ifdef CONFIG_VIDEO
+/* No VGA available when booted from the PVH entry point. */
+memset((boot_vid_info), 0, sizeof(boot_vid_info));
+#endif
 }
 
 void __init pvh_print_info(void)
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 86cd8b999774..449a3476531e 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -646,7 +646,6 @@ static struct e820map __initdata boot_e820;
 
 #ifdef CONFIG_VIDEO
 # include "boot/video.h"
-extern struct boot_video_info boot_vid_info;
 #endif
 
 static void __init parse_video_info(void)
-- 
2.44.0

[PATCH v2 1/2] xen: introduce header file with section related symbols

2024-04-19 Thread Roger Pau Monne

Start by declaring the beginning and end of the init section.

No functional change intended.

Requested-by: Andrew Cooper 
Signed-off-by: Roger Pau Monné 
---
 xen/arch/arm/mmu/setup.c   |  3 +--
 xen/arch/x86/setup.c   |  3 +--
 xen/include/xen/sections.h | 17 +
 3 files changed, 19 insertions(+), 4 deletions(-)
 create mode 100644 xen/include/xen/sections.h

diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
index c0cb17ca2ecf..f4bb424c3c91 100644
--- a/xen/arch/arm/mmu/setup.c
+++ b/xen/arch/arm/mmu/setup.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -62,8 +63,6 @@ vaddr_t directmap_virt_start __read_mostly;
 unsigned long directmap_base_pdx __read_mostly;
 #endif
 
-extern char __init_begin[], __init_end[];
-
 /* Checking VA memory layout alignment. */
 static void __init __maybe_unused build_assertions(void)
 {
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 86cd8b999774..dd4d1b2887ee 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -309,8 +310,6 @@ void __init discard_initial_images(void)
 initial_images = NULL;
 }
 
-extern unsigned char __init_begin[], __init_end[];
-
 static void __init init_idle_domain(void)
 {
 scheduler_init();
diff --git a/xen/include/xen/sections.h b/xen/include/xen/sections.h
new file mode 100644
index ..b6cb5604c285
--- /dev/null
+++ b/xen/include/xen/sections.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __XEN_SECTIONS_H__
+#define __XEN_SECTIONS_H__
+
+/* SAF-0-safe */
+extern char __init_begin[], __init_end[];
+
+#endif /* !__XEN_SECTIONS_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.44.0

[PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections

2024-04-19 Thread Roger Pau Monne

Livepatch payloads containing symbols that belong to init sections can only
lead to page faults later on, as by the time the livepatch is loaded init
sections have already been freed.

Refuse to resolve such symbols and return an error instead.

Note such resolutions are only relevant for symbols that point to undefined
sections (SHN_UNDEF), as that implies the symbol is not in the current payload
and hence must either be a Xen or a different livepatch payload symbol.

Do not allow to resolve symbols that point to __init_begin, as that address is
also unmapped.  On the other hand, __init_end is not unmapped, and hence allow
resolutions against it.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Fix off-by-one in range checking.
---
 xen/common/livepatch_elf.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c
index 45d73912a3cd..a67101eadc02 100644
--- a/xen/common/livepatch_elf.c
+++ b/xen/common/livepatch_elf.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct livepatch_elf 
*elf)
 break;
 }
 }
+
+/*
+ * Ensure not an init symbol.  Only applicable to Xen symbols, as
+ * livepatch payloads don't have init sections or equivalent.
+ */
+else if ( st_value >= (uintptr_t)&__init_begin &&
+  st_value < (uintptr_t)&__init_end )
+{
+printk(XENLOG_ERR LIVEPATCH
+   "%s: symbol %s is in init section, not resolving\n",
+   elf->name, elf->sym[i].name);
+rc = -ENXIO;
+break;
+}
+
 dprintk(XENLOG_DEBUG, LIVEPATCH "%s: Undefined symbol resolved: %s 
=> %#"PRIxElfAddr"\n",
 elf->name, elf->sym[i].name, st_value);
 break;
-- 
2.44.0

[PATCH v2 0/2] livepatch: make symbol resolution more robust

2024-04-19 Thread Roger Pau Monne

Hello,

Previously a single patch, now with a pre-patch to place the init
section markers into a common file.

Thanks, Roger.

Roger Pau Monne (2):
  xen: introduce header file with section related symbols
  livepatch: refuse to resolve symbols that belong to init sections

 xen/arch/arm/mmu/setup.c   |  3 +--
 xen/arch/x86/setup.c   |  3 +--
 xen/common/livepatch_elf.c | 16 
 xen/include/xen/sections.h | 17 +
 4 files changed, 35 insertions(+), 4 deletions(-)
 create mode 100644 xen/include/xen/sections.h

-- 
2.44.0

[PATCH v2 0/2] x86/spec: misc fixes for XSA-456

2024-04-18 Thread Roger Pau Monne

Hello,

Fix patch fixing the printing of whether BHB-entry is used for PV or
HVM, second patch refines a bit the logic to decide whether the lfence
on entry can be elided for the entry points.

Thanks, Roger.

Roger Pau Monne (2):
  x86/spec: fix reporting of BHB clearing usage from guest entry points
  x86/spec: adjust logic to logic that elides lfence

 xen/arch/x86/include/asm/cpufeature.h |  3 ---
 xen/arch/x86/spec_ctrl.c  | 14 +++---
 2 files changed, 7 insertions(+), 10 deletions(-)

-- 
2.44.0

[PATCH v2 2/2] x86/spec: adjust logic to logic that elides lfence

2024-04-18 Thread Roger Pau Monne

It's currently too restrictive by just checking whether there's a BHB clearing
sequence selected.  It should instead check whether BHB clearing is used on
entry from PV or HVM specifically.

Switch to use opt_bhb_entry_{pv,hvm} instead, and then remove cpu_has_bhb_seq
since it no longer has any users.

Reported-by: Jan Beulich 
Fixes: 954c983abcee ('x86/spec-ctrl: Software BHB-clearing sequences')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.

There (possibly) still a bit of overhead for dom0 if BHB clearing is not used
for dom0, as Xen would still add the lfence if domUs require it.
---
 xen/arch/x86/include/asm/cpufeature.h | 3 ---
 xen/arch/x86/spec_ctrl.c  | 6 +++---
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/include/asm/cpufeature.h 
b/xen/arch/x86/include/asm/cpufeature.h
index 743f11f98940..9bc553681f4a 100644
--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -235,9 +235,6 @@ static inline bool boot_cpu_has(unsigned int feat)
 #define cpu_bug_fpu_ptrsboot_cpu_has(X86_BUG_FPU_PTRS)
 #define cpu_bug_null_segboot_cpu_has(X86_BUG_NULL_SEG)
 
-#define cpu_has_bhb_seq(boot_cpu_has(X86_SPEC_BHB_TSX) ||   \
-boot_cpu_has(X86_SPEC_BHB_LOOPS))
-
 enum _cache_type {
 CACHE_TYPE_NULL = 0,
 CACHE_TYPE_DATA = 1,
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 1e831c1c108e..40f6ae017010 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -2328,7 +2328,7 @@ void __init init_speculation_mitigations(void)
  * unconditional WRMSR.  If we do have it, or we're not using any
  * prior conditional block, then it's safe to drop the LFENCE.
  */
-if ( !cpu_has_bhb_seq &&
+if ( !opt_bhb_entry_pv &&
  (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
   !boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)) )
 setup_force_cpu_cap(X86_SPEC_NO_LFENCE_ENTRY_PV);
@@ -2344,7 +2344,7 @@ void __init init_speculation_mitigations(void)
  * active in the block that is skipped when interrupting guest
  * context, then it's safe to drop the LFENCE.
  */
-if ( !cpu_has_bhb_seq &&
+if ( !opt_bhb_entry_pv &&
  (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
   (!boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) &&
!boot_cpu_has(X86_FEATURE_SC_RSB_PV))) )
@@ -2356,7 +2356,7 @@ void __init init_speculation_mitigations(void)
  * A BHB sequence, if used, is the only conditional action, so if we
  * don't have it, we don't need the safety LFENCE.
  */
-if ( !cpu_has_bhb_seq )
+if ( !opt_bhb_entry_hvm )
 setup_force_cpu_cap(X86_SPEC_NO_LFENCE_ENTRY_VMX);
 }
 
-- 
2.44.0

[PATCH v2 1/2] x86/spec: fix reporting of BHB clearing usage from guest entry points

2024-04-18 Thread Roger Pau Monne

Reporting whether the BHB clearing on entry is done for the different domains
types based on cpu_has_bhb_seq is incorrect, as that variable signals whether
there's a BHB clearing sequence selected, but that alone doesn't imply that
such sequence is used from the PV and/or HVM entry points.

Instead use opt_bhb_entry_{pv,hvm} which do signal whether BHB clearing is
performed on entry from PV/HVM.

Fixes: 689ad48ce9cf ('x86/spec-ctrl: Wire up the Native-BHI software sequences')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Also fix usage of cpu_has_bhb_seq when deciding whether to print "None".
---
 xen/arch/x86/spec_ctrl.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index dd01e30844a1..1e831c1c108e 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -634,7 +634,7 @@ static void __init print_details(enum ind_thunk thunk)
(boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
 boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
 boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
-cpu_has_bhb_seq || amd_virt_spec_ctrl ||
+opt_bhb_entry_hvm || amd_virt_spec_ctrl ||
 opt_eager_fpu || opt_verw_hvm)   ? ""   : " 
None",
boot_cpu_has(X86_FEATURE_SC_MSR_HVM)  ? " MSR_SPEC_CTRL" : "",
(boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
@@ -643,7 +643,7 @@ static void __init print_details(enum ind_thunk thunk)
opt_eager_fpu ? " EAGER_FPU" : "",
opt_verw_hvm  ? " VERW"  : "",
boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM)  ? " IBPB-entry": "",
-   cpu_has_bhb_seq   ? " BHB-entry" : "");
+   opt_bhb_entry_hvm ? " BHB-entry" : "");
 
 #endif
 #ifdef CONFIG_PV
@@ -651,14 +651,14 @@ static void __init print_details(enum ind_thunk thunk)
(boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
 boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
 boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
-cpu_has_bhb_seq ||
+opt_bhb_entry_pv ||
 opt_eager_fpu || opt_verw_pv)? ""   : " 
None",
boot_cpu_has(X86_FEATURE_SC_MSR_PV)   ? " MSR_SPEC_CTRL" : "",
boot_cpu_has(X86_FEATURE_SC_RSB_PV)   ? " RSB"   : "",
opt_eager_fpu ? " EAGER_FPU" : "",
opt_verw_pv   ? " VERW"  : "",
boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)   ? " IBPB-entry": "",
-   cpu_has_bhb_seq   ? " BHB-entry" : "");
+   opt_bhb_entry_pv  ? " BHB-entry" : "");
 
 printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
opt_xpti_hwdom ? "enabled" : "disabled",
-- 
2.44.0

[PATCH] x86/spec: fix reporting of BHB clearing usage from guest entry points

2024-04-15 Thread Roger Pau Monne

Reporting whether the BHB clearing on entry is done for the different domains
types based on cpu_has_bhb_seq is incorrect, as that variable signals whether
there's a BHB clearing sequence selected, but that alone doesn't imply that
such sequence is used from the PV and/or HVM entry points.

Instead use opt_bhb_entry_{pv,hvm} which do signal whether BHB clearing is
performed on entry from PV/HVM.

Fixes: 689ad48ce9cf ('x86/spec-ctrl: Wire up the Native-BHI software sequences')
Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/spec_ctrl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index dd01e30844a1..29f5248d01d1 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -643,7 +643,7 @@ static void __init print_details(enum ind_thunk thunk)
opt_eager_fpu ? " EAGER_FPU" : "",
opt_verw_hvm  ? " VERW"  : "",
boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM)  ? " IBPB-entry": "",
-   cpu_has_bhb_seq   ? " BHB-entry" : "");
+   opt_bhb_entry_hvm ? " BHB-entry" : "");
 
 #endif
 #ifdef CONFIG_PV
@@ -658,7 +658,7 @@ static void __init print_details(enum ind_thunk thunk)
opt_eager_fpu ? " EAGER_FPU" : "",
opt_verw_pv   ? " VERW"  : "",
boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)   ? " IBPB-entry": "",
-   cpu_has_bhb_seq   ? " BHB-entry" : "");
+   opt_bhb_entry_pv  ? " BHB-entry" : "");
 
 printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
opt_xpti_hwdom ? "enabled" : "disabled",
-- 
2.44.0

[PATCH] osstest: increase boot timeout for Debian PV guests

2024-04-12 Thread Roger Pau Monne

The current timeout of 40s seems to be too low for AMD boxes (pinots and
rimavas) in the lab after XSA-455, see:

http://logs.test-lab.xenproject.org/osstest/logs/185303/test-amd64-coresched-amd64-xl/info.html

Increase the timeout to 60s.

Signed-off-by: Roger Pau Monné 
---
 ts-debian-di-install | 2 +-
 ts-debian-install| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/ts-debian-di-install b/ts-debian-di-install
index 06c7e1f46a92..1a187dfeb9cb 100755
--- a/ts-debian-di-install
+++ b/ts-debian-di-install
@@ -76,7 +76,7 @@ sub prep () {
 target_install_packages_norec($ho, qw(lvm2));
 
 $gho= prepareguest($ho, $gn, $guesthost, 22,
-   $disk_mb, 40);
+   $disk_mb, 60);
 
 prepareguest_part_diskimg($ho, $gho, $disk_mb);
 }
diff --git a/ts-debian-install b/ts-debian-install
index 62db487ad15d..ef2954dd3c8e 100755
--- a/ts-debian-install
+++ b/ts-debian-install
@@ -42,7 +42,7 @@ sub prep () {
 
 $gho= prepareguest($ho, $gn, $guesthost, 22,
$swap_mb + $disk_mb + 2,
-   40);
+   60);
 target_cmd_root($ho, "umount $gho->{Lvdev} ||:");
 }
 
-- 
2.44.0

[PATCH] livepatch: refuse to resolve symbols that belong to init sections

2024-04-12 Thread Roger Pau Monne

Livepatch payloads containing symbols that belong to init sections can only
lead to page faults later on, as by the time the livepatch is loaded init
sections have already been freed.

Refuse to resolve such symbols and return an error instead.

Note such resolutions are only relevant for symbols that point to undefined
sections (SHN_UNDEF), as that implies the symbol is not in the current payload
and hence must either be a Xen or a different livepatch payload symbol.

Signed-off-by: Roger Pau Monné 
---
 xen/common/livepatch_elf.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/xen/common/livepatch_elf.c b/xen/common/livepatch_elf.c
index 45d73912a3cd..25b81189c590 100644
--- a/xen/common/livepatch_elf.c
+++ b/xen/common/livepatch_elf.c
@@ -8,6 +8,9 @@
 #include 
 #include 
 
+/* Needed to check we don't resolve against init section symbols. */
+extern char __init_begin[], __init_end[];
+
 const struct livepatch_elf_sec *
 livepatch_elf_sec_by_name(const struct livepatch_elf *elf,
   const char *name)
@@ -310,6 +313,20 @@ int livepatch_elf_resolve_symbols(struct livepatch_elf 
*elf)
 break;
 }
 }
+/*
+ * Ensure not an init symbol.  Only applicable to Xen symbols, as
+ * livepatch payloads don't have init sections or equivalent.
+ */
+else if ( st_value >= (uintptr_t)&__init_begin &&
+  st_value <= (uintptr_t)&__init_end )
+{
+printk(XENLOG_ERR LIVEPATCH
+   "%s: symbol %s is in init section, not resolving\n",
+   elf->name, elf->sym[i].name);
+rc = -ENXIO;
+break;
+}
+
 dprintk(XENLOG_DEBUG, LIVEPATCH "%s: Undefined symbol resolved: %s 
=> %#"PRIxElfAddr"\n",
 elf->name, elf->sym[i].name, st_value);
 break;
-- 
2.44.0

[PATCH v2] altcall: fix __alt_call_maybe_initdata so it's safe for livepatch

2024-04-11 Thread Roger Pau Monne

Setting alternative call variables as __init is not safe for use with
livepatch, as livepatches can rightfully introduce new alternative calls to
structures marked as __alt_call_maybe_initdata (possibly just indirectly due to
replacing existing functions that use those).  Attempting to resolve those
alternative calls then results in page faults as the variable that holds the
function pointer address has been freed.

When livepatch is supported use the __ro_after_init attribute instead of
__initdata for __alt_call_maybe_initdata.

Fixes: f26bb285949b ('xen: Implement xen/alternative-call.h for use in common 
code')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Use #ifdef instead of #ifndef.
---
 xen/include/xen/alternative-call.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/include/xen/alternative-call.h 
b/xen/include/xen/alternative-call.h
index 5c6b9a562b92..10f7d7637e1e 100644
--- a/xen/include/xen/alternative-call.h
+++ b/xen/include/xen/alternative-call.h
@@ -50,7 +50,12 @@
 
 #include 
 
-#define __alt_call_maybe_initdata __initdata
+#ifdef CONFIG_LIVEPATCH
+/* Must keep for livepatches to resolve alternative calls. */
+# define __alt_call_maybe_initdata __ro_after_init
+#else
+# define __alt_call_maybe_initdata __initdata
+#endif
 
 #else
 
-- 
2.44.0

[PATCH] altcall: fix __alt_call_maybe_initdata so it's safe for livepatch

2024-04-11 Thread Roger Pau Monne

Setting alternative call variables as __init is not safe for use with
livepatch, as livepatches can rightfully introduce new alternative calls to
structures marked as __alt_call_maybe_initdata (possibly just indirectly due to
replacing existing functions that use those).  Attempting to resolve those
alternative calls then results in page faults as the variable that holds the
function pointer address has been freed.

When livepatch is supported use the __ro_after_init attribute instead of
__initdata for __alt_call_maybe_initdata.

Fixes: f26bb285949b ('xen: Implement xen/alternative-call.h for use in common 
code')
Signed-off-by: Roger Pau Monné 
---
 xen/include/xen/alternative-call.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/include/xen/alternative-call.h 
b/xen/include/xen/alternative-call.h
index 5c6b9a562b92..1896063e5da5 100644
--- a/xen/include/xen/alternative-call.h
+++ b/xen/include/xen/alternative-call.h
@@ -50,7 +50,12 @@
 
 #include 
 
-#define __alt_call_maybe_initdata __initdata
+#ifndef CONFIG_LIVEPATCH
+# define __alt_call_maybe_initdata __initdata
+#else
+/* Must keep for livepatches to resolve alternative calls against them. */
+# define __alt_call_maybe_initdata __ro_after_init
+#endif
 
 #else
 
-- 
2.44.0

[PATCH] x86/alternatives: fix .init section reference in _apply_alternatives()

2024-04-09 Thread Roger Pau Monne

The code in _apply_alternatives() will unconditionally attempt to read
__initdata_cf_clobber_{start,end} when called as part of applying alternatives
to a livepatch payload.  That leads to a page-fault as
__initdata_cf_clobber_{start,end} living in .init section will have been
unmapped by the time a livepatch gets loaded.

Fix by adding a check that limits the clobbering of endbr64 instructions to
boot time only.

Fixes: 37ed5da851b8 ('x86/altcall: Optimise away endbr64 instruction where 
possible')
Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 21af0e825822..2e7ba6e0b833 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -326,7 +326,7 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
  * Clobber endbr64 instructions now that altcall has finished optimising
  * all indirect branches to direct ones.
  */
-if ( force && cpu_has_xen_ibt )
+if ( force && cpu_has_xen_ibt && system_state < SYS_STATE_active )
 {
 void *const *val;
 unsigned int clobbered = 0;
-- 
2.44.0

[PATCH 1/2] x86/video: add boot_video_info offset generation to asm-offsets

2024-03-28 Thread Roger Pau Monne

Currently the offsets into the boot_video_info struct are manually encoded in
video.S, which is fragile.  Generate them in asm-offsets.c and switch the
current code to use those instead.

No functional change intended.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/boot/video.S | 83 ---
 xen/arch/x86/x86_64/asm-offsets.c | 26 ++
 2 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/xen/arch/x86/boot/video.S b/xen/arch/x86/boot/video.S
index 0ae04f270f8c..a4b25a3b34d1 100644
--- a/xen/arch/x86/boot/video.S
+++ b/xen/arch/x86/boot/video.S
@@ -26,32 +26,13 @@
 /* Force 400 scan lines for standard modes (hack to fix bad BIOS behaviour */
 #undef CONFIG_VIDEO_400_HACK
 
-/* Positions of various video parameters passed to the kernel */
-/* (see also include/linux/tty.h) */
-#define PARAM_CURSOR_POS0x00
-#define PARAM_VIDEO_MODE0x02
-#define PARAM_VIDEO_COLS0x03
-#define PARAM_VIDEO_LINES   0x04
-#define PARAM_HAVE_VGA  0x05
-#define PARAM_FONT_POINTS   0x06
-#define PARAM_CAPABILITIES  0x08
-#define PARAM_LFB_LINELENGTH0x0c
-#define PARAM_LFB_WIDTH 0x0e
-#define PARAM_LFB_HEIGHT0x10
-#define PARAM_LFB_DEPTH 0x12
-#define PARAM_LFB_BASE  0x14
-#define PARAM_LFB_SIZE  0x18
-#define PARAM_LFB_COLORS0x1c
-#define PARAM_VESAPM_SEG0x24
-#define PARAM_VESAPM_OFF0x26
-#define PARAM_VESA_ATTRIB   0x28
 #define _param(param) bootsym(boot_vid_info)+(param)
 
 video:  xorw%ax, %ax
 movw%ax, %gs# GS is zero
 cld
 callbasic_detect# Basic adapter type testing (EGA/VGA/MDA/CGA)
-cmpb$0,_param(PARAM_HAVE_VGA)
+cmpb$0,_param(BVI_have_vga)
 je  1f# Bail if there's no VGA
 movwbootsym(boot_vid_mode), %ax # User selected video mode
 cmpw$ASK_VGA, %ax   # Bring up the menu
@@ -69,7 +50,7 @@ vid1:   callstore_edid
 
 # Detect if we have CGA, MDA, EGA or VGA and pass it to the kernel.
 basic_detect:
-movb$0, _param(PARAM_HAVE_VGA)
+movb$0, _param(BVI_have_vga)
 movb$0x12, %ah  # Check EGA/VGA
 movb$0x10, %bl
 int $0x10
@@ -79,7 +60,7 @@ basic_detect:
 int $0x10
 cmpb$0x1a, %al  # 1a means VGA...
 jne basret  # anything else is EGA.
-incb_param(PARAM_HAVE_VGA)  # We've detected a VGA
+incb_param(BVI_have_vga)# We've detected a VGA
 basret: ret
 
 # Store the video mode parameters for later usage by the kernel.
@@ -92,57 +73,57 @@ mode_params:
 movb$0x03, %ah  # Read cursor position
 xorb%bh, %bh
 int $0x10
-movw%dx, _param(PARAM_CURSOR_POS)
+movw%dx, _param(BVI_cursor_pos)
 movb$0x0f, %ah  # Read page/mode/width
 int $0x10
-movw%ax, _param(PARAM_VIDEO_MODE)   # Video mode and screen width
+movw%ax, _param(BVI_video_mode) # Video mode and screen width
 movw%gs:(0x485), %ax# Font size
-movw%ax, _param(PARAM_FONT_POINTS)  # (valid only on EGA/VGA)
+movw%ax, _param(BVI_font_points)# (valid only on EGA/VGA)
 movwbootsym(force_size), %ax# Forced size?
 orw %ax, %ax
 jz  mopar1
 
-movb%ah, _param(PARAM_VIDEO_COLS)
-movb%al, _param(PARAM_VIDEO_LINES)
+movb%ah, _param(BVI_video_cols)
+movb%al, _param(BVI_video_lines)
 ret
 
 mopar1: movb%gs:(0x484), %al# On EGA/VGA, use the EGA+ BIOS
 incb%al # location of max lines.
-mopar2: movb%al, _param(PARAM_VIDEO_LINES)
+mopar2: movb%al, _param(BVI_video_lines)
 ret
 
 # Fetching of VESA frame buffer parameters
 mopar_gr:
 movw$vesa_mode_info, %di
-movb$0x23, _param(PARAM_HAVE_VGA)
+movb$0x23, _param(BVI_have_vga)
 movw16(%di), %ax
-movw%ax, _param(PARAM_LFB_LINELENGTH)
+movw%ax, _param(BVI_lfb_linelength)
 movw18(%di), %ax
-movw%ax, _param(PARAM_LFB_WIDTH)
+movw%ax, _param(BVI_lfb_width)
 movw20(%di), %ax
-movw%ax, _param(PARAM_LFB_HEIGHT)
+movw%ax, _param(BVI_lfb_height)
 movzbw  25(%di), %ax
-movw%ax, _param(PARAM_LFB_DEPTH)
+movw%ax, _param(BVI_lfb_depth)
 movl40(%di), %eax
-movl%eax, _param(PARAM_LFB_BASE)
+movl%eax, _param(BVI_lfb_base)
 movl31(%di), %eax
-movl%eax, _param(PARAM_LFB_COLORS)
+movl%eax, _param(BVI_lfb_colors)
 movl35(%di), %eax
-

[PATCH 2/2] x86/video: do not assume a video mode to be unconditionally present

2024-03-28 Thread Roger Pau Monne

There's no reason to assume VGA text mode 3 to be unconditionally available.
With the addition of booting Xen itself in PVH mode there's a boot path that
explicitly short-circuits all the real-mode logic, including the VGA detection.

Leave the default user selected mode as text mode 3 in boot_vid_mode, but do
not populate boot_vid_info with any default settings.  It will either be
populated by the real-mode video detection code, or left zeroed in case
real-mode code is skipped.

Note that only PVH skips the real-mode portion of the boot trampoline,
otherwise the only way to skip it is to set `no-real-mode` on the command line,
and the description for the option already notes that VGA would be disabled as
a result of skipping real-mode bootstrap.

This fixes Xen incorrectly reporting:

(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16

When booted as a PVH guest.

Reported-by: Andrew Cooper 
Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/boot/video.S | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/xen/arch/x86/boot/video.S b/xen/arch/x86/boot/video.S
index a4b25a3b34d1..a51de04a024e 100644
--- a/xen/arch/x86/boot/video.S
+++ b/xen/arch/x86/boot/video.S
@@ -994,12 +994,7 @@ name_bann:  .asciz  "Video adapter: "
 force_size: .word   0   # Use this size instead of the one in BIOS vars
 
 GLOBAL(boot_vid_info)
-.byte   0, 0/* orig_x, orig_y */
-.byte   3   /* text mode 3*/
-.byte   80, 25  /* 80x25  */
-.byte   1   /* isVGA  */
-.word   16  /* 8x16 font  */
-.space  BVI_size - BVI_capabilities
+.space  BVI_size
 GLOBAL(boot_edid_info)
 .fill   128,1,0x13
 GLOBAL(boot_edid_caps)
-- 
2.44.0

[PATCH 0/2] x86/video: improve early video detection

2024-03-28 Thread Roger Pau Monne

Hello,

The current video logic reports wrong values when booted in PVH mode, as
that boot path explicitly skips real-mode logic, and thus avoids any VGA
video detection.

First patch is cleanup of the offset declarations in boot_video_info.
Second patch attempts to fix Xen in PVH mode reporting VGA support.

Roger Pau Monne (2):
  x86/video: add boot_video_info offset generation to asm-offsets
  x86/video: do not assume a video mode to be unconditionally present

 xen/arch/x86/boot/video.S | 88 +++
 xen/arch/x86/x86_64/asm-offsets.c | 26 +
 2 files changed, 58 insertions(+), 56 deletions(-)

-- 
2.44.0

[PATCH] xen/console: add comment about external console lock helper

2024-03-21 Thread Roger Pau Monne

The current console_lock_recursive_irqsave() implementation is not speculation
safe, however it's only used to prevent interleaved output.  Note this in the
function declaration in order for callers to be aware of the limitation.

No functional change.

Signed-off-by: Roger Pau Monné 
---
 xen/include/xen/console.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/include/xen/console.h b/xen/include/xen/console.h
index 68759862e88d..6dfbade3ece3 100644
--- a/xen/include/xen/console.h
+++ b/xen/include/xen/console.h
@@ -20,6 +20,7 @@ void console_init_postirq(void);
 void console_endboot(void);
 int console_has(const char *device);
 
+/* Not speculation safe - only used to prevent interleaving of output. */
 unsigned long console_lock_recursive_irqsave(void);
 void console_unlock_recursive_irqrestore(unsigned long flags);
 void console_force_unlock(void);
-- 
2.44.0

[PATCH] x86/vcpu: relax VCPUOP_initialise restriction for non-PV vCPUs

2024-03-20 Thread Roger Pau Monne

There's no reason to force HVM guests to have a valid vcpu_info area when
initializing a vCPU, as the vCPU can also be brought online using the local
APIC, and on that path there's no requirement for vcpu_info to be setup ahead
of the bring up.  Note an HVM vCPU can operate normally without making use of
vcpu_info.

Restrict the check against dummy_vcpu_info to only apply to PV guests.

Fixes: 192df6f9122d ('x86: allow HVM guests to use hypercalls to bring up 
vCPUs')
Signed-off-by: Roger Pau Monné 
---
 xen/common/compat/domain.c | 2 +-
 xen/common/domain.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/domain.c b/xen/common/compat/domain.c
index 7ff238cc2656..6b4afc823217 100644
--- a/xen/common/compat/domain.c
+++ b/xen/common/compat/domain.c
@@ -49,7 +49,7 @@ int compat_common_vcpu_op(int cmd, struct vcpu *v,
 {
 case VCPUOP_initialise:
 {
-if ( v->vcpu_info_area.map == _vcpu_info )
+if ( is_pv_domain(d) && v->vcpu_info_area.map == _vcpu_info )
 return -EINVAL;
 
 #ifdef CONFIG_HVM
diff --git a/xen/common/domain.c b/xen/common/domain.c
index f6f557499660..d956dc09eca0 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1817,7 +1817,7 @@ long common_vcpu_op(int cmd, struct vcpu *v, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 switch ( cmd )
 {
 case VCPUOP_initialise:
-if ( v->vcpu_info_area.map == _vcpu_info )
+if ( is_pv_domain(d) && v->vcpu_info_area.map == _vcpu_info )
 return -EINVAL;
 
 rc = arch_initialise_vcpu(v, arg);
-- 
2.44.0

[PATCH] x86/time: prefer CMOS over EFI_GET_TIME

2024-03-15 Thread Roger Pau Monne

EFI_GET_TIME doesn't seem to be very reliable:

[ Xen-4.19-unstable  x86_64  debug=y  Tainted:   C]
CPU:0
RIP:e008:[<62ccfa70>] 62ccfa70
[...]
Xen call trace:
   [<62ccfa70>] R 62ccfa70
   [<732e9a3f>] S 732e9a3f
   [] F arch/x86/time.c#get_cmos_time+0x1b3/0x26e
   [] F init_xen_time+0x28/0xa4
   [] F __start_xen+0x1ee7/0x2578
   [] F __high_start+0x94/0xa0

Pagetable walk from 62ccfa70:
 L4[0x000] = 00207ef1c063 
 L3[0x001] = 5d6c0063 
 L2[0x116] = 800062c001e3  (PSE)


Panic on CPU 0:
FATAL PAGE FAULT
[error_code=0011]
Faulting linear address: 62ccfa70


Swap the preference to default to CMOS first, and EFI later, in an attempt to
use EFI_GET_TIME as a last resort option only.  Note that Linux for example
doesn't allow calling the get_time method, and instead provides a dummy handler
that unconditionally returns EFI_UNSUPPORTED on x86-64.

The error checks are moved to the end of the function, in order to have them
grouped together.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/time.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 60870047894b..fcce5528e1f0 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1176,26 +1176,17 @@ static void __get_cmos_time(struct rtc_time *rtc)
 
 static unsigned long get_cmos_time(void)
 {
-unsigned long res, flags;
+unsigned long flags;
 struct rtc_time rtc;
 unsigned int seconds = 60;
 static bool __read_mostly cmos_rtc_probe;
 boolean_param("cmos-rtc-probe", cmos_rtc_probe);
 
-if ( efi_enabled(EFI_RS) )
-{
-res = efi_get_time();
-if ( res )
-return res;
-}
-
 if ( likely(!(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC)) )
 cmos_rtc_probe = false;
-else if ( system_state < SYS_STATE_smp_boot && !cmos_rtc_probe )
-panic("System with no CMOS RTC advertised must be booted from EFI"
-  " (or with command line option \"cmos-rtc-probe\")\n");
 
-for ( ; ; )
+for ( ; !(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC) ||
+cmos_rtc_probe; )
 {
 s_time_t start, t1, t2;
 
@@ -1223,7 +1214,8 @@ static unsigned long get_cmos_time(void)
  rtc.sec >= 60 || rtc.min >= 60 || rtc.hour >= 24 ||
  !rtc.day || rtc.day > 31 ||
  !rtc.mon || rtc.mon > 12 )
-break;
+return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min,
+  rtc.sec);
 
 if ( seconds < 60 )
 {
@@ -1231,6 +1223,8 @@ static unsigned long get_cmos_time(void)
 {
 cmos_rtc_probe = false;
 acpi_gbl_FADT.boot_flags &= ~ACPI_FADT_NO_CMOS_RTC;
+return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min,
+  rtc.sec);
 }
 break;
 }
@@ -1240,10 +1234,22 @@ static unsigned long get_cmos_time(void)
 seconds = rtc.sec;
 }
 
-if ( unlikely(cmos_rtc_probe) )
-panic("No CMOS RTC found - system must be booted from EFI\n");
+if ( efi_enabled(EFI_RS) )
+{
+unsigned long res = efi_get_time();
+
+if ( res )
+return res;
+
+panic("Broken EFI_GET_TIME %s\n",
+  !cmos_rtc_probe ? "try booting with \"cmos-rtc-probe\""
+  : "and no CMOS RTC found");
+}
+if ( !cmos_rtc_probe )
+panic("System with no CMOS RTC advertised must be booted from EFI"
+  " (or with command line option \"cmos-rtc-probe\")\n");
 
-return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
+panic("No CMOS RTC found - system must be booted from EFI\n");
 }
 
 static unsigned int __ro_after_init cmos_alias_mask;
-- 
2.44.0

[PATCH] x86/mm: fix detection of last L1 entry in modify_xen_mappings_lite()

2024-03-11 Thread Roger Pau Monne

The current logic to detect when to switch to the next L1 table is incorrectly
using l2_table_offset() in order to notice when the last entry on the current
L1 table has been reached.

It should instead use l1_table_offset() to check whether the index has wrapped
to point to the first entry, and so the next L1 table should be used.

Fixes: 8676092a0f16 ('x86/livepatch: Fix livepatch application when CET is 
active')
Signed-off-by: Roger Pau Monné 
---
This fixes the osstest livepatch related crash, we have been lucky so far that
the .text section didn't seem to have hit this.
---
 xen/arch/x86/mm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2aff6d4b5338..0c6658298de2 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5959,7 +5959,7 @@ void init_or_livepatch modify_xen_mappings_lite(
 
 v += 1UL << L1_PAGETABLE_SHIFT;
 
-if ( l2_table_offset(v) == 0 )
+if ( l1_table_offset(v) == 0 )
 break;
 }
 
-- 
2.44.0

[PATCH] x86/mm: re-implement get_page_light() using an atomic increment

2024-03-01 Thread Roger Pau Monne

The current usage of a cmpxchg loop to increase the value of page count is not
optimal on amd64, as there's already an instruction to do an atomic add to a
64bit integer.

Switch the code in get_page_light() to use an atomic increment, as that avoids
a loop construct.  This slightly changes the order of the checks, as current
code will crash before modifying the page count_info if the conditions are not
correct, while with the proposed change the crash will happen immediately
after having carried the counter increase.  Since we are crashing anyway, I
don't believe the re-ordering to have any meaningful impact.

Note that the page must already have a non-zero reference count which prevents
the flags from changing, and the previous usage of the cmpxchg loop didn't
guarantee that the rest of the fields in count_info didn't change while
updating the reference count.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/mm.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4d6d7bfe4f89..2aff6d4b5338 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2580,16 +2580,10 @@ bool get_page(struct page_info *page, const struct 
domain *domain)
  */
 static void get_page_light(struct page_info *page)
 {
-unsigned long x, nx, y = page->count_info;
+unsigned long old_pgc = arch_fetch_and_add(>count_info, 1);
 
-do {
-x  = y;
-nx = x + 1;
-BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
-BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
-y = cmpxchg(>count_info, x, nx);
-}
-while ( unlikely(y != x) );
+BUG_ON(!(old_pgc & PGC_count_mask)); /* Not allocated? */
+BUG_ON(!((old_pgc + 1) & PGC_count_mask)); /* Overflow? */
 }
 
 static int validate_page(struct page_info *page, unsigned long type,
-- 
2.44.0

[PATCH] pci: fix locking around vPCI removal in pci_remove_device()

2024-02-29 Thread Roger Pau Monne

Currently vpci_deassign_device() is called without holding the per-domain
pci_lock in pci_remove_device(), which leads to:

Assertion 'rw_is_write_locked(>domain->pci_lock)' failed at 
../drivers/vpci/vpci.c:47
[...]
Xen call trace:
   [] R vpci_deassign_device+0x10d/0x1b9
   [] S pci_remove_device+0x2b1/0x380
   [] F pci_physdev_op+0x197/0x19e
   [] F do_physdev_op+0x342/0x12aa
   [] F pv_hypercall+0x58e/0x62b
   [] F lstar_enter+0x13a/0x140

Move the existing block that removes the device from the domain pdev_list ahead
and also issue the call to vpci_deassign_device() there.  It's fine to remove
the device from the domain list of assigned devices, as further functions only
care that the pdev domain field is correctly set to the owner of the device
about to be removed.

Moving the vpci_deassign_device() past the pci_cleanup_msi() call can be
dangerous, as doing the MSI cleanup ahead of having removed the vPCI handlers
could lead to stale data in vPCI MSI(-X) internal structures.

Fixes: 4f78438b45e2 ('vpci: use per-domain PCI lock to protect vpci structure')
Signed-off-by: Roger Pau Monné 
---
 xen/drivers/passthrough/pci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 4c0a836486ec..194701c9137d 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -817,15 +817,15 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
 list_for_each_entry ( pdev, >alldevs_list, alldevs_list )
 if ( pdev->bus == bus && pdev->devfn == devfn )
 {
-vpci_deassign_device(pdev);
-pci_cleanup_msi(pdev);
-ret = iommu_remove_device(pdev);
 if ( pdev->domain )
 {
 write_lock(>domain->pci_lock);
+vpci_deassign_device(pdev);
 list_del(>domain_list);
 write_unlock(>domain->pci_lock);
 }
+pci_cleanup_msi(pdev);
+ret = iommu_remove_device(pdev);
 printk(XENLOG_DEBUG "PCI remove device %pp\n", >sbdf);
 free_pdev(pseg, pdev);
 break;
-- 
2.44.0

[PATCH 1/2] README: bump minimum required clang/llvm version

2024-02-29 Thread Roger Pau Monne

We no longer have a way to build with the minimum required clang/llvm version
stated in the README on the gitlab CI loop, since we dropped the Debian Jessie
container that had Clang 3.5.0.

Bump the minimum required Clang/LLVM to the one used in the oldest production
FreeBSD version (13.2 currently), as that's the main reason I care to maintain
Clang/LLVM support, and as far as I know FreeBSD is the only production
deployment of Xen built with Clang/LLVM.

Purge the build jobs for non-supported Clang versions from Gitlab CI.  Note the
.dockerfiles for the respective distros are explicitly not adjusted to drop the
install of the clang packages, or else those jobs would start to fail on older
Xen branches.

Signed-off-by: Roger Pau Monné 
---
I'm willing to consider older versions, but there needs to be a reason (iow:
use-case) for considering those, as maintaining support for older toolchains is
a burden.
---
 README  |  2 +-
 automation/gitlab-ci/build.yaml | 45 -
 2 files changed, 1 insertion(+), 46 deletions(-)

diff --git a/README b/README
index c8a108449e29..5fe52cc7a932 100644
--- a/README
+++ b/README
@@ -41,7 +41,7 @@ provided by your OS distributor:
 - GCC 4.1.2_20070115 or later
 - GNU Binutils 2.16.91.0.5 or later
 or
-- Clang/LLVM 3.5 or later
+- Clang/LLVM 14.0.0 or later
   - For ARM 32-bit:
 - GCC 4.9 or later
 - GNU Binutils 2.24 or later
diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index 6d2cb18b8883..347fe1b5a8db 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -638,21 +638,6 @@ debian-stretch-gcc:
   variables:
 CONTAINER: debian:stretch
 
-debian-stretch-clang:
-  extends: .clang-x86-64-build
-  variables:
-CONTAINER: debian:stretch
-
-debian-stretch-clang-debug:
-  extends: .clang-x86-64-build-debug
-  variables:
-CONTAINER: debian:stretch
-
-debian-stretch-32-clang-debug:
-  extends: .clang-x86-32-build-debug
-  variables:
-CONTAINER: debian:stretch-i386
-
 debian-stretch-32-gcc-debug:
   extends: .gcc-x86-32-build-debug
   variables:
@@ -725,16 +710,6 @@ ubuntu-trusty-gcc-debug:
   variables:
 CONTAINER: ubuntu:trusty
 
-ubuntu-xenial-clang:
-  extends: .clang-x86-64-build
-  variables:
-CONTAINER: ubuntu:xenial
-
-ubuntu-xenial-clang-debug:
-  extends: .clang-x86-64-build-debug
-  variables:
-CONTAINER: ubuntu:xenial
-
 ubuntu-xenial-gcc:
   extends: .gcc-x86-64-build
   variables:
@@ -745,16 +720,6 @@ ubuntu-xenial-gcc-debug:
   variables:
 CONTAINER: ubuntu:xenial
 
-ubuntu-bionic-clang:
-  extends: .clang-x86-64-build
-  variables:
-CONTAINER: ubuntu:bionic
-
-ubuntu-bionic-clang-debug:
-  extends: .clang-x86-64-build-debug
-  variables:
-CONTAINER: ubuntu:bionic
-
 ubuntu-bionic-gcc:
   extends: .gcc-x86-64-build
   variables:
@@ -775,16 +740,6 @@ ubuntu-focal-gcc-debug:
   variables:
 CONTAINER: ubuntu:focal
 
-ubuntu-focal-clang:
-  extends: .clang-x86-64-build
-  variables:
-CONTAINER: ubuntu:focal
-
-ubuntu-focal-clang-debug:
-  extends: .clang-x86-64-build-debug
-  variables:
-CONTAINER: ubuntu:focal
-
 opensuse-leap-clang:
   extends: .clang-x86-64-build
   variables:
-- 
2.44.0

[PATCH 2/2] automation: introduce non debug clang based tests

2024-02-29 Thread Roger Pau Monne

The generated code between the debug and release builds can be quite
different, as a note 2ce562b2a413 only manifested in non-debug builds due to
the usage of -O2.

Duplicate the clang based test in order to test with both debug and non-debug
Xen builds.

Signed-off-by: Roger Pau Monné 
---
 automation/gitlab-ci/test.yaml | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
index 8b7b2e4da92d..dedca794b257 100644
--- a/automation/gitlab-ci/test.yaml
+++ b/automation/gitlab-ci/test.yaml
@@ -422,13 +422,20 @@ qemu-smoke-x86-64-gcc:
   needs:
 - debian-stretch-gcc-debug
 
-qemu-smoke-x86-64-clang:
+qemu-smoke-x86-64-clang-debug:
   extends: .qemu-x86-64
   script:
 - ./automation/scripts/qemu-smoke-x86-64.sh pv 2>&1 | tee ${LOGFILE}
   needs:
 - debian-bookworm-clang-debug
 
+qemu-smoke-x86-64-clang:
+  extends: .qemu-x86-64
+  script:
+- ./automation/scripts/qemu-smoke-x86-64.sh pv 2>&1 | tee ${LOGFILE}
+  needs:
+- debian-bookworm-clang
+
 qemu-smoke-x86-64-gcc-pvh:
   extends: .qemu-x86-64
   script:
@@ -436,13 +443,20 @@ qemu-smoke-x86-64-gcc-pvh:
   needs:
 - debian-stretch-gcc-debug
 
-qemu-smoke-x86-64-clang-pvh:
+qemu-smoke-x86-64-clang-debug-pvh:
   extends: .qemu-x86-64
   script:
 - ./automation/scripts/qemu-smoke-x86-64.sh pvh 2>&1 | tee ${LOGFILE}
   needs:
 - debian-bookworm-clang-debug
 
+qemu-smoke-x86-64-clang-pvh:
+  extends: .qemu-x86-64
+  script:
+- ./automation/scripts/qemu-smoke-x86-64.sh pvh 2>&1 | tee ${LOGFILE}
+  needs:
+- debian-bookworm-clang
+
 qemu-smoke-riscv64-gcc:
   extends: .qemu-riscv64
   script:
-- 
2.44.0

[PATCH 0/2] Restrict Clang/LLVM supported versions

2024-02-29 Thread Roger Pau Monne

Hello,

The following series limits the supported versions of Clang to what I
actually care about, as I seem to be the only one that actively uses
Clang/LLVM builds of Xen.

Patch 2 adds non-debug tests for -clang builds.

Thanks, Roger.

Roger Pau Monne (2):
  README: bump minimum required clang/llvm version
  automation: introduce non debug clang based tests

 README  |  2 +-
 automation/gitlab-ci/build.yaml | 45 -
 automation/gitlab-ci/test.yaml  | 18 +++--
 3 files changed, 17 insertions(+), 48 deletions(-)

-- 
2.44.0

[PATCH] x86/altcall: always use a temporary parameter stashing variable

2024-02-28 Thread Roger Pau Monne

The usage in ALT_CALL_ARG() on clang of:

register union {
typeof(arg) e;
const unsigned long r;
} ...

When `arg` is the first argument to alternative_{,v}call() and
const_vlapic_vcpu() is used results in clang 3.5.0 complaining with:

arch/x86/hvm/vlapic.c:141:47: error: non-const static data member must be 
initialized out of line
 alternative_call(hvm_funcs.test_pir, const_vlapic_vcpu(vlapic), vec) )

Workaround this by pulling `arg1` into a local variable, like it's done for
further arguments (arg2, arg3...)

Originally arg1 wasn't pulled into a variable because for the a1_ register
local variable the possible clobbering as a result of operators on other
variables don't matter:

https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables

Note clang version 3.8.1 seems to already be fixed and don't require the
workaround, but since it's harmless do it uniformly everywhere.

Reported-by: Andrew Cooper 
Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function 
parameters on clang')
Signed-off-by: Roger Pau Monné 
---
Gitlab CI seems OK on both the 4.17 and staging branches with this applied:

4.17:https://gitlab.com/xen-project/people/royger/xen/-/pipelines/1193801183
staging: https://gitlab.com/xen-project/people/royger/xen/-/pipelines/1193801881
---
 xen/arch/x86/include/asm/alternative.h | 36 +-
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/include/asm/alternative.h 
b/xen/arch/x86/include/asm/alternative.h
index 3c14db5078ba..0d3697f1de49 100644
--- a/xen/arch/x86/include/asm/alternative.h
+++ b/xen/arch/x86/include/asm/alternative.h
@@ -253,21 +253,24 @@ extern void alternative_branches(void);
 })
 
 #define alternative_vcall1(func, arg) ({   \
-ALT_CALL_ARG(arg, 1);  \
+typeof(arg) v1_ = (arg);   \
+ALT_CALL_ARG(v1_, 1);  \
 ALT_CALL_NO_ARG2;  \
 (void)sizeof(func(arg));   \
 (void)alternative_callN(1, int, func); \
 })
 
 #define alternative_call1(func, arg) ({\
-ALT_CALL_ARG(arg, 1);  \
+typeof(arg) v1_ = (arg);   \
+ALT_CALL_ARG(v1_, 1);  \
 ALT_CALL_NO_ARG2;  \
 alternative_callN(1, typeof(func(arg)), func); \
 })
 
 #define alternative_vcall2(func, arg1, arg2) ({   \
+typeof(arg1) v1_ = (arg1);\
 typeof(arg2) v2_ = (arg2);\
-ALT_CALL_ARG(arg1, 1);\
+ALT_CALL_ARG(v1_, 1); \
 ALT_CALL_ARG(v2_, 2); \
 ALT_CALL_NO_ARG3; \
 (void)sizeof(func(arg1, arg2));   \
@@ -275,17 +278,19 @@ extern void alternative_branches(void);
 })
 
 #define alternative_call2(func, arg1, arg2) ({\
+typeof(arg1) v1_ = (arg1);\
 typeof(arg2) v2_ = (arg2);\
-ALT_CALL_ARG(arg1, 1);\
+ALT_CALL_ARG(v1_, 1); \
 ALT_CALL_ARG(v2_, 2); \
 ALT_CALL_NO_ARG3; \
 alternative_callN(2, typeof(func(arg1, arg2)), func); \
 })
 
 #define alternative_vcall3(func, arg1, arg2, arg3) ({\
+typeof(arg1) v1_ = (arg1);   \
 typeof(arg2) v2_ = (arg2);   \
 typeof(arg3) v3_ = (arg3);   \
-ALT_CALL_ARG(arg1, 1);   \
+ALT_CALL_ARG(v1_, 1);\
 ALT_CALL_ARG(v2_, 2);\
 ALT_CALL_ARG(v3_, 3);\
 ALT_CALL_NO_ARG4;\
@@ -294,9 +299,10 @@ extern void alternative_branches(void);
 })
 
 #define alternative_call3(func, arg1, arg2, arg3) ({ \
+typeof(arg1) v1_ = (arg1);\
 typeof(arg2) v2_ = (arg2);   \
 typeof(arg3) v3_ = (arg3);   \
-ALT_CALL_ARG(arg1, 1);   \
+ALT_CALL_ARG(v1_, 1);\
 ALT_CALL_ARG(v2_, 2);\
 ALT_CALL_ARG(v3_, 3);\
 ALT_CALL_NO_ARG4;\
@@ -305,10 +311,11 @@ extern void alternative_branches(void);
 })
 
 #define alternative_vcall4(func, arg1, arg2, arg3, arg4) ({ \
+typeof(arg1) v1_ = (arg1);  \
 typeof(arg2) v2_ = (arg2);  \
 typeof(arg3) v3_ = (arg3);  \

[PATCH v2 4/5] xen/livepatch: properly build the noapply and norevert tests

2024-02-27 Thread Roger Pau Monne

It seems the build variables for those tests where copy-pasted from
xen_action_hooks_marker-objs and not adjusted to use the correct source files.

Fixes: 6047104c3ccc ('livepatch: Add per-function applied/reverted state 
tracking marker')
Signed-off-by: Roger Pau Monné 
Reviewed-by: Ross Lagerwall 
---
 xen/test/livepatch/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/test/livepatch/Makefile b/xen/test/livepatch/Makefile
index c258ab0b5940..d987a8367f15 100644
--- a/xen/test/livepatch/Makefile
+++ b/xen/test/livepatch/Makefile
@@ -118,12 +118,12 @@ xen_action_hooks_marker-objs := xen_action_hooks_marker.o 
xen_hello_world_func.o
 $(obj)/xen_action_hooks_noapply.o: $(obj)/config.h
 
 extra-y += xen_action_hooks_noapply.livepatch
-xen_action_hooks_noapply-objs := xen_action_hooks_marker.o 
xen_hello_world_func.o note.o xen_note.o
+xen_action_hooks_noapply-objs := xen_action_hooks_noapply.o 
xen_hello_world_func.o note.o xen_note.o
 
 $(obj)/xen_action_hooks_norevert.o: $(obj)/config.h
 
 extra-y += xen_action_hooks_norevert.livepatch
-xen_action_hooks_norevert-objs := xen_action_hooks_marker.o 
xen_hello_world_func.o note.o xen_note.o
+xen_action_hooks_norevert-objs := xen_action_hooks_norevert.o 
xen_hello_world_func.o note.o xen_note.o
 
 EXPECT_BYTES_COUNT := 8
 CODE_GET_EXPECT=$(shell $(OBJDUMP) -d --insn-width=1 $(1) | sed -n -e 
'/<'$(2)'>:$$/,/^$$/ p' | tail -n +2 | head -n $(EXPECT_BYTES_COUNT) | awk 
'{$$0=$$2; printf "%s", substr($$0,length-1)}' | sed 's/.\{2\}/0x&,/g' | sed 
's/^/{/;s/,$$/}/g')
-- 
2.44.0

[PATCH v2 2/5] xen/livepatch: search for symbols in all loaded payloads

2024-02-27 Thread Roger Pau Monne

When checking if an address belongs to a patch, or when resolving a symbol,
take into account all loaded livepatch payloads, even if not applied.

This is required in order for the pre-apply and post-revert hooks to work
properly, or else Xen won't detect the instruction pointer belonging to those
hooks as being part of the currently active text.

Move the RCU handling to be used for payload_list instead of applied_list, as
now the calls from trap code will iterate over the payload_list.

Fixes: 8313c864fa95 ('livepatch: Implement pre-|post- apply|revert hooks')
Signed-off-by: Roger Pau Monné 
Reviewed-by: Ross Lagerwall 
---
 xen/common/livepatch.c | 49 +++---
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index d7f50e101858..14295bae8704 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -36,13 +36,14 @@
  * caller in schedule_work.
  */
 static DEFINE_SPINLOCK(payload_lock);
-static LIST_HEAD(payload_list);
-
 /*
- * Patches which have been applied. Need RCU in case we crash (and then
- * traps code would iterate via applied_list) when adding entries on the list.
+ * Need RCU in case we crash (and then traps code would iterate via
+ * payload_list) when adding entries on the list.
  */
-static DEFINE_RCU_READ_LOCK(rcu_applied_lock);
+static DEFINE_RCU_READ_LOCK(rcu_payload_lock);
+static LIST_HEAD(payload_list);
+
+/* Patches which have been applied. Only modified from stop machine context. */
 static LIST_HEAD(applied_list);
 
 static unsigned int payload_cnt;
@@ -111,12 +112,8 @@ bool is_patch(const void *ptr)
 const struct payload *data;
 bool r = false;
 
-/*
- * Only RCU locking since this list is only ever changed during apply
- * or revert context. And in case it dies there we need an safe list.
- */
-rcu_read_lock(_applied_lock);
-list_for_each_entry_rcu ( data, _list, applied_list )
+rcu_read_lock(_payload_lock);
+list_for_each_entry_rcu ( data, _list, list )
 {
 if ( (ptr >= data->rw_addr &&
   ptr < (data->rw_addr + data->rw_size)) ||
@@ -130,7 +127,7 @@ bool is_patch(const void *ptr)
 }
 
 }
-rcu_read_unlock(_applied_lock);
+rcu_read_unlock(_payload_lock);
 
 return r;
 }
@@ -166,12 +163,8 @@ static const char *cf_check livepatch_symbols_lookup(
 const void *va = (const void *)addr;
 const char *n = NULL;
 
-/*
- * Only RCU locking since this list is only ever changed during apply
- * or revert context. And in case it dies there we need an safe list.
- */
-rcu_read_lock(_applied_lock);
-list_for_each_entry_rcu ( data, _list, applied_list )
+rcu_read_lock(_payload_lock);
+list_for_each_entry_rcu ( data, _list, list )
 {
 if ( va < data->text_addr ||
  va >= (data->text_addr + data->text_size) )
@@ -200,7 +193,7 @@ static const char *cf_check livepatch_symbols_lookup(
 n = data->symtab[best].name;
 break;
 }
-rcu_read_unlock(_applied_lock);
+rcu_read_unlock(_payload_lock);
 
 return n;
 }
@@ -1072,7 +1065,8 @@ static void free_payload(struct payload *data)
 {
 ASSERT(spin_is_locked(_lock));
 unregister_virtual_region(>region);
-list_del(>list);
+list_del_rcu(>list);
+rcu_barrier();
 payload_cnt--;
 payload_version++;
 free_payload_data(data);
@@ -1172,7 +1166,7 @@ static int livepatch_upload(struct 
xen_sysctl_livepatch_upload *upload)
 INIT_LIST_HEAD(>applied_list);
 
 register_virtual_region(>region);
-list_add_tail(>list, _list);
+list_add_tail_rcu(>list, _list);
 payload_cnt++;
 payload_version++;
 }
@@ -1383,11 +1377,7 @@ static int apply_payload(struct payload *data)
 
 static inline void apply_payload_tail(struct payload *data)
 {
-/*
- * We need RCU variant (which has barriers) in case we crash here.
- * The applied_list is iterated by the trap code.
- */
-list_add_tail_rcu(>applied_list, _list);
+list_add_tail(>applied_list, _list);
 
 data->state = LIVEPATCH_STATE_APPLIED;
 }
@@ -1427,12 +1417,7 @@ static int revert_payload(struct payload *data)
 
 static inline void revert_payload_tail(struct payload *data)
 {
-
-/*
- * We need RCU variant (which has barriers) in case we crash here.
- * The applied_list is iterated by the trap code.
- */
-list_del_rcu(>applied_list);
+list_del(>applied_list);
 
 data->reverted = true;
 data->state = LIVEPATCH_STATE_CHECKED;
-- 
2.44.0

[PATCH v2 5/5] xen/livepatch: group and document payload hooks

2024-02-27 Thread Roger Pau Monne

Group the payload hooks between the pre/post handlers, and the apply/revert
replacements.  Also attempt to comment the context in which the hooks are
executed.

No functional change.

Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 xen/include/xen/livepatch_payload.h | 37 ++---
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/xen/include/xen/livepatch_payload.h 
b/xen/include/xen/livepatch_payload.h
index b9cd4f209670..472d6a4a63c1 100644
--- a/xen/include/xen/livepatch_payload.h
+++ b/xen/include/xen/livepatch_payload.h
@@ -82,6 +82,8 @@ struct payload {
  * collision.  Since multiple hooks can be registered, the
  * .livepatch.hook.load section is a table of functions that will be
  * executed in series by the livepatch infrastructure at patch load time.
+ *
+ * Note the load hook is executed in quiesced context.
  */
 #define LIVEPATCH_LOAD_HOOK(_fn) \
 livepatch_loadcall_t *__weak \
@@ -96,14 +98,20 @@ struct payload {
  livepatch_unloadcall_t *__weak \
 const livepatch_unload_data_##_fn __section(".livepatch.hooks.unload") 
= _fn;
 
+/*
+ * Pre/Post action hooks.
+ *
+ * This hooks are executed before or after the livepatch application. Pre hooks
+ * can veto the application/revert of the livepatch.  They are not executed in
+ * quiesced context.  All of pre and post hooks are considered vetoing, and
+ * hence filling any of those will block the usage of the REPLACE action.
+ *
+ * Each of the hooks below can only be set once per livepatch payload.
+ */
 #define LIVEPATCH_PREAPPLY_HOOK(_fn) \
 livepatch_precall_t *__attribute__((weak, used)) \
 const livepatch_preapply_data_##_fn 
__section(".livepatch.hooks.preapply") = _fn;
 
-#define LIVEPATCH_APPLY_HOOK(_fn) \
-livepatch_actioncall_t *__attribute__((weak, used)) \
-const livepatch_apply_data_##_fn __section(".livepatch.hooks.apply") = 
_fn;
-
 #define LIVEPATCH_POSTAPPLY_HOOK(_fn) \
 livepatch_postcall_t *__attribute__((weak, used)) \
 const livepatch_postapply_data_##_fn 
__section(".livepatch.hooks.postapply") = _fn;
@@ -112,14 +120,27 @@ struct payload {
 livepatch_precall_t *__attribute__((weak, used)) \
 const livepatch_prerevert_data_##_fn 
__section(".livepatch.hooks.prerevert") = _fn;
 
-#define LIVEPATCH_REVERT_HOOK(_fn) \
-livepatch_actioncall_t *__attribute__((weak, used)) \
-const livepatch_revert_data_##_fn __section(".livepatch.hooks.revert") 
= _fn;
-
 #define LIVEPATCH_POSTREVERT_HOOK(_fn) \
 livepatch_postcall_t *__attribute__((weak, used)) \
 const livepatch_postrevert_data_##_fn 
__section(".livepatch.hooks.postrevert") = _fn;
 
+/*
+ * Action replacement hooks.
+ *
+ * The following hooks replace the hypervisor implementation for the livepatch
+ * application and revert routines.  When filling the hooks below the native
+ * apply and revert routines will not be executed, so the provided hooks need
+ * to make sure the state of the payload after apply or revert is as expected
+ * by the livepatch logic.
+ */
+#define LIVEPATCH_APPLY_HOOK(_fn) \
+livepatch_actioncall_t *__attribute__((weak, used)) \
+const livepatch_apply_data_##_fn __section(".livepatch.hooks.apply") = 
_fn;
+
+#define LIVEPATCH_REVERT_HOOK(_fn) \
+livepatch_actioncall_t *__attribute__((weak, used)) \
+const livepatch_revert_data_##_fn __section(".livepatch.hooks.revert") 
= _fn;
+
 #endif /* __XEN_LIVEPATCH_PAYLOAD_H__ */
 
 /*
-- 
2.44.0

[PATCH v2 3/5] xen/livepatch: fix norevert test attempt to open-code revert

2024-02-27 Thread Roger Pau Monne

The purpose of the norevert test is to install a dummy handler that replaces
the internal Xen revert code, and then perform the revert in the post-revert
hook.  For that purpose the usage of the previous common_livepatch_revert() is
not enough, as that just reverts specific functions, but not the whole state of
the payload.

Remove both common_livepatch_{apply,revert}() and instead expose
revert_payload{,_tail}() in order to perform the patch revert from the
post-revert hook.

Fixes: 6047104c3ccc ('livepatch: Add per-function applied/reverted state 
tracking marker')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Check return value of revert_payload().
---
 xen/common/livepatch.c| 41 +--
 xen/include/xen/livepatch.h   | 32 ++-
 .../livepatch/xen_action_hooks_norevert.c | 22 +++---
 3 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 14295bae8704..5a7d5b7be0ad 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -1366,7 +1366,22 @@ static int apply_payload(struct payload *data)
 ASSERT(!local_irq_is_enabled());
 
 for ( i = 0; i < data->nfuncs; i++ )
-common_livepatch_apply(>funcs[i], >fstate[i]);
+{
+const struct livepatch_func *func = >funcs[i];
+struct livepatch_fstate *state = >fstate[i];
+
+/* If the action has been already executed on this function, do 
nothing. */
+if ( state->applied == LIVEPATCH_FUNC_APPLIED )
+{
+printk(XENLOG_WARNING LIVEPATCH
+   "%s: %s has been already applied before\n",
+   __func__, func->name);
+continue;
+}
+
+arch_livepatch_apply(func, state);
+state->applied = LIVEPATCH_FUNC_APPLIED;
+}
 
 arch_livepatch_revive();
 
@@ -1382,7 +1397,7 @@ static inline void apply_payload_tail(struct payload 
*data)
 data->state = LIVEPATCH_STATE_APPLIED;
 }
 
-static int revert_payload(struct payload *data)
+int revert_payload(struct payload *data)
 {
 unsigned int i;
 int rc;
@@ -1397,7 +1412,25 @@ static int revert_payload(struct payload *data)
 }
 
 for ( i = 0; i < data->nfuncs; i++ )
-common_livepatch_revert(>funcs[i], >fstate[i]);
+{
+const struct livepatch_func *func = >funcs[i];
+struct livepatch_fstate *state = >fstate[i];
+
+/*
+ * If the apply action hasn't been executed on this function, do
+ * nothing.
+ */
+if ( !func->old_addr || state->applied == LIVEPATCH_FUNC_NOT_APPLIED )
+{
+printk(XENLOG_WARNING LIVEPATCH
+   "%s: %s has not been applied before\n",
+   __func__, func->name);
+continue;
+}
+
+arch_livepatch_revert(func, state);
+state->applied = LIVEPATCH_FUNC_NOT_APPLIED;
+}
 
 /*
  * Since we are running with IRQs disabled and the hooks may call common
@@ -1415,7 +1448,7 @@ static int revert_payload(struct payload *data)
 return 0;
 }
 
-static inline void revert_payload_tail(struct payload *data)
+void revert_payload_tail(struct payload *data)
 {
 list_del(>applied_list);
 
diff --git a/xen/include/xen/livepatch.h b/xen/include/xen/livepatch.h
index ad0eae28bd0d..d074a5bebecc 100644
--- a/xen/include/xen/livepatch.h
+++ b/xen/include/xen/livepatch.h
@@ -138,35 +138,11 @@ void arch_livepatch_post_action(void);
 void arch_livepatch_mask(void);
 void arch_livepatch_unmask(void);
 
-static inline void common_livepatch_apply(const struct livepatch_func *func,
-  struct livepatch_fstate *state)
-{
-/* If the action has been already executed on this function, do nothing. */
-if ( state->applied == LIVEPATCH_FUNC_APPLIED )
-{
-printk(XENLOG_WARNING LIVEPATCH "%s: %s has been already applied 
before\n",
-__func__, func->name);
-return;
-}
-
-arch_livepatch_apply(func, state);
-state->applied = LIVEPATCH_FUNC_APPLIED;
-}
+/* Only for testing purposes. */
+struct payload;
+int revert_payload(struct payload *data);
+void revert_payload_tail(struct payload *data);
 
-static inline void common_livepatch_revert(const struct livepatch_func *func,
-   struct livepatch_fstate *state)
-{
-/* If the apply action hasn't been executed on this function, do nothing. 
*/
-if ( !func->old_addr || state->applied == LIVEPATCH_FUNC_NOT_APPLIED )
-{
-printk(XENLOG_WARNING LIVEPATCH "%s: %s has not been applied before\n",
-__func__, func->name);
-return;
-}
-
-arch_livepatch_revert(func, state);
-state->applied = LIVEPATCH_FUNC_NOT_APPLIED;
-}
 #else
 
 /*
diff --git a/xen/test/livepatch/xen_action_hooks_norevert.c 
b/xen/test/livepatch/xen_action_hooks_norevert.c
index c17385519263..c5fbab174680 100644

[PATCH v2 0/5] xen/livepatch: fixes for the pre-apply / post-revert hooks

2024-02-27 Thread Roger Pau Monne

Hello,

The follow series contain a misc of fixes mostly related to the usage of
the pre-apply / post-revert hooks.  The norevert test is also fixed to
work as I think was expected.  Finally both the no{apply,revert}
tests are fixed to build properly, as the files where previously
unhooked from the build system completely.

I'm unsure how useful the apply and revert hooks really are, as without
calling the internal apply/revert functions the state of the payload
structure is quite likely inconsistent with the code expectations.

Thanks, Roger.

Roger Pau Monne (5):
  xen/livepatch: register livepatch regions when loaded
  xen/livepatch: search for symbols in all loaded payloads
  xen/livepatch: fix norevert test attempt to open-code revert
  xen/livepatch: properly build the noapply and norevert tests
  xen/livepatch: group and document payload hooks

 xen/common/livepatch.c| 94 +++
 xen/common/virtual_region.c   | 42 -
 xen/include/xen/livepatch.h   | 32 +--
 xen/include/xen/livepatch_payload.h   | 37 ++--
 xen/test/livepatch/Makefile   |  4 +-
 .../livepatch/xen_action_hooks_norevert.c | 22 +
 6 files changed, 112 insertions(+), 119 deletions(-)

-- 
2.44.0

[PATCH v2 1/5] xen/livepatch: register livepatch regions when loaded

2024-02-27 Thread Roger Pau Monne

Currently livepatch regions are registered as virtual regions only after the
livepatch has been applied.

This can lead to issues when using the pre-apply or post-revert hooks, as at
that point the livepatch is not in the virtual regions list.  If a livepatch
pre-apply hook contains a WARN() it would trigger an hypervisor crash, as the
code to handle the bug frame won't be able to find the instruction pointer that
triggered the #UD in any of the registered virtual regions, and hence crash.

Fix this by adding the livepatch payloads as virtual regions as soon as loaded,
and only remove them once the payload is unloaded.  This requires some changes
to the virtual regions code, as the removal of the virtual regions is no longer
done in stop machine context, and hence an RCU barrier is added in order to
make sure there are no users of the virtual region after it's been removed from
the list.

Fixes: 8313c864fa95 ('livepatch: Implement pre-|post- apply|revert hooks')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Add the virtual region in livepatch_upload().
 - Make unregister_virtual_region() depend on CONFIG_LIVEPATCH.
---
 xen/common/livepatch.c  |  4 ++--
 xen/common/virtual_region.c | 42 ++---
 2 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 2c4b84382798..d7f50e101858 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -1071,6 +1071,7 @@ static int build_symbol_table(struct payload *payload,
 static void free_payload(struct payload *data)
 {
 ASSERT(spin_is_locked(_lock));
+unregister_virtual_region(>region);
 list_del(>list);
 payload_cnt--;
 payload_version++;
@@ -1170,6 +1171,7 @@ static int livepatch_upload(struct 
xen_sysctl_livepatch_upload *upload)
 INIT_LIST_HEAD(>list);
 INIT_LIST_HEAD(>applied_list);
 
+register_virtual_region(>region);
 list_add_tail(>list, _list);
 payload_cnt++;
 payload_version++;
@@ -1386,7 +1388,6 @@ static inline void apply_payload_tail(struct payload 
*data)
  * The applied_list is iterated by the trap code.
  */
 list_add_tail_rcu(>applied_list, _list);
-register_virtual_region(>region);
 
 data->state = LIVEPATCH_STATE_APPLIED;
 }
@@ -1432,7 +1433,6 @@ static inline void revert_payload_tail(struct payload 
*data)
  * The applied_list is iterated by the trap code.
  */
 list_del_rcu(>applied_list);
-unregister_virtual_region(>region);
 
 data->reverted = true;
 data->state = LIVEPATCH_STATE_CHECKED;
diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
index ddac5c9147e5..e3a4dc8540df 100644
--- a/xen/common/virtual_region.c
+++ b/xen/common/virtual_region.c
@@ -23,14 +23,8 @@ static struct virtual_region core_init __initdata = {
 };
 
 /*
- * RCU locking. Additions are done either at startup (when there is only
- * one CPU) or when all CPUs are running without IRQs.
- *
- * Deletions are bit tricky. We do it when Live Patch (all CPUs running
- * without IRQs) or during bootup (when clearing the init).
- *
- * Hence we use list_del_rcu (which sports an memory fence) and a spinlock
- * on deletion.
+ * RCU locking. Modifications to the list must be done in exclusive mode, and
+ * hence need to hold the spinlock.
  *
  * All readers of virtual_region_list MUST use list_for_each_entry_rcu.
  */
@@ -58,41 +52,36 @@ const struct virtual_region *find_text_region(unsigned long 
addr)
 
 void register_virtual_region(struct virtual_region *r)
 {
-ASSERT(!local_irq_is_enabled());
+unsigned long flags;
 
+spin_lock_irqsave(_region_lock, flags);
 list_add_tail_rcu(>list, _region_list);
+spin_unlock_irqrestore(_region_lock, flags);
 }
 
-static void remove_virtual_region(struct virtual_region *r)
+/*
+ * Suggest inline so when !CONFIG_LIVEPATCH the function is not left
+ * unreachable after init code is removed.
+ */
+static void inline remove_virtual_region(struct virtual_region *r)
 {
 unsigned long flags;
 
 spin_lock_irqsave(_region_lock, flags);
 list_del_rcu(>list);
 spin_unlock_irqrestore(_region_lock, flags);
-/*
- * We do not need to invoke call_rcu.
- *
- * This is due to the fact that on the deletion we have made sure
- * to use spinlocks (to guard against somebody else calling
- * unregister_virtual_region) and list_deletion spiced with
- * memory barrier.
- *
- * That protects us from corrupting the list as the readers all
- * use list_for_each_entry_rcu which is safe against concurrent
- * deletions.
- */
 }
 
+#ifdef CONFIG_LIVEPATCH
 void unregister_virtual_region(struct virtual_region *r)
 {
-/* Expected to be called from Live Patch - which has IRQs disabled. */
-ASSERT(!local_irq_is_enabled());
-
 remove_virtual_region(r);
+
+/* Assert that no CPU might be using the removed region. */
+rcu_barrier();
 }

[PATCH v3 4/4] x86/spec: do not print thunk option selection if not built-in

2024-02-26 Thread Roger Pau Monne

Now that the thunk built-in enable is printed as part of the "Compiled-in
support:" line, avoid printing anything in "Xen settings:" if the thunk is
disabled at build time.

Note the BTI-Thunk option printing is also adjusted to print a colon in the
same way the other options on the line do.

Requested-by: Jan Beulich 
Signed-off-by: Roger Pau Monné 
---
Changes since v3:
 - New in this version.
---
 xen/arch/x86/spec_ctrl.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index ca82b9e41ccd..e8b0e62adba4 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -504,11 +504,12 @@ static void __init print_details(enum ind_thunk thunk)
"\n");
 
 /* Settings for Xen's protection, irrespective of guests. */
-printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, 
Other:%s%s%s%s%s%s\n",
-   thunk == THUNK_NONE  ? "N/A" :
-   thunk == THUNK_RETPOLINE ? "RETPOLINE" :
-   thunk == THUNK_LFENCE? "LFENCE" :
-   thunk == THUNK_JMP   ? "JMP" : "?",
+printk("  Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
+   thunk != THUNK_NONE  ? "BTI-Thunk: " : "",
+   thunk == THUNK_NONE  ? "" :
+   thunk == THUNK_RETPOLINE ? "RETPOLINE, " :
+   thunk == THUNK_LFENCE? "LFENCE, " :
+   thunk == THUNK_JMP   ? "JMP, " : "?, ",
(!boot_cpu_has(X86_FEATURE_IBRSB) &&
 !boot_cpu_has(X86_FEATURE_IBRS)) ? "No" :
(default_xen_spec_ctrl & SPEC_CTRL_IBRS)  ? "IBRS+" :  "IBRS-",
-- 
2.43.0

[PATCH v3 0/4] x86/spec: improve command line parsing

2024-02-26 Thread Roger Pau Monne

Hello,

A couple of improvements for the spec-ctrl command line parsing.

Thanks, Roger.

Roger Pau Monne (4):
  x86/spec: print the built-in SPECULATIVE_HARDEN_* options
  x86/spec: fix BRANCH_HARDEN option to only be set when build-enabled
  x86/spec: fix INDIRECT_THUNK option to only be set when build-enabled
  x86/spec: do not print thunk option selection if not built-in

 docs/misc/xen-command-line.pandoc | 10 +++
 xen/arch/x86/spec_ctrl.c  | 46 +--
 2 files changed, 42 insertions(+), 14 deletions(-)

-- 
2.43.0

[PATCH v3 2/4] x86/spec: fix BRANCH_HARDEN option to only be set when build-enabled

2024-02-26 Thread Roger Pau Monne

The current logic to handle the BRANCH_HARDEN option will report it as enabled
even when build-time disabled. Fix this by only allowing the option to be set
when support for it is built into Xen.

Fixes: 2d6f36daa086 ('x86/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_BRANCH')
Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Use IS_ENABLED() in the parser.

Changes since v1:
 - Use no_config_param().
---
 xen/arch/x86/spec_ctrl.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 9f5ed8772533..5fae80774519 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -50,7 +50,8 @@ static int8_t __initdata opt_psfd = -1;
 int8_t __ro_after_init opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
-static bool __initdata opt_branch_harden = true;
+static bool __initdata opt_branch_harden =
+IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
 
 bool __initdata bsp_delay_spec_ctrl;
 uint8_t __read_mostly default_xen_spec_ctrl;
@@ -268,7 +269,16 @@ static int __init cf_check parse_spec_ctrl(const char *s)
 else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
 opt_l1d_flush = val;
 else if ( (val = parse_boolean("branch-harden", s, ss)) >= 0 )
-opt_branch_harden = val;
+{
+if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) )
+opt_branch_harden = val;
+else
+{
+no_config_param("SPECULATIVE_HARDEN_BRANCH", "spec-ctrl", s,
+ss);
+rc = -EINVAL;
+}
+}
 else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
 opt_srb_lock = val;
 else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
-- 
2.43.0

[PATCH v3 3/4] x86/spec: fix INDIRECT_THUNK option to only be set when build-enabled

2024-02-26 Thread Roger Pau Monne

Attempt to provide a more helpful error message when the user attempts to set
spec-ctrl=bti-thunk option but the support is build-time disabled.

While there also adjust the command line documentation to mention
CONFIG_INDIRECT_THUNK instead of INDIRECT_THUNK.

Reported-by: Andrew Cooper 
Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Adjust documentation.
 - Use IS_ENABLED() instead of #ifdef.

Changes since v1:
 - New in this version.
---
 docs/misc/xen-command-line.pandoc | 10 +-
 xen/arch/x86/spec_ctrl.c  |  7 ++-
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index be76be8d5365..02896598df6f 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2417,11 +2417,11 @@ guests to use.
   performance reasons dom0 is unprotected by default.  If it is necessary to
   protect dom0 too, boot with `spec-ctrl=ibpb-entry`.
 
-If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
-select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
-locations.  The default thunk is `retpoline` (generally preferred), with the
-alternatives being `jmp` (a `jmp *%reg` gadget, minimal overhead), and
-`lfence` (an `lfence; jmp *%reg` gadget).
+If Xen was compiled with `CONFIG_INDIRECT_THUNK` support, `bti-thunk=` can be
+used to select which of the thunks gets patched into the
+`__x86_indirect_thunk_%reg` locations.  The default thunk is `retpoline`
+(generally preferred), with the alternatives being `jmp` (a `jmp *%reg` gadget,
+minimal overhead), and `lfence` (an `lfence; jmp *%reg` gadget).
 
 On hardware supporting IBRS (Indirect Branch Restricted Speculation), the
 `ibrs=` option can be used to force or prevent Xen using the feature itself.
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 5fae80774519..ca82b9e41ccd 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -241,7 +241,12 @@ static int __init cf_check parse_spec_ctrl(const char *s)
 {
 s += 10;
 
-if ( !cmdline_strcmp(s, "retpoline") )
+if ( !IS_ENABLED(CONFIG_INDIRECT_THUNK) )
+{
+no_config_param("INDIRECT_THUNK", "spec-ctrl=bti-thunk", s, 
ss);
+rc = -EINVAL;
+}
+else if ( !cmdline_strcmp(s, "retpoline") )
 opt_thunk = THUNK_RETPOLINE;
 else if ( !cmdline_strcmp(s, "lfence") )
 opt_thunk = THUNK_LFENCE;
-- 
2.43.0

[PATCH v3 1/4] x86/spec: print the built-in SPECULATIVE_HARDEN_* options

2024-02-26 Thread Roger Pau Monne

Just like it's done for INDIRECT_THUNK and SHADOW_PAGING.

Reported-by: Jan Beulich 
Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - New in this version.
---
 xen/arch/x86/spec_ctrl.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 421fe3f640df..9f5ed8772533 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -466,13 +466,25 @@ static void __init print_details(enum ind_thunk thunk)
(e21a & cpufeat_mask(X86_FEATURE_SBPB))   ? " SBPB" 
  : "");
 
 /* Compiled-in support which pertains to mitigations. */
-if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) 
)
+if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) 
||
+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) ||
+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) ||
+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) )
 printk("  Compiled-in support:"
 #ifdef CONFIG_INDIRECT_THUNK
" INDIRECT_THUNK"
 #endif
 #ifdef CONFIG_SHADOW_PAGING
" SHADOW_PAGING"
+#endif
+#ifdef CONFIG_SPECULATIVE_HARDEN_ARRAY
+   " SPECULATIVE_HARDEN_ARRAY"
+#endif
+#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
+   " SPECULATIVE_HARDEN_BRANCH"
+#endif
+#ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS
+   " SPECULATIVE_HARDEN_GUEST_ACCESS"
 #endif
"\n");
 
-- 
2.43.0

[PATCH v3] x86/altcall: use an union as register type for function parameters on clang

2024-02-23 Thread Roger Pau Monne

The current code for alternative calls uses the caller parameter types as the
types for the register variables that serve as function parameters:

uint8_t foo;
[...]
alternative_call(myfunc, foo);

Would expand roughly into:

register unint8_t a1_ asm("rdi") = foo;
register unsigned long a2_ asm("rsi");
[...]
asm volatile ("call *%c[addr](%%rip)"...);

However with -O2 clang will generate incorrect code, given the following
example:

unsigned int func(uint8_t t)
{
return t;
}

static void bar(uint8_t b)
{
int ret_;
register uint8_t di asm("rdi") = b;
register unsigned long si asm("rsi");
register unsigned long dx asm("rdx");
register unsigned long cx asm("rcx");
register unsigned long r8 asm("r8");
register unsigned long r9 asm("r9");
register unsigned long r10 asm("r10");
register unsigned long r11 asm("r11");

asm volatile ( "call %c[addr]"
   : "+r" (di), "=r" (si), "=r" (dx),
 "=r" (cx), "=r" (r8), "=r" (r9),
 "=r" (r10), "=r" (r11), "=a" (ret_)
   : [addr] "i" (&(func)), "g" (func)
   : "memory" );
}

void foo(unsigned int a)
{
bar(a);
}

Clang generates the following assembly code:

func:   # @func
movl%edi, %eax
retq
foo:# @foo
callq   func
retq

Note the truncation of the unsigned int parameter 'a' of foo() to uint8_t when
passed into bar() is lost.  clang doesn't zero extend the parameters in the
callee when required, as the psABI mandates.

The above can be worked around by using an union when defining the register
variables, so that `di` becomes:

register union {
uint8_t e;
unsigned long r;
} di asm("rdi") = { .e = b };

Which results in following code generated for `foo()`:

foo:# @foo
movzbl  %dil, %edi
callq   func
retq

So the truncation is not longer lost.  Apply such workaround only when built
with clang.

Reported-by: Matthew Grooms 
Link: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277200
Link: https://github.com/llvm/llvm-project/issues/12579
Link: https://github.com/llvm/llvm-project/issues/82598
Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Expand the code comment.

Changes since v1:
 - Only apply the union workaround with clang.

Seems like all gitlab build tests are OK with this approach.
---
 xen/arch/x86/include/asm/alternative.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/xen/arch/x86/include/asm/alternative.h 
b/xen/arch/x86/include/asm/alternative.h
index a1cd6a9fe5b8..3c14db5078ba 100644
--- a/xen/arch/x86/include/asm/alternative.h
+++ b/xen/arch/x86/include/asm/alternative.h
@@ -167,9 +167,34 @@ extern void alternative_branches(void);
 #define ALT_CALL_arg5 "r8"
 #define ALT_CALL_arg6 "r9"
 
+#ifdef CONFIG_CC_IS_CLANG
+/*
+ * Use a union with an unsigned long in order to prevent clang from
+ * skipping a possible truncation of the value.  By using the union any
+ * truncation is carried before the call instruction, in turn covering
+ * for ABI-non-compliance in that the necessary clipping / extension of
+ * the value is supposed to be carried out in the callee.
+ *
+ * Note this behavior is not mandated by the standard, and hence could
+ * stop being a viable workaround, or worse, could cause a different set
+ * of code-generation issues in future clang versions.
+ *
+ * This has been reported upstream:
+ * https://github.com/llvm/llvm-project/issues/12579
+ * https://github.com/llvm/llvm-project/issues/82598
+ */
+#define ALT_CALL_ARG(arg, n)\
+register union {\
+typeof(arg) e;  \
+unsigned long r;\
+} a ## n ## _ asm ( ALT_CALL_arg ## n ) = { \
+.e = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })   \
+}
+#else
 #define ALT_CALL_ARG(arg, n) \
 register typeof(arg) a ## n ## _ asm ( ALT_CALL_arg ## n ) = \
 ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })
+#endif
 #define ALT_CALL_NO_ARG(n) \
 register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n )
 
-- 
2.43.0

[PATCH v2 3/3] x86/spec: fix INDIRECT_THUNK option to only be set when build-enabled

2024-02-23 Thread Roger Pau Monne

Attempt to provide a more helpful error message when the user attempts to set
spec-ctrl=bti-thunk option but the support is build-time disabled.

Reported-by: Andrew Cooper 
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 xen/arch/x86/spec_ctrl.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 4ce8a7a0b8ef..f3432f1a6c80 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -239,6 +239,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
 /* Xen's speculative sidechannel mitigation settings. */
 else if ( !strncmp(s, "bti-thunk=", 10) )
 {
+#ifdef CONFIG_INDIRECT_THUNK
 s += 10;
 
 if ( !cmdline_strcmp(s, "retpoline") )
@@ -249,6 +250,10 @@ static int __init cf_check parse_spec_ctrl(const char *s)
 opt_thunk = THUNK_JMP;
 else
 rc = -EINVAL;
+#else
+no_config_param("INDIRECT_THUNK", "spec-ctrl", s, ss);
+rc = -EINVAL;
+#endif
 }
 
 /* Bits in MSR_SPEC_CTRL. */
-- 
2.43.0

[PATCH v2 1/3] xen/cmdline: fix printf format specifier in no_config_param()

2024-02-23 Thread Roger Pau Monne

'*' sets the width field, which is the minimum number of characters to output,
but what we want in no_config_param() is the precision instead, which is '.*'
as it imposes a maximum limit on the output.

Fixes: 68d757df8dd2 ('x86/pv: Options to disable and/or compile out 32bit PV 
support')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - New in this version.
---
 xen/include/xen/param.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/xen/param.h b/xen/include/xen/param.h
index 9170455cde5c..6442a92aff8e 100644
--- a/xen/include/xen/param.h
+++ b/xen/include/xen/param.h
@@ -191,7 +191,7 @@ static inline void no_config_param(const char *cfg, const 
char *param,
 {
 int len = e ? ({ ASSERT(e >= s); e - s; }) : strlen(s);
 
-printk(XENLOG_INFO "CONFIG_%s disabled - ignoring '%s=%*s' setting\n",
+printk(XENLOG_INFO "CONFIG_%s disabled - ignoring '%s=%.*s' setting\n",
cfg, param, len, s);
 }
 
-- 
2.43.0

[PATCH v2 2/3] x86/spec: fix BRANCH_HARDEN option to only be set when build-enabled

2024-02-23 Thread Roger Pau Monne

The current logic to handle the BRANCH_HARDEN option will report it as enabled
even when build-time disabled. Fix this by only allowing the option to be set
when support for it is built into Xen.

Fixes: 2d6f36daa086 ('x86/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_BRANCH')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Use no_config_param().
---
 xen/arch/x86/spec_ctrl.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 421fe3f640df..4ce8a7a0b8ef 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -50,7 +50,8 @@ static int8_t __initdata opt_psfd = -1;
 int8_t __ro_after_init opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
-static bool __initdata opt_branch_harden = true;
+static bool __initdata opt_branch_harden =
+IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
 
 bool __initdata bsp_delay_spec_ctrl;
 uint8_t __read_mostly default_xen_spec_ctrl;
@@ -268,7 +269,14 @@ static int __init cf_check parse_spec_ctrl(const char *s)
 else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
 opt_l1d_flush = val;
 else if ( (val = parse_boolean("branch-harden", s, ss)) >= 0 )
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
 opt_branch_harden = val;
+#else
+no_config_param("SPECULATIVE_HARDEN_BRANCH", "spec-ctrl", s, ss);
+rc = -EINVAL;
+#endif
+}
 else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
 opt_srb_lock = val;
 else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
-- 
2.43.0

[PATCH v2 0/3] x86/spec: improve command line parsing

2024-02-23 Thread Roger Pau Monne

Hello,

A couple of improvements for the spec-ctrl command line parsing, and one
bugfix for no_config_param().

Thanks, Roger.

Roger Pau Monne (3):
  xen/cmdline: fix printf format specifier in no_config_param()
  x86/spec: fix BRANCH_HARDEN option to only be set when build-enabled
  x86/spec: fix INDIRECT_THUNK option to only be set when build-enabled

 xen/arch/x86/spec_ctrl.c | 15 ++-
 xen/include/xen/param.h  |  2 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

-- 
2.43.0

[PATCH] x86/spec: fix BRANCH_HARDEN option to only be set when build-enabled

2024-02-23 Thread Roger Pau Monne

The current logic to handle the BRANCH_HARDEN option will report it as enabled
even when build-time disabled. Fix this by only allowing the option to be set
when support for it is built into Xen.

Fixes: 2d6f36daa086 ('x86/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_BRANCH')
Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/spec_ctrl.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 421fe3f640df..e634c6b559b4 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -50,7 +50,8 @@ static int8_t __initdata opt_psfd = -1;
 int8_t __ro_after_init opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
-static bool __initdata opt_branch_harden = true;
+static bool __initdata opt_branch_harden =
+IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
 
 bool __initdata bsp_delay_spec_ctrl;
 uint8_t __read_mostly default_xen_spec_ctrl;
@@ -267,7 +268,8 @@ static int __init cf_check parse_spec_ctrl(const char *s)
 opt_eager_fpu = val;
 else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
 opt_l1d_flush = val;
-else if ( (val = parse_boolean("branch-harden", s, ss)) >= 0 )
+else if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) &&
+  (val = parse_boolean("branch-harden", s, ss)) >= 0 )
 opt_branch_harden = val;
 else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
 opt_srb_lock = val;
-- 
2.43.0

[PATCH v2] x86/altcall: use an union as register type for function parameters

2024-02-22 Thread Roger Pau Monne

The current code for alternative calls uses the caller parameter types as the
types for the register variables that serve as function parameters:

uint8_t foo;
[...]
alternative_call(myfunc, foo);

Would expand roughly into:

register unint8_t a1_ asm("rdi") = foo;
register unsigned long a2_ asm("rsi");
[...]
asm volatile ("call *%c[addr](%%rip)"...);

However under certain circumstances clang >= 16.0.0 with -O2 can generate
incorrect code, given the following example:

unsigned int func(uint8_t t)
{
return t;
}

static void bar(uint8_t b)
{
int ret_;
register uint8_t di asm("rdi") = b;
register unsigned long si asm("rsi");
register unsigned long dx asm("rdx");
register unsigned long cx asm("rcx");
register unsigned long r8 asm("r8");
register unsigned long r9 asm("r9");
register unsigned long r10 asm("r10");
register unsigned long r11 asm("r11");

asm volatile ( "call %c[addr]"
   : "+r" (di), "=r" (si), "=r" (dx),
 "=r" (cx), "=r" (r8), "=r" (r9),
 "=r" (r10), "=r" (r11), "=a" (ret_)
   : [addr] "i" (&(func)), "g" (func)
   : "memory" );
}

void foo(unsigned int a)
{
bar(a);
}

Clang generates the following code:

func:   # @func
movl%edi, %eax
retq
foo:# @foo
callq   func
retq

Note the truncation of the unsigned int parameter 'a' of foo() to uint8_t when
passed into bar() is lost.

The above can be worked around by using an union when defining the register
variables, so that `di` becomes:

register union {
uint8_t e;
unsigned long r;
} di asm("rdi") = { .e = b };

Which results in following code generated for `foo()`:

foo:# @foo
movzbl  %dil, %edi
callq   func
retq

So the truncation is not longer lost.  Apply such workaround only when built
with clang.

Reported-by: Matthew Grooms 
Link: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277200
Link: https://github.com/llvm/llvm-project/issues/82598
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Only apply the union workaround with clang.

Seems like all gitlab build tests are OK with this approach.
---
 xen/arch/x86/include/asm/alternative.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/xen/arch/x86/include/asm/alternative.h 
b/xen/arch/x86/include/asm/alternative.h
index a1cd6a9fe5b8..3fe27ea791bf 100644
--- a/xen/arch/x86/include/asm/alternative.h
+++ b/xen/arch/x86/include/asm/alternative.h
@@ -167,9 +167,25 @@ extern void alternative_branches(void);
 #define ALT_CALL_arg5 "r8"
 #define ALT_CALL_arg6 "r9"
 
+#ifdef CONFIG_CC_IS_CLANG
+/*
+ * Use an union with an unsigned long in order to prevent clang from skipping a
+ * possible truncation of the value.  By using the union any truncation is
+ * carried before the call instruction.
+ * https://github.com/llvm/llvm-project/issues/82598
+ */
+#define ALT_CALL_ARG(arg, n)\
+register union {\
+typeof(arg) e;  \
+unsigned long r;\
+} a ## n ## _ asm ( ALT_CALL_arg ## n ) = { \
+.e = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })   \
+}
+#else
 #define ALT_CALL_ARG(arg, n) \
 register typeof(arg) a ## n ## _ asm ( ALT_CALL_arg ## n ) = \
 ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })
+#endif
 #define ALT_CALL_NO_ARG(n) \
 register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n )
 
-- 
2.43.0

[PATCH 0/2] xen/x86: cmpxchg cleanup

2024-02-22 Thread Roger Pau Monne

Hello,

Following series replace a couple of cmpxchg loops with an atomic inc.
The usage of such loops probably dates back to 32bit support, which
didn't have an instruction to do an atomic 64bit addition.

Thanks, Roger.

Roger Pau Monne (2):
  x86/memsharing: use an atomic add instead of a cmpxchg loop
  x86/hpet: use an atomic add instead of a cmpxchg loop

 xen/arch/x86/hpet.c   | 6 +-
 xen/arch/x86/mm/mem_sharing.c | 8 +---
 2 files changed, 2 insertions(+), 12 deletions(-)

-- 
2.43.0

[PATCH 2/2] x86/hpet: use an atomic add instead of a cmpxchg loop

2024-02-22 Thread Roger Pau Monne

The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same
can be achieved with an atomic increment, which is both simpler to read, and
avoid any need for a loop.

Note there can be a small divergence in the channel returned if next_channel
overflows, but returned channel will always be in the [0, num_hpets_used)
range, and that's fine for the purpose of balancing HPET channels across CPUs.
This is also theoretical, as there's no system currently with 2^32 CPUs (as
long as next_channel is 32bit width).

Signed-of-by: Roger Pau Monné 
---
 xen/arch/x86/hpet.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index d982b0f6b2c9..4777dc859d96 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -458,11 +458,7 @@ static struct hpet_event_channel 
*hpet_get_channel(unsigned int cpu)
 if ( num_hpets_used >= nr_cpu_ids )
 return _events[cpu];
 
-do {
-next = next_channel;
-if ( (i = next + 1) == num_hpets_used )
-i = 0;
-} while ( cmpxchg(_channel, next, i) != next );
+next = arch_fetch_and_add(_channel, 1) % num_hpets_used;
 
 /* try unused channel first */
 for ( i = next; i < next + num_hpets_used; i++ )
-- 
2.43.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2060 matches

Mail list logo