On 1/29/26 6:01 PM, Jan Beulich wrote:
On 22.01.2026 17:47, Oleksii Kurochko wrote:
This patch is based on Linux kernel 6.16.0.
Add the consumer side (vcpu_flush_interrupts()) of the lockless pending
interrupt tracking introduced in part 1 (for producers). According, to the
design only one consumer is possible, and it is vCPU itself.
vcpu_flush_interrupts() is expected to be ran (as guests aren't ran now due
to the lack of functionality) before the hypervisor returns control to the
guest.
Producers may set bits in irqs_pending_mask without a lock. Clearing bits in
irqs_pending_mask is performed only by the consumer via xchg() (with aquire &
release semantics). The consumer must not write to irqs_pending and must not
act on bits that are not set in the mask. Otherwise, extra synchronization
should be provided.
The worst thing which could happen with such approach is that a new pending
bit will be set to irqs_pending bitmap during update of hvip variable in
vcpu_flush_interrupt() but it isn't problem as the new pending bit won't
be lost and just be proceded during the next flush.
It is possible a guest could have pending bit not result in the hardware
register without to be marked pending in irq_pending bitmap as:
According to the RISC-V ISA specification:
Bits hip.VSSIP and hie.VSSIE are the interrupt-pending and
interrupt-enable bits for VS-level software interrupts. VSSIP in hip
is an alias (writable) of the same bit in hvip.
Additionally:
When bit 2 of hideleg is zero, vsip.SSIP and vsie.SSIE are read-only
zeros. Else, vsip.SSIP and vsie.SSIE are aliases of hip.VSSIP and
hie.VSSIE.
This means the guest may modify vsip.SSIP, which implicitly updates
hip.VSSIP and the bit being writable with 1 would also trigger an interrupt
as according to the RISC-V spec:
These conditions for an interrupt trap to occur must be evaluated in a
bounded amount of time from when an interrupt becomes, or ceases to be,
pending in sip, and must also be evaluated immediately following the
execution of an SRET instruction or an explicit write to a CSR on which
these interrupt trap conditions expressly depend (including sip, sie and
sstatus).
What means that IRQ_VS_SOFT must be synchronized separately, what is done
in vcpu_sync_interrupts().
And this function is going to be used from where? Exit from guest into the
hypervisor? Whereas vcpu_flush_interrupt() is to be called ahead of re-
entering the guest?
Both of them are called before returning control to a guest (missed to mention
that in the commit message) in do_trap() at the end:
static void check_for_pcpu_work(void)
{
...
vcpu_flush_interrupts(current);
vcpu_sync_interrupts(current);
}
void do_trap(struct cpu_user_regs *cpu_regs)
{
...
if ( cpu_regs->hstatus & HSTATUS_SPV )
check_for_pcpu_work();
}
I ask because vcpu_sync_interrupts() very much looks like a producer to me,
yet the patch here supposedly is the consumer side.
Yes, vcpu_sync_interrupts() should be in producer side, I'll move it to the
prev.
patch.
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -171,3 +171,68 @@ int vcpu_unset_interrupt(struct vcpu *v, unsigned int irq)
return 0;
}
+
+static void vcpu_update_hvip(struct vcpu *v)
Pointer-to-const?
+{
+ csr_write(CSR_HVIP, v->arch.hvip);
+}
+
+void vcpu_flush_interrupts(struct vcpu *v)
+{
+ register_t *hvip = &v->arch.hvip;
+
+ unsigned long mask, val;
These are used ...
+ if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
+ {
+ mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
+ val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
+
+ *hvip &= ~mask;
+ *hvip |= val;
... solely in this more narrow scope.
I'll declare them inside the if().
+ }
+
+ /*
+ * Flush AIA high interrupts.
+ *
+ * It is necessary to do only for CONFIG_RISCV_32 which isn't supported
+ * now.
+ */
+#ifdef CONFIG_RISCV_32
+# error "Update hviph"
+#endif
+
+ vcpu_update_hvip(v);
Why would bits for which the mask bit wasn't set be written here?
This function inside uses only v->arch.hvip which is updated above according to
the mask.
+void vcpu_sync_interrupts(struct vcpu *v)
+{
+ unsigned long hvip;
+
+ /* Read current HVIP and VSIE CSRs */
+ v->arch.vsie = csr_read(CSR_VSIE);
+
+ /* Sync-up HVIP.VSSIP bit changes does by Guest */
Nit: s/does/done/ ?
+ hvip = csr_read(CSR_HVIP);
+ if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
+ {
+ if ( !test_and_set_bit(IRQ_VS_SOFT,
+ &v->arch.irqs_pending_mask) )
Why two separate, nested if()s?
Do you mean that it could be:
if ( !test_and_set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending_mask) && (hvip &
BIT(IRQ_VS_SOFT, UL))
?
+ {
+ if ( hvip & BIT(IRQ_VS_SOFT, UL) )
+ set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
+ else
+ clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
+ }
In the previous patch you set forth strict ordering rules, with a barrier in
the middle. All of this is violated here.
It still respects the rule that the producer (|vcpu_sync_interrupts()| which
should be in the producer path) never clears the mask and only writes to
|irqs_pending| if it is the one that flipped the corresponding mask bit from 0
to 1.
Considering that the consumer cannot be called concurrently in this case
(since|vcpu_flush_interrupts()| and|vcpu_sync_interrupts()| are only invoked
sequentially in|check_for_pcpu_work()|, as mentioned above), nothing can
clear a bit in the mask in between. Therefore, I think it is acceptable to
slightly bend (and it should be explained in the comment above the
function or in the commit message) the rule that the|irqs_pending| bit must
be written first, followed by updating the corresponding bit in
|irqs_pending_mask() specifically for |vcpu_sync_interrupts().
+ }
+
+ /*
+ * Sync-up AIA high interrupts.
+ *
+ * It is necessary to do only for CONFIG_RISCV_32 which isn't supported
+ * now.
+ */
+#ifdef CONFIG_RISCV_32
+# error "Update vsieh"
+#endif
Here you mean the register or the struct vcpu field? It may be helpful to
disambiguate; assuming it's the latter, simply spell out v->arch.vsieh?
(Same then for the similar code in vcpu_flush_interrupts().)
Agree, it would be better.
Thanks.
~ Oleksii