Current code in _clear_irq_vector() will mark the irq as unused before doing the cleanup required when move_in_progress is true.
This can lead to races in create_irq() if the function picks an irq desc that's been marked as unused but has move_in_progress set, as the call to assign_irq_vector() in that function can then fail with -EAGAIN. Prevent that by only marking irq descs as unused when all the cleanup has been done. While there also use write_atomic() when setting IRQ_UNUSED in _clear_irq_vector(). The check for move_in_progress cannot be removed from _assign_irq_vector(), as other users (io_apic_set_pci_routing() and ioapic_guest_write()) can still pass active irq descs to assign_irq_vector(). Signed-off-by: Roger Pau Monné <roger....@citrix.com> --- I've only observed this race when using vPCI with a PVH dom0, so I assume other users of _{clear,assign}_irq_vector() are likely to already be mutually exclusive by means of other external locks. The path that triggered this race is using allocate_and_map_msi_pirq(), which will sadly drop the returned error code from create_irq() and just return EINVAL, that makes figuring out the issue more complicated, but fixing that error path should be done in a separate commit. --- xen/arch/x86/irq.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index cd0c8a30a8..15a78a44da 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -220,27 +220,27 @@ static void _clear_irq_vector(struct irq_desc *desc) clear_bit(vector, desc->arch.used_vectors); } - desc->arch.used = IRQ_UNUSED; - - trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask); + if ( unlikely(desc->arch.move_in_progress) ) + { + /* If we were in motion, also clear desc->arch.old_vector */ + old_vector = desc->arch.old_vector; + cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); - if ( likely(!desc->arch.move_in_progress) ) - return; + for_each_cpu(cpu, tmp_mask) + { + ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq); + TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu); + per_cpu(vector_irq, cpu)[old_vector] = ~irq; + } - /* If we were in motion, also clear desc->arch.old_vector */ - old_vector = desc->arch.old_vector; - cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map); + release_old_vec(desc); - for_each_cpu(cpu, tmp_mask) - { - ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq); - TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu); - per_cpu(vector_irq, cpu)[old_vector] = ~irq; + desc->arch.move_in_progress = 0; } - release_old_vec(desc); + write_atomic(&desc->arch.used, IRQ_UNUSED); - desc->arch.move_in_progress = 0; + trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask); } void __init clear_irq_vector(int irq) -- 2.37.3