On 13.06.2024 13:31, Roger Pau Monné wrote:
> On Thu, Jun 13, 2024 at 10:38:35AM +0200, Jan Beulich wrote:
>> On 12.06.2024 17:36, Roger Pau Monné wrote:
>>> On Wed, Jun 12, 2024 at 03:42:58PM +0200, Jan Beulich wrote:
>>>> On 12.06.2024 12:39, Roger Pau Monné wrote:
>>>>> On Tue, Jun 11, 2024 at 03:18:32PM +0200, Jan Beulich wrote:
>>>>>> On 10.06.2024 16:20, Roger Pau Monne wrote:
>>>>>>> Currently there's logic in fixup_irqs() that attempts to prevent
>>>>>>> _assign_irq_vector() from failing, as fixup_irqs() is required to 
>>>>>>> evacuate all
>>>>>>> interrupts from the CPUs not present in the input mask.  The current 
>>>>>>> logic in
>>>>>>> fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
>>>>>>> move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.
>>>>>>>
>>>>>>> Instead of attempting to fixup the interrupt descriptor in fixup_irqs() 
>>>>>>> so that
>>>>>>> _assign_irq_vector() cannot fail, introduce logic in 
>>>>>>> _assign_irq_vector()
>>>>>>> to deal with interrupts that have either 
>>>>>>> move_{in_progress,cleanup_count} set
>>>>>>> and no remaining online CPUs in ->arch.cpu_mask.
>>>>>>>
>>>>>>> If _assign_irq_vector() is requested to move an interrupt in the state
>>>>>>> described above, first attempt to see if ->arch.old_cpu_mask contains 
>>>>>>> any valid
>>>>>>> CPUs that could be used as fallback, and if that's the case do move the
>>>>>>> interrupt back to the previous destination.  Note this is easier 
>>>>>>> because the
>>>>>>> vector hasn't been released yet, so there's no need to allocate and 
>>>>>>> setup a new
>>>>>>> vector on the destination.
>>>>>>>
>>>>>>> Due to the logic in fixup_irqs() that clears offline CPUs from
>>>>>>> ->arch.old_cpu_mask (and releases the old vector if the mask becomes 
>>>>>>> empty) it
>>>>>>> shouldn't be possible to get into _assign_irq_vector() with
>>>>>>> ->arch.move_{in_progress,cleanup_count} set but no online CPUs in
>>>>>>> ->arch.old_cpu_mask.
>>>>>>>
>>>>>>> However if ->arch.move_{in_progress,cleanup_count} is set and the 
>>>>>>> interrupt has
>>>>>>> also changed affinity, it's possible the members of ->arch.old_cpu_mask 
>>>>>>> are no
>>>>>>> longer part of the affinity set,
>>>>>>
>>>>>> I'm having trouble relating this (->arch.old_cpu_mask related) to ...
>>>>>>
>>>>>>> move the interrupt to a different CPU part of
>>>>>>> the provided mask
>>>>>>
>>>>>> ... this (->arch.cpu_mask related).
>>>>>
>>>>> No, the "provided mask" here is the "mask" parameter, not
>>>>> ->arch.cpu_mask.
>>>>
>>>> Oh, so this describes the case of "hitting" the comment at the very bottom 
>>>> of
>>>> the first hunk then? (I probably was misreading this because I was 
>>>> expecting
>>>> it to describe a code change, rather than the case where original behavior
>>>> needs retaining. IOW - all fine here then.)
>>>>
>>>>>>> and keep the current ->arch.old_{cpu_mask,vector} for the
>>>>>>> pending interrupt movement to be completed.
>>>>>>
>>>>>> Right, that's to clean up state from before the initial move. What isn't
>>>>>> clear to me is what's to happen with the state of the intermediate
>>>>>> placement. Description and code changes leave me with the impression that
>>>>>> it's okay to simply abandon, without any cleanup, yet I can't quite 
>>>>>> figure
>>>>>> why that would be an okay thing to do.
>>>>>
>>>>> There isn't much we can do with the intermediate placement, as the CPU
>>>>> is going offline.  However we can drain any pending interrupts from
>>>>> IRR after the new destination has been set, since setting the
>>>>> destination is done from the CPU that's the current target of the
>>>>> interrupts.  So we can ensure the draining is done strictly after the
>>>>> target has been switched, hence ensuring no further interrupts from
>>>>> this source will be delivered to the current CPU.
>>>>
>>>> Hmm, I'm afraid I still don't follow: I'm specifically in trouble with
>>>> the ...
>>>>
>>>>>>> --- a/xen/arch/x86/irq.c
>>>>>>> +++ b/xen/arch/x86/irq.c
>>>>>>> @@ -544,7 +544,53 @@ static int _assign_irq_vector(struct irq_desc 
>>>>>>> *desc, const cpumask_t *mask)
>>>>>>>      }
>>>>>>>  
>>>>>>>      if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
>>>>>>> -        return -EAGAIN;
>>>>>>> +    {
>>>>>>> +        /*
>>>>>>> +         * If the current destination is online refuse to shuffle.  
>>>>>>> Retry after
>>>>>>> +         * the in-progress movement has finished.
>>>>>>> +         */
>>>>>>> +        if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) )
>>>>>>> +            return -EAGAIN;
>>>>>>> +
>>>>>>> +        /*
>>>>>>> +         * Due to the logic in fixup_irqs() that clears offlined CPUs 
>>>>>>> from
>>>>>>> +         * ->arch.old_cpu_mask it shouldn't be possible to get here 
>>>>>>> with
>>>>>>> +         * ->arch.move_{in_progress,cleanup_count} set and no online 
>>>>>>> CPUs in
>>>>>>> +         * ->arch.old_cpu_mask.
>>>>>>> +         */
>>>>>>> +        ASSERT(valid_irq_vector(desc->arch.old_vector));
>>>>>>> +        ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, 
>>>>>>> &cpu_online_map));
>>>>>>> +
>>>>>>> +        if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) )
>>>>>>> +        {
>>>>>>> +            /*
>>>>>>> +             * Fallback to the old destination if moving is in 
>>>>>>> progress and the
>>>>>>> +             * current destination is to be offlined.  This is only 
>>>>>>> possible if
>>>>>>> +             * the CPUs in old_cpu_mask intersect with the affinity 
>>>>>>> mask passed
>>>>>>> +             * in the 'mask' parameter.
>>>>>>> +             */
>>>>>>> +            desc->arch.vector = desc->arch.old_vector;
>>>>>>> +            cpumask_and(desc->arch.cpu_mask, desc->arch.old_cpu_mask, 
>>>>>>> mask);
>>>>
>>>> ... replacing of vector (and associated mask), without any further 
>>>> accounting.
>>>
>>> It's quite likely I'm missing something here, but what further
>>> accounting you would like to do?
>>>
>>> The current target of the interrupt (->arch.cpu_mask previous to
>>> cpumask_and()) is all going offline, so any attempt to set it in
>>> ->arch.old_cpu_mask would just result in a stale (offline) CPU getting
>>> set in ->arch.old_cpu_mask, which previous patches attempted to
>>> solve.
>>>
>>> Maybe by "further accounting" you meant something else not related to
>>> ->arch.old_{cpu_mask,vector}?
>>
>> Indeed. What I'm thinking of is what normally release_old_vec() would
>> do (of which only desc->arch.used_vectors updating would appear to be
>> relevant, seeing the CPU's going offline). The other one I was thinking
>> of, updating vector_irq[], likely is also unnecessary, again because
>> that's per-CPU data of a CPU going down.
> 
> I think updating vector_irq[] should be explicitly avoided, as doing
> so would prevent us from correctly draining any pending interrupts
> because the vector -> irq mapping would be broken when the interrupt
> enable window at the bottom of fixup_irqs() is reached.
> 
> For used_vectors: we might clean it, I'm a bit worried however that at
> some point we insert a check in do_IRQ() path that ensures the
> vector_irq[] is inline with desc->arch.used_vectors, which would fail
> for interrupts drained at the bottom of fixup_irqs().  Let me attempt
> to clean the currently used vector from ->arch.used_vectors.

Just to clarify: It may well be that for draining the bit can't be cleared
right here. But it then still needs clearing _somewhere_, or else we
chance ending up with inconsistent state (triggering e.g. an assertion
later on) or the leaking of vectors. My problem here was that I also
couldn't locate any such "somewhere", and commentary also didn't point me
anywhere.

Jan

Reply via email to