On 14.02.22 15:22, Jan Beulich wrote:
> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
>>
>> On 14.02.22 14:57, Jan Beulich wrote:
>>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>>>> On 14.02.22 13:25, Roger Pau Monné wrote:
>>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote:
>>>>>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko wrote:
>>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko
>>>>>>>>> wrote:
>>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>>>>>>>>>> +
>>>>>>>>>>>>>> for ( i = 0; i < msix->max_entries; i++ )
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> const struct vpci_msix_entry *entry =
>>>>>>>>>>>>>> &msix->entries[i];
>>>>>>>>>>>>> Since this function is now called with the per-domain rwlock read
>>>>>>>>>>>>> locked it's likely not appropriate to call
>>>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>>>> while holding such lock (check below).
>>>>>>>>>>>> You are right, as it is possible that:
>>>>>>>>>>>>
>>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>>>>>>>
>>>>>>>>>>>> Even more, vpci_process_pending may also
>>>>>>>>>>>>
>>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock
>>>>>>>>>>>>
>>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs
>>>>>>>>>>>> must not hold d->vpci_rwlock at least.
>>>>>>>>>>>>
>>>>>>>>>>>> And also we need to check that pdev->vpci was not removed
>>>>>>>>>>>> in between or *re-created*
>>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs assigned
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same
>>>>>>>>>>>>> domain.
>>>>>>>>>>>> So, do you mean a pattern like the below should be used at all
>>>>>>>>>>>> places where we need to call process_pending_softirqs?
>>>>>>>>>>>>
>>>>>>>>>>>> read_unlock
>>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>>> read_lock
>>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>>>>>>>>>>> <continue processing>
>>>>>>>>>>> Something along those lines. You likely need to continue iterate
>>>>>>>>>>> using
>>>>>>>>>>> for_each_pdev.
>>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>>>>>>>>> this question before [1] and I was about to use some ID for that
>>>>>>>>>> purpose:
>>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id for
>>>>>>>>>> checks
>>>>>>>>> Given this is a debug message I would be OK with just doing the
>>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>>>>>>>> and that the resume MSI entry is not past the current limit. Otherwise
>>>>>>>>> just print a message and move on to the next device.
>>>>>>>> Agree, I see no big issue (probably) if we are not able to print
>>>>>>>>
>>>>>>>> How about this one:
>>>>>>>>
>>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>>>>> index 809a6b4773e1..50373f04da82 100644
>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d,
>>>>>>>> const struct pci_dev *pdev,
>>>>>>>> struct rangeset *mem, uint16_t cmd)
>>>>>>>> {
>>>>>>>> struct map_data data = { .d = d, .map = true };
>>>>>>>> + pci_sbdf_t sbdf = pdev->sbdf;
>>>>>>>> int rc;
>>>>>>>>
>>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
>>>>>>>> +
>>>>>>>> while ( (rc = rangeset_consume_ranges(mem, map_range,
>>>>>>>> &data)) == -ERESTART )
>>>>>>>> + {
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * process_pending_softirqs may trigger vpci_process_pending
>>>>>>>> which
>>>>>>>> + * may need to acquire pdev->domain->vpci_rwlock in read mode.
>>>>>>>> + */
>>>>>>>> + write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>> process_pending_softirqs();
>>>>>>>> + write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>> +
>>>>>>>> + /* Check if pdev still exists and vPCI was not removed or
>>>>>>>> re-created. */
>>>>>>>> + if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn)
>>>>>>>> != pdev)
>>>>>>>> + if ( vpci is NOT the same )
>>>>>>>> + {
>>>>>>>> + rc = 0;
>>>>>>>> + break;
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> rangeset_destroy(mem);
>>>>>>>> if ( !rc )
>>>>>>>> modify_decoding(pdev, cmd, false);
>>>>>>>>
>>>>>>>> This one also wants process_pending_softirqs to run so it *might*
>>>>>>>> want pdev and vpci checks. But at the same time apply_map runs
>>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>>>>>>> it has something to do yet). So, I think we just need:
>>>>>>>>
>>>>>>>> write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>>> process_pending_softirqs();
>>>>>>>> write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>>
>>>>>>>> and this should be enough
>>>>>>> Given the context apply_map is called from (dom0 specific init code),
>>>>>>> there's no need to check for the pdev to still exits, or whether vpci
>>>>>>> has been recreated, as it's not possible. Just add a comment to
>>>>>>> explicitly note that the context of the function is special, and thus
>>>>>>> there's no possibility of either the device or vpci going away.
>>>>>> Does it really need write_unlock/write_lock given the context?...
>>>>> I think it's bad practice to call process_pending_softirqs while
>>>>> holding any locks. This is a very specific context so it's likely fine
>>>>> to not drop the lock, but would still seem incorrect to me.
>>>> Ok
>>>>>> I think it doesn't as there is no chance defer_map is called, thus
>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>> Indeed, there's no chance of that because process_pending_softirqs
>>>>> will never try to do a scheduling operation that would result in our
>>>>> context being scheduled out.
>>>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) ==
>>>> -ERESTART )
>>>> {
>>>> /*
>>>> * FIXME: Given the context apply_map is called from (dom0
>>>> specific
>>>> * init code at system_state < SYS_STATE_active) it is not
>>>> strictly
>>>> * required that pdev->domain->vpci_rwlock is unlocked before
>>>> calling
>>>> * process_pending_softirqs as there is no contention possible
>>>> between
>>>> * this code and vpci_process_pending trying to acquire the
>>>> lock in
>>>> * read mode. But running process_pending_softirqs with any
>>>> lock held
>>>> * doesn't seem to be a good practice, so drop the lock and
>>>> re-acquire
>>>> * it right again.
>>>> */
>>>> write_unlock(&pdev->domain->vpci_rwlock);
>>>> process_pending_softirqs();
>>>> write_lock(&pdev->domain->vpci_rwlock);
>>>> }
>>> I'm afraid that's misleading at best. apply_map() is merely a specific
>>> example where you know the lock is going to be taken. But really any
>>> softirq handler could be acquiring any lock, so requesting to process
>>> softirqs cannot ever be done with any lock held.
>>>
>>> What you instead want to explain is why, after re-acquiring the lock,
>>> no further checking is needed for potentially changed state.
>> How about:
>>
>> /*
>> * FIXME: Given the context apply_map is called from (dom0 specific
>> * init code at system_state < SYS_STATE_active) there is no contention
>> * possible between this code and vpci_process_pending trying to acquire
>> * the lock in read mode and destroy pdev->vpci in its error path.
>> * Neither pdev may be disposed yet, so it is not required to check if the
>> * relevant pdev still exists after re-acquiring the lock.
>> */
> I'm not sure I follow the first sentence; I guess a comma or two may help,
> and or using "as well as" in place of one of the two "and". I also don't
> think you mean contention, but rather a race between the named entities?
/*
* FIXME: Given the context from which apply_map is called (dom0 specific
* init code at system_state < SYS_STATE_active) there is no race condition
* possible between this code and vpci_process_pending which may try to
acquire
* the lock in read mode and also try to destroy pdev->vpci in its error path.
* Neither pdev may be disposed yet, so it is not required to check if the
* relevant pdev still exists after re-acquiring the lock.
*/
>
> Jan
>
Thank you,
Oleksandr