On 10.07.2023 00:41, Volodymyr Babchuk wrote:
> 
> Hi Jan,
> 
> Jan Beulich <[email protected]> writes:
> 
>> On 04.07.2023 23:03, Volodymyr Babchuk wrote:
>>> I am currently implementing your proposal (along with Jan's
>>> suggestions), but I am facing ABBA deadlock with IOMMU's
>>> reassign_device() call, which has this piece of code:
>>>
>>>         list_move(&pdev->domain_list, &target->pdev_list);
>>>
>>> My immediate change was:
>>>
>>>         write_lock(&pdev->domain->pci_lock);
>>>         list_del(&pdev->domain_list);
>>>         write_unlock(&pdev->domain->pci_lock);
>>>
>>>         write_lock(&target->pci_lock);
>>>         list_add(&pdev->domain_list, &target->pdev_list);
>>>         write_unlock(&target->pci_lock);
>>>
>>> But this will not work because reassign_device is called from
>>> pci_release_devices() which iterates over d->pdev_list, so we need to
>>> take a d->pci_lock early.
>>>
>>> Any suggestions on how to fix this? My idea is to remove a device from a
>>> list one at time:
>>>
>>> int pci_release_devices(struct domain *d)
>>> {
>>>     struct pci_dev *pdev;
>>>     u8 bus, devfn;
>>>     int ret;
>>>
>>>     pcidevs_lock();
>>>     write_lock(&d->pci_lock);
>>>     ret = arch_pci_clean_pirqs(d);
>>>     if ( ret )
>>>     {
>>>         pcidevs_unlock();
>>>         write_unlock(&d->pci_lock);
>>>         return ret;
>>>     }
>>>
>>>     while ( !list_empty(&d->pdev_list) )
>>>     {
>>>         pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
>>>         bus = pdev->bus;
>>>         devfn = pdev->devfn;
>>>         list_del(&pdev->domain_list);
>>>         write_unlock(&d->pci_lock);
>>>         ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>>>         write_lock(&d->pci_lock);
>>
>> I think it needs doing almost like this, but with two more tweaks and
>> no list_del() right here (first and foremost to avoid needing to
>> figure whether removing early isn't going to subtly break anything;
>> see below for an error case that would end up with changed behavior):
>>
>>     while ( !list_empty(&d->pdev_list) )
>>     {
>>         const struct pci_dev *pdev = list_first_entry(&d->pdev_list, struct 
>> pci_dev, domain_list);
>>         uint16_t seg = pdev->seg;
>>         uint8_t bus = pdev->bus;
>>         uint8_t devfn = pdev->devfn;
>>
>>         write_unlock(&d->pci_lock);
>>         ret = deassign_device(d, seg, bus, devfn) ?: ret;
>>         write_lock(&d->pci_lock);
>>     }
>>
>> One caveat though: The original list_for_each_entry_safe() guarantees
>> the loop to complete; your use of list_del() would guarantee that too,
>> but then the device wouldn't be on the list anymore if deassign_device()
>> failed. Therefore I guess you will need another local list where you
>> (temporarily) put all the devices which deassign_device() left on the
>> list, and which you would then move back to d->pdev_list after the loop
>> has finished.
> 
> Okay, taking into the account all that you wrote, I came with this
> implementation:

Looks plausible at the first glance, but will of course need looking at
in more detail in the context of the full patch. Just one (largely)
cosmetic remark right away:

> int pci_release_devices(struct domain *d)
> {
>     int combined_ret;
>     LIST_HEAD(failed_pdevs);
> 
>     pcidevs_lock();
>     write_lock(&d->pci_lock);
>     combined_ret = arch_pci_clean_pirqs(d);
>     if ( combined_ret )
>     {
>         pcidevs_unlock();
>         write_unlock(&d->pci_lock);
>         return combined_ret;
>     }
> 
>     while ( !list_empty(&d->pdev_list) )
>     {
>         struct pci_dev *pdev = list_first_entry(&d->pdev_list,
>                                                 struct pci_dev,
>                                                 domain_list);
>         uint16_t seg = pdev->seg;
>         uint8_t bus = pdev->bus;
>         uint8_t devfn = pdev->devfn;
>         int ret;
> 
>         write_unlock(&d->pci_lock);
>         ret = deassign_device(d, seg, bus, devfn);
>         write_lock(&d->pci_lock);
>         if ( ret )
>         {
>             bool still_present = false;
>             const struct pci_dev *tmp;
> 
>             for_each_pdev ( d, tmp )
>             {
>                 if ( tmp == (const struct pci_dev*)pdev )

As I'm sure I expressed earlier, casts should be avoided whenever
possible. I see no reason for one here: Comparing pointer-to-
const-<type> with pointer-to-<type> is permitted by the language.

>                 {
>                     still_present = true;
>                     break;
>                 }
>             }
>             if ( still_present )
>                 list_move(&pdev->domain_list, &failed_pdevs);
>             combined_ret = ret;
>         }
>     }
> 
>     list_splice(&failed_pdevs, &d->pdev_list);
>     write_unlock(&d->pci_lock);
>     pcidevs_unlock();
> 
>     return combined_ret;
> }
> 
> 
>> (Whether it is sufficient to inspect the list head to
>> determine whether the pdev is still on the list needs careful checking.)
> 
> I am not sure that this is enough. We dropped d->pci_lock so we can
> expect that the list was permutated in a random way.

Right, if that cannot be excluded for sure, then better stay on the safe
side for now. A code comment would be nice though ahead of the
for_each_pdev().

Jan

Reply via email to