Hi Jan,

Jan Beulich <[email protected]> writes:

> On 04.07.2023 23:03, Volodymyr Babchuk wrote:
>> I am currently implementing your proposal (along with Jan's
>> suggestions), but I am facing ABBA deadlock with IOMMU's
>> reassign_device() call, which has this piece of code:
>> 
>>         list_move(&pdev->domain_list, &target->pdev_list);
>> 
>> My immediate change was:
>> 
>>         write_lock(&pdev->domain->pci_lock);
>>         list_del(&pdev->domain_list);
>>         write_unlock(&pdev->domain->pci_lock);
>> 
>>         write_lock(&target->pci_lock);
>>         list_add(&pdev->domain_list, &target->pdev_list);
>>         write_unlock(&target->pci_lock);
>> 
>> But this will not work because reassign_device is called from
>> pci_release_devices() which iterates over d->pdev_list, so we need to
>> take a d->pci_lock early.
>> 
>> Any suggestions on how to fix this? My idea is to remove a device from a
>> list one at time:
>> 
>> int pci_release_devices(struct domain *d)
>> {
>>     struct pci_dev *pdev;
>>     u8 bus, devfn;
>>     int ret;
>> 
>>     pcidevs_lock();
>>     write_lock(&d->pci_lock);
>>     ret = arch_pci_clean_pirqs(d);
>>     if ( ret )
>>     {
>>         pcidevs_unlock();
>>         write_unlock(&d->pci_lock);
>>         return ret;
>>     }
>> 
>>     while ( !list_empty(&d->pdev_list) )
>>     {
>>         pdev = list_entry(&d->pdev_list, struct pci_dev, domain_list);
>>         bus = pdev->bus;
>>         devfn = pdev->devfn;
>>         list_del(&pdev->domain_list);
>>         write_unlock(&d->pci_lock);
>>         ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>>         write_lock(&d->pci_lock);
>
> I think it needs doing almost like this, but with two more tweaks and
> no list_del() right here (first and foremost to avoid needing to
> figure whether removing early isn't going to subtly break anything;
> see below for an error case that would end up with changed behavior):
>
>     while ( !list_empty(&d->pdev_list) )
>     {
>         const struct pci_dev *pdev = list_first_entry(&d->pdev_list, struct 
> pci_dev, domain_list);
>         uint16_t seg = pdev->seg;
>         uint8_t bus = pdev->bus;
>         uint8_t devfn = pdev->devfn;
>
>         write_unlock(&d->pci_lock);
>         ret = deassign_device(d, seg, bus, devfn) ?: ret;
>         write_lock(&d->pci_lock);
>     }
>
> One caveat though: The original list_for_each_entry_safe() guarantees
> the loop to complete; your use of list_del() would guarantee that too,
> but then the device wouldn't be on the list anymore if deassign_device()
> failed. Therefore I guess you will need another local list where you
> (temporarily) put all the devices which deassign_device() left on the
> list, and which you would then move back to d->pdev_list after the loop
> has finished.

Okay, taking into the account all that you wrote, I came with this
implementation:

int pci_release_devices(struct domain *d)
{
    int combined_ret;
    LIST_HEAD(failed_pdevs);

    pcidevs_lock();
    write_lock(&d->pci_lock);
    combined_ret = arch_pci_clean_pirqs(d);
    if ( combined_ret )
    {
        pcidevs_unlock();
        write_unlock(&d->pci_lock);
        return combined_ret;
    }

    while ( !list_empty(&d->pdev_list) )
    {
        struct pci_dev *pdev = list_first_entry(&d->pdev_list,
                                                struct pci_dev,
                                                domain_list);
        uint16_t seg = pdev->seg;
        uint8_t bus = pdev->bus;
        uint8_t devfn = pdev->devfn;
        int ret;

        write_unlock(&d->pci_lock);
        ret = deassign_device(d, seg, bus, devfn);
        write_lock(&d->pci_lock);
        if ( ret )
        {
            bool still_present = false;
            const struct pci_dev *tmp;

            for_each_pdev ( d, tmp )
            {
                if ( tmp == (const struct pci_dev*)pdev )
                {
                    still_present = true;
                    break;
                }
            }
            if ( still_present )
                list_move(&pdev->domain_list, &failed_pdevs);
            combined_ret = ret;
        }
    }

    list_splice(&failed_pdevs, &d->pdev_list);
    write_unlock(&d->pci_lock);
    pcidevs_unlock();

    return combined_ret;
}


> (Whether it is sufficient to inspect the list head to
> determine whether the pdev is still on the list needs careful checking.)

I am not sure that this is enough. We dropped d->pci_lock so we can
expect that the list was permutated in a random way.


-- 
WBR, Volodymyr

Reply via email to