+Daniel (XSM mention)

On 10/28/24 13:02, Jan Beulich wrote:
> On 18.10.2024 22:39, Stewart Hildebrand wrote:
>> Add links between a VF's struct pci_dev and its associated PF struct
>> pci_dev. Move the calls to pci_get_pdev()/pci_add_device() down to avoid
>> dropping and re-acquiring the pcidevs_lock().
>>
>> During PF removal, unlink VF from PF and mark the VF broken. As before,
>> VFs may exist without a corresponding PF, although now only with
>> pdev->broken = true.
>>
>> The hardware domain is expected to remove the associated VFs before
>> removing the PF. Print a warning in case a PF is removed with associated
>> VFs still present.
>>
>> Signed-off-by: Stewart Hildebrand <[email protected]>
>> ---
>> Candidate for backport to 4.19 (the next patch depends on this one)
>>
>> v5->v6:
>> * move printk() before ASSERT_UNREACHABLE()
>> * warn about PF removal with VFs still present
> 
> Hmm, maybe I didn't make this clear enough when commenting on v5: I wasn't
> just after an adjustment to the commit message. I'm instead actively
> concerned of the resulting behavior. Question is whether we can reasonably
> do something about that.
> 
> Jan

Right. My suggestion then is to go back to roughly how it was done in
v4 [0]:

* Remove the VFs right away during PF removal, so that we don't end up
with stale VFs. Regarding XSM, assume that a domain with permission to
remove the PF is also allowed to remove the VFs. We should probably also
return an error from pci_remove_device in the case of removing the PF
with VFs still present (and still perform the removals despite returning
an error). Subsequent attempts by a domain to remove the VFs would
return an error (as they have already been removed), but that's expected
since we've taken a stance that PF-then-VF removal order is invalid
anyway.

While the above is what I prefer, I just want to mention other options I
considered for the scenario of PF removal with VFs still present:

* Increase the "scariness" of the warning message added in v6.

* Return an error from pci_remove_device (while still removing only the
PF). We would be left with stale VFs in Xen. At least this would
concretely inform dom0 that Xen takes issue with the PF-then-VF removal
order. Subsequent attempts by a domain to remove VFs, however
(un)likely, would succeed.

* Return an error from pci_remove_device and keep the PF and VFs. This
is IMO the worst option because then we would have a stale PF in
addition to stale VFs.

[0] 
https://lore.kernel.org/xen-devel/[email protected]/T/#t

Reply via email to