On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko wrote: > > > On 11.02.22 13:40, Roger Pau Monné wrote: > > + > >>>> for ( i = 0; i < msix->max_entries; i++ ) > >>>> { > >>>> const struct vpci_msix_entry *entry = &msix->entries[i]; > >>> Since this function is now called with the per-domain rwlock read > >>> locked it's likely not appropriate to call process_pending_softirqs > >>> while holding such lock (check below). > >> You are right, as it is possible that: > >> > >> process_pending_softirqs -> vpci_process_pending -> read_lock > >> > >> Even more, vpci_process_pending may also > >> > >> read_unlock -> vpci_remove_device -> write_lock > >> > >> in its error path. So, any invocation of process_pending_softirqs > >> must not hold d->vpci_rwlock at least. > >> > >> And also we need to check that pdev->vpci was not removed > >> in between or *re-created* > >>> We will likely need to re-iterate over the list of pdevs assigned to > >>> the domain and assert that the pdev is still assigned to the same > >>> domain. > >> So, do you mean a pattern like the below should be used at all > >> places where we need to call process_pending_softirqs? > >> > >> read_unlock > >> process_pending_softirqs > >> read_lock > >> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); > >> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) ) > >> <continue processing> > > Something along those lines. You likely need to continue iterate using > > for_each_pdev. > How do we tell if pdev->vpci is the same? Jan has already brought > this question before [1] and I was about to use some ID for that purpose: > pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id for checks
Given this is a debug message I would be OK with just doing the minimal checks to prevent Xen from crashing (ie: pdev->vpci exists) and that the resume MSI entry is not past the current limit. Otherwise just print a message and move on to the next device. The recreating of pdev->vpci only occurs as a result of some admin operations, and doing it while also trying to print the current MSI status is not a reliable approach. So dumping an incomplete or incoherent state as a result of ongoing admin operations would be fine. Thanks, Roger.