Re: [PATCH v2 3/4] vpci: use pcidevs locking to protect MMIO handlers
On 09.08.2022 22:33, Volodymyr Babchuk wrote: > Jan Beulich writes: >> On 18.07.2022 23:15, Volodymyr Babchuk wrote: >>> -if ( !pdev->vpci || !spin_trylock(>vpci->lock) ) >>> + >>> +if ( !pcidevs_read_trylock() ) >>> +return -EBUSY; >>> +pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, >>> sbdf.devfn); >>> +/* >>> + * FIXME: we may find a re-allocated pdev's copy here. >>> + * Even occupying the same address as before. Do our best. >>> + */ >>> +if ( !pdev || (pdev != msix->pdev) || !pdev->vpci || >> >> Despite the comment: What guarantees that msix isn't a dangling pointer >> at this point? At the very least I think you need to check !pdev->vpci >> first. And I'm afraid I don't view "do our best" as good enough here >> (considering the patch doesn't carry an RFC tag). And no, I don't have >> any good suggestion other than "our PCI device locking needs a complete >> overhaul". Quite likely what we need is a refcounter per device, which >> - as long as non-zero - prevents removal. > > Refcounter itself is a good idea, but I'm not liking where all this > goes. We already are reworking locking by adding rw-locks with counters, > adding refcounter on top of this will complicate things even further. I'm of quite the opposite opinion: A lot of the places will no longer need to hold the pcidevs lock when instead they hold a reference; the lock will only be needed to acquire a reference. Therefore refcounting is likely to simplify things, presumably to the point where at least recursive locking (and probably also converting to some r/w locking scheme) won't be necessary. The main complicating factor is that all places where a reference is needed will have to be located, and (quite obviously I'm inclined to say) in particular all involved error paths will need to be covered when it comes to dropping references no longer needed. > I'm starting to think that complete PCI device locking rework may be > simpler solution, actually. By any chance, were there any prior > discussion on how proper locking should look like? Well, there were prior discussions (you'd need to search the list, as I have no pointers to hand), but I'm not sure a clear picture had surfaced how "proper locking" would look like. I guess that's part of the reason why the currently proposed locking model actually makes things quite a bit more complicated. Jan
Re: [PATCH v2 3/4] vpci: use pcidevs locking to protect MMIO handlers
Hello Jan, Jan Beulich writes: > On 18.07.2022 23:15, Volodymyr Babchuk wrote: >> --- a/xen/arch/x86/hvm/vmsi.c >> +++ b/xen/arch/x86/hvm/vmsi.c >> @@ -891,10 +891,16 @@ void vpci_msix_arch_init_entry(struct vpci_msix_entry >> *entry) >> entry->arch.pirq = INVALID_PIRQ; >> } >> >> -int vpci_msix_arch_print(const struct vpci_msix *msix) >> +int vpci_msix_arch_print(const struct domain *d, const struct vpci_msix >> *msix) > > I don't think the extra parameter is needed: > >> @@ -911,11 +917,23 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) >> if ( i && !(i % 64) ) >> { >> struct pci_dev *pdev = msix->pdev; > > You get hold of pdev here, and hence you can take the domain from pdev. Yes, makes sense. >> +pci_sbdf_t sbdf = pdev->sbdf; >> >> spin_unlock(>pdev->vpci->lock); >> +pcidevs_read_unlock(); >> + >> +/* NB: we still hold rcu_read_lock(_read_lock); here. */ >> process_pending_softirqs(); >> -/* NB: we assume that pdev cannot go away for an alive domain. >> */ > > I think this comment wants retaining, as the new one you add is about > a different aspect. > >> -if ( !pdev->vpci || !spin_trylock(>vpci->lock) ) >> + >> +if ( !pcidevs_read_trylock() ) >> +return -EBUSY; >> +pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, >> sbdf.devfn); >> +/* >> + * FIXME: we may find a re-allocated pdev's copy here. >> + * Even occupying the same address as before. Do our best. >> + */ >> +if ( !pdev || (pdev != msix->pdev) || !pdev->vpci || > > Despite the comment: What guarantees that msix isn't a dangling pointer > at this point? At the very least I think you need to check !pdev->vpci > first. And I'm afraid I don't view "do our best" as good enough here > (considering the patch doesn't carry an RFC tag). And no, I don't have > any good suggestion other than "our PCI device locking needs a complete > overhaul". Quite likely what we need is a refcounter per device, which > - as long as non-zero - prevents removal. Refcounter itself is a good idea, but I'm not liking where all this goes. We already are reworking locking by adding rw-locks with counters, adding refcounter on top of this will complicate things even further. I'm starting to think that complete PCI device locking rework may be simpler solution, actually. By any chance, were there any prior discussion on how proper locking should look like? > >> + !spin_trylock(>vpci->lock) ) >> return -EBUSY; > > Don't you need to drop the pcidevs lock on this error path? Yeah, you are right. > >> @@ -450,10 +465,15 @@ static int cf_check init_bars(struct pci_dev *pdev) >> uint16_t cmd; >> uint64_t addr, size; >> unsigned int i, num_bars, rom_reg; >> -struct vpci_header *header = >vpci->header; >> -struct vpci_bar *bars = header->bars; >> +struct vpci_header *header; >> +struct vpci_bar *bars; >> int rc; >> >> +ASSERT(pcidevs_write_locked()); >> + >> +header = >vpci->header; >> +bars = header->bars; > > I'm not convinced the code movement here does us any good. (Same > apparently elsewhere below.) > >> @@ -277,6 +282,9 @@ void vpci_dump_msi(void) >> >> printk("vPCI MSI/MSI-X d%d\n", d->domain_id); >> >> +if ( !pcidevs_read_trylock() ) >> +continue; > > Note how this lives ahead of ... > >> for_each_pdev ( d, pdev ) >> { > > ... the loop, while ... > >> @@ -310,7 +318,7 @@ void vpci_dump_msi(void) >> printk(" entries: %u maskall: %d enabled: %d\n", >> msix->max_entries, msix->masked, msix->enabled); >> >> -rc = vpci_msix_arch_print(msix); >> +rc = vpci_msix_arch_print(d, msix); >> if ( rc ) >> { >> /* >> @@ -318,12 +326,13 @@ void vpci_dump_msi(void) >> * holding the lock. >> */ >> printk("unable to print all MSI-X entries: %d\n", rc); >> -process_pending_softirqs(); >> -continue; >> +goto pdev_done; >> } >> } >> >> spin_unlock(>vpci->lock); >> + pdev_done: >> +pcidevs_read_unlock(); > > ... this is still inside the loop body. > >> @@ -332,10 +334,14 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, >> unsigned int size) >> return data; >> } >> >> +pcidevs_read_lock(); >> /* Find the PCI dev matching the address. */ >> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); >> -if ( !pdev ) >> +if ( !pdev || (pdev && !pdev->vpci) ) > > Simpler > > if ( !pdev || !pdev->vpci ) > > ? > >> @@ -381,6 +387,7 @@ uint32_t
Re: [PATCH v2 3/4] vpci: use pcidevs locking to protect MMIO handlers
On 18.07.2022 23:15, Volodymyr Babchuk wrote: > --- a/xen/arch/x86/hvm/vmsi.c > +++ b/xen/arch/x86/hvm/vmsi.c > @@ -891,10 +891,16 @@ void vpci_msix_arch_init_entry(struct vpci_msix_entry > *entry) > entry->arch.pirq = INVALID_PIRQ; > } > > -int vpci_msix_arch_print(const struct vpci_msix *msix) > +int vpci_msix_arch_print(const struct domain *d, const struct vpci_msix > *msix) I don't think the extra parameter is needed: > @@ -911,11 +917,23 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > if ( i && !(i % 64) ) > { > struct pci_dev *pdev = msix->pdev; You get hold of pdev here, and hence you can take the domain from pdev. > +pci_sbdf_t sbdf = pdev->sbdf; > > spin_unlock(>pdev->vpci->lock); > +pcidevs_read_unlock(); > + > +/* NB: we still hold rcu_read_lock(_read_lock); here. */ > process_pending_softirqs(); > -/* NB: we assume that pdev cannot go away for an alive domain. */ I think this comment wants retaining, as the new one you add is about a different aspect. > -if ( !pdev->vpci || !spin_trylock(>vpci->lock) ) > + > +if ( !pcidevs_read_trylock() ) > +return -EBUSY; > +pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); > +/* > + * FIXME: we may find a re-allocated pdev's copy here. > + * Even occupying the same address as before. Do our best. > + */ > +if ( !pdev || (pdev != msix->pdev) || !pdev->vpci || Despite the comment: What guarantees that msix isn't a dangling pointer at this point? At the very least I think you need to check !pdev->vpci first. And I'm afraid I don't view "do our best" as good enough here (considering the patch doesn't carry an RFC tag). And no, I don't have any good suggestion other than "our PCI device locking needs a complete overhaul". Quite likely what we need is a refcounter per device, which - as long as non-zero - prevents removal. > + !spin_trylock(>vpci->lock) ) > return -EBUSY; Don't you need to drop the pcidevs lock on this error path? > @@ -450,10 +465,15 @@ static int cf_check init_bars(struct pci_dev *pdev) > uint16_t cmd; > uint64_t addr, size; > unsigned int i, num_bars, rom_reg; > -struct vpci_header *header = >vpci->header; > -struct vpci_bar *bars = header->bars; > +struct vpci_header *header; > +struct vpci_bar *bars; > int rc; > > +ASSERT(pcidevs_write_locked()); > + > +header = >vpci->header; > +bars = header->bars; I'm not convinced the code movement here does us any good. (Same apparently elsewhere below.) > @@ -277,6 +282,9 @@ void vpci_dump_msi(void) > > printk("vPCI MSI/MSI-X d%d\n", d->domain_id); > > +if ( !pcidevs_read_trylock() ) > +continue; Note how this lives ahead of ... > for_each_pdev ( d, pdev ) > { ... the loop, while ... > @@ -310,7 +318,7 @@ void vpci_dump_msi(void) > printk(" entries: %u maskall: %d enabled: %d\n", > msix->max_entries, msix->masked, msix->enabled); > > -rc = vpci_msix_arch_print(msix); > +rc = vpci_msix_arch_print(d, msix); > if ( rc ) > { > /* > @@ -318,12 +326,13 @@ void vpci_dump_msi(void) > * holding the lock. > */ > printk("unable to print all MSI-X entries: %d\n", rc); > -process_pending_softirqs(); > -continue; > +goto pdev_done; > } > } > > spin_unlock(>vpci->lock); > + pdev_done: > +pcidevs_read_unlock(); ... this is still inside the loop body. > @@ -332,10 +334,14 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, > unsigned int size) > return data; > } > > +pcidevs_read_lock(); > /* Find the PCI dev matching the address. */ > pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); > -if ( !pdev ) > +if ( !pdev || (pdev && !pdev->vpci) ) Simpler if ( !pdev || !pdev->vpci ) ? > @@ -381,6 +387,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, > unsigned int size) > ASSERT(data_offset < size); > } > spin_unlock(>vpci->lock); > +pcidevs_read_unlock(); I guess this is too early and wants to come after ... > if ( data_offset < size ) > { ... this if, which - even if it doesn't use pdev - still accesses the device. Both comments equally apply to vpci_write(). > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -161,6 +161,7 @@ void pcidevs_unlock(void); > bool __must_check pcidevs_locked(void); > > void pcidevs_read_lock(void); > +int pcidevs_read_trylock(void); This declaration
[PATCH v2 3/4] vpci: use pcidevs locking to protect MMIO handlers
From: Oleksandr Andrushchenko vPCI MMIO handlers are accessing pdevs without protecting this access with pcidevs_{lock|unlock}. This is not a problem as of now as these are only used by Dom0. But, towards vPCI is used also for guests, we need to properly protect pdev and pdev->vpci from being removed while still in use. For that use pcidevs_read_{un}lock helpers. This patch adds ASSERTs in the code to check that the rwlock is taken and in appropriate mode. Some of such checks require changes to the initialization of local variables which may be accessed before the ASSERT checks the locking. For example see init_bars and mask_write. Signed-off-by: Oleksandr Andrushchenko Signed-off-by: Volodymyr Babchuk --- Since v1: - move pcidevs_read_{lock|unlock} into patch 1 --- xen/arch/x86/hvm/vmsi.c | 24 ++--- xen/drivers/vpci/header.c | 24 +++-- xen/drivers/vpci/msi.c| 21 ++- xen/drivers/vpci/msix.c | 55 ++- xen/drivers/vpci/vpci.c | 16 +--- xen/include/xen/pci.h | 1 + xen/include/xen/vpci.h| 2 +- 7 files changed, 121 insertions(+), 22 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index c1ede676d0..3f250f81a4 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -891,10 +891,16 @@ void vpci_msix_arch_init_entry(struct vpci_msix_entry *entry) entry->arch.pirq = INVALID_PIRQ; } -int vpci_msix_arch_print(const struct vpci_msix *msix) +int vpci_msix_arch_print(const struct domain *d, const struct vpci_msix *msix) { unsigned int i; +/* + * FIXME: this is not immediately correct, as the lock can be grabbed + * by a different CPU. But this is better then nothing. + */ +ASSERT(pcidevs_read_locked()); + for ( i = 0; i < msix->max_entries; i++ ) { const struct vpci_msix_entry *entry = >entries[i]; @@ -911,11 +917,23 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) if ( i && !(i % 64) ) { struct pci_dev *pdev = msix->pdev; +pci_sbdf_t sbdf = pdev->sbdf; spin_unlock(>pdev->vpci->lock); +pcidevs_read_unlock(); + +/* NB: we still hold rcu_read_lock(_read_lock); here. */ process_pending_softirqs(); -/* NB: we assume that pdev cannot go away for an alive domain. */ -if ( !pdev->vpci || !spin_trylock(>vpci->lock) ) + +if ( !pcidevs_read_trylock() ) +return -EBUSY; +pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); +/* + * FIXME: we may find a re-allocated pdev's copy here. + * Even occupying the same address as before. Do our best. + */ +if ( !pdev || (pdev != msix->pdev) || !pdev->vpci || + !spin_trylock(>vpci->lock) ) return -EBUSY; if ( pdev->vpci->msix != msix ) { diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index a1c928a0d2..e0461b1139 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -142,16 +142,19 @@ bool vpci_process_pending(struct vcpu *v) if ( rc == -ERESTART ) return true; +pcidevs_read_lock(); spin_lock(>vpci.pdev->vpci->lock); /* Disable memory decoding unconditionally on failure. */ modify_decoding(v->vpci.pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd, !rc && v->vpci.rom_only); spin_unlock(>vpci.pdev->vpci->lock); +pcidevs_read_unlock(); rangeset_destroy(v->vpci.mem); v->vpci.mem = NULL; if ( rc ) +{ /* * FIXME: in case of failure remove the device from the domain. * Note that there might still be leftover mappings. While this is @@ -159,7 +162,10 @@ bool vpci_process_pending(struct vcpu *v) * killed in order to avoid leaking stale p2m mappings on * failure. */ +pcidevs_write_lock(); vpci_remove_device(v->vpci.pdev); +pcidevs_write_unlock(); +} } return false; @@ -172,7 +178,16 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev, int rc; while ( (rc = rangeset_consume_ranges(mem, map_range, )) == -ERESTART ) +{ +/* + * It's safe to drop and re-acquire the lock in this context + * without risking pdev disappearing because devices cannot be + * removed until the initial domain has been started. + */ +pcidevs_write_unlock(); process_pending_softirqs(); +pcidevs_write_lock(); +} rangeset_destroy(mem); if ( !rc ) modify_decoding(pdev, cmd, false); @@ -450,10 +465,15 @@ static int cf_check init_bars(struct pci_dev *pdev)