On Tue, Jun 13, 2023 at 10:32:26AM +0000, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko <[email protected]>
> 
> Introduce a per-domain read/write lock to check whether vpci is present,
> so we are sure there are no accesses to the contents of the vpci struct
> if not. This lock can be used (and in a few cases is used right away)
> so that vpci removal can be performed while holding the lock in write
> mode. Previously such removal could race with vpci_read for example.
> 
> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> from being removed.
> 
> 2. Writing the command register and ROM BAR register may trigger
> modify_bars to run, which in turn may access multiple pdevs while
> checking for the existing BAR's overlap. The overlapping check, if done
> under the read lock, requires vpci->lock to be acquired on both devices
> being compared, which may produce a deadlock. It is not possible to
> upgrade read lock to write lock in such a case. So, in order to prevent
> the deadlock, check which registers are going to be written and acquire
> the lock in the appropriate mode from the beginning.
> 
> All other code, which doesn't lead to pdev->vpci destruction and does not
> access multiple pdevs at the same time, can still use a combination of the
> read lock and pdev->vpci->lock.
> 
> 3. Optimize if ROM BAR write lock required detection by caching offset
> of the ROM BAR register in vpci->header->rom_reg which depends on
> header's type.
> 
> 4. Reduce locked region in vpci_remove_device as it is now possible
> to set pdev->vpci to NULL early right after the write lock is acquired.
> 
> 5. Reduce locked region in vpci_add_handlers as it is possible to
> initialize many more fields of the struct vpci before assigning it to
> pdev->vpci.
> 
> 6. vpci_{add|remove}_register are required to be called with the write lock
> held, but it is not feasible to add an assert there as it requires
> struct domain to be passed for that. So, add a comment about this requirement
> to these and other functions with the equivalent constraints.
> 
> 7. Drop const qualifier where the new rwlock is used and this is appropriate.
> 
> 8. Do not call process_pending_softirqs with any locks held. For that unlock
> prior the call and re-acquire the locks after. After re-acquiring the
> lock there is no need to check if pdev->vpci exists:
>  - in apply_map because of the context it is called (no race condition
>    possible)
>  - for MSI/MSI-X debug code because it is called at the end of
>    pdev->vpci access and no further access to pdev->vpci is made
> 
> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
> and if so, allow reading or writing the hardware register directly. This is
> acceptable as we only deal with Dom0 as of now. Once DomU support is
> added the write will need to be ignored and read return all 0's for the
> guests, while Dom0 can still access the registers directly.
> 
> 10. Introduce pcidevs_trylock, so there is a possibility to try locking
> the pcidev's lock.
> 
> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
> while accessing pdevs in vpci code.
> 
> 12. This is based on the discussion at [1].
> 
> [1] https://lore.kernel.org/all/[email protected]/
> 
> Suggested-by: Roger Pau Monné <[email protected]>
> Suggested-by: Jan Beulich <[email protected]>
> Signed-off-by: Oleksandr Andrushchenko <[email protected]>

Thanks.

I haven't looked in full detail, but I'm afraid there's an ABBA
deadlock with the per-domain vpci lock and the pcidevs lock in
modify_bars() vs  vpci_add_handlers() and vpci_remove_device().

I've made some comments below.

> ---
> This was checked on x86: with and without PVH Dom0.
> ---
>  xen/arch/x86/hvm/vmsi.c       |   7 +++
>  xen/common/domain.c           |   3 +
>  xen/drivers/passthrough/pci.c |   5 ++
>  xen/drivers/vpci/header.c     |  52 +++++++++++++++++
>  xen/drivers/vpci/msi.c        |  25 +++++++-
>  xen/drivers/vpci/msix.c       |  51 +++++++++++++---
>  xen/drivers/vpci/vpci.c       | 107 +++++++++++++++++++++++++---------
>  xen/include/xen/pci.h         |   1 +
>  xen/include/xen/sched.h       |   3 +
>  xen/include/xen/vpci.h        |   6 ++
>  10 files changed, 223 insertions(+), 37 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 3cd4923060..f188450b1b 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -895,6 +895,9 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>  {
>      unsigned int i;
>  
> +    ASSERT(rw_is_locked(&msix->pdev->domain->vpci_rwlock));
> +    ASSERT(pcidevs_locked());
> +
>      for ( i = 0; i < msix->max_entries; i++ )
>      {
>          const struct vpci_msix_entry *entry = &msix->entries[i];
> @@ -913,7 +916,11 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>              struct pci_dev *pdev = msix->pdev;
>  
>              spin_unlock(&msix->pdev->vpci->lock);
> +            pcidevs_unlock();
> +            read_unlock(&pdev->domain->vpci_rwlock);
>              process_pending_softirqs();
> +            read_lock(&pdev->domain->vpci_rwlock);
> +            pcidevs_lock();

Why do you need the pcidevs_lock here?  Isn't it enough to have the
per-domain rwlock taken in read mode in order to prevent devices from
being de-assigned from the domain?

>              /* NB: we assume that pdev cannot go away for an alive domain. */
>              if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
>                  return -EBUSY;
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 6a440590fe..a4640acb62 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -622,6 +622,9 @@ struct domain *domain_create(domid_t domid,
>  
>  #ifdef CONFIG_HAS_PCI
>      INIT_LIST_HEAD(&d->pdev_list);
> +#ifdef CONFIG_HAS_VPCI
> +    rwlock_init(&d->vpci_rwlock);
> +#endif
>  #endif
>  
>      /* All error paths can depend on the above setup. */
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 07d1986d33..0afcb59af0 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -57,6 +57,11 @@ void pcidevs_lock(void)
>      spin_lock_recursive(&_pcidevs_lock);
>  }
>  
> +int pcidevs_trylock(void)
> +{
> +    return spin_trylock_recursive(&_pcidevs_lock);
> +}
> +
>  void pcidevs_unlock(void)
>  {
>      spin_unlock_recursive(&_pcidevs_lock);
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ec2e978a4e..23b5223adf 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -152,12 +152,14 @@ bool vpci_process_pending(struct vcpu *v)
>          if ( rc == -ERESTART )
>              return true;
>  
> +        read_lock(&v->domain->vpci_rwlock);
>          spin_lock(&v->vpci.pdev->vpci->lock);
>          /* Disable memory decoding unconditionally on failure. */
>          modify_decoding(v->vpci.pdev,
>                          rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>                          !rc && v->vpci.rom_only);
>          spin_unlock(&v->vpci.pdev->vpci->lock);
> +        read_unlock(&v->domain->vpci_rwlock);

I think you likely want to expand the scope of the domain vpci lock in
order to also protect the data in v->vpci?  So that vPCI device
removal can clear this data if required without racing with
vpci_process_pending().  Otherwise you need to pause the domain when
removing vPCI.

>  
>          rangeset_destroy(v->vpci.mem);
>          v->vpci.mem = NULL;
> @@ -181,8 +183,20 @@ static int __init apply_map(struct domain *d, const 
> struct pci_dev *pdev,
>      struct map_data data = { .d = d, .map = true };
>      int rc;
>  
> +    ASSERT(rw_is_write_locked(&d->vpci_rwlock));
> +
>      while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
> -ERESTART )
> +    {
> +        /*
> +         * It's safe to drop and reacquire the lock in this context
> +         * without risking pdev disappearing because devices cannot be
> +         * removed until the initial domain has been started.
> +         */
> +        write_unlock(&d->vpci_rwlock);
>          process_pending_softirqs();
> +        write_lock(&d->vpci_rwlock);
> +    }
> +
>      rangeset_destroy(mem);
>      if ( !rc )
>          modify_decoding(pdev, cmd, false);
> @@ -213,6 +227,7 @@ static void defer_map(struct domain *d, struct pci_dev 
> *pdev,
>      raise_softirq(SCHEDULE_SOFTIRQ);
>  }
>  
> +/* This must hold domain's vpci_rwlock in write mode. */

Why it must be in write mode?

Is this to avoid ABBA deadlocks as not taking the per-domain lock in
write mode would then force us to take each pdev->vpci->lock in order
to prevent concurrent modifications of remote devices?

>  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool 
> rom_only)
>  {
>      struct vpci_header *header = &pdev->vpci->header;
> @@ -287,6 +302,7 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>       * Check for overlaps with other BARs. Note that only BARs that are
>       * currently mapped (enabled) are checked for overlaps.
>       */
> +    pcidevs_lock();

This can be problematic I'm afraid, as it's a guest controlled path
that allows applying unrestricted contention on the pcidevs lock.  I'm
unsure whether multiple guests could be able to trigger the watchdog
if given enough devices/vcpus to be able to perform enough
simultaneous calls to modify_bars().

Maybe you could expand the per-domain vpci lock to also protect
modifications of domain->pdev_list?

IOW: kind of a per-domain pdev_lock.

>      for_each_pdev ( pdev->domain, tmp )
>      {
>          if ( tmp == pdev )
> @@ -326,10 +342,12 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>                  printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
>                         start, end, rc);
>                  rangeset_destroy(mem);
> +                pcidevs_unlock();
>                  return rc;
>              }
>          }
>      }
> +    pcidevs_unlock();
>  
>      ASSERT(dev);
>  
> @@ -482,6 +500,8 @@ static int cf_check init_bars(struct pci_dev *pdev)
>      struct vpci_bar *bars = header->bars;
>      int rc;
>  
> +    ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
> +
>      switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>      {
>      case PCI_HEADER_TYPE_NORMAL:
> @@ -570,6 +590,8 @@ static int cf_check init_bars(struct pci_dev *pdev)
>          }
>      }
>  
> +    ASSERT(!header->rom_reg);
> +
>      /* Check expansion ROM. */
>      rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
>      if ( rc > 0 && size )
> @@ -586,12 +608,42 @@ static int cf_check init_bars(struct pci_dev *pdev)
>                                 4, rom);
>          if ( rc )
>              rom->type = VPCI_BAR_EMPTY;
> +
> +        header->rom_reg = rom_reg;
>      }
>  
>      return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
>  }
>  REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
>  
> +static bool overlap(unsigned int r1_offset, unsigned int r1_size,
> +                    unsigned int r2_offset, unsigned int r2_size)
> +{
> +    /* Return true if there is an overlap. */
> +    return r1_offset < r2_offset + r2_size && r2_offset < r1_offset + 
> r1_size;

Hm, we already have a vpci_register_cmp(), which does a very similar
comparison.  I wonder if it would be possible to somehow use that
here.

> +}
> +
> +bool vpci_header_need_write_lock(const struct pci_dev *pdev,
> +                                 unsigned int start, unsigned int size)
> +{
> +    /*
> +     * Writing the command register and ROM BAR register may trigger
> +     * modify_bars to run, which in turn may access multiple pdevs while
> +     * checking for the existing BAR's overlap. The overlapping check, if 
> done
> +     * under the read lock, requires vpci->lock to be acquired on both 
> devices
> +     * being compared, which may produce a deadlock. At the same time it is 
> not
> +     * possible to upgrade read lock to write lock in such a case.
> +     * Check which registers are going to be written and return true if lock
> +     * needs to be acquired in write mode.
> +     */
> +    if ( overlap(start, size, PCI_COMMAND, 2) ||
> +         (pdev->vpci->header.rom_reg &&
> +          overlap(start, size, pdev->vpci->header.rom_reg, 4)) )
> +        return true;
> +
> +    return false;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 8f2b59e61a..e7d1f916a0 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -190,6 +190,8 @@ static int cf_check init_msi(struct pci_dev *pdev)
>      uint16_t control;
>      int ret;
>  
> +    ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
> +
>      if ( !pos )
>          return 0;
>  
> @@ -265,7 +267,7 @@ REGISTER_VPCI_INIT(init_msi, VPCI_PRIORITY_LOW);
>  
>  void vpci_dump_msi(void)
>  {
> -    const struct domain *d;
> +    struct domain *d;
>  
>      rcu_read_lock(&domlist_read_lock);
>      for_each_domain ( d )
> @@ -277,6 +279,15 @@ void vpci_dump_msi(void)
>  
>          printk("vPCI MSI/MSI-X d%d\n", d->domain_id);
>  
> +        if ( !read_trylock(&d->vpci_rwlock) )
> +            continue;
> +
> +        if ( !pcidevs_trylock() )
> +        {
> +            read_unlock(&d->vpci_rwlock);
> +            continue;
> +        }
> +
>          for_each_pdev ( d, pdev )
>          {
>              const struct vpci_msi *msi;
> @@ -318,14 +329,22 @@ void vpci_dump_msi(void)
>                       * holding the lock.
>                       */
>                      printk("unable to print all MSI-X entries: %d\n", rc);
> -                    process_pending_softirqs();
> -                    continue;
> +                    goto pdev_done;
>                  }
>              }
>  
>              spin_unlock(&pdev->vpci->lock);
> + pdev_done:
> +            pcidevs_unlock();
> +            read_unlock(&d->vpci_rwlock);
> +
>              process_pending_softirqs();
> +
> +            read_lock(&d->vpci_rwlock);
> +            pcidevs_lock();
>          }
> +        pcidevs_unlock();
> +        read_unlock(&d->vpci_rwlock);
>      }
>      rcu_read_unlock(&domlist_read_lock);
>  }
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 25bde77586..b5a7dfdf9c 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -143,6 +143,7 @@ static void cf_check control_write(
>          pci_conf_write16(pdev->sbdf, reg, val);
>  }
>  
> +/* This must hold domain's vpci_rwlock in write mode. */
>  static struct vpci_msix *msix_find(const struct domain *d, unsigned long 
> addr)
>  {
>      struct vpci_msix *msix;
> @@ -163,7 +164,13 @@ static struct vpci_msix *msix_find(const struct domain 
> *d, unsigned long addr)
>  
>  static int cf_check msix_accept(struct vcpu *v, unsigned long addr)
>  {
> -    return !!msix_find(v->domain, addr);
> +    int rc;
> +
> +    read_lock(&v->domain->vpci_rwlock);
> +    rc = !!msix_find(v->domain, addr);
> +    read_unlock(&v->domain->vpci_rwlock);
> +
> +    return rc;
>  }
>  
>  static bool access_allowed(const struct pci_dev *pdev, unsigned long addr,
> @@ -358,21 +365,34 @@ static int adjacent_read(const struct domain *d, const 
> struct vpci_msix *msix,
>  static int cf_check msix_read(
>      struct vcpu *v, unsigned long addr, unsigned int len, unsigned long 
> *data)
>  {
> -    const struct domain *d = v->domain;
> -    struct vpci_msix *msix = msix_find(d, addr);
> +    struct domain *d = v->domain;
> +    struct vpci_msix *msix;
>      const struct vpci_msix_entry *entry;
>      unsigned int offset;
>  
>      *data = ~0ul;
>  
> +    read_lock(&d->vpci_rwlock);
> +
> +    msix = msix_find(d, addr);
>      if ( !msix )
> +    {
> +        read_unlock(&d->vpci_rwlock);
>          return X86EMUL_RETRY;
> +    }
>  
>      if ( adjacent_handle(msix, addr) )
> -        return adjacent_read(d, msix, addr, len, data);
> +    {
> +        int rc = adjacent_read(d, msix, addr, len, data);
> +        read_unlock(&d->vpci_rwlock);
> +        return rc;
> +    }
>  
>      if ( !access_allowed(msix->pdev, addr, len) )
> +    {
> +        read_unlock(&d->vpci_rwlock);
>          return X86EMUL_OKAY;
> +    }
>  
>      spin_lock(&msix->pdev->vpci->lock);
>      entry = get_entry(msix, addr);
> @@ -404,6 +424,7 @@ static int cf_check msix_read(
>          break;
>      }
>      spin_unlock(&msix->pdev->vpci->lock);
> +    read_unlock(&d->vpci_rwlock);
>  
>      return X86EMUL_OKAY;
>  }
> @@ -491,19 +512,32 @@ static int adjacent_write(const struct domain *d, const 
> struct vpci_msix *msix,
>  static int cf_check msix_write(
>      struct vcpu *v, unsigned long addr, unsigned int len, unsigned long data)
>  {
> -    const struct domain *d = v->domain;
> -    struct vpci_msix *msix = msix_find(d, addr);
> +    struct domain *d = v->domain;
> +    struct vpci_msix *msix;
>      struct vpci_msix_entry *entry;
>      unsigned int offset;
>  
> +    read_lock(&d->vpci_rwlock);
> +
> +    msix = msix_find(d, addr);
>      if ( !msix )
> +    {
> +        read_unlock(&d->vpci_rwlock);
>          return X86EMUL_RETRY;
> +    }
>  
>      if ( adjacent_handle(msix, addr) )
> -        return adjacent_write(d, msix, addr, len, data);
> +    {
> +        int rc = adjacent_write(d, msix, addr, len, data);
> +        read_unlock(&d->vpci_rwlock);
> +        return rc;
> +    }
>  
>      if ( !access_allowed(msix->pdev, addr, len) )
> +    {
> +        read_unlock(&d->vpci_rwlock);
>          return X86EMUL_OKAY;
> +    }
>  
>      spin_lock(&msix->pdev->vpci->lock);
>      entry = get_entry(msix, addr);
> @@ -579,6 +613,7 @@ static int cf_check msix_write(
>          break;
>      }
>      spin_unlock(&msix->pdev->vpci->lock);
> +    read_unlock(&d->vpci_rwlock);
>  
>      return X86EMUL_OKAY;
>  }
> @@ -665,6 +700,8 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      struct vpci_msix *msix;
>      int rc;
>  
> +    ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
> +
>      msix_offset = pci_find_cap_offset(pdev->seg, pdev->bus, slot, func,
>                                        PCI_CAP_ID_MSIX);
>      if ( !msix_offset )
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 652807a4a4..1270174e78 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -38,20 +38,32 @@ extern vpci_register_init_t *const __end_vpci_array[];
>  
>  void vpci_remove_device(struct pci_dev *pdev)
>  {
> -    if ( !has_vpci(pdev->domain) || !pdev->vpci )
> +    struct vpci *vpci;
> +
> +    if ( !has_vpci(pdev->domain) )
>          return;
>  
> -    spin_lock(&pdev->vpci->lock);
> +    write_lock(&pdev->domain->vpci_rwlock);
> +    if ( !pdev->vpci )
> +    {
> +        write_unlock(&pdev->domain->vpci_rwlock);
> +        return;
> +    }
> +
> +    vpci = pdev->vpci;
> +    pdev->vpci = NULL;
> +    write_unlock(&pdev->domain->vpci_rwlock);
> +
>      while ( !list_empty(&pdev->vpci->handlers) )
>      {
> -        struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> +        struct vpci_register *r = list_first_entry(&vpci->handlers,
>                                                     struct vpci_register,
>                                                     node);
>  
>          list_del(&r->node);
>          xfree(r);
>      }
> -    spin_unlock(&pdev->vpci->lock);
> +
>      if ( pdev->vpci->msix )
>      {
>          unsigned int i;
> @@ -61,29 +73,33 @@ void vpci_remove_device(struct pci_dev *pdev)
>              if ( pdev->vpci->msix->table[i] )
>                  iounmap(pdev->vpci->msix->table[i]);
>      }
> -    xfree(pdev->vpci->msix);
> -    xfree(pdev->vpci->msi);
> -    xfree(pdev->vpci);
> -    pdev->vpci = NULL;
> +    xfree(vpci->msix);
> +    xfree(vpci->msi);
> +    xfree(vpci);
>  }
>  
>  int vpci_add_handlers(struct pci_dev *pdev)
>  {
> +    struct vpci *vpci;
>      unsigned int i;
>      int rc = 0;
>  
>      if ( !has_vpci(pdev->domain) )
>          return 0;
>  
> +    vpci = xzalloc(struct vpci);
> +    if ( !vpci )
> +        return -ENOMEM;
> +
> +    INIT_LIST_HEAD(&vpci->handlers);
> +    spin_lock_init(&vpci->lock);
> +
> +    write_lock(&pdev->domain->vpci_rwlock);

I think the usage of the vpci per-domain lock here and in
vpci_remove_device() create an ABBA deadlock situation with the usage
of it in modify_bars().

Both vpci_add_handlers() and vpci_remove_device() get called with the
pcidevs lock taken, and take the per-domain vpci lock afterwards,
while modify_bars() does the inverse locking, gets called with the
per-domain vpci lock taken and then take the pcidevs lock inside the
function.

> +
>      /* We should not get here twice for the same device. */
>      ASSERT(!pdev->vpci);
>  
> -    pdev->vpci = xzalloc(struct vpci);
> -    if ( !pdev->vpci )
> -        return -ENOMEM;
> -
> -    INIT_LIST_HEAD(&pdev->vpci->handlers);
> -    spin_lock_init(&pdev->vpci->lock);
> +    pdev->vpci = vpci;
>  
>      for ( i = 0; i < NUM_VPCI_INIT; i++ )
>      {
> @@ -91,6 +107,7 @@ int vpci_add_handlers(struct pci_dev *pdev)
>          if ( rc )
>              break;
>      }
> +    write_unlock(&pdev->domain->vpci_rwlock);
>  
>      if ( rc )
>          vpci_remove_device(pdev);
> @@ -139,6 +156,7 @@ uint32_t cf_check vpci_hw_read32(
>      return pci_conf_read32(pdev->sbdf, reg);
>  }
>  
> +/* This must hold domain's vpci_rwlock in write mode. */
>  int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>                        vpci_write_t *write_handler, unsigned int offset,
>                        unsigned int size, void *data)
> @@ -162,8 +180,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t 
> *read_handler,
>      r->offset = offset;
>      r->private = data;
>  
> -    spin_lock(&vpci->lock);

If you remove the lock here we need to assert that the per-domain vpci
lock is taken in write mode.

> -
>      /* The list of handlers must be kept sorted at all times. */
>      list_for_each ( prev, &vpci->handlers )
>      {
> @@ -175,25 +191,23 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t 
> *read_handler,
>              break;
>          if ( cmp == 0 )
>          {
> -            spin_unlock(&vpci->lock);
>              xfree(r);
>              return -EEXIST;
>          }
>      }
>  
>      list_add_tail(&r->node, prev);
> -    spin_unlock(&vpci->lock);
>  
>      return 0;
>  }
>  
> +/* This must hold domain's vpci_rwlock in write mode. */
>  int vpci_remove_register(struct vpci *vpci, unsigned int offset,
>                           unsigned int size)
>  {
>      const struct vpci_register r = { .offset = offset, .size = size };
>      struct vpci_register *rm;
>  
> -    spin_lock(&vpci->lock);
>      list_for_each_entry ( rm, &vpci->handlers, node )
>      {
>          int cmp = vpci_register_cmp(&r, rm);
> @@ -205,14 +219,12 @@ int vpci_remove_register(struct vpci *vpci, unsigned 
> int offset,
>          if ( !cmp && rm->offset == offset && rm->size == size )
>          {
>              list_del(&rm->node);
> -            spin_unlock(&vpci->lock);
>              xfree(rm);
>              return 0;
>          }
>          if ( cmp <= 0 )
>              break;
>      }
> -    spin_unlock(&vpci->lock);

Same here about the per-domain lock being taken.

>  
>      return -ENOENT;
>  }
> @@ -320,7 +332,7 @@ static uint32_t merge_result(uint32_t data, uint32_t new, 
> unsigned int size,
>  
>  uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>  {
> -    const struct domain *d = current->domain;
> +    struct domain *d = current->domain;
>      const struct pci_dev *pdev;
>      const struct vpci_register *r;
>      unsigned int data_offset = 0;
> @@ -333,10 +345,18 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
> unsigned int size)
>      }
>  
>      /* Find the PCI dev matching the address. */
> +    pcidevs_lock();
>      pdev = pci_get_pdev(d, sbdf);
> -    if ( !pdev || !pdev->vpci )
> +    pcidevs_unlock();

I think it's worth looking into expanding the per-domain vpci-lock to
a pdevs lock or similar in order to protect the pdev_list also if
possible.  So that we can avoid taking the pcidevs lock here.

Thanks, Roger.


Reply via email to