Re: [PATCH v3 04/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:44PM -0600, Logan Gunthorpe wrote:
> Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
> implementations. It takes an scatterlist segment that must point to a
> pci_p2pdma struct page and will map it if the mapping requires a bus
> address.
> 
> The return value indicates whether the mapping required a bus address
> or whether the caller still needs to map the segment normally. If the
> segment should not be mapped, -EREMOTEIO is returned.
> 
> This helper uses a state structure to track the changes to the
> pgmap across calls and avoid needing to lookup into the xarray for
> every page.
> 
> Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
> dma_map_sg() implementations where the sg segment containing the page
> differs from the sg segment containing the DMA address.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Ditto.

> ---
>  drivers/pci/p2pdma.c   | 59 ++
>  include/linux/pci-p2pdma.h | 21 ++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index b656d8c801a7..58c34f1f1473 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -943,6 +943,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, 
> struct scatterlist *sg,
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>  
> +/**
> + * pci_p2pdma_map_segment - map an sg segment determining the mapping type
> + * @state: State structure that should be declared outside of the 
> for_each_sg()
> + *   loop and initialized to zero.
> + * @dev: DMA device that's doing the mapping operation
> + * @sg: scatterlist segment to map
> + *
> + * This is a helper to be used by non-iommu dma_map_sg() implementations 
> where
> + * the sg segment is the same for the page_link and the dma_address.

s/non-iommu/non-IOMMU/

> + *
> + * Attempt to map a single segment in an SGL with the PCI bus address.
> + * The segment must point to a PCI P2PDMA page and thus must be
> + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
> + *
> + * Returns the type of mapping used and maps the page if the type is
> + * PCI_P2PDMA_MAP_BUS_ADDR.
> + */
> +enum pci_p2pdma_map_type
> +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device 
> *dev,
> +struct scatterlist *sg)
> +{
> + if (state->pgmap != sg_page(sg)->pgmap) {
> + state->pgmap = sg_page(sg)->pgmap;
> + state->map = pci_p2pdma_map_type(state->pgmap, dev);
> + state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
> + }
> +
> + if (state->map == PCI_P2PDMA_MAP_BUS_ADDR) {
> + sg->dma_address = sg_phys(sg) + state->bus_off;
> + sg_dma_len(sg) = sg->length;
> + sg_dma_mark_pci_p2pdma(sg);
> + }
> +
> + return state->map;
> +}
> +
> +/**
> + * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
> + *   be mapped with PCI_P2PDMA_MAP_BUS_ADDR
> + * @pg_sg: scatterlist segment with the page to map
> + * @dma_sg: scatterlist segment to assign a dma address to

s/dma address/DMA address/, also below

> + *
> + * This is a helper for iommu dma_map_sg() implementations when the
> + * segment for the dma address differs from the segment containing the
> + * source page.
> + *
> + * pci_p2pdma_map_type() must have already been called on the pg_sg and
> + * returned PCI_P2PDMA_MAP_BUS_ADDR.
> + */
> +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> + struct scatterlist *dma_sg)
> +{
> + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
> +
> + dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
> + sg_dma_len(dma_sg) = pg_sg->length;
> + sg_dma_mark_pci_p2pdma(dma_sg);
> +}
> +
>  /**
>   * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
>   *   to enable p2pdma
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index caac2d023f8f..e5a8d5bc0f51 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -13,6 +13,12 @@
>  
>  #include 
>  
> +struct pci_p2pdma_map_state {
> + struct dev_pagemap *pgmap;
> + int map;
> + u64 bus_off;
> +};
> +
>  struct block_device;
>  struct scatterlist;
>  
> @@ -70,6 +76,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
> scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
>  void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatter

Re: [PATCH v3 13/20] PCI/P2PDMA: remove pci_p2pdma_[un]map_sg()

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:53PM -0600, Logan Gunthorpe wrote:
> This interface is superseded by support in dma_map_sg() which now supports
> heterogeneous scatterlists. There are no longer any users, so remove it.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Ditto.

> ---
>  drivers/pci/p2pdma.c   | 65 --
>  include/linux/pci-p2pdma.h | 27 
>  2 files changed, 92 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 58c34f1f1473..4478633346bd 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -878,71 +878,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
> dev_pagemap *pgmap,
>   return type;
>  }
>  
> -static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
> - struct device *dev, struct scatterlist *sg, int nents)
> -{
> - struct scatterlist *s;
> - int i;
> -
> - for_each_sg(sg, s, nents, i) {
> - s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
> - sg_dma_len(s) = s->length;
> - }
> -
> - return nents;
> -}
> -
> -/**
> - * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
> - * @dev: device doing the DMA request
> - * @sg: scatter list to map
> - * @nents: elements in the scatterlist
> - * @dir: DMA direction
> - * @attrs: DMA attributes passed to dma_map_sg() (if called)
> - *
> - * Scatterlists mapped with this function should be unmapped using
> - * pci_p2pdma_unmap_sg_attrs().
> - *
> - * Returns the number of SG entries mapped or 0 on error.
> - */
> -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs)
> -{
> - struct pci_p2pdma_pagemap *p2p_pgmap =
> - to_p2p_pgmap(sg_page(sg)->pgmap);
> -
> - switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
> - case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> - return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> - case PCI_P2PDMA_MAP_BUS_ADDR:
> - return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
> - default:
> - return 0;
> - }
> -}
> -EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
> -
> -/**
> - * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
> - *   mapped with pci_p2pdma_map_sg()
> - * @dev: device doing the DMA request
> - * @sg: scatter list to map
> - * @nents: number of elements returned by pci_p2pdma_map_sg()
> - * @dir: DMA direction
> - * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
> - */
> -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs)
> -{
> - enum pci_p2pdma_map_type map_type;
> -
> - map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
> -
> - if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
> - dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
> -}
> -EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
> -
>  /**
>   * pci_p2pdma_map_segment - map an sg segment determining the mapping type
>   * @state: State structure that should be declared outside of the 
> for_each_sg()
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index e5a8d5bc0f51..0c33a40a86e7 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -72,10 +72,6 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct 
> scatterlist *sgl);
>  void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
>  enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>struct device *dev);
> -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs);
> -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs);
>  enum pci_p2pdma_map_type
>  pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device 
> *dev,
>  struct scatterlist *sg);
> @@ -135,17 +131,6 @@ pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct 
> device *dev)
>  {
>   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>  }
> -static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
> - struct scatterlist *sg, int nents, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - return 0;
> -}
> -static inline void pci_p2pdma_unmap_sg_attrs(struct

Re: [PATCH v3 02/20] PCI/P2PDMA: attempt to set map_type if it has not been set

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:42PM -0600, Logan Gunthorpe wrote:
> Attempt to find the mapping type for P2PDMA pages on the first
> DMA map attempt if it has not been done ahead of time.
> 
> Previously, the mapping type was expected to be calculated ahead of
> time, but if pages are to come from userspace then there's no
> way to ensure the path was checked ahead of time.
> 
> With this change it's no longer invalid to call pci_p2pdma_map_sg()
> before the mapping type is calculated so drop the WARN_ON when that
> is the case.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Capitalize subject line.

> ---
>  drivers/pci/p2pdma.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 50cdde3e9a8b..1192c465ba6d 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -848,6 +848,7 @@ static enum pci_p2pdma_map_type 
> pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>   struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
>   struct pci_dev *client;
>   struct pci_p2pdma *p2pdma;
> + int dist;
>  
>   if (!provider->p2pdma)
>   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
> @@ -864,6 +865,10 @@ static enum pci_p2pdma_map_type 
> pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>   type = xa_to_value(xa_load(>map_types,
>  map_types_idx(client)));
>   rcu_read_unlock();
> +
> + if (type == PCI_P2PDMA_MAP_UNKNOWN)
> + return calc_map_type_and_dist(provider, client, , false);
> +
>   return type;
>  }
>  
> @@ -906,7 +911,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
> scatterlist *sg,
>   case PCI_P2PDMA_MAP_BUS_ADDR:
>   return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
>   default:
> - WARN_ON_ONCE(1);
>   return 0;
>   }
>  }
> -- 
> 2.30.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:59PM -0600, Logan Gunthorpe wrote:
> Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap
> a hunk of p2pmem into userspace.
> 
> Pages are allocated from the genalloc in bulk and their reference count
> incremented. They are returned to the genalloc when the page is put.
> 
> The VMA does not take a reference to the pages when they are inserted
> with vmf_insert_mixed() (which is necessary for zone device pages) so
> the backing P2P memory is stored in a structures in vm_private_data.
> 
> A pseudo mount is used to allocate an inode for each PCI device. The
> inode's address_space is used in the file doing the mmap so that all
> VMAs are collected and can be unmapped if the PCI device is unbound.
> After unmapping, the VMAs are iterated through and their pages are
> put so the device can continue to be unbound. An active flag is used
> to signal to VMAs not to allocate any further P2P memory once the
> removal process starts. The flag is synchronized with concurrent
> access with an RCU lock.
> 
> The VMAs and inode will survive after the unbind of the device, but no
> pages will be present in the VMA and a subsequent access will result
> in a SIGBUS error.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

I would capitalize "Introduce" in the subject line.

> ---
>  drivers/pci/p2pdma.c   | 263 -
>  include/linux/pci-p2pdma.h |  11 ++
>  include/uapi/linux/magic.h |   1 +
>  3 files changed, 273 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 2422af5a529c..a5adf57af53a 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -16,14 +16,19 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct pci_p2pdma {
>   struct gen_pool *pool;
>   bool p2pmem_published;
>   struct xarray map_types;
> + struct inode *inode;
> + bool active;
>  };
>  
>  struct pci_p2pdma_pagemap {
> @@ -32,6 +37,14 @@ struct pci_p2pdma_pagemap {
>   u64 bus_offset;
>  };
>  
> +struct pci_p2pdma_map {
> + struct kref ref;
> + struct pci_dev *pdev;
> + struct inode *inode;
> + void *kaddr;
> + size_t len;
> +};
> +
>  static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap)
>  {
>   return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap);
> @@ -100,6 +113,26 @@ static const struct attribute_group p2pmem_group = {
>   .name = "p2pmem",
>  };
>  
> +/*
> + * P2PDMA internal mount
> + * Fake an internal VFS mount-point in order to allocate struct address_space
> + * mappings to remove VMAs on unbind events.
> + */
> +static int pci_p2pdma_fs_cnt;
> +static struct vfsmount *pci_p2pdma_fs_mnt;
> +
> +static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc)
> +{
> + return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM;
> +}
> +
> +static struct file_system_type pci_p2pdma_fs_type = {
> + .name = "p2dma",
> + .owner = THIS_MODULE,
> + .init_fs_context = pci_p2pdma_fs_init_fs_context,
> + .kill_sb = kill_anon_super,
> +};
> +
>  static void p2pdma_page_free(struct page *page)
>  {
>   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
> @@ -128,6 +161,9 @@ static void pci_p2pdma_release(void *data)
>   gen_pool_destroy(p2pdma->pool);
>   sysfs_remove_group(>dev.kobj, _group);
>   xa_destroy(>map_types);
> +
> + iput(p2pdma->inode);
> + simple_release_fs(_p2pdma_fs_mnt, _p2pdma_fs_cnt);
>  }
>  
>  static int pci_p2pdma_setup(struct pci_dev *pdev)
> @@ -145,17 +181,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
>   if (!p2p->pool)
>   goto out;
>  
> - error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
> + error = simple_pin_fs(_p2pdma_fs_type, _p2pdma_fs_mnt,
> +   _p2pdma_fs_cnt);
>   if (error)
>   goto out_pool_destroy;
>  
> + p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb);
> + if (IS_ERR(p2p->inode)) {
> + error = -ENOMEM;
> + goto out_unpin_fs;
> + }
> +
> + error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
> + if (error)
> + goto out_put_inode;
> +
>   error = sysfs_create_group(>dev.kobj, _group);
>   if (error)
> - goto out_pool_destroy;
> + goto out_put_inode;
>  
>   rcu_assign_pointer(pdev->p2p

Re: [PATCH v3 03/20] PCI/P2PDMA: make pci_p2pdma_map_type() non-static

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:43PM -0600, Logan Gunthorpe wrote:
> pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
> implementation because it will need to determine the mapping type
> ahead of actually doing the mapping to create the actual iommu mapping.

I don't expect this to go via the PCI tree, but if it did I would
silently:

  s/PCI/P2PDMA: make pci_p2pdma_map_type() non-static/
PCI/P2PDMA: Expose pci_p2pdma_map_type()/
  s/iommu/IOMMU/

and mention what this patch does in the commit log (in addition to the
subject) and fix a couple minor typos below.

> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c   | 24 +-
>  include/linux/pci-p2pdma.h | 41 ++
>  2 files changed, 56 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 1192c465ba6d..b656d8c801a7 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -20,13 +20,6 @@
>  #include 
>  #include 
>  
> -enum pci_p2pdma_map_type {
> - PCI_P2PDMA_MAP_UNKNOWN = 0,
> - PCI_P2PDMA_MAP_NOT_SUPPORTED,
> - PCI_P2PDMA_MAP_BUS_ADDR,
> - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> -};
> -
>  struct pci_p2pdma {
>   struct gen_pool *pool;
>   bool p2pmem_published;
> @@ -841,8 +834,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool 
> publish)
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>  
> -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap 
> *pgmap,
> - struct device *dev)
> +/**
> + * pci_p2pdma_map_type - return the type of mapping that should be used for
> + *   a given device and pgmap
> + * @pgmap: the pagemap of a page to determine the mapping type for
> + * @dev: device that is mapping the page
> + *
> + * Returns one of:
> + *   PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
> + *   PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
> + *   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done normally
> + *   using the CPU physical address (in dma-direct) or an IOVA
> + *   mapping for the IOMMU.
> + */
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +  struct device *dev)
>  {
>   enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
>   struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index 8318a97c9c61..caac2d023f8f 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -16,6 +16,40 @@
>  struct block_device;
>  struct scatterlist;
>  
> +enum pci_p2pdma_map_type {
> + /*
> +  * PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping
> +  * type hasn't been calculated yet. Functions that return this enum
> +  * never return this value.
> +  */
> + PCI_P2PDMA_MAP_UNKNOWN = 0,
> +
> + /*
> +  * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will
> +  * traverse the host bridge and the host bridge is not in the
> +  * whitelist. DMA Mapping routines should return an error when
> +  * this is returned.
> +  */
> + PCI_P2PDMA_MAP_NOT_SUPPORTED,
> +
> + /*
> +  * PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to
> +  * eachother directly through a PCI switch and the transaction will
> +  * not traverse the host bridge. Such a mapping should program
> +  * the DMA engine with PCI bus addresses.

s/eachother/each other/

> +  */
> + PCI_P2PDMA_MAP_BUS_ADDR,
> +
> + /*
> +  * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk
> +  * to eachother, but the transaction traverses a host bridge on the
> +  * whitelist. In this case, a normal mapping either with CPU physical
> +  * addresses (in the case of dma-direct) or IOVA addresses (in the
> +  * case of IOMMUs) should be used to program the DMA engine.

s/eachother/each other/

> +  */
> + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> +};
> +
>  #ifdef CONFIG_PCI_P2PDMA
>  int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
>   u64 offset);
> @@ -30,6 +64,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev 
> *pdev,
>unsigned int *nents, u32 length);
>  void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
>  void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,

Re: [PATCH v2 6/9] PCI: Add pci_find_dvsec_capability to find designated VSEC

2021-09-27 Thread Bjorn Helgaas
s/pci_find_dvsec_capability/pci_find_dvsec_capability()/ in subject
and commit log.

On Thu, Sep 23, 2021 at 10:26:44AM -0700, Ben Widawsky wrote:
> Add pci_find_dvsec_capability to locate a Designated Vendor-Specific
> Extended Capability with the specified DVSEC ID.

"specified Vendor ID and Capability ID".

> The Designated Vendor-Specific Extended Capability (DVSEC) allows one or
> more vendor specific capabilities that aren't tied to the vendor ID of
> the PCI component.
> 
> DVSEC is critical for both the Compute Express Link (CXL) driver as well
> as the driver for OpenCAPI coherent accelerator (OCXL).

Strictly speaking, not really relevant for the commit log.

> Cc: David E. Box 
> Cc: Jonathan Cameron 
> Cc: Bjorn Helgaas 
> Cc: Dan Williams 
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: Andrew Donnellan 
> Cc: Lu Baolu 
> Reviewed-by: Frederic Barrat 
> Signed-off-by: Ben Widawsky 

If you want to merge this with the series,

Acked-by: Bjorn Helgaas 

Or if you want me to merge this on a branch, let me know.

> ---
>  drivers/pci/pci.c   | 32 
>  include/linux/pci.h |  1 +
>  2 files changed, 33 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ce2ab62b64cf..94ac86ff28b0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -732,6 +732,38 @@ u16 pci_find_vsec_capability(struct pci_dev *dev, u16 
> vendor, int cap)
>  }
>  EXPORT_SYMBOL_GPL(pci_find_vsec_capability);
>  
> +/**
> + * pci_find_dvsec_capability - Find DVSEC for vendor
> + * @dev: PCI device to query
> + * @vendor: Vendor ID to match for the DVSEC
> + * @dvsec: Designated Vendor-specific capability ID
> + *
> + * If DVSEC has Vendor ID @vendor and DVSEC ID @dvsec return the capability
> + * offset in config space; otherwise return 0.
> + */
> +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> +{
> + int pos;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DVSEC);
> + if (!pos)
> + return 0;
> +
> + while (pos) {
> + u16 v, id;
> +
> + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER1, );
> + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER2, );
> + if (vendor == v && dvsec == id)
> + return pos;
> +
> + pos = pci_find_next_ext_capability(dev, pos, 
> PCI_EXT_CAP_ID_DVSEC);
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
> +
>  /**
>   * pci_find_parent_resource - return resource region of parent bus of given
>   * region
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index cd8aa6fce204..c93ccfa4571b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1130,6 +1130,7 @@ u16 pci_find_ext_capability(struct pci_dev *dev, int 
> cap);
>  u16 pci_find_next_ext_capability(struct pci_dev *dev, u16 pos, int cap);
>  struct pci_bus *pci_find_next_bus(const struct pci_bus *from);
>  u16 pci_find_vsec_capability(struct pci_dev *dev, u16 vendor, int cap);
> +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec);
>  
>  u64 pci_get_dsn(struct pci_dev *dev);
>  
> -- 
> 2.33.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] PCI: only build xen-pcifront in PV-enabled environments

2021-09-17 Thread Bjorn Helgaas
s/only/Only/ in subject

On Fri, Sep 17, 2021 at 12:48:03PM +0200, Jan Beulich wrote:
> The driver's module init function, pcifront_init(), invokes
> xen_pv_domain() first thing. That construct produces constant "false"
> when !CONFIG_XEN_PV. Hence there's no point building the driver in
> non-PV configurations.

Thanks for these bread crumbs.  xen_domain_type is set to
XEN_PV_DOMAIN only by xen_start_kernel() in enlighten_pv.c, which is
only built when CONFIG_XEN_PV=y, so even I can verify this :)

> Drop the (now implicit and generally wrong) X86 dependency: At present,
> XEN_PV con only be set when X86 is also enabled. In general an
> architecture supporting Xen PV (and PCI) would want to have this driver
> built.

s/con only/can only/

> Signed-off-by: Jan Beulich 
> Reviewed-by: Stefano Stabellini 

Acked-by: Bjorn Helgaas 

> ---
> v2: Title and description redone.
> 
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -110,7 +110,7 @@ config PCI_PF_STUB
>  
>  config XEN_PCIDEV_FRONTEND
>   tristate "Xen PCI Frontend"
> - depends on X86 && XEN
> + depends on XEN_PV
>   select PCI_XEN
>   select XEN_XENBUS_FRONTEND
>   default y
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses

2021-09-03 Thread Bjorn Helgaas
From: Bjorn Helgaas 

719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed
the DMA fault reason from hex to decimal.  It also added "0x" prefixes to
the PCI bus/device, e.g.,

  - DMAR: [INTR-REMAP] Request device [00:00.5]
  + DMAR: [INTR-REMAP] Request device [0x00:0x00.5]

These no longer match dev_printk() and other similar messages in
dmar_match_pci_path() and dmar_acpi_insert_dev_scope().

Drop the "0x" prefixes from the bus and device addresses.

Fixes: 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault")
Signed-off-by: Bjorn Helgaas 
---
 drivers/iommu/intel/dmar.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d66f79acd14d..8647a355dad0 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1944,18 +1944,18 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, 
int type,
reason = dmar_get_fault_reason(fault_reason, _type);
 
if (fault_type == INTR_REMAP)
-   pr_err("[INTR-REMAP] Request device [0x%02x:0x%02x.%d] fault 
index 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 
0x%llx [fault reason 0x%02x] %s\n",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr >> 48,
   fault_reason, reason);
else if (pasid == INVALID_IOASID)
-   pr_err("[%s NO_PASID] Request device [0x%02x:0x%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[%s NO_PASID] Request device [%02x:%02x.%d] fault addr 
0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr,
   fault_reason, reason);
else
-   pr_err("[%s PASID 0x%x] Request device [0x%02x:0x%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[%s PASID 0x%x] Request device [%02x:%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write", pasid,
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-09-02 Thread Bjorn Helgaas
[+cc Marek, Anders, Robin]

On Fri, Aug 20, 2021 at 02:57:12PM -0500, Bjorn Helgaas wrote:
> On Fri, May 21, 2021 at 03:03:24AM +, Wang Xingang wrote:
> > From: Xingang Wang 
> > 
> > When booting with devicetree, the pci_request_acs() is called after the
> > enumeration and initialization of PCI devices, thus the ACS is not
> > enabled. And ACS should be enabled when IOMMU is detected for the
> > PCI host bridge, so add check for IOMMU before probe of PCI host and call
> > pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> > devices.
> > 
> > Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> > configuring IOMMU linkage")
> > Signed-off-by: Xingang Wang 
> 
> Applied to pci/virtualization for v5.15, thanks!

I dropped this for now, until the problems reported by Marek and
Anders get sorted out.

> > ---
> >  drivers/iommu/of_iommu.c | 1 -
> >  drivers/pci/of.c | 8 +++-
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> > index a9d2df001149..54a14da242cc 100644
> > --- a/drivers/iommu/of_iommu.c
> > +++ b/drivers/iommu/of_iommu.c
> > @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct 
> > device *dev,
> > .np = master_np,
> > };
> >  
> > -   pci_request_acs();
> > err = pci_for_each_dma_alias(to_pci_dev(dev),
> >  of_pci_iommu_init, );
> > } else {
> > diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> > index da5b414d585a..2313c3f848b0 100644
> > --- a/drivers/pci/of.c
> > +++ b/drivers/pci/of.c
> > @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct 
> > device *dev,
> >  
> >  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> > *bridge)
> >  {
> > -   if (!dev->of_node)
> > +   struct device_node *node = dev->of_node;
> > +
> > +   if (!node)
> > return 0;
> >  
> > +   /* Detect IOMMU and make sure ACS will be enabled */
> > +   if (of_property_read_bool(node, "iommu-map"))
> > +   pci_request_acs();
> > +
> > bridge->swizzle_irq = pci_common_swizzle;
> > bridge->map_irq = of_irq_parse_and_map_pci;
> >  
> > -- 
> > 2.19.1
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-08-20 Thread Bjorn Helgaas
On Fri, May 21, 2021 at 03:03:24AM +, Wang Xingang wrote:
> From: Xingang Wang 
> 
> When booting with devicetree, the pci_request_acs() is called after the
> enumeration and initialization of PCI devices, thus the ACS is not
> enabled. And ACS should be enabled when IOMMU is detected for the
> PCI host bridge, so add check for IOMMU before probe of PCI host and call
> pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> devices.
> 
> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 

Applied to pci/virtualization for v5.15, thanks!

> ---
>  drivers/iommu/of_iommu.c | 1 -
>  drivers/pci/of.c | 8 +++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..54a14da242cc 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>  
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index da5b414d585a..2313c3f848b0 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
> *dev,
>  
>  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> *bridge)
>  {
> - if (!dev->of_node)
> + struct device_node *node = dev->of_node;
> +
> + if (!node)
>   return 0;
>  
> + /* Detect IOMMU and make sure ACS will be enabled */
> + if (of_property_read_bool(node, "iommu-map"))
> + pci_request_acs();
> +
>   bridge->swizzle_irq = pci_common_swizzle;
>   bridge->map_irq = of_irq_parse_and_map_pci;
>  
> -- 
> 2.19.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-06-04 Thread Bjorn Helgaas
[+cc John, who tested 6bf6c24720d3]

On Fri, May 21, 2021 at 03:03:24AM +, Wang Xingang wrote:
> From: Xingang Wang 
> 
> When booting with devicetree, the pci_request_acs() is called after the
> enumeration and initialization of PCI devices, thus the ACS is not
> enabled. And ACS should be enabled when IOMMU is detected for the
> PCI host bridge, so add check for IOMMU before probe of PCI host and call
> pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> devices.

I'm happy to apply this, but I'm a little puzzled about 6bf6c24720d3
("iommu/of: Request ACS from the PCI core when configuring IOMMU
linkage").  It was tested and fixed a problem, but I don't understand
how.

6bf6c24720d3 added the call to pci_request_acs() in
of_iommu_configure() so it currently looks like this:

  of_iommu_configure(dev, ...)
  {
if (dev_is_pci(dev))
  pci_request_acs();

pci_request_acs() sets pci_acs_enable, which tells us to enable ACS
when enumerating PCI devices in the future.  But we only call
pci_request_acs() if we already *have* a PCI device.

So maybe 6bf6c24720d3 fixed a problem for *some* PCI devices, but not
all?  E.g., did we call of_iommu_configure() for one PCI device before
enumerating the rest?

> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 
> ---
>  drivers/iommu/of_iommu.c | 1 -
>  drivers/pci/of.c | 8 +++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..54a14da242cc 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>  
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index da5b414d585a..2313c3f848b0 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
> *dev,
>  
>  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> *bridge)
>  {
> - if (!dev->of_node)
> + struct device_node *node = dev->of_node;
> +
> + if (!node)
>   return 0;
>  
> + /* Detect IOMMU and make sure ACS will be enabled */
> + if (of_property_read_bool(node, "iommu-map"))
> + pci_request_acs();
> +
>   bridge->swizzle_irq = pci_common_swizzle;
>   bridge->map_irq = of_irq_parse_and_map_pci;
>  
> -- 
> 2.19.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/of: Fix request and enable ACS for of_iommu_configure

2021-05-07 Thread Bjorn Helgaas
On Fri, May 07, 2021 at 12:49:53PM +, Wang Xingang wrote:
> From: Xingang Wang 
> 
> When request ACS for PCI device in of_iommu_configure, the pci device
> has already been scanned and added with 'pci_acs_enable=0'. So the
> pci_request_acs() in current procedure does not work for enabling ACS.
> Besides, the ACS should be enabled only if there's an IOMMU in system.
> So this fix the call of pci_request_acs() and call pci_enable_acs() to
> make sure ACS is enabled for the pci_device.

For consistency:

  s/of_iommu_configure/of_iommu_configure()/
  s/pci device/PCI device/
  s/pci_device/PCI device/

But I'm confused about what problem this fixes.  On x86, I think we
*do* set pci_acs_enable=1 in this path:

  start_kernel
mm_init
  mem_init
pci_iommu_alloc
  p->detect()
detect_intel_iommu   # IOMMU_INIT_POST(detect_intel_iommu)
  pci_request_acs
pci_acs_enable = 1

before enumerating any PCI devices.

But you mentioned pci_host_common_probe(), which I think is mostly
used on non-x86 architectures, and I'm guessing those arches detect
the IOMMU differently.

So my question is, can we figure out how to detect IOMMUs the same way
across all arches?

> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 
> ---
>  drivers/iommu/of_iommu.c | 10 +-
>  drivers/pci/pci.c|  2 +-
>  include/linux/pci.h  |  1 +
>  3 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..dc621861ae72 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>  
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> @@ -222,6 +221,15 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   /* The fwspec pointer changed, read it again */
>   fwspec = dev_iommu_fwspec_get(dev);
>   ops= fwspec->ops;
> +
> + /*
> +  * If we found an IOMMU and the device is pci,
> +  * make sure we enable ACS.

s/pci/PCI/ for consistency.

> +  */
> + if (dev_is_pci(dev)) {
> + pci_request_acs();
> + pci_enable_acs(to_pci_dev(dev));
> + }
>   }
>   /*
>* If we have reason to believe the IOMMU driver missed the initial
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b717680377a9..4e4f98ee2870 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -926,7 +926,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   * pci_enable_acs - enable ACS if hardware support it
>   * @dev: the PCI device
>   */
> -static void pci_enable_acs(struct pci_dev *dev)
> +void pci_enable_acs(struct pci_dev *dev)
>  {
>   if (!pci_acs_enable)
>   goto disable_acs_redir;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c20211e59a57..e6a8bfbc9c98 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -2223,6 +2223,7 @@ static inline struct pci_dev 
> *pcie_find_root_port(struct pci_dev *dev)
>  }
>  
>  void pci_request_acs(void);
> +void pci_enable_acs(struct pci_dev *dev);
>  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
>  bool pci_acs_path_enabled(struct pci_dev *start,
> struct pci_dev *end, u16 acs_flags);
> -- 
> 2.19.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] pci: Rename pci_dev->untrusted to pci_dev->external

2021-04-20 Thread Bjorn Helgaas
On Tue, Apr 20, 2021 at 07:10:06AM +0100, Christoph Hellwig wrote:
> On Mon, Apr 19, 2021 at 05:30:49PM -0700, Rajat Jain wrote:
> > The current flag name "untrusted" is not correct as it is populated
> > using the firmware property "external-facing" for the parent ports. In
> > other words, the firmware only says which ports are external facing, so
> > the field really identifies the devices as external (vs internal).
> > 
> > Only field renaming. No functional change intended.
> 
> I don't think this is a good idea.  First the field should have been
> added to the generic struct device as requested multiple times before.

Fair point.  There isn't anything PCI-specific about this idea.  The
ACPI "ExternalFacingPort" and DT "external-facing" are currently only
defined for PCI devices, but could be applied elsewhere.

> Right now this requires horrible hacks in the IOMMU code to get at the
> pci_dev, and also doesn't scale to various other potential users.

Agreed, this is definitely suboptimal.  Do you have other users in
mind?  Maybe they could help inform the plan.

> Second the untrusted is objectively a better name.  Because untrusted
> is how we treat the device, which is what mattes.  External is just
> how we come to that conclusion.

The decision to treat "external" as being "untrusted" is a little bit
of policy that the PCI core really doesn't care about, so I think it
does make some sense to let the places that *do* care decide what to
trust based on "external" and possibly other factors, e.g., whether
the device is a BMC or processes untrusted data, etc.

But I guess it makes sense to wait until we have a better motivation
before renaming it, since we don't gain any functionality here.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 02/11] PCI/P2PDMA: Avoid pci_get_slot() which sleeps

2021-03-12 Thread Bjorn Helgaas
On Thu, Mar 11, 2021 at 04:31:32PM -0700, Logan Gunthorpe wrote:
> In order to use upstream_bridge_distance_warn() from a dma_map function,
> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
> might sleep.
> 
> In order to avoid this, try to get the host bridge's device from
> bus->self, and if that is not set just get the first element in the
> list. It should be impossible for the host bridges device to go away
> while references are held on child devices, so the first element
> should not change and this should be safe.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/pci/p2pdma.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index bd89437faf06..2135fe69bb07 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -311,11 +311,15 @@ static const struct pci_p2pdma_whitelist_entry {
>  static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>   bool same_host_bridge)
>  {
> - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>   const struct pci_p2pdma_whitelist_entry *entry;
> + struct pci_dev *root = host->bus->self;
>   unsigned short vendor, device;
>  
>   if (!root)
> + root = list_first_entry_or_null(>bus->devices,
> + struct pci_dev, bus_list);

Replacing one ugliness (assuming there is a pci_dev for the host
bridge, and that it is at 00.0) with another (still assuming a pci_dev
and that it is host->bus->self or the first entry).  I can't suggest
anything better, but maybe a little comment in the code would help
future readers.

I wish we had a real way to discover this property without the
whitelist, at least for future devices.  Was there ever any interest
in a _DSM or similar interface for this?

I *am* very glad to remove a pci_get_slot() usage.

> +
> + if (!root || root->devfn)
>   return false;
>  
>   vendor = root->vendor;

Don't you need to also remove the "pci_dev_put(root)" a few lines
below?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 01/11] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()

2021-03-12 Thread Bjorn Helgaas
On Thu, Mar 11, 2021 at 04:31:31PM -0700, Logan Gunthorpe wrote:
> In order to call this function from a dma_map function, it must not sleep.
> The only reason it does sleep so to allocate the seqbuf to print
> which devices are within the ACS path.

s/this function/upstream_bridge_distance_warn()/ ?
s/so to/is to/

Maybe the subject could say something about the purpose, e.g., allow
calling from atomic context or something?  "Pass gfp_mask flags" sort
of restates what we can read from the patch, but without the
motivation of why this is useful.

> Switch the kmalloc call to use a passed in gfp_mask  and don't print that
> message if the buffer fails to be allocated.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 196382630363..bd89437faf06 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
>  
>  static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev 
> *pdev)
>  {
> - if (!buf)
> + if (!buf || !buf->buffer)
>   return;
>  
>   seq_buf_printf(buf, "%s;", pci_name(pdev));
> @@ -495,25 +495,26 @@ upstream_bridge_distance(struct pci_dev *provider, 
> struct pci_dev *client,
>  
>  static enum pci_p2pdma_map_type
>  upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev 
> *client,
> -   int *dist)
> +   int *dist, gfp_t gfp_mask)
>  {
>   struct seq_buf acs_list;
>   bool acs_redirects;
>   int ret;
>  
> - seq_buf_init(_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> - if (!acs_list.buffer)
> - return -ENOMEM;
> + seq_buf_init(_list, kmalloc(PAGE_SIZE, gfp_mask), PAGE_SIZE);
>  
>   ret = upstream_bridge_distance(provider, client, dist, _redirects,
>  _list);
>   if (acs_redirects) {
>   pci_warn(client, "ACS redirect is set between the client and 
> provider (%s)\n",
>pci_name(provider));
> - /* Drop final semicolon */
> - acs_list.buffer[acs_list.len-1] = 0;
> - pci_warn(client, "to disable ACS redirect for this path, add 
> the kernel parameter: pci=disable_acs_redir=%s\n",
> -  acs_list.buffer);
> +
> + if (acs_list.buffer) {
> + /* Drop final semicolon */
> + acs_list.buffer[acs_list.len - 1] = 0;
> + pci_warn(client, "to disable ACS redirect for this 
> path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> +  acs_list.buffer);
> + }
>   }
>  
>   if (ret == PCI_P2PDMA_MAP_NOT_SUPPORTED) {
> @@ -566,7 +567,7 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, 
> struct device **clients,
>  
>   if (verbose)
>   ret = upstream_bridge_distance_warn(provider,
> - pci_client, );
> + pci_client, , GFP_KERNEL);
>   else
>   ret = upstream_bridge_distance(provider, pci_client,
>  , NULL, NULL);
> -- 
> 2.20.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] PCI: vmd: Disable MSI/X remapping when possible

2021-02-05 Thread Bjorn Helgaas
On Thu, Feb 04, 2021 at 12:09:06PM -0700, Jon Derrick wrote:
> VMD will retransmit child device MSI/X using its own MSI/X table and
> requester-id. This limits the number of MSI/X available to the whole
> child device domain to the number of VMD MSI/X interrupts.
> 
> Some VMD devices have a mode where this remapping can be disabled,
> allowing child device interrupts to bypass processing with the VMD MSI/X
> domain interrupt handler and going straight the child device interrupt
> handler, allowing for better performance and scaling. The requester-id
> still gets changed to the VMD endpoint's requester-id, and the interrupt
> remapping handlers have been updated to properly set IRTE for child
> device interrupts to the VMD endpoint's context.
> 
> Some VMD platforms have existing production BIOS which rely on MSI/X
> remapping and won't explicitly program the MSI/X remapping bit. This
> re-enables MSI/X remapping on unload.

Trivial comments below.  Would you mind using "MSI-X" instead of
"MSI/X" so it matches the usage in the PCIe specs?  Several mentions
above (including subject) and below.

> Signed-off-by: Jon Derrick 
> ---
>  drivers/pci/controller/vmd.c | 60 
>  1 file changed, 48 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 5e80f28f0119..a319ce49645b 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -59,6 +59,13 @@ enum vmd_features {
>* be used for MSI remapping
>*/
>   VMD_FEAT_OFFSET_FIRST_VECTOR= (1 << 3),
> +
> + /*
> +  * Device can bypass remapping MSI/X transactions into its MSI/X table,
> +  * avoding the requirement of a VMD MSI domain for child device

s/avoding/avoiding/

> +  * interrupt handling

Maybe a period at the end of the sentence.

> +  */
> + VMD_FEAT_BYPASS_MSI_REMAP   = (1 << 4),
>  };
>  
>  /*
> @@ -306,6 +313,15 @@ static struct msi_domain_info vmd_msi_domain_info = {
>   .chip   = _msi_controller,
>  };
>  
> +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable)
> +{
> + u16 reg;
> +
> + pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, );
> + reg = enable ? (reg & ~0x2) : (reg | 0x2);

Would be nice to have a #define for 0x2.

> + pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg);
> +}
> +
>  static int vmd_create_irq_domain(struct vmd_dev *vmd)
>  {
>   struct fwnode_handle *fn;
> @@ -325,6 +341,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd)
>  
>  static void vmd_remove_irq_domain(struct vmd_dev *vmd)
>  {
> + /*
> +  * Some production BIOS won't enable remapping between soft reboots.
> +  * Ensure remapping is restored before unloading the driver.
> +  */
> + if (!vmd->msix_count)
> + vmd_enable_msi_remapping(vmd, true);
> +
>   if (vmd->irq_domain) {
>   struct fwnode_handle *fn = vmd->irq_domain->fwnode;
>  
> @@ -679,15 +702,31 @@ static int vmd_enable_domain(struct vmd_dev *vmd, 
> unsigned long features)
>  
>   sd->node = pcibus_to_node(vmd->dev->bus);
>  
> - ret = vmd_create_irq_domain(vmd);
> - if (ret)
> - return ret;
> -
>   /*
> -  * Override the irq domain bus token so the domain can be distinguished
> -  * from a regular PCI/MSI domain.
> +  * Currently MSI remapping must be enabled in guest passthrough mode
> +  * due to some missing interrupt remapping plumbing. This is probably
> +  * acceptable because the guest is usually CPU-limited and MSI
> +  * remapping doesn't become a performance bottleneck.
>*/
> - irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> + if (!(features & VMD_FEAT_BYPASS_MSI_REMAP) || offset[0] || offset[1]) {
> + ret = vmd_alloc_irqs(vmd);
> + if (ret)
> + return ret;
> +
> + vmd_enable_msi_remapping(vmd, true);
> +
> + ret = vmd_create_irq_domain(vmd);
> + if (ret)
> + return ret;
> +
> + /*
> +  * Override the irq domain bus token so the domain can be
> +  * distinguished from a regular PCI/MSI domain.
> +  */
> + irq_domain_update_bus_token(vmd->irq_domain, 
> DOMAIN_BUS_VMD_MSI);
> + } else {
> + vmd_enable_msi_remapping(vmd, false);
> + }
>  
>   pci_add_resource(, >resources[0]);
>   pci_add_resource_offset(, >resources[1], offset[0]);
> @@ -753,10 +792,6 @@ static int vmd_probe(struct pci_dev *dev, const struct 
> pci_device_id *id)
>   if (features & VMD_FEAT_OFFSET_FIRST_VECTOR)
>   vmd->first_vec = 1;
>  
> - err = vmd_alloc_irqs(vmd);
> - if (err)
> - return err;
> -
>   spin_lock_init(>cfg_lock);
>   pci_set_drvdata(dev, vmd);
>   err = vmd_enable_domain(vmd, features);
> @@ 

[PATCH] iommu/vt-d: Fix 'physical' typos

2021-01-26 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Fix misspellings of "physical".

Signed-off-by: Bjorn Helgaas 
---
 include/linux/intel-iommu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 09c6a0bf3892..3ae86385b222 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -662,7 +662,7 @@ static inline struct dmar_domain *to_dmar_domain(struct 
iommu_domain *dom)
  * 7: super page
  * 8-10: available
  * 11: snoop behavior
- * 12-63: Host physcial address
+ * 12-63: Host physical address
  */
 struct dma_pte {
u64 val;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-12-17 Thread Bjorn Helgaas
On Wed, Dec 16, 2020 at 07:24:30PM +0800, Zhou Wang wrote:
> On 2020/6/23 23:04, Bjorn Helgaas wrote:
> > On Fri, Jun 19, 2020 at 10:26:54AM +0800, Zhangfei Gao wrote:
> >> Have studied _DSM method, two issues we met comparing using quirk.
> >>
> >> 1. Need change definition of either pci_host_bridge or pci_dev, like adding
> >> member can_stall,
> >> while pci system does not know stall now.
> >>
> >> a, pci devices do not have uuid: uuid need be described in dsdt, while pci
> >> devices are not defined in dsdt.
> >> so we have to use host bridge.
> > 
> > PCI devices *can* be described in the DSDT.  IIUC these particular
> > devices are hardwired (not plug-in cards), so platform firmware can
> > know about them and could describe them in the DSDT.
> > 
> >> b,  Parsing dsdt is in in pci subsystem.
> >> Like drivers/acpi/pci_root.c:
> >>obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), 
> >> _acpi_dsm_guid,
> >> 1,
> >> IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> >>
> >> After parsing DSM in pci, we need record this info.
> >> Currently, can_stall info is recorded in iommu_fwspec,
> >> which is allocated in iommu_fwspec_init and called by iort_iommu_configure
> >> for uefi.
> > 
> > You can look for a _DSM wherever it is convenient for you.  It could
> > be in an AMBA shim layer.
> > 
> >> 2. Guest kernel also need support sva.
> >> Using quirk, the guest can boot with sva enabled, since quirk is
> >> self-contained by kernel.
> >> If using  _DSM, a specific uefi or dtb has to be provided,
> >> currently we can useQEMU_EFI.fd from apt install qemu-efi
> > 
> > I don't quite understand what this means, but as I mentioned before, a
> > quirk for a *limited* number of devices is OK, as long as there is a
> > plan that removes the need for a quirk for future devices.
> > 
> > E.g., if the next platform version ships with a DTB or firmware with a
> > _DSM or other mechanism that enables the kernel to discover this
> > information without a kernel change, it's fine to use a quirk to cover
> > the early platform.
> > 
> > The principles are:
> > 
> >   - I don't want to have to update a quirk for every new Device ID
> > that needs this.
> 
> Hi Bjorn and Zhangfei,
> 
> We plan to use ATS/PRI to support SVA in future PCI devices. However, for
> current devices, we need to add limited number of quirk to let them
> work. The device IDs of current quirk needed devices are ZIP engine(0xa250, 
> 0xa251),
> SEC engine(0xa255, 0xa256), HPRE engine(0xa258, 0xa259), revision id are
> 0x21 and 0x30.
> 
> Let's continue to upstream these quirks!

Please post the patches you propose.  I don't think the previous ones
are in my queue.  Please include the lore URL for the previous
posting(s) in the cover letter so we can connect the discussion.

> >   - I don't really want to have to manage non-PCI information in the
> > struct pci_dev.  If this is AMBA- or IOMMU-related, it should be
> > stored in a structure related to AMBA or the IOMMU.
> > .
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-12-10 Thread Bjorn Helgaas
On Thu, Dec 10, 2020 at 03:36:36PM +, Deucher, Alexander wrote:
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> > Sent: Thursday, December 10, 2020 5:48 AM
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> > 
> > Alright. Done that.
> > This should be it finally I believe.
> > Which will be the initial kernel-version that incorporates that?
> 
> Looks good to me.  Bjorn, can you pick this up for PCI?

Didn't apply cleanly, but I applied it by hand to pci/misc for v5.11.
If all goes well it should appear in v5.11-rc1.

https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/misc=23bb0d9a9fe70a8ff23f53af822f2c6e6f261818

> > -Original Message-
> > From: Deucher, Alexander 
> > Sent: Mittwoch, 9. Dezember 2020 15:24
> > To: Merger, Edgar [AUTOSOL/MAS/AUGS] ; 
> > Huang, Ray ; Kuehling, Felix 
> > 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> > 
> > [AMD Public Use]
> > 
> > > -Original Message-
> > > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > 
> > > Sent: Wednesday, December 9, 2020 2:59 AM
> > > To: Deucher, Alexander ; Huang, Ray 
> > > ; Kuehling, Felix 
> > > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > > Helgaas ; Joerg Roedel ; Zhu, 
> > > Changfeng 
> > > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > > broken
> > >
> > > Alex,
> > >
> > > I had to revise the patch. Please see attachment. It is actually two 
> > > more SSIDs affected to that.
> > 
> > Other than some minor whitespace issues, the patch looks fine to me.
> > Please align the subsystem_device lines and put the closing 
> > parenthesis on the same line as the last check.
> > 
> > Thanks!
> > 
> > Alex
> > 
> > >
> > > Best regards,
> > > Edgar
> > >
> > > -Original Message-
> > > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > > Sent: Dienstag, 8. Dezember 2020 09:23
> > > To: 'Deucher, Alexander' ; 'Huang, Ray'
> > > ; 'Kuehling, Felix' 
> > > Cc: 'Will Deacon' ; 'linux-ker...@vger.kernel.org'
> > > ; 'linux-...@vger.kernel.org'  > > p...@vger.kernel.org>; 'iommu@lists.linux-foundation.org'
> > > ; 'Bjorn Helgaas'
> > > ; 'Joerg Roedel' ; 'Zhu, 
> > > Changfeng' 
> > > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > > broken
> > >
> > > Applied the patch as in attachment. Verified that ATS for GPU-Device 
> > > had been disabled. See attachment "dmesg_ATS.log".
> > >
> > > Was running that build over night successfully.
> > >
> > > -Original Message-
> > > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > > Sent: Montag, 7. Dezember 2020 05:53
> > > To: Deucher, Alexander ; Huang, Ray 
> > > ; Kuehling, Felix 
> > > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > > Helgaas ; Joerg Roedel ; Zhu, 
> > > Changfeng 
> > > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > > broken
> > >
> > > Hi Alex,
> > >
> > > I believe in the patch file, this
> > > + (pdev->subsystem_device == 0x0c19 ||
> > > +      pdev->subsystem_device == 0x0c10))
> > >
> > > Has to be changed to:
> > > + (pdev->subsystem_device == 0xce19 ||
> > > +  pdev->subsystem_device == 0xcc10))
> > >
> > > Because our SSIDs are "ea50:ce19" and "ea50:cc10" respectively and 
> > > another one would "ea50:cc08".
> > >
> > > I will apply that patch and feedback the results soon plus the patch 
> > > file that I actually had applied.
> > &g

Re: [RFC PATCH 03/15] PCI/P2PDMA: Introduce pci_p2pdma_should_map_bus() and pci_p2pdma_bus_offset()

2020-11-10 Thread Bjorn Helgaas
On Fri, Nov 06, 2020 at 10:00:24AM -0700, Logan Gunthorpe wrote:
> Introduce pci_p2pdma_should_map_bus() which is meant to be called by
> dma map functions to determine how to map a given p2pdma page.

s/dma/DMA/ for consistency (also below in function comment)

> pci_p2pdma_bus_offset() is also added to allow callers to get the bus
> offset if they need to map the bus address.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/pci/p2pdma.c   | 46 ++
>  include/linux/pci-p2pdma.h | 11 +
>  2 files changed, 57 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index ea8472278b11..9961e779f430 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -930,6 +930,52 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, 
> struct scatterlist *sg,
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>  
> +/**
> + * pci_p2pdma_bus_offset - returns the bus offset for a given page
> + * @page: page to get the offset for
> + *
> + * Must be passed a pci p2pdma page.

s/pci/PCI/

> + */
> +u64 pci_p2pdma_bus_offset(struct page *page)
> +{
> + struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(page->pgmap);
> +
> + WARN_ON(!is_pci_p2pdma_page(page));
> +
> + return p2p_pgmap->bus_offset;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_bus_offset);
> +
> +/**
> + * pci_p2pdma_should_map_bus - determine if a dma mapping should use the
> + *   bus address
> + * @dev: device doing the DMA request
> + * @pgmap: dev_pagemap structure for the mapping
> + *
> + * Returns 1 if the page should be mapped with a bus address, 0 otherwise
> + * and -1 the device should not be mapping P2PDMA pages.

I think this is missing a word.

I'm not really sure how to interpret the "should" in
pci_p2pdma_should_map_bus().  If this returns -1, does that mean the
patches *cannot* be mapped?  They *could* be mapped, but you really
*shouldn't*?  Something else?

1 means page should be mapped with bus address.  0 means ... what,
exactly?  It should be mapped with some different address?

Sorry these are naive questions because I don't know how all this
works.

> + */
> +int pci_p2pdma_should_map_bus(struct device *dev, struct dev_pagemap *pgmap)
> +{
> + struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(pgmap);
> + struct pci_dev *client;
> +
> + if (!dev_is_pci(dev))
> + return -1;
> +
> + client = to_pci_dev(dev);
> +
> + switch (pci_p2pdma_map_type(p2p_pgmap->provider, client)) {
> + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> + return 0;
> + case PCI_P2PDMA_MAP_BUS_ADDR:
> + return 1;
> + default:
> + return -1;
> + }
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_should_map_bus);
> +
>  /**
>   * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
>   *   to enable p2pdma
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index 8318a97c9c61..fc5de47eeac4 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -34,6 +34,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
> scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
>  void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
> +u64 pci_p2pdma_bus_offset(struct page *page);
> +int pci_p2pdma_should_map_bus(struct device *dev, struct dev_pagemap *pgmap);
>  int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
>   bool *use_p2pdma);
>  ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
> @@ -83,6 +85,15 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev 
> *pdev,
>  static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>  {
>  }
> +static inline u64 pci_p2pdma_bus_offset(struct page *page)
> +{
> + return -1;
> +}
> +static inline int pci_p2pdma_should_map_bus(struct device *dev,
> + struct dev_pagemap *pgmap)
> +{
> + return -1;
> +}
>  static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
>   struct scatterlist *sg, int nents, enum dma_data_direction dir,
>   unsigned long attrs)
> -- 
> 2.20.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] PCI/p2p: cleanup up __pci_p2pdma_map_sg a bit

2020-11-04 Thread Bjorn Helgaas
s|PCI/p2p: cleanup up __pci_p2pdma_map_sg|PCI/P2PDMA: Cleanup up 
__pci_p2pdma_map_sg|
to match history.

On Wed, Nov 04, 2020 at 10:50:51AM +0100, Christoph Hellwig wrote:
> Remove the pointless paddr variable that was only used once.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index b07018af53876c..afd792cc272832 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -825,13 +825,10 @@ static int __pci_p2pdma_map_sg(struct 
> pci_p2pdma_pagemap *p2p_pgmap,
>   struct device *dev, struct scatterlist *sg, int nents)
>  {
>   struct scatterlist *s;
> - phys_addr_t paddr;
>   int i;
>  
>   for_each_sg(sg, s, nents, i) {
> - paddr = sg_phys(s);
> -
> - s->dma_address = paddr - p2p_pgmap->bus_offset;
> + s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
>   sg_dma_len(s) = s->length;
>   }
>  
> -- 
> 2.28.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] PCI/p2p: remove the DMA_VIRT_OPS hacks

2020-11-04 Thread Bjorn Helgaas
s|PCI/p2p: remove|PCI/P2PDMA: Remove/
to match history.

On Wed, Nov 04, 2020 at 10:50:50AM +0100, Christoph Hellwig wrote:
> Now that all users of dma_virt_ops are gone we can remove the workaround
> for it in the PCIe peer to peer code.

s/PCIe/PCI/
We went to some trouble to make P2PDMA work on conventional PCI as
well as PCIe.

> Signed-off-by: Christoph Hellwig 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c | 20 
>  1 file changed, 20 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index de1c331dbed43f..b07018af53876c 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -556,15 +556,6 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, 
> struct device **clients,
>   return -1;
>  
>   for (i = 0; i < num_clients; i++) {
> -#ifdef CONFIG_DMA_VIRT_OPS
> - if (clients[i]->dma_ops == _virt_ops) {
> - if (verbose)
> - dev_warn(clients[i],
> -  "cannot be used for peer-to-peer DMA 
> because the driver makes use of dma_virt_ops\n");
> - return -1;
> - }
> -#endif
> -
>   pci_client = find_parent_pci_dev(clients[i]);
>   if (!pci_client) {
>   if (verbose)
> @@ -837,17 +828,6 @@ static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap 
> *p2p_pgmap,
>   phys_addr_t paddr;
>   int i;
>  
> - /*
> -  * p2pdma mappings are not compatible with devices that use
> -  * dma_virt_ops. If the upper layers do the right thing
> -  * this should never happen because it will be prevented
> -  * by the check in pci_p2pdma_distance_many()
> -  */
> -#ifdef CONFIG_DMA_VIRT_OPS
> - if (WARN_ON_ONCE(dev->dma_ops == _virt_ops))
> - return 0;
> -#endif
> -
>   for_each_sg(sg, s, nents, i) {
>   paddr = sg_phys(s);
>  
> -- 
> 2.28.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Bug 209321] DMAR: [DMA Read] Request device [03:00.0] PASID ffffffff fault addr fffd3000 [fault reason 06] PTE Read access is not set

2020-10-07 Thread Bjorn Helgaas
https://bugzilla.kernel.org/show_bug.cgi?id=209321

Not much detail in the bugzilla yet, but apparently this started in
v5.8.0-rc1:

  DMAR: [DMA Read] Request device [03:00.0] PASID  fault addr fffd3000 
[fault reason 06] PTE Read access is not set

Currently assigned to Driver/PCI, but not clear to me yet whether PCI
is the culprit or the victim.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-25 Thread Bjorn Helgaas
On Fri, Sep 25, 2020 at 10:12:43AM +0200, Jean-Philippe Brucker wrote:
> On Thu, Sep 24, 2020 at 10:22:03AM -0500, Bjorn Helgaas wrote:
> > On Fri, Aug 21, 2020 at 03:15:39PM +0200, Jean-Philippe Brucker wrote:

> > > + /* Perform the init sequence before we can read the config */
> > > + ret = viommu_pci_reset(common_cfg);
> > 
> > I guess this is some special device-specific reset, not any kind of
> > standard PCI reset?
> 
> Yes it's the virtio reset - writing 0 to the status register in the BAR.

I wonder if this should be named something like viommu_virtio_reset(),
so there's no confusion with PCI resets and all the timing
restrictions, config space restoration, etc. associated with them.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-24 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 03:15:39PM +0200, Jean-Philippe Brucker wrote:
> Platforms without device-tree nor ACPI can provide a topology
> description embedded into the virtio config space. Parse it.
> 
> Use PCI FIXUP to probe the config space early, because we need to
> discover the topology before any DMA configuration takes place, and the
> virtio driver may be loaded much later. Since we discover the topology
> description when probing the PCI hierarchy, the virtual IOMMU cannot
> manage other platform devices discovered earlier.

> +struct viommu_cap_config {
> + u8 bar;
> + u32 length; /* structure size */
> + u32 offset; /* structure offset within the bar */

s/the bar/the BAR/ (to match comment below).

> +static void viommu_pci_parse_topology(struct pci_dev *dev)
> +{
> + int ret;
> + u32 features;
> + void __iomem *regs, *common_regs;
> + struct viommu_cap_config cap = {0};
> + struct virtio_pci_common_cfg __iomem *common_cfg;
> +
> + /*
> +  * The virtio infrastructure might not be loaded at this point. We need
> +  * to access the BARs ourselves.
> +  */
> + ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_COMMON_CFG, );
> + if (!ret) {
> + pci_warn(dev, "common capability not found\n");

Is the lack of this capability really an error, i.e., is this
pci_warn() or pci_info()?  The "device doesn't have topology
description" below is only pci_dbg(), which suggests that we can live
without this.

Maybe a hint about what "common capability" means?

> + return;
> + }
> +
> + if (pci_enable_device_mem(dev))
> + return;
> +
> + common_regs = pci_iomap(dev, cap.bar, 0);
> + if (!common_regs)
> + return;
> +
> + common_cfg = common_regs + cap.offset;
> +
> + /* Perform the init sequence before we can read the config */
> + ret = viommu_pci_reset(common_cfg);

I guess this is some special device-specific reset, not any kind of
standard PCI reset?

> + if (ret < 0) {
> + pci_warn(dev, "unable to reset device\n");
> + goto out_unmap_common;
> + }
> +
> + iowrite8(VIRTIO_CONFIG_S_ACKNOWLEDGE, _cfg->device_status);
> + iowrite8(VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER,
> +  _cfg->device_status);
> +
> + /* Find out if the device supports topology description */
> + iowrite32(0, _cfg->device_feature_select);
> + features = ioread32(_cfg->device_feature);
> +
> + if (!(features & BIT(VIRTIO_IOMMU_F_TOPOLOGY))) {
> + pci_dbg(dev, "device doesn't have topology description");
> + goto out_reset;
> + }
> +
> + ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, );
> + if (!ret) {
> + pci_warn(dev, "device config capability not found\n");
> + goto out_reset;
> + }
> +
> + regs = pci_iomap(dev, cap.bar, 0);
> + if (!regs)
> + goto out_reset;
> +
> + pci_info(dev, "parsing virtio-iommu topology\n");
> + ret = viommu_parse_topology(>dev, regs + cap.offset,
> + pci_resource_len(dev, 0) - cap.offset);
> + if (ret)
> + pci_warn(dev, "failed to parse topology: %d\n", ret);
> +
> + pci_iounmap(dev, regs);
> +out_reset:
> + ret = viommu_pci_reset(common_cfg);
> + if (ret)
> + pci_warn(dev, "unable to reset device\n");
> +out_unmap_common:
> + pci_iounmap(dev, common_regs);
> +}
> +
> +/*
> + * Catch a PCI virtio-iommu implementation early to get the topology 
> description
> + * before we start probing other endpoints.
> + */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1040 + 
> VIRTIO_ID_IOMMU,
> + viommu_pci_parse_topology);
> -- 
> 2.28.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[bugzilla-dae...@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]

2020-09-23 Thread Bjorn Helgaas
[+cc IOMMU and NVMe folks]

Sorry, I forgot to forward this to linux-pci when it was first
reported.

Apparently this happens with v5.9-rc3, and may be related to
50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
which appeared in v5.8-rc3.

There are several dmesg logs and proposed patches in the bugzilla, but
no analysis yet of what the problem is.  From the first dmesg
attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):

  [   50.434945] PM: suspend entry (deep)
  [   50.802086] nvme :01:00.0: saving config space at offset 0x0 (reading 
0x11e0f)
  [   50.842775] ACPI: Preparing to enter system sleep state S3
  [   50.858922] ACPI: Waking up from system sleep state S3
  [   50.883622] nvme :01:00.0: can't change power state from D3hot to D0 
(config space inaccessible)
  [   50.947352] nvme :01:00.0: restoring config space at offset 0x0 (was 
0x, writing 0x11e0f)
  [   50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 
source:0x
  [   50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error 
detected
  [   50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected 
(Non-Fatal), type=Transaction Layer, (Receiver ID)
  [   50.947830] pcieport :00:1b.0:   device [8086:06ac] error 
status/mask=0020/0001
  [   50.947831] pcieport :00:1b.0:[21] ACSViol(First)
  [   50.947841] pcieport :00:1b.0: AER: broadcast error_detected message
  [   50.947843] nvme nvme0: frozen state error detected, reset controller

I suspect the nvme "can't change power state" and restore config space
errors are a consequence of the DPC event.  If DPC disables the link,
the device is inaccessible.

I don't know what caused the ACS Violation.  The AER TLP Header Log
might have a clue, but unfortunately we didn't print it.

Tangent:

  The fact that we didn't print the AER TLP Header log looks like
  a bug in itself.  PCIe r5.0, sec 6.2.7, table 6-5, says many
  errors, including ACS Violation, should log the TLP header.  But
  aer_get_device_error_info() only reads the log for error bits in
  AER_LOG_TLP_MASKS, which doesn't include PCI_ERR_UNC_ACSV.

  I don't think there's a "TLP Header Log Valid" bit, and it's ugly to
  have to update AER_LOG_TLP_MASKS if new errors are added.  I think
  maybe we should always print the header log.

- Forwarded message from bugzilla-dae...@bugzilla.kernel.org -

Date: Fri, 04 Sep 2020 14:31:20 +
From: bugzilla-dae...@bugzilla.kernel.org
To: bj...@helgaas.com
Subject: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in
hint" makes NVMe config space not accessible after S3
Message-ID: 

https://bugzilla.kernel.org/show_bug.cgi?id=209149

Bug ID: 209149
   Summary: "iommu/vt-d: Enable PCI ACS for platform opt in hint"
makes NVMe config space not accessible after S3
   Product: Drivers
   Version: 2.5
Kernel Version: mainline
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PCI
  Assignee: drivers_...@kernel-bugs.osdl.org
  Reporter: kai.heng.f...@canonical.com
Regression: No

Here's the error:
[   50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01
source:0x
[   50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error
detected
[   50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Receiver ID)
[   50.947830] pcieport :00:1b.0:   device [8086:06ac] error
status/mask=0020/0001
[   50.947831] pcieport :00:1b.0:[21] ACSViol(First)
[   50.947841] pcieport :00:1b.0: AER: broadcast error_detected message
[   50.947843] nvme nvme0: frozen state error detected, reset controller

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

- End forwarded message -
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5] PCI/ACS: Enable PCI_ACS_TB and disable only when needed for ATS

2020-09-16 Thread Bjorn Helgaas
On Tue, Jul 14, 2020 at 01:15:40PM -0700, Rajat Jain wrote:
> The ACS "Translation Blocking" bit blocks the translated addresses from
> the devices. We don't expect such traffic from devices unless ATS is
> enabled on them. A device sending such traffic without ATS enabled,
> indicates malicious intent, and thus should be blocked.
> 
> Enable PCI_ACS_TB by default for all devices, and it stays enabled until
> atleast one of the devices downstream wants to enable ATS. It gets
> disabled to enable ATS on a device downstream it, and then gets enabled
> back on once all the downstream devices don't need ATS.
> 
> Signed-off-by: Rajat Jain 

I applied v4 of this patch instead because I think the complexity of
this one, where we have to walk up the tree and disable TB in upstream
bridges, is too high.  It's always tricky to modify the state of
device Y when we're doing something for device X.

> ---
> Note that I'm ignoring the devices that require quirks to enable or
> disable ACS, instead of using the standard way for ACS configuration.
> The reason is that it would require adding yet another quirk table or
> quirk function pointer, that I don't know how to implement for those
> devices, and will neither have the devices to test that code.
> 
> v5: Enable TB and disable ATS for all devices on boot. Disable TB later
> only if needed to enable ATS on downstream devices.
> v4: Add braces to avoid warning from kernel robot
> print warning for only external-facing devices.
> v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
> Minor code comments fixes.
> v2: Commit log change
> 
>  drivers/pci/ats.c   |  5 
>  drivers/pci/pci.c   | 57 +
>  drivers/pci/pci.h   |  2 ++
>  drivers/pci/probe.c |  2 +-
>  include/linux/pci.h |  2 ++
>  5 files changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..e2ea9083f30f 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -28,6 +28,9 @@ void pci_ats_init(struct pci_dev *dev)
>   return;
>  
>   dev->ats_cap = pos;
> +
> + dev->ats_enabled = 1; /* To avoid WARN_ON from pci_disable_ats() */
> + pci_disable_ats(dev);
>  }
>  
>  /**
> @@ -82,6 +85,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
>   }
>   pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
>  
> + pci_disable_acs_trans_blocking(dev);
>   dev->ats_enabled = 1;
>   return 0;
>  }
> @@ -102,6 +106,7 @@ void pci_disable_ats(struct pci_dev *dev)
>   ctrl &= ~PCI_ATS_CTRL_ENABLE;
>   pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
>  
> + pci_enable_acs_trans_blocking(dev);
>   dev->ats_enabled = 0;
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_ats);
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 73a862782214..614e3c1e8c56 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -876,6 +876,9 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> @@ -904,6 +907,60 @@ static void pci_enable_acs(struct pci_dev *dev)
>   pci_disable_acs_redir(dev);
>  }
>  
> +void pci_disable_acs_trans_blocking(struct pci_dev *pdev)
> +{
> + u16 cap, ctrl, pos;
> + struct pci_dev *dev;
> +
> + if (!pci_acs_enable)
> + return;
> +
> + for (dev = pdev; dev; dev = pci_upstream_bridge(pdev)) {
> +
> + pos = dev->acs_cap;
> + if (!pos)
> + continue;
> +
> + /*
> +  * Disable translation blocking when first downstream
> +  * device that needs it (for ATS) wants to enable ATS
> +  */
> + if (++dev->ats_dependencies == 1) {
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, );
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
> + ctrl &= ~(cap & PCI_ACS_TB);
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> + }
> + }
> +}
> +
> +void pci_enable_acs_trans_blocking(struct pci_dev *pdev)
> +{
> + u16 cap, ctrl, pos;
> + struct pci_dev *dev;
> +
> + if (!pci_acs_enable)
> + return;
> +
> + for (dev = pdev; dev; dev = pci_upstream_bridge(pdev)) {
> +
> + pos = dev->acs_cap;
> + if (!pos)
> + continue;
> +
> + /*
> +  * Enable translation blocking when last downstream device
> +  * that depends on it (for ATS), doesn't need ATS anymore
> +  */
> + if (--dev->ats_dependencies == 0) {
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, );
> + pci_read_config_word(dev, pos 

Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-09-16 Thread Bjorn Helgaas
On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 

Applied (slightly modified) to pci/acs for v5.10, thanks!

I think the warning is superfluous because every external_facing
device is a Root Port or Switch Downstream Port, and if those support
ACS at all, they are required to support Translation Blocking.  So we
should only see the warning if the device is defective, and I don't
think we need to go out of our way to look for those.

> ---
> v4: Add braces to avoid warning from kernel robot
> print warning for only external-facing devices.
> v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
> Minor code comments fixes.
> v2: Commit log change
> 
>  drivers/pci/pci.c|  8 
>  drivers/pci/quirks.c | 15 +++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 73a8627822140..a5a6bea7af7ce 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..bb22b46c1d719 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF
> + *
> + * TODO: This quirk also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted
> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,14 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/dma: Fix IOVA reserve dma ranges

2020-09-11 Thread Bjorn Helgaas
On Fri, Sep 11, 2020 at 03:55:34PM +0530, Srinath Mannam wrote:
> Fix IOVA reserve failure in the case when address of first memory region
> listed in dma-ranges is equal to 0x0.
> 
> Fixes: aadad097cd46f ("iommu/dma: Reserve IOVA for PCIe inaccessible DMA 
> address")
> Signed-off-by: Srinath Mannam 
> ---
> Changes from v1:
>Removed unnecessary changes based on Robin's review comments.
> 
>  drivers/iommu/dma-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 5141d49a046b..682068a9aae7 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -217,7 +217,7 @@ static int iova_reserve_pci_windows(struct pci_dev *dev,
>   lo = iova_pfn(iovad, start);
>   hi = iova_pfn(iovad, end);
>   reserve_iova(iovad, lo, hi);
> - } else {
> + } else if (end < start) {
>   /* dma_ranges list should be sorted */
>   dev_err(>dev, "Failed to reserve IOVA\n");

You didn't actually change the error message, but the message would be
way more useful if it included the IOVA address range, e.g., the
format used in pci_register_host_bridge():

  bus address [%#010llx-%#010llx]

Incidentally, the pr_err() in copy_reserved_iova() looks bogus; it
prints iova->pfn_low twice, when it should probably print the base and
size or (my preference) something like the above:

pr_err("Reserve iova range %lx@%lx failed\n",
   iova->pfn_lo, iova->pfn_lo);

>   return -EINVAL;
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable

2020-08-27 Thread Bjorn Helgaas
[+cc Rob,
cover https://lore.kernel.org/r/20200826111628.794979...@linutronix.de/
this  https://lore.kernel.org/r/20200826112333.992429...@linutronix.de/]

On Wed, Aug 26, 2020 at 01:17:02PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.

Sorry, I really don't understand this, so these are probably stupid
questions.

If CONFIG_PCI_MSI_ARCH_FALLBACKS is defined, we will supply
implementations of:

  arch_setup_msi_irq
  arch_teardown_msi_irq
  arch_setup_msi_irqs
  arch_teardown_msi_irqs
  default_teardown_msi_irqs# non-weak

You select CONFIG_PCI_MSI_ARCH_FALLBACKS for ia64, mips, powerpc,
s390, sparc, and x86.  I see that all of those arches implement at
least one of the functions above.  But x86 doesn't and I can't figure
out why it needs to select CONFIG_PCI_MSI_ARCH_FALLBACKS.

I assume there's a way to convert these arches to hierarchical irq
domains so they wouldn't need this at all?  Is there a sample
conversion to look at?

And I can't figure out what's special about tegra, rcar, and xilinx
that makes them need it as well.  Is there something I could grep for
to identify them?  Is there a way to convert them so they don't need
it?

> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
>  void pci_msi_unmask_irq(struct irq_data *data);
>  
>  /*
> - * The arch hooks to setup up msi irqs. Those functions are
> - * implemented as weak symbols so that they /can/ be overriden by
> - * architecture specific code if needed.
> + * The arch hooks to setup up msi irqs. Default functions are implemented

s/msi/MSI/ to match the one below.

> + * as weak symbols so that they /can/ be overriden by architecture specific
> + * code if needed. These hooks must be enabled by the architecture or by
> + * drivers which depend on them via msi_controller based MSI handling.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI

2020-08-25 Thread Bjorn Helgaas
On Tue, Aug 25, 2020 at 11:30:41PM +0200, Thomas Gleixner wrote:
> On Tue, Aug 25 2020 at 15:24, Bjorn Helgaas wrote:
> > On Fri, Aug 21, 2020 at 02:24:58AM +0200, Thomas Gleixner wrote:
> >> Rename it to x86_msi_prepare() and handle the allocation type setup
> >> depending on the device type.
> >
> > I see what you're doing, but the subject reads a little strangely
> 
> Yes :(
> 
> > ("pci_msi_prepare() handling non-PCI" stuff) since it doesn't mention
> > the rename.  Maybe not practical or worthwhile to split into a rename
> > + make generic, I dunno.
> 
> What about
> 
> x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI

Perfect!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks

2020-08-25 Thread Bjorn Helgaas
On Tue, Aug 25, 2020 at 11:28:30PM +0200, Thomas Gleixner wrote:
> On Tue, Aug 25 2020 at 15:07, Bjorn Helgaas wrote:
> >> + * The arch hooks to setup up msi irqs. Default functions are implemented
> >> + * as weak symbols so that they /can/ be overriden by architecture 
> >> specific
> >> + * code if needed.
> >> + *
> >> + * They can be replaced by stubs with warnings via
> >> + * CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS when the architecture fully
> >> + * utilizes direct irqdomain based setup.

> > If not, it seems like it'd be nicer to have the burden on the arches
> > that need/want to use arch-specific code instead of on the arches that
> > do things generically.
> 
> Right, but they still share the common code there and some of them
> provide only parts of the weak callbacks. I'm not sure whether it's a
> good idea to copy all of this into each affected architecture.
> 
> Or did you just mean that those architectures should select
> CONFIG_I_WANT_THE CRUFT instead of opting out on the fully irq domain
> based ones?

Yes, that was my real question -- can we confine the cruft in the
crufty arches?  If not, no big deal.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:58AM +0200, Thomas Gleixner wrote:
> Rename it to x86_msi_prepare() and handle the allocation type setup
> depending on the device type.

I see what you're doing, but the subject reads a little strangely
("pci_msi_prepare() handling non-PCI" stuff) since it doesn't mention
the rename.  Maybe not practical or worthwhile to split into a rename
+ make generic, I dunno.

> Add a new arch_msi_prepare define which will be utilized by the upcoming
> device MSI support. Define it to NULL if not provided by an architecture in
> the generic MSI header.
> 
> One arch specific function for MSI support is truly enough.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-...@vger.kernel.org
> Cc: linux-hyp...@vger.kernel.org
> ---
>  arch/x86/include/asm/msi.h  |4 +++-
>  arch/x86/kernel/apic/msi.c  |   27 ---
>  drivers/pci/controller/pci-hyperv.c |2 +-
>  include/linux/msi.h |4 
>  4 files changed, 28 insertions(+), 9 deletions(-)
> 
> --- a/arch/x86/include/asm/msi.h
> +++ b/arch/x86/include/asm/msi.h
> @@ -6,7 +6,9 @@
>  
>  typedef struct irq_alloc_info msi_alloc_info_t;
>  
> -int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
> +int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
>   msi_alloc_info_t *arg);
>  
> +#define arch_msi_prepare x86_msi_prepare
> +
>  #endif /* _ASM_X86_MSI_H */
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -182,26 +182,39 @@ static struct irq_chip pci_msi_controlle
>   .flags  = IRQCHIP_SKIP_SET_WAKE,
>  };
>  
> -int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
> - msi_alloc_info_t *arg)
> +static void pci_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
>  {
> - struct pci_dev *pdev = to_pci_dev(dev);
> - struct msi_desc *desc = first_pci_msi_entry(pdev);
> + struct msi_desc *desc = first_msi_entry(dev);
>  
> - init_irq_alloc_info(arg, NULL);
>   if (desc->msi_attrib.is_msix) {
>   arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
>   } else {
>   arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
>   arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
>   }
> +}
> +
> +static void dev_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
> +{
> + arg->type = X86_IRQ_ALLOC_TYPE_DEV_MSI;
> +}
> +
> +int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
> + msi_alloc_info_t *arg)
> +{
> + init_irq_alloc_info(arg, NULL);
> +
> + if (dev_is_pci(dev))
> + pci_msi_prepare(dev, arg);
> + else
> + dev_msi_prepare(dev, arg);
>  
>   return 0;
>  }
> -EXPORT_SYMBOL_GPL(pci_msi_prepare);
> +EXPORT_SYMBOL_GPL(x86_msi_prepare);
>  
>  static struct msi_domain_ops pci_msi_domain_ops = {
> - .msi_prepare= pci_msi_prepare,
> + .msi_prepare= x86_msi_prepare,
>  };
>  
>  static struct msi_domain_info pci_msi_domain_info = {
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1532,7 +1532,7 @@ static struct irq_chip hv_msi_irq_chip =
>  };
>  
>  static struct msi_domain_ops hv_msi_ops = {
> - .msi_prepare= pci_msi_prepare,
> + .msi_prepare= arch_msi_prepare,
>   .msi_free   = hv_msi_free,
>  };
>  
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -430,4 +430,8 @@ static inline struct irq_domain *pci_msi
>  }
>  #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
>  
> +#ifndef arch_msi_prepare
> +# define arch_msi_prepareNULL
> +#endif
> +
>  #endif /* LINUX_MSI_H */
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code

2020-08-25 Thread Bjorn Helgaas
s/Reducde/Reduce/ (in subject)

On Fri, Aug 21, 2020 at 02:24:41AM +0200, Thomas Gleixner wrote:
> Adding a function call before the first #ifdef in arch_pci_init() triggers
> a 'mixed declarations and code' warning if PCI_DIRECT is enabled.
> 
> Use stub functions and move the #ifdeffery to the header file where it is
> not in the way.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-...@vger.kernel.org

Nice cleanup, thanks.  Glad to get rid of the useless initializer,
too.

Acked-by: Bjorn Helgaas 

> ---
>  arch/x86/include/asm/pci_x86.h |   11 +++
>  arch/x86/pci/init.c|   10 +++---
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -114,9 +114,20 @@ extern const struct pci_raw_ops pci_dire
>  extern bool port_cf9_safe;
>  
>  /* arch_initcall level */
> +#ifdef CONFIG_PCI_DIRECT
>  extern int pci_direct_probe(void);
>  extern void pci_direct_init(int type);
> +#else
> +static inline int pci_direct_probe(void) { return -1; }
> +static inline  void pci_direct_init(int type) { }
> +#endif
> +
> +#ifdef CONFIG_PCI_BIOS
>  extern void pci_pcbios_init(void);
> +#else
> +static inline void pci_pcbios_init(void) { }
> +#endif
> +
>  extern void __init dmi_check_pciprobe(void);
>  extern void __init dmi_check_skip_isa_align(void);
>  
> --- a/arch/x86/pci/init.c
> +++ b/arch/x86/pci/init.c
> @@ -8,11 +8,9 @@
> in the right sequence from here. */
>  static __init int pci_arch_init(void)
>  {
> -#ifdef CONFIG_PCI_DIRECT
> - int type = 0;
> + int type;
>  
>   type = pci_direct_probe();
> -#endif
>  
>   if (!(pci_probe & PCI_PROBE_NOEARLY))
>   pci_mmcfg_early_init();
> @@ -20,18 +18,16 @@ static __init int pci_arch_init(void)
>   if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
>   return 0;
>  
> -#ifdef CONFIG_PCI_BIOS
>   pci_pcbios_init();
> -#endif
> +
>   /*
>* don't check for raw_pci_ops here because we want pcbios as last
>* fallback, yet it's needed to run first to set pcibios_last_bus
>* in case legacy PCI probing is used. otherwise detecting peer busses
>* fails.
>*/
> -#ifdef CONFIG_PCI_DIRECT
>   pci_direct_init(type);
> -#endif
> +
>   if (!raw_pci_ops && !raw_pci_ext_ops)
>   printk(KERN_ERR
>   "PCI: Fatal: No config space access function found\n");
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:45AM +0200, Thomas Gleixner wrote:
> Provide a helper function to check whether a PCI device is handled by a
> non-standard PCI/MSI domain. This will be used to exclude such devices
> which hang of a special bus, e.g. VMD, to be excluded from the irq domain
> override in irq remapping.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org

Acked-by: Bjorn Helgaas 

s|PCI: MSI:|PCI/MSI:| in the subject if feasible.

> ---
>  drivers/pci/msi.c   |   22 ++
>  include/linux/msi.h |1 +
>  2 files changed, 23 insertions(+)
> 
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1553,4 +1553,26 @@ struct irq_domain *pci_msi_get_device_do
>DOMAIN_BUS_PCI_MSI);
>   return dom;
>  }
> +
> +/**
> + * pci_dev_has_special_msi_domain - Check whether the device is handled by
> + *   a non-standard PCI-MSI domain
> + * @pdev:The PCI device to check.
> + *
> + * Returns: True if the device irqdomain or the bus irqdomain is
> + * non-standard PCI/MSI.
> + */
> +bool pci_dev_has_special_msi_domain(struct pci_dev *pdev)
> +{
> + struct irq_domain *dom = dev_get_msi_domain(>dev);
> +
> + if (!dom)
> + dom = dev_get_msi_domain(>bus->dev);
> +
> + if (!dom)
> + return true;
> +
> + return dom->bus_token != DOMAIN_BUS_PCI_MSI;
> +}
> +
>  #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -374,6 +374,7 @@ int pci_msi_domain_check_cap(struct irq_
>struct msi_domain_info *info, struct device *dev);
>  u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
> *pdev);
>  struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
> +bool pci_dev_has_special_msi_domain(struct pci_dev *pdev);
>  #else
>  static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev 
> *pdev)
>  {
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:54AM +0200, Thomas Gleixner wrote:
> If an architecture does not require the MSI setup/teardown fallback
> functions, then allow them to be replaced by stub functions which emit a
> warning.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org

Acked-by: Bjorn Helgaas 

Question/comment below.

> ---
>  drivers/pci/Kconfig |3 +++
>  drivers/pci/msi.c   |3 ++-
>  include/linux/msi.h |   31 ++-
>  3 files changed, 31 insertions(+), 6 deletions(-)
> 
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
>   depends on PCI_MSI
>   select GENERIC_MSI_IRQ_DOMAIN
>  
> +config PCI_MSI_DISABLE_ARCH_FALLBACKS
> + bool
> +
>  config PCI_QUIRKS
>   default y
>   bool "Enable PCI quirk workarounds" if EXPERT
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -58,8 +58,8 @@ static void pci_msi_teardown_msi_irqs(st
>  #define pci_msi_teardown_msi_irqsarch_teardown_msi_irqs
>  #endif
>  
> +#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
>  /* Arch hooks */
> -
>  int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
>  {
>   struct msi_controller *chip = dev->bus->msi;
> @@ -132,6 +132,7 @@ void __weak arch_teardown_msi_irqs(struc
>  {
>   return default_teardown_msi_irqs(dev);
>  }
> +#endif /* !CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS */
>  
>  static void default_restore_msi_irq(struct pci_dev *dev, int irq)
>  {
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
>  void pci_msi_unmask_irq(struct irq_data *data);
>  
>  /*
> - * The arch hooks to setup up msi irqs. Those functions are
> - * implemented as weak symbols so that they /can/ be overriden by
> - * architecture specific code if needed.
> + * The arch hooks to setup up msi irqs. Default functions are implemented
> + * as weak symbols so that they /can/ be overriden by architecture specific
> + * code if needed.
> + *
> + * They can be replaced by stubs with warnings via
> + * CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS when the architecture fully
> + * utilizes direct irqdomain based setup.

Do you expect *all* arches to eventually use direct irqdomain setup?
And in that case, to remove the config option?

If not, it seems like it'd be nicer to have the burden on the arches
that need/want to use arch-specific code instead of on the arches that
do things generically.

>   */
> +#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
>  int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
>  void arch_teardown_msi_irq(unsigned int irq);
>  int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
>  void arch_teardown_msi_irqs(struct pci_dev *dev);
> -void arch_restore_msi_irqs(struct pci_dev *dev);
> -
>  void default_teardown_msi_irqs(struct pci_dev *dev);
> +#else
> +static inline int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int 
> type)
> +{
> + WARN_ON_ONCE(1);
> + return -ENODEV;
> +}
> +
> +static inline void arch_teardown_msi_irqs(struct pci_dev *dev)
> +{
> + WARN_ON_ONCE(1);
> +}
> +#endif
> +
> +/*
> + * The restore hooks are still available as they are useful even
> + * for fully irq domain based setups. Courtesy to XEN/X86.
> + */
> +void arch_restore_msi_irqs(struct pci_dev *dev);
>  void default_restore_msi_irqs(struct pci_dev *dev);
>  
>  struct msi_controller {
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:44AM +0200, Thomas Gleixner wrote:
> Devices on the VMD bus use their own MSI irq domain, but it is not
> distinguishable from regular PCI/MSI irq domains. This is required
> to exclude VMD devices from getting the irq domain pointer set by
> interrupt remapping.
> 
> Override the default bus token.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Bjorn Helgaas 
> Cc: Lorenzo Pieralisi 
> Cc: Jonathan Derrick 
> Cc: linux-...@vger.kernel.org

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/controller/vmd.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
>   return -ENODEV;
>   }
>  
> + /*
> +  * Override the irq domain bus token so the domain can be distinguished
> +  * from a regular PCI/MSI domain.
> +  */
> + irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> +
>   pci_add_resource(, >resources[0]);
>   pci_add_resource_offset(, >resources[1], offset[0]);
>   pci_add_resource_offset(, >resources[2], offset[1]);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq()

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:37AM +0200, Thomas Gleixner wrote:
> Retrieve the PCI device from the msi descriptor instead of doing so at the
> call sites.

I'd like it *better* with "PCI/MSI: " in the subject (to match history
and other patches in this series) and "MSI" here in the commit log,
but nice cleanup and:

Acked-by: Bjorn Helgaas 

Minor comments below.

> Signed-off-by: Thomas Gleixner 
> Cc: linux-...@vger.kernel.org
> ---
>  arch/x86/kernel/apic/msi.c |2 +-
>  drivers/pci/msi.c  |   13 ++---
>  include/linux/msi.h|3 +--
>  3 files changed, 8 insertions(+), 10 deletions(-)
> 
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -232,7 +232,7 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare);
>  
>  void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
>  {
> - arg->msi_hwirq = pci_msi_domain_calc_hwirq(arg->msi_dev, desc);
> + arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc);

I guess it's safe to assume that "arg->msi_dev ==
msi_desc_to_pci_dev(desc)"?  I didn't try to verify that.

>  }
>  EXPORT_SYMBOL_GPL(pci_msi_set_desc);
>  
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1346,17 +1346,17 @@ void pci_msi_domain_write_msg(struct irq
>  
>  /**
>   * pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source
> - * @dev: Pointer to the PCI device
>   * @desc:Pointer to the MSI descriptor
>   *
>   * The ID number is only used within the irqdomain.
>   */
> -irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
> -   struct msi_desc *desc)
> +irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
>  {
> + struct pci_dev *pdev = msi_desc_to_pci_dev(desc);

If you named this "struct pci_dev *dev" (not "pdev"), the diff would
be a little smaller and it would match other usage in the file.

>   return (irq_hw_number_t)desc->msi_attrib.entry_nr |
> - pci_dev_id(dev) << 11 |
> - (pci_domain_nr(dev->bus) & 0x) << 27;
> + pci_dev_id(pdev) << 11 |
> + (pci_domain_nr(pdev->bus) & 0x) << 27;
>  }
>  
>  static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
> @@ -1406,8 +1406,7 @@ static void pci_msi_domain_set_desc(msi_
>   struct msi_desc *desc)
>  {
>   arg->desc = desc;
> - arg->hwirq = pci_msi_domain_calc_hwirq(msi_desc_to_pci_dev(desc),
> -desc);
> + arg->hwirq = pci_msi_domain_calc_hwirq(desc);
>  }
>  #else
>  #define pci_msi_domain_set_desc  NULL
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -369,8 +369,7 @@ void pci_msi_domain_write_msg(struct irq
>  struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
>struct msi_domain_info *info,
>struct irq_domain *parent);
> -irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
> -   struct msi_desc *desc);
> +irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc);
>  int pci_msi_domain_check_cap(struct irq_domain *domain,
>struct msi_domain_info *info, struct device *dev);
>  u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
> *pdev);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 00/17] Drop uses of pci_read_config_*() return value

2020-08-02 Thread Bjorn Helgaas
On Sun, Aug 02, 2020 at 08:46:48PM +0200, Borislav Petkov wrote:
> On Sun, Aug 02, 2020 at 07:28:00PM +0200, Saheed Bolarinwa wrote:
> > Because the value ~0 has a meaning to some drivers and only
> 
> No, ~0 means that the PCI read failed. For *every* PCI device I know.

Wait, I'm not convinced yet.  I know that if a PCI read fails, you
normally get ~0 data because the host bridge fabricates it to complete
the CPU load.

But what guarantees that a PCI config register cannot contain ~0?
If there's something about that in the spec I'd love to know where it
is because it would simplify a lot of things.

I don't think we should merge any of these patches as-is.  If we *do*
want to go this direction, we at least need some kind of macro or
function that tests for ~0 so we have a clue about what's happening
and can grep for it.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/1] PCI/ATS: Check PRI supported on the PF device when SRIOV is enabled

2020-07-24 Thread Bjorn Helgaas
On Thu, Jul 23, 2020 at 03:37:29PM -0700, Ashok Raj wrote:
> PASID and PRI capabilities are only enumerated in PF devices. VF devices
> do not enumerate these capabilites. IOMMU drivers also need to enumerate
> them before enabling features in the IOMMU. Extending the same support as
> PASID feature discovery (pci_pasid_features) for PRI.
> 
> Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities 
> before ATS")
> Signed-off-by: Ashok Raj 

Applied with Baolu's reviewed-by and Joerg's ack to pci/virtualization
for v5.9, thanks!

> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: Lu Baolu 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: Ashok Raj 
> Cc: iommu@lists.linux-foundation.org
> ---
> v3: Added Fixes tag
> v2: Fixed build failure reported from lkp when CONFIG_PRI=n
> 
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/ats.c   | 13 +
>  include/linux/pci-ats.h |  4 
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..276452f5e6a7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2560,7 +2560,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   }
>  
>   if (info->ats_supported && ecap_prs(iommu->ecap) &&
> - pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> + pci_pri_supported(pdev))
>   info->pri_supported = 1;
>   }
>   }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..2e6cf0c700f7 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -325,6 +325,19 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  
>   return pdev->pasid_required;
>  }
> +
> +/**
> + * pci_pri_supported - Check if PRI is supported.
> + * @pdev: PCI device structure
> + *
> + * Returns true if PRI capability is present, false otherwise.
> + */
> +bool pci_pri_supported(struct pci_dev *pdev)
> +{
> + /* VFs share the PF PRI configuration */
> + return !!(pci_physfn(pdev)->pri_cap);
> +}
> +EXPORT_SYMBOL_GPL(pci_pri_supported);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346d..df54cd5b15db 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,10 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  int pci_reset_pri(struct pci_dev *pdev);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +bool pci_pri_supported(struct pci_dev *pdev);
> +#else
> +static inline bool pci_pri_supported(struct pci_dev *pdev)
> +{ return false; }
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/1] PCI/ATS: Check PRI supported on the PF device when SRIOV is enabled

2020-07-23 Thread Bjorn Helgaas
On Thu, Jul 23, 2020 at 03:37:29PM -0700, Ashok Raj wrote:
> PASID and PRI capabilities are only enumerated in PF devices. VF devices
> do not enumerate these capabilites. IOMMU drivers also need to enumerate
> them before enabling features in the IOMMU. Extending the same support as
> PASID feature discovery (pci_pasid_features) for PRI.
> 
> Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities 
> before ATS")
> Signed-off-by: Ashok Raj 

This looks right to me, but I would like Joerg's ack before applying
it.

> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: Lu Baolu 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: Ashok Raj 
> Cc: iommu@lists.linux-foundation.org
> ---
> v3: Added Fixes tag
> v2: Fixed build failure reported from lkp when CONFIG_PRI=n
> 
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/ats.c   | 13 +
>  include/linux/pci-ats.h |  4 
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..276452f5e6a7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2560,7 +2560,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   }
>  
>   if (info->ats_supported && ecap_prs(iommu->ecap) &&
> - pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> + pci_pri_supported(pdev))
>   info->pri_supported = 1;
>   }
>   }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..2e6cf0c700f7 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -325,6 +325,19 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  
>   return pdev->pasid_required;
>  }
> +
> +/**
> + * pci_pri_supported - Check if PRI is supported.
> + * @pdev: PCI device structure
> + *
> + * Returns true if PRI capability is present, false otherwise.
> + */
> +bool pci_pri_supported(struct pci_dev *pdev)
> +{
> + /* VFs share the PF PRI configuration */
> + return !!(pci_physfn(pdev)->pri_cap);
> +}
> +EXPORT_SYMBOL_GPL(pci_pri_supported);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346d..df54cd5b15db 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,10 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  int pci_reset_pri(struct pci_dev *pdev);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +bool pci_pri_supported(struct pci_dev *pdev);
> +#else
> +static inline bool pci_pri_supported(struct pci_dev *pdev)
> +{ return false; }
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] PCI/ATS: PASID and PRI are only enumerated in PF devices.

2020-07-23 Thread Bjorn Helgaas
On Thu, Jul 23, 2020 at 10:38:19AM -0700, Raj, Ashok wrote:
> Hi Bjorn
> 
> On Tue, Jul 21, 2020 at 09:54:01AM -0500, Bjorn Helgaas wrote:
> > On Mon, Jul 20, 2020 at 09:43:00AM -0700, Ashok Raj wrote:
> > > PASID and PRI capabilities are only enumerated in PF devices. VF devices
> > > do not enumerate these capabilites. IOMMU drivers also need to enumerate
> > > them before enabling features in the IOMMU. Extending the same support as
> > > PASID feature discovery (pci_pasid_features) for PRI.
> > > 
> > > Signed-off-by: Ashok Raj 
> > 
> > Hi Ashok,
> > 
> > When you update this for the 0-day implicit declaration thing, can you
> > update the subject to say what the patch *does*, as opposed to what it
> > is solving?  Also, no need for a period at the end.
> 
> Yes, will update and resend. Goofed up a couple things, i'll update those
> as well.
> 
> > Does this fix a regression?  Is it associated with a commit that we
> > could add as a "Fixes:" tag so we know how far back to try to apply
> > to stable kernels?
> 
> Yes, 

Does that mean "yes, this fixes a regression"?

> but the iommu files moved location and git fixes tags only generates
> for a few handful of commits and doesn't show the old ones. 

Not sure how to interpret the rest of this.  I'm happy to include the
SHA1 of the original commit that added the regression, even if the
file has moved since then.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/12] ACPI/IORT: Make iort_msi_map_rid() PCI agnostic

2020-07-21 Thread Bjorn Helgaas
On Fri, Jun 19, 2020 at 09:20:04AM +0100, Lorenzo Pieralisi wrote:
> There is nothing PCI specific in iort_msi_map_rid().
> 
> Rename the function using a bus protocol agnostic name,
> iort_msi_map_id(), and convert current callers to it.
> 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Bjorn Helgaas 
> Cc: Sudeep Holla 
> Cc: Catalin Marinas 
> Cc: Robin Murphy 
> Cc: "Rafael J. Wysocki" 

Acked-by: Bjorn Helgaas 

Sorry I missed this!

> ---
>  drivers/acpi/arm64/iort.c | 12 ++--
>  drivers/pci/msi.c |  2 +-
>  include/linux/acpi_iort.h |  6 +++---
>  3 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 902e2aaca946..53f9ef515089 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -568,22 +568,22 @@ static struct acpi_iort_node *iort_find_dev_node(struct 
> device *dev)
>  }
>  
>  /**
> - * iort_msi_map_rid() - Map a MSI requester ID for a device
> + * iort_msi_map_id() - Map a MSI input ID for a device
>   * @dev: The device for which the mapping is to be done.
> - * @req_id: The device requester ID.
> + * @input_id: The device input ID.
>   *
> - * Returns: mapped MSI RID on success, input requester ID otherwise
> + * Returns: mapped MSI ID on success, input ID otherwise
>   */
> -u32 iort_msi_map_rid(struct device *dev, u32 req_id)
> +u32 iort_msi_map_id(struct device *dev, u32 input_id)
>  {
>   struct acpi_iort_node *node;
>   u32 dev_id;
>  
>   node = iort_find_dev_node(dev);
>   if (!node)
> - return req_id;
> + return input_id;
>  
> - iort_node_map_id(node, req_id, _id, IORT_MSI_TYPE);
> + iort_node_map_id(node, input_id, _id, IORT_MSI_TYPE);
>   return dev_id;
>  }
>  
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 74a91f52ecc0..77f48b95e277 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1536,7 +1536,7 @@ u32 pci_msi_domain_get_msi_rid(struct irq_domain 
> *domain, struct pci_dev *pdev)
>  
>   of_node = irq_domain_get_of_node(domain);
>   rid = of_node ? of_msi_map_rid(>dev, of_node, rid) :
> - iort_msi_map_rid(>dev, rid);
> + iort_msi_map_id(>dev, rid);
>  
>   return rid;
>  }
> diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
> index 08ec6bd2297f..e51425e083da 100644
> --- a/include/linux/acpi_iort.h
> +++ b/include/linux/acpi_iort.h
> @@ -28,7 +28,7 @@ void iort_deregister_domain_token(int trans_id);
>  struct fwnode_handle *iort_find_domain_token(int trans_id);
>  #ifdef CONFIG_ACPI_IORT
>  void acpi_iort_init(void);
> -u32 iort_msi_map_rid(struct device *dev, u32 req_id);
> +u32 iort_msi_map_id(struct device *dev, u32 id);
>  struct irq_domain *iort_get_device_domain(struct device *dev, u32 id,
> enum irq_domain_bus_token bus_token);
>  void acpi_configure_pmsi_domain(struct device *dev);
> @@ -39,8 +39,8 @@ const struct iommu_ops *iort_iommu_configure(struct device 
> *dev);
>  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
> *head);
>  #else
>  static inline void acpi_iort_init(void) { }
> -static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id)
> -{ return req_id; }
> +static inline u32 iort_msi_map_id(struct device *dev, u32 id)
> +{ return id; }
>  static inline struct irq_domain *iort_get_device_domain(
>   struct device *dev, u32 id, enum irq_domain_bus_token bus_token)
>  { return NULL; }
> -- 
> 2.26.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] PCI/ATS: PASID and PRI are only enumerated in PF devices.

2020-07-21 Thread Bjorn Helgaas
On Mon, Jul 20, 2020 at 09:43:00AM -0700, Ashok Raj wrote:
> PASID and PRI capabilities are only enumerated in PF devices. VF devices
> do not enumerate these capabilites. IOMMU drivers also need to enumerate
> them before enabling features in the IOMMU. Extending the same support as
> PASID feature discovery (pci_pasid_features) for PRI.
> 
> Signed-off-by: Ashok Raj 

Hi Ashok,

When you update this for the 0-day implicit declaration thing, can you
update the subject to say what the patch *does*, as opposed to what it
is solving?  Also, no need for a period at the end.

Does this fix a regression?  Is it associated with a commit that we
could add as a "Fixes:" tag so we know how far back to try to apply
to stable kernels?

> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: Lu Baolu 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: Ashok Raj 
> Cc: iommu@lists.linux-foundation.org
> ---
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/ats.c   | 14 ++
>  include/linux/pci-ats.h |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..276452f5e6a7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2560,7 +2560,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   }
>  
>   if (info->ats_supported && ecap_prs(iommu->ecap) &&
> - pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> + pci_pri_supported(pdev))
>   info->pri_supported = 1;
>   }
>   }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..ffb4de8c5a77 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -461,6 +461,20 @@ int pci_pasid_features(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_pasid_features);
>  
> +/**
> + * pci_pri_supported - Check if PRI is supported.
> + * @pdev: PCI device structure
> + *
> + * Returns false when no PRI capability is present.
> + * Returns true if PRI feature is supported and enabled
> + */
> +bool pci_pri_supported(struct pci_dev *pdev)
> +{
> + /* VFs share the PF PRI configuration */
> + return !!(pci_physfn(pdev)->pri_cap);
> +}
> +EXPORT_SYMBOL_GPL(pci_pri_supported);
> +
>  #define PASID_NUMBER_SHIFT   8
>  #define PASID_NUMBER_MASK(0x1f << PASID_NUMBER_SHIFT)
>  /**
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346d..073d57292445 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,7 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  int pci_reset_pri(struct pci_dev *pdev);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +bool pci_pri_supported(struct pci_dev *pdev);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-11 Thread Bjorn Helgaas
On Sat, Jul 11, 2020 at 05:08:51PM -0700, Rajat Jain wrote:
> On Sat, Jul 11, 2020 at 12:53 PM Bjorn Helgaas  wrote:
> > On Fri, Jul 10, 2020 at 03:53:59PM -0700, Rajat Jain wrote:
> > > On Fri, Jul 10, 2020 at 2:29 PM Raj, Ashok  wrote:
> > > > On Fri, Jul 10, 2020 at 03:29:22PM -0500, Bjorn Helgaas wrote:
> > > > > On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> > > > > > When enabling ACS, enable translation blocking for external facing 
> > > > > > ports
> > > > > > and untrusted devices.
> > > > > >
> > > > > > Signed-off-by: Rajat Jain 
> > > > > > ---
> > > > > > v4: Add braces to avoid warning from kernel robot
> > > > > > print warning for only external-facing devices.
> > > > > > v3: print warning if ACS_TB not supported on 
> > > > > > external-facing/untrusted ports.
> > > > > > Minor code comments fixes.
> > > > > > v2: Commit log change
> > > > > >
> > > > > >  drivers/pci/pci.c|  8 
> > > > > >  drivers/pci/quirks.c | 15 +++
> > > > > >  2 files changed, 23 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > > > index 73a8627822140..a5a6bea7af7ce 100644
> > > > > > --- a/drivers/pci/pci.c
> > > > > > +++ b/drivers/pci/pci.c
> > > > > > @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev 
> > > > > > *dev)
> > > > > > /* Upstream Forwarding */
> > > > > > ctrl |= (cap & PCI_ACS_UF);
> > > > > >
> > > > > > +   /* Enable Translation Blocking for external devices */
> > > > > > +   if (dev->external_facing || dev->untrusted) {
> > > > > > +   if (cap & PCI_ACS_TB)
> > > > > > +   ctrl |= PCI_ACS_TB;
> > > > > > +   else if (dev->external_facing)
> > > > > > +   pci_warn(dev, "ACS: No Translation Blocking on 
> > > > > > external-facing dev\n");
> > > > > > +   }
> > > > >
> > > > > IIUC, this means that external devices can *never* use ATS and
> > > > > can never cache translations.
> > >
> > > Yes, but it already exists today (and this patch doesn't change that):
> > > 521376741b2c2 "PCI/ATS: Only enable ATS for trusted devices"
> > >
> > > IMHO any external device trying to send ATS traffic despite having ATS
> > > disabled should count as a bad intent. And this patch is trying to
> > > plug that loophole, by blocking the AT traffic from devices that we do
> > > not expect to see AT from anyway.
> >
> > Thinking about this some more, I wonder if Linux should:
> >
> >   - Explicitly disable ATS for every device at enumeration-time, e.g.,
> > in pci_init_capabilities(),
> >
> >   - Enable PCI_ACS_TB for every device (not just external-facing or
> > untrusted ones),
> >
> >   - Disable PCI_ACS_TB for the relevant devices along the path only
> > when enabling ATS.
> >
> > One nice thing about doing that is that the "untrusted" test would be
> > only in pci_enable_ats(), and we wouldn't need one in
> > pci_std_enable_acs().
> 
> Yes, this could work.
> 
> I think I had thought about this but I'm blanking out on why I had
> given it up. I think it was because of the possibility that some
> bridges may have "Translation blocking" disabled, even if not all
> their descendents were trusted enough to enable ATS on them. But now
> thinking about this again, as long as we retain the policy of not
> enabling ATS on external devices (and thus enable TB for sure on
> them), this should not be a problem. WDYT?

I think I would feel better if we always enabled Translation Blocking
except when we actually need it for ATS.  But I'm not confident about
how all the pieces of ATS work, so I could be missing something.

> > It's possible BIOS gives us devices with ATS enabled, and this
> > might break them, but that seems like something we'd want to find
> > out about.
> 
> Why would they break? We'd disable ATS on each device as we
> enumerate them, so they'd be functional, just with ATS disabled
> until it is enabled again on internal devices as needed. Which would
> be WAI behavior?

If BIOS handed off with ATS enabled and we somehow relied on it being
already enabled, something might break if we start disabling ATS.
Just a theoretical possibility, doesn't seem likely to me.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-11 Thread Bjorn Helgaas
On Fri, Jul 10, 2020 at 03:53:59PM -0700, Rajat Jain wrote:
> On Fri, Jul 10, 2020 at 2:29 PM Raj, Ashok  wrote:
> > On Fri, Jul 10, 2020 at 03:29:22PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> > > > When enabling ACS, enable translation blocking for external facing ports
> > > > and untrusted devices.
> > > >
> > > > Signed-off-by: Rajat Jain 
> > > > ---
> > > > v4: Add braces to avoid warning from kernel robot
> > > > print warning for only external-facing devices.
> > > > v3: print warning if ACS_TB not supported on external-facing/untrusted 
> > > > ports.
> > > > Minor code comments fixes.
> > > > v2: Commit log change
> > > >
> > > >  drivers/pci/pci.c|  8 
> > > >  drivers/pci/quirks.c | 15 +++
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 73a8627822140..a5a6bea7af7ce 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
> > > > /* Upstream Forwarding */
> > > > ctrl |= (cap & PCI_ACS_UF);
> > > >
> > > > +   /* Enable Translation Blocking for external devices */
> > > > +   if (dev->external_facing || dev->untrusted) {
> > > > +   if (cap & PCI_ACS_TB)
> > > > +   ctrl |= PCI_ACS_TB;
> > > > +   else if (dev->external_facing)
> > > > +   pci_warn(dev, "ACS: No Translation Blocking on 
> > > > external-facing dev\n");
> > > > +   }
> > >
> > > IIUC, this means that external devices can *never* use ATS and
> > > can never cache translations.
> 
> Yes, but it already exists today (and this patch doesn't change that):
> 521376741b2c2 "PCI/ATS: Only enable ATS for trusted devices"
> 
> IMHO any external device trying to send ATS traffic despite having ATS
> disabled should count as a bad intent. And this patch is trying to
> plug that loophole, by blocking the AT traffic from devices that we do
> not expect to see AT from anyway.

Thinking about this some more, I wonder if Linux should:

  - Explicitly disable ATS for every device at enumeration-time, e.g.,
in pci_init_capabilities(), 

  - Enable PCI_ACS_TB for every device (not just external-facing or
untrusted ones),

  - Disable PCI_ACS_TB for the relevant devices along the path only
when enabling ATS.

One nice thing about doing that is that the "untrusted" test would be
only in pci_enable_ats(), and we wouldn't need one in
pci_std_enable_acs().

It's possible BIOS gives us devices with ATS enabled, and this might
break them, but that seems like something we'd want to find out about.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-10 Thread Bjorn Helgaas
On Fri, Jul 10, 2020 at 03:53:59PM -0700, Rajat Jain wrote:
> On Fri, Jul 10, 2020 at 2:29 PM Raj, Ashok  wrote:
> > On Fri, Jul 10, 2020 at 03:29:22PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> > > > When enabling ACS, enable translation blocking for external facing ports
> > > > and untrusted devices.
> > > >
> > > > Signed-off-by: Rajat Jain 
> > > > ---
> > > > v4: Add braces to avoid warning from kernel robot
> > > > print warning for only external-facing devices.
> > > > v3: print warning if ACS_TB not supported on external-facing/untrusted 
> > > > ports.
> > > > Minor code comments fixes.
> > > > v2: Commit log change
> > > >
> > > >  drivers/pci/pci.c|  8 
> > > >  drivers/pci/quirks.c | 15 +++
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 73a8627822140..a5a6bea7af7ce 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
> > > > /* Upstream Forwarding */
> > > > ctrl |= (cap & PCI_ACS_UF);
> > > >
> > > > +   /* Enable Translation Blocking for external devices */
> > > > +   if (dev->external_facing || dev->untrusted) {
> > > > +   if (cap & PCI_ACS_TB)
> > > > +   ctrl |= PCI_ACS_TB;
> > > > +   else if (dev->external_facing)
> > > > +   pci_warn(dev, "ACS: No Translation Blocking on 
> > > > external-facing dev\n");
> > > > +   }
> > >
> > > IIUC, this means that external devices can *never* use ATS
> > and can
> > > never cache translations.
> 
> Yes, but it already exists today (and this patch doesn't change that):
> 521376741b2c2 "PCI/ATS: Only enable ATS for trusted devices"

If you get in the habit of using the commit reference style from
Documentation/process/submitting-patches.rst it saves me the trouble
of fixing them.  I use this:

  gsr is aliased to `git --no-pager show -s --abbrev-commit --abbrev=12 
--pretty=format:"%h (\"%s\")%n"'

> IMHO any external device trying to send ATS traffic despite having
> ATS disabled should count as a bad intent. And this patch is trying
> to plug that loophole, by blocking the AT traffic from devices that
> we do not expect to see AT from anyway.

That's exactly the sort of assertion I was looking for.  If we can get
something like this explanation into the commit log, and if Ashok and
Alex are OK with this, we'll be much closer.

It sounds like this is just enforcing a restriction we already have,
i.e., enabling PCI_ACS_TB blocks translated requests from devices that
aren't supposed to be generating them.

> Do you see any case where this is not true?
> 
> >  And (I guess, I'm not an expert) it can
> > > also never use the Page Request Services?
> >
> > Yep, sounds like it.
> 
> Yes, from spec "Address Translation Services" Rev 1.1:
> "...a device that supports ATS need not support PRI, but PRI is
> dependent on ATS’s capabilities."
> (So no ATS = No PRI).
> 
> > > Is this what we want?  Do we have any idea how many external
> > > devices this will affect or how much of a performance impact
> > > they will see?
> > >
> > > Do we need some kind of override or mechanism to authenticate
> > > certain devices so they can use ATS and PRI?
> >
> > Sounds like we would need some form of an allow-list to start with
> > so we can have something in the interim.
> 
> I assume what is being referred to, is an escape hatch to enable ATS
> on certain given "external-facing" ports (and devices downstream on
> that port). Do we really think a *per-port* control for ATS may be
> needed? I can add if there is consensus about this.
> 
> > I suppose a future platform might have a facilty to ensure ATS is
> > secure and authenticated we could enable for all of devices in the
> > system, in addition to PCI CMA/IDE.
> >
> > I think having a global override to enable all devices so platform
> > can switch to current behavior, or maybe via a cmdline switch.. as
> > much as we have a billion of those, it still gives an option in
> > case someone needs it.
> 
> Currently:
> 
> pci.noats => No ATS on all PCI devices.
> (Absense 

Re: [PATCH v4 1/4] PCI: Move pci_enable_acs() and its dependencies up in pci.c

2020-07-10 Thread Bjorn Helgaas
On Tue, Jul 07, 2020 at 03:46:01PM -0700, Rajat Jain wrote:
> Move pci_enable_acs() and the functions it depends on, further up in the
> source code to avoid having to forward declare it when we make it static
> in near future (next patch).
> 
> No functional changes intended.
> 
> Signed-off-by: Rajat Jain 

Applied patches 1-3 to pci/enumeration for v5.9, thanks!

I held off on patch 4 (enabling PCI_ACS_TB) until we have a little
more conversation on the impact of it.

> ---
> v4: Same as v3
> v3: Initial version of the patch, created per Bjorn's suggestion
> 
>  drivers/pci/pci.c | 254 +++---
>  1 file changed, 127 insertions(+), 127 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ce096272f52b1..eec625f0e594e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -777,6 +777,133 @@ int pci_wait_for_pending(struct pci_dev *dev, int pos, 
> u16 mask)
>   return 0;
>  }
>  
> +static int pci_acs_enable;
> +
> +/**
> + * pci_request_acs - ask for ACS to be enabled if supported
> + */
> +void pci_request_acs(void)
> +{
> + pci_acs_enable = 1;
> +}
> +
> +static const char *disable_acs_redir_param;
> +
> +/**
> + * pci_disable_acs_redir - disable ACS redirect capabilities
> + * @dev: the PCI device
> + *
> + * For only devices specified in the disable_acs_redir parameter.
> + */
> +static void pci_disable_acs_redir(struct pci_dev *dev)
> +{
> + int ret = 0;
> + const char *p;
> + int pos;
> + u16 ctrl;
> +
> + if (!disable_acs_redir_param)
> + return;
> +
> + p = disable_acs_redir_param;
> + while (*p) {
> + ret = pci_dev_str_match(dev, p, );
> + if (ret < 0) {
> + pr_info_once("PCI: Can't parse disable_acs_redir 
> parameter: %s\n",
> +  disable_acs_redir_param);
> +
> + break;
> + } else if (ret == 1) {
> + /* Found a match */
> + break;
> + }
> +
> + if (*p != ';' && *p != ',') {
> + /* End of param or invalid format */
> + break;
> + }
> + p++;
> + }
> +
> + if (ret != 1)
> + return;
> +
> + if (!pci_dev_specific_disable_acs_redir(dev))
> + return;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> + if (!pos) {
> + pci_warn(dev, "cannot disable ACS redirect for this hardware as 
> it does not have ACS capabilities\n");
> + return;
> + }
> +
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
> +
> + /* P2P Request & Completion Redirect */
> + ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
> +
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> +
> + pci_info(dev, "disabled ACS redirect\n");
> +}
> +
> +/**
> + * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
> + * @dev: the PCI device
> + */
> +static void pci_std_enable_acs(struct pci_dev *dev)
> +{
> + int pos;
> + u16 cap;
> + u16 ctrl;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> + if (!pos)
> + return;
> +
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, );
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
> +
> + /* Source Validation */
> + ctrl |= (cap & PCI_ACS_SV);
> +
> + /* P2P Request Redirect */
> + ctrl |= (cap & PCI_ACS_RR);
> +
> + /* P2P Completion Redirect */
> + ctrl |= (cap & PCI_ACS_CR);
> +
> + /* Upstream Forwarding */
> + ctrl |= (cap & PCI_ACS_UF);
> +
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> +}
> +
> +/**
> + * pci_enable_acs - enable ACS if hardware support it
> + * @dev: the PCI device
> + */
> +void pci_enable_acs(struct pci_dev *dev)
> +{
> + if (!pci_acs_enable)
> + goto disable_acs_redir;
> +
> + if (!pci_dev_specific_enable_acs(dev))
> + goto disable_acs_redir;
> +
> + pci_std_enable_acs(dev);
> +
> +disable_acs_redir:
> + /*
> +  * Note: pci_disable_acs_redir() must be called even if ACS was not
> +  * enabled by the kernel because it may have been enabled by
> +  * platform firmware.  So if we are told to disable it, we should
> +  * always disable it after setting the kernel's default
> +  * preferences.
> +  */
> + pci_disable_acs_redir(dev);
> +}
> +
>  /**
>   * pci_restore_bars - restore a device's BAR values (e.g. after wake-up)
>   * @dev: PCI device to have its BARs restored
> @@ -3230,133 +3357,6 @@ void pci_configure_ari(struct pci_dev *dev)
>   }
>  }
>  
> -static int pci_acs_enable;
> -
> -/**
> - * pci_request_acs - ask for ACS to be enabled if supported
> - */
> -void pci_request_acs(void)
> -{
> - pci_acs_enable = 1;
> -}
> -
> -static const char *disable_acs_redir_param;
> -
> -/**
> - * pci_disable_acs_redir - 

Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-10 Thread Bjorn Helgaas
On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 
> ---
> v4: Add braces to avoid warning from kernel robot
> print warning for only external-facing devices.
> v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
> Minor code comments fixes.
> v2: Commit log change
> 
>  drivers/pci/pci.c|  8 
>  drivers/pci/quirks.c | 15 +++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 73a8627822140..a5a6bea7af7ce 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }

IIUC, this means that external devices can *never* use ATS and can
never cache translations.  And (I guess, I'm not an expert) it can
also never use the Page Request Services?

Is this what we want?  Do we have any idea how many external devices
this will affect or how much of a performance impact they will see?

Do we need some kind of override or mechanism to authenticate certain
devices so they can use ATS and PRI?

If we do decide this is the right thing to do, I think we need to
expand the commit log a bit, because this is potentially a significant
user-visible change.

>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..bb22b46c1d719 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF
> + *
> + * TODO: This quirk also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted
> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,14 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2] PCI: Add device even if driver attach failed

2020-07-07 Thread Bjorn Helgaas
On Mon, Jul 06, 2020 at 04:32:40PM -0700, Rajat Jain wrote:
> device_attach() returning failure indicates a driver error while trying to
> probe the device. In such a scenario, the PCI device should still be added
> in the system and be visible to the user.
> 
> This patch partially reverts:
> commit ab1a187bba5c ("PCI: Check device_attach() return value always")
> 
> Signed-off-by: Rajat Jain 
> Reviewed-by: Greg Kroah-Hartman 
> ---
> Resending to stable, independent from other patches per Greg's suggestion
> v2: Add Greg's reviewed by, fix commit log

Applied to pci/enumeration for v5.8 with stable tag, thanks!

>  drivers/pci/bus.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 8e40b3e6da77d..3cef835b375fd 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -322,12 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
>  
>   dev->match_driver = true;
>   retval = device_attach(>dev);
> - if (retval < 0 && retval != -EPROBE_DEFER) {
> + if (retval < 0 && retval != -EPROBE_DEFER)
>   pci_warn(dev, "device attach failed (%d)\n", retval);
> - pci_proc_detach_device(dev);
> - pci_remove_sysfs_dev_files(dev);
> - return;
> - }
>  
>   pci_dev_assign_added(dev, true);
>  }
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/7] PCI: Set "untrusted" flag for truly external devices only

2020-07-06 Thread Bjorn Helgaas
On Mon, Jul 06, 2020 at 03:31:47PM -0700, Rajat Jain wrote:
> On Mon, Jul 6, 2020 at 9:38 AM Bjorn Helgaas  wrote:
> > On Mon, Jun 29, 2020 at 09:49:38PM -0700, Rajat Jain wrote:

> > > -static void pci_acpi_set_untrusted(struct pci_dev *dev)
> > > +static void pci_acpi_set_external_facing(struct pci_dev *dev)
> > >  {
> > >   u8 val;
> > >
> > > - if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
> > > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
> > > + pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
> >
> > This looks like a change worthy of its own patch.  We used to look for
> > "ExternalFacingPort" only on Root Ports; now we'll also do it for
> > Switch Downstream Ports.
> 
> Can do. (please see below)
> 
> > Can you include DT and ACPI spec references if they exist?  I found
> > this mention:
> > https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports
> > which actually says it should only be implemented for Root Ports.
> 
> I actually have no references. It seems to me that the microsoft spec
> assumes that all external ports must be implemented on root ports, but
> I think it would be equally fair for systems with PCIe switches to
> implement one on one of their switch downstream ports. I don't have an
> immediate use of this anyway, so if you think this should rather wait
> unless someone really has this case, this can wait. Let me know.

I agree that it "makes sense" to pay attention to this property no
matter where it appears, but since that Microsoft doc went to the
trouble to restrict it to Root Ports, I think we should leave this
as-is and only look for it in the Root Port.  Otherwise Linux will
accept something Windows will reject, and that seems like a needless
difference.

We can at least include the above link to the Microsoft doc in the
commit log.

> > It also mentions a "DmaProperty" that looks related.  Maybe Linux
> > should also pay attention to this?
> 
> Interesting. Since this is not in use currently by the kernel as well
> as not exposed by (our) BIOS, I don't have an immediate use case for
> this. I'd like to defer this for later (as-the-need-arises).

I agree, you can defer this until you see a need for it.  I just
pointed it out in case it would be useful to you.

> > > + /*
> > > +  * Devices are marked as external-facing using info from platform
> > > +  * (ACPI / devicetree). An external-facing device is still an 
> > > internal
> > > +  * trusted device, but it faces external untrusted devices. Thus any
> > > +  * devices enumerated downstream an external-facing device is marked
> > > +  * as untrusted.
> >
> > This comment has a subject/verb agreement problem.
> 
> I assume you meant s/is/are/ in last sentence. Will do.

Right.  There's also something wrong with "enumerated downstream an".
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/7] PCI: Keep the ACS capability offset in device

2020-07-06 Thread Bjorn Helgaas
On Mon, Jul 06, 2020 at 03:16:42PM -0700, Rajat Jain wrote:
> On Mon, Jul 6, 2020 at 8:58 AM Bjorn Helgaas  wrote:
> > On Mon, Jun 29, 2020 at 09:49:37PM -0700, Rajat Jain wrote:

> > > +static void pci_enable_acs(struct pci_dev *dev);
> >
> > I don't think we need this forward declaration, do we?
> 
> We need it unless we move its definition further up in the file:
> 
> drivers/pci/pci.c: In function ‘pci_restore_state’:
> drivers/pci/pci.c:1551:2: error: implicit declaration of function
> ‘pci_enable_acs’; did you mean ‘pci_enable_ats’?
> [-Werror=implicit-function-declaration]
>  1551 |  pci_enable_acs(dev);
> 
> Do you want me to move it up in the file so that we do not need the
> forward declaration?

Yes, please move it.  Maybe a preliminary patch that moves it but
doesn't change anything else.

I think I thought you had renamed the function, in which case you
could tell from the patch itself.  But I was mistaken!

> > > @@ -4653,7 +4653,7 @@ static int pci_quirk_intel_spt_pch_acs(struct 
> > > pci_dev *dev, u16 acs_flags)
> > >   if (!pci_quirk_intel_spt_pch_acs_match(dev))
> > >   return -ENOTTY;
> > >
> > > - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> > > + pos = dev->acs_cap;
> >
> > I assume you verified that all these quirks are FINAL quirks, since
> > pci_init_capabilities() is called after HEADER quirks.  I'll
> > double-check before applying this.
> 
> None of these quirks are applied via DECLARE_PCI_FIXUP_*(). All these
> quirks are called (directly or indirectly) from either
> pci_enable_acs() or pci_acs_enabled(),
> 
> EXCEPT
> 
> pci_idt_bus_quirk(). That one is called from
> pci_bus_read_dev_vendor_id() which should be called only after the
> parent bridge has been added and setup correctly.
> 
> So it looks all good to me.

Great, thanks for checking that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 3/7] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-06 Thread Bjorn Helgaas
On Mon, Jun 29, 2020 at 09:49:39PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 
> ---
> v2: Commit log change 
> 
>  drivers/pci/pci.c|  4 
>  drivers/pci/quirks.c | 11 +++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d2ff987585855..79853b52658a2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3330,6 +3330,10 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + if (dev->external_facing || dev->untrusted)
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..6294adeac4049 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_SV

Nit: Reorder these as in c8de8ed2dcaa ("PCI: Make ACS quirk
implementations more uniform") so they match other similar lists in
the code.

But more to the point: we have a bunch of other quirks for devices
that do not have an ACS capability but *do* provide some ACS-like
features.  Most of them support

  PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF

because that's what we usually want.  But I bet some of them also
actually provide the equivalent of PCI_ACS_TB.

REQ_ACS_FLAGS doesn't include PCI_ACS_TB.  Is there anything we need
to do on the pci_acs_enabled() side to check for PCI_ACS_TB, and
consequently, to update any of the quirks for devices that provide it?

> + *
> + * Currently missing, it also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted
> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,10 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + if (dev->external_facing || dev->untrusted)
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/7] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-06 Thread Bjorn Helgaas
On Mon, Jun 29, 2020 at 09:49:39PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 
> ---
> v2: Commit log change 
> 
>  drivers/pci/pci.c|  4 
>  drivers/pci/quirks.c | 11 +++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d2ff987585855..79853b52658a2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3330,6 +3330,10 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + if (dev->external_facing || dev->untrusted)
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..6294adeac4049 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_SV
> + *
> + * Currently missing, it also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted

I don't understand this comment.  Is this a "TODO"?  Is there
something more that needs to be done here?

After a patch is applied, a comment should describe the code as it is.

> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,10 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + if (dev->external_facing || dev->untrusted)
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/7] PCI: Set "untrusted" flag for truly external devices only

2020-07-06 Thread Bjorn Helgaas
On Tue, Jun 30, 2020 at 09:55:54AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 29, 2020 at 09:49:38PM -0700, Rajat Jain wrote:
> > The "ExternalFacing" devices (root ports) are still internal devices that
> > sit on the internal system fabric and thus trusted. Currently they were
> > being marked untrusted.
> > 
> > This patch uses the platform flag to identify the external facing devices
> > and then use it to mark any downstream devices as "untrusted". The
> > external-facing devices themselves are left as "trusted". This was
> > discussed here: https://lkml.org/lkml/2020/6/10/1049
> 
> {sigh}
> 
> First off, please use lore.kernel.org links, we don't control lkml.org
> and it often times has been down.
> 
> Also, you need to put all of the information in the changelog, referring
> to another place isn't always the best thing, considering you will be
> looking this up in 20+ years to try to figure out why people came up
> with such a crazy design.
> 
> But, the main point is, no, we did not decide on this.  "trust" is a
> policy decision to make by userspace, it is independant of "location",
> while you are tieing it directly here, which is what I explicitly said
> NOT to do.
> 
> So again, no, I will NAK this patch as-is, sorry, you are mixing things
> together in a way that it should not do at this point in time.

What do you see being mixed together here?  I acknowledge that the
name of "pdev->untrusted" is probably a mistake.  But this patch
doesn't change anything there.  It only changes the treatment of the
edge case of the "ExternalFacing" ports.  Previously we treated them
as being external themselves, which does seem wrong.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/7] PCI: Set "untrusted" flag for truly external devices only

2020-07-06 Thread Bjorn Helgaas
On Mon, Jun 29, 2020 at 09:49:38PM -0700, Rajat Jain wrote:
> The "ExternalFacing" devices (root ports) are still internal devices that
> sit on the internal system fabric and thus trusted. Currently they were
> being marked untrusted.
> 
> This patch uses the platform flag to identify the external facing devices
> and then use it to mark any downstream devices as "untrusted". The
> external-facing devices themselves are left as "trusted". This was
> discussed here: https://lkml.org/lkml/2020/6/10/1049

Use the imperative mood in the commit log, as you did for 1/7.  E.g.,
instead of "This patch uses ...", say "Use the platform flag ...".
That helps all the commit logs read nicely together.

I think this patch makes two changes that should be separated:

  - Treat "external-facing" devices as internal.

  - Look for the "external-facing" or "ExternalFacing" property on
Switch Downstream Ports as well as Root Ports.

> Signed-off-by: Rajat Jain 
> ---
> v2: cosmetic changes in commit log
> 
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/of.c|  2 +-
>  drivers/pci/pci-acpi.c  | 13 +++--
>  drivers/pci/probe.c |  2 +-
>  include/linux/pci.h |  8 
>  5 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e982..1ccb224f82496 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4743,7 +4743,7 @@ static inline bool has_untrusted_dev(void)
>   struct pci_dev *pdev = NULL;
>  
>   for_each_pci_dev(pdev)
> - if (pdev->untrusted)
> + if (pdev->untrusted || pdev->external_facing)

I think checking pdev->external_facing is enough for this case,
because it's impossible to have pdev->untrusted unless a parent has
pdev->external_facing.

IIUC, this usage is asking "might we ever have an external device?"
as opposed to the "pdev->untrusted" uses, which are asking "is *this*
device an external device?"

>   return true;
>  
>   return false;
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index 27839cd2459f6..22727fc9558df 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -42,7 +42,7 @@ void pci_set_bus_of_node(struct pci_bus *bus)
>   } else {
>   node = of_node_get(bus->self->dev.of_node);
>   if (node && of_property_read_bool(node, "external-facing"))
> - bus->self->untrusted = true;
> + bus->self->external_facing = true;
>   }
>  
>   bus->dev.of_node = node;
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 7224b1e5f2a83..492c07805caf8 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1213,22 +1213,23 @@ static void pci_acpi_optimize_delay(struct pci_dev 
> *pdev,
>   ACPI_FREE(obj);
>  }
>  
> -static void pci_acpi_set_untrusted(struct pci_dev *dev)
> +static void pci_acpi_set_external_facing(struct pci_dev *dev)
>  {
>   u8 val;
>  
> - if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
> + if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
> + pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)

This looks like a change worthy of its own patch.  We used to look for
"ExternalFacingPort" only on Root Ports; now we'll also do it for
Switch Downstream Ports.

Can you include DT and ACPI spec references if they exist?  I found
this mention:
https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports
which actually says it should only be implemented for Root Ports.

It also mentions a "DmaProperty" that looks related.  Maybe Linux
should also pay attention to this?

If we do change this, should we use pcie_downstream_port(), which
includes PCI-to-PCIe bridges as well?

>   return;
>   if (device_property_read_u8(>dev, "ExternalFacingPort", ))
>   return;
>  
>   /*
> -  * These root ports expose PCIe (including DMA) outside of the
> -  * system so make sure we treat them and everything behind as
> +  * These root/down ports expose PCIe (including DMA) outside of the
> +  * system so make sure we treat everything behind them as
>* untrusted.
>*/
>   if (val)
> - dev->untrusted = 1;
> + dev->external_facing = 1;
>  }
>  
>  static void pci_acpi_setup(struct device *dev)
> @@ -1240,7 +1241,7 @@ static void pci_acpi_setup(struct device *dev)
>   return;
>  
>   pci_acpi_optimize_delay(pci_dev, adev->handle);
> - pci_acpi_set_untrusted(pci_dev);
> + pci_acpi_set_external_facing(pci_dev);
>   pci_acpi_add_edr_notifier(pci_dev);
>  
>   pci_acpi_add_pm_notifier(adev, pci_dev);
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 6d87066a5ecc5..8c40c00413e74 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1552,7 +1552,7 @@ static void set_pcie_untrusted(struct 

Re: [PATCH v2 1/7] PCI: Keep the ACS capability offset in device

2020-07-06 Thread Bjorn Helgaas
On Mon, Jun 29, 2020 at 09:49:37PM -0700, Rajat Jain wrote:
> Currently this is being looked up at a number of places. Read and store it
> once at bootup so that it can be used by all later.

Write the commit log so it is complete even without the subject.
Right now, you have to read the subject to know what "this" refers to.

The subject is like the title; the log is like the body of an article.
The title isn't *part* of the article, so the article has to make
sense all by itself.

> +static void pci_enable_acs(struct pci_dev *dev);

I don't think we need this forward declaration, do we?

> @@ -4653,7 +4653,7 @@ static int pci_quirk_intel_spt_pch_acs(struct pci_dev 
> *dev, u16 acs_flags)
>   if (!pci_quirk_intel_spt_pch_acs_match(dev))
>   return -ENOTTY;
>  
> - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> + pos = dev->acs_cap;

I assume you verified that all these quirks are FINAL quirks, since
pci_init_capabilities() is called after HEADER quirks.  I'll
double-check before applying this.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] pci: Add pci device even if the driver failed to attach

2020-06-26 Thread Bjorn Helgaas
Nit: when you update these patches, can you run "git log --oneline
drivers/pci/bus.c" and make your subject lines match the convention?
E.g.,

  PCI: Add device even if driver attach failed

On Thu, Jun 25, 2020 at 05:27:09PM -0700, Rajat Jain wrote:
> device_attach() returning failure indicates a driver error
> while trying to probe the device. In such a scenario, the PCI
> device should still be added in the system and be visible to
> the user.

Nit: please wrap logs to fill 75 characters.  "git log" adds 4 spaces
at the beginning, so 75+4 still fits nicely in 80 columns without
wrapping.

> This patch partially reverts:
> commit ab1a187bba5c ("PCI: Check device_attach() return value always")
> 
> Signed-off-by: Rajat Jain 
> ---
>  drivers/pci/bus.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 8e40b3e6da77d..3cef835b375fd 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -322,12 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
>  
>   dev->match_driver = true;
>   retval = device_attach(>dev);
> - if (retval < 0 && retval != -EPROBE_DEFER) {
> + if (retval < 0 && retval != -EPROBE_DEFER)
>   pci_warn(dev, "device attach failed (%d)\n", retval);
> - pci_proc_detach_device(dev);
> - pci_remove_sysfs_dev_files(dev);

Thanks for catching my bug!

> - return;
> - }
>  
>   pci_dev_assign_added(dev, true);
>  }
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-23 Thread Bjorn Helgaas
On Fri, Jun 19, 2020 at 10:26:54AM +0800, Zhangfei Gao wrote:
> Have studied _DSM method, two issues we met comparing using quirk.
> 
> 1. Need change definition of either pci_host_bridge or pci_dev, like adding
> member can_stall,
> while pci system does not know stall now.
> 
> a, pci devices do not have uuid: uuid need be described in dsdt, while pci
> devices are not defined in dsdt.
>     so we have to use host bridge.

PCI devices *can* be described in the DSDT.  IIUC these particular
devices are hardwired (not plug-in cards), so platform firmware can
know about them and could describe them in the DSDT.

> b,  Parsing dsdt is in in pci subsystem.
> Like drivers/acpi/pci_root.c:
>    obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), _acpi_dsm_guid,
> 1,
>     IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> 
> After parsing DSM in pci, we need record this info.
> Currently, can_stall info is recorded in iommu_fwspec,
> which is allocated in iommu_fwspec_init and called by iort_iommu_configure
> for uefi.

You can look for a _DSM wherever it is convenient for you.  It could
be in an AMBA shim layer.

> 2. Guest kernel also need support sva.
> Using quirk, the guest can boot with sva enabled, since quirk is
> self-contained by kernel.
> If using  _DSM, a specific uefi or dtb has to be provided,
> currently we can useQEMU_EFI.fd from apt install qemu-efi

I don't quite understand what this means, but as I mentioned before, a
quirk for a *limited* number of devices is OK, as long as there is a
plan that removes the need for a quirk for future devices.

E.g., if the next platform version ships with a DTB or firmware with a
_DSM or other mechanism that enables the kernel to discover this
information without a kernel change, it's fine to use a quirk to cover
the early platform.

The principles are:

  - I don't want to have to update a quirk for every new Device ID
that needs this.

  - I don't really want to have to manage non-PCI information in the
struct pci_dev.  If this is AMBA- or IOMMU-related, it should be
stored in a structure related to AMBA or the IOMMU.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-15 Thread Bjorn Helgaas
On Sat, Jun 13, 2020 at 10:30:56PM +0800, Zhangfei Gao wrote:
> On 2020/6/11 下午9:44, Bjorn Helgaas wrote:
> > +++ b/drivers/iommu/iommu.c
> > > > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device 
> > > > > > > > > > *dev, struct
> > > > > > > > > > fwnode_handle *iommu_fwnode,
> > > > > > > > > > fwspec->iommu_fwnode = iommu_fwnode;
> > > > > > > > > > fwspec->ops = ops;
> > > > > > > > > > dev_iommu_fwspec_set(dev, fwspec);
> > > > > > > > > > +
> > > > > > > > > > +   if (dev_is_pci(dev))
> > > > > > > > > > +   pci_fixup_device(pci_fixup_final, 
> > > > > > > > > > to_pci_dev(dev));
> > > > > > > > > > +
> > > > > > > > > > 
> > > > > > > > > > Then pci_fixup_final will be called twice, the first in 
> > > > > > > > > > pci_bus_add_device.
> > > > > > > > > > Here in iommu_fwspec_init is the second time, specifically 
> > > > > > > > > > for iommu_fwspec.
> > > > > > > > > > Will send this when 5.8-rc1 is open.
> > > > > > > > > Wait, this whole fixup approach seems wrong to me.  No matter 
> > > > > > > > > how you
> > > > > > > > > do the fixup, it's still a fixup, which means it requires 
> > > > > > > > > ongoing
> > > > > > > > > maintenance.  Surely we don't want to have to add the 
> > > > > > > > > Vendor/Device ID
> > > > > > > > > for every new AMBA device that comes along, do we?
> > > > > > > > > 
> > > > > > > > Here the fake pci device has standard PCI cfg space, but 
> > > > > > > > physical
> > > > > > > > implementation is base on AMBA
> > > > > > > > They can provide pasid feature.
> > > > > > > > However,
> > > > > > > > 1, does not support tlp since they are not real pci devices.
> > > > > > > > 2. does not support pri, instead support stall (provided by 
> > > > > > > > smmu)
> > > > > > > > And stall is not a pci feature, so it is not described in 
> > > > > > > > struct pci_dev,
> > > > > > > > but in struct iommu_fwspec.
> > > > > > > > So we use this fixup to tell pci system that the devices can 
> > > > > > > > support stall,
> > > > > > > > and hereby support pasid.
> > > > > > > This did not answer my question.  Are you proposing that we 
> > > > > > > update a
> > > > > > > quirk every time a new AMBA device is released?  I don't think 
> > > > > > > that
> > > > > > > would be a good model.
> > > > > > Yes, you are right, but we do not have any better idea yet.
> > > > > > Currently we have three fake pci devices, which support stall and 
> > > > > > pasid.
> > > > > > We have to let pci system know the device can support pasid, 
> > > > > > because of
> > > > > > stall feature, though not support pri.
> > > > > > Do you have any other ideas?
> > > > > It sounds like the best way would be to allocate a PCI capability for 
> > > > > it, so
> > > > > detection can be done through config space, at least in future 
> > > > > devices,
> > > > > or possibly after a firmware update if the config space in your system
> > > > > is controlled by firmware somewhere.  Once there is a proper mechanism
> > > > > to do this, using fixups to detect the early devices that don't use 
> > > > > that
> > > > > should be uncontroversial. I have no idea what the process or timeline
> > > > > is to add new capabilities into the PCIe specification, or if this one
> > > > > would be acceptable to the PCI SIG at all.
> > > > That sounds like a possibility.  The spec already defines a
> > > > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that 

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-11 Thread Bjorn Helgaas
On Thu, Jun 11, 2020 at 10:54:45AM +0800, Zhangfei Gao wrote:
> On 2020/6/10 上午12:49, Bjorn Helgaas wrote:
> > On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:
> > > On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao  
> > > wrote:
> > > > On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> > > > > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> > > > > > On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > > > > > > > +++ b/drivers/iommu/iommu.c
> > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device 
> > > > > > > > *dev, struct
> > > > > > > > fwnode_handle *iommu_fwnode,
> > > > > > > >fwspec->iommu_fwnode = iommu_fwnode;
> > > > > > > >fwspec->ops = ops;
> > > > > > > >dev_iommu_fwspec_set(dev, fwspec);
> > > > > > > > +
> > > > > > > > +   if (dev_is_pci(dev))
> > > > > > > > +   pci_fixup_device(pci_fixup_final, 
> > > > > > > > to_pci_dev(dev));
> > > > > > > > +
> > > > > > > > 
> > > > > > > > Then pci_fixup_final will be called twice, the first in 
> > > > > > > > pci_bus_add_device.
> > > > > > > > Here in iommu_fwspec_init is the second time, specifically for 
> > > > > > > > iommu_fwspec.
> > > > > > > > Will send this when 5.8-rc1 is open.
> > > > > > > Wait, this whole fixup approach seems wrong to me.  No matter how 
> > > > > > > you
> > > > > > > do the fixup, it's still a fixup, which means it requires ongoing
> > > > > > > maintenance.  Surely we don't want to have to add the 
> > > > > > > Vendor/Device ID
> > > > > > > for every new AMBA device that comes along, do we?
> > > > > > > 
> > > > > > Here the fake pci device has standard PCI cfg space, but physical
> > > > > > implementation is base on AMBA
> > > > > > They can provide pasid feature.
> > > > > > However,
> > > > > > 1, does not support tlp since they are not real pci devices.
> > > > > > 2. does not support pri, instead support stall (provided by smmu)
> > > > > > And stall is not a pci feature, so it is not described in struct 
> > > > > > pci_dev,
> > > > > > but in struct iommu_fwspec.
> > > > > > So we use this fixup to tell pci system that the devices can 
> > > > > > support stall,
> > > > > > and hereby support pasid.
> > > > > This did not answer my question.  Are you proposing that we update a
> > > > > quirk every time a new AMBA device is released?  I don't think that
> > > > > would be a good model.
> > > > Yes, you are right, but we do not have any better idea yet.
> > > > Currently we have three fake pci devices, which support stall and pasid.
> > > > We have to let pci system know the device can support pasid, because of
> > > > stall feature, though not support pri.
> > > > Do you have any other ideas?
> > > It sounds like the best way would be to allocate a PCI capability for it, 
> > > so
> > > detection can be done through config space, at least in future devices,
> > > or possibly after a firmware update if the config space in your system
> > > is controlled by firmware somewhere.  Once there is a proper mechanism
> > > to do this, using fixups to detect the early devices that don't use that
> > > should be uncontroversial. I have no idea what the process or timeline
> > > is to add new capabilities into the PCIe specification, or if this one
> > > would be acceptable to the PCI SIG at all.
> > That sounds like a possibility.  The spec already defines a
> > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
> > be a candidate.
> Will investigate this, thanks Bjorn

FWIW, there's also a Vendor-Specific Capability that can appear in the
first 256 bytes of config space (the Vendor-Specific Extended
Capability must appear in the "Extended Configuration Space" from
0x100-0xfff).

> > > If detection cannot be done through PCI config space, the next best
> > > alternative is to pass auxiliary data through firmware. On DT based

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-09 Thread Bjorn Helgaas
On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:
> On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao  wrote:
> > On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> > > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> > >> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > >>>> +++ b/drivers/iommu/iommu.c
> > >>>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> > >>>> fwnode_handle *iommu_fwnode,
> > >>>>   fwspec->iommu_fwnode = iommu_fwnode;
> > >>>>   fwspec->ops = ops;
> > >>>>   dev_iommu_fwspec_set(dev, fwspec);
> > >>>> +
> > >>>> +   if (dev_is_pci(dev))
> > >>>> +   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> > >>>> +
> > >>>>
> > >>>> Then pci_fixup_final will be called twice, the first in 
> > >>>> pci_bus_add_device.
> > >>>> Here in iommu_fwspec_init is the second time, specifically for 
> > >>>> iommu_fwspec.
> > >>>> Will send this when 5.8-rc1 is open.
> > >>> Wait, this whole fixup approach seems wrong to me.  No matter how you
> > >>> do the fixup, it's still a fixup, which means it requires ongoing
> > >>> maintenance.  Surely we don't want to have to add the Vendor/Device ID
> > >>> for every new AMBA device that comes along, do we?
> > >>>
> > >> Here the fake pci device has standard PCI cfg space, but physical
> > >> implementation is base on AMBA
> > >> They can provide pasid feature.
> > >> However,
> > >> 1, does not support tlp since they are not real pci devices.
> > >> 2. does not support pri, instead support stall (provided by smmu)
> > >> And stall is not a pci feature, so it is not described in struct pci_dev,
> > >> but in struct iommu_fwspec.
> > >> So we use this fixup to tell pci system that the devices can support 
> > >> stall,
> > >> and hereby support pasid.
> > > This did not answer my question.  Are you proposing that we update a
> > > quirk every time a new AMBA device is released?  I don't think that
> > > would be a good model.
> >
> > Yes, you are right, but we do not have any better idea yet.
> > Currently we have three fake pci devices, which support stall and pasid.
> > We have to let pci system know the device can support pasid, because of
> > stall feature, though not support pri.
> > Do you have any other ideas?
> 
> It sounds like the best way would be to allocate a PCI capability for it, so
> detection can be done through config space, at least in future devices,
> or possibly after a firmware update if the config space in your system
> is controlled by firmware somewhere.  Once there is a proper mechanism
> to do this, using fixups to detect the early devices that don't use that
> should be uncontroversial. I have no idea what the process or timeline
> is to add new capabilities into the PCIe specification, or if this one
> would be acceptable to the PCI SIG at all.

That sounds like a possibility.  The spec already defines a
Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
be a candidate.

> If detection cannot be done through PCI config space, the next best
> alternative is to pass auxiliary data through firmware. On DT based
> machines, you can list non-hotpluggable PCIe devices and add custom
> properties that could be read during device enumeration. I assume
> ACPI has something similar, but I have not done that.

ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
like this better than a PCI capability because the property you need
to expose is not a PCI property.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-08 Thread Bjorn Helgaas
On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
> > > On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
> > > > On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> > > > > On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > > > > > Is this slowdown significant?  We already iterate over every device
> > > > > > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > > > > > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > > > > > adding two more iterations to the loop in pci_do_fixups() that tries
> > > > > > to match quirks against the current device.  I doubt that would be a
> > > > > > measurable slowdown.
> > > > > I don't know how significant it is, but I remember people complaining
> > > > > about adding new PCI quirks because it takes too long for them to run
> > > > > them all. That was in the discussion about the quirk disabling ATS on
> > > > > AMD Stoney systems.
> > > > > 
> > > > > So it probably depends on how many PCI devices are in the system 
> > > > > whether
> > > > > it causes any measureable slowdown.
> > > > I found this [1] from Paul Menzel, which was a slowdown caused by
> > > > quirk_usb_early_handoff().  I think the real problem is individual
> > > > quirks that take a long time.
> > > > 
> > > > The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
> > > > course, they're only run for matching devices anyway.  So I'd rather
> > > > keep them as PCI_FIXUP_FINAL than add a whole new phase.
> > > > 
> > > Thanks Bjorn for taking time for this.
> > > If so, it would be much simpler.
> > > 
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> > > fwnode_handle *iommu_fwnode,
> > >      fwspec->iommu_fwnode = iommu_fwnode;
> > >      fwspec->ops = ops;
> > >      dev_iommu_fwspec_set(dev, fwspec);
> > > +
> > > +   if (dev_is_pci(dev))
> > > +   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> > > +
> > > 
> > > Then pci_fixup_final will be called twice, the first in 
> > > pci_bus_add_device.
> > > Here in iommu_fwspec_init is the second time, specifically for 
> > > iommu_fwspec.
> > > Will send this when 5.8-rc1 is open.
> >
> > Wait, this whole fixup approach seems wrong to me.  No matter how you
> > do the fixup, it's still a fixup, which means it requires ongoing
> > maintenance.  Surely we don't want to have to add the Vendor/Device ID
> > for every new AMBA device that comes along, do we?
> > 
> Here the fake pci device has standard PCI cfg space, but physical
> implementation is base on AMBA
> They can provide pasid feature.
> However,
> 1, does not support tlp since they are not real pci devices.
> 2. does not support pri, instead support stall (provided by smmu)
> And stall is not a pci feature, so it is not described in struct pci_dev,
> but in struct iommu_fwspec.
> So we use this fixup to tell pci system that the devices can support stall,
> and hereby support pasid.

This did not answer my question.  Are you proposing that we update a
quirk every time a new AMBA device is released?  I don't think that
would be a good model.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-05 Thread Bjorn Helgaas
On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
> On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> > > On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > > > Is this slowdown significant?  We already iterate over every device
> > > > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > > > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > > > adding two more iterations to the loop in pci_do_fixups() that tries
> > > > to match quirks against the current device.  I doubt that would be a
> > > > measurable slowdown.
> > > I don't know how significant it is, but I remember people complaining
> > > about adding new PCI quirks because it takes too long for them to run
> > > them all. That was in the discussion about the quirk disabling ATS on
> > > AMD Stoney systems.
> > > 
> > > So it probably depends on how many PCI devices are in the system whether
> > > it causes any measureable slowdown.
> > I found this [1] from Paul Menzel, which was a slowdown caused by
> > quirk_usb_early_handoff().  I think the real problem is individual
> > quirks that take a long time.
> > 
> > The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
> > course, they're only run for matching devices anyway.  So I'd rather
> > keep them as PCI_FIXUP_FINAL than add a whole new phase.
> > 
> Thanks Bjorn for taking time for this.
> If so, it would be much simpler.
> 
> +++ b/drivers/iommu/iommu.c
> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> fwnode_handle *iommu_fwnode,
>     fwspec->iommu_fwnode = iommu_fwnode;
>     fwspec->ops = ops;
>     dev_iommu_fwspec_set(dev, fwspec);
> +
> +   if (dev_is_pci(dev))
> +   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> +
> 
> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
> Will send this when 5.8-rc1 is open.

Wait, this whole fixup approach seems wrong to me.  No matter how you
do the fixup, it's still a fixup, which means it requires ongoing
maintenance.  Surely we don't want to have to add the Vendor/Device ID
for every new AMBA device that comes along, do we?

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.

2020-06-01 Thread Bjorn Helgaas
On Mon, Jun 01, 2020 at 03:56:55PM -0600, Alex Williamson wrote:
> On Mon, 1 Jun 2020 14:40:23 -0700
> "Raj, Ashok"  wrote:
> 
> > On Mon, Jun 01, 2020 at 04:25:19PM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote:  
> > > > All Intel platforms guarantee that all root complex implementations
> > > > must send transactions up to IOMMU for address translations. Hence for
> > > > RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> > > > ACS support.
> > > > 
> > > > 
> > > > 3.16 Root-Complex Peer to Peer Considerations
> > > > When DMA remapping is enabled, peer-to-peer requests through the
> > > > Root-Complex must be handled
> > > > as follows:
> > > > • The input address in the request is translated (through first-level,
> > > >   second-level or nested translation) to a host physical address (HPA).
> > > >   The address decoding for peer addresses must be done only on the
> > > >   translated HPA. Hardware implementations are free to further limit
> > > >   peer-to-peer accesses to specific host physical address regions
> > > >   (or to completely disallow peer-forwarding of translated requests).
> > > > • Since address translation changes the contents (address field) of
> > > >   the PCI Express Transaction Layer Packet (TLP), for PCI Express
> > > >   peer-to-peer requests with ECRC, the Root-Complex hardware must use
> > > >   the new ECRC (re-computed with the translated address) if it
> > > >   decides to forward the TLP as a peer request.
> > > > • Root-ports, and multi-function root-complex integrated endpoints, may
> > > >   support additional peerto-peer control features by supporting PCI 
> > > > Express
> > > >   Access Control Services (ACS) capability. Refer to ACS capability in
> > > >   PCI Express specifications for details.
> > > > 
> > > > Since Linux didn't give special treatment to allow this exception, 
> > > > certain
> > > > RCiEP MFD devices are getting grouped in a single iommu group. This
> > > > doesn't permit a single device to be assigned to a guest for instance.
> > > > 
> > > > In one vendor system: Device 14.x were grouped in a single IOMMU group.
> > > > 
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.3
> > > > 
> > > > After the patch:
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > > > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> > > > 
> > > > 14.0 and 14.2 are integrated devices, but legacy end points.
> > > > Whereas 14.3 was a PCIe compliant RCiEP.
> > > > 
> > > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> > > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> > > > 
> > > > This permits assigning this device to a guest VM.
> > > > 
> > > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> > > > Signed-off-by: Ashok Raj 
> > > > To: Joerg Roedel 
> > > > To: Bjorn Helgaas 
> > > > Cc: linux-ker...@vger.kernel.org
> > > > Cc: iommu@lists.linux-foundation.org
> > > > Cc: Lu Baolu 
> > > > Cc: Alex Williamson 
> > > > Cc: Darrel Goeddel 
> > > > Cc: Mark Scott ,
> > > > Cc: Romil Sharma 
> > > > Cc: Ashok Raj   
> > > 
> > > Tentatively applied to pci/virtualization for v5.8, thanks!
> > > 
> > > The spec says this handling must apply "when DMA remapping is
> > > enabled".  The patch does not check whether DMA remapping is enabled.
> > > 
> > > Is there any case where DMA remapping is *not* enabled, and we rely on
> > > this patch to tell us whether the device is isolated?  It sounds like
> > > it may give the wrong answer in such a case?
> > > 
> > > Can you confirm that I don't need to worry about this?
> > 
> > I think all of this makes sense only when DMA remapping is enabled.
> > Otherwise there is no enforcement for isolation. 
> 
> Yep, without an IOMMU all devices operate in the same IOVA space and we
> have no isolation.  We only enable ACS when an IOMMU driver requests it
> and it's only used by IOMMU code to determine IOMMU grouping of
> devices.  Thanks,

Thanks, Ashok and Alex.  I wish it were more obvious from the code,
but I am reassured.

I also added a stable tag to help get this backported.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.

2020-06-01 Thread Bjorn Helgaas
On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote:
> All Intel platforms guarantee that all root complex implementations
> must send transactions up to IOMMU for address translations. Hence for
> RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> ACS support.
> 
> 
> 3.16 Root-Complex Peer to Peer Considerations
> When DMA remapping is enabled, peer-to-peer requests through the
> Root-Complex must be handled
> as follows:
> • The input address in the request is translated (through first-level,
>   second-level or nested translation) to a host physical address (HPA).
>   The address decoding for peer addresses must be done only on the
>   translated HPA. Hardware implementations are free to further limit
>   peer-to-peer accesses to specific host physical address regions
>   (or to completely disallow peer-forwarding of translated requests).
> • Since address translation changes the contents (address field) of
>   the PCI Express Transaction Layer Packet (TLP), for PCI Express
>   peer-to-peer requests with ECRC, the Root-Complex hardware must use
>   the new ECRC (re-computed with the translated address) if it
>   decides to forward the TLP as a peer request.
> • Root-ports, and multi-function root-complex integrated endpoints, may
>   support additional peerto-peer control features by supporting PCI Express
>   Access Control Services (ACS) capability. Refer to ACS capability in
>   PCI Express specifications for details.
> 
> Since Linux didn't give special treatment to allow this exception, certain
> RCiEP MFD devices are getting grouped in a single iommu group. This
> doesn't permit a single device to be assigned to a guest for instance.
> 
> In one vendor system: Device 14.x were grouped in a single IOMMU group.
> 
> /sys/kernel/iommu_groups/5/devices/:00:14.0
> /sys/kernel/iommu_groups/5/devices/:00:14.2
> /sys/kernel/iommu_groups/5/devices/:00:14.3
> 
> After the patch:
> /sys/kernel/iommu_groups/5/devices/:00:14.0
> /sys/kernel/iommu_groups/5/devices/:00:14.2
> /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> 
> 14.0 and 14.2 are integrated devices, but legacy end points.
> Whereas 14.3 was a PCIe compliant RCiEP.
> 
> 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> 
> This permits assigning this device to a guest VM.
> 
> Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> Signed-off-by: Ashok Raj 
> To: Joerg Roedel 
> To: Bjorn Helgaas 
> Cc: linux-ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: Lu Baolu 
> Cc: Alex Williamson 
> Cc: Darrel Goeddel 
> Cc: Mark Scott ,
> Cc: Romil Sharma 
> Cc: Ashok Raj 

Tentatively applied to pci/virtualization for v5.8, thanks!

The spec says this handling must apply "when DMA remapping is
enabled".  The patch does not check whether DMA remapping is enabled.

Is there any case where DMA remapping is *not* enabled, and we rely on
this patch to tell us whether the device is isolated?  It sounds like
it may give the wrong answer in such a case?

Can you confirm that I don't need to worry about this?  

> ---
> v2: Moved functionality from iommu to pci quirks - Alex Williamson
> 
>  drivers/pci/quirks.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 28c9a2409c50..63373ca0a3fe 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4682,6 +4682,20 @@ static int pci_quirk_mf_endpoint_acs(struct pci_dev 
> *dev, u16 acs_flags)
>   PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT);
>  }
>  
> +static int pci_quirk_rciep_acs(struct pci_dev *dev, u16 acs_flags)
> +{
> + /*
> +  * RCiEP's are required to allow p2p only on translated addresses.
> +  * Refer to Intel VT-d specification Section 3.16 Root-Complex Peer
> +  * to Peer Considerations
> +  */
> + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_END)
> + return -ENOTTY;
> +
> + return pci_acs_ctrl_enabled(acs_flags,
> + PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF);
> +}
> +
>  static int pci_quirk_brcm_acs(struct pci_dev *dev, u16 acs_flags)
>  {
>   /*
> @@ -4764,6 +4778,7 @@ static const struct pci_dev_acs_enabled {
>   /* I219 */
>   { PCI_VENDOR_ID_INTEL, 0x15b7, pci_quirk_mf_endpoint_acs },
>   { PCI_VENDOR_ID_INTEL, 0x15b8, pci_quirk_mf_endpoint_acs },
> + { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_rciep_acs },
>   /* QCOM QDF2xxx root ports */
>   { PCI_VENDOR_ID_QCOM, 0x0400, pci_quirk_qcom_rp_acs },
>   { PCI_VENDOR_ID_QCOM, 0x0401, pci_quirk_qcom_rp_acs },
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-01 Thread Bjorn Helgaas
On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > Is this slowdown significant?  We already iterate over every device
> > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > adding two more iterations to the loop in pci_do_fixups() that tries
> > to match quirks against the current device.  I doubt that would be a
> > measurable slowdown.
> 
> I don't know how significant it is, but I remember people complaining
> about adding new PCI quirks because it takes too long for them to run
> them all. That was in the discussion about the quirk disabling ATS on
> AMD Stoney systems.
> 
> So it probably depends on how many PCI devices are in the system whether
> it causes any measureable slowdown.

I found this [1] from Paul Menzel, which was a slowdown caused by
quirk_usb_early_handoff().  I think the real problem is individual
quirks that take a long time.

The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
course, they're only run for matching devices anyway.  So I'd rather
keep them as PCI_FIXUP_FINAL than add a whole new phase.

Bjorn

[1] 
https://lore.kernel.org/linux-pci/b1533fd5-1fae-7256-9597-36d3d5de9...@molgen.mpg.de/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-05-27 Thread Bjorn Helgaas
On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:
> Some platform devices appear as PCI but are actually on the AMBA bus,
> and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
> Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
> is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
> down iommu probing as all devices in fixup final list will be
> reprocessed, suggested by Joerg, [1]

Is this slowdown significant?  We already iterate over every device
when applying PCI_FIXUP_FINAL quirks, so if we used the existing
PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
adding two more iterations to the loop in pci_do_fixups() that tries
to match quirks against the current device.  I doubt that would be a
measurable slowdown.

> For example:
> Hisilicon platform device need fixup in
> drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2]
> 
> +static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
> +{
> +struct iommu_fwspec *fwspec;
> +
> +pdev->eetlp_prefix_path = 1;
> +fwspec = dev_iommu_fwspec_get(>dev);
> +if (fwspec)
> +fwspec->can_stall = 1;
> +}
> +
> +DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
> +DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, 
> quirk_huawei_pcie_sva); 
> 
> [1] https://www.spinics.net/lists/iommu/msg44591.html
> [2] https://www.spinics.net/lists/linux-pci/msg94559.html

If you reference these in the commit logs, please use lore.kernel.org
links instead of spinics.

> Zhangfei Gao (2):
>   PCI: Introduce PCI_FIXUP_IOMMU
>   iommu: calling pci_fixup_iommu in iommu_fwspec_init
> 
>  drivers/iommu/iommu.c | 4 
>  drivers/pci/quirks.c  | 7 +++
>  include/asm-generic/vmlinux.lds.h | 3 +++
>  include/linux/pci.h   | 8 
>  4 files changed, 22 insertions(+)
> 
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/12] ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic

2020-05-21 Thread Bjorn Helgaas
On Thu, May 21, 2020 at 01:59:58PM +0100, Lorenzo Pieralisi wrote:
> iort_get_device_domain() is PCI specific but it need not be,
> since it can be used to retrieve IRQ domain nexus of any kind
> by adding an irq_domain_bus_token input to it.
> 
> Make it PCI agnostic by also renaming the requestor ID input
> to a more generic ID name.
> 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Bjorn Helgaas 
> Cc: Sudeep Holla 
> Cc: Catalin Marinas 
> Cc: Robin Murphy 
> Cc: "Rafael J. Wysocki" 
> Cc: Marc Zyngier 

Acked-by: Bjorn Helgaas# pci/msi.c

> ---
>  drivers/acpi/arm64/iort.c | 14 +++---
>  drivers/pci/msi.c |  3 ++-
>  include/linux/acpi_iort.h |  7 ---
>  3 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 7cfd77b5e6e8..8f2a961c1364 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -567,7 +567,6 @@ static struct acpi_iort_node *iort_find_dev_node(struct 
> device *dev)
>   node = iort_get_iort_node(dev->fwnode);
>   if (node)
>   return node;
> -
>   /*
>* if not, then it should be a platform device defined in
>* DSDT/SSDT (with Named Component node in IORT)
> @@ -658,13 +657,13 @@ static int __maybe_unused iort_find_its_base(u32 
> its_id, phys_addr_t *base)
>  /**
>   * iort_dev_find_its_id() - Find the ITS identifier for a device
>   * @dev: The device.
> - * @req_id: Device's requester ID
> + * @id: Device's ID
>   * @idx: Index of the ITS identifier list.
>   * @its_id: ITS identifier.
>   *
>   * Returns: 0 on success, appropriate error value otherwise
>   */
> -static int iort_dev_find_its_id(struct device *dev, u32 req_id,
> +static int iort_dev_find_its_id(struct device *dev, u32 id,
>   unsigned int idx, int *its_id)
>  {
>   struct acpi_iort_its_group *its;
> @@ -674,7 +673,7 @@ static int iort_dev_find_its_id(struct device *dev, u32 
> req_id,
>   if (!node)
>   return -ENXIO;
>  
> - node = iort_node_map_id(node, req_id, NULL, IORT_MSI_TYPE);
> + node = iort_node_map_id(node, id, NULL, IORT_MSI_TYPE);
>   if (!node)
>   return -ENXIO;
>  
> @@ -697,19 +696,20 @@ static int iort_dev_find_its_id(struct device *dev, u32 
> req_id,
>   *
>   * Returns: the MSI domain for this device, NULL otherwise
>   */
> -struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id)
> +struct irq_domain *iort_get_device_domain(struct device *dev, u32 id,
> +   enum irq_domain_bus_token bus_token)
>  {
>   struct fwnode_handle *handle;
>   int its_id;
>  
> - if (iort_dev_find_its_id(dev, req_id, 0, _id))
> + if (iort_dev_find_its_id(dev, id, 0, _id))
>   return NULL;
>  
>   handle = iort_find_domain_token(its_id);
>   if (!handle)
>   return NULL;
>  
> - return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI);
> + return irq_find_matching_fwnode(handle, bus_token);
>  }
>  
>  static void iort_set_device_domain(struct device *dev,
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 6b43a5455c7a..74a91f52ecc0 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1558,7 +1558,8 @@ struct irq_domain *pci_msi_get_device_domain(struct 
> pci_dev *pdev)
>   pci_for_each_dma_alias(pdev, get_msi_id_cb, );
>   dom = of_msi_map_get_device_domain(>dev, rid);
>   if (!dom)
> - dom = iort_get_device_domain(>dev, rid);
> + dom = iort_get_device_domain(>dev, rid,
> +  DOMAIN_BUS_PCI_MSI);
>   return dom;
>  }
>  #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
> diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
> index 8e7e2ec37f1b..08ec6bd2297f 100644
> --- a/include/linux/acpi_iort.h
> +++ b/include/linux/acpi_iort.h
> @@ -29,7 +29,8 @@ struct fwnode_handle *iort_find_domain_token(int trans_id);
>  #ifdef CONFIG_ACPI_IORT
>  void acpi_iort_init(void);
>  u32 iort_msi_map_rid(struct device *dev, u32 req_id);
> -struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id);
> +struct irq_domain *iort_get_device_domain(struct device *dev, u32 id,
> +   enum irq_domain_bus_token bus_token);
>  void acpi_configure_pmsi_domain(struct device *dev);
>  int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
>  /* IOMMU interface */
> @@ -40,8 +41,8 @@ int iort_iommu_msi_g

Re: [PATCH 08/12] of/irq: make of_msi_map_get_device_domain() bus agnostic

2020-05-21 Thread Bjorn Helgaas
On Thu, May 21, 2020 at 02:00:04PM +0100, Lorenzo Pieralisi wrote:
> From: Diana Craciun 
> 
> of_msi_map_get_device_domain() is PCI specific but it need not be and
> can be easily changed to be bus agnostic in order to be used by other
> busses by adding an IRQ domain bus token as an input parameter.
> 
> Signed-off-by: Diana Craciun 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Bjorn Helgaas 
> Cc: Rob Herring 
> Cc: Marc Zyngier 

Acked-by: Bjorn Helgaas# pci/msi.c

> ---
>  drivers/of/irq.c   | 8 +---
>  drivers/pci/msi.c  | 2 +-
>  include/linux/of_irq.h | 5 +++--
>  3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/of/irq.c b/drivers/of/irq.c
> index a296eaf52a5b..48a40326984f 100644
> --- a/drivers/of/irq.c
> +++ b/drivers/of/irq.c
> @@ -613,18 +613,20 @@ u32 of_msi_map_rid(struct device *dev, struct 
> device_node *msi_np, u32 rid_in)
>   * of_msi_map_get_device_domain - Use msi-map to find the relevant MSI domain
>   * @dev: device for which the mapping is to be done.
>   * @rid: Requester ID for the device.
> + * @bus_token: Bus token
>   *
>   * Walk up the device hierarchy looking for devices with a "msi-map"
>   * property.
>   *
>   * Returns: the MSI domain for this device (or NULL on failure)
>   */
> -struct irq_domain *of_msi_map_get_device_domain(struct device *dev, u32 rid)
> +struct irq_domain *of_msi_map_get_device_domain(struct device *dev, u32 id,
> + u32 bus_token)
>  {
>   struct device_node *np = NULL;
>  
> - __of_msi_map_rid(dev, , rid);
> - return irq_find_matching_host(np, DOMAIN_BUS_PCI_MSI);
> + __of_msi_map_rid(dev, , id);
> + return irq_find_matching_host(np, bus_token);
>  }
>  
>  /**
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 74a91f52ecc0..9532e1d12d3f 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1556,7 +1556,7 @@ struct irq_domain *pci_msi_get_device_domain(struct 
> pci_dev *pdev)
>   u32 rid = pci_dev_id(pdev);
>  
>   pci_for_each_dma_alias(pdev, get_msi_id_cb, );
> - dom = of_msi_map_get_device_domain(>dev, rid);
> + dom = of_msi_map_get_device_domain(>dev, rid, DOMAIN_BUS_PCI_MSI);
>   if (!dom)
>   dom = iort_get_device_domain(>dev, rid,
>DOMAIN_BUS_PCI_MSI);
> diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h
> index 1214cabb2247..7142a3722758 100644
> --- a/include/linux/of_irq.h
> +++ b/include/linux/of_irq.h
> @@ -52,7 +52,8 @@ extern struct irq_domain *of_msi_get_domain(struct device 
> *dev,
>   struct device_node *np,
>   enum irq_domain_bus_token token);
>  extern struct irq_domain *of_msi_map_get_device_domain(struct device *dev,
> -u32 rid);
> + u32 id,
> + u32 bus_token);
>  extern void of_msi_configure(struct device *dev, struct device_node *np);
>  u32 of_msi_map_rid(struct device *dev, struct device_node *msi_np, u32 
> rid_in);
>  #else
> @@ -85,7 +86,7 @@ static inline struct irq_domain *of_msi_get_domain(struct 
> device *dev,
>   return NULL;
>  }
>  static inline struct irq_domain *of_msi_map_get_device_domain(struct device 
> *dev,
> -   u32 rid)
> + u32 id, u32 bus_token)
>  {
>   return NULL;
>  }
> -- 
> 2.26.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/15] PCI: brcmstb: enable PCIe for STB chips

2020-05-20 Thread Bjorn Helgaas
On Tue, May 19, 2020 at 04:33:58PM -0400, Jim Quinlan wrote:
> This patchset expands the usefulness of the Broadcom Settop Box PCIe
> controller by building upon the PCIe driver used currently by the
> Raspbery Pi.  Other forms of this patchset were submitted by me years
> ago and not accepted; the major sticking point was the code required
> for the DMA remapping needed for the PCIe driver to work [1].
> 
> There have been many changes to the DMA and OF subsystems since that
> time, making a cleaner and less intrusive patchset possible.  This
> patchset implements a generalization of "dev->dma_pfn_offset", except
> that instead of a single scalar offset it provides for multiple
> offsets via a function which depends upon the "dma-ranges" property of
> the PCIe host controller.  This is required for proper functionality
> of the BrcmSTB PCIe controller and possibly some other devices.
> 
> [1] 
> https://lore.kernel.org/linux-arm-kernel/1516058925-46522-5-git-send-email-jim2101...@gmail.com/
> 
> Jim Quinlan (15):
>   PCI: brcmstb: PCIE_BRCMSTB depends on ARCH_BRCMSTB
>   ahci_brcm: fix use of BCM7216 reset controller
>   dt-bindings: PCI: Add bindings for more Brcmstb chips
>   PCI: brcmstb: Add compatibily of other chips
>   PCI: brcmstb: Add suspend and resume pm_ops
>   PCI: brcmstb: Asserting PERST is different for 7278
>   PCI: brcmstb: Add control of rescal reset
>   of: Include a dev param in of_dma_get_range()
>   device core: Add ability to handle multiple dma offsets
>   dma-direct: Invoke dma offset func if needed
>   arm: dma-mapping: Invoke dma offset func if needed
>   PCI: brcmstb: Set internal memory viewport sizes
>   PCI: brcmstb: Accommodate MSI for older chips
>   PCI: brcmstb: Set bus max burst side by chip type
>   PCI: brcmstb: add compatilbe chips to match list

If you have occasion to post a v2 for other reasons,

s/PCIE_BRCMSTB depends on ARCH_BRCMSTB/Allow PCIE_BRCMSTB on ARCH_BRCMSTB also/
s/ahci_brcm: fix use of BCM7216 reset controller/ata: ahci_brcm: Fix .../
s/Add compatibily of other chips/Add bcm7278 register info/
s/Asserting PERST is different for 7278/Add bcm7278 PERST support/
s/Set bus max burst side/Set bus max burst size/
s/add compatilbe chips.*/Add bcm7211, bcm7216, bcm7445, bcm7278 to match list/

Rewrap commit logs to use full 75 character lines (to allow for the 4
spaces added by git log).

In commit logs, s/This commit// (use imperative mood instead).

In "Accommodate MSI for older chips" commit log, s/commont/common/.

>  .../bindings/pci/brcm,stb-pcie.yaml   |  40 +-
>  arch/arm/include/asm/dma-mapping.h|  17 +-
>  drivers/ata/ahci_brcm.c   |  14 +-
>  drivers/of/address.c  |  54 ++-
>  drivers/of/device.c   |   2 +-
>  drivers/of/of_private.h   |   8 +-
>  drivers/pci/controller/Kconfig|   4 +-
>  drivers/pci/controller/pcie-brcmstb.c | 403 +++---
>  include/linux/device.h|   9 +-
>  include/linux/dma-direct.h|  16 +
>  include/linux/dma-mapping.h   |  44 ++
>  kernel/dma/Kconfig|  12 +
>  12 files changed, 542 insertions(+), 81 deletions(-)
> 
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] PCI/ATS: Only enable ATS for trusted devices

2020-05-15 Thread Bjorn Helgaas
On Fri, May 15, 2020 at 12:43:59PM +0200, Jean-Philippe Brucker wrote:
> Add pci_ats_supported(), which checks whether a device has an ATS
> capability, and whether it is trusted.  A device is untrusted if it is
> plugged into an external-facing port such as Thunderbolt and could be
> spoof an existing device to exploit weaknesses in the IOMMU
> configuration.  PCIe ATS is one such weaknesses since it allows
> endpoints to cache IOMMU translations and emit transactions with
> 'Translated' Address Type (10b) that partially bypass the IOMMU
> translation.
> 
> The SMMUv3 and VT-d IOMMU drivers already disallow ATS and transactions
> with 'Translated' Address Type for untrusted devices.  Add the check to
> pci_enable_ats() to let other drivers (AMD IOMMU for now) benefit from
> it.
> 
> By checking ats_cap, the pci_ats_supported() helper also returns whether
> ATS was globally disabled with pci=noats, and could later include more
> things, for example whether the whole PCIe hierarchy down to the
> endpoint supports ATS.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  include/linux/pci-ats.h |  3 +++
>  drivers/pci/ats.c   | 18 +-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index d08f0869f1213e..f75c307f346de9 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -6,11 +6,14 @@
>  
>  #ifdef CONFIG_PCI_ATS
>  /* Address Translation Service */
> +bool pci_ats_supported(struct pci_dev *dev);
>  int pci_enable_ats(struct pci_dev *dev, int ps);
>  void pci_disable_ats(struct pci_dev *dev);
>  int pci_ats_queue_depth(struct pci_dev *dev);
>  int pci_ats_page_aligned(struct pci_dev *dev);
>  #else /* CONFIG_PCI_ATS */
> +static inline bool pci_ats_supported(struct pci_dev *d)
> +{ return false; }
>  static inline int pci_enable_ats(struct pci_dev *d, int ps)
>  { return -ENODEV; }
>  static inline void pci_disable_ats(struct pci_dev *d) { }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 390e92f2d8d1fc..15fa0c37fd8e44 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -30,6 +30,22 @@ void pci_ats_init(struct pci_dev *dev)
>   dev->ats_cap = pos;
>  }
>  
> +/**
> + * pci_ats_supported - check if the device can use ATS
> + * @dev: the PCI device
> + *
> + * Returns true if the device supports ATS and is allowed to use it, false
> + * otherwise.
> + */
> +bool pci_ats_supported(struct pci_dev *dev)
> +{
> + if (!dev->ats_cap)
> + return false;
> +
> + return !dev->untrusted;
> +}
> +EXPORT_SYMBOL_GPL(pci_ats_supported);
> +
>  /**
>   * pci_enable_ats - enable the ATS capability
>   * @dev: the PCI device
> @@ -42,7 +58,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
>   u16 ctrl;
>   struct pci_dev *pdev;
>  
> - if (!dev->ats_cap)
> + if (!pci_ats_supported(dev))
>   return -EINVAL;
>  
>   if (WARN_ON(dev->ats_enabled))
> -- 
> 2.26.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/3] PCI: Add DMA configuration for virtual platforms

2020-03-18 Thread Bjorn Helgaas
On Fri, Feb 28, 2020 at 06:25:37PM +0100, Jean-Philippe Brucker wrote:
> Hardware platforms usually describe the IOMMU topology using either
> device-tree pointers or vendor-specific ACPI tables.  For virtual
> platforms that don't provide a device-tree, the virtio-iommu device
> contains a description of the endpoints it manages.  That information
> allows us to probe endpoints after the IOMMU is probed (possibly as late
> as userspace modprobe), provided it is discovered early enough.
> 
> Add a hook to pci_dma_configure(), which returns -EPROBE_DEFER if the
> endpoint is managed by a vIOMMU that will be loaded later, or 0 in any
> other case to avoid disturbing the normal DMA configuration methods.
> When CONFIG_VIRTIO_IOMMU_TOPOLOGY isn't selected, the call to
> virt_dma_configure() is compiled out.
> 
> As long as the information is consistent, platforms can provide both a
> device-tree and a built-in topology, and the IOMMU infrastructure is
> able to deal with multiple DMA configuration methods.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/pci-driver.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 0454ca0e4e3f..69303a814f21 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "pci.h"
>  #include "pcie/portdrv.h"
>  
> @@ -1602,6 +1603,10 @@ static int pci_dma_configure(struct device *dev)
>   struct device *bridge;
>   int ret = 0;
>  
> + ret = virt_dma_configure(dev);
> + if (ret)
> + return ret;
> +
>   bridge = pci_get_host_bridge_device(to_pci_dev(dev));
>  
>   if (IS_ENABLED(CONFIG_OF) && bridge->parent &&
> -- 
> 2.25.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/6] PCI/ATS: Export symbols of PASID functions

2020-03-18 Thread Bjorn Helgaas
On Mon, Feb 24, 2020 at 05:58:41PM +0100, Jean-Philippe Brucker wrote:
> The Arm SMMUv3 driver uses pci_{enable,disable}_pasid() and related
> functions.  Export them to allow the driver to be built as a module.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/ats.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 3ef0bb281e7c..390e92f2d8d1 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -366,6 +366,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>  
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(pci_enable_pasid);
>  
>  /**
>   * pci_disable_pasid - Disable the PASID capability
> @@ -390,6 +391,7 @@ void pci_disable_pasid(struct pci_dev *pdev)
>  
>   pdev->pasid_enabled = 0;
>  }
> +EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
>  /**
>   * pci_restore_pasid_state - Restore PASID capabilities
> @@ -441,6 +443,7 @@ int pci_pasid_features(struct pci_dev *pdev)
>  
>   return supported;
>  }
> +EXPORT_SYMBOL_GPL(pci_pasid_features);
>  
>  #define PASID_NUMBER_SHIFT   8
>  #define PASID_NUMBER_MASK(0x1f << PASID_NUMBER_SHIFT)
> @@ -469,4 +472,5 @@ int pci_max_pasids(struct pci_dev *pdev)
>  
>   return (1 << supported);
>  }
> +EXPORT_SYMBOL_GPL(pci_max_pasids);
>  #endif /* CONFIG_PCI_PASID */
> -- 
> 2.25.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 02/11] PCI: Add ats_supported host bridge flag

2020-03-12 Thread Bjorn Helgaas
On Wed, Mar 11, 2020 at 01:44:57PM +0100, Jean-Philippe Brucker wrote:
> Each vendor has their own way of describing whether a host bridge
> supports ATS.  The Intel and AMD ACPI tables selectively enable or
> disable ATS per device or sub-tree, while Arm has a single bit for each
> host bridge.  For those that need it, add an ats_supported bit to the
> host bridge structure.

Can you mention the specific ACPI tables here in the commit log?

Maybe elaborate on the "for those that need it" bit?  I'm not sure if
you need it for the cases where DT or ACPI tells us directly for the
host bridge, or if you need it for the more selective cases?

I guess in one sense you *always* need it since you check the cached
bit later.

I don't understand the implications of this, especially the selective
situation.  Given your comment from the first posting, I thought this
was a property of the host bridge, so I don't know what it means to
say some devices support ATS but others don't.

> Signed-off-by: Jean-Philippe Brucker 
> ---
> v1->v2: try to improve the comment
> ---
>  drivers/pci/probe.c | 8 
>  include/linux/pci.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 512cb4312ddd..b5e36f06b40a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -598,6 +598,14 @@ static void pci_init_host_bridge(struct pci_host_bridge 
> *bridge)
>   bridge->native_shpc_hotplug = 1;
>   bridge->native_pme = 1;
>   bridge->native_ltr = 1;
> +
> + /*
> +  * Some systems (ACPI IORT, device-tree) declare ATS support at the host
> +  * bridge, and clear this bit when ATS isn't supported. Others (ACPI
> +  * DMAR and IVRS) declare ATS support with a smaller granularity, and
> +  * need this bit set.
> +  */
> + bridge->ats_supported = 1;
>  }
>  
>  struct pci_host_bridge *pci_alloc_host_bridge(size_t priv)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3840a541a9de..9fe2e84d74d7 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -511,6 +511,7 @@ struct pci_host_bridge {
>   unsigned intnative_pme:1;   /* OS may use PCIe PME */
>   unsigned intnative_ltr:1;   /* OS may use PCIe LTR */
>   unsigned intpreserve_config:1;  /* Preserve FW resource setup */
> + unsigned intats_supported:1;
>  
>   /* Resource alignment requirements */
>   resource_size_t (*align_resource)(struct pci_dev *dev,
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 05/11] PCI/ATS: Gather checks into pci_ats_supported()

2020-03-12 Thread Bjorn Helgaas
On Wed, Mar 11, 2020 at 01:45:00PM +0100, Jean-Philippe Brucker wrote:
> IOMMU drivers need to perform several tests when checking if a device
> supports ATS.  Move them all into a new function that returns true when
> a device and its host bridge support ATS.
> 
> Since pci_enable_ats() now calls pci_ats_supported(), the following
> new checks are now common:
> * whether a device is trusted.  Devices plugged into external-facing
>   ports such as thunderbolt are untrusted.
> * whether the host bridge supports ATS, which defaults to true unless
>   the firmware description states that ATS isn't supported by the host
>   bridge.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/ats.c   | 30 +-
>  include/linux/pci-ats.h |  3 +++
>  2 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 390e92f2d8d1..bbfd0d42b8b9 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -30,6 +30,34 @@ void pci_ats_init(struct pci_dev *dev)
>   dev->ats_cap = pos;
>  }
>  
> +/**
> + * pci_ats_supported - check if the device can use ATS
> + * @dev: the PCI device
> + *
> + * Returns true if the device supports ATS and is allowed to use it, false
> + * otherwise.
> + */
> +bool pci_ats_supported(struct pci_dev *dev)
> +{
> + struct pci_host_bridge *bridge;
> +
> + if (!dev->ats_cap)
> + return false;
> +
> + if (dev->untrusted)
> + return false;
> +
> + bridge = pci_find_host_bridge(dev->bus);
> + if (!bridge)
> + return false;
> +
> + if (!bridge->ats_supported)
> + return false;
> +
> + return true;

I assume this is the same as

  return bridge->ats_supported;

Only "assuming" because I'm not a C language lawyer, but I assume it
does the obvious conversion from unsigned:1 to bool.

> +}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 04/11] ACPI/IORT: Check ATS capability in root complex node

2020-03-12 Thread Bjorn Helgaas
On Wed, Mar 11, 2020 at 01:44:59PM +0100, Jean-Philippe Brucker wrote:
> When initializing a PCI root bridge, copy its "ATS supported" attribute
> into the root bridge.
> 
> Acked-by: Hanjun Guo 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/acpi/arm64/iort.c | 27 +++
>  drivers/acpi/pci_root.c   |  3 +++
>  include/linux/acpi_iort.h |  8 
>  3 files changed, 38 insertions(+)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index ed3d2d1a7ae9..d99d7f5b51e1 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1633,6 +1633,33 @@ static void __init iort_enable_acs(struct 
> acpi_iort_node *iort_node)
>   }
>   }
>  }
> +
> +static acpi_status iort_match_host_bridge_callback(struct acpi_iort_node 
> *node,
> +void *context)
> +{
> + struct acpi_iort_root_complex *pci_rc;
> + struct pci_host_bridge *host_bridge = context;
> +
> + pci_rc = (struct acpi_iort_root_complex *)node->node_data;
> +
> + return pci_domain_nr(host_bridge->bus) == pci_rc->pci_segment_number ?
> + AE_OK : AE_NOT_FOUND;
> +}
> +
> +void iort_pci_host_bridge_setup(struct pci_host_bridge *host_bridge)
> +{
> + struct acpi_iort_node *node;
> + struct acpi_iort_root_complex *pci_rc;
> +
> + node = iort_scan_node(ACPI_IORT_NODE_PCI_ROOT_COMPLEX,
> +   iort_match_host_bridge_callback, host_bridge);
> + if (!node)
> + return;
> +
> + pci_rc = (struct acpi_iort_root_complex *)node->node_data;
> + host_bridge->ats_supported = !!(pci_rc->ats_attribute &
> + ACPI_IORT_ATS_SUPPORTED);
> +}
>  #else
>  static inline void iort_enable_acs(struct acpi_iort_node *iort_node) { }
>  #endif
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index d1e666ef3fcc..eb2fb8f17c0b 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -6,6 +6,7 @@
>   *  Copyright (C) 2001, 2002 Paul Diefenbaugh 
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -917,6 +918,8 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root 
> *root,
>   if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
>   host_bridge->native_ltr = 0;
>  
> + iort_pci_host_bridge_setup(host_bridge);

Similar comment as on the OF side.

You mentioned at [1] that "it's important that we only enable ATS if
the host bridge supports it".  That should be captured in a commit log
and comment somewhere here.

That suggests to me that we should not set

  bridge->ats_supported = 1;

by default in pci_init_host_bridge(), but rather leave it zero as it
is by default, and then do things like:

  if (iort_pci_host_bridge_ats_supported(bridge))
bridge->ats_supported = 1;

  if (of_pci_host_bridge_ats_supported(bridge))
bridge->ats_supported = 1;

I don't know what you do about IVRS and DMAR, which don't appear in
this series except in the comment.

[1] https://lore.kernel.org/r/20200213165049.508908-1-jean-phili...@linaro.org

>   /*
>* Evaluate the "PCI Boot Configuration" _DSM Function.  If it
>* exists and returns 0, we must preserve any PCI resource
> diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
> index 8e7e2ec37f1b..7b06871cc3aa 100644
> --- a/include/linux/acpi_iort.h
> +++ b/include/linux/acpi_iort.h
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IORT_IRQ_MASK(irq)   (irq & 0xULL)
>  #define IORT_IRQ_TRIGGER_MASK(irq)   ((irq >> 32) & 0xULL)
> @@ -55,4 +56,11 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
> struct list_head *head)
>  { return 0; }
>  #endif
>  
> +#if defined(CONFIG_ACPI_IORT) && defined(CONFIG_PCI)
> +void iort_pci_host_bridge_setup(struct pci_host_bridge *host_bridge);
> +#else
> +static inline
> +void iort_pci_host_bridge_setup(struct pci_host_bridge *host_bridge) { }
> +#endif
> +
>  #endif /* __ACPI_IORT_H__ */
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/11] PCI: OF: Check whether the host bridge supports ATS

2020-03-12 Thread Bjorn Helgaas
On Wed, Mar 11, 2020 at 01:44:58PM +0100, Jean-Philippe Brucker wrote:
> When setting up a generic host on a device-tree based system, copy the
> ats-supported flag into the pci_host_bridge structure.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
> v1->v2: keep the helper in pci-host-common.c
> ---
>  drivers/pci/controller/pci-host-common.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-host-common.c 
> b/drivers/pci/controller/pci-host-common.c
> index 250a3fc80ec6..2e800bc6ae7a 100644
> --- a/drivers/pci/controller/pci-host-common.c
> +++ b/drivers/pci/controller/pci-host-common.c
> @@ -54,6 +54,16 @@ static struct pci_config_window *gen_pci_init(struct 
> device *dev,
>   return ERR_PTR(err);
>  }
>  
> +static void of_pci_host_check_ats(struct pci_host_bridge *bridge)
> +{
> + struct device_node *np = bridge->bus->dev.of_node;
> +
> + if (!np)
> + return;
> +
> + bridge->ats_supported = of_property_read_bool(np, "ats-supported");
> +}
> +
>  int pci_host_common_probe(struct platform_device *pdev,
> struct pci_ecam_ops *ops)
>  {
> @@ -92,6 +102,7 @@ int pci_host_common_probe(struct platform_device *pdev,
>   return ret;
>   }
>  
> + of_pci_host_check_ats(bridge);

I would prefer to write this as a predicate instead of having the
assignment be a side-effect, e.g.,

  bridge->ats_supported = of_pci_host_ats_supported(bridge);

If that works for you,

Acked-by: Bjorn Helgaas 

>   platform_set_drvdata(pdev, bridge->bus);
>   return 0;
>  }
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 24/26] PCI/ATS: Add PRI stubs

2020-02-27 Thread Bjorn Helgaas
On Mon, Feb 24, 2020 at 07:23:59PM +0100, Jean-Philippe Brucker wrote:
> The SMMUv3 driver, which can be built without CONFIG_PCI, will soon gain
> support for PRI.  Partially revert commit c6e9aefbf9db ("PCI/ATS: Remove
> unused PRI and PASID stubs") to re-introduce the PRI stubs, and avoid
> adding more #ifdefs to the SMMU driver.
> 
> Cc: Bjorn Helgaas 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  include/linux/pci-ats.h | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346d..e9e266df9b37 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,14 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  int pci_reset_pri(struct pci_dev *pdev);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +#else /* CONFIG_PCI_PRI */
> +static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> +{ return -ENODEV; }
> +static inline void pci_disable_pri(struct pci_dev *pdev) { }
> +static inline int pci_reset_pri(struct pci_dev *pdev)
> +{ return -ENODEV; }
> +static inline int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> +{ return 0; }
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.25.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 25/26] PCI/ATS: Export symbols of PRI functions

2020-02-27 Thread Bjorn Helgaas
Subject could be simply "PCI/ATS: Export PRI functions"

On Mon, Feb 24, 2020 at 07:24:00PM +0100, Jean-Philippe Brucker wrote:
> The SMMUv3 driver uses pci_{enable,disable}_pri() and related
> functions. Export those functions to allow the driver to be built as a
> module.
> 
> Cc: Bjorn Helgaas 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/ats.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index bbfd0d42b8b9..fc8fc6fc8bd5 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -197,6 +197,7 @@ void pci_pri_init(struct pci_dev *pdev)
>   if (status & PCI_PRI_STATUS_PASID)
>   pdev->pasid_required = 1;
>  }
> +EXPORT_SYMBOL_GPL(pci_pri_init);
>  
>  /**
>   * pci_enable_pri - Enable PRI capability
> @@ -243,6 +244,7 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>  
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(pci_enable_pri);
>  
>  /**
>   * pci_disable_pri - Disable PRI capability
> @@ -322,6 +324,7 @@ int pci_reset_pri(struct pci_dev *pdev)
>  
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(pci_reset_pri);
>  
>  /**
>   * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
> @@ -337,6 +340,7 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  
>   return pdev->pasid_required;
>  }
> +EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.25.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/11] PCI: Add ats_supported host bridge flag

2020-02-15 Thread Bjorn Helgaas
On Thu, Feb 13, 2020 at 05:50:40PM +0100, Jean-Philippe Brucker wrote:
> Each vendor has their own way of describing whether a host bridge
> supports ATS.  The Intel and AMD ACPI tables selectively enable or
> disable ATS per device or sub-tree, while Arm has a single bit for each
> host bridge.  For those that need it, add an ats_supported bit to the
> host bridge structure.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/pci/probe.c | 7 +++
>  include/linux/pci.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 512cb4312ddd..75c0a25af44e 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -598,6 +598,13 @@ static void pci_init_host_bridge(struct pci_host_bridge 
> *bridge)
>   bridge->native_shpc_hotplug = 1;
>   bridge->native_pme = 1;
>   bridge->native_ltr = 1;
> +
> + /*
> +  * Some systems may disable ATS at the host bridge (ACPI IORT,
> +  * device-tree), other filter it with a smaller granularity (ACPI DMAR
> +  * and IVRS).
> +  */
> + bridge->ats_supported = 1;

The cover letter says it's important to enable ATS only if the host
bridge supports it.  From the other patches, it looks like we learn if
the host bridge supports ATS from either a DT "ats-supported" property
or an ACPI IORT table.  If that's the case, shouldn't the default here
be "ATS is *not* supported"?

>  }
>  
>  struct pci_host_bridge *pci_alloc_host_bridge(size_t priv)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3840a541a9de..9fe2e84d74d7 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -511,6 +511,7 @@ struct pci_host_bridge {
>   unsigned intnative_pme:1;   /* OS may use PCIe PME */
>   unsigned intnative_ltr:1;   /* OS may use PCIe LTR */
>   unsigned intpreserve_config:1;  /* Preserve FW resource setup */
> + unsigned intats_supported:1;
>  
>   /* Resource alignment requirements */
>   resource_size_t (*align_resource)(struct pci_dev *dev,
> -- 
> 2.25.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/7] Clean up VMD DMA Map Ops

2020-01-24 Thread Bjorn Helgaas
On Tue, Jan 21, 2020 at 06:37:44AM -0700, Jon Derrick wrote:
> v4 Set: 
> https://lore.kernel.org/linux-pci/20200120110220.gb17...@e121166-lin.cambridge.arm.com/T/#t
> v3 Set: 
> https://lore.kernel.org/linux-iommu/20200113181742.ga27...@e121166-lin.cambridge.arm.com/T/#t
> v2 Set: 
> https://lore.kernel.org/linux-iommu/1578580256-3483-1-git-send-email-jonathan.derr...@intel.com/T/#t
> v1 Set: 
> https://lore.kernel.org/linux-iommu/20200107134125.gd30...@8bytes.org/T/#t
> 
> VMD currently works with VT-d enabled by pointing DMA and IOMMU actions at the
> VMD endpoint. The problem with this approach is that the VMD endpoint's
> device-specific attributes, such as the DMA Mask Bits, are used instead of the
> child device's attributes.
> 
> This set cleans up VMD by removing the override that redirects DMA map
> operations to the VMD endpoint. Instead it introduces a new DMA alias 
> mechanism
> into the existing DMA alias infrastructure. This new DMA alias mechanism 
> allows
> an architecture-specific pci_real_dma_dev() function to provide a pointer from
> a pci_dev to its PCI DMA device, where by default it returns the original
> pci_dev.
> 
> In addition, this set removes the sanity check that was added to prevent
> assigning VMD child devices. By using the DMA alias mechanism, all child
> devices are assigned the same IOMMU group as the VMD endpoint. This removes 
> the
> need for restricting VMD child devices from assignment, as the whole group
> would have to be assigned, requiring unbinding the VMD driver and removing the
> child device domain.
> 
> v1 added a pointer in struct pci_dev that pointed to the DMA alias' struct
> pci_dev and did the necessary DMA alias and IOMMU modifications.
> 
> v2 introduced a new weak function to reference the 'Direct DMA Alias', and
> removed the need to add a pointer in struct device or pci_dev. Weak functions
> are generally frowned upon when it's a single architecture implementation, so 
> I
> am open to alternatives.
> 
> v3 referenced the pci_dev rather than the struct device for the PCI
> 'Direct DMA Alias' (pci_direct_dma_alias()). This revision also allowed
> pci_for_each_dma_alias() to call any DMA aliases for the Direct DMA alias
> device, though I don't expect the VMD endpoint to need intra-bus DMA aliases.
> 
> v4 changes the 'Direct DMA Alias' to instead refer to the 'Real DMA Dev', 
> which
> either returns the PCI device itself or the PCI DMA device.
> 
> v5 Fixes a bad call argument to pci_real_dma_dev that would have broken
> bisection. This revision also changes one of the calls to a one-liner, and
> assembles the same on my system.
> 
> 
> Changes from v4:
> Fix pci_real_dma_dev() call in 4/7.
> Change other pci_real_dma_dev() call in 4/7 to one-liner.
> 
> Changes from v3:
> Uses pci_real_dma_dev() instead of pci_direct_dma_alias()
> Split IOMMU enabling, IOMMU VMD sanity check and VMD dma_map_ops cleanup into 
> three patches
> 
> Changes from v2:
> Uses struct pci_dev for PCI Device 'Direct DMA aliasing' 
> (pci_direct_dma_alias)
> Allows pci_for_each_dma_alias to iterate over the alias mask of the 'Direct 
> DMA alias'
> 
> Changes from v1:
> Removed 1/5 & 2/5 misc fix patches that were merged
> Uses Christoph's staging/cleanup patches
> Introduce weak function rather than including pointer in struct device or 
> pci_dev.
> 
> Based on Bjorn's next:
> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/?h=next
> 
> Christoph Hellwig (2):
>   x86/PCI: Add a to_pci_sysdata helper
>   x86/PCI: Remove X86_DEV_DMA_OPS
> 
> Jon Derrick (5):
>   x86/PCI: Expose VMD's PCI Device in pci_sysdata
>   PCI: Introduce pci_real_dma_dev()
>   iommu/vt-d: Use pci_real_dma_dev() for mapping
>   iommu/vt-d: Remove VMD child device sanity check
>   PCI: vmd: Stop overriding dma_map_ops
> 
>  arch/x86/Kconfig   |   3 -
>  arch/x86/include/asm/device.h  |  10 ---
>  arch/x86/include/asm/pci.h |  31 -
>  arch/x86/pci/common.c  |  48 +++--
>  drivers/iommu/intel-iommu.c|  11 ++-
>  drivers/pci/controller/Kconfig |   1 -
>  drivers/pci/controller/vmd.c   | 152 
> +
>  drivers/pci/pci.c  |  19 +-
>  drivers/pci/search.c   |   6 ++
>  include/linux/pci.h|   1 +
>  10 files changed, 54 insertions(+), 228 deletions(-)

Applied with acks/reviewed-by from Lu, Keith, and Lorenzo to
pci/host-vmd for v5.6, thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/7] PCI: Introduce pci_real_dma_dev()

2020-01-22 Thread Bjorn Helgaas
On Tue, Jan 21, 2020 at 06:37:47AM -0700, Jon Derrick wrote:
> The current DMA alias implementation requires the aliased device be on
> the same PCI bus as the requester ID. This introduces an arch-specific
> mechanism to point to another PCI device when doing mapping and
> PCI DMA alias search. The default case returns the actual device.
> 
> CC: Christoph Hellwig 
> Signed-off-by: Jon Derrick 

Acked-by: Bjorn Helgaas 

Looks like a nice cleanup to me.

Lorenzo, let me know if you want me to take this.

> ---
>  arch/x86/pci/common.c | 10 ++
>  drivers/pci/pci.c | 19 ++-
>  drivers/pci/search.c  |  6 ++
>  include/linux/pci.h   |  1 +
>  4 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 1e59df0..fe21a5c 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -736,3 +736,13 @@ int pci_ext_cfg_avail(void)
>   else
>   return 0;
>  }
> +
> +#if IS_ENABLED(CONFIG_VMD)
> +struct pci_dev *pci_real_dma_dev(struct pci_dev *dev)
> +{
> + if (is_vmd(dev->bus))
> + return to_pci_sysdata(dev->bus)->vmd_dev;
> +
> + return dev;
> +}
> +#endif
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 581b177..36d24f2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6048,7 +6048,9 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, 
> struct pci_dev *dev2)
>   return (dev1->dma_alias_mask &&
>   test_bit(dev2->devfn, dev1->dma_alias_mask)) ||
>  (dev2->dma_alias_mask &&
> - test_bit(dev1->devfn, dev2->dma_alias_mask));
> + test_bit(dev1->devfn, dev2->dma_alias_mask)) ||
> +pci_real_dma_dev(dev1) == dev2 ||
> +pci_real_dma_dev(dev2) == dev1;
>  }
>  
>  bool pci_device_is_present(struct pci_dev *pdev)
> @@ -6072,6 +6074,21 @@ void pci_ignore_hotplug(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(pci_ignore_hotplug);
>  
> +/**
> + * pci_real_dma_dev - Get PCI DMA device for PCI device
> + * @dev: the PCI device that may have a PCI DMA alias
> + *
> + * Permits the platform to provide architecture-specific functionality to
> + * devices needing to alias DMA to another PCI device on another PCI bus. If
> + * the PCI device is on the same bus, it is recommended to use
> + * pci_add_dma_alias(). This is the default implementation. Architecture
> + * implementations can override this.
> + */
> +struct pci_dev __weak *pci_real_dma_dev(struct pci_dev *dev)
> +{
> + return dev;
> +}
> +
>  resource_size_t __weak pcibios_default_alignment(void)
>  {
>   return 0;
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index e4dbdef..2061672 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -32,6 +32,12 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
>   struct pci_bus *bus;
>   int ret;
>  
> + /*
> +  * The device may have an explicit alias requester ID for DMA where the
> +  * requester is on another PCI bus.
> +  */
> + pdev = pci_real_dma_dev(pdev);
>   ret = fn(pdev, pci_dev_id(pdev), data);
>   if (ret)
>   return ret;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 930fab2..3840a54 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1202,6 +1202,7 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, 
> struct pci_dev **limiting_dev,
>  int pci_select_bars(struct pci_dev *dev, unsigned long flags);
>  bool pci_device_is_present(struct pci_dev *pdev);
>  void pci_ignore_hotplug(struct pci_dev *dev);
> +struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
>  
>  int __printf(6, 7) pci_request_irq(struct pci_dev *dev, unsigned int nr,
>   irq_handler_t handler, irq_handler_t thread_fn, void *dev_id,
> -- 
> 1.8.3.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough

2020-01-21 Thread Bjorn Helgaas
[+cc linux-pci, thread at 
https://lore.kernel.org/r/20200101052648.14295-1-baolu...@linux.intel.com]

On Wed, Jan 01, 2020 at 01:26:46PM +0800, Lu Baolu wrote:
> The new parameter takes a list of devices separated by a semicolon.
> Each device specified will have its iommu_passthrough bit in struct
> device set. This is very similar to the existing 'disable_acs_redir'
> parameter.

Almost all of this patchset is in drivers/iommu.  Should the parameter
be "iommu ..." instead of "pci=iommu_passthrough=..."?

There is already an "iommu.passthrough=" argument.  Would this fit
better there?  Since the iommu_passthrough bit is generic, it seems
like you anticipate similar situations for non-PCI devices.

> Signed-off-by: Lu Baolu 
> ---
>  .../admin-guide/kernel-parameters.txt |  5 +++
>  drivers/pci/pci.c | 34 +++
>  drivers/pci/pci.h |  1 +
>  drivers/pci/probe.c   |  2 ++
>  4 files changed, 42 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index ade4e6ec23e0..d3edc2cb6696 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3583,6 +3583,11 @@
>   may put more devices in an IOMMU group.
>   force_floating  [S390] Force usage of floating interrupts.
>   nomio   [S390] Do not use MIO instructions.
> + iommu_passthrough=[; ...]
> + Specify one or more PCI devices (in the format
> + specified above) separated by semicolons.
> + Each device specified will bypass IOMMU DMA
> + translation.
>  
>   pcie_aspm=  [PCIE] Forcibly enable or disable PCIe Active State 
> Power
>   Management.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 90dbd7c70371..05bf3f4acc36 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6401,6 +6401,37 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus)
>  }
>  EXPORT_SYMBOL(pci_fixup_cardbus);
>  
> +static const char *iommu_passthrough_param;
> +bool pci_iommu_passthrough_match(struct pci_dev *dev)
> +{
> + int ret = 0;
> + const char *p = iommu_passthrough_param;
> +
> + if (!p)
> + return false;
> +
> + while (*p) {
> + ret = pci_dev_str_match(dev, p, );
> + if (ret < 0) {
> + pr_info_once("PCI: Can't parse iommu_passthrough 
> parameter: %s\n",
> +  iommu_passthrough_param);
> +
> + break;
> + } else if (ret == 1) {
> + pci_info(dev, "PCI: IOMMU passthrough\n");
> + return true;
> + }
> +
> + if (*p != ';' && *p != ',') {
> + /* End of param or invalid format */
> + break;
> + }
> + p++;
> + }
> +
> + return false;
> +}
> +
>  static int __init pci_setup(char *str)
>  {
>   while (str) {
> @@ -6462,6 +6493,8 @@ static int __init pci_setup(char *str)
>   pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
>   } else if (!strncmp(str, "disable_acs_redir=", 18)) {
>   disable_acs_redir_param = str + 18;
> + } else if (!strncmp(str, "iommu_passthrough=", 18)) {
> + iommu_passthrough_param = str + 18;
>   } else {
>   pr_err("PCI: Unknown option `%s'\n", str);
>   }
> @@ -6486,6 +6519,7 @@ static int __init pci_realloc_setup_params(void)
>   resource_alignment_param = kstrdup(resource_alignment_param,
>  GFP_KERNEL);
>   disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
> + iommu_passthrough_param = kstrdup(iommu_passthrough_param, GFP_KERNEL);
>  
>   return 0;
>  }
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index a0a53bd05a0b..95f6af06aba6 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -288,6 +288,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev 
> *dev);
>  void pci_disable_bridge_window(struct pci_dev *dev);
>  struct pci_bus *pci_bus_get(struct pci_bus *bus);
>  void pci_bus_put(struct pci_bus *bus);
> +bool pci_iommu_passthrough_match(struct pci_dev *dev);
>  
>  /* PCIe link information */
>  #define PCIE_SPEED2STR(speed) \
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 512cb4312ddd..4c571ee75621 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2404,6 +2404,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus 
> *bus)
>  
>   dev->state_saved = false;
>  
> + 

Re: [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough

2020-01-17 Thread Bjorn Helgaas
On Wed, Jan 01, 2020 at 01:26:46PM +0800, Lu Baolu wrote:
> The new parameter takes a list of devices separated by a semicolon.
> Each device specified will have its iommu_passthrough bit in struct
> device set. This is very similar to the existing 'disable_acs_redir'
> parameter.
> 
> Signed-off-by: Lu Baolu 
> ---
>  .../admin-guide/kernel-parameters.txt |  5 +++
>  drivers/pci/pci.c | 34 +++
>  drivers/pci/pci.h |  1 +
>  drivers/pci/probe.c   |  2 ++
>  4 files changed, 42 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index ade4e6ec23e0..d3edc2cb6696 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3583,6 +3583,11 @@
>   may put more devices in an IOMMU group.
>   force_floating  [S390] Force usage of floating interrupts.
>   nomio   [S390] Do not use MIO instructions.
> + iommu_passthrough=[; ...]
> + Specify one or more PCI devices (in the format
> + specified above) separated by semicolons.
> + Each device specified will bypass IOMMU DMA
> + translation.
>  
>   pcie_aspm=  [PCIE] Forcibly enable or disable PCIe Active State 
> Power
>   Management.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 90dbd7c70371..05bf3f4acc36 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6401,6 +6401,37 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus)
>  }
>  EXPORT_SYMBOL(pci_fixup_cardbus);
>  
> +static const char *iommu_passthrough_param;
> +bool pci_iommu_passthrough_match(struct pci_dev *dev)
> +{
> + int ret = 0;
> + const char *p = iommu_passthrough_param;
> +
> + if (!p)
> + return false;
> +
> + while (*p) {
> + ret = pci_dev_str_match(dev, p, );
> + if (ret < 0) {
> + pr_info_once("PCI: Can't parse iommu_passthrough 
> parameter: %s\n",
> +  iommu_passthrough_param);
> +
> + break;
> + } else if (ret == 1) {
> + pci_info(dev, "PCI: IOMMU passthrough\n");
> + return true;
> + }
> +
> + if (*p != ';' && *p != ',') {
> + /* End of param or invalid format */
> + break;
> + }
> + p++;
> + }
> +
> + return false;
> +}

This duplicates a lot of the code in pci_disable_acs_redir().  That
needs to be factored out somehow so we don't duplicate it.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/5] Clean up VMD DMA Map Ops

2020-01-13 Thread Bjorn Helgaas
On Mon, Jan 13, 2020 at 05:13:38PM +, Derrick, Jonathan wrote:
> On Mon, 2020-01-13 at 12:08 +, Lorenzo Pieralisi wrote:
> > On Fri, Jan 10, 2020 at 10:21:08AM -0700, Jon Derrick wrote:
> > > v2 Set: 
> > > https://lore.kernel.org/linux-iommu/1578580256-3483-1-git-send-email-jonathan.derr...@intel.com/T/#t
> > > v1 Set: 
> > > https://lore.kernel.org/linux-iommu/20200107134125.gd30...@8bytes.org/T/#t
> > > 
> > > VMD currently works with VT-d enabled by pointing DMA and IOMMU actions 
> > > at the
> > > VMD endpoint. The problem with this approach is that the VMD endpoint's
> > > device-specific attributes, such as the DMA Mask Bits, are used instead.
> > > 
> > > This set cleans up VMD by removing the override that redirects DMA map
> > > operations to the VMD endpoint. Instead it introduces a new DMA alias 
> > > mechanism
> > > into the existing DMA alias infrastructure.
> > > 
> > > v1 added a pointer in struct pci_dev that pointed to the DMA alias' struct
> > > pci_dev and did the necessary DMA alias and IOMMU modifications.
> > > 
> > > v2 introduced a new weak function to reference the 'Direct DMA Alias', and
> > > removed the need to add a pointer in struct device or pci_dev. Weak 
> > > functions
> > > are generally frowned upon when it's a single architecture 
> > > implementation, so I
> > > am open to alternatives.
> > > 
> > > v3 references the pci_dev rather than the struct device for the PCI
> > > 'Direct DMA Alias' (pci_direct_dma_alias()). This revision also allows
> > > pci_for_each_dma_alias() to call any DMA aliases for the Direct DMA alias
> > > device, though I don't expect the VMD endpoint to need intra-bus DMA 
> > > aliases.
> > > 
> > > Changes from v2:
> > > Uses struct pci_dev for PCI Device 'Direct DMA aliasing' 
> > > (pci_direct_dma_alias)
> > > Allows pci_for_each_dma_alias to iterate over the alias mask of the 
> > > 'Direct DMA alias'
> > > 
> > > Changes from v1:
> > > Removed 1/5 & 2/5 misc fix patches that were merged
> > > Uses Christoph's staging/cleanup patches
> > > Introduce weak function rather than including pointer in struct device or 
> > > pci_dev.
> > > 
> > > Based on Joerg's next:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/
> > > 
> > > Jon Derrick (5):
> > >   x86/pci: Add a to_pci_sysdata helper
> > >   x86/PCI: Expose VMD's PCI Device in pci_sysdata
> > >   PCI: Introduce pci_direct_dma_alias()
> > >   PCI: vmd: Stop overriding dma_map_ops
> > >   x86/pci: Remove X86_DEV_DMA_OPS
> > > 
> > >  arch/x86/Kconfig   |   3 -
> > >  arch/x86/include/asm/device.h  |  10 ---
> > >  arch/x86/include/asm/pci.h |  31 -
> > >  arch/x86/pci/common.c  |  45 ++--
> > >  drivers/iommu/intel-iommu.c|  18 +++--
> > >  drivers/pci/controller/Kconfig |   1 -
> > >  drivers/pci/controller/vmd.c   | 152 
> > > +
> > >  drivers/pci/pci.c  |  19 +-
> > >  drivers/pci/search.c   |   7 ++
> > >  include/linux/pci.h|   1 +
> > >  10 files changed, 61 insertions(+), 226 deletions(-)
> > 
> > Jon, Christoph,
> > 
> > AFAICS this series supersedes/overrides:
> > 
> > https://patchwork.kernel.org/patch/4831/
> > 
> > Please let me know if that's correct, actually I was waiting to
> > see consensus on the patch above but if this series supersedes
> > it I would drop it from the PCI review queue.
> > 
> > Thanks,
> > Lorenzo
> 
> It does supercede it (with Christoph's blessing). By the way, I have
> been basing on Joerg's repo  due to the v1/RFC IOMMU modifcations. As
> there's more pci work at this point, should I base it on Bjorn's repo
> instead?

In general if I'm going to apply something, I prefer it based on my
"master" branch unless there's a reason to the contrary.  I think
Lorenzo works pretty much the same way.

Lorenzo will probably handle this series, but I applied it
experimentally to check out the brace thing, and it applied fine to my
"master" branch.  So I think everything's fine as-is.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] x86/pci: Add a to_pci_sysdata helper

2020-01-13 Thread Bjorn Helgaas
On Fri, Jan 10, 2020 at 10:21:09AM -0700, Jon Derrick wrote:
> From: Christoph Hellwig 
> 
> Various helpers need the pci_sysdata just to dereference a single field
> in it.  Add a little helper that returns the properly typed sysdata
> pointer to require a little less boilerplate code.
> 
> Signed-off-by: Christoph Hellwig 
> [jonathan.derrick: added un-const cast]
> Signed-off-by: Jon Derrick 
> ---
>  arch/x86/include/asm/pci.h | 28 +---
>  1 file changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
> index 90d0731..cf680c5 100644
> --- a/arch/x86/include/asm/pci.h
> +++ b/arch/x86/include/asm/pci.h
> @@ -35,12 +35,15 @@ struct pci_sysdata {
>  
>  #ifdef CONFIG_PCI
>  
> +static inline struct pci_sysdata *to_pci_sysdata(struct pci_bus *bus)
> +{
> + return bus->sysdata;
> +}
> +
>  #ifdef CONFIG_PCI_DOMAINS
>  static inline int pci_domain_nr(struct pci_bus *bus)
>  {
> - struct pci_sysdata *sd = bus->sysdata;
> -
> - return sd->domain;
> + return to_pci_sysdata(bus)->domain;
>  }
>  
>  static inline int pci_proc_domain(struct pci_bus *bus)
> @@ -52,23 +55,20 @@ static inline int pci_proc_domain(struct pci_bus *bus)
>  #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
>  static inline void *_pci_root_bus_fwnode(struct pci_bus *bus)
>  {
> - struct pci_sysdata *sd = bus->sysdata;
> -
> - return sd->fwnode;
> + return to_pci_sysdata(bus)->fwnode;
>  }
>  
>  #define pci_root_bus_fwnode  _pci_root_bus_fwnode
>  #endif
>  
> +#if IS_ENABLED(CONFIG_VMD)
>  static inline bool is_vmd(struct pci_bus *bus)
>  {
> -#if IS_ENABLED(CONFIG_VMD)
> - struct pci_sysdata *sd = bus->sysdata;
> -
> - return sd->vmd_domain;
> + return to_pci_sysdata(bus)->vmd_domain;
> +}
>  #else
> - return false;
> -#endif
> +#define is_vmd(bus)  false
> +#endif /* CONFIG_VMD */
>  }

I think this patch leaves this stray close brace here (it's cleaned up
in the next patch, but looks like it will break bisection).

Also, when you fix this, can you update the subject lines?  There's a
mix of "x86/PCI" and "x86/pci" (the convention per "git log --oneline"
is "x86/PCI").

>  /* Can be used to override the logic in pci_scan_bus for skipping
> @@ -124,9 +124,7 @@ static inline void early_quirks(void) { }
>  /* Returns the node based on pci bus */
>  static inline int __pcibus_to_node(const struct pci_bus *bus)
>  {
> - const struct pci_sysdata *sd = bus->sysdata;
> -
> - return sd->node;
> + return to_pci_sysdata((struct pci_bus *) bus)->node;
>  }
>  
>  static inline const struct cpumask *
> -- 
> 1.8.3.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/5] PCI: Introduce direct dma alias

2020-01-09 Thread Bjorn Helgaas
In subject:
s/Introduce direct dma alias/Add pci_direct_dma_alias()/

On Thu, Jan 09, 2020 at 07:30:54AM -0700, Jon Derrick wrote:
> The current dma alias implementation requires the aliased device be on
> the same bus as the dma parent. This introduces an arch-specific
> mechanism to point to an arbitrary struct device when doing mapping and
> pci alias search.

"arbitrary struct device" is a little weird since an arbitrary device
doesn't have to be a PCI device, but these mappings and aliases only
make sense in the PCI domain.

Maybe it has something to do with pci_sysdata.vmd_dev being a
"struct device *" rather than a "struct pci_dev *"?  I don't know why
that is, because it looks like every place you use it, you use
to_pci_dev() to get the pci_dev pointer back anyway.  But I assume you
have some good reason for that.

s/dma/DMA/
s/pci/PCI/
(above and also in code comments below)

> Signed-off-by: Jon Derrick 
> ---
>  arch/x86/pci/common.c |  7 +++
>  drivers/pci/pci.c | 17 -
>  drivers/pci/search.c  |  9 +
>  include/linux/pci.h   |  1 +
>  4 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 1e59df0..565cc17 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -736,3 +736,10 @@ int pci_ext_cfg_avail(void)
>   else
>   return 0;
>  }
> +
> +#if IS_ENABLED(CONFIG_VMD)
> +struct device *pci_direct_dma_alias(struct pci_dev *dev)
> +{
> + return to_pci_sysdata(dev->bus)->vmd_dev;
> +}
> +#endif
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ad746d9..e4269e9 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6034,7 +6034,9 @@ bool pci_devs_are_dma_aliases(struct pci_dev *dev1, 
> struct pci_dev *dev2)
>   return (dev1->dma_alias_mask &&
>   test_bit(dev2->devfn, dev1->dma_alias_mask)) ||
>  (dev2->dma_alias_mask &&
> - test_bit(dev1->devfn, dev2->dma_alias_mask));
> + test_bit(dev1->devfn, dev2->dma_alias_mask)) ||
> +(pci_direct_dma_alias(dev1) == >dev) ||
> +(pci_direct_dma_alias(dev2) == >dev);
>  }
>  
>  bool pci_device_is_present(struct pci_dev *pdev)
> @@ -6058,6 +6060,19 @@ void pci_ignore_hotplug(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(pci_ignore_hotplug);
>  
> +/**
> + * pci_direct_dma_alias - Get dma alias for pci device
> + * @dev: the PCI device that may have a dma alias
> + *
> + * Permits the platform to provide architecture-specific functionality to
> + * devices needing to alias dma to another device. This is the default
> + * implementation. Architecture implementations can override this.
> + */
> +struct device __weak *pci_direct_dma_alias(struct pci_dev *dev)
> +{
> + return NULL;
> +}
> +
>  resource_size_t __weak pcibios_default_alignment(void)
>  {
>   return 0;
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index bade140..6d61209 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -32,6 +32,15 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
>   struct pci_bus *bus;
>   int ret;
>  
> + if (unlikely(pci_direct_dma_alias(pdev))) {
> + struct device *dev = pci_direct_dma_alias(pdev);
> +
> + if (dev_is_pci(dev))
> + pdev = to_pci_dev(dev);
> + return fn(pdev, PCI_DEVID(pdev->bus->number, pdev->devfn),
> +   data);
> + }
> +
>   ret = fn(pdev, pci_dev_id(pdev), data);
>   if (ret)
>   return ret;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c393dff..82494d3 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1202,6 +1202,7 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, 
> struct pci_dev **limiting_dev,
>  int pci_select_bars(struct pci_dev *dev, unsigned long flags);
>  bool pci_device_is_present(struct pci_dev *pdev);
>  void pci_ignore_hotplug(struct pci_dev *dev);
> +struct device *pci_direct_dma_alias(struct pci_dev *dev);
>  
>  int __printf(6, 7) pci_request_irq(struct pci_dev *dev, unsigned int nr,
>   irq_handler_t handler, irq_handler_t thread_fn, void *dev_id,
> -- 
> 1.8.3.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 03/16] PCI/ATS: Restore EXPORT_SYMBOL_GPL() for pci_{enable,disable}_ats()

2019-12-20 Thread Bjorn Helgaas
On Fri, Dec 20, 2019 at 09:43:03AM +0100, Joerg Roedel wrote:
> Hi Bjorn,
> 
> On Thu, Dec 19, 2019 at 12:03:39PM +, Will Deacon wrote:
> > From: Greg Kroah-Hartman 
> > 
> > Commit d355bb209783 ("PCI/ATS: Remove unnecessary EXPORT_SYMBOL_GPL()")
> > unexported a bunch of symbols from the PCI core since the only external
> > users were non-modular IOMMU drivers. Although most of those symbols
> > can remain private for now, 'pci_{enable,disable_ats()' is required for
> > the ARM SMMUv3 driver to build as a module, otherwise we get a build
> > failure as follows:
> > 
> >   | ERROR: "pci_enable_ats" [drivers/iommu/arm-smmu-v3.ko] undefined!
> >   | ERROR: "pci_disable_ats" [drivers/iommu/arm-smmu-v3.ko] undefined!
> > 
> > Re-export these two functions so that the ARM SMMUv3 driver can be build
> > as a module.
> > 
> > Cc: Bjorn Helgaas 
> > Cc: Joerg Roedel 
> > Signed-off-by: Greg Kroah-Hartman 
> > [will: rewrote commit message]
> > Signed-off-by: Will Deacon 
> > ---
> >  drivers/pci/ats.c | 2 ++
> >  1 file changed, 2 insertions(+)
> 
> Are you fine with this change? I would apply this series to my tree
> then.

Yep, thanks!  You can add my

Acked-by: Bjorn Helgaas 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 2/3] PCI: Add parameter nr_devfns to pci_add_dma_alias

2019-12-11 Thread Bjorn Helgaas
On Wed, Dec 11, 2019 at 03:37:30PM +, James Sewart wrote:
> > On 10 Dec 2019, at 22:37, Bjorn Helgaas  wrote:
> >> -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
> >> +void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, unsigned 
> >> nr_devfns)
> >> {
> >> +  int devfn_to;
> >> +
> >> +  nr_devfns = min(nr_devfns, (unsigned)MAX_NR_DEVFNS);
> >> +  devfn_to = devfn_from + nr_devfns - 1;
> > 
> > I made this look like:
> > 
> > +   devfn_to = min(devfn_from + nr_devfns - 1,
> > +  (unsigned) MAX_NR_DEVFNS - 1);
> > 
> > so devfn_from=0xf0, nr_devfns=0x20 doesn't cause devfn_to to wrap
> > around.
> > 
> > I did keep Logan's reviewed-by, so let me know if I broke something.
> 
> I think nr_devfns still needs updating as it is used for bitmap_set. 
> Although thinking about it now we should limit the number to alias to be 
> maximum (MAX_NR_DEVFNS - devfn_from), so that we don’t set past the end of 
> the bitmap:
> 
>  nr_devfns = min(nr_devfns, (unsigned) MAX_NR_DEVFNS - devfn_from);
> 
> I think with this change we wont need to clip devfn_to.

Indeed, you're right, thanks!  I put this in, so it now looks like
the following.  I dropped Logan's reviewed-by to avoid putting words
in his mouth ;)

commit 5ddcd840395a ("PCI: Add nr_devfns parameter to pci_add_dma_alias()")
Author: James Sewart 
Date:   Tue Dec 10 16:07:30 2019 -0600

PCI: Add nr_devfns parameter to pci_add_dma_alias()

Add a "nr_devfns" parameter to pci_add_dma_alias() so it can be used to
create DMA aliases for a range of devfns.

[bhelgaas: incorporate nr_devfns fix from James, update
quirk_pex_vca_alias() and setup_aliases()]
Signed-off-by: James Sewart 
Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index bd25674ee4db..7a6c056b9b9c 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -230,11 +230,8 @@ static struct pci_dev *setup_aliases(struct device *dev)
 */
ivrs_alias = amd_iommu_alias_table[pci_dev_id(pdev)];
if (ivrs_alias != pci_dev_id(pdev) &&
-   PCI_BUS_NUM(ivrs_alias) == pdev->bus->number) {
-   pci_add_dma_alias(pdev, ivrs_alias & 0xff);
-   pci_info(pdev, "Added PCI DMA alias %02x.%d\n",
-   PCI_SLOT(ivrs_alias), PCI_FUNC(ivrs_alias));
-   }
+   PCI_BUS_NUM(ivrs_alias) == pdev->bus->number)
+   pci_add_dma_alias(pdev, ivrs_alias & 0xff, 1);
 
clone_aliases(pdev);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7b5fa2eabe09..951099279192 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5998,7 +5998,8 @@ EXPORT_SYMBOL_GPL(pci_pr3_present);
 /**
  * pci_add_dma_alias - Add a DMA devfn alias for a device
  * @dev: the PCI device for which alias is added
- * @devfn: alias slot and function
+ * @devfn_from: alias slot and function
+ * @nr_devfns: number of subsequent devfns to alias
  *
  * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask
  * which is used to program permissible bus-devfn source addresses for DMA
@@ -6014,8 +6015,13 @@ EXPORT_SYMBOL_GPL(pci_pr3_present);
  * cannot be left as a userspace activity).  DMA aliases should therefore
  * be configured via quirks, such as the PCI fixup header quirk.
  */
-void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
+void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, unsigned nr_devfns)
 {
+   int devfn_to;
+
+   nr_devfns = min(nr_devfns, (unsigned) MAX_NR_DEVFNS - devfn_from);
+   devfn_to = devfn_from + nr_devfns - 1;
+
if (!dev->dma_alias_mask)
dev->dma_alias_mask = bitmap_zalloc(MAX_NR_DEVFNS, GFP_KERNEL);
if (!dev->dma_alias_mask) {
@@ -6023,9 +6029,15 @@ void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
return;
}
 
-   set_bit(devfn, dev->dma_alias_mask);
-   pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n",
-PCI_SLOT(devfn), PCI_FUNC(devfn));
+   bitmap_set(dev->dma_alias_mask, devfn_from, nr_devfns);
+
+   if (nr_devfns == 1)
+   pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n",
+   PCI_SLOT(devfn_from), PCI_FUNC(devfn_from));
+   else if (nr_devfns > 1)
+   pci_info(dev, "Enabling fixed DMA alias for devfn range from 
%02x.%d to %02x.%d\n",
+   PCI_SLOT(devfn_from), PCI_FUNC(devfn_from),
+   PCI_SLOT(devfn_to), PCI_FUNC(devfn_to));
 }
 
 bool pci_devs_are_dma_aliases(struct pci_dev *dev1, struct pci_dev *dev2)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c

Re: [PATCH v6 2/3] PCI: Add parameter nr_devfns to pci_add_dma_alias

2019-12-10 Thread Bjorn Helgaas
[+cc Joerg]

On Tue, Dec 03, 2019 at 03:43:53PM +, James Sewart wrote:
> pci_add_dma_alias can now be used to create a dma alias for a range of
> devfns.
> 
> Reviewed-by: Logan Gunthorpe 
> Signed-off-by: James Sewart 
> ---
>  drivers/pci/pci.c| 22 +-
>  drivers/pci/quirks.c | 14 +++---
>  include/linux/pci.h  |  2 +-
>  3 files changed, 25 insertions(+), 13 deletions(-)

Heads up Joerg: I also updated drivers/iommu/amd_iommu.c (this is the
one reported by the kbuild test robot) and removed the printk there
that prints the same thing as the one in pci_add_dma_alias(), and I
updated a PCI quirk that was merged after this patch was posted.

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d3c83248f3ce..dbb01aceafda 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5857,7 +5857,8 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
>  /**
>   * pci_add_dma_alias - Add a DMA devfn alias for a device
>   * @dev: the PCI device for which alias is added
> - * @devfn: alias slot and function
> + * @devfn_from: alias slot and function
> + * @nr_devfns: Number of subsequent devfns to alias
>   *
>   * This helper encodes an 8-bit devfn as a bit number in dma_alias_mask
>   * which is used to program permissible bus-devfn source addresses for DMA
> @@ -5873,8 +5874,13 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
>   * cannot be left as a userspace activity).  DMA aliases should therefore
>   * be configured via quirks, such as the PCI fixup header quirk.
>   */
> -void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
> +void pci_add_dma_alias(struct pci_dev *dev, u8 devfn_from, unsigned 
> nr_devfns)
>  {
> + int devfn_to;
> +
> + nr_devfns = min(nr_devfns, (unsigned)MAX_NR_DEVFNS);
> + devfn_to = devfn_from + nr_devfns - 1;

I made this look like:

+   devfn_to = min(devfn_from + nr_devfns - 1,
+  (unsigned) MAX_NR_DEVFNS - 1);

so devfn_from=0xf0, nr_devfns=0x20 doesn't cause devfn_to to wrap
around.

I did keep Logan's reviewed-by, so let me know if I broke something.

>   if (!dev->dma_alias_mask)
>   dev->dma_alias_mask = bitmap_zalloc(MAX_NR_DEVFNS, GFP_KERNEL);
>   if (!dev->dma_alias_mask) {
> @@ -5882,9 +5888,15 @@ void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
>   return;
>   }
>  
> - set_bit(devfn, dev->dma_alias_mask);
> - pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n",
> -  PCI_SLOT(devfn), PCI_FUNC(devfn));
> + bitmap_set(dev->dma_alias_mask, devfn_from, nr_devfns);
> +
> + if (nr_devfns == 1)
> + pci_info(dev, "Enabling fixed DMA alias to %02x.%d\n",
> + PCI_SLOT(devfn_from), PCI_FUNC(devfn_from));
> + else if(nr_devfns > 1)
> + pci_info(dev, "Enabling fixed DMA alias for devfn range from 
> %02x.%d to %02x.%d\n",
> + PCI_SLOT(devfn_from), PCI_FUNC(devfn_from),
> + PCI_SLOT(devfn_to), PCI_FUNC(devfn_to));
>  }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/3] PCI: Fix off by one in dma_alias_mask allocation size

2019-12-10 Thread Bjorn Helgaas
[+cc Joerg]

On Tue, Dec 03, 2019 at 03:43:22PM +, James Sewart wrote:
> The number of possible devfns is 256, add def and correct uses.
> 
> Reviewed-by: Logan Gunthorpe 
> Signed-off-by: James Sewart 

I applied these three patches to pci/virtualization for v5.6, thanks!

I moved the MAX_NR_DEVFNS from include/linux/pci.h to
drivers/pci/pci.h since nobody outside drivers/pci needs it.

> ---
>  drivers/pci/pci.c| 2 +-
>  drivers/pci/search.c | 2 +-
>  include/linux/pci.h  | 2 ++
>  3 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a97e2571a527..d3c83248f3ce 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5876,7 +5876,7 @@ int pci_set_vga_state(struct pci_dev *dev, bool decode,
>  void pci_add_dma_alias(struct pci_dev *dev, u8 devfn)
>  {
>   if (!dev->dma_alias_mask)
> - dev->dma_alias_mask = bitmap_zalloc(U8_MAX, GFP_KERNEL);
> + dev->dma_alias_mask = bitmap_zalloc(MAX_NR_DEVFNS, GFP_KERNEL);
>   if (!dev->dma_alias_mask) {
>   pci_warn(dev, "Unable to allocate DMA alias mask\n");
>   return;
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index bade14002fd8..9e4dfae47252 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -43,7 +43,7 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
>   if (unlikely(pdev->dma_alias_mask)) {
>   u8 devfn;
>  
> - for_each_set_bit(devfn, pdev->dma_alias_mask, U8_MAX) {
> + for_each_set_bit(devfn, pdev->dma_alias_mask, MAX_NR_DEVFNS) {
>   ret = fn(pdev, PCI_DEVID(pdev->bus->number, devfn),
>data);
>   if (ret)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 1a6cf19eac2d..6481da29d667 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -57,6 +57,8 @@
>  #define PCI_DEVID(bus, devfn)u16)(bus)) << 8) | (devfn))
>  /* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
>  #define PCI_BUS_NUM(x) (((x) >> 8) & 0xff)
> +/* Number of possible devfns. devfns can be from 0.0 to 1f.7 inclusive */
> +#define MAX_NR_DEVFNS 256
>  
>  /* pci_slot represents a physical slot */
>  struct pci_slot {
> -- 
> 2.24.0
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 12/13] PCI/ATS: Add PASID stubs

2019-12-10 Thread Bjorn Helgaas
On Mon, Dec 09, 2019 at 07:05:13PM +0100, Jean-Philippe Brucker wrote:
> The SMMUv3 driver, which may be built without CONFIG_PCI, will soon gain
> PASID support.  Partially revert commit c6e9aefbf9db ("PCI/ATS: Remove
> unused PRI and PASID stubs") to re-introduce the PASID stubs, and avoid
> adding more #ifdefs to the SMMU driver.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  include/linux/pci-ats.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 5d62e78946a3..d08f0869f121 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -33,6 +33,9 @@ void pci_disable_pasid(struct pci_dev *pdev);
>  int pci_pasid_features(struct pci_dev *pdev);
>  int pci_max_pasids(struct pci_dev *pdev);
>  #else /* CONFIG_PCI_PASID */
> +static inline int pci_enable_pasid(struct pci_dev *pdev, int features)
> +{ return -EINVAL; }
> +static inline void pci_disable_pasid(struct pci_dev *pdev) { }
>  static inline int pci_pasid_features(struct pci_dev *pdev)
>  { return -EINVAL; }
>  static inline int pci_max_pasids(struct pci_dev *pdev)
> -- 
> 2.24.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()

2019-12-05 Thread Bjorn Helgaas
You got the "n" on "down" in the subject, but still missing "of" ;)

On Tue, Dec 03, 2019 at 12:47:40PM +0100, Nicolas Saenz Julienne wrote:
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> out ilog2() already handles 32/64bit calculations properly, and being
> the building block to the round functions we can rework them as a
> wrapper around it.

Missing "of" in the function names here.
s/a wrapper/wrappers/

IIUC the point of this is that roundup_pow_of_two() returned
"unsigned long", which can be either 32 or 64 bits (worth pointing
out, I think), and many callers need something that returns
"unsigned long long" (always 64 bits).

It's a nice simplification to remove the "__" variants.  Just as a
casual reader of this commit message, I'd like to know why we had both
the roundup and the __roundup versions in the first place, and why we
no longer need both.

> -#define roundup_pow_of_two(n)\
> -(\
> - __builtin_constant_p(n) ? ( \
> - (n == 1) ? 1 :  \
> - (1UL << (ilog2((n) - 1) + 1))   \
> -) :  \
> - __roundup_pow_of_two(n) \
> - )
> +#define roundup_pow_of_two(n)  \
> +(  \
> + (__builtin_constant_p(n) && ((n) == 1)) ? \
> + 1 : (1ULL << (ilog2((n) - 1) + 1))\
> +)

Should the resulting type of this expression always be a ULL, even
when n==1, i.e., should it be this?

  1ULL : (1ULL << (ilog2((n) - 1) + 1))\

Or maybe there's no case where that makes a difference?

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations

2019-12-05 Thread Bjorn Helgaas
The subject contains a couple typos: it's missing "of" and it's
missing the "n" on "down".

On Tue, Dec 03, 2019 at 12:47:41PM +0100, Nicolas Saenz Julienne wrote:
> The function now is safe to use while expecting a 64bit value. Use it
> where relevant.

Please include the function names ("roundup_pow_of_two()",
"rounddown_pow_of_two()") in the changelog so it is self-contained and
doesn't depend on the subject.

> Signed-off-by: Nicolas Saenz Julienne 

With the nits above and below addressed,

Acked-by: Bjorn Helgaas# drivers/pci

> ---
>  drivers/acpi/arm64/iort.c| 2 +-
>  drivers/net/ethernet/mellanox/mlx4/en_clock.c| 3 ++-
>  drivers/of/device.c  | 3 ++-
>  drivers/pci/controller/cadence/pcie-cadence-ep.c | 3 ++-
>  drivers/pci/controller/cadence/pcie-cadence.c| 3 ++-
>  drivers/pci/controller/pcie-brcmstb.c| 3 ++-
>  drivers/pci/controller/pcie-rockchip-ep.c| 5 +++--
>  kernel/dma/direct.c  | 2 +-
>  8 files changed, 15 insertions(+), 9 deletions(-)

> --- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
> +++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "pcie-cadence.h"
>  
> @@ -65,7 +66,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
>* roundup_pow_of_two() returns an unsigned long, which is not suited
>* for 64bit values.
>*/

Please remove the comment above since it no longer applies.

> - sz = 1ULL << fls64(sz - 1);
> + sz = roundup_pow_of_two(sz);
>   aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
>  
>   if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
> diff --git a/drivers/pci/controller/cadence/pcie-cadence.c 
> b/drivers/pci/controller/cadence/pcie-cadence.c
> index cd795f6fc1e2..b1689f725b41 100644
> --- a/drivers/pci/controller/cadence/pcie-cadence.c
> +++ b/drivers/pci/controller/cadence/pcie-cadence.c
> @@ -4,6 +4,7 @@
>  // Author: Cyrille Pitchen 
>  
>  #include 
> +#include 
>  
>  #include "pcie-cadence.h"
>  
> @@ -15,7 +16,7 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, 
> u8 fn,
>* roundup_pow_of_two() returns an unsigned long, which is not suited
>* for 64bit values.
>*/

Same here.

> - u64 sz = 1ULL << fls64(size - 1);
> + u64 sz = roundup_pow_of_two(size);
>   int nbits = ilog2(sz);
>   u32 addr0, addr1, desc0, desc1;
>  
> --- a/drivers/pci/controller/pcie-rockchip-ep.c
> +++ b/drivers/pci/controller/pcie-rockchip-ep.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "pcie-rockchip.h"
>  
> @@ -70,7 +71,7 @@ static void rockchip_pcie_prog_ep_ob_atu(struct 
> rockchip_pcie *rockchip, u8 fn,
>u32 r, u32 type, u64 cpu_addr,
>u64 pci_addr, size_t size)
>  {
> - u64 sz = 1ULL << fls64(size - 1);
> + u64 sz = roundup_pow_of_two(size);
>   int num_pass_bits = ilog2(sz);
>   u32 addr0, addr1, desc0, desc1;
>   bool is_nor_msg = (type == AXI_WRAPPER_NOR_MSG);
> @@ -176,7 +177,7 @@ static int rockchip_pcie_ep_set_bar(struct pci_epc *epc, 
> u8 fn,
>* roundup_pow_of_two() returns an unsigned long, which is not suited
>* for 64bit values.
>*/

And here.

> - sz = 1ULL << fls64(sz - 1);
> + sz = roundup_pow_of_two(sz);
>   aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
>  
>   if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 6af7ae83c4ad..056886c4efec 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -53,7 +53,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
>  {
>   u64 max_dma = phys_to_dma_direct(dev, (max_pfn - 1) << PAGE_SHIFT);
>  
> - return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
> + return rounddown_pow_of_two(max_dma) * 2 - 1;

Personally I would probably make this one a separate patch since it's
qualitatively different than the others and it would avoid the slight
awkwardness of the non-greppable "roundup/down_pow_of_two()"
construction in the commit subject.

But it's fine either way.

>  }
>  
>  static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
> -- 
> 2.24.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Ensure pci transactions coming from PLX NTB are handled when IOMMU is turned on

2019-11-20 Thread Bjorn Helgaas
On Wed, Nov 20, 2019 at 12:30:48PM -0700, Logan Gunthorpe wrote:
> On 2019-11-20 10:48 a.m., Dmitry Safonov wrote:
> > On 11/5/19 12:17 PM, James Sewart wrote:
> >>
> >>> On 24 Oct 2019, at 13:52, James Sewart  wrote:
> >>>
> >>> The PLX PEX NTB forwards DMA transactions using Requester ID's that don't 
> >>> exist as
> >>> PCI devices. The devfn for a transaction is used as an index into a 
> >>> lookup table
> >>> storing the origin of a transaction on the other side of the bridge.
> >>>
> >>> This patch aliases all possible devfn's to the NTB device so that any 
> >>> transaction
> >>> coming in is governed by the mappings for the NTB.
> >>>
> >>> Signed-Off-By: James Sewart 
> >>> ---
> >>> drivers/pci/quirks.c | 22 ++
> >>> 1 file changed, 22 insertions(+)
> >>>
> >>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> >>> index 320255e5e8f8..647f546e427f 100644
> >>> --- a/drivers/pci/quirks.c
> >>> +++ b/drivers/pci/quirks.c
> >>> @@ -5315,6 +5315,28 @@ SWITCHTEC_QUIRK(0x8574);  /* PFXI 64XG3 */
> >>> SWITCHTEC_QUIRK(0x8575);  /* PFXI 80XG3 */
> >>> SWITCHTEC_QUIRK(0x8576);  /* PFXI 96XG3 */
> >>>
> >>> +/*
> >>> + * PLX NTB uses devfn proxy IDs to move TLPs between NT endpoints. These 
> >>> IDs
> >>> + * are used to forward responses to the originator on the other side of 
> >>> the
> >>> + * NTB. Alias all possible IDs to the NTB to permit access when the 
> >>> IOMMU is
> >>> + * turned on.
> >>> + */
> >>> +static void quirk_PLX_NTB_dma_alias(struct pci_dev *pdev)
> >>> +{
> >>> + if (!pdev->dma_alias_mask)
> >>> + pdev->dma_alias_mask = kcalloc(BITS_TO_LONGS(U8_MAX),
> >>> +   sizeof(long), GFP_KERNEL);
> >>> + if (!pdev->dma_alias_mask) {
> >>> + dev_warn(>dev, "Unable to allocate DMA alias mask\n");
> >>> + return;
> >>> + }
> >>> +
> >>> + // PLX NTB may use all 256 devfns
> >>> + memset(pdev->dma_alias_mask, U8_MAX, (U8_MAX+1)/BITS_PER_BYTE);
> 
> I think it would be better to create a pci_add_dma_alias_range()
> function instead of directly accessing dma_alias_mask. We could then use
> that function to clean up quirk_switchtec_ntb_dma_alias() which is
> essentially doing the same thing.

Great idea!

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Ensure pci transactions coming from PLX NTB are handled when IOMMU is turned on

2019-11-20 Thread Bjorn Helgaas
[+cc Alex]

Hi James,

Thanks for the patch, and thanks, Dmitry for the cc!

"scripts/get_maintainer.pl -f drivers/pci/quirks.c" will give you a
list of relevant email addresses to post patches.  It was a good idea
to augment that list with related addresses, e.g., Logan and the iommu
list.

Follow existing style for subject, e.g.,

  PCI: Add DMA alias quirk for Microsemi Switchtec NTB

for a recent similar patch.

On Wed, Nov 20, 2019 at 05:48:45PM +, Dmitry Safonov wrote:
> On 11/5/19 12:17 PM, James Sewart wrote:
> >> On 24 Oct 2019, at 13:52, James Sewart  wrote:
> >>
> >> The PLX PEX NTB forwards DMA transactions using Requester ID's that don't 
> >> exist as
> >> PCI devices. The devfn for a transaction is used as an index into a lookup 
> >> table
> >> storing the origin of a transaction on the other side of the bridge.
> >>
> >> This patch aliases all possible devfn's to the NTB device so that any 
> >> transaction
> >> coming in is governed by the mappings for the NTB.
> >>
> >> Signed-Off-By: James Sewart 

Conventionally capitalized as:

  Signed-off-by: James Sewart 

> >> ---
> >> drivers/pci/quirks.c | 22 ++
> >> 1 file changed, 22 insertions(+)
> >>
> >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> >> index 320255e5e8f8..647f546e427f 100644
> >> --- a/drivers/pci/quirks.c
> >> +++ b/drivers/pci/quirks.c
> >> @@ -5315,6 +5315,28 @@ SWITCHTEC_QUIRK(0x8574);  /* PFXI 64XG3 */
> >> SWITCHTEC_QUIRK(0x8575);  /* PFXI 80XG3 */
> >> SWITCHTEC_QUIRK(0x8576);  /* PFXI 96XG3 */
> >>
> >> +/*
> >> + * PLX NTB uses devfn proxy IDs to move TLPs between NT endpoints. These 
> >> IDs
> >> + * are used to forward responses to the originator on the other side of 
> >> the
> >> + * NTB. Alias all possible IDs to the NTB to permit access when the IOMMU 
> >> is
> >> + * turned on.
> >> + */
> >> +static void quirk_PLX_NTB_dma_alias(struct pci_dev *pdev)

Conventional style is all lower-case (e.g.
quirk_switchtec_ntb_dma_alias()) for function and variable names, and
upper-case in English text.

> >> +{
> >> +  if (!pdev->dma_alias_mask)
> >> +  pdev->dma_alias_mask = kcalloc(BITS_TO_LONGS(U8_MAX),
> >> +sizeof(long), GFP_KERNEL);
> >> +  if (!pdev->dma_alias_mask) {
> >> +  dev_warn(>dev, "Unable to allocate DMA alias mask\n");

pci_warn()

> >> +  return;
> >> +  }
> >> +
> >> +  // PLX NTB may use all 256 devfns
> >> +  memset(pdev->dma_alias_mask, U8_MAX, (U8_MAX+1)/BITS_PER_BYTE);

Use C (not C++) comment style, as the rest of the file does.

I was about to suggest using pci_add_dma_alias(), but as currently
implemented that would generate 256 messages in dmesg, which seems
like overkill.

But I think it would still be good to allocate the mask the same way
(using bitmap_zalloc()) and to set the bits using bitmap_set().

It would also be nice to have one line in dmesg about these aliases,
as a hint when debugging IOMMU faults.

> >> +}
> >> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x87b0, 
> >> quirk_PLX_NTB_dma_alias);
> >> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PLX, 0x87b1, 
> >> quirk_PLX_NTB_dma_alias);
> >> +
> >> /*
> >>  * On Lenovo Thinkpad P50 SKUs with a Nvidia Quadro M1000M, the BIOS does
> >>  * not always reset the secondary Nvidia GPU between reboots if the system
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/7] PCI: Export pci_ats_disabled() as a GPL symbol to modules

2019-10-30 Thread Bjorn Helgaas
On Wed, Oct 30, 2019 at 02:51:08PM +, Will Deacon wrote:
> Building drivers for ATS-aware IOMMUs as modules requires access to
> pci_ats_disabled(). Export it as a GPL symbol to get things working.
> 
> Signed-off-by: Will Deacon 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/pci.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index a97e2571a527..4fbe5b576dd8 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -123,6 +123,7 @@ bool pci_ats_disabled(void)
>  {
>   return pcie_ats_disabled;
>  }
> +EXPORT_SYMBOL_GPL(pci_ats_disabled);
>  
>  /* Disable bridge_d3 for all PCIe ports */
>  static bool pci_bridge_d3_disable;
> -- 
> 2.24.0.rc0.303.g954a862665-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/dmar: collect fault statistics

2019-10-16 Thread Bjorn Helgaas
Hi Yuri,

On Tue, Oct 15, 2019 at 05:11:11PM +0200, Yuri Volchkov wrote:
> Currently dmar_fault handler only prints a message in the dmesg. This
> commit introduces counters - how many faults have happened, and
> exposes them via sysfs. Each pci device will have an entry
> 'dmar_faults' reading from which will give user 3 lines
>   remap: xxx
>   read: xxx
>   write: xxx

I think you should have three files instead of putting all these in a
single file.  See https://lore.kernel.org/r/20190621072911.ga21...@kroah.com
They should also be documented in Documentation/ABI/

I'm not sure this should be DMAR-specific.  Couldn't we count similar
events for other IOMMUs as well?

> This functionality is targeted for health monitoring daemons.
> 
> Signed-off-by: Yuri Volchkov 
> ---
>  drivers/iommu/dmar.c| 133 +++-
>  drivers/pci/pci-sysfs.c |  20 ++
>  include/linux/intel-iommu.h |   3 +
>  include/linux/pci.h |  11 +++
>  4 files changed, 150 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index eecd6a421667..0749873e3e41 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1107,6 +1107,7 @@ static void free_iommu(struct intel_iommu *iommu)
>   }
>  
>   if (iommu->irq) {
> + destroy_workqueue(iommu->fault_wq);
>   if (iommu->pr_irq) {
>   free_irq(iommu->pr_irq, iommu);
>   dmar_free_hwirq(iommu->pr_irq);
> @@ -1672,9 +1673,46 @@ void dmar_msi_read(int irq, struct msi_msg *msg)
>   raw_spin_unlock_irqrestore(>register_lock, flag);
>  }
>  
> -static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
> - u8 fault_reason, int pasid, u16 source_id,
> - unsigned long long addr)
> +struct dmar_fault_info {
> + struct work_struct work;
> + struct intel_iommu *iommu;
> + int type;
> + int pasid;
> + u16 source_id;
> + unsigned long long addr;
> + u8 fault_reason;
> +};
> +
> +static struct kmem_cache *dmar_fault_info_cache;
> +int __init dmar_fault_info_cache_init(void)
> +{
> + int ret = 0;
> +
> + dmar_fault_info_cache =
> + kmem_cache_create("dmar_fault_info",
> +   sizeof(struct dmar_fault_info), 0,
> +   SLAB_HWCACHE_ALIGN, NULL);
> + if (!dmar_fault_info_cache) {
> + pr_err("Couldn't create dmar_fault_info cache\n");
> + ret = -ENOMEM;
> + }
> +
> + return ret;
> +}
> +
> +static inline void *alloc_dmar_fault_info(void)
> +{
> + return kmem_cache_alloc(dmar_fault_info_cache, GFP_ATOMIC);
> +}
> +
> +static inline void free_dmar_fault_info(void *vaddr)
> +{
> + kmem_cache_free(dmar_fault_info_cache, vaddr);
> +}
> +
> +static int dmar_fault_dump_one(struct intel_iommu *iommu, int type,
> +u8 fault_reason, int pasid, u16 source_id,
> +unsigned long long addr)
>  {
>   const char *reason;
>   int fault_type;
> @@ -1695,6 +1733,57 @@ static int dmar_fault_do_one(struct intel_iommu 
> *iommu, int type,
>   return 0;
>  }
>  
> +static int dmar_fault_handle_one(struct dmar_fault_info *info)
> +{
> + struct pci_dev *pdev;
> + u8 devfn;
> + atomic_t *pcnt;
> +
> + devfn = PCI_DEVFN(PCI_SLOT(info->source_id), PCI_FUNC(info->source_id));
> + pdev = pci_get_domain_bus_and_slot(info->iommu->segment,
> +PCI_BUS_NUM(info->source_id), devfn);

I'm sure you've considered this already, but it's not completely clear
to me whether these counters should be in the pci_dev (as in this
patch) or in something IOMMU-related.

The pci_dev is nice because you automatically have counters for each
PCI device, and most faults can be tied back to a device.

But on the other hand, it's not the PCI device actually detecting and
reporting the error.  It's the IOMMU reporting the fault, and while
it's *likely* there's a pci_dev corresponding to the
bus/device/function info in the IOMMU error registers, there's no
guarantee: the device may have been hot-removed between reading the
IOMMU registers and doing the pci_get_domain_bus_and_slot(), or (I
suspect) faults could be caused by corrupted or malicious TLPs.

Another possible issue is that if the counts are in the pci_dev,
they're lost if the device is removed, which might happen while
diagnosing faulty hardware.

So I tend to think this is really IOMMU error information and the
IOMMU driver should handle it itself, including logging and exposing
it via sysfs.

> + if (!pdev)
> + return -ENXIO;
> +
> + if (info->fault_reason == INTR_REMAP)
> + pcnt = >faults_cnt.remap;
> + else if (info->type)
> + pcnt = >faults_cnt.read;
> + else
> + pcnt = >faults_cnt.write;
> +
> + atomic_inc(pcnt);

pci_get_domain_bus_and_slot() increments pdev's 

Re: [PATCH 0/2] iommu/vt-d: Select PCI_PRI for INTEL_IOMMU_SVM

2019-10-15 Thread Bjorn Helgaas
[+cc Jerry]

On Wed, Oct 09, 2019 at 05:45:49PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> I think intel-iommu.c depends on CONFIG_AMD_IOMMU in an undesirable way:
> 
> When CONFIG_INTEL_IOMMU_SVM=y, iommu_enable_dev_iotlb() calls PRI
> interfaces (pci_reset_pri() and pci_enable_pri()), but those are only
> implemented when CONFIG_PCI_PRI is enabled.  If CONFIG_PCI_PRI is not
> enabled, there are stubs that just return failure.
> 
> The INTEL_IOMMU_SVM Kconfig does nothing with PCI_PRI, but AMD_IOMMU
> selects PCI_PRI.  So if AMD_IOMMU is enabled, intel-iommu.c gets the full
> PRI interfaces.  If AMD_IOMMU is not enabled, it gets the PRI stubs.
> 
> This seems wrong.  The first patch here makes INTEL_IOMMU_SVM select
> PCI_PRI so intel-iommu.c always gets the full PRI interfaces.
> 
> The second patch moves pci_prg_resp_pasid_required(), which simply returns
> a bit from the PCI capability, from #ifdef CONFIG_PCI_PASID to #ifdef
> CONFIG_PCI_PRI.  This is related because INTEL_IOMMU_SVM already *does*
> select PCI_PASID, so it previously always got pci_prg_resp_pasid_required()
> even though it got stubs for other PRI things.
> 
> Since these are related and I have several follow-on ATS-related patches in
> the queue, I'd like to take these both via the PCI tree.
> 
> Bjorn Helgaas (2):
>   iommu/vt-d: Select PCI_PRI for INTEL_IOMMU_SVM
>   PCI/ATS: Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRI
> 
>  drivers/iommu/Kconfig   |  1 +
>  drivers/pci/ats.c   | 55 +++--
>  include/linux/pci-ats.h | 11 -
>  3 files changed, 31 insertions(+), 36 deletions(-)

I applied these to pci/virtualization for v5.5 with Kuppuswamy's
and Joerg's Reviewed-by on both and Jerry's on the first.  Thank you
all for checking this over!


  1   2   3   4   >