Re: [PATCH v5 1/2] PCI: ACPI: Support Microsoft's "DmaProperty"

2022-04-15 Thread Bjorn Helgaas
On Thu, Apr 14, 2022 at 04:15:47PM -0700, Rajat Jain via iommu wrote:
> On Thu, Apr 7, 2022 at 12:17 PM Bjorn Helgaas  wrote:
> > On Fri, Mar 25, 2022 at 11:46:08AM -0700, Rajat Jain wrote:

> > > Support the "DmaProperty" with the same semantics. This is useful for
> > > internal PCI devices that do not hang off a PCIe rootport, but offer
> > > an attack surface for DMA attacks (e.g. internal network devices).
> >
> > Same semantics as what?
> 
> Er, I meant the same semantics as the "DmaProperty". Please also see below.

"Support the 'DmaProperty' with the same semantics as 'DmaProperty'"
doesn't help much, so there must be a little more to the story :)

> > The MS description of "ExternalFacingPort" says:
> >
> >   This ACPI object enables the operating system to identify externally
> >   exposed PCIe hierarchies, such as Thunderbolt.
> 
> No, my patch doesn't have to do with this one.

I know, but it's similar, and I'm just hoping we can deal with them
consistently.

> > and "DmaProperty" says:
> >
> >   This ACPI object enables the operating system to identify internal
> >   PCIe hierarchies that are easily accessible by users (such as,
> >   Laptop M.2 PCIe slots accessible by way of a latch) and require
> >   protection by the OS Kernel DMA Protection mechanism.
> 
> Yes, this is the property that my patch uses. Microsoft has agreed to
> update this documentation (in a sideband thread that I also copied you
> on), with the updated semantics that this property can be used to
> identify any PCI devices that require Kernel DMA protection. i.e. the
> property is not restricted to identify "internal PCIe hierarchies"
> (starting at root port), but to "any PCI device".
> 
> > I don't really understand why they called out "laptop M.2 PCIe slots"
> > here.  Is the idea that those are more accessible than a standard
> > internal PCIe slot?  Seems like a pretty small distinction to me.
> >
> > I can understand your example of internal network devices adding an
> > attack surface.  But I don't see how "DmaProperty" helps identify
> > those.  Wouldn't a NIC in a standard internal PCIe slot add the same
> > attack surface?
> 
> Yes it would. The attack surface is the same. They probably only
> thought of devices external to the SoC (starting from a root port)
> when designing this property and thus called out internal M.2 PCI
> slots. But nowhave realized that this could be opened to any PCI
> device.

> > > +  * Property also used by Microsoft Windows for same purpose,
> > > +  * (to implement DMA protection from a device, using the IOMMU).
> > > +  */
> > > + if (device_property_read_u8(>dev, "DmaProperty", ))
> >
> > The MS web page says a _DSD with this property must be implemented in
> > the Root Port device scope, but we don't enforce that here.  We *do*
> > enforce it in pci_acpi_set_untrusted().  Shouldn't we do the same
> > here?
> 
> No, the whole point of doing this (please refer to the discussion on
> the previous versions of this patch) was that we want to have a
> property that is NOT limited to the root ports only. And we have
> reached an agreement with Microsoft about that.

We need to either mention the fact that we're going beyond what the MS
web page says or (ideally, as you are doing) get the page updated with
the semantics you need.

> > But IIUC, device_property_read_u8() works for either ACPI or DT
> > properties, and maybe there is interest in using this for DT systems.
> > None of these appear in any in-tree DTs, but maybe it is important to
> > handle these in DTs?
> >
> > If that's the case, this code would no longer be specific to ACPI and
> > should be moved to somewhere that's compiled even when CONFIG_ACPI
> > isn't set.
> 
> I think unifying ACPI and GPIO systems to use the same code / function
> to read the properties might be more work/investigation, because
> reading the properties for ACPI system happens much later than DT
> systems (For acpi systems, it happens in pci_acpi_setup() which is
> called much later). Given that no one wants to use this for DT
> systems, I'd prefer for this to be ACPI specific for now, and then we
> can solve it for DT once someone needs it.

I think it's OK to make it ACPI-specific for now.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

2022-04-08 Thread Bjorn Helgaas
On Fri, Apr 08, 2022 at 05:37:16PM +0200, Joerg Roedel wrote:
> On Fri, Apr 08, 2022 at 11:17:47AM -0300, Jason Gunthorpe wrote:
> > You might consider using a linear tree instead of the topic branches,
> > topics are tricky and I'm not sure it helps a small subsystem so much.
> > Conflicts between topics are a PITA for everyone, and it makes
> > handling conflicts with rc much harder than it needs to be.
> 
> I like the concept of a branch per driver, because with that I can just
> exclude that branch from my next-merge when there are issues with it.
> Conflicts between branches happen too, but they are quite manageable
> when the branches have the same base.

FWIW, I use the same topic branch approach for PCI.  I like the
ability to squash in fixes or drop things without having to clutter
the history with trivial commits and reverts.  I haven't found
conflicts to be a problem.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/2] PCI: ACPI: Support Microsoft's "DmaProperty"

2022-04-07 Thread Bjorn Helgaas
In subject,

  PCI/ACPI: ...

would be consistent with previous history (at least things coming
through the PCI tree :)).

On Fri, Mar 25, 2022 at 11:46:08AM -0700, Rajat Jain wrote:
> The "DmaProperty" is supported and documented by Microsoft here:
> https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports

Here's a more specific link (could probably be referenced below to
avoid cluttering the text here):

https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-internal-pcie-ports-accessible-to-users-and-requiring-dma-protection

> They use this property for DMA protection:
> https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt
> 
> Support the "DmaProperty" with the same semantics. This is useful for
> internal PCI devices that do not hang off a PCIe rootport, but offer
> an attack surface for DMA attacks (e.g. internal network devices).

Same semantics as what?

The MS description of "ExternalFacingPort" says:

  This ACPI object enables the operating system to identify externally
  exposed PCIe hierarchies, such as Thunderbolt.

and "DmaProperty" says:

  This ACPI object enables the operating system to identify internal
  PCIe hierarchies that are easily accessible by users (such as,
  Laptop M.2 PCIe slots accessible by way of a latch) and require
  protection by the OS Kernel DMA Protection mechanism.

I don't really understand why they called out "laptop M.2 PCIe slots"
here.  Is the idea that those are more accessible than a standard
internal PCIe slot?  Seems like a pretty small distinction to me.

I can understand your example of internal network devices adding an
attack surface.  But I don't see how "DmaProperty" helps identify
those.  Wouldn't a NIC in a standard internal PCIe slot add the same
attack surface?

> Signed-off-by: Rajat Jain 
> Reviewed-by: Mika Westerberg 
> ---
> v5: * Reorder the patches in the series
> v4: * Add the GUID. 
> * Update the comment and commitlog.
> v3: * Use Microsoft's documented property "DmaProperty"
> * Resctrict to ACPI only
> 
>  drivers/acpi/property.c |  3 +++
>  drivers/pci/pci-acpi.c  | 16 
>  2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> index d0986bda2964..20603cacc28d 100644
> --- a/drivers/acpi/property.c
> +++ b/drivers/acpi/property.c
> @@ -48,6 +48,9 @@ static const guid_t prp_guids[] = {
>   /* Storage device needs D3 GUID: 5025030f-842f-4ab4-a561-99a5189762d0 */
>   GUID_INIT(0x5025030f, 0x842f, 0x4ab4,
> 0xa5, 0x61, 0x99, 0xa5, 0x18, 0x97, 0x62, 0xd0),
> + /* DmaProperty for PCI devices GUID: 
> 70d24161-6dd5-4c9e-8070-705531292865 */
> + GUID_INIT(0x70d24161, 0x6dd5, 0x4c9e,
> +   0x80, 0x70, 0x70, 0x55, 0x31, 0x29, 0x28, 0x65),
>  };
>  
>  /* ACPI _DSD data subnodes GUID: dbb8e3e6-5886-4ba6-8795-1319f52a966b */
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 1f15ab7eabf8..378e05096c52 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1350,12 +1350,28 @@ static void pci_acpi_set_external_facing(struct 
> pci_dev *dev)
>   dev->external_facing = 1;
>  }
>  
> +static void pci_acpi_check_for_dma_protection(struct pci_dev *dev)

I try to avoid function names like *_check_*() because they don't give
any hint about whether there's a side effect or what direction things
are going.  I prefer things that return a value or make sense when
used as a predicate.  Maybe something like this?

  int pci_dev_has_dma_property(struct pci_dev *dev)

  dev->untrusted |= pci_dev_has_dma_property(pci_dev);

> +{
> + u8 val;
> +
> + /*
> +  * Property also used by Microsoft Windows for same purpose,
> +  * (to implement DMA protection from a device, using the IOMMU).
> +  */
> + if (device_property_read_u8(>dev, "DmaProperty", ))

The MS web page says a _DSD with this property must be implemented in
the Root Port device scope, but we don't enforce that here.  We *do*
enforce it in pci_acpi_set_untrusted().  Shouldn't we do the same
here?

We currently look at three properties from the same _DSD:

  DmaProperty
  ExternalFacingPort
  HotPlugSupportInD3

For "HotPlugSupportInD3", we check that "value == 1".  For
"ExternalFacingPort", we check that it's non-zero.  The MS doc isn't
explicit about the values, but shows "1" in the sample ASL.  I think
we should handle all three cases the same.

The first two use device_property_read_u8(); the last uses
acpi_dev_get_property().  Again, I think they should all be the same.

acpi_dev_get_property() is easier for me to read because there are
slightly fewer layers of abstraction between _DSD and
acpi_dev_get_property().

But IIUC, device_property_read_u8() works for either ACPI or DT
properties, and maybe there is interest in using this for DT systems.
None of these appear in any in-tree 

Re: [PATCH v7 06/11] PCI: portdrv: Set driver_managed_dma

2022-02-28 Thread Bjorn Helgaas
On Mon, Feb 28, 2022 at 08:50:51AM +0800, Lu Baolu wrote:
> If a switch lacks ACS P2P Request Redirect, a device below the switch can
> bypass the IOMMU and DMA directly to other devices below the switch, so
> all the downstream devices must be in the same IOMMU group as the switch
> itself.
> 
> The existing VFIO framework allows the portdrv driver to be bound to the
> bridge while its downstream devices are assigned to user space. The
> pci_dma_configure() marks the IOMMU group as containing only devices
> with kernel drivers that manage DMA. Avoid this default behavior for the
> portdrv driver in order for compatibility with the current VFIO usage.

It would be nice to explicitly say here how we can look at portdrv
(and pci_stub) and conclude that ".driver_managed_dma = true" is safe.

Otherwise I won't know what kind of future change to portdrv might
make it unsafe.

> Suggested-by: Jason Gunthorpe 
> Suggested-by: Kevin Tian 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jason Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/pcie/portdrv_pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..6b2adb678c21 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -202,6 +202,8 @@ static struct pci_driver pcie_portdriver = {
>  
>   .err_handler= _portdrv_err_handler,
>  
> + .driver_managed_dma = true,
> +
>   .driver.pm  = PCIE_PORTDRV_PM_OPS,
>  };
>  
> -- 
> 2.25.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 05/11] PCI: pci_stub: Set driver_managed_dma

2022-02-28 Thread Bjorn Helgaas
On Mon, Feb 28, 2022 at 08:50:50AM +0800, Lu Baolu wrote:
> The current VFIO implementation allows pci-stub driver to be bound to
> a PCI device with other devices in the same IOMMU group being assigned
> to userspace. The pci-stub driver has no dependencies on DMA or the
> IOVA mapping of the device, but it does prevent the user from having
> direct access to the device, which is useful in some circumstances.
> 
> The pci_dma_configure() marks the iommu_group as containing only devices
> with kernel drivers that manage DMA. For compatibility with the VFIO
> usage, avoid this default behavior for the pci_stub. This allows the
> pci_stub still able to be used by the admin to block driver binding after
> applying the DMA ownership to VFIO.
> 
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jason Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/pci-stub.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..d1f4c1ce7bd1 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,7 @@ static struct pci_driver stub_driver = {
>   .name   = "pci-stub",
>   .id_table   = NULL, /* only dynamic id's */
>   .probe  = pci_stub_probe,
> + .driver_managed_dma = true,
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 07/14] PCI: Add driver dma ownership management

2022-02-23 Thread Bjorn Helgaas
In subject,

s/dma/DMA/ to match the other patches

On Tue, Jan 04, 2022 at 09:56:37AM +0800, Lu Baolu wrote:
> Multiple PCI devices may be placed in the same IOMMU group because
> they cannot be isolated from each other. These devices must either be
> entirely under kernel control or userspace control, never a mixture. This
> checks and sets DMA ownership during driver binding, and release the
> ownership during driver unbinding.
> 
> The device driver may set a new flag (no_kernel_api_dma) to skip calling
> iommu_device_use_dma_api() during the binding process. For instance, the
> userspace framework drivers (vfio etc.) which need to manually claim
> their own dma ownership when assigning the device to userspace.

s/vfio/VFIO/ when used as an acronym (occurs in several patches)

> + * @no_kernel_api_dma: Device driver doesn't use kernel DMA API for DMA.
> + *   Drivers which don't require DMA or want to manually claim the
> + *   owner type (e.g. userspace driver frameworks) could set this
> + *   flag.

s/Drivers which/Drivers that/

>  static int pci_dma_configure(struct device *dev)
>  {
> + struct pci_driver *driver = to_pci_driver(dev->driver);
>   struct device *bridge;
>   int ret = 0;
>  
> + if (!driver->no_kernel_api_dma) {

Ugh.  Double negative, totally agree this needs a better name that
reverses the sense.  Every place you use it, you negate it again.

> + if (ret && !driver->no_kernel_api_dma)
> + iommu_device_unuse_dma_api(dev);

> +static void pci_dma_cleanup(struct device *dev)
> +{
> + struct pci_driver *driver = to_pci_driver(dev->driver);
> +
> + if (!driver->no_kernel_api_dma)
> + iommu_device_unuse_dma_api(dev);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/14] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

2022-01-06 Thread Bjorn Helgaas
On Thu, Jan 06, 2022 at 12:12:35PM +0800, Lu Baolu wrote:
> On 1/5/22 1:06 AM, Bjorn Helgaas wrote:
> > On Tue, Jan 04, 2022 at 09:56:39AM +0800, Lu Baolu wrote:
> > > If a switch lacks ACS P2P Request Redirect, a device below the switch can
> > > bypass the IOMMU and DMA directly to other devices below the switch, so
> > > all the downstream devices must be in the same IOMMU group as the switch
> > > itself.
> > Help me think through what's going on here.  IIUC, we put devices in
> > the same IOMMU group when they can interfere with each other in any
> > way (DMA, config access, etc).
> > 
> > (We said "DMA" above, but I guess this would also apply to config
> > requests, right?)
> 
> I am not sure whether devices could interfere each other through config
> space access. The IOMMU hardware only protects and isolates DMA
> accesses, so that userspace could control DMA directly. The config
> accesses will always be intercepted by VFIO. Hence, I don't see a
> problem.

I was wondering about config accesses generated by an endpoint, e.g.,
an endpoint doing config writes to a peer or the upstream bridge.

But I think that is prohibited by spec - PCIe r5.0, sec 7.3.3, says
"Propagation of Configuration Requests from Downstream to Upstream as
well as peer-to-peer are not supported" and "Configuration Requests
are initiated only by the Host Bridge, including those passed through
the SFI CAM mechanism."

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/14] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

2022-01-04 Thread Bjorn Helgaas
On Tue, Jan 04, 2022 at 03:26:14PM -0400, Jason Gunthorpe wrote:
> On Tue, Jan 04, 2022 at 11:06:31AM -0600, Bjorn Helgaas wrote:
> 
> > > The existing vfio framework allows the portdrv driver to be bound
> > > to the bridge while its downstream devices are assigned to user space.
> > 
> > I.e., the existing VFIO framework allows a switch to be in the same
> > IOMMU group as the devices below it, even though the switch has a
> > kernel driver and the other devices may have userspace drivers?
> 
> Yes, this patch exists to maintain current VFIO behavior which has this
> same check.
> 
> I belive the basis for VFIO doing this is that the these devices
> cannot do DMA, so don't care about the DMA API or the group->domain,
> and do not expose MMIO memory so do not care about the P2P attack.

"These devices" means bridges, right?  Not sure why we wouldn't care
about the P2P attack.

PCIe switches use MSI or MSI-X for hotplug, PME, etc, so they do DMA
for that.  Is that not relevant here?

Is there something that *prohibits* a bridge from having
device-specific functionality including DMA?

I know some bridges have device-specific BARs for performance counters
and the like.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/14] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

2022-01-04 Thread Bjorn Helgaas
On Tue, Jan 04, 2022 at 09:56:39AM +0800, Lu Baolu wrote:
> If a switch lacks ACS P2P Request Redirect, a device below the switch can
> bypass the IOMMU and DMA directly to other devices below the switch, so
> all the downstream devices must be in the same IOMMU group as the switch
> itself.

Help me think through what's going on here.  IIUC, we put devices in
the same IOMMU group when they can interfere with each other in any
way (DMA, config access, etc).

(We said "DMA" above, but I guess this would also apply to config
requests, right?)

*This* patch doesn't check for any ACS features.  Can you connect the
dots for me?  I guess the presence or absence of P2P Request Redirect
determines the size of the IOMMU group.  And the following says
something about what is allowed in the group?  And .no_kernel_api_dma
allows an exception to the general rule?

> The existing vfio framework allows the portdrv driver to be bound
> to the bridge while its downstream devices are assigned to user space.

I.e., the existing VFIO framework allows a switch to be in the same
IOMMU group as the devices below it, even though the switch has a
kernel driver and the other devices may have userspace drivers?

Is this a function of VFIO design or of the IOMMU driver?

> The pci_dma_configure() marks the iommu_group as containing only devices
> with kernel drivers that manage DMA. Avoid this default behavior for the
> portdrv driver in order for compatibility with the current vfio policy.

I assume "IOMMU group" means the same as "iommu_group"; maybe we can
use one of them consistently?

> Suggested-by: Jason Gunthorpe 
> Suggested-by: Kevin Tian 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/pci/pcie/portdrv_pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..2116f821c005 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -202,6 +202,8 @@ static struct pci_driver pcie_portdriver = {
>  
>   .err_handler= _portdrv_err_handler,
>  
> + .no_kernel_api_dma = true,
> +
>   .driver.pm  = PCIE_PORTDRV_PM_OPS,
>  };
>  
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 01/14] iommu: Add dma ownership management interfaces

2022-01-04 Thread Bjorn Helgaas
On Tue, Jan 04, 2022 at 02:08:00AM -0800, Christoph Hellwig wrote:
> On Tue, Jan 04, 2022 at 09:56:31AM +0800, Lu Baolu wrote:
> > Multiple devices may be placed in the same IOMMU group because they
> > cannot be isolated from each other. These devices must either be
> > entirely under kernel control or userspace control, never a mixture.

I guess the reason is that if a group contained a mixture, userspace
could attack the kernel by programming a device to DMA to a device
owned by the kernel?

> > This adds dma ownership management in iommu core and exposes several
> > interfaces for the device drivers and the device userspace assignment
> > framework (i.e. vfio), so that any conflict between user and kernel
> > controlled DMA could be detected at the beginning.

Maybe I'm missing the point because I don't know what "conflict
between user and kernel controlled DMA" is.  Are you talking about
both userspace and the kernel programming the same device to do DMA?

> > The device driver oriented interfaces are,
> > 
> > int iommu_device_use_dma_api(struct device *dev);
> > void iommu_device_unuse_dma_api(struct device *dev);

Nit, do we care whether it uses the actual DMA API?  Or is it just
that iommu_device_use_dma_api() tells us the driver may program the
device to do DMA?

> > Devices under kernel drivers control must call iommu_device_use_dma_api()
> > before driver probes. The driver binding process must be aborted if it
> > returns failure.

"Devices" don't call functions.  Drivers do, or in this case, it looks
like the bus DMA code (platform, amba, fsl, pci, etc).

These functions are EXPORT_SYMBOL_GPL(), but it looks like all the
callers are built-in, so maybe the export is unnecessary?

You use "iommu"/"IOMMU" and "dma"/"DMA" interchangeably above.  Would
be easier to read if you picked one.

> > The vfio oriented interfaces are,
> > 
> > int iommu_group_set_dma_owner(struct iommu_group *group,
> >   void *owner);
> > void iommu_group_release_dma_owner(struct iommu_group *group);
> > bool iommu_group_dma_owner_claimed(struct iommu_group *group);
> > 
> > The device userspace assignment must be disallowed if the set dma owner
> > interface returns failure.

Can you connect this back to the "never a mixture" from the beginning?
If all you cared about was prevent an IOMMU group from containing
devices with a mixture of kernel drivers and userspace drivers, I
assume you could do that without iommu_device_use_dma_api().  So is
this a way to *allow* a mixture under certain restricted conditions?

Another nit below.

> > Signed-off-by: Jason Gunthorpe 
> > Signed-off-by: Kevin Tian 
> > Signed-off-by: Lu Baolu 
> > ---
> >  include/linux/iommu.h |  31 
> >  drivers/iommu/iommu.c | 161 +-
> >  2 files changed, 189 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index de0c57a567c8..568f285468cf 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -682,6 +682,13 @@ struct iommu_sva *iommu_sva_bind_device(struct device 
> > *dev,
> >  void iommu_sva_unbind_device(struct iommu_sva *handle);
> >  u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> >  
> > +int iommu_device_use_dma_api(struct device *dev);
> > +void iommu_device_unuse_dma_api(struct device *dev);
> > +
> > +int iommu_group_set_dma_owner(struct iommu_group *group, void *owner);
> > +void iommu_group_release_dma_owner(struct iommu_group *group);
> > +bool iommu_group_dma_owner_claimed(struct iommu_group *group);
> > +
> >  #else /* CONFIG_IOMMU_API */
> >  
> >  struct iommu_ops {};
> > @@ -1082,6 +1089,30 @@ static inline struct iommu_fwspec 
> > *dev_iommu_fwspec_get(struct device *dev)
> >  {
> > return NULL;
> >  }
> > +
> > +static inline int iommu_device_use_dma_api(struct device *dev)
> > +{
> > +   return 0;
> > +}
> > +
> > +static inline void iommu_device_unuse_dma_api(struct device *dev)
> > +{
> > +}
> > +
> > +static inline int
> > +iommu_group_set_dma_owner(struct iommu_group *group, void *owner)
> > +{
> > +   return -ENODEV;
> > +}
> > +
> > +static inline void iommu_group_release_dma_owner(struct iommu_group *group)
> > +{
> > +}
> > +
> > +static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
> > +{
> > +   return false;
> > +}
> >  #endif /* CONFIG_IOMMU_API */
> >  
> >  /**
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 8b86406b7162..ff0c8c1ad5af 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -48,6 +48,8 @@ struct iommu_group {
> > struct iommu_domain *default_domain;
> > struct iommu_domain *domain;
> > struct list_head entry;
> > +   unsigned int owner_cnt;
> > +   void *owner;
> >  };
> >  
> >  struct group_device {
> > @@ -289,7 +291,12 @@ int iommu_probe_device(struct device *dev)
> > mutex_lock(>mutex);
> > iommu_alloc_default_domain(group, 

Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming

2021-12-30 Thread Bjorn Helgaas
On Thu, Dec 30, 2021 at 01:34:27PM +0800, Lu Baolu wrote:
> Hi Bjorn,
> 
> On 12/30/21 4:42 AM, Bjorn Helgaas wrote:
> > On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
> > > The pci_dma_configure() marks the iommu_group as containing only devices
> > > with kernel drivers that manage DMA.
> > 
> > I'm looking at pci_dma_configure(), and I don't see the connection to
> > iommu_groups.
> 
> The 2nd patch "driver core: Set DMA ownership during driver bind/unbind"
> sets all drivers' DMA to be kernel-managed by default except a few ones
> which has a driver flag set. So by default, all iommu groups contains
> only devices with kernel drivers managing DMA.

It looks like that happens in device_dma_configure(), not
pci_dma_configure().

> > > Avoid this default behavior for the
> > > pci_stub because it does not program any DMA itself.  This allows the
> > > pci_stub still able to be used by the admin to block driver binding after
> > > applying the DMA ownership to vfio.
> > 
> > > 
> > > Signed-off-by: Lu Baolu 
> > > ---
> > >   drivers/pci/pci-stub.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> > > index e408099fea52..6324c68602b4 100644
> > > --- a/drivers/pci/pci-stub.c
> > > +++ b/drivers/pci/pci-stub.c
> > > @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
> > >   .name   = "pci-stub",
> > >   .id_table   = NULL, /* only dynamic id's */
> > >   .probe  = pci_stub_probe,
> > > + .driver = {
> > > + .suppress_auto_claim_dma_owner = true,
> > 
> > The new .suppress_auto_claim_dma_owner controls whether we call
> > iommu_device_set_dma_owner().  I guess you added
> > .suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
> > must be done *before* we call the driver's .probe() method?
> 
> As explained above, all drivers are set to kernel-managed dma by
> default. For those vfio and vfio-approved drivers,
> suppress_auto_claim_dma_owner is used to tell the driver core that "this
> driver is attached to device for userspace assignment purpose, do not
> claim it for kernel-management dma".
> 
> > Otherwise, we could call some new interface from .probe() instead of
> > adding the flag to struct device_driver.
> 
> Most device drivers are of the kernel-managed DMA type. Only a few vfio
> and vfio-approved drivers need to use this flag. That's the reason why
> we claim kernel-managed DMA by default.

Yes.  But you didn't answer the question of whether this must be done
by a new flag in struct device_driver, or whether it could be done by
having these few VFIO and "VFIO-approved" (whatever that means)
drivers call a new interface.

I was speculating that maybe the DMA ownership claiming must be done
*before* the driver's .probe() method?  If so, that would require a
new flag.  But I don't know whether that's the case.  If DMA
ownership could be claimed by the .probe() method, we wouldn't need
the new flag in struct device_driver.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/13] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

2021-12-29 Thread Bjorn Helgaas
On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote:
> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> then all of the downstream devices will be part of the same IOMMU group
> as the bridge. The existing vfio framework allows the portdrv driver to
> be bound to the bridge while its downstream devices are assigned to user
> space. The pci_dma_configure() marks the iommu_group as containing only
> devices with kernel drivers that manage DMA. Avoid this default behavior
> for the portdrv driver in order for compatibility with the current vfio
> policy.

A word about the isolation would be useful.  I think you're referring
to some specific ACS controls, probably P2P Request Redirect?

I guess this is just a wording issue, but I think it's actually the
*lack* of some ACS controls that forces us to put several devices in
the same IOMMU group, isn't it?  It's not that we start with "IOMMU
grouping" and that necessitates something else.

Maybe something like this?

  If a switch lacks ACS P2P Request Redirect (and possibly other
  controls?), a device below the switch can bypass the IOMMU and DMA
  directly to other devices below the switch, so all the downstream
  devices must be in the same IOMMU group as the switch itself.

> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
> policy to all kernel drivers of bridge class. This is not always safe.
> For example, The shpchp_core driver relies on the PCI MMIO access for the
> controller functionality. With its downstream devices assigned to the
> userspace, the MMIO might be changed through user initiated P2P accesses
> without any notification. This might break the kernel driver integrity
> and lead to some unpredictable consequences.
> 
> For any bridge driver, in order to avoiding default kernel DMA ownership
> claiming, we should consider:
> 
>  1) Does the bridge driver use DMA? Calling pci_set_master() or
> a dma_map_* API is a sure indicate the driver is doing DMA
> 
>  2) If the bridge driver uses MMIO, is it tolerant to hostile
> userspace also touching the same MMIO registers via P2P DMA
> attacks?
> 
> Conservatively if the driver maps an MMIO region at all, we can say that
> it fails the test.

I'm not sure what all this explanation is telling me.  It says
something done by 5f096b14d421 is not always safe, but this patch
doesn't fix any of those unsafe things.

If it doesn't explain why we need this patch or how this patch works,
I don't think we need it in the commit log.

Maybe this is an explanation for why you didn't set
.suppress_auto_claim_dma_owner for shpc_driver?

Minor typos above:
  s/in order to avoiding default/before avoiding default/
  s/relies on the PCI MMIO access/relies on PCI MMIO access/
  s/For example, The/For example, the/
  s/is a sure indicate the/is a sure indication the/

> Suggested-by: Jason Gunthorpe 
> Suggested-by: Kevin Tian 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/pci/pcie/portdrv_pci.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..c48a8734f9c4 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
>  
>   .err_handler= _portdrv_err_handler,
>  
> - .driver.pm  = PCIE_PORTDRV_PM_OPS,
> + .driver = {
> + .pm = PCIE_PORTDRV_PM_OPS,
> + .suppress_auto_claim_dma_owner = true,
> + },
>  };
>  
>  static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming

2021-12-29 Thread Bjorn Helgaas
On Fri, Dec 17, 2021 at 02:36:58PM +0800, Lu Baolu wrote:
> The pci_dma_configure() marks the iommu_group as containing only devices
> with kernel drivers that manage DMA.

I'm looking at pci_dma_configure(), and I don't see the connection to
iommu_groups.

> Avoid this default behavior for the
> pci_stub because it does not program any DMA itself.  This allows the
> pci_stub still able to be used by the admin to block driver binding after
> applying the DMA ownership to vfio.

> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>   .name   = "pci-stub",
>   .id_table   = NULL, /* only dynamic id's */
>   .probe  = pci_stub_probe,
> + .driver = {
> + .suppress_auto_claim_dma_owner = true,

The new .suppress_auto_claim_dma_owner controls whether we call
iommu_device_set_dma_owner().  I guess you added
.suppress_auto_claim_dma_owner because iommu_device_set_dma_owner()
must be done *before* we call the driver's .probe() method?

Otherwise, we could call some new interface from .probe() instead of
adding the flag to struct device_driver.

> + },
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 28/36] PCI/MSI: Use __msi_get_virq() in pci_get_vector()

2021-12-07 Thread Bjorn Helgaas
On Mon, Dec 06, 2021 at 11:39:41PM +0100, Thomas Gleixner wrote:
> Use msi_get_vector() and handle the return value to be compatible.
> 
> No functional change intended.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
> V2: Handle the INTx case directly instead of trying to be overly smart - Marc
> ---
>  drivers/pci/msi/msi.c |   25 +
>  1 file changed, 5 insertions(+), 20 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -1032,28 +1032,13 @@ EXPORT_SYMBOL(pci_free_irq_vectors);
>   */
>  int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
>  {
> - if (dev->msix_enabled) {
> - struct msi_desc *entry;
> + unsigned int irq;
>  
> - for_each_pci_msi_entry(entry, dev) {
> - if (entry->msi_index == nr)
> - return entry->irq;
> - }
> - WARN_ON_ONCE(1);
> - return -EINVAL;
> - }
> + if (!dev->msi_enabled && !dev->msix_enabled)
> + return !nr ? dev->irq : -EINVAL;
>  
> - if (dev->msi_enabled) {
> - struct msi_desc *entry = first_pci_msi_entry(dev);
> -
> - if (WARN_ON_ONCE(nr >= entry->nvec_used))
> - return -EINVAL;
> - } else {
> - if (WARN_ON_ONCE(nr > 0))
> - return -EINVAL;
> - }
> -
> - return dev->irq + nr;
> + irq = msi_get_virq(>dev, nr);
> + return irq ? irq : -EINVAL;
>  }
>  EXPORT_SYMBOL(pci_irq_vector);
>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 25/36] PCI/MSI: Provide MSI_FLAG_MSIX_CONTIGUOUS

2021-12-07 Thread Bjorn Helgaas
On Mon, Dec 06, 2021 at 11:39:36PM +0100, Thomas Gleixner wrote:
> Provide a domain info flag which makes the core code check for a contiguous
> MSI-X index on allocation. That's simpler than checking it at some other
> domain callback in architecture code.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> Reviewed-by: Jason Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/irqdomain.c |   16 ++--
>  include/linux/msi.h |2 ++
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -89,9 +89,21 @@ static int pci_msi_domain_check_cap(stru
>   if (pci_msi_desc_is_multi_msi(desc) &&
>   !(info->flags & MSI_FLAG_MULTI_PCI_MSI))
>   return 1;
> - else if (desc->pci.msi_attrib.is_msix && !(info->flags & 
> MSI_FLAG_PCI_MSIX))
> - return -ENOTSUPP;
>  
> + if (desc->pci.msi_attrib.is_msix) {
> + if (!(info->flags & MSI_FLAG_PCI_MSIX))
> + return -ENOTSUPP;
> +
> + if (info->flags & MSI_FLAG_MSIX_CONTIGUOUS) {
> + unsigned int idx = 0;
> +
> + /* Check for gaps in the entry indices */
> + for_each_msi_entry(desc, dev) {
> + if (desc->msi_index != idx++)
> + return -ENOTSUPP;
> + }
> + }
> + }
>   return 0;
>  }
>  
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -376,6 +376,8 @@ enum {
>   MSI_FLAG_LEVEL_CAPABLE  = (1 << 6),
>   /* Populate sysfs on alloc() and destroy it on free() */
>   MSI_FLAG_DEV_SYSFS  = (1 << 7),
> + /* MSI-X entries must be contiguous */
> + MSI_FLAG_MSIX_CONTIGUOUS= (1 << 8),
>  };
>  
>  int msi_domain_set_affinity(struct irq_data *data, const struct cpumask 
> *mask,
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 19/36] PCI/MSI: Store properties in device::msi::data

2021-12-07 Thread Bjorn Helgaas
On Mon, Dec 06, 2021 at 11:39:26PM +0100, Thomas Gleixner wrote:
> Store the properties which are interesting for various places so the MSI
> descriptor fiddling can be removed.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: Bjorn Helgaas 

> ---
> V2: Use the setter function
> ---
>  drivers/pci/msi/msi.c |8 
>  1 file changed, 8 insertions(+)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -244,6 +244,8 @@ static void free_msi_irqs(struct pci_dev
>   iounmap(dev->msix_base);
>   dev->msix_base = NULL;
>   }
> +
> + msi_device_set_properties(>dev, 0);
>  }
>  
>  static void pci_intx_for_msi(struct pci_dev *dev, int enable)
> @@ -341,6 +343,7 @@ msi_setup_entry(struct pci_dev *dev, int
>  {
>   struct irq_affinity_desc *masks = NULL;
>   struct msi_desc *entry;
> + unsigned long prop;
>   u16 control;
>  
>   if (affd)
> @@ -372,6 +375,10 @@ msi_setup_entry(struct pci_dev *dev, int
>   if (entry->pci.msi_attrib.can_mask)
>   pci_read_config_dword(dev, entry->pci.mask_pos, 
> >pci.msi_mask);
>  
> + prop = MSI_PROP_PCI_MSI;
> + if (entry->pci.msi_attrib.is_64)
> + prop |= MSI_PROP_64BIT;
> + msi_device_set_properties(>dev, prop);
>  out:
>   kfree(masks);
>   return entry;
> @@ -514,6 +521,7 @@ static int msix_setup_entries(struct pci
>   if (masks)
>   curmsk++;
>   }
> + msi_device_set_properties(>dev, MSI_PROP_PCI_MSIX | 
> MSI_PROP_64BIT);
>   ret = 0;
>  out:
>   kfree(masks);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 17/36] PCI/MSI: Use msi_desc::msi_index

2021-12-07 Thread Bjorn Helgaas
On Mon, Dec 06, 2021 at 11:39:23PM +0100, Thomas Gleixner wrote:
> The usage of msi_desc::pci::entry_nr is confusing at best. It's the index
> into the MSI[X] descriptor table.
> 
> Use msi_desc::msi_index which is shared between all MSI incarnations
> instead of having a PCI specific storage for no value.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> Reviewed-by: Jason Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  arch/powerpc/platforms/pseries/msi.c |4 ++--
>  arch/x86/pci/xen.c   |2 +-
>  drivers/pci/msi/irqdomain.c  |2 +-
>  drivers/pci/msi/msi.c|   20 
>  drivers/pci/xen-pcifront.c   |2 +-
>  include/linux/msi.h  |2 --
>  6 files changed, 13 insertions(+), 19 deletions(-)
> 
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -332,7 +332,7 @@ static int check_msix_entries(struct pci
>  
>   expected = 0;
>   for_each_pci_msi_entry(entry, pdev) {
> - if (entry->pci.msi_attrib.entry_nr != expected) {
> + if (entry->msi_index != expected) {
>   pr_debug("rtas_msi: bad MSI-X entries.\n");
>   return -EINVAL;
>   }
> @@ -580,7 +580,7 @@ static int pseries_irq_domain_alloc(stru
>   int hwirq;
>   int i, ret;
>  
> - hwirq = rtas_query_irq_number(pci_get_pdn(pdev), 
> desc->pci.msi_attrib.entry_nr);
> + hwirq = rtas_query_irq_number(pci_get_pdn(pdev), desc->msi_index);
>   if (hwirq < 0) {
>   dev_err(>dev, "Failed to query HW IRQ: %d\n", hwirq);
>   return hwirq;
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -306,7 +306,7 @@ static int xen_initdom_setup_msi_irqs(st
>   return -EINVAL;
>  
>   map_irq.table_base = pci_resource_start(dev, bir);
> - map_irq.entry_nr = msidesc->pci.msi_attrib.entry_nr;
> + map_irq.entry_nr = msidesc->msi_index;
>   }
>  
>   ret = -EINVAL;
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -57,7 +57,7 @@ static irq_hw_number_t pci_msi_domain_ca
>  {
>   struct pci_dev *dev = msi_desc_to_pci_dev(desc);
>  
> - return (irq_hw_number_t)desc->pci.msi_attrib.entry_nr |
> + return (irq_hw_number_t)desc->msi_index |
>   pci_dev_id(dev) << 11 |
>   (pci_domain_nr(dev->bus) & 0x) << 27;
>  }
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -44,7 +44,7 @@ static inline void pci_msi_unmask(struct
>  
>  static inline void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
>  {
> - return desc->pci.mask_base + desc->pci.msi_attrib.entry_nr * 
> PCI_MSIX_ENTRY_SIZE;
> + return desc->pci.mask_base + desc->msi_index * PCI_MSIX_ENTRY_SIZE;
>  }
>  
>  /*
> @@ -356,13 +356,10 @@ msi_setup_entry(struct pci_dev *dev, int
>   if (dev->dev_flags & PCI_DEV_FLAGS_HAS_MSI_MASKING)
>   control |= PCI_MSI_FLAGS_MASKBIT;
>  
> - entry->pci.msi_attrib.is_msix   = 0;
> - entry->pci.msi_attrib.is_64 = !!(control & 
> PCI_MSI_FLAGS_64BIT);
> - entry->pci.msi_attrib.is_virtual= 0;
> - entry->pci.msi_attrib.entry_nr  = 0;
> + entry->pci.msi_attrib.is_64 = !!(control & PCI_MSI_FLAGS_64BIT);
>   entry->pci.msi_attrib.can_mask  = !pci_msi_ignore_mask &&
> !!(control & PCI_MSI_FLAGS_MASKBIT);
> - entry->pci.msi_attrib.default_irq   = dev->irq; /* Save IOAPIC 
> IRQ */
> + entry->pci.msi_attrib.default_irq = dev->irq;
>   entry->pci.msi_attrib.multi_cap = (control & PCI_MSI_FLAGS_QMASK) >> 1;
>   entry->pci.msi_attrib.multiple  = ilog2(__roundup_pow_of_two(nvec));
>  
> @@ -496,12 +493,11 @@ static int msix_setup_entries(struct pci
>   entry->pci.msi_attrib.is_64 = 1;
>  
>   if (entries)
> - entry->pci.msi_attrib.entry_nr = entries[i].entry;
> + entry->msi_index = entries[i].entry;
>   else
> - entry->pci.msi_attrib.entry_nr = i;
> + entry->msi_index = i;
>  
> - entry->pci.msi_attrib.is_virtual =
> - entry->pci.msi_attrib.entry_nr >= vec_count;
> + entry->pci.msi_attrib.is_virtual = entry->msi_index >= 
> vec

Re: [patch V2 08/36] PCI/MSI: Let the irq code handle sysfs groups

2021-12-07 Thread Bjorn Helgaas
On Mon, Dec 06, 2021 at 11:39:09PM +0100, Thomas Gleixner wrote:
> Set the domain info flag which makes the core code handle sysfs groups and
> put an explicit invocation into the legacy code.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> Reviewed-by: Jason Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/irqdomain.c |2 +-
>  drivers/pci/msi/legacy.c|6 +-
>  drivers/pci/msi/msi.c   |   23 ---
>  include/linux/pci.h |1 -
>  4 files changed, 6 insertions(+), 26 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -159,7 +159,7 @@ struct irq_domain *pci_msi_create_irq_do
>   if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
>   pci_msi_domain_update_chip_ops(info);
>  
> - info->flags |= MSI_FLAG_ACTIVATE_EARLY;
> + info->flags |= MSI_FLAG_ACTIVATE_EARLY | MSI_FLAG_DEV_SYSFS;
>   if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
>   info->flags |= MSI_FLAG_MUST_REACTIVATE;
>  
> --- a/drivers/pci/msi/legacy.c
> +++ b/drivers/pci/msi/legacy.c
> @@ -70,10 +70,14 @@ int pci_msi_legacy_setup_msi_irqs(struct
>  {
>   int ret = arch_setup_msi_irqs(dev, nvec, type);
>  
> - return pci_msi_setup_check_result(dev, type, ret);
> + ret = pci_msi_setup_check_result(dev, type, ret);
> + if (!ret)
> + ret = msi_device_populate_sysfs(>dev);
> + return ret;
>  }
>  
>  void pci_msi_legacy_teardown_msi_irqs(struct pci_dev *dev)
>  {
> + msi_device_destroy_sysfs(>dev);
>   arch_teardown_msi_irqs(dev);
>  }
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -233,11 +233,6 @@ static void free_msi_irqs(struct pci_dev
>   for (i = 0; i < entry->nvec_used; i++)
>   BUG_ON(irq_has_action(entry->irq + i));
>  
> - if (dev->msi_irq_groups) {
> - msi_destroy_sysfs(>dev, dev->msi_irq_groups);
> - dev->msi_irq_groups = NULL;
> - }
> -
>   pci_msi_teardown_msi_irqs(dev);
>  
>   list_for_each_entry_safe(entry, tmp, msi_list, list) {
> @@ -417,7 +412,6 @@ static int msi_verify_entries(struct pci
>  static int msi_capability_init(struct pci_dev *dev, int nvec,
>  struct irq_affinity *affd)
>  {
> - const struct attribute_group **groups;
>   struct msi_desc *entry;
>   int ret;
>  
> @@ -441,14 +435,6 @@ static int msi_capability_init(struct pc
>   if (ret)
>   goto err;
>  
> - groups = msi_populate_sysfs(>dev);
> - if (IS_ERR(groups)) {
> - ret = PTR_ERR(groups);
> - goto err;
> - }
> -
> - dev->msi_irq_groups = groups;
> -
>   /* Set MSI enabled bits */
>   pci_intx_for_msi(dev, 0);
>   pci_msi_set_enable(dev, 1);
> @@ -576,7 +562,6 @@ static void msix_mask_all(void __iomem *
>  static int msix_capability_init(struct pci_dev *dev, struct msix_entry 
> *entries,
>   int nvec, struct irq_affinity *affd)
>  {
> - const struct attribute_group **groups;
>   void __iomem *base;
>   int ret, tsize;
>   u16 control;
> @@ -618,14 +603,6 @@ static int msix_capability_init(struct p
>  
>   msix_update_entries(dev, entries);
>  
> - groups = msi_populate_sysfs(>dev);
> - if (IS_ERR(groups)) {
> - ret = PTR_ERR(groups);
> - goto out_free;
> - }
> -
> - dev->msi_irq_groups = groups;
> -
>   /* Set MSI-X enabled bits and unmask the function */
>   pci_intx_for_msi(dev, 0);
>   dev->msix_enabled = 1;
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -475,7 +475,6 @@ struct pci_dev {
>  #ifdef CONFIG_PCI_MSI
>   void __iomem*msix_base;
>   raw_spinlock_t  msi_lock;
> - const struct attribute_group **msi_irq_groups;
>  #endif
>   struct pci_vpd  vpd;
>  #ifdef CONFIG_PCIE_DPC
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 03/36] PCI/MSI: Allocate MSI device data on first use

2021-12-07 Thread Bjorn Helgaas
On Mon, Dec 06, 2021 at 11:39:00PM +0100, Thomas Gleixner wrote:
> Allocate MSI device data on first use, i.e. when a PCI driver invokes one
> of the PCI/MSI enablement functions.
> 
> Signed-off-by: Thomas Gleixner 
> Reviewed-by: Greg Kroah-Hartman 
> Reviewed-by: Jason Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/msi/msi.c |   20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -889,10 +889,12 @@ static int __pci_enable_msi_range(struct
>  /* deprecated, don't use */
>  int pci_enable_msi(struct pci_dev *dev)
>  {
> - int rc = __pci_enable_msi_range(dev, 1, 1, NULL);
> - if (rc < 0)
> - return rc;
> - return 0;
> + int rc = msi_setup_device_data(>dev);
> +
> + if (!rc)
> + rc = __pci_enable_msi_range(dev, 1, 1, NULL);
> +
> + return rc < 0 ? rc : 0;
>  }
>  EXPORT_SYMBOL(pci_enable_msi);
>  
> @@ -947,7 +949,11 @@ static int __pci_enable_msix_range(struc
>  int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
>   int minvec, int maxvec)
>  {
> - return __pci_enable_msix_range(dev, entries, minvec, maxvec, NULL, 0);
> + int ret = msi_setup_device_data(>dev);
> +
> + if (!ret)
> + ret = __pci_enable_msix_range(dev, entries, minvec, maxvec, 
> NULL, 0);
> + return ret;
>  }
>  EXPORT_SYMBOL(pci_enable_msix_range);
>  
> @@ -974,8 +980,12 @@ int pci_alloc_irq_vectors_affinity(struc
>  struct irq_affinity *affd)
>  {
>   struct irq_affinity msi_default_affd = {0};
> + int ret = msi_setup_device_data(>dev);
>   int nvecs = -ENOSPC;
>  
> + if (ret)
> + return ret;
> +
>   if (flags & PCI_IRQ_AFFINITY) {
>   if (!affd)
>   affd = _default_affd;
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

2021-11-16 Thread Bjorn Helgaas
On Tue, Nov 16, 2021 at 03:24:29PM +0800, Lu Baolu wrote:
> On 2021/11/16 4:44, Bjorn Helgaas wrote:
> > On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> > > IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> > > then all of the downstream devices will be part of the same IOMMU group
> > > as the bridge.
> > 
> > I think this means something like: "If a PCIe Switch Downstream Port
> > lacks , all downstream devices
> > will be part of the same IOMMU group as the switch," right?
> 
> For this patch, yes.
> 
> > If so, can you fill in the details to make it specific and concrete?
> 
> The existing vfio implementation allows a kernel driver to bind with a
> PCI bridge while its downstream devices are assigned to the user space
> though there lacks ACS-like isolation in bridge.
> 
> drivers/vfio/vfio.c:
>  540 static bool vfio_dev_driver_allowed(struct device *dev,
>  541 struct device_driver *drv)
>  542 {
>  543 if (dev_is_pci(dev)) {
>  544 struct pci_dev *pdev = to_pci_dev(dev);
>  545
>  546 if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>  547 return true;
>  548 }
> 
> We are moving the group viability check to IOMMU core, and trying to
> make it compatible with the current vfio policy. We saw three types of
> bridges:
> 
> #1) PCIe/PCI-to-PCI bridges
> These bridges are configured in the PCI framework, there's no
> dedicated driver for such devices.
> 
> #2) Generic PCIe switch downstream port
> The port driver doesn't map and access any MMIO in the PCI BAR.
> The iommu group is viable to user even this driver is bound.
> 
> #3) Hot Plug Controller
> The controller driver maps and access the device MMIO. The iommu
> group is not viable to user with this driver bound to its device.

I *guess* the question here is whether the bridge can or will do DMA?

I think that's orthogonal to the question of whether it implements
BARs, so I'm not sure why the MMIO BARs are part of this discussion.
I assume it's theoretically possible for a driver to use registers in
config space to program a device to do DMA, even if the device has no
BARs.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming

2021-11-15 Thread Bjorn Helgaas
On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> pci_stub allows the admin to block driver binding on a device and make
> it permanently shared with userspace. Since pci_stub does not do DMA,
> it is safe. However the admin must understand that using pci_stub allows
> userspace to attack whatever device it was bound to.

This commit log doesn't say what the patch does.  I think it tells us
something about what pci-stub *already* does ("allows admin to block
driver binding") and something about why that is safe ("does not do
DMA").

But it doesn't say what this patch changes.  Based on the subject
line, I expected something like:

  As of (""), () marks the iommu_group
  as containing only devices with kernel drivers that manage DMA.

  Avoid this default behavior for pci-stub because it does not program
  any DMA itself.  This allows .

> Signed-off-by: Lu Baolu 
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>   .name   = "pci-stub",
>   .id_table   = NULL, /* only dynamic id's */
>   .probe  = pci_stub_probe,
> + .driver = {
> + .suppress_auto_claim_dma_owner = true,
> + },
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming

2021-11-15 Thread Bjorn Helgaas
On Mon, Nov 15, 2021 at 10:05:44AM +0800, Lu Baolu wrote:
> pci_stub allows the admin to block driver binding on a device and make
> it permanently shared with userspace. Since pci_stub does not do DMA,
> it is safe. 

Can you elaborate on what "permanently shared with userspace" means
here?  I assume it's only permanent as long as pci-stub is bound to
the device?

Also, a few words about what "it is safe" means here would be helpful.

> However the admin must understand that using pci_stub allows
> userspace to attack whatever device it was bound to.

The admin isn't going to read this sentence.  Should there be a doc
update related to this?  What sort of attack does this refer to?

> Signed-off-by: Lu Baolu 
> ---
>  drivers/pci/pci-stub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci-stub.c b/drivers/pci/pci-stub.c
> index e408099fea52..6324c68602b4 100644
> --- a/drivers/pci/pci-stub.c
> +++ b/drivers/pci/pci-stub.c
> @@ -36,6 +36,9 @@ static struct pci_driver stub_driver = {
>   .name   = "pci-stub",
>   .id_table   = NULL, /* only dynamic id's */
>   .probe  = pci_stub_probe,
> + .driver = {
> + .suppress_auto_claim_dma_owner = true,
> + },
>  };
>  
>  static int __init pci_stub_init(void)
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/11] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

2021-11-15 Thread Bjorn Helgaas
On Mon, Nov 15, 2021 at 10:05:45AM +0800, Lu Baolu wrote:
> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> then all of the downstream devices will be part of the same IOMMU group
> as the bridge.

I think this means something like: "If a PCIe Switch Downstream Port
lacks , all downstream devices
will be part of the same IOMMU group as the switch," right?

If so, can you fill in the details to make it specific and concrete?

> As long as the bridge kernel driver doesn't map and
> access any PCI mmio bar, it's safe to bind it to the device in a USER-
> owned group. Hence, safe to suppress the kernel DMA ownership auto-
> claiming.

s/mmio/MMIO/ (also below)
s/bar/BAR/ (also below)

I don't understand what "kernel DMA ownership auto-claiming" means.
Presumably that's explained in previous patches and a code comment
near "suppress_auto_claim_dma_owner".

> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") permitted a
> class of kernel drivers. 

Permitted them to do what?

> This is not always safe. For example, the SHPC
> system design requires that it must be integrated into a PCI-to-PCI
> bridge or a host bridge.

If this SHPC example is important, it would be nice to have a citation
to the spec section that requires this.

> The shpchp_core driver relies on the PCI mmio
> bar access for the controller functionality. Binding it to the device
> belonging to a USER-owned group will allow the user to change the
> controller via p2p transactions which is unknown to the hot-plug driver
> and could lead to some unpredictable consequences.
> 
> Now that we have driver self-declaration of safety we should rely on that.

Can you spell out what drivers are self-declaring?  Are they declaring
that they don't program their devices to do DMA?

> This change may cause regression on some platforms, since all bridges were
> exempted before, but now they have to be manually audited before doing so.
> This is actually the desired outcome anyway.

Please spell out what regression this may cause and how users would
recognize it.  Also, please give a hint about why that is desirable.

> Suggested-by: Jason Gunthorpe 
> Suggested-by: Kevin Tian 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/pci/pcie/portdrv_pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..1285862a9aa8 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -203,6 +203,8 @@ static struct pci_driver pcie_portdriver = {
>   .err_handler= _portdrv_err_handler,
>  
>   .driver.pm  = PCIE_PORTDRV_PM_OPS,
> +
> + .driver.suppress_auto_claim_dma_owner = true,
>  };
>  
>  static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces

2021-11-15 Thread Bjorn Helgaas
On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> From the perspective of who is initiating the device to do DMA, device
> DMA could be divided into the following types:
> 
> DMA_OWNER_KERNEL: kernel device driver intiates the DMA
> DMA_OWNER_USER: userspace device driver intiates the DMA

s/intiates/initiates/ (twice)

As your first sentence suggests, the driver doesn't actually
*initiate* the DMA in either case.  One of the drivers programs the
device, and the *device* initiates the DMA.

> DMA_OWNER_KERNEL and DMA_OWNER_USER are exclusive for all devices in
> same iommu group as an iommu group is the smallest granularity of device
> isolation and protection that the IOMMU subsystem can guarantee.

I think this basically says DMA_OWNER_KERNEL and DMA_OWNER_USER are
attributes of the iommu_group (not an individual device), and it
applies to all devices in the iommu_group.  Below, you allude to the
fact that the interfaces are per-device.  It's not clear to me why you
made a per-device interface instead of a per-group interface.

> This
> extends the iommu core to enforce this exclusion when devices are
> assigned to userspace.
> 
> Basically two new interfaces are provided:
> 
> int iommu_device_set_dma_owner(struct device *dev,
> enum iommu_dma_owner mode, struct file *user_file);
> void iommu_device_release_dma_owner(struct device *dev,
> enum iommu_dma_owner mode);
> 
> Although above interfaces are per-device, DMA owner is tracked per group
> under the hood. An iommu group cannot have both DMA_OWNER_KERNEL
> and DMA_OWNER_USER set at the same time. Violation of this assumption
> fails iommu_device_set_dma_owner().
> 
> Kernel driver which does DMA have DMA_OWNER_KENREL automatically
> set/released in the driver binding process (see next patch).

s/DMA_OWNER_KENREL/DMA_OWNER_KERNEL/

> Kernel driver which doesn't do DMA should not set the owner type (via a
> new suppress flag in next patch). Device bound to such driver is considered
> same as a driver-less device which is compatible to all owner types.
> 
> Userspace driver framework (e.g. vfio) should set DMA_OWNER_USER for
> a device before the userspace is allowed to access it, plus a fd pointer to
> mark the user identity so a single group cannot be operated by multiple
> users simultaneously. Vice versa, the owner type should be released after
> the user access permission is withdrawn.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)

2021-11-05 Thread Bjorn Helgaas
On Fri, Nov 05, 2021 at 12:56:09PM +0100, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> On a PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021, Linux 5.10.70 takes
> almost five seconds to initialize PCI. According to the timestamps, 1.5 s
> are from assigning the PCI devices to the 142 IOMMU groups.
> 
> ```
> $ lspci | wc -l
> 281
> $ dmesg
> […]
> [2.918411] PCI: Using host bridge windows from ACPI; if necessary, use
> "pci=nocrs" and report a bug
> [2.933841] ACPI: Enabled 5 GPEs in block 00 to 7F
> [2.973739] ACPI: PCI Root Bridge [PC00] (domain  [bus 00-16])
> [2.980398] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM
> ClockPM Segments MSI HPX-Type3]
> [2.989457] acpi PNP0A08:00: _OSC: platform does not support [LTR]
> [2.995451] acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability]
> [3.001394] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using
> BIOS configuration
> [3.010511] PCI host bridge to bus :00
> […]
> [6.233508] system 00:05: [io  0x1000-0x10fe] has been reserved
> [6.239420] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active)
> [6.239906] pnp: PnP ACPI: found 6 devices

For ~280 PCI devices, (6.24-2.92)/280 = 0.012 s/dev.  On my laptop I
have about (.66-.37)/36 = 0.008 s/dev (on v5.4), so about the same
ballpark.

Faster would always be better, of course.  I assume this is not really
a regression?

> [6.989016] pci :d7:05.0: disabled boot interrupts on device
> [8086:2034]
> [6.996063] PCI: CLS 0 bytes, default 64
> [7.08] Trying to unpack rootfs image as initramfs...
> [7.065281] Freeing initrd memory: 5136K
> […]
> [7.079098] DMAR: dmar7: Using Queued invalidation
> [7.083983] pci :00:00.0: Adding to iommu group 0
> […]
> [8.537808] pci :d7:17.1: Adding to iommu group 141

I don't have this iommu stuff turned on and don't know what's
happening here.

> Is there anything that could be done to reduce the time?
> 
> 
> Kind regards,
> 
> Paul
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 04/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:44PM -0600, Logan Gunthorpe wrote:
> Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
> implementations. It takes an scatterlist segment that must point to a
> pci_p2pdma struct page and will map it if the mapping requires a bus
> address.
> 
> The return value indicates whether the mapping required a bus address
> or whether the caller still needs to map the segment normally. If the
> segment should not be mapped, -EREMOTEIO is returned.
> 
> This helper uses a state structure to track the changes to the
> pgmap across calls and avoid needing to lookup into the xarray for
> every page.
> 
> Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
> dma_map_sg() implementations where the sg segment containing the page
> differs from the sg segment containing the DMA address.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Ditto.

> ---
>  drivers/pci/p2pdma.c   | 59 ++
>  include/linux/pci-p2pdma.h | 21 ++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index b656d8c801a7..58c34f1f1473 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -943,6 +943,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, 
> struct scatterlist *sg,
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>  
> +/**
> + * pci_p2pdma_map_segment - map an sg segment determining the mapping type
> + * @state: State structure that should be declared outside of the 
> for_each_sg()
> + *   loop and initialized to zero.
> + * @dev: DMA device that's doing the mapping operation
> + * @sg: scatterlist segment to map
> + *
> + * This is a helper to be used by non-iommu dma_map_sg() implementations 
> where
> + * the sg segment is the same for the page_link and the dma_address.

s/non-iommu/non-IOMMU/

> + *
> + * Attempt to map a single segment in an SGL with the PCI bus address.
> + * The segment must point to a PCI P2PDMA page and thus must be
> + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
> + *
> + * Returns the type of mapping used and maps the page if the type is
> + * PCI_P2PDMA_MAP_BUS_ADDR.
> + */
> +enum pci_p2pdma_map_type
> +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device 
> *dev,
> +struct scatterlist *sg)
> +{
> + if (state->pgmap != sg_page(sg)->pgmap) {
> + state->pgmap = sg_page(sg)->pgmap;
> + state->map = pci_p2pdma_map_type(state->pgmap, dev);
> + state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
> + }
> +
> + if (state->map == PCI_P2PDMA_MAP_BUS_ADDR) {
> + sg->dma_address = sg_phys(sg) + state->bus_off;
> + sg_dma_len(sg) = sg->length;
> + sg_dma_mark_pci_p2pdma(sg);
> + }
> +
> + return state->map;
> +}
> +
> +/**
> + * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
> + *   be mapped with PCI_P2PDMA_MAP_BUS_ADDR
> + * @pg_sg: scatterlist segment with the page to map
> + * @dma_sg: scatterlist segment to assign a dma address to

s/dma address/DMA address/, also below

> + *
> + * This is a helper for iommu dma_map_sg() implementations when the
> + * segment for the dma address differs from the segment containing the
> + * source page.
> + *
> + * pci_p2pdma_map_type() must have already been called on the pg_sg and
> + * returned PCI_P2PDMA_MAP_BUS_ADDR.
> + */
> +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> + struct scatterlist *dma_sg)
> +{
> + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
> +
> + dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
> + sg_dma_len(dma_sg) = pg_sg->length;
> + sg_dma_mark_pci_p2pdma(dma_sg);
> +}
> +
>  /**
>   * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
>   *   to enable p2pdma
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index caac2d023f8f..e5a8d5bc0f51 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -13,6 +13,12 @@
>  
>  #include 
>  
> +struct pci_p2pdma_map_state {
> + struct dev_pagemap *pgmap;
> + int map;
> + u64 bus_off;
> +};
> +
>  struct block_device;
>  struct scatterlist;
>  
> @@ -70,6 +76,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
> scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
>  void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatter

Re: [PATCH v3 13/20] PCI/P2PDMA: remove pci_p2pdma_[un]map_sg()

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:53PM -0600, Logan Gunthorpe wrote:
> This interface is superseded by support in dma_map_sg() which now supports
> heterogeneous scatterlists. There are no longer any users, so remove it.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Ditto.

> ---
>  drivers/pci/p2pdma.c   | 65 --
>  include/linux/pci-p2pdma.h | 27 
>  2 files changed, 92 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 58c34f1f1473..4478633346bd 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -878,71 +878,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
> dev_pagemap *pgmap,
>   return type;
>  }
>  
> -static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
> - struct device *dev, struct scatterlist *sg, int nents)
> -{
> - struct scatterlist *s;
> - int i;
> -
> - for_each_sg(sg, s, nents, i) {
> - s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
> - sg_dma_len(s) = s->length;
> - }
> -
> - return nents;
> -}
> -
> -/**
> - * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
> - * @dev: device doing the DMA request
> - * @sg: scatter list to map
> - * @nents: elements in the scatterlist
> - * @dir: DMA direction
> - * @attrs: DMA attributes passed to dma_map_sg() (if called)
> - *
> - * Scatterlists mapped with this function should be unmapped using
> - * pci_p2pdma_unmap_sg_attrs().
> - *
> - * Returns the number of SG entries mapped or 0 on error.
> - */
> -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs)
> -{
> - struct pci_p2pdma_pagemap *p2p_pgmap =
> - to_p2p_pgmap(sg_page(sg)->pgmap);
> -
> - switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
> - case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> - return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> - case PCI_P2PDMA_MAP_BUS_ADDR:
> - return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
> - default:
> - return 0;
> - }
> -}
> -EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
> -
> -/**
> - * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
> - *   mapped with pci_p2pdma_map_sg()
> - * @dev: device doing the DMA request
> - * @sg: scatter list to map
> - * @nents: number of elements returned by pci_p2pdma_map_sg()
> - * @dir: DMA direction
> - * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
> - */
> -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs)
> -{
> - enum pci_p2pdma_map_type map_type;
> -
> - map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
> -
> - if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
> - dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
> -}
> -EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
> -
>  /**
>   * pci_p2pdma_map_segment - map an sg segment determining the mapping type
>   * @state: State structure that should be declared outside of the 
> for_each_sg()
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index e5a8d5bc0f51..0c33a40a86e7 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -72,10 +72,6 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct 
> scatterlist *sgl);
>  void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
>  enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>struct device *dev);
> -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs);
> -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs);
>  enum pci_p2pdma_map_type
>  pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device 
> *dev,
>  struct scatterlist *sg);
> @@ -135,17 +131,6 @@ pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct 
> device *dev)
>  {
>   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>  }
> -static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
> - struct scatterlist *sg, int nents, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - return 0;
> -}
> -static inline void pci_p2pdma_unmap_sg_attrs(struct

Re: [PATCH v3 02/20] PCI/P2PDMA: attempt to set map_type if it has not been set

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:42PM -0600, Logan Gunthorpe wrote:
> Attempt to find the mapping type for P2PDMA pages on the first
> DMA map attempt if it has not been done ahead of time.
> 
> Previously, the mapping type was expected to be calculated ahead of
> time, but if pages are to come from userspace then there's no
> way to ensure the path was checked ahead of time.
> 
> With this change it's no longer invalid to call pci_p2pdma_map_sg()
> before the mapping type is calculated so drop the WARN_ON when that
> is the case.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Capitalize subject line.

> ---
>  drivers/pci/p2pdma.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 50cdde3e9a8b..1192c465ba6d 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -848,6 +848,7 @@ static enum pci_p2pdma_map_type 
> pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>   struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
>   struct pci_dev *client;
>   struct pci_p2pdma *p2pdma;
> + int dist;
>  
>   if (!provider->p2pdma)
>   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
> @@ -864,6 +865,10 @@ static enum pci_p2pdma_map_type 
> pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>   type = xa_to_value(xa_load(>map_types,
>  map_types_idx(client)));
>   rcu_read_unlock();
> +
> + if (type == PCI_P2PDMA_MAP_UNKNOWN)
> + return calc_map_type_and_dist(provider, client, , false);
> +
>   return type;
>  }
>  
> @@ -906,7 +911,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
> scatterlist *sg,
>   case PCI_P2PDMA_MAP_BUS_ADDR:
>   return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
>   default:
> - WARN_ON_ONCE(1);
>   return 0;
>   }
>  }
> -- 
> 2.30.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:59PM -0600, Logan Gunthorpe wrote:
> Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap
> a hunk of p2pmem into userspace.
> 
> Pages are allocated from the genalloc in bulk and their reference count
> incremented. They are returned to the genalloc when the page is put.
> 
> The VMA does not take a reference to the pages when they are inserted
> with vmf_insert_mixed() (which is necessary for zone device pages) so
> the backing P2P memory is stored in a structures in vm_private_data.
> 
> A pseudo mount is used to allocate an inode for each PCI device. The
> inode's address_space is used in the file doing the mmap so that all
> VMAs are collected and can be unmapped if the PCI device is unbound.
> After unmapping, the VMAs are iterated through and their pages are
> put so the device can continue to be unbound. An active flag is used
> to signal to VMAs not to allocate any further P2P memory once the
> removal process starts. The flag is synchronized with concurrent
> access with an RCU lock.
> 
> The VMAs and inode will survive after the unbind of the device, but no
> pages will be present in the VMA and a subsequent access will result
> in a SIGBUS error.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

I would capitalize "Introduce" in the subject line.

> ---
>  drivers/pci/p2pdma.c   | 263 -
>  include/linux/pci-p2pdma.h |  11 ++
>  include/uapi/linux/magic.h |   1 +
>  3 files changed, 273 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 2422af5a529c..a5adf57af53a 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -16,14 +16,19 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct pci_p2pdma {
>   struct gen_pool *pool;
>   bool p2pmem_published;
>   struct xarray map_types;
> + struct inode *inode;
> + bool active;
>  };
>  
>  struct pci_p2pdma_pagemap {
> @@ -32,6 +37,14 @@ struct pci_p2pdma_pagemap {
>   u64 bus_offset;
>  };
>  
> +struct pci_p2pdma_map {
> + struct kref ref;
> + struct pci_dev *pdev;
> + struct inode *inode;
> + void *kaddr;
> + size_t len;
> +};
> +
>  static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap)
>  {
>   return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap);
> @@ -100,6 +113,26 @@ static const struct attribute_group p2pmem_group = {
>   .name = "p2pmem",
>  };
>  
> +/*
> + * P2PDMA internal mount
> + * Fake an internal VFS mount-point in order to allocate struct address_space
> + * mappings to remove VMAs on unbind events.
> + */
> +static int pci_p2pdma_fs_cnt;
> +static struct vfsmount *pci_p2pdma_fs_mnt;
> +
> +static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc)
> +{
> + return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM;
> +}
> +
> +static struct file_system_type pci_p2pdma_fs_type = {
> + .name = "p2dma",
> + .owner = THIS_MODULE,
> + .init_fs_context = pci_p2pdma_fs_init_fs_context,
> + .kill_sb = kill_anon_super,
> +};
> +
>  static void p2pdma_page_free(struct page *page)
>  {
>   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
> @@ -128,6 +161,9 @@ static void pci_p2pdma_release(void *data)
>   gen_pool_destroy(p2pdma->pool);
>   sysfs_remove_group(>dev.kobj, _group);
>   xa_destroy(>map_types);
> +
> + iput(p2pdma->inode);
> + simple_release_fs(_p2pdma_fs_mnt, _p2pdma_fs_cnt);
>  }
>  
>  static int pci_p2pdma_setup(struct pci_dev *pdev)
> @@ -145,17 +181,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
>   if (!p2p->pool)
>   goto out;
>  
> - error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
> + error = simple_pin_fs(_p2pdma_fs_type, _p2pdma_fs_mnt,
> +   _p2pdma_fs_cnt);
>   if (error)
>   goto out_pool_destroy;
>  
> + p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb);
> + if (IS_ERR(p2p->inode)) {
> + error = -ENOMEM;
> + goto out_unpin_fs;
> + }
> +
> + error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
> + if (error)
> + goto out_put_inode;
> +
>   error = sysfs_create_group(>dev.kobj, _group);
>   if (error)
> - goto out_pool_destroy;
> + goto out_put_inode;
>  
>   rcu_assign_pointer(pdev->p2p

Re: [PATCH v3 03/20] PCI/P2PDMA: make pci_p2pdma_map_type() non-static

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:43PM -0600, Logan Gunthorpe wrote:
> pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
> implementation because it will need to determine the mapping type
> ahead of actually doing the mapping to create the actual iommu mapping.

I don't expect this to go via the PCI tree, but if it did I would
silently:

  s/PCI/P2PDMA: make pci_p2pdma_map_type() non-static/
PCI/P2PDMA: Expose pci_p2pdma_map_type()/
  s/iommu/IOMMU/

and mention what this patch does in the commit log (in addition to the
subject) and fix a couple minor typos below.

> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c   | 24 +-
>  include/linux/pci-p2pdma.h | 41 ++
>  2 files changed, 56 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 1192c465ba6d..b656d8c801a7 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -20,13 +20,6 @@
>  #include 
>  #include 
>  
> -enum pci_p2pdma_map_type {
> - PCI_P2PDMA_MAP_UNKNOWN = 0,
> - PCI_P2PDMA_MAP_NOT_SUPPORTED,
> - PCI_P2PDMA_MAP_BUS_ADDR,
> - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> -};
> -
>  struct pci_p2pdma {
>   struct gen_pool *pool;
>   bool p2pmem_published;
> @@ -841,8 +834,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool 
> publish)
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>  
> -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap 
> *pgmap,
> - struct device *dev)
> +/**
> + * pci_p2pdma_map_type - return the type of mapping that should be used for
> + *   a given device and pgmap
> + * @pgmap: the pagemap of a page to determine the mapping type for
> + * @dev: device that is mapping the page
> + *
> + * Returns one of:
> + *   PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
> + *   PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
> + *   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done normally
> + *   using the CPU physical address (in dma-direct) or an IOVA
> + *   mapping for the IOMMU.
> + */
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +  struct device *dev)
>  {
>   enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
>   struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index 8318a97c9c61..caac2d023f8f 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -16,6 +16,40 @@
>  struct block_device;
>  struct scatterlist;
>  
> +enum pci_p2pdma_map_type {
> + /*
> +  * PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping
> +  * type hasn't been calculated yet. Functions that return this enum
> +  * never return this value.
> +  */
> + PCI_P2PDMA_MAP_UNKNOWN = 0,
> +
> + /*
> +  * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will
> +  * traverse the host bridge and the host bridge is not in the
> +  * whitelist. DMA Mapping routines should return an error when
> +  * this is returned.
> +  */
> + PCI_P2PDMA_MAP_NOT_SUPPORTED,
> +
> + /*
> +  * PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to
> +  * eachother directly through a PCI switch and the transaction will
> +  * not traverse the host bridge. Such a mapping should program
> +  * the DMA engine with PCI bus addresses.

s/eachother/each other/

> +  */
> + PCI_P2PDMA_MAP_BUS_ADDR,
> +
> + /*
> +  * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk
> +  * to eachother, but the transaction traverses a host bridge on the
> +  * whitelist. In this case, a normal mapping either with CPU physical
> +  * addresses (in the case of dma-direct) or IOVA addresses (in the
> +  * case of IOMMUs) should be used to program the DMA engine.

s/eachother/each other/

> +  */
> + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> +};
> +
>  #ifdef CONFIG_PCI_P2PDMA
>  int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
>   u64 offset);
> @@ -30,6 +64,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev 
> *pdev,
>unsigned int *nents, u32 length);
>  void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
>  void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,

Re: [PATCH v2 6/9] PCI: Add pci_find_dvsec_capability to find designated VSEC

2021-09-27 Thread Bjorn Helgaas
s/pci_find_dvsec_capability/pci_find_dvsec_capability()/ in subject
and commit log.

On Thu, Sep 23, 2021 at 10:26:44AM -0700, Ben Widawsky wrote:
> Add pci_find_dvsec_capability to locate a Designated Vendor-Specific
> Extended Capability with the specified DVSEC ID.

"specified Vendor ID and Capability ID".

> The Designated Vendor-Specific Extended Capability (DVSEC) allows one or
> more vendor specific capabilities that aren't tied to the vendor ID of
> the PCI component.
> 
> DVSEC is critical for both the Compute Express Link (CXL) driver as well
> as the driver for OpenCAPI coherent accelerator (OCXL).

Strictly speaking, not really relevant for the commit log.

> Cc: David E. Box 
> Cc: Jonathan Cameron 
> Cc: Bjorn Helgaas 
> Cc: Dan Williams 
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: Andrew Donnellan 
> Cc: Lu Baolu 
> Reviewed-by: Frederic Barrat 
> Signed-off-by: Ben Widawsky 

If you want to merge this with the series,

Acked-by: Bjorn Helgaas 

Or if you want me to merge this on a branch, let me know.

> ---
>  drivers/pci/pci.c   | 32 
>  include/linux/pci.h |  1 +
>  2 files changed, 33 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ce2ab62b64cf..94ac86ff28b0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -732,6 +732,38 @@ u16 pci_find_vsec_capability(struct pci_dev *dev, u16 
> vendor, int cap)
>  }
>  EXPORT_SYMBOL_GPL(pci_find_vsec_capability);
>  
> +/**
> + * pci_find_dvsec_capability - Find DVSEC for vendor
> + * @dev: PCI device to query
> + * @vendor: Vendor ID to match for the DVSEC
> + * @dvsec: Designated Vendor-specific capability ID
> + *
> + * If DVSEC has Vendor ID @vendor and DVSEC ID @dvsec return the capability
> + * offset in config space; otherwise return 0.
> + */
> +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> +{
> + int pos;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DVSEC);
> + if (!pos)
> + return 0;
> +
> + while (pos) {
> + u16 v, id;
> +
> + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER1, );
> + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER2, );
> + if (vendor == v && dvsec == id)
> + return pos;
> +
> + pos = pci_find_next_ext_capability(dev, pos, 
> PCI_EXT_CAP_ID_DVSEC);
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
> +
>  /**
>   * pci_find_parent_resource - return resource region of parent bus of given
>   * region
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index cd8aa6fce204..c93ccfa4571b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1130,6 +1130,7 @@ u16 pci_find_ext_capability(struct pci_dev *dev, int 
> cap);
>  u16 pci_find_next_ext_capability(struct pci_dev *dev, u16 pos, int cap);
>  struct pci_bus *pci_find_next_bus(const struct pci_bus *from);
>  u16 pci_find_vsec_capability(struct pci_dev *dev, u16 vendor, int cap);
> +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec);
>  
>  u64 pci_get_dsn(struct pci_dev *dev);
>  
> -- 
> 2.33.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] PCI: only build xen-pcifront in PV-enabled environments

2021-09-17 Thread Bjorn Helgaas
s/only/Only/ in subject

On Fri, Sep 17, 2021 at 12:48:03PM +0200, Jan Beulich wrote:
> The driver's module init function, pcifront_init(), invokes
> xen_pv_domain() first thing. That construct produces constant "false"
> when !CONFIG_XEN_PV. Hence there's no point building the driver in
> non-PV configurations.

Thanks for these bread crumbs.  xen_domain_type is set to
XEN_PV_DOMAIN only by xen_start_kernel() in enlighten_pv.c, which is
only built when CONFIG_XEN_PV=y, so even I can verify this :)

> Drop the (now implicit and generally wrong) X86 dependency: At present,
> XEN_PV con only be set when X86 is also enabled. In general an
> architecture supporting Xen PV (and PCI) would want to have this driver
> built.

s/con only/can only/

> Signed-off-by: Jan Beulich 
> Reviewed-by: Stefano Stabellini 

Acked-by: Bjorn Helgaas 

> ---
> v2: Title and description redone.
> 
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -110,7 +110,7 @@ config PCI_PF_STUB
>  
>  config XEN_PCIDEV_FRONTEND
>   tristate "Xen PCI Frontend"
> - depends on X86 && XEN
> + depends on XEN_PV
>   select PCI_XEN
>   select XEN_XENBUS_FRONTEND
>   default y
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses

2021-09-03 Thread Bjorn Helgaas
From: Bjorn Helgaas 

719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed
the DMA fault reason from hex to decimal.  It also added "0x" prefixes to
the PCI bus/device, e.g.,

  - DMAR: [INTR-REMAP] Request device [00:00.5]
  + DMAR: [INTR-REMAP] Request device [0x00:0x00.5]

These no longer match dev_printk() and other similar messages in
dmar_match_pci_path() and dmar_acpi_insert_dev_scope().

Drop the "0x" prefixes from the bus and device addresses.

Fixes: 719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault")
Signed-off-by: Bjorn Helgaas 
---
 drivers/iommu/intel/dmar.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d66f79acd14d..8647a355dad0 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1944,18 +1944,18 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, 
int type,
reason = dmar_get_fault_reason(fault_reason, _type);
 
if (fault_type == INTR_REMAP)
-   pr_err("[INTR-REMAP] Request device [0x%02x:0x%02x.%d] fault 
index 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 
0x%llx [fault reason 0x%02x] %s\n",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr >> 48,
   fault_reason, reason);
else if (pasid == INVALID_IOASID)
-   pr_err("[%s NO_PASID] Request device [0x%02x:0x%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[%s NO_PASID] Request device [%02x:%02x.%d] fault addr 
0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr,
   fault_reason, reason);
else
-   pr_err("[%s PASID 0x%x] Request device [0x%02x:0x%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
+   pr_err("[%s PASID 0x%x] Request device [%02x:%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write", pasid,
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
   PCI_FUNC(source_id & 0xFF), addr,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-09-02 Thread Bjorn Helgaas
[+cc Marek, Anders, Robin]

On Fri, Aug 20, 2021 at 02:57:12PM -0500, Bjorn Helgaas wrote:
> On Fri, May 21, 2021 at 03:03:24AM +, Wang Xingang wrote:
> > From: Xingang Wang 
> > 
> > When booting with devicetree, the pci_request_acs() is called after the
> > enumeration and initialization of PCI devices, thus the ACS is not
> > enabled. And ACS should be enabled when IOMMU is detected for the
> > PCI host bridge, so add check for IOMMU before probe of PCI host and call
> > pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> > devices.
> > 
> > Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> > configuring IOMMU linkage")
> > Signed-off-by: Xingang Wang 
> 
> Applied to pci/virtualization for v5.15, thanks!

I dropped this for now, until the problems reported by Marek and
Anders get sorted out.

> > ---
> >  drivers/iommu/of_iommu.c | 1 -
> >  drivers/pci/of.c | 8 +++-
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> > index a9d2df001149..54a14da242cc 100644
> > --- a/drivers/iommu/of_iommu.c
> > +++ b/drivers/iommu/of_iommu.c
> > @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct 
> > device *dev,
> > .np = master_np,
> > };
> >  
> > -   pci_request_acs();
> > err = pci_for_each_dma_alias(to_pci_dev(dev),
> >  of_pci_iommu_init, );
> > } else {
> > diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> > index da5b414d585a..2313c3f848b0 100644
> > --- a/drivers/pci/of.c
> > +++ b/drivers/pci/of.c
> > @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct 
> > device *dev,
> >  
> >  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> > *bridge)
> >  {
> > -   if (!dev->of_node)
> > +   struct device_node *node = dev->of_node;
> > +
> > +   if (!node)
> > return 0;
> >  
> > +   /* Detect IOMMU and make sure ACS will be enabled */
> > +   if (of_property_read_bool(node, "iommu-map"))
> > +   pci_request_acs();
> > +
> > bridge->swizzle_irq = pci_common_swizzle;
> > bridge->map_irq = of_irq_parse_and_map_pci;
> >  
> > -- 
> > 2.19.1
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-08-20 Thread Bjorn Helgaas
On Fri, May 21, 2021 at 03:03:24AM +, Wang Xingang wrote:
> From: Xingang Wang 
> 
> When booting with devicetree, the pci_request_acs() is called after the
> enumeration and initialization of PCI devices, thus the ACS is not
> enabled. And ACS should be enabled when IOMMU is detected for the
> PCI host bridge, so add check for IOMMU before probe of PCI host and call
> pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> devices.
> 
> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 

Applied to pci/virtualization for v5.15, thanks!

> ---
>  drivers/iommu/of_iommu.c | 1 -
>  drivers/pci/of.c | 8 +++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..54a14da242cc 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>  
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index da5b414d585a..2313c3f848b0 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
> *dev,
>  
>  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> *bridge)
>  {
> - if (!dev->of_node)
> + struct device_node *node = dev->of_node;
> +
> + if (!node)
>   return 0;
>  
> + /* Detect IOMMU and make sure ACS will be enabled */
> + if (of_property_read_bool(node, "iommu-map"))
> + pci_request_acs();
> +
>   bridge->swizzle_irq = pci_common_swizzle;
>   bridge->map_irq = of_irq_parse_and_map_pci;
>  
> -- 
> 2.19.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-06-04 Thread Bjorn Helgaas
[+cc John, who tested 6bf6c24720d3]

On Fri, May 21, 2021 at 03:03:24AM +, Wang Xingang wrote:
> From: Xingang Wang 
> 
> When booting with devicetree, the pci_request_acs() is called after the
> enumeration and initialization of PCI devices, thus the ACS is not
> enabled. And ACS should be enabled when IOMMU is detected for the
> PCI host bridge, so add check for IOMMU before probe of PCI host and call
> pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> devices.

I'm happy to apply this, but I'm a little puzzled about 6bf6c24720d3
("iommu/of: Request ACS from the PCI core when configuring IOMMU
linkage").  It was tested and fixed a problem, but I don't understand
how.

6bf6c24720d3 added the call to pci_request_acs() in
of_iommu_configure() so it currently looks like this:

  of_iommu_configure(dev, ...)
  {
if (dev_is_pci(dev))
  pci_request_acs();

pci_request_acs() sets pci_acs_enable, which tells us to enable ACS
when enumerating PCI devices in the future.  But we only call
pci_request_acs() if we already *have* a PCI device.

So maybe 6bf6c24720d3 fixed a problem for *some* PCI devices, but not
all?  E.g., did we call of_iommu_configure() for one PCI device before
enumerating the rest?

> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 
> ---
>  drivers/iommu/of_iommu.c | 1 -
>  drivers/pci/of.c | 8 +++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..54a14da242cc 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>  
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index da5b414d585a..2313c3f848b0 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
> *dev,
>  
>  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> *bridge)
>  {
> - if (!dev->of_node)
> + struct device_node *node = dev->of_node;
> +
> + if (!node)
>   return 0;
>  
> + /* Detect IOMMU and make sure ACS will be enabled */
> + if (of_property_read_bool(node, "iommu-map"))
> + pci_request_acs();
> +
>   bridge->swizzle_irq = pci_common_swizzle;
>   bridge->map_irq = of_irq_parse_and_map_pci;
>  
> -- 
> 2.19.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/of: Fix request and enable ACS for of_iommu_configure

2021-05-07 Thread Bjorn Helgaas
On Fri, May 07, 2021 at 12:49:53PM +, Wang Xingang wrote:
> From: Xingang Wang 
> 
> When request ACS for PCI device in of_iommu_configure, the pci device
> has already been scanned and added with 'pci_acs_enable=0'. So the
> pci_request_acs() in current procedure does not work for enabling ACS.
> Besides, the ACS should be enabled only if there's an IOMMU in system.
> So this fix the call of pci_request_acs() and call pci_enable_acs() to
> make sure ACS is enabled for the pci_device.

For consistency:

  s/of_iommu_configure/of_iommu_configure()/
  s/pci device/PCI device/
  s/pci_device/PCI device/

But I'm confused about what problem this fixes.  On x86, I think we
*do* set pci_acs_enable=1 in this path:

  start_kernel
mm_init
  mem_init
pci_iommu_alloc
  p->detect()
detect_intel_iommu   # IOMMU_INIT_POST(detect_intel_iommu)
  pci_request_acs
pci_acs_enable = 1

before enumerating any PCI devices.

But you mentioned pci_host_common_probe(), which I think is mostly
used on non-x86 architectures, and I'm guessing those arches detect
the IOMMU differently.

So my question is, can we figure out how to detect IOMMUs the same way
across all arches?

> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 
> ---
>  drivers/iommu/of_iommu.c | 10 +-
>  drivers/pci/pci.c|  2 +-
>  include/linux/pci.h  |  1 +
>  3 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..dc621861ae72 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>  
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> @@ -222,6 +221,15 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   /* The fwspec pointer changed, read it again */
>   fwspec = dev_iommu_fwspec_get(dev);
>   ops= fwspec->ops;
> +
> + /*
> +  * If we found an IOMMU and the device is pci,
> +  * make sure we enable ACS.

s/pci/PCI/ for consistency.

> +  */
> + if (dev_is_pci(dev)) {
> + pci_request_acs();
> + pci_enable_acs(to_pci_dev(dev));
> + }
>   }
>   /*
>* If we have reason to believe the IOMMU driver missed the initial
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b717680377a9..4e4f98ee2870 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -926,7 +926,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   * pci_enable_acs - enable ACS if hardware support it
>   * @dev: the PCI device
>   */
> -static void pci_enable_acs(struct pci_dev *dev)
> +void pci_enable_acs(struct pci_dev *dev)
>  {
>   if (!pci_acs_enable)
>   goto disable_acs_redir;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c20211e59a57..e6a8bfbc9c98 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -2223,6 +2223,7 @@ static inline struct pci_dev 
> *pcie_find_root_port(struct pci_dev *dev)
>  }
>  
>  void pci_request_acs(void);
> +void pci_enable_acs(struct pci_dev *dev);
>  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
>  bool pci_acs_path_enabled(struct pci_dev *start,
> struct pci_dev *end, u16 acs_flags);
> -- 
> 2.19.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] pci: Rename pci_dev->untrusted to pci_dev->external

2021-04-20 Thread Bjorn Helgaas
On Tue, Apr 20, 2021 at 07:10:06AM +0100, Christoph Hellwig wrote:
> On Mon, Apr 19, 2021 at 05:30:49PM -0700, Rajat Jain wrote:
> > The current flag name "untrusted" is not correct as it is populated
> > using the firmware property "external-facing" for the parent ports. In
> > other words, the firmware only says which ports are external facing, so
> > the field really identifies the devices as external (vs internal).
> > 
> > Only field renaming. No functional change intended.
> 
> I don't think this is a good idea.  First the field should have been
> added to the generic struct device as requested multiple times before.

Fair point.  There isn't anything PCI-specific about this idea.  The
ACPI "ExternalFacingPort" and DT "external-facing" are currently only
defined for PCI devices, but could be applied elsewhere.

> Right now this requires horrible hacks in the IOMMU code to get at the
> pci_dev, and also doesn't scale to various other potential users.

Agreed, this is definitely suboptimal.  Do you have other users in
mind?  Maybe they could help inform the plan.

> Second the untrusted is objectively a better name.  Because untrusted
> is how we treat the device, which is what mattes.  External is just
> how we come to that conclusion.

The decision to treat "external" as being "untrusted" is a little bit
of policy that the PCI core really doesn't care about, so I think it
does make some sense to let the places that *do* care decide what to
trust based on "external" and possibly other factors, e.g., whether
the device is a BMC or processes untrusted data, etc.

But I guess it makes sense to wait until we have a better motivation
before renaming it, since we don't gain any functionality here.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 02/11] PCI/P2PDMA: Avoid pci_get_slot() which sleeps

2021-03-12 Thread Bjorn Helgaas
On Thu, Mar 11, 2021 at 04:31:32PM -0700, Logan Gunthorpe wrote:
> In order to use upstream_bridge_distance_warn() from a dma_map function,
> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
> might sleep.
> 
> In order to avoid this, try to get the host bridge's device from
> bus->self, and if that is not set just get the first element in the
> list. It should be impossible for the host bridges device to go away
> while references are held on child devices, so the first element
> should not change and this should be safe.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/pci/p2pdma.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index bd89437faf06..2135fe69bb07 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -311,11 +311,15 @@ static const struct pci_p2pdma_whitelist_entry {
>  static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>   bool same_host_bridge)
>  {
> - struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>   const struct pci_p2pdma_whitelist_entry *entry;
> + struct pci_dev *root = host->bus->self;
>   unsigned short vendor, device;
>  
>   if (!root)
> + root = list_first_entry_or_null(>bus->devices,
> + struct pci_dev, bus_list);

Replacing one ugliness (assuming there is a pci_dev for the host
bridge, and that it is at 00.0) with another (still assuming a pci_dev
and that it is host->bus->self or the first entry).  I can't suggest
anything better, but maybe a little comment in the code would help
future readers.

I wish we had a real way to discover this property without the
whitelist, at least for future devices.  Was there ever any interest
in a _DSM or similar interface for this?

I *am* very glad to remove a pci_get_slot() usage.

> +
> + if (!root || root->devfn)
>   return false;
>  
>   vendor = root->vendor;

Don't you need to also remove the "pci_dev_put(root)" a few lines
below?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 01/11] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()

2021-03-12 Thread Bjorn Helgaas
On Thu, Mar 11, 2021 at 04:31:31PM -0700, Logan Gunthorpe wrote:
> In order to call this function from a dma_map function, it must not sleep.
> The only reason it does sleep so to allocate the seqbuf to print
> which devices are within the ACS path.

s/this function/upstream_bridge_distance_warn()/ ?
s/so to/is to/

Maybe the subject could say something about the purpose, e.g., allow
calling from atomic context or something?  "Pass gfp_mask flags" sort
of restates what we can read from the patch, but without the
motivation of why this is useful.

> Switch the kmalloc call to use a passed in gfp_mask  and don't print that
> message if the buffer fails to be allocated.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 196382630363..bd89437faf06 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
>  
>  static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev 
> *pdev)
>  {
> - if (!buf)
> + if (!buf || !buf->buffer)
>   return;
>  
>   seq_buf_printf(buf, "%s;", pci_name(pdev));
> @@ -495,25 +495,26 @@ upstream_bridge_distance(struct pci_dev *provider, 
> struct pci_dev *client,
>  
>  static enum pci_p2pdma_map_type
>  upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev 
> *client,
> -   int *dist)
> +   int *dist, gfp_t gfp_mask)
>  {
>   struct seq_buf acs_list;
>   bool acs_redirects;
>   int ret;
>  
> - seq_buf_init(_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
> - if (!acs_list.buffer)
> - return -ENOMEM;
> + seq_buf_init(_list, kmalloc(PAGE_SIZE, gfp_mask), PAGE_SIZE);
>  
>   ret = upstream_bridge_distance(provider, client, dist, _redirects,
>  _list);
>   if (acs_redirects) {
>   pci_warn(client, "ACS redirect is set between the client and 
> provider (%s)\n",
>pci_name(provider));
> - /* Drop final semicolon */
> - acs_list.buffer[acs_list.len-1] = 0;
> - pci_warn(client, "to disable ACS redirect for this path, add 
> the kernel parameter: pci=disable_acs_redir=%s\n",
> -  acs_list.buffer);
> +
> + if (acs_list.buffer) {
> + /* Drop final semicolon */
> + acs_list.buffer[acs_list.len - 1] = 0;
> + pci_warn(client, "to disable ACS redirect for this 
> path, add the kernel parameter: pci=disable_acs_redir=%s\n",
> +  acs_list.buffer);
> + }
>   }
>  
>   if (ret == PCI_P2PDMA_MAP_NOT_SUPPORTED) {
> @@ -566,7 +567,7 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, 
> struct device **clients,
>  
>   if (verbose)
>   ret = upstream_bridge_distance_warn(provider,
> - pci_client, );
> + pci_client, , GFP_KERNEL);
>   else
>   ret = upstream_bridge_distance(provider, pci_client,
>  , NULL, NULL);
> -- 
> 2.20.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] PCI: vmd: Disable MSI/X remapping when possible

2021-02-05 Thread Bjorn Helgaas
On Thu, Feb 04, 2021 at 12:09:06PM -0700, Jon Derrick wrote:
> VMD will retransmit child device MSI/X using its own MSI/X table and
> requester-id. This limits the number of MSI/X available to the whole
> child device domain to the number of VMD MSI/X interrupts.
> 
> Some VMD devices have a mode where this remapping can be disabled,
> allowing child device interrupts to bypass processing with the VMD MSI/X
> domain interrupt handler and going straight the child device interrupt
> handler, allowing for better performance and scaling. The requester-id
> still gets changed to the VMD endpoint's requester-id, and the interrupt
> remapping handlers have been updated to properly set IRTE for child
> device interrupts to the VMD endpoint's context.
> 
> Some VMD platforms have existing production BIOS which rely on MSI/X
> remapping and won't explicitly program the MSI/X remapping bit. This
> re-enables MSI/X remapping on unload.

Trivial comments below.  Would you mind using "MSI-X" instead of
"MSI/X" so it matches the usage in the PCIe specs?  Several mentions
above (including subject) and below.

> Signed-off-by: Jon Derrick 
> ---
>  drivers/pci/controller/vmd.c | 60 
>  1 file changed, 48 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 5e80f28f0119..a319ce49645b 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -59,6 +59,13 @@ enum vmd_features {
>* be used for MSI remapping
>*/
>   VMD_FEAT_OFFSET_FIRST_VECTOR= (1 << 3),
> +
> + /*
> +  * Device can bypass remapping MSI/X transactions into its MSI/X table,
> +  * avoding the requirement of a VMD MSI domain for child device

s/avoding/avoiding/

> +  * interrupt handling

Maybe a period at the end of the sentence.

> +  */
> + VMD_FEAT_BYPASS_MSI_REMAP   = (1 << 4),
>  };
>  
>  /*
> @@ -306,6 +313,15 @@ static struct msi_domain_info vmd_msi_domain_info = {
>   .chip   = _msi_controller,
>  };
>  
> +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable)
> +{
> + u16 reg;
> +
> + pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, );
> + reg = enable ? (reg & ~0x2) : (reg | 0x2);

Would be nice to have a #define for 0x2.

> + pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg);
> +}
> +
>  static int vmd_create_irq_domain(struct vmd_dev *vmd)
>  {
>   struct fwnode_handle *fn;
> @@ -325,6 +341,13 @@ static int vmd_create_irq_domain(struct vmd_dev *vmd)
>  
>  static void vmd_remove_irq_domain(struct vmd_dev *vmd)
>  {
> + /*
> +  * Some production BIOS won't enable remapping between soft reboots.
> +  * Ensure remapping is restored before unloading the driver.
> +  */
> + if (!vmd->msix_count)
> + vmd_enable_msi_remapping(vmd, true);
> +
>   if (vmd->irq_domain) {
>   struct fwnode_handle *fn = vmd->irq_domain->fwnode;
>  
> @@ -679,15 +702,31 @@ static int vmd_enable_domain(struct vmd_dev *vmd, 
> unsigned long features)
>  
>   sd->node = pcibus_to_node(vmd->dev->bus);
>  
> - ret = vmd_create_irq_domain(vmd);
> - if (ret)
> - return ret;
> -
>   /*
> -  * Override the irq domain bus token so the domain can be distinguished
> -  * from a regular PCI/MSI domain.
> +  * Currently MSI remapping must be enabled in guest passthrough mode
> +  * due to some missing interrupt remapping plumbing. This is probably
> +  * acceptable because the guest is usually CPU-limited and MSI
> +  * remapping doesn't become a performance bottleneck.
>*/
> - irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> + if (!(features & VMD_FEAT_BYPASS_MSI_REMAP) || offset[0] || offset[1]) {
> + ret = vmd_alloc_irqs(vmd);
> + if (ret)
> + return ret;
> +
> + vmd_enable_msi_remapping(vmd, true);
> +
> + ret = vmd_create_irq_domain(vmd);
> + if (ret)
> + return ret;
> +
> + /*
> +  * Override the irq domain bus token so the domain can be
> +  * distinguished from a regular PCI/MSI domain.
> +  */
> + irq_domain_update_bus_token(vmd->irq_domain, 
> DOMAIN_BUS_VMD_MSI);
> + } else {
> + vmd_enable_msi_remapping(vmd, false);
> + }
>  
>   pci_add_resource(, >resources[0]);
>   pci_add_resource_offset(, >resources[1], offset[0]);
> @@ -753,10 +792,6 @@ static int vmd_probe(struct pci_dev *dev, const struct 
> pci_device_id *id)
>   if (features & VMD_FEAT_OFFSET_FIRST_VECTOR)
>   vmd->first_vec = 1;
>  
> - err = vmd_alloc_irqs(vmd);
> - if (err)
> - return err;
> -
>   spin_lock_init(>cfg_lock);
>   pci_set_drvdata(dev, vmd);
>   err = vmd_enable_domain(vmd, features);
> @@ 

[PATCH] iommu/vt-d: Fix 'physical' typos

2021-01-26 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Fix misspellings of "physical".

Signed-off-by: Bjorn Helgaas 
---
 include/linux/intel-iommu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 09c6a0bf3892..3ae86385b222 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -662,7 +662,7 @@ static inline struct dmar_domain *to_dmar_domain(struct 
iommu_domain *dom)
  * 7: super page
  * 8-10: available
  * 11: snoop behavior
- * 12-63: Host physcial address
+ * 12-63: Host physical address
  */
 struct dma_pte {
u64 val;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-12-17 Thread Bjorn Helgaas
On Wed, Dec 16, 2020 at 07:24:30PM +0800, Zhou Wang wrote:
> On 2020/6/23 23:04, Bjorn Helgaas wrote:
> > On Fri, Jun 19, 2020 at 10:26:54AM +0800, Zhangfei Gao wrote:
> >> Have studied _DSM method, two issues we met comparing using quirk.
> >>
> >> 1. Need change definition of either pci_host_bridge or pci_dev, like adding
> >> member can_stall,
> >> while pci system does not know stall now.
> >>
> >> a, pci devices do not have uuid: uuid need be described in dsdt, while pci
> >> devices are not defined in dsdt.
> >> so we have to use host bridge.
> > 
> > PCI devices *can* be described in the DSDT.  IIUC these particular
> > devices are hardwired (not plug-in cards), so platform firmware can
> > know about them and could describe them in the DSDT.
> > 
> >> b,  Parsing dsdt is in in pci subsystem.
> >> Like drivers/acpi/pci_root.c:
> >>obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), 
> >> _acpi_dsm_guid,
> >> 1,
> >> IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> >>
> >> After parsing DSM in pci, we need record this info.
> >> Currently, can_stall info is recorded in iommu_fwspec,
> >> which is allocated in iommu_fwspec_init and called by iort_iommu_configure
> >> for uefi.
> > 
> > You can look for a _DSM wherever it is convenient for you.  It could
> > be in an AMBA shim layer.
> > 
> >> 2. Guest kernel also need support sva.
> >> Using quirk, the guest can boot with sva enabled, since quirk is
> >> self-contained by kernel.
> >> If using  _DSM, a specific uefi or dtb has to be provided,
> >> currently we can useQEMU_EFI.fd from apt install qemu-efi
> > 
> > I don't quite understand what this means, but as I mentioned before, a
> > quirk for a *limited* number of devices is OK, as long as there is a
> > plan that removes the need for a quirk for future devices.
> > 
> > E.g., if the next platform version ships with a DTB or firmware with a
> > _DSM or other mechanism that enables the kernel to discover this
> > information without a kernel change, it's fine to use a quirk to cover
> > the early platform.
> > 
> > The principles are:
> > 
> >   - I don't want to have to update a quirk for every new Device ID
> > that needs this.
> 
> Hi Bjorn and Zhangfei,
> 
> We plan to use ATS/PRI to support SVA in future PCI devices. However, for
> current devices, we need to add limited number of quirk to let them
> work. The device IDs of current quirk needed devices are ZIP engine(0xa250, 
> 0xa251),
> SEC engine(0xa255, 0xa256), HPRE engine(0xa258, 0xa259), revision id are
> 0x21 and 0x30.
> 
> Let's continue to upstream these quirks!

Please post the patches you propose.  I don't think the previous ones
are in my queue.  Please include the lore URL for the previous
posting(s) in the cover letter so we can connect the discussion.

> >   - I don't really want to have to manage non-PCI information in the
> > struct pci_dev.  If this is AMBA- or IOMMU-related, it should be
> > stored in a structure related to AMBA or the IOMMU.
> > .
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-12-10 Thread Bjorn Helgaas
On Thu, Dec 10, 2020 at 03:36:36PM +, Deucher, Alexander wrote:
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> > Sent: Thursday, December 10, 2020 5:48 AM
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> > 
> > Alright. Done that.
> > This should be it finally I believe.
> > Which will be the initial kernel-version that incorporates that?
> 
> Looks good to me.  Bjorn, can you pick this up for PCI?

Didn't apply cleanly, but I applied it by hand to pci/misc for v5.11.
If all goes well it should appear in v5.11-rc1.

https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/misc=23bb0d9a9fe70a8ff23f53af822f2c6e6f261818

> > -Original Message-
> > From: Deucher, Alexander 
> > Sent: Mittwoch, 9. Dezember 2020 15:24
> > To: Merger, Edgar [AUTOSOL/MAS/AUGS] ; 
> > Huang, Ray ; Kuehling, Felix 
> > 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> > 
> > [AMD Public Use]
> > 
> > > -Original Message-
> > > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > 
> > > Sent: Wednesday, December 9, 2020 2:59 AM
> > > To: Deucher, Alexander ; Huang, Ray 
> > > ; Kuehling, Felix 
> > > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > > Helgaas ; Joerg Roedel ; Zhu, 
> > > Changfeng 
> > > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > > broken
> > >
> > > Alex,
> > >
> > > I had to revise the patch. Please see attachment. It is actually two 
> > > more SSIDs affected to that.
> > 
> > Other than some minor whitespace issues, the patch looks fine to me.
> > Please align the subsystem_device lines and put the closing 
> > parenthesis on the same line as the last check.
> > 
> > Thanks!
> > 
> > Alex
> > 
> > >
> > > Best regards,
> > > Edgar
> > >
> > > -Original Message-
> > > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > > Sent: Dienstag, 8. Dezember 2020 09:23
> > > To: 'Deucher, Alexander' ; 'Huang, Ray'
> > > ; 'Kuehling, Felix' 
> > > Cc: 'Will Deacon' ; 'linux-ker...@vger.kernel.org'
> > > ; 'linux-...@vger.kernel.org'  > > p...@vger.kernel.org>; 'iommu@lists.linux-foundation.org'
> > > ; 'Bjorn Helgaas'
> > > ; 'Joerg Roedel' ; 'Zhu, 
> > > Changfeng' 
> > > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > > broken
> > >
> > > Applied the patch as in attachment. Verified that ATS for GPU-Device 
> > > had been disabled. See attachment "dmesg_ATS.log".
> > >
> > > Was running that build over night successfully.
> > >
> > > -Original Message-
> > > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> > > Sent: Montag, 7. Dezember 2020 05:53
> > > To: Deucher, Alexander ; Huang, Ray 
> > > ; Kuehling, Felix 
> > > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > > Helgaas ; Joerg Roedel ; Zhu, 
> > > Changfeng 
> > > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > > broken
> > >
> > > Hi Alex,
> > >
> > > I believe in the patch file, this
> > > + (pdev->subsystem_device == 0x0c19 ||
> > > +      pdev->subsystem_device == 0x0c10))
> > >
> > > Has to be changed to:
> > > + (pdev->subsystem_device == 0xce19 ||
> > > +  pdev->subsystem_device == 0xcc10))
> > >
> > > Because our SSIDs are "ea50:ce19" and "ea50:cc10" respectively and 
> > > another one would "ea50:cc08".
> > >
> > > I will apply that patch and feedback the results soon plus the patch 
> > > file that I actually had applied.
> > &g

Re: [RFC PATCH 03/15] PCI/P2PDMA: Introduce pci_p2pdma_should_map_bus() and pci_p2pdma_bus_offset()

2020-11-10 Thread Bjorn Helgaas
On Fri, Nov 06, 2020 at 10:00:24AM -0700, Logan Gunthorpe wrote:
> Introduce pci_p2pdma_should_map_bus() which is meant to be called by
> dma map functions to determine how to map a given p2pdma page.

s/dma/DMA/ for consistency (also below in function comment)

> pci_p2pdma_bus_offset() is also added to allow callers to get the bus
> offset if they need to map the bus address.
> 
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/pci/p2pdma.c   | 46 ++
>  include/linux/pci-p2pdma.h | 11 +
>  2 files changed, 57 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index ea8472278b11..9961e779f430 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -930,6 +930,52 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, 
> struct scatterlist *sg,
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>  
> +/**
> + * pci_p2pdma_bus_offset - returns the bus offset for a given page
> + * @page: page to get the offset for
> + *
> + * Must be passed a pci p2pdma page.

s/pci/PCI/

> + */
> +u64 pci_p2pdma_bus_offset(struct page *page)
> +{
> + struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(page->pgmap);
> +
> + WARN_ON(!is_pci_p2pdma_page(page));
> +
> + return p2p_pgmap->bus_offset;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_bus_offset);
> +
> +/**
> + * pci_p2pdma_should_map_bus - determine if a dma mapping should use the
> + *   bus address
> + * @dev: device doing the DMA request
> + * @pgmap: dev_pagemap structure for the mapping
> + *
> + * Returns 1 if the page should be mapped with a bus address, 0 otherwise
> + * and -1 the device should not be mapping P2PDMA pages.

I think this is missing a word.

I'm not really sure how to interpret the "should" in
pci_p2pdma_should_map_bus().  If this returns -1, does that mean the
patches *cannot* be mapped?  They *could* be mapped, but you really
*shouldn't*?  Something else?

1 means page should be mapped with bus address.  0 means ... what,
exactly?  It should be mapped with some different address?

Sorry these are naive questions because I don't know how all this
works.

> + */
> +int pci_p2pdma_should_map_bus(struct device *dev, struct dev_pagemap *pgmap)
> +{
> + struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(pgmap);
> + struct pci_dev *client;
> +
> + if (!dev_is_pci(dev))
> + return -1;
> +
> + client = to_pci_dev(dev);
> +
> + switch (pci_p2pdma_map_type(p2p_pgmap->provider, client)) {
> + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> + return 0;
> + case PCI_P2PDMA_MAP_BUS_ADDR:
> + return 1;
> + default:
> + return -1;
> + }
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pdma_should_map_bus);
> +
>  /**
>   * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
>   *   to enable p2pdma
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index 8318a97c9c61..fc5de47eeac4 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -34,6 +34,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
> scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
>  void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
> +u64 pci_p2pdma_bus_offset(struct page *page);
> +int pci_p2pdma_should_map_bus(struct device *dev, struct dev_pagemap *pgmap);
>  int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
>   bool *use_p2pdma);
>  ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
> @@ -83,6 +85,15 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev 
> *pdev,
>  static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
>  {
>  }
> +static inline u64 pci_p2pdma_bus_offset(struct page *page)
> +{
> + return -1;
> +}
> +static inline int pci_p2pdma_should_map_bus(struct device *dev,
> + struct dev_pagemap *pgmap)
> +{
> + return -1;
> +}
>  static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
>   struct scatterlist *sg, int nents, enum dma_data_direction dir,
>   unsigned long attrs)
> -- 
> 2.20.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] PCI/p2p: cleanup up __pci_p2pdma_map_sg a bit

2020-11-04 Thread Bjorn Helgaas
s|PCI/p2p: cleanup up __pci_p2pdma_map_sg|PCI/P2PDMA: Cleanup up 
__pci_p2pdma_map_sg|
to match history.

On Wed, Nov 04, 2020 at 10:50:51AM +0100, Christoph Hellwig wrote:
> Remove the pointless paddr variable that was only used once.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index b07018af53876c..afd792cc272832 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -825,13 +825,10 @@ static int __pci_p2pdma_map_sg(struct 
> pci_p2pdma_pagemap *p2p_pgmap,
>   struct device *dev, struct scatterlist *sg, int nents)
>  {
>   struct scatterlist *s;
> - phys_addr_t paddr;
>   int i;
>  
>   for_each_sg(sg, s, nents, i) {
> - paddr = sg_phys(s);
> -
> - s->dma_address = paddr - p2p_pgmap->bus_offset;
> + s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
>   sg_dma_len(s) = s->length;
>   }
>  
> -- 
> 2.28.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] PCI/p2p: remove the DMA_VIRT_OPS hacks

2020-11-04 Thread Bjorn Helgaas
s|PCI/p2p: remove|PCI/P2PDMA: Remove/
to match history.

On Wed, Nov 04, 2020 at 10:50:50AM +0100, Christoph Hellwig wrote:
> Now that all users of dma_virt_ops are gone we can remove the workaround
> for it in the PCIe peer to peer code.

s/PCIe/PCI/
We went to some trouble to make P2PDMA work on conventional PCI as
well as PCIe.

> Signed-off-by: Christoph Hellwig 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c | 20 
>  1 file changed, 20 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index de1c331dbed43f..b07018af53876c 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -556,15 +556,6 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, 
> struct device **clients,
>   return -1;
>  
>   for (i = 0; i < num_clients; i++) {
> -#ifdef CONFIG_DMA_VIRT_OPS
> - if (clients[i]->dma_ops == _virt_ops) {
> - if (verbose)
> - dev_warn(clients[i],
> -  "cannot be used for peer-to-peer DMA 
> because the driver makes use of dma_virt_ops\n");
> - return -1;
> - }
> -#endif
> -
>   pci_client = find_parent_pci_dev(clients[i]);
>   if (!pci_client) {
>   if (verbose)
> @@ -837,17 +828,6 @@ static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap 
> *p2p_pgmap,
>   phys_addr_t paddr;
>   int i;
>  
> - /*
> -  * p2pdma mappings are not compatible with devices that use
> -  * dma_virt_ops. If the upper layers do the right thing
> -  * this should never happen because it will be prevented
> -  * by the check in pci_p2pdma_distance_many()
> -  */
> -#ifdef CONFIG_DMA_VIRT_OPS
> - if (WARN_ON_ONCE(dev->dma_ops == _virt_ops))
> - return 0;
> -#endif
> -
>   for_each_sg(sg, s, nents, i) {
>   paddr = sg_phys(s);
>  
> -- 
> 2.28.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Bug 209321] DMAR: [DMA Read] Request device [03:00.0] PASID ffffffff fault addr fffd3000 [fault reason 06] PTE Read access is not set

2020-10-07 Thread Bjorn Helgaas
https://bugzilla.kernel.org/show_bug.cgi?id=209321

Not much detail in the bugzilla yet, but apparently this started in
v5.8.0-rc1:

  DMAR: [DMA Read] Request device [03:00.0] PASID  fault addr fffd3000 
[fault reason 06] PTE Read access is not set

Currently assigned to Driver/PCI, but not clear to me yet whether PCI
is the culprit or the victim.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-25 Thread Bjorn Helgaas
On Fri, Sep 25, 2020 at 10:12:43AM +0200, Jean-Philippe Brucker wrote:
> On Thu, Sep 24, 2020 at 10:22:03AM -0500, Bjorn Helgaas wrote:
> > On Fri, Aug 21, 2020 at 03:15:39PM +0200, Jean-Philippe Brucker wrote:

> > > + /* Perform the init sequence before we can read the config */
> > > + ret = viommu_pci_reset(common_cfg);
> > 
> > I guess this is some special device-specific reset, not any kind of
> > standard PCI reset?
> 
> Yes it's the virtio reset - writing 0 to the status register in the BAR.

I wonder if this should be named something like viommu_virtio_reset(),
so there's no confusion with PCI resets and all the timing
restrictions, config space restoration, etc. associated with them.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-24 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 03:15:39PM +0200, Jean-Philippe Brucker wrote:
> Platforms without device-tree nor ACPI can provide a topology
> description embedded into the virtio config space. Parse it.
> 
> Use PCI FIXUP to probe the config space early, because we need to
> discover the topology before any DMA configuration takes place, and the
> virtio driver may be loaded much later. Since we discover the topology
> description when probing the PCI hierarchy, the virtual IOMMU cannot
> manage other platform devices discovered earlier.

> +struct viommu_cap_config {
> + u8 bar;
> + u32 length; /* structure size */
> + u32 offset; /* structure offset within the bar */

s/the bar/the BAR/ (to match comment below).

> +static void viommu_pci_parse_topology(struct pci_dev *dev)
> +{
> + int ret;
> + u32 features;
> + void __iomem *regs, *common_regs;
> + struct viommu_cap_config cap = {0};
> + struct virtio_pci_common_cfg __iomem *common_cfg;
> +
> + /*
> +  * The virtio infrastructure might not be loaded at this point. We need
> +  * to access the BARs ourselves.
> +  */
> + ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_COMMON_CFG, );
> + if (!ret) {
> + pci_warn(dev, "common capability not found\n");

Is the lack of this capability really an error, i.e., is this
pci_warn() or pci_info()?  The "device doesn't have topology
description" below is only pci_dbg(), which suggests that we can live
without this.

Maybe a hint about what "common capability" means?

> + return;
> + }
> +
> + if (pci_enable_device_mem(dev))
> + return;
> +
> + common_regs = pci_iomap(dev, cap.bar, 0);
> + if (!common_regs)
> + return;
> +
> + common_cfg = common_regs + cap.offset;
> +
> + /* Perform the init sequence before we can read the config */
> + ret = viommu_pci_reset(common_cfg);

I guess this is some special device-specific reset, not any kind of
standard PCI reset?

> + if (ret < 0) {
> + pci_warn(dev, "unable to reset device\n");
> + goto out_unmap_common;
> + }
> +
> + iowrite8(VIRTIO_CONFIG_S_ACKNOWLEDGE, _cfg->device_status);
> + iowrite8(VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER,
> +  _cfg->device_status);
> +
> + /* Find out if the device supports topology description */
> + iowrite32(0, _cfg->device_feature_select);
> + features = ioread32(_cfg->device_feature);
> +
> + if (!(features & BIT(VIRTIO_IOMMU_F_TOPOLOGY))) {
> + pci_dbg(dev, "device doesn't have topology description");
> + goto out_reset;
> + }
> +
> + ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, );
> + if (!ret) {
> + pci_warn(dev, "device config capability not found\n");
> + goto out_reset;
> + }
> +
> + regs = pci_iomap(dev, cap.bar, 0);
> + if (!regs)
> + goto out_reset;
> +
> + pci_info(dev, "parsing virtio-iommu topology\n");
> + ret = viommu_parse_topology(>dev, regs + cap.offset,
> + pci_resource_len(dev, 0) - cap.offset);
> + if (ret)
> + pci_warn(dev, "failed to parse topology: %d\n", ret);
> +
> + pci_iounmap(dev, regs);
> +out_reset:
> + ret = viommu_pci_reset(common_cfg);
> + if (ret)
> + pci_warn(dev, "unable to reset device\n");
> +out_unmap_common:
> + pci_iounmap(dev, common_regs);
> +}
> +
> +/*
> + * Catch a PCI virtio-iommu implementation early to get the topology 
> description
> + * before we start probing other endpoints.
> + */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1040 + 
> VIRTIO_ID_IOMMU,
> + viommu_pci_parse_topology);
> -- 
> 2.28.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[bugzilla-dae...@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]

2020-09-23 Thread Bjorn Helgaas
[+cc IOMMU and NVMe folks]

Sorry, I forgot to forward this to linux-pci when it was first
reported.

Apparently this happens with v5.9-rc3, and may be related to
50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
which appeared in v5.8-rc3.

There are several dmesg logs and proposed patches in the bugzilla, but
no analysis yet of what the problem is.  From the first dmesg
attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):

  [   50.434945] PM: suspend entry (deep)
  [   50.802086] nvme :01:00.0: saving config space at offset 0x0 (reading 
0x11e0f)
  [   50.842775] ACPI: Preparing to enter system sleep state S3
  [   50.858922] ACPI: Waking up from system sleep state S3
  [   50.883622] nvme :01:00.0: can't change power state from D3hot to D0 
(config space inaccessible)
  [   50.947352] nvme :01:00.0: restoring config space at offset 0x0 (was 
0x, writing 0x11e0f)
  [   50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 
source:0x
  [   50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error 
detected
  [   50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected 
(Non-Fatal), type=Transaction Layer, (Receiver ID)
  [   50.947830] pcieport :00:1b.0:   device [8086:06ac] error 
status/mask=0020/0001
  [   50.947831] pcieport :00:1b.0:[21] ACSViol(First)
  [   50.947841] pcieport :00:1b.0: AER: broadcast error_detected message
  [   50.947843] nvme nvme0: frozen state error detected, reset controller

I suspect the nvme "can't change power state" and restore config space
errors are a consequence of the DPC event.  If DPC disables the link,
the device is inaccessible.

I don't know what caused the ACS Violation.  The AER TLP Header Log
might have a clue, but unfortunately we didn't print it.

Tangent:

  The fact that we didn't print the AER TLP Header log looks like
  a bug in itself.  PCIe r5.0, sec 6.2.7, table 6-5, says many
  errors, including ACS Violation, should log the TLP header.  But
  aer_get_device_error_info() only reads the log for error bits in
  AER_LOG_TLP_MASKS, which doesn't include PCI_ERR_UNC_ACSV.

  I don't think there's a "TLP Header Log Valid" bit, and it's ugly to
  have to update AER_LOG_TLP_MASKS if new errors are added.  I think
  maybe we should always print the header log.

- Forwarded message from bugzilla-dae...@bugzilla.kernel.org -

Date: Fri, 04 Sep 2020 14:31:20 +
From: bugzilla-dae...@bugzilla.kernel.org
To: bj...@helgaas.com
Subject: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in
hint" makes NVMe config space not accessible after S3
Message-ID: 

https://bugzilla.kernel.org/show_bug.cgi?id=209149

Bug ID: 209149
   Summary: "iommu/vt-d: Enable PCI ACS for platform opt in hint"
makes NVMe config space not accessible after S3
   Product: Drivers
   Version: 2.5
Kernel Version: mainline
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PCI
  Assignee: drivers_...@kernel-bugs.osdl.org
  Reporter: kai.heng.f...@canonical.com
Regression: No

Here's the error:
[   50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01
source:0x
[   50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error
detected
[   50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Receiver ID)
[   50.947830] pcieport :00:1b.0:   device [8086:06ac] error
status/mask=0020/0001
[   50.947831] pcieport :00:1b.0:[21] ACSViol(First)
[   50.947841] pcieport :00:1b.0: AER: broadcast error_detected message
[   50.947843] nvme nvme0: frozen state error detected, reset controller

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

- End forwarded message -
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5] PCI/ACS: Enable PCI_ACS_TB and disable only when needed for ATS

2020-09-16 Thread Bjorn Helgaas
On Tue, Jul 14, 2020 at 01:15:40PM -0700, Rajat Jain wrote:
> The ACS "Translation Blocking" bit blocks the translated addresses from
> the devices. We don't expect such traffic from devices unless ATS is
> enabled on them. A device sending such traffic without ATS enabled,
> indicates malicious intent, and thus should be blocked.
> 
> Enable PCI_ACS_TB by default for all devices, and it stays enabled until
> atleast one of the devices downstream wants to enable ATS. It gets
> disabled to enable ATS on a device downstream it, and then gets enabled
> back on once all the downstream devices don't need ATS.
> 
> Signed-off-by: Rajat Jain 

I applied v4 of this patch instead because I think the complexity of
this one, where we have to walk up the tree and disable TB in upstream
bridges, is too high.  It's always tricky to modify the state of
device Y when we're doing something for device X.

> ---
> Note that I'm ignoring the devices that require quirks to enable or
> disable ACS, instead of using the standard way for ACS configuration.
> The reason is that it would require adding yet another quirk table or
> quirk function pointer, that I don't know how to implement for those
> devices, and will neither have the devices to test that code.
> 
> v5: Enable TB and disable ATS for all devices on boot. Disable TB later
> only if needed to enable ATS on downstream devices.
> v4: Add braces to avoid warning from kernel robot
> print warning for only external-facing devices.
> v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
> Minor code comments fixes.
> v2: Commit log change
> 
>  drivers/pci/ats.c   |  5 
>  drivers/pci/pci.c   | 57 +
>  drivers/pci/pci.h   |  2 ++
>  drivers/pci/probe.c |  2 +-
>  include/linux/pci.h |  2 ++
>  5 files changed, 67 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..e2ea9083f30f 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -28,6 +28,9 @@ void pci_ats_init(struct pci_dev *dev)
>   return;
>  
>   dev->ats_cap = pos;
> +
> + dev->ats_enabled = 1; /* To avoid WARN_ON from pci_disable_ats() */
> + pci_disable_ats(dev);
>  }
>  
>  /**
> @@ -82,6 +85,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
>   }
>   pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
>  
> + pci_disable_acs_trans_blocking(dev);
>   dev->ats_enabled = 1;
>   return 0;
>  }
> @@ -102,6 +106,7 @@ void pci_disable_ats(struct pci_dev *dev)
>   ctrl &= ~PCI_ATS_CTRL_ENABLE;
>   pci_write_config_word(dev, dev->ats_cap + PCI_ATS_CTRL, ctrl);
>  
> + pci_enable_acs_trans_blocking(dev);
>   dev->ats_enabled = 0;
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_ats);
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 73a862782214..614e3c1e8c56 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -876,6 +876,9 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> @@ -904,6 +907,60 @@ static void pci_enable_acs(struct pci_dev *dev)
>   pci_disable_acs_redir(dev);
>  }
>  
> +void pci_disable_acs_trans_blocking(struct pci_dev *pdev)
> +{
> + u16 cap, ctrl, pos;
> + struct pci_dev *dev;
> +
> + if (!pci_acs_enable)
> + return;
> +
> + for (dev = pdev; dev; dev = pci_upstream_bridge(pdev)) {
> +
> + pos = dev->acs_cap;
> + if (!pos)
> + continue;
> +
> + /*
> +  * Disable translation blocking when first downstream
> +  * device that needs it (for ATS) wants to enable ATS
> +  */
> + if (++dev->ats_dependencies == 1) {
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, );
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
> + ctrl &= ~(cap & PCI_ACS_TB);
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> + }
> + }
> +}
> +
> +void pci_enable_acs_trans_blocking(struct pci_dev *pdev)
> +{
> + u16 cap, ctrl, pos;
> + struct pci_dev *dev;
> +
> + if (!pci_acs_enable)
> + return;
> +
> + for (dev = pdev; dev; dev = pci_upstream_bridge(pdev)) {
> +
> + pos = dev->acs_cap;
> + if (!pos)
> + continue;
> +
> + /*
> +  * Enable translation blocking when last downstream device
> +  * that depends on it (for ATS), doesn't need ATS anymore
> +  */
> + if (--dev->ats_dependencies == 0) {
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, );
> + pci_read_config_word(dev, pos 

Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-09-16 Thread Bjorn Helgaas
On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 

Applied (slightly modified) to pci/acs for v5.10, thanks!

I think the warning is superfluous because every external_facing
device is a Root Port or Switch Downstream Port, and if those support
ACS at all, they are required to support Translation Blocking.  So we
should only see the warning if the device is defective, and I don't
think we need to go out of our way to look for those.

> ---
> v4: Add braces to avoid warning from kernel robot
> print warning for only external-facing devices.
> v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
> Minor code comments fixes.
> v2: Commit log change
> 
>  drivers/pci/pci.c|  8 
>  drivers/pci/quirks.c | 15 +++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 73a8627822140..a5a6bea7af7ce 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..bb22b46c1d719 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF
> + *
> + * TODO: This quirk also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted
> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,14 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/dma: Fix IOVA reserve dma ranges

2020-09-11 Thread Bjorn Helgaas
On Fri, Sep 11, 2020 at 03:55:34PM +0530, Srinath Mannam wrote:
> Fix IOVA reserve failure in the case when address of first memory region
> listed in dma-ranges is equal to 0x0.
> 
> Fixes: aadad097cd46f ("iommu/dma: Reserve IOVA for PCIe inaccessible DMA 
> address")
> Signed-off-by: Srinath Mannam 
> ---
> Changes from v1:
>Removed unnecessary changes based on Robin's review comments.
> 
>  drivers/iommu/dma-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 5141d49a046b..682068a9aae7 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -217,7 +217,7 @@ static int iova_reserve_pci_windows(struct pci_dev *dev,
>   lo = iova_pfn(iovad, start);
>   hi = iova_pfn(iovad, end);
>   reserve_iova(iovad, lo, hi);
> - } else {
> + } else if (end < start) {
>   /* dma_ranges list should be sorted */
>   dev_err(>dev, "Failed to reserve IOVA\n");

You didn't actually change the error message, but the message would be
way more useful if it included the IOVA address range, e.g., the
format used in pci_register_host_bridge():

  bus address [%#010llx-%#010llx]

Incidentally, the pr_err() in copy_reserved_iova() looks bogus; it
prints iova->pfn_low twice, when it should probably print the base and
size or (my preference) something like the above:

pr_err("Reserve iova range %lx@%lx failed\n",
   iova->pfn_lo, iova->pfn_lo);

>   return -EINVAL;
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable

2020-08-27 Thread Bjorn Helgaas
[+cc Rob,
cover https://lore.kernel.org/r/20200826111628.794979...@linutronix.de/
this  https://lore.kernel.org/r/20200826112333.992429...@linutronix.de/]

On Wed, Aug 26, 2020 at 01:17:02PM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.

Sorry, I really don't understand this, so these are probably stupid
questions.

If CONFIG_PCI_MSI_ARCH_FALLBACKS is defined, we will supply
implementations of:

  arch_setup_msi_irq
  arch_teardown_msi_irq
  arch_setup_msi_irqs
  arch_teardown_msi_irqs
  default_teardown_msi_irqs# non-weak

You select CONFIG_PCI_MSI_ARCH_FALLBACKS for ia64, mips, powerpc,
s390, sparc, and x86.  I see that all of those arches implement at
least one of the functions above.  But x86 doesn't and I can't figure
out why it needs to select CONFIG_PCI_MSI_ARCH_FALLBACKS.

I assume there's a way to convert these arches to hierarchical irq
domains so they wouldn't need this at all?  Is there a sample
conversion to look at?

And I can't figure out what's special about tegra, rcar, and xilinx
that makes them need it as well.  Is there something I could grep for
to identify them?  Is there a way to convert them so they don't need
it?

> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
>  void pci_msi_unmask_irq(struct irq_data *data);
>  
>  /*
> - * The arch hooks to setup up msi irqs. Those functions are
> - * implemented as weak symbols so that they /can/ be overriden by
> - * architecture specific code if needed.
> + * The arch hooks to setup up msi irqs. Default functions are implemented

s/msi/MSI/ to match the one below.

> + * as weak symbols so that they /can/ be overriden by architecture specific
> + * code if needed. These hooks must be enabled by the architecture or by
> + * drivers which depend on them via msi_controller based MSI handling.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI

2020-08-25 Thread Bjorn Helgaas
On Tue, Aug 25, 2020 at 11:30:41PM +0200, Thomas Gleixner wrote:
> On Tue, Aug 25 2020 at 15:24, Bjorn Helgaas wrote:
> > On Fri, Aug 21, 2020 at 02:24:58AM +0200, Thomas Gleixner wrote:
> >> Rename it to x86_msi_prepare() and handle the allocation type setup
> >> depending on the device type.
> >
> > I see what you're doing, but the subject reads a little strangely
> 
> Yes :(
> 
> > ("pci_msi_prepare() handling non-PCI" stuff) since it doesn't mention
> > the rename.  Maybe not practical or worthwhile to split into a rename
> > + make generic, I dunno.
> 
> What about
> 
> x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI

Perfect!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks

2020-08-25 Thread Bjorn Helgaas
On Tue, Aug 25, 2020 at 11:28:30PM +0200, Thomas Gleixner wrote:
> On Tue, Aug 25 2020 at 15:07, Bjorn Helgaas wrote:
> >> + * The arch hooks to setup up msi irqs. Default functions are implemented
> >> + * as weak symbols so that they /can/ be overriden by architecture 
> >> specific
> >> + * code if needed.
> >> + *
> >> + * They can be replaced by stubs with warnings via
> >> + * CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS when the architecture fully
> >> + * utilizes direct irqdomain based setup.

> > If not, it seems like it'd be nicer to have the burden on the arches
> > that need/want to use arch-specific code instead of on the arches that
> > do things generically.
> 
> Right, but they still share the common code there and some of them
> provide only parts of the weak callbacks. I'm not sure whether it's a
> good idea to copy all of this into each affected architecture.
> 
> Or did you just mean that those architectures should select
> CONFIG_I_WANT_THE CRUFT instead of opting out on the fully irq domain
> based ones?

Yes, that was my real question -- can we confine the cruft in the
crufty arches?  If not, no big deal.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:58AM +0200, Thomas Gleixner wrote:
> Rename it to x86_msi_prepare() and handle the allocation type setup
> depending on the device type.

I see what you're doing, but the subject reads a little strangely
("pci_msi_prepare() handling non-PCI" stuff) since it doesn't mention
the rename.  Maybe not practical or worthwhile to split into a rename
+ make generic, I dunno.

> Add a new arch_msi_prepare define which will be utilized by the upcoming
> device MSI support. Define it to NULL if not provided by an architecture in
> the generic MSI header.
> 
> One arch specific function for MSI support is truly enough.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-...@vger.kernel.org
> Cc: linux-hyp...@vger.kernel.org
> ---
>  arch/x86/include/asm/msi.h  |4 +++-
>  arch/x86/kernel/apic/msi.c  |   27 ---
>  drivers/pci/controller/pci-hyperv.c |2 +-
>  include/linux/msi.h |4 
>  4 files changed, 28 insertions(+), 9 deletions(-)
> 
> --- a/arch/x86/include/asm/msi.h
> +++ b/arch/x86/include/asm/msi.h
> @@ -6,7 +6,9 @@
>  
>  typedef struct irq_alloc_info msi_alloc_info_t;
>  
> -int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
> +int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
>   msi_alloc_info_t *arg);
>  
> +#define arch_msi_prepare x86_msi_prepare
> +
>  #endif /* _ASM_X86_MSI_H */
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -182,26 +182,39 @@ static struct irq_chip pci_msi_controlle
>   .flags  = IRQCHIP_SKIP_SET_WAKE,
>  };
>  
> -int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
> - msi_alloc_info_t *arg)
> +static void pci_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
>  {
> - struct pci_dev *pdev = to_pci_dev(dev);
> - struct msi_desc *desc = first_pci_msi_entry(pdev);
> + struct msi_desc *desc = first_msi_entry(dev);
>  
> - init_irq_alloc_info(arg, NULL);
>   if (desc->msi_attrib.is_msix) {
>   arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
>   } else {
>   arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
>   arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
>   }
> +}
> +
> +static void dev_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
> +{
> + arg->type = X86_IRQ_ALLOC_TYPE_DEV_MSI;
> +}
> +
> +int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
> + msi_alloc_info_t *arg)
> +{
> + init_irq_alloc_info(arg, NULL);
> +
> + if (dev_is_pci(dev))
> + pci_msi_prepare(dev, arg);
> + else
> + dev_msi_prepare(dev, arg);
>  
>   return 0;
>  }
> -EXPORT_SYMBOL_GPL(pci_msi_prepare);
> +EXPORT_SYMBOL_GPL(x86_msi_prepare);
>  
>  static struct msi_domain_ops pci_msi_domain_ops = {
> - .msi_prepare= pci_msi_prepare,
> + .msi_prepare= x86_msi_prepare,
>  };
>  
>  static struct msi_domain_info pci_msi_domain_info = {
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1532,7 +1532,7 @@ static struct irq_chip hv_msi_irq_chip =
>  };
>  
>  static struct msi_domain_ops hv_msi_ops = {
> - .msi_prepare= pci_msi_prepare,
> + .msi_prepare= arch_msi_prepare,
>   .msi_free   = hv_msi_free,
>  };
>  
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -430,4 +430,8 @@ static inline struct irq_domain *pci_msi
>  }
>  #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
>  
> +#ifndef arch_msi_prepare
> +# define arch_msi_prepareNULL
> +#endif
> +
>  #endif /* LINUX_MSI_H */
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code

2020-08-25 Thread Bjorn Helgaas
s/Reducde/Reduce/ (in subject)

On Fri, Aug 21, 2020 at 02:24:41AM +0200, Thomas Gleixner wrote:
> Adding a function call before the first #ifdef in arch_pci_init() triggers
> a 'mixed declarations and code' warning if PCI_DIRECT is enabled.
> 
> Use stub functions and move the #ifdeffery to the header file where it is
> not in the way.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-...@vger.kernel.org

Nice cleanup, thanks.  Glad to get rid of the useless initializer,
too.

Acked-by: Bjorn Helgaas 

> ---
>  arch/x86/include/asm/pci_x86.h |   11 +++
>  arch/x86/pci/init.c|   10 +++---
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -114,9 +114,20 @@ extern const struct pci_raw_ops pci_dire
>  extern bool port_cf9_safe;
>  
>  /* arch_initcall level */
> +#ifdef CONFIG_PCI_DIRECT
>  extern int pci_direct_probe(void);
>  extern void pci_direct_init(int type);
> +#else
> +static inline int pci_direct_probe(void) { return -1; }
> +static inline  void pci_direct_init(int type) { }
> +#endif
> +
> +#ifdef CONFIG_PCI_BIOS
>  extern void pci_pcbios_init(void);
> +#else
> +static inline void pci_pcbios_init(void) { }
> +#endif
> +
>  extern void __init dmi_check_pciprobe(void);
>  extern void __init dmi_check_skip_isa_align(void);
>  
> --- a/arch/x86/pci/init.c
> +++ b/arch/x86/pci/init.c
> @@ -8,11 +8,9 @@
> in the right sequence from here. */
>  static __init int pci_arch_init(void)
>  {
> -#ifdef CONFIG_PCI_DIRECT
> - int type = 0;
> + int type;
>  
>   type = pci_direct_probe();
> -#endif
>  
>   if (!(pci_probe & PCI_PROBE_NOEARLY))
>   pci_mmcfg_early_init();
> @@ -20,18 +18,16 @@ static __init int pci_arch_init(void)
>   if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
>   return 0;
>  
> -#ifdef CONFIG_PCI_BIOS
>   pci_pcbios_init();
> -#endif
> +
>   /*
>* don't check for raw_pci_ops here because we want pcbios as last
>* fallback, yet it's needed to run first to set pcibios_last_bus
>* in case legacy PCI probing is used. otherwise detecting peer busses
>* fails.
>*/
> -#ifdef CONFIG_PCI_DIRECT
>   pci_direct_init(type);
> -#endif
> +
>   if (!raw_pci_ops && !raw_pci_ext_ops)
>   printk(KERN_ERR
>   "PCI: Fatal: No config space access function found\n");
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:45AM +0200, Thomas Gleixner wrote:
> Provide a helper function to check whether a PCI device is handled by a
> non-standard PCI/MSI domain. This will be used to exclude such devices
> which hang of a special bus, e.g. VMD, to be excluded from the irq domain
> override in irq remapping.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org

Acked-by: Bjorn Helgaas 

s|PCI: MSI:|PCI/MSI:| in the subject if feasible.

> ---
>  drivers/pci/msi.c   |   22 ++
>  include/linux/msi.h |1 +
>  2 files changed, 23 insertions(+)
> 
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1553,4 +1553,26 @@ struct irq_domain *pci_msi_get_device_do
>DOMAIN_BUS_PCI_MSI);
>   return dom;
>  }
> +
> +/**
> + * pci_dev_has_special_msi_domain - Check whether the device is handled by
> + *   a non-standard PCI-MSI domain
> + * @pdev:The PCI device to check.
> + *
> + * Returns: True if the device irqdomain or the bus irqdomain is
> + * non-standard PCI/MSI.
> + */
> +bool pci_dev_has_special_msi_domain(struct pci_dev *pdev)
> +{
> + struct irq_domain *dom = dev_get_msi_domain(>dev);
> +
> + if (!dom)
> + dom = dev_get_msi_domain(>bus->dev);
> +
> + if (!dom)
> + return true;
> +
> + return dom->bus_token != DOMAIN_BUS_PCI_MSI;
> +}
> +
>  #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -374,6 +374,7 @@ int pci_msi_domain_check_cap(struct irq_
>struct msi_domain_info *info, struct device *dev);
>  u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
> *pdev);
>  struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
> +bool pci_dev_has_special_msi_domain(struct pci_dev *pdev);
>  #else
>  static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev 
> *pdev)
>  {
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:54AM +0200, Thomas Gleixner wrote:
> If an architecture does not require the MSI setup/teardown fallback
> functions, then allow them to be replaced by stub functions which emit a
> warning.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Bjorn Helgaas 
> Cc: linux-...@vger.kernel.org

Acked-by: Bjorn Helgaas 

Question/comment below.

> ---
>  drivers/pci/Kconfig |3 +++
>  drivers/pci/msi.c   |3 ++-
>  include/linux/msi.h |   31 ++-
>  3 files changed, 31 insertions(+), 6 deletions(-)
> 
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
>   depends on PCI_MSI
>   select GENERIC_MSI_IRQ_DOMAIN
>  
> +config PCI_MSI_DISABLE_ARCH_FALLBACKS
> + bool
> +
>  config PCI_QUIRKS
>   default y
>   bool "Enable PCI quirk workarounds" if EXPERT
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -58,8 +58,8 @@ static void pci_msi_teardown_msi_irqs(st
>  #define pci_msi_teardown_msi_irqsarch_teardown_msi_irqs
>  #endif
>  
> +#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
>  /* Arch hooks */
> -
>  int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
>  {
>   struct msi_controller *chip = dev->bus->msi;
> @@ -132,6 +132,7 @@ void __weak arch_teardown_msi_irqs(struc
>  {
>   return default_teardown_msi_irqs(dev);
>  }
> +#endif /* !CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS */
>  
>  static void default_restore_msi_irq(struct pci_dev *dev, int irq)
>  {
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
>  void pci_msi_unmask_irq(struct irq_data *data);
>  
>  /*
> - * The arch hooks to setup up msi irqs. Those functions are
> - * implemented as weak symbols so that they /can/ be overriden by
> - * architecture specific code if needed.
> + * The arch hooks to setup up msi irqs. Default functions are implemented
> + * as weak symbols so that they /can/ be overriden by architecture specific
> + * code if needed.
> + *
> + * They can be replaced by stubs with warnings via
> + * CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS when the architecture fully
> + * utilizes direct irqdomain based setup.

Do you expect *all* arches to eventually use direct irqdomain setup?
And in that case, to remove the config option?

If not, it seems like it'd be nicer to have the burden on the arches
that need/want to use arch-specific code instead of on the arches that
do things generically.

>   */
> +#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
>  int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
>  void arch_teardown_msi_irq(unsigned int irq);
>  int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
>  void arch_teardown_msi_irqs(struct pci_dev *dev);
> -void arch_restore_msi_irqs(struct pci_dev *dev);
> -
>  void default_teardown_msi_irqs(struct pci_dev *dev);
> +#else
> +static inline int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int 
> type)
> +{
> + WARN_ON_ONCE(1);
> + return -ENODEV;
> +}
> +
> +static inline void arch_teardown_msi_irqs(struct pci_dev *dev)
> +{
> + WARN_ON_ONCE(1);
> +}
> +#endif
> +
> +/*
> + * The restore hooks are still available as they are useful even
> + * for fully irq domain based setups. Courtesy to XEN/X86.
> + */
> +void arch_restore_msi_irqs(struct pci_dev *dev);
>  void default_restore_msi_irqs(struct pci_dev *dev);
>  
>  struct msi_controller {
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:44AM +0200, Thomas Gleixner wrote:
> Devices on the VMD bus use their own MSI irq domain, but it is not
> distinguishable from regular PCI/MSI irq domains. This is required
> to exclude VMD devices from getting the irq domain pointer set by
> interrupt remapping.
> 
> Override the default bus token.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Bjorn Helgaas 
> Cc: Lorenzo Pieralisi 
> Cc: Jonathan Derrick 
> Cc: linux-...@vger.kernel.org

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/controller/vmd.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
>   return -ENODEV;
>   }
>  
> + /*
> +  * Override the irq domain bus token so the domain can be distinguished
> +  * from a regular PCI/MSI domain.
> +  */
> + irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
> +
>   pci_add_resource(, >resources[0]);
>   pci_add_resource_offset(, >resources[1], offset[0]);
>   pci_add_resource_offset(, >resources[2], offset[1]);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq()

2020-08-25 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 02:24:37AM +0200, Thomas Gleixner wrote:
> Retrieve the PCI device from the msi descriptor instead of doing so at the
> call sites.

I'd like it *better* with "PCI/MSI: " in the subject (to match history
and other patches in this series) and "MSI" here in the commit log,
but nice cleanup and:

Acked-by: Bjorn Helgaas 

Minor comments below.

> Signed-off-by: Thomas Gleixner 
> Cc: linux-...@vger.kernel.org
> ---
>  arch/x86/kernel/apic/msi.c |2 +-
>  drivers/pci/msi.c  |   13 ++---
>  include/linux/msi.h|3 +--
>  3 files changed, 8 insertions(+), 10 deletions(-)
> 
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -232,7 +232,7 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare);
>  
>  void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
>  {
> - arg->msi_hwirq = pci_msi_domain_calc_hwirq(arg->msi_dev, desc);
> + arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc);

I guess it's safe to assume that "arg->msi_dev ==
msi_desc_to_pci_dev(desc)"?  I didn't try to verify that.

>  }
>  EXPORT_SYMBOL_GPL(pci_msi_set_desc);
>  
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1346,17 +1346,17 @@ void pci_msi_domain_write_msg(struct irq
>  
>  /**
>   * pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source
> - * @dev: Pointer to the PCI device
>   * @desc:Pointer to the MSI descriptor
>   *
>   * The ID number is only used within the irqdomain.
>   */
> -irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
> -   struct msi_desc *desc)
> +irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
>  {
> + struct pci_dev *pdev = msi_desc_to_pci_dev(desc);

If you named this "struct pci_dev *dev" (not "pdev"), the diff would
be a little smaller and it would match other usage in the file.

>   return (irq_hw_number_t)desc->msi_attrib.entry_nr |
> - pci_dev_id(dev) << 11 |
> - (pci_domain_nr(dev->bus) & 0x) << 27;
> + pci_dev_id(pdev) << 11 |
> + (pci_domain_nr(pdev->bus) & 0x) << 27;
>  }
>  
>  static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
> @@ -1406,8 +1406,7 @@ static void pci_msi_domain_set_desc(msi_
>   struct msi_desc *desc)
>  {
>   arg->desc = desc;
> - arg->hwirq = pci_msi_domain_calc_hwirq(msi_desc_to_pci_dev(desc),
> -desc);
> + arg->hwirq = pci_msi_domain_calc_hwirq(desc);
>  }
>  #else
>  #define pci_msi_domain_set_desc  NULL
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -369,8 +369,7 @@ void pci_msi_domain_write_msg(struct irq
>  struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
>struct msi_domain_info *info,
>struct irq_domain *parent);
> -irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
> -   struct msi_desc *desc);
> +irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc);
>  int pci_msi_domain_check_cap(struct irq_domain *domain,
>struct msi_domain_info *info, struct device *dev);
>  u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
> *pdev);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 00/17] Drop uses of pci_read_config_*() return value

2020-08-02 Thread Bjorn Helgaas
On Sun, Aug 02, 2020 at 08:46:48PM +0200, Borislav Petkov wrote:
> On Sun, Aug 02, 2020 at 07:28:00PM +0200, Saheed Bolarinwa wrote:
> > Because the value ~0 has a meaning to some drivers and only
> 
> No, ~0 means that the PCI read failed. For *every* PCI device I know.

Wait, I'm not convinced yet.  I know that if a PCI read fails, you
normally get ~0 data because the host bridge fabricates it to complete
the CPU load.

But what guarantees that a PCI config register cannot contain ~0?
If there's something about that in the spec I'd love to know where it
is because it would simplify a lot of things.

I don't think we should merge any of these patches as-is.  If we *do*
want to go this direction, we at least need some kind of macro or
function that tests for ~0 so we have a clue about what's happening
and can grep for it.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/1] PCI/ATS: Check PRI supported on the PF device when SRIOV is enabled

2020-07-24 Thread Bjorn Helgaas
On Thu, Jul 23, 2020 at 03:37:29PM -0700, Ashok Raj wrote:
> PASID and PRI capabilities are only enumerated in PF devices. VF devices
> do not enumerate these capabilites. IOMMU drivers also need to enumerate
> them before enabling features in the IOMMU. Extending the same support as
> PASID feature discovery (pci_pasid_features) for PRI.
> 
> Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities 
> before ATS")
> Signed-off-by: Ashok Raj 

Applied with Baolu's reviewed-by and Joerg's ack to pci/virtualization
for v5.9, thanks!

> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: Lu Baolu 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: Ashok Raj 
> Cc: iommu@lists.linux-foundation.org
> ---
> v3: Added Fixes tag
> v2: Fixed build failure reported from lkp when CONFIG_PRI=n
> 
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/ats.c   | 13 +
>  include/linux/pci-ats.h |  4 
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..276452f5e6a7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2560,7 +2560,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   }
>  
>   if (info->ats_supported && ecap_prs(iommu->ecap) &&
> - pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> + pci_pri_supported(pdev))
>   info->pri_supported = 1;
>   }
>   }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..2e6cf0c700f7 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -325,6 +325,19 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  
>   return pdev->pasid_required;
>  }
> +
> +/**
> + * pci_pri_supported - Check if PRI is supported.
> + * @pdev: PCI device structure
> + *
> + * Returns true if PRI capability is present, false otherwise.
> + */
> +bool pci_pri_supported(struct pci_dev *pdev)
> +{
> + /* VFs share the PF PRI configuration */
> + return !!(pci_physfn(pdev)->pri_cap);
> +}
> +EXPORT_SYMBOL_GPL(pci_pri_supported);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346d..df54cd5b15db 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,10 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  int pci_reset_pri(struct pci_dev *pdev);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +bool pci_pri_supported(struct pci_dev *pdev);
> +#else
> +static inline bool pci_pri_supported(struct pci_dev *pdev)
> +{ return false; }
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/1] PCI/ATS: Check PRI supported on the PF device when SRIOV is enabled

2020-07-23 Thread Bjorn Helgaas
On Thu, Jul 23, 2020 at 03:37:29PM -0700, Ashok Raj wrote:
> PASID and PRI capabilities are only enumerated in PF devices. VF devices
> do not enumerate these capabilites. IOMMU drivers also need to enumerate
> them before enabling features in the IOMMU. Extending the same support as
> PASID feature discovery (pci_pasid_features) for PRI.
> 
> Fixes: b16d0cb9e2fc ("iommu/vt-d: Always enable PASID/PRI PCI capabilities 
> before ATS")
> Signed-off-by: Ashok Raj 

This looks right to me, but I would like Joerg's ack before applying
it.

> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: Lu Baolu 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: Ashok Raj 
> Cc: iommu@lists.linux-foundation.org
> ---
> v3: Added Fixes tag
> v2: Fixed build failure reported from lkp when CONFIG_PRI=n
> 
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/ats.c   | 13 +
>  include/linux/pci-ats.h |  4 
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..276452f5e6a7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2560,7 +2560,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   }
>  
>   if (info->ats_supported && ecap_prs(iommu->ecap) &&
> - pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> + pci_pri_supported(pdev))
>   info->pri_supported = 1;
>   }
>   }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..2e6cf0c700f7 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -325,6 +325,19 @@ int pci_prg_resp_pasid_required(struct pci_dev *pdev)
>  
>   return pdev->pasid_required;
>  }
> +
> +/**
> + * pci_pri_supported - Check if PRI is supported.
> + * @pdev: PCI device structure
> + *
> + * Returns true if PRI capability is present, false otherwise.
> + */
> +bool pci_pri_supported(struct pci_dev *pdev)
> +{
> + /* VFs share the PF PRI configuration */
> + return !!(pci_physfn(pdev)->pri_cap);
> +}
> +EXPORT_SYMBOL_GPL(pci_pri_supported);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346d..df54cd5b15db 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,10 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  int pci_reset_pri(struct pci_dev *pdev);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +bool pci_pri_supported(struct pci_dev *pdev);
> +#else
> +static inline bool pci_pri_supported(struct pci_dev *pdev)
> +{ return false; }
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] PCI/ATS: PASID and PRI are only enumerated in PF devices.

2020-07-23 Thread Bjorn Helgaas
On Thu, Jul 23, 2020 at 10:38:19AM -0700, Raj, Ashok wrote:
> Hi Bjorn
> 
> On Tue, Jul 21, 2020 at 09:54:01AM -0500, Bjorn Helgaas wrote:
> > On Mon, Jul 20, 2020 at 09:43:00AM -0700, Ashok Raj wrote:
> > > PASID and PRI capabilities are only enumerated in PF devices. VF devices
> > > do not enumerate these capabilites. IOMMU drivers also need to enumerate
> > > them before enabling features in the IOMMU. Extending the same support as
> > > PASID feature discovery (pci_pasid_features) for PRI.
> > > 
> > > Signed-off-by: Ashok Raj 
> > 
> > Hi Ashok,
> > 
> > When you update this for the 0-day implicit declaration thing, can you
> > update the subject to say what the patch *does*, as opposed to what it
> > is solving?  Also, no need for a period at the end.
> 
> Yes, will update and resend. Goofed up a couple things, i'll update those
> as well.
> 
> > Does this fix a regression?  Is it associated with a commit that we
> > could add as a "Fixes:" tag so we know how far back to try to apply
> > to stable kernels?
> 
> Yes, 

Does that mean "yes, this fixes a regression"?

> but the iommu files moved location and git fixes tags only generates
> for a few handful of commits and doesn't show the old ones. 

Not sure how to interpret the rest of this.  I'm happy to include the
SHA1 of the original commit that added the regression, even if the
file has moved since then.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 03/12] ACPI/IORT: Make iort_msi_map_rid() PCI agnostic

2020-07-21 Thread Bjorn Helgaas
On Fri, Jun 19, 2020 at 09:20:04AM +0100, Lorenzo Pieralisi wrote:
> There is nothing PCI specific in iort_msi_map_rid().
> 
> Rename the function using a bus protocol agnostic name,
> iort_msi_map_id(), and convert current callers to it.
> 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Bjorn Helgaas 
> Cc: Sudeep Holla 
> Cc: Catalin Marinas 
> Cc: Robin Murphy 
> Cc: "Rafael J. Wysocki" 

Acked-by: Bjorn Helgaas 

Sorry I missed this!

> ---
>  drivers/acpi/arm64/iort.c | 12 ++--
>  drivers/pci/msi.c |  2 +-
>  include/linux/acpi_iort.h |  6 +++---
>  3 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 902e2aaca946..53f9ef515089 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -568,22 +568,22 @@ static struct acpi_iort_node *iort_find_dev_node(struct 
> device *dev)
>  }
>  
>  /**
> - * iort_msi_map_rid() - Map a MSI requester ID for a device
> + * iort_msi_map_id() - Map a MSI input ID for a device
>   * @dev: The device for which the mapping is to be done.
> - * @req_id: The device requester ID.
> + * @input_id: The device input ID.
>   *
> - * Returns: mapped MSI RID on success, input requester ID otherwise
> + * Returns: mapped MSI ID on success, input ID otherwise
>   */
> -u32 iort_msi_map_rid(struct device *dev, u32 req_id)
> +u32 iort_msi_map_id(struct device *dev, u32 input_id)
>  {
>   struct acpi_iort_node *node;
>   u32 dev_id;
>  
>   node = iort_find_dev_node(dev);
>   if (!node)
> - return req_id;
> + return input_id;
>  
> - iort_node_map_id(node, req_id, _id, IORT_MSI_TYPE);
> + iort_node_map_id(node, input_id, _id, IORT_MSI_TYPE);
>   return dev_id;
>  }
>  
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 74a91f52ecc0..77f48b95e277 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1536,7 +1536,7 @@ u32 pci_msi_domain_get_msi_rid(struct irq_domain 
> *domain, struct pci_dev *pdev)
>  
>   of_node = irq_domain_get_of_node(domain);
>   rid = of_node ? of_msi_map_rid(>dev, of_node, rid) :
> - iort_msi_map_rid(>dev, rid);
> + iort_msi_map_id(>dev, rid);
>  
>   return rid;
>  }
> diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
> index 08ec6bd2297f..e51425e083da 100644
> --- a/include/linux/acpi_iort.h
> +++ b/include/linux/acpi_iort.h
> @@ -28,7 +28,7 @@ void iort_deregister_domain_token(int trans_id);
>  struct fwnode_handle *iort_find_domain_token(int trans_id);
>  #ifdef CONFIG_ACPI_IORT
>  void acpi_iort_init(void);
> -u32 iort_msi_map_rid(struct device *dev, u32 req_id);
> +u32 iort_msi_map_id(struct device *dev, u32 id);
>  struct irq_domain *iort_get_device_domain(struct device *dev, u32 id,
> enum irq_domain_bus_token bus_token);
>  void acpi_configure_pmsi_domain(struct device *dev);
> @@ -39,8 +39,8 @@ const struct iommu_ops *iort_iommu_configure(struct device 
> *dev);
>  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
> *head);
>  #else
>  static inline void acpi_iort_init(void) { }
> -static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id)
> -{ return req_id; }
> +static inline u32 iort_msi_map_id(struct device *dev, u32 id)
> +{ return id; }
>  static inline struct irq_domain *iort_get_device_domain(
>   struct device *dev, u32 id, enum irq_domain_bus_token bus_token)
>  { return NULL; }
> -- 
> 2.26.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] PCI/ATS: PASID and PRI are only enumerated in PF devices.

2020-07-21 Thread Bjorn Helgaas
On Mon, Jul 20, 2020 at 09:43:00AM -0700, Ashok Raj wrote:
> PASID and PRI capabilities are only enumerated in PF devices. VF devices
> do not enumerate these capabilites. IOMMU drivers also need to enumerate
> them before enabling features in the IOMMU. Extending the same support as
> PASID feature discovery (pci_pasid_features) for PRI.
> 
> Signed-off-by: Ashok Raj 

Hi Ashok,

When you update this for the 0-day implicit declaration thing, can you
update the subject to say what the patch *does*, as opposed to what it
is solving?  Also, no need for a period at the end.

Does this fix a regression?  Is it associated with a commit that we
could add as a "Fixes:" tag so we know how far back to try to apply
to stable kernels?

> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: Lu Baolu 
> Cc: sta...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: Ashok Raj 
> Cc: iommu@lists.linux-foundation.org
> ---
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/ats.c   | 14 ++
>  include/linux/pci-ats.h |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..276452f5e6a7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2560,7 +2560,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   }
>  
>   if (info->ats_supported && ecap_prs(iommu->ecap) &&
> - pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI))
> + pci_pri_supported(pdev))
>   info->pri_supported = 1;
>   }
>   }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index b761c1f72f67..ffb4de8c5a77 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -461,6 +461,20 @@ int pci_pasid_features(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_pasid_features);
>  
> +/**
> + * pci_pri_supported - Check if PRI is supported.
> + * @pdev: PCI device structure
> + *
> + * Returns false when no PRI capability is present.
> + * Returns true if PRI feature is supported and enabled
> + */
> +bool pci_pri_supported(struct pci_dev *pdev)
> +{
> + /* VFs share the PF PRI configuration */
> + return !!(pci_physfn(pdev)->pri_cap);
> +}
> +EXPORT_SYMBOL_GPL(pci_pri_supported);
> +
>  #define PASID_NUMBER_SHIFT   8
>  #define PASID_NUMBER_MASK(0x1f << PASID_NUMBER_SHIFT)
>  /**
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index f75c307f346d..073d57292445 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -28,6 +28,7 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs);
>  void pci_disable_pri(struct pci_dev *pdev);
>  int pci_reset_pri(struct pci_dev *pdev);
>  int pci_prg_resp_pasid_required(struct pci_dev *pdev);
> +bool pci_pri_supported(struct pci_dev *pdev);
>  #endif /* CONFIG_PCI_PRI */
>  
>  #ifdef CONFIG_PCI_PASID
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-11 Thread Bjorn Helgaas
On Sat, Jul 11, 2020 at 05:08:51PM -0700, Rajat Jain wrote:
> On Sat, Jul 11, 2020 at 12:53 PM Bjorn Helgaas  wrote:
> > On Fri, Jul 10, 2020 at 03:53:59PM -0700, Rajat Jain wrote:
> > > On Fri, Jul 10, 2020 at 2:29 PM Raj, Ashok  wrote:
> > > > On Fri, Jul 10, 2020 at 03:29:22PM -0500, Bjorn Helgaas wrote:
> > > > > On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> > > > > > When enabling ACS, enable translation blocking for external facing 
> > > > > > ports
> > > > > > and untrusted devices.
> > > > > >
> > > > > > Signed-off-by: Rajat Jain 
> > > > > > ---
> > > > > > v4: Add braces to avoid warning from kernel robot
> > > > > > print warning for only external-facing devices.
> > > > > > v3: print warning if ACS_TB not supported on 
> > > > > > external-facing/untrusted ports.
> > > > > > Minor code comments fixes.
> > > > > > v2: Commit log change
> > > > > >
> > > > > >  drivers/pci/pci.c|  8 
> > > > > >  drivers/pci/quirks.c | 15 +++
> > > > > >  2 files changed, 23 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > > > index 73a8627822140..a5a6bea7af7ce 100644
> > > > > > --- a/drivers/pci/pci.c
> > > > > > +++ b/drivers/pci/pci.c
> > > > > > @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev 
> > > > > > *dev)
> > > > > > /* Upstream Forwarding */
> > > > > > ctrl |= (cap & PCI_ACS_UF);
> > > > > >
> > > > > > +   /* Enable Translation Blocking for external devices */
> > > > > > +   if (dev->external_facing || dev->untrusted) {
> > > > > > +   if (cap & PCI_ACS_TB)
> > > > > > +   ctrl |= PCI_ACS_TB;
> > > > > > +   else if (dev->external_facing)
> > > > > > +   pci_warn(dev, "ACS: No Translation Blocking on 
> > > > > > external-facing dev\n");
> > > > > > +   }
> > > > >
> > > > > IIUC, this means that external devices can *never* use ATS and
> > > > > can never cache translations.
> > >
> > > Yes, but it already exists today (and this patch doesn't change that):
> > > 521376741b2c2 "PCI/ATS: Only enable ATS for trusted devices"
> > >
> > > IMHO any external device trying to send ATS traffic despite having ATS
> > > disabled should count as a bad intent. And this patch is trying to
> > > plug that loophole, by blocking the AT traffic from devices that we do
> > > not expect to see AT from anyway.
> >
> > Thinking about this some more, I wonder if Linux should:
> >
> >   - Explicitly disable ATS for every device at enumeration-time, e.g.,
> > in pci_init_capabilities(),
> >
> >   - Enable PCI_ACS_TB for every device (not just external-facing or
> > untrusted ones),
> >
> >   - Disable PCI_ACS_TB for the relevant devices along the path only
> > when enabling ATS.
> >
> > One nice thing about doing that is that the "untrusted" test would be
> > only in pci_enable_ats(), and we wouldn't need one in
> > pci_std_enable_acs().
> 
> Yes, this could work.
> 
> I think I had thought about this but I'm blanking out on why I had
> given it up. I think it was because of the possibility that some
> bridges may have "Translation blocking" disabled, even if not all
> their descendents were trusted enough to enable ATS on them. But now
> thinking about this again, as long as we retain the policy of not
> enabling ATS on external devices (and thus enable TB for sure on
> them), this should not be a problem. WDYT?

I think I would feel better if we always enabled Translation Blocking
except when we actually need it for ATS.  But I'm not confident about
how all the pieces of ATS work, so I could be missing something.

> > It's possible BIOS gives us devices with ATS enabled, and this
> > might break them, but that seems like something we'd want to find
> > out about.
> 
> Why would they break? We'd disable ATS on each device as we
> enumerate them, so they'd be functional, just with ATS disabled
> until it is enabled again on internal devices as needed. Which would
> be WAI behavior?

If BIOS handed off with ATS enabled and we somehow relied on it being
already enabled, something might break if we start disabling ATS.
Just a theoretical possibility, doesn't seem likely to me.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-11 Thread Bjorn Helgaas
On Fri, Jul 10, 2020 at 03:53:59PM -0700, Rajat Jain wrote:
> On Fri, Jul 10, 2020 at 2:29 PM Raj, Ashok  wrote:
> > On Fri, Jul 10, 2020 at 03:29:22PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> > > > When enabling ACS, enable translation blocking for external facing ports
> > > > and untrusted devices.
> > > >
> > > > Signed-off-by: Rajat Jain 
> > > > ---
> > > > v4: Add braces to avoid warning from kernel robot
> > > > print warning for only external-facing devices.
> > > > v3: print warning if ACS_TB not supported on external-facing/untrusted 
> > > > ports.
> > > > Minor code comments fixes.
> > > > v2: Commit log change
> > > >
> > > >  drivers/pci/pci.c|  8 
> > > >  drivers/pci/quirks.c | 15 +++
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 73a8627822140..a5a6bea7af7ce 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
> > > > /* Upstream Forwarding */
> > > > ctrl |= (cap & PCI_ACS_UF);
> > > >
> > > > +   /* Enable Translation Blocking for external devices */
> > > > +   if (dev->external_facing || dev->untrusted) {
> > > > +   if (cap & PCI_ACS_TB)
> > > > +   ctrl |= PCI_ACS_TB;
> > > > +   else if (dev->external_facing)
> > > > +   pci_warn(dev, "ACS: No Translation Blocking on 
> > > > external-facing dev\n");
> > > > +   }
> > >
> > > IIUC, this means that external devices can *never* use ATS and
> > > can never cache translations.
> 
> Yes, but it already exists today (and this patch doesn't change that):
> 521376741b2c2 "PCI/ATS: Only enable ATS for trusted devices"
> 
> IMHO any external device trying to send ATS traffic despite having ATS
> disabled should count as a bad intent. And this patch is trying to
> plug that loophole, by blocking the AT traffic from devices that we do
> not expect to see AT from anyway.

Thinking about this some more, I wonder if Linux should:

  - Explicitly disable ATS for every device at enumeration-time, e.g.,
in pci_init_capabilities(), 

  - Enable PCI_ACS_TB for every device (not just external-facing or
untrusted ones),

  - Disable PCI_ACS_TB for the relevant devices along the path only
when enabling ATS.

One nice thing about doing that is that the "untrusted" test would be
only in pci_enable_ats(), and we wouldn't need one in
pci_std_enable_acs().

It's possible BIOS gives us devices with ATS enabled, and this might
break them, but that seems like something we'd want to find out about.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-10 Thread Bjorn Helgaas
On Fri, Jul 10, 2020 at 03:53:59PM -0700, Rajat Jain wrote:
> On Fri, Jul 10, 2020 at 2:29 PM Raj, Ashok  wrote:
> > On Fri, Jul 10, 2020 at 03:29:22PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> > > > When enabling ACS, enable translation blocking for external facing ports
> > > > and untrusted devices.
> > > >
> > > > Signed-off-by: Rajat Jain 
> > > > ---
> > > > v4: Add braces to avoid warning from kernel robot
> > > > print warning for only external-facing devices.
> > > > v3: print warning if ACS_TB not supported on external-facing/untrusted 
> > > > ports.
> > > > Minor code comments fixes.
> > > > v2: Commit log change
> > > >
> > > >  drivers/pci/pci.c|  8 
> > > >  drivers/pci/quirks.c | 15 +++
> > > >  2 files changed, 23 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index 73a8627822140..a5a6bea7af7ce 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
> > > > /* Upstream Forwarding */
> > > > ctrl |= (cap & PCI_ACS_UF);
> > > >
> > > > +   /* Enable Translation Blocking for external devices */
> > > > +   if (dev->external_facing || dev->untrusted) {
> > > > +   if (cap & PCI_ACS_TB)
> > > > +   ctrl |= PCI_ACS_TB;
> > > > +   else if (dev->external_facing)
> > > > +   pci_warn(dev, "ACS: No Translation Blocking on 
> > > > external-facing dev\n");
> > > > +   }
> > >
> > > IIUC, this means that external devices can *never* use ATS
> > and can
> > > never cache translations.
> 
> Yes, but it already exists today (and this patch doesn't change that):
> 521376741b2c2 "PCI/ATS: Only enable ATS for trusted devices"

If you get in the habit of using the commit reference style from
Documentation/process/submitting-patches.rst it saves me the trouble
of fixing them.  I use this:

  gsr is aliased to `git --no-pager show -s --abbrev-commit --abbrev=12 
--pretty=format:"%h (\"%s\")%n"'

> IMHO any external device trying to send ATS traffic despite having
> ATS disabled should count as a bad intent. And this patch is trying
> to plug that loophole, by blocking the AT traffic from devices that
> we do not expect to see AT from anyway.

That's exactly the sort of assertion I was looking for.  If we can get
something like this explanation into the commit log, and if Ashok and
Alex are OK with this, we'll be much closer.

It sounds like this is just enforcing a restriction we already have,
i.e., enabling PCI_ACS_TB blocks translated requests from devices that
aren't supposed to be generating them.

> Do you see any case where this is not true?
> 
> >  And (I guess, I'm not an expert) it can
> > > also never use the Page Request Services?
> >
> > Yep, sounds like it.
> 
> Yes, from spec "Address Translation Services" Rev 1.1:
> "...a device that supports ATS need not support PRI, but PRI is
> dependent on ATS’s capabilities."
> (So no ATS = No PRI).
> 
> > > Is this what we want?  Do we have any idea how many external
> > > devices this will affect or how much of a performance impact
> > > they will see?
> > >
> > > Do we need some kind of override or mechanism to authenticate
> > > certain devices so they can use ATS and PRI?
> >
> > Sounds like we would need some form of an allow-list to start with
> > so we can have something in the interim.
> 
> I assume what is being referred to, is an escape hatch to enable ATS
> on certain given "external-facing" ports (and devices downstream on
> that port). Do we really think a *per-port* control for ATS may be
> needed? I can add if there is consensus about this.
> 
> > I suppose a future platform might have a facilty to ensure ATS is
> > secure and authenticated we could enable for all of devices in the
> > system, in addition to PCI CMA/IDE.
> >
> > I think having a global override to enable all devices so platform
> > can switch to current behavior, or maybe via a cmdline switch.. as
> > much as we have a billion of those, it still gives an option in
> > case someone needs it.
> 
> Currently:
> 
> pci.noats => No ATS on all PCI devices.
> (Absense 

Re: [PATCH v4 1/4] PCI: Move pci_enable_acs() and its dependencies up in pci.c

2020-07-10 Thread Bjorn Helgaas
On Tue, Jul 07, 2020 at 03:46:01PM -0700, Rajat Jain wrote:
> Move pci_enable_acs() and the functions it depends on, further up in the
> source code to avoid having to forward declare it when we make it static
> in near future (next patch).
> 
> No functional changes intended.
> 
> Signed-off-by: Rajat Jain 

Applied patches 1-3 to pci/enumeration for v5.9, thanks!

I held off on patch 4 (enabling PCI_ACS_TB) until we have a little
more conversation on the impact of it.

> ---
> v4: Same as v3
> v3: Initial version of the patch, created per Bjorn's suggestion
> 
>  drivers/pci/pci.c | 254 +++---
>  1 file changed, 127 insertions(+), 127 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ce096272f52b1..eec625f0e594e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -777,6 +777,133 @@ int pci_wait_for_pending(struct pci_dev *dev, int pos, 
> u16 mask)
>   return 0;
>  }
>  
> +static int pci_acs_enable;
> +
> +/**
> + * pci_request_acs - ask for ACS to be enabled if supported
> + */
> +void pci_request_acs(void)
> +{
> + pci_acs_enable = 1;
> +}
> +
> +static const char *disable_acs_redir_param;
> +
> +/**
> + * pci_disable_acs_redir - disable ACS redirect capabilities
> + * @dev: the PCI device
> + *
> + * For only devices specified in the disable_acs_redir parameter.
> + */
> +static void pci_disable_acs_redir(struct pci_dev *dev)
> +{
> + int ret = 0;
> + const char *p;
> + int pos;
> + u16 ctrl;
> +
> + if (!disable_acs_redir_param)
> + return;
> +
> + p = disable_acs_redir_param;
> + while (*p) {
> + ret = pci_dev_str_match(dev, p, );
> + if (ret < 0) {
> + pr_info_once("PCI: Can't parse disable_acs_redir 
> parameter: %s\n",
> +  disable_acs_redir_param);
> +
> + break;
> + } else if (ret == 1) {
> + /* Found a match */
> + break;
> + }
> +
> + if (*p != ';' && *p != ',') {
> + /* End of param or invalid format */
> + break;
> + }
> + p++;
> + }
> +
> + if (ret != 1)
> + return;
> +
> + if (!pci_dev_specific_disable_acs_redir(dev))
> + return;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> + if (!pos) {
> + pci_warn(dev, "cannot disable ACS redirect for this hardware as 
> it does not have ACS capabilities\n");
> + return;
> + }
> +
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
> +
> + /* P2P Request & Completion Redirect */
> + ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
> +
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> +
> + pci_info(dev, "disabled ACS redirect\n");
> +}
> +
> +/**
> + * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
> + * @dev: the PCI device
> + */
> +static void pci_std_enable_acs(struct pci_dev *dev)
> +{
> + int pos;
> + u16 cap;
> + u16 ctrl;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> + if (!pos)
> + return;
> +
> + pci_read_config_word(dev, pos + PCI_ACS_CAP, );
> + pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
> +
> + /* Source Validation */
> + ctrl |= (cap & PCI_ACS_SV);
> +
> + /* P2P Request Redirect */
> + ctrl |= (cap & PCI_ACS_RR);
> +
> + /* P2P Completion Redirect */
> + ctrl |= (cap & PCI_ACS_CR);
> +
> + /* Upstream Forwarding */
> + ctrl |= (cap & PCI_ACS_UF);
> +
> + pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> +}
> +
> +/**
> + * pci_enable_acs - enable ACS if hardware support it
> + * @dev: the PCI device
> + */
> +void pci_enable_acs(struct pci_dev *dev)
> +{
> + if (!pci_acs_enable)
> + goto disable_acs_redir;
> +
> + if (!pci_dev_specific_enable_acs(dev))
> + goto disable_acs_redir;
> +
> + pci_std_enable_acs(dev);
> +
> +disable_acs_redir:
> + /*
> +  * Note: pci_disable_acs_redir() must be called even if ACS was not
> +  * enabled by the kernel because it may have been enabled by
> +  * platform firmware.  So if we are told to disable it, we should
> +  * always disable it after setting the kernel's default
> +  * preferences.
> +  */
> + pci_disable_acs_redir(dev);
> +}
> +
>  /**
>   * pci_restore_bars - restore a device's BAR values (e.g. after wake-up)
>   * @dev: PCI device to have its BARs restored
> @@ -3230,133 +3357,6 @@ void pci_configure_ari(struct pci_dev *dev)
>   }
>  }
>  
> -static int pci_acs_enable;
> -
> -/**
> - * pci_request_acs - ask for ACS to be enabled if supported
> - */
> -void pci_request_acs(void)
> -{
> - pci_acs_enable = 1;
> -}
> -
> -static const char *disable_acs_redir_param;
> -
> -/**
> - * pci_disable_acs_redir - 

Re: [PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-10 Thread Bjorn Helgaas
On Tue, Jul 07, 2020 at 03:46:04PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 
> ---
> v4: Add braces to avoid warning from kernel robot
> print warning for only external-facing devices.
> v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
> Minor code comments fixes.
> v2: Commit log change
> 
>  drivers/pci/pci.c|  8 
>  drivers/pci/quirks.c | 15 +++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 73a8627822140..a5a6bea7af7ce 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }

IIUC, this means that external devices can *never* use ATS and can
never cache translations.  And (I guess, I'm not an expert) it can
also never use the Page Request Services?

Is this what we want?  Do we have any idea how many external devices
this will affect or how much of a performance impact they will see?

Do we need some kind of override or mechanism to authenticate certain
devices so they can use ATS and PRI?

If we do decide this is the right thing to do, I think we need to
expand the commit log a bit, because this is potentially a significant
user-visible change.

>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..bb22b46c1d719 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF
> + *
> + * TODO: This quirk also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted
> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,14 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + /* Enable Translation Blocking for external devices */
> + if (dev->external_facing || dev->untrusted) {
> + if (cap & PCI_ACS_TB)
> + ctrl |= PCI_ACS_TB;
> + else if (dev->external_facing)
> + pci_warn(dev, "ACS: No Translation Blocking on 
> external-facing dev\n");
> + }
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2] PCI: Add device even if driver attach failed

2020-07-07 Thread Bjorn Helgaas
On Mon, Jul 06, 2020 at 04:32:40PM -0700, Rajat Jain wrote:
> device_attach() returning failure indicates a driver error while trying to
> probe the device. In such a scenario, the PCI device should still be added
> in the system and be visible to the user.
> 
> This patch partially reverts:
> commit ab1a187bba5c ("PCI: Check device_attach() return value always")
> 
> Signed-off-by: Rajat Jain 
> Reviewed-by: Greg Kroah-Hartman 
> ---
> Resending to stable, independent from other patches per Greg's suggestion
> v2: Add Greg's reviewed by, fix commit log

Applied to pci/enumeration for v5.8 with stable tag, thanks!

>  drivers/pci/bus.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 8e40b3e6da77d..3cef835b375fd 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -322,12 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
>  
>   dev->match_driver = true;
>   retval = device_attach(>dev);
> - if (retval < 0 && retval != -EPROBE_DEFER) {
> + if (retval < 0 && retval != -EPROBE_DEFER)
>   pci_warn(dev, "device attach failed (%d)\n", retval);
> - pci_proc_detach_device(dev);
> - pci_remove_sysfs_dev_files(dev);
> - return;
> - }
>  
>   pci_dev_assign_added(dev, true);
>  }
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/7] PCI: Set "untrusted" flag for truly external devices only

2020-07-06 Thread Bjorn Helgaas
On Mon, Jul 06, 2020 at 03:31:47PM -0700, Rajat Jain wrote:
> On Mon, Jul 6, 2020 at 9:38 AM Bjorn Helgaas  wrote:
> > On Mon, Jun 29, 2020 at 09:49:38PM -0700, Rajat Jain wrote:

> > > -static void pci_acpi_set_untrusted(struct pci_dev *dev)
> > > +static void pci_acpi_set_external_facing(struct pci_dev *dev)
> > >  {
> > >   u8 val;
> > >
> > > - if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
> > > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
> > > + pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
> >
> > This looks like a change worthy of its own patch.  We used to look for
> > "ExternalFacingPort" only on Root Ports; now we'll also do it for
> > Switch Downstream Ports.
> 
> Can do. (please see below)
> 
> > Can you include DT and ACPI spec references if they exist?  I found
> > this mention:
> > https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports
> > which actually says it should only be implemented for Root Ports.
> 
> I actually have no references. It seems to me that the microsoft spec
> assumes that all external ports must be implemented on root ports, but
> I think it would be equally fair for systems with PCIe switches to
> implement one on one of their switch downstream ports. I don't have an
> immediate use of this anyway, so if you think this should rather wait
> unless someone really has this case, this can wait. Let me know.

I agree that it "makes sense" to pay attention to this property no
matter where it appears, but since that Microsoft doc went to the
trouble to restrict it to Root Ports, I think we should leave this
as-is and only look for it in the Root Port.  Otherwise Linux will
accept something Windows will reject, and that seems like a needless
difference.

We can at least include the above link to the Microsoft doc in the
commit log.

> > It also mentions a "DmaProperty" that looks related.  Maybe Linux
> > should also pay attention to this?
> 
> Interesting. Since this is not in use currently by the kernel as well
> as not exposed by (our) BIOS, I don't have an immediate use case for
> this. I'd like to defer this for later (as-the-need-arises).

I agree, you can defer this until you see a need for it.  I just
pointed it out in case it would be useful to you.

> > > + /*
> > > +  * Devices are marked as external-facing using info from platform
> > > +  * (ACPI / devicetree). An external-facing device is still an 
> > > internal
> > > +  * trusted device, but it faces external untrusted devices. Thus any
> > > +  * devices enumerated downstream an external-facing device is marked
> > > +  * as untrusted.
> >
> > This comment has a subject/verb agreement problem.
> 
> I assume you meant s/is/are/ in last sentence. Will do.

Right.  There's also something wrong with "enumerated downstream an".
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/7] PCI: Keep the ACS capability offset in device

2020-07-06 Thread Bjorn Helgaas
On Mon, Jul 06, 2020 at 03:16:42PM -0700, Rajat Jain wrote:
> On Mon, Jul 6, 2020 at 8:58 AM Bjorn Helgaas  wrote:
> > On Mon, Jun 29, 2020 at 09:49:37PM -0700, Rajat Jain wrote:

> > > +static void pci_enable_acs(struct pci_dev *dev);
> >
> > I don't think we need this forward declaration, do we?
> 
> We need it unless we move its definition further up in the file:
> 
> drivers/pci/pci.c: In function ‘pci_restore_state’:
> drivers/pci/pci.c:1551:2: error: implicit declaration of function
> ‘pci_enable_acs’; did you mean ‘pci_enable_ats’?
> [-Werror=implicit-function-declaration]
>  1551 |  pci_enable_acs(dev);
> 
> Do you want me to move it up in the file so that we do not need the
> forward declaration?

Yes, please move it.  Maybe a preliminary patch that moves it but
doesn't change anything else.

I think I thought you had renamed the function, in which case you
could tell from the patch itself.  But I was mistaken!

> > > @@ -4653,7 +4653,7 @@ static int pci_quirk_intel_spt_pch_acs(struct 
> > > pci_dev *dev, u16 acs_flags)
> > >   if (!pci_quirk_intel_spt_pch_acs_match(dev))
> > >   return -ENOTTY;
> > >
> > > - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> > > + pos = dev->acs_cap;
> >
> > I assume you verified that all these quirks are FINAL quirks, since
> > pci_init_capabilities() is called after HEADER quirks.  I'll
> > double-check before applying this.
> 
> None of these quirks are applied via DECLARE_PCI_FIXUP_*(). All these
> quirks are called (directly or indirectly) from either
> pci_enable_acs() or pci_acs_enabled(),
> 
> EXCEPT
> 
> pci_idt_bus_quirk(). That one is called from
> pci_bus_read_dev_vendor_id() which should be called only after the
> parent bridge has been added and setup correctly.
> 
> So it looks all good to me.

Great, thanks for checking that.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 3/7] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-06 Thread Bjorn Helgaas
On Mon, Jun 29, 2020 at 09:49:39PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 
> ---
> v2: Commit log change 
> 
>  drivers/pci/pci.c|  4 
>  drivers/pci/quirks.c | 11 +++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d2ff987585855..79853b52658a2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3330,6 +3330,10 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + if (dev->external_facing || dev->untrusted)
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..6294adeac4049 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_SV

Nit: Reorder these as in c8de8ed2dcaa ("PCI: Make ACS quirk
implementations more uniform") so they match other similar lists in
the code.

But more to the point: we have a bunch of other quirks for devices
that do not have an ACS capability but *do* provide some ACS-like
features.  Most of them support

  PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF

because that's what we usually want.  But I bet some of them also
actually provide the equivalent of PCI_ACS_TB.

REQ_ACS_FLAGS doesn't include PCI_ACS_TB.  Is there anything we need
to do on the pci_acs_enabled() side to check for PCI_ACS_TB, and
consequently, to update any of the quirks for devices that provide it?

> + *
> + * Currently missing, it also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted
> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,10 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + if (dev->external_facing || dev->untrusted)
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/7] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-06 Thread Bjorn Helgaas
On Mon, Jun 29, 2020 at 09:49:39PM -0700, Rajat Jain wrote:
> When enabling ACS, enable translation blocking for external facing ports
> and untrusted devices.
> 
> Signed-off-by: Rajat Jain 
> ---
> v2: Commit log change 
> 
>  drivers/pci/pci.c|  4 
>  drivers/pci/quirks.c | 11 +++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d2ff987585855..79853b52658a2 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3330,6 +3330,10 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>   /* Upstream Forwarding */
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + if (dev->external_facing || dev->untrusted)
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b341628e47527..6294adeac4049 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
> pci_dev *dev)
>   }
>  }
>  
> +/*
> + * Currently this quirk does the equivalent of
> + * PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_SV
> + *
> + * Currently missing, it also needs to do equivalent of PCI_ACS_TB,
> + * if dev->external_facing || dev->untrusted

I don't understand this comment.  Is this a "TODO"?  Is there
something more that needs to be done here?

After a patch is applied, a comment should describe the code as it is.

> + */
>  static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
>  {
>   if (!pci_quirk_intel_pch_acs_match(dev))
> @@ -4973,6 +4980,10 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
> pci_dev *dev)
>   ctrl |= (cap & PCI_ACS_CR);
>   ctrl |= (cap & PCI_ACS_UF);
>  
> + if (dev->external_facing || dev->untrusted)
> + /* Translation Blocking */
> + ctrl |= (cap & PCI_ACS_TB);
> +
>   pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
>  
>   pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/7] PCI: Set "untrusted" flag for truly external devices only

2020-07-06 Thread Bjorn Helgaas
On Tue, Jun 30, 2020 at 09:55:54AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 29, 2020 at 09:49:38PM -0700, Rajat Jain wrote:
> > The "ExternalFacing" devices (root ports) are still internal devices that
> > sit on the internal system fabric and thus trusted. Currently they were
> > being marked untrusted.
> > 
> > This patch uses the platform flag to identify the external facing devices
> > and then use it to mark any downstream devices as "untrusted". The
> > external-facing devices themselves are left as "trusted". This was
> > discussed here: https://lkml.org/lkml/2020/6/10/1049
> 
> {sigh}
> 
> First off, please use lore.kernel.org links, we don't control lkml.org
> and it often times has been down.
> 
> Also, you need to put all of the information in the changelog, referring
> to another place isn't always the best thing, considering you will be
> looking this up in 20+ years to try to figure out why people came up
> with such a crazy design.
> 
> But, the main point is, no, we did not decide on this.  "trust" is a
> policy decision to make by userspace, it is independant of "location",
> while you are tieing it directly here, which is what I explicitly said
> NOT to do.
> 
> So again, no, I will NAK this patch as-is, sorry, you are mixing things
> together in a way that it should not do at this point in time.

What do you see being mixed together here?  I acknowledge that the
name of "pdev->untrusted" is probably a mistake.  But this patch
doesn't change anything there.  It only changes the treatment of the
edge case of the "ExternalFacing" ports.  Previously we treated them
as being external themselves, which does seem wrong.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/7] PCI: Set "untrusted" flag for truly external devices only

2020-07-06 Thread Bjorn Helgaas
On Mon, Jun 29, 2020 at 09:49:38PM -0700, Rajat Jain wrote:
> The "ExternalFacing" devices (root ports) are still internal devices that
> sit on the internal system fabric and thus trusted. Currently they were
> being marked untrusted.
> 
> This patch uses the platform flag to identify the external facing devices
> and then use it to mark any downstream devices as "untrusted". The
> external-facing devices themselves are left as "trusted". This was
> discussed here: https://lkml.org/lkml/2020/6/10/1049

Use the imperative mood in the commit log, as you did for 1/7.  E.g.,
instead of "This patch uses ...", say "Use the platform flag ...".
That helps all the commit logs read nicely together.

I think this patch makes two changes that should be separated:

  - Treat "external-facing" devices as internal.

  - Look for the "external-facing" or "ExternalFacing" property on
Switch Downstream Ports as well as Root Ports.

> Signed-off-by: Rajat Jain 
> ---
> v2: cosmetic changes in commit log
> 
>  drivers/iommu/intel/iommu.c |  2 +-
>  drivers/pci/of.c|  2 +-
>  drivers/pci/pci-acpi.c  | 13 +++--
>  drivers/pci/probe.c |  2 +-
>  include/linux/pci.h |  8 
>  5 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e982..1ccb224f82496 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4743,7 +4743,7 @@ static inline bool has_untrusted_dev(void)
>   struct pci_dev *pdev = NULL;
>  
>   for_each_pci_dev(pdev)
> - if (pdev->untrusted)
> + if (pdev->untrusted || pdev->external_facing)

I think checking pdev->external_facing is enough for this case,
because it's impossible to have pdev->untrusted unless a parent has
pdev->external_facing.

IIUC, this usage is asking "might we ever have an external device?"
as opposed to the "pdev->untrusted" uses, which are asking "is *this*
device an external device?"

>   return true;
>  
>   return false;
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index 27839cd2459f6..22727fc9558df 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -42,7 +42,7 @@ void pci_set_bus_of_node(struct pci_bus *bus)
>   } else {
>   node = of_node_get(bus->self->dev.of_node);
>   if (node && of_property_read_bool(node, "external-facing"))
> - bus->self->untrusted = true;
> + bus->self->external_facing = true;
>   }
>  
>   bus->dev.of_node = node;
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 7224b1e5f2a83..492c07805caf8 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -1213,22 +1213,23 @@ static void pci_acpi_optimize_delay(struct pci_dev 
> *pdev,
>   ACPI_FREE(obj);
>  }
>  
> -static void pci_acpi_set_untrusted(struct pci_dev *dev)
> +static void pci_acpi_set_external_facing(struct pci_dev *dev)
>  {
>   u8 val;
>  
> - if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
> + if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
> + pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)

This looks like a change worthy of its own patch.  We used to look for
"ExternalFacingPort" only on Root Ports; now we'll also do it for
Switch Downstream Ports.

Can you include DT and ACPI spec references if they exist?  I found
this mention:
https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports
which actually says it should only be implemented for Root Ports.

It also mentions a "DmaProperty" that looks related.  Maybe Linux
should also pay attention to this?

If we do change this, should we use pcie_downstream_port(), which
includes PCI-to-PCIe bridges as well?

>   return;
>   if (device_property_read_u8(>dev, "ExternalFacingPort", ))
>   return;
>  
>   /*
> -  * These root ports expose PCIe (including DMA) outside of the
> -  * system so make sure we treat them and everything behind as
> +  * These root/down ports expose PCIe (including DMA) outside of the
> +  * system so make sure we treat everything behind them as
>* untrusted.
>*/
>   if (val)
> - dev->untrusted = 1;
> + dev->external_facing = 1;
>  }
>  
>  static void pci_acpi_setup(struct device *dev)
> @@ -1240,7 +1241,7 @@ static void pci_acpi_setup(struct device *dev)
>   return;
>  
>   pci_acpi_optimize_delay(pci_dev, adev->handle);
> - pci_acpi_set_untrusted(pci_dev);
> + pci_acpi_set_external_facing(pci_dev);
>   pci_acpi_add_edr_notifier(pci_dev);
>  
>   pci_acpi_add_pm_notifier(adev, pci_dev);
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 6d87066a5ecc5..8c40c00413e74 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1552,7 +1552,7 @@ static void set_pcie_untrusted(struct 

Re: [PATCH v2 1/7] PCI: Keep the ACS capability offset in device

2020-07-06 Thread Bjorn Helgaas
On Mon, Jun 29, 2020 at 09:49:37PM -0700, Rajat Jain wrote:
> Currently this is being looked up at a number of places. Read and store it
> once at bootup so that it can be used by all later.

Write the commit log so it is complete even without the subject.
Right now, you have to read the subject to know what "this" refers to.

The subject is like the title; the log is like the body of an article.
The title isn't *part* of the article, so the article has to make
sense all by itself.

> +static void pci_enable_acs(struct pci_dev *dev);

I don't think we need this forward declaration, do we?

> @@ -4653,7 +4653,7 @@ static int pci_quirk_intel_spt_pch_acs(struct pci_dev 
> *dev, u16 acs_flags)
>   if (!pci_quirk_intel_spt_pch_acs_match(dev))
>   return -ENOTTY;
>  
> - pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> + pos = dev->acs_cap;

I assume you verified that all these quirks are FINAL quirks, since
pci_init_capabilities() is called after HEADER quirks.  I'll
double-check before applying this.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] pci: Add pci device even if the driver failed to attach

2020-06-26 Thread Bjorn Helgaas
Nit: when you update these patches, can you run "git log --oneline
drivers/pci/bus.c" and make your subject lines match the convention?
E.g.,

  PCI: Add device even if driver attach failed

On Thu, Jun 25, 2020 at 05:27:09PM -0700, Rajat Jain wrote:
> device_attach() returning failure indicates a driver error
> while trying to probe the device. In such a scenario, the PCI
> device should still be added in the system and be visible to
> the user.

Nit: please wrap logs to fill 75 characters.  "git log" adds 4 spaces
at the beginning, so 75+4 still fits nicely in 80 columns without
wrapping.

> This patch partially reverts:
> commit ab1a187bba5c ("PCI: Check device_attach() return value always")
> 
> Signed-off-by: Rajat Jain 
> ---
>  drivers/pci/bus.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 8e40b3e6da77d..3cef835b375fd 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -322,12 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
>  
>   dev->match_driver = true;
>   retval = device_attach(>dev);
> - if (retval < 0 && retval != -EPROBE_DEFER) {
> + if (retval < 0 && retval != -EPROBE_DEFER)
>   pci_warn(dev, "device attach failed (%d)\n", retval);
> - pci_proc_detach_device(dev);
> - pci_remove_sysfs_dev_files(dev);

Thanks for catching my bug!

> - return;
> - }
>  
>   pci_dev_assign_added(dev, true);
>  }
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-23 Thread Bjorn Helgaas
On Fri, Jun 19, 2020 at 10:26:54AM +0800, Zhangfei Gao wrote:
> Have studied _DSM method, two issues we met comparing using quirk.
> 
> 1. Need change definition of either pci_host_bridge or pci_dev, like adding
> member can_stall,
> while pci system does not know stall now.
> 
> a, pci devices do not have uuid: uuid need be described in dsdt, while pci
> devices are not defined in dsdt.
>     so we have to use host bridge.

PCI devices *can* be described in the DSDT.  IIUC these particular
devices are hardwired (not plug-in cards), so platform firmware can
know about them and could describe them in the DSDT.

> b,  Parsing dsdt is in in pci subsystem.
> Like drivers/acpi/pci_root.c:
>    obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), _acpi_dsm_guid,
> 1,
>     IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> 
> After parsing DSM in pci, we need record this info.
> Currently, can_stall info is recorded in iommu_fwspec,
> which is allocated in iommu_fwspec_init and called by iort_iommu_configure
> for uefi.

You can look for a _DSM wherever it is convenient for you.  It could
be in an AMBA shim layer.

> 2. Guest kernel also need support sva.
> Using quirk, the guest can boot with sva enabled, since quirk is
> self-contained by kernel.
> If using  _DSM, a specific uefi or dtb has to be provided,
> currently we can useQEMU_EFI.fd from apt install qemu-efi

I don't quite understand what this means, but as I mentioned before, a
quirk for a *limited* number of devices is OK, as long as there is a
plan that removes the need for a quirk for future devices.

E.g., if the next platform version ships with a DTB or firmware with a
_DSM or other mechanism that enables the kernel to discover this
information without a kernel change, it's fine to use a quirk to cover
the early platform.

The principles are:

  - I don't want to have to update a quirk for every new Device ID
that needs this.

  - I don't really want to have to manage non-PCI information in the
struct pci_dev.  If this is AMBA- or IOMMU-related, it should be
stored in a structure related to AMBA or the IOMMU.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-15 Thread Bjorn Helgaas
On Sat, Jun 13, 2020 at 10:30:56PM +0800, Zhangfei Gao wrote:
> On 2020/6/11 下午9:44, Bjorn Helgaas wrote:
> > +++ b/drivers/iommu/iommu.c
> > > > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device 
> > > > > > > > > > *dev, struct
> > > > > > > > > > fwnode_handle *iommu_fwnode,
> > > > > > > > > > fwspec->iommu_fwnode = iommu_fwnode;
> > > > > > > > > > fwspec->ops = ops;
> > > > > > > > > > dev_iommu_fwspec_set(dev, fwspec);
> > > > > > > > > > +
> > > > > > > > > > +   if (dev_is_pci(dev))
> > > > > > > > > > +   pci_fixup_device(pci_fixup_final, 
> > > > > > > > > > to_pci_dev(dev));
> > > > > > > > > > +
> > > > > > > > > > 
> > > > > > > > > > Then pci_fixup_final will be called twice, the first in 
> > > > > > > > > > pci_bus_add_device.
> > > > > > > > > > Here in iommu_fwspec_init is the second time, specifically 
> > > > > > > > > > for iommu_fwspec.
> > > > > > > > > > Will send this when 5.8-rc1 is open.
> > > > > > > > > Wait, this whole fixup approach seems wrong to me.  No matter 
> > > > > > > > > how you
> > > > > > > > > do the fixup, it's still a fixup, which means it requires 
> > > > > > > > > ongoing
> > > > > > > > > maintenance.  Surely we don't want to have to add the 
> > > > > > > > > Vendor/Device ID
> > > > > > > > > for every new AMBA device that comes along, do we?
> > > > > > > > > 
> > > > > > > > Here the fake pci device has standard PCI cfg space, but 
> > > > > > > > physical
> > > > > > > > implementation is base on AMBA
> > > > > > > > They can provide pasid feature.
> > > > > > > > However,
> > > > > > > > 1, does not support tlp since they are not real pci devices.
> > > > > > > > 2. does not support pri, instead support stall (provided by 
> > > > > > > > smmu)
> > > > > > > > And stall is not a pci feature, so it is not described in 
> > > > > > > > struct pci_dev,
> > > > > > > > but in struct iommu_fwspec.
> > > > > > > > So we use this fixup to tell pci system that the devices can 
> > > > > > > > support stall,
> > > > > > > > and hereby support pasid.
> > > > > > > This did not answer my question.  Are you proposing that we 
> > > > > > > update a
> > > > > > > quirk every time a new AMBA device is released?  I don't think 
> > > > > > > that
> > > > > > > would be a good model.
> > > > > > Yes, you are right, but we do not have any better idea yet.
> > > > > > Currently we have three fake pci devices, which support stall and 
> > > > > > pasid.
> > > > > > We have to let pci system know the device can support pasid, 
> > > > > > because of
> > > > > > stall feature, though not support pri.
> > > > > > Do you have any other ideas?
> > > > > It sounds like the best way would be to allocate a PCI capability for 
> > > > > it, so
> > > > > detection can be done through config space, at least in future 
> > > > > devices,
> > > > > or possibly after a firmware update if the config space in your system
> > > > > is controlled by firmware somewhere.  Once there is a proper mechanism
> > > > > to do this, using fixups to detect the early devices that don't use 
> > > > > that
> > > > > should be uncontroversial. I have no idea what the process or timeline
> > > > > is to add new capabilities into the PCIe specification, or if this one
> > > > > would be acceptable to the PCI SIG at all.
> > > > That sounds like a possibility.  The spec already defines a
> > > > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that 

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-11 Thread Bjorn Helgaas
On Thu, Jun 11, 2020 at 10:54:45AM +0800, Zhangfei Gao wrote:
> On 2020/6/10 上午12:49, Bjorn Helgaas wrote:
> > On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:
> > > On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao  
> > > wrote:
> > > > On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> > > > > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> > > > > > On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > > > > > > > +++ b/drivers/iommu/iommu.c
> > > > > > > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device 
> > > > > > > > *dev, struct
> > > > > > > > fwnode_handle *iommu_fwnode,
> > > > > > > >fwspec->iommu_fwnode = iommu_fwnode;
> > > > > > > >fwspec->ops = ops;
> > > > > > > >dev_iommu_fwspec_set(dev, fwspec);
> > > > > > > > +
> > > > > > > > +   if (dev_is_pci(dev))
> > > > > > > > +   pci_fixup_device(pci_fixup_final, 
> > > > > > > > to_pci_dev(dev));
> > > > > > > > +
> > > > > > > > 
> > > > > > > > Then pci_fixup_final will be called twice, the first in 
> > > > > > > > pci_bus_add_device.
> > > > > > > > Here in iommu_fwspec_init is the second time, specifically for 
> > > > > > > > iommu_fwspec.
> > > > > > > > Will send this when 5.8-rc1 is open.
> > > > > > > Wait, this whole fixup approach seems wrong to me.  No matter how 
> > > > > > > you
> > > > > > > do the fixup, it's still a fixup, which means it requires ongoing
> > > > > > > maintenance.  Surely we don't want to have to add the 
> > > > > > > Vendor/Device ID
> > > > > > > for every new AMBA device that comes along, do we?
> > > > > > > 
> > > > > > Here the fake pci device has standard PCI cfg space, but physical
> > > > > > implementation is base on AMBA
> > > > > > They can provide pasid feature.
> > > > > > However,
> > > > > > 1, does not support tlp since they are not real pci devices.
> > > > > > 2. does not support pri, instead support stall (provided by smmu)
> > > > > > And stall is not a pci feature, so it is not described in struct 
> > > > > > pci_dev,
> > > > > > but in struct iommu_fwspec.
> > > > > > So we use this fixup to tell pci system that the devices can 
> > > > > > support stall,
> > > > > > and hereby support pasid.
> > > > > This did not answer my question.  Are you proposing that we update a
> > > > > quirk every time a new AMBA device is released?  I don't think that
> > > > > would be a good model.
> > > > Yes, you are right, but we do not have any better idea yet.
> > > > Currently we have three fake pci devices, which support stall and pasid.
> > > > We have to let pci system know the device can support pasid, because of
> > > > stall feature, though not support pri.
> > > > Do you have any other ideas?
> > > It sounds like the best way would be to allocate a PCI capability for it, 
> > > so
> > > detection can be done through config space, at least in future devices,
> > > or possibly after a firmware update if the config space in your system
> > > is controlled by firmware somewhere.  Once there is a proper mechanism
> > > to do this, using fixups to detect the early devices that don't use that
> > > should be uncontroversial. I have no idea what the process or timeline
> > > is to add new capabilities into the PCIe specification, or if this one
> > > would be acceptable to the PCI SIG at all.
> > That sounds like a possibility.  The spec already defines a
> > Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
> > be a candidate.
> Will investigate this, thanks Bjorn

FWIW, there's also a Vendor-Specific Capability that can appear in the
first 256 bytes of config space (the Vendor-Specific Extended
Capability must appear in the "Extended Configuration Space" from
0x100-0xfff).

> > > If detection cannot be done through PCI config space, the next best
> > > alternative is to pass auxiliary data through firmware. On DT based

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-09 Thread Bjorn Helgaas
On Tue, Jun 09, 2020 at 11:15:06AM +0200, Arnd Bergmann wrote:
> On Tue, Jun 9, 2020 at 6:02 AM Zhangfei Gao  wrote:
> > On 2020/6/9 上午12:41, Bjorn Helgaas wrote:
> > > On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> > >> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > >>>> +++ b/drivers/iommu/iommu.c
> > >>>> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> > >>>> fwnode_handle *iommu_fwnode,
> > >>>>   fwspec->iommu_fwnode = iommu_fwnode;
> > >>>>   fwspec->ops = ops;
> > >>>>   dev_iommu_fwspec_set(dev, fwspec);
> > >>>> +
> > >>>> +   if (dev_is_pci(dev))
> > >>>> +   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> > >>>> +
> > >>>>
> > >>>> Then pci_fixup_final will be called twice, the first in 
> > >>>> pci_bus_add_device.
> > >>>> Here in iommu_fwspec_init is the second time, specifically for 
> > >>>> iommu_fwspec.
> > >>>> Will send this when 5.8-rc1 is open.
> > >>> Wait, this whole fixup approach seems wrong to me.  No matter how you
> > >>> do the fixup, it's still a fixup, which means it requires ongoing
> > >>> maintenance.  Surely we don't want to have to add the Vendor/Device ID
> > >>> for every new AMBA device that comes along, do we?
> > >>>
> > >> Here the fake pci device has standard PCI cfg space, but physical
> > >> implementation is base on AMBA
> > >> They can provide pasid feature.
> > >> However,
> > >> 1, does not support tlp since they are not real pci devices.
> > >> 2. does not support pri, instead support stall (provided by smmu)
> > >> And stall is not a pci feature, so it is not described in struct pci_dev,
> > >> but in struct iommu_fwspec.
> > >> So we use this fixup to tell pci system that the devices can support 
> > >> stall,
> > >> and hereby support pasid.
> > > This did not answer my question.  Are you proposing that we update a
> > > quirk every time a new AMBA device is released?  I don't think that
> > > would be a good model.
> >
> > Yes, you are right, but we do not have any better idea yet.
> > Currently we have three fake pci devices, which support stall and pasid.
> > We have to let pci system know the device can support pasid, because of
> > stall feature, though not support pri.
> > Do you have any other ideas?
> 
> It sounds like the best way would be to allocate a PCI capability for it, so
> detection can be done through config space, at least in future devices,
> or possibly after a firmware update if the config space in your system
> is controlled by firmware somewhere.  Once there is a proper mechanism
> to do this, using fixups to detect the early devices that don't use that
> should be uncontroversial. I have no idea what the process or timeline
> is to add new capabilities into the PCIe specification, or if this one
> would be acceptable to the PCI SIG at all.

That sounds like a possibility.  The spec already defines a
Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might
be a candidate.

> If detection cannot be done through PCI config space, the next best
> alternative is to pass auxiliary data through firmware. On DT based
> machines, you can list non-hotpluggable PCIe devices and add custom
> properties that could be read during device enumeration. I assume
> ACPI has something similar, but I have not done that.

ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate.  I
like this better than a PCI capability because the property you need
to expose is not a PCI property.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-08 Thread Bjorn Helgaas
On Mon, Jun 08, 2020 at 10:54:15AM +0800, Zhangfei Gao wrote:
> On 2020/6/6 上午7:19, Bjorn Helgaas wrote:
> > On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
> > > On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
> > > > On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> > > > > On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > > > > > Is this slowdown significant?  We already iterate over every device
> > > > > > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > > > > > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > > > > > adding two more iterations to the loop in pci_do_fixups() that tries
> > > > > > to match quirks against the current device.  I doubt that would be a
> > > > > > measurable slowdown.
> > > > > I don't know how significant it is, but I remember people complaining
> > > > > about adding new PCI quirks because it takes too long for them to run
> > > > > them all. That was in the discussion about the quirk disabling ATS on
> > > > > AMD Stoney systems.
> > > > > 
> > > > > So it probably depends on how many PCI devices are in the system 
> > > > > whether
> > > > > it causes any measureable slowdown.
> > > > I found this [1] from Paul Menzel, which was a slowdown caused by
> > > > quirk_usb_early_handoff().  I think the real problem is individual
> > > > quirks that take a long time.
> > > > 
> > > > The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
> > > > course, they're only run for matching devices anyway.  So I'd rather
> > > > keep them as PCI_FIXUP_FINAL than add a whole new phase.
> > > > 
> > > Thanks Bjorn for taking time for this.
> > > If so, it would be much simpler.
> > > 
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> > > fwnode_handle *iommu_fwnode,
> > >      fwspec->iommu_fwnode = iommu_fwnode;
> > >      fwspec->ops = ops;
> > >      dev_iommu_fwspec_set(dev, fwspec);
> > > +
> > > +   if (dev_is_pci(dev))
> > > +   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> > > +
> > > 
> > > Then pci_fixup_final will be called twice, the first in 
> > > pci_bus_add_device.
> > > Here in iommu_fwspec_init is the second time, specifically for 
> > > iommu_fwspec.
> > > Will send this when 5.8-rc1 is open.
> >
> > Wait, this whole fixup approach seems wrong to me.  No matter how you
> > do the fixup, it's still a fixup, which means it requires ongoing
> > maintenance.  Surely we don't want to have to add the Vendor/Device ID
> > for every new AMBA device that comes along, do we?
> > 
> Here the fake pci device has standard PCI cfg space, but physical
> implementation is base on AMBA
> They can provide pasid feature.
> However,
> 1, does not support tlp since they are not real pci devices.
> 2. does not support pri, instead support stall (provided by smmu)
> And stall is not a pci feature, so it is not described in struct pci_dev,
> but in struct iommu_fwspec.
> So we use this fixup to tell pci system that the devices can support stall,
> and hereby support pasid.

This did not answer my question.  Are you proposing that we update a
quirk every time a new AMBA device is released?  I don't think that
would be a good model.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-05 Thread Bjorn Helgaas
On Thu, Jun 04, 2020 at 09:33:07PM +0800, Zhangfei Gao wrote:
> On 2020/6/2 上午1:41, Bjorn Helgaas wrote:
> > On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> > > On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > > > Is this slowdown significant?  We already iterate over every device
> > > > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > > > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > > > adding two more iterations to the loop in pci_do_fixups() that tries
> > > > to match quirks against the current device.  I doubt that would be a
> > > > measurable slowdown.
> > > I don't know how significant it is, but I remember people complaining
> > > about adding new PCI quirks because it takes too long for them to run
> > > them all. That was in the discussion about the quirk disabling ATS on
> > > AMD Stoney systems.
> > > 
> > > So it probably depends on how many PCI devices are in the system whether
> > > it causes any measureable slowdown.
> > I found this [1] from Paul Menzel, which was a slowdown caused by
> > quirk_usb_early_handoff().  I think the real problem is individual
> > quirks that take a long time.
> > 
> > The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
> > course, they're only run for matching devices anyway.  So I'd rather
> > keep them as PCI_FIXUP_FINAL than add a whole new phase.
> > 
> Thanks Bjorn for taking time for this.
> If so, it would be much simpler.
> 
> +++ b/drivers/iommu/iommu.c
> @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct
> fwnode_handle *iommu_fwnode,
>     fwspec->iommu_fwnode = iommu_fwnode;
>     fwspec->ops = ops;
>     dev_iommu_fwspec_set(dev, fwspec);
> +
> +   if (dev_is_pci(dev))
> +   pci_fixup_device(pci_fixup_final, to_pci_dev(dev));
> +
> 
> Then pci_fixup_final will be called twice, the first in pci_bus_add_device.
> Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec.
> Will send this when 5.8-rc1 is open.

Wait, this whole fixup approach seems wrong to me.  No matter how you
do the fixup, it's still a fixup, which means it requires ongoing
maintenance.  Surely we don't want to have to add the Vendor/Device ID
for every new AMBA device that comes along, do we?

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.

2020-06-01 Thread Bjorn Helgaas
On Mon, Jun 01, 2020 at 03:56:55PM -0600, Alex Williamson wrote:
> On Mon, 1 Jun 2020 14:40:23 -0700
> "Raj, Ashok"  wrote:
> 
> > On Mon, Jun 01, 2020 at 04:25:19PM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote:  
> > > > All Intel platforms guarantee that all root complex implementations
> > > > must send transactions up to IOMMU for address translations. Hence for
> > > > RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> > > > ACS support.
> > > > 
> > > > 
> > > > 3.16 Root-Complex Peer to Peer Considerations
> > > > When DMA remapping is enabled, peer-to-peer requests through the
> > > > Root-Complex must be handled
> > > > as follows:
> > > > • The input address in the request is translated (through first-level,
> > > >   second-level or nested translation) to a host physical address (HPA).
> > > >   The address decoding for peer addresses must be done only on the
> > > >   translated HPA. Hardware implementations are free to further limit
> > > >   peer-to-peer accesses to specific host physical address regions
> > > >   (or to completely disallow peer-forwarding of translated requests).
> > > > • Since address translation changes the contents (address field) of
> > > >   the PCI Express Transaction Layer Packet (TLP), for PCI Express
> > > >   peer-to-peer requests with ECRC, the Root-Complex hardware must use
> > > >   the new ECRC (re-computed with the translated address) if it
> > > >   decides to forward the TLP as a peer request.
> > > > • Root-ports, and multi-function root-complex integrated endpoints, may
> > > >   support additional peerto-peer control features by supporting PCI 
> > > > Express
> > > >   Access Control Services (ACS) capability. Refer to ACS capability in
> > > >   PCI Express specifications for details.
> > > > 
> > > > Since Linux didn't give special treatment to allow this exception, 
> > > > certain
> > > > RCiEP MFD devices are getting grouped in a single iommu group. This
> > > > doesn't permit a single device to be assigned to a guest for instance.
> > > > 
> > > > In one vendor system: Device 14.x were grouped in a single IOMMU group.
> > > > 
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.3
> > > > 
> > > > After the patch:
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.0
> > > > /sys/kernel/iommu_groups/5/devices/:00:14.2
> > > > /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> > > > 
> > > > 14.0 and 14.2 are integrated devices, but legacy end points.
> > > > Whereas 14.3 was a PCIe compliant RCiEP.
> > > > 
> > > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> > > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> > > > 
> > > > This permits assigning this device to a guest VM.
> > > > 
> > > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> > > > Signed-off-by: Ashok Raj 
> > > > To: Joerg Roedel 
> > > > To: Bjorn Helgaas 
> > > > Cc: linux-ker...@vger.kernel.org
> > > > Cc: iommu@lists.linux-foundation.org
> > > > Cc: Lu Baolu 
> > > > Cc: Alex Williamson 
> > > > Cc: Darrel Goeddel 
> > > > Cc: Mark Scott ,
> > > > Cc: Romil Sharma 
> > > > Cc: Ashok Raj   
> > > 
> > > Tentatively applied to pci/virtualization for v5.8, thanks!
> > > 
> > > The spec says this handling must apply "when DMA remapping is
> > > enabled".  The patch does not check whether DMA remapping is enabled.
> > > 
> > > Is there any case where DMA remapping is *not* enabled, and we rely on
> > > this patch to tell us whether the device is isolated?  It sounds like
> > > it may give the wrong answer in such a case?
> > > 
> > > Can you confirm that I don't need to worry about this?
> > 
> > I think all of this makes sense only when DMA remapping is enabled.
> > Otherwise there is no enforcement for isolation. 
> 
> Yep, without an IOMMU all devices operate in the same IOVA space and we
> have no isolation.  We only enable ACS when an IOMMU driver requests it
> and it's only used by IOMMU code to determine IOMMU grouping of
> devices.  Thanks,

Thanks, Ashok and Alex.  I wish it were more obvious from the code,
but I am reassured.

I also added a stable tag to help get this backported.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] PCI: Relax ACS requirement for Intel RCiEP devices.

2020-06-01 Thread Bjorn Helgaas
On Thu, May 28, 2020 at 01:57:42PM -0700, Ashok Raj wrote:
> All Intel platforms guarantee that all root complex implementations
> must send transactions up to IOMMU for address translations. Hence for
> RCiEP devices that are Vendor ID Intel, can claim exception for lack of
> ACS support.
> 
> 
> 3.16 Root-Complex Peer to Peer Considerations
> When DMA remapping is enabled, peer-to-peer requests through the
> Root-Complex must be handled
> as follows:
> • The input address in the request is translated (through first-level,
>   second-level or nested translation) to a host physical address (HPA).
>   The address decoding for peer addresses must be done only on the
>   translated HPA. Hardware implementations are free to further limit
>   peer-to-peer accesses to specific host physical address regions
>   (or to completely disallow peer-forwarding of translated requests).
> • Since address translation changes the contents (address field) of
>   the PCI Express Transaction Layer Packet (TLP), for PCI Express
>   peer-to-peer requests with ECRC, the Root-Complex hardware must use
>   the new ECRC (re-computed with the translated address) if it
>   decides to forward the TLP as a peer request.
> • Root-ports, and multi-function root-complex integrated endpoints, may
>   support additional peerto-peer control features by supporting PCI Express
>   Access Control Services (ACS) capability. Refer to ACS capability in
>   PCI Express specifications for details.
> 
> Since Linux didn't give special treatment to allow this exception, certain
> RCiEP MFD devices are getting grouped in a single iommu group. This
> doesn't permit a single device to be assigned to a guest for instance.
> 
> In one vendor system: Device 14.x were grouped in a single IOMMU group.
> 
> /sys/kernel/iommu_groups/5/devices/:00:14.0
> /sys/kernel/iommu_groups/5/devices/:00:14.2
> /sys/kernel/iommu_groups/5/devices/:00:14.3
> 
> After the patch:
> /sys/kernel/iommu_groups/5/devices/:00:14.0
> /sys/kernel/iommu_groups/5/devices/:00:14.2
> /sys/kernel/iommu_groups/6/devices/:00:14.3 <<< new group
> 
> 14.0 and 14.2 are integrated devices, but legacy end points.
> Whereas 14.3 was a PCIe compliant RCiEP.
> 
> 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30)
> Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> 
> This permits assigning this device to a guest VM.
> 
> Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()")
> Signed-off-by: Ashok Raj 
> To: Joerg Roedel 
> To: Bjorn Helgaas 
> Cc: linux-ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: Lu Baolu 
> Cc: Alex Williamson 
> Cc: Darrel Goeddel 
> Cc: Mark Scott ,
> Cc: Romil Sharma 
> Cc: Ashok Raj 

Tentatively applied to pci/virtualization for v5.8, thanks!

The spec says this handling must apply "when DMA remapping is
enabled".  The patch does not check whether DMA remapping is enabled.

Is there any case where DMA remapping is *not* enabled, and we rely on
this patch to tell us whether the device is isolated?  It sounds like
it may give the wrong answer in such a case?

Can you confirm that I don't need to worry about this?  

> ---
> v2: Moved functionality from iommu to pci quirks - Alex Williamson
> 
>  drivers/pci/quirks.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 28c9a2409c50..63373ca0a3fe 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4682,6 +4682,20 @@ static int pci_quirk_mf_endpoint_acs(struct pci_dev 
> *dev, u16 acs_flags)
>   PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT);
>  }
>  
> +static int pci_quirk_rciep_acs(struct pci_dev *dev, u16 acs_flags)
> +{
> + /*
> +  * RCiEP's are required to allow p2p only on translated addresses.
> +  * Refer to Intel VT-d specification Section 3.16 Root-Complex Peer
> +  * to Peer Considerations
> +  */
> + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_END)
> + return -ENOTTY;
> +
> + return pci_acs_ctrl_enabled(acs_flags,
> + PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF);
> +}
> +
>  static int pci_quirk_brcm_acs(struct pci_dev *dev, u16 acs_flags)
>  {
>   /*
> @@ -4764,6 +4778,7 @@ static const struct pci_dev_acs_enabled {
>   /* I219 */
>   { PCI_VENDOR_ID_INTEL, 0x15b7, pci_quirk_mf_endpoint_acs },
>   { PCI_VENDOR_ID_INTEL, 0x15b8, pci_quirk_mf_endpoint_acs },
> + { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_rciep_acs },
>   /* QCOM QDF2xxx root ports */
>   { PCI_VENDOR_ID_QCOM, 0x0400, pci_quirk_qcom_rp_acs },
>   { PCI_VENDOR_ID_QCOM, 0x0401, pci_quirk_qcom_rp_acs },
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-06-01 Thread Bjorn Helgaas
On Thu, May 28, 2020 at 09:33:44AM +0200, Joerg Roedel wrote:
> On Wed, May 27, 2020 at 01:18:42PM -0500, Bjorn Helgaas wrote:
> > Is this slowdown significant?  We already iterate over every device
> > when applying PCI_FIXUP_FINAL quirks, so if we used the existing
> > PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
> > adding two more iterations to the loop in pci_do_fixups() that tries
> > to match quirks against the current device.  I doubt that would be a
> > measurable slowdown.
> 
> I don't know how significant it is, but I remember people complaining
> about adding new PCI quirks because it takes too long for them to run
> them all. That was in the discussion about the quirk disabling ATS on
> AMD Stoney systems.
> 
> So it probably depends on how many PCI devices are in the system whether
> it causes any measureable slowdown.

I found this [1] from Paul Menzel, which was a slowdown caused by
quirk_usb_early_handoff().  I think the real problem is individual
quirks that take a long time.

The PCI_FIXUP_IOMMU things we're talking about should be fast, and of
course, they're only run for matching devices anyway.  So I'd rather
keep them as PCI_FIXUP_FINAL than add a whole new phase.

Bjorn

[1] 
https://lore.kernel.org/linux-pci/b1533fd5-1fae-7256-9597-36d3d5de9...@molgen.mpg.de/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU

2020-05-27 Thread Bjorn Helgaas
On Tue, May 26, 2020 at 07:49:07PM +0800, Zhangfei Gao wrote:
> Some platform devices appear as PCI but are actually on the AMBA bus,
> and they need fixup in drivers/pci/quirks.c handling iommu_fwnode.
> Here introducing PCI_FIXUP_IOMMU, which is called after iommu_fwnode
> is allocated, instead of reusing PCI_FIXUP_FINAL since it will slow
> down iommu probing as all devices in fixup final list will be
> reprocessed, suggested by Joerg, [1]

Is this slowdown significant?  We already iterate over every device
when applying PCI_FIXUP_FINAL quirks, so if we used the existing
PCI_FIXUP_FINAL, we wouldn't be adding a new loop.  We would only be
adding two more iterations to the loop in pci_do_fixups() that tries
to match quirks against the current device.  I doubt that would be a
measurable slowdown.

> For example:
> Hisilicon platform device need fixup in
> drivers/pci/quirks.c handling fwspec->can_stall, which is introduced in [2]
> 
> +static void quirk_huawei_pcie_sva(struct pci_dev *pdev)
> +{
> +struct iommu_fwspec *fwspec;
> +
> +pdev->eetlp_prefix_path = 1;
> +fwspec = dev_iommu_fwspec_get(>dev);
> +if (fwspec)
> +fwspec->can_stall = 1;
> +}
> +
> +DECLARE_PCI_FIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa250, quirk_huawei_pcie_sva);
> +DECLARE_PCI_iFIXUP_IOMMU(PCI_VENDOR_ID_HUAWEI, 0xa251, 
> quirk_huawei_pcie_sva); 
> 
> [1] https://www.spinics.net/lists/iommu/msg44591.html
> [2] https://www.spinics.net/lists/linux-pci/msg94559.html

If you reference these in the commit logs, please use lore.kernel.org
links instead of spinics.

> Zhangfei Gao (2):
>   PCI: Introduce PCI_FIXUP_IOMMU
>   iommu: calling pci_fixup_iommu in iommu_fwspec_init
> 
>  drivers/iommu/iommu.c | 4 
>  drivers/pci/quirks.c  | 7 +++
>  include/asm-generic/vmlinux.lds.h | 3 +++
>  include/linux/pci.h   | 8 
>  4 files changed, 22 insertions(+)
> 
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/12] ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic

2020-05-21 Thread Bjorn Helgaas
On Thu, May 21, 2020 at 01:59:58PM +0100, Lorenzo Pieralisi wrote:
> iort_get_device_domain() is PCI specific but it need not be,
> since it can be used to retrieve IRQ domain nexus of any kind
> by adding an irq_domain_bus_token input to it.
> 
> Make it PCI agnostic by also renaming the requestor ID input
> to a more generic ID name.
> 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Bjorn Helgaas 
> Cc: Sudeep Holla 
> Cc: Catalin Marinas 
> Cc: Robin Murphy 
> Cc: "Rafael J. Wysocki" 
> Cc: Marc Zyngier 

Acked-by: Bjorn Helgaas# pci/msi.c

> ---
>  drivers/acpi/arm64/iort.c | 14 +++---
>  drivers/pci/msi.c |  3 ++-
>  include/linux/acpi_iort.h |  7 ---
>  3 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 7cfd77b5e6e8..8f2a961c1364 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -567,7 +567,6 @@ static struct acpi_iort_node *iort_find_dev_node(struct 
> device *dev)
>   node = iort_get_iort_node(dev->fwnode);
>   if (node)
>   return node;
> -
>   /*
>* if not, then it should be a platform device defined in
>* DSDT/SSDT (with Named Component node in IORT)
> @@ -658,13 +657,13 @@ static int __maybe_unused iort_find_its_base(u32 
> its_id, phys_addr_t *base)
>  /**
>   * iort_dev_find_its_id() - Find the ITS identifier for a device
>   * @dev: The device.
> - * @req_id: Device's requester ID
> + * @id: Device's ID
>   * @idx: Index of the ITS identifier list.
>   * @its_id: ITS identifier.
>   *
>   * Returns: 0 on success, appropriate error value otherwise
>   */
> -static int iort_dev_find_its_id(struct device *dev, u32 req_id,
> +static int iort_dev_find_its_id(struct device *dev, u32 id,
>   unsigned int idx, int *its_id)
>  {
>   struct acpi_iort_its_group *its;
> @@ -674,7 +673,7 @@ static int iort_dev_find_its_id(struct device *dev, u32 
> req_id,
>   if (!node)
>   return -ENXIO;
>  
> - node = iort_node_map_id(node, req_id, NULL, IORT_MSI_TYPE);
> + node = iort_node_map_id(node, id, NULL, IORT_MSI_TYPE);
>   if (!node)
>   return -ENXIO;
>  
> @@ -697,19 +696,20 @@ static int iort_dev_find_its_id(struct device *dev, u32 
> req_id,
>   *
>   * Returns: the MSI domain for this device, NULL otherwise
>   */
> -struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id)
> +struct irq_domain *iort_get_device_domain(struct device *dev, u32 id,
> +   enum irq_domain_bus_token bus_token)
>  {
>   struct fwnode_handle *handle;
>   int its_id;
>  
> - if (iort_dev_find_its_id(dev, req_id, 0, _id))
> + if (iort_dev_find_its_id(dev, id, 0, _id))
>   return NULL;
>  
>   handle = iort_find_domain_token(its_id);
>   if (!handle)
>   return NULL;
>  
> - return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI);
> + return irq_find_matching_fwnode(handle, bus_token);
>  }
>  
>  static void iort_set_device_domain(struct device *dev,
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 6b43a5455c7a..74a91f52ecc0 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1558,7 +1558,8 @@ struct irq_domain *pci_msi_get_device_domain(struct 
> pci_dev *pdev)
>   pci_for_each_dma_alias(pdev, get_msi_id_cb, );
>   dom = of_msi_map_get_device_domain(>dev, rid);
>   if (!dom)
> - dom = iort_get_device_domain(>dev, rid);
> + dom = iort_get_device_domain(>dev, rid,
> +  DOMAIN_BUS_PCI_MSI);
>   return dom;
>  }
>  #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
> diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
> index 8e7e2ec37f1b..08ec6bd2297f 100644
> --- a/include/linux/acpi_iort.h
> +++ b/include/linux/acpi_iort.h
> @@ -29,7 +29,8 @@ struct fwnode_handle *iort_find_domain_token(int trans_id);
>  #ifdef CONFIG_ACPI_IORT
>  void acpi_iort_init(void);
>  u32 iort_msi_map_rid(struct device *dev, u32 req_id);
> -struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id);
> +struct irq_domain *iort_get_device_domain(struct device *dev, u32 id,
> +   enum irq_domain_bus_token bus_token);
>  void acpi_configure_pmsi_domain(struct device *dev);
>  int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
>  /* IOMMU interface */
> @@ -40,8 +41,8 @@ int iort_iommu_msi_g

Re: [PATCH 08/12] of/irq: make of_msi_map_get_device_domain() bus agnostic

2020-05-21 Thread Bjorn Helgaas
On Thu, May 21, 2020 at 02:00:04PM +0100, Lorenzo Pieralisi wrote:
> From: Diana Craciun 
> 
> of_msi_map_get_device_domain() is PCI specific but it need not be and
> can be easily changed to be bus agnostic in order to be used by other
> busses by adding an IRQ domain bus token as an input parameter.
> 
> Signed-off-by: Diana Craciun 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Bjorn Helgaas 
> Cc: Rob Herring 
> Cc: Marc Zyngier 

Acked-by: Bjorn Helgaas# pci/msi.c

> ---
>  drivers/of/irq.c   | 8 +---
>  drivers/pci/msi.c  | 2 +-
>  include/linux/of_irq.h | 5 +++--
>  3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/of/irq.c b/drivers/of/irq.c
> index a296eaf52a5b..48a40326984f 100644
> --- a/drivers/of/irq.c
> +++ b/drivers/of/irq.c
> @@ -613,18 +613,20 @@ u32 of_msi_map_rid(struct device *dev, struct 
> device_node *msi_np, u32 rid_in)
>   * of_msi_map_get_device_domain - Use msi-map to find the relevant MSI domain
>   * @dev: device for which the mapping is to be done.
>   * @rid: Requester ID for the device.
> + * @bus_token: Bus token
>   *
>   * Walk up the device hierarchy looking for devices with a "msi-map"
>   * property.
>   *
>   * Returns: the MSI domain for this device (or NULL on failure)
>   */
> -struct irq_domain *of_msi_map_get_device_domain(struct device *dev, u32 rid)
> +struct irq_domain *of_msi_map_get_device_domain(struct device *dev, u32 id,
> + u32 bus_token)
>  {
>   struct device_node *np = NULL;
>  
> - __of_msi_map_rid(dev, , rid);
> - return irq_find_matching_host(np, DOMAIN_BUS_PCI_MSI);
> + __of_msi_map_rid(dev, , id);
> + return irq_find_matching_host(np, bus_token);
>  }
>  
>  /**
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 74a91f52ecc0..9532e1d12d3f 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1556,7 +1556,7 @@ struct irq_domain *pci_msi_get_device_domain(struct 
> pci_dev *pdev)
>   u32 rid = pci_dev_id(pdev);
>  
>   pci_for_each_dma_alias(pdev, get_msi_id_cb, );
> - dom = of_msi_map_get_device_domain(>dev, rid);
> + dom = of_msi_map_get_device_domain(>dev, rid, DOMAIN_BUS_PCI_MSI);
>   if (!dom)
>   dom = iort_get_device_domain(>dev, rid,
>DOMAIN_BUS_PCI_MSI);
> diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h
> index 1214cabb2247..7142a3722758 100644
> --- a/include/linux/of_irq.h
> +++ b/include/linux/of_irq.h
> @@ -52,7 +52,8 @@ extern struct irq_domain *of_msi_get_domain(struct device 
> *dev,
>   struct device_node *np,
>   enum irq_domain_bus_token token);
>  extern struct irq_domain *of_msi_map_get_device_domain(struct device *dev,
> -u32 rid);
> + u32 id,
> + u32 bus_token);
>  extern void of_msi_configure(struct device *dev, struct device_node *np);
>  u32 of_msi_map_rid(struct device *dev, struct device_node *msi_np, u32 
> rid_in);
>  #else
> @@ -85,7 +86,7 @@ static inline struct irq_domain *of_msi_get_domain(struct 
> device *dev,
>   return NULL;
>  }
>  static inline struct irq_domain *of_msi_map_get_device_domain(struct device 
> *dev,
> -   u32 rid)
> + u32 id, u32 bus_token)
>  {
>   return NULL;
>  }
> -- 
> 2.26.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/15] PCI: brcmstb: enable PCIe for STB chips

2020-05-20 Thread Bjorn Helgaas
On Tue, May 19, 2020 at 04:33:58PM -0400, Jim Quinlan wrote:
> This patchset expands the usefulness of the Broadcom Settop Box PCIe
> controller by building upon the PCIe driver used currently by the
> Raspbery Pi.  Other forms of this patchset were submitted by me years
> ago and not accepted; the major sticking point was the code required
> for the DMA remapping needed for the PCIe driver to work [1].
> 
> There have been many changes to the DMA and OF subsystems since that
> time, making a cleaner and less intrusive patchset possible.  This
> patchset implements a generalization of "dev->dma_pfn_offset", except
> that instead of a single scalar offset it provides for multiple
> offsets via a function which depends upon the "dma-ranges" property of
> the PCIe host controller.  This is required for proper functionality
> of the BrcmSTB PCIe controller and possibly some other devices.
> 
> [1] 
> https://lore.kernel.org/linux-arm-kernel/1516058925-46522-5-git-send-email-jim2101...@gmail.com/
> 
> Jim Quinlan (15):
>   PCI: brcmstb: PCIE_BRCMSTB depends on ARCH_BRCMSTB
>   ahci_brcm: fix use of BCM7216 reset controller
>   dt-bindings: PCI: Add bindings for more Brcmstb chips
>   PCI: brcmstb: Add compatibily of other chips
>   PCI: brcmstb: Add suspend and resume pm_ops
>   PCI: brcmstb: Asserting PERST is different for 7278
>   PCI: brcmstb: Add control of rescal reset
>   of: Include a dev param in of_dma_get_range()
>   device core: Add ability to handle multiple dma offsets
>   dma-direct: Invoke dma offset func if needed
>   arm: dma-mapping: Invoke dma offset func if needed
>   PCI: brcmstb: Set internal memory viewport sizes
>   PCI: brcmstb: Accommodate MSI for older chips
>   PCI: brcmstb: Set bus max burst side by chip type
>   PCI: brcmstb: add compatilbe chips to match list

If you have occasion to post a v2 for other reasons,

s/PCIE_BRCMSTB depends on ARCH_BRCMSTB/Allow PCIE_BRCMSTB on ARCH_BRCMSTB also/
s/ahci_brcm: fix use of BCM7216 reset controller/ata: ahci_brcm: Fix .../
s/Add compatibily of other chips/Add bcm7278 register info/
s/Asserting PERST is different for 7278/Add bcm7278 PERST support/
s/Set bus max burst side/Set bus max burst size/
s/add compatilbe chips.*/Add bcm7211, bcm7216, bcm7445, bcm7278 to match list/

Rewrap commit logs to use full 75 character lines (to allow for the 4
spaces added by git log).

In commit logs, s/This commit// (use imperative mood instead).

In "Accommodate MSI for older chips" commit log, s/commont/common/.

>  .../bindings/pci/brcm,stb-pcie.yaml   |  40 +-
>  arch/arm/include/asm/dma-mapping.h|  17 +-
>  drivers/ata/ahci_brcm.c   |  14 +-
>  drivers/of/address.c  |  54 ++-
>  drivers/of/device.c   |   2 +-
>  drivers/of/of_private.h   |   8 +-
>  drivers/pci/controller/Kconfig|   4 +-
>  drivers/pci/controller/pcie-brcmstb.c | 403 +++---
>  include/linux/device.h|   9 +-
>  include/linux/dma-direct.h|  16 +
>  include/linux/dma-mapping.h   |  44 ++
>  kernel/dma/Kconfig|  12 +
>  12 files changed, 542 insertions(+), 81 deletions(-)
> 
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] PCI/ATS: Only enable ATS for trusted devices

2020-05-15 Thread Bjorn Helgaas
On Fri, May 15, 2020 at 12:43:59PM +0200, Jean-Philippe Brucker wrote:
> Add pci_ats_supported(), which checks whether a device has an ATS
> capability, and whether it is trusted.  A device is untrusted if it is
> plugged into an external-facing port such as Thunderbolt and could be
> spoof an existing device to exploit weaknesses in the IOMMU
> configuration.  PCIe ATS is one such weaknesses since it allows
> endpoints to cache IOMMU translations and emit transactions with
> 'Translated' Address Type (10b) that partially bypass the IOMMU
> translation.
> 
> The SMMUv3 and VT-d IOMMU drivers already disallow ATS and transactions
> with 'Translated' Address Type for untrusted devices.  Add the check to
> pci_enable_ats() to let other drivers (AMD IOMMU for now) benefit from
> it.
> 
> By checking ats_cap, the pci_ats_supported() helper also returns whether
> ATS was globally disabled with pci=noats, and could later include more
> things, for example whether the whole PCIe hierarchy down to the
> endpoint supports ATS.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  include/linux/pci-ats.h |  3 +++
>  drivers/pci/ats.c   | 18 +-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index d08f0869f1213e..f75c307f346de9 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -6,11 +6,14 @@
>  
>  #ifdef CONFIG_PCI_ATS
>  /* Address Translation Service */
> +bool pci_ats_supported(struct pci_dev *dev);
>  int pci_enable_ats(struct pci_dev *dev, int ps);
>  void pci_disable_ats(struct pci_dev *dev);
>  int pci_ats_queue_depth(struct pci_dev *dev);
>  int pci_ats_page_aligned(struct pci_dev *dev);
>  #else /* CONFIG_PCI_ATS */
> +static inline bool pci_ats_supported(struct pci_dev *d)
> +{ return false; }
>  static inline int pci_enable_ats(struct pci_dev *d, int ps)
>  { return -ENODEV; }
>  static inline void pci_disable_ats(struct pci_dev *d) { }
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 390e92f2d8d1fc..15fa0c37fd8e44 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -30,6 +30,22 @@ void pci_ats_init(struct pci_dev *dev)
>   dev->ats_cap = pos;
>  }
>  
> +/**
> + * pci_ats_supported - check if the device can use ATS
> + * @dev: the PCI device
> + *
> + * Returns true if the device supports ATS and is allowed to use it, false
> + * otherwise.
> + */
> +bool pci_ats_supported(struct pci_dev *dev)
> +{
> + if (!dev->ats_cap)
> + return false;
> +
> + return !dev->untrusted;
> +}
> +EXPORT_SYMBOL_GPL(pci_ats_supported);
> +
>  /**
>   * pci_enable_ats - enable the ATS capability
>   * @dev: the PCI device
> @@ -42,7 +58,7 @@ int pci_enable_ats(struct pci_dev *dev, int ps)
>   u16 ctrl;
>   struct pci_dev *pdev;
>  
> - if (!dev->ats_cap)
> + if (!pci_ats_supported(dev))
>   return -EINVAL;
>  
>   if (WARN_ON(dev->ats_enabled))
> -- 
> 2.26.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/3] PCI: Add DMA configuration for virtual platforms

2020-03-18 Thread Bjorn Helgaas
On Fri, Feb 28, 2020 at 06:25:37PM +0100, Jean-Philippe Brucker wrote:
> Hardware platforms usually describe the IOMMU topology using either
> device-tree pointers or vendor-specific ACPI tables.  For virtual
> platforms that don't provide a device-tree, the virtio-iommu device
> contains a description of the endpoints it manages.  That information
> allows us to probe endpoints after the IOMMU is probed (possibly as late
> as userspace modprobe), provided it is discovered early enough.
> 
> Add a hook to pci_dma_configure(), which returns -EPROBE_DEFER if the
> endpoint is managed by a vIOMMU that will be loaded later, or 0 in any
> other case to avoid disturbing the normal DMA configuration methods.
> When CONFIG_VIRTIO_IOMMU_TOPOLOGY isn't selected, the call to
> virt_dma_configure() is compiled out.
> 
> As long as the information is consistent, platforms can provide both a
> device-tree and a built-in topology, and the IOMMU infrastructure is
> able to deal with multiple DMA configuration methods.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/pci-driver.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 0454ca0e4e3f..69303a814f21 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "pci.h"
>  #include "pcie/portdrv.h"
>  
> @@ -1602,6 +1603,10 @@ static int pci_dma_configure(struct device *dev)
>   struct device *bridge;
>   int ret = 0;
>  
> + ret = virt_dma_configure(dev);
> + if (ret)
> + return ret;
> +
>   bridge = pci_get_host_bridge_device(to_pci_dev(dev));
>  
>   if (IS_ENABLED(CONFIG_OF) && bridge->parent &&
> -- 
> 2.25.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/6] PCI/ATS: Export symbols of PASID functions

2020-03-18 Thread Bjorn Helgaas
On Mon, Feb 24, 2020 at 05:58:41PM +0100, Jean-Philippe Brucker wrote:
> The Arm SMMUv3 driver uses pci_{enable,disable}_pasid() and related
> functions.  Export them to allow the driver to be built as a module.
> 
> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/ats.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 3ef0bb281e7c..390e92f2d8d1 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -366,6 +366,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>  
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(pci_enable_pasid);
>  
>  /**
>   * pci_disable_pasid - Disable the PASID capability
> @@ -390,6 +391,7 @@ void pci_disable_pasid(struct pci_dev *pdev)
>  
>   pdev->pasid_enabled = 0;
>  }
> +EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
>  /**
>   * pci_restore_pasid_state - Restore PASID capabilities
> @@ -441,6 +443,7 @@ int pci_pasid_features(struct pci_dev *pdev)
>  
>   return supported;
>  }
> +EXPORT_SYMBOL_GPL(pci_pasid_features);
>  
>  #define PASID_NUMBER_SHIFT   8
>  #define PASID_NUMBER_MASK(0x1f << PASID_NUMBER_SHIFT)
> @@ -469,4 +472,5 @@ int pci_max_pasids(struct pci_dev *pdev)
>  
>   return (1 << supported);
>  }
> +EXPORT_SYMBOL_GPL(pci_max_pasids);
>  #endif /* CONFIG_PCI_PASID */
> -- 
> 2.25.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 02/11] PCI: Add ats_supported host bridge flag

2020-03-12 Thread Bjorn Helgaas
On Wed, Mar 11, 2020 at 01:44:57PM +0100, Jean-Philippe Brucker wrote:
> Each vendor has their own way of describing whether a host bridge
> supports ATS.  The Intel and AMD ACPI tables selectively enable or
> disable ATS per device or sub-tree, while Arm has a single bit for each
> host bridge.  For those that need it, add an ats_supported bit to the
> host bridge structure.

Can you mention the specific ACPI tables here in the commit log?

Maybe elaborate on the "for those that need it" bit?  I'm not sure if
you need it for the cases where DT or ACPI tells us directly for the
host bridge, or if you need it for the more selective cases?

I guess in one sense you *always* need it since you check the cached
bit later.

I don't understand the implications of this, especially the selective
situation.  Given your comment from the first posting, I thought this
was a property of the host bridge, so I don't know what it means to
say some devices support ATS but others don't.

> Signed-off-by: Jean-Philippe Brucker 
> ---
> v1->v2: try to improve the comment
> ---
>  drivers/pci/probe.c | 8 
>  include/linux/pci.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 512cb4312ddd..b5e36f06b40a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -598,6 +598,14 @@ static void pci_init_host_bridge(struct pci_host_bridge 
> *bridge)
>   bridge->native_shpc_hotplug = 1;
>   bridge->native_pme = 1;
>   bridge->native_ltr = 1;
> +
> + /*
> +  * Some systems (ACPI IORT, device-tree) declare ATS support at the host
> +  * bridge, and clear this bit when ATS isn't supported. Others (ACPI
> +  * DMAR and IVRS) declare ATS support with a smaller granularity, and
> +  * need this bit set.
> +  */
> + bridge->ats_supported = 1;
>  }
>  
>  struct pci_host_bridge *pci_alloc_host_bridge(size_t priv)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3840a541a9de..9fe2e84d74d7 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -511,6 +511,7 @@ struct pci_host_bridge {
>   unsigned intnative_pme:1;   /* OS may use PCIe PME */
>   unsigned intnative_ltr:1;   /* OS may use PCIe LTR */
>   unsigned intpreserve_config:1;  /* Preserve FW resource setup */
> + unsigned intats_supported:1;
>  
>   /* Resource alignment requirements */
>   resource_size_t (*align_resource)(struct pci_dev *dev,
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   4   5   >