Re: [PATCH 4/4] iommu/vt-d: Remove lazy allocation of domains

2019-03-05 Thread Lu Baolu

Hi,

On 3/5/19 7:46 PM, James Sewart wrote:

Hey Lu,


On 5 Mar 2019, at 06:59, Lu Baolu  wrote:

Hi,

It's hard for me to understand why do we remove the rmrr related
code in this patch.


The RMRR code removed here requires the lazy allocation of domains to
exist, as it is run before iommu.c would assign IOMMU groups and attach a
domain. Before patch 3, removing this code would mean the RMRR regions are
never mapped for a domain: iommu.c will allocate a default domain for the
group that a device is about to be put in, it will attach the domain to
the device, then for each region returned by get_resv_regions it will
create an identity map, this is where the RMRRs are setup for the default
domain. >


And, now we have two places to hold a domain for a device: group and
dev->info. We can consider to optimize the use of per device iommu
arch data. This should be later work anyway.

More comments inline.

On 3/4/19 11:47 PM, James Sewart wrote:

The generic IOMMU code will allocate and attach a dma ops domain to each
device that comes online, replacing any lazy allocated domain. Removes
RMRR application at iommu init time as we won't have a domain attached
to any device. iommu.c will do this after attaching a device using
create_direct_mappings.
Signed-off-by: James Sewart 
---
  drivers/iommu/intel-iommu.c | 202 ++--
  1 file changed, 8 insertions(+), 194 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 71cd6bbfec05..282257e2628d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2595,118 +2595,6 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
return domain;
  }
  -static int get_last_alias(struct pci_dev *pdev, u16 alias, void *opaque)
-{
-   *(u16 *)opaque = alias;
-   return 0;
-}
-
-static struct dmar_domain *find_or_alloc_domain(struct device *dev, int gaw)
-{
-   struct device_domain_info *info = NULL;
-   struct dmar_domain *domain = NULL;
-   struct intel_iommu *iommu;
-   u16 dma_alias;
-   unsigned long flags;
-   u8 bus, devfn;
-
-   iommu = device_to_iommu(dev, , );
-   if (!iommu)
-   return NULL;
-
-   if (dev_is_pci(dev)) {
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   pci_for_each_dma_alias(pdev, get_last_alias, _alias);
-
-   spin_lock_irqsave(_domain_lock, flags);
-   info = dmar_search_domain_by_dev_info(pci_domain_nr(pdev->bus),
- PCI_BUS_NUM(dma_alias),
- dma_alias & 0xff);
-   if (info) {
-   iommu = info->iommu;
-   domain = info->domain;
-   }
-   spin_unlock_irqrestore(_domain_lock, flags);
-
-   /* DMA alias already has a domain, use it */
-   if (info)
-   goto out;
-   }
-
-   /* Allocate and initialize new domain for the device */
-   domain = alloc_domain(0);
-   if (!domain)
-   return NULL;
-   if (domain_init(domain, iommu, gaw)) {
-   domain_exit(domain);
-   return NULL;
-   }
-
-out:
-
-   return domain;
-}
-
-static struct dmar_domain *set_domain_for_dev(struct device *dev,
- struct dmar_domain *domain)
-{
-   struct intel_iommu *iommu;
-   struct dmar_domain *tmp;
-   u16 req_id, dma_alias;
-   u8 bus, devfn;
-
-   iommu = device_to_iommu(dev, , );
-   if (!iommu)
-   return NULL;
-
-   req_id = ((u16)bus << 8) | devfn;
-
-   if (dev_is_pci(dev)) {
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   pci_for_each_dma_alias(pdev, get_last_alias, _alias);
-
-   /* register PCI DMA alias device */
-   if (req_id != dma_alias) {
-   tmp = dmar_insert_one_dev_info(iommu, 
PCI_BUS_NUM(dma_alias),
-   dma_alias & 0xff, NULL, domain);
-
-   if (!tmp || tmp != domain)
-   return tmp;
-   }
-   }
-
-   tmp = dmar_insert_one_dev_info(iommu, bus, devfn, dev, domain);
-   if (!tmp || tmp != domain)
-   return tmp;
-
-   return domain;
-}
-
-static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
-{
-   struct dmar_domain *domain, *tmp;
-
-   domain = find_domain(dev);
-   if (domain)
-   goto out;
-
-   domain = find_or_alloc_domain(dev, gaw);
-   if (!domain)
-   goto out;
-
-   tmp = set_domain_for_dev(dev, domain);
-   if (!tmp || domain != tmp) {
-   domain_exit(domain);
-   domain = tmp;
-   }
-
-out:
-
-   return domain;
-}
-
  static int iommu_domain_identity_map(struct 

Re: [PATCH 0/4] iommu/vt-d: Fix-up device-domain relationship by refactoring to use iommu group default domain.

2019-03-05 Thread Lu Baolu

Hi,

On 3/5/19 7:14 PM, James Sewart wrote:

Hey Lu,

The motivation for this is buggy domain <-> IOMMU group relationship when
using find_or_alloc_domain. From what I have read it should be the case
that an IOMMU domain is shared by all devices in the same group, thus the
same mappings. This is because the IOMMU group is determined by things
like ACS settings on pci switches which determines whether devices can
talk to each other.

Yes, exactly. This is a known issue.



In find_or_alloc_domain we only determine domain sharing based on devices
returned by pci_for_each_dma_alias, whereas there are many more checks in
pci_device_group(e.g. ACS settings of intermediary pci switches), which is
used for determining which IOMMU group a device is in. This discontinuity
means it can be the case that each device within an IOMMU group will have
different domains.


Yes. Agree again.



We see issues as it is supposed to be safe to assume that devices within a
group should be considered to be in the same address space, but if each
device has its own domain then any mapping created won’t be valid for the
other devices, and even could be mapped differently for each device. This
also could cause issues with a device whose RMRR's are not applied to the
domains of other devices within its group, these regions could be remapped
for the other devices resulting in unexpected behaviour during
peer-to-peer transfers.


Yes, fair enough.

I asked this question because I am interested to know whether multiple
domains per group due to lack of ACS consideration will cause any issues
that we need to fix in the stable kernels.

Best regards,
Lu Baolu



Cheers,
James



On 5 Mar 2019, at 06:05, Lu Baolu  wrote:

Hi James,

Very glad to see this. Thank you!

On 3/4/19 11:41 PM, James Sewart wrote:

Hey,
This patchset moves IOMMU_DOMAIN_DMA iommu domain management to iommu.c.
This avoids the use of find_or_alloc_domain whose domain assignment is
inconsistent with the iommu grouping as determined by pci_device_group.


Is this a bug fix or an improvement? What's the real issue will it cause
if we go without this patch set?

Best regards,
Lu Baolu


Patch 3 permits domain type IOMMU_DOMAIN_DMA to be allocated via the
iommu_ops api, allowing the default_domain of an iommu group to be set in
iommu.c. This domain will be attached to every device that is brought up
with an iommu group, and the devices reserved regions will be mapped using
regions returned by get_resv_regions.
In intel_iommu_domain_alloc we don’t know the IOMMU this domain will be
associated with so we defer full initialisation until
intel_iommu_attach_device. Currently iommu.c:iommu_group_add_device will
try to map a devices reserved regions before attaching the domain which
would cause issue if the domain is not fully initialised. This is
addressed in patch 1 by moving the mapping to after attaching.
Patch 2 implements function apply_resv_region, used in
iommu_group_create_direct_mappings to mark the reserved regions as non
mappable for the dma_map_ops api.
Patch 4 removes the domain lazy allocation logic. Before this patch the
lazy allocation logic would not be used as any domain allocated using
these paths would be replaced when attaching the group default domain.
Default domain allocation has been tested with and without this patch on
4.19.
Cheers,
James.
James Sewart (4):
   iommu: Move iommu_group_create_direct_mappings to after device_attach
   iommu/vt-d: Implement apply_resv_region for reserving IOVA ranges
   iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be allocated
   iommu/vt-d: Remove lazy allocation of domains
  drivers/iommu/intel-iommu.c | 329 
  drivers/iommu/iommu.c   |   4 +-
  2 files changed, 108 insertions(+), 225 deletions(-)




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 02/13] driver core: Remove the link if there is no driver with AUTO flag

2019-03-05 Thread Evan Green
On Wed, Feb 27, 2019 at 6:33 AM Yong Wu  wrote:
>
> On Mon, 2019-02-25 at 15:53 -0800, Evan Green wrote:
> > On Mon, Dec 31, 2018 at 8:52 PM Yong Wu  wrote:
> > >
> > > DL_FLAG_AUTOREMOVE_CONSUMER/SUPPLIER means "Remove the link
> > > automatically on consumer/supplier driver unbind", that means we should
> > > remove whole the device_link when there is no this driver no matter what
> > > the ref_count of the link is.
> > >
> > > CC: Greg Kroah-Hartman 
> > > Signed-off-by: Yong Wu 
> > > ---
> > > The ref_count of our device_link normally is over 1. When the consumer
> > > device driver is removed, whole the device_link should be removed.
> > > Thus, I add this patch.
> > > ---
> >
> > I will admit to reading about device links for the first time while
> > reviewing this patch, but I don't really get this. Why use a kref at
> > all if we're just going to ignore its value? For instance, I see that
> > if you call device_link_add() with the same supplier and consumer, it
> > uses the kref to return the same link. That machinery is broken with
> > your change. Although I don't see any uses of it, you might also
> > expect a supplier or consumer could do a kref_get() on the link it got
> > back from device_link_add(), and have a reasonable expectation that
> > the link wouldn't be freed out from under it. This would also be
> > broken.
> >
> > Can you explain why your device_links normally have a reference count
> > >1,
>
> I use device link between the smi-larb device and the iommu-consumer
> device. Take a example, smi-larb1 have 4 VDEC ports. From 4/13 in this
> patchset, we use device_link to link the VDEC device and the smi-larb1
> device in the function(mtk_iommu_config). since there are 4 ports, it
> will call device_link_add 4 times.
>
> >
> > and why those additional references can't be cleaned up in an
> > orderly fashion?
>
> If the iommu-consume device(like VDEC above) is removed, It should enter
> device_links_driver_cleanup which only ref_put one time. I guess whole
> the link should be removed at that time.

It seems like Robin had some suggestions about using
mtk_iommu_add_device() rather than the attach_dev() to set the links
up, and then track them for removal in the corresponding
remove_device() callback. Then you wouldn't need this change, right?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/13] memory: mtk-smi: Add device-link between smi-larb and smi-common

2019-03-05 Thread Evan Green
On Wed, Feb 27, 2019 at 6:33 AM Yong Wu  wrote:
>
> On Mon, 2019-02-25 at 15:54 -0800, Evan Green wrote:
> > On Mon, Dec 31, 2018 at 8:52 PM Yong Wu  wrote:
> > >
> > > Normally, If the smi-larb HW need work, we should enable the smi-common
> > > HW power and clock firstly.
> > > This patch adds device-link between the smi-larb dev and the smi-common
> > > dev. then If pm_runtime_get_sync(smi-larb-dev), the pm_runtime_get_sync
> > > (smi-common-dev) will be called automatically.
> > >
> > > Since smi is built-in driver like IOMMU and never unbound,
> > > DL_FLAG_AUTOREMOVE_* is not needed.
> > >
> > > CC: Matthias Brugger 
> > > Suggested-by: Tomasz Figa 
> > > Signed-off-by: Yong Wu 
> > > ---
> > >  drivers/memory/mtk-smi.c | 16 +++-
> > >  1 file changed, 7 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> > > index 9688341..30930e4 100644
> > > --- a/drivers/memory/mtk-smi.c
> > > +++ b/drivers/memory/mtk-smi.c
> > > @@ -271,6 +271,7 @@ static int mtk_smi_larb_probe(struct platform_device 
> > > *pdev)
> > > struct device *dev = >dev;
> > > struct device_node *smi_node;
> > > struct platform_device *smi_pdev;
> > > +   struct device_link *link;
> > >
> > > larb = devm_kzalloc(dev, sizeof(*larb), GFP_KERNEL);
> > > if (!larb)
> > > @@ -310,6 +311,12 @@ static int mtk_smi_larb_probe(struct platform_device 
> > > *pdev)
> > > if (!platform_get_drvdata(smi_pdev))
> > > return -EPROBE_DEFER;
> > > larb->smi_common_dev = _pdev->dev;
> > > +   link = device_link_add(dev, larb->smi_common_dev,
> > > +  DL_FLAG_PM_RUNTIME);
> >
> > Doesn't this need to be torn down in remove()? You mention that it's
> > built-in and never removed, but it does seem to have a remove()
>
> The MTK IOMMU driver need depend on this SMI driver. the IOMMU is a
> builtin driver, thus the SMI also should be a builtin driver.
>
> Technically, If the driver is builtin, then the "remove" function can be
> removed? If yes, I could use a new patch do it.

Yeah, I guess so. It's always sad to see cleanup code getting removed,
but it makes sense to me.


>
> It looks the MACRO(builtin_platform_driver) only support one driver, but
> we have two driver(smi-common and smi-larb) here.
>
> > function that tears down everything else, so it seemed a shame to
> > start leaking now. Maybe the AUTOREMOVE flag would do it.
>
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 11/13] iommu/mediatek: Use builtin_platform_driver

2019-03-05 Thread Evan Green
On Wed, Feb 27, 2019 at 6:33 AM Yong Wu  wrote:
>
> On Mon, 2019-02-25 at 15:56 -0800, Evan Green wrote:
> > On Mon, Dec 31, 2018 at 8:53 PM Yong Wu  wrote:
> > >
> > > MediaTek IOMMU should wait for smi larb which need wait for the
> > > power domain(mtk-scpsys.c) and the multimedia ccf who both are
> > > module init. Thus, subsys_initcall for MediaTek IOMMU is not helpful.
> > > Switch to builtin_platform_driver.
> > >
> > > Meanwhile, the ".remove" can be removed. Move its content to
> > > ".shutdown".
> > >
> > > Signed-off-by: Yong Wu 
> > > ---
> > >  drivers/iommu/mtk_iommu.c| 23 ++-
> > >  drivers/iommu/mtk_iommu_v1.c | 16 ++--
> > >  2 files changed, 4 insertions(+), 35 deletions(-)
> > >
> > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > > index 735ae8d..2798b12 100644
> > > --- a/drivers/iommu/mtk_iommu.c
> > > +++ b/drivers/iommu/mtk_iommu.c
> > > @@ -690,7 +690,7 @@ static int mtk_iommu_probe(struct platform_device 
> > > *pdev)
> > > return component_master_add_with_match(dev, _iommu_com_ops, 
> > > match);
> > >  }
> > >
> > > -static int mtk_iommu_remove(struct platform_device *pdev)
> > > +static void mtk_iommu_shutdown(struct platform_device *pdev)
> > >  {
> > > struct mtk_iommu_data *data = platform_get_drvdata(pdev);
> > >
> > > @@ -703,12 +703,6 @@ static int mtk_iommu_remove(struct platform_device 
> > > *pdev)
> > > clk_disable_unprepare(data->bclk);
> > > devm_free_irq(>dev, data->irq, data);
> > > component_master_del(>dev, _iommu_com_ops);
> > > -   return 0;
> > > -}
> > > -
> > > -static void mtk_iommu_shutdown(struct platform_device *pdev)
> > > -{
> > > -   mtk_iommu_remove(pdev);
> >
> > Is there a reason all these things are happening in shutdown()? Don't
> > we normally just not clean things up and let the machine turn off?
> > Normally I'm a big advocate of proper symmetric teardown, so it hurts
> > a little to ask.
>
> In this patch I change the driver to builtin driver. I guess the
> "_remove" is not helpful now. thus I remove it.
>
> About the shutdown, I don't have a special reason for it. I only want to
> mute the m4u log when something go wrong in the shutdown flow.
>

Wouldn't you want to see the logs when something goes wrong in
shutdown? Or is there a lot of noise that gets generated if you don't
do this during a graceful shutdown? I guess if there's some real
reason you're doing all this stuff in shutdown then it's fine, but if
it was just a default shuttling of code from remove to shutdown
because remove was going away then it could be deleted.
-Evan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] iommu/mediatek: Add probe_defer for smi-larb

2019-03-05 Thread Evan Green
On Wed, Feb 27, 2019 at 6:34 AM Yong Wu  wrote:
>
> On Mon, 2019-02-25 at 15:54 -0800, Evan Green wrote:
> > On Mon, Dec 31, 2018 at 8:52 PM Yong Wu  wrote:
> > >
> > > The iommu consumer should use device_link to connect with the
> > > smi-larb(supplier). then the smi-larb should run before the iommu
> > > consumer. Here we delay the iommu driver until the smi driver is
> > > ready, then all the iommu consumer always is after the smi driver.
> > >
> > > When there is no this patch, if some consumer drivers run before
> > > smi-larb, the supplier link_status is DL_DEV_NO_DRIVER(0) in the
> > > device_link_add, then device_links_driver_bound will use WARN_ON
> > > to complain that the link_status of supplier is not right.
> > >
> >
> > I don't quite have enough knowledge of device links to figure out if
> > this is a) the right fix, b) a deficiency in the device_links code, or
> > c) neither.
>
> I can not say more about the device link, But from the MTK iommu point
> of view, it looks a right fix. As the comment above, we should make sure
> the probe sequence, SMI-larb should be before MTK IOMMU which need be
> before iommu-consumer. this patch help SMI-larb always is before
> MTK_IOMMU.
>
> Then if there is no this patchset, what the MTK_IOMMU will happen?.
>
> The MTK_IOMMU is subsys_initcall, all the consume only need after the
> MTK_IOMMU and don't need wait for SMI-larb driver. If some consume run
> before SMI-larb driver, It also is OK while it don't call
> mtk_smi_larb_get in this probe.(Of course it will abort if it call this
> while smi-driver don't probe at that time). the consumer and smi-larb
> normally are module_init, we can not guarantee the the sequence. it is a
> potential issue.
>
> Some consume know it should wait for SMI-larb like [1], but It don't
> work.
>
> [1]
> https://elixir.bootlin.com/linux/v5.0-rc1/source/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c#L145

Ok, I think I get it now. The device_links that get created are
enforcing a probe order dependency, but in cases where this driver
probes before those device links are even created, then this extra
check defers so that the probe order stays correct. Then the
device_links were correct to WARN because they're basically saying,
"hey, you asked us to enforce a particular dependency, but I can
already see probe is happening in an order that violates that
dependency".

Is that right?

-Evan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 RFC/RFT] dma-contiguous: Get normal pages for single-page allocations

2019-03-05 Thread Nicolin Chen
The addresses within a single page are always contiguous, so it's
not so necessary to always allocate one single page from CMA area.
Since the CMA area has a limited predefined size of space, it may
run out of space in heavy use cases, where there might be quite a
lot CMA pages being allocated for single pages.

However, there is also a concern that a device might care where a
page comes from -- it might expect the page from CMA area and act
differently if the page doesn't.

This patch tries to get normal pages for single-page allocations
unless the device has its own CMA area. This would save resources
from the CMA area for more CMA allocations. And it'd also reduce
CMA fragmentations resulted from trivial allocations.

Also, it updates the API and its callers so as to pass gfp flags.

Signed-off-by: Nicolin Chen 
---
Changelog
v1->v2:
 * Removed one ';' in header file that caused build errors when
   CONFIG_DMA_CMA=N.

 arch/arm/mm/dma-mapping.c  |  5 ++---
 arch/arm64/mm/dma-mapping.c|  2 +-
 arch/xtensa/kernel/pci-dma.c   |  2 +-
 drivers/iommu/amd_iommu.c  |  2 +-
 drivers/iommu/intel-iommu.c|  3 +--
 include/linux/dma-contiguous.h |  4 ++--
 kernel/dma/contiguous.c| 23 +++
 7 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8a90f298af96..c39fc2d97712 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -588,7 +588,7 @@ static void *__alloc_from_contiguous(struct device *dev, 
size_t size,
struct page *page;
void *ptr = NULL;
 
-   page = dma_alloc_from_contiguous(dev, count, order, gfp & __GFP_NOWARN);
+   page = dma_alloc_from_contiguous(dev, count, order, gfp);
if (!page)
return NULL;
 
@@ -1293,8 +1293,7 @@ static struct page **__iommu_alloc_buffer(struct device 
*dev, size_t size,
unsigned long order = get_order(size);
struct page *page;
 
-   page = dma_alloc_from_contiguous(dev, count, order,
-gfp & __GFP_NOWARN);
+   page = dma_alloc_from_contiguous(dev, count, order, gfp);
if (!page)
goto error;
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 78c0a72f822c..660adedaab5d 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -159,7 +159,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
struct page *page;
 
page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-   get_order(size), gfp & __GFP_NOWARN);
+get_order(size), gfp);
if (!page)
return NULL;
 
diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c
index 9171bff76fc4..e15b893caadb 100644
--- a/arch/xtensa/kernel/pci-dma.c
+++ b/arch/xtensa/kernel/pci-dma.c
@@ -157,7 +157,7 @@ void *arch_dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
 
if (gfpflags_allow_blocking(flag))
page = dma_alloc_from_contiguous(dev, count, get_order(size),
-flag & __GFP_NOWARN);
+flag);
 
if (!page)
page = alloc_pages(flag | __GFP_ZERO, get_order(size));
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 6b0760dafb3e..c54923a9e31f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2691,7 +2691,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
return NULL;
 
page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-   get_order(size), flag & __GFP_NOWARN);
+get_order(size), flag);
if (!page)
return NULL;
}
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 87274b54febd..6b5cf7313db6 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3791,8 +3791,7 @@ static void *intel_alloc_coherent(struct device *dev, 
size_t size,
if (gfpflags_allow_blocking(flags)) {
unsigned int count = size >> PAGE_SHIFT;
 
-   page = dma_alloc_from_contiguous(dev, count, order,
-flags & __GFP_NOWARN);
+   page = dma_alloc_from_contiguous(dev, count, order, flags);
if (page && iommu_no_mapping(dev) &&
page_to_phys(page) + size > dev->coherent_dma_mask) {
dma_release_from_contiguous(dev, page, count);
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index f247e8aa5e3d..10ea106b720f 100644
--- 

Re: [PATCH v4 05/22] iommu: Introduce cache_invalidate API

2019-03-05 Thread Auger Eric
Hi Kevin, Yi,

On 3/5/19 4:28 PM, Jean-Philippe Brucker wrote:
> On 18/02/2019 13:54, Eric Auger wrote:
>> From: "Liu, Yi L" 
>>
>> In any virtualization use case, when the first translation stage
>> is "owned" by the guest OS, the host IOMMU driver has no knowledge
>> of caching structure updates unless the guest invalidation activities
>> are trapped by the virtualizer and passed down to the host.
>>
>> Since the invalidation data are obtained from user space and will be
>> written into physical IOMMU, we must allow security check at various
>> layers. Therefore, generic invalidation data format are proposed here,
>> model specific IOMMU drivers need to convert them into their own format.
>>
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v3 -> v4:
>> - full reshape of the API following Alex' comments
>>
>> v1 -> v2:
>> - add arch_id field
>> - renamed tlb_invalidate into cache_invalidate as this API allows
>>   to invalidate context caches on top of IOTLBs
>>
>> v1:
>> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
>> header. Commit message reworded.
>> ---
>>  drivers/iommu/iommu.c  | 14 
>>  include/linux/iommu.h  | 14 
>>  include/uapi/linux/iommu.h | 71 ++
>>  3 files changed, 99 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b3adb77cb14c..bcb8eb15426c 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1564,6 +1564,20 @@ void iommu_detach_pasid_table(struct iommu_domain 
>> *domain)
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>>  
>> +int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
>> +   struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +int ret = 0;
>> +
>> +if (unlikely(!domain->ops->cache_invalidate))
>> +return -ENODEV;
>> +
>> +ret = domain->ops->cache_invalidate(domain, dev, inv_info);
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 7045e26f3a7d..a3b879d0753c 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -189,6 +189,7 @@ struct iommu_resv_region {
>>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>>   * @attach_pasid_table: attach a pasid table
>>   * @detach_pasid_table: detach the pasid table
>> + * @cache_invalidate: invalidate translation caches
>>   */
>>  struct iommu_ops {
>>  bool (*capable)(enum iommu_cap);
>> @@ -235,6 +236,9 @@ struct iommu_ops {
>>struct iommu_pasid_table_config *cfg);
>>  void (*detach_pasid_table)(struct iommu_domain *domain);
>>  
>> +int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
>> +struct iommu_cache_invalidate_info *inv_info);
>> +
>>  unsigned long pgsize_bitmap;
>>  };
>>  
>> @@ -348,6 +352,9 @@ extern void iommu_detach_device(struct iommu_domain 
>> *domain,
>>  extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>>  struct iommu_pasid_table_config *cfg);
>>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
>> +  struct device *dev,
>> +  struct iommu_cache_invalidate_info *inv_info);
>>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> @@ -798,6 +805,13 @@ void iommu_detach_pasid_table(struct iommu_domain 
>> *domain)
>>  {
>>  return -ENODEV;
>>  }
>> +static inline int
>> +iommu_cache_invalidate(struct iommu_domain *domain,
>> +   struct device *dev,
>> +   struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +return -ENODEV;
>> +}
>>  
>>  #endif /* CONFIG_IOMMU_API */
>>  
>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
>> index e9065bfa5b24..ae41385b0a7e 100644
>> --- a/include/uapi/linux/iommu.h
>> +++ b/include/uapi/linux/iommu.h
>> @@ -159,4 +159,75 @@ struct iommu_pasid_table_config {
>>  };
>>  };
>>  
>> +/* defines the granularity of the invalidation */
>> +enum iommu_inv_granularity {
>> +IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
>> +IOMMU_INV_GRANU_PASID,  /* pasid-selective invalidation */
>> +IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
>> +};
>> +
>> +/**
>> + * Address Selective Invalidation Structure
>> + *
>> + * @flags indicates the 

Re: [PATCH v4 04/22] iommu: Introduce attach/detach_pasid_table API

2019-03-05 Thread Auger Eric
Hi Jean,

On 3/5/19 4:23 PM, Jean-Philippe Brucker wrote:
> On 18/02/2019 13:54, Eric Auger wrote:
>> From: Jacob Pan 
>>
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> This patch generalizes the API introduced by Jacob & co-authors in
>> https://lwn.net/Articles/754331/
>>
>> v3 -> v4:
>> - s/set_pasid_table/attach_pasid_table
>> - restore detach_pasid_table. Detach can be used on unwind path.
>> - add padding
>> - remove @abort
>> - signature used for config and format
>> - add comments for fields in the SMMU struct
>>
>> v2 -> v3:
>> - replace unbind/bind by set_pasid_table
>> - move table pointer and pasid bits in the generic part of the struct
>>
>> v1 -> v2:
>> - restore the original pasid table name
>> - remove the struct device * parameter in the API
>> - reworked iommu_pasid_smmuv3
>> ---
>>  drivers/iommu/iommu.c  | 19 +++
>>  include/linux/iommu.h  | 22 ++
>>  include/uapi/linux/iommu.h | 47 ++
>>  3 files changed, 88 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index c297cdcf7f89..b3adb77cb14c 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1545,6 +1545,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
>> struct device *dev)
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> + struct iommu_pasid_table_config *cfg)
>> +{
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +return domain->ops->attach_pasid_table(domain, cfg);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +if (unlikely(!domain->ops->detach_pasid_table))
>> +return;
>> +
>> +domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index b38e0c100940..7045e26f3a7d 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -187,6 +187,8 @@ struct iommu_resv_region {
>>   * @domain_window_disable: Disable a particular window for a domain
>>   * @of_xlate: add OF master IDs to iommu grouping
>>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>> + * @attach_pasid_table: attach a pasid table
>> + * @detach_pasid_table: detach the pasid table
> 
> Should go before pgsize_bitmap
sure
> 
>>   */
>>  struct iommu_ops {
>>  bool (*capable)(enum iommu_cap);
>> @@ -229,6 +231,10 @@ struct iommu_ops {
>>  int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
>>  bool (*is_attach_deferred)(struct iommu_domain *domain, struct device 
>> *dev);
>>  
>> +int (*attach_pasid_table)(struct iommu_domain *domain,
>> +  struct iommu_pasid_table_config *cfg);
>> +void (*detach_pasid_table)(struct iommu_domain *domain);
>> +
>>  unsigned long pgsize_bitmap;
>>  };
>>  
>> @@ -339,6 +345,9 @@ extern int iommu_attach_device(struct iommu_domain 
>> *domain,
>> struct device *dev);
>>  extern void iommu_detach_device(struct iommu_domain *domain,
>>  struct device *dev);
>> +extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +struct iommu_pasid_table_config *cfg);
>> +extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> @@ -777,6 +786,19 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct 
>> fwnode_handle *fwnode)
>>  return NULL;
>>  }
>>  
>> +static inline
>> +int iommu_attach_pasid_table(struct 

Re: [PATCH v4 00/22] SMMUv3 Nested Stage Setup

2019-03-05 Thread Auger Eric
Hi,

On 2/18/19 2:54 PM, Eric Auger wrote:
> This series allows a virtualizer to program the nested stage mode.
> This is useful when both the host and the guest are exposed with
> an SMMUv3 and a PCI device is assigned to the guest using VFIO.
> 
> In this mode, the physical IOMMU must be programmed to translate
> the two stages: the one set up by the guest (IOVA -> GPA) and the
> one set up by the host VFIO driver as part of the assignment process
> (GPA -> HPA).
> 
> On Intel, this is traditionnaly achieved by combining the 2 stages
> into a single physical stage. However this relies on the capability
> to trap on each guest translation structure update. This is possible
> by using the VTD Caching Mode. Unfortunately the ARM SMMUv3 does
> not offer a similar mechanism.
> 
> However, the ARM SMMUv3 architecture supports 2 physical stages! Those
> were devised exactly with that use case in mind. Assuming the HW
> implements both stages (optional), the guest now can use stage 1
> while the host uses stage 2.
> 
> This assumes the virtualizer has means to propagate guest settings
> to the host SMMUv3 driver. This series brings this VFIO/IOMMU
> infrastructure.  Those services are:
> - bind the guest stage 1 configuration to the stream table entry
> - propagate guest TLB invalidations
> - bind MSI IOVAs
> - propagate faults collected at physical level up to the virtualizer
> 
> This series largely reuses the user API and infrastructure originally
> devised for SVA/SVM and patches submitted by Jacob, Yi Liu, Tianyu in
> [1-2] and Jean-Philippe [3-4].
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/v5.0-rc7-2stage-v4
For those interested in testing this kernel branch, you can use the
following qemu branch:

https://github.com/eauger/qemu/tree/v3.1.0-2stage-prev3-for-patchv4-testing

I will send an official QEMU RFCv3 asap.

Thanks

Eric
> 
> References:
> [1] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual
> Address (SVA)
> https://lwn.net/Articles/754331/
> [2] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
> (VFIO part)
> https://lists.linuxfoundation.org/pipermail/iommu/2017-April/021475.html
> [3] [v2,00/40] Shared Virtual Addressing for the IOMMU
> https://patchwork.ozlabs.org/cover/912129/
> [4] [PATCH v3 00/10] Shared Virtual Addressing for the IOMMU
> https://patchwork.kernel.org/cover/10608299/
> 
> History:
> v3 -> v4:
> - took into account Alex, jean-Philippe and Robin's comments on v3
> - rework of the smmuv3 driver integration
> - add tear down ops for msi binding and PASID table binding
> - fix S1 fault propagation
> - put fault reporting patches at the beginning of the series following
>   Jean-Philippe's request
> - update of the cache invalidate and fault API uapis
> - VFIO fault reporting rework with 2 separate regions and one mmappable
>   segment for the fault queue
> - moved to PATCH
> 
> v2 -> v3:
> - When registering the S1 MSI binding we now store the device handle. This
>   addresses Robin's comment about discimination of devices beonging to
>   different S1 groups and using different physical MSI doorbells.
> - Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
>   set the eventfd and expose the faults through an mmappable fault region
> 
> v1 -> v2:
> - Added the fault reporting capability
> - asid properly passed on invalidation (fix assignment of multiple
>   devices)
> - see individual change logs for more info
> 
> Eric Auger (13):
>   iommu: Introduce bind/unbind_guest_msi
>   vfio: VFIO_IOMMU_BIND/UNBIND_MSI
>   iommu/smmuv3: Get prepared for nested stage support
>   iommu/smmuv3: Implement attach/detach_pasid_table
>   iommu/smmuv3: Implement cache_invalidate
>   dma-iommu: Implement NESTED_MSI cookie
>   iommu/smmuv3: Implement bind/unbind_guest_msi
>   iommu/smmuv3: Report non recoverable faults
>   vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type
>   vfio-pci: Register an iommu fault handler
>   vfio_pci: Allow to mmap the fault queue
>   vfio-pci: Add VFIO_PCI_DMA_FAULT_IRQ_INDEX
>   vfio: Document nested stage control
> 
> Jacob Pan (4):
>   driver core: add per device iommu param
>   iommu: introduce device fault data
>   iommu: introduce device fault report API
>   iommu: Introduce attach/detach_pasid_table API
> 
> Jean-Philippe Brucker (2):
>   iommu/arm-smmu-v3: Link domains and devices
>   iommu/arm-smmu-v3: Maintain a SID->device structure
> 
> Liu, Yi L (3):
>   iommu: Introduce cache_invalidate API
>   vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE
>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
> 
>  Documentation/vfio.txt  |  83 
>  drivers/iommu/arm-smmu-v3.c | 580 ++--
>  drivers/iommu/dma-iommu.c   | 145 ++-
>  drivers/iommu/iommu.c   | 221 ++-
>  drivers/vfio/pci/vfio_pci.c | 214 ++
>  drivers/vfio/pci/vfio_pci_intrs.c   |  19 +
>  

Re: [PATCH v2] iommu: io-pgtable: Add ARM Mali midgard MMU page table format

2019-03-05 Thread Robin Murphy

On 04/03/2019 18:48, Rob Herring wrote:

On Mon, Mar 4, 2019 at 11:31 AM Robin Murphy  wrote:


On 04/03/2019 15:32, Rob Herring wrote:

On Fri, Mar 1, 2019 at 10:28 AM Robin Murphy  wrote:


On 27/02/2019 23:22, Rob Herring wrote:

ARM Mali midgard GPU page tables are similar to standard 64-bit stage 1
page tables, but have a few differences. Add a new format type to
represent the format. The input address size is 48-bits and the output
address size is 40-bits (and possibly less?). Note that the later
bifrost GPUs follow the standard 64-bit stage 1 format.

The differences in the format compared to 64-bit stage 1 format are:

The 3rd level page entry bits are 0x1 instead of 0x3 for page entries.

The access flags are not read-only and unprivileged, but read and write.
This is similar to stage 2 entries, but the memory attributes field matches
stage 1 being an index.

The nG bit is not set by the vendor driver. This one didn't seem to matter,
but we'll keep it aligned to the vendor driver.


[...]


+ cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
+ iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
+ if (iop)
+ cfg->arm_lpae_s1_cfg.tcr = 0;


The general design intent is that we return ready-to-go register values
here (much like mmu_get_as_setup() in mali_kbase, it seems), so I think
it's worth adding an arm_mali_cfg to the io_pgtable_cfg union and
transforming the VMSA output into final transtab/transcfg/memattr format
at this point, rather then callers needing to care (e.g. some of those
AS_TRANSTAB_LPAE_* bits look reminiscent of the walk attributes we set
up for a VMSA TCR).


I agree with returning ready-to-go values, but I'm not so sure the
bits are the same. Bits 0-1 are enable/mode bits which are pretty
specific to mali. Bit 2 is 'read inner' for whatever that means.
Perhaps it is read allocate cacheability, but that's a bit different
than RGN bits. Bit 4 is 'share outer'. Maybe it's the same as SH0 if
bit 3 is included, but I have no evidence of that. So I don't think
there's really anything to transform. We just set the bits needed. So
here's what I have in mind:


Right, my Friday-afternoon wording perhaps wasn't the best - by
"transform" I didn't mean to imply trying to reinterpret the default
settings we configure for a VMSA TCR, merely applying some
similarly-appropriate defaults to make a Mali TRANSTAB out of the VMSA TTBR.


iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
if (iop) {
u64 mair, ttbr;

/* Copy values as union fields overlap */
mair = cfg->arm_lpae_s1_cfg.mair[0];
ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];

cfg->arm_mali_lpae_cfg.mair = mair;
cfg->arm_mali_lpae_cfg.ttbr = ttbr;
cfg->arm_mali_lpae_cfg.ttbr |= ARM_MALI_LPAE_TTBR_READ_INNER |
  ARM_MALI_LPAE_TTBR_ADRMODE_TABLE;
}


...pretty much exactly like that (although I'd still prefer to use the
Mali register names for clarity, and presumably you'll still explicitly
initialise TRANSCFG too in the real patch).


No, TRANSCFG is only on Bifrost.


Ah, fair enough - I thought that your "cfg->arm_lpae_s1_cfg.tcr = 0;" 
was deliberately echoing the "setup->transcfg = 0;" in mali_kbase to 
imply that AS_TRANSCFG_ADRMODE_LEGACY was significant, but I suppose 
maybe Midgard *is* the legacy in this case. I'll admit I've not gone 
looking for the actual register-poking to see what's consumed by which 
devices.



While the page table format seems to
be standard stage 1 64-bit, the registers are still different from
anything else. So I guess we'll need yet another format definition
when we get there.


Hmm, at that point I'd be inclined to use standard AArch64 format and 
handle the rest in the driver, similar to how we repack TCR values into 
Context Descriptors for SMMUv3. Those TRANSCFG_PTW_* fields even look 
like they're pretending to be TCR.SH1 and TCR.IRGN1 (albeit with the 
latter having a wonky encoding, and the fact that there's no TTBR1 anyway).


I appreciate that somewhat undermines my argument for having io-pgtable 
fill in complete LPAE-format registers, so if you wanted the driver to 
handle both cases itself for consistency I wouldn't really mind - as 
long as LPAE still has its own init_fn where we can fine-tune the 
relevant constraints and sanity checks I'll be happy.



Also, we may still have to massage the register
values outside of this code. It's not going to know the
'system_coherency' value the kbase driver uses (And I'm not sure how
we want to express that upstream either).


AFAICS there are two possible aspects to coherency. One is I/O coherency 
(i.e. "can the GPU snoop CPU caches"), which is already controlled by 
the "dma-coherent" property. The other is whether the GPU cache itself 
is coherent with the other caches in the system (i.e. "can CPUs snoop 
the GPU cache; how much GPU cache maintenance is necessary") which 
should be something we can infer from the integration-specific 
compatible string, because we should always have an 

Re: [PATCH v4 05/22] iommu: Introduce cache_invalidate API

2019-03-05 Thread Jean-Philippe Brucker
On 18/02/2019 13:54, Eric Auger wrote:
> From: "Liu, Yi L" 
> 
> In any virtualization use case, when the first translation stage
> is "owned" by the guest OS, the host IOMMU driver has no knowledge
> of caching structure updates unless the guest invalidation activities
> are trapped by the virtualizer and passed down to the host.
> 
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own format.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> v3 -> v4:
> - full reshape of the API following Alex' comments
> 
> v1 -> v2:
> - add arch_id field
> - renamed tlb_invalidate into cache_invalidate as this API allows
>   to invalidate context caches on top of IOTLBs
> 
> v1:
> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
> header. Commit message reworded.
> ---
>  drivers/iommu/iommu.c  | 14 
>  include/linux/iommu.h  | 14 
>  include/uapi/linux/iommu.h | 71 ++
>  3 files changed, 99 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index b3adb77cb14c..bcb8eb15426c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1564,6 +1564,20 @@ void iommu_detach_pasid_table(struct iommu_domain 
> *domain)
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>  
> +int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> +struct iommu_cache_invalidate_info *inv_info)
> +{
> + int ret = 0;
> +
> + if (unlikely(!domain->ops->cache_invalidate))
> + return -ENODEV;
> +
> + ret = domain->ops->cache_invalidate(domain, dev, inv_info);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 7045e26f3a7d..a3b879d0753c 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -189,6 +189,7 @@ struct iommu_resv_region {
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   * @attach_pasid_table: attach a pasid table
>   * @detach_pasid_table: detach the pasid table
> + * @cache_invalidate: invalidate translation caches
>   */
>  struct iommu_ops {
>   bool (*capable)(enum iommu_cap);
> @@ -235,6 +236,9 @@ struct iommu_ops {
> struct iommu_pasid_table_config *cfg);
>   void (*detach_pasid_table)(struct iommu_domain *domain);
>  
> + int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
> + struct iommu_cache_invalidate_info *inv_info);
> +
>   unsigned long pgsize_bitmap;
>  };
>  
> @@ -348,6 +352,9 @@ extern void iommu_detach_device(struct iommu_domain 
> *domain,
>  extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>   struct iommu_pasid_table_config *cfg);
>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
> +   struct device *dev,
> +   struct iommu_cache_invalidate_info *inv_info);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -798,6 +805,13 @@ void iommu_detach_pasid_table(struct iommu_domain 
> *domain)
>  {
>   return -ENODEV;
>  }
> +static inline int
> +iommu_cache_invalidate(struct iommu_domain *domain,
> +struct device *dev,
> +struct iommu_cache_invalidate_info *inv_info)
> +{
> + return -ENODEV;
> +}
>  
>  #endif /* CONFIG_IOMMU_API */
>  
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index e9065bfa5b24..ae41385b0a7e 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -159,4 +159,75 @@ struct iommu_pasid_table_config {
>   };
>  };
>  
> +/* defines the granularity of the invalidation */
> +enum iommu_inv_granularity {
> + IOMMU_INV_GRANU_DOMAIN, /* domain-selective invalidation */
> + IOMMU_INV_GRANU_PASID,  /* pasid-selective invalidation */
> + IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation */
> +};
> +
> +/**
> + * Address Selective Invalidation Structure
> + *
> + * @flags indicates the granularity of the address-selective invalidation
> + * - if PASID bit is set, @pasid field is populated and the invalidation
> + *   relates to cache entries tagged with 

Re: [PATCH v4 04/22] iommu: Introduce attach/detach_pasid_table API

2019-03-05 Thread Jean-Philippe Brucker
On 18/02/2019 13:54, Eric Auger wrote:
> From: Jacob Pan 
> 
> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on the guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/Intel terminology) while the host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to set/unset the pasid table information.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> This patch generalizes the API introduced by Jacob & co-authors in
> https://lwn.net/Articles/754331/
> 
> v3 -> v4:
> - s/set_pasid_table/attach_pasid_table
> - restore detach_pasid_table. Detach can be used on unwind path.
> - add padding
> - remove @abort
> - signature used for config and format
> - add comments for fields in the SMMU struct
> 
> v2 -> v3:
> - replace unbind/bind by set_pasid_table
> - move table pointer and pasid bits in the generic part of the struct
> 
> v1 -> v2:
> - restore the original pasid table name
> - remove the struct device * parameter in the API
> - reworked iommu_pasid_smmuv3
> ---
>  drivers/iommu/iommu.c  | 19 +++
>  include/linux/iommu.h  | 22 ++
>  include/uapi/linux/iommu.h | 47 ++
>  3 files changed, 88 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index c297cdcf7f89..b3adb77cb14c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1545,6 +1545,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
> struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +  struct iommu_pasid_table_config *cfg)
> +{
> + if (unlikely(!domain->ops->attach_pasid_table))
> + return -ENODEV;
> +
> + return domain->ops->attach_pasid_table(domain, cfg);
> +}
> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
> +
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> + if (unlikely(!domain->ops->detach_pasid_table))
> + return;
> +
> + domain->ops->detach_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b38e0c100940..7045e26f3a7d 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -187,6 +187,8 @@ struct iommu_resv_region {
>   * @domain_window_disable: Disable a particular window for a domain
>   * @of_xlate: add OF master IDs to iommu grouping
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @attach_pasid_table: attach a pasid table
> + * @detach_pasid_table: detach the pasid table

Should go before pgsize_bitmap

>   */
>  struct iommu_ops {
>   bool (*capable)(enum iommu_cap);
> @@ -229,6 +231,10 @@ struct iommu_ops {
>   int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
>   bool (*is_attach_deferred)(struct iommu_domain *domain, struct device 
> *dev);
>  
> + int (*attach_pasid_table)(struct iommu_domain *domain,
> +   struct iommu_pasid_table_config *cfg);
> + void (*detach_pasid_table)(struct iommu_domain *domain);
> +
>   unsigned long pgsize_bitmap;
>  };
>  
> @@ -339,6 +345,9 @@ extern int iommu_attach_device(struct iommu_domain 
> *domain,
>  struct device *dev);
>  extern void iommu_detach_device(struct iommu_domain *domain,
>   struct device *dev);
> +extern int iommu_attach_pasid_table(struct iommu_domain *domain,
> + struct iommu_pasid_table_config *cfg);
> +extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -777,6 +786,19 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct 
> fwnode_handle *fwnode)
>   return NULL;
>  }
>  
> +static inline
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +  struct iommu_pasid_table_config *cfg)
> +{
> + return -ENODEV;
> +}
> +
> +static inline
> +void 

Re: [PATCH v4 03/22] iommu: introduce device fault report API

2019-03-05 Thread Jean-Philippe Brucker
On 18/02/2019 13:54, Eric Auger wrote:
[...]> +/**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, call this handler with the fault 
> event
> + * and data as argument. The handler should return 0 on success. If the 
> fault is
> + * recoverable (IOMMU_FAULT_PAGE_REQ), the handler can also complete
> + * the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.

The comment refers to function and values that haven't been defined yet.
Either the page_response() patch should come before, or we need to split
this patch.

Something I missed before: if the handler fails (returns != 0) it should
complete the fault by calling iommu_page_response(), if we're not doing
it in iommu_report_device_fault(). It should be indicated in this
comment. It's safe for the handler to call page_response() since we're
not holding fault_param->lock when calling the handler.

> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
[...]
> +/**
> + * iommu_report_device_fault() - Report fault event to device
> + * @dev: the device
> + * @evt: fault event data
> + *
> + * Called by IOMMU model specific drivers when fault is detected, typically
> + * in a threaded IRQ handler.
> + *
> + * Return 0 on success, or an error.
> + */
> +int iommu_report_device_fault(struct device *dev, struct iommu_fault_event 
> *evt)
> +{
> + int ret = 0;
> + struct iommu_fault_event *evt_pending;
> + struct iommu_fault_param *fparam;
> +
> + /* iommu_param is allocated when device is added to group */
> + if (!dev->iommu_param | !evt)

Typo: ||

Thanks,
Jean

> + return -EINVAL;
> + /* we only report device fault if there is a handler registered */
> + mutex_lock(>iommu_param->lock);
> + if (!dev->iommu_param->fault_param ||
> + !dev->iommu_param->fault_param->handler) {
> + ret = -EINVAL;
> + goto done_unlock;
> + }
> + fparam = dev->iommu_param->fault_param;
> + if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
> + evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE) {
> + evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
> + GFP_KERNEL);
> + if (!evt_pending) {
> + ret = -ENOMEM;
> + goto done_unlock;
> + }
> + mutex_lock(>lock);
> + list_add_tail(_pending->list, >faults);
> + mutex_unlock(>lock);
> + }
> + ret = fparam->handler(evt, fparam->data);
> +done_unlock:
> + mutex_unlock(>iommu_param->lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_report_device_fault);
[...]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 02/22] iommu: introduce device fault data

2019-03-05 Thread Jean-Philippe Brucker
On 18/02/2019 13:54, Eric Auger wrote:
> From: Jacob Pan 
> 
> Device faults detected by IOMMU can be reported outside the IOMMU
> subsystem for further processing. This patch introduces
> a generic device fault data structure.
> 
> The fault can be either an unrecoverable fault or a page request,
> also referred to as a recoverable fault.
> 
> We only care about non internal faults that are likely to be reported
> to an external subsystem.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v3 -> v4:
> - use a union containing aither an unrecoverable fault or a page
>   request message. Move the device private data in the page request
>   structure. Reshuffle the fields and use flags.
> - move fault perm attributes to the uapi
> - remove a bunch of iommu_fault_reason enum values that were related
>   to internal errors
> ---
>  include/linux/iommu.h  |  47 +++
>  include/uapi/linux/iommu.h | 115 +
>  2 files changed, 162 insertions(+)
>  create mode 100644 include/uapi/linux/iommu.h
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e90da6b6f3d1..032d33894723 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IOMMU_READ   (1 << 0)
>  #define IOMMU_WRITE  (1 << 1)
> @@ -48,6 +49,7 @@ struct bus_type;
>  struct device;
>  struct iommu_domain;
>  struct notifier_block;
> +struct iommu_fault_event;
>  
>  /* iommu fault flags */
>  #define IOMMU_FAULT_READ 0x0
> @@ -55,6 +57,7 @@ struct notifier_block;
>  
>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>   struct device *, unsigned long, int, void *);
> +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
>  
>  struct iommu_domain_geometry {
>   dma_addr_t aperture_start; /* First address that can be mapped*/
> @@ -243,6 +246,49 @@ struct iommu_device {
>   struct device *dev;
>  };
>  
> +/**
> + * struct iommu_fault_event - Generic per device fault data
> + *
> + * - PCI and non-PCI devices
> + * - Recoverable faults (e.g. page request), information based on PCI ATS
> + *   and PASID spec.

"for example information based on PCI PRI and PASID extensions"? ATS+PRI
have been integrated into the main spec, and only PRI is relevant here.

> + * - Un-recoverable faults of device interest

"of interest to device drivers"?

> + * - DMA remapping and IRQ remapping faults
> + *
> + * @fault: fault descriptor
> + * @iommu_private: used by the IOMMU driver for storing fault-specific
> + * data. Users should not modify this field before
> + * sending the fault response.
> + */
> +struct iommu_fault_event {
> + struct iommu_fault fault;
> + u64 iommu_private;
> +};
> +
> +/**
> + * struct iommu_fault_param - per-device IOMMU fault data
> + * @dev_fault_handler: Callback function to handle IOMMU faults at device 
> level
> + * @data: handler private data
> + *
> + */
> +struct iommu_fault_param {
> + iommu_dev_fault_handler_t handler;
> + void *data;
> +};
> +
> +/**
> + * struct iommu_param - collection of per-device IOMMU data
> + *
> + * @fault_param: IOMMU detected device fault reporting data
> + *
> + * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
> + *   struct iommu_group  *iommu_group;
> + *   struct iommu_fwspec *iommu_fwspec;
> + */
> +struct iommu_param {
> + struct iommu_fault_param *fault_param;
> +};
> +
>  int  iommu_device_register(struct iommu_device *iommu);
>  void iommu_device_unregister(struct iommu_device *iommu);
>  int  iommu_device_sysfs_add(struct iommu_device *iommu,
> @@ -418,6 +464,7 @@ struct iommu_ops {};
>  struct iommu_group {};
>  struct iommu_fwspec {};
>  struct iommu_device {};
> +struct iommu_fault_param {};
>  
>  static inline bool iommu_present(struct bus_type *bus)
>  {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> new file mode 100644
> index ..7ebf23ed6ccb
> --- /dev/null
> +++ b/include/uapi/linux/iommu.h
> @@ -0,0 +1,115 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * IOMMU user API definitions
> + */
> +
> +#ifndef _UAPI_IOMMU_H
> +#define _UAPI_IOMMU_H
> +
> +#include 
> +
> +/*  Generic fault types, can be expanded IRQ remapping fault */
> +enum iommu_fault_type {
> + IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
> + IOMMU_FAULT_PAGE_REQ,   /* page request fault */
> +};
> +
> +enum iommu_fault_reason {
> + IOMMU_FAULT_REASON_UNKNOWN = 0,
> +
> + /* Could not access the PASID table (fetch caused external abort) */
> + IOMMU_FAULT_REASON_PASID_FETCH,
> +
> + /* pasid entry is invalid or has configuration errors */
> + IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
> 

Re: [PATCH 4/4] iommu/vt-d: Remove lazy allocation of domains

2019-03-05 Thread James Sewart via iommu
Hey Lu,

> On 5 Mar 2019, at 06:59, Lu Baolu  wrote:
> 
> Hi,
> 
> It's hard for me to understand why do we remove the rmrr related
> code in this patch.

The RMRR code removed here requires the lazy allocation of domains to 
exist, as it is run before iommu.c would assign IOMMU groups and attach a 
domain. Before patch 3, removing this code would mean the RMRR regions are 
never mapped for a domain: iommu.c will allocate a default domain for the 
group that a device is about to be put in, it will attach the domain to 
the device, then for each region returned by get_resv_regions it will 
create an identity map, this is where the RMRRs are setup for the default 
domain.

> 
> And, now we have two places to hold a domain for a device: group and
> dev->info. We can consider to optimize the use of per device iommu
> arch data. This should be later work anyway.
> 
> More comments inline.
> 
> On 3/4/19 11:47 PM, James Sewart wrote:
>> The generic IOMMU code will allocate and attach a dma ops domain to each
>> device that comes online, replacing any lazy allocated domain. Removes
>> RMRR application at iommu init time as we won't have a domain attached
>> to any device. iommu.c will do this after attaching a device using
>> create_direct_mappings.
>> Signed-off-by: James Sewart 
>> ---
>>  drivers/iommu/intel-iommu.c | 202 ++--
>>  1 file changed, 8 insertions(+), 194 deletions(-)
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 71cd6bbfec05..282257e2628d 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -2595,118 +2595,6 @@ static struct dmar_domain 
>> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>>  return domain;
>>  }
>>  -static int get_last_alias(struct pci_dev *pdev, u16 alias, void *opaque)
>> -{
>> -*(u16 *)opaque = alias;
>> -return 0;
>> -}
>> -
>> -static struct dmar_domain *find_or_alloc_domain(struct device *dev, int gaw)
>> -{
>> -struct device_domain_info *info = NULL;
>> -struct dmar_domain *domain = NULL;
>> -struct intel_iommu *iommu;
>> -u16 dma_alias;
>> -unsigned long flags;
>> -u8 bus, devfn;
>> -
>> -iommu = device_to_iommu(dev, , );
>> -if (!iommu)
>> -return NULL;
>> -
>> -if (dev_is_pci(dev)) {
>> -struct pci_dev *pdev = to_pci_dev(dev);
>> -
>> -pci_for_each_dma_alias(pdev, get_last_alias, _alias);
>> -
>> -spin_lock_irqsave(_domain_lock, flags);
>> -info = dmar_search_domain_by_dev_info(pci_domain_nr(pdev->bus),
>> -  PCI_BUS_NUM(dma_alias),
>> -  dma_alias & 0xff);
>> -if (info) {
>> -iommu = info->iommu;
>> -domain = info->domain;
>> -}
>> -spin_unlock_irqrestore(_domain_lock, flags);
>> -
>> -/* DMA alias already has a domain, use it */
>> -if (info)
>> -goto out;
>> -}
>> -
>> -/* Allocate and initialize new domain for the device */
>> -domain = alloc_domain(0);
>> -if (!domain)
>> -return NULL;
>> -if (domain_init(domain, iommu, gaw)) {
>> -domain_exit(domain);
>> -return NULL;
>> -}
>> -
>> -out:
>> -
>> -return domain;
>> -}
>> -
>> -static struct dmar_domain *set_domain_for_dev(struct device *dev,
>> -  struct dmar_domain *domain)
>> -{
>> -struct intel_iommu *iommu;
>> -struct dmar_domain *tmp;
>> -u16 req_id, dma_alias;
>> -u8 bus, devfn;
>> -
>> -iommu = device_to_iommu(dev, , );
>> -if (!iommu)
>> -return NULL;
>> -
>> -req_id = ((u16)bus << 8) | devfn;
>> -
>> -if (dev_is_pci(dev)) {
>> -struct pci_dev *pdev = to_pci_dev(dev);
>> -
>> -pci_for_each_dma_alias(pdev, get_last_alias, _alias);
>> -
>> -/* register PCI DMA alias device */
>> -if (req_id != dma_alias) {
>> -tmp = dmar_insert_one_dev_info(iommu, 
>> PCI_BUS_NUM(dma_alias),
>> -dma_alias & 0xff, NULL, domain);
>> -
>> -if (!tmp || tmp != domain)
>> -return tmp;
>> -}
>> -}
>> -
>> -tmp = dmar_insert_one_dev_info(iommu, bus, devfn, dev, domain);
>> -if (!tmp || tmp != domain)
>> -return tmp;
>> -
>> -return domain;
>> -}
>> -
>> -static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
>> -{
>> -struct dmar_domain *domain, *tmp;
>> -
>> -domain = find_domain(dev);
>> -if (domain)
>> -goto out;
>> -
>> -domain = find_or_alloc_domain(dev, gaw);
>> -if (!domain)
>> -goto out;
>> -
>> -tmp = set_domain_for_dev(dev, domain);
>> -if (!tmp || domain != tmp) {
>> -

Re: [PATCH 3/4] iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be allocated

2019-03-05 Thread James Sewart via iommu
Hey Lu,

> On 5 Mar 2019, at 06:46, Lu Baolu  wrote:
> 
> Hi,
> 
> On 3/4/19 11:46 PM, James Sewart wrote:
>> Allowing IOMMU_DOMAIN_DMA type IOMMU domain to be allocated allows the
>> default_domain of an iommu_group to be set. This delegates device-domain
>> relationships to the generic IOMMU code.
>> Signed-off-by: James Sewart 
>> ---
>>  drivers/iommu/intel-iommu.c | 113 +++-
>>  1 file changed, 84 insertions(+), 29 deletions(-)
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 8e0a4e2ff77f..71cd6bbfec05 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -309,6 +309,14 @@ static int hw_pass_through = 1;
>>  /* si_domain contains mulitple devices */
>>  #define DOMAIN_FLAG_STATIC_IDENTITY (1 << 1)
>>  +/* Domain managed externally, don't cleanup if it isn't attached
>> + * to any devices. */
>> +#define DOMAIN_FLAG_MANAGED_EXTERNALLY  (1 << 2)
>> +
>> +/* Set after domain initialisation. Used when allocating dma domains to
>> + * defer domain initialisation until it is attached to a device */
>> +#define DOMAIN_FLAG_INITIALISED (1 << 4)
> 
> Why do you skip bit 3?

This was an oversight, I will update to use bit 3.

> 
>> +
>>  #define for_each_domain_iommu(idx, domain)  \
>>  for (idx = 0; idx < g_num_of_iommus; idx++) \
>>  if (domain->iommu_refcnt[idx])
>> @@ -558,6 +566,16 @@ static inline int domain_type_is_vm_or_si(struct 
>> dmar_domain *domain)
>>  DOMAIN_FLAG_STATIC_IDENTITY);
>>  }
>>  +static inline int domain_managed_externally(struct dmar_domain *domain)
>> +{
>> +return domain->flags & DOMAIN_FLAG_MANAGED_EXTERNALLY;
>> +}
>> +
>> +static inline int domain_is_initialised(struct dmar_domain *domain)
>> +{
>> +return domain->flags & DOMAIN_FLAG_INITIALISED;
>> +}
>> +
>>  static inline int domain_pfn_supported(struct dmar_domain *domain,
>> unsigned long pfn)
>>  {
>> @@ -1662,7 +1680,7 @@ static void disable_dmar_iommu(struct intel_iommu 
>> *iommu)
>>  __dmar_remove_one_dev_info(info);
>>  -   if (!domain_type_is_vm_or_si(domain)) {
>> +if (!domain_managed_externally(domain)) {
>>  /*
>>   * The domain_exit() function  can't be called under
>>   * device_domain_lock, as it takes this lock itself.
>> @@ -1895,6 +1913,7 @@ static int domain_init(struct dmar_domain *domain, 
>> struct intel_iommu *iommu,
>>  domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
>>  if (!domain->pgd)
>>  return -ENOMEM;
>> +domain->flags |= DOMAIN_FLAG_INITIALISED;
>>  __iommu_flush_cache(iommu, domain->pgd, PAGE_SIZE);
>>  return 0;
>>  }
>> @@ -2807,14 +2826,10 @@ static inline void iommu_prepare_isa(void)
>>static int md_domain_init(struct dmar_domain *domain, int guest_width);
>>  -static int __init si_domain_init(int hw)
>> +static int __init si_domain_init(struct dmar_domain *si_domain, int hw)
>>  {
>>  int nid, ret = 0;
>>  -   si_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
>> -if (!si_domain)
>> -return -EFAULT;
>> -
>>  if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
>>  domain_exit(si_domain);
>>  return -EFAULT;
>> @@ -3417,9 +3432,16 @@ static int __init init_dmars(void)
>>  check_tylersburg_isoch();
>>  if (iommu_identity_mapping) {
>> -ret = si_domain_init(hw_pass_through);
>> -if (ret)
>> +si_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
> 
> Where will this si_domain be used? We still need to keep the global
> si_domain?

I am unsure of the best thing to do here. The si_domain can be initialised 
as a hardware passthrough which means it doesn’t have any mappings applied. 
I think any user allocating a domain here will always want a software 
passthrough domain. I’m not sure if/how to consolidate these two.

> 
>> +if (!si_domain) {
>> +ret = -EFAULT;
>>  goto free_iommu;
>> +}
>> +ret = si_domain_init(si_domain, hw_pass_through);
>> +if (ret) {
>> +domain_exit(si_domain);
>> +goto free_iommu;
>> +}
>>  }
>>@@ -4575,7 +4597,7 @@ static int device_notifier(struct notifier_block 
>> *nb,
>>  return 0;
>>  dmar_remove_one_dev_info(domain, dev);
>> -if (!domain_type_is_vm_or_si(domain) && list_empty(>devices))
>> +if (!domain_managed_externally(domain) && list_empty(>devices))
>>  domain_exit(domain);
>>  return 0;
>> @@ -5020,6 +5042,7 @@ static int md_domain_init(struct dmar_domain *domain, 
>> int guest_width)
>>  domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
>>  if (!domain->pgd)
>>  return 

Re: [PATCH 0/4] iommu/vt-d: Fix-up device-domain relationship by refactoring to use iommu group default domain.

2019-03-05 Thread James Sewart via iommu
Hey Lu,

The motivation for this is buggy domain <-> IOMMU group relationship when 
using find_or_alloc_domain. From what I have read it should be the case 
that an IOMMU domain is shared by all devices in the same group, thus the 
same mappings. This is because the IOMMU group is determined by things 
like ACS settings on pci switches which determines whether devices can 
talk to each other.

In find_or_alloc_domain we only determine domain sharing based on devices 
returned by pci_for_each_dma_alias, whereas there are many more checks in 
pci_device_group(e.g. ACS settings of intermediary pci switches), which is 
used for determining which IOMMU group a device is in. This discontinuity 
means it can be the case that each device within an IOMMU group will have 
different domains.

We see issues as it is supposed to be safe to assume that devices within a 
group should be considered to be in the same address space, but if each 
device has its own domain then any mapping created won’t be valid for the 
other devices, and even could be mapped differently for each device. This 
also could cause issues with a device whose RMRR's are not applied to the 
domains of other devices within its group, these regions could be remapped 
for the other devices resulting in unexpected behaviour during 
peer-to-peer transfers.

Cheers,
James


> On 5 Mar 2019, at 06:05, Lu Baolu  wrote:
> 
> Hi James,
> 
> Very glad to see this. Thank you!
> 
> On 3/4/19 11:41 PM, James Sewart wrote:
>> Hey,
>> This patchset moves IOMMU_DOMAIN_DMA iommu domain management to iommu.c.
>> This avoids the use of find_or_alloc_domain whose domain assignment is
>> inconsistent with the iommu grouping as determined by pci_device_group.
> 
> Is this a bug fix or an improvement? What's the real issue will it cause
> if we go without this patch set?
> 
> Best regards,
> Lu Baolu
> 
>> Patch 3 permits domain type IOMMU_DOMAIN_DMA to be allocated via the
>> iommu_ops api, allowing the default_domain of an iommu group to be set in
>> iommu.c. This domain will be attached to every device that is brought up
>> with an iommu group, and the devices reserved regions will be mapped using
>> regions returned by get_resv_regions.
>> In intel_iommu_domain_alloc we don’t know the IOMMU this domain will be
>> associated with so we defer full initialisation until
>> intel_iommu_attach_device. Currently iommu.c:iommu_group_add_device will
>> try to map a devices reserved regions before attaching the domain which
>> would cause issue if the domain is not fully initialised. This is
>> addressed in patch 1 by moving the mapping to after attaching.
>> Patch 2 implements function apply_resv_region, used in
>> iommu_group_create_direct_mappings to mark the reserved regions as non
>> mappable for the dma_map_ops api.
>> Patch 4 removes the domain lazy allocation logic. Before this patch the
>> lazy allocation logic would not be used as any domain allocated using
>> these paths would be replaced when attaching the group default domain.
>> Default domain allocation has been tested with and without this patch on
>> 4.19.
>> Cheers,
>> James.
>> James Sewart (4):
>>   iommu: Move iommu_group_create_direct_mappings to after device_attach
>>   iommu/vt-d: Implement apply_resv_region for reserving IOVA ranges
>>   iommu/vt-d: Allow IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY to be 
>> allocated
>>   iommu/vt-d: Remove lazy allocation of domains
>>  drivers/iommu/intel-iommu.c | 329 
>>  drivers/iommu/iommu.c   |   4 +-
>>  2 files changed, 108 insertions(+), 225 deletions(-)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Xen-devel] [PATCH] Revert "swiotlb: remove SWIOTLB_MAP_ERROR"

2019-03-05 Thread Julien Grall

Hi Arnd,

On 3/5/19 8:16 AM, Arnd Bergmann wrote:

On Tue, Mar 5, 2019 at 12:56 AM Robin Murphy  wrote:

On 2019-03-04 7:59 pm, Arnd Bergmann wrote:

This reverts commit b907e20508d0 ("swiotlb: remove SWIOTLB_MAP_ERROR"), which
introduced an overflow warning in configurations that have a larger
dma_addr_t than phys_addr_t:

In file included from include/linux/dma-direct.h:5,
   from kernel/dma/swiotlb.c:23:
kernel/dma/swiotlb.c: In function 'swiotlb_tbl_map_single':
include/linux/dma-mapping.h:136:28: error: conversion from 'long long unsigned 
int' to 'phys_addr_t' {aka 'unsigned int'} changes value from 
'18446744073709551615' to '4294967295' [-Werror=overflow]
   #define DMA_MAPPING_ERROR  (~(dma_addr_t)0)
  ^
kernel/dma/swiotlb.c:544:9: note: in expansion of macro 'DMA_MAPPING_ERROR'
return DMA_MAPPING_ERROR;

The configuration that caused this is on 32-bit ARM, where the DMA address
space depends on the enabled hardware platforms, while the physical
address space depends on the type of MMU chosen (classic vs LPAE).


Are these real platforms, or random configs? Realistically I don't see a
great deal of need to support DMA_ADDR_T_64BIT for non-LPAE.
Particularly in this case since AFAIK the only selector of SWIOTLB on
Arm is Xen, and that by definition is never going to be useful on
non-LPAE hardware.

...
On Mon, Mar 4, 2019 at 11:00 PM Konrad Rzeszutek Wilk
 wrote:

What about making the phys_addr_t and dma_addr_t have the same
width with some magic #ifdef hackery?


As far as I can tell, only randconfig builds see this problem, in
real systems phys_addr_t is normally the same as dma_addr_t,
and you could reasonably have a machine with a larger phys_addr_t
than dma_addr_t but wouldn't need to bother.


On Xen, dma_addr_t will always be 64-bit while the phys_addr_t will 
depend on the MMU type. So we may have phys_addr_t smaller than 
dma_addr_t from the kernel point of view.


Cheers,

--
Julien Grall
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Xen-devel] [PATCH] Revert "swiotlb: remove SWIOTLB_MAP_ERROR"

2019-03-05 Thread Julien Grall

Hi Robin,

On 3/4/19 11:56 PM, Robin Murphy wrote:

On 2019-03-04 7:59 pm, Arnd Bergmann wrote:
This reverts commit b907e20508d0 ("swiotlb: remove 
SWIOTLB_MAP_ERROR"), which

introduced an overflow warning in configurations that have a larger
dma_addr_t than phys_addr_t:

In file included from include/linux/dma-direct.h:5,
  from kernel/dma/swiotlb.c:23:
kernel/dma/swiotlb.c: In function 'swiotlb_tbl_map_single':
include/linux/dma-mapping.h:136:28: error: conversion from 'long long 
unsigned int' to 'phys_addr_t' {aka 'unsigned int'} changes value from 
'18446744073709551615' to '4294967295' [-Werror=overflow]

  #define DMA_MAPPING_ERROR  (~(dma_addr_t)0)
 ^
kernel/dma/swiotlb.c:544:9: note: in expansion of macro 
'DMA_MAPPING_ERROR'

   return DMA_MAPPING_ERROR;

The configuration that caused this is on 32-bit ARM, where the DMA 
address

space depends on the enabled hardware platforms, while the physical
address space depends on the type of MMU chosen (classic vs LPAE).


Are these real platforms, or random configs? Realistically I don't see a 
great deal of need to support DMA_ADDR_T_64BIT for non-LPAE. 


This is selected by CONFIG_XEN no matter the type of MMU chosen (see 
more below).


Particularly in this case since AFAIK the only selector of SWIOTLB on 
Arm is Xen, and that by definition is never going to be useful on 
non-LPAE hardware.


While Xen itself requires LPAE, it is still possible to run a non-LPAE 
kernel in the guest. For instance, last time I checked, Debian was 
shipping only non-LPAE kernel for Arm32.


On Arm, swiotlb is only used by the hardware domain (aka Dom0) to allow 
DMA in memory mapped from other guest. So the returned DMA address may 
be 64-bit. Hence why we select DMA_ADDR_T_64BIT above.


Cheers,

--
Julien Grall
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] Revert "swiotlb: remove SWIOTLB_MAP_ERROR"

2019-03-05 Thread Arnd Bergmann
On Tue, Mar 5, 2019 at 12:56 AM Robin Murphy  wrote:
> On 2019-03-04 7:59 pm, Arnd Bergmann wrote:
> > This reverts commit b907e20508d0 ("swiotlb: remove SWIOTLB_MAP_ERROR"), 
> > which
> > introduced an overflow warning in configurations that have a larger
> > dma_addr_t than phys_addr_t:
> >
> > In file included from include/linux/dma-direct.h:5,
> >   from kernel/dma/swiotlb.c:23:
> > kernel/dma/swiotlb.c: In function 'swiotlb_tbl_map_single':
> > include/linux/dma-mapping.h:136:28: error: conversion from 'long long 
> > unsigned int' to 'phys_addr_t' {aka 'unsigned int'} changes value from 
> > '18446744073709551615' to '4294967295' [-Werror=overflow]
> >   #define DMA_MAPPING_ERROR  (~(dma_addr_t)0)
> >  ^
> > kernel/dma/swiotlb.c:544:9: note: in expansion of macro 'DMA_MAPPING_ERROR'
> >return DMA_MAPPING_ERROR;
> >
> > The configuration that caused this is on 32-bit ARM, where the DMA address
> > space depends on the enabled hardware platforms, while the physical
> > address space depends on the type of MMU chosen (classic vs LPAE).
>
> Are these real platforms, or random configs? Realistically I don't see a
> great deal of need to support DMA_ADDR_T_64BIT for non-LPAE.
> Particularly in this case since AFAIK the only selector of SWIOTLB on
> Arm is Xen, and that by definition is never going to be useful on
> non-LPAE hardware.
...
On Mon, Mar 4, 2019 at 11:00 PM Konrad Rzeszutek Wilk
 wrote:
> What about making the phys_addr_t and dma_addr_t have the same
> width with some magic #ifdef hackery?

As far as I can tell, only randconfig builds see this problem, in
real systems phys_addr_t is normally the same as dma_addr_t,
and you could reasonably have a machine with a larger phys_addr_t
than dma_addr_t but wouldn't need to bother.

The reason I'd like to keep them independent on ARM is that
the difference does occasionally find real driver bugs where some
drivers happily mix phys_addr_t and dma_addr_t in ways that
are broken even when they are the same size.

On Tue, Mar 5, 2019 at 12:56 AM Robin Murphy  wrote:
> Fair enough that we don't still don't want even randconfigs generating
> warnings, though. As long as this change doesn't let SWIOTLB_MAP_ERROR
> leak out to logic expecting DMA_MAP_ERROR - which does appear to be the
> case - and is also still OK for the opposite weirdness of
> PHYS_ADDR_T_64BIT && !DMA_ADDR_T_64BIT, then I think it's reasonable.

Another option would be to change the return type of swiotlb_tbl_map_single()
to 'u64', which would always be large enough to contain both a phys_addr_t
and DMA_MAP_ERROR. I first tried changing the return type to dma_addr_t,
which solves the build issue, but I couldn't convince myself that this is
correct in all cases, or that this is a sensible type to use here.

   Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 00/22] SMMUv3 Nested Stage Setup

2019-03-05 Thread Auger Eric
Hi,

On 2/18/19 2:54 PM, Eric Auger wrote:
> This series allows a virtualizer to program the nested stage mode.
> This is useful when both the host and the guest are exposed with
> an SMMUv3 and a PCI device is assigned to the guest using VFIO.
> 
> In this mode, the physical IOMMU must be programmed to translate
> the two stages: the one set up by the guest (IOVA -> GPA) and the
> one set up by the host VFIO driver as part of the assignment process
> (GPA -> HPA).
> 
> On Intel, this is traditionnaly achieved by combining the 2 stages
> into a single physical stage. However this relies on the capability
> to trap on each guest translation structure update. This is possible
> by using the VTD Caching Mode. Unfortunately the ARM SMMUv3 does
> not offer a similar mechanism.
> 
> However, the ARM SMMUv3 architecture supports 2 physical stages! Those
> were devised exactly with that use case in mind. Assuming the HW
> implements both stages (optional), the guest now can use stage 1
> while the host uses stage 2.
> 
> This assumes the virtualizer has means to propagate guest settings
> to the host SMMUv3 driver. This series brings this VFIO/IOMMU
> infrastructure.  Those services are:
> - bind the guest stage 1 configuration to the stream table entry
> - propagate guest TLB invalidations
> - bind MSI IOVAs
> - propagate faults collected at physical level up to the virtualizer
> 
> This series largely reuses the user API and infrastructure originally
> devised for SVA/SVM and patches submitted by Jacob, Yi Liu, Tianyu in
> [1-2] and Jean-Philippe [3-4].

I am about to respin to fix the fault handler unregistration bug
reported by Vincent. Are there any other comments (iommu uapi changes,
smmuv3 driver changes)?

Thanks

Eric
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/v5.0-rc7-2stage-v4
> 
> References:
> [1] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual
> Address (SVA)
> https://lwn.net/Articles/754331/
> [2] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
> (VFIO part)
> https://lists.linuxfoundation.org/pipermail/iommu/2017-April/021475.html
> [3] [v2,00/40] Shared Virtual Addressing for the IOMMU
> https://patchwork.ozlabs.org/cover/912129/
> [4] [PATCH v3 00/10] Shared Virtual Addressing for the IOMMU
> https://patchwork.kernel.org/cover/10608299/
> 
> History:
> v3 -> v4:
> - took into account Alex, jean-Philippe and Robin's comments on v3
> - rework of the smmuv3 driver integration
> - add tear down ops for msi binding and PASID table binding
> - fix S1 fault propagation
> - put fault reporting patches at the beginning of the series following
>   Jean-Philippe's request
> - update of the cache invalidate and fault API uapis
> - VFIO fault reporting rework with 2 separate regions and one mmappable
>   segment for the fault queue
> - moved to PATCH
> 
> v2 -> v3:
> - When registering the S1 MSI binding we now store the device handle. This
>   addresses Robin's comment about discimination of devices beonging to
>   different S1 groups and using different physical MSI doorbells.
> - Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
>   set the eventfd and expose the faults through an mmappable fault region
> 
> v1 -> v2:
> - Added the fault reporting capability
> - asid properly passed on invalidation (fix assignment of multiple
>   devices)
> - see individual change logs for more info
> 
> Eric Auger (13):
>   iommu: Introduce bind/unbind_guest_msi
>   vfio: VFIO_IOMMU_BIND/UNBIND_MSI
>   iommu/smmuv3: Get prepared for nested stage support
>   iommu/smmuv3: Implement attach/detach_pasid_table
>   iommu/smmuv3: Implement cache_invalidate
>   dma-iommu: Implement NESTED_MSI cookie
>   iommu/smmuv3: Implement bind/unbind_guest_msi
>   iommu/smmuv3: Report non recoverable faults
>   vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type
>   vfio-pci: Register an iommu fault handler
>   vfio_pci: Allow to mmap the fault queue
>   vfio-pci: Add VFIO_PCI_DMA_FAULT_IRQ_INDEX
>   vfio: Document nested stage control
> 
> Jacob Pan (4):
>   driver core: add per device iommu param
>   iommu: introduce device fault data
>   iommu: introduce device fault report API
>   iommu: Introduce attach/detach_pasid_table API
> 
> Jean-Philippe Brucker (2):
>   iommu/arm-smmu-v3: Link domains and devices
>   iommu/arm-smmu-v3: Maintain a SID->device structure
> 
> Liu, Yi L (3):
>   iommu: Introduce cache_invalidate API
>   vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE
>   vfio: VFIO_IOMMU_CACHE_INVALIDATE
> 
>  Documentation/vfio.txt  |  83 
>  drivers/iommu/arm-smmu-v3.c | 580 ++--
>  drivers/iommu/dma-iommu.c   | 145 ++-
>  drivers/iommu/iommu.c   | 221 ++-
>  drivers/vfio/pci/vfio_pci.c | 214 ++
>  drivers/vfio/pci/vfio_pci_intrs.c   |  19 +
>  drivers/vfio/pci/vfio_pci_private.h |  20 +
>