RE: [PATCH v2 06/12] iommu/vt-d: Add second level page table interface

2018-09-12 Thread Tian, Kevin
> From: Raj, Ashok
> Sent: Saturday, September 8, 2018 1:43 AM
> 
> On Fri, Sep 07, 2018 at 10:47:11AM +0800, Lu Baolu wrote:
> >
> > >>+
> > >>+ intel_pasid_clear_entry(dev, pasid);
> > >>+
> > >>+ if (!ecap_coherent(iommu->ecap)) {
> > >>+ pte = intel_pasid_get_entry(dev, pasid);
> > >>+ clflush_cache_range(pte, sizeof(*pte));
> > >>+ }
> > >>+
> > >>+ pasid_based_pasid_cache_invalidation(iommu, did, pasid);
> > >>+ pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
> > >>+
> > >>+ /* Device IOTLB doesn't need to be flushed in caching mode. */
> > >>+ if (!cap_caching_mode(iommu->cap))
> > >>+ pasid_based_dev_iotlb_cache_invalidation(iommu, dev,
> > >>pasid);
> > >
> > >can you elaborate, or point to any spec reference?
> > >
> >
> > In the driver, device iotlb doesn't get flushed in caching mode. I just
> > follow what have been done there.
> >
> > It also makes sense to me since only the bare metal host needs to
> > consider whether and how to flush the device iotlb.
> >
> 
> DavidW might remember, i think the idea was to help with cost
> of virtualization, we can avoid taking 2 exits vs handling
> it directly when we do iotlb flushing instead.
> 

OK, performance-wise it makes sense. though strictly speaking it
doesn't follow spec...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-12 Thread Kuninori Morimoto


Hi Geert

Thank you for your feedback

> > -
> > commit ac6bbf0cdf4206c517ac9789814c23e372ebce4d
> > Author: Rob Herring 
> > Date:   Mon Jul 9 09:41:52 2018 -0600
> >
> > iommu: Remove IOMMU_OF_DECLARE
> >
> > Now that we use the driver core to stop deferred probe for missing
> > drivers, IOMMU_OF_DECLARE can be removed.
> >
> > This is slightly less optimal than having a list of built-in drivers in
> > that we'll now defer probe twice before giving up. This shouldn't have a
> > significant impact on boot times as past discussions about deferred
> > probe have given no evidence of deferred probe having a substantial
> > impact.
> > ...
> > -
(snip)
> I assume you wrote commit 6c92d5a2744e2761 ("ASoC: rsnd: don't fallback
> to PIO mode when -EPROBE_DEFER") to work around this?

Yes, it is work around for it.

> While this got rid of the error messages, and postpones sound initialization
> until the audio DMAC is available, it does mean that the driver will _never_
> fall back to PIO.
> 
> I.e. if CONFIG_RCAR_DMAC=n, audio will fail to probe instead of falling
> back to PIO.

If I focus only for sound here, the purpose of PIO mode is
checking basic HW connection, clock settings, etc.
Without DMAC support, it can't use many sound features.
And PIO mode is supporting "SSI" only.

If DT has SRC/CTU/DVC settings for sound,
it needs DMA support anyway.

_sound {
...
ports {
rsnd_port0: port@0 {
rsnd_endpoint0: endpoint {
...
=>  playback = <  >;
};
};
};
};

Before 6c92d5a2744e2761 patch, driver will forcibly ignore
non-SSI modules, and try to use PIO mode.
I'm not sure it is "kindly support" or "overkill support".

After this patch, it needs DMA, otherwise, probe will be failed.
DT shouldn't have non-SSI modules if you want to use PIO mode.

+ /* use PIO mode */ 
- playback = <  >;
+ playback = <>;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/7 v7] Support for fsl-mc bus and its devices in SMMU

2018-09-12 Thread Nipun Gupta



> -Original Message-
> From: Will Deacon [mailto:will.dea...@arm.com]
> Sent: Wednesday, September 12, 2018 10:29 PM
> To: Nipun Gupta 
> Cc: j...@8bytes.org; robin.mur...@arm.com; robh...@kernel.org;
> r...@kernel.org; mark.rutl...@arm.com; catalin.mari...@arm.com;
> gre...@linuxfoundation.org; Laurentiu Tudor ;
> bhelg...@google.com; h...@lst.de; m.szyprow...@samsung.com;
> shawn...@kernel.org; frowand.l...@gmail.com; iommu@lists.linux-
> foundation.org; linux-ker...@vger.kernel.org; devicet...@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org; linux-
> p...@vger.kernel.org; Bharat Bhushan ;
> stuyo...@gmail.com; Leo Li 
> Subject: Re: [PATCH 0/7 v7] Support for fsl-mc bus and its devices in SMMU
> 
> Hi Nipun,
> 
> On Mon, Sep 10, 2018 at 07:19:14PM +0530, Nipun Gupta wrote:
> > This patchset defines IOMMU DT binding for fsl-mc bus and adds
> > support in SMMU for fsl-mc bus.
> >
> > These patches
> >   - Define property 'iommu-map' for fsl-mc bus (patch 1)
> >   - Integrates the fsl-mc bus with the SMMU using this
> > IOMMU binding (patch 2,3,4)
> >   - Adds the dma configuration support for fsl-mc bus (patch 5, 6)
> >   - Updates the fsl-mc device node with iommu/dma related changes (patch 7)
> 
> It looks like you have all the Acks in place for this series now, so I
> assume it's going to go via Joerg directly. Is that right?

This is what I am expecting :)
 
> Will
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] add non-strict mode support for arm-smmu-v3

2018-09-12 Thread Leizhen (ThunderTown)



On 2018/9/13 1:12, Robin Murphy wrote:
> On 12/09/18 17:57, Will Deacon wrote:
>> Hi all,
>>
>> On Wed, Aug 15, 2018 at 09:28:25AM +0800, Zhen Lei wrote:
>>> v4 -> v5:
>>> 1. change the type of global variable and struct member named "non_strict" 
>>> from
>>> "int" to "bool".
>>> 2. cancel the unnecessary parameter "strict" of __arm_lpae_unmap which was 
>>> added
>>> in v4.
>>> 3. change boot option "arm_iommu" to "iommu.non_strict".
>>> 4. convert __iommu_dma_unmap to use iommu_unmap_fast()/iommu_tlb_sync(), 
>>> because
>>> non-leaf unmaps still need to be synchronous.
>>>
>>> Thanks for Robin's review comments.
>>
>> Since this is 90% of the way there now, I suggest Robin picks up what's here
>> and incorporates his remaining review comments directly (especially since it
>> sounded like Zhen Lei hasn't got much free time lately). With that, I can
>> queue this lot via my smmu branch, which already has some stuff queued
>> for SMMUv3 and io-pgtable.
>>
>> Please shout if you have any objections, but I'm keen for this not to
>> languish on the lists given how close it is!
> 
> Sure, having got this far I'd like to see it get done too. I'll make some 
> time and give it a go.

Thank you very much.

I've been too busy lately, at least I still have no time this week, I also 
think it's been too long.

So sorry and thanks again.

> 
> Robin.
> 
> .
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers

2018-09-12 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Thursday, September 13, 2018 1:54 AM
> 
> On 12/09/2018 06:02, Lu Baolu wrote:
> > Hi,
> >
> > On 09/11/2018 12:23 AM, Jean-Philippe Brucker wrote:
> >> On 30/08/2018 05:09, Lu Baolu wrote:
> >>> +static int vfio_detach_aux_domain(struct device *dev, void *data)
> >>> +{
> >>> + struct iommu_domain *domain = data;
> >>> +
> >>> + vfio_mdev_set_aux_domain(dev, NULL);
> >>> + iommu_detach_device(domain, dev->parent);
> >>
> >> I think that's only going to work for vt-d, which doesn't use a
> >> default_domain. For other drivers, iommu.c ends up calling
> >> domain->ops->attach_dev(default_domain, dev) here, which isn't what
> we want.
> >
> > This doesn't break any functionality. It takes effect only if iommu
> > hardware supports finer granular translation and advertises it through
> > the API.>
> >> That was my concern with reusing attach/detach_dev callbacks for
> PASID
> >> management. The attach_dev code of IOMMU drivers already has to
> deal
> >> with toggling between default and unmanaged domain. Dealing with
> more
> >> state transitions in the same path is going to be difficult.
> >
> > Let's consider it from the API point of view.
> >
> > We have iommu_a(de)ttach_device() APIs to attach or detach a domain
> > to/from a device. We should avoid applying a limitation of "these are
> > only for single domain case, for multiple domains, use another API".
> 
> That's an idealized view of the API, the actual semantics aren't as
> simple. For IOMMU drivers that implement IOMMU_DOMAIN_DMA in their
> domain_alloc operation (Arm SMMU, AMD IOMMU, ...), attach_dev
> attaches
> an unmanaged domain, detach_dev reattaches the default DMA domain,
> and
> the detach_dev IOMMU op is not called. We can't change that now, so it's
> difficult to add more functionality to attach_dev and detach_dev.
> 

Now we have four possible usages on a(de)ttach_device:

1) Normal DMA API path i.e. IOMMU_DOMAIN_DMA, for DMA operations
in host/parent device driver;

2) VFIO passthru path, when the physical device is assigned to user space
or guest driver

3) mdev passthru path 1, when mdev is assigned to user space or guest
driver. Here mdev is a wrap on random PCI device

4) mdev passthru path 2, when mdev is assigned to user space or guest
driver. Here mdev is in a smaller granularity (e.g. tagged by PASID) as
supported by vt-d scalable mode

1) and 2) are existing usages. What you described (unmanaged vs. default)
falls into this category.

3) is actually same as 2) in IOMMU layer. sort of passing through DMA
capability to guest. Though there is still a parent driver, the parent driver
itself should not do DMA - i.e. unmanaged in host side.

4) is a new code path introduced in IOMMU layer, which is what
vfio_a(de)tach_aux_domain is trying to address. In this case parent device
can still have its own DMA capability, using existing code path 1) (thus
default domain still applies). and the parent device can have multiple 
aux domains (and associated structures), using code path 4).

I didn't see how this new code path interferes with default domain 
concept in existing path. In the future if ARM or other architectures
plan to support similar scalable mode capability, only the new code
path should be tweaked.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-12 Thread Tian, Kevin
> From: Jean-Philippe Brucker
> Sent: Thursday, September 13, 2018 1:54 AM
> 
> On 12/09/2018 03:42, Lu Baolu wrote:
> > Hi,
> >
> > On 09/11/2018 12:22 AM, Jean-Philippe Brucker wrote:
> >> Hi,
> >>
> >> On 30/08/2018 05:09, Lu Baolu wrote:
> >>> Below APIs are introduced in the IOMMU glue for device drivers to use
> >>> the finer granularity translation.
> >>>
> >>> * iommu_capable(IOMMU_CAP_AUX_DOMAIN)
> >>>- Represents the ability for supporting multiple domains per device
> >>>  (a.k.a. finer granularity translations) of the IOMMU hardware.
> >>
> >> iommu_capable() cannot represent hardware capabilities, we need
> >> something else for systems with multiple IOMMUs that have different
> >> caps. How about iommu_domain_get_attr on the device's domain
> instead?
> >
> > Domain is not a good choice for per iommu cap query. A domain might be
> > attached to devices belonging to different iommu's.
> >
> > How about an API with device structure as parameter? A device always
> > belongs to a specific iommu. This API is supposed to be used the
> > device driver.
> 
> Ah right, domain attributes won't work. Your suggestion seems more
> suitable, but maybe users can simply try to enable auxiliary domains
> first, and conclude that the IOMMU doesn't support it if it returns an error
> 
> >>> * iommu_en(dis)able_aux_domain(struct device *dev)
> >>>- Enable/disable the multiple domains capability for a device
> >>>  referenced by @dev.
> 
> It strikes me now that in the IOMMU driver,
> iommu_enable/disable_aux_domain() will do the same thing as
> iommu_sva_device_init/shutdown()
> (https://www.spinics.net/lists/arm-kernel/msg651896.html). Some IOMMU
> drivers want to enable PASID and allocate PASID tables only when
> requested by users, in the sva_init_device IOMMU op (see Joerg's comment
> last year https://patchwork.kernel.org/patch/9989307/#21025429). Maybe
> we could simply add a flag to iommu_sva_device_init?

We could combine, but definitely 'sva' should be removed :-)

> 
> >>> * iommu_auxiliary_id(struct iommu_domain *domain)
> >>>- Return the index value used for finer-granularity DMA translation.
> >>>  The specific device driver needs to feed the hardware with this
> >>>  value, so that hardware device could issue the DMA transaction with
> >>>  this value tagged.
> >>
> >> This could also reuse iommu_domain_get_attr.
> >>
> >>
> >> More generally I'm having trouble understanding how auxiliary domains
> >> will be used. So VFIO allocates PASIDs like this:
> >
> > As I wrote in the cover letter, "auxiliary domain" is just a name to
> > ease discussion. It's actually has no special meaning (we think a domain
> > as an isolation boundary which could be used by the IOMMU to isolate
> > the DMA transactions out of a PCI device or partial of it).
> >
> > So drivers like vfio should see no difference when use an auxiliary
> > domain. The auxiliary domain is not aware out of iommu driver.
> 
> For an auxiliary domain, VFIO does need to retrieve the PASID and write
> it to hardware. But being able to reuse
> iommu_map/unmap/iova_to_phys/etc
> on the auxiliary domain is nice.
> 
> >> * iommu_enable_aux_domain(parent_dev)
> >> * iommu_domain_alloc() -> dom1
> >> * iommu_domain_alloc() -> dom2
> >> * iommu_attach_device(dom1, parent_dev)
> >>-> dom1 gets PASID #1
> >> * iommu_attach_device(dom2, parent_dev)
> >>-> dom2 gets PASID #2
> >>
> >> Then I'm not sure about the next steps, when userspace does
> >> VFIO_IOMMU_MAP_DMA or VFIO_IOMMU_BIND on an mdev's
> container. Is the
> >> following use accurate?
> >>
> >> For the single translation level:
> >> * iommu_map(dom1, ...) updates first-level/second-level pgtables for
> >> PASID #1
> >> * iommu_map(dom2, ...) updates first-level/second-level pgtables for
> >> PASID #2
> >>
> >> Nested translation:
> >> * iommu_map(dom1, ...) updates second-level pgtables for PASID #1
> >> * iommu_bind_table(dom1, ...) binds first-level pgtables, provided by
> >> the guest, for PASID #1
> >> * iommu_map(dom2, ...) updates second-level pgtables for PASID #2
> >> * iommu_bind_table(dom2, ...) binds first-level pgtables for PASID #2
> >>>
> >> I'm trying to understand how to implement this with SMMU and other
> >
> > This is proposed for architectures which support finer granularity
> > second level translation with no impact on architectures which only
> > support Source ID or the similar granularity.
> 
> Just to be clear, in this paragraph you're only referring to the
> Nested/second-level translation for mdev, which is specific to vt-d
> rev3? Other architectures can still do first-level translation with
> PASID, to support some use-cases of IOMMU aware mediated device
> (assigning mdevs to userspace drivers, for example)

yes. aux domain concept applies only to vt-d rev3 which introduces
scalable mode. Care is taken to avoid breaking usages on existing
architectures.

one note. Assigning mdevs to user space alone doesn't imply IOMMU

Re: [RFC PATCH v2 08/10] vfio/type1: Add domain at(de)taching group helpers

2018-09-12 Thread Jean-Philippe Brucker
On 12/09/2018 06:02, Lu Baolu wrote:
> Hi,
> 
> On 09/11/2018 12:23 AM, Jean-Philippe Brucker wrote:
>> On 30/08/2018 05:09, Lu Baolu wrote:
>>> +static int vfio_detach_aux_domain(struct device *dev, void *data)
>>> +{
>>> +   struct iommu_domain *domain = data;
>>> +
>>> +   vfio_mdev_set_aux_domain(dev, NULL);
>>> +   iommu_detach_device(domain, dev->parent);
>>
>> I think that's only going to work for vt-d, which doesn't use a
>> default_domain. For other drivers, iommu.c ends up calling
>> domain->ops->attach_dev(default_domain, dev) here, which isn't what we want.
> 
> This doesn't break any functionality. It takes effect only if iommu
> hardware supports finer granular translation and advertises it through
> the API.>
>> That was my concern with reusing attach/detach_dev callbacks for PASID
>> management. The attach_dev code of IOMMU drivers already has to deal
>> with toggling between default and unmanaged domain. Dealing with more
>> state transitions in the same path is going to be difficult.
> 
> Let's consider it from the API point of view.
> 
> We have iommu_a(de)ttach_device() APIs to attach or detach a domain
> to/from a device. We should avoid applying a limitation of "these are
> only for single domain case, for multiple domains, use another API".

That's an idealized view of the API, the actual semantics aren't as
simple. For IOMMU drivers that implement IOMMU_DOMAIN_DMA in their
domain_alloc operation (Arm SMMU, AMD IOMMU, ...), attach_dev attaches
an unmanaged domain, detach_dev reattaches the default DMA domain, and
the detach_dev IOMMU op is not called. We can't change that now, so it's
difficult to add more functionality to attach_dev and detach_dev.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/10] vfio/mdev: IOMMU aware mediated device

2018-09-12 Thread Jean-Philippe Brucker
On 12/09/2018 03:42, Lu Baolu wrote:
> Hi,
> 
> On 09/11/2018 12:22 AM, Jean-Philippe Brucker wrote:
>> Hi,
>>
>> On 30/08/2018 05:09, Lu Baolu wrote:
>>> Below APIs are introduced in the IOMMU glue for device drivers to use
>>> the finer granularity translation.
>>>
>>> * iommu_capable(IOMMU_CAP_AUX_DOMAIN)
>>>- Represents the ability for supporting multiple domains per device
>>>  (a.k.a. finer granularity translations) of the IOMMU hardware.
>>
>> iommu_capable() cannot represent hardware capabilities, we need
>> something else for systems with multiple IOMMUs that have different
>> caps. How about iommu_domain_get_attr on the device's domain instead?
> 
> Domain is not a good choice for per iommu cap query. A domain might be
> attached to devices belonging to different iommu's.
>
> How about an API with device structure as parameter? A device always
> belongs to a specific iommu. This API is supposed to be used the
> device driver.

Ah right, domain attributes won't work. Your suggestion seems more
suitable, but maybe users can simply try to enable auxiliary domains
first, and conclude that the IOMMU doesn't support it if it returns an error

>>> * iommu_en(dis)able_aux_domain(struct device *dev)
>>>- Enable/disable the multiple domains capability for a device
>>>  referenced by @dev.

It strikes me now that in the IOMMU driver,
iommu_enable/disable_aux_domain() will do the same thing as
iommu_sva_device_init/shutdown()
(https://www.spinics.net/lists/arm-kernel/msg651896.html). Some IOMMU
drivers want to enable PASID and allocate PASID tables only when
requested by users, in the sva_init_device IOMMU op (see Joerg's comment
last year https://patchwork.kernel.org/patch/9989307/#21025429). Maybe
we could simply add a flag to iommu_sva_device_init?

>>> * iommu_auxiliary_id(struct iommu_domain *domain)
>>>- Return the index value used for finer-granularity DMA translation.
>>>  The specific device driver needs to feed the hardware with this
>>>  value, so that hardware device could issue the DMA transaction with
>>>  this value tagged.
>>
>> This could also reuse iommu_domain_get_attr.
>>
>>
>> More generally I'm having trouble understanding how auxiliary domains
>> will be used. So VFIO allocates PASIDs like this:
> 
> As I wrote in the cover letter, "auxiliary domain" is just a name to
> ease discussion. It's actually has no special meaning (we think a domain
> as an isolation boundary which could be used by the IOMMU to isolate
> the DMA transactions out of a PCI device or partial of it).
> 
> So drivers like vfio should see no difference when use an auxiliary
> domain. The auxiliary domain is not aware out of iommu driver.

For an auxiliary domain, VFIO does need to retrieve the PASID and write
it to hardware. But being able to reuse iommu_map/unmap/iova_to_phys/etc
on the auxiliary domain is nice.

>> * iommu_enable_aux_domain(parent_dev)
>> * iommu_domain_alloc() -> dom1
>> * iommu_domain_alloc() -> dom2
>> * iommu_attach_device(dom1, parent_dev)
>>-> dom1 gets PASID #1
>> * iommu_attach_device(dom2, parent_dev)
>>-> dom2 gets PASID #2
>>
>> Then I'm not sure about the next steps, when userspace does
>> VFIO_IOMMU_MAP_DMA or VFIO_IOMMU_BIND on an mdev's container. Is the
>> following use accurate?
>>
>> For the single translation level:
>> * iommu_map(dom1, ...) updates first-level/second-level pgtables for
>> PASID #1
>> * iommu_map(dom2, ...) updates first-level/second-level pgtables for
>> PASID #2
>>
>> Nested translation:
>> * iommu_map(dom1, ...) updates second-level pgtables for PASID #1
>> * iommu_bind_table(dom1, ...) binds first-level pgtables, provided by
>> the guest, for PASID #1
>> * iommu_map(dom2, ...) updates second-level pgtables for PASID #2
>> * iommu_bind_table(dom2, ...) binds first-level pgtables for PASID #2
>>>
>> I'm trying to understand how to implement this with SMMU and other
> 
> This is proposed for architectures which support finer granularity
> second level translation with no impact on architectures which only
> support Source ID or the similar granularity.

Just to be clear, in this paragraph you're only referring to the
Nested/second-level translation for mdev, which is specific to vt-d
rev3? Other architectures can still do first-level translation with
PASID, to support some use-cases of IOMMU aware mediated device
(assigning mdevs to userspace drivers, for example)

>> IOMMUs. It's not a clean fit since we have a single domain to hold the
>> second-level pgtables. 
> 
> Do you mind explaining why a domain holds multiple second-level
> pgtables? Shouldn't that be multiple domains?

I didn't mean a single domain holding multiple second-level pgtables,
but a single domain holding a single set of second-level pgtables for
all mdevs. But let's ignore that, mdev and second-level isn't realistic
for arm SMMU.

>> Then again, the nested case probably doesn't
>> matter for us - we might as well assign the parent directly, since 

Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

2018-09-12 Thread Raj, Ashok
On Thu, Aug 09, 2018 at 01:44:17PM -0600, Alex Williamson wrote:
> On Thu,  9 Aug 2018 12:37:06 -0700
> Ashok Raj  wrote:
> 
> > PCI_INTERRUPT_PIN should always read  0 for SRIOV Virtual Functions.
> > 
> > Some SRIOV devices have some bugs in RTL and VF's end up reading 1
> > instead of 0 for the PIN.
> 
> Hi Ashok,
> 
> One question, can we identify which VFs are known to have this issue so
> that users (and downstreams) can know how to prioritize this patch?

Hi Alex

Sorry it took some time to hunt this down. 

The offending VF has a device ID : 0x270C
The corresponding PF has a device ID: 0x270B.

> Thanks,
> 
> Alex
> 
> > Since this is a spec required value, rather than having a device specific
> > quirk, we could fix it permanently in vfio.
> > 
> > Reworked suggestions from Alex https://lkml.org/lkml/2018/7/16/1052
> > 
> > Reported-by: Gage Eads 
> > Tested-by: Gage Eads 
> > Signed-off-by: Ashok Raj 
> > Cc: k...@vger.kernel.org
> > Cc: linux-ker...@vger.kernel.org
> > Cc: iommu@lists.linux-foundation.org
> > Cc: Joerg Roedel 
> > Cc: Bjorn Helgaas 
> > Cc: Gage Eads 
> > ---
> >  drivers/vfio/pci/vfio_pci.c| 12 +---
> >  drivers/vfio/pci/vfio_pci_config.c |  3 ++-
> >  2 files changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index b423a30..32943dd 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -433,10 +433,16 @@ static int vfio_pci_get_irq_count(struct 
> > vfio_pci_device *vdev, int irq_type)
> >  {
> > if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
> > u8 pin;
> > -   pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, );
> > -   if (IS_ENABLED(CONFIG_VFIO_PCI_INTX) && !vdev->nointx && pin)
> > -   return 1;
> > +   /*
> > +* INTx must be 0 for all VF's. Enforce that for all
> > +* VF's since this is a spec requirement.
> > +*/
> > +   if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
> > +   vdev->pdev->is_virtfn)
> > +   return 0;
> >  
> > +   pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, );
> > +   return (pin ? 1 : 0);
> > } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
> > u8 pos;
> > u16 flags;
> > diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> > b/drivers/vfio/pci/vfio_pci_config.c
> > index 115a36f..e36b7c3 100644
> > --- a/drivers/vfio/pci/vfio_pci_config.c
> > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > @@ -1676,7 +1676,8 @@ int vfio_config_init(struct vfio_pci_device *vdev)
> > *(__le16 *)[PCI_DEVICE_ID] = cpu_to_le16(pdev->device);
> > }
> >  
> > -   if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx)
> > +   if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
> > +   pdev->is_virtfn)
> > vconfig[PCI_INTERRUPT_PIN] = 0;
> >  
> > ret = vfio_cap_init(vdev);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] add non-strict mode support for arm-smmu-v3

2018-09-12 Thread Robin Murphy

On 12/09/18 17:57, Will Deacon wrote:

Hi all,

On Wed, Aug 15, 2018 at 09:28:25AM +0800, Zhen Lei wrote:

v4 -> v5:
1. change the type of global variable and struct member named "non_strict" from
"int" to "bool".
2. cancel the unnecessary parameter "strict" of __arm_lpae_unmap which was added
in v4.
3. change boot option "arm_iommu" to "iommu.non_strict".
4. convert __iommu_dma_unmap to use iommu_unmap_fast()/iommu_tlb_sync(), because
non-leaf unmaps still need to be synchronous.

Thanks for Robin's review comments.


Since this is 90% of the way there now, I suggest Robin picks up what's here
and incorporates his remaining review comments directly (especially since it
sounded like Zhen Lei hasn't got much free time lately). With that, I can
queue this lot via my smmu branch, which already has some stuff queued
for SMMUv3 and io-pgtable.

Please shout if you have any objections, but I'm keen for this not to
languish on the lists given how close it is!


Sure, having got this far I'd like to see it get done too. I'll make 
some time and give it a go.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/7 v7] Support for fsl-mc bus and its devices in SMMU

2018-09-12 Thread Will Deacon
Hi Nipun,

On Mon, Sep 10, 2018 at 07:19:14PM +0530, Nipun Gupta wrote:
> This patchset defines IOMMU DT binding for fsl-mc bus and adds
> support in SMMU for fsl-mc bus.
> 
> These patches
>   - Define property 'iommu-map' for fsl-mc bus (patch 1)
>   - Integrates the fsl-mc bus with the SMMU using this
> IOMMU binding (patch 2,3,4)
>   - Adds the dma configuration support for fsl-mc bus (patch 5, 6)
>   - Updates the fsl-mc device node with iommu/dma related changes (patch 7)

It looks like you have all the Acks in place for this series now, so I
assume it's going to go via Joerg directly. Is that right?

Will

> Changes in v2:
>   - use iommu-map property for fsl-mc bus
>   - rebase over patchset https://patchwork.kernel.org/patch/10317337/
> and make corresponding changes for dma configuration of devices on
> fsl-mc bus
> 
> Changes in v3:
>   - move of_map_rid in drivers/of/address.c
> 
> Changes in v4:
>   - move of_map_rid in drivers/of/base.c
> 
> Changes in v5:
>   - break patch 5 in two separate patches (now patch 5/7 and patch 6/7)
>   - add changelog text in patch 3/7 and patch 5/7
>   - typo fix
> 
> Changes in v6:
>   - Updated fsl_mc_device_group() API to be more rational
>   - Added dma-coherent property in the LS2 smmu device node
>   - Minor fixes in the device-tree documentation
> 
> Changes in v7:
>   - Rebased over linux 4.19
> 
> Nipun Gupta (7):
>   Documentation: fsl-mc: add iommu-map device-tree binding for fsl-mc
> bus
>   iommu/of: make of_pci_map_rid() available for other devices too
>   iommu/of: support iommu configuration for fsl-mc devices
>   iommu/arm-smmu: Add support for the fsl-mc bus
>   bus: fsl-mc: support dma configure for devices on fsl-mc bus
>   bus: fsl-mc: set coherent dma mask for devices on fsl-mc bus
>   arm64: dts: ls208xa: comply with the iommu map binding for fsl_mc
> 
>  .../devicetree/bindings/misc/fsl,qoriq-mc.txt  |  39 
>  arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi |   7 +-
>  drivers/bus/fsl-mc/fsl-mc-bus.c|  16 +++-
>  drivers/iommu/arm-smmu.c   |   7 ++
>  drivers/iommu/iommu.c  |  13 +++
>  drivers/iommu/of_iommu.c   |  25 -
>  drivers/of/base.c  | 102 
> +
>  drivers/of/irq.c   |   5 +-
>  drivers/pci/of.c   | 101 
>  include/linux/fsl/mc.h |   8 ++
>  include/linux/iommu.h  |   2 +
>  include/linux/of.h |  11 +++
>  include/linux/of_pci.h |  10 --
>  13 files changed, 224 insertions(+), 122 deletions(-)
> 
> -- 
> 1.9.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] add non-strict mode support for arm-smmu-v3

2018-09-12 Thread Will Deacon
Hi all,

On Wed, Aug 15, 2018 at 09:28:25AM +0800, Zhen Lei wrote:
> v4 -> v5:
> 1. change the type of global variable and struct member named "non_strict" 
> from
>"int" to "bool".
> 2. cancel the unnecessary parameter "strict" of __arm_lpae_unmap which was 
> added
>in v4.
> 3. change boot option "arm_iommu" to "iommu.non_strict".
> 4. convert __iommu_dma_unmap to use iommu_unmap_fast()/iommu_tlb_sync(), 
> because
>non-leaf unmaps still need to be synchronous.
> 
> Thanks for Robin's review comments.

Since this is 90% of the way there now, I suggest Robin picks up what's here
and incorporates his remaining review comments directly (especially since it
sounded like Zhen Lei hasn't got much free time lately). With that, I can
queue this lot via my smmu branch, which already has some stuff queued
for SMMUv3 and io-pgtable.

Please shout if you have any objections, but I'm keen for this not to
languish on the lists given how close it is!

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] media: s5p-mfc: Fix memdev DMA configuration

2018-09-12 Thread Robin Murphy
Having of_reserved_mem_device_init() forcibly reconfigure DMA for all
callers, potentially overriding the work done by a bus-specific
.dma_configure method earlier, is at best a bad idea and at worst
actively harmful. If drivers really need virtual devices to own
dma-coherent memory, they should explicitly configure those devices
based on the appropriate firmware node as they create them.

It looks like the only driver not passing in a proper OF platform device
is s5p-mfc, so move the rogue of_dma_configure() call into that driver
where it logically belongs.

CC: Smitha T Murthy 
CC: Marek Szyprowski 
CC: Rob Herring 
Signed-off-by: Robin Murphy 
---
 drivers/media/platform/s5p-mfc/s5p_mfc.c | 7 +++
 drivers/of/of_reserved_mem.c | 4 
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index 927a1235408d..77eb4a4511c1 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1094,6 +1094,13 @@ static struct device *s5p_mfc_alloc_memdev(struct device 
*dev,
child->dma_mask = dev->dma_mask;
child->release = s5p_mfc_memdev_release;
 
+   /*
+* The memdevs are not proper OF platform devices, so in order for them
+* to be treated as valid DMA masters we need a bit of a hack to force
+* them to inherit the MFC node's DMA configuration.
+*/
+   of_dma_configure(child, dev->of_node, true);
+
if (device_add(child) == 0) {
ret = of_reserved_mem_device_init_by_idx(child, dev->of_node,
 idx);
diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 895c83e0c7b6..4ef6f4485335 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -350,10 +350,6 @@ int of_reserved_mem_device_init_by_idx(struct device *dev,
mutex_lock(_rmem_assigned_device_mutex);
list_add(>list, _rmem_assigned_device_list);
mutex_unlock(_rmem_assigned_device_mutex);
-   /* ensure that dma_ops is set for virtual devices
-* using reserved memory
-*/
-   of_dma_configure(dev, np, true);
 
dev_info(dev, "assigned reserved memory node %s\n", rmem->name);
} else {
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: of_dma_request_slave_channel() failed ?

2018-09-12 Thread Mark Brown
On Tue, Sep 11, 2018 at 11:43:47AM +0200, Geert Uytterhoeven wrote:

> So it seems the audio DMAC is deferred a second time, before the iommu driver
> probed.

Shouldn't there be at least one more round of deferred probe triggers
after the IOMMU probes?


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/3] iommu: Add fast hook for getting DMA domains

2018-09-12 Thread Robin Murphy
While iommu_get_domain_for_dev() is the robust way for arbitrary IOMMU
API callers to retrieve the domain pointer, for DMA ops domains it
doesn't scale well for large systems and multi-queue devices, since the
momentary refcount adjustment will lead to exclusive cacheline contention
when multiple CPUs are operating in parallel on different mappings for
the same device.

In the case of DMA ops domains, however, this refcounting is actually
unnecessary, since they already imply that the group exists and is
managed by platform code and IOMMU internals (by virtue of
iommu_group_get_for_dev()) such that a reference will already be held
for the lifetime of the device. Thus we can avoid the bottleneck by
providing a fast lookup specifically for the DMA code to retrieve the
default domain it already knows it has set up - a simple read-only
dereference plays much nicer with cache-coherency protocols.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 9 +
 include/linux/iommu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c15c5980299..9d70344204fe 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1415,6 +1415,15 @@ struct iommu_domain *iommu_get_domain_for_dev(struct 
device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
 
+/*
+ * For IOMMU_DOMAIN_DMA implementations which already provide their own
+ * guarantees that the group and its default domain are valid and correct.
+ */
+struct iommu_domain *iommu_get_dma_domain(struct device *dev)
+{
+   return dev->iommu_group->default_domain;
+}
+
 /*
  * IOMMU groups are really the natrual working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 87994c265bf5..c783648d4060 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -293,6 +293,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/3] iommu/dma: Use fast DMA domain lookup

2018-09-12 Thread Robin Murphy
Most parts of iommu-dma already assume they are operating on a default
domain set up by iommu_dma_init_domain(), and can be converted straight
over to avoid the refcounting bottleneck. MSI page mappings may be in
an unmanaged domain with an explicit MSI-only cookie, so retain the
non-specific lookup, but that's OK since they're far from a contended
fast path either way.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 511ff9a1d6d9..320f9ea82f3f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -491,7 +491,7 @@ static struct page **__iommu_dma_alloc_pages(unsigned int 
count,
 void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
dma_addr_t *handle)
 {
-   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
+   __iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
*handle = IOMMU_MAPPING_ERROR;
 }
@@ -518,7 +518,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
unsigned long attrs, int prot, dma_addr_t *handle,
void (*flush_page)(struct device *, const void *, phys_addr_t))
 {
-   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
struct page **pages;
@@ -606,9 +606,8 @@ int iommu_dma_mmap(struct page **pages, size_t size, struct 
vm_area_struct *vma)
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot)
+   size_t size, int prot, struct iommu_domain *domain)
 {
-   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
size_t iova_off = 0;
dma_addr_t iova;
@@ -632,13 +631,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
 dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, int prot)
 {
-   return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot);
+   return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
+   iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+   __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 /*
@@ -726,7 +726,7 @@ static void __invalidate_sg(struct scatterlist *sg, int 
nents)
 int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
int nents, int prot)
 {
-   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
@@ -811,20 +811,21 @@ void iommu_dma_unmap_sg(struct device *dev, struct 
scatterlist *sg, int nents,
sg = tmp;
}
end = sg_dma_address(sg) + sg_dma_len(sg);
-   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), start, end - start);
+   __iommu_dma_unmap(iommu_get_dma_domain(dev), start, end - start);
 }
 
 dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
return __iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO);
+   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+   iommu_get_dma_domain(dev));
 }
 
 void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle, size);
+   __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
 }
 
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
@@ -850,7 +851,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
if (!msi_page)
return NULL;
 
-   iova = __iommu_dma_map(dev, msi_addr, size, prot);
+   iova = __iommu_dma_map(dev, msi_addr, size, prot, domain);
if (iommu_dma_mapping_error(dev, iova))
goto out_free_page;
 
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH v2 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

2018-09-12 Thread Robin Murphy
Whilst the symmetry of deferring to the existing sync callback in
__iommu_map_page() is nice, taking a round-trip through
iommu_iova_to_phys() is a pretty heavyweight way to get an address we
can trivially compute from the page we already have. Tweaking it to just
perform the cache maintenance directly when appropriate doesn't really
make the code any more complicated, and the runtime efficiency gain can
only be a benefit.

Furthermore, the sync operations themselves know they can only be
invoked on a managed DMA ops domain, so can use the fast specific domain
lookup to avoid excessive manipulation of the group refcount
(particularly in the scatterlist cases).

Acked-by: Will Deacon 
Signed-off-by: Robin Murphy 
---

v2: Don't be totally broken by forgetting the offset

 arch/arm64/mm/dma-mapping.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 072c51fb07d7..cf017c5bb5e7 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev,
if (is_device_dma_coherent(dev))
return;
 
-   phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+   phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
__dma_unmap_area(phys_to_virt(phys), size, dir);
 }
 
@@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device 
*dev,
if (is_device_dma_coherent(dev))
return;
 
-   phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+   phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr);
__dma_map_area(phys_to_virt(phys), size, dir);
 }
 
@@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, 
struct page *page,
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
-   if (!iommu_dma_mapping_error(dev, dev_addr) &&
-   (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-   __iommu_sync_single_for_device(dev, dev_addr, size, dir);
+   if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   !iommu_dma_mapping_error(dev, dev_addr))
+   __dma_map_area(page_address(page) + offset, size, dir);
 
return dev_addr;
 }
-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/3] iommu: Avoid DMA ops domain refcount contention

2018-09-12 Thread Robin Murphy
John raised the issue[1] that we have some unnecessary refcount contention
in the DMA ops path which shows scalability problems now that we have more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing iommu-dma, but didn't have any way to justify at the
time. However, the reports of 10-25% better networking performance on v1
suggest that it's very worthwhile (and far more significant than I ever
would have guessed).

As far as merging goes, I don't mind at all whether this goes via IOMMU,
or via dma-mapping provided Joerg's happy to ack it.

Robin.


[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html

Robin Murphy (3):
  iommu: Add fast hook for getting DMA domains
  iommu/dma: Use fast DMA domain lookup
  arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

 arch/arm64/mm/dma-mapping.c | 10 +-
 drivers/iommu/dma-iommu.c   | 23 ---
 drivers/iommu/iommu.c   |  9 +
 include/linux/iommu.h   |  1 +
 4 files changed, 27 insertions(+), 16 deletions(-)

-- 
2.19.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-12 Thread Christian König

Am 12.09.2018 um 14:40 schrieb Jean-Philippe Brucker:

On 08/09/2018 08:29, Christian König wrote:

Yes, exactly. I just need a PASID which is never used by the OS for a
process and we can easily give that back when the last FD reference is
closed.

Alright, iommu-sva can get its PASID from this external allocator as
well, as long as it has an interface similar to idr. Where would it go,
drivers/base/, mm/, kernel/...?


Good question, my initial instinct was to put it under drivers/pci.

But AFAIKS now you are supporting SVA implementations which are not 
based on PCI.


So drivers/base sounds like a good place to me.




The process dies, iommu-sva is notified and calls the mm_exit()
function passed by the device driver to iommu_sva_device_init(). In
mm_exit() the device driver needs to clear any reference to the
PASID in hardware and in its own structures. When the device driver
returns from mm_exit(), it effectively tells the core that it has
finished using the PASID, and iommu-sva can reuse the PASID for
another process. mm_exit() is allowed to block, so the device
driver has time to clean up and flush the queues.

If the device driver finishes using the PASID before the process
exits, it just calls unbind().

Exactly that's what Michal Hocko is probably going to not like at all.

Can we have a different approach where each driver is informed by the
mm_exit(), but needs to explicitly call unbind() before a PASID is
reused?

It's awful from the IOMMU driver perspective. In addition to "enabled"
and "disabled" PASID states, you add "disabled but DMA still running
normally". Between that new state and "disabled", the IOMMU will be
flooded by translation faults (non-recoverable ones), which it needs to
ignore instead of reporting to the kernel. Not all IOMMUs can deal with
this in hardware (SMMU and VT-d can quiesce translation faults
per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to
filter fault events themselves, depending on the PASID state.


Puh, yeah that is probably true.

Ok let us skip that for a moment, we just need to invest more work in 
killing DMA operations quickly when the process address space is teared 
down.



During that teardown transition it would be ideal if that PASID only
points to a dummy root page directory with only invalid entries.


I guess this can be vendor specific, In VT-d I plan to mark PASID
entry not present and disable fault reporting while draining remaining
activities.

Sounds good to me.

Point is at least in the case where the process was killed by the OOM
killer we should not block in mm_exit().

Instead operations issued by the process to a device driver which uses
SVA needs to be terminated as soon as possible to make sure that the OOM
killer can advance.

I don't see how we're preventing the OOM killer from advancing, so I'm
looking for a stronger argument that justifies adding this complexity to
IOMMU drivers. Time limit of the release MMU notifier, locking
requirement, a concrete example where things break, even a comment
somewhere in mm/ would do...

In my tests I can't manage to disturb the OOM killer, but I could be
missing some special case. Even if the mm_exit() callback
(unrealistically) sleeps for 60 seconds,


Well you are *COMPLETELY* under estimating this. A compute task with a 
huge wave launch can take multiple minutes to tear down.


That's why I'm so concerned about that, but to be honest I think that 
just the hardware needs to become better and we need to be able to block 
dead tasks from spawning threads again.


Regards,
Christian.


  the OOM killer isn't affected
because oom_reap_task_mm() wipes the victim's address space in another
thread, either before or while the release notifier is running.

Thanks,
Jean


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

2018-09-12 Thread Jean-Philippe Brucker
On 08/09/2018 08:29, Christian König wrote:
> Yes, exactly. I just need a PASID which is never used by the OS for a 
> process and we can easily give that back when the last FD reference is 
> closed.

Alright, iommu-sva can get its PASID from this external allocator as
well, as long as it has an interface similar to idr. Where would it go,
drivers/base/, mm/, kernel/...?

 The process dies, iommu-sva is notified and calls the mm_exit()
 function passed by the device driver to iommu_sva_device_init(). In
 mm_exit() the device driver needs to clear any reference to the
 PASID in hardware and in its own structures. When the device driver
 returns from mm_exit(), it effectively tells the core that it has
 finished using the PASID, and iommu-sva can reuse the PASID for
 another process. mm_exit() is allowed to block, so the device
 driver has time to clean up and flush the queues.

 If the device driver finishes using the PASID before the process
 exits, it just calls unbind().
>>> Exactly that's what Michal Hocko is probably going to not like at all.
>>>
>>> Can we have a different approach where each driver is informed by the
>>> mm_exit(), but needs to explicitly call unbind() before a PASID is
>>> reused?

It's awful from the IOMMU driver perspective. In addition to "enabled"
and "disabled" PASID states, you add "disabled but DMA still running
normally". Between that new state and "disabled", the IOMMU will be
flooded by translation faults (non-recoverable ones), which it needs to
ignore instead of reporting to the kernel. Not all IOMMUs can deal with
this in hardware (SMMU and VT-d can quiesce translation faults
per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to
filter fault events themselves, depending on the PASID state.

>>> During that teardown transition it would be ideal if that PASID only
>>> points to a dummy root page directory with only invalid entries.
>>>
>> I guess this can be vendor specific, In VT-d I plan to mark PASID
>> entry not present and disable fault reporting while draining remaining
>> activities.
>
> Sounds good to me.
> 
> Point is at least in the case where the process was killed by the OOM 
> killer we should not block in mm_exit().
>
> Instead operations issued by the process to a device driver which uses 
> SVA needs to be terminated as soon as possible to make sure that the OOM 
> killer can advance.

I don't see how we're preventing the OOM killer from advancing, so I'm
looking for a stronger argument that justifies adding this complexity to
IOMMU drivers. Time limit of the release MMU notifier, locking
requirement, a concrete example where things break, even a comment
somewhere in mm/ would do...

In my tests I can't manage to disturb the OOM killer, but I could be
missing some special case. Even if the mm_exit() callback
(unrealistically) sleeps for 60 seconds, the OOM killer isn't affected
because oom_reap_task_mm() wipes the victim's address space in another
thread, either before or while the release notifier is running.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu