RE: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting

2022-05-23 Thread Tian, Kevin
> From: Steve Wahl
> Sent: Thursday, May 19, 2022 3:58 AM
> 
> On Fri, May 13, 2022 at 10:09:46AM +0800, Baolu Lu wrote:
> > On 2022/5/13 07:12, Steve Wahl wrote:
> > > On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote:
> > > > To support up to 64 sockets with 10 DMAR units each (640), make the
> > > > value of DMAR_UNITS_SUPPORTED adjustable by a config variable,
> > > > CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when
> MAXSMP is
> > > > set.
> > > >
> > > > If the available hardware exceeds DMAR_UNITS_SUPPORTED
> (previously set
> > > > to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
> > > > allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
> > > > remapping doesn't support X2APIC mode x2apic disabled"; and the
> system
> > > > fails to boot properly.
> > > >
> > > > Signed-off-by: Steve Wahl 
> > >
> > > I've received a report from the kernel test robot ,
> > > that this patch causes an error (shown below) when
> > > CONFIG_IOMMU_SUPPORT is not set.
> > >
> > > In my opinion, this is because include/linux/dmar.h and
> > > include/linux/intel-iommu are being #included when they are not really
> > > being used.
> > >
> > > I've tried placing the contents of intel-iommu.h within an #ifdef
> > > CONFIG_INTEL_IOMMU, and that fixes the problem.
> > >
> > > Two questions:
> > >
> > > A) Is this the desired approach to to the fix?
> >
> > Most part of include/linux/intel-iommu.h is private to Intel IOMMU
> > driver. They should be put in a header like drivers/iommu/intel
> > /iommu.h. Eventually, we should remove include/linux/intel-iommu.h
> > and device drivers interact with iommu subsystem through the IOMMU
> > kAPIs.
> >
> > Best regards,
> > baolu
> 
> Baolu's recent patch to move intel-iommu.h private still allows my
> [PATCH v2] to apply with no changes, and gets rid of the compile
> errors when CONFIG_IOMMU_SUPPORT is not set, so the kernel test robot
> should be happy now.
> 
> Is there another step I should do to bring this patch back into focus?
> 

This looks good to me.

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/6] iommu: Add PASID support for DMA mapping API users

2022-05-23 Thread Jacob Pan
Hi Kevin,

On Mon, 23 May 2022 08:25:33 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Thursday, May 19, 2022 2:21 AM
> > 
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> > 
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> > 
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/dma-iommu.c | 114
> > ++
> >  include/linux/dma-iommu.h |   3 +
> >  2 files changed, 117 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 1ca85d37eeab..6ad7ba619ef0 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
> > phys_addr_t phys;
> >  };
> > 
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> >  enum iommu_dma_cookie_type {
> > IOMMU_DMA_IOVA_COOKIE,
> > IOMMU_DMA_MSI_COOKIE,
> > @@ -370,6 +372,118 @@ void iommu_put_dma_cookie(struct
> > iommu_domain *domain)
> > domain->iova_cookie = NULL;
> >  }
> > 
> > +/* Protect iommu_domain DMA PASID data */
> > +static DEFINE_MUTEX(dma_pasid_lock);
> > +/**
> > + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the
> > device's
> > + * DMA domain.
> > + * @dev: Device to be enabled
> > + * @pasid: The returned kernel PASID to be used for DMA
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA, the PASID will point to the same IOVA page table.
> > + *
> > + * @return err code or 0 on success
> > + */
> > +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)  
> 
> iommu_attach_dma_domain_pasid? 'dma_pasid' is too broad from
> a API p.o.v.
> 
I agree dma_pasid is too broad, technically it is dma_api_pasid but seems
too long.
My concern with dma_domain_pasid is that the pasid can also be used for
identity domain.

> > +{
> > +   struct iommu_domain *dom;
> > +   ioasid_t id, max;
> > +   int ret = 0;
> > +
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > +   return -ENODEV;
> > +
> > +   /* Only support domain types that DMA API can be used */
> > +   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> > +   dom->type == IOMMU_DOMAIN_BLOCKED) {
> > +   dev_warn(dev, "Invalid domain type %d", dom->type);
> > +   return -EPERM;
> > +   }  
> 
> WARN_ON.
> 
> and probably we can just check whether domain is default domain here.
> 
good point, I will just use
struct iommu_domain *def_domain = iommu_get_dma_domain(dev);

> > +
> > +   mutex_lock(_pasid_lock);
> > +   id = dom->dma_pasid;
> > +   if (!id) {
> > +   /*
> > +* First device to use PASID in its DMA domain,
> > allocate
> > +* a single PASID per DMA domain is all we need, it is
> > also
> > +* good for performance when it comes down to IOTLB
> > flush.
> > +*/
> > +   max = 1U << dev->iommu->pasid_bits;
> > +   if (!max) {
> > +   ret = -EINVAL;
> > +   goto done_unlock;
> > +   }
> > +
> > +   id = ioasid_alloc(_dma_pasid, 1, max, dev);
> > +   if (id == INVALID_IOASID) {
> > +   ret = -ENOMEM;
> > +   goto done_unlock;
> > +   }
> > +
> > +   dom->dma_pasid = id;
> > +   atomic_set(>dma_pasid_users, 1);  
> 
> this is always accessed with lock held hence no need to be atomic.
> 
good catch, will fix

> > +   }
> > +
> > +   ret = iommu_attach_device_pasid(dom, dev, id);
> > +   if (!ret) {
> > +   *pasid = id;
> > +   atomic_inc(>dma_pasid_users);
> > +   goto done_unlock;
> > +   }
> > +
> > +   if (atomic_dec_and_test(>dma_pasid_users)) {
> > +   ioasid_free(id);
> > +   dom->dma_pasid = 0;
> > +   }
> > +done_unlock:
> > +   mutex_unlock(_pasid_lock);
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL(iommu_attach_dma_pasid);
> > +
> > +/**
> > + * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
> > + * @dev:   Device's PASID DMA to be disabled
> > + *
> > + * It is the device 

Re: [PATCH 0/2] dma-mapping, remoteproc: Fix dma_mem leak after rproc_shutdown

2022-05-23 Thread Mark-PK Tsai via iommu
> >> Sigh.  In theory drivers should never declare coherent memory like
> >> this, and there has been some work to fix remoteproc in that regard.
> >>
> >> But I guess until that is merged we'll need somthing like this fix.
> > 
> > Hi,
> > 
> > Thanks for your comment.
> > As I didn't see other fix of this issue, should we use this patch
> > before the remoteproc work you mentioned is merged?
> 
> TBH I think it would be better "fixed" with a kmemleak_ignore() and a 
> big comment, rather than adding API cruft for something that isn't a 
> real problem. I'm quite sure that no real-world user is unbinding 
> remoteproc drivers frequently enough that leaking a 48-byte allocation 
> each time has any practical significance.
> 

Actually we stop 2 remote processors using the same remoteproc driver
when system suspend or screen off on our arm64 platform.
And the 48-byte slab allocation will be aligned up to 128 bytes on arm64.
So the leak can be significant in our use case especially
in stress test...

We really don't want to ignore it. Maybe there're other way to fix
it without adding APIs?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 0/6] IOMMU-DMA - support DMA_ATTR_LOW_ADDRESS attribute

2022-05-23 Thread Ajay Kumar
Ping!

On Thu, May 12, 2022 at 9:09 AM Ajay Kumar  wrote:
>
> This patchset is a rebase of original patches from Marek Szyprowski:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2321261.html
>
> The patches have been rebased on Joro's IOMMU tree "next" branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
>
> This patchset is needed to address the IOVA address dependency issue between
> firmware buffers and other buffers in Samsung s5p-mfc driver.
>
> There have been few discussions in the past on how to find a generic
> soultion for this issue, ranging from adding an entirely new API to choose
> IOVA window[1], to adding a DMA attribute DMA_ATTR_LOW_ADDRESS which handles
> buffer allocation from lower address[2].
> This is a continuation of initial work from Marek for approach [2].
> Patches have been tested with latest version of Samsung s5p-mfc driver.
>
> Changes since V1:
> [PATCH V2 1/6]
> - Rebase on latest tree.
>
> [PATCH V2 2/6]
> - Rebase on latest tree.
>   Added a missing check for iova_pfn in __iova_rcache_get()
>   Discard changes from drivers/iommu/intel/iommu.c which are not necessary
>
> [PATCH V2 3/6]
> - Rebase on latest tree.
>
> [PATCH V2 4/6]
> - Rebase on latest tree
>
> [PATCH V2 5/6]
> - Rebase on latest tree
>
> [PATCH V2 6/6]
> - Rebase on latest tree.
>
> Marek Szyprowski (6):
>   dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute
>   iommu: iova: properly handle 0 as a valid IOVA address
>   iommu: iova: add support for 'first-fit' algorithm
>   iommu: dma-iommu: refactor iommu_dma_alloc_iova()
>   iommu: dma-iommu: add support for DMA_ATTR_LOW_ADDRESS
>   media: platform: s5p-mfc: use DMA_ATTR_LOW_ADDRESS
>
> References:
> [1]
> https://lore.kernel.org/linux-iommu/20200811054912.ga...@infradead.org/
>
> [2]
> https://lore.kernel.org/linux-mm/bff57cbe-2247-05e1-9059-d9c66d64c...@arm.com
>
>  drivers/iommu/dma-iommu.c | 77 +++-
>  drivers/iommu/iova.c  | 91 ++-
>  .../media/platform/samsung/s5p-mfc/s5p_mfc.c  |  8 +-
>  include/linux/dma-mapping.h   |  6 ++
>  include/linux/iova.h  |  3 +
>  5 files changed, 156 insertions(+), 29 deletions(-)
>
>
> base-commit: faf93cfaadfaaff2a5c35d6301b45aa2f6e4ddb2
> --
> 2.17.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Avoid leak OF node on error

2022-05-23 Thread Robin Murphy

On 2022-05-23 12:54, Johan Hovold wrote:

On Mon, May 23, 2022 at 11:11:45AM +, cgel@gmail.com wrote:

From: Minghao Chi 

The OF node should be put before returning error in ipmmu_init(),
otherwise node's refcount will be leaked.

Reported-by: Zeal Robot 
Signed-off-by: Minghao Chi 
---
  drivers/iommu/ipmmu-vmsa.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 8fdb84b3642b..f6440b106f46 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -1179,11 +1179,10 @@ static int __init ipmmu_init(void)
return 0;
  
  	np = of_find_matching_node(NULL, ipmmu_of_ids);

+   of_node_put(np);
if (!np)
return 0;
  
-	of_node_put(np);

-
ret = platform_driver_register(_driver);
if (ret < 0)
return ret;


NAK


Indeed. How exactly can we hold a refcount on NULL, let alone leak it?

Static checkers are great for flagging up code that *might* have issues, 
but please actually *look* at the code and apply some thought before 
sending a patch.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration

2022-05-23 Thread Nathan Chancellor
On Fri, May 20, 2022 at 05:15:55PM -0700, Saravana Kannan wrote:
> On Fri, May 20, 2022 at 5:04 PM Nathan Chancellor  wrote:
> >
> > On Fri, May 20, 2022 at 04:49:48PM -0700, Saravana Kannan wrote:
> > > On Fri, May 20, 2022 at 4:30 PM Nathan Chancellor  
> > > wrote:
> > > >
> > > > Hi Saravana,
> > > >
> > > > On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote:
> > > > > The deferred probe timer that's used for this currently starts at
> > > > > late_initcall and runs for driver_deferred_probe_timeout seconds. The
> > > > > assumption being that all available drivers would be loaded and
> > > > > registered before the timer expires. This means, the
> > > > > driver_deferred_probe_timeout has to be pretty large for it to cover 
> > > > > the
> > > > > worst case. But if we set the default value for it to cover the worst
> > > > > case, it would significantly slow down the average case. For this
> > > > > reason, the default value is set to 0.
> > > > >
> > > > > Also, with CONFIG_MODULES=y and the current default values of
> > > > > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with 
> > > > > missing
> > > > > drivers will cause their consumer devices to always defer their 
> > > > > probes.
> > > > > This is because device links created by fw_devlink defer the probe 
> > > > > even
> > > > > before the consumer driver's probe() is called.
> > > > >
> > > > > Instead of a fixed timeout, if we extend an unexpired deferred probe
> > > > > timer on every successful driver registration, with the expectation 
> > > > > more
> > > > > modules would be loaded in the near future, then the default value of
> > > > > driver_deferred_probe_timeout only needs to be as long as the worst 
> > > > > case
> > > > > time difference between two consecutive module loads.
> > > > >
> > > > > So let's implement that and set the default value to 10 seconds when
> > > > > CONFIG_MODULES=y.
> > > > >
> > > > > Cc: Greg Kroah-Hartman 
> > > > > Cc: "Rafael J. Wysocki" 
> > > > > Cc: Rob Herring 
> > > > > Cc: Linus Walleij 
> > > > > Cc: Will Deacon 
> > > > > Cc: Ulf Hansson 
> > > > > Cc: Kevin Hilman 
> > > > > Cc: Thierry Reding 
> > > > > Cc: Mark Brown 
> > > > > Cc: Pavel Machek 
> > > > > Cc: Geert Uytterhoeven 
> > > > > Cc: Yoshihiro Shimoda 
> > > > > Cc: Paul Kocialkowski 
> > > > > Cc: linux-g...@vger.kernel.org
> > > > > Cc: linux...@vger.kernel.org
> > > > > Cc: iommu@lists.linux-foundation.org
> > > > > Signed-off-by: Saravana Kannan 
> > > >
> > > > I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this
> > > > change as commit 2b28a1a84a0e ("driver core: Extend deferred probe
> > > > timeout on driver registration") in next-20220520 (bisect log below).
> > > >
> > > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- 
> > > > defconfig bzImage
> > > >
> > > > $ timeout --foreground 15m stdbuf -oL -eL \
> > > > qemu-system-s390x \
> > > > -initrd ... \
> > > > -M s390-ccw-virtio \
> > > > -display none \
> > > > -kernel arch/s390/boot/bzImage \
> > > > -m 512m \
> > > > -nodefaults \
> > > > -serial mon:stdio
> > > > ...
> > > > [2.077303] In-situ OAM (IOAM) with IPv6
> > > > [2.077639] NET: Registered PF_PACKET protocol family
> > > > [2.078063] bridge: filtering via arp/ip/ip6tables is no longer 
> > > > available by default. Update your scripts to load br_netfilter if you 
> > > > need this.
> > > > [2.078795] Key type dns_resolver registered
> > > > [2.079317] cio: Channel measurement facility initialized using 
> > > > format extended (mode autodetected)
> > > > [2.081494] Discipline DIAG cannot be used without z/VM
> > > > [  260.626363] random: crng init done
> > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout)
> > > >
> > > > We have a simple rootfs available if necessary:
> > > >
> > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst
> > > >
> > > > If there is any other information I can provide, please let me know!
> > >
> > > Hmm... strange. Can you please try the following command line options
> > > and tell me which of these has the issue and which don't?
> >
> > Sure thing!
> >
> > > 1) deferred_probe_timeout=0
> >
> > No issue.
> >
> > > 2) deferred_probe_timeout=1
> > > 3) deferred_probe_timeout=300
> >
> > Both of these appear to hang in the same way, I let each sit for five
> > minutes.
> 
> Strange that a sufficiently large timeout isn't helping. Is it trying
> to boot off a network mount? I'll continue looking into this next
> week.

I don't think so, it seems like doing that requires some extra flags
that we do not have:

https://wiki.qemu.org/Features/S390xNetworkBoot

If you need any additional information or want something tested, please
let me know!

Cheers,
Nathan
___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-23 Thread Alexey Kardashevskiy




On 4/29/22 00:53, David Gibson wrote:

On Thu, Mar 24, 2022 at 04:04:03PM -0600, Alex Williamson wrote:

On Wed, 23 Mar 2022 21:33:42 -0300
Jason Gunthorpe  wrote:


On Wed, Mar 23, 2022 at 04:51:25PM -0600, Alex Williamson wrote:


My overall question here would be whether we can actually achieve a
compatibility interface that has sufficient feature transparency that we
can dump vfio code in favor of this interface, or will there be enough
niche use cases that we need to keep type1 and vfio containers around
through a deprecation process?


Other than SPAPR, I think we can.


Does this mean #ifdef CONFIG_PPC in vfio core to retain infrastructure
for POWER support?


There are a few different levels to consider for dealing with PPC.
For a suitable long term interface for ppc hosts and guests dropping
this is fine: the ppc specific iommu model was basically an
ill-conceived idea from the beginning, because none of us had
sufficiently understood what things were general and what things where
iommu model/hw specific.

..mostly.  There are several points of divergence for the ppc iommu
model.

1) Limited IOVA windows.  This one turned out to not really be ppc
specific, and is (rightly) handlded generically in the new interface.
No problem here.

2) Costly GUPs.  pseries (the most common ppc machine type) always
expects a (v)IOMMU.  That means that unlike the common x86 model of a
host with IOMMU, but guests with no-vIOMMU, guest initiated
maps/unmaps can be a hot path.  Accounting in that path can be
prohibitive (and on POWER8 in particular it prevented us from
optimizing that path the way we wanted).  We had two solutions for
that, in v1 the explicit ENABLE/DISABLE calls, which preaccounted
based on the IOVA window sizes.  That was improved in the v2 which
used the concept of preregistration.  IIUC iommufd can achieve the
same effect as preregistration using IOAS_COPY, so this one isn't
really a problem either.



I am getting rid of those POWER8-related realmode handlers as POWER9 has 
MMU enabled when hcalls are handled. Costly GUP problem is still there 
though (which base IOAS should solve?).




3) "dynamic DMA windows" (DDW).  The IBM IOMMU hardware allows for 2 IOVA
windows, which aren't contiguous with each other.  The base addresses
of each of these are fixed, but the size of each window, the pagesize
(i.e. granularity) of each window and the number of levels in the
IOMMU pagetable are runtime configurable.  Because it's true in the
hardware, it's also true of the vIOMMU interface defined by the IBM
hypervisor (and adpoted by KVM as well).  So, guests can request
changes in how these windows are handled.  Typical Linux guests will
use the "low" window (IOVA 0..2GiB) dynamically, and the high window
(IOVA 1<<60..???) to map all of RAM.  However, as a hypervisor we
can't count on that; the guest can use them however it wants.



The guest actually does this already. AIX has always been like that, 
Linux is forced to do that for SRIOV VFs as there can be many VFs and 
TCEs (==IOPTEs) are limited resource. The today's pseries IOMMU code 
first tried mapping 1:1 (as it has been for ages) but if there is not 
enough TCEs - it removes the first window (which increases the TCE 
budget), creates a new 64bit window (as big as possible but not 
necessarily enough for 1:1, 64K/2M IOMMU page sizes allowed) and does 
map/unmap as drivers go.



Which means the guest RAM does not need to be all mapped in that base 
IOAS suggested down this thread as that would mean all memory is pinned 
and powervm won't be able to swap it out (yeah, it can do such thing 
now!). Not sure if we really want to support this or stick to a simpler 
design.






(3) still needs a plan for how to fit it into the /dev/iommufd model.
This is a secondary reason that in the past I advocated for the user
requesting specific DMA windows which the kernel would accept or
refuse, rather than having a query function - it connects easily to
the DDW model.  With the query-first model we'd need some sort of
extension here, not really sure what it should look like.



Then, there's handling existing qemu (or other software) that is using
the VFIO SPAPR_TCE interfaces.  First, it's not entirely clear if this
should be a goal or not: as others have noted, working actively to
port qemu to the new interface at the same time as making a
comprehensive in-kernel compat layer is arguably redundant work.

That said, if we did want to handle this in an in-kernel compat layer,
here's roughly what you'd need for SPAPR_TCE v2:

- VFIO_IOMMU_SPAPR_TCE_GET_INFO
 I think this should be fairly straightforward; the information you
 need should be in the now generic IOVA window stuff and would just
 need massaging into the expected format.
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY /
   VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
 IIUC, these could be traslated into map/unmap operations onto a
 second implicit IOAS which represents the preregistered memory
 

[PATCH] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts

2022-05-23 Thread Sai Prakash Ranjan
TLB sync timeouts can be due to various reasons such as TBU power down
or pending TCU/TBU invalidation/sync and so on. Debugging these often
require dumping of some implementation defined registers to know the
status of TBU/TCU operations and some of these registers are not
accessible in non-secure world such as from kernel and requires SMC
calls to read them in the secure world. So, add this debug support
to dump implementation defined registers for TLB sync timeout issues.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.c  |  2 +
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  4 ++
 3 files changed, 56 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 7820711c4560..22e9a0085475 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -5,11 +5,19 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #include "arm-smmu.h"
 
+#define QCOM_DUMMY_VAL -1
+
+/* Implementation Defined Register Space 0 registers */
+#define QCOM_SMMU_STATS_SYNC_INV_TBU_ACK   0x5dc
+#define QCOM_SMMU_TBU_PWR_STATUS   0x2204
+#define QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR   0x2670
+
 struct qcom_smmu {
struct arm_smmu_device smmu;
bool bypass_quirk;
@@ -22,6 +30,46 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device 
*smmu)
return container_of(smmu, struct qcom_smmu, smmu);
 }
 
+static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,
+   int sync, int status)
+{
+   u32 sync_inv_ack, sync_inv_progress, tbu_pwr_status;
+   unsigned int spin_cnt, delay;
+   u32 reg;
+   int ret;
+
+   arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL);
+   for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
+   for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
+   reg = arm_smmu_readl(smmu, page, status);
+   if (!(reg & ARM_SMMU_sTLBGSTATUS_GSACTIVE))
+   return;
+   cpu_relax();
+   }
+   udelay(delay);
+   }
+
+   sync_inv_ack = arm_smmu_readl(smmu, ARM_SMMU_IMPL_DEF0,
+ QCOM_SMMU_STATS_SYNC_INV_TBU_ACK);
+
+   ret = qcom_scm_io_readl(smmu->ioaddr + QCOM_SMMU_TBU_PWR_STATUS,
+   _pwr_status);
+   if (ret)
+   dev_err_ratelimited(smmu->dev,
+   "Failed to read TBU power status: %d\n", 
ret);
+
+   ret = qcom_scm_io_readl(smmu->ioaddr + 
QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR,
+   _inv_progress);
+   if (ret)
+   dev_err_ratelimited(smmu->dev,
+   "Failed to read SAFE WAIT counter: %d\n", 
ret);
+
+   dev_err_ratelimited(smmu->dev,
+   "TLB sync timed out -- SMMU may be deadlocked\n"
+   "TBU: sync_inv_ack %#x power_status %#x 
sync_inv_progress %#x\n",
+   sync_inv_ack, tbu_pwr_status, sync_inv_progress);
+}
+
 static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx,
u32 reg)
 {
@@ -374,6 +422,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
.write_s2cr = qcom_smmu_write_s2cr,
+   .tlb_sync = qcom_smmu_tlb_sync,
 };
 
 static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
@@ -382,6 +431,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
.reset = qcom_smmu500_reset,
.alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
.write_sctlr = qcom_adreno_smmu_write_sctlr,
+   .tlb_sync = qcom_smmu_tlb_sync,
 };
 
 static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..4c5b51109835 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2099,6 +2099,8 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (IS_ERR(smmu->base))
return PTR_ERR(smmu->base);
ioaddr = res->start;
+   smmu->ioaddr = ioaddr;
+
/*
 * The resource size should effectively match the value of SMMU_TOP;
 * stash that temporarily until we know PAGESIZE to validate it with.
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 2b9b42fb6f30..8cf6567d970f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -278,6 +278,7 @@ struct arm_smmu_device {
struct device   *dev;
 
   

Re: [PATCH V2 2/6] iommu: iova: properly handle 0 as a valid IOVA address

2022-05-23 Thread Robin Murphy

On 2022-05-11 13:15, Ajay Kumar wrote:

From: Marek Szyprowski 

Zero is a valid DMA and IOVA address on many architectures, so adjust the
IOVA management code to properly handle it. A new value IOVA_BAD_ADDR
(~0UL) is introduced as a generic value for the error case. Adjust all
callers of the alloc_iova_fast() function for the new return value.


And when does anything actually need this? In fact if you were to stop 
iommu-dma from reserving IOVA 0 - which you don't - it would only show 
how patch #3 is broken.


Also note that it's really nothing to do with architectures either way; 
iommu-dma simply chooses to reserve IOVA 0 for its own convenience, 
mostly because it can. Much the same way that 0 is typically a valid CPU 
VA, but mapping something meaningful there is just asking for a world of 
pain debugging NULL-dereference bugs.


Robin.


Signed-off-by: Marek Szyprowski 
Signed-off-by: Ajay Kumar 
---
  drivers/iommu/dma-iommu.c | 16 +---
  drivers/iommu/iova.c  | 13 +
  include/linux/iova.h  |  1 +
  3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1ca85d37eeab..16218d6a0703 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -605,7 +605,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
  {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
-   unsigned long shift, iova_len, iova = 0;
+   unsigned long shift, iova_len, iova = IOVA_BAD_ADDR;
  
  	if (cookie->type == IOMMU_DMA_MSI_COOKIE) {

cookie->msi_iova += size;
@@ -625,11 +625,13 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
iova = alloc_iova_fast(iovad, iova_len,
   DMA_BIT_MASK(32) >> shift, false);
  
-	if (!iova)

+   if (iova == IOVA_BAD_ADDR)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
   true);
  
-	return (dma_addr_t)iova << shift;

+   if (iova != IOVA_BAD_ADDR)
+   return (dma_addr_t)iova << shift;
+   return DMA_MAPPING_ERROR;
  }
  
  static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,

@@ -688,7 +690,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size = iova_align(iovad, size + iova_off);
  
  	iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);

-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
  
  	if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {

@@ -799,7 +801,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
  
  	size = iova_align(iovad, size);

iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
  
  	if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))

@@ -1204,7 +1206,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
}
  
  	iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);

-   if (!iova) {
+   if (iova == DMA_MAPPING_ERROR) {
ret = -ENOMEM;
goto out_restore_sg;
}
@@ -1516,7 +1518,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
return NULL;
  
  	iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);

-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_free_page;
  
  	if (iommu_map(domain, iova, msi_addr, size, prot))

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..ae0fe0a6714e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -429,6 +429,8 @@ EXPORT_SYMBOL_GPL(free_iova);
   * This function tries to satisfy an iova allocation from the rcache,
   * and falls back to regular allocation on failure. If regular allocation
   * fails too and the flush_rcache flag is set then the rcache will be flushed.
+ * Returns a pfn the allocated iova starts at or IOVA_BAD_ADDR in the case
+ * of a failure.
  */
  unsigned long
  alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
@@ -447,7 +449,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
size = roundup_pow_of_two(size);
  
  	iova_pfn = iova_rcache_get(iovad, size, limit_pfn + 1);

-   if (iova_pfn)
+   if (iova_pfn != IOVA_BAD_ADDR)
return iova_pfn;
  
  retry:

@@ -456,7 +458,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
unsigned int cpu;
  
  		if (!flush_rcache)

-   return 0;
+   return IOVA_BAD_ADDR;
  
  		/* Try replenishing IOVAs by flushing rcache. */

flush_rcache = false;
@@ 

Re: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-23 Thread Jacob Pan
Hi Kevin,

On Mon, 23 May 2022 09:14:04 +, "Tian, Kevin" 
wrote:

> > From: Tian, Kevin
> > Sent: Monday, May 23, 2022 3:55 PM
> >   
> > > From: Jacob Pan 
> > > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> > > iommu_domain *domain)
> > > +{
> > > + struct iommu_domain *tdomain;
> > > + struct iommu_group *group;
> > > + unsigned long index;
> > > + ioasid_t pasid = INVALID_IOASID;
> > > +
> > > + group = iommu_group_get(dev);
> > > + if (!group)
> > > + return pasid;
> > > +
> > > + xa_for_each(>pasid_array, index, tdomain) {
> > > + if (domain == tdomain) {
> > > + pasid = index;
> > > + break;
> > > + }
> > > + }  
> > 
> > Don't we need to acquire the group lock here?
> > 
pasid_array is under RCU read lock so it is protected though may have stale
data. It also used in atomic context for TLB flush, cannot take the
group mutex. If the caller does detach_dev_pasid while doing TLB flush, it
could result in extra flush but harmless.

> > Btw the intention of this function is a bit confusing. Patch01 already
> > stores the pasid under domain hence it's redundant to get it
> > indirectly from xarray index. You could simply introduce a flag bit
> > (e.g. dma_pasid_enabled) in device_domain_info and then directly
> > use domain->dma_pasid once the flag is true.
> >   
> 
> Just saw your discussion with Jason about v3. While it makes sense
> to not specialize DMA domain in iommu driver, the use of this function
> should only be that when the call chain doesn't pass down a pasid
> value e.g. when doing cache invalidation for domain map/unmap. If
> the upper interface already carries a pasid e.g. in detach_dev_pasid()
> iommu driver can simply verify that the corresponding pasid xarray 
> entry points to the specified domain instead of using this function to
> loop xarray and then verify the returned pasid (as done in patch03/04).
Excellent point, I could just use xa_load(pasid) to compare the domain
instead of loop through xa.
I will add another helper.

bool iommu_is_pasid_domain_attached(struct device *dev, struct iommu_domain 
*domain, ioasid_t pasid)
{
struct iommu_group *group;
bool ret = false;

group = iommu_group_get(dev);
if (WARN_ON(!group))
return false;

if (domain == xa_load(>pasid_array, pasid))
ret = true;

iommu_group_put(group);

return ret;
}

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration

2022-05-23 Thread Saravana Kannan via iommu
On Mon, May 23, 2022 at 8:17 AM Nathan Chancellor  wrote:
>
> On Fri, May 20, 2022 at 05:15:55PM -0700, Saravana Kannan wrote:
> > On Fri, May 20, 2022 at 5:04 PM Nathan Chancellor  wrote:
> > >
> > > On Fri, May 20, 2022 at 04:49:48PM -0700, Saravana Kannan wrote:
> > > > On Fri, May 20, 2022 at 4:30 PM Nathan Chancellor  
> > > > wrote:
> > > > >
> > > > > Hi Saravana,
> > > > >
> > > > > On Fri, Apr 29, 2022 at 03:09:32PM -0700, Saravana Kannan wrote:
> > > > > > The deferred probe timer that's used for this currently starts at
> > > > > > late_initcall and runs for driver_deferred_probe_timeout seconds. 
> > > > > > The
> > > > > > assumption being that all available drivers would be loaded and
> > > > > > registered before the timer expires. This means, the
> > > > > > driver_deferred_probe_timeout has to be pretty large for it to 
> > > > > > cover the
> > > > > > worst case. But if we set the default value for it to cover the 
> > > > > > worst
> > > > > > case, it would significantly slow down the average case. For this
> > > > > > reason, the default value is set to 0.
> > > > > >
> > > > > > Also, with CONFIG_MODULES=y and the current default values of
> > > > > > driver_deferred_probe_timeout=0 and fw_devlink=on, devices with 
> > > > > > missing
> > > > > > drivers will cause their consumer devices to always defer their 
> > > > > > probes.
> > > > > > This is because device links created by fw_devlink defer the probe 
> > > > > > even
> > > > > > before the consumer driver's probe() is called.
> > > > > >
> > > > > > Instead of a fixed timeout, if we extend an unexpired deferred probe
> > > > > > timer on every successful driver registration, with the expectation 
> > > > > > more
> > > > > > modules would be loaded in the near future, then the default value 
> > > > > > of
> > > > > > driver_deferred_probe_timeout only needs to be as long as the worst 
> > > > > > case
> > > > > > time difference between two consecutive module loads.
> > > > > >
> > > > > > So let's implement that and set the default value to 10 seconds when
> > > > > > CONFIG_MODULES=y.
> > > > > >
> > > > > > Cc: Greg Kroah-Hartman 
> > > > > > Cc: "Rafael J. Wysocki" 
> > > > > > Cc: Rob Herring 
> > > > > > Cc: Linus Walleij 
> > > > > > Cc: Will Deacon 
> > > > > > Cc: Ulf Hansson 
> > > > > > Cc: Kevin Hilman 
> > > > > > Cc: Thierry Reding 
> > > > > > Cc: Mark Brown 
> > > > > > Cc: Pavel Machek 
> > > > > > Cc: Geert Uytterhoeven 
> > > > > > Cc: Yoshihiro Shimoda 
> > > > > > Cc: Paul Kocialkowski 
> > > > > > Cc: linux-g...@vger.kernel.org
> > > > > > Cc: linux...@vger.kernel.org
> > > > > > Cc: iommu@lists.linux-foundation.org
> > > > > > Signed-off-by: Saravana Kannan 
> > > > >
> > > > > I bisected a boot hang with ARCH=s390 defconfig in QEMU down to this
> > > > > change as commit 2b28a1a84a0e ("driver core: Extend deferred probe
> > > > > timeout on driver registration") in next-20220520 (bisect log below).
> > > > >
> > > > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- 
> > > > > defconfig bzImage
> > > > >
> > > > > $ timeout --foreground 15m stdbuf -oL -eL \
> > > > > qemu-system-s390x \
> > > > > -initrd ... \
> > > > > -M s390-ccw-virtio \
> > > > > -display none \
> > > > > -kernel arch/s390/boot/bzImage \
> > > > > -m 512m \
> > > > > -nodefaults \
> > > > > -serial mon:stdio
> > > > > ...
> > > > > [2.077303] In-situ OAM (IOAM) with IPv6
> > > > > [2.077639] NET: Registered PF_PACKET protocol family
> > > > > [2.078063] bridge: filtering via arp/ip/ip6tables is no longer 
> > > > > available by default. Update your scripts to load br_netfilter if you 
> > > > > need this.
> > > > > [2.078795] Key type dns_resolver registered
> > > > > [2.079317] cio: Channel measurement facility initialized using 
> > > > > format extended (mode autodetected)
> > > > > [2.081494] Discipline DIAG cannot be used without z/VM
> > > > > [  260.626363] random: crng init done
> > > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 (timeout)
> > > > >
> > > > > We have a simple rootfs available if necessary:
> > > > >
> > > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst
> > > > >
> > > > > If there is any other information I can provide, please let me know!
> > > >
> > > > Hmm... strange. Can you please try the following command line options
> > > > and tell me which of these has the issue and which don't?
> > >
> > > Sure thing!
> > >
> > > > 1) deferred_probe_timeout=0
> > >
> > > No issue.
> > >
> > > > 2) deferred_probe_timeout=1
> > > > 3) deferred_probe_timeout=300
> > >
> > > Both of these appear to hang in the same way, I let each sit for five
> > > minutes.
> >
> > Strange that a sufficiently large timeout isn't helping. Is it trying
> > to boot off a network mount? I'll continue looking into this next
> > week.
>
> I don't think so, it seems like doing that requires some extra flags

Re: [PATCH 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-23 Thread John Garry via iommu

On 21/05/2022 00:33, Damien Le Moal wrote:

Hi Damien,


+unsigned long iova_rcache_range(void)

Why not a size_t return type ?


The IOVA code generally uses unsigned long for size/range while 
dam-iommu uses size_t as appropiate, so I'm just sticking to that.





+{
+   return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
+}
+


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-23 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Thursday, May 19, 2022 2:21 AM
> 
> IOMMU group maintains a PASID array which stores the associated IOMMU
> domains. This patch introduces a helper function to do domain to PASID
> look up. It will be used by TLB flush and device-PASID attach verification.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/iommu.c | 22 ++
>  include/linux/iommu.h |  6 +-
>  2 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 00d0262a1fe9..22f44833db64 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3199,3 +3199,25 @@ struct iommu_domain
> *iommu_get_domain_for_iopf(struct device *dev,
> 
>   return domain;
>  }
> +
> +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> iommu_domain *domain)
> +{
> + struct iommu_domain *tdomain;
> + struct iommu_group *group;
> + unsigned long index;
> + ioasid_t pasid = INVALID_IOASID;
> +
> + group = iommu_group_get(dev);
> + if (!group)
> + return pasid;
> +
> + xa_for_each(>pasid_array, index, tdomain) {
> + if (domain == tdomain) {
> + pasid = index;
> + break;
> + }
> + }

Don't we need to acquire the group lock here?

Btw the intention of this function is a bit confusing. Patch01 already
stores the pasid under domain hence it's redundant to get it 
indirectly from xarray index. You could simply introduce a flag bit
(e.g. dma_pasid_enabled) in device_domain_info and then directly
use domain->dma_pasid once the flag is true.

> + iommu_group_put(group);
> +
> + return pasid;
> +}
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 36ad007084cc..c0440a4be699 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -694,7 +694,7 @@ void iommu_detach_device_pasid(struct
> iommu_domain *domain,
>  struct device *dev, ioasid_t pasid);
>  struct iommu_domain *
>  iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid);
> -
> +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> iommu_domain *domain);
>  #else /* CONFIG_IOMMU_API */
> 
>  struct iommu_ops {};
> @@ -1070,6 +1070,10 @@ iommu_get_domain_for_iopf(struct device *dev,
> ioasid_t pasid)
>  {
>   return NULL;
>  }
> +static ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> iommu_domain *domain)
> +{
> + return INVALID_IOASID;
> +}
>  #endif /* CONFIG_IOMMU_API */
> 
>  #ifdef CONFIG_IOMMU_SVA
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 4/6] iommu: Add PASID support for DMA mapping API users

2022-05-23 Thread Tian, Kevin
> From: Jacob Pan 
> Sent: Thursday, May 19, 2022 2:21 AM
> 
> DMA mapping API is the de facto standard for in-kernel DMA. It operates
> on a per device/RID basis which is not PASID-aware.
> 
> Some modern devices such as Intel Data Streaming Accelerator, PASID is
> required for certain work submissions. To allow such devices use DMA
> mapping API, we need the following functionalities:
> 1. Provide device a way to retrieve a PASID for work submission within
> the kernel
> 2. Enable the kernel PASID on the IOMMU for the device
> 3. Attach the kernel PASID to the device's default DMA domain, let it
> be IOVA or physical address in case of pass-through.
> 
> This patch introduces a driver facing API that enables DMA API
> PASID usage. Once enabled, device drivers can continue to use DMA APIs as
> is. There is no difference in dma_handle between without PASID and with
> PASID.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/dma-iommu.c | 114
> ++
>  include/linux/dma-iommu.h |   3 +
>  2 files changed, 117 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 1ca85d37eeab..6ad7ba619ef0 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
>   phys_addr_t phys;
>  };
> 
> +static DECLARE_IOASID_SET(iommu_dma_pasid);
> +
>  enum iommu_dma_cookie_type {
>   IOMMU_DMA_IOVA_COOKIE,
>   IOMMU_DMA_MSI_COOKIE,
> @@ -370,6 +372,118 @@ void iommu_put_dma_cookie(struct
> iommu_domain *domain)
>   domain->iova_cookie = NULL;
>  }
> 
> +/* Protect iommu_domain DMA PASID data */
> +static DEFINE_MUTEX(dma_pasid_lock);
> +/**
> + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the
> device's
> + * DMA domain.
> + * @dev: Device to be enabled
> + * @pasid: The returned kernel PASID to be used for DMA
> + *
> + * DMA request with PASID will be mapped the same way as the legacy DMA.
> + * If the device is in pass-through, PASID will also pass-through. If the
> + * device is in IOVA, the PASID will point to the same IOVA page table.
> + *
> + * @return err code or 0 on success
> + */
> +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)

iommu_attach_dma_domain_pasid? 'dma_pasid' is too broad from
a API p.o.v.

> +{
> + struct iommu_domain *dom;
> + ioasid_t id, max;
> + int ret = 0;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> + return -ENODEV;
> +
> + /* Only support domain types that DMA API can be used */
> + if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> + dom->type == IOMMU_DOMAIN_BLOCKED) {
> + dev_warn(dev, "Invalid domain type %d", dom->type);
> + return -EPERM;
> + }

WARN_ON.

and probably we can just check whether domain is default domain here.

> +
> + mutex_lock(_pasid_lock);
> + id = dom->dma_pasid;
> + if (!id) {
> + /*
> +  * First device to use PASID in its DMA domain, allocate
> +  * a single PASID per DMA domain is all we need, it is also
> +  * good for performance when it comes down to IOTLB flush.
> +  */
> + max = 1U << dev->iommu->pasid_bits;
> + if (!max) {
> + ret = -EINVAL;
> + goto done_unlock;
> + }
> +
> + id = ioasid_alloc(_dma_pasid, 1, max, dev);
> + if (id == INVALID_IOASID) {
> + ret = -ENOMEM;
> + goto done_unlock;
> + }
> +
> + dom->dma_pasid = id;
> + atomic_set(>dma_pasid_users, 1);

this is always accessed with lock held hence no need to be atomic.

> + }
> +
> + ret = iommu_attach_device_pasid(dom, dev, id);
> + if (!ret) {
> + *pasid = id;
> + atomic_inc(>dma_pasid_users);
> + goto done_unlock;
> + }
> +
> + if (atomic_dec_and_test(>dma_pasid_users)) {
> + ioasid_free(id);
> + dom->dma_pasid = 0;
> + }
> +done_unlock:
> + mutex_unlock(_pasid_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL(iommu_attach_dma_pasid);
> +
> +/**
> + * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
> + * @dev: Device's PASID DMA to be disabled
> + *
> + * It is the device driver's responsibility to ensure no more incoming DMA
> + * requests with the kernel PASID before calling this function. IOMMU driver
> + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
> + * drained.
> + *
> + */
> +void iommu_detach_dma_pasid(struct device *dev)
> +{
> + struct iommu_domain *dom;
> + ioasid_t pasid;
> +
> + dom = iommu_get_domain_for_dev(dev);
> + if (WARN_ON(!dom || !dom->ops || !dom->ops->detach_dev_pasid))
> + return;
> +
> + /* Only support DMA API managed domain type 

Re: [PATCH 0/2] dma-mapping, remoteproc: Fix dma_mem leak after rproc_shutdown

2022-05-23 Thread Mark-PK Tsai via iommu
> Sigh.  In theory drivers should never declare coherent memory like
> this, and there has been some work to fix remoteproc in that regard.
>
> But I guess until that is merged we'll need somthing like this fix.

Hi,

Thanks for your comment.
As I didn't see other fix of this issue, should we use this patch
before the remoteproc work you mentioned is merged?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/ipmmu-vmsa: Avoid leak OF node on error

2022-05-23 Thread cgel . zte
From: Minghao Chi 

The OF node should be put before returning error in ipmmu_init(),
otherwise node's refcount will be leaked.

Reported-by: Zeal Robot 
Signed-off-by: Minghao Chi 
---
 drivers/iommu/ipmmu-vmsa.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 8fdb84b3642b..f6440b106f46 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -1179,11 +1179,10 @@ static int __init ipmmu_init(void)
return 0;
 
np = of_find_matching_node(NULL, ipmmu_of_ids);
+   of_node_put(np);
if (!np)
return 0;
 
-   of_node_put(np);
-
ret = platform_driver_register(_driver);
if (ret < 0)
return ret;
-- 
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Avoid leak OF node on error

2022-05-23 Thread Johan Hovold
On Mon, May 23, 2022 at 11:11:45AM +, cgel@gmail.com wrote:
> From: Minghao Chi 
> 
> The OF node should be put before returning error in ipmmu_init(),
> otherwise node's refcount will be leaked.
> 
> Reported-by: Zeal Robot 
> Signed-off-by: Minghao Chi 
> ---
>  drivers/iommu/ipmmu-vmsa.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index 8fdb84b3642b..f6440b106f46 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -1179,11 +1179,10 @@ static int __init ipmmu_init(void)
>   return 0;
>  
>   np = of_find_matching_node(NULL, ipmmu_of_ids);
> + of_node_put(np);
>   if (!np)
>   return 0;
>  
> - of_node_put(np);
> -
>   ret = platform_driver_register(_driver);
>   if (ret < 0)
>   return ret;

NAK
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] DMA mapping changes for SCSI core

2022-05-23 Thread John Garry via iommu

On 22/05/2022 23:22, Damien Le Moal wrote:

On 2022/05/22 22:13, Christoph Hellwig wrote:

The whole series looks fine to me.  I'll happily queue it up in the
dma-mapping tree if the SCSI and ATA maintainers are ok with that.



Fine with me. I sent an acked-by for the libata bit.



Thanks, I'm going to have to post a v2 and I figure that with the timing 
that I'll have to wait for v5.20 now.


Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-23 Thread John Garry via iommu

On 21/05/2022 00:30, Damien Le Moal wrote:

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index f69b77cbf538..a3ae6345473b 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -225,6 +225,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
   shost->can_queue);
  


Hi Damien,


+   if (dma_dev->dma_mask) {
+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+   }

Nit: you could drop the curly brackets here.


Some people prefer this style - multi-line statements have curly 
brackets, while single-line statements conform to the official coding 
style (and don't use brackets).


I'll just stick with what we have unless there is a consensus to change.

Thanks,
John




+
error = scsi_init_sense_cache(shost);
if (error)
goto fail;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-23 Thread Damien Le Moal via iommu
On 2022/05/20 17:23, John Garry wrote:
> Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
> allows the drivers to know the optimal mapping limit and thus limit the
> requested IOVA lengths.
> 
> This value is based on the IOVA rcache range limit, as IOVAs allocated
> above this limit must always be newly allocated, which may be quite slow.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/dma-iommu.c | 6 ++
>  drivers/iommu/iova.c  | 5 +
>  include/linux/iova.h  | 2 ++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 09f6e1c0f9c0..f619e41b9172 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1442,6 +1442,11 @@ static unsigned long 
> iommu_dma_get_merge_boundary(struct device *dev)
>   return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
>  }
>  
> +static size_t iommu_dma_opt_mapping_size(void)
> +{
> + return iova_rcache_range();
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>   .alloc  = iommu_dma_alloc,
>   .free   = iommu_dma_free,
> @@ -1462,6 +1467,7 @@ static const struct dma_map_ops iommu_dma_ops = {
>   .map_resource   = iommu_dma_map_resource,
>   .unmap_resource = iommu_dma_unmap_resource,
>   .get_merge_boundary = iommu_dma_get_merge_boundary,
> + .opt_mapping_size   = iommu_dma_opt_mapping_size,
>  };
>  
>  /*
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index db77aa675145..9f00b58d546e 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain 
> *iovad,
>  static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
> *iovad);
>  static void free_iova_rcaches(struct iova_domain *iovad);
>  
> +unsigned long iova_rcache_range(void)
> +{
> + return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
> +}
> +
>  static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
>  {
>   struct iova_domain *iovad;
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 320a70e40233..c6ba6d95d79c 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain 
> *iovad, dma_addr_t iova)
>  int iova_cache_get(void);
>  void iova_cache_put(void);
>  
> +unsigned long iova_rcache_range(void);
> +
>  void free_iova(struct iova_domain *iovad, unsigned long pfn);
>  void __free_iova(struct iova_domain *iovad, struct iova *iova);
>  struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-23 Thread Damien Le Moal via iommu
On 2022/05/23 15:53, John Garry wrote:
> On 21/05/2022 00:30, Damien Le Moal wrote:
>>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>>> index f69b77cbf538..a3ae6345473b 100644
>>> --- a/drivers/scsi/hosts.c
>>> +++ b/drivers/scsi/hosts.c
>>> @@ -225,6 +225,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, 
>>> struct device *dev,
>>> shost->cmd_per_lun = min_t(int, shost->cmd_per_lun,
>>>shost->can_queue);
>>>   
> 
> Hi Damien,
> 
>>> +   if (dma_dev->dma_mask) {
>>> +   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
>>> +   dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
>>> +   }
>> Nit: you could drop the curly brackets here.
> 
> Some people prefer this style - multi-line statements have curly 
> brackets, while single-line statements conform to the official coding 
> style (and don't use brackets).

OK.

> 
> I'll just stick with what we have unless there is a consensus to change.
> 
> Thanks,
> John
> 
>>
>>> +
>>> error = scsi_init_sense_cache(shost);
>>> if (error)
>>> goto fail;
> 


-- 
Damien Le Moal
Western Digital Research
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-23 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Monday, May 23, 2022 3:55 PM
> 
> > From: Jacob Pan 
> > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> > iommu_domain *domain)
> > +{
> > +   struct iommu_domain *tdomain;
> > +   struct iommu_group *group;
> > +   unsigned long index;
> > +   ioasid_t pasid = INVALID_IOASID;
> > +
> > +   group = iommu_group_get(dev);
> > +   if (!group)
> > +   return pasid;
> > +
> > +   xa_for_each(>pasid_array, index, tdomain) {
> > +   if (domain == tdomain) {
> > +   pasid = index;
> > +   break;
> > +   }
> > +   }
> 
> Don't we need to acquire the group lock here?
> 
> Btw the intention of this function is a bit confusing. Patch01 already
> stores the pasid under domain hence it's redundant to get it
> indirectly from xarray index. You could simply introduce a flag bit
> (e.g. dma_pasid_enabled) in device_domain_info and then directly
> use domain->dma_pasid once the flag is true.
> 

Just saw your discussion with Jason about v3. While it makes sense
to not specialize DMA domain in iommu driver, the use of this function
should only be that when the call chain doesn't pass down a pasid
value e.g. when doing cache invalidation for domain map/unmap. If
the upper interface already carries a pasid e.g. in detach_dev_pasid()
iommu driver can simply verify that the corresponding pasid xarray 
entry points to the specified domain instead of using this function to
loop xarray and then verify the returned pasid (as done in patch03/04).

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-23 Thread Dan Carpenter
Hi John,

url:
https://github.com/intel-lab-lkp/linux/commits/John-Garry/DMA-mapping-changes-for-SCSI-core/20220520-163049
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: s390-randconfig-m031-20220519 
(https://download.01.org/0day-ci/archive/20220521/202205210545.gks834ds-...@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.3.0

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 

smatch warnings:
drivers/scsi/hosts.c:243 scsi_add_host_with_dma() warn: variable dereferenced 
before check 'dma_dev' (see line 228)

vim +/dma_dev +243 drivers/scsi/hosts.c

d139b9bd0e52dd James Bottomley   2009-11-05  209  int 
scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
d139b9bd0e52dd James Bottomley   2009-11-05  210   
struct device *dma_dev)
^1da177e4c3f41 Linus Torvalds2005-04-16  211  {
^1da177e4c3f41 Linus Torvalds2005-04-16  212struct 
scsi_host_template *sht = shost->hostt;
^1da177e4c3f41 Linus Torvalds2005-04-16  213int error = -EINVAL;
^1da177e4c3f41 Linus Torvalds2005-04-16  214  
91921e016a2199 Hannes Reinecke   2014-06-25  215shost_printk(KERN_INFO, 
shost, "%s\n",
^1da177e4c3f41 Linus Torvalds2005-04-16  216
sht->info ? sht->info(shost) : sht->name);
^1da177e4c3f41 Linus Torvalds2005-04-16  217  
^1da177e4c3f41 Linus Torvalds2005-04-16  218if (!shost->can_queue) {
91921e016a2199 Hannes Reinecke   2014-06-25  219
shost_printk(KERN_ERR, shost,
91921e016a2199 Hannes Reinecke   2014-06-25  220 
"can_queue = 0 no longer supported\n");
542bd1377a9630 James Bottomley   2008-04-21  221goto fail;
^1da177e4c3f41 Linus Torvalds2005-04-16  222}
^1da177e4c3f41 Linus Torvalds2005-04-16  223  
50b6cb3516365c Dexuan Cui2021-10-07  224/* Use min_t(int, ...) 
in case shost->can_queue exceeds SHRT_MAX */
50b6cb3516365c Dexuan Cui2021-10-07  225shost->cmd_per_lun = 
min_t(int, shost->cmd_per_lun,
ea2f0f77538c50 John Garry2021-05-19  226
   shost->can_queue);
ea2f0f77538c50 John Garry2021-05-19  227  
2ad7ba6ca08593 John Garry2022-05-20 @228if (dma_dev->dma_mask) {
^
The patch adds a new unchecked dereference

2ad7ba6ca08593 John Garry2022-05-20  229
shost->max_sectors = min_t(unsigned int, shost->max_sectors,
2ad7ba6ca08593 John Garry2022-05-20  230
dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
2ad7ba6ca08593 John Garry2022-05-20  231}
2ad7ba6ca08593 John Garry2022-05-20  232  
0a6ac4ee7c2109 Christoph Hellwig 2017-01-03  233error = 
scsi_init_sense_cache(shost);
0a6ac4ee7c2109 Christoph Hellwig 2017-01-03  234if (error)
0a6ac4ee7c2109 Christoph Hellwig 2017-01-03  235goto fail;
0a6ac4ee7c2109 Christoph Hellwig 2017-01-03  236  
d285203cf647d7 Christoph Hellwig 2014-01-17  237error = 
scsi_mq_setup_tags(shost);
542bd1377a9630 James Bottomley   2008-04-21  238if (error)
542bd1377a9630 James Bottomley   2008-04-21  239goto fail;
d285203cf647d7 Christoph Hellwig 2014-01-17  240  
^1da177e4c3f41 Linus Torvalds2005-04-16  241if 
(!shost->shost_gendev.parent)
^1da177e4c3f41 Linus Torvalds2005-04-16  242
shost->shost_gendev.parent = dev ? dev : _bus;
3c8d9a957d0ae6 James Bottomley   2012-05-04 @243if (!dma_dev)

The old code checked for NULL

3c8d9a957d0ae6 James Bottomley   2012-05-04  244dma_dev = 
shost->shost_gendev.parent;
3c8d9a957d0ae6 James Bottomley   2012-05-04  245  
d139b9bd0e52dd James Bottomley   2009-11-05  246shost->dma_dev = 
dma_dev;
^1da177e4c3f41 Linus Torvalds2005-04-16  247  
5c6fab9d558470 Mika Westerberg   2016-02-18  248/*
5c6fab9d558470 Mika Westerberg   2016-02-18  249 * Increase usage count 
temporarily here so that calling
5c6fab9d558470 Mika Westerberg   2016-02-18  250 * 
scsi_autopm_put_host() will trigger runtime idle if there is
5c6fab9d558470 Mika Westerberg   2016-02-18  251 * nothing else 
preventing suspending the device.
5c6fab9d558470 Mika Westerberg   2016-02-18  252 */
5c6fab9d558470 Mika Westerberg   2016-02-18  253
pm_runtime_get_noresume(>shost_gendev);
bc4f24014de58f Alan Stern2010-06-17  254
pm_runtime_set_active(>shost_gendev);
bc4f24014de58f Alan Stern2010-06-17  255
pm_runtime_enable(>shost_gendev);
bc4f24014de58f Alan Stern2010-06-17  256
device_enable_async_suspend(>shost_gendev);
bc4f24014de58f Alan Stern

Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-23 Thread Baolu Lu

On 2022/5/19 15:20, Lu Baolu wrote:

The iommu_sva_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructure
to support SVA domain in the iommu common layer. It includes:

- Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA domain
   type.
- Add a new domain ops pointer in iommu_ops. The IOMMU drivers that
   support SVA should provide the callbacks.
- Add helpers to allocate and free an SVA domain.
- Add helpers to set an SVA domain to a device and the reverse
   operation.

Some buses, like PCI, route packets without considering the PASID value.
Thus a DMA target address with PASID might be treated as P2P if the
address falls into the MMIO BAR of other devices in the group. To make
things simple, the attach/detach interfaces only apply to devices
belonging to the singleton groups, and the singleton is immutable in
fabric i.e. not affected by hotplug.

The iommu_set/block_device_pasid() can be used for other purposes,
such as kernel DMA with pasid, mediation device, etc. Hence, it is put
in the iommu.c.

Suggested-by: Jean-Philippe Brucker
Suggested-by: Jason Gunthorpe
Signed-off-by: Lu Baolu
---
  include/linux/iommu.h | 51 +
  drivers/iommu/iommu-sva-lib.h | 15 
  drivers/iommu/iommu-sva-lib.c | 48 +++
  drivers/iommu/iommu.c | 71 +++
  4 files changed, 185 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0c358b7c583b..e8cf82d46ce1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -64,6 +64,9 @@ struct iommu_domain_geometry {
  #define __IOMMU_DOMAIN_PT (1U << 2)  /* Domain is identity mapped   */
  #define __IOMMU_DOMAIN_DMA_FQ (1U << 3)  /* DMA-API uses flush queue*/
  
+#define __IOMMU_DOMAIN_SHARED	(1U << 4)  /* Page table shared from CPU  */

+#define __IOMMU_DOMAIN_HOST_VA (1U << 5)  /* Host CPU virtual address */
+
  /*
   * This are the possible domain-types
   *
@@ -86,6 +89,8 @@ struct iommu_domain_geometry {
  #define IOMMU_DOMAIN_DMA_FQ   (__IOMMU_DOMAIN_PAGING |\
 __IOMMU_DOMAIN_DMA_API |   \
 __IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_SVA   (__IOMMU_DOMAIN_SHARED |\
+__IOMMU_DOMAIN_HOST_VA)
  
  struct iommu_domain {

unsigned type;
@@ -254,6 +259,7 @@ struct iommu_ops {
int (*def_domain_type)(struct device *dev);
  
  	const struct iommu_domain_ops *default_domain_ops;

+   const struct iommu_domain_ops *sva_domain_ops;


Per Joerg's comment in anther thread,

https://lore.kernel.org/linux-iommu/yodvj7ervpidw...@8bytes.org/

adding a sva_domain_ops here is not the right way to go.

If no objection, I will make the sva domain go through the
generic domain_alloc/free() callbacks in the next version.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-23 Thread Damien Le Moal via iommu
On 2022/05/23 16:01, John Garry wrote:
> On 21/05/2022 00:33, Damien Le Moal wrote:
> 
> Hi Damien,
> 
>>> +unsigned long iova_rcache_range(void)
>> Why not a size_t return type ?
> 
> The IOVA code generally uses unsigned long for size/range while 
> dam-iommu uses size_t as appropiate, so I'm just sticking to that.

OK.

> 
>>
>>> +{
>>> +   return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
>>> +}
>>> +
> 
> Thanks,
> John


-- 
Damien Le Moal
Western Digital Research
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] dma-mapping, remoteproc: Fix dma_mem leak after rproc_shutdown

2022-05-23 Thread Robin Murphy

On 2022-05-23 11:15, Mark-PK Tsai wrote:

Sigh.  In theory drivers should never declare coherent memory like
this, and there has been some work to fix remoteproc in that regard.

But I guess until that is merged we'll need somthing like this fix.


Hi,

Thanks for your comment.
As I didn't see other fix of this issue, should we use this patch
before the remoteproc work you mentioned is merged?


TBH I think it would be better "fixed" with a kmemleak_ignore() and a 
big comment, rather than adding API cruft for something that isn't a 
real problem. I'm quite sure that no real-world user is unbinding 
remoteproc drivers frequently enough that leaking a 48-byte allocation 
each time has any practical significance.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-23 Thread John Garry via iommu

On 23/05/2022 12:08, Dan Carpenter wrote:

Thanks for the report


50b6cb3516365c Dexuan Cui2021-10-07  224/* Use min_t(int, ...) in 
case shost->can_queue exceeds SHRT_MAX */
50b6cb3516365c Dexuan Cui2021-10-07  225shost->cmd_per_lun = 
min_t(int, shost->cmd_per_lun,
ea2f0f77538c50 John Garry2021-05-19  226   
shost->can_queue);
ea2f0f77538c50 John Garry2021-05-19  227
2ad7ba6ca08593 John Garry2022-05-20 @228if (dma_dev->dma_mask) {
 ^


I knew that we fixed up dma_dev to be non-NULL, but I thought it was 
earlier in this function...



The patch adds a new unchecked dereference

2ad7ba6ca08593 John Garry2022-05-20  229shost->max_sectors 
= min_t(unsigned int, shost->max_sectors,
2ad7ba6ca08593 John Garry2022-05-20  230
dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
2ad7ba6ca08593 John Garry2022-05-20  231}
2ad7ba6ca08593 John Garry2022-05-20  232
0a6ac4ee7c2109 Christoph Hellwig 2017-01-03  233error = 
scsi_init_sense_cache(shost);
0a6ac4ee7c2109 Christoph Hellwig 2017-01-03  234if (error)
0a6ac4ee7c2109 Christoph Hellwig 2017-01-03  235goto fail;
0a6ac4ee7c2109 Christoph Hellwig 2017-01-03  236
d285203cf647d7 Christoph Hellwig 2014-01-17  237error = 
scsi_mq_setup_tags(shost);
542bd1377a9630 James Bottomley   2008-04-21  238if (error)
542bd1377a9630 James Bottomley   2008-04-21  239goto fail;
d285203cf647d7 Christoph Hellwig 2014-01-17  240
^1da177e4c3f41 Linus Torvalds2005-04-16  241if 
(!shost->shost_gendev.parent)
^1da177e4c3f41 Linus Torvalds2005-04-16  242
shost->shost_gendev.parent = dev ? dev : _bus;
3c8d9a957d0ae6 James Bottomley   2012-05-04 @243if (!dma_dev)
 


Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts

2022-05-23 Thread Sai Prakash Ranjan

On 5/23/2022 10:48 PM, Sai Prakash Ranjan wrote:

TLB sync timeouts can be due to various reasons such as TBU power down
or pending TCU/TBU invalidation/sync and so on. Debugging these often
require dumping of some implementation defined registers to know the
status of TBU/TCU operations and some of these registers are not
accessible in non-secure world such as from kernel and requires SMC
calls to read them in the secure world. So, add this debug support
to dump implementation defined registers for TLB sync timeout issues.

Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 50 ++
  drivers/iommu/arm/arm-smmu/arm-smmu.c  |  2 +
  drivers/iommu/arm/arm-smmu/arm-smmu.h  |  4 ++
  3 files changed, 56 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 7820711c4560..22e9a0085475 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -5,11 +5,19 @@
  
  #include 

  #include 
+#include 
  #include 
  #include 
  
  #include "arm-smmu.h"
  
+#define QCOM_DUMMY_VAL	-1

+
+/* Implementation Defined Register Space 0 registers */
+#define QCOM_SMMU_STATS_SYNC_INV_TBU_ACK   0x5dc
+#define QCOM_SMMU_TBU_PWR_STATUS   0x2204
+#define QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR   0x2670
+
  struct qcom_smmu {
struct arm_smmu_device smmu;
bool bypass_quirk;
@@ -22,6 +30,46 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device 
*smmu)
return container_of(smmu, struct qcom_smmu, smmu);
  }
  
+static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page,

+   int sync, int status)
+{
+   u32 sync_inv_ack, sync_inv_progress, tbu_pwr_status;
+   unsigned int spin_cnt, delay;
+   u32 reg;
+   int ret;
+
+   arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL);
+   for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
+   for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) {
+   reg = arm_smmu_readl(smmu, page, status);
+   if (!(reg & ARM_SMMU_sTLBGSTATUS_GSACTIVE))
+   return;
+   cpu_relax();
+   }
+   udelay(delay);
+   }
+
+   sync_inv_ack = arm_smmu_readl(smmu, ARM_SMMU_IMPL_DEF0,
+ QCOM_SMMU_STATS_SYNC_INV_TBU_ACK);


Sorry, this doesn't work always, looks like on earlier chipsets this is a 
secure register and
reading it from non-secure world would probably blow. Also this register can be 
in other
implementation defined space for different chipsets. So I think we can use SCM 
call here
and have a device specific data based on already existing compatible for QCOM 
SoCs to
identify IMP_DEF space used.


+   ret = qcom_scm_io_readl(smmu->ioaddr + QCOM_SMMU_TBU_PWR_STATUS,
+   _pwr_status);
+   if (ret)
+   dev_err_ratelimited(smmu->dev,
+   "Failed to read TBU power status: %d\n", 
ret);
+
+   ret = qcom_scm_io_readl(smmu->ioaddr + 
QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR,
+   _inv_progress);
+   if (ret)
+   dev_err_ratelimited(smmu->dev,
+   "Failed to read SAFE WAIT counter: %d\n", 
ret);
+
+   dev_err_ratelimited(smmu->dev,
+   "TLB sync timed out -- SMMU may be deadlocked\n"
+   "TBU: sync_inv_ack %#x power_status %#x 
sync_inv_progress %#x\n",
+   sync_inv_ack, tbu_pwr_status, sync_inv_progress);
+}
+
  static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int 
idx,
u32 reg)
  {
@@ -374,6 +422,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
.write_s2cr = qcom_smmu_write_s2cr,
+   .tlb_sync = qcom_smmu_tlb_sync,
  };
  
  static const struct arm_smmu_impl qcom_adreno_smmu_impl = {

@@ -382,6 +431,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
.reset = qcom_smmu500_reset,
.alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
.write_sctlr = qcom_adreno_smmu_write_sctlr,
+   .tlb_sync = qcom_smmu_tlb_sync,
  };
  
  static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..4c5b51109835 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2099,6 +2099,8 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (IS_ERR(smmu->base))
return PTR_ERR(smmu->base);
ioaddr = res->start;
+   smmu->ioaddr = ioaddr;
+
/*
  

Re: [PATCH 0/4] DMA mapping changes for SCSI core

2022-05-23 Thread Martin K. Petersen


Christoph,

> The whole series looks fine to me.  I'll happily queue it up in the
> dma-mapping tree if the SCSI and ATA maintainers are ok with that.

Works for me.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration

2022-05-23 Thread Nathan Chancellor
virtio \
> > > > > > -display none \
> > > > > > -kernel arch/s390/boot/bzImage \
> > > > > > -m 512m \
> > > > > > -nodefaults \
> > > > > > -serial mon:stdio
> > > > > > ...
> > > > > > [2.077303] In-situ OAM (IOAM) with IPv6
> > > > > > [2.077639] NET: Registered PF_PACKET protocol family
> > > > > > [2.078063] bridge: filtering via arp/ip/ip6tables is no longer 
> > > > > > available by default. Update your scripts to load br_netfilter if 
> > > > > > you need this.
> > > > > > [2.078795] Key type dns_resolver registered
> > > > > > [2.079317] cio: Channel measurement facility initialized using 
> > > > > > format extended (mode autodetected)
> > > > > > [2.081494] Discipline DIAG cannot be used without z/VM
> > > > > > [  260.626363] random: crng init done
> > > > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 
> > > > > > (timeout)
> > > > > >
> > > > > > We have a simple rootfs available if necessary:
> > > > > >
> > > > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst
> > > > > >
> > > > > > If there is any other information I can provide, please let me know!
> > > > >
> > > > > Hmm... strange. Can you please try the following command line options
> > > > > and tell me which of these has the issue and which don't?
> > > >
> > > > Sure thing!
> > > >
> > > > > 1) deferred_probe_timeout=0
> > > >
> > > > No issue.
> > > >
> > > > > 2) deferred_probe_timeout=1
> > > > > 3) deferred_probe_timeout=300
> > > >
> > > > Both of these appear to hang in the same way, I let each sit for five
> > > > minutes.
> > >
> > > Strange that a sufficiently large timeout isn't helping. Is it trying
> > > to boot off a network mount? I'll continue looking into this next
> > > week.
> >
> > I don't think so, it seems like doing that requires some extra flags
> > that we do not have:
> >
> > https://wiki.qemu.org/Features/S390xNetworkBoot
> >
> > If you need any additional information or want something tested, please
> > let me know!
> 
> I'll try to get qemu going on my end, but I'm not too confident I'll
> be able to get to it in a timely fashion. So if you can help figure
> out where this boot process is hanging, that'd be very much
> appreciated.

Sure thing! Information included below, I am more than happy to continue
to test and debug as you need.

> Couple of suggestions for debugging:
> 
> Can you add a log to "wait_for_device_probe()" and see if that's
> getting called right before the boot process hangs? If it does, can
> you get a stacktrace (I just add a WARN_ON(1) when I need a stack
> trace)? It's unlikely this is the case because
> deferred_probe_timeout=1 still causes an issue for you, but I'd be
> good to rule out.

If I add a pr_info() call at the top of wait_for_device_probe(), I see
it right before the process hangs. Adding WARN_ON(1) right below that
reveals dasd_eckd_init() in drivers/s390/block/dasd_eckd.c calls
wait_for_device_probe():

[4.610397] [ cut here ]
[4.610520] WARNING: CPU: 0 PID: 1 at drivers/base/dd.c:742 
wait_for_device_probe+0x28/0x110
[4.611134] Modules linked in:
[4.611593] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.18.0-next-20220523-dirty #1
[4.611830] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[4.612017] Krnl PSW : 0704c0018000 00ce4b3c 
(wait_for_device_probe+0x2c/0x110)
[4.612258]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
[4.612387] Krnl GPRS: 8000f071 0027 000c 
017f91d8
[4.612457]f071 017f9218 01a655a0 
0006
[4.612521]0002 01965810 019d51a0 

[4.612585]02218000 0125bcc8 00ce4b38 
0380bc80
[4.614814] Krnl Code: 00ce4b2c: e3e0f0980024stg 
%r14,152(%r15)
[4.614814]00ce4b32: c0e594cbbrasl   
%r14,00cd74c8
[4.614814]   #00ce4b38: af00mc  0,0
[4.614814]   >00ce4b3c: c0100054d1falarl
%r1,0177ef30
[4.614814]00ce4b42: e3101012lt  
%r1,0(%r1)
[4.614814]00ce4b48: a784002dbrc 
8,00ce4ba2
[4.614814]00ce4b4c: d727f0a0f0a0xc  
160(40,%r15),160(%r15)
[4.614814]00ce4b52: 41b0f0a0la  
%r11,160(%r15)
[4.615698] Call Trace:
[4.616559]  [<00ce4b3c>] wait_for_device_probe+0x2c/0x110
[4.616744] ([<00ce4b38>] wait_for_device_probe+0x28/0x110)
[4.616841]  [<0196593e>] dasd_eckd_init+0x12e/0x178
[4.616913]  [<00100936>] do_one_initcall+0x46/0x1e8
[4.616983]  [<01920706>] do_initcalls+0x126/0x150
[4.617046]  [<0192095e>] kernel_init_freeable+0x1ae/0x1f0
[4.617110]  [<00ce85a6>] kernel_init+0x2e/0x168
[4.617171]  [<00103320>] __ret_from_fork+0x40/0x58
[4.617233]  [<00cf5eaa>] ret_from_fork+0xa/0x40
[4.617352] Last Breaking-Event-Address:
[4.617393]  [<00e0e098>] __s390_indirect_jump_r14+0x0/0xc
[4.617481] ---[ end trace  ]---

> Let's try to rule out if deferred_probe_extend_timeout() is causing
> some issues. So, without my patch, what happens if you set:
> deferred_probe_timeout=1
> deferred_probe_timeout=300

At commit 6ee60e9c9f2f ("MAINTAINERS: add Russ Weight as a firmware
loader maintainer"), both deferred_probe_timeout=1 and
deferred_probe_timeout=300 hang the boot.

> If deferred_probe_timeout=1 causes an issue even without my patch,
> then in addition, can you try commenting out the call to
> fw_devlink_drivers_done() inside deferred_probe_timeout_work_func()
> and try again?

Sure, that does not appear to make a difference with
deferred_probe_timeout=1.

Cheers,
Nathan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration

2022-05-23 Thread Saravana Kannan via iommu
> > timeout on driver registration") in next-20220520 (bisect log 
> > > > > > > below).
> > > > > > >
> > > > > > > $ make -skj"$(nproc)" ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- 
> > > > > > > defconfig bzImage
> > > > > > >
> > > > > > > $ timeout --foreground 15m stdbuf -oL -eL \
> > > > > > > qemu-system-s390x \
> > > > > > > -initrd ... \
> > > > > > > -M s390-ccw-virtio \
> > > > > > > -display none \
> > > > > > > -kernel arch/s390/boot/bzImage \
> > > > > > > -m 512m \
> > > > > > > -nodefaults \
> > > > > > > -serial mon:stdio
> > > > > > > ...
> > > > > > > [2.077303] In-situ OAM (IOAM) with IPv6
> > > > > > > [2.077639] NET: Registered PF_PACKET protocol family
> > > > > > > [2.078063] bridge: filtering via arp/ip/ip6tables is no 
> > > > > > > longer available by default. Update your scripts to load 
> > > > > > > br_netfilter if you need this.
> > > > > > > [2.078795] Key type dns_resolver registered
> > > > > > > [2.079317] cio: Channel measurement facility initialized 
> > > > > > > using format extended (mode autodetected)
> > > > > > > [2.081494] Discipline DIAG cannot be used without z/VM
> > > > > > > [  260.626363] random: crng init done
> > > > > > > qemu-system-s390x: terminating on signal 15 from pid 3815762 
> > > > > > > (timeout)
> > > > > > >
> > > > > > > We have a simple rootfs available if necessary:
> > > > > > >
> > > > > > > https://github.com/ClangBuiltLinux/boot-utils/raw/bc0d17785eb67f1edd0ee0a134970a807895f741/images/s390/rootfs.cpio.zst
> > > > > > >
> > > > > > > If there is any other information I can provide, please let me 
> > > > > > > know!
> > > > > >
> > > > > > Hmm... strange. Can you please try the following command line 
> > > > > > options
> > > > > > and tell me which of these has the issue and which don't?
> > > > >
> > > > > Sure thing!
> > > > >
> > > > > > 1) deferred_probe_timeout=0
> > > > >
> > > > > No issue.
> > > > >
> > > > > > 2) deferred_probe_timeout=1
> > > > > > 3) deferred_probe_timeout=300
> > > > >
> > > > > Both of these appear to hang in the same way, I let each sit for five
> > > > > minutes.
> > > >
> > > > Strange that a sufficiently large timeout isn't helping. Is it trying
> > > > to boot off a network mount? I'll continue looking into this next
> > > > week.
> > >
> > > I don't think so, it seems like doing that requires some extra flags
> > > that we do not have:
> > >
> > > https://wiki.qemu.org/Features/S390xNetworkBoot
> > >
> > > If you need any additional information or want something tested, please
> > > let me know!
> >
> > I'll try to get qemu going on my end, but I'm not too confident I'll
> > be able to get to it in a timely fashion. So if you can help figure
> > out where this boot process is hanging, that'd be very much
> > appreciated.
>
> Sure thing! Information included below, I am more than happy to continue
> to test and debug as you need.
>
> > Couple of suggestions for debugging:
> >
> > Can you add a log to "wait_for_device_probe()" and see if that's
> > getting called right before the boot process hangs? If it does, can
> > you get a stacktrace (I just add a WARN_ON(1) when I need a stack
> > trace)? It's unlikely this is the case because
> > deferred_probe_timeout=1 still causes an issue for you, but I'd be
> > good to rule out.
>
> If I add a pr_info() call at the top of wait_for_device_probe(), I see
> it right before the process hangs. Adding WARN_ON(1) right below that
> reveals dasd_eckd_init() in drivers/s390/block/dasd_eckd.c calls
> wait_for_device_probe():
>
> [4.610397] [ cut here ]
> [4.610520] WARNING: CPU: 0 PID: 1 at drivers/base/dd.c:742 
> wait_for_device_probe+0x28/0x110
> [4.611134] Modules linked in:
> [4.611593] CPU: