date:20220512

Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-12 Thread David Gibson

On Wed, May 11, 2022 at 03:15:22AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, May 11, 2022 3:00 AM
> > 
> > On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote:
> > > Ok... here's a revised version of my proposal which I think addresses
> > > your concerns and simplfies things.
> > >
> > > - No new operations, but IOAS_MAP gets some new flags (and IOAS_COPY
> > >   will probably need matching changes)
> > >
> > > - By default the IOVA given to IOAS_MAP is a hint only, and the IOVA
> > >   is chosen by the kernel within the aperture(s).  This is closer to
> > >   how mmap() operates, and DPDK and similar shouldn't care about
> > >   having specific IOVAs, even at the individual mapping level.
> > >
> > > - IOAS_MAP gets an IOMAP_FIXED flag, analagous to mmap()'s MAP_FIXED,
> > >   for when you really do want to control the IOVA (qemu, maybe some
> > >   special userspace driver cases)
> > 
> > We already did both of these, the flag is called
> > IOMMU_IOAS_MAP_FIXED_IOVA - if it is not specified then kernel will
> > select the IOVA internally.
> > 
> > > - ATTACH will fail if the new device would shrink the aperture to
> > >   exclude any already established mappings (I assume this is already
> > >   the case)
> > 
> > Yes
> > 
> > > - IOAS_MAP gets an IOMAP_RESERVE flag, which operates a bit like a
> > >   PROT_NONE mmap().  It reserves that IOVA space, so other (non-FIXED)
> > >   MAPs won't use it, but doesn't actually put anything into the IO
> > >   pagetables.
> > > - Like a regular mapping, ATTACHes that are incompatible with an
> > >   IOMAP_RESERVEed region will fail
> > > - An IOMAP_RESERVEed area can be overmapped with an IOMAP_FIXED
> > >   mapping
> > 
> > Yeah, this seems OK, I'm thinking a new API might make sense because
> > you don't really want mmap replacement semantics but a permanent
> > record of what IOVA must always be valid.
> > 
> > IOMMU_IOA_REQUIRE_IOVA perhaps, similar signature to
> > IOMMUFD_CMD_IOAS_IOVA_RANGES:
> > 
> > struct iommu_ioas_require_iova {
> > __u32 size;
> > __u32 ioas_id;
> > __u32 num_iovas;
> > __u32 __reserved;
> > struct iommu_required_iovas {
> > __aligned_u64 start;
> > __aligned_u64 last;
> > } required_iovas[];
> > };
> 
> As a permanent record do we want to enforce that once the required
> range list is set all FIXED and non-FIXED allocations must be within the
> list of ranges?

No, I don't think so.  In fact the way I was envisaging this,
non-FIXED mappings will *never* go into the reserved ranges.  This is
for the benefit of any use cases that need both mappings where they
don't care about the IOVA and those which do.

Essentially, reserving a region here is saying to the kernel "I want
to manage this IOVA space; make sure nothing else touches it".  That
means both that the kernel must disallow any hw associated changes
(like ATTACH) which would impinge on the reserved region, and also any
IOVA allocations that would take parts away from that space.

Whether we want to restrict FIXED mappings to the reserved regions is
an interesting question.  I wasn't thinking that would be necessary
(just as you can use mmap() MAP_FIXED anywhere).  However.. much as
MAP_FIXED is very dangerous to use if you don't previously reserve
address space, I think IOMAP_FIXED is dangerous if you haven't
previously reserved space.  So maybe it would make sense to only allow
FIXED mappings within reserved regions.

Strictly dividing the IOVA space into kernel managed and user managed
regions does make a certain amount of sense.

> If yes we can take the end of the last range as the max size of the iova
> address space to optimize the page table layout.
> 
> otherwise we may need another dedicated hint for that optimization.

Right.  With the revised model where reserving windows is optional,
not required, I don't think we can quite re-use this for optimization
hints.  Which is a bit unfortunate.

I can't immediately see a way to tweak this which handles both more
neatly, but I like the idea if we can figure out a way.

> > > So, for DPDK the sequence would be:
> > >
> > > 1. Create IOAS
> > > 2. ATTACH devices
> > > 3. IOAS_MAP some stuff
> > > 4. Do DMA with the IOVAs that IOAS_MAP returned
> > >
> > > (Note, not even any need for QUERY in simple cases)
> > 
> > Yes, this is done already
> > 
> > > For (unoptimized) qemu it would be:
> > >
> > > 1. Create IOAS
> > > 2. IOAS_MAP(IOMAP_FIXED|IOMAP_RESERVE) the valid IOVA regions of
> > the
> > >guest platform
> > > 3. ATTACH devices (this will fail if they're not compatible with the
> > >reserved IOVA regions)
> > > 4. Boot the guest
> 
> I suppose above is only the sample flow for PPC vIOMMU. For non-PPC
> vIOMMUs regular mappings are required before booting the guest and
> reservation might be done but not mandatory (at least not what current
> Qemu vfio can afford as it simply rep

Re: [PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()

2022-05-12 Thread Nicolin Chen via iommu

On Fri, May 13, 2022 at 11:32:11AM +0800, Baolu Lu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2022/5/13 08:32, Nicolin Chen wrote:
> > Local boot test and VFIO sanity test show that info->iommu can be
> > used in device_to_iommu() as a fast path. So this patch adds it.
> > 
> > Signed-off-by: Nicolin Chen 
> > ---
> >   drivers/iommu/intel/iommu.c | 5 +
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 2990f80c5e08..412fca5ab9cd 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, 
> > struct device *dev)
> >   struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
> > *devfn)
> >   {
> >   struct dmar_drhd_unit *drhd = NULL;
> > + struct device_domain_info *info;
> >   struct pci_dev *pdev = NULL;
> >   struct intel_iommu *iommu;
> >   struct device *tmp;
> > @@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device 
> > *dev, u8 *bus, u8 *devfn)
> >   if (!dev)
> >   return NULL;
> > 
> > + info = dev_iommu_priv_get(dev);
> > + if (info)
> > + return info->iommu;
> 
> device_to_iommu() also returns device source id (@bus, @devfn).
> 
> Perhaps, we can make device_to_iommu() only for probe_device() where the
> per-device info data is not initialized yet. After probe_device(), iommu
> and sid are retrieved through other helpers by looking up the device
> info directly?

That should work I think. I was just not sure when the priv
could be unset. But it seems that you have cleaned up those
places other than probe/release_device() in recent version :)

Nic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()

2022-05-12 Thread Baolu Lu


On 2022/5/13 08:32, Nicolin Chen wrote:

Local boot test and VFIO sanity test show that info->iommu can be
used in device_to_iommu() as a fast path. So this patch adds it.

Signed-off-by: Nicolin Chen 
---
  drivers/iommu/intel/iommu.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2990f80c5e08..412fca5ab9cd 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, 
struct device *dev)
  struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn)
  {
struct dmar_drhd_unit *drhd = NULL;
+   struct device_domain_info *info;
struct pci_dev *pdev = NULL;
struct intel_iommu *iommu;
struct device *tmp;
@@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 
*bus, u8 *devfn)
if (!dev)
return NULL;
  
+	info = dev_iommu_priv_get(dev);

+   if (info)
+   return info->iommu;


device_to_iommu() also returns device source id (@bus, @devfn).

Perhaps, we can make device_to_iommu() only for probe_device() where the
per-device info data is not initialized yet. After probe_device(), iommu
and sid are retrieved through other helpers by looking up the device
info directly?

Best regards,
baolu


+
if (dev_is_pci(dev)) {
struct pci_dev *pf_pdev;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit

2022-05-12 Thread kernel test robot

Hi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on arm-perf/for-next/perf]
[also build test WARNING on linus/master v5.18-rc6]
[cannot apply to joro-iommu/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/intel-lab-lkp/linux/commits/yf-wang-mediatek-com/iommu-io-pgtable-arm-v7s-Add-a-quirk-to-allow-pgtable-PA-up-to-35bit/20220512-234603
base:   https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
for-next/perf
config: arm-qcom_defconfig 
(https://download.01.org/0day-ci/archive/20220513/202205131021.3gskebg2-...@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/916a5fc41cbb8ddfe343193598f250d06b09e3fa
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
yf-wang-mediatek-com/iommu-io-pgtable-arm-v7s-Add-a-quirk-to-allow-pgtable-PA-up-to-35bit/20220512-234603
git checkout 916a5fc41cbb8ddfe343193598f250d06b09e3fa
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 
O=build_dir ARCH=arm SHELL=/bin/bash drivers/iommu/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from include/linux/ratelimit_types.h:5,
from include/linux/ratelimit.h:5,
from include/linux/dev_printk.h:16,
from include/linux/device.h:15,
from include/linux/dma-mapping.h:7,
from drivers/iommu/io-pgtable-arm-v7s.c:25:
   drivers/iommu/io-pgtable-arm-v7s.c: In function 'arm_v7s_alloc_pgtable':
   include/linux/bits.h:35:29: warning: left shift count >= width of type 
[-Wshift-count-overflow]
  35 | (((~UL(0)) - (UL(1) << (l)) + 1) & \
 | ^~
   include/linux/bits.h:38:38: note: in expansion of macro '__GENMASK'
  38 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
 |  ^
   drivers/iommu/io-pgtable-arm-v7s.c:154:46: note: in expansion of macro 
'GENMASK'
 154 | ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32))
 |  ^~~
   drivers/iommu/io-pgtable-arm-v7s.c:886:25: note: in expansion of macro 
'ARM_V7S_TTBR_35BIT_PA'
 886 | ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, 
paddr);
 | ^
   include/linux/bits.h:36:18: warning: right shift count is negative 
[-Wshift-count-negative]
  36 |  (~UL(0) >> (BITS_PER_LONG - 1 - (h
 |  ^~
   include/linux/bits.h:38:38: note: in expansion of macro '__GENMASK'
  38 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
 |  ^
   drivers/iommu/io-pgtable-arm-v7s.c:154:46: note: in expansion of macro 
'GENMASK'
 154 | ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32))
 |  ^~~
   drivers/iommu/io-pgtable-arm-v7s.c:886:25: note: in expansion of macro 
'ARM_V7S_TTBR_35BIT_PA'
 886 | ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, 
paddr);
 | ^
>> drivers/iommu/io-pgtable-arm-v7s.c:154:63: warning: right shift count >= 
>> width of type [-Wshift-count-overflow]
 154 | ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32))
 |   ^~
   drivers/iommu/io-pgtable-arm-v7s.c:886:25: note: in expansion of macro 
'ARM_V7S_TTBR_35BIT_PA'
 886 | ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, 
paddr);
 | ^


vim +154 drivers/iommu/io-pgtable-arm-v7s.c

   145  
   146  #define ARM_V7S_TTBR_S  BIT(1)
   147  #define ARM_V7S_TTBR_NOSBIT(5)
   148  #define ARM_V7S_TTBR_ORGN_ATTR(attr)(((attr) & 0x3) << 3)
   149  #define ARM_V7S_TTBR_IRGN_ATTR(attr)
\
   150  attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))

Re: [PATCH v3 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit

2022-05-12 Thread kernel test robot

Hi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on arm-perf/for-next/perf]
[also build test WARNING on linus/master v5.18-rc6 next-20220512]
[cannot apply to joro-iommu/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/intel-lab-lkp/linux/commits/yf-wang-mediatek-com/iommu-io-pgtable-arm-v7s-Add-a-quirk-to-allow-pgtable-PA-up-to-35bit/20220512-234603
base:   https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
for-next/perf
config: hexagon-allmodconfig 
(https://download.01.org/0day-ci/archive/20220513/202205131016.ati0kpnr-...@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 
9519dacab7b8afd537811fc2abaceb4d14f4e16a)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/916a5fc41cbb8ddfe343193598f250d06b09e3fa
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
yf-wang-mediatek-com/iommu-io-pgtable-arm-v7s-Add-a-quirk-to-allow-pgtable-PA-up-to-35bit/20220512-234603
git checkout 916a5fc41cbb8ddfe343193598f250d06b09e3fa
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/iommu/ drivers/rtc/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/iommu/io-pgtable-arm-v7s.c:886:4: warning: shift count >= width of 
>> type [-Wshift-count-overflow]
   ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr);
   ^~~
   drivers/iommu/io-pgtable-arm-v7s.c:154:39: note: expanded from macro 
'ARM_V7S_TTBR_35BIT_PA'
   ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32))
^~~
   include/linux/bits.h:38:31: note: expanded from macro 'GENMASK'
   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
^~~
   include/linux/bits.h:35:22: note: expanded from macro '__GENMASK'
   (((~UL(0)) - (UL(1) << (l)) + 1) & \
   ^  ~~~
>> drivers/iommu/io-pgtable-arm-v7s.c:886:4: warning: shift count is negative 
>> [-Wshift-count-negative]
   ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr);
   ^~~
   drivers/iommu/io-pgtable-arm-v7s.c:154:39: note: expanded from macro 
'ARM_V7S_TTBR_35BIT_PA'
   ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32))
^~~
   include/linux/bits.h:38:31: note: expanded from macro 'GENMASK'
   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
^~~
   include/linux/bits.h:36:11: note: expanded from macro '__GENMASK'
(~UL(0) >> (BITS_PER_LONG - 1 - (h
^  ~
>> drivers/iommu/io-pgtable-arm-v7s.c:886:4: warning: shift count >= width of 
>> type [-Wshift-count-overflow]
   ARM_V7S_TTBR_35BIT_PA(cfg->arm_v7s_cfg.ttbr, paddr);
   ^~~
   drivers/iommu/io-pgtable-arm-v7s.c:154:56: note: expanded from macro 
'ARM_V7S_TTBR_35BIT_PA'
   ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32))
 ^  ~~
   3 warnings generated.


vim +886 drivers/iommu/io-pgtable-arm-v7s.c

   795  
   796  static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg 
*cfg,
   797  void *cookie)
   798  {
   799  slab_flags_t slab_flag = ARM_V7S_TABLE_SLAB_FLAGS;
   800  struct arm_v7s_io_pgtable *data;
   801  phys_addr_t paddr;
   802  
   803  if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : 
ARM_V7S_ADDR_BITS))
   804  return NULL;
   805  
   806  if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : 
ARM_V7S_ADDR_BITS))
   807  return NULL;
   808  
   809  if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
   810

Re: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting

2022-05-12 Thread Baolu Lu


On 2022/5/13 07:12, Steve Wahl wrote:

On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote:

To support up to 64 sockets with 10 DMAR units each (640), make the
value of DMAR_UNITS_SUPPORTED adjustable by a config variable,
CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when MAXSMP is
set.

If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set
to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
remapping doesn't support X2APIC mode x2apic disabled"; and the system
fails to boot properly.

Signed-off-by: Steve Wahl 


I've received a report from the kernel test robot ,
that this patch causes an error (shown below) when
CONFIG_IOMMU_SUPPORT is not set.

In my opinion, this is because include/linux/dmar.h and
include/linux/intel-iommu are being #included when they are not really
being used.

I've tried placing the contents of intel-iommu.h within an #ifdef
CONFIG_INTEL_IOMMU, and that fixes the problem.

Two questions:

A) Is this the desired approach to to the fix?


Most part of include/linux/intel-iommu.h is private to Intel IOMMU
driver. They should be put in a header like drivers/iommu/intel
/iommu.h. Eventually, we should remove include/linux/intel-iommu.h
and device drivers interact with iommu subsystem through the IOMMU
kAPIs.

Best regards,
baolu



B) Should it be a separate patch, or added onto this patch as a v3?

Error message:  --

In file included from include/linux/intel-iommu.h:21,
 from arch/x86/kvm/x86.c:44:

include/linux/dmar.h:21:33: error: 'CONFIG_DMAR_UNITS_SUPPORTED' undeclared 
here (not in a function); did you mean 'DMAR_UNITS_SUPPORTED'?

   21 | #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED
  | ^~~
include/linux/intel-iommu.h:531:35: note: in expansion of macro 
'DMAR_UNITS_SUPPORTED'
  531 | unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED];
  |   ^~~~


vim +21 include/linux/dmar.h

 20
   > 21  #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED
 22

Initial stab at fixing it: --

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2f9891cb3d00..916fd7b5bcb5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -10,6 +10,8 @@
  #ifndef _INTEL_IOMMU_H_
  #define _INTEL_IOMMU_H_
  
+#ifdef CONFIG_INTEL_IOMMU

+
  #include 
  #include 
  #include 
@@ -831,4 +833,6 @@ static inline const char *decode_prq_descriptor(char *str, 
size_t size,
return str;
  }
  
+#endif /* CONFIG_IOMMU_SUPPORT */

+
  #endif


Thanks.

--> Steve Wahl




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/3] swiotlb: use the right nslabs-derived sizes in swiotlb_init_late

2022-05-12 Thread Stefano Stabellini

On Wed, 11 May 2022, Christoph Hellwig wrote:
> nslabs can shrink when allocations or the remap don't succeed, so make
> sure to use it for all sizing.  For that remove the bytes value that
> can get stale and replace it with local calculations and a boolean to
> indicate if the originally requested size could not be allocated.
> 
> Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and 
> swiotlb_init_late_with_tbl")
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Stefano Stabellini 


> ---
>  kernel/dma/swiotlb.c | 19 +++
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 113e1e8aaca37..d6e62a6a42ceb 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -297,9 +297,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  {
>   struct io_tlb_mem *mem = &io_tlb_default_mem;
>   unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
> - unsigned long bytes;
>   unsigned char *vstart = NULL;
>   unsigned int order;
> + bool retried = false;
>   int rc = 0;
>  
>   if (swiotlb_force_disable)
> @@ -308,7 +308,6 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  retry:
>   order = get_order(nslabs << IO_TLB_SHIFT);
>   nslabs = SLABS_PER_PAGE << order;
> - bytes = nslabs << IO_TLB_SHIFT;
>  
>   while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
>   vstart = (void *)__get_free_pages(gfp_mask | __GFP_NOWARN,
> @@ -316,16 +315,13 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>   if (vstart)
>   break;
>   order--;
> + nslabs = SLABS_PER_PAGE << order;
> + retried = true;
>   }
>  
>   if (!vstart)
>   return -ENOMEM;
>  
> - if (order != get_order(bytes)) {
> - pr_warn("only able to allocate %ld MB\n",
> - (PAGE_SIZE << order) >> 20);
> - nslabs = SLABS_PER_PAGE << order;
> - }
>   if (remap)
>   rc = remap(vstart, nslabs);
>   if (rc) {
> @@ -334,9 +330,15 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>   nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
>   if (nslabs < IO_TLB_MIN_SLABS)
>   return rc;
> + retried = true;
>   goto retry;
>   }
>  
> + if (retried) {
> + pr_warn("only able to allocate %ld MB\n",
> + (PAGE_SIZE << order) >> 20);
> + }
> +
>   mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
>   get_order(array_size(sizeof(*mem->slots), nslabs)));
>   if (!mem->slots) {
> @@ -344,7 +346,8 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>   return -ENOMEM;
>   }
>  
> - set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT);
> + set_memory_decrypted((unsigned long)vstart,
> +  (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
>   swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
>  
>   swiotlb_print_info();
> -- 
> 2.30.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] swiotlb: use the right nslabs value in swiotlb_init_remap

2022-05-12 Thread Stefano Stabellini

On Wed, 11 May 2022, Christoph Hellwig wrote:
> default_nslabs should only be used to initialize nslabs, after that we
> need to use the local variable that can shrink when allocations or the
> remap don't succeed.
> 
> Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and 
> swiotlb_init_late_with_tbl")
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Stefano Stabellini 


> ---
>  kernel/dma/swiotlb.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 3e992a308c8a1..113e1e8aaca37 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -234,7 +234,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
> unsigned int flags,
>  {
>   struct io_tlb_mem *mem = &io_tlb_default_mem;
>   unsigned long nslabs = default_nslabs;
> - size_t alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs));
> + size_t alloc_size;
>   size_t bytes;
>   void *tlb;
>  
> @@ -249,7 +249,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
> unsigned int flags,
>* memory encryption.
>*/
>  retry:
> - bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
> + bytes = PAGE_ALIGN(nslabs << IO_TLB_SHIFT);
>   if (flags & SWIOTLB_ANY)
>   tlb = memblock_alloc(bytes, PAGE_SIZE);
>   else
> @@ -269,12 +269,13 @@ void __init swiotlb_init_remap(bool addressing_limit, 
> unsigned int flags,
>   goto retry;
>   }
>  
> + alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs));
>   mem->slots = memblock_alloc(alloc_size, PAGE_SIZE);
>   if (!mem->slots)
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
>  
> - swiotlb_init_io_tlb_mem(mem, __pa(tlb), default_nslabs, false);
> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
>   mem->force_bounce = flags & SWIOTLB_FORCE;
>  
>   if (flags & SWIOTLB_VERBOSE)
> -- 
> 2.30.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] swiotlb: don't panic when the swiotlb buffer can't be allocated

2022-05-12 Thread Stefano Stabellini

On Wed, 11 May 2022, Christoph Hellwig wrote:
> For historical reasons the switlb code paniced when the metadata could
> not be allocated, but just printed a warning when the actual main
> swiotlb buffer could not be allocated.  Restore this somewhat unexpected
> behavior as changing it caused a boot failure on the Microchip RISC-V
> PolarFire SoC Icicle kit.
> 
> Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and 
> swiotlb_init_late_with_tbl")
> Reported-by: Conor Dooley 
> Signed-off-by: Christoph Hellwig 
> Tested-by: Conor Dooley 

Reviewed-by: Stefano Stabellini 


> ---
>  kernel/dma/swiotlb.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index e2ef0864eb1e5..3e992a308c8a1 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -254,8 +254,10 @@ void __init swiotlb_init_remap(bool addressing_limit, 
> unsigned int flags,
>   tlb = memblock_alloc(bytes, PAGE_SIZE);
>   else
>   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
> - if (!tlb)
> - panic("%s: failed to allocate tlb structure\n", __func__);
> + if (!tlb) {
> + pr_warn("%s: failed to allocate tlb structure\n", __func__);
> + return;
> + }
>  
>   if (remap && remap(tlb, nslabs) < 0) {
>   memblock_free(tlb, PAGE_ALIGN(bytes));
> -- 
> 2.30.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()

2022-05-12 Thread Nicolin Chen via iommu

Local boot test and VFIO sanity test show that info->iommu can be
used in device_to_iommu() as a fast path. So this patch adds it.

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/intel/iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2990f80c5e08..412fca5ab9cd 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, 
struct device *dev)
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn)
 {
struct dmar_drhd_unit *drhd = NULL;
+   struct device_domain_info *info;
struct pci_dev *pdev = NULL;
struct intel_iommu *iommu;
struct device *tmp;
@@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 
*bus, u8 *devfn)
if (!dev)
return NULL;
 
+   info = dev_iommu_priv_get(dev);
+   if (info)
+   return info->iommu;
+
if (dev_is_pci(dev)) {
struct pci_dev *pf_pdev;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/26] dma-direct: take dma-ranges/offsets into account in resource mapping

2022-05-12 Thread Serge Semin

On Mon, May 09, 2022 at 08:15:52AM +0200, Christoph Hellwig wrote:
> So I think the big problem pointed out by Robin is that existing DTs
> might not take this into account. 

I'd say that the biggest problem isn't in the DT part but in the
drivers using the dma_map_resource() method since they don't expect
the non-uniform DMA-direct mapping can be available.

> So I think we need to do a little
> research and at least Cc all maintainers and lists for relevant in-tree
> DTs for drivers that use dma_map_resource to discuss this.

Right. I'll send the next patchset revision out with all possibly
interested maintainers and lists in Cc of this patch.

-Sergey

> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting

2022-05-12 Thread Steve Wahl

On Thu, May 12, 2022 at 10:13:09AM -0500, Steve Wahl wrote:
> To support up to 64 sockets with 10 DMAR units each (640), make the
> value of DMAR_UNITS_SUPPORTED adjustable by a config variable,
> CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when MAXSMP is
> set.
> 
> If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set
> to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
> allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
> remapping doesn't support X2APIC mode x2apic disabled"; and the system
> fails to boot properly.
> 
> Signed-off-by: Steve Wahl 

I've received a report from the kernel test robot ,
that this patch causes an error (shown below) when
CONFIG_IOMMU_SUPPORT is not set.

In my opinion, this is because include/linux/dmar.h and
include/linux/intel-iommu are being #included when they are not really
being used.

I've tried placing the contents of intel-iommu.h within an #ifdef
CONFIG_INTEL_IOMMU, and that fixes the problem.

Two questions:

A) Is this the desired approach to to the fix?

B) Should it be a separate patch, or added onto this patch as a v3?

Error message:  --

   In file included from include/linux/intel-iommu.h:21,
from arch/x86/kvm/x86.c:44:
>> include/linux/dmar.h:21:33: error: 'CONFIG_DMAR_UNITS_SUPPORTED' undeclared 
>> here (not in a function); did you mean 'DMAR_UNITS_SUPPORTED'?
  21 | #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED
 | ^~~
   include/linux/intel-iommu.h:531:35: note: in expansion of macro 
'DMAR_UNITS_SUPPORTED'
 531 | unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED];
 |   ^~~~


vim +21 include/linux/dmar.h

20
  > 21  #define DMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED
22

Initial stab at fixing it: --

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2f9891cb3d00..916fd7b5bcb5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -10,6 +10,8 @@
 #ifndef _INTEL_IOMMU_H_
 #define _INTEL_IOMMU_H_
 
+#ifdef CONFIG_INTEL_IOMMU
+
 #include 
 #include 
 #include 
@@ -831,4 +833,6 @@ static inline const char *decode_prq_descriptor(char *str, 
size_t size,
return str;
 }
 
+#endif /* CONFIG_IOMMU_SUPPORT */
+
 #endif


Thanks.

--> Steve Wahl


-- 
Steve Wahl, Hewlett Packard Enterprise
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 5/5] iommu/tegra-smmu: Support managed domains

2022-05-12 Thread Thierry Reding

From: Navneet Kumar 

Allow creating identity and DMA API compatible IOMMU domains. When
creating a DMA API compatible domain, make sure to also create the
required cookie.

Signed-off-by: Navneet Kumar 
Signed-off-by: Thierry Reding 
---
Changes in v5:
- remove DMA cookie initialization that's now no longer needed

 drivers/iommu/tegra-smmu.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 93879c40056c..f8b2b863c111 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -277,7 +278,9 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
 {
struct tegra_smmu_as *as;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED &&
+   type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_IDENTITY)
return NULL;
 
as = kzalloc(sizeof(*as), GFP_KERNEL);
@@ -287,25 +290,16 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
 
as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO);
-   if (!as->pd) {
-   kfree(as);
-   return NULL;
-   }
+   if (!as->pd)
+   goto free_as;
 
as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);
-   if (!as->count) {
-   __free_page(as->pd);
-   kfree(as);
-   return NULL;
-   }
+   if (!as->count)
+   goto free_pd_range;
 
as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);
-   if (!as->pts) {
-   kfree(as->count);
-   __free_page(as->pd);
-   kfree(as);
-   return NULL;
-   }
+   if (!as->pts)
+   goto free_pts;
 
spin_lock_init(&as->lock);
 
@@ -315,6 +309,15 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
as->domain.geometry.force_aperture = true;
 
return &as->domain;
+
+free_pts:
+   kfree(as->pts);
+free_pd_range:
+   __free_page(as->pd);
+free_as:
+   kfree(as);
+
+   return NULL;
 }
 
 static void tegra_smmu_domain_free(struct iommu_domain *domain)
@@ -1009,7 +1012,7 @@ static const struct iommu_ops tegra_smmu_ops = {
.probe_device = tegra_smmu_probe_device,
.release_device = tegra_smmu_release_device,
.device_group = tegra_smmu_device_group,
-   .get_resv_regions = of_iommu_get_resv_regions,
+   .get_resv_regions = iommu_dma_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.of_xlate = tegra_smmu_of_xlate,
.pgsize_bitmap = SZ_4K,
-- 
2.36.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 4/5] iommu/tegra-smmu: Add support for reserved regions

2022-05-12 Thread Thierry Reding

From: Thierry Reding 

The Tegra DRM driver currently uses the IOMMU API explicitly. This means
that it has fine-grained control over when exactly the translation
through the IOMMU is enabled. This currently happens after the driver
probes, so the driver is in a DMA quiesced state when the IOMMU
translation is enabled.

During the transition of the Tegra DRM driver to use the DMA API instead
of the IOMMU API explicitly, it was observed that on certain platforms
the display controllers were still actively fetching from memory. When a
DMA IOMMU domain is created as part of the DMA/IOMMU API setup during
boot, the IOMMU translation for the display controllers can be enabled a
significant amount of time before the driver has had a chance to reset
the hardware into a sane state. This causes the SMMU to detect faults on
the addresses that the display controller is trying to fetch.

To avoid this, and as a byproduct paving the way for seamless transition
of display from the bootloader to the kernel, add support for reserved
regions in the Tegra SMMU driver. This is implemented using the standard
reserved memory device tree bindings, which let us describe regions of
memory which the kernel is forbidden from using for regular allocations.
The Tegra SMMU driver will parse the nodes associated with each device
via the "memory-region" property and return reserved regions that the
IOMMU core will then create direct mappings for prior to attaching the
IOMMU domains to the devices. This ensures that a 1:1 mapping is in
place when IOMMU translation starts and prevents the SMMU from detecting
any faults.

Signed-off-by: Thierry Reding 
---
 drivers/iommu/tegra-smmu.c | 47 --
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 2f2b12033618..93879c40056c 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -471,6 +472,7 @@ static void tegra_smmu_as_unprepare(struct tegra_smmu *smmu,
tegra_smmu_free_asid(smmu, as->id);
 
dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD, DMA_TO_DEVICE);
+   as->pd_dma = 0;
 
as->smmu = NULL;
 
@@ -534,6 +536,38 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, 
unsigned long iova,
struct tegra_smmu *smmu = as->smmu;
u32 *pd = page_address(as->pd);
unsigned long offset = pd_index * sizeof(*pd);
+   bool unmap = false;
+
+   /*
+* XXX Move this outside of this function. Perhaps add a struct
+* iommu_domain parameter to ->{get,put}_resv_regions() so that
+* the mapping can be done there.
+*
+* The problem here is that as->smmu is only known once we attach
+* the domain to a device (because then we look up the right SMMU
+* instance via the dev->archdata.iommu pointer). When the direct
+* mappings are created for reserved regions, the domain has not
+* been attached to a device yet, so we don't know. We currently
+* fix that up in ->apply_resv_regions() because that is the first
+* time where we have access to a struct device that will be used
+* with the IOMMU domain. However, that's asymmetric and doesn't
+* take care of the page directory mapping either, so we need to
+* come up with something better.
+*/
+   if (WARN_ON_ONCE(as->pd_dma == 0)) {
+   as->pd_dma = dma_map_page(smmu->dev, as->pd, 0, SMMU_SIZE_PD,
+ DMA_TO_DEVICE);
+   if (dma_mapping_error(smmu->dev, as->pd_dma))
+   return;
+
+   if (!smmu_dma_addr_valid(smmu, as->pd_dma)) {
+   dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD,
+  DMA_TO_DEVICE);
+   return;
+   }
+
+   unmap = true;
+   }
 
/* Set the page directory entry first */
pd[pd_index] = value;
@@ -546,6 +580,12 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, 
unsigned long iova,
smmu_flush_ptc(smmu, as->pd_dma, offset);
smmu_flush_tlb_section(smmu, as->id, iova);
smmu_flush(smmu);
+
+   if (unmap) {
+   dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD,
+  DMA_TO_DEVICE);
+   as->pd_dma = 0;
+   }
 }
 
 static u32 *tegra_smmu_pte_offset(struct page *pt_page, unsigned long iova)
@@ -846,7 +886,6 @@ static struct iommu_device *tegra_smmu_probe_device(struct 
device *dev)
smmu = tegra_smmu_find(args.np);
if (smmu) {
err = tegra_smmu_configure(smmu, dev, &args);
-
if (err < 0) {
of_node_put(args.np);
return

[PATCH v5 3/5] iommu: dma: Use of_iommu_get_resv_regions()

2022-05-12 Thread Thierry Reding

From: Thierry Reding 

For device tree nodes, use the standard of_iommu_get_resv_regions()
implementation to obtain the reserved memory regions associated with a
device.

Signed-off-by: Thierry Reding 
---
 drivers/iommu/dma-iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1ca85d37eeab..3a40ae7a450b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -386,6 +387,8 @@ void iommu_dma_get_resv_regions(struct device *dev, struct 
list_head *list)
if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode))
iort_iommu_msi_get_resv_regions(dev, list);
 
+   if (dev->of_node)
+   of_iommu_get_resv_regions(dev, list);
 }
 EXPORT_SYMBOL(iommu_dma_get_resv_regions);
 
-- 
2.36.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 2/5] iommu: Implement of_iommu_get_resv_regions()

2022-05-12 Thread Thierry Reding

From: Thierry Reding 

This is an implementation that IOMMU drivers can use to obtain reserved
memory regions from a device tree node. It uses the reserved-memory DT
bindings to find the regions associated with a given device. If these
regions are marked accordingly, identity mappings will be created for
them in the IOMMU domain that the devices will be attached to.

Signed-off-by: Thierry Reding 
---
Changes in v5:
- update for new "iommu-addresses" device tree bindings

Changes in v4:
- fix build failure on !CONFIG_OF_ADDRESS

Changes in v3:
- change "active" property to identity mapping flag that is part of the
  memory region specifier (as defined by #memory-region-cells) to allow
  per-reference flags to be used

Changes in v2:
- use "active" property to determine whether direct mappings are needed

 drivers/iommu/of_iommu.c | 90 
 include/linux/of_iommu.h |  8 
 2 files changed, 98 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5696314ae69e..9e341b5e307f 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -11,12 +11,15 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 
+#include 
+
 #define NO_IOMMU   1
 
 static int of_iommu_xlate(struct device *dev,
@@ -172,3 +175,90 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
 
return ops;
 }
+
+/**
+ * of_iommu_get_resv_regions - reserved region driver helper for device tree
+ * @dev: device for which to get reserved regions
+ * @list: reserved region list
+ *
+ * IOMMU drivers can use this to implement their .get_resv_regions() callback
+ * for memory regions attached to a device tree node. See the reserved-memory
+ * device tree bindings on how to use these:
+ *
+ *   Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+ */
+void of_iommu_get_resv_regions(struct device *dev, struct list_head *list)
+{
+#if IS_ENABLED(CONFIG_OF_ADDRESS)
+   struct of_phandle_iterator it;
+   int err;
+
+   of_for_each_phandle(&it, err, dev->of_node, "memory-region", NULL, 0) {
+   struct iommu_resv_region *region;
+   struct resource res;
+   const __be32 *maps;
+   int size;
+
+   memset(&res, 0, sizeof(res));
+
+   /*
+* The "reg" property is optional and can be omitted by 
reserved-memory regions
+* that represent reservations in the IOVA space, which are 
regions that should
+* not be mapped.
+*/
+   if (of_find_property(it.node, "reg", NULL)) {
+   err = of_address_to_resource(it.node, 0, &res);
+   if (err < 0) {
+   dev_err(dev, "failed to parse memory region 
%pOF: %d\n",
+   it.node, err);
+   continue;
+   }
+   }
+
+   maps = of_get_property(it.node, "iommu-addresses", &size);
+   if (maps) {
+   const __be32 *end = maps + size / sizeof(__be32);
+   struct device_node *np;
+   unsigned int index = 0;
+   u32 phandle;
+   int na, ns;
+
+   while (maps < end) {
+   phys_addr_t start, end;
+   size_t length;
+
+   phandle = be32_to_cpup(maps++);
+   np = of_find_node_by_phandle(phandle);
+   na = of_n_addr_cells(np);
+   ns = of_n_size_cells(np);
+
+   start = of_translate_dma_address(np, maps);
+   length = of_read_number(maps + na, ns);
+   end = start + length - 1;
+
+   if (np == dev->of_node) {
+   int prot = IOMMU_READ | IOMMU_WRITE;
+   enum iommu_resv_type type;
+
+   /*
+* IOMMU regions without an associated 
physical region
+* cannot be mapped and are simply 
reservations.
+*/
+   if (res.end > res.start)
+   type = 
IOMMU_RESV_DIRECT_RELAXABLE;
+   else
+   type = IOMMU_RESV_RESERVED;
+
+   region = iommu_alloc_resv_region(start, 
length, prot, type);
+   if (region)
+   list_add_tail(®ion->list, 
list);
+

[PATCH v5 1/5] dt-bindings: reserved-memory: Document iommu-addresses

2022-05-12 Thread Thierry Reding

From: Thierry Reding 

This adds the "iommu-addresses" property to reserved-memory nodes, which
allow describing the interaction of memory regions with IOMMUs. Two use-
cases are supported:

  1. Static mappings can be described by pairing the "iommu-addresses"
 property with a "reg" property. This is mostly useful for adopting
 firmware-allocated buffers via identity mappings. One common use-
 case where this is required is if early firmware or bootloaders
 have set up a bootsplash framebuffer that a display controller is
 actively scanning out from during the operating system boot
 process.

  2. If an "iommu-addresses" property exists without a "reg" property,
 the reserved-memory node describes an IOVA reservation. Such memory
 regions are excluded from the IOVA space available to operating
 system drivers and can be used for regions that must not be used to
 map arbitrary buffers.

Each mapping or reservation is tied to a specific device via a phandle
to the device's device tree node. This allows a reserved-memory region
to be reused across multiple devices.

Signed-off-by: Thierry Reding 
---
 .../reserved-memory/reserved-memory.txt   |  1 -
 .../reserved-memory/reserved-memory.yaml  | 62 +++
 include/dt-bindings/reserved-memory.h |  8 +++
 3 files changed, 70 insertions(+), 1 deletion(-)
 delete mode 100644 
Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
 create mode 100644 include/dt-bindings/reserved-memory.h

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
deleted file mode 100644
index 1810701a8509..
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ /dev/null
@@ -1 +0,0 @@
-This file has been moved to reserved-memory.yaml.
diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
index 7a0744052ff6..3a769aa66e1c 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml
@@ -52,6 +52,30 @@ properties:
   Address and Length pairs. Specifies regions of memory that are
   acceptable to allocate from.
 
+  iommu-addresses:
+$ref: /schemas/types.yaml#/definitions/phandle-array
+description: >
+  A list of phandle and specifier pairs that describe static IO virtual
+  address space mappings and carveouts associated with a given reserved
+  memory region. The phandle in the first cell refers to the device for
+  which the mapping or carveout is to be created.
+
+  The specifier consists of an address/size pair and denotes the IO
+  virtual address range of the region for the given device. The exact
+  format depends on the values of the "#address-cells" and "#size-cells"
+  properties of the device referenced via the phandle.
+
+  When used in combination with a "reg" property, an IOVA mapping is to
+  be established for this memory region. One example where this can be
+  useful is to create an identity mapping for physical memory that the
+  firmware has configured some hardware to access (such as a bootsplash
+  framebuffer).
+
+  If no "reg" property is specified, the "iommu-addresses" property
+  defines carveout regions in the IOVA space for the given device. This
+  can be useful if a certain memory region should not be mapped through
+  the IOMMU.
+
   no-map:
 type: boolean
 description: >
@@ -97,4 +121,42 @@ oneOf:
 
 additionalProperties: true
 
+examples:
+  - |
+reserved-memory {
+  #address-cells = <2>;
+  #size-cells = <2>;
+  ranges;
+
+  adsp: reservation-adsp {
+/*
+ * Restrict IOVA mappings for ADSP buffers to the 512 MiB region
+ * from 0x4000 - 0x5fff. Anything outside is reserved by
+ * the ADSP for I/O memory and private memory allocations.
+ */
+iommu-addresses = <0x0 0x 0x00 0x4000>,
+  <0x0 0x6000 0xff 0xa000>;
+  };
+
+  fb: framebuffer@9000 {
+reg = <0x0 0x9000 0x0 0x0080>;
+iommu-addresses = <&dc0 0x0 0x9000 0x0 0x0080>;
+  };
+};
+
+bus@0 {
+  #address-cells = <2>;
+  #size-cells = <2>;
+
+  adsp@299 {
+reg = <0x0 0x299 0x0 0x2000>;
+memory-region = <&adsp>;
+  };
+
+  display@1520 {
+reg = <0x0 0x1520 0x0 0x1>;
+memory-region = <&fb>;
+  };
+};
+
 ...
diff --git a/include/dt-bindings/reserved-memory.h 
b/include/dt-bindings/reserved-memory.h
new file mode 100644
index ..174ca3448342
--- /dev/null
+++ b/include/dt-bindings/reserved-memory.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identif

[PATCH v5 0/5] iommu: Support mappings/reservations in reserved-memory regions

2022-05-12 Thread Thierry Reding

From: Thierry Reding 

Hi,

this is another attempt at solving the problem of passing IOMMU
configuration via device tree. It has significantly evolved since the
last attempt, based on the discussion that followed. The discussion can
be found here:

  https://lore.kernel.org/all/20210423163234.3651547-1-thierry.red...@gmail.com/

Rather than using a memory-region specifier, this new version introduces
a new "iommu-addresses" property for the reserved-memory regions
themselves. These are used to describe either a static mapping or
reservation that should be created for a given device. If both "reg" and
"iommu-addresses" properties are given, a mapping will be created
(typically this would be an identity mapping) whereas if only an
"iommu-addresses" property is specified, a reservation for the specified
range will be installed.

An example is included in the DT bindings, but here is an extract of
what I've used to test this:

reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;

/*
 * Creates an identity mapping for the framebuffer that
 * the firmware has setup to scan out a bootsplash from.
 */
fb: framebuffer@92cb2000 {
reg = <0x0 0x92cb2000 0x0 0x0080>;
iommu-addresses = <&dc0 0x0 0x92cb2000 0x0 0x0080>;
};

/*
 * Creates a reservation in the IOVA space to prevent
 * any buffers from being mapped to that region. Note
 * that on Tegra the range is actually quite different
 * from this, but it would conflict with the display
 * driver that I tested this against, so this is just
 * a dummy region for testing.
 */
adsp: reservation-adsp {
iommu-addresses = <&dc0 0x0 0x9000 0x0 0x0001>;
};
};

host1x@5000 {
dc@5420 {
memory-region = <&fb>, <&adsp>;
};
};

This is abbreviated a little to focus on the essentials. Note also that
the ADSP reservation is not actually used on this device and the driver
for this doesn't exist yet, but I wanted to include this variant for
testing, because we'll want to use these bindings for the reservation
use-case as well at some point.

Adding Alyssa and Janne who have in the past tried to make these
bindings work on Apple M1. Also adding Sameer from the Tegra audio team
to look at the ADSP reservation and double-check that this is suitable
for our needs.

Thierry

Navneet Kumar (1):
  iommu/tegra-smmu: Support managed domains

Thierry Reding (4):
  dt-bindings: reserved-memory: Document iommu-addresses
  iommu: Implement of_iommu_get_resv_regions()
  iommu: dma: Use of_iommu_get_resv_regions()
  iommu/tegra-smmu: Add support for reserved regions

 .../reserved-memory/reserved-memory.txt   |  1 -
 .../reserved-memory/reserved-memory.yaml  | 62 +
 drivers/iommu/dma-iommu.c |  3 +
 drivers/iommu/of_iommu.c  | 90 +++
 drivers/iommu/tegra-smmu.c| 82 +
 include/dt-bindings/reserved-memory.h |  8 ++
 include/linux/of_iommu.h  |  8 ++
 7 files changed, 235 insertions(+), 19 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
 create mode 100644 include/dt-bindings/reserved-memory.h

-- 
2.36.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio: Stop using iommu_present()

2022-05-12 Thread Alex Williamson

On Tue,  5 Apr 2022 13:11:54 +0100
Robin Murphy  wrote:

> IOMMU groups have been mandatory for some time now, so a device without
> one is necessarily a device without any usable IOMMU, therefore the
> iommu_present() check is redundant (or at best unhelpful).
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/vfio/vfio.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index a4555014bd1e..7b0a7b85e77e 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -745,11 +745,11 @@ static struct vfio_group 
> *vfio_group_find_or_alloc(struct device *dev)
>  
>   iommu_group = iommu_group_get(dev);
>  #ifdef CONFIG_VFIO_NOIOMMU
> - if (!iommu_group && noiommu && !iommu_present(dev->bus)) {
> + if (!iommu_group && noiommu) {
>   /*
>* With noiommu enabled, create an IOMMU group for devices that
> -  * don't already have one and don't have an iommu_ops on their
> -  * bus.  Taint the kernel because we're about to give a DMA
> +  * don't already have one, implying no IOMMU hardware/driver
> +  * exists.  Taint the kernel because we're about to give a DMA
>* capable device to a user without IOMMU protection.
>*/
>   group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);

Applied to vfio next branch for v5.19.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-12 Thread Eric Auger

Hi,

On 5/10/22 20:13, Jason Gunthorpe wrote:
> On Tue, May 10, 2022 at 06:52:06PM +0100, Robin Murphy wrote:
>> On 2022-05-10 17:55, Jason Gunthorpe via iommu wrote:
>>> This control causes the ARM SMMU drivers to choose a stage 2
>>> implementation for the IO pagetable (vs the stage 1 usual default),
>>> however this choice has no visible impact to the VFIO user. Further qemu
>>> never implemented this and no other userspace user is known.
>>>
>>> The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
>>> new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
>>> SMMU translation services to the guest operating system" however the rest
>>> of the API to set the guest table pointer for the stage 1 was never
>>> completed, or at least never upstreamed, rendering this part useless dead
>>> code.
>>>
>>> Since the current patches to enable nested translation, aka userspace page
>>> tables, rely on iommufd and will not use the enable_nesting()
>>> iommu_domain_op, remove this infrastructure. However, don't cut too deep
>>> into the SMMU drivers for now expecting the iommufd work to pick it up -
>>> we still need to create S2 IO page tables.
>>>
>>> Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
>>> enable_nesting iommu_domain_op.
>>>
>>> Just in-case there is some userspace using this continue to treat
>>> requesting it as a NOP, but do not advertise support any more.
>> I assume the nested translation/guest SVA patches that Eric and Vivek were
>> working on pre-IOMMUFD made use of this, and given that they got quite far
>> along, I wouldn't be too surprised if some eager cloud vendors might have
>> even deployed something based on the patches off the list. 

thank you Robin for the heads up.
> With upstream there is no way to make use of this flag, if someone is
> using it they have other out of tree kernel, vfio, kvm and qemu
> patches to make it all work.
>
> You can see how much is still needed in Eric's tree:
>
> https://github.com/eauger/linux/commits/v5.15-rc7-nested-v16
>
>> I can't help feeling a little wary about removing this until IOMMUFD
>> can actually offer a functional replacement - is it in the way of
>> anything upcoming?
> From an upstream perspective if someone has a patched kernel to
> complete the feature, then they can patch this part in as well, we
> should not carry dead code like this in the kernel and in the uapi.

On the other end the code is in the kernel for 8 years now, I think we
could wait for some additional weeks/months until the iommufd nested
integration arises and gets tested.

Thanks

Eric
>
> It is not directly in the way, but this needs to get done at some
> point, I'd rather just get it out of the way.
>
> Thanks,
> Jason
>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu/vt-d: Make DMAR_UNITS_SUPPORTED a config setting

2022-05-12 Thread Steve Wahl

To support up to 64 sockets with 10 DMAR units each (640), make the
value of DMAR_UNITS_SUPPORTED adjustable by a config variable,
CONFIG_DMAR_UNITS_SUPPORTED, and make it's default 1024 when MAXSMP is
set.

If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set
to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
remapping doesn't support X2APIC mode x2apic disabled"; and the system
fails to boot properly.

Signed-off-by: Steve Wahl 
---

Note that we could not find a reason for connecting
DMAR_UNITS_SUPPORTED to MAX_IO_APICS as was done previously.  Perhaps
it seemed like the two would continue to match on earlier processors.
There doesn't appear to be kernel code that assumes that the value of
one is related to the other.

v2: Make this value a config option, rather than a fixed constant.  The default
values should match previous configuration except in the MAXSMP case.  Keeping 
the
value at a power of two was requested by Kevin Tian.

 drivers/iommu/intel/Kconfig | 6 ++
 include/linux/dmar.h| 6 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 247d0f2d5fdf..fdbda77ac21e 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -9,6 +9,12 @@ config DMAR_PERF
 config DMAR_DEBUG
bool
 
+config DMAR_UNITS_SUPPORTED
+   int "Number of DMA Remapping Units supported"
+   default 1024 if MAXSMP
+   default 128  if X86_64
+   default 64
+
 config INTEL_IOMMU
bool "Support for Intel IOMMU using DMA Remapping Devices"
depends on PCI_MSI && ACPI && (X86 || IA64)
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 45e903d84733..0c03c1845c23 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -18,11 +18,7 @@
 
 struct acpi_dmar_header;
 
-#ifdef CONFIG_X86
-# define   DMAR_UNITS_SUPPORTEDMAX_IO_APICS
-#else
-# define   DMAR_UNITS_SUPPORTED64
-#endif
+#defineDMAR_UNITS_SUPPORTEDCONFIG_DMAR_UNITS_SUPPORTED
 
 /* DMAR Flags */
 #define DMAR_INTR_REMAP0x1
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 2/2] iomm/mediatek: Allow page table PA up to 35bit

2022-05-12 Thread yf.wang--- via iommu

From: Yunfei Wang 

Add the quirk IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT support, so that allows
page table PA up to 35bit, not only in ZONE_DMA32.

Signed-off-by: Ning Li 
Signed-off-by: Yunfei Wang 
---
 drivers/iommu/mtk_iommu.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6fd75a60abd6..1b9a876ef271 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -33,6 +33,7 @@
 
 #define REG_MMU_PT_BASE_ADDR   0x000
 #define MMU_PT_ADDR_MASK   GENMASK(31, 7)
+#define MMU_PT_ADDR_2_0_MASK   GENMASK(2, 0)
 
 #define REG_MMU_INVALIDATE 0x020
 #define F_ALL_INVLD0x2
@@ -118,6 +119,7 @@
 #define WR_THROT_ENBIT(6)
 #define HAS_LEGACY_IVRP_PADDR  BIT(7)
 #define IOVA_34_EN BIT(8)
+#define PGTABLE_PA_35_EN   BIT(9)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -401,6 +403,9 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom,
.iommu_dev = data->dev,
};
 
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   dom->cfg.quirks |= IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT;
+
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE))
dom->cfg.oas = data->enable_4GB ? 33 : 32;
else
@@ -450,6 +455,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
struct device *m4udev = data->dev;
int ret, domid;
+   u32 regval;
 
domid = mtk_iommu_get_domain_id(dev, data->plat_data);
if (domid < 0)
@@ -472,8 +478,14 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
return ret;
}
data->m4u_dom = dom;
-   writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
-  data->base + REG_MMU_PT_BASE_ADDR);
+
+   /* Bits[6:3] are invalid for mediatek platform */
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   regval = (dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK) 
|
+(dom->cfg.arm_v7s_cfg.ttbr & 
MMU_PT_ADDR_2_0_MASK);
+   else
+   regval = dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK;
+   writel(regval, data->base + REG_MMU_PT_BASE_ADDR);
 
pm_runtime_put(m4udev);
}
@@ -987,6 +999,7 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct 
device *dev)
struct mtk_iommu_suspend_reg *reg = &data->reg;
struct mtk_iommu_domain *m4u_dom = data->m4u_dom;
void __iomem *base = data->base;
+   u32 regval;
int ret;
 
ret = clk_prepare_enable(data->bclk);
@@ -1010,7 +1023,14 @@ static int __maybe_unused 
mtk_iommu_runtime_resume(struct device *dev)
writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL);
writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
-   writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, base + 
REG_MMU_PT_BASE_ADDR);
+
+   /* Bits[6:3] are invalid for mediatek platform */
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   regval = (m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK) |
+(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_2_0_MASK);
+   else
+   regval = m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK;
+   writel(regval, base + REG_MMU_PT_BASE_ADDR);
 
/*
 * Users may allocate dma buffer before they call pm_runtime_get,
@@ -1038,7 +1058,8 @@ static const struct mtk_iommu_plat_data mt2712_data = {
 
 static const struct mtk_iommu_plat_data mt6779_data = {
.m4u_plat  = M4U_MT6779,
-   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN,
+   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN |
+PGTABLE_PA_35_EN,
.inv_sel_reg   = REG_MMU_INV_SEL_GEN2,
.iova_region   = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit

2022-05-12 Thread yf.wang--- via iommu

From: Yunfei Wang 

The calling to kmem_cache_alloc for level 2 pgtable allocation may run
in atomic context, and it fails sometimes when DMA32 zone runs out of
memory.

Since Mediatek IOMMU hardware support at most 35bit PA in pgtable,
so add a quirk to allow the PA of pgtables support up to bit35.

Signed-off-by: Ning Li 
Signed-off-by: Yunfei Wang 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 56 ++
 include/linux/io-pgtable.h | 15 +---
 2 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index be066c1503d3..57455ae052ac 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -149,6 +149,10 @@
 #define ARM_V7S_TTBR_IRGN_ATTR(attr)   \
attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
 
+/* Mediatek extend ttbr bits[2:0] for PA bits[34:32] */
+#define ARM_V7S_TTBR_35BIT_PA(ttbr, pa)
\
+   ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32))
+
 #ifdef CONFIG_ZONE_DMA32
 #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
 #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
@@ -182,14 +186,8 @@ static bool arm_v7s_is_mtk_enabled(struct io_pgtable_cfg 
*cfg)
(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT);
 }
 
-static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl,
-   struct io_pgtable_cfg *cfg)
+static arm_v7s_iopte to_iopte_mtk(phys_addr_t paddr, arm_v7s_iopte pte)
 {
-   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
-
-   if (!arm_v7s_is_mtk_enabled(cfg))
-   return pte;
-
if (paddr & BIT_ULL(32))
pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
if (paddr & BIT_ULL(33))
@@ -199,6 +197,17 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int 
lvl,
return pte;
 }
 
+static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl,
+   struct io_pgtable_cfg *cfg)
+{
+   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
+
+   if (!arm_v7s_is_mtk_enabled(cfg))
+   return pte;
+
+   return to_iopte_mtk(paddr, pte);
+}
+
 static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int lvl,
  struct io_pgtable_cfg *cfg)
 {
@@ -234,6 +243,7 @@ static arm_v7s_iopte *iopte_deref(arm_v7s_iopte pte, int 
lvl,
 static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
   struct arm_v7s_io_pgtable *data)
 {
+   gfp_t gfp_l1 = __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA;
struct io_pgtable_cfg *cfg = &data->iop.cfg;
struct device *dev = cfg->iommu_dev;
phys_addr_t phys;
@@ -241,9 +251,11 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
size_t size = ARM_V7S_TABLE_SIZE(lvl, cfg);
void *table = NULL;
 
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)
+   gfp_l1 = __GFP_ZERO;
+
if (lvl == 1)
-   table = (void *)__get_free_pages(
-   __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size));
+   table = (void *)__get_free_pages(gfp_l1, get_order(size));
else if (lvl == 2)
table = kmem_cache_zalloc(data->l2_tables, gfp);
 
@@ -251,7 +263,8 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
return NULL;
 
phys = virt_to_phys(table);
-   if (phys != (arm_v7s_iopte)phys) {
+   if (phys != (arm_v7s_iopte)phys &&
+   !(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)) {
/* Doesn't fit in PTE */
dev_err(dev, "Page table does not fit in PTE: %pa", &phys);
goto out_free;
@@ -457,9 +470,14 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte 
*table,
   arm_v7s_iopte curr,
   struct io_pgtable_cfg *cfg)
 {
+   phys_addr_t phys = virt_to_phys(table);
arm_v7s_iopte old, new;
 
-   new = virt_to_phys(table) | ARM_V7S_PTE_TYPE_TABLE;
+   new = phys | ARM_V7S_PTE_TYPE_TABLE;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)
+   new = to_iopte_mtk(phys, new);
+
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
new |= ARM_V7S_ATTR_NS_TABLE;
 
@@ -778,7 +796,9 @@ static phys_addr_t arm_v7s_iova_to_phys(struct 
io_pgtable_ops *ops,
 static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
void *cookie)
 {
+   slab_flags_t slab_flag = ARM_V7S_TABLE_SLAB_FLAGS;
struct arm_v7s_io_pgtable *data;
+   phys_addr_t paddr;
 
if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
return NULL;
@@ -788,7 +808,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
 
if (cfg->q

[PATCH v2 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit

2022-05-12 Thread yf.wang--- via iommu

From: Yunfei Wang 

The calling to kmem_cache_alloc for level 2 pgtable allocation may run
in atomic context, and it fails sometimes when DMA32 zone runs out of
memory.

Since Mediatek IOMMU hardware support at most 35bit PA in pgtable,
so add a quirk to allow the PA of pgtables support up to bit35.

Signed-off-by: Ning Li 
Signed-off-by: Yunfei Wang 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 56 ++
 include/linux/io-pgtable.h | 15 +---
 2 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index be066c1503d3..57455ae052ac 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -149,6 +149,10 @@
 #define ARM_V7S_TTBR_IRGN_ATTR(attr)   \
attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
 
+/* Mediatek extend ttbr bits[2:0] for PA bits[34:32] */
+#define ARM_V7S_TTBR_35BIT_PA(ttbr, pa)
\
+   ((ttbr & ((u32)(~0U << 3))) | ((pa & GENMASK(34, 32)) >> 32))
+
 #ifdef CONFIG_ZONE_DMA32
 #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
 #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
@@ -182,14 +186,8 @@ static bool arm_v7s_is_mtk_enabled(struct io_pgtable_cfg 
*cfg)
(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT);
 }
 
-static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl,
-   struct io_pgtable_cfg *cfg)
+static arm_v7s_iopte to_iopte_mtk(phys_addr_t paddr, arm_v7s_iopte pte)
 {
-   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
-
-   if (!arm_v7s_is_mtk_enabled(cfg))
-   return pte;
-
if (paddr & BIT_ULL(32))
pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
if (paddr & BIT_ULL(33))
@@ -199,6 +197,17 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int 
lvl,
return pte;
 }
 
+static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl,
+   struct io_pgtable_cfg *cfg)
+{
+   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
+
+   if (!arm_v7s_is_mtk_enabled(cfg))
+   return pte;
+
+   return to_iopte_mtk(paddr, pte);
+}
+
 static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int lvl,
  struct io_pgtable_cfg *cfg)
 {
@@ -234,6 +243,7 @@ static arm_v7s_iopte *iopte_deref(arm_v7s_iopte pte, int 
lvl,
 static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
   struct arm_v7s_io_pgtable *data)
 {
+   gfp_t gfp_l1 = __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA;
struct io_pgtable_cfg *cfg = &data->iop.cfg;
struct device *dev = cfg->iommu_dev;
phys_addr_t phys;
@@ -241,9 +251,11 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
size_t size = ARM_V7S_TABLE_SIZE(lvl, cfg);
void *table = NULL;
 
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)
+   gfp_l1 = __GFP_ZERO;
+
if (lvl == 1)
-   table = (void *)__get_free_pages(
-   __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size));
+   table = (void *)__get_free_pages(gfp_l1, get_order(size));
else if (lvl == 2)
table = kmem_cache_zalloc(data->l2_tables, gfp);
 
@@ -251,7 +263,8 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
return NULL;
 
phys = virt_to_phys(table);
-   if (phys != (arm_v7s_iopte)phys) {
+   if (phys != (arm_v7s_iopte)phys &&
+   !(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)) {
/* Doesn't fit in PTE */
dev_err(dev, "Page table does not fit in PTE: %pa", &phys);
goto out_free;
@@ -457,9 +470,14 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte 
*table,
   arm_v7s_iopte curr,
   struct io_pgtable_cfg *cfg)
 {
+   phys_addr_t phys = virt_to_phys(table);
arm_v7s_iopte old, new;
 
-   new = virt_to_phys(table) | ARM_V7S_PTE_TYPE_TABLE;
+   new = phys | ARM_V7S_PTE_TYPE_TABLE;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)
+   new = to_iopte_mtk(phys, new);
+
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
new |= ARM_V7S_ATTR_NS_TABLE;
 
@@ -778,7 +796,9 @@ static phys_addr_t arm_v7s_iova_to_phys(struct 
io_pgtable_ops *ops,
 static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
void *cookie)
 {
+   slab_flags_t slab_flag = ARM_V7S_TABLE_SLAB_FLAGS;
struct arm_v7s_io_pgtable *data;
+   phys_addr_t paddr;
 
if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
return NULL;
@@ -788,7 +808,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
 
if (cfg->q

[PATCH v2 2/2] iomm/mediatek: Allow page table PA up to 35bit

2022-05-12 Thread yf.wang--- via iommu

From: Yunfei Wang 

Add the quirk IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT support, so that allows
page table PA up to 35bit, not only in ZONE_DMA32.

Signed-off-by: Ning Li 
Signed-off-by: Yunfei Wang 
---
 drivers/iommu/mtk_iommu.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6fd75a60abd6..1b9a876ef271 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -33,6 +33,7 @@
 
 #define REG_MMU_PT_BASE_ADDR   0x000
 #define MMU_PT_ADDR_MASK   GENMASK(31, 7)
+#define MMU_PT_ADDR_2_0_MASK   GENMASK(2, 0)
 
 #define REG_MMU_INVALIDATE 0x020
 #define F_ALL_INVLD0x2
@@ -118,6 +119,7 @@
 #define WR_THROT_ENBIT(6)
 #define HAS_LEGACY_IVRP_PADDR  BIT(7)
 #define IOVA_34_EN BIT(8)
+#define PGTABLE_PA_35_EN   BIT(9)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -401,6 +403,9 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom,
.iommu_dev = data->dev,
};
 
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   dom->cfg.quirks |= IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT;
+
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE))
dom->cfg.oas = data->enable_4GB ? 33 : 32;
else
@@ -450,6 +455,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
struct device *m4udev = data->dev;
int ret, domid;
+   u32 regval;
 
domid = mtk_iommu_get_domain_id(dev, data->plat_data);
if (domid < 0)
@@ -472,8 +478,14 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
return ret;
}
data->m4u_dom = dom;
-   writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
-  data->base + REG_MMU_PT_BASE_ADDR);
+
+   /* Bits[6:3] are invalid for mediatek platform */
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   regval = (dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK) 
|
+(dom->cfg.arm_v7s_cfg.ttbr & 
MMU_PT_ADDR_2_0_MASK);
+   else
+   regval = dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK;
+   writel(regval, data->base + REG_MMU_PT_BASE_ADDR);
 
pm_runtime_put(m4udev);
}
@@ -987,6 +999,7 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct 
device *dev)
struct mtk_iommu_suspend_reg *reg = &data->reg;
struct mtk_iommu_domain *m4u_dom = data->m4u_dom;
void __iomem *base = data->base;
+   u32 regval;
int ret;
 
ret = clk_prepare_enable(data->bclk);
@@ -1010,7 +1023,14 @@ static int __maybe_unused 
mtk_iommu_runtime_resume(struct device *dev)
writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL);
writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
-   writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, base + 
REG_MMU_PT_BASE_ADDR);
+
+   /* Bits[6:3] are invalid for mediatek platform */
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   regval = (m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK) |
+(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_2_0_MASK);
+   else
+   regval = m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK;
+   writel(regval, base + REG_MMU_PT_BASE_ADDR);
 
/*
 * Users may allocate dma buffer before they call pm_runtime_get,
@@ -1038,7 +1058,8 @@ static const struct mtk_iommu_plat_data mt2712_data = {
 
 static const struct mtk_iommu_plat_data mt6779_data = {
.m4u_plat  = M4U_MT6779,
-   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN,
+   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN |
+PGTABLE_PA_35_EN,
.inv_sel_reg   = REG_MMU_INV_SEL_GEN2,
.iova_region   = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Baolu Lu


On 2022/5/12 20:03, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 07:59:41PM +0800, Baolu Lu wrote:

On 2022/5/12 19:48, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote:

On 2022/5/12 13:01, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Thursday, May 12, 2022 11:03 AM

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



'bond' was required before because we didn't have a domain to wrap
the page table at that time.

Now we have a domain and it is 1:1 associated to bond. Probably
make sense now by just returning the domain as the sva handle
instead?


It also includes the device information that the domain has been
attached. So the sva_unbind() looks like this:

/**
   * iommu_sva_unbind_device() - Remove a bond created with
iommu_sva_bind_device
   * @handle: the handle returned by iommu_sva_bind_device()
   *
   * Put reference to a bond between device and address space. The device
should
   * not be issuing any more transaction for this PASID. All outstanding page
   * requests for this PASID must have been flushed to the IOMMU.
   */
void iommu_sva_unbind_device(struct iommu_sva *handle)

It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
if we can include the device in the unbind() interface.


Why would we have a special unbind for SVA?


It's about SVA kAPI for device drivers. The existing kAPIs include:

struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 struct mm_struct *mm,
 void *drvdata);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);


This is not what we agreed the API should be. We agreed:

  iommu_sva_domain_alloc()
  iommu_attach_device_pasid()
  iommu_detach_device_pasid()

Again, SVA should not be different from normal domain stuff.


Yes, agreed.

I am trying to achieve this in two steps. This first step focuses on
internal iommu implementation and keep the driver kAPI untouched. Then,
the second step focus on the driver APIs.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Baolu Lu


On 2022/5/12 19:51, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 11:02:39AM +0800, Baolu Lu wrote:

+   mutex_lock(&group->mutex);
+   domain = xa_load(&group->pasid_array, pasid);
+   if (domain && domain->type != type)
+   domain = NULL;
+   mutex_unlock(&group->mutex);
+   iommu_group_put(group);
+
+   return domain;

This is bad locking, group->pasid_array values cannot be taken outside
the lock.


It's not iommu core, but SVA (or other feature components) that manage
the life cycle of a domain. The iommu core only provides a place to
store the domain pointer. The feature components are free to fetch their
domain pointers from iommu core as long as they are sure that the domain
is alive during use.


I'm not convinced.


I'm sorry, I may not have explained it clearly. :-)

This helper is safe for uses inside the IOMMU subsystem. We could trust
ourselves that nobody will abuse this helper to obtain domains belonging
to others as the pasid has been allocated for SVA code. No other code
should be able to setup another type of domain for this pasid. The SVA
code has its own lock mechanism to avoid using after free.

Please correct me if I missed anything. :-) By the way, I can see some
similar helpers in current IOMMU core. For example,

struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
{
struct iommu_domain *domain;
struct iommu_group *group;

group = iommu_group_get(dev);
if (!group)
return NULL;

domain = group->domain;

iommu_group_put(group);

return domain;
}
EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-12 Thread zhangfei....@foxmail.com




On 2022/5/12 下午7:59, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 07:57:26PM +0800, zhangfei@foxmail.com wrote:


On 2022/5/12 下午7:32, Jason Gunthorpe via iommu wrote:

On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote:

I can't help feeling a little wary about removing this until IOMMUFD
can actually offer a functional replacement - is it in the way of
anything upcoming?

   From an upstream perspective if someone has a patched kernel to
complete the feature, then they can patch this part in as well, we
should not carry dead code like this in the kernel and in the uapi.

It is not directly in the way, but this needs to get done at some
point, I'd rather just get it out of the way.

We are using this interface for nested mode.

How are you using it? It doesn't do anything. Do you have out of tree
patches as well?

Yes, there are some patches out of tree, since they are pending for almost
one year.

By the way, I am trying to test nesting mode based on iommufd, still
requires iommu_enable_nesting,
which is removed in this patch as well.

This is temporary.


So can we wait for alternative patch then remove them?

Do you see a problem with patching this along with all the other
patches you need?

Not yet.

Currently I am using two workarounds, but it can simply work.

1. Hard code to to enable nesting mode, both in kernel and qemu.
Will consider then.

2. Adding VFIOIOASHwpt *hwpt in VFIOIOMMUFDContainer.
And save hwpt when vfio_device_attach_container.

While currently VFIOIOMMUFDContainer has hwpt_list now.
Have asked Yi in another mail.

Thanks


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Jason Gunthorpe via iommu

On Thu, May 12, 2022 at 07:59:41PM +0800, Baolu Lu wrote:
> On 2022/5/12 19:48, Jason Gunthorpe wrote:
> > On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote:
> > > On 2022/5/12 13:01, Tian, Kevin wrote:
> > > > > From: Baolu Lu 
> > > > > Sent: Thursday, May 12, 2022 11:03 AM
> > > > > 
> > > > > On 2022/5/11 22:53, Jason Gunthorpe wrote:
> > > > > > > > Also, given the current arrangement it might make sense to have 
> > > > > > > > a
> > > > > > > > struct iommu_domain_sva given that no driver is wrappering this 
> > > > > > > > in
> > > > > > > > something else.
> > > > > > > Fair enough. How about below wrapper?
> > > > > > > 
> > > > > > > +struct iommu_sva_domain {
> > > > > > > +   /*
> > > > > > > +* Common iommu domain header,*must*  be put at the top
> > > > > > > +* of the structure.
> > > > > > > +*/
> > > > > > > +   struct iommu_domain domain;
> > > > > > > +   struct mm_struct *mm;
> > > > > > > +   struct iommu_sva bond;
> > > > > > > +}
> > > > > > > 
> > > > > > > The refcount is wrapped in bond.
> > > > > > I'm still not sure that bond is necessary
> > > > > 
> > > > > "bond" is the sva handle that the device drivers get through calling
> > > > > iommu_sva_bind().
> > > > > 
> > > > 
> > > > 'bond' was required before because we didn't have a domain to wrap
> > > > the page table at that time.
> > > > 
> > > > Now we have a domain and it is 1:1 associated to bond. Probably
> > > > make sense now by just returning the domain as the sva handle
> > > > instead?
> > > 
> > > It also includes the device information that the domain has been
> > > attached. So the sva_unbind() looks like this:
> > > 
> > > /**
> > >   * iommu_sva_unbind_device() - Remove a bond created with
> > > iommu_sva_bind_device
> > >   * @handle: the handle returned by iommu_sva_bind_device()
> > >   *
> > >   * Put reference to a bond between device and address space. The device
> > > should
> > >   * not be issuing any more transaction for this PASID. All outstanding 
> > > page
> > >   * requests for this PASID must have been flushed to the IOMMU.
> > >   */
> > > void iommu_sva_unbind_device(struct iommu_sva *handle)
> > > 
> > > It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
> > > if we can include the device in the unbind() interface.
> > 
> > Why would we have a special unbind for SVA?
> 
> It's about SVA kAPI for device drivers. The existing kAPIs include:
> 
> struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> struct mm_struct *mm,
> void *drvdata);
> void iommu_sva_unbind_device(struct iommu_sva *handle);
> u32 iommu_sva_get_pasid(struct iommu_sva *handle);

This is not what we agreed the API should be. We agreed:

 iommu_sva_domain_alloc()
 iommu_attach_device_pasid()
 iommu_detach_device_pasid()

Again, SVA should not be different from normal domain stuff.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Baolu Lu


On 2022/5/12 19:48, Jason Gunthorpe wrote:

On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote:

On 2022/5/12 13:01, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Thursday, May 12, 2022 11:03 AM

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



'bond' was required before because we didn't have a domain to wrap
the page table at that time.

Now we have a domain and it is 1:1 associated to bond. Probably
make sense now by just returning the domain as the sva handle
instead?


It also includes the device information that the domain has been
attached. So the sva_unbind() looks like this:

/**
  * iommu_sva_unbind_device() - Remove a bond created with
iommu_sva_bind_device
  * @handle: the handle returned by iommu_sva_bind_device()
  *
  * Put reference to a bond between device and address space. The device
should
  * not be issuing any more transaction for this PASID. All outstanding page
  * requests for this PASID must have been flushed to the IOMMU.
  */
void iommu_sva_unbind_device(struct iommu_sva *handle)

It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
if we can include the device in the unbind() interface.


Why would we have a special unbind for SVA?


It's about SVA kAPI for device drivers. The existing kAPIs include:

struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm,
void *drvdata);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);


SVA should not different from normal domains it should use the normal
detach flow too.


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-12 Thread Jason Gunthorpe via iommu

On Thu, May 12, 2022 at 07:57:26PM +0800, zhangfei@foxmail.com wrote:
> 
> 
> On 2022/5/12 下午7:32, Jason Gunthorpe via iommu wrote:
> > On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote:
> > > > > I can't help feeling a little wary about removing this until IOMMUFD
> > > > > can actually offer a functional replacement - is it in the way of
> > > > > anything upcoming?
> > > >   From an upstream perspective if someone has a patched kernel to
> > > > complete the feature, then they can patch this part in as well, we
> > > > should not carry dead code like this in the kernel and in the uapi.
> > > > 
> > > > It is not directly in the way, but this needs to get done at some
> > > > point, I'd rather just get it out of the way.
> > > We are using this interface for nested mode.
> > How are you using it? It doesn't do anything. Do you have out of tree
> > patches as well?
> 
> Yes, there are some patches out of tree, since they are pending for almost
> one year.
> 
> By the way, I am trying to test nesting mode based on iommufd, still
> requires iommu_enable_nesting,
> which is removed in this patch as well.

This is temporary.

> So can we wait for alternative patch then remove them?

Do you see a problem with patching this along with all the other
patches you need?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-12 Thread zhangfei....@foxmail.com




On 2022/5/12 下午7:32, Jason Gunthorpe via iommu wrote:

On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote:

I can't help feeling a little wary about removing this until IOMMUFD
can actually offer a functional replacement - is it in the way of
anything upcoming?

  From an upstream perspective if someone has a patched kernel to
complete the feature, then they can patch this part in as well, we
should not carry dead code like this in the kernel and in the uapi.

It is not directly in the way, but this needs to get done at some
point, I'd rather just get it out of the way.

We are using this interface for nested mode.

How are you using it? It doesn't do anything. Do you have out of tree
patches as well?


Yes, there are some patches out of tree, since they are pending for 
almost one year.


By the way, I am trying to test nesting mode based on iommufd, still 
requires iommu_enable_nesting,

which is removed in this patch as well.

So can we wait for alternative patch then remove them?

Thanks


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-12 Thread Jason Gunthorpe via iommu

On Thu, May 12, 2022 at 02:22:03PM +0800, Baolu Lu wrote:
> Hi Jason,
> 
> On 2022/5/12 01:00, Jason Gunthorpe wrote:
> > > Consolidate pasid programming into dev_set_pasid() then called by both
> > > intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?
> > I was only suggesting that really dev_attach_pasid() op is misnamed,
> > it should be called set_dev_pasid() and act like a set, not a paired
> > attach/detach - same as the non-PASID ops.
> 
> So,
> 
>   "set_dev_pasid(domain, device, pasid)" equals to dev_attach_pasid()
> 
> and
> 
>   "set_dev_pasid(NULL, device, pasid)" equals to dev_detach_pasid()?
> 
> do I understand it right?

blocking_domain should be passed, not null

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Jason Gunthorpe via iommu

On Thu, May 12, 2022 at 11:02:39AM +0800, Baolu Lu wrote:
> > > +   mutex_lock(&group->mutex);
> > > +   domain = xa_load(&group->pasid_array, pasid);
> > > +   if (domain && domain->type != type)
> > > +   domain = NULL;
> > > +   mutex_unlock(&group->mutex);
> > > +   iommu_group_put(group);
> > > +
> > > +   return domain;
> > This is bad locking, group->pasid_array values cannot be taken outside
> > the lock.
> 
> It's not iommu core, but SVA (or other feature components) that manage
> the life cycle of a domain. The iommu core only provides a place to
> store the domain pointer. The feature components are free to fetch their
> domain pointers from iommu core as long as they are sure that the domain
> is alive during use.

I'm not convinced.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-12 Thread Jason Gunthorpe via iommu

On Thu, May 12, 2022 at 08:00:59AM +0100, Jean-Philippe Brucker wrote:

> > It is not "missing" it is just renamed to 
> > blocking_domain->ops->set_dev_pasid()
> > 
> > The implementation of that function would be identical to
> > detach_dev_pasid.
> 
>   attach(dev, pasid, sva_domain)
>   detach(dev, pasid, sva_domain)
> 
> versus
> 
>   set_dev_pasid(dev, pasid, sva_domain)
>   set_dev_pasid(dev, pasid, blocking)
> 
> we loose the information of the domain previously attached, and the SMMU
> driver has to retrieve it to find the ASID corresponding to the mm. 

It would be easy to have the old domain be an input as well - the core
code knows it.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Jason Gunthorpe via iommu

On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote:
> On 2022/5/12 13:01, Tian, Kevin wrote:
> > > From: Baolu Lu 
> > > Sent: Thursday, May 12, 2022 11:03 AM
> > > 
> > > On 2022/5/11 22:53, Jason Gunthorpe wrote:
> > > > > > Also, given the current arrangement it might make sense to have a
> > > > > > struct iommu_domain_sva given that no driver is wrappering this in
> > > > > > something else.
> > > > > Fair enough. How about below wrapper?
> > > > > 
> > > > > +struct iommu_sva_domain {
> > > > > +   /*
> > > > > +* Common iommu domain header,*must*  be put at the top
> > > > > +* of the structure.
> > > > > +*/
> > > > > +   struct iommu_domain domain;
> > > > > +   struct mm_struct *mm;
> > > > > +   struct iommu_sva bond;
> > > > > +}
> > > > > 
> > > > > The refcount is wrapped in bond.
> > > > I'm still not sure that bond is necessary
> > > 
> > > "bond" is the sva handle that the device drivers get through calling
> > > iommu_sva_bind().
> > > 
> > 
> > 'bond' was required before because we didn't have a domain to wrap
> > the page table at that time.
> > 
> > Now we have a domain and it is 1:1 associated to bond. Probably
> > make sense now by just returning the domain as the sva handle
> > instead?
> 
> It also includes the device information that the domain has been
> attached. So the sva_unbind() looks like this:
> 
> /**
>  * iommu_sva_unbind_device() - Remove a bond created with
> iommu_sva_bind_device
>  * @handle: the handle returned by iommu_sva_bind_device()
>  *
>  * Put reference to a bond between device and address space. The device
> should
>  * not be issuing any more transaction for this PASID. All outstanding page
>  * requests for this PASID must have been flushed to the IOMMU.
>  */
> void iommu_sva_unbind_device(struct iommu_sva *handle)
> 
> It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
> if we can include the device in the unbind() interface.

Why would we have a special unbind for SVA?

SVA should not different from normal domains it should use the normal
detach flow too.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-12 Thread Jason Gunthorpe via iommu

On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote:
> > > I can't help feeling a little wary about removing this until IOMMUFD
> > > can actually offer a functional replacement - is it in the way of
> > > anything upcoming?
> >  From an upstream perspective if someone has a patched kernel to
> > complete the feature, then they can patch this part in as well, we
> > should not carry dead code like this in the kernel and in the uapi.
> > 
> > It is not directly in the way, but this needs to get done at some
> > point, I'd rather just get it out of the way.
> 
> We are using this interface for nested mode.

How are you using it? It doesn't do anything. Do you have out of tree
patches as well?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-12 Thread Zhangfei Gao


Hi, Jason

On 2022/5/11 上午2:13, Jason Gunthorpe via iommu wrote:

On Tue, May 10, 2022 at 06:52:06PM +0100, Robin Murphy wrote:

On 2022-05-10 17:55, Jason Gunthorpe via iommu wrote:

This control causes the ARM SMMU drivers to choose a stage 2
implementation for the IO pagetable (vs the stage 1 usual default),
however this choice has no visible impact to the VFIO user. Further qemu
never implemented this and no other userspace user is known.

The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to "provide
SMMU translation services to the guest operating system" however the rest
of the API to set the guest table pointer for the stage 1 was never
completed, or at least never upstreamed, rendering this part useless dead
code.

Since the current patches to enable nested translation, aka userspace page
tables, rely on iommufd and will not use the enable_nesting()
iommu_domain_op, remove this infrastructure. However, don't cut too deep
into the SMMU drivers for now expecting the iommufd work to pick it up -
we still need to create S2 IO page tables.

Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
enable_nesting iommu_domain_op.

Just in-case there is some userspace using this continue to treat
requesting it as a NOP, but do not advertise support any more.

I assume the nested translation/guest SVA patches that Eric and Vivek were
working on pre-IOMMUFD made use of this, and given that they got quite far
along, I wouldn't be too surprised if some eager cloud vendors might have
even deployed something based on the patches off the list.

With upstream there is no way to make use of this flag, if someone is
using it they have other out of tree kernel, vfio, kvm and qemu
patches to make it all work.

You can see how much is still needed in Eric's tree:

https://github.com/eauger/linux/commits/v5.15-rc7-nested-v16


I can't help feeling a little wary about removing this until IOMMUFD
can actually offer a functional replacement - is it in the way of
anything upcoming?

 From an upstream perspective if someone has a patched kernel to
complete the feature, then they can patch this part in as well, we
should not carry dead code like this in the kernel and in the uapi.

It is not directly in the way, but this needs to get done at some
point, I'd rather just get it out of the way.


We are using this interface for nested mode.

Is the replacing patch ready?
Can we remove this until submitting the new one?

Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-12 Thread Jean-Philippe Brucker

On Wed, May 11, 2022 at 09:02:40AM -0300, Jason Gunthorpe wrote:
> On Wed, May 11, 2022 at 08:54:39AM +0100, Jean-Philippe Brucker wrote:
> > > > > Then 'detach pasid' is:
> > > > >
> > > > > iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev,
> > > > pasid);
> > > > >
> > > > > And we move away from the notion of 'detach' and in the direction that
> > > > > everything continuously has a domain set. PASID would logically
> > > > > default to blocking_domain, though we wouldn't track this anywhere.
> > > > 
> > > > I am not sure whether we still need to keep the blocking domain concept
> > > > when we are entering the new PASID world. Please allow me to wait and
> > > > listen to more opinions.
> > > > 
> > > 
> > > I'm with Jason on this direction. In concept after a PASID is detached 
> > > it's
> > > essentially blocked. Implementation-wise it doesn't prevent the iommu
> > > driver from marking the PASID entry as non-present as doing in this
> > > series instead of actually pointing to the empty page table of the block
> > > domain. But api-wise it does make the entire semantics more consistent.
> > 
> > This is all internal to IOMMU so I don't think we should be concerned
> > about API consistency. I prefer a straighforward detach() operation
> > because that way IOMMU drivers don't have to keep track of which domain is
> > attached to which PASID. That code can be factored into the IOMMU core.
> 
> Why would a driver need to keep additional tracking?
> 
> > In addition to clearing contexts, detach() also needs to invalidate TLBs,
> > and for that the SMMU driver needs to know the old ASID (!= PASID) that
> > was used by the context descriptor. We can certainly work around a missing
> > detach() to implement this, but it will be convoluted.
> 
> It is not "missing" it is just renamed to 
> blocking_domain->ops->set_dev_pasid()
> 
> The implementation of that function would be identical to
> detach_dev_pasid.

  attach(dev, pasid, sva_domain)
  detach(dev, pasid, sva_domain)

versus

  set_dev_pasid(dev, pasid, sva_domain)
  set_dev_pasid(dev, pasid, blocking)

we loose the information of the domain previously attached, and the SMMU
driver has to retrieve it to find the ASID corresponding to the mm. 

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

39 matches

Mail list logo