Re: [PATCH v9 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-13 Thread Michael S. Tsirkin
On Wed, Jul 14, 2021 at 01:45:39PM +0800, Jason Wang wrote:
> > +static int vduse_dev_msg_sync(struct vduse_dev *dev,
> > + struct vduse_dev_msg *msg)
> > +{
> > +   int ret;
> > +
> > +   init_waitqueue_head(>waitq);
> > +   spin_lock(>msg_lock);
> > +   msg->req.request_id = dev->msg_unique++;
> > +   vduse_enqueue_msg(>send_list, msg);
> > +   wake_up(>waitq);
> > +   spin_unlock(>msg_lock);
> > +
> > +   wait_event_killable_timeout(msg->waitq, msg->completed,
> > +   VDUSE_REQUEST_TIMEOUT * HZ);
> > +   spin_lock(>msg_lock);
> > +   if (!msg->completed) {
> > +   list_del(>list);
> > +   msg->resp.result = VDUSE_REQ_RESULT_FAILED;
> > +   }
> > +   ret = (msg->resp.result == VDUSE_REQ_RESULT_OK) ? 0 : -EIO;
> 
> 
> I think we should mark the device as malfunction when there is a timeout and
> forbid any userspace operations except for the destroy aftwards for safety.

This looks like if one tried to run gdb on the program the behaviour
will change completely because kernel wants it to respond within
specific time. Looks like a receipe for heisenbugs.

Let's not build interfaces with arbitrary timeouts like that.
Interruptible wait exists for this very reason. Let userspace have its
own code to set and use timers. This does mean that userspace will
likely have to change a bit to support this driver, such is life.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 03/17] vdpa: Fix code indentation

2021-07-13 Thread Yongji Xie
On Wed, Jul 14, 2021 at 12:20 PM Joe Perches  wrote:
>
> On Tue, 2021-07-13 at 16:46 +0800, Xie Yongji wrote:
> > Use tabs to indent the code instead of spaces.
>
> There are a lot more of these in this file.
>
> $ ./scripts/checkpatch.pl --fix-inplace --strict include/linux/vdpa.h
>
> and a little typing gives:
> ---
>  include/linux/vdpa.h | 50 +-
>  1 file changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 3357ac98878d4..14cd4248e59fd 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -43,17 +43,17 @@ struct vdpa_vq_state_split {
>   * @last_used_idx: used index
>   */
>  struct vdpa_vq_state_packed {
> -u16last_avail_counter:1;
> -u16last_avail_idx:15;
> -u16last_used_counter:1;
> -u16last_used_idx:15;
> +   u16 last_avail_counter:1;
> +   u16 last_avail_idx:15;
> +   u16 last_used_counter:1;
> +   u16 last_used_idx:15;
>  };
>
>  struct vdpa_vq_state {
> - union {
> -  struct vdpa_vq_state_split split;
> -  struct vdpa_vq_state_packed packed;
> - };
> +   union {
> +   struct vdpa_vq_state_split split;
> +   struct vdpa_vq_state_packed packed;
> +   };
>  };
>
>  struct vdpa_mgmt_dev;
> @@ -131,7 +131,7 @@ struct vdpa_iova_range {
>   * @vdev: vdpa device
>   * @idx: virtqueue index
>   * @state: pointer to returned state 
> (last_avail_idx)
> - * @get_vq_notification:   Get the notification area for a virtqueue
> + * @get_vq_notification:   Get the notification area for a virtqueue
>   * @vdev: vdpa device
>   * @idx: virtqueue index
>   * Returns the notifcation area
> @@ -277,13 +277,13 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
> *parent,
> const struct vdpa_config_ops *config,
> size_t size, const char *name);
>
> -#define vdpa_alloc_device(dev_struct, member, parent, config, name)   \
> - container_of(__vdpa_alloc_device( \
> -  parent, config, \
> -  sizeof(dev_struct) + \
> -  BUILD_BUG_ON_ZERO(offsetof( \
> -  dev_struct, member)), name), \
> -  dev_struct, member)
> +#define vdpa_alloc_device(dev_struct, member, parent, config, name)\
> +   container_of(__vdpa_alloc_device(parent, config,\
> +sizeof(dev_struct) +   \
> +
> BUILD_BUG_ON_ZERO(offsetof(dev_struct, \
> +   member)), 
> \
> +name), \
> +dev_struct, member)
>
>  int vdpa_register_device(struct vdpa_device *vdev, int nvqs);
>  void vdpa_unregister_device(struct vdpa_device *vdev);
> @@ -308,8 +308,8 @@ struct vdpa_driver {
>  int __vdpa_register_driver(struct vdpa_driver *drv, struct module *owner);
>  void vdpa_unregister_driver(struct vdpa_driver *drv);
>
> -#define module_vdpa_driver(__vdpa_driver) \
> -   module_driver(__vdpa_driver, vdpa_register_driver,  \
> +#define module_vdpa_driver(__vdpa_driver)  \
> +   module_driver(__vdpa_driver, vdpa_register_driver,  \
>   vdpa_unregister_driver)
>
>  static inline struct vdpa_driver *drv_to_vdpa(struct device_driver *driver)
> @@ -339,25 +339,25 @@ static inline struct device *vdpa_get_dma_dev(struct 
> vdpa_device *vdev)
>
>  static inline void vdpa_reset(struct vdpa_device *vdev)
>  {
> -const struct vdpa_config_ops *ops = vdev->config;
> +   const struct vdpa_config_ops *ops = vdev->config;
>
> vdev->features_valid = false;
> -ops->set_status(vdev, 0);
> +   ops->set_status(vdev, 0);
>  }
>
>  static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features)
>  {
> -const struct vdpa_config_ops *ops = vdev->config;
> +   const struct vdpa_config_ops *ops = vdev->config;
>
> vdev->features_valid = true;
> -return ops->set_features(vdev, features);
> +   return ops->set_features(vdev, features);
>  }
>
> -
> -static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,
> +static inline void vdpa_get_config(struct vdpa_device *vdev,
> +  unsigned int offset,
>void *buf, unsigned int len)
>  {
> -const struct vdpa_config_ops *ops = vdev->config;
> +   const struct 

Re: [PATCH v9 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-13 Thread Jason Wang


在 2021/7/13 下午4:46, Xie Yongji 写道:

This VDUSE driver enables implementing software-emulated vDPA
devices in userspace. The vDPA device is created by
ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
interface (/dev/vduse/$NAME) is exported to userspace for device
emulation.

In order to make the device emulation more secure, the device's
control path is handled in kernel. A message mechnism is introduced
to forward some dataplane related control messages to userspace.

And in the data path, the DMA buffer will be mapped into userspace
address space through different ways depending on the vDPA bus to
which the vDPA device is attached. In virtio-vdpa case, the MMU-based
IOMMU driver is used to achieve that. And in vhost-vdpa case, the
DMA buffer is reside in a userspace memory region which can be shared
to the VDUSE userspace processs via transferring the shmfd.

For more details on VDUSE design and usage, please see the follow-on
Documentation commit.

Signed-off-by: Xie Yongji 
---
  Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
  drivers/vdpa/Kconfig   |   10 +
  drivers/vdpa/Makefile  |1 +
  drivers/vdpa/vdpa_user/Makefile|5 +
  drivers/vdpa/vdpa_user/vduse_dev.c | 1502 
  include/uapi/linux/vduse.h |  221 +++
  6 files changed, 1740 insertions(+)
  create mode 100644 drivers/vdpa/vdpa_user/Makefile
  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
  create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 1409e40e6345..293ca3aef358 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
  'z'   10-4F  drivers/s390/crypto/zcrypt_api.hconflict!
  '|'   00-7F  linux/media.h
  0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
  0x89  00-06  arch/x86/include/asm/sockios.h
  0x89  0B-DF  linux/sockios.h
  0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index a503c1b2bfd9..6e23bce6433a 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
  vDPA block device simulator which terminates IO request in a
  memory buffer.
  
+config VDPA_USER

+   tristate "VDUSE (vDPA Device in Userspace) support"
+   depends on EVENTFD && MMU && HAS_DMA
+   select DMA_OPS
+   select VHOST_IOTLB
+   select IOMMU_IOVA
+   help
+ With VDUSE it is possible to emulate a vDPA Device
+ in a userspace program.
+
  config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
  # SPDX-License-Identifier: GPL-2.0
  obj-$(CONFIG_VDPA) += vdpa.o
  obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
  obj-$(CONFIG_IFCVF)+= ifcvf/
  obj-$(CONFIG_MLX5_VDPA) += mlx5/
  obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..c994a4a4660c
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1502 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024)
+#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+#define VDUSE_REQUEST_TIMEOUT 30
+
+struct vduse_virtqueue {
+   u16 index;
+   u16 num_max;
+   u32 num;
+   u64 desc_addr;
+   u64 driver_addr;
+   u64 device_addr;
+   struct vdpa_vq_state state;
+   bool ready;
+   bool kicked;
+   spinlock_t kick_lock;
+   spinlock_t irq_lock;
+   

Re: [PATCH v9 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-13 Thread Yongji Xie
On Wed, Jul 14, 2021 at 10:54 AM Jason Wang  wrote:
>
>
> 在 2021/7/13 下午9:27, Dan Carpenter 写道:
> > On Tue, Jul 13, 2021 at 04:46:55PM +0800, Xie Yongji wrote:
> >> +static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> >> +{
> >> +struct vduse_vdpa *vdev;
> >> +int ret;
> >> +
> >> +if (dev->vdev)
> >> +return -EEXIST;
> >> +
> >> +vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> >> + _vdpa_config_ops, name, true);
> >> +if (!vdev)
> >> +return -ENOMEM;
> > This should be an IS_ERR() check instead of a NULL check.
>
>
> Yes.
>
>
> >
> > The vdpa_alloc_device() macro is doing something very complicated but
> > I'm not sure what.  It calls container_of() and that looks buggy until
> > you spot the BUILD_BUG_ON_ZERO() compile time assert which ensures that
> > the container_of() is a no-op.
> >
> > Only one of the callers checks for error pointers correctly so maybe
> > it's too complicated or maybe there should be better documentation.
>
>
> We need better documentation for this macro and fix all the buggy callers.
>
> Yong Ji, want to do that?
>

Sure, I will send the fix soon.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 13/17] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

2021-07-13 Thread Yongji Xie
On Tue, Jul 13, 2021 at 7:31 PM Dan Carpenter  wrote:
>
> On Tue, Jul 13, 2021 at 04:46:52PM +0800, Xie Yongji wrote:
> > @@ -613,37 +618,28 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, 
> > u64 iova, u64 size)
> >   }
> >  }
> >
> > -static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
> > -struct vhost_iotlb_msg *msg)
> > +static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
> > +  u64 iova, u64 size, u64 uaddr, u32 perm)
> >  {
> >   struct vhost_dev *dev = >vdev;
> > - struct vhost_iotlb *iotlb = dev->iotlb;
> >   struct page **page_list;
> >   unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
> >   unsigned int gup_flags = FOLL_LONGTERM;
> >   unsigned long npages, cur_base, map_pfn, last_pfn = 0;
> >   unsigned long lock_limit, sz2pin, nchunks, i;
> > - u64 iova = msg->iova;
> > + u64 start = iova;
> >   long pinned;
> >   int ret = 0;
> >
> > - if (msg->iova < v->range.first ||
> > - msg->iova + msg->size - 1 > v->range.last)
> > - return -EINVAL;
>
> This is not related to your patch, but can the "msg->iova + msg->size"
> addition can have an integer overflow.  From looking at the callers it
> seems like it can.  msg comes from:
>   vhost_chr_write_iter()
>   --> dev->msg_handler(dev, );
>   --> vhost_vdpa_process_iotlb_msg()
>  --> vhost_vdpa_process_iotlb_update()
>
> If I'm thinking of the right thing then these are allowed to overflow to
> 0 because of the " - 1" but not further than that.  I believe the check
> needs to be something like:
>
> if (msg->iova < v->range.first ||
> msg->iova - 1 > U64_MAX - msg->size ||
> msg->iova + msg->size - 1 > v->range.last)
>

Make sense.

> But writing integer overflow check correctly is notoriously difficult.
> Do you think you could send a fix for that which is separate from the
> patcheset?  We'd want to backport it to stable.
>

OK, I will send a patch to fix it.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Enable swiotlb if any device supports iommu v2 and uses identity mapping

2021-07-13 Thread Kai-Heng Feng
On Wed, Jul 14, 2021 at 7:57 AM Konrad Rzeszutek Wilk  wrote:
>
> On Thu, Jul 08, 2021 at 03:43:42PM +0100, Robin Murphy wrote:
> > On 2021-07-08 14:57, Kai-Heng Feng wrote:
> > > On Thu, Jul 8, 2021 at 6:18 PM Robin Murphy  wrote:
> > > >
> > > > On 2021-07-08 10:28, Joerg Roedel wrote:
> > > > > On Thu, Jul 08, 2021 at 03:42:32PM +0800, Kai-Heng Feng wrote:
> > > > > > @@ -344,6 +344,9 @@ static int iommu_init_device(struct device *dev)
> > > > > >
> > > > > >   iommu = amd_iommu_rlookup_table[dev_data->devid];
> > > > > >   dev_data->iommu_v2 = iommu->is_iommu_v2;
> > > > > > +
> > > > > > +if (dev_data->iommu_v2)
> > > > > > +swiotlb = 1;
> > > > >
> > > > > This looks like the big hammer, as it will affect all other systems
> > > > > where the AMD GPUs are in their own group.
> > > > >
> > > > > What is needed here is an explicit check whether a non-iommu-v2 device
> > > > > is direct-mapped because it shares a group with the GPU, and only 
> > > > > enable
> > > > > swiotlb in this case.
> > > >
> > > > Right, it's basically about whether any DMA-limited device might at any
> > > > time end up in an IOMMU_DOMAIN_IDENTITY domain. And given the
> > > > possibility of device hotplug and the user being silly with the sysfs
> > > > interface, I don't think we can categorically determine that at boot 
> > > > time.
> > > >
> > > > Also note that Intel systems are likely to be similarly affected (in
> > > > fact intel-iommu doesn't even have the iommu_default_passthough() check
> > > > so it's probably even easier to blow up).
> > >
> > > swiotlb is enabled by pci_swiotlb_detect_4gb() and intel-iommu doesn't
> > > disable it.
> >
> > Oh, right... I did say I found this dance hard to follow. Clearly I
> > shouldn't have trusted what I thought I remembered from looking at it
> > yesterday :)
> >
> > Also not helped by the fact that it sets iommu_detected which *does* disable
> > SWIOTLB, but only on IA-64.
> >
> > > I wonder if we can take the same approach in amd-iommu?
> >
> > Certainly if there's a precedent for leaving SWIOTLB enabled even if it
> > *might* be redundant, that seems like the easiest option (it's what we do on
> > arm64 too, but then we have system topologies where some devices may not be
> > behind IOMMUs even when others are). More fun would be to try to bring it up
> > at the first sign of IOMMU_DOMAIN_IDENTITY if it was disabled previously,
> > but I don't have the highest hope of that being practical.
>
> 
> It is kind of silly to enable SWIOTLB which will just eat 64MB of memory
> "just in case".
>
> The SWIOTLB does have support to do late initialization (xen-pcifront
> does that for example - so if you add devices that can't do 64-bit it
> will allocate something like 4MB).
>
> Would that be a better choice going forward - that is allocate this
> under those conditions?

But how to practically do swiotlb late init on 32-bit capable devices?
On the first DMA map requested by the driver?

Kai-Heng

> >
> > Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 11/11] arm64: dts: mediatek: Get rid of mediatek, larb for MM nodes

2021-07-13 Thread Yong Wu
After adding device_link between the IOMMU consumer and smi,
the mediatek,larb is unnecessary now.

CC: Matthias Brugger 
Signed-off-by: Yong Wu 
Reviewed-by: Evan Green 
---
 arch/arm64/boot/dts/mediatek/mt8173.dtsi | 16 
 arch/arm64/boot/dts/mediatek/mt8183.dtsi |  6 --
 2 files changed, 22 deletions(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index 2f0fc1e317d7..cf5d26db82b8 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -1009,7 +1009,6 @@
 < CLK_MM_MUTEX_32K>;
power-domains = < MT8173_POWER_DOMAIN_MM>;
iommus = < M4U_PORT_MDP_RDMA0>;
-   mediatek,larb = <>;
};
 
mdp_rdma1: rdma@14002000 {
@@ -1019,7 +1018,6 @@
 < CLK_MM_MUTEX_32K>;
power-domains = < MT8173_POWER_DOMAIN_MM>;
iommus = < M4U_PORT_MDP_RDMA1>;
-   mediatek,larb = <>;
};
 
mdp_rsz0: rsz@14003000 {
@@ -1049,7 +1047,6 @@
clocks = < CLK_MM_MDP_WDMA>;
power-domains = < MT8173_POWER_DOMAIN_MM>;
iommus = < M4U_PORT_MDP_WDMA>;
-   mediatek,larb = <>;
};
 
mdp_wrot0: wrot@14007000 {
@@ -1058,7 +1055,6 @@
clocks = < CLK_MM_MDP_WROT0>;
power-domains = < MT8173_POWER_DOMAIN_MM>;
iommus = < M4U_PORT_MDP_WROT0>;
-   mediatek,larb = <>;
};
 
mdp_wrot1: wrot@14008000 {
@@ -1067,7 +1063,6 @@
clocks = < CLK_MM_MDP_WROT1>;
power-domains = < MT8173_POWER_DOMAIN_MM>;
iommus = < M4U_PORT_MDP_WROT1>;
-   mediatek,larb = <>;
};
 
ovl0: ovl@1400c000 {
@@ -1077,7 +1072,6 @@
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_OVL0>;
iommus = < M4U_PORT_DISP_OVL0>;
-   mediatek,larb = <>;
mediatek,gce-client-reg = < SUBSYS_1400 0xc000 
0x1000>;
};
 
@@ -1088,7 +1082,6 @@
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_OVL1>;
iommus = < M4U_PORT_DISP_OVL1>;
-   mediatek,larb = <>;
mediatek,gce-client-reg = < SUBSYS_1400 0xd000 
0x1000>;
};
 
@@ -1099,7 +1092,6 @@
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_RDMA0>;
iommus = < M4U_PORT_DISP_RDMA0>;
-   mediatek,larb = <>;
mediatek,gce-client-reg = < SUBSYS_1400 0xe000 
0x1000>;
};
 
@@ -1110,7 +1102,6 @@
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_RDMA1>;
iommus = < M4U_PORT_DISP_RDMA1>;
-   mediatek,larb = <>;
mediatek,gce-client-reg = < SUBSYS_1400 0xf000 
0x1000>;
};
 
@@ -1121,7 +1112,6 @@
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_RDMA2>;
iommus = < M4U_PORT_DISP_RDMA2>;
-   mediatek,larb = <>;
mediatek,gce-client-reg = < SUBSYS_1401 0 
0x1000>;
};
 
@@ -1132,7 +1122,6 @@
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_WDMA0>;
iommus = < M4U_PORT_DISP_WDMA0>;
-   mediatek,larb = <>;
mediatek,gce-client-reg = < SUBSYS_1401 0x1000 
0x1000>;
};
 
@@ -1143,7 +1132,6 @@
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_WDMA1>;
iommus = < M4U_PORT_DISP_WDMA1>;
-   mediatek,larb = <>;
mediatek,gce-client-reg = < SUBSYS_1401 0x2000 
0x1000>;
};
 
@@ -1394,7 +1382,6 @@
  <0 0x16027800 0 0x800>,   /* VDEC_HWB */
  <0 0x16028400 0 0x400>;   /* VDEC_HWG */
interrupts = ;
-   mediatek,larb = <>;
iommus = < M4U_PORT_HW_VDEC_MC_EXT>,
 < M4U_PORT_HW_VDEC_PP_EXT>,
 < M4U_PORT_HW_VDEC_AVC_MV_EXT>,
@@ -1462,7 +1449,6 @@
  

[PATCH v6 10/11] arm: dts: mediatek: Get rid of mediatek, larb for MM nodes

2021-07-13 Thread Yong Wu
After adding device_link between the IOMMU consumer and smi,
the mediatek,larb is unnecessary now.

CC: Matthias Brugger 
Signed-off-by: Yong Wu 
Reviewed-by: Evan Green 
---
 arch/arm/boot/dts/mt2701.dtsi  | 2 --
 arch/arm/boot/dts/mt7623n.dtsi | 5 -
 2 files changed, 7 deletions(-)

diff --git a/arch/arm/boot/dts/mt2701.dtsi b/arch/arm/boot/dts/mt2701.dtsi
index 4776f85d6d5b..ef583cfd3baf 100644
--- a/arch/arm/boot/dts/mt2701.dtsi
+++ b/arch/arm/boot/dts/mt2701.dtsi
@@ -564,7 +564,6 @@
clock-names = "jpgdec-smi",
  "jpgdec";
power-domains = < MT2701_POWER_DOMAIN_ISP>;
-   mediatek,larb = <>;
iommus = < MT2701_M4U_PORT_JPGDEC_WDMA>,
 < MT2701_M4U_PORT_JPGDEC_BSDMA>;
};
@@ -577,7 +576,6 @@
clocks =  < CLK_IMG_VENC>;
clock-names = "jpgenc";
power-domains = < MT2701_POWER_DOMAIN_ISP>;
-   mediatek,larb = <>;
iommus = < MT2701_M4U_PORT_JPGENC_RDMA>,
 < MT2701_M4U_PORT_JPGENC_BSDMA>;
};
diff --git a/arch/arm/boot/dts/mt7623n.dtsi b/arch/arm/boot/dts/mt7623n.dtsi
index bcb0846e29fd..3adab5cd1fef 100644
--- a/arch/arm/boot/dts/mt7623n.dtsi
+++ b/arch/arm/boot/dts/mt7623n.dtsi
@@ -121,7 +121,6 @@
clock-names = "jpgdec-smi",
  "jpgdec";
power-domains = < MT2701_POWER_DOMAIN_ISP>;
-   mediatek,larb = <>;
iommus = < MT2701_M4U_PORT_JPGDEC_WDMA>,
 < MT2701_M4U_PORT_JPGDEC_BSDMA>;
};
@@ -144,7 +143,6 @@
interrupts = ;
clocks = < CLK_MM_DISP_OVL>;
iommus = < MT2701_M4U_PORT_DISP_OVL_0>;
-   mediatek,larb = <>;
};
 
rdma0: rdma@14008000 {
@@ -154,7 +152,6 @@
interrupts = ;
clocks = < CLK_MM_DISP_RDMA>;
iommus = < MT2701_M4U_PORT_DISP_RDMA>;
-   mediatek,larb = <>;
};
 
wdma@14009000 {
@@ -164,7 +161,6 @@
interrupts = ;
clocks = < CLK_MM_DISP_WDMA>;
iommus = < MT2701_M4U_PORT_DISP_WDMA>;
-   mediatek,larb = <>;
};
 
bls: pwm@1400a000 {
@@ -215,7 +211,6 @@
interrupts = ;
clocks = < CLK_MM_DISP_RDMA1>;
iommus = < MT2701_M4U_PORT_DISP_RDMA1>;
-   mediatek,larb = <>;
};
 
dpi0: dpi@14014000 {
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 09/11] memory: mtk-smi: Get rid of mtk_smi_larb_get/put

2021-07-13 Thread Yong Wu
After adding device_link between the iommu consumer and smi-larb,
the pm_runtime_get(_sync) of smi-larb and smi-common will be called
automatically. we can get rid of mtk_smi_larb_get/put.

CC: Matthias Brugger 
Signed-off-by: Yong Wu 
Reviewed-by: Evan Green 
Acked-by: Krzysztof Kozlowski 
Acked-by: Matthias Brugger 
---
 drivers/memory/mtk-smi.c   | 14 --
 include/soc/mediatek/smi.h | 20 
 2 files changed, 34 deletions(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index c5fb51f73b34..7c61c924e220 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -134,20 +134,6 @@ static void mtk_smi_clk_disable(const struct mtk_smi *smi)
clk_disable_unprepare(smi->clk_apb);
 }
 
-int mtk_smi_larb_get(struct device *larbdev)
-{
-   int ret = pm_runtime_resume_and_get(larbdev);
-
-   return (ret < 0) ? ret : 0;
-}
-EXPORT_SYMBOL_GPL(mtk_smi_larb_get);
-
-void mtk_smi_larb_put(struct device *larbdev)
-{
-   pm_runtime_put_sync(larbdev);
-}
-EXPORT_SYMBOL_GPL(mtk_smi_larb_put);
-
 static int
 mtk_smi_larb_bind(struct device *dev, struct device *master, void *data)
 {
diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
index 15e3397cec58..11f7d6b59642 100644
--- a/include/soc/mediatek/smi.h
+++ b/include/soc/mediatek/smi.h
@@ -19,26 +19,6 @@ struct mtk_smi_larb_iommu {
unsigned char  bank[32];
 };
 
-/*
- * mtk_smi_larb_get: Enable the power domain and clocks for this local arbiter.
- *   It also initialize some basic setting(like iommu).
- * mtk_smi_larb_put: Disable the power domain and clocks for this local 
arbiter.
- * Both should be called in non-atomic context.
- *
- * Returns 0 if successful, negative on failure.
- */
-int mtk_smi_larb_get(struct device *larbdev);
-void mtk_smi_larb_put(struct device *larbdev);
-
-#else
-
-static inline int mtk_smi_larb_get(struct device *larbdev)
-{
-   return 0;
-}
-
-static inline void mtk_smi_larb_put(struct device *larbdev) { }
-
 #endif
 
 #endif
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 08/11] media: mtk-vcodec: Get rid of mtk_smi_larb_get/put

2021-07-13 Thread Yong Wu
MediaTek IOMMU has already added the device_link between the consumer
and smi-larb device. If the vcodec device call the pm_runtime_get_sync,
the smi-larb's pm_runtime_get_sync also be called automatically.

CC: Tiffany Lin 
CC: Irui Wang 
Signed-off-by: Yong Wu 
Reviewed-by: Evan Green 
Acked-by: Tiffany Lin 
---
 .../platform/mtk-vcodec/mtk_vcodec_dec_pm.c   | 37 +++-
 .../platform/mtk-vcodec/mtk_vcodec_drv.h  |  3 --
 .../platform/mtk-vcodec/mtk_vcodec_enc.c  |  1 -
 .../platform/mtk-vcodec/mtk_vcodec_enc_pm.c   | 44 +++
 4 files changed, 10 insertions(+), 75 deletions(-)

diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c 
b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c
index 6038db96f71c..d0bf9aa3b29d 100644
--- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c
+++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c
@@ -8,14 +8,12 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "mtk_vcodec_dec_pm.h"
 #include "mtk_vcodec_util.h"
 
 int mtk_vcodec_init_dec_pm(struct mtk_vcodec_dev *mtkdev)
 {
-   struct device_node *node;
struct platform_device *pdev;
struct mtk_vcodec_pm *pm;
struct mtk_vcodec_clk *dec_clk;
@@ -26,18 +24,7 @@ int mtk_vcodec_init_dec_pm(struct mtk_vcodec_dev *mtkdev)
pm = >pm;
pm->mtkdev = mtkdev;
dec_clk = >vdec_clk;
-   node = of_parse_phandle(pdev->dev.of_node, "mediatek,larb", 0);
-   if (!node) {
-   mtk_v4l2_err("of_parse_phandle mediatek,larb fail!");
-   return -1;
-   }
 
-   pdev = of_find_device_by_node(node);
-   of_node_put(node);
-   if (WARN_ON(!pdev)) {
-   return -1;
-   }
-   pm->larbvdec = >dev;
pdev = mtkdev->plat_dev;
pm->dev = >dev;
 
@@ -47,14 +34,11 @@ int mtk_vcodec_init_dec_pm(struct mtk_vcodec_dev *mtkdev)
dec_clk->clk_info = devm_kcalloc(>dev,
dec_clk->clk_num, sizeof(*clk_info),
GFP_KERNEL);
-   if (!dec_clk->clk_info) {
-   ret = -ENOMEM;
-   goto put_device;
-   }
+   if (!dec_clk->clk_info)
+   return -ENOMEM;
} else {
mtk_v4l2_err("Failed to get vdec clock count");
-   ret = -EINVAL;
-   goto put_device;
+   return -EINVAL;
}
 
for (i = 0; i < dec_clk->clk_num; i++) {
@@ -63,29 +47,24 @@ int mtk_vcodec_init_dec_pm(struct mtk_vcodec_dev *mtkdev)
"clock-names", i, _info->clk_name);
if (ret) {
mtk_v4l2_err("Failed to get clock name id = %d", i);
-   goto put_device;
+   return ret;
}
clk_info->vcodec_clk = devm_clk_get(>dev,
clk_info->clk_name);
if (IS_ERR(clk_info->vcodec_clk)) {
mtk_v4l2_err("devm_clk_get (%d)%s fail", i,
clk_info->clk_name);
-   ret = PTR_ERR(clk_info->vcodec_clk);
-   goto put_device;
+   return PTR_ERR(clk_info->vcodec_clk);
}
}
 
pm_runtime_enable(>dev);
return 0;
-put_device:
-   put_device(pm->larbvdec);
-   return ret;
 }
 
 void mtk_vcodec_release_dec_pm(struct mtk_vcodec_dev *dev)
 {
pm_runtime_disable(dev->pm.dev);
-   put_device(dev->pm.larbvdec);
 }
 
 int mtk_vcodec_dec_pw_on(struct mtk_vcodec_pm *pm)
@@ -122,11 +101,6 @@ void mtk_vcodec_dec_clock_on(struct mtk_vcodec_pm *pm)
}
}
 
-   ret = mtk_smi_larb_get(pm->larbvdec);
-   if (ret) {
-   mtk_v4l2_err("mtk_smi_larb_get larbvdec fail %d", ret);
-   goto error;
-   }
return;
 
 error:
@@ -139,7 +113,6 @@ void mtk_vcodec_dec_clock_off(struct mtk_vcodec_pm *pm)
struct mtk_vcodec_clk *dec_clk = >vdec_clk;
int i = 0;
 
-   mtk_smi_larb_put(pm->larbvdec);
for (i = dec_clk->clk_num - 1; i >= 0; i--)
clk_disable_unprepare(dec_clk->clk_info[i].vcodec_clk);
 }
diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h 
b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
index c6c7672fecfb..64b73dd880ce 100644
--- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
+++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_drv.h
@@ -189,10 +189,7 @@ struct mtk_vcodec_clk {
  */
 struct mtk_vcodec_pm {
struct mtk_vcodec_clk   vdec_clk;
-   struct device   *larbvdec;
-
struct mtk_vcodec_clk   venc_clk;
-   struct device   *larbvenc;
struct device   *dev;
struct mtk_vcodec_dev   *mtkdev;
 };
diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c 
b/drivers/media/platform/mtk-vcodec/mtk_vcodec_enc.c
index 416f356af363..9a1515cf862d 100644

[PATCH v6 07/11] drm/mediatek: Get rid of mtk_smi_larb_get/put

2021-07-13 Thread Yong Wu
MediaTek IOMMU has already added the device_link between the consumer
and smi-larb device. If the drm device call the pm_runtime_get_sync,
the smi-larb's pm_runtime_get_sync also be called automatically.

CC: CK Hu 
CC: Philipp Zabel 
Signed-off-by: Yong Wu 
Reviewed-by: Evan Green 
Acked-by: Chun-Kuang Hu 
---
 drivers/gpu/drm/mediatek/mtk_drm_crtc.c |  9 --
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 36 ++---
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h |  1 -
 drivers/gpu/drm/mediatek/mtk_drm_drv.c  |  5 +--
 4 files changed, 3 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c 
b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
index 08e3f352377d..d046abcf66ce 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
@@ -10,7 +10,6 @@
 #include 
 
 #include 
-#include 
 
 #include 
 #include 
@@ -551,12 +550,6 @@ static void mtk_drm_crtc_atomic_enable(struct drm_crtc 
*crtc,
 
DRM_DEBUG_DRIVER("%s %d\n", __func__, crtc->base.id);
 
-   ret = mtk_smi_larb_get(comp->larb_dev);
-   if (ret) {
-   DRM_ERROR("Failed to get larb: %d\n", ret);
-   return;
-   }
-
ret = pm_runtime_resume_and_get(comp->dev);
if (ret < 0)
DRM_DEV_ERROR(comp->dev, "Failed to enable power domain: %d\n",
@@ -564,7 +557,6 @@ static void mtk_drm_crtc_atomic_enable(struct drm_crtc 
*crtc,
 
ret = mtk_crtc_ddp_hw_init(mtk_crtc);
if (ret) {
-   mtk_smi_larb_put(comp->larb_dev);
pm_runtime_put(comp->dev);
return;
}
@@ -601,7 +593,6 @@ static void mtk_drm_crtc_atomic_disable(struct drm_crtc 
*crtc,
 
drm_crtc_vblank_off(crtc);
mtk_crtc_ddp_hw_fini(mtk_crtc);
-   mtk_smi_larb_put(comp->larb_dev);
ret = pm_runtime_put(comp->dev);
if (ret < 0)
DRM_DEV_ERROR(comp->dev, "Failed to disable power domain: %d\n",
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c 
b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
index 75bc00e17fc4..7d240218d4c7 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c
@@ -449,37 +449,15 @@ unsigned int mtk_drm_find_possible_crtc_by_comp(struct 
drm_device *drm,
return ret;
 }
 
-static int mtk_ddp_get_larb_dev(struct device_node *node, struct mtk_ddp_comp 
*comp,
-   struct device *dev)
-{
-   struct device_node *larb_node;
-   struct platform_device *larb_pdev;
-
-   larb_node = of_parse_phandle(node, "mediatek,larb", 0);
-   if (!larb_node) {
-   dev_err(dev, "Missing mediadek,larb phandle in %pOF node\n", 
node);
-   return -EINVAL;
-   }
-
-   larb_pdev = of_find_device_by_node(larb_node);
-   if (!larb_pdev) {
-   dev_warn(dev, "Waiting for larb device %pOF\n", larb_node);
-   of_node_put(larb_node);
-   return -EPROBE_DEFER;
-   }
-   of_node_put(larb_node);
-   comp->larb_dev = _pdev->dev;
-
-   return 0;
-}
-
 int mtk_ddp_comp_init(struct device_node *node, struct mtk_ddp_comp *comp,
  enum mtk_ddp_comp_id comp_id)
 {
struct platform_device *comp_pdev;
enum mtk_ddp_comp_type type;
struct mtk_ddp_comp_dev *priv;
+#if IS_REACHABLE(CONFIG_MTK_CMDQ)
int ret;
+#endif
 
if (comp_id < 0 || comp_id >= DDP_COMPONENT_ID_MAX)
return -EINVAL;
@@ -495,16 +473,6 @@ int mtk_ddp_comp_init(struct device_node *node, struct 
mtk_ddp_comp *comp,
}
comp->dev = _pdev->dev;
 
-   /* Only DMA capable components need the LARB property */
-   if (type == MTK_DISP_OVL ||
-   type == MTK_DISP_OVL_2L ||
-   type == MTK_DISP_RDMA ||
-   type == MTK_DISP_WDMA) {
-   ret = mtk_ddp_get_larb_dev(node, comp, comp->dev);
-   if (ret)
-   return ret;
-   }
-
if (type == MTK_DISP_BLS ||
type == MTK_DISP_CCORR ||
type == MTK_DISP_COLOR ||
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h 
b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h
index bb914d976cf5..1b582262b682 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h
+++ b/drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h
@@ -70,7 +70,6 @@ struct mtk_ddp_comp_funcs {
 struct mtk_ddp_comp {
struct device *dev;
int irq;
-   struct device *larb_dev;
enum mtk_ddp_comp_id id;
const struct mtk_ddp_comp_funcs *funcs;
 };
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_drv.c 
b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
index b46bdb8985da..0d5ef3d8d081 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -577,11 +577,8 @@ static int mtk_drm_probe(struct platform_device *pdev)
pm_runtime_disable(dev);
 err_node:

[PATCH v6 06/11] drm/mediatek: Add pm runtime support for ovl and rdma

2021-07-13 Thread Yong Wu
From: Yongqiang Niu 

Prepare for smi cleaning up "mediatek,larb".

Display use the dispsys device to call pm_rumtime_get_sync before.
This patch add pm_runtime_xx with ovl and rdma device whose nodes has
"iommus" property, then display could help pm_runtime_get for smi via
ovl or rdma device.

CC: CK Hu 
Signed-off-by: Yongqiang Niu 
Signed-off-by: Yong Wu 
(Yong: Use pm_runtime_resume_and_get instead of pm_runtime_get_sync)
Acked-by: Chun-Kuang Hu 
---
 drivers/gpu/drm/mediatek/mtk_disp_ovl.c  |  9 -
 drivers/gpu/drm/mediatek/mtk_disp_rdma.c |  9 -
 drivers/gpu/drm/mediatek/mtk_drm_crtc.c  | 12 +++-
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c 
b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
index fa9d79963cd3..ea5760f856ec 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_ovl.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "mtk_disp_drv.h"
@@ -414,15 +415,21 @@ static int mtk_disp_ovl_probe(struct platform_device 
*pdev)
return ret;
}
 
+   pm_runtime_enable(dev);
+
ret = component_add(dev, _disp_ovl_component_ops);
-   if (ret)
+   if (ret) {
+   pm_runtime_disable(dev);
dev_err(dev, "Failed to add component: %d\n", ret);
+   }
 
return ret;
 }
 
 static int mtk_disp_ovl_remove(struct platform_device *pdev)
 {
+   pm_runtime_disable(>dev);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c 
b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
index 705f28ceb4dd..0f31d1c8e37c 100644
--- a/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
+++ b/drivers/gpu/drm/mediatek/mtk_disp_rdma.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "mtk_disp_drv.h"
@@ -327,9 +328,13 @@ static int mtk_disp_rdma_probe(struct platform_device 
*pdev)
 
platform_set_drvdata(pdev, priv);
 
+   pm_runtime_enable(dev);
+
ret = component_add(dev, _disp_rdma_component_ops);
-   if (ret)
+   if (ret) {
+   pm_runtime_disable(dev);
dev_err(dev, "Failed to add component: %d\n", ret);
+   }
 
return ret;
 }
@@ -338,6 +343,8 @@ static int mtk_disp_rdma_remove(struct platform_device 
*pdev)
 {
component_del(>dev, _disp_rdma_component_ops);
 
+   pm_runtime_disable(>dev);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c 
b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
index 474efb844249..08e3f352377d 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
@@ -557,9 +557,15 @@ static void mtk_drm_crtc_atomic_enable(struct drm_crtc 
*crtc,
return;
}
 
+   ret = pm_runtime_resume_and_get(comp->dev);
+   if (ret < 0)
+   DRM_DEV_ERROR(comp->dev, "Failed to enable power domain: %d\n",
+ ret);
+
ret = mtk_crtc_ddp_hw_init(mtk_crtc);
if (ret) {
mtk_smi_larb_put(comp->larb_dev);
+   pm_runtime_put(comp->dev);
return;
}
 
@@ -572,7 +578,7 @@ static void mtk_drm_crtc_atomic_disable(struct drm_crtc 
*crtc,
 {
struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc);
struct mtk_ddp_comp *comp = mtk_crtc->ddp_comp[0];
-   int i;
+   int i, ret;
 
DRM_DEBUG_DRIVER("%s %d\n", __func__, crtc->base.id);
if (!mtk_crtc->enabled)
@@ -596,6 +602,10 @@ static void mtk_drm_crtc_atomic_disable(struct drm_crtc 
*crtc,
drm_crtc_vblank_off(crtc);
mtk_crtc_ddp_hw_fini(mtk_crtc);
mtk_smi_larb_put(comp->larb_dev);
+   ret = pm_runtime_put(comp->dev);
+   if (ret < 0)
+   DRM_DEV_ERROR(comp->dev, "Failed to disable power domain: %d\n",
+ ret);
 
mtk_crtc->enabled = false;
 }
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 05/11] media: mtk-mdp: Get rid of mtk_smi_larb_get/put

2021-07-13 Thread Yong Wu
MediaTek IOMMU has already added the device_link between the consumer
and smi-larb device. If the mdp device call the pm_runtime_get_sync,
the smi-larb's pm_runtime_get_sync also be called automatically.

CC: Minghsiu Tsai 
CC: Houlong Wei 
Signed-off-by: Yong Wu 
Reviewed-by: Evan Green 
Reviewed-by: Houlong Wei 
---
 drivers/media/platform/mtk-mdp/mtk_mdp_comp.c | 46 +--
 drivers/media/platform/mtk-mdp/mtk_mdp_comp.h |  2 -
 drivers/media/platform/mtk-mdp/mtk_mdp_core.c |  1 -
 3 files changed, 1 insertion(+), 48 deletions(-)

diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c 
b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c
index de2d425efdd1..5e0ea83a9f7f 100644
--- a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c
+++ b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "mtk_mdp_comp.h"
@@ -57,13 +56,6 @@ int mtk_mdp_comp_power_on(struct mtk_mdp_comp *comp)
 {
int status, err;
 
-   if (comp->larb_dev) {
-   err = mtk_smi_larb_get(comp->larb_dev);
-   if (err)
-   dev_err(comp->dev,
-   "failed to get larb, err %d.\n", err);
-   }
-
err = pm_runtime_get_sync(comp->dev);
if (err < 0) {
dev_err(comp->dev, "failed to runtime get, err %d.\n", err);
@@ -146,9 +138,6 @@ void mtk_mdp_comp_clock_off(struct mtk_mdp_comp *comp)
continue;
clk_disable_unprepare(comp->clk[i]);
}
-
-   if (comp->larb_dev)
-   mtk_smi_larb_put(comp->larb_dev);
 }
 
 /*
@@ -236,9 +225,6 @@ static const struct component_ops mtk_mdp_component_ops = {
 
 int mtk_mdp_comp_init(struct mtk_mdp_comp *comp, struct device *dev)
 {
-   struct device_node *larb_node;
-   struct platform_device *larb_pdev;
-   int ret;
int i;
struct device_node *node = dev->of_node;
enum mtk_mdp_comp_type comp_type =
@@ -252,8 +238,7 @@ int mtk_mdp_comp_init(struct mtk_mdp_comp *comp, struct 
device *dev)
if (IS_ERR(comp->clk[i])) {
if (PTR_ERR(comp->clk[i]) != -EPROBE_DEFER)
dev_err(dev, "Failed to get clock\n");
-   ret = PTR_ERR(comp->clk[i]);
-   goto err;
+   return PTR_ERR(comp->clk[i]);
}
 
/* Only RDMA needs two clocks */
@@ -261,36 +246,7 @@ int mtk_mdp_comp_init(struct mtk_mdp_comp *comp, struct 
device *dev)
break;
}
 
-   /* Only DMA capable components need the LARB property */
-   comp->larb_dev = NULL;
-   if (comp_type != MTK_MDP_RDMA &&
-   comp_type != MTK_MDP_WDMA &&
-   comp_type != MTK_MDP_WROT)
-   return 0;
-
-   larb_node = of_parse_phandle(node, "mediatek,larb", 0);
-   if (!larb_node) {
-   dev_err(dev,
-   "Missing mediadek,larb phandle in %pOF node\n", node);
-   ret = -EINVAL;
-   goto err;
-   }
-
-   larb_pdev = of_find_device_by_node(larb_node);
-   if (!larb_pdev) {
-   dev_warn(dev, "Waiting for larb device %pOF\n", larb_node);
-   of_node_put(larb_node);
-   ret = -EPROBE_DEFER;
-   goto err;
-   }
-   of_node_put(larb_node);
-
-   comp->larb_dev = _pdev->dev;
-
return 0;
-
-err:
-   return ret;
 }
 
 static int mtk_mdp_comp_probe(struct platform_device *pdev)
diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h 
b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h
index 5201c47f7baa..2bd229cc7eae 100644
--- a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h
+++ b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h
@@ -11,13 +11,11 @@
  * struct mtk_mdp_comp - the MDP's function component data
  * @node:  list node to track sibing MDP components
  * @clk:   clocks required for component
- * @larb_dev:  SMI device required for component
  * @dev:   component's device
  */
 struct mtk_mdp_comp {
struct list_headnode;
struct clk  *clk[2];
-   struct device   *larb_dev;
struct device   *dev;
 };
 
diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c 
b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
index e1fb39231248..be7d35b3e3ff 100644
--- a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
+++ b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "mtk_mdp_comp.h"
 #include "mtk_mdp_core.h"
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 04/11] media: mtk-jpeg: Get rid of mtk_smi_larb_get/put

2021-07-13 Thread Yong Wu
MediaTek IOMMU has already added device_link between the consumer
and smi-larb device. If the jpg device call the pm_runtime_get_sync,
the smi-larb's pm_runtime_get_sync also be called automatically.

After removing the larb_get operations, then mtk_jpeg_clk_init is
also unnecessary. Remove it too.

CC: Rick Chang 
CC: Xia Jiang 
Signed-off-by: Yong Wu 
Reviewed-by: Evan Green 
Acked-by: Rick Chang 
---
 .../media/platform/mtk-jpeg/mtk_jpeg_core.c   | 45 +--
 .../media/platform/mtk-jpeg/mtk_jpeg_core.h   |  2 -
 2 files changed, 2 insertions(+), 45 deletions(-)

diff --git a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c 
b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
index a89c7b206eef..4fea2c512434 100644
--- a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
@@ -22,7 +22,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "mtk_jpeg_enc_hw.h"
 #include "mtk_jpeg_dec_hw.h"
@@ -1055,10 +1054,6 @@ static void mtk_jpeg_clk_on(struct mtk_jpeg_dev *jpeg)
 {
int ret;
 
-   ret = mtk_smi_larb_get(jpeg->larb);
-   if (ret)
-   dev_err(jpeg->dev, "mtk_smi_larb_get larbvdec fail %d\n", ret);
-
ret = clk_bulk_prepare_enable(jpeg->variant->num_clks,
  jpeg->variant->clks);
if (ret)
@@ -1069,7 +1064,6 @@ static void mtk_jpeg_clk_off(struct mtk_jpeg_dev *jpeg)
 {
clk_bulk_disable_unprepare(jpeg->variant->num_clks,
   jpeg->variant->clks);
-   mtk_smi_larb_put(jpeg->larb);
 }
 
 static irqreturn_t mtk_jpeg_enc_done(struct mtk_jpeg_dev *jpeg)
@@ -1284,35 +1278,6 @@ static struct clk_bulk_data mtk_jpeg_clocks[] = {
{ .id = "jpgenc" },
 };
 
-static int mtk_jpeg_clk_init(struct mtk_jpeg_dev *jpeg)
-{
-   struct device_node *node;
-   struct platform_device *pdev;
-   int ret;
-
-   node = of_parse_phandle(jpeg->dev->of_node, "mediatek,larb", 0);
-   if (!node)
-   return -EINVAL;
-   pdev = of_find_device_by_node(node);
-   if (WARN_ON(!pdev)) {
-   of_node_put(node);
-   return -EINVAL;
-   }
-   of_node_put(node);
-
-   jpeg->larb = >dev;
-
-   ret = devm_clk_bulk_get(jpeg->dev, jpeg->variant->num_clks,
-   jpeg->variant->clks);
-   if (ret) {
-   dev_err(>dev, "failed to get jpeg clock:%d\n", ret);
-   put_device(>dev);
-   return ret;
-   }
-
-   return 0;
-}
-
 static void mtk_jpeg_job_timeout_work(struct work_struct *work)
 {
struct mtk_jpeg_dev *jpeg = container_of(work, struct mtk_jpeg_dev,
@@ -1333,11 +1298,6 @@ static void mtk_jpeg_job_timeout_work(struct work_struct 
*work)
v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
 }
 
-static inline void mtk_jpeg_clk_release(struct mtk_jpeg_dev *jpeg)
-{
-   put_device(jpeg->larb);
-}
-
 static int mtk_jpeg_probe(struct platform_device *pdev)
 {
struct mtk_jpeg_dev *jpeg;
@@ -1376,7 +1336,8 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
goto err_req_irq;
}
 
-   ret = mtk_jpeg_clk_init(jpeg);
+   ret = devm_clk_bulk_get(jpeg->dev, jpeg->variant->num_clks,
+   jpeg->variant->clks);
if (ret) {
dev_err(>dev, "Failed to init clk, err %d\n", ret);
goto err_clk_init;
@@ -1442,7 +1403,6 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
v4l2_device_unregister(>v4l2_dev);
 
 err_dev_register:
-   mtk_jpeg_clk_release(jpeg);
 
 err_clk_init:
 
@@ -1460,7 +1420,6 @@ static int mtk_jpeg_remove(struct platform_device *pdev)
video_device_release(jpeg->vdev);
v4l2_m2m_release(jpeg->m2m_dev);
v4l2_device_unregister(>v4l2_dev);
-   mtk_jpeg_clk_release(jpeg);
 
return 0;
 }
diff --git a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h 
b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h
index 595f7f10c9fd..3e4811a41ba2 100644
--- a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h
+++ b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.h
@@ -85,7 +85,6 @@ struct mtk_jpeg_variant {
  * @alloc_ctx: videobuf2 memory allocator's context
  * @vdev:  video device node for jpeg mem2mem mode
  * @reg_base:  JPEG registers mapping
- * @larb:  SMI device
  * @job_timeout_work:  IRQ timeout structure
  * @variant:   driver variant to be used
  */
@@ -99,7 +98,6 @@ struct mtk_jpeg_dev {
void*alloc_ctx;
struct video_device *vdev;
void __iomem*reg_base;
-   struct device   *larb;
struct delayed_work job_timeout_work;
const struct mtk_jpeg_variant *variant;
 };
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH v6 03/11] iommu/mediatek: Add device_link between the consumer and the larb devices

2021-07-13 Thread Yong Wu
MediaTek IOMMU-SMI diagram is like below. all the consumer connect with
smi-larb, then connect with smi-common.

M4U
 |
smi-common
 |
  -
  | |...
  | |
larb1 larb2
  | |
vdec   venc

When the consumer works, it should enable the smi-larb's power which
also need enable the smi-common's power firstly.

Thus, First of all, use the device link connect the consumer and the
smi-larbs. then add device link between the smi-larb and smi-common.

This patch adds device_link between the consumer and the larbs.

When device_link_add, I add the flag DL_FLAG_STATELESS to avoid calling
pm_runtime_xx to keep the original status of clocks. It can avoid two
issues:
1) Display HW show fastlogo abnormally reported in [1]. At the beggining,
all the clocks are enabled before entering kernel, but the clocks for
display HW(always in larb0) will be gated after clk_enable and clk_disable
called from device_link_add(->pm_runtime_resume) and rpm_idle. The clock
operation happened before display driver probe. At that time, the display
HW will be abnormal.

2) A deadlock issue reported in [2]. Use DL_FLAG_STATELESS to skip
pm_runtime_xx to avoid the deadlock.

Corresponding, DL_FLAG_AUTOREMOVE_CONSUMER can't be added, then
device_link_removed should be added explicitly.

[1] https://lore.kernel.org/linux-mediatek/1564213888.22908.4.camel@mhfsdcap03/
[2] https://lore.kernel.org/patchwork/patch/1086569/

Suggested-by: Tomasz Figa 
Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c| 22 ++
 drivers/iommu/mtk_iommu_v1.c | 20 +++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index a02dde094788..ee742900cf4b 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -571,22 +571,44 @@ static struct iommu_device *mtk_iommu_probe_device(struct 
device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct mtk_iommu_data *data;
+   struct device_link *link;
+   struct device *larbdev;
+   unsigned int larbid;
 
if (!fwspec || fwspec->ops != _iommu_ops)
return ERR_PTR(-ENODEV); /* Not a iommu client device */
 
data = dev_iommu_priv_get(dev);
 
+   /*
+* Link the consumer device with the smi-larb device(supplier)
+* The device in each a larb is a independent HW. thus only link
+* one larb here.
+*/
+   larbid = MTK_M4U_TO_LARB(fwspec->ids[0]);
+   larbdev = data->larb_imu[larbid].dev;
+   link = device_link_add(dev, larbdev,
+  DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS);
+   if (!link)
+   dev_err(dev, "Unable to link %s\n", dev_name(larbdev));
return >iommu;
 }
 
 static void mtk_iommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct mtk_iommu_data *data;
+   struct device *larbdev;
+   unsigned int larbid;
 
if (!fwspec || fwspec->ops != _iommu_ops)
return;
 
+   data = dev_iommu_priv_get(dev);
+   larbid = MTK_M4U_TO_LARB(fwspec->ids[0]);
+   larbdev = data->larb_imu[larbid].dev;
+   device_link_remove(dev, larbdev);
+
iommu_fwspec_free(dev);
 }
 
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index d9365a3d8dc9..d2a7c66b8239 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -424,7 +424,9 @@ static struct iommu_device *mtk_iommu_probe_device(struct 
device *dev)
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct of_phandle_args iommu_spec;
struct mtk_iommu_data *data;
-   int err, idx = 0;
+   int err, idx = 0, larbid;
+   struct device_link *link;
+   struct device *larbdev;
 
while (!of_parse_phandle_with_args(dev->of_node, "iommus",
   "#iommu-cells",
@@ -445,6 +447,14 @@ static struct iommu_device *mtk_iommu_probe_device(struct 
device *dev)
 
data = dev_iommu_priv_get(dev);
 
+   /* Link the consumer device with the smi-larb device(supplier) */
+   larbid = mt2701_m4u_to_larb(fwspec->ids[0]);
+   larbdev = data->larb_imu[larbid].dev;
+   link = device_link_add(dev, larbdev,
+  DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS);
+   if (!link)
+   dev_err(dev, "Unable to link %s\n", dev_name(larbdev));
+
return >iommu;
 }
 
@@ -465,10 +475,18 @@ static void mtk_iommu_probe_finalize(struct device *dev)
 static void mtk_iommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct mtk_iommu_data *data;
+   struct device *larbdev;
+   unsigned int larbid;
 
if (!fwspec || fwspec->ops != _iommu_ops)
return;
 
+   data = 

[PATCH v6 02/11] iommu/mediatek: Add probe_defer for smi-larb

2021-07-13 Thread Yong Wu
Prepare for adding device_link.

The iommu consumer should use device_link to connect with the
smi-larb(supplier). then the smi-larb should run before the iommu
consumer. Here we delay the iommu driver until the smi driver is
ready, then all the iommu consumer always is after the smi driver.

When there is no this patch, if some consumer drivers run before
smi-larb, the supplier link_status is DL_DEV_NO_DRIVER(0) in the
device_link_add, then device_links_driver_bound will use WARN_ON
to complain that the link_status of supplier is not right.

Signed-off-by: Yong Wu 
---
since [1], device_is_bound is not allowed to be EXPORT. It will
affect this driver built as module. thus still use dev.driver here.

[1] https://lore.kernel.org/patchwork/patch/1334670/
---
 drivers/iommu/mtk_iommu.c| 2 +-
 drivers/iommu/mtk_iommu_v1.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6f7c69688ce2..a02dde094788 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -855,7 +855,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
id = i;
 
plarbdev = of_find_device_by_node(larbnode);
-   if (!plarbdev) {
+   if (!plarbdev || !plarbdev->dev.driver) {
of_node_put(larbnode);
return -EPROBE_DEFER;
}
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 778e66f5f1aa..d9365a3d8dc9 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -594,7 +594,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
}
 
plarbdev = of_find_device_by_node(larbnode);
-   if (!plarbdev) {
+   if (!plarbdev || !plarbdev->dev.driver) {
of_node_put(larbnode);
return -EPROBE_DEFER;
}
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 01/11] dt-binding: mediatek: Get rid of mediatek, larb for multimedia HW

2021-07-13 Thread Yong Wu
After adding device_link between the consumer with the smi-larbs,
if the consumer call its owner pm_runtime_get(_sync), the
pm_runtime_get(_sync) of smi-larb and smi-common will be called
automatically. Thus, the consumer don't need the property.

And IOMMU also know which larb this consumer connects with from
iommu id in the "iommus=" property.

Signed-off-by: Yong Wu 
Reviewed-by: Rob Herring 
Reviewed-by: Evan Green 
---
 .../bindings/display/mediatek/mediatek,disp.txt  | 9 -
 .../devicetree/bindings/media/mediatek-jpeg-decoder.yaml | 9 -
 .../devicetree/bindings/media/mediatek-jpeg-encoder.yaml | 9 -
 Documentation/devicetree/bindings/media/mediatek-mdp.txt | 8 
 .../devicetree/bindings/media/mediatek-vcodec.txt| 4 
 5 files changed, 39 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt 
b/Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt
index fbb59c9ddda6..867bd82e2f03 100644
--- a/Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt
+++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt
@@ -61,8 +61,6 @@ Required properties (DMA function blocks):
"mediatek,-disp-rdma"
"mediatek,-disp-wdma"
   the supported chips are mt2701, mt8167 and mt8173.
-- larb: Should contain a phandle pointing to the local arbiter device as 
defined
-  in 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
 - iommus: Should point to the respective IOMMU block with master port as
   argument, see Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
   for details.
@@ -91,7 +89,6 @@ ovl0: ovl@1400c000 {
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_OVL0>;
iommus = < M4U_PORT_DISP_OVL0>;
-   mediatek,larb = <>;
 };
 
 ovl1: ovl@1400d000 {
@@ -101,7 +98,6 @@ ovl1: ovl@1400d000 {
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_OVL1>;
iommus = < M4U_PORT_DISP_OVL1>;
-   mediatek,larb = <>;
 };
 
 rdma0: rdma@1400e000 {
@@ -111,7 +107,6 @@ rdma0: rdma@1400e000 {
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_RDMA0>;
iommus = < M4U_PORT_DISP_RDMA0>;
-   mediatek,larb = <>;
mediatek,rdma-fifosize = <8192>;
 };
 
@@ -122,7 +117,6 @@ rdma1: rdma@1400f000 {
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_RDMA1>;
iommus = < M4U_PORT_DISP_RDMA1>;
-   mediatek,larb = <>;
 };
 
 rdma2: rdma@1401 {
@@ -132,7 +126,6 @@ rdma2: rdma@1401 {
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_RDMA2>;
iommus = < M4U_PORT_DISP_RDMA2>;
-   mediatek,larb = <>;
 };
 
 wdma0: wdma@14011000 {
@@ -142,7 +135,6 @@ wdma0: wdma@14011000 {
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_WDMA0>;
iommus = < M4U_PORT_DISP_WDMA0>;
-   mediatek,larb = <>;
 };
 
 wdma1: wdma@14012000 {
@@ -152,7 +144,6 @@ wdma1: wdma@14012000 {
power-domains = < MT8173_POWER_DOMAIN_MM>;
clocks = < CLK_MM_DISP_WDMA1>;
iommus = < M4U_PORT_DISP_WDMA1>;
-   mediatek,larb = <>;
 };
 
 color0: color@14013000 {
diff --git a/Documentation/devicetree/bindings/media/mediatek-jpeg-decoder.yaml 
b/Documentation/devicetree/bindings/media/mediatek-jpeg-decoder.yaml
index 9b87f036f178..052e752157b4 100644
--- a/Documentation/devicetree/bindings/media/mediatek-jpeg-decoder.yaml
+++ b/Documentation/devicetree/bindings/media/mediatek-jpeg-decoder.yaml
@@ -42,13 +42,6 @@ properties:
   power-domains:
 maxItems: 1
 
-  mediatek,larb:
-$ref: '/schemas/types.yaml#/definitions/phandle'
-description: |
-  Must contain the local arbiters in the current Socs, see
-  
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
-  for details.
-
   iommus:
 maxItems: 2
 description: |
@@ -63,7 +56,6 @@ required:
   - clocks
   - clock-names
   - power-domains
-  - mediatek,larb
   - iommus
 
 additionalProperties: false
@@ -83,7 +75,6 @@ examples:
   clock-names = "jpgdec-smi",
 "jpgdec";
   power-domains = < MT2701_POWER_DOMAIN_ISP>;
-  mediatek,larb = <>;
   iommus = < MT2701_M4U_PORT_JPGDEC_WDMA>,
< MT2701_M4U_PORT_JPGDEC_BSDMA>;
 };
diff --git a/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml 
b/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml
index fcd9b829e036..8bfdfdfaba59 100644
--- a/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml
+++ b/Documentation/devicetree/bindings/media/mediatek-jpeg-encoder.yaml
@@ -35,13 +35,6 @@ properties:
   power-domains:
 maxItems: 1
 
-  mediatek,larb:
-$ref: '/schemas/types.yaml#/definitions/phandle'
-description: |
-  Must contain the local arbiters in the current Socs, see
-  

[PATCH v6 00/11] Clean up "mediatek,larb"

2021-07-13 Thread Yong Wu
MediaTek IOMMU block diagram always like below:

M4U
 |
smi-common
 |
  -
  | |  ...
  | |
larb1 larb2
  | |
vdec   venc

All the consumer connect with smi-larb, then connect with smi-common.

When the consumer works, it should enable the smi-larb's power which also
need enable the smi-common's power firstly.

Thus, Firstly, use the device link connect the consumer and the
smi-larbs. then add device link between the smi-larb and smi-common.

After adding the device_link, then "mediatek,larb" property can be removed.
the iommu consumer don't need call the mtk_smi_larb_get/put to enable
the power and clock of smi-larb and smi-common.

About the MM dt-binding/dtsi patches, I guess they should go together, thus
I don't split them for each a MM module and each a SoC.

Base on v5.14-rc1, and a jpeg[1] and mdp[2] patchset.

[1] 
https://lore.kernel.org/linux-mediatek/20210702102304.3346429-1-hsi...@chromium.org/
[2] 
https://lore.kernel.org/linux-mediatek/20210709022324.1607884-1-ei...@chromium.org/

Change notes:
v6: 1) rebase on v5.14-rc1.
2) Fix the issue commented in v5 from Dafna and Hsin-Yi.
3) Remove the patches about using pm_runtime_resume_and_get since they have
   already been merged by other patches.

v5: 
https://lore.kernel.org/linux-mediatek/20210410091128.31823-1-yong...@mediatek.com/
1) Base v5.12-rc2.
2) Remove changing the mtk-iommu to module_platform_driver patch, It have 
already been a
independent patch.

v4: 
https://lore.kernel.org/linux-mediatek/1590826218-23653-1-git-send-email-yong...@mediatek.com/
 
base on v5.7-rc1.
  1) Move drm PM patch before smi patchs.
  2) Change builtin_platform_driver to module_platform_driver since we may need
 build as module.
  3) Rebase many patchset as above.

v3: 
https://lore.kernel.org/linux-iommu/1567503456-24725-1-git-send-email-yong...@mediatek.com/
1) rebase on v5.3-rc1 and the latest mt8183 patchset.
2) Use device_is_bound to check whether the driver is ready from Matthias.  
  
3) Add DL_FLAG_STATELESS flag when calling device_link_add and explain the
   reason in the commit message[3/14].
4) Add a display patch[12/14] into this series. otherwise it may affect
   display HW fastlogo even though it don't happen in mt8183.
   
v2: 
https://lore.kernel.org/linux-iommu/1560171313-28299-1-git-send-email-yong...@mediatek.com/
   1) rebase on v5.2-rc1.
   2) Move adding device_link between the consumer and smi-larb into
iommu_add_device from Robin.
   3) add DL_FLAG_AUTOREMOVE_CONSUMER even though the smi is built-in from Evan.
   4) Remove the shutdown callback in iommu.   

v1: 
https://lore.kernel.org/linux-iommu/1546318276-18993-1-git-send-email-yong...@mediatek.com/

Yong Wu (10):
  dt-binding: mediatek: Get rid of mediatek,larb for multimedia HW
  iommu/mediatek: Add probe_defer for smi-larb
  iommu/mediatek: Add device_link between the consumer and the larb
devices
  media: mtk-jpeg: Get rid of mtk_smi_larb_get/put
  media: mtk-mdp: Get rid of mtk_smi_larb_get/put
  drm/mediatek: Get rid of mtk_smi_larb_get/put
  media: mtk-vcodec: Get rid of mtk_smi_larb_get/put
  memory: mtk-smi: Get rid of mtk_smi_larb_get/put
  arm: dts: mediatek: Get rid of mediatek,larb for MM nodes
  arm64: dts: mediatek: Get rid of mediatek,larb for MM nodes

Yongqiang Niu (1):
  drm/mediatek: Add pm runtime support for ovl and rdma

 .../display/mediatek/mediatek,disp.txt|  9 
 .../bindings/media/mediatek-jpeg-decoder.yaml |  9 
 .../bindings/media/mediatek-jpeg-encoder.yaml |  9 
 .../bindings/media/mediatek-mdp.txt   |  8 
 .../bindings/media/mediatek-vcodec.txt|  4 --
 arch/arm/boot/dts/mt2701.dtsi |  2 -
 arch/arm/boot/dts/mt7623n.dtsi|  5 --
 arch/arm64/boot/dts/mediatek/mt8173.dtsi  | 16 ---
 arch/arm64/boot/dts/mediatek/mt8183.dtsi  |  6 ---
 drivers/gpu/drm/mediatek/mtk_disp_ovl.c   |  9 +++-
 drivers/gpu/drm/mediatek/mtk_disp_rdma.c  |  9 +++-
 drivers/gpu/drm/mediatek/mtk_drm_crtc.c   | 19 
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c   | 36 +--
 drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h   |  1 -
 drivers/gpu/drm/mediatek/mtk_drm_drv.c|  5 +-
 drivers/iommu/mtk_iommu.c | 24 +-
 drivers/iommu/mtk_iommu_v1.c  | 22 -
 .../media/platform/mtk-jpeg/mtk_jpeg_core.c   | 45 +-
 .../media/platform/mtk-jpeg/mtk_jpeg_core.h   |  2 -
 drivers/media/platform/mtk-mdp/mtk_mdp_comp.c | 46 +--
 drivers/media/platform/mtk-mdp/mtk_mdp_comp.h |  2 -
 drivers/media/platform/mtk-mdp/mtk_mdp_core.c |  1 -
 .../platform/mtk-vcodec/mtk_vcodec_dec_pm.c   | 37 ++-
 .../platform/mtk-vcodec/mtk_vcodec_drv.h  |  3 --
 .../platform/mtk-vcodec/mtk_vcodec_enc.c  |  1 -
 .../platform/mtk-vcodec/mtk_vcodec_enc_pm.c   | 44 ++
 

Re: [PATCH v9 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-13 Thread Jason Wang


在 2021/7/13 下午9:27, Dan Carpenter 写道:

On Tue, Jul 13, 2021 at 04:46:55PM +0800, Xie Yongji wrote:

+static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
+{
+   struct vduse_vdpa *vdev;
+   int ret;
+
+   if (dev->vdev)
+   return -EEXIST;
+
+   vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
+_vdpa_config_ops, name, true);
+   if (!vdev)
+   return -ENOMEM;

This should be an IS_ERR() check instead of a NULL check.



Yes.




The vdpa_alloc_device() macro is doing something very complicated but
I'm not sure what.  It calls container_of() and that looks buggy until
you spot the BUILD_BUG_ON_ZERO() compile time assert which ensures that
the container_of() is a no-op.

Only one of the callers checks for error pointers correctly so maybe
it's too complicated or maybe there should be better documentation.



We need better documentation for this macro and fix all the buggy callers.

Yong Ji, want to do that?

Thanks




regards,
dan carpenter



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v9 13/17] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

2021-07-13 Thread Jason Wang


在 2021/7/13 下午7:31, Dan Carpenter 写道:

On Tue, Jul 13, 2021 at 04:46:52PM +0800, Xie Yongji wrote:

@@ -613,37 +618,28 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 
iova, u64 size)
}
  }
  
-static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,

-  struct vhost_iotlb_msg *msg)
+static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
+u64 iova, u64 size, u64 uaddr, u32 perm)
  {
struct vhost_dev *dev = >vdev;
-   struct vhost_iotlb *iotlb = dev->iotlb;
struct page **page_list;
unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
unsigned int gup_flags = FOLL_LONGTERM;
unsigned long npages, cur_base, map_pfn, last_pfn = 0;
unsigned long lock_limit, sz2pin, nchunks, i;
-   u64 iova = msg->iova;
+   u64 start = iova;
long pinned;
int ret = 0;
  
-	if (msg->iova < v->range.first ||

-   msg->iova + msg->size - 1 > v->range.last)
-   return -EINVAL;

This is not related to your patch, but can the "msg->iova + msg->size"
addition can have an integer overflow.  From looking at the callers it
seems like it can.  msg comes from:
   vhost_chr_write_iter()
   --> dev->msg_handler(dev, );
   --> vhost_vdpa_process_iotlb_msg()
  --> vhost_vdpa_process_iotlb_update()



Yes.




If I'm thinking of the right thing then these are allowed to overflow to
0 because of the " - 1" but not further than that.  I believe the check
needs to be something like:

if (msg->iova < v->range.first ||
msg->iova - 1 > U64_MAX - msg->size ||



I guess we don't need - 1 here?

Thanks



msg->iova + msg->size - 1 > v->range.last)

But writing integer overflow check correctly is notoriously difficult.
Do you think you could send a fix for that which is separate from the
patcheset?  We'd want to backport it to stable.

regards,
dan carpenter



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] iommu/vt-d: Disable superpage for Geminilake igfx

2021-07-13 Thread Lu Baolu

On 7/14/21 4:30 AM, Ville Syrjälä wrote:

On Tue, Jul 13, 2021 at 09:34:09AM +0800, Lu Baolu wrote:

On 7/12/21 11:47 PM, Ville Syrjälä wrote:

On Mon, Jul 12, 2021 at 07:23:07AM +0800, Lu Baolu wrote:

On 7/10/21 12:47 AM, Ville Syrjala wrote:

From: Ville Syrjälä

While running "gem_exec_big --r single" from igt-gpu-tools on
Geminilake as soon as a 2M mapping is made I tend to get a DMAR
write fault. Strangely the faulting address is always a 4K page
and usually very far away from the 2M page that got mapped.
But if no 2M mappings get used I can't reproduce the fault.

I also tried to dump the PTE for the faulting address but it actually
looks correct to me (ie. definitely seems to have the write bit set):
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Write] Request device [00:02.0] PASID  fault addr 
7fa8a78000 [fault reason 05] PTE Write access is not set
DMAR: fault 7fa8a78000 (level=1) PTE = 149efc003

So not really sure what's going on and this might just be full on duct
tape, but it seems to work here. The machine has now survived a whole day
running that test whereas with superpage enabled it fails in less than
a minute usually.

TODO: might be nice to disable superpage only for the igfx iommu
 instead of both iommus

If all these quirks are about igfx dedicated iommu's, I would suggest to
disable superpage only for the igfx ones.

Sure. Unfortunately there's no convenient mechanism to do that in
the iommu driver that I can immediately see. So not something I
can just whip up easily. Since you're actually familiar with the
driver maybe you can come up with a decent solution for that?


How about something like below? [no compile, no test...]

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 1131b8efb050..2d51ef288a9e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -338,6 +338,7 @@ static int intel_iommu_strict;
   static int intel_iommu_superpage = 1;
   static int iommu_identity_mapping;
   static int iommu_skip_te_disable;
+static int iommu_skip_igfx_superpage;

   #define IDENTMAP_GFX 2
   #define IDENTMAP_AZALIA  4
@@ -652,6 +653,27 @@ static bool domain_update_iommu_snooping(struct
intel_iommu *skip)
return ret;
   }

+static bool domain_use_super_page(struct dmar_domain *domain)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   bool ret = true;
+
+   if (!intel_iommu_superpage)
+   return false;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (drhd->gfx_dedicated && iommu_skip_igfx_superpage) {
+   ret = false;
+   break

  ^
Missing semicolon. Othwerwise seems to work great here. Thanks.

Are you going to turn this into a proper patch, or do you
want me to just squash this into my patches and repost?



Please go ahead with a new version.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] dma-mapping: use vmalloc_to_page for vmalloc addresses

2021-07-13 Thread Stefano Stabellini
On Tue, 22 Jun 2021, Roman Skakun wrote:
> This commit is dedicated to fix incorrect conversion from
> cpu_addr to page address in cases when we get virtual
> address which allocated in the vmalloc range.
> As the result, virt_to_page() cannot convert this address
> properly and return incorrect page address.
> 
> Need to detect such cases and obtains the page address using
> vmalloc_to_page() instead.
> 
> Signed-off-by: Roman Skakun 
> Reviewed-by: Andrii Anisov 
> ---
> Hey!
> Thanks for suggestions, Christoph!
> I updated the patch according to your advice.
> But, I'm so surprised because nobody catches this problem
> in the common code before. It looks a bit strange as for me. 
> 
> 
>  kernel/dma/ops_helpers.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
> index 910ae69cae77..782728d8a393 100644
> --- a/kernel/dma/ops_helpers.c
> +++ b/kernel/dma/ops_helpers.c
> @@ -5,6 +5,14 @@
>   */
>  #include 
>  
> +static struct page *cpu_addr_to_page(void *cpu_addr)
> +{
> + if (is_vmalloc_addr(cpu_addr))
> + return vmalloc_to_page(cpu_addr);
> + else
> + return virt_to_page(cpu_addr);
> +}

We have the same thing in xen_swiotlb_free_coherent. Can we find a way
to call cpu_addr_to_page() from xen_swiotlb_free_coherent? Maybe we can
make cpu_addr_to_page non-static? No problem if it is too much trouble.


>  /*
>   * Create scatter-list for the already allocated DMA buffer.
>   */
> @@ -12,7 +20,7 @@ int dma_common_get_sgtable(struct device *dev, struct 
> sg_table *sgt,
>void *cpu_addr, dma_addr_t dma_addr, size_t size,
>unsigned long attrs)
>  {
> - struct page *page = virt_to_page(cpu_addr);
> + struct page *page = cpu_addr_to_page(cpu_addr);
>   int ret;
>  
>   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
> @@ -43,7 +51,7 @@ int dma_common_mmap(struct device *dev, struct 
> vm_area_struct *vma,
>   return -ENXIO;
>  
>   return remap_pfn_range(vma, vma->vm_start,
> - page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
> + page_to_pfn(cpu_addr_to_page(cpu_addr)) + vma->vm_pgoff,
>   user_count << PAGE_SHIFT, vma->vm_page_prot);
>  #else
>   return -ENXIO;
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-mapping: use vmalloc_to_page for vmalloc addresses

2021-07-13 Thread Konrad Rzeszutek Wilk
On Tue, Jun 22, 2021 at 04:34:14PM +0300, Roman Skakun wrote:
> This commit is dedicated to fix incorrect conversion from
> cpu_addr to page address in cases when we get virtual
> address which allocated in the vmalloc range.
> As the result, virt_to_page() cannot convert this address
> properly and return incorrect page address.
> 
> Need to detect such cases and obtains the page address using
> vmalloc_to_page() instead.
> 
> Signed-off-by: Roman Skakun 
> Reviewed-by: Andrii Anisov 
> ---
> Hey!
> Thanks for suggestions, Christoph!
> I updated the patch according to your advice.
> But, I'm so surprised because nobody catches this problem
> in the common code before. It looks a bit strange as for me. 

This looks like it wasn't picked up? Should it go in rc1?
> 
> 
>  kernel/dma/ops_helpers.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
> index 910ae69cae77..782728d8a393 100644
> --- a/kernel/dma/ops_helpers.c
> +++ b/kernel/dma/ops_helpers.c
> @@ -5,6 +5,14 @@
>   */
>  #include 
>  
> +static struct page *cpu_addr_to_page(void *cpu_addr)
> +{
> + if (is_vmalloc_addr(cpu_addr))
> + return vmalloc_to_page(cpu_addr);
> + else
> + return virt_to_page(cpu_addr);
> +}
> +
>  /*
>   * Create scatter-list for the already allocated DMA buffer.
>   */
> @@ -12,7 +20,7 @@ int dma_common_get_sgtable(struct device *dev, struct 
> sg_table *sgt,
>void *cpu_addr, dma_addr_t dma_addr, size_t size,
>unsigned long attrs)
>  {
> - struct page *page = virt_to_page(cpu_addr);
> + struct page *page = cpu_addr_to_page(cpu_addr);
>   int ret;
>  
>   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
> @@ -43,7 +51,7 @@ int dma_common_mmap(struct device *dev, struct 
> vm_area_struct *vma,
>   return -ENXIO;
>  
>   return remap_pfn_range(vma, vma->vm_start,
> - page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
> + page_to_pfn(cpu_addr_to_page(cpu_addr)) + vma->vm_pgoff,
>   user_count << PAGE_SHIFT, vma->vm_page_prot);
>  #else
>   return -ENXIO;
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-13 Thread Konrad Rzeszutek Wilk
..snip..
> > > I think the main question I have is how would you like to see patches for
> > > 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?
> > 
> > Yes that would be perfect. If there are any dependencies on the rc1, I
> > can rebase it on top of that.
> 
> Yes, please, rebasing would be very helpful. The broader rework of
> 'io_tlb_default_mem' is going to conflict quite badly otherwise.

There is a devel/for-linus-5.15 (based on v5.14-rc1) now.

Thank you!
> 
> Cheers,
> 
> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Enable swiotlb if any device supports iommu v2 and uses identity mapping

2021-07-13 Thread Konrad Rzeszutek Wilk
On Thu, Jul 08, 2021 at 03:43:42PM +0100, Robin Murphy wrote:
> On 2021-07-08 14:57, Kai-Heng Feng wrote:
> > On Thu, Jul 8, 2021 at 6:18 PM Robin Murphy  wrote:
> > > 
> > > On 2021-07-08 10:28, Joerg Roedel wrote:
> > > > On Thu, Jul 08, 2021 at 03:42:32PM +0800, Kai-Heng Feng wrote:
> > > > > @@ -344,6 +344,9 @@ static int iommu_init_device(struct device *dev)
> > > > > 
> > > > >   iommu = amd_iommu_rlookup_table[dev_data->devid];
> > > > >   dev_data->iommu_v2 = iommu->is_iommu_v2;
> > > > > +
> > > > > +if (dev_data->iommu_v2)
> > > > > +swiotlb = 1;
> > > > 
> > > > This looks like the big hammer, as it will affect all other systems
> > > > where the AMD GPUs are in their own group.
> > > > 
> > > > What is needed here is an explicit check whether a non-iommu-v2 device
> > > > is direct-mapped because it shares a group with the GPU, and only enable
> > > > swiotlb in this case.
> > > 
> > > Right, it's basically about whether any DMA-limited device might at any
> > > time end up in an IOMMU_DOMAIN_IDENTITY domain. And given the
> > > possibility of device hotplug and the user being silly with the sysfs
> > > interface, I don't think we can categorically determine that at boot time.
> > > 
> > > Also note that Intel systems are likely to be similarly affected (in
> > > fact intel-iommu doesn't even have the iommu_default_passthough() check
> > > so it's probably even easier to blow up).
> > 
> > swiotlb is enabled by pci_swiotlb_detect_4gb() and intel-iommu doesn't
> > disable it.
> 
> Oh, right... I did say I found this dance hard to follow. Clearly I
> shouldn't have trusted what I thought I remembered from looking at it
> yesterday :)
> 
> Also not helped by the fact that it sets iommu_detected which *does* disable
> SWIOTLB, but only on IA-64.
> 
> > I wonder if we can take the same approach in amd-iommu?
> 
> Certainly if there's a precedent for leaving SWIOTLB enabled even if it
> *might* be redundant, that seems like the easiest option (it's what we do on
> arm64 too, but then we have system topologies where some devices may not be
> behind IOMMUs even when others are). More fun would be to try to bring it up
> at the first sign of IOMMU_DOMAIN_IDENTITY if it was disabled previously,
> but I don't have the highest hope of that being practical.


It is kind of silly to enable SWIOTLB which will just eat 64MB of memory
"just in case".

The SWIOTLB does have support to do late initialization (xen-pcifront
does that for example - so if you add devices that can't do 64-bit it
will allocate something like 4MB).

Would that be a better choice going forward - that is allocate this
under those conditions?
> 
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: add overflow checks to swiotlb_bounce

2021-07-13 Thread Konrad Rzeszutek Wilk
On Wed, Jul 07, 2021 at 02:12:54PM +0900, Dominique Martinet wrote:
> This is a follow-up on 5f89468e2f06 ("swiotlb: manipulate orig_addr
> when tlb_addr has offset") which fixed unaligned dma mappings,
> making sure the following overflows are caught:
> 
> - offset of the start of the slot within the device bigger than
> requested address' offset, in other words if the base address
> given in swiotlb_tbl_map_single to create the mapping (orig_addr)
> was after the requested address for the sync (tlb_offset) in the
> same block:
> 
>  |--| block
>   <> mapped part of the block
>   ^
>   orig_addr
>^
>invalid tlb_addr for sync
> 
> - if the resulting offset was bigger than the allocation size
> this one could happen if the mapping was not until the end. e.g.
> 
>  |--| block
>   <-> mapped part of the block
>   ^   ^
>   orig_addr   invalid tlb_addr
> 
> Both should never happen so print a warning and bail out without trying
> to adjust the sizes/offsets: the first one could try to sync from
> orig_addr to whatever is left of the requested size, but the later
> really has nothing to sync there...
> 
> Signed-off-by: Dominique Martinet 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Bumyong Lee 
> Cc: Chanho Park 
> Cc: Christoph Hellwig 
> ---
> 
> Hi Konrad,
> 
> here's the follow up for the swiotlb/caamjr regression I had promissed.

Awesome!
> It doesn't really change anything, and I confirmed I don't hit either of
> the warnings on our board, but it's probably best to have as either
> could really happen.

:nods:

I put it in the devel/for-linus-5.14 and linux-next. Thank you!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v2] /dev/iommu uAPI proposal

2021-07-13 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, July 14, 2021 7:23 AM
> 
> On Tue, Jul 13, 2021 at 11:20:12PM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, July 14, 2021 7:03 AM
> > >
> > > On Tue, Jul 13, 2021 at 10:48:38PM +, Tian, Kevin wrote:
> > >
> > > > We can still bind to the parent with cookie, but with
> > > > iommu_register_ sw_device() IOMMU fd knows that this binding
> doesn't
> > > > need to establish any security context via IOMMU API.
> > >
> > > AFAIK there is no reason to involve the parent PCI or other device in
> > > SW mode. The iommufd doesn't need to be aware of anything there.
> > >
> >
> > Yes. but does it makes sense to have an unified model in IOMMU fd
> > which always have a [struct device, cookie] with flags to indicate whether
> > the binding/attaching should be specially handled for sw mdev? Or
> > are you suggesting that lacking of struct device is actually the indicator
> > for such trick?
> 
> I think you've veered into such micro implementation details that it
> is better to wait and see how things look.
> 
> The important point here is that whatever physical device is under a
> SW mdev does not need to be passed to the iommufd because there is
> nothing it can do with that information.
> 

Make sense
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu: dart: Add DART iommu driver

2021-07-13 Thread Robin Murphy
^^ Nit: the subsystem style for the subject format should be 
"iommu/dart: Add..." - similarly on patch #1, which I just realised I 
missed (sorry!)


On 2021-06-27 15:34, Sven Peter wrote:

Apple's new SoCs use iommus for almost all peripherals. These Device
Address Resolution Tables must be setup before these peripherals can
act as DMA masters.

Signed-off-by: Sven Peter 
---
  MAINTAINERS  |1 +
  drivers/iommu/Kconfig|   15 +
  drivers/iommu/Makefile   |1 +
  drivers/iommu/apple-dart-iommu.c | 1058 ++


I'd be inclined to drop "-iommu" from the filename, unless there's some 
other "apple-dart" functionality that might lead to a module name clash 
in future?



  4 files changed, 1075 insertions(+)
  create mode 100644 drivers/iommu/apple-dart-iommu.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 29e5541c8f21..c1ffaa56b5f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1245,6 +1245,7 @@ M:Sven Peter 
  L:iommu@lists.linux-foundation.org
  S:Maintained
  F:Documentation/devicetree/bindings/iommu/apple,dart.yaml
+F: drivers/iommu/apple-dart-iommu.c
  
  APPLE SMC DRIVER

  M:Henrik Rydberg 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..87882c628b46 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -249,6 +249,21 @@ config SPAPR_TCE_IOMMU
  Enables bits of IOMMU API required by VFIO. The iommu_ops
  is not implemented as it is not necessary for VFIO.
  
+config IOMMU_APPLE_DART

+   tristate "Apple DART IOMMU Support"
+   depends on ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   select IOMMU_API
+   select IOMMU_IO_PGTABLE


This is redundant - the individual formats already select it.


+   select IOMMU_IO_PGTABLE_LPAE
+   default ARCH_APPLE
+   help
+ Support for Apple DART (Device Address Resolution Table) IOMMUs
+ found in Apple ARM SoCs like the M1.
+ This IOMMU is required for most peripherals using DMA to access
+ the main memory.
+
+ Say Y here if you are using an Apple SoC with a DART IOMMU.
+
  # ARM IOMMU support
  config ARM_SMMU
tristate "ARM Ltd. System MMU (SMMU) Support"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c0fb0ba88143..8c813f0ebc54 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
  obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
+obj-$(CONFIG_IOMMU_APPLE_DART) += apple-dart-iommu.o
diff --git a/drivers/iommu/apple-dart-iommu.c b/drivers/iommu/apple-dart-iommu.c
new file mode 100644
index ..637ba6e7cef9
--- /dev/null
+++ b/drivers/iommu/apple-dart-iommu.c
@@ -0,0 +1,1058 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Apple DART (Device Address Resolution Table) IOMMU driver
+ *
+ * Copyright (C) 2021 The Asahi Linux Contributors
+ *
+ * Based on arm/arm-smmu/arm-ssmu.c and arm/arm-smmu-v3/arm-smmu-v3.c
+ *  Copyright (C) 2013 ARM Limited
+ *  Copyright (C) 2015 ARM Limited
+ * and on exynos-iommu.c
+ *  Copyright (c) 2011,2016 Samsung Electronics Co., Ltd.
+ */
+
+#include 
+#include 
+#include 


Oh, right, bus_dma_region. Fair enough :)


+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 


Redundant duplicate


+#define DART_MAX_STREAMS 16
+#define DART_MAX_TTBR 4
+
+#define DART_STREAM_ALL 0x
+
+#define DART_PARAMS1 0x00
+#define DART_PARAMS_PAGE_SHIFT GENMASK(27, 24)
+
+#define DART_PARAMS2 0x04
+#define DART_PARAMS_BYPASS_SUPPORT BIT(0)
+
+#define DART_STREAM_COMMAND 0x20
+#define DART_STREAM_COMMAND_BUSY BIT(2)
+#define DART_STREAM_COMMAND_INVALIDATE BIT(20)
+
+#define DART_STREAM_SELECT 0x34
+
+#define DART_ERROR 0x40
+#define DART_ERROR_STREAM GENMASK(27, 24)
+#define DART_ERROR_CODE GENMASK(23, 0)
+#define DART_ERROR_FLAG BIT(31)
+#define DART_ERROR_READ_FAULT BIT(4)
+#define DART_ERROR_WRITE_FAULT BIT(3)
+#define DART_ERROR_NO_PTE BIT(2)
+#define DART_ERROR_NO_PMD BIT(1)
+#define DART_ERROR_NO_TTBR BIT(0)
+
+#define DART_CONFIG 0x60
+#define DART_CONFIG_LOCK BIT(15)
+
+#define DART_STREAM_COMMAND_BUSY_TIMEOUT 100
+
+#define DART_STREAM_REMAP 0x80
+
+#define DART_ERROR_ADDR_HI 0x54
+#define DART_ERROR_ADDR_LO 0x50
+
+#define DART_TCR(sid) (0x100 + 4 * (sid))
+#define DART_TCR_TRANSLATE_ENABLE BIT(7)
+#define DART_TCR_BYPASS0_ENABLE BIT(8)
+#define DART_TCR_BYPASS1_ENABLE BIT(12)
+
+#define DART_TTBR(sid, idx) (0x200 + 16 * (sid) + 4 * (idx))
+#define DART_TTBR_VALID BIT(31)
+#define DART_TTBR_SHIFT 12
+
+/*
+ * Private structure associated with each DART device.
+ *
+ * @dev: device struct
+ * @regs: mapped MMIO region
+ * @irq: interrupt number, can be shared with 

Re: [RFC v2] /dev/iommu uAPI proposal

2021-07-13 Thread Jason Gunthorpe
On Tue, Jul 13, 2021 at 11:20:12PM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, July 14, 2021 7:03 AM
> > 
> > On Tue, Jul 13, 2021 at 10:48:38PM +, Tian, Kevin wrote:
> > 
> > > We can still bind to the parent with cookie, but with
> > > iommu_register_ sw_device() IOMMU fd knows that this binding doesn't
> > > need to establish any security context via IOMMU API.
> > 
> > AFAIK there is no reason to involve the parent PCI or other device in
> > SW mode. The iommufd doesn't need to be aware of anything there.
> > 
> 
> Yes. but does it makes sense to have an unified model in IOMMU fd
> which always have a [struct device, cookie] with flags to indicate whether 
> the binding/attaching should be specially handled for sw mdev? Or
> are you suggesting that lacking of struct device is actually the indicator
> for such trick?

I think you've veered into such micro implementation details that it
is better to wait and see how things look.

The important point here is that whatever physical device is under a
SW mdev does not need to be passed to the iommufd because there is
nothing it can do with that information.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v2] /dev/iommu uAPI proposal

2021-07-13 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, July 14, 2021 7:03 AM
> 
> On Tue, Jul 13, 2021 at 10:48:38PM +, Tian, Kevin wrote:
> 
> > We can still bind to the parent with cookie, but with
> > iommu_register_ sw_device() IOMMU fd knows that this binding doesn't
> > need to establish any security context via IOMMU API.
> 
> AFAIK there is no reason to involve the parent PCI or other device in
> SW mode. The iommufd doesn't need to be aware of anything there.
> 

Yes. but does it makes sense to have an unified model in IOMMU fd
which always have a [struct device, cookie] with flags to indicate whether 
the binding/attaching should be specially handled for sw mdev? Or
are you suggesting that lacking of struct device is actually the indicator
for such trick?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2] /dev/iommu uAPI proposal

2021-07-13 Thread Jason Gunthorpe
On Tue, Jul 13, 2021 at 10:48:38PM +, Tian, Kevin wrote:

> We can still bind to the parent with cookie, but with
> iommu_register_ sw_device() IOMMU fd knows that this binding doesn't
> need to establish any security context via IOMMU API.

AFAIK there is no reason to involve the parent PCI or other device in
SW mode. The iommufd doesn't need to be aware of anything there.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v2] /dev/iommu uAPI proposal

2021-07-13 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, July 14, 2021 12:33 AM
> 
> On Tue, Jul 13, 2021 at 10:26:07AM -0600, Alex Williamson wrote:
> > Quoting this proposal again:
> >
> > > 1)  A successful binding call for the first device in the group creates
> > > the security context for the entire group, by:
> > >
> > > * Verifying group viability in a similar way as VFIO does;
> > >
> > > * Calling IOMMU-API to move the group into a block-dma state,
> > >   which makes all devices in the group attached to an block-dma
> > >   domain with an empty I/O page table;
> > >
> > > VFIO should not allow the user to mmap the MMIO bar of the bound
> > > device until the binding call succeeds.
> >
> > The attach step is irrelevant to my question, the bind step is where
> > the device/group gets into a secure state for device access.
> 
> Binding is similar to attach, it will need to indicate the drivers
> intention and a SW driver will not attach to the PCI device underneath
> it.

Yes. I need to clarify this part in next version. In v1 the binding operation
was purely a software operation within IOMMU fd thus there was no
intention to differentiate device types in this step. But now with v2 the
binding actually involves calling IOMMU API for devices other than sw
mdev. Then we do need similar per-type binding wrappers as defined
for attaching calls.

> 
> > AIUI the operation of VFIO_DEVICE_BIND_IOMMU_FD looks like this:
> >
> > iommu_ctx = iommu_ctx_fdget(iommu_fd);
> >
> > mdev = mdev_from_dev(vdev->dev);
> > dev = mdev ? mdev_parent_dev(mdev) : vdev->dev;
> >
> > iommu_dev = iommu_register_device(iommu_ctx, dev, cookie);
> 
> A default of binding to vdev->dev might turn out to be OK, but this
> needs to be an overridable op in vfio_device and the SW mdevs will
> have to do some 'iommu_register_sw_device()' and not pass in a dev at
> all.
> 

We can still bind to the parent with cookie, but with iommu_register_
sw_device() IOMMU fd knows that this binding doesn't need to
establish any security context via IOMMU API. 

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 5/7] iommu/amd: Tailored gather logic for AMD

2021-07-13 Thread Nadav Amit



> On Jul 13, 2021, at 11:40 AM, Robin Murphy  wrote:
> 
> On 2021-07-13 10:41, Nadav Amit wrote:
>> From: Nadav Amit 
>> AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
>> This is in contrast, for instnace, to Intel IOMMUs that have a limit on
>> the number of pages that can be flushed in a single flush.  In addition,
>> AMD's IOMMU do not care about the page-size, so changes of the page size
>> do not need to trigger a TLB flush.
>> So in most cases, a TLB flush due to disjoint range is not needed for
>> AMD. Yet, vIOMMUs require the hypervisor to synchronize the virtualized
>> IOMMU's PTEs with the physical ones. This process induce overheads, so
>> it is better not to cause unnecessary flushes, i.e., flushes of PTEs
>> that were not modified.
>> Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
>> of the generic iommu_iotlb_gather_add_page(). Ignore disjoint regions
>> unless "non-present cache" feature is reported by the IOMMU
>> capabilities, as this is an indication we are running on a physical
>> IOMMU. A similar indication is used by VT-d (see "caching mode"). The
>> new logic retains the same flushing behavior that we had before the
>> introduction of page-selective IOTLB flushes for AMD.
>> On virtualized environments, check if the newly flushed region and the
>> gathered one are disjoint and flush if it is.
>> Cc: Joerg Roedel 
>> Cc: Will Deacon 
>> Cc: Jiajun Cao 
>> Cc: Lu Baolu 
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-ker...@vger.kernel.org>
>> Cc: Robin Murphy 
>> Signed-off-by: Nadav Amit 
>> ---
>>  drivers/iommu/amd/iommu.c | 23 ++-
>>  1 file changed, 22 insertions(+), 1 deletion(-)
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index bfae3928b98f..cc55c4c6a355 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -2048,6 +2048,27 @@ static int amd_iommu_map(struct iommu_domain *dom, 
>> unsigned long iova,
>>  return ret;
>>  }
>>  +static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
>> +struct iommu_iotlb_gather *gather,
>> +unsigned long iova, size_t size)
>> +{
>> +/*
>> + * AMD's IOMMU can flush as many pages as necessary in a single flush.
>> + * Unless we run in a virtual machine, which can be inferred according
>> + * to whether "non-present cache" is on, it is probably best to prefer
>> + * (potentially) too extensive TLB flushing (i.e., more misses) over
>> + * mutliple TLB flushes (i.e., more flushes). For virtual machines the
>> + * hypervisor needs to synchronize the host IOMMU PTEs with those of
>> + * the guest, and the trade-off is different: unnecessary TLB flushes
>> + * should be avoided.
>> + */
>> +if (amd_iommu_np_cache && gather->end != 0 &&
> 
> iommu_iotlb_gather_is_disjoint() is also checking "gather->end != 0", so I 
> don't think we need both. Strictly it's only necessary here since the other 
> call from iommu_iotlb_gather_add_page() equivalently asserts that the gather 
> is already non-empty via its gather->pgsize check, but one could argue it 
> either way and I don't have a hugely strong preference.

You are correct (even if the compiler would have eliminated the redundancy).

I will remove the redundant check.

> 
> Otherwise, I love how neat this has all ended up, thanks for persevering!

Thank you for the thorough review!

Regards,
Nadav
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] iommu/vt-d: Disable superpage for Geminilake igfx

2021-07-13 Thread Ville Syrjälä
On Tue, Jul 13, 2021 at 09:34:09AM +0800, Lu Baolu wrote:
> On 7/12/21 11:47 PM, Ville Syrjälä wrote:
> > On Mon, Jul 12, 2021 at 07:23:07AM +0800, Lu Baolu wrote:
> >> On 7/10/21 12:47 AM, Ville Syrjala wrote:
> >>> From: Ville Syrjälä
> >>>
> >>> While running "gem_exec_big --r single" from igt-gpu-tools on
> >>> Geminilake as soon as a 2M mapping is made I tend to get a DMAR
> >>> write fault. Strangely the faulting address is always a 4K page
> >>> and usually very far away from the 2M page that got mapped.
> >>> But if no 2M mappings get used I can't reproduce the fault.
> >>>
> >>> I also tried to dump the PTE for the faulting address but it actually
> >>> looks correct to me (ie. definitely seems to have the write bit set):
> >>>DMAR: DRHD: handling fault status reg 2
> >>>DMAR: [DMA Write] Request device [00:02.0] PASID  fault addr 
> >>> 7fa8a78000 [fault reason 05] PTE Write access is not set
> >>>DMAR: fault 7fa8a78000 (level=1) PTE = 149efc003
> >>>
> >>> So not really sure what's going on and this might just be full on duct
> >>> tape, but it seems to work here. The machine has now survived a whole day
> >>> running that test whereas with superpage enabled it fails in less than
> >>> a minute usually.
> >>>
> >>> TODO: might be nice to disable superpage only for the igfx iommu
> >>> instead of both iommus
> >> If all these quirks are about igfx dedicated iommu's, I would suggest to
> >> disable superpage only for the igfx ones.
> > Sure. Unfortunately there's no convenient mechanism to do that in
> > the iommu driver that I can immediately see. So not something I
> > can just whip up easily. Since you're actually familiar with the
> > driver maybe you can come up with a decent solution for that?
> > 
> 
> How about something like below? [no compile, no test...]
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 1131b8efb050..2d51ef288a9e 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -338,6 +338,7 @@ static int intel_iommu_strict;
>   static int intel_iommu_superpage = 1;
>   static int iommu_identity_mapping;
>   static int iommu_skip_te_disable;
> +static int iommu_skip_igfx_superpage;
> 
>   #define IDENTMAP_GFX2
>   #define IDENTMAP_AZALIA 4
> @@ -652,6 +653,27 @@ static bool domain_update_iommu_snooping(struct 
> intel_iommu *skip)
>   return ret;
>   }
> 
> +static bool domain_use_super_page(struct dmar_domain *domain)
> +{
> + struct dmar_drhd_unit *drhd;
> + struct intel_iommu *iommu;
> + bool ret = true;
> +
> + if (!intel_iommu_superpage)
> + return false;
> +
> + rcu_read_lock();
> + for_each_active_iommu(iommu, drhd) {
> + if (drhd->gfx_dedicated && iommu_skip_igfx_superpage) {
> + ret = false;
> + break
 ^
Missing semicolon. Othwerwise seems to work great here. Thanks.

Are you going to turn this into a proper patch, or do you
want me to just squash this into my patches and repost?

> + }
> + }
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
>   static int domain_update_iommu_superpage(struct dmar_domain *domain,
>struct intel_iommu *skip)
>   {
> @@ -659,7 +681,7 @@ static int domain_update_iommu_superpage(struct 
> dmar_domain *domain,
>   struct intel_iommu *iommu;
>   int mask = 0x3;
> 
> - if (!intel_iommu_superpage)
> + if (!domain_use_super_page(domain))
>   return 0;
> 
>   /* set iommu_superpage to the smallest common denominator */
> @@ -5656,6 +5678,14 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 
> 0x1632, quirk_iommu_igfx);
>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163A, quirk_iommu_igfx);
>   DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, quirk_iommu_igfx);
> 
> +static void quirk_skip_igfx_superpage(struct pci_dev *dev)
> +{
> + pci_info(dev, "Disabling IOMMU superpage for graphics on this 
> chipset\n");
> + iommu_skip_igfx_superpage = 1;
> +}
> +
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x3184, 
> quirk_skip_igfx_superpage);
> +
>   static void quirk_iommu_rwbf(struct pci_dev *dev)
>   {
>   if (risky_device(dev))
> 
> Best regards,
> baolu

-- 
Ville Syrjälä
Intel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dt-bindings: More dropping redundant minItems/maxItems

2021-07-13 Thread Rob Herring
Another round of removing redundant minItems/maxItems from new schema in
the recent merge window.

If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
same size as the list is redundant and can be dropped. Note that is DT
schema specific behavior and not standard json-schema behavior. The tooling
will fixup the final schema adding any unspecified minItems/maxItems.

This condition is partially checked with the meta-schema already, but
only if both 'minItems' and 'maxItems' are equal to the 'items' length.
An improved meta-schema is pending.

Cc: Stephen Boyd 
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Krzysztof Kozlowski 
Cc: Miquel Raynal 
Cc: Richard Weinberger 
Cc: Vignesh Raghavendra 
Cc: Alessandro Zummo 
Cc: Alexandre Belloni 
Cc: Greg Kroah-Hartman 
Cc: Sureshkumar Relli 
Cc: Brian Norris 
Cc: Kamal Dasu 
Cc: Linus Walleij 
Cc: Sebastian Siewior 
Cc: Laurent Pinchart 
Cc: linux-...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Signed-off-by: Rob Herring 
---
 .../devicetree/bindings/clock/brcm,iproc-clocks.yaml  | 1 -
 .../devicetree/bindings/iommu/rockchip,iommu.yaml | 2 --
 .../bindings/memory-controllers/arm,pl353-smc.yaml| 1 -
 Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml  | 8 
 .../devicetree/bindings/rtc/faraday,ftrtc010.yaml | 1 -
 Documentation/devicetree/bindings/usb/nxp,isp1760.yaml| 2 --
 6 files changed, 15 deletions(-)

diff --git a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.yaml 
b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.yaml
index 8dc7b404ee12..1174c9aa9934 100644
--- a/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.yaml
+++ b/Documentation/devicetree/bindings/clock/brcm,iproc-clocks.yaml
@@ -50,7 +50,6 @@ properties:
 
   reg:
 minItems: 1
-maxItems: 3
 items:
   - description: base register
   - description: power register
diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
index d2e28a9e3545..ba9124f721f1 100644
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -28,14 +28,12 @@ properties:
   - description: configuration registers for MMU instance 0
   - description: configuration registers for MMU instance 1
 minItems: 1
-maxItems: 2
 
   interrupts:
 items:
   - description: interruption for MMU instance 0
   - description: interruption for MMU instance 1
 minItems: 1
-maxItems: 2
 
   clocks:
 items:
diff --git 
a/Documentation/devicetree/bindings/memory-controllers/arm,pl353-smc.yaml 
b/Documentation/devicetree/bindings/memory-controllers/arm,pl353-smc.yaml
index 7a63c85ef8c5..01c9acf9275d 100644
--- a/Documentation/devicetree/bindings/memory-controllers/arm,pl353-smc.yaml
+++ b/Documentation/devicetree/bindings/memory-controllers/arm,pl353-smc.yaml
@@ -57,7 +57,6 @@ properties:
 
   ranges:
 minItems: 1
-maxItems: 3
 description: |
   Memory bus areas for interacting with the devices. Reflects
   the memory layout with four integer values following:
diff --git a/Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml 
b/Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml
index e5f1a2a5..dd5a64969e37 100644
--- a/Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml
+++ b/Documentation/devicetree/bindings/mtd/brcm,brcmnand.yaml
@@ -84,7 +84,6 @@ properties:
 
   interrupts:
 minItems: 1
-maxItems: 3
 items:
   - description: NAND CTLRDY interrupt
   - description: FLASH_DMA_DONE if flash DMA is available
@@ -92,7 +91,6 @@ properties:
 
   interrupt-names:
 minItems: 1
-maxItems: 3
 items:
   - const: nand_ctlrdy
   - const: flash_dma_done
@@ -148,8 +146,6 @@ allOf:
 then:
   properties:
 reg-names:
-  minItems: 2
-  maxItems: 2
   items:
 - const: nand
 - const: nand-int-base
@@ -161,8 +157,6 @@ allOf:
 then:
   properties:
 reg-names:
-  minItems: 3
-  maxItems: 3
   items:
 - const: nand
 - const: nand-int-base
@@ -175,8 +169,6 @@ allOf:
 then:
   properties:
 reg-names:
-  minItems: 3
-  maxItems: 3
   items:
 - const: nand
 - const: iproc-idm
diff --git a/Documentation/devicetree/bindings/rtc/faraday,ftrtc010.yaml 
b/Documentation/devicetree/bindings/rtc/faraday,ftrtc010.yaml
index 657c13b62b67..056d42daae06 100644
--- a/Documentation/devicetree/bindings/rtc/faraday,ftrtc010.yaml
+++ b/Documentation/devicetree/bindings/rtc/faraday,ftrtc010.yaml
@@ -30,7 +30,6 @@ properties:
 maxItems: 1
 
   clocks:
-minItems: 2
 items:
   - description: PCLK clocks
   - 

Re: [PATCH v4 1/3] iommu: io-pgtable: add DART pagetable format

2021-07-13 Thread Robin Murphy

On 2021-06-27 15:34, Sven Peter wrote:

Apple's DART iommu uses a pagetable format that shares some
similarities with the ones already implemented by io-pgtable.c.
Add a new format variant to support the required differences
so that we don't have to duplicate the pagetable handling code.

Signed-off-by: Sven Peter 
---
  drivers/iommu/io-pgtable-arm.c | 62 ++
  drivers/iommu/io-pgtable.c |  1 +
  include/linux/io-pgtable.h |  7 
  3 files changed, 70 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..1dd5c45b4b5b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -127,6 +127,9 @@
  #define ARM_MALI_LPAE_MEMATTR_IMP_DEF 0x88ULL
  #define ARM_MALI_LPAE_MEMATTR_WRITE_ALLOC 0x8DULL
  
+#define APPLE_DART_PTE_PROT_NO_WRITE (1<<7)

+#define APPLE_DART_PTE_PROT_NO_READ (1<<8)
+
  /* IOPTE accessors */
  #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
  
@@ -381,6 +384,15 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,

  {
arm_lpae_iopte pte;
  
+	if (data->iop.fmt == ARM_APPLE_DART) {

+   pte = 0;
+   if (!(prot & IOMMU_WRITE))
+   pte |= APPLE_DART_PTE_PROT_NO_WRITE;
+   if (!(prot & IOMMU_READ))
+   pte |= APPLE_DART_PTE_PROT_NO_READ;
+   return pte;
+   }
+
if (data->iop.fmt == ARM_64_LPAE_S1 ||
data->iop.fmt == ARM_32_LPAE_S1) {
pte = ARM_LPAE_PTE_nG;
@@ -1043,6 +1055,51 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, 
void *cookie)
return NULL;
  }
  
+static struct io_pgtable *

+apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+{
+   struct arm_lpae_io_pgtable *data;
+   int i;
+
+   if (cfg->oas > 36)
+   return NULL;
+
+   data = arm_lpae_alloc_pgtable(cfg);
+   if (!data)
+   return NULL;
+
+   /*
+* Apple's DART always requires three levels with the first level being
+* stored in four MMIO registers. We always concatenate the first and
+* second level so that we only have to setup the MMIO registers once.
+* This results in an effective two level pagetable.
+*/


Nit: I appreciate the effort to document the weirdness, but this comment 
did rather mislead me initially, and now that I (think I) understand how 
things work it seems a bit backwards. Could we say something like:


  "The table format itself always uses two levels, but the total VA
   space is mapped by four separate tables, making the MMIO registers
   an effective "level 1". For simplicity, though, we treat this
   equivalently to LPAE stage 2 concatenation at level 2, with the
   additional TTBRs each just pointing at consecutive pages."

?


+   if (data->start_level < 1)
+   return NULL;
+   if (data->start_level == 1 && data->pgd_bits > 2)
+   return NULL;
+   if (data->start_level > 1)
+   data->pgd_bits = 0;
+   data->start_level = 2;
+   cfg->apple_dart_cfg.n_ttbrs = 1 << data->pgd_bits;
+   data->pgd_bits += data->bits_per_level;
+
+   data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
+  cfg);
+   if (!data->pgd)
+   goto out_free_data;
+
+   for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i)
+   cfg->apple_dart_cfg.ttbr[i] =
+   virt_to_phys(data->pgd + i * ARM_LPAE_GRANULE(data));
+
+   return >iop;
+
+out_free_data:
+   kfree(data);
+   return NULL;
+}
+
  struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = {
.alloc  = arm_64_lpae_alloc_pgtable_s1,
.free   = arm_lpae_free_pgtable,
@@ -1068,6 +1125,11 @@ struct io_pgtable_init_fns 
io_pgtable_arm_mali_lpae_init_fns = {
.free   = arm_lpae_free_pgtable,
  };
  
+struct io_pgtable_init_fns io_pgtable_apple_dart_init_fns = {

+   .alloc  = apple_dart_alloc_pgtable,
+   .free   = arm_lpae_free_pgtable,
+};
+
  #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
  
  static struct io_pgtable_cfg *cfg_cookie __initdata;

diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 6e9917ce980f..fd8e6bd6caf9 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -20,6 +20,7 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
[ARM_64_LPAE_S1] = _pgtable_arm_64_lpae_s1_init_fns,
[ARM_64_LPAE_S2] = _pgtable_arm_64_lpae_s2_init_fns,
[ARM_MALI_LPAE] = _pgtable_arm_mali_lpae_init_fns,
+   [ARM_APPLE_DART] = _pgtable_apple_dart_init_fns,
  #endif
  #ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S
[ARM_V7S] = _pgtable_arm_v7s_init_fns,
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4d40dfa75b55..a4bfac7f85f7 100644
--- a/include/linux/io-pgtable.h
+++ 

Re: [PATCH v5 5/7] iommu/amd: Tailored gather logic for AMD

2021-07-13 Thread Robin Murphy

On 2021-07-13 10:41, Nadav Amit wrote:

From: Nadav Amit 

AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
This is in contrast, for instnace, to Intel IOMMUs that have a limit on
the number of pages that can be flushed in a single flush.  In addition,
AMD's IOMMU do not care about the page-size, so changes of the page size
do not need to trigger a TLB flush.

So in most cases, a TLB flush due to disjoint range is not needed for
AMD. Yet, vIOMMUs require the hypervisor to synchronize the virtualized
IOMMU's PTEs with the physical ones. This process induce overheads, so
it is better not to cause unnecessary flushes, i.e., flushes of PTEs
that were not modified.

Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
of the generic iommu_iotlb_gather_add_page(). Ignore disjoint regions
unless "non-present cache" feature is reported by the IOMMU
capabilities, as this is an indication we are running on a physical
IOMMU. A similar indication is used by VT-d (see "caching mode"). The
new logic retains the same flushing behavior that we had before the
introduction of page-selective IOTLB flushes for AMD.

On virtualized environments, check if the newly flushed region and the
gathered one are disjoint and flush if it is.

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org>
Cc: Robin Murphy 
Signed-off-by: Nadav Amit 
---
  drivers/iommu/amd/iommu.c | 23 ++-
  1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index bfae3928b98f..cc55c4c6a355 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2048,6 +2048,27 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
return ret;
  }
  
+static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,

+   struct iommu_iotlb_gather *gather,
+   unsigned long iova, size_t size)
+{
+   /*
+* AMD's IOMMU can flush as many pages as necessary in a single flush.
+* Unless we run in a virtual machine, which can be inferred according
+* to whether "non-present cache" is on, it is probably best to prefer
+* (potentially) too extensive TLB flushing (i.e., more misses) over
+* mutliple TLB flushes (i.e., more flushes). For virtual machines the
+* hypervisor needs to synchronize the host IOMMU PTEs with those of
+* the guest, and the trade-off is different: unnecessary TLB flushes
+* should be avoided.
+*/
+   if (amd_iommu_np_cache && gather->end != 0 &&


iommu_iotlb_gather_is_disjoint() is also checking "gather->end != 0", so 
I don't think we need both. Strictly it's only necessary here since the 
other call from iommu_iotlb_gather_add_page() equivalently asserts that 
the gather is already non-empty via its gather->pgsize check, but one 
could argue it either way and I don't have a hugely strong preference.


Otherwise, I love how neat this has all ended up, thanks for persevering!

Reviewed-by: Robin Murphy 


+   iommu_iotlb_gather_is_disjoint(gather, iova, size))
+   iommu_iotlb_sync(domain, gather);
+
+   iommu_iotlb_gather_add_range(gather, iova, size);
+}
+
  static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
  size_t page_size,
  struct iommu_iotlb_gather *gather)
@@ -2062,7 +2083,7 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
  
  	r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
  
-	iommu_iotlb_gather_add_page(dom, gather, iova, page_size);

+   amd_iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
  
  	return r;

  }


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 4/7] iommu: Factor iommu_iotlb_gather_is_disjoint() out

2021-07-13 Thread Robin Murphy

On 2021-07-13 10:41, Nadav Amit wrote:

From: Nadav Amit 

Refactor iommu_iotlb_gather_add_page() and factor out the logic that
detects whether IOTLB gather range and a new range are disjoint. To be
used by the next patch that implements different gathering logic for
AMD.

Note that updating gather->pgsize unconditionally does not affect
correctness as the function had (and has) an invariant, in which
gather->pgsize always represents the flushing granularity of its range.
Arguably, “size" should never be zero, but lets assume for the matter of
discussion that it might.

If "size" equals to "gather->pgsize", then the assignment in question
has no impact.

Otherwise, if "size" is non-zero, then iommu_iotlb_sync() would
initialize the size and range (see iommu_iotlb_gather_init()), and the
invariant is kept.

Otherwise, "size" is zero, and "gather" already holds a range, so
gather->pgsize is non-zero and (gather->pgsize && gather->pgsize !=
size) is true. Therefore, again, iommu_iotlb_sync() would be called and
initialize the size.


With the caveat of one comment on the next patch...

Reviewed-by: Robin Murphy 


Cc: Joerg Roedel 
Cc: Jiajun Cao 
Cc: Robin Murphy 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org>
Acked-by: Will Deacon 
Signed-off-by: Nadav Amit 
---
  include/linux/iommu.h | 34 ++
  1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e554871db46f..979a5ceeea55 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct iommu_domain 
*domain,
iommu_iotlb_gather_init(iotlb_gather);
  }
  
+/**

+ * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint
+ *
+ * @gather: TLB gather data
+ * @iova: start of page to invalidate
+ * @size: size of page to invalidate
+ *
+ * Helper for IOMMU drivers to check whether a new range and the gathered range
+ * are disjoint. For many IOMMUs, flushing the IOMMU in this case is better
+ * than merging the two, which might lead to unnecessary invalidations.
+ */
+static inline
+bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
+   unsigned long iova, size_t size)
+{
+   unsigned long start = iova, end = start + size - 1;
+
+   return gather->end != 0 &&
+   (end + 1 < gather->start || start > gather->end + 1);
+}
+
+
  /**
   * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
   * @gather: TLB gather data
@@ -533,20 +555,16 @@ static inline void iommu_iotlb_gather_add_page(struct 
iommu_domain *domain,
   struct iommu_iotlb_gather 
*gather,
   unsigned long iova, size_t size)
  {
-   unsigned long start = iova, end = start + size - 1;
-
/*
 * If the new page is disjoint from the current range or is mapped at
 * a different granularity, then sync the TLB so that the gather
 * structure can be rewritten.
 */
-   if (gather->pgsize != size ||
-   end + 1 < gather->start || start > gather->end + 1) {
-   if (gather->pgsize)
-   iommu_iotlb_sync(domain, gather);
-   gather->pgsize = size;
-   }
+   if ((gather->pgsize && gather->pgsize != size) ||
+   iommu_iotlb_gather_is_disjoint(gather, iova, size))
+   iommu_iotlb_sync(domain, gather);
  
+	gather->pgsize = size;

iommu_iotlb_gather_add_range(gather, iova, size);
  }
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC

2021-07-13 Thread Robin Murphy

On 2021-07-08 15:36, Doug Anderson wrote:
[...]

Or document for the users that want performance how to
change the setting, so that they can decide.


Pushing this to the users can make sense for a Linux distribution but
probably less sense for an embedded platform. So I'm happy to make
some way for a user to override this (like via kernel command line),
but I also strongly believe there should be a default that users don't
have to futz with that we think is correct.


FYI I did make progress on the "punt it to userspace" approach. I'm not 
posting it even as an RFC yet because I still need to set up a machine 
to try actually testing any of it (it's almost certainly broken 
somewhere), but in the end it comes out looking surprisingly not too bad 
overall. If you're curious to take a look in the meantime I put it here:


https://gitlab.arm.com/linux-arm/linux-rm/-/commits/iommu/fq

Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 7/7] iommu/amd: Use only natural aligned flushes in a VM

2021-07-13 Thread Nadav Amit
From: Nadav Amit 

When running on an AMD vIOMMU, it is better to avoid TLB flushes
of unmodified PTEs. vIOMMUs require the hypervisor to synchronize the
virtualized IOMMU's PTEs with the physical ones. This process induce
overheads.

AMD IOMMU allows us to flush any range that is aligned to the power of
2. So when running on top of a vIOMMU, break the range into sub-ranges
that are naturally aligned, and flush each one separately. This apporach
is better when running with a vIOMMU, but on physical IOMMUs, the
penalty of IOTLB misses due to unnecessary flushed entries is likely to
be low.

Repurpose (i.e., keeping the name, changing the logic)
domain_flush_pages() so it is used to choose whether to perform one
flush of the whole range or multiple ones to avoid flushing unnecessary
ranges. Use NpCache, as usual, to infer whether the IOMMU is physical or
virtual.

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Suggested-by: Robin Murphy 
Signed-off-by: Nadav Amit 
---
 drivers/iommu/amd/iommu.c | 47 ++-
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c1fcd01b3c40..cb5ea94055e1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1261,15 +1261,52 @@ static void __domain_flush_pages(struct 
protection_domain *domain,
 }
 
 static void domain_flush_pages(struct protection_domain *domain,
-  u64 address, size_t size)
+  u64 address, size_t size, int pde)
 {
-   __domain_flush_pages(domain, address, size, 0);
+   if (likely(!amd_iommu_np_cache)) {
+   __domain_flush_pages(domain, address, size, pde);
+   return;
+   }
+
+   /*
+* When NpCache is on, we infer that we run in a VM and use a vIOMMU.
+* In such setups it is best to avoid flushes of ranges which are not
+* naturally aligned, since it would lead to flushes of unmodified
+* PTEs. Such flushes would require the hypervisor to do more work than
+* necessary. Therefore, perform repeated flushes of aligned ranges
+* until you cover the range. Each iteration flushes the smaller
+* between the natural alignment of the address that we flush and the
+* greatest naturally aligned region that fits in the range.
+*/
+   while (size != 0) {
+   int addr_alignment = __ffs(address);
+   int size_alignment = __fls(size);
+   int min_alignment;
+   size_t flush_size;
+
+   /*
+* size is always non-zero, but address might be zero, causing
+* addr_alignment to be negative. As the casting of the
+* argument in __ffs(address) to long might trim the high bits
+* of the address on x86-32, cast to long when doing the check.
+*/
+   if (likely((unsigned long)address != 0))
+   min_alignment = min(addr_alignment, size_alignment);
+   else
+   min_alignment = size_alignment;
+
+   flush_size = 1ul << min_alignment;
+
+   __domain_flush_pages(domain, address, flush_size, pde);
+   address += flush_size;
+   size -= flush_size;
+   }
 }
 
 /* Flush the whole IO/TLB for a given protection domain - including PDE */
 void amd_iommu_domain_flush_tlb_pde(struct protection_domain *domain)
 {
-   __domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 1);
+   domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 1);
 }
 
 void amd_iommu_domain_flush_complete(struct protection_domain *domain)
@@ -1296,7 +1333,7 @@ static void domain_flush_np_cache(struct 
protection_domain *domain,
unsigned long flags;
 
spin_lock_irqsave(>lock, flags);
-   domain_flush_pages(domain, iova, size);
+   domain_flush_pages(domain, iova, size, 1);
amd_iommu_domain_flush_complete(domain);
spin_unlock_irqrestore(>lock, flags);
}
@@ -2200,7 +2237,7 @@ static void amd_iommu_iotlb_sync(struct iommu_domain 
*domain,
unsigned long flags;
 
spin_lock_irqsave(>lock, flags);
-   __domain_flush_pages(dom, gather->start, gather->end - gather->start, 
1);
+   domain_flush_pages(dom, gather->start, gather->end - gather->start, 1);
amd_iommu_domain_flush_complete(dom);
spin_unlock_irqrestore(>lock, flags);
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 6/7] iommu/amd: Sync once for scatter-gather operations

2021-07-13 Thread Nadav Amit
From: Nadav Amit 

On virtual machines, software must flush the IOTLB after each page table
entry update.

The iommu_map_sg() code iterates through the given scatter-gather list
and invokes iommu_map() for each element in the scatter-gather list,
which calls into the vendor IOMMU driver through iommu_ops callback. As
the result, a single sg mapping may lead to multiple IOTLB flushes.

Fix this by adding amd_iotlb_sync_map() callback and flushing at this
point after all sg mappings we set.

This commit is followed and inspired by commit 933fcd01e97e2
("iommu/vt-d: Add iotlb_sync_map callback").

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Robin Murphy 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Nadav Amit 
---
 drivers/iommu/amd/iommu.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index cc55c4c6a355..c1fcd01b3c40 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2022,6 +2022,16 @@ static int amd_iommu_attach_device(struct iommu_domain 
*dom,
return ret;
 }
 
+static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
+unsigned long iova, size_t size)
+{
+   struct protection_domain *domain = to_pdomain(dom);
+   struct io_pgtable_ops *ops = >iop.iop.ops;
+
+   if (ops->map)
+   domain_flush_np_cache(domain, iova, size);
+}
+
 static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
 phys_addr_t paddr, size_t page_size, int iommu_prot,
 gfp_t gfp)
@@ -2040,10 +2050,8 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
 
-   if (ops->map) {
+   if (ops->map)
ret = ops->map(ops, iova, paddr, page_size, prot, gfp);
-   domain_flush_np_cache(domain, iova, page_size);
-   }
 
return ret;
 }
@@ -2223,6 +2231,7 @@ const struct iommu_ops amd_iommu_ops = {
.attach_dev = amd_iommu_attach_device,
.detach_dev = amd_iommu_detach_device,
.map = amd_iommu_map,
+   .iotlb_sync_map = amd_iommu_iotlb_sync_map,
.unmap = amd_iommu_unmap,
.iova_to_phys = amd_iommu_iova_to_phys,
.probe_device = amd_iommu_probe_device,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 5/7] iommu/amd: Tailored gather logic for AMD

2021-07-13 Thread Nadav Amit
From: Nadav Amit 

AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
This is in contrast, for instnace, to Intel IOMMUs that have a limit on
the number of pages that can be flushed in a single flush.  In addition,
AMD's IOMMU do not care about the page-size, so changes of the page size
do not need to trigger a TLB flush.

So in most cases, a TLB flush due to disjoint range is not needed for
AMD. Yet, vIOMMUs require the hypervisor to synchronize the virtualized
IOMMU's PTEs with the physical ones. This process induce overheads, so
it is better not to cause unnecessary flushes, i.e., flushes of PTEs
that were not modified.

Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
of the generic iommu_iotlb_gather_add_page(). Ignore disjoint regions
unless "non-present cache" feature is reported by the IOMMU
capabilities, as this is an indication we are running on a physical
IOMMU. A similar indication is used by VT-d (see "caching mode"). The
new logic retains the same flushing behavior that we had before the
introduction of page-selective IOTLB flushes for AMD.

On virtualized environments, check if the newly flushed region and the
gathered one are disjoint and flush if it is.

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org>
Cc: Robin Murphy 
Signed-off-by: Nadav Amit 
---
 drivers/iommu/amd/iommu.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index bfae3928b98f..cc55c4c6a355 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2048,6 +2048,27 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
return ret;
 }
 
+static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
+   struct iommu_iotlb_gather *gather,
+   unsigned long iova, size_t size)
+{
+   /*
+* AMD's IOMMU can flush as many pages as necessary in a single flush.
+* Unless we run in a virtual machine, which can be inferred according
+* to whether "non-present cache" is on, it is probably best to prefer
+* (potentially) too extensive TLB flushing (i.e., more misses) over
+* mutliple TLB flushes (i.e., more flushes). For virtual machines the
+* hypervisor needs to synchronize the host IOMMU PTEs with those of
+* the guest, and the trade-off is different: unnecessary TLB flushes
+* should be avoided.
+*/
+   if (amd_iommu_np_cache && gather->end != 0 &&
+   iommu_iotlb_gather_is_disjoint(gather, iova, size))
+   iommu_iotlb_sync(domain, gather);
+
+   iommu_iotlb_gather_add_range(gather, iova, size);
+}
+
 static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
  size_t page_size,
  struct iommu_iotlb_gather *gather)
@@ -2062,7 +2083,7 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
 
r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
 
-   iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
+   amd_iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
 
return r;
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/7] iommu: Factor iommu_iotlb_gather_is_disjoint() out

2021-07-13 Thread Nadav Amit
From: Nadav Amit 

Refactor iommu_iotlb_gather_add_page() and factor out the logic that
detects whether IOTLB gather range and a new range are disjoint. To be
used by the next patch that implements different gathering logic for
AMD.

Note that updating gather->pgsize unconditionally does not affect
correctness as the function had (and has) an invariant, in which
gather->pgsize always represents the flushing granularity of its range.
Arguably, “size" should never be zero, but lets assume for the matter of
discussion that it might.

If "size" equals to "gather->pgsize", then the assignment in question
has no impact.

Otherwise, if "size" is non-zero, then iommu_iotlb_sync() would
initialize the size and range (see iommu_iotlb_gather_init()), and the
invariant is kept.

Otherwise, "size" is zero, and "gather" already holds a range, so
gather->pgsize is non-zero and (gather->pgsize && gather->pgsize !=
size) is true. Therefore, again, iommu_iotlb_sync() would be called and
initialize the size.

Cc: Joerg Roedel 
Cc: Jiajun Cao 
Cc: Robin Murphy 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org>
Acked-by: Will Deacon 
Signed-off-by: Nadav Amit 
---
 include/linux/iommu.h | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e554871db46f..979a5ceeea55 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct iommu_domain 
*domain,
iommu_iotlb_gather_init(iotlb_gather);
 }
 
+/**
+ * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint
+ *
+ * @gather: TLB gather data
+ * @iova: start of page to invalidate
+ * @size: size of page to invalidate
+ *
+ * Helper for IOMMU drivers to check whether a new range and the gathered range
+ * are disjoint. For many IOMMUs, flushing the IOMMU in this case is better
+ * than merging the two, which might lead to unnecessary invalidations.
+ */
+static inline
+bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
+   unsigned long iova, size_t size)
+{
+   unsigned long start = iova, end = start + size - 1;
+
+   return gather->end != 0 &&
+   (end + 1 < gather->start || start > gather->end + 1);
+}
+
+
 /**
  * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
  * @gather: TLB gather data
@@ -533,20 +555,16 @@ static inline void iommu_iotlb_gather_add_page(struct 
iommu_domain *domain,
   struct iommu_iotlb_gather 
*gather,
   unsigned long iova, size_t size)
 {
-   unsigned long start = iova, end = start + size - 1;
-
/*
 * If the new page is disjoint from the current range or is mapped at
 * a different granularity, then sync the TLB so that the gather
 * structure can be rewritten.
 */
-   if (gather->pgsize != size ||
-   end + 1 < gather->start || start > gather->end + 1) {
-   if (gather->pgsize)
-   iommu_iotlb_sync(domain, gather);
-   gather->pgsize = size;
-   }
+   if ((gather->pgsize && gather->pgsize != size) ||
+   iommu_iotlb_gather_is_disjoint(gather, iova, size))
+   iommu_iotlb_sync(domain, gather);
 
+   gather->pgsize = size;
iommu_iotlb_gather_add_range(gather, iova, size);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 3/7] iommu: Improve iommu_iotlb_gather helpers

2021-07-13 Thread Nadav Amit
From: Robin Murphy 

The Mediatek driver is not the only one which might want a basic
address-based gathering behaviour, so although it's arguably simple
enough to open-code, let's factor it out for the sake of cleanliness.
Let's also take this opportunity to document the intent of these
helpers for clarity.

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Robin Murphy 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Robin Murphy 
Signed-off-by: Nadav Amit 
---
 drivers/iommu/mtk_iommu.c |  6 +-
 include/linux/iommu.h | 38 +-
 2 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6f7c69688ce2..d9939e4af35c 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -520,12 +520,8 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
  struct iommu_iotlb_gather *gather)
 {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
-   unsigned long end = iova + size - 1;
 
-   if (gather->start > iova)
-   gather->start = iova;
-   if (gather->end < end)
-   gather->end = end;
+   iommu_iotlb_gather_add_range(gather, iova, size);
return dom->iop->unmap(dom->iop, iova, size, gather);
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..e554871db46f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -497,6 +497,38 @@ static inline void iommu_iotlb_sync(struct iommu_domain 
*domain,
iommu_iotlb_gather_init(iotlb_gather);
 }
 
+/**
+ * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
+ * @gather: TLB gather data
+ * @iova: start of page to invalidate
+ * @size: size of page to invalidate
+ *
+ * Helper for IOMMU drivers to build arbitrarily-sized invalidation commands
+ * where only the address range matters, and simply minimising intermediate
+ * syncs is preferred.
+ */
+static inline void iommu_iotlb_gather_add_range(struct iommu_iotlb_gather 
*gather,
+   unsigned long iova, size_t size)
+{
+   unsigned long end = iova + size - 1;
+
+   if (gather->start > iova)
+   gather->start = iova;
+   if (gather->end < end)
+   gather->end = end;
+}
+
+/**
+ * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
+ * @domain: IOMMU domain to be invalidated
+ * @gather: TLB gather data
+ * @iova: start of page to invalidate
+ * @size: size of page to invalidate
+ *
+ * Helper for IOMMU drivers to build invalidation commands based on individual
+ * pages, or with page size/table level hints which cannot be gathered if they
+ * differ.
+ */
 static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
   struct iommu_iotlb_gather 
*gather,
   unsigned long iova, size_t size)
@@ -515,11 +547,7 @@ static inline void iommu_iotlb_gather_add_page(struct 
iommu_domain *domain,
gather->pgsize = size;
}
 
-   if (gather->end < end)
-   gather->end = end;
-
-   if (gather->start > start)
-   gather->start = start;
+   iommu_iotlb_gather_add_range(gather, iova, size);
 }
 
 /* PCI device grouping function */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/7] iommu/amd: Do not use flush-queue when NpCache is on

2021-07-13 Thread Nadav Amit
From: Nadav Amit 

Do not use flush-queue on virtualized environments, where the NpCache
capability of the IOMMU is set. This is required to reduce
virtualization overheads.

This change follows a similar change to Intel's VT-d and a detailed
explanation as for the rationale is described in commit 29b32839725f
("iommu/vt-d: Do not use flush-queue when caching-mode is on").

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Robin Murphy 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Nadav Amit 
---
 drivers/iommu/amd/init.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 46280e6e1535..1c7ae7d3c55d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1850,8 +1850,13 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
if (ret)
return ret;
 
-   if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
+   if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
+   if (!amd_iommu_unmap_flush)
+   pr_info("IOMMU batching is disabled due to 
virtualization\n");
+
amd_iommu_np_cache = true;
+   amd_iommu_unmap_flush = true;
+   }
 
init_iommu_perf_ctr(iommu);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/7] iommu/amd: Selective flush on unmap

2021-07-13 Thread Nadav Amit
From: Nadav Amit 

Recent patch attempted to enable selective page flushes on AMD IOMMU but
neglected to adapt amd_iommu_iotlb_sync() to use the selective flushes.

Adapt amd_iommu_iotlb_sync() to use selective flushes and change
amd_iommu_unmap() to collect the flushes. As a defensive measure, to
avoid potential issues as those that the Intel IOMMU driver encountered
recently, flush the page-walk caches by always setting the "pde"
parameter. This can be removed later.

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Robin Murphy 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Nadav Amit 
---
 drivers/iommu/amd/iommu.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 811a49a95d04..bfae3928b98f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2054,12 +2054,17 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
 {
struct protection_domain *domain = to_pdomain(dom);
struct io_pgtable_ops *ops = >iop.iop.ops;
+   size_t r;
 
if ((amd_iommu_pgtable == AMD_IOMMU_V1) &&
(domain->iop.mode == PAGE_MODE_NONE))
return 0;
 
-   return (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
+   r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
+
+   iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
+
+   return r;
 }
 
 static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom,
@@ -2162,7 +2167,13 @@ static void amd_iommu_flush_iotlb_all(struct 
iommu_domain *domain)
 static void amd_iommu_iotlb_sync(struct iommu_domain *domain,
 struct iommu_iotlb_gather *gather)
 {
-   amd_iommu_flush_iotlb_all(domain);
+   struct protection_domain *dom = to_pdomain(domain);
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   __domain_flush_pages(dom, gather->start, gather->end - gather->start, 
1);
+   amd_iommu_domain_flush_complete(dom);
+   spin_unlock_irqrestore(>lock, flags);
 }
 
 static int amd_iommu_def_domain_type(struct device *dev)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/7] iommu/amd: Enable page-selective flushes

2021-07-13 Thread Nadav Amit
From: Nadav Amit 

The previous patch, commit 268aa4548277 ("iommu/amd: Page-specific
invalidations for more than one page") was supposed to enable
page-selective IOTLB flushes on AMD.

Besides the bug that was already fixed by commit a017c567915f
("iommu/amd: Fix wrong parentheses on page-specific invalidations")
there are several remaining matters to enable and benefit from
page-selective IOTLB flushes on AMD:

1. Enable selective flushes on unmap (patch 1)
2. Avoid using flush-queue on vIOMMUs (patch 2)
3. Relaxed flushes when gathering, excluding vIOMMUs (patches 3-5)
4. Syncing once on scatter-gather map operations (patch 6)
5. Breaking flushes to naturally aligned ranges on vIOMMU (patch 7)

The main difference in this version is that the logic that flushes
vIOMMU was improved based on Robin's feedback. Batching decisions are
not based on alignment anymore, but instead the flushing range is broken
into naturally aligned regions on sync. Doing so allows us to flush only
the entries that we modified with the minimal number of flushes.

Robin, others: your feedback would be highly appreciated to get these
patches merge.

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Robin Murphy 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org

---

v4->v5:
* Rebase on v5.14-rc1
* Change pr_warn() to pr_info() [John Garry]

v3->v4:
* Breaking flushes to naturally aligned ranges on vIOMMU [Robin]
* Removing unnecessary stubs; fixing comment [Robin]
* Removing unused variable [Yong]
* Changing pr_warn_once() to pr_warn() [Robin]
* Improving commit log [Will]

v2->v3:
* Rebase on v5.13-rc5
* Refactoring (patches 4-5) [Robin]
* Rework flush logic (patch 5): more relaxed on native
* Syncing once on scatter-gather operations (patch 6)

v1->v2:
* Rebase on v5.13-rc3


Nadav Amit (6):
  iommu/amd: Selective flush on unmap
  iommu/amd: Do not use flush-queue when NpCache is on
  iommu: Factor iommu_iotlb_gather_is_disjoint() out
  iommu/amd: Tailored gather logic for AMD
  iommu/amd: Sync once for scatter-gather operations
  iommu/amd: Use only natural aligned flushes in a VM

Robin Murphy (1):
  iommu: Improve iommu_iotlb_gather helpers

 drivers/iommu/amd/init.c  |  7 ++-
 drivers/iommu/amd/iommu.c | 96 +++
 drivers/iommu/mtk_iommu.c |  6 +--
 include/linux/iommu.h | 72 +++--
 4 files changed, 153 insertions(+), 28 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2] /dev/iommu uAPI proposal

2021-07-13 Thread Jason Gunthorpe
On Tue, Jul 13, 2021 at 10:26:07AM -0600, Alex Williamson wrote:
> Quoting this proposal again:
> 
> > 1)  A successful binding call for the first device in the group creates 
> > the security context for the entire group, by:
> > 
> > * Verifying group viability in a similar way as VFIO does;
> > 
> > * Calling IOMMU-API to move the group into a block-dma state,
> >   which makes all devices in the group attached to an block-dma
> >   domain with an empty I/O page table;
> > 
> > VFIO should not allow the user to mmap the MMIO bar of the bound
> > device until the binding call succeeds.
> 
> The attach step is irrelevant to my question, the bind step is where
> the device/group gets into a secure state for device access.

Binding is similar to attach, it will need to indicate the drivers
intention and a SW driver will not attach to the PCI device underneath
it.

> AIUI the operation of VFIO_DEVICE_BIND_IOMMU_FD looks like this:
> 
>   iommu_ctx = iommu_ctx_fdget(iommu_fd);
> 
>   mdev = mdev_from_dev(vdev->dev);
>   dev = mdev ? mdev_parent_dev(mdev) : vdev->dev;
> 
>   iommu_dev = iommu_register_device(iommu_ctx, dev, cookie);

A default of binding to vdev->dev might turn out to be OK, but this
needs to be an overridable op in vfio_device and the SW mdevs will
have to do some 'iommu_register_sw_device()' and not pass in a dev at
all.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2] /dev/iommu uAPI proposal

2021-07-13 Thread Alex Williamson
On Tue, 13 Jul 2021 09:55:03 -0300
Jason Gunthorpe  wrote:

> On Mon, Jul 12, 2021 at 11:56:24PM +, Tian, Kevin wrote:
> 
> > Maybe I misunderstood your question. Are you specifically worried
> > about establishing the security context for a mdev vs. for its
> > parent?  
> 
> The way to think about the cookie, and the device bind/attach in
> general, is as taking control of a portion of the IOMMU routing:
> 
>  - RID
>  - RID + PASID
>  - "software"
> 
> For the first two there can be only one device attachment per value so
> the cookie is unambiguous.
> 
> For "software" the iommu layer has little to do with this - everything
> is constructed outside by the mdev. If the mdev wishes to communicate
> on /dev/iommu using the cookie then it has to do so using some iommufd
> api and we can convay the proper device at that point.
> 
> Kevin didn't show it, but along side the PCI attaches:
> 
> struct iommu_attach_data * iommu_pci_device_attach(
> struct iommu_dev *dev, struct pci_device *pdev,
> u32 ioasid);
> 
> There would also be a software attach for mdev:
> 
> struct iommu_attach_data * iommu_sw_device_attach(
> struct iommu_dev *dev, struct device *pdev, u32 ioasid);
> 
> Which does not connect anything to the iommu layer.
> 
> It would have to return something that allows querying the IO page
> table, and the mdev would use that API instead of vfio_pin_pages().


Quoting this proposal again:

> 1)  A successful binding call for the first device in the group creates 
> the security context for the entire group, by:
> 
> * Verifying group viability in a similar way as VFIO does;
> 
> * Calling IOMMU-API to move the group into a block-dma state,
>   which makes all devices in the group attached to an block-dma
>   domain with an empty I/O page table;
> 
> VFIO should not allow the user to mmap the MMIO bar of the bound
> device until the binding call succeeds.

The attach step is irrelevant to my question, the bind step is where
the device/group gets into a secure state for device access.

So for IGD we have two scenarios, direct assignment and software mdevs.

AIUI the operation of VFIO_DEVICE_BIND_IOMMU_FD looks like this:

iommu_ctx = iommu_ctx_fdget(iommu_fd);

mdev = mdev_from_dev(vdev->dev);
dev = mdev ? mdev_parent_dev(mdev) : vdev->dev;

iommu_dev = iommu_register_device(iommu_ctx, dev, cookie);

In either case, this last line is either registering the IGD itself
(ie. the struct device representing PCI device :00:02.0) or the
parent of the GVT-g mdev (ie. the struct device representing PCI device
:00:02.0).  They're the same!  AIUI, the cookie is simply an
arbitrary user generated value which they'll use to refer to this
device via the iommu_fd uAPI.

So what magic is iommu_register_device() doing to infer my intentions
as to whether I'm asking for the IGD RID to be isolated or I'm only
creating a software context for an mdev?  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-13 Thread Dan Carpenter
On Tue, Jul 13, 2021 at 04:46:55PM +0800, Xie Yongji wrote:
> +static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name)
> +{
> + struct vduse_vdpa *vdev;
> + int ret;
> +
> + if (dev->vdev)
> + return -EEXIST;
> +
> + vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev,
> +  _vdpa_config_ops, name, true);
> + if (!vdev)
> + return -ENOMEM;

This should be an IS_ERR() check instead of a NULL check.

The vdpa_alloc_device() macro is doing something very complicated but
I'm not sure what.  It calls container_of() and that looks buggy until
you spot the BUILD_BUG_ON_ZERO() compile time assert which ensures that
the container_of() is a no-op.

Only one of the callers checks for error pointers correctly so maybe
it's too complicated or maybe there should be better documentation.

regards,
dan carpenter
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2] /dev/iommu uAPI proposal

2021-07-13 Thread Jason Gunthorpe
On Mon, Jul 12, 2021 at 11:56:24PM +, Tian, Kevin wrote:

> Maybe I misunderstood your question. Are you specifically worried
> about establishing the security context for a mdev vs. for its
> parent?

The way to think about the cookie, and the device bind/attach in
general, is as taking control of a portion of the IOMMU routing:

 - RID
 - RID + PASID
 - "software"

For the first two there can be only one device attachment per value so
the cookie is unambiguous.

For "software" the iommu layer has little to do with this - everything
is constructed outside by the mdev. If the mdev wishes to communicate
on /dev/iommu using the cookie then it has to do so using some iommufd
api and we can convay the proper device at that point.

Kevin didn't show it, but along side the PCI attaches:

struct iommu_attach_data * iommu_pci_device_attach(
struct iommu_dev *dev, struct pci_device *pdev,
u32 ioasid);

There would also be a software attach for mdev:

struct iommu_attach_data * iommu_sw_device_attach(
struct iommu_dev *dev, struct device *pdev, u32 ioasid);

Which does not connect anything to the iommu layer.

It would have to return something that allows querying the IO page
table, and the mdev would use that API instead of vfio_pin_pages().

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 13/17] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

2021-07-13 Thread Dan Carpenter
On Tue, Jul 13, 2021 at 04:46:52PM +0800, Xie Yongji wrote:
> @@ -613,37 +618,28 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 
> iova, u64 size)
>   }
>  }
>  
> -static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
> -struct vhost_iotlb_msg *msg)
> +static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
> +  u64 iova, u64 size, u64 uaddr, u32 perm)
>  {
>   struct vhost_dev *dev = >vdev;
> - struct vhost_iotlb *iotlb = dev->iotlb;
>   struct page **page_list;
>   unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
>   unsigned int gup_flags = FOLL_LONGTERM;
>   unsigned long npages, cur_base, map_pfn, last_pfn = 0;
>   unsigned long lock_limit, sz2pin, nchunks, i;
> - u64 iova = msg->iova;
> + u64 start = iova;
>   long pinned;
>   int ret = 0;
>  
> - if (msg->iova < v->range.first ||
> - msg->iova + msg->size - 1 > v->range.last)
> - return -EINVAL;

This is not related to your patch, but can the "msg->iova + msg->size"
addition can have an integer overflow.  From looking at the callers it
seems like it can.  msg comes from:
  vhost_chr_write_iter()
  --> dev->msg_handler(dev, );
  --> vhost_vdpa_process_iotlb_msg()
 --> vhost_vdpa_process_iotlb_update()

If I'm thinking of the right thing then these are allowed to overflow to
0 because of the " - 1" but not further than that.  I believe the check
needs to be something like:

if (msg->iova < v->range.first ||
msg->iova - 1 > U64_MAX - msg->size ||
msg->iova + msg->size - 1 > v->range.last)

But writing integer overflow check correctly is notoriously difficult.
Do you think you could send a fix for that which is separate from the
patcheset?  We'd want to backport it to stable.

regards,
dan carpenter
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 07/17] virtio: Don't set FAILED status bit on device index allocation failure

2021-07-13 Thread Yongji Xie
On Tue, Jul 13, 2021 at 7:02 PM Dan Carpenter  wrote:
>
> On Tue, Jul 13, 2021 at 04:46:46PM +0800, Xie Yongji wrote:
> > We don't need to set FAILED status bit on device index allocation
> > failure since the device initialization hasn't been started yet.
>
> The commit message should say what the effect of this change is to the
> user.  Is this a bugfix?  Will it have any effect on runtime at all?
>

Thanks for the reminder. Will update the commit message.

> To me, hearing your thoughts on this is valuable even if you have to
> guess.  "I noticed this mistake during review and I don't think it will
> affect runtime."
>

Yes, that's what I thought.

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 07/17] virtio: Don't set FAILED status bit on device index allocation failure

2021-07-13 Thread Dan Carpenter
On Tue, Jul 13, 2021 at 04:46:46PM +0800, Xie Yongji wrote:
> We don't need to set FAILED status bit on device index allocation
> failure since the device initialization hasn't been started yet.

The commit message should say what the effect of this change is to the
user.  Is this a bugfix?  Will it have any effect on runtime at all?

To me, hearing your thoughts on this is valuable even if you have to
guess.  "I noticed this mistake during review and I don't think it will
affect runtime."

regards,
dan carpenter

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-07-13 Thread Xie Yongji
This VDUSE driver enables implementing software-emulated vDPA
devices in userspace. The vDPA device is created by
ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
interface (/dev/vduse/$NAME) is exported to userspace for device
emulation.

In order to make the device emulation more secure, the device's
control path is handled in kernel. A message mechnism is introduced
to forward some dataplane related control messages to userspace.

And in the data path, the DMA buffer will be mapped into userspace
address space through different ways depending on the vDPA bus to
which the vDPA device is attached. In virtio-vdpa case, the MMU-based
IOMMU driver is used to achieve that. And in vhost-vdpa case, the
DMA buffer is reside in a userspace memory region which can be shared
to the VDUSE userspace processs via transferring the shmfd.

For more details on VDUSE design and usage, please see the follow-on
Documentation commit.

Signed-off-by: Xie Yongji 
---
 Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
 drivers/vdpa/Kconfig   |   10 +
 drivers/vdpa/Makefile  |1 +
 drivers/vdpa/vdpa_user/Makefile|5 +
 drivers/vdpa/vdpa_user/vduse_dev.c | 1502 
 include/uapi/linux/vduse.h |  221 +++
 6 files changed, 1740 insertions(+)
 create mode 100644 drivers/vdpa/vdpa_user/Makefile
 create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
 create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 1409e40e6345..293ca3aef358 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
 'z'   10-4F  drivers/s390/crypto/zcrypt_api.hconflict!
 '|'   00-7F  linux/media.h
 0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
 0x89  00-06  arch/x86/include/asm/sockios.h
 0x89  0B-DF  linux/sockios.h
 0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index a503c1b2bfd9..6e23bce6433a 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
  vDPA block device simulator which terminates IO request in a
  memory buffer.
 
+config VDPA_USER
+   tristate "VDUSE (vDPA Device in Userspace) support"
+   depends on EVENTFD && MMU && HAS_DMA
+   select DMA_OPS
+   select VHOST_IOTLB
+   select IOMMU_IOVA
+   help
+ With VDUSE it is possible to emulate a vDPA Device
+ in a userspace program.
+
 config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_VDPA) += vdpa.o
 obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
 obj-$(CONFIG_IFCVF)+= ifcvf/
 obj-$(CONFIG_MLX5_VDPA) += mlx5/
 obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..c994a4a4660c
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1502 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024)
+#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+#define VDUSE_REQUEST_TIMEOUT 30
+
+struct vduse_virtqueue {
+   u16 index;
+   u16 num_max;
+   u32 num;
+   u64 desc_addr;
+   u64 driver_addr;
+   u64 device_addr;
+   struct vdpa_vq_state state;
+   bool ready;
+   bool kicked;
+   spinlock_t kick_lock;
+   spinlock_t irq_lock;
+   struct eventfd_ctx *kickfd;
+   struct vdpa_callback cb;
+  

[PATCH v9 17/17] Documentation: Add documentation for VDUSE

2021-07-13 Thread Xie Yongji
VDUSE (vDPA Device in Userspace) is a framework to support
implementing software-emulated vDPA devices in userspace. This
document is intended to clarify the VDUSE design and usage.

Signed-off-by: Xie Yongji 
---
 Documentation/userspace-api/index.rst |   1 +
 Documentation/userspace-api/vduse.rst | 248 ++
 2 files changed, 249 insertions(+)
 create mode 100644 Documentation/userspace-api/vduse.rst

diff --git a/Documentation/userspace-api/index.rst 
b/Documentation/userspace-api/index.rst
index 0b5eefed027e..c432be070f67 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -27,6 +27,7 @@ place where this information is gathered.
iommu
media/index
sysfs-platform_profile
+   vduse
 
 .. only::  subproject and html
 
diff --git a/Documentation/userspace-api/vduse.rst 
b/Documentation/userspace-api/vduse.rst
new file mode 100644
index ..2c0d56d4b2da
--- /dev/null
+++ b/Documentation/userspace-api/vduse.rst
@@ -0,0 +1,248 @@
+==
+VDUSE - "vDPA Device in Userspace"
+==
+
+vDPA (virtio data path acceleration) device is a device that uses a
+datapath which complies with the virtio specifications with vendor
+specific control path. vDPA devices can be both physically located on
+the hardware or emulated by software. VDUSE is a framework that makes it
+possible to implement software-emulated vDPA devices in userspace. And
+to make the device emulation more secure, the emulated vDPA device's
+control path is handled in the kernel and only the data path is
+implemented in the userspace.
+
+Note that only virtio block device is supported by VDUSE framework now,
+which can reduce security risks when the userspace process that implements
+the data path is run by an unprivileged user. The support for other device
+types can be added after the security issue of corresponding device driver
+is clarified or fixed in the future.
+
+Start/Stop VDUSE devices
+
+
+VDUSE devices are started as follows:
+
+1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
+   /dev/vduse/control.
+
+2. Setup each virtqueue with ioctl(VDUSE_VQ_SETUP) on /dev/vduse/$NAME.
+
+3. Begin processing VDUSE messages from /dev/vduse/$NAME. The first
+   messages will arrive while attaching the VDUSE instance to vDPA bus.
+
+4. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE
+   instance to vDPA bus.
+
+VDUSE devices are stopped as follows:
+
+1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE
+   instance from vDPA bus.
+
+2. Close the file descriptor referring to /dev/vduse/$NAME.
+
+3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on
+   /dev/vduse/control.
+
+The netlink messages can be sent via vdpa tool in iproute2 or use the
+below sample codes:
+
+.. code-block:: c
+
+   static int netlink_add_vduse(const char *name, enum vdpa_command cmd)
+   {
+   struct nl_sock *nlsock;
+   struct nl_msg *msg;
+   int famid;
+
+   nlsock = nl_socket_alloc();
+   if (!nlsock)
+   return -ENOMEM;
+
+   if (genl_connect(nlsock))
+   goto free_sock;
+
+   famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME);
+   if (famid < 0)
+   goto close_sock;
+
+   msg = nlmsg_alloc();
+   if (!msg)
+   goto close_sock;
+
+   if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, 
cmd, 0))
+   goto nla_put_failure;
+
+   NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name);
+   if (cmd == VDPA_CMD_DEV_NEW)
+   NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, 
"vduse");
+
+   if (nl_send_sync(nlsock, msg))
+   goto close_sock;
+
+   nl_close(nlsock);
+   nl_socket_free(nlsock);
+
+   return 0;
+   nla_put_failure:
+   nlmsg_free(msg);
+   close_sock:
+   nl_close(nlsock);
+   free_sock:
+   nl_socket_free(nlsock);
+   return -1;
+   }
+
+How VDUSE works
+---
+
+As mentioned above, a VDUSE device is created by ioctl(VDUSE_CREATE_DEV) on
+/dev/vduse/control. With this ioctl, userspace can specify some basic 
configuration
+such as device name (uniquely identify a VDUSE device), virtio features, virtio
+configuration space, bounce buffer size and so on for this emulated device. 
Then
+a char device interface (/dev/vduse/$NAME) is exported to userspace for device
+emulation. Userspace can use the VDUSE_VQ_SETUP ioctl on /dev/vduse/$NAME to
+add per-virtqueue configuration such as the max size of virtqueue to the 
device.
+
+After the initialization, the VDUSE device can be attached to vDPA bus via
+the VDPA_CMD_DEV_NEW netlink 

[PATCH v9 15/17] vduse: Implement an MMU-based IOMMU driver

2021-07-13 Thread Xie Yongji
This implements an MMU-based IOMMU driver to support mapping
kernel dma buffer into userspace. The basic idea behind it is
treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set
up MMU mapping instead of IOMMU mapping for the DMA transfer so
that the userspace process is able to use its virtual address to
access the dma buffer in kernel.

And to avoid security issue, a bounce-buffering mechanism is
introduced to prevent userspace accessing the original buffer
directly.

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/vdpa_user/iova_domain.c | 545 +++
 drivers/vdpa/vdpa_user/iova_domain.h |  73 +
 2 files changed, 618 insertions(+)
 create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
 create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
new file mode 100644
index ..ad45026f5423
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -0,0 +1,545 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * MMU-based IOMMU implementation
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+static int vduse_iotlb_add_range(struct vduse_iova_domain *domain,
+u64 start, u64 last,
+u64 addr, unsigned int perm,
+struct file *file, u64 offset)
+{
+   struct vdpa_map_file *map_file;
+   int ret;
+
+   map_file = kmalloc(sizeof(*map_file), GFP_ATOMIC);
+   if (!map_file)
+   return -ENOMEM;
+
+   map_file->file = get_file(file);
+   map_file->offset = offset;
+
+   ret = vhost_iotlb_add_range_ctx(domain->iotlb, start, last,
+   addr, perm, map_file);
+   if (ret) {
+   fput(map_file->file);
+   kfree(map_file);
+   return ret;
+   }
+   return 0;
+}
+
+static void vduse_iotlb_del_range(struct vduse_iova_domain *domain,
+ u64 start, u64 last)
+{
+   struct vdpa_map_file *map_file;
+   struct vhost_iotlb_map *map;
+
+   while ((map = vhost_iotlb_itree_first(domain->iotlb, start, last))) {
+   map_file = (struct vdpa_map_file *)map->opaque;
+   fput(map_file->file);
+   kfree(map_file);
+   vhost_iotlb_map_free(domain->iotlb, map);
+   }
+}
+
+int vduse_domain_set_map(struct vduse_iova_domain *domain,
+struct vhost_iotlb *iotlb)
+{
+   struct vdpa_map_file *map_file;
+   struct vhost_iotlb_map *map;
+   u64 start = 0ULL, last = ULLONG_MAX;
+   int ret;
+
+   spin_lock(>iotlb_lock);
+   vduse_iotlb_del_range(domain, start, last);
+
+   for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+map = vhost_iotlb_itree_next(map, start, last)) {
+   map_file = (struct vdpa_map_file *)map->opaque;
+   ret = vduse_iotlb_add_range(domain, map->start, map->last,
+   map->addr, map->perm,
+   map_file->file,
+   map_file->offset);
+   if (ret)
+   goto err;
+   }
+   spin_unlock(>iotlb_lock);
+
+   return 0;
+err:
+   vduse_iotlb_del_range(domain, start, last);
+   spin_unlock(>iotlb_lock);
+   return ret;
+}
+
+void vduse_domain_clear_map(struct vduse_iova_domain *domain,
+   struct vhost_iotlb *iotlb)
+{
+   struct vhost_iotlb_map *map;
+   u64 start = 0ULL, last = ULLONG_MAX;
+
+   spin_lock(>iotlb_lock);
+   for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+map = vhost_iotlb_itree_next(map, start, last)) {
+   vduse_iotlb_del_range(domain, map->start, map->last);
+   }
+   spin_unlock(>iotlb_lock);
+}
+
+static int vduse_domain_map_bounce_page(struct vduse_iova_domain *domain,
+u64 iova, u64 size, u64 paddr)
+{
+   struct vduse_bounce_map *map;
+   u64 last = iova + size - 1;
+
+   while (iova <= last) {
+   map = >bounce_maps[iova >> PAGE_SHIFT];
+   if (!map->bounce_page) {
+   map->bounce_page = alloc_page(GFP_ATOMIC);
+   if (!map->bounce_page)
+   return -ENOMEM;
+   }
+   map->orig_phys = paddr;
+   paddr += PAGE_SIZE;
+   iova += PAGE_SIZE;
+   }
+   return 0;
+}
+
+static void vduse_domain_unmap_bounce_page(struct vduse_iova_domain *domain,
+  u64 iova, u64 size)
+{
+   struct 

[PATCH v9 14/17] vdpa: Support transferring virtual addressing during DMA mapping

2021-07-13 Thread Xie Yongji
This patch introduces an attribute for vDPA device to indicate
whether virtual address can be used. If vDPA device driver set
it, vhost-vdpa bus driver will not pin user page and transfer
userspace virtual address instead of physical address during
DMA mapping. And corresponding vma->vm_file and offset will be
also passed as an opaque pointer.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/ifcvf/ifcvf_main.c   |  2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c |  2 +-
 drivers/vdpa/vdpa.c   |  9 +++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c  |  2 +-
 drivers/vdpa/virtio_pci/vp_vdpa.c |  2 +-
 drivers/vhost/vdpa.c  | 99 ++-
 include/linux/vdpa.h  | 19 ++--
 7 files changed, 116 insertions(+), 19 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index b4b89a66607f..fc848aafe5a9 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -492,7 +492,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
}
 
adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
-   dev, _vdpa_ops, NULL);
+   dev, _vdpa_ops, NULL, false);
if (adapter == NULL) {
IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
return -ENOMEM;
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 683452c41a36..79cb2f130bf0 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2035,7 +2035,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name)
max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
 
ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
mdev->device, _vdpa_ops,
-name);
+name, false);
if (IS_ERR(ndev))
return PTR_ERR(ndev);
 
diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index d77d59811389..41377df674d5 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -71,6 +71,7 @@ static void vdpa_release_dev(struct device *d)
  * @config: the bus operations that is supported by this device
  * @size: size of the parent structure that contains private data
  * @name: name of the vdpa device; optional.
+ * @use_va: indicate whether virtual address must be used by this device
  *
  * Driver should use vdpa_alloc_device() wrapper macro instead of
  * using this directly.
@@ -80,7 +81,8 @@ static void vdpa_release_dev(struct device *d)
  */
 struct vdpa_device *__vdpa_alloc_device(struct device *parent,
const struct vdpa_config_ops *config,
-   size_t size, const char *name)
+   size_t size, const char *name,
+   bool use_va)
 {
struct vdpa_device *vdev;
int err = -EINVAL;
@@ -91,6 +93,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
if (!!config->dma_map != !!config->dma_unmap)
goto err;
 
+   /* It should only work for the device that use on-chip IOMMU */
+   if (use_va && !(config->dma_map || config->set_map))
+   goto err;
+
err = -ENOMEM;
vdev = kzalloc(size, GFP_KERNEL);
if (!vdev)
@@ -106,6 +112,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
vdev->index = err;
vdev->config = config;
vdev->features_valid = false;
+   vdev->use_va = use_va;
 
if (name)
err = dev_set_name(>dev, "%s", name);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index f456f4baf86d..472d573d755c 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -250,7 +250,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr)
ops = _config_ops;
 
vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops,
-   dev_attr->name);
+   dev_attr->name, false);
if (!vdpasim)
goto err_alloc;
 
diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index ee723c645aec..c828b9f9d92a 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -435,7 +435,7 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
return ret;
 
vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa,
-   dev, _vdpa_ops, NULL);
+   dev, _vdpa_ops, NULL, false);
if (vp_vdpa == NULL) {
dev_err(dev, "vp_vdpa: Failed to allocate vDPA structure\n");

[PATCH v9 13/17] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

2021-07-13 Thread Xie Yongji
The upcoming patch is going to support VA mapping/unmapping.
So let's factor out the logic of PA mapping/unmapping firstly
to make the code more readable.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vhost/vdpa.c | 53 +---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index f60a513dac7c..3e260038beba 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -511,7 +511,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
return r;
 }
 
-static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last)
+static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last)
 {
struct vhost_dev *dev = >vdev;
struct vhost_iotlb *iotlb = dev->iotlb;
@@ -533,6 +533,11 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, 
u64 start, u64 last)
}
 }
 
+static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last)
+{
+   return vhost_vdpa_pa_unmap(v, start, last);
+}
+
 static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)
 {
struct vhost_dev *dev = >vdev;
@@ -613,37 +618,28 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 
iova, u64 size)
}
 }
 
-static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
-  struct vhost_iotlb_msg *msg)
+static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
+u64 iova, u64 size, u64 uaddr, u32 perm)
 {
struct vhost_dev *dev = >vdev;
-   struct vhost_iotlb *iotlb = dev->iotlb;
struct page **page_list;
unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
unsigned int gup_flags = FOLL_LONGTERM;
unsigned long npages, cur_base, map_pfn, last_pfn = 0;
unsigned long lock_limit, sz2pin, nchunks, i;
-   u64 iova = msg->iova;
+   u64 start = iova;
long pinned;
int ret = 0;
 
-   if (msg->iova < v->range.first ||
-   msg->iova + msg->size - 1 > v->range.last)
-   return -EINVAL;
-
-   if (vhost_iotlb_itree_first(iotlb, msg->iova,
-   msg->iova + msg->size - 1))
-   return -EEXIST;
-
/* Limit the use of memory for bookkeeping */
page_list = (struct page **) __get_free_page(GFP_KERNEL);
if (!page_list)
return -ENOMEM;
 
-   if (msg->perm & VHOST_ACCESS_WO)
+   if (perm & VHOST_ACCESS_WO)
gup_flags |= FOLL_WRITE;
 
-   npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
+   npages = PAGE_ALIGN(size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
if (!npages) {
ret = -EINVAL;
goto free;
@@ -657,7 +653,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
goto unlock;
}
 
-   cur_base = msg->uaddr & PAGE_MASK;
+   cur_base = uaddr & PAGE_MASK;
iova &= PAGE_MASK;
nchunks = 0;
 
@@ -688,7 +684,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
ret = vhost_vdpa_map(v, iova, csize,
 map_pfn << PAGE_SHIFT,
-msg->perm);
+perm);
if (ret) {
/*
 * Unpin the pages that are left 
unmapped
@@ -717,7 +713,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
 
/* Pin the rest chunk */
ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT,
-map_pfn << PAGE_SHIFT, msg->perm);
+map_pfn << PAGE_SHIFT, perm);
 out:
if (ret) {
if (nchunks) {
@@ -736,13 +732,32 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
for (pfn = map_pfn; pfn <= last_pfn; pfn++)
unpin_user_page(pfn_to_page(pfn));
}
-   vhost_vdpa_unmap(v, msg->iova, msg->size);
+   vhost_vdpa_unmap(v, start, size);
}
 unlock:
mmap_read_unlock(dev->mm);
 free:
free_page((unsigned long)page_list);
return ret;
+
+}
+
+static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
+  struct vhost_iotlb_msg *msg)
+{
+   struct vhost_dev *dev = >vdev;
+   struct vhost_iotlb *iotlb = dev->iotlb;
+
+   if (msg->iova < v->range.first ||
+   msg->iova + msg->size - 1 > v->range.last)
+   return -EINVAL;
+
+   if (vhost_iotlb_itree_first(iotlb, msg->iova,
+   

[PATCH v9 12/17] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()

2021-07-13 Thread Xie Yongji
Add an opaque pointer for DMA mapping.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 +++---
 drivers/vhost/vdpa.c | 2 +-
 include/linux/vdpa.h | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 72167963421b..f456f4baf86d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -542,14 +542,14 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,
 }
 
 static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size,
-  u64 pa, u32 perm)
+  u64 pa, u32 perm, void *opaque)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int ret;
 
spin_lock(>iommu_lock);
-   ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa,
-   perm);
+   ret = vhost_iotlb_add_range_ctx(vdpasim->iommu, iova, iova + size - 1,
+   pa, perm, opaque);
spin_unlock(>iommu_lock);
 
return ret;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 8615756306ec..f60a513dac7c 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -578,7 +578,7 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
return r;
 
if (ops->dma_map) {
-   r = ops->dma_map(vdpa, iova, size, pa, perm);
+   r = ops->dma_map(vdpa, iova, size, pa, perm, NULL);
} else if (ops->set_map) {
if (!v->in_batch)
r = ops->set_map(vdpa, dev->iotlb);
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 198c30e84b5d..4903e67c690b 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -268,7 +268,7 @@ struct vdpa_config_ops {
/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
-  u64 pa, u32 perm);
+  u64 pa, u32 perm, void *opaque);
int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
 
/* Free device resources */
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 09/17] virtio-vdpa: Handle the failure of vdpa_reset()

2021-07-13 Thread Xie Yongji
The vpda_reset() may fail now. This adds check to its return
value and fail the virtio_vdpa_reset().

Signed-off-by: Xie Yongji 
---
 drivers/virtio/virtio_vdpa.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index 3e666f70e829..ebbd8471bbee 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -101,9 +101,7 @@ static int virtio_vdpa_reset(struct virtio_device *vdev)
 {
struct vdpa_device *vdpa = vd_get_vdpa(vdev);
 
-   vdpa_reset(vdpa);
-
-   return 0;
+   return vdpa_reset(vdpa);
 }
 
 static bool virtio_vdpa_notify(struct virtqueue *vq)
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 11/17] vhost-iotlb: Add an opaque pointer for vhost IOTLB

2021-07-13 Thread Xie Yongji
Add an opaque pointer for vhost IOTLB. And introduce
vhost_iotlb_add_range_ctx() to accept it.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vhost/iotlb.c   | 20 
 include/linux/vhost_iotlb.h |  3 +++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 0582079e4bcc..670d56c879e5 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -36,19 +36,21 @@ void vhost_iotlb_map_free(struct vhost_iotlb *iotlb,
 EXPORT_SYMBOL_GPL(vhost_iotlb_map_free);
 
 /**
- * vhost_iotlb_add_range - add a new range to vhost IOTLB
+ * vhost_iotlb_add_range_ctx - add a new range to vhost IOTLB
  * @iotlb: the IOTLB
  * @start: start of the IOVA range
  * @last: last of IOVA range
  * @addr: the address that is mapped to @start
  * @perm: access permission of this range
+ * @opaque: the opaque pointer for the new mapping
  *
  * Returns an error last is smaller than start or memory allocation
  * fails
  */
-int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
- u64 start, u64 last,
- u64 addr, unsigned int perm)
+int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb,
+ u64 start, u64 last,
+ u64 addr, unsigned int perm,
+ void *opaque)
 {
struct vhost_iotlb_map *map;
 
@@ -71,6 +73,7 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
map->last = last;
map->addr = addr;
map->perm = perm;
+   map->opaque = opaque;
 
iotlb->nmaps++;
vhost_iotlb_itree_insert(map, >root);
@@ -80,6 +83,15 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
 
return 0;
 }
+EXPORT_SYMBOL_GPL(vhost_iotlb_add_range_ctx);
+
+int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
+ u64 start, u64 last,
+ u64 addr, unsigned int perm)
+{
+   return vhost_iotlb_add_range_ctx(iotlb, start, last,
+addr, perm, NULL);
+}
 EXPORT_SYMBOL_GPL(vhost_iotlb_add_range);
 
 /**
diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h
index 6b09b786a762..2d0e2f52f938 100644
--- a/include/linux/vhost_iotlb.h
+++ b/include/linux/vhost_iotlb.h
@@ -17,6 +17,7 @@ struct vhost_iotlb_map {
u32 perm;
u32 flags_padding;
u64 __subtree_last;
+   void *opaque;
 };
 
 #define VHOST_IOTLB_FLAG_RETIRE 0x1
@@ -29,6 +30,8 @@ struct vhost_iotlb {
unsigned int flags;
 };
 
+int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, u64 start, u64 last,
+ u64 addr, unsigned int perm, void *opaque);
 int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last,
  u64 addr, unsigned int perm);
 void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 10/17] virtio: Handle device reset failure in register_virtio_device()

2021-07-13 Thread Xie Yongji
The device reset may fail in virtio-vdpa case now, so add checks to
its return value and fail the register_virtio_device().

Signed-off-by: Xie Yongji 
---
 drivers/virtio/virtio.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index a15beb6b593b..8df75425fb43 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -349,7 +349,9 @@ int register_virtio_device(struct virtio_device *dev)
 
/* We always start by resetting the device, in case a previous
 * driver messed it up.  This also tests that code path a little. */
-   dev->config->reset(dev);
+   err = dev->config->reset(dev);
+   if (err)
+   goto err_reset;
 
/* Acknowledge that we've seen the device. */
virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
@@ -362,10 +364,13 @@ int register_virtio_device(struct virtio_device *dev)
 */
err = device_add(>dev);
if (err)
-   ida_simple_remove(_index_ida, dev->index);
-out:
-   if (err)
-   virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
+   goto err_add;
+
+   return 0;
+err_add:
+   virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
+err_reset:
+   ida_simple_remove(_index_ida, dev->index);
return err;
 }
 EXPORT_SYMBOL_GPL(register_virtio_device);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 08/17] virtio_config: Add a return value to reset function

2021-07-13 Thread Xie Yongji
This adds a return value to reset function so that we can
handle the reset failure later. No functional changes.

Signed-off-by: Xie Yongji 
---
 arch/um/drivers/virtio_uml.c | 4 +++-
 drivers/platform/mellanox/mlxbf-tmfifo.c | 4 +++-
 drivers/remoteproc/remoteproc_virtio.c   | 4 +++-
 drivers/s390/virtio/virtio_ccw.c | 6 --
 drivers/virtio/virtio_pci_legacy.c   | 4 +++-
 drivers/virtio/virtio_pci_modern.c   | 4 +++-
 drivers/virtio/virtio_vdpa.c | 4 +++-
 include/linux/virtio_config.h| 3 ++-
 8 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index 4412d6febade..ca02deaf9b32 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -828,11 +828,13 @@ static void vu_set_status(struct virtio_device *vdev, u8 
status)
vu_dev->status = status;
 }
 
-static void vu_reset(struct virtio_device *vdev)
+static int vu_reset(struct virtio_device *vdev)
 {
struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev);
 
vu_dev->status = 0;
+
+   return 0;
 }
 
 static void vu_del_vq(struct virtqueue *vq)
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c 
b/drivers/platform/mellanox/mlxbf-tmfifo.c
index 38800e86ed8a..e3c513c2d4fa 100644
--- a/drivers/platform/mellanox/mlxbf-tmfifo.c
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -989,11 +989,13 @@ static void mlxbf_tmfifo_virtio_set_status(struct 
virtio_device *vdev,
 }
 
 /* Reset the device. Not much here for now. */
-static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+static int mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
 {
struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
 
tm_vdev->status = 0;
+
+   return 0;
 }
 
 /* Read the value of a configuration field. */
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index cf4d54e98e6a..975c845b3187 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -191,7 +191,7 @@ static void rproc_virtio_set_status(struct virtio_device 
*vdev, u8 status)
dev_dbg(>dev, "status: %d\n", status);
 }
 
-static void rproc_virtio_reset(struct virtio_device *vdev)
+static int rproc_virtio_reset(struct virtio_device *vdev)
 {
struct rproc_vdev *rvdev = vdev_to_rvdev(vdev);
struct fw_rsc_vdev *rsc;
@@ -200,6 +200,8 @@ static void rproc_virtio_reset(struct virtio_device *vdev)
 
rsc->status = 0;
dev_dbg(>dev, "reset !\n");
+
+   return 0;
 }
 
 /* provide the vdev features as retrieved from the firmware */
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index d35e7a3f7067..5221cdad531d 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -710,14 +710,14 @@ static int virtio_ccw_find_vqs(struct virtio_device 
*vdev, unsigned nvqs,
return ret;
 }
 
-static void virtio_ccw_reset(struct virtio_device *vdev)
+static int virtio_ccw_reset(struct virtio_device *vdev)
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
struct ccw1 *ccw;
 
ccw = ccw_device_dma_zalloc(vcdev->cdev, sizeof(*ccw));
if (!ccw)
-   return;
+   return -ENOMEM;
 
/* Zero status bits. */
vcdev->dma_area->status = 0;
@@ -729,6 +729,8 @@ static void virtio_ccw_reset(struct virtio_device *vdev)
ccw->cda = 0;
ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_RESET);
ccw_device_dma_free(vcdev->cdev, ccw, sizeof(*ccw));
+
+   return 0;
 }
 
 static u64 virtio_ccw_get_features(struct virtio_device *vdev)
diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index d62e9835aeec..0b5d95e3efa1 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -89,7 +89,7 @@ static void vp_set_status(struct virtio_device *vdev, u8 
status)
iowrite8(status, vp_dev->ioaddr + VIRTIO_PCI_STATUS);
 }
 
-static void vp_reset(struct virtio_device *vdev)
+static int vp_reset(struct virtio_device *vdev)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
/* 0 status means a reset. */
@@ -99,6 +99,8 @@ static void vp_reset(struct virtio_device *vdev)
ioread8(vp_dev->ioaddr + VIRTIO_PCI_STATUS);
/* Flush pending VQ/configuration callbacks. */
vp_synchronize_vectors(vdev);
+
+   return 0;
 }
 
 static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 30654d3a0b41..b0cde3b2f0ff 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -158,7 +158,7 @@ static void vp_set_status(struct virtio_device *vdev, u8 
status)
vp_modern_set_status(_dev->mdev, status);
 }
 
-static void vp_reset(struct 

[PATCH v9 06/17] vhost-vdpa: Handle the failure of vdpa_reset()

2021-07-13 Thread Xie Yongji
The vdpa_reset() may fail now. This adds check to its return
value and fail the vhost_vdpa_open().

Signed-off-by: Xie Yongji 
---
 drivers/vhost/vdpa.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 62b6d911c57d..8615756306ec 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -116,12 +116,13 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa 
*v, u16 qid)
irq_bypass_unregister_producer(>call_ctx.producer);
 }
 
-static void vhost_vdpa_reset(struct vhost_vdpa *v)
+static int vhost_vdpa_reset(struct vhost_vdpa *v)
 {
struct vdpa_device *vdpa = v->vdpa;
 
-   vdpa_reset(vdpa);
v->in_batch = 0;
+
+   return vdpa_reset(vdpa);
 }
 
 static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
@@ -871,7 +872,9 @@ static int vhost_vdpa_open(struct inode *inode, struct file 
*filep)
return -EBUSY;
 
nvqs = v->nvqs;
-   vhost_vdpa_reset(v);
+   r = vhost_vdpa_reset(v);
+   if (r)
+   goto err;
 
vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL);
if (!vqs) {
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 07/17] virtio: Don't set FAILED status bit on device index allocation failure

2021-07-13 Thread Xie Yongji
We don't need to set FAILED status bit on device index allocation
failure since the device initialization hasn't been started yet.

Signed-off-by: Xie Yongji 
---
 drivers/virtio/virtio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 4b15c00c0a0a..a15beb6b593b 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -338,7 +338,7 @@ int register_virtio_device(struct virtio_device *dev)
/* Assign a unique device index and hence name. */
err = ida_simple_get(_index_ida, 0, 0, GFP_KERNEL);
if (err < 0)
-   goto out;
+   return err;
 
dev->index = err;
dev_set_name(>dev, "virtio%u", dev->index);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 05/17] vhost-vdpa: Fail the vhost_vdpa_set_status() on reset failure

2021-07-13 Thread Xie Yongji
Re-read the device status to ensure it's set to zero during
resetting. Otherwise, fail the vhost_vdpa_set_status() after timeout.

Signed-off-by: Xie Yongji 
---
 drivers/vhost/vdpa.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index bb374a801bda..62b6d911c57d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -157,7 +157,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
u8 status, status_old;
-   int nvqs = v->nvqs;
+   int timeout = 0, nvqs = v->nvqs;
u16 i;
 
if (copy_from_user(, statusp, sizeof(status)))
@@ -173,6 +173,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
return -EINVAL;
 
ops->set_status(vdpa, status);
+   if (status == 0) {
+   while (ops->get_status(vdpa)) {
+   timeout += 20;
+   if (timeout > VDPA_RESET_TIMEOUT_MS)
+   return -EIO;
+
+   msleep(20);
+   }
+   }
 
if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & 
VIRTIO_CONFIG_S_DRIVER_OK))
for (i = 0; i < nvqs; i++)
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 04/17] vdpa: Fail the vdpa_reset() if fail to set device status to zero

2021-07-13 Thread Xie Yongji
Re-read the device status to ensure it's set to zero during
resetting. Otherwise, fail the vdpa_reset() after timeout.

Signed-off-by: Xie Yongji 
---
 include/linux/vdpa.h | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index f822490db584..198c30e84b5d 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * struct vdpa_calllback - vDPA callback definition.
@@ -340,12 +341,24 @@ static inline struct device *vdpa_get_dma_dev(struct 
vdpa_device *vdev)
return vdev->dma_dev;
 }
 
-static inline void vdpa_reset(struct vdpa_device *vdev)
+#define VDPA_RESET_TIMEOUT_MS 1000
+
+static inline int vdpa_reset(struct vdpa_device *vdev)
 {
const struct vdpa_config_ops *ops = vdev->config;
+   int timeout = 0;
 
vdev->features_valid = false;
ops->set_status(vdev, 0);
+   while (ops->get_status(vdev)) {
+   timeout += 20;
+   if (timeout > VDPA_RESET_TIMEOUT_MS)
+   return -EIO;
+
+   msleep(20);
+   }
+
+   return 0;
 }
 
 static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features)
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 03/17] vdpa: Fix code indentation

2021-07-13 Thread Xie Yongji
Use tabs to indent the code instead of spaces.

Signed-off-by: Xie Yongji 
---
 include/linux/vdpa.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 7c49bc5a2b71..f822490db584 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -342,25 +342,25 @@ static inline struct device *vdpa_get_dma_dev(struct 
vdpa_device *vdev)
 
 static inline void vdpa_reset(struct vdpa_device *vdev)
 {
-const struct vdpa_config_ops *ops = vdev->config;
+   const struct vdpa_config_ops *ops = vdev->config;
 
vdev->features_valid = false;
-ops->set_status(vdev, 0);
+   ops->set_status(vdev, 0);
 }
 
 static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features)
 {
-const struct vdpa_config_ops *ops = vdev->config;
+   const struct vdpa_config_ops *ops = vdev->config;
 
vdev->features_valid = true;
-return ops->set_features(vdev, features);
+   return ops->set_features(vdev, features);
 }
 
 
 static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,
   void *buf, unsigned int len)
 {
-const struct vdpa_config_ops *ops = vdev->config;
+   const struct vdpa_config_ops *ops = vdev->config;
 
/*
 * Config accesses aren't supposed to trigger before features are set.
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 02/17] file: Export receive_fd() to modules

2021-07-13 Thread Xie Yongji
Export receive_fd() so that some modules can use
it to pass file descriptor between processes without
missing any security stuffs.

Signed-off-by: Xie Yongji 
---
 fs/file.c| 6 ++
 include/linux/file.h | 7 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 86dc9956af32..210e540672aa 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1134,6 +1134,12 @@ int receive_fd_replace(int new_fd, struct file *file, 
unsigned int o_flags)
return new_fd;
 }
 
+int receive_fd(struct file *file, unsigned int o_flags)
+{
+   return __receive_fd(file, NULL, o_flags);
+}
+EXPORT_SYMBOL_GPL(receive_fd);
+
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
 {
int err = -EBADF;
diff --git a/include/linux/file.h b/include/linux/file.h
index 2de2e4613d7b..51e830b4fe3a 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -94,6 +94,9 @@ extern void fd_install(unsigned int fd, struct file *file);
 
 extern int __receive_fd(struct file *file, int __user *ufd,
unsigned int o_flags);
+
+extern int receive_fd(struct file *file, unsigned int o_flags);
+
 static inline int receive_fd_user(struct file *file, int __user *ufd,
  unsigned int o_flags)
 {
@@ -101,10 +104,6 @@ static inline int receive_fd_user(struct file *file, int 
__user *ufd,
return -EFAULT;
return __receive_fd(file, ufd, o_flags);
 }
-static inline int receive_fd(struct file *file, unsigned int o_flags)
-{
-   return __receive_fd(file, NULL, o_flags);
-}
 int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags);
 
 extern void flush_delayed_fput(void);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

2021-07-13 Thread Xie Yongji
Export alloc_iova_fast() and free_iova_fast() so that
some modules can use it to improve iova allocation efficiency.

Signed-off-by: Xie Yongji 
---
 drivers/iommu/iova.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b6cf5f16123b..3941ed6bb99b 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -521,6 +521,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
 
return new_iova->pfn_lo;
 }
+EXPORT_SYMBOL_GPL(alloc_iova_fast);
 
 /**
  * free_iova_fast - free iova pfn range into rcache
@@ -538,6 +539,7 @@ free_iova_fast(struct iova_domain *iovad, unsigned long 
pfn, unsigned long size)
 
free_iova(iovad, pfn);
 }
+EXPORT_SYMBOL_GPL(free_iova_fast);
 
 #define fq_ring_for_each(i, fq) \
for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % 
IOVA_FQ_SIZE)
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 00/17] Introduce VDUSE - vDPA Device in Userspace

2021-07-13 Thread Xie Yongji
This series introduces a framework that makes it possible to implement
software-emulated vDPA devices in userspace. And to make the device
emulation more secure, the emulated vDPA device's control path is handled
in the kernel and only the data path is implemented in the userspace.

Since the emuldated vDPA device's control path is handled in the kernel,
a message mechnism is introduced to make userspace be aware of the data
path related changes. Userspace can use read()/write() to receive/reply
the control messages.

In the data path, the core is mapping dma buffer into VDUSE daemon's
address space, which can be implemented in different ways depending on
the vdpa bus to which the vDPA device is attached.

In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
buffer is reside in a userspace memory region which can be shared to the
VDUSE userspace processs via transferring the shmfd.

The details and our user case is shown below:

-   
--
|Container ||  QEMU(VM) |   |   
VDUSE daemon |
|   -  ||  ---  |   | 
-  |
|   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
emulation | | block driver | |
+--- ---+   
-+--+-
|   ||  
|
|   ||  
|
+---++--+-
|| block device |   |  vhost device || vduse driver |   
   | TCP/IP ||
|---+   +---+   
   -+|
|   |   |   |   
||
| --+--   --+--- ---+---
||
| | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |
||
| --+--   --+--- ---+---
||
|   |  virtio bus   |   |   
||
|   ++---   |   |   
||
||  |   |   
||
|  --+--|   |   
||
|  | virtio-blk device ||   |   
||
|  --+--|   |   
||
||  |   |   
||
| ---+---   |   |   
||
| |  virtio-vdpa driver |   |   |   
||
| ---+---   |   |   
||
||  |   |vdpa 
bus   ||
| 
---+--+---+ 
  ||
|   
 ---+--- |
-|
 NIC |--

 ---+---

|

   -+-

   | Remote Storages |

   ---

We make use of it to implement a block device connecting to
our distributed storage, which can be used both in containers and
VMs. Thus, we can have an unified technology stack in this two cases.

To test it with null-blk:

  $ qemu-storage-daemon \
  --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
  --monitor chardev=charmonitor \
  --blockdev 
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
 \
  --export 

Re: [PATCH] dt-bindings: arm-smmu: Fix json-schema syntax

2021-07-13 Thread Krzysztof Kozlowski
On Mon, 12 Jul 2021 at 16:14, Rob Herring  wrote:
>
> On Tue, Jun 22, 2021 at 11:56 PM Krzysztof Kozlowski
>  wrote:
> >
> > On Mon, 21 Jun 2021 16:00:36 +0200, Thierry Reding wrote:
> > > Commit 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible
> > > string") introduced a jsonschema syntax error as a result of a rebase
> > > gone wrong. Fix it.
> >
> > Applied, thanks!
> >
> > [1/1] dt-bindings: arm-smmu: Fix json-schema syntax
> >   commit: bf3ec9deaa33889630722c47f7bb86ba58872ea7
>
> Applied where? Now Linus's master is broken.

To memory controller drivers tree. Pushed to soc folks some time ago:
https://lore.kernel.org/lkml/20210625073604.13562-1-krzysztof.kozlow...@canonical.com/

Cc: Arnd and Olof,
Any comments on merging these fixes? They should go to current RC.

Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu