date:20211123

On Tue, Nov 23, 2021 at 7:58 PM Robin Murphy  wrote:
>
> On 2021-11-23 11:21, Hsin-Yi Wang wrote:
> > Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases.
> > This series adds support to customize io_tlb_segsize for each
> > restricted-dma-pool.
> >
> > Example use case:
> >
> > mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through
> > mtk-scp. In order to use the noncontiguous DMA API[3], we need to use
> > the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs.
> > mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are
> > larger than the default IO_TLB_SEGSIZE (128) slabs.
>
> Are drivers really doing streaming DMA mappings that large? If so, that
> seems like it might be worth trying to address in its own right for the
> sake of efficiency - allocating ~5MB of memory twice and copying it back
> and forth doesn't sound like the ideal thing to do.
>
> If it's really about coherent DMA buffer allocation, I thought the plan
> was that devices which expect to use a significant amount and/or size of
> coherent buffers would continue to use a shared-dma-pool for that? It's
> still what the binding implies. My understanding was that
> swiotlb_alloc() is mostly just a fallback for the sake of drivers which
> mostly do streaming DMA but may allocate a handful of pages worth of
> coherent buffers here and there. Certainly looking at the mtk_scp
> driver, that seems like it shouldn't be going anywhere near SWIOTLB at all.
>
mtk_scp on its own can use the shared-dma-pool, which it currently uses.
The reason we switched to restricted-dma-pool is that we want to use
the noncontiguous DMA API for mtk-isp. The noncontiguous DMA API is
designed for devices with iommu, and if a device doesn't have an
iommu, it will fallback using swiotlb. But currently noncontiguous DMA
API doesn't work with the shared-dma-pool.

vb2_dc_alloc() -> dma_alloc_noncontiguous() -> alloc_single_sgt() ->
__dma_alloc_pages() -> dma_direct_alloc_pages() ->
__dma_direct_alloc_pages() -> swiotlb_alloc().


> Robin.
>
> > [1] (not in upstream) 
> > https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo@mediatek.com/
> > [2] 
> > https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c
> > [3] 
> > https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhat...@chromium.org/
> >
> > Hsin-Yi Wang (3):
> >dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE
> >dt-bindings: Add io-tlb-segsize property for restricted-dma-pool
> >arm64: dts: mt8183: use restricted swiotlb for scp mem
> >
> >   .../reserved-memory/shared-dma-pool.yaml  |  8 +
> >   .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi |  4 +--
> >   include/linux/swiotlb.h   |  1 +
> >   kernel/dma/swiotlb.c  | 34 ++-
> >   4 files changed, 37 insertions(+), 10 deletions(-)
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 9/9] iommu: Move flush queue data into iommu_dma_cookie

2021-11-23 Thread kernel test robot

Hi Robin,

I love your patch! Perhaps something to improve:

[auto build test WARNING on joro-iommu/next]
[also build test WARNING on v5.16-rc2 next-20211123]
[cannot apply to tegra-drm/drm/tegra/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Robin-Murphy/iommu-Refactor-flush-queues-into-iommu-dma/20211123-221220
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: arm-defconfig 
(https://download.01.org/0day-ci/archive/20211124/202111240645.30neuyaq-...@intel.com/config.gz)
compiler: arm-linux-gnueabi-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/d4623bb02366503fa7c3805228fa9534c9490d20
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Robin-Murphy/iommu-Refactor-flush-queues-into-iommu-dma/20211123-221220
git checkout d4623bb02366503fa7c3805228fa9534c9490d20
# save the config file to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/tegra/vic.c: In function 'vic_exit':
   drivers/gpu/drm/tegra/vic.c:196:17: error: implicit declaration of function 
'dma_unmap_single'; did you mean 'mount_single'? 
[-Werror=implicit-function-declaration]
 196 | dma_unmap_single(vic->dev, vic->falcon.firmware.phys,
 | ^~~~
 | mount_single
   drivers/gpu/drm/tegra/vic.c:197:61: error: 'DMA_TO_DEVICE' undeclared (first 
use in this function); did you mean 'MT_DEVICE'?
 197 |  vic->falcon.firmware.size, 
DMA_TO_DEVICE);
 | 
^
 | MT_DEVICE
   drivers/gpu/drm/tegra/vic.c:197:61: note: each undeclared identifier is 
reported only once for each function it appears in
   drivers/gpu/drm/tegra/vic.c:202:17: error: implicit declaration of function 
'dma_free_coherent' [-Werror=implicit-function-declaration]
 202 | dma_free_coherent(vic->dev, 
vic->falcon.firmware.size,
 | ^
   drivers/gpu/drm/tegra/vic.c: In function 'vic_load_firmware':
   drivers/gpu/drm/tegra/vic.c:234:24: error: implicit declaration of function 
'dma_alloc_coherent' [-Werror=implicit-function-declaration]
 234 | virt = dma_alloc_coherent(vic->dev, size, , 
GFP_KERNEL);
 |^~
>> drivers/gpu/drm/tegra/vic.c:234:22: warning: assignment to 'void *' from 
>> 'int' makes pointer from integer without a cast [-Wint-conversion]
 234 | virt = dma_alloc_coherent(vic->dev, size, , 
GFP_KERNEL);
 |  ^
   drivers/gpu/drm/tegra/vic.c:236:23: error: implicit declaration of function 
'dma_mapping_error' [-Werror=implicit-function-declaration]
 236 | err = dma_mapping_error(vic->dev, iova);
 |   ^
   drivers/gpu/drm/tegra/vic.c:258:24: error: implicit declaration of function 
'dma_map_single' [-Werror=implicit-function-declaration]
 258 | phys = dma_map_single(vic->dev, virt, size, 
DMA_TO_DEVICE);
 |^~
   drivers/gpu/drm/tegra/vic.c:258:61: error: 'DMA_TO_DEVICE' undeclared (first 
use in this function); did you mean 'MT_DEVICE'?
 258 | phys = dma_map_single(vic->dev, virt, size, 
DMA_TO_DEVICE);
 | 
^
 | MT_DEVICE
   drivers/gpu/drm/tegra/vic.c: In function 'vic_probe':
   drivers/gpu/drm/tegra/vic.c:412:15: error: implicit declaration of function 
'dma_coerce_mask_and_coherent' [-Werror=implicit-function-declaration]
 412 | err = dma_coerce_mask_and_coherent(dev, 
*dev->parent->dma_mask);
 |   ^~~~
   cc1: some warnings being treated as errors


vim +234 drivers/gpu/drm/tegra/vic.c

0ae797a8ba05a2 Arto Merilainen 2016-12-14  214  
77a0b09dd993c8 Thierry Reding  2019-02-01  215  static int 
vic_load_firmware(struct vic *vic)
77a0b09dd993c8 Thierry Reding  2019-02-01  216  {
20e7dce255e96a Thierry Reding  2019-10-28  217  struct host1x_client

RE: [PATCH V2 5/6] net: netvsc: Add Isolation VM support for netvsc driver

2021-11-23 Thread Michael Kelley (LINUX) via iommu

From: Tianyu Lan  Sent: Tuesday, November 23, 2021 6:31 AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() stills need to be handled. Use DMA API to map/umap
> these memory during sending/receiving packet and Hyper-V swiotlb
> bounce buffer dma address will be returned. The swiotlb bounce buffer
> has been masked to be visible to host during boot up.
> 
> Allocate rx/tx ring buffer via dma_alloc_noncontiguous() in Isolation
> VM. After calling vmbus_establish_gpadl() which marks these pages visible
> to host, map these pages unencrypted addes space via dma_vmap_noncontiguous().
> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/net/hyperv/hyperv_net.h   |   5 +
>  drivers/net/hyperv/netvsc.c   | 192 +++---
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h|   6 +
>  4 files changed, 190 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 315278a7cf88..31c77a00d01e 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>   u32 total_bytes;
>   u32 send_buf_index;
>   u32 total_data_buflen;
> + struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,7 @@ struct netvsc_device {
> 
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
> + struct sg_table *recv_sgt;
>   u32 recv_buf_size; /* allocated bytes */
>   struct vmbus_gpadl recv_buf_gpadl_handle;
>   u32 recv_section_cnt;
> @@ -1082,6 +1084,7 @@ struct netvsc_device {
> 
>   /* Send buffer allocated by us */
>   void *send_buf;
> + struct sg_table *send_sgt;
>   u32 send_buf_size;
>   struct vmbus_gpadl send_buf_gpadl_handle;
>   u32 send_section_cnt;
> @@ -1731,4 +1734,6 @@ struct rndis_message {
>  #define RETRY_US_HI  1
>  #define RETRY_MAX2000/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +   struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 396bc1c204e6..9cdc71930830 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -146,15 +147,39 @@ static struct netvsc_device *alloc_net_device(void)
>   return net_device;
>  }
> 
> +static struct hv_device *netvsc_channel_to_device(struct vmbus_channel 
> *channel)
> +{
> + struct vmbus_channel *primary = channel->primary_channel;
> +
> + return primary ? primary->device_obj : channel->device_obj;
> +}
> +
>  static void free_netvsc_device(struct rcu_head *head)
>  {
>   struct netvsc_device *nvdev
>   = container_of(head, struct netvsc_device, rcu);
> + struct hv_device *dev =
> + netvsc_channel_to_device(nvdev->chan_table[0].channel);
>   int i;
> 
>   kfree(nvdev->extension);
> - vfree(nvdev->recv_buf);
> - vfree(nvdev->send_buf);
> +
> + if (nvdev->recv_sgt) {
> + dma_vunmap_noncontiguous(>device, nvdev->recv_buf);
> + dma_free_noncontiguous(>device, nvdev->recv_buf_size,
> +nvdev->recv_sgt, DMA_FROM_DEVICE);
> + } else {
> + vfree(nvdev->recv_buf);
> + }
> +
> + if (nvdev->send_sgt) {
> + dma_vunmap_noncontiguous(>device, nvdev->send_buf);
> + dma_free_noncontiguous(>device, nvdev->send_buf_size,
> +nvdev->send_sgt, DMA_TO_DEVICE);
> + } else {
> + vfree(nvdev->send_buf);
> + }
> +
>   kfree(nvdev->send_section_map);
> 
>   for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -348,7 +373,21 @@ static int netvsc_init_buf(struct hv_device *device,
>   buf_size = min_t(unsigned int, buf_size,
>NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
> 
> - net_device->recv_buf = vzalloc(buf_size);
> + if (hv_isolation_type_snp()) {
> + net_device->recv_sgt =
> + dma_alloc_noncontiguous(>device, buf_size,
> + DMA_FROM_DEVICE, GFP_KERNEL, 0);
> + if (!net_device->recv_sgt) {
> + pr_err("Fail to allocate recv buffer buf_size %d.\n.", 
> buf_size);
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + net_device->recv_buf = (void 
> *)net_device->recv_sgt->sgl->dma_address;

Use sg_dma_address() macro.

> + } else {
> + net_device->recv_buf = vzalloc(buf_size);
> + }
> +
>   if

RE: [PATCH V2 4/6] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-23 Thread Michael Kelley (LINUX) via iommu

From: Tianyu Lan  Sent: Tuesday, November 23, 2021 6:31 AM
> 
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> Swiotlb bounce buffer code calls set_memory_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space via memremap. Populate the shared_gpa_boundary
> (vTOM) via swiotlb_unencrypted_base variable.
> 
> The map function memremap() can't work in the early place
> hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes()
> in the hyperv_iommu_swiotlb_later_init().
> 
> Add Hyper-V dma ops and provide alloc/free and vmap/vunmap noncontiguous
> callback to handle request of  allocating and mapping noncontiguous dma
> memory in vmbus device driver. Netvsc driver will use this. Set dma_ops_
> bypass flag for hv device to use dma direct functions during mapping/unmapping
> dma page.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v1:
>   * Remove hv isolation check in the sev_setup_arch()
> 
>  arch/x86/mm/mem_encrypt.c  |   1 +
>  arch/x86/xen/pci-swiotlb-xen.c |   3 +-
>  drivers/hv/Kconfig |   1 +
>  drivers/hv/vmbus_drv.c |   6 ++
>  drivers/iommu/hyperv-iommu.c   | 164 +
>  include/linux/hyperv.h |  10 ++
>  6 files changed, 184 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 35487305d8af..e48c73b3dd41 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

There is no longer any need to add this #include since code changes to this
file in a previous version of the patch are now gone.

> 
>  #include "mm_internal.h"
> 
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 46df59aeaa06..30fd0600b008 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -4,6 +4,7 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include 
> @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
>  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
> 
>  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
> -   NULL,
> +   hyperv_swiotlb_detect,
> pci_xen_swiotlb_init,
> NULL);
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index dd12af20e467..d43b4cd88f57 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -9,6 +9,7 @@ config HYPERV
>   select PARAVIRT
>   select X86_HV_CALLBACK_VECTOR if X86
>   select VMAP_PFN
> + select DMA_OPS_BYPASS
>   help
> Select this option to run Linux as a Hyper-V client operating
> system.
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 392c1ac4f819..32dc193e31cd 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "hyperv_vmbus.h"
> 
> @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t 
> *type,
>   return child_device_obj;
>  }
> 
> +static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
>  /*
>   * vmbus_device_register - Register the child device
>   */
> @@ -2118,6 +2120,10 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   }
>   hv_debug_add_dev_dir(child_device_obj);
> 
> + child_device_obj->device.dma_ops_bypass = true;
> + child_device_obj->device.dma_ops = _iommu_dma_ops;
> + child_device_obj->device.dma_mask = _dma_mask;
> + child_device_obj->device.dma_parms = _device_obj->dma_parms;
>   return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index e285a220c913..ebcb628e7e8f 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -13,14 +13,21 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include

Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.

2021-11-23 Thread Vlastimil Babka

On 11/23/21 17:35, Zi Yan wrote:
> On 19 Nov 2021, at 10:15, Zi Yan wrote:
 From what my understanding, cma required alignment of
 max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was 
 introduced,
 __free_one_page() does not prevent merging two different pageblocks, when
 MAX_ORDER - 1 > pageblock_order. But current __free_one_page() 
 implementation
 does prevent that.
>>>
>>> But it does prevent that only for isolated pageblock, not CMA, and yout
>>> patchset doesn't seem to expand that to CMA? Or am I missing something.
>>
>> Yeah, you are right. Originally, I thought preventing merging isolated 
>> pageblock
>> with other types of pageblocks is sufficient, since MIGRATE_CMA is always
>> converted from MIGRATE_ISOLATE. But that is not true. I will rework the code.
>> Thanks for pointing this out.
>>
> 
> I find that two pageblocks with different migratetypes, like 
> MIGRATE_RECLAIMABLE
> and MIGRATE_MOVABLE can be merged into a single free page after I checked
> __free_one_page() in detail and printed pageblock information during buddy 
> page
> merging.

Yes, that can happen.

I am not sure what consequence it will cause. Do you have any idea?

For MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE or even MIGRATE_UNMOVABLE it's
absolutely fine. As long as these pageblocks are fully free (and they are if
it's a single free page spanning 2 pageblocks), they can be of any of these
type, as they can be reused as needed without causing fragmentation.

But in case of MIGRATE_CMA and MIGRATE_ISOLATE, uncontrolled merging would
break the specifics of those types. That's why the code is careful for
MIGRATE_ISOLATE, and MIGRATE_CMA was until now done in MAX_ORDER granularity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V2 1/6] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-11-23 Thread Michael Kelley (LINUX) via iommu

From: Tianyu Lan  Sent: Tuesday, November 23, 2021 6:31 AM
> 
> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
> extra address space which is above shared_gpa_boundary (E.G 39 bit
> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
> physical address will be original physical address + shared_gpa_boundary.
> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
> memory(vTOM). Memory addresses below vTOM are automatically treated as
> private while memory above vTOM is treated as shared.
> 
> Expose swiotlb_unencrypted_base for platforms to set unencrypted
> memory base offset and platform calls swiotlb_update_mem_attributes()
> to remap swiotlb mem to unencrypted address space. memremap() can
> not be called in the early stage and so put remapping code into
> swiotlb_update_mem_attributes(). Store remap address and use it to copy
> data from/to swiotlb bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v1:
>   * Rework comment in the swiotlb_init_io_tlb_mem()
>   * Make swiotlb_init_io_tlb_mem() back to return void.
> ---
>  include/linux/swiotlb.h |  6 +
>  kernel/dma/swiotlb.c| 53 +
>  2 files changed, 54 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 569272871375..f6c3638255d5 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
>   * @end: The end address of the swiotlb memory pool. Used to do a quick
>   *   range check to see if the memory was in fact allocated by this
>   *   API.
> + * @vaddr:   The vaddr of the swiotlb memory pool. The swiotlb memory pool
> + *   may be remapped in the memory encrypted case and store virtual
> + *   address for bounce buffer operation.
>   * @nslabs:  The number of IO TLB blocks (in groups of 64) between @start and
>   *   @end. For default swiotlb, this is command line adjustable via
>   *   setup_io_tlb_npages.
> @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
>  struct io_tlb_mem {
>   phys_addr_t start;
>   phys_addr_t end;
> + void *vaddr;
>   unsigned long nslabs;
>   unsigned long used;
>   unsigned int index;
> @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device 
> *dev)
>  }
>  #endif /* CONFIG_DMA_RESTRICTED_POOL */
> 
> +extern phys_addr_t swiotlb_unencrypted_base;
> +
>  #endif /* __LINUX_SWIOTLB_H */
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 8e840fbbed7c..c303fdeba82f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
> 
>  struct io_tlb_mem io_tlb_default_mem;
> 
> +phys_addr_t swiotlb_unencrypted_base;
> +
>  /*
>   * Max segment that we can provide which (if pages are contingous) will
>   * not be bounced (unless SWIOTLB_FORCE is set).
> @@ -155,6 +158,31 @@ static inline unsigned long nr_slots(u64 val)
>   return DIV_ROUND_UP(val, IO_TLB_SIZE);
>  }
> 
> +/*
> + * Remap swioltb memory in the unencrypted physical address space
> + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
> + * Isolation VMs).
> + */
> +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
> +{
> + void *vaddr;
> +
> + if (swiotlb_unencrypted_base) {
> + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
> +
> + vaddr = memremap(paddr, bytes, MEMREMAP_WB);
> + if (!vaddr) {
> + pr_err("Failed to map the unencrypted memory %llx size 
> %lx.\n",
> +paddr, bytes);
> + return NULL;
> + }
> +
> + return vaddr;
> + }
> +
> + return phys_to_virt(mem->start);
> +}
> +
>  /*
>   * Early SWIOTLB allocation may be too early to allow an architecture to
>   * perform the desired operations.  This function allows the architecture to
> @@ -172,7 +200,14 @@ void __init swiotlb_update_mem_attributes(void)
>   vaddr = phys_to_virt(mem->start);
>   bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
>   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
> - memset(vaddr, 0, bytes);
> +
> + mem->vaddr = swiotlb_mem_remap(mem, bytes);
> + if (!mem->vaddr) {
> + pr_err("Fail to remap swiotlb mem.\n");
> + return;
> + }
> +
> + memset(mem->vaddr, 0, bytes);
>  }

In the error case, do you want to leave mem->vaddr as NULL?  Or is it
better to leave it as the virtual address of mem-start?  Your code leaves it
as NULL.

The interaction between swiotlb_update_mem_attributes() and the helper
function swiotlb_memo_remap() seems kind of clunky.  phys_to_virt() gets called
twice, for

Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.

2021-11-23 Thread David Hildenbrand

On 17.11.21 04:04, Zi Yan wrote:
> On 16 Nov 2021, at 3:58, David Hildenbrand wrote:
> 
>> On 15.11.21 20:37, Zi Yan wrote:
>>> From: Zi Yan 
>>>
>>> Hi David,
>>
>> Hi,
>>
>> thanks for looking into this.
>>

Hi,

sorry for the delay, I wasn't "actually working" last week, so now I'm
back from holiday :)

>>>
>>> You suggested to make alloc_contig_range() deal with pageblock_order 
>>> instead of
>>> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
>>> patchset is my attempt to achieve that. Please take a look and let me know 
>>> if
>>> I am doing it correctly or not.
>>>
>>> From what my understanding, cma required alignment of
>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was 
>>> introduced,
>>> __free_one_page() does not prevent merging two different pageblocks, when
>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() 
>>> implementation
>>> does prevent that. It should be OK to just align cma to pageblock_order.
>>> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
>>> pageblock_order as alignment too.
>>
>> I wonder if that's sufficient. Especially the outer_start logic in
>> alloc_contig_range() might be problematic. There are some ugly corner
>> cases with free pages/allocations spanning multiple pageblocks and we
>> only isolated a single pageblock.
> 
> Thank you a lot for writing the list of these corner cases. They are
> very helpful!
> 
>>
>>
>> Regarding CMA, we have to keep the following cases working:
>>
>> a) Different pageblock types (MIGRATE_CMA and !MIGRATE_CMA) in MAX_ORDER
>>- 1 page:
>>[   MAX_ ORDER - 1 ]
>>[ pageblock 0 | pageblock 1]
>>
>> Assume either pageblock 0 is MIGRATE_CMA or pageblock 1 is MIGRATE_CMA,
>> but not both. We have to make sure alloc_contig_range() keeps working
>> correctly. This should be the case even with your change, as we won't
>> merging pages accross differing migratetypes.
> 
> Yes.
> 
>>
>> b) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated:
>>[   MAX_ ORDER - 1 ]
>>[ pageblock 0 | pageblock 1]
>>
>> Assume both are MIGRATE_CMA. Assume we want to either allocate from
>> pageblock 0 or pageblock 1. Especially, assume we want to allocate from
>> pageblock 1. While we would isolate pageblock 1, we wouldn't isolate
>> pageblock 0.
>>
>> What happens if we either have a free page spanning the MAX_ORDER - 1
>> range already OR if we have to migrate a MAX_ORDER - 1 page, resulting
>> in a free MAX_ORDER - 1 page of which only the second pageblock is
>> isolated? We would end up essentially freeing a page that has mixed
>> pageblocks, essentially placing it in !MIGRATE_ISOLATE free lists ... I
>> might be wrong but I have the feeling that this would be problematic.
>>
> 
> This could happen when start_isolate_page_range() stumbles upon a compound
> page with order >= pageblock_order or a free page with order >=
> pageblock_order, but should not. start_isolate_page_range() should check
> the actual page size, either compound page size or free page size, and set
> the migratetype across pageblocks if the page is bigger than pageblock size.
> More precisely set_migratetype_isolate() should do that.

Right -- but then we have to extend the isolation range from within
set_migratetype_isolate() :/ E.g., how should we know what we have to
unisolate later ..

> 
> 
>> c) Concurrent allocations:
>> [   MAX_ ORDER - 1 ]
>> [ pageblock 0 | pageblock 1]
>>
>> Assume b) but we have two concurrent CMA allocations to pageblock 0 and
>> pageblock 1, which would now be possible as start_isolate_page_range()
>> isolate would succeed on both.
> 
> Two isolations will be serialized by the zone lock taken by
> set_migratetype_isolate(), so the concurrent allocation would not be a 
> problem.
> If it is a MAX_ORDER-1 free page, the first comer should split it and only
> isolate one of the pageblock then second one can isolate the other pageblock.

Right, the issue is essentially b) above.

> If it is a MAX_ORDER-1 compound page, the first comer should isolate both
> pageblocks, then the second one would fail. WDYT?

Possibly we could even have two independent CMA areas "colliding" within
a MAX_ ORDER - 1 page. I guess the surprise would be getting an
"-EAGAIN" from isolation, but the caller should properly handle that.

Maybe it's really easier to do something along the lines I proposed
below and always isolate the complete MAX_ORDER-1 range in
alloc_contig_range(). We just have to "fix" isolation as I drafted.

> 
> 
> In sum, it seems to me that the issue is page isolation code only sees
> pageblock without check the actual page. When there are multiple pageblocks
> belonging to one page, the problem appears. This should be fixed.
> 
>>
>>
>> Regarding virtio-mem, we care about the following cases:
>>
>> a) Allocating parts from completely movable MAX_ ORDER - 1 page:
>>[   MAX_ ORDER - 1 ]
>>[ pageblock

Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.

2021-11-23 Thread Zi Yan via iommu

On 19 Nov 2021, at 10:15, Zi Yan wrote:

> On 19 Nov 2021, at 7:33, Vlastimil Babka wrote:
>
>> On 11/15/21 20:37, Zi Yan wrote:
>>> From: Zi Yan 
>>>
>>> Hi David,
>>>
>>> You suggested to make alloc_contig_range() deal with pageblock_order 
>>> instead of
>>> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
>>> patchset is my attempt to achieve that. Please take a look and let me know 
>>> if
>>> I am doing it correctly or not.
>>>
>>> From what my understanding, cma required alignment of
>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was 
>>> introduced,
>>> __free_one_page() does not prevent merging two different pageblocks, when
>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() 
>>> implementation
>>> does prevent that.
>>
>> But it does prevent that only for isolated pageblock, not CMA, and yout
>> patchset doesn't seem to expand that to CMA? Or am I missing something.
>
> Yeah, you are right. Originally, I thought preventing merging isolated 
> pageblock
> with other types of pageblocks is sufficient, since MIGRATE_CMA is always
> converted from MIGRATE_ISOLATE. But that is not true. I will rework the code.
> Thanks for pointing this out.
>

I find that two pageblocks with different migratetypes, like MIGRATE_RECLAIMABLE
and MIGRATE_MOVABLE can be merged into a single free page after I checked
__free_one_page() in detail and printed pageblock information during buddy page
merging. I am not sure what consequence it will cause. Do you have any idea?

I will fix it in the next version of this patchset.

>>
>>
>>> It should be OK to just align cma to pageblock_order.
>>> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
>>> pageblock_order as alignment too.
>>>
>>> In terms of virtio_mem, if I understand correctly, it relies on
>>> alloc_contig_range() to obtain contiguous free pages and offlines them to 
>>> reduce
>>> guest memory size. As the result of alloc_contig_range() alignment change,
>>> virtio_mem should be able to just align PFNs to pageblock_order.
>>>
>>> Thanks.
>>>
>>>
>>> [1] 
>>> https://lore.kernel.org/linux-mm/28b57903-fae6-47ac-7e1b-a1dd41421...@redhat.com/
>>>
>>> Zi Yan (3):
>>>   mm: cma: alloc_contig_range: use pageblock_order as the single
>>> alignment.
>>>   drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>>> size.
>>>   arch: powerpc: adjust fadump alignment to be pageblock aligned.
>>>
>>>  arch/powerpc/include/asm/fadump-internal.h |  4 +---
>>>  drivers/virtio/virtio_mem.c|  6 ++
>>>  include/linux/mmzone.h |  5 +
>>>  kernel/dma/contiguous.c|  2 +-
>>>  mm/cma.c   |  6 ++
>>>  mm/page_alloc.c| 12 +---
>>>  6 files changed, 12 insertions(+), 23 deletions(-)
>>>
>
> --
> Best Regards,
> Yan, Zi


--
Best Regards,
Yan, Zi


signature.asc
Description: OpenPGP digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] dt-bindings: Add io-tlb-segsize property for restricted-dma-pool

2021-11-23 Thread Rob Herring

On Tue, 23 Nov 2021 19:21:03 +0800, Hsin-Yi Wang wrote:
> Add a io-tlb-segsize property that each restricted-dma-pool can set its
> own io_tlb_segsize since some use cases require slabs larger than default
> value (128).
> 
> Signed-off-by: Hsin-Yi Wang 
> ---
>  .../bindings/reserved-memory/shared-dma-pool.yaml | 8 
>  1 file changed, 8 insertions(+)
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml:
 properties:io-tlb-segsize:type: 'anyOf' conditional failed, one must be fixed:
'u32' is not one of ['array', 'boolean', 'integer', 'null', 'number', 
'object', 'string']
'u32' is not of type 'array'
from schema $id: http://json-schema.org/draft-07/schema#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml:
 properties:io-tlb-segsize:type: 'u32' is not one of ['boolean', 'object']
from schema $id: http://devicetree.org/meta-schemas/core.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml:
 ignoring, error in schema: properties: io-tlb-segsize: type
warning: no schema found in file: 
./Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml
Documentation/devicetree/bindings/display/msm/gpu.example.dt.yaml:0:0: 
/example-1/reserved-memory/gpu@8f20: failed to match any schema with 
compatible: ['shared-dma-pool']
Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.example.dt.yaml:0:0:
 /example-0/reserved-memory/linux,cma: failed to match any schema with 
compatible: ['shared-dma-pool']
Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.example.dt.yaml:0:0:
 /example-0/reserved-memory/restricted-dma-pool@5000: failed to match any 
schema with compatible: ['restricted-dma-pool']
Documentation/devicetree/bindings/dsp/fsl,dsp.example.dt.yaml:0:0: 
/example-1/vdev0buffer@9430: failed to match any schema with compatible: 
['shared-dma-pool']
Documentation/devicetree/bindings/remoteproc/ti,omap-remoteproc.example.dt.yaml:0:0:
 /example-0/reserved-memory/dsp-memory@9800: failed to match any schema 
with compatible: ['shared-dma-pool']
Documentation/devicetree/bindings/remoteproc/ti,omap-remoteproc.example.dt.yaml:0:0:
 /example-1/reserved-memory/ipu-memory@9580: failed to match any schema 
with compatible: ['shared-dma-pool']
Documentation/devicetree/bindings/remoteproc/ti,omap-remoteproc.example.dt.yaml:0:0:
 /example-2/reserved-memory/dsp1-memory@9900: failed to match any schema 
with compatible: ['shared-dma-pool']
Documentation/devicetree/bindings/sound/google,cros-ec-codec.example.dt.yaml:0:0:
 /example-0/reserved-mem@5280: failed to match any schema with compatible: 
['shared-dma-pool']

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/1558503

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 5/5] iommu/amd: remove useless irq affinity notifier

iommu->intcapxt_notify field is no longer used
after a switch to a separate domain was done

Fixes: d1adcfbb520c ("iommu/amd: Fix IOMMU interrupt generation in X2APIC mode")
Signed-off-by: Maxim Levitsky 
---
 drivers/iommu/amd/amd_iommu_types.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 867535eb0ce97..ffc89c4fb1205 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -645,8 +645,6 @@ struct amd_iommu {
/* DebugFS Info */
struct dentry *debugfs;
 #endif
-   /* IRQ notifier for IntCapXT interrupt */
-   struct irq_affinity_notify intcapxt_notify;
 };
 
 static inline struct amd_iommu *dev_to_amd_iommu(struct device *dev)
-- 
2.26.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 4/5] iommu/amd: x2apic mode: mask/unmask interrupts on suspend/resume

Use IRQCHIP_MASK_ON_SUSPEND to make the core irq code to
mask the iommu interrupt on suspend and unmask it on the resume.

Since now the unmask function updates the INTX settings,
that will restore them on resume from s3/s4.

Since IRQCHIP_MASK_ON_SUSPEND is only effective for interrupts
which are not wakeup sources, remove IRQCHIP_SKIP_SET_WAKE flag
and instead implement a dummy .irq_set_wake which doesn't allow
the interrupt to become a wakeup source.

Fixes: 66929812955bb ("iommu/amd: Add support for X2APIC IOMMU interrupts")

Signed-off-by: Maxim Levitsky 
---
 drivers/iommu/amd/init.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9e895bb8086a6..b94822fc2c9f7 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2104,6 +2104,11 @@ static int intcapxt_set_affinity(struct irq_data *irqd,
return 0;
 }
 
+static int intcapxt_set_wake(struct irq_data *irqd, unsigned int on)
+{
+   return on ? -EOPNOTSUPP : 0;
+}
+
 static struct irq_chip intcapxt_controller = {
.name   = "IOMMU-MSI",
.irq_unmask = intcapxt_unmask_irq,
@@ -2111,7 +2116,8 @@ static struct irq_chip intcapxt_controller = {
.irq_ack= irq_chip_ack_parent,
.irq_retrigger  = irq_chip_retrigger_hierarchy,
.irq_set_affinity   = intcapxt_set_affinity,
-   .flags  = IRQCHIP_SKIP_SET_WAKE,
+   .irq_set_wake   = intcapxt_set_wake,
+   .flags  = IRQCHIP_MASK_ON_SUSPEND,
 };
 
 static const struct irq_domain_ops intcapxt_domain_ops = {
-- 
2.26.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/5] iommu/amd: x2apic mode: setup the INTX registers on mask/unmask

This is more logically correct and will also allow us to
to use mask/unmask logic to restore INTX setttings after
the resume from s3/s4.

Fixes: 66929812955bb ("iommu/amd: Add support for X2APIC IOMMU interrupts")

Signed-off-by: Maxim Levitsky 
---
 drivers/iommu/amd/init.c | 65 
 1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b905604f434e1..9e895bb8086a6 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2015,48 +2015,18 @@ union intcapxt {
};
 } __attribute__ ((packed));
 
-/*
- * There isn't really any need to mask/unmask at the irqchip level because
- * the 64-bit INTCAPXT registers can be updated atomically without tearing
- * when the affinity is being updated.
- */
-static void intcapxt_unmask_irq(struct irq_data *data)
-{
-}
-
-static void intcapxt_mask_irq(struct irq_data *data)
-{
-}
 
 static struct irq_chip intcapxt_controller;
 
 static int intcapxt_irqdomain_activate(struct irq_domain *domain,
   struct irq_data *irqd, bool reserve)
 {
-   struct amd_iommu *iommu = irqd->chip_data;
-   struct irq_cfg *cfg = irqd_cfg(irqd);
-   union intcapxt xt;
-
-   xt.capxt = 0ULL;
-   xt.dest_mode_logical = apic->dest_mode_logical;
-   xt.vector = cfg->vector;
-   xt.destid_0_23 = cfg->dest_apicid & GENMASK(23, 0);
-   xt.destid_24_31 = cfg->dest_apicid >> 24;
-
-   /**
-* Current IOMMU implemtation uses the same IRQ for all
-* 3 IOMMU interrupts.
-*/
-   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
-   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
-   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
return 0;
 }
 
 static void intcapxt_irqdomain_deactivate(struct irq_domain *domain,
  struct irq_data *irqd)
 {
-   intcapxt_mask_irq(irqd);
 }
 
 
@@ -2090,6 +2060,38 @@ static void intcapxt_irqdomain_free(struct irq_domain 
*domain, unsigned int virq
irq_domain_free_irqs_top(domain, virq, nr_irqs);
 }
 
+
+static void intcapxt_unmask_irq(struct irq_data *irqd)
+{
+   struct amd_iommu *iommu = irqd->chip_data;
+   struct irq_cfg *cfg = irqd_cfg(irqd);
+   union intcapxt xt;
+
+   xt.capxt = 0ULL;
+   xt.dest_mode_logical = apic->dest_mode_logical;
+   xt.vector = cfg->vector;
+   xt.destid_0_23 = cfg->dest_apicid & GENMASK(23, 0);
+   xt.destid_24_31 = cfg->dest_apicid >> 24;
+
+   /**
+* Current IOMMU implementation uses the same IRQ for all
+* 3 IOMMU interrupts.
+*/
+   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
+   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
+   writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+}
+
+static void intcapxt_mask_irq(struct irq_data *irqd)
+{
+   struct amd_iommu *iommu = irqd->chip_data;
+
+   writeq(0, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
+   writeq(0, iommu->mmio_base + MMIO_INTCAPXT_PPR_OFFSET);
+   writeq(0, iommu->mmio_base + MMIO_INTCAPXT_GALOG_OFFSET);
+}
+
+
 static int intcapxt_set_affinity(struct irq_data *irqd,
 const struct cpumask *mask, bool force)
 {
@@ -2099,8 +2101,7 @@ static int intcapxt_set_affinity(struct irq_data *irqd,
ret = parent->chip->irq_set_affinity(parent, mask, force);
if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
return ret;
-
-   return intcapxt_irqdomain_activate(irqd->domain, irqd, false);
+   return 0;
 }
 
 static struct irq_chip intcapxt_controller = {
-- 
2.26.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/5] iommu/amd: x2apic mode: re-enable after resume

Otherwise it is guaranteed to not work after the resume...

Fixes: 66929812955bb ("iommu/amd: Add support for X2APIC IOMMU interrupts")

Signed-off-by: Maxim Levitsky 
---
 drivers/iommu/amd/init.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 8dae85fcfc2eb..b905604f434e1 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2172,7 +2172,6 @@ static int iommu_setup_intcapxt(struct amd_iommu *iommu)
return ret;
}
 
-   iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
return 0;
 }
 
@@ -2195,6 +2194,10 @@ static int iommu_init_irq(struct amd_iommu *iommu)
 
iommu->int_enabled = true;
 enable_faults:
+
+   if (amd_iommu_xt_mode == IRQ_REMAP_X2APIC_MODE)
+   iommu_feature_enable(iommu, CONTROL_INTCAPXT_EN);
+
iommu_feature_enable(iommu, CONTROL_EVT_INT_EN);
 
if (iommu->ppr_log != NULL)
-- 
2.26.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/5] iommu/amd: restore GA log/tail pointer on host resume

This will give IOMMU GA log a chance to work after resume
from s3/s4.

Fixes: 8bda0cfbdc1a6 ("iommu/amd: Detect and initialize guest vAPIC log")

Signed-off-by: Maxim Levitsky 
---
 drivers/iommu/amd/init.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1eacd43cb4368..8dae85fcfc2eb 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -806,16 +806,27 @@ static int iommu_ga_log_enable(struct amd_iommu *iommu)
 {
 #ifdef CONFIG_IRQ_REMAP
u32 status, i;
+   u64 entry;
 
if (!iommu->ga_log)
return -EINVAL;
 
-   status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
-
/* Check if already running */
-   if (status & (MMIO_STATUS_GALOG_RUN_MASK))
+   status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
+   if (WARN_ON(status & (MMIO_STATUS_GALOG_RUN_MASK)))
return 0;
 
+   entry = iommu_virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
+   memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET,
+   , sizeof(entry));
+   entry = (iommu_virt_to_phys(iommu->ga_log_tail) &
+(BIT_ULL(52)-1)) & ~7ULL;
+   memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET,
+   , sizeof(entry));
+   writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
+   writel(0x00, iommu->mmio_base + MMIO_GA_TAIL_OFFSET);
+
+
iommu_feature_enable(iommu, CONTROL_GAINT_EN);
iommu_feature_enable(iommu, CONTROL_GALOG_EN);
 
@@ -825,7 +836,7 @@ static int iommu_ga_log_enable(struct amd_iommu *iommu)
break;
}
 
-   if (i >= LOOP_TIMEOUT)
+   if (WARN_ON(i >= LOOP_TIMEOUT))
return -EINVAL;
 #endif /* CONFIG_IRQ_REMAP */
return 0;
@@ -834,8 +845,6 @@ static int iommu_ga_log_enable(struct amd_iommu *iommu)
 static int iommu_init_ga_log(struct amd_iommu *iommu)
 {
 #ifdef CONFIG_IRQ_REMAP
-   u64 entry;
-
if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
return 0;
 
@@ -849,16 +858,6 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
if (!iommu->ga_log_tail)
goto err_out;
 
-   entry = iommu_virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512;
-   memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET,
-   , sizeof(entry));
-   entry = (iommu_virt_to_phys(iommu->ga_log_tail) &
-(BIT_ULL(52)-1)) & ~7ULL;
-   memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET,
-   , sizeof(entry));
-   writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET);
-   writel(0x00, iommu->mmio_base + MMIO_GA_TAIL_OFFSET);
-
return 0;
 err_out:
free_ga_log(iommu);
-- 
2.26.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/5] iommu/amd: fixes for suspend/resume

As I sadly found out, a s3 cycle makes the AMD's iommu stop sending interrupts
until the system is rebooted.

I only noticed it now because otherwise the IOMMU works, and these interrupts
are only used for errors and for GA log which I tend not to use by
making my VMs do mwait/pause/etc in guest (cpu-pm=on).

There are two issues here that prevent interrupts from being generated after
s3 cycle:

1. GA log base address was not restored after resume, and was all zeroed
after resume (by BIOS or such).

In theory if BIOS writes some junk to it, that can even cause a memory 
corruption.
Patch 2 fixes that.

2. INTX (aka x2apic mode) settings were not restored after resume.
That mode is used regardless if the host uses/supports x2apic, but rather when
the IOMMU supports it, and mine does.
Patches 3-4 fix that.

Note that there is still one slight (userspace) bug remaining:
During suspend all but the boot CPU are offlined and then after resume
are onlined again.

The offlining moves all non-affinity managed interrupts to CPU0, and
later when all other CPUs are onlined, there is nothing in the kernel
to spread back the interrupts over the cores.

The userspace 'irqbalance' daemon does fix this but it seems to ignore
the IOMMU interrupts in INTX mode since they are not attached to any
PCI device, and thus they remain on CPU0 after a s3 cycle,
which is suboptimal when the system has multiple IOMMUs
(mine has 4 of them).

Setting the IRQ affinity manually via /proc/irq/ does work.

This was tested on my 3970X with both INTX and regular MSI mode (later was 
enabled
by patching out INTX detection), by running a guest with AVIC enabled and with
a PCI assigned device (network card), and observing interrupts from
IOMMU while guest is mostly idle.

This was also tested on my AMD laptop with 4650U (which has the same issue)
(I tested only INTX mode)

Patch 1 is a small refactoring to remove an unused struct field.

Best regards,
   Maxim Levitsky

Maxim Levitsky (5):
  iommu/amd: restore GA log/tail pointer on host resume
  iommu/amd: x2apic mode: re-enable after resume
  iommu/amd: x2apic mode: setup the INTX registers on mask/unmask
  iommu/amd: x2apic mode: mask/unmask interrupts on suspend/resume
  iommu/amd: remove useless irq affinity notifier

 drivers/iommu/amd/amd_iommu_types.h |   2 -
 drivers/iommu/amd/init.c| 107 +++-
 2 files changed, 58 insertions(+), 51 deletions(-)

-- 
2.26.3


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 4/5] iommu/virtio: Pass end address to viommu_add_mapping()

To support identity mappings, the virtio-iommu driver must be able to
represent full 64-bit ranges internally. Pass (start, end) instead of
(start, size) to viommu_add/del_mapping().

Clean comments. The one about the returned size was never true: when
sweeping the whole address space the returned size will most certainly
be smaller than 2^64.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index d63ec4d11b00..eceb9281c8c1 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -311,8 +311,8 @@ static int viommu_send_req_sync(struct viommu_dev *viommu, 
void *buf,
  *
  * On success, return the new mapping. Otherwise return NULL.
  */
-static int viommu_add_mapping(struct viommu_domain *vdomain, unsigned long 
iova,
- phys_addr_t paddr, size_t size, u32 flags)
+static int viommu_add_mapping(struct viommu_domain *vdomain, u64 iova, u64 end,
+ phys_addr_t paddr, u32 flags)
 {
unsigned long irqflags;
struct viommu_mapping *mapping;
@@ -323,7 +323,7 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
 
mapping->paddr  = paddr;
mapping->iova.start = iova;
-   mapping->iova.last  = iova + size - 1;
+   mapping->iova.last  = end;
mapping->flags  = flags;
 
spin_lock_irqsave(>mappings_lock, irqflags);
@@ -338,26 +338,24 @@ static int viommu_add_mapping(struct viommu_domain 
*vdomain, unsigned long iova,
  *
  * @vdomain: the domain
  * @iova: start of the range
- * @size: size of the range. A size of 0 corresponds to the entire address
- * space.
+ * @end: end of the range
  *
- * On success, returns the number of unmapped bytes (>= size)
+ * On success, returns the number of unmapped bytes
  */
 static size_t viommu_del_mappings(struct viommu_domain *vdomain,
- unsigned long iova, size_t size)
+ u64 iova, u64 end)
 {
size_t unmapped = 0;
unsigned long flags;
-   unsigned long last = iova + size - 1;
struct viommu_mapping *mapping = NULL;
struct interval_tree_node *node, *next;
 
spin_lock_irqsave(>mappings_lock, flags);
-   next = interval_tree_iter_first(>mappings, iova, last);
+   next = interval_tree_iter_first(>mappings, iova, end);
while (next) {
node = next;
mapping = container_of(node, struct viommu_mapping, iova);
-   next = interval_tree_iter_next(node, iova, last);
+   next = interval_tree_iter_next(node, iova, end);
 
/* Trying to split a mapping? */
if (mapping->iova.start < iova)
@@ -656,8 +654,8 @@ static void viommu_domain_free(struct iommu_domain *domain)
 {
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-   /* Free all remaining mappings (size 2^64) */
-   viommu_del_mappings(vdomain, 0, 0);
+   /* Free all remaining mappings */
+   viommu_del_mappings(vdomain, 0, ULLONG_MAX);
 
if (vdomain->viommu)
ida_free(>viommu->domain_ids, vdomain->id);
@@ -742,6 +740,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 {
int ret;
u32 flags;
+   u64 end = iova + size - 1;
struct virtio_iommu_req_map map;
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
@@ -752,7 +751,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
if (flags & ~vdomain->map_flags)
return -EINVAL;
 
-   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
+   ret = viommu_add_mapping(vdomain, iova, end, paddr, flags);
if (ret)
return ret;
 
@@ -761,7 +760,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
.domain = cpu_to_le32(vdomain->id),
.virt_start = cpu_to_le64(iova),
.phys_start = cpu_to_le64(paddr),
-   .virt_end   = cpu_to_le64(iova + size - 1),
+   .virt_end   = cpu_to_le64(end),
.flags  = cpu_to_le32(flags),
};
 
@@ -770,7 +769,7 @@ static int viommu_map(struct iommu_domain *domain, unsigned 
long iova,
 
ret = viommu_send_req_sync(vdomain->viommu, , sizeof(map));
if (ret)
-   viommu_del_mappings(vdomain, iova, size);
+   viommu_del_mappings(vdomain, iova, end);
 
return ret;
 }
@@ -783,7 +782,7 @@ static size_t viommu_unmap(struct iommu_domain *domain, 
unsigned long iova,
struct virtio_iommu_req_unmap unmap;
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-

[PATCH v2 5/5] iommu/virtio: Support identity-mapped domains

Support identity domains for devices that do not offer the
VIRTIO_IOMMU_F_BYPASS_CONFIG feature, by creating 1:1 mappings between
the virtual and physical address space. Identity domains created this
way still perform noticeably better than DMA domains, because they don't
have the overhead of setting up and tearing down mappings at runtime.
The performance difference between this and bypass is minimal in
comparison.

It does not matter that the physical addresses in the identity mappings
do not all correspond to memory. By enabling passthrough we are trusting
the device driver and the device itself to only perform DMA to suitable
locations. In some cases it may even be desirable to perform DMA to MMIO
regions.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 63 +---
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index eceb9281c8c1..6a8a52b4297b 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -375,6 +375,55 @@ static size_t viommu_del_mappings(struct viommu_domain 
*vdomain,
return unmapped;
 }
 
+/*
+ * Fill the domain with identity mappings, skipping the device's reserved
+ * regions.
+ */
+static int viommu_domain_map_identity(struct viommu_endpoint *vdev,
+ struct viommu_domain *vdomain)
+{
+   int ret;
+   struct iommu_resv_region *resv;
+   u64 iova = vdomain->domain.geometry.aperture_start;
+   u64 limit = vdomain->domain.geometry.aperture_end;
+   u32 flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
+   unsigned long granule = 1UL << __ffs(vdomain->domain.pgsize_bitmap);
+
+   iova = ALIGN(iova, granule);
+   limit = ALIGN_DOWN(limit + 1, granule) - 1;
+
+   list_for_each_entry(resv, >resv_regions, list) {
+   u64 resv_start = ALIGN_DOWN(resv->start, granule);
+   u64 resv_end = ALIGN(resv->start + resv->length, granule) - 1;
+
+   if (resv_end < iova || resv_start > limit)
+   /* No overlap */
+   continue;
+
+   if (resv_start > iova) {
+   ret = viommu_add_mapping(vdomain, iova, resv_start - 1,
+(phys_addr_t)iova, flags);
+   if (ret)
+   goto err_unmap;
+   }
+
+   if (resv_end >= limit)
+   return 0;
+
+   iova = resv_end + 1;
+   }
+
+   ret = viommu_add_mapping(vdomain, iova, limit, (phys_addr_t)iova,
+flags);
+   if (ret)
+   goto err_unmap;
+   return 0;
+
+err_unmap:
+   viommu_del_mappings(vdomain, 0, iova);
+   return ret;
+}
+
 /*
  * viommu_replay_mappings - re-send MAP requests
  *
@@ -637,14 +686,18 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->viommu = viommu;
 
if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-   if (!virtio_has_feature(viommu->vdev,
-   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   if (virtio_has_feature(viommu->vdev,
+  VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   vdomain->bypass = true;
+   return 0;
+   }
+
+   ret = viommu_domain_map_identity(vdev, vdomain);
+   if (ret) {
ida_free(>domain_ids, vdomain->id);
-   vdomain->viommu = 0;
+   vdomain->viommu = NULL;
return -EOPNOTSUPP;
}
-
-   vdomain->bypass = true;
}
 
return 0;
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 3/5] iommu/virtio: Sort reserved regions

To ease identity mapping support, keep the list of reserved regions
sorted.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index ee8a7afd667b..d63ec4d11b00 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -423,7 +423,7 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
size_t size;
u64 start64, end64;
phys_addr_t start, end;
-   struct iommu_resv_region *region = NULL;
+   struct iommu_resv_region *region = NULL, *next;
unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
start = start64 = le64_to_cpu(mem->start);
@@ -454,7 +454,12 @@ static int viommu_add_resv_mem(struct viommu_endpoint 
*vdev,
if (!region)
return -ENOMEM;
 
-   list_add(>list, >resv_regions);
+   /* Keep the list sorted */
+   list_for_each_entry(next, >resv_regions, list) {
+   if (next->start > region->start)
+   break;
+   }
+   list_add_tail(>list, >list);
return 0;
 }
 
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 2/5] iommu/virtio: Support bypass domains

The VIRTIO_IOMMU_F_BYPASS_CONFIG feature adds a new flag to the ATTACH
request, that creates a bypass domain. Use it to enable identity
domains.

When VIRTIO_IOMMU_F_BYPASS_CONFIG is not supported by the device, we
currently fail attaching to an identity domain. Future patches will
instead create identity mappings in this case.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 80930ce04a16..ee8a7afd667b 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -71,6 +71,7 @@ struct viommu_domain {
struct rb_root_cached   mappings;
 
unsigned long   nr_endpoints;
+   boolbypass;
 };
 
 struct viommu_endpoint {
@@ -587,7 +588,9 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
 {
struct viommu_domain *vdomain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+   if (type != IOMMU_DOMAIN_UNMANAGED &&
+   type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_IDENTITY)
return NULL;
 
vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
@@ -630,6 +633,17 @@ static int viommu_domain_finalise(struct viommu_endpoint 
*vdev,
vdomain->map_flags  = viommu->map_flags;
vdomain->viommu = viommu;
 
+   if (domain->type == IOMMU_DOMAIN_IDENTITY) {
+   if (!virtio_has_feature(viommu->vdev,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG)) {
+   ida_free(>domain_ids, vdomain->id);
+   vdomain->viommu = 0;
+   return -EOPNOTSUPP;
+   }
+
+   vdomain->bypass = true;
+   }
+
return 0;
 }
 
@@ -691,6 +705,9 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
.domain = cpu_to_le32(vdomain->id),
};
 
+   if (vdomain->bypass)
+   req.flags |= cpu_to_le32(VIRTIO_IOMMU_ATTACH_F_BYPASS);
+
for (i = 0; i < fwspec->num_ids; i++) {
req.endpoint = cpu_to_le32(fwspec->ids[i]);
 
@@ -1132,6 +1149,7 @@ static unsigned int features[] = {
VIRTIO_IOMMU_F_DOMAIN_RANGE,
VIRTIO_IOMMU_F_PROBE,
VIRTIO_IOMMU_F_MMIO,
+   VIRTIO_IOMMU_F_BYPASS_CONFIG,
 };
 
 static struct virtio_device_id id_table[] = {
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/5] iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG

Add definitions for the VIRTIO_IOMMU_F_BYPASS_CONFIG, which supersedes
VIRTIO_IOMMU_F_BYPASS.

Reviewed-by: Kevin Tian 
Signed-off-by: Jean-Philippe Brucker 
---
 include/uapi/linux/virtio_iommu.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_iommu.h 
b/include/uapi/linux/virtio_iommu.h
index 237e36a280cb..cafd8cf7febf 100644
--- a/include/uapi/linux/virtio_iommu.h
+++ b/include/uapi/linux/virtio_iommu.h
@@ -16,6 +16,7 @@
 #define VIRTIO_IOMMU_F_BYPASS  3
 #define VIRTIO_IOMMU_F_PROBE   4
 #define VIRTIO_IOMMU_F_MMIO5
+#define VIRTIO_IOMMU_F_BYPASS_CONFIG   6
 
 struct virtio_iommu_range_64 {
__le64  start;
@@ -36,6 +37,8 @@ struct virtio_iommu_config {
struct virtio_iommu_range_32domain_range;
/* Probe buffer size */
__le32  probe_size;
+   __u8bypass;
+   __u8reserved[7];
 };
 
 /* Request types */
@@ -66,11 +69,14 @@ struct virtio_iommu_req_tail {
__u8reserved[3];
 };
 
+#define VIRTIO_IOMMU_ATTACH_F_BYPASS   (1 << 0)
+
 struct virtio_iommu_req_attach {
struct virtio_iommu_req_headhead;
__le32  domain;
__le32  endpoint;
-   __u8reserved[8];
+   __le32  flags;
+   __u8reserved[4];
struct virtio_iommu_req_tailtail;
 };
 
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 0/5] iommu/virtio: Add identity domains

Support identity domains, allowing to only enable IOMMU protection for a
subset of endpoints (those assigned to userspace, for example). Users
may enable identity domains at compile time
(CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time (iommu.passthrough=1) or
runtime (/sys/kernel/iommu_groups/*/type = identity).

Since v1 [1] I rebased onto v5.16-rc and added Kevin's review tag.
The specification update for the new feature has now been accepted [2].

Patches 1-2 support identity domains using the optional
VIRTIO_IOMMU_F_BYPASS_CONFIG feature, and patches 3-5 add a fallback to
identity mappings, when the feature is not supported.

QEMU patches are on my virtio-iommu/bypass branch [3], and depend on the
UAPI update.

[1] 
https://lore.kernel.org/linux-iommu/20211013121052.518113-1-jean-phili...@linaro.org/
[2] https://github.com/oasis-tcs/virtio-spec/issues/119
[3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass

Jean-Philippe Brucker (5):
  iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
  iommu/virtio: Support bypass domains
  iommu/virtio: Sort reserved regions
  iommu/virtio: Pass end address to viommu_add_mapping()
  iommu/virtio: Support identity-mapped domains

 include/uapi/linux/virtio_iommu.h |   8 ++-
 drivers/iommu/virtio-iommu.c  | 113 +-
 2 files changed, 101 insertions(+), 20 deletions(-)

-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V2 5/6] net: netvsc: Add Isolation VM support for netvsc driver

From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

Allocate rx/tx ring buffer via dma_alloc_noncontiguous() in Isolation
VM. After calling vmbus_establish_gpadl() which marks these pages visible
to host, map these pages unencrypted addes space via dma_vmap_noncontiguous().

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |   5 +
 drivers/net/hyperv/netvsc.c   | 192 +++---
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h|   6 +
 4 files changed, 190 insertions(+), 15 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..31c77a00d01e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   struct sg_table *recv_sgt;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   struct sg_table *send_sgt;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..9cdc71930830 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -146,15 +147,39 @@ static struct netvsc_device *alloc_net_device(void)
return net_device;
 }
 
+static struct hv_device *netvsc_channel_to_device(struct vmbus_channel 
*channel)
+{
+   struct vmbus_channel *primary = channel->primary_channel;
+
+   return primary ? primary->device_obj : channel->device_obj;
+}
+
 static void free_netvsc_device(struct rcu_head *head)
 {
struct netvsc_device *nvdev
= container_of(head, struct netvsc_device, rcu);
+   struct hv_device *dev =
+   netvsc_channel_to_device(nvdev->chan_table[0].channel);
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_sgt) {
+   dma_vunmap_noncontiguous(>device, nvdev->recv_buf);
+   dma_free_noncontiguous(>device, nvdev->recv_buf_size,
+  nvdev->recv_sgt, DMA_FROM_DEVICE);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_sgt) {
+   dma_vunmap_noncontiguous(>device, nvdev->send_buf);
+   dma_free_noncontiguous(>device, nvdev->send_buf_size,
+  nvdev->send_sgt, DMA_TO_DEVICE);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -348,7 +373,21 @@ static int netvsc_init_buf(struct hv_device *device,
buf_size = min_t(unsigned int, buf_size,
 NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
 
-   net_device->recv_buf = vzalloc(buf_size);
+   if (hv_isolation_type_snp()) {
+   net_device->recv_sgt =
+   dma_alloc_noncontiguous(>device, buf_size,
+   DMA_FROM_DEVICE, GFP_KERNEL, 0);
+   if (!net_device->recv_sgt) {
+   pr_err("Fail to allocate recv buffer buf_size %d.\n.", 
buf_size);
+   ret = -ENOMEM;
+   goto cleanup;
+   }
+
+   net_device->recv_buf = (void 
*)net_device->recv_sgt->sgl->dma_address;
+   } else {
+   net_device->recv_buf = vzalloc(buf_size);
+   }
+
if (!net_device->recv_buf) {
netdev_err(ndev,
   "unable to allocate receive buffer of size %u\n",
@@ -357,8 +396,6 @@ static int netvsc_init_buf(struct hv_device *device,

[PATCH V2 6/6] scsi: storvsc: Add Isolation VM support for storvsc driver

From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
-   hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
-   hvpgoff;
+   hvpfn = HVPFN_DOWN(sg_dma_address(sg));
+   hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
+sg_dma_len(sg)) - hvpfn;
 
/*
 * Fill the next portion of the PFN array with
@@ -1872,7 +1876,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
 * the PFN array is filled.
 */
while (hvpfns_to_add--)
-   payload->range.pfn_array[i++] = hvpfn++;
+   payload->range.pfn_array[i++] = hvpfn++;
}
}
 
@@ -2016,6

[PATCH V2 3/6] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

From: Tianyu Lan 

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cc_platform.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..f3bb0431f5c5 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,23 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_HYPERV
+   if (attr == CC_ATTR_GUEST_MEM_ENCRYPT)
+   return true;
+   else
+   return false;
+#else
+   return false;
+#endif
+}
 
 bool cc_platform_has(enum cc_attr attr)
 {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V2 2/6] dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops

From: Tianyu Lan 

Hyper-V netvsc driver needs to allocate noncontiguous DMA memory and
remap it into unencrypted address space before sharing with host. Add
vmap/vunmap_noncontiguous() callback and handle the remap in the Hyper-V
dma ops callback.

Signed-off-by: Tianyu Lan 
---
 include/linux/dma-map-ops.h |  3 +++
 kernel/dma/mapping.c| 18 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..f7b9958ca20a 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -27,6 +27,9 @@ struct dma_map_ops {
unsigned long attrs);
void (*free_noncontiguous)(struct device *dev, size_t size,
struct sg_table *sgt, enum dma_data_direction dir);
+   void *(*vmap_noncontiguous)(struct device *dev, size_t size,
+   struct sg_table *sgt);
+   void (*vunmap_noncontiguous)(struct device *dev, void *addr);
int (*mmap)(struct device *, struct vm_area_struct *,
void *, dma_addr_t, size_t, unsigned long attrs);
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9478eccd1c8e..7fd751d866cc 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -674,8 +674,14 @@ void *dma_vmap_noncontiguous(struct device *dev, size_t 
size,
const struct dma_map_ops *ops = get_dma_ops(dev);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 
-   if (ops && ops->alloc_noncontiguous)
-   return vmap(sgt_handle(sgt)->pages, count, VM_MAP, PAGE_KERNEL);
+   if (ops) {
+   if (ops->vmap_noncontiguous)
+   return ops->vmap_noncontiguous(dev, size, sgt);
+   else if (ops->alloc_noncontiguous)
+   return vmap(sgt_handle(sgt)->pages, count, VM_MAP,
+   PAGE_KERNEL);
+   }
+
return page_address(sg_page(sgt->sgl));
 }
 EXPORT_SYMBOL_GPL(dma_vmap_noncontiguous);
@@ -684,8 +690,12 @@ void dma_vunmap_noncontiguous(struct device *dev, void 
*vaddr)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
 
-   if (ops && ops->alloc_noncontiguous)
-   vunmap(vaddr);
+   if (ops) {
+   if (ops->vunmap_noncontiguous)
+   ops->vunmap_noncontiguous(dev, vaddr);
+   else if (ops->alloc_noncontiguous)
+   vunmap(vaddr);
+   }
 }
 EXPORT_SYMBOL_GPL(dma_vunmap_noncontiguous);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V2 4/6] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes()
in the hyperv_iommu_swiotlb_later_init().

Add Hyper-V dma ops and provide alloc/free and vmap/vunmap noncontiguous
callback to handle request of  allocating and mapping noncontiguous dma
memory in vmbus device driver. Netvsc driver will use this. Set dma_ops_
bypass flag for hv device to use dma direct functions during mapping/unmapping
dma page.

Signed-off-by: Tianyu Lan 
---
Change since v1:
* Remove hv isolation check in the sev_setup_arch()

 arch/x86/mm/mem_encrypt.c  |   1 +
 arch/x86/xen/pci-swiotlb-xen.c |   3 +-
 drivers/hv/Kconfig |   1 +
 drivers/hv/vmbus_drv.c |   6 ++
 drivers/iommu/hyperv-iommu.c   | 164 +
 include/linux/hyperv.h |  10 ++
 6 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 35487305d8af..e48c73b3dd41 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mm_internal.h"
 
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 46df59aeaa06..30fd0600b008 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index dd12af20e467..d43b4cd88f57 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -9,6 +9,7 @@ config HYPERV
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
select VMAP_PFN
+   select DMA_OPS_BYPASS
help
  Select this option to run Linux as a Hyper-V client operating
  system.
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..32dc193e31cd 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,10 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_ops_bypass = true;
+   child_device_obj->device.dma_ops = _iommu_dma_ops;
+   child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..ebcb628e7e8f 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,21 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -337,4 +344,161 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
.free = hyperv_root_irq_remapping_free,
 };
 
+static void __init hyperv_iommu_swiotlb_init(void)
+{
+   unsigned long hyperv_io_tlb_size;
+

[PATCH V2 1/6] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 +
 kernel/dma/swiotlb.c| 53 +
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..c303fdeba82f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,31 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr) {
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   return NULL;
+   }
+
+   return vaddr;
+   }
+
+   return phys_to_virt(mem->start);
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +200,14 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr) {
+   pr_err("Fail to remap swiotlb mem.\n");
+   return;
+   }
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +231,18 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
}
+
+   /*
+* If swiotlb_unencrypted_base is set, the bounce buffer memory will
+* be remapped and cleared in swiotlb_update_mem_attributes.
+*/
+   if (swiotlb_unencrypted_base)
+

[PATCH V2 0/6] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM. Add Hyper-V dma ops and provide dma_alloc/free_
noncontiguous and vmap/vunmap_noncontiguous callback. Allocate
rx/tx ring via dma_alloc_noncontiguous() and map them into extra
address space via dma_vmap_noncontiguous().

Change since v1:
 * Add Hyper-V Isolation support check in the cc_platform_has()
   and return true for guest memory encrypt attr.
 * Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (6):
  Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  dma-mapping: Add vmap/vunmap_noncontiguous() callback in dma ops
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  net: netvsc: Add Isolation VM support for netvsc driver
  scsi: storvsc: Add Isolation VM support for storvsc driver

 arch/x86/kernel/cc_platform.c |  15 +++
 arch/x86/mm/mem_encrypt.c |   1 +
 arch/x86/xen/pci-swiotlb-xen.c|   3 +-
 drivers/hv/Kconfig|   1 +
 drivers/hv/vmbus_drv.c|   6 +
 drivers/iommu/hyperv-iommu.c  | 164 +
 drivers/net/hyperv/hyperv_net.h   |   5 +
 drivers/net/hyperv/netvsc.c   | 192 +++---
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 +++---
 include/linux/dma-map-ops.h   |   3 +
 include/linux/hyperv.h|  17 +++
 include/linux/swiotlb.h   |   6 +
 kernel/dma/mapping.c  |  18 ++-
 kernel/dma/swiotlb.c  |  53 -
 15 files changed, 482 insertions(+), 41 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 9/9] iommu: Move flush queue data into iommu_dma_cookie

Complete the move into iommu-dma by refactoring the flush queues
themselves to belong to the DMA cookie rather than the IOVA domain.

The refactoring may as well extend to some minor cosmetic aspects
too, to help us stay one step ahead of the style police.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 171 +-
 drivers/iommu/iova.c  |   2 -
 include/linux/iova.h  |  44 +-
 3 files changed, 95 insertions(+), 122 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ddf75e7c2ebc..8a1aa980d376 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -9,9 +9,12 @@
  */
 
 #include 
+#include 
+#include 
 #include 
-#include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -20,11 +23,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
+#include 
+#include 
 #include 
-#include 
-#include 
 
 struct iommu_dma_msi_page {
struct list_headlist;
@@ -41,7 +43,19 @@ struct iommu_dma_cookie {
enum iommu_dma_cookie_type  type;
union {
/* Full allocator for IOMMU_DMA_IOVA_COOKIE */
-   struct iova_domain  iovad;
+   struct {
+   struct iova_domain  iovad;
+
+   struct iova_fq __percpu *fq;/* Flush queue */
+   /* Number of TLB flushes that have been started */
+   atomic64_t  fq_flush_start_cnt;
+   /* Number of TLB flushes that have been finished */
+   atomic64_t  fq_flush_finish_cnt;
+   /* Timer to regularily empty the flush queues */
+   struct timer_list   fq_timer;
+   /* 1 when timer is active, 0 when not */
+   atomic_tfq_timer_on;
+   };
/* Trivial linear page allocator for IOMMU_DMA_MSI_COOKIE */
dma_addr_t  msi_iova;
};
@@ -65,6 +79,27 @@ static int __init iommu_dma_forcedac_setup(char *str)
 early_param("iommu.forcedac", iommu_dma_forcedac_setup);
 
 
+/* Number of entries per flush queue */
+#define IOVA_FQ_SIZE   256
+
+/* Timeout (in ms) after which entries are flushed from the queue */
+#define IOVA_FQ_TIMEOUT10
+
+/* Flush queue entry for deferred flushing */
+struct iova_fq_entry {
+   unsigned long iova_pfn;
+   unsigned long pages;
+   struct list_head freelist;
+   u64 counter; /* Flush counter when this entry was added */
+};
+
+/* Per-CPU flush queue structure */
+struct iova_fq {
+   struct iova_fq_entry entries[IOVA_FQ_SIZE];
+   unsigned int head, tail;
+   spinlock_t lock;
+};
+
 #define fq_ring_for_each(i, fq) \
for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % 
IOVA_FQ_SIZE)
 
@@ -74,9 +109,9 @@ static inline bool fq_full(struct iova_fq *fq)
return (((fq->tail + 1) % IOVA_FQ_SIZE) == fq->head);
 }
 
-static inline unsigned fq_ring_add(struct iova_fq *fq)
+static inline unsigned int fq_ring_add(struct iova_fq *fq)
 {
-   unsigned idx = fq->tail;
+   unsigned int idx = fq->tail;
 
assert_spin_locked(>lock);
 
@@ -85,10 +120,10 @@ static inline unsigned fq_ring_add(struct iova_fq *fq)
return idx;
 }
 
-static void fq_ring_free(struct iova_domain *iovad, struct iova_fq *fq)
+static void fq_ring_free(struct iommu_dma_cookie *cookie, struct iova_fq *fq)
 {
-   u64 counter = atomic64_read(>fq_flush_finish_cnt);
-   unsigned idx;
+   u64 counter = atomic64_read(>fq_flush_finish_cnt);
+   unsigned int idx;
 
assert_spin_locked(>lock);
 
@@ -98,7 +133,7 @@ static void fq_ring_free(struct iova_domain *iovad, struct 
iova_fq *fq)
break;
 
put_pages_list(>entries[idx].freelist);
-   free_iova_fast(iovad,
+   free_iova_fast(>iovad,
   fq->entries[idx].iova_pfn,
   fq->entries[idx].pages);
 
@@ -106,50 +141,50 @@ static void fq_ring_free(struct iova_domain *iovad, 
struct iova_fq *fq)
}
 }
 
-static void iova_domain_flush(struct iova_domain *iovad)
+static void fq_flush_iotlb(struct iommu_dma_cookie *cookie)
 {
-   atomic64_inc(>fq_flush_start_cnt);
-   iovad->fq_domain->ops->flush_iotlb_all(iovad->fq_domain);
-   atomic64_inc(>fq_flush_finish_cnt);
+   atomic64_inc(>fq_flush_start_cnt);
+   cookie->fq_domain->ops->flush_iotlb_all(cookie->fq_domain);
+   atomic64_inc(>fq_flush_finish_cnt);
 }
 
 static void fq_flush_timeout(struct timer_list *t)
 {
-   struct iova_domain *iovad = from_timer(iovad, t, fq_timer);
+   struct iommu_dma_cookie *cookie = from_timer(cookie, t, fq_timer);
int cpu;
 
-   atomic_set(>fq_timer_on, 0);
-   iova_domain_flush(iovad);
+   atomic_set(>fq_timer_on, 0);
+

[PATCH 8/9] iommu/iova: Move flush queue code to iommu-dma

Flush queues are specific to DMA ops, which are now handled exclusively
by iommu-dma. As such, now that the historical artefacts from being
shared directly with drivers have been cleaned up, move the flush queue
code into iommu-dma itself to get it out of the way of other IOVA users.

This is pure code movement with no functional change; refactoring to
clean up the headers and definitions will follow.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 179 +-
 drivers/iommu/iova.c  | 175 -
 2 files changed, 178 insertions(+), 176 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f139b77caee0..ddf75e7c2ebc 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -64,6 +64,181 @@ static int __init iommu_dma_forcedac_setup(char *str)
 }
 early_param("iommu.forcedac", iommu_dma_forcedac_setup);
 
+
+#define fq_ring_for_each(i, fq) \
+   for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % 
IOVA_FQ_SIZE)
+
+static inline bool fq_full(struct iova_fq *fq)
+{
+   assert_spin_locked(>lock);
+   return (((fq->tail + 1) % IOVA_FQ_SIZE) == fq->head);
+}
+
+static inline unsigned fq_ring_add(struct iova_fq *fq)
+{
+   unsigned idx = fq->tail;
+
+   assert_spin_locked(>lock);
+
+   fq->tail = (idx + 1) % IOVA_FQ_SIZE;
+
+   return idx;
+}
+
+static void fq_ring_free(struct iova_domain *iovad, struct iova_fq *fq)
+{
+   u64 counter = atomic64_read(>fq_flush_finish_cnt);
+   unsigned idx;
+
+   assert_spin_locked(>lock);
+
+   fq_ring_for_each(idx, fq) {
+
+   if (fq->entries[idx].counter >= counter)
+   break;
+
+   put_pages_list(>entries[idx].freelist);
+   free_iova_fast(iovad,
+  fq->entries[idx].iova_pfn,
+  fq->entries[idx].pages);
+
+   fq->head = (fq->head + 1) % IOVA_FQ_SIZE;
+   }
+}
+
+static void iova_domain_flush(struct iova_domain *iovad)
+{
+   atomic64_inc(>fq_flush_start_cnt);
+   iovad->fq_domain->ops->flush_iotlb_all(iovad->fq_domain);
+   atomic64_inc(>fq_flush_finish_cnt);
+}
+
+static void fq_flush_timeout(struct timer_list *t)
+{
+   struct iova_domain *iovad = from_timer(iovad, t, fq_timer);
+   int cpu;
+
+   atomic_set(>fq_timer_on, 0);
+   iova_domain_flush(iovad);
+
+   for_each_possible_cpu(cpu) {
+   unsigned long flags;
+   struct iova_fq *fq;
+
+   fq = per_cpu_ptr(iovad->fq, cpu);
+   spin_lock_irqsave(>lock, flags);
+   fq_ring_free(iovad, fq);
+   spin_unlock_irqrestore(>lock, flags);
+   }
+}
+
+void queue_iova(struct iova_domain *iovad,
+   unsigned long pfn, unsigned long pages,
+   struct list_head *freelist)
+{
+   struct iova_fq *fq;
+   unsigned long flags;
+   unsigned idx;
+
+   /*
+* Order against the IOMMU driver's pagetable update from unmapping
+* @pte, to guarantee that iova_domain_flush() observes that if called
+* from a different CPU before we release the lock below. Full barrier
+* so it also pairs with iommu_dma_init_fq() to avoid seeing partially
+* written fq state here.
+*/
+   smp_mb();
+
+   fq = raw_cpu_ptr(iovad->fq);
+   spin_lock_irqsave(>lock, flags);
+
+   /*
+* First remove all entries from the flush queue that have already been
+* flushed out on another CPU. This makes the fq_full() check below less
+* likely to be true.
+*/
+   fq_ring_free(iovad, fq);
+
+   if (fq_full(fq)) {
+   iova_domain_flush(iovad);
+   fq_ring_free(iovad, fq);
+   }
+
+   idx = fq_ring_add(fq);
+
+   fq->entries[idx].iova_pfn = pfn;
+   fq->entries[idx].pages= pages;
+   fq->entries[idx].counter  = atomic64_read(>fq_flush_start_cnt);
+   list_splice(freelist, >entries[idx].freelist);
+
+   spin_unlock_irqrestore(>lock, flags);
+
+   /* Avoid false sharing as much as possible. */
+   if (!atomic_read(>fq_timer_on) &&
+   !atomic_xchg(>fq_timer_on, 1))
+   mod_timer(>fq_timer,
+ jiffies + msecs_to_jiffies(IOVA_FQ_TIMEOUT));
+}
+
+static void free_iova_flush_queue(struct iova_domain *iovad)
+{
+   int cpu, idx;
+
+   if (!iovad->fq)
+   return;
+
+   del_timer(>fq_timer);
+   /*
+* This code runs when the iova_domain is being detroyed, so don't
+* bother to free iovas, just free any remaining pagetable pages.
+*/
+   for_each_possible_cpu(cpu) {
+   struct iova_fq *fq = per_cpu_ptr(iovad->fq, cpu);
+
+   fq_ring_for_each(idx, fq)
+   put_pages_list(>entries[idx].freelist);
+   }
+
+   free_percpu(iovad->fq);

[PATCH 7/9] iommu/iova: Consolidate flush queue code

Squash and simplify some of the freeing code, and move the init
and free routines down into the rest of the flush queue code to
obviate the forward declarations.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/iova.c | 132 +++
 1 file changed, 58 insertions(+), 74 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index a32007c950e5..159acd34501b 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -24,8 +24,6 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 static void init_iova_rcaches(struct iova_domain *iovad);
 static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
-static void fq_destroy_all_entries(struct iova_domain *iovad);
-static void fq_flush_timeout(struct timer_list *t);
 
 static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
 {
@@ -73,61 +71,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
-static bool has_iova_flush_queue(struct iova_domain *iovad)
-{
-   return !!iovad->fq;
-}
-
-static void free_iova_flush_queue(struct iova_domain *iovad)
-{
-   if (!has_iova_flush_queue(iovad))
-   return;
-
-   if (timer_pending(>fq_timer))
-   del_timer(>fq_timer);
-
-   fq_destroy_all_entries(iovad);
-
-   free_percpu(iovad->fq);
-
-   iovad->fq = NULL;
-   iovad->fq_domain  = NULL;
-}
-
-int init_iova_flush_queue(struct iova_domain *iovad, struct iommu_domain 
*fq_domain)
-{
-   struct iova_fq __percpu *queue;
-   int i, cpu;
-
-   atomic64_set(>fq_flush_start_cnt,  0);
-   atomic64_set(>fq_flush_finish_cnt, 0);
-
-   queue = alloc_percpu(struct iova_fq);
-   if (!queue)
-   return -ENOMEM;
-
-   for_each_possible_cpu(cpu) {
-   struct iova_fq *fq;
-
-   fq = per_cpu_ptr(queue, cpu);
-   fq->head = 0;
-   fq->tail = 0;
-
-   spin_lock_init(>lock);
-
-   for (i = 0; i < IOVA_FQ_SIZE; i++)
-   INIT_LIST_HEAD(>entries[i].freelist);
-   }
-
-   iovad->fq_domain = fq_domain;
-   iovad->fq = queue;
-
-   timer_setup(>fq_timer, fq_flush_timeout, 0);
-   atomic_set(>fq_timer_on, 0);
-
-   return 0;
-}
-
 static struct rb_node *
 __get_cached_rbnode(struct iova_domain *iovad, unsigned long limit_pfn)
 {
@@ -586,23 +529,6 @@ static void iova_domain_flush(struct iova_domain *iovad)
atomic64_inc(>fq_flush_finish_cnt);
 }
 
-static void fq_destroy_all_entries(struct iova_domain *iovad)
-{
-   int cpu;
-
-   /*
-* This code runs when the iova_domain is being detroyed, so don't
-* bother to free iovas, just free any remaining pagetable pages.
-*/
-   for_each_possible_cpu(cpu) {
-   struct iova_fq *fq = per_cpu_ptr(iovad->fq, cpu);
-   int idx;
-
-   fq_ring_for_each(idx, fq)
-   put_pages_list(>entries[idx].freelist);
-   }
-}
-
 static void fq_flush_timeout(struct timer_list *t)
 {
struct iova_domain *iovad = from_timer(iovad, t, fq_timer);
@@ -670,6 +596,64 @@ void queue_iova(struct iova_domain *iovad,
  jiffies + msecs_to_jiffies(IOVA_FQ_TIMEOUT));
 }
 
+static void free_iova_flush_queue(struct iova_domain *iovad)
+{
+   int cpu, idx;
+
+   if (!iovad->fq)
+   return;
+
+   del_timer(>fq_timer);
+   /*
+* This code runs when the iova_domain is being detroyed, so don't
+* bother to free iovas, just free any remaining pagetable pages.
+*/
+   for_each_possible_cpu(cpu) {
+   struct iova_fq *fq = per_cpu_ptr(iovad->fq, cpu);
+
+   fq_ring_for_each(idx, fq)
+   put_pages_list(>entries[idx].freelist);
+   }
+
+   free_percpu(iovad->fq);
+
+   iovad->fq = NULL;
+   iovad->fq_domain = NULL;
+}
+
+int init_iova_flush_queue(struct iova_domain *iovad, struct iommu_domain 
*fq_domain)
+{
+   struct iova_fq __percpu *queue;
+   int i, cpu;
+
+   atomic64_set(>fq_flush_start_cnt,  0);
+   atomic64_set(>fq_flush_finish_cnt, 0);
+
+   queue = alloc_percpu(struct iova_fq);
+   if (!queue)
+   return -ENOMEM;
+
+   for_each_possible_cpu(cpu) {
+   struct iova_fq *fq = per_cpu_ptr(queue, cpu);
+
+   fq->head = 0;
+   fq->tail = 0;
+
+   spin_lock_init(>lock);
+
+   for (i = 0; i < IOVA_FQ_SIZE; i++)
+   INIT_LIST_HEAD(>entries[i].freelist);
+   }
+
+   iovad->fq_domain = fq_domain;
+   iovad->fq = queue;
+
+   timer_setup(>fq_timer, fq_flush_timeout, 0);
+   atomic_set(>fq_timer_on, 0);
+
+   return 0;
+}
+
 /**
  * put_iova_domain - destroys the iova domain
  * @iovad: -

[PATCH 6/9] iommu/vt-d: Use put_pages_list

From: "Matthew Wilcox (Oracle)" 

page->freelist is for the use of slab.  We already have the ability
to free a list of pages in the core mm, but it requires the use of a
list_head and for the pages to be chained together through page->lru.
Switch the Intel IOMMU and IOVA code over to using free_pages_list().

Signed-off-by: Matthew Wilcox (Oracle) 
[rm: split from original patch, cosmetic tweaks, fix fq entries]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c   |  2 +-
 drivers/iommu/intel/iommu.c | 89 +
 drivers/iommu/iova.c| 26 ---
 include/linux/iommu.h   |  3 +-
 include/linux/iova.h|  4 +-
 5 files changed, 45 insertions(+), 79 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index cde887530549..f139b77caee0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -455,7 +455,7 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie 
*cookie,
else if (gather && gather->queued)
queue_iova(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad),
-   gather->freelist);
+   >freelist);
else
free_iova_fast(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad));
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0bde0c8b4126..f65206dac485 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1303,35 +1303,30 @@ static void dma_pte_free_pagetable(struct dmar_domain 
*domain,
know the hardware page-walk will no longer touch them.
The 'pte' argument is the *parent* PTE, pointing to the page that is to
be freed. */
-static struct page *dma_pte_list_pagetables(struct dmar_domain *domain,
-   int level, struct dma_pte *pte,
-   struct page *freelist)
+static void dma_pte_list_pagetables(struct dmar_domain *domain,
+   int level, struct dma_pte *pte,
+   struct list_head *freelist)
 {
struct page *pg;
 
pg = pfn_to_page(dma_pte_addr(pte) >> PAGE_SHIFT);
-   pg->freelist = freelist;
-   freelist = pg;
+   list_add_tail(>lru, freelist);
 
if (level == 1)
-   return freelist;
+   return;
 
pte = page_address(pg);
do {
if (dma_pte_present(pte) && !dma_pte_superpage(pte))
-   freelist = dma_pte_list_pagetables(domain, level - 1,
-  pte, freelist);
+   dma_pte_list_pagetables(domain, level - 1, pte, 
freelist);
pte++;
} while (!first_pte_in_page(pte));
-
-   return freelist;
 }
 
-static struct page *dma_pte_clear_level(struct dmar_domain *domain, int level,
-   struct dma_pte *pte, unsigned long pfn,
-   unsigned long start_pfn,
-   unsigned long last_pfn,
-   struct page *freelist)
+static void dma_pte_clear_level(struct dmar_domain *domain, int level,
+   struct dma_pte *pte, unsigned long pfn,
+   unsigned long start_pfn, unsigned long last_pfn,
+   struct list_head *freelist)
 {
struct dma_pte *first_pte = NULL, *last_pte = NULL;
 
@@ -1352,7 +1347,7 @@ static struct page *dma_pte_clear_level(struct 
dmar_domain *domain, int level,
/* These suborbinate page tables are going away 
entirely. Don't
   bother to clear them; we're just going to *free* 
them. */
if (level > 1 && !dma_pte_superpage(pte))
-   freelist = dma_pte_list_pagetables(domain, 
level - 1, pte, freelist);
+   dma_pte_list_pagetables(domain, level - 1, pte, 
freelist);
 
dma_clear_pte(pte);
if (!first_pte)
@@ -1360,10 +1355,10 @@ static struct page *dma_pte_clear_level(struct 
dmar_domain *domain, int level,
last_pte = pte;
} else if (level > 1) {
/* Recurse down into a level that isn't *entirely* 
obsolete */
-   freelist = dma_pte_clear_level(domain, level - 1,
-  
phys_to_virt(dma_pte_addr(pte)),
-  level_pfn, start_pfn, 
last_pfn,
-  freelist);
+   dma_pte_clear_level(domain, level - 1,
+   phys_to_virt(dma_pte_addr(pte)),
+

[PATCH 5/9] iommu/amd: Use put_pages_list

From: "Matthew Wilcox (Oracle)" 

page->freelist is for the use of slab.  We already have the ability
to free a list of pages in the core mm, but it requires the use of a
list_head and for the pages to be chained together through page->lru.
Switch the AMD IOMMU code over to using free_pages_list().

Signed-off-by: Matthew Wilcox (Oracle) 
[rm: split from original patch, cosmetic tweaks]
Signed-off-by: Robin Murphy 
---
 drivers/iommu/amd/io_pgtable.c | 50 --
 1 file changed, 18 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index f92ecb3e21d7..be2eba61b4d3 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -74,26 +74,14 @@ static u64 *first_pte_l7(u64 *pte, unsigned long *page_size,
  *
  /
 
-static void free_page_list(struct page *freelist)
-{
-   while (freelist != NULL) {
-   unsigned long p = (unsigned long)page_address(freelist);
-
-   freelist = freelist->freelist;
-   free_page(p);
-   }
-}
-
-static struct page *free_pt_page(u64 *pt, struct page *freelist)
+static void free_pt_page(u64 *pt, struct list_head *freelist)
 {
struct page *p = virt_to_page(pt);
 
-   p->freelist = freelist;
-
-   return p;
+   list_add_tail(>lru, freelist);
 }
 
-static struct page *free_pt_lvl(u64 *pt, struct page *freelist, int lvl)
+static void free_pt_lvl(u64 *pt, struct list_head *freelist, int lvl)
 {
u64 *p;
int i;
@@ -110,22 +98,22 @@ static struct page *free_pt_lvl(u64 *pt, struct page 
*freelist, int lvl)
 
p = IOMMU_PTE_PAGE(pt[i]);
if (lvl > 2)
-   freelist = free_pt_lvl(p, freelist, lvl - 1);
+   free_pt_lvl(p, freelist, lvl - 1);
else
-   freelist = free_pt_page(p, freelist);
+   free_pt_page(p, freelist);
}
 
-   return free_pt_page(pt, freelist);
+   free_pt_page(pt, freelist);
 }
 
-static struct page *free_sub_pt(u64 *root, int mode, struct page *freelist)
+static void free_sub_pt(u64 *root, int mode, struct list_head *freelist)
 {
switch (mode) {
case PAGE_MODE_NONE:
case PAGE_MODE_7_LEVEL:
break;
case PAGE_MODE_1_LEVEL:
-   freelist = free_pt_page(root, freelist);
+   free_pt_page(root, freelist);
break;
case PAGE_MODE_2_LEVEL:
case PAGE_MODE_3_LEVEL:
@@ -137,8 +125,6 @@ static struct page *free_sub_pt(u64 *root, int mode, struct 
page *freelist)
default:
BUG();
}
-
-   return freelist;
 }
 
 void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
@@ -346,7 +332,7 @@ static u64 *fetch_pte(struct amd_io_pgtable *pgtable,
return pte;
 }
 
-static struct page *free_clear_pte(u64 *pte, u64 pteval, struct page *freelist)
+static void free_clear_pte(u64 *pte, u64 pteval, struct list_head *freelist)
 {
u64 *pt;
int mode;
@@ -357,12 +343,12 @@ static struct page *free_clear_pte(u64 *pte, u64 pteval, 
struct page *freelist)
}
 
if (!IOMMU_PTE_PRESENT(pteval))
-   return freelist;
+   return;
 
pt   = IOMMU_PTE_PAGE(pteval);
mode = IOMMU_PTE_MODE(pteval);
 
-   return free_sub_pt(pt, mode, freelist);
+   free_sub_pt(pt, mode, freelist);
 }
 
 /*
@@ -376,7 +362,7 @@ static int iommu_v1_map_page(struct io_pgtable_ops *ops, 
unsigned long iova,
  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
struct protection_domain *dom = io_pgtable_ops_to_domain(ops);
-   struct page *freelist = NULL;
+   LIST_HEAD(freelist);
bool updated = false;
u64 __pte, *pte;
int ret, i, count;
@@ -396,9 +382,9 @@ static int iommu_v1_map_page(struct io_pgtable_ops *ops, 
unsigned long iova,
goto out;
 
for (i = 0; i < count; ++i)
-   freelist = free_clear_pte([i], pte[i], freelist);
+   free_clear_pte([i], pte[i], );
 
-   if (freelist != NULL)
+   if (!list_empty())
updated = true;
 
if (count > 1) {
@@ -433,7 +419,7 @@ static int iommu_v1_map_page(struct io_pgtable_ops *ops, 
unsigned long iova,
}
 
/* Everything flushed out, free pages now */
-   free_page_list(freelist);
+   put_pages_list();
 
return ret;
 }
@@ -495,7 +481,7 @@ static void v1_free_pgtable(struct io_pgtable *iop)
 {
struct amd_io_pgtable *pgtable = container_of(iop, struct 
amd_io_pgtable, iop);
struct protection_domain *dom;
-   struct page *freelist = NULL;
+   LIST_HEAD(freelist);
 
if (pgtable->mode == PAGE_MODE_NONE)
return;
@@ -512,9 +498,9 @@ static void v1_free_pgtable(struct

[PATCH 4/9] iommu/amd: Simplify pagetable freeing

For reasons unclear, pagetable freeing is an effectively recursive
method implemented via an elaborate system of templated functions that
turns out to account for 25% of the object file size. Implementing it
using regular straightforward recursion makes the code simpler, and
seems like a good thing to do before we work on it further. As part of
that, also fix the types to avoid all the needless casting back and
forth which just gets in the way.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/amd/io_pgtable.c | 78 +-
 1 file changed, 30 insertions(+), 48 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 182c93a43efd..f92ecb3e21d7 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -84,49 +84,41 @@ static void free_page_list(struct page *freelist)
}
 }
 
-static struct page *free_pt_page(unsigned long pt, struct page *freelist)
+static struct page *free_pt_page(u64 *pt, struct page *freelist)
 {
-   struct page *p = virt_to_page((void *)pt);
+   struct page *p = virt_to_page(pt);
 
p->freelist = freelist;
 
return p;
 }
 
-#define DEFINE_FREE_PT_FN(LVL, FN) 
\
-static struct page *free_pt_##LVL (unsigned long __pt, struct page *freelist)  
\
-{  
\
-   unsigned long p;
\
-   u64 *pt;
\
-   int i;  
\
-   
\
-   pt = (u64 *)__pt;   
\
-   
\
-   for (i = 0; i < 512; ++i) { 
\
-   /* PTE present? */  
\
-   if (!IOMMU_PTE_PRESENT(pt[i]))  
\
-   continue;   
\
-   
\
-   /* Large PTE? */
\
-   if (PM_PTE_LEVEL(pt[i]) == 0 || 
\
-   PM_PTE_LEVEL(pt[i]) == 7)   
\
-   continue;   
\
-   
\
-   p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   
\
-   freelist = FN(p, freelist); 
\
-   }   
\
-   
\
-   return free_pt_page((unsigned long)pt, freelist);   
\
+static struct page *free_pt_lvl(u64 *pt, struct page *freelist, int lvl)
+{
+   u64 *p;
+   int i;
+
+   for (i = 0; i < 512; ++i) {
+   /* PTE present? */
+   if (!IOMMU_PTE_PRESENT(pt[i]))
+   continue;
+
+   /* Large PTE? */
+   if (PM_PTE_LEVEL(pt[i]) == 0 ||
+   PM_PTE_LEVEL(pt[i]) == 7)
+   continue;
+
+   p = IOMMU_PTE_PAGE(pt[i]);
+   if (lvl > 2)
+   freelist = free_pt_lvl(p, freelist, lvl - 1);
+   else
+   freelist = free_pt_page(p, freelist);
+   }
+
+   return free_pt_page(pt, freelist);
 }
 
-DEFINE_FREE_PT_FN(l2, free_pt_page)
-DEFINE_FREE_PT_FN(l3, free_pt_l2)
-DEFINE_FREE_PT_FN(l4, free_pt_l3)
-DEFINE_FREE_PT_FN(l5, free_pt_l4)
-DEFINE_FREE_PT_FN(l6, free_pt_l5)
-
-static struct page *free_sub_pt(unsigned long root, int mode,
-   struct page *freelist)
+static struct page *free_sub_pt(u64 *root, int mode, struct page *freelist)
 {
switch (mode) {
case PAGE_MODE_NONE:
@@ -136,19 +128,11 @@ static struct page *free_sub_pt(unsigned long root, int 
mode,
freelist = free_pt_page(root, freelist);
break;
case PAGE_MODE_2_LEVEL:
-   freelist = free_pt_l2(root, freelist);
-   break;
case PAGE_MODE_3_LEVEL:
-   freelist = free_pt_l3(root, freelist);
-   break;
case PAGE_MODE_4_LEVEL:
-   freelist = free_pt_l4(root, freelist);
-   break;
case PAGE_MODE_5_LEVEL:
-   freelist = free_pt_l5(root, freelist);
-   break;
case PAGE_MODE_6_LEVEL:
-   freelist = free_pt_l6(root,

[PATCH 3/9] iommu/iova: Squash flush_cb abstraction

Once again, with iommu-dma now being the only flush queue user, we no
longer need the extra level of indirection through flush_cb. Squash that
and let the flush queue code call the domain method directly.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 13 +
 drivers/iommu/iova.c  | 11 +--
 include/linux/iova.h  | 11 +++
 3 files changed, 9 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index fa21b9141b71..cde887530549 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -282,17 +282,6 @@ static int iova_reserve_iommu_regions(struct device *dev,
return ret;
 }
 
-static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad)
-{
-   struct iommu_dma_cookie *cookie;
-   struct iommu_domain *domain;
-
-   cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
-   domain = cookie->fq_domain;
-
-   domain->ops->flush_iotlb_all(domain);
-}
-
 static bool dev_is_untrusted(struct device *dev)
 {
return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
@@ -312,7 +301,7 @@ int iommu_dma_init_fq(struct iommu_domain *domain)
if (cookie->fq_domain)
return 0;
 
-   ret = init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all);
+   ret = init_iova_flush_queue(>iovad, domain);
if (ret) {
pr_warn("iova flush queue initialization failed\n");
return ret;
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 982e2779b981..7619ccb726cc 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -63,7 +63,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
iovad->start_pfn = start_pfn;
iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
iovad->max32_alloc_size = iovad->dma_32bit_pfn;
-   iovad->flush_cb = NULL;
+   iovad->fq_domain = NULL;
iovad->fq = NULL;
iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
rb_link_node(>anchor.node, NULL, >rbroot.rb_node);
@@ -91,10 +91,10 @@ static void free_iova_flush_queue(struct iova_domain *iovad)
free_percpu(iovad->fq);
 
iovad->fq = NULL;
-   iovad->flush_cb   = NULL;
+   iovad->fq_domain  = NULL;
 }
 
-int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb)
+int init_iova_flush_queue(struct iova_domain *iovad, struct iommu_domain 
*fq_domain)
 {
struct iova_fq __percpu *queue;
int cpu;
@@ -106,8 +106,6 @@ int init_iova_flush_queue(struct iova_domain *iovad, 
iova_flush_cb flush_cb)
if (!queue)
return -ENOMEM;
 
-   iovad->flush_cb   = flush_cb;
-
for_each_possible_cpu(cpu) {
struct iova_fq *fq;
 
@@ -118,6 +116,7 @@ int init_iova_flush_queue(struct iova_domain *iovad, 
iova_flush_cb flush_cb)
spin_lock_init(>lock);
}
 
+   iovad->fq_domain = fq_domain;
iovad->fq = queue;
 
timer_setup(>fq_timer, fq_flush_timeout, 0);
@@ -590,7 +589,7 @@ static void fq_ring_free(struct iova_domain *iovad, struct 
iova_fq *fq)
 static void iova_domain_flush(struct iova_domain *iovad)
 {
atomic64_inc(>fq_flush_start_cnt);
-   iovad->flush_cb(iovad);
+   iovad->fq_domain->ops->flush_iotlb_all(iovad->fq_domain);
atomic64_inc(>fq_flush_finish_cnt);
 }
 
diff --git a/include/linux/iova.h b/include/linux/iova.h
index e746d8e41449..99be4fcea4f3 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* iova structure */
 struct iova {
@@ -35,11 +36,6 @@ struct iova_rcache {
struct iova_cpu_rcache __percpu *cpu_rcaches;
 };
 
-struct iova_domain;
-
-/* Call-Back from IOVA code into IOMMU drivers */
-typedef void (* iova_flush_cb)(struct iova_domain *domain);
-
 /* Number of entries per Flush Queue */
 #define IOVA_FQ_SIZE   256
 
@@ -82,8 +78,7 @@ struct iova_domain {
struct iova anchor; /* rbtree lookup anchor */
struct iova_rcache rcaches[IOVA_RANGE_CACHE_MAX_SIZE];  /* IOVA range 
caches */
 
-   iova_flush_cb   flush_cb;   /* Call-Back function to flush IOMMU
-  TLBs */
+   struct iommu_domain *fq_domain;
 
struct timer_list fq_timer; /* Timer to regularily empty the
   flush-queues */
@@ -147,7 +142,7 @@ struct iova *reserve_iova(struct iova_domain *iovad, 
unsigned long pfn_lo,
unsigned long pfn_hi);
 void init_iova_domain(struct iova_domain *iovad, unsigned long granule,
unsigned long start_pfn);
-int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb);
+int init_iova_flush_queue(struct iova_domain *iovad, struct iommu_domain 
*fq_domain);
 struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
 void

[PATCH 2/9] iommu/iova: Squash entry_dtor abstraction

All flush queues are driven by iommu-dma now, so there is no need to
abstract entry_dtor or its data any more. Squash the now-canonical
implementation directly into the IOVA code to get it out of the way.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 17 ++---
 drivers/iommu/iova.c  | 28 +++-
 include/linux/iova.h  | 26 +++---
 3 files changed, 20 insertions(+), 51 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b42e38a0dbe2..fa21b9141b71 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -64,18 +64,6 @@ static int __init iommu_dma_forcedac_setup(char *str)
 }
 early_param("iommu.forcedac", iommu_dma_forcedac_setup);
 
-static void iommu_dma_entry_dtor(unsigned long data)
-{
-   struct page *freelist = (struct page *)data;
-
-   while (freelist) {
-   unsigned long p = (unsigned long)page_address(freelist);
-
-   freelist = freelist->freelist;
-   free_page(p);
-   }
-}
-
 static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
 {
if (cookie->type == IOMMU_DMA_IOVA_COOKIE)
@@ -324,8 +312,7 @@ int iommu_dma_init_fq(struct iommu_domain *domain)
if (cookie->fq_domain)
return 0;
 
-   ret = init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all,
-   iommu_dma_entry_dtor);
+   ret = init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all);
if (ret) {
pr_warn("iova flush queue initialization failed\n");
return ret;
@@ -479,7 +466,7 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie 
*cookie,
else if (gather && gather->queued)
queue_iova(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad),
-   (unsigned long)gather->freelist);
+   gather->freelist);
else
free_iova_fast(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad));
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 9e8bc802ac05..982e2779b981 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -92,11 +92,9 @@ static void free_iova_flush_queue(struct iova_domain *iovad)
 
iovad->fq = NULL;
iovad->flush_cb   = NULL;
-   iovad->entry_dtor = NULL;
 }
 
-int init_iova_flush_queue(struct iova_domain *iovad,
- iova_flush_cb flush_cb, iova_entry_dtor entry_dtor)
+int init_iova_flush_queue(struct iova_domain *iovad, iova_flush_cb flush_cb)
 {
struct iova_fq __percpu *queue;
int cpu;
@@ -109,7 +107,6 @@ int init_iova_flush_queue(struct iova_domain *iovad,
return -ENOMEM;
 
iovad->flush_cb   = flush_cb;
-   iovad->entry_dtor = entry_dtor;
 
for_each_possible_cpu(cpu) {
struct iova_fq *fq;
@@ -539,6 +536,16 @@ free_iova_fast(struct iova_domain *iovad, unsigned long 
pfn, unsigned long size)
 }
 EXPORT_SYMBOL_GPL(free_iova_fast);
 
+static void fq_entry_dtor(struct page *freelist)
+{
+   while (freelist) {
+   unsigned long p = (unsigned long)page_address(freelist);
+
+   freelist = freelist->freelist;
+   free_page(p);
+   }
+}
+
 #define fq_ring_for_each(i, fq) \
for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % 
IOVA_FQ_SIZE)
 
@@ -571,9 +578,7 @@ static void fq_ring_free(struct iova_domain *iovad, struct 
iova_fq *fq)
if (fq->entries[idx].counter >= counter)
break;
 
-   if (iovad->entry_dtor)
-   iovad->entry_dtor(fq->entries[idx].data);
-
+   fq_entry_dtor(fq->entries[idx].freelist);
free_iova_fast(iovad,
   fq->entries[idx].iova_pfn,
   fq->entries[idx].pages);
@@ -598,15 +603,12 @@ static void fq_destroy_all_entries(struct iova_domain 
*iovad)
 * bother to free iovas, just call the entry_dtor on all remaining
 * entries.
 */
-   if (!iovad->entry_dtor)
-   return;
-
for_each_possible_cpu(cpu) {
struct iova_fq *fq = per_cpu_ptr(iovad->fq, cpu);
int idx;
 
fq_ring_for_each(idx, fq)
-   iovad->entry_dtor(fq->entries[idx].data);
+   fq_entry_dtor(fq->entries[idx].freelist);
}
 }
 
@@ -631,7 +633,7 @@ static void fq_flush_timeout(struct timer_list *t)
 
 void queue_iova(struct iova_domain *iovad,
unsigned long pfn, unsigned long pages,
-   unsigned long data)
+   struct page *freelist)
 {
struct iova_fq *fq;
unsigned long flags;
@@ -665,7 +667,7 @@ void queue_iova(struct iova_domain *iovad,

[PATCH 1/9] gpu: host1x: Add missing DMA API include

Host1x seems to be relying on picking up dma-mapping.h transitively from
iova.h, which has no reason to include it in the first place. Fix the
former issue before we totally break things by fixing the latter one.

CC: Thierry Reding 
CC: Mikko Perttunen 
CC: dri-de...@lists.freedesktop.org
CC: linux-te...@vger.kernel.org
Signed-off-by: Robin Murphy 
---

Feel free to pick this into drm-misc-next or drm-misc-fixes straight
away if that suits - it's only to avoid a build breakage once the rest
of the series gets queued.

Robin.

 drivers/gpu/host1x/bus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index 218e3718fd68..881fad5c3307 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -5,6 +5,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/9] iommu: Refactor flush queues into iommu-dma

Hi all,

As promised, this series cleans up the flush queue code and streamlines
it directly into iommu-dma. Since we no longer have per-driver DMA ops
implementations, a lot of the abstraction is now no longer necessary, so
there's a nice degree of simplification in the process. Un-abstracting
the queued page freeing mechanism is also the perfect opportunity to
revise which struct page fields we use so we can be better-behaved
from the MM point of view, thanks to Matthew.

These changes should also make it viable to start using the gather
freelist in io-pgtable-arm, and eliminate some more synchronous
invalidations from the normal flow there, but that is proving to need a
bit more careful thought than I have time for in this cycle, so I've
parked that again for now and will revisit it in the new year.

For convenience, branch at:
  https://gitlab.arm.com/linux-arm/linux-rm/-/tree/iommu/iova

I've build-tested for x86_64, and boot-tested arm64 to the point of
confirming that put_pages_list() gets passed a valid empty list when
flushing, while everything else still works.

Cheers,
Robin.


Matthew Wilcox (Oracle) (2):
  iommu/amd: Use put_pages_list
  iommu/vt-d: Use put_pages_list

Robin Murphy (7):
  gpu: host1x: Add missing DMA API include
  iommu/iova: Squash entry_dtor abstraction
  iommu/iova: Squash flush_cb abstraction
  iommu/amd: Simplify pagetable freeing
  iommu/iova: Consolidate flush queue code
  iommu/iova: Move flush queue code to iommu-dma
  iommu: Move flush queue data into iommu_dma_cookie

 drivers/gpu/host1x/bus.c   |   1 +
 drivers/iommu/amd/io_pgtable.c | 116 ++
 drivers/iommu/dma-iommu.c  | 266 +++--
 drivers/iommu/intel/iommu.c|  89 ---
 drivers/iommu/iova.c   | 200 -
 include/linux/iommu.h  |   3 +-
 include/linux/iova.h   |  69 +
 7 files changed, 295 insertions(+), 449 deletions(-)

-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/3] Allow restricted-dma-pool to customize IO_TLB_SEGSIZE

On 2021-11-23 11:21, Hsin-Yi Wang wrote:

Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases.
This series adds support to customize io_tlb_segsize for each
restricted-dma-pool.

Example use case:

mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through
mtk-scp. In order to use the noncontiguous DMA API[3], we need to use
the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs.
mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are
larger than the default IO_TLB_SEGSIZE (128) slabs.

Are drivers really doing streaming DMA mappings that large? If so, that
seems like it might be worth trying to address in its own right for the
sake of efficiency - allocating ~5MB of memory twice and copying it back
and forth doesn't sound like the ideal thing to do.

If it's really about coherent DMA buffer allocation, I thought the plan
was that devices which expect to use a significant amount and/or size of
coherent buffers would continue to use a shared-dma-pool for that? It's
still what the binding implies. My understanding was that
swiotlb_alloc() is mostly just a fallback for the sake of drivers which
mostly do streaming DMA but may allocate a handful of pages worth of
coherent buffers here and there. Certainly looking at the mtk_scp
driver, that seems like it shouldn't be going anywhere near SWIOTLB at all.

Robin.

[1] (not in upstream)
https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo@mediatek.com/
[2] https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c
[3]
https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhat...@chromium.org/

Hsin-Yi Wang (3):
dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE
dt-bindings: Add io-tlb-segsize property for restricted-dma-pool
arm64: dts: mt8183: use restricted swiotlb for scp mem

.../reserved-memory/shared-dma-pool.yaml | 8 +
.../arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 4 +--
include/linux/swiotlb.h | 1 +
kernel/dma/swiotlb.c | 34 ++-
4 files changed, 37 insertions(+), 10 deletions(-)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/3] dt-bindings: Add io-tlb-segsize property for restricted-dma-pool

Add a io-tlb-segsize property that each restricted-dma-pool can set its
own io_tlb_segsize since some use cases require slabs larger than default
value (128).

Signed-off-by: Hsin-Yi Wang 
---
 .../bindings/reserved-memory/shared-dma-pool.yaml | 8 
 1 file changed, 8 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml 
b/Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml
index a4bf757d6881de..6198bf6b76f0b2 100644
--- a/Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml
+++ b/Documentation/devicetree/bindings/reserved-memory/shared-dma-pool.yaml
@@ -56,6 +56,14 @@ properties:
   If this property is present, then Linux will use the region for
   the default pool of the consistent DMA allocator.
 
+  io-tlb-segsize:
+type: u32
+description: >
+  Each restricted-dma-pool can use this property to set its own
+  io_tlb_segsize. If not set, it will use the default value
+  IO_TLB_SEGSIZE defined in include/linux/swiotlb.h. The value has
+  to be a power of 2, otherwise it will fall back to IO_TLB_SEGSIZE.
+
 unevaluatedProperties: false
 
 examples:
-- 
2.34.0.rc2.393.gf8c9666880-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/3] arm64: dts: mt8183: use restricted swiotlb for scp mem

Use restricted-dma-pool for mtk_scp's reserved memory. And set the
io-tlb-segsize to 4096 since the driver needs at least 2560 slabs to
allocate memory.

Signed-off-by: Hsin-Yi Wang 
---
 arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
index 94c13c45919445..de94b2fd7f33e7 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
@@ -109,9 +109,9 @@ reserved_memory: reserved-memory {
ranges;
 
scp_mem_reserved: scp_mem_region {
-   compatible = "shared-dma-pool";
+   compatible = "restricted-dma-pool";
reg = <0 0x5000 0 0x290>;
-   no-map;
+   io-tlb-segsize = <4096>;
};
};
 
-- 
2.34.0.rc2.393.gf8c9666880-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/3] dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE

Default IO_TLB_SEGSIZE is 128, but some use cases requires more slabs.
Otherwise swiotlb_find_slots() will fail.

This patch allows each mem pool to decide their own io-tlb-segsize
through dt property.

Signed-off-by: Hsin-Yi Wang 
---
 include/linux/swiotlb.h |  1 +
 kernel/dma/swiotlb.c| 34 ++
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375c4..73b3312f23e65b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -95,6 +95,7 @@ struct io_tlb_mem {
unsigned long nslabs;
unsigned long used;
unsigned int index;
+   unsigned int io_tlb_segsize;
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c7a..021eef1844ca4c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -145,9 +145,10 @@ void swiotlb_print_info(void)
   (mem->nslabs << IO_TLB_SHIFT) >> 20);
 }
 
-static inline unsigned long io_tlb_offset(unsigned long val)
+static inline unsigned long io_tlb_offset(unsigned long val,
+ unsigned long io_tlb_segsize)
 {
-   return val & (IO_TLB_SEGSIZE - 1);
+   return val & (io_tlb_segsize - 1);
 }
 
 static inline unsigned long nr_slots(u64 val)
@@ -186,13 +187,16 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+   if (!mem->io_tlb_segsize)
+   mem->io_tlb_segsize = IO_TLB_SEGSIZE;
 
if (swiotlb_force == SWIOTLB_FORCE)
mem->force_bounce = true;
 
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].list = mem->io_tlb_segsize -
+io_tlb_offset(i, mem->io_tlb_segsize);
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
}
@@ -523,7 +527,7 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
alloc_size - (offset + ((i - index) << IO_TLB_SHIFT));
}
for (i = index - 1;
-io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
+io_tlb_offset(i, mem->io_tlb_segsize) != mem->io_tlb_segsize - 1 &&
 mem->slots[i].list; i--)
mem->slots[i].list = ++count;
 
@@ -603,7 +607,7 @@ static void swiotlb_release_slots(struct device *dev, 
phys_addr_t tlb_addr)
 * with slots below and above the pool being returned.
 */
spin_lock_irqsave(>lock, flags);
-   if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE))
+   if (index + nslots < ALIGN(index + 1, mem->io_tlb_segsize))
count = mem->slots[index + nslots].list;
else
count = 0;
@@ -623,8 +627,8 @@ static void swiotlb_release_slots(struct device *dev, 
phys_addr_t tlb_addr)
 * available (non zero)
 */
for (i = index - 1;
-io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && mem->slots[i].list;
-i--)
+io_tlb_offset(i, mem->io_tlb_segsize) != mem->io_tlb_segsize - 1 &&
+mem->slots[i].list; i--)
mem->slots[i].list = ++count;
mem->used -= nslots;
spin_unlock_irqrestore(>lock, flags);
@@ -701,7 +705,9 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
paddr, size_t size,
 
 size_t swiotlb_max_mapping_size(struct device *dev)
 {
-   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+
+   return ((size_t)IO_TLB_SIZE) * mem->io_tlb_segsize;
 }
 
 bool is_swiotlb_active(struct device *dev)
@@ -788,6 +794,7 @@ static int rmem_swiotlb_device_init(struct reserved_mem 
*rmem,
 {
struct io_tlb_mem *mem = rmem->priv;
unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+   struct device_node *np;
 
/*
 * Since multiple devices can share the same pool, the private data,
@@ -808,6 +815,17 @@ static int rmem_swiotlb_device_init(struct reserved_mem 
*rmem,
 
set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
 rmem->size >> PAGE_SHIFT);
+
+   np = of_find_node_by_phandle(rmem->phandle);
+   if (np) {
+   if (!of_property_read_u32(np, "io-tlb-segsize",
+ >io_tlb_segsize)) {
+   if (hweight32(mem->io_tlb_segsize) != 1)
+   mem->io_tlb_segsize = IO_TLB_SEGSIZE;
+   }
+   of_node_put(np);
+   }
+
swiotlb_init_io_tlb_mem(mem, rmem->base,

[PATCH 0/3] Allow restricted-dma-pool to customize IO_TLB_SEGSIZE

Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases.
This series adds support to customize io_tlb_segsize for each
restricted-dma-pool.

Example use case:

mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through
mtk-scp. In order to use the noncontiguous DMA API[3], we need to use
the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs.
mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are
larger than the default IO_TLB_SEGSIZE (128) slabs.

[1] (not in upstream) 
https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo@mediatek.com/
[2] https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c
[3] 
https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhat...@chromium.org/

Hsin-Yi Wang (3):
  dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE
  dt-bindings: Add io-tlb-segsize property for restricted-dma-pool
  arm64: dts: mt8183: use restricted swiotlb for scp mem

 .../reserved-memory/shared-dma-pool.yaml  |  8 +
 .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi |  4 +--
 include/linux/swiotlb.h   |  1 +
 kernel/dma/swiotlb.c  | 34 ++-
 4 files changed, 37 insertions(+), 10 deletions(-)

-- 
2.34.0.rc2.393.gf8c9666880-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/2] iommu/vt-d: Fixes for v5.16-rc3

On Mon, Nov 22, 2021 at 11:24:56AM +0800, Lu Baolu wrote:
> Alex Williamson (1):
>   iommu/vt-d: Fix unmap_pages support
> 
> Christophe JAILLET (1):
>   iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock()

Queued for v5.16, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] iommu/vt-d: Remove unused PASID_DISABLED

2021-11-23 Thread Lu Baolu


On 2021/11/23 18:55, Joerg Roedel wrote:

From: Joerg Roedel 

The macro is unused after commit 00ecd5401349a so it can be removed.

Reported-by: Linus Torvalds 
Fixes: 00ecd5401349a ("iommu/vt-d: Clean up unused PASID updating functions")


Reviewed-by: Lu Baolu 

Best regards,
baolu


Signed-off-by: Joerg Roedel 
---
  arch/x86/include/asm/fpu/api.h | 6 --
  1 file changed, 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 6053674f9132..c2767a6a387e 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -102,12 +102,6 @@ extern void switch_fpu_return(void);
   */
  extern int cpu_has_xfeatures(u64 xfeatures_mask, const char **feature_name);
  
-/*

- * Tasks that are not using SVA have mm->pasid set to zero to note that they
- * will not have the valid bit set in MSR_IA32_PASID while they are running.
- */
-#define PASID_DISABLED 0
-
  /* Trap handling */
  extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
  extern void fpu_sync_fpstate(struct fpu *fpu);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/2] iommu: Two minor cleanups for v5.16

From: Joerg Roedel 

Hi,

here are two minor cleanups in the IOMMU code for v5.16. If there are
no objections I will sent them upstream this week.

Regards,

Joerg

Joerg Roedel (2):
  iommu/vt-d: Remove unused PASID_DISABLED
  iommu/amd: Clarify AMD IOMMUv2 initialization messages

 arch/x86/include/asm/fpu/api.h | 6 --
 drivers/iommu/amd/iommu_v2.c   | 6 +++---
 2 files changed, 3 insertions(+), 9 deletions(-)

-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/2] iommu/amd: Clarify AMD IOMMUv2 initialization messages

From: Joerg Roedel 

The messages printed on the initialization of the AMD IOMMUv2 driver
have caused some confusion in the past. Clarify the messages to lower
the confusion in the future.

Cc: sta...@vger.kernel.org
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/iommu_v2.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index 13cbeb997cc1..58da08cc3d01 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -929,10 +929,8 @@ static int __init amd_iommu_v2_init(void)
 {
int ret;
 
-   pr_info("AMD IOMMUv2 driver by Joerg Roedel \n");
-
if (!amd_iommu_v2_supported()) {
-   pr_info("AMD IOMMUv2 functionality not available on this 
system\n");
+   pr_info("AMD IOMMUv2 functionality not available on this system 
- This is not a bug.\n");
/*
 * Load anyway to provide the symbols to other modules
 * which may use AMD IOMMUv2 optionally.
@@ -947,6 +945,8 @@ static int __init amd_iommu_v2_init(void)
 
amd_iommu_register_ppr_notifier(_nb);
 
+   pr_info("AMD IOMMUv2 loaded and initialized\n");
+
return 0;
 
 out:
-- 
2.33.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/2] iommu/vt-d: Remove unused PASID_DISABLED