Re: [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-07-28 Thread Sai Prakash Ranjan

Hi Georgi,

On 2021-07-28 19:30, Georgi Djakov wrote:

On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:

commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we will
need to set the right PTE attribute for GPU buffers to be sys cached.
Without this, the system cache lines are not allocated for GPU.

So the patches in this series introduces a new prot flag IOMMU_LLC,
renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
and makes GPU the user of this protection flag.


Hi Sai,

Thank you for the patchset! Are you planning to refresh it, as it does
not apply anymore?



I was waiting on Will's reply [1]. If there are no changes needed, then
I can repost the patch.

[1] 
https://lore.kernel.org/lkml/21239ba603d0bdc4e4c696588a905...@codeaurora.org/


Thanks,
Sai





The series slightly depends on following 2 patches posted earlier and
is based on msm-next branch:
 * https://lore.kernel.org/patchwork/patch/1363008/
 * https://lore.kernel.org/patchwork/patch/1363010/

Sai Prakash Ranjan (3):
  iommu/io-pgtable: Rename last-level cache quirk to
IO_PGTABLE_QUIRK_PTW_LLC
  iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
 drivers/gpu/drm/msm/msm_iommu.c | 3 +++
 drivers/gpu/drm/msm/msm_mmu.h   | 4 
 drivers/iommu/io-pgtable-arm.c  | 9 ++---
 include/linux/io-pgtable.h  | 6 +++---
 include/linux/iommu.h   | 6 ++
 7 files changed, 26 insertions(+), 7 deletions(-)


base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!

2021-07-28 Thread Nicholas Piggin
Excerpts from Nathan Chancellor's message of July 29, 2021 3:35 am:
> On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote:
>> linux-next fails to boot on Power server (POWER8/POWER9). Following traces
>> are seen during boot
>> 
>> [0.010799] software IO TLB: tearing down default memory pool
>> [0.010805] [ cut here ]
>> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
>> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1]
>> [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> [0.010820] Modules linked in:
>> [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
>> 5.14.0-rc3-next-20210727 #1
>> [0.010830] NIP:  c0032cfc LR: c000c764 CTR: 
>> c000c670
>> [0.010834] REGS: c3603b10 TRAP: 0700   Not tainted  
>> (5.14.0-rc3-next-20210727)
>> [0.010838] MSR:  80029033   CR: 28000222  
>> XER: 0002
>> [0.010848] CFAR: c000c760 IRQMASK: 3 
>> [0.010848] GPR00: c000c764 c3603db0 c29bd000 
>> 0001 
>> [0.010848] GPR04: 0a68 0400 c3603868 
>>  
>> [0.010848] GPR08:    
>> 0003 
>> [0.010848] GPR12:  c0001ec9ee80 c0012a28 
>>  
>> [0.010848] GPR16:    
>>  
>> [0.010848] GPR20:    
>>  
>> [0.010848] GPR24: f134   
>> c3603868 
>> [0.010848] GPR28: 0400 0a68 c202e9c0 
>> c3603e80 
>> [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0
>> [0.010901] LR [c000c764] system_call_common+0xf4/0x258
>> [0.010907] Call Trace:
>> [0.010909] [c3603db0] [c016a6dc] 
>> calculate_sigpending+0x4c/0xe0 (unreliable)
>> [0.010915] [c3603e10] [c000c764] 
>> system_call_common+0xf4/0x258
>> [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8
>> [0.010926] NIP:  c0092dec LR: c0114fc8 CTR: 
>> 
>> [0.010930] REGS: c3603e80 TRAP: 0c00   Not tainted  
>> (5.14.0-rc3-next-20210727)
>> [0.010934] MSR:  80009033   CR: 28000222  
>> XER: 
>> [0.010943] IRQMASK: 0 
>> [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 
>> f134 
>> [0.010943] GPR04: 0a68 0400 c3603868 
>>  
>> [0.010943] GPR08:    
>>  
>> [0.010943] GPR12:  c0001ec9ee80 c0012a28 
>>  
>> [0.010943] GPR16:    
>>  
>> [0.010943] GPR20:    
>>  
>> [0.010943] GPR24: c20033c4 c110afc0 c2081950 
>> c3277d40 
>> [0.010943] GPR28:  ca68 0400 
>> 000d 
>> [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8
>> [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60
>> [0.010999] --- interrupt: c00
>> [0.011001] [c3603b00] [c000c764] 
>> system_call_common+0xf4/0x258 (unreliable)
>> [0.011008] Instruction dump:
>> [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 
>> e87f0108 68690002 
>> [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 
>> 792907e0 0b09 
>> [0.011029] ---[ end trace a20ad55589efcb10 ]---
>> [0.012297] 
>> [1.012304] Kernel panic - not syncing: Fatal exception
>> 
>> next-20210723 was good. The boot failure seems to have been introduced with 
>> next-20210726.
>> 
>> I have attached the boot log.
> 
> I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on
> commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That
> series just keeps on giving... Adding some people from that thread to
> this one. Original thread:
> https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/

This is because powerpc's set_memory_encrypted makes an ultracall but it 
does not exist on that processor.

x86's set_memory_encrypted/decrypted have

   /* Nothing to do if memory encryption is not active */
if (!mem_encrypt_active())
return 0;

Probably powerpc should just do that too.

Thanks,
Nick
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/24] iommu: Refactor DMA domain strictness

2021-07-28 Thread chenxiang (M)

Hi Robin,


在 2021/7/28 23:58, Robin Murphy 写道:

Hi all,

Here's v2 where things start to look more realistic, hence the expanded
CC list. The patches are now based on the current iommu/core branch to
take John's iommu_set_dma_strict() cleanup into account.

The series remiains in two (or possibly 3) logical parts - for people
CC'd on cookie cleanup patches, the later parts should not affect you
since your drivers don't implement non-strict mode anyway; the cleanup
is all pretty straightforward, but please do yell at me if I've managed
to let a silly mistake slip through and broken your driver.

This time I have also build-tested x86 as well as arm64 :)


I have tested those patchset on ARM64 with SMMUV3, and the testcases are 
as follows:

- Boot with iommu.strict=0, running fio and it works well;
- Boot with iommu.strict=1, running fio and it works well;
- Change strict mode to lazy mode when building, the change takes effect;
- Boot without iommu.strict(default strict mode), change the sysfs 
interface type from DMA to DMA-FQ dynamically during running fio, and it 
works well;
- Boot without iommu.strict(default strict mode), change the sysfs 
interface type from DMA-FQ to DMA dynamically, and it is not allowed and 
print "Device or resource busy"
(i know it is qualified, and we can change no-strict mode to strict by 
unbind the driver -> change the sysfs interface (type)->bind the driver 
(tested this and it works well),
but i have a small question: is it also possible to change from DMA-FQ 
to DMA dynamically? )


Anyway, please feel free to add :
Tested-by: Xiang Chen 



Changes in v2:

- Add iommu_is_dma_domain() helper to abstract flag check (and help
   avoid silly typos like the one in v1).
- Tweak a few commit messages for spelling and (hopefully) clarity.
- Move the iommu_create_device_direct_mappings() update to patch #14
   where it should have been.
- Rewrite patch #20 as a conversion of the now-existing option.
- Clean up the ops->flush_iotlb_all check which is also made redundant
   by the new domain type
- Add patch #24, which is arguably tangential, but it was something I
   spotted during the rebase, so...

Once again, the whole lot is available on a branch here:

https://gitlab.arm.com/linux-arm/linux-rm/-/tree/iommu/fq

Thanks,
Robin.


CC: Marek Szyprowski 
CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
CC: Yong Wu 
CC: Heiko Stuebner 
CC: Chunyan Zhang 
CC: Chunyan Zhang 
CC: Maxime Ripard 
CC: Jean-Philippe Brucker 

Robin Murphy (24):
   iommu: Pull IOVA cookie management into the core
   iommu/amd: Drop IOVA cookie management
   iommu/arm-smmu: Drop IOVA cookie management
   iommu/vt-d: Drop IOVA cookie management
   iommu/exynos: Drop IOVA cookie management
   iommu/ipmmu-vmsa: Drop IOVA cookie management
   iommu/mtk: Drop IOVA cookie management
   iommu/rockchip: Drop IOVA cookie management
   iommu/sprd: Drop IOVA cookie management
   iommu/sun50i: Drop IOVA cookie management
   iommu/virtio: Drop IOVA cookie management
   iommu/dma: Unexport IOVA cookie management
   iommu/dma: Remove redundant "!dev" checks
   iommu: Introduce explicit type for non-strict DMA domains
   iommu/amd: Prepare for multiple DMA domain types
   iommu/arm-smmu: Prepare for multiple DMA domain types
   iommu/vt-d: Prepare for multiple DMA domain types
   iommu: Express DMA strictness via the domain type
   iommu: Expose DMA domain strictness via sysfs
   iommu: Merge strictness and domain type configs
   iommu/dma: Factor out flush queue init
   iommu: Allow enabling non-strict mode dynamically
   iommu/arm-smmu: Allow non-strict in pgtable_quirks interface
   iommu: Only log strictness for DMA domains

  .../ABI/testing/sysfs-kernel-iommu_groups |  2 +
  drivers/iommu/Kconfig | 80 +--
  drivers/iommu/amd/iommu.c | 21 +
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 25 --
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 29 ---
  drivers/iommu/arm/arm-smmu/qcom_iommu.c   |  8 --
  drivers/iommu/dma-iommu.c | 44 +-
  drivers/iommu/exynos-iommu.c  | 18 +
  drivers/iommu/intel/iommu.c   | 23 ++
  drivers/iommu/iommu.c | 53 +++-
  drivers/iommu/ipmmu-vmsa.c| 27 +--
  drivers/iommu/mtk_iommu.c |  6 --
  drivers/iommu/rockchip-iommu.c| 11 +--
  drivers/iommu/sprd-iommu.c|  6 --
  drivers/iommu/sun50i-iommu.c  | 12 +--
  drivers/iommu/virtio-iommu.c  |  8 --
  include/linux/dma-iommu.h |  9 ++-
  include/linux/iommu.h | 15 +++-
  18 files changed, 171 insertions(+), 226 deletions(-)




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu/amd: Use report_iommu_fault()

2021-07-28 Thread Suthikulpanit, Suravee via iommu

Lennert,

On 7/26/2021 11:31 AM, Lennert Buytenhek wrote:

This patch makes iommu/amd call report_iommu_fault() when an I/O page
fault occurs, which has two effects:

1) It allows device drivers to register a callback to be notified of
I/O page faults, via the iommu_set_fault_handler() API.

2) It triggers the io_page_fault tracepoint in report_iommu_fault()
when an I/O page fault occurs.

I'm mainly interested in (2).  We have a daemon with some rasdaemon-like
functionality for handling platform errors, and being able to be notified
of I/O page faults for initiating corrective action is very useful -- and
receiving such events via event tracing is a lot nicer than having to
scrape them from kmsg.

A number of other IOMMU drivers already use report_iommu_fault(), and
I/O page faults on those IOMMUs therefore already seem to trigger this
tracepoint -- but this isn't (yet) the case for AMD-Vi and Intel DMAR.

I copied the logic from the other callers of report_iommu_fault(), where
if that function returns zero, the driver will have handled the fault,
in which case we avoid logging information about the fault to the printk
buffer from the IOMMU driver.

With this patch I see io_page_fault event tracing entries as expected:

irq/24-AMD-Vi-48[002]    978.554289: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482640 flags=0x
irq/24-AMD-Vi-48[002]    978.554294: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482650 flags=0x
irq/24-AMD-Vi-48[002]    978.554299: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482660 flags=0x
irq/24-AMD-Vi-48[002]    978.554305: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482670 flags=0x
irq/24-AMD-Vi-48[002]    978.554310: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x91482680 flags=0x
irq/24-AMD-Vi-48[002]    978.554315: io_page_fault: IOMMU:[drvname] 
:05:00.0 iova=0x914826a0 flags=0x

For determining IOMMU_FAULT_{READ,WRITE}, I followed the AMD IOMMU
spec, but I haven't tested that bit of the code, as the page faults I
encounter are all to non-present (!EVENT_FLAG_PR) mappings, in which
case EVENT_FLAG_RW doesn't make sense.

Signed-off-by: Lennert Buytenhek 
---
Changes since v1 RFC:

- Don't call report_iommu_fault() for IRQ remapping faults.
   (Suggested by Joerg Roedel.)

  drivers/iommu/amd/amd_iommu_types.h |  4 
  drivers/iommu/amd/iommu.c   | 29 +
  2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..2f2c6630c24c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -138,6 +138,10 @@
  #define EVENT_DOMID_MASK_HI   0xf
  #define EVENT_FLAGS_MASK  0xfff
  #define EVENT_FLAGS_SHIFT 0x10
+#define EVENT_FLAG_TR  0x100
+#define EVENT_FLAG_RW  0x020
+#define EVENT_FLAG_PR  0x010
+#define EVENT_FLAG_I   0x008
  
  /* feature control bits */

  #define CONTROL_IOMMU_EN0x00ULL
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a7d6d78147b7..d9fb2c22d44a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c


What if we introduce:

+/*
+ * AMD I/O Virtualization Technology (IOMMU) Specification,
+ * revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
+ * that the RW ("read-write") bit is only valid if the I/O
+ * page fault was caused by a memory transaction request
+ * referencing a page that was marked present.
+ */
+#define IO_PAGE_FAULT_MEM_MASK \
+   (EVENT_FLAG_TR | EVENT_FLAG_PR | EVENT_FLAG_I)
+#define IS_IOMMU_MEM_TRANSACTION(x)\
+   ((x & IO_PAGE_FAULT_MEM_MASK) == EVENT_FLAG_PR)

Note that this should have already checked w/ EVENT_FLAG_I == 0.



@@ -484,6 +484,34 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
  
+	/*

+* If this is a DMA fault (for which the I(nterrupt) bit will
+* be unset), allow report_iommu_fault() to prevent logging it.
+*/
+   if (dev_data && ((flags & EVENT_FLAG_I) == 0)) {
+   int report_flags;
+
+   /*
+* AMD I/O Virtualization Technology (IOMMU) Specification,
+* revision 3.00, section 2.5.3 ("IO_PAGE_FAULT Event") says
+* that the RW ("read-write") bit is only valid if the I/O
+* page fault was caused by a memory transaction request
+* referencing a page that was marked present.
+*/
+   report_flags = 0;
+   if ((flags & (EVENT_FLAG_TR | EVENT_FLAG_PR)) ==
+   EVENT_FLAG_PR) {
+   if (flags & EVENT_FLAG_RW)
+   

Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!

2021-07-28 Thread Nathan Chancellor
On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote:
> linux-next fails to boot on Power server (POWER8/POWER9). Following traces
> are seen during boot
> 
> [0.010799] software IO TLB: tearing down default memory pool
> [0.010805] [ cut here ]
> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
> [0.010812] Oops: Exception in kernel mode, sig: 5 [#1]
> [0.010816] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [0.010820] Modules linked in:
> [0.010824] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> 5.14.0-rc3-next-20210727 #1
> [0.010830] NIP:  c0032cfc LR: c000c764 CTR: 
> c000c670
> [0.010834] REGS: c3603b10 TRAP: 0700   Not tainted  
> (5.14.0-rc3-next-20210727)
> [0.010838] MSR:  80029033   CR: 28000222  
> XER: 0002
> [0.010848] CFAR: c000c760 IRQMASK: 3 
> [0.010848] GPR00: c000c764 c3603db0 c29bd000 
> 0001 
> [0.010848] GPR04: 0a68 0400 c3603868 
>  
> [0.010848] GPR08:    
> 0003 
> [0.010848] GPR12:  c0001ec9ee80 c0012a28 
>  
> [0.010848] GPR16:    
>  
> [0.010848] GPR20:    
>  
> [0.010848] GPR24: f134   
> c3603868 
> [0.010848] GPR28: 0400 0a68 c202e9c0 
> c3603e80 
> [0.010896] NIP [c0032cfc] system_call_exception+0x8c/0x2e0
> [0.010901] LR [c000c764] system_call_common+0xf4/0x258
> [0.010907] Call Trace:
> [0.010909] [c3603db0] [c016a6dc] 
> calculate_sigpending+0x4c/0xe0 (unreliable)
> [0.010915] [c3603e10] [c000c764] 
> system_call_common+0xf4/0x258
> [0.010921] --- interrupt: c00 at kvm_template_end+0x4/0x8
> [0.010926] NIP:  c0092dec LR: c0114fc8 CTR: 
> 
> [0.010930] REGS: c3603e80 TRAP: 0c00   Not tainted  
> (5.14.0-rc3-next-20210727)
> [0.010934] MSR:  80009033   CR: 28000222  
> XER: 
> [0.010943] IRQMASK: 0 
> [0.010943] GPR00: c202e9c0 c3603b00 c29bd000 
> f134 
> [0.010943] GPR04: 0a68 0400 c3603868 
>  
> [0.010943] GPR08:    
>  
> [0.010943] GPR12:  c0001ec9ee80 c0012a28 
>  
> [0.010943] GPR16:    
>  
> [0.010943] GPR20:    
>  
> [0.010943] GPR24: c20033c4 c110afc0 c2081950 
> c3277d40 
> [0.010943] GPR28:  ca68 0400 
> 000d 
> [0.010989] NIP [c0092dec] kvm_template_end+0x4/0x8
> [0.010993] LR [c0114fc8] set_memory_encrypted+0x38/0x60
> [0.010999] --- interrupt: c00
> [0.011001] [c3603b00] [c000c764] 
> system_call_common+0xf4/0x258 (unreliable)
> [0.011008] Instruction dump:
> [0.011011] 694a0003 312a 7d495110 0b0a 6000 6000 e87f0108 
> 68690002 
> [0.011019] 7929ffe2 0b09 68634000 786397e2 <0b03> e93f0138 
> 792907e0 0b09 
> [0.011029] ---[ end trace a20ad55589efcb10 ]---
> [0.012297] 
> [1.012304] Kernel panic - not syncing: Fatal exception
> 
> next-20210723 was good. The boot failure seems to have been introduced with 
> next-20210726.
> 
> I have attached the boot log.

I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on
commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That
series just keeps on giving... Adding some people from that thread to
this one. Original thread:
https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/

[1]: https://github.com/openSUSE/kernel-source/raw/master/config/ppc64le/default

Cheers,
Nathan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-07-28 Thread Dave Hansen
On 7/28/21 7:52 AM, Tianyu Lan wrote:
> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int 
> numpages, bool enc)
>   int ret;
>  
>   /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> + if (hv_is_isolation_supported())
> + return hv_set_mem_enc(addr, numpages, enc);
> + else if (!mem_encrypt_active())
>   return 0;

One more thing.  If you're going to be patching generic code, please
start using feature checks that can get optimized away at runtime.
hv_is_isolation_supported() doesn't look like the world's cheapest
check.  It can't be inlined and costs at least a function call.

These checks could, with basically no effort be wrapped in a header like
this:

static inline bool hv_is_isolation_supported(void)
{
if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
return 0;

// out of line function call:
return __hv_is_isolation_supported();
}   

I don't think it would be the end of the world to add an
X86_FEATURE_HYPERV_GUEST, either.  There are plenty of bits allocated
for Xen and VMWare.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-07-28 Thread Borislav Petkov
On Wed, Jul 28, 2021 at 02:17:27PM +0100, Christoph Hellwig wrote:
> So common checks obviously make sense, but I really hate the stupid
> multiplexer.  Having one well-documented helper per feature is much
> easier to follow.

We had that in x86 - it was called cpu_has_ where xxx is the
feature bit. It didn't scale with the sheer amount of feature bits that
kept getting added so we do cpu_feature_enabled(X86_FEATURE_XXX) now.

The idea behind this is very similar - those protected guest flags
will only grow in the couple of tens range - at least - so having a
multiplexer is a lot simpler, I'd say, than having a couple of tens of
helpers. And those PATTR flags should have good, readable names, btw.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 24/24] iommu: Only log strictness for DMA domains

2021-07-28 Thread Robin Murphy
When passthrough is enabled, the default strictness policy becomes
irrelevant, since any subsequent runtime override to a DMA domain type
now embodies an explicit choice of strictness as well. Save on noise by
only logging the default policy when it is meaningfully in effect.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index be399d630953..87d7b299436e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -144,10 +144,11 @@ static int __init iommu_subsys_init(void)
(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ?
"(set via kernel command line)" : "");
 
-   pr_info("DMA domain TLB invalidation policy: %s mode %s\n",
-   iommu_dma_strict ? "strict" : "lazy",
-   (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
-   "(set via kernel command line)" : "");
+   if (!iommu_default_passthrough())
+   pr_info("DMA domain TLB invalidation policy: %s mode %s\n",
+   iommu_dma_strict ? "strict" : "lazy",
+   (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
+   "(set via kernel command line)" : "");
 
return 0;
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 23/24] iommu/arm-smmu: Allow non-strict in pgtable_quirks interface

2021-07-28 Thread Robin Murphy
To make io-pgtable aware of a flush queue being dynamically enabled,
allow IO_PGTABLE_QUIRK_NON_STRICT to be set even after a domain has been
attached to, and hook up the final piece of the puzzle in iommu-dma.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 11 +++
 drivers/iommu/dma-iommu.c   |  3 +++
 3 files changed, 29 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 19400826eba7..40fa9cb382c3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2711,6 +2711,20 @@ static int arm_smmu_enable_nesting(struct iommu_domain 
*domain)
return ret;
 }
 
+static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,
+   unsigned long quirks)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   if (quirks == IO_PGTABLE_QUIRK_NON_STRICT && smmu_domain->pgtbl_ops) {
+   struct io_pgtable *iop = 
io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+
+   iop->cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+   return 0;
+   }
+   return -EINVAL;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2825,6 +2839,7 @@ static struct iommu_ops arm_smmu_ops = {
.release_device = arm_smmu_release_device,
.device_group   = arm_smmu_device_group,
.enable_nesting = arm_smmu_enable_nesting,
+   .set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 109e4723f9f5..f18684f308b9 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1518,6 +1518,17 @@ static int arm_smmu_set_pgtable_quirks(struct 
iommu_domain *domain,
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
int ret = 0;
 
+   if (quirks == IO_PGTABLE_QUIRK_NON_STRICT) {
+   struct io_pgtable *iop;
+
+   if (!smmu_domain->pgtbl_ops)
+   return -EINVAL;
+
+   iop = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+   iop->cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+   return 0;
+   }
+
mutex_lock(_domain->init_mutex);
if (smmu_domain->smmu)
ret = -EPERM;
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 304a3ec71223..6e3eca778267 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -326,6 +327,8 @@ int iommu_dma_init_fq(struct iommu_domain *domain)
return -ENODEV;
}
cookie->fq_domain = domain;
+   if (domain->ops->set_pgtable_quirks)
+   domain->ops->set_pgtable_quirks(domain, 
IO_PGTABLE_QUIRK_NON_STRICT);
return 0;
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 21/24] iommu/dma: Factor out flush queue init

2021-07-28 Thread Robin Murphy
Factor out flush queue setup from the initial domain init so that we
can potentially trigger it from sysfs later on in a domain's lifetime.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 30 --
 include/linux/dma-iommu.h |  9 ++---
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7f3968865387..304a3ec71223 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev)
return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
+int iommu_dma_init_fq(struct iommu_domain *domain)
+{
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+
+   if (domain->type != IOMMU_DOMAIN_DMA_FQ)
+   return -EINVAL;
+   if (cookie->fq_domain)
+   return 0;
+
+   if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all,
+ iommu_dma_entry_dtor)) {
+   pr_warn("iova flush queue initialization failed\n");
+   domain->type = IOMMU_DOMAIN_DMA;
+   return -ENODEV;
+   }
+   cookie->fq_domain = domain;
+   return 0;
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -362,16 +381,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
}
 
init_iova_domain(iovad, 1UL << order, base_pfn);
-
-   if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) {
-   if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
- iommu_dma_entry_dtor)) {
-   pr_warn("iova flush queue initialization failed\n");
-   domain->type = IOMMU_DOMAIN_DMA;
-   } else {
-   cookie->fq_domain = domain;
-   }
-   }
+   iommu_dma_init_fq(domain);
 
return iova_reserve_iommu_regions(dev, domain);
 }
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 758ca4694257..81ab647f1618 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
 
 /* Setup call for arch DMA mapping code */
 void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
+int iommu_dma_init_fq(struct iommu_domain *domain);
 
 /* The DMA API isn't _quite_ the whole story, though... */
 /*
@@ -37,9 +38,6 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
-void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
-   struct iommu_domain *domain);
-
 extern bool iommu_dma_forcedac;
 
 #else /* CONFIG_IOMMU_DMA */
@@ -54,6 +52,11 @@ static inline void iommu_setup_dma_ops(struct device *dev, 
u64 dma_base,
 {
 }
 
+static inline int iommu_dma_init_fq(struct iommu_domain *domain)
+{
+   return -EINVAL;
+}
+
 static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
return -ENODEV;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 22/24] iommu: Allow enabling non-strict mode dynamically

2021-07-28 Thread Robin Murphy
Allocating and enabling a flush queue is in fact something we can
reasonably do while a DMA domain is active, without having to rebuild it
from scratch. Thus we can allow a strict -> non-strict transition from
sysfs without requiring to unbind the device's driver, which is of
particular interest to users who want to make selective relaxations to
critical devices like the one serving their root filesystem.

Disabling and draining a queue also seems technically possible to
achieve without rebuilding the whole domain, but would certainly be more
involved. Furthermore there's not such a clear use-case for tightening
up security *after* the device may already have done whatever it is that
you don't trust it not to do, so we only consider the relaxation case.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 25c1adc1ec67..be399d630953 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3200,6 +3200,13 @@ static int iommu_change_dev_def_domain(struct 
iommu_group *group,
goto out;
}
 
+   /* We can bring up a flush queue without tearing down the domain */
+   if (type == IOMMU_DOMAIN_DMA_FQ && prev_dom->type == IOMMU_DOMAIN_DMA) {
+   prev_dom->type = IOMMU_DOMAIN_DMA_FQ;
+   ret = iommu_dma_init_fq(prev_dom);
+   goto out;
+   }
+
/* Sets group->default_domain to the newly allocated domain */
ret = iommu_group_alloc_default_domain(dev->bus, group, type);
if (ret)
@@ -3240,9 +3247,9 @@ static int iommu_change_dev_def_domain(struct iommu_group 
*group,
 }
 
 /*
- * Changing the default domain through sysfs requires the users to ubind the
- * drivers from the devices in the iommu group. Return failure if this doesn't
- * meet.
+ * Changing the default domain through sysfs requires the users to unbind the
+ * drivers from the devices in the iommu group, except for a DMA -> DMA-FQ
+ * transition. Return failure if this isn't met.
  *
  * We need to consider the race between this and the device release path.
  * device_lock(dev) is used here to guarantee that the device release path
@@ -3318,7 +3325,8 @@ static ssize_t iommu_group_store_type(struct iommu_group 
*group,
 
/* Check if the device in the group still has a driver bound to it */
device_lock(dev);
-   if (device_is_bound(dev)) {
+   if (device_is_bound(dev) && !(req_type == IOMMU_DOMAIN_DMA_FQ &&
+   group->default_domain->type == IOMMU_DOMAIN_DMA)) {
pr_err_ratelimited("Device is still bound to driver\n");
ret = -EBUSY;
goto out;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 20/24] iommu: Merge strictness and domain type configs

2021-07-28 Thread Robin Murphy
To parallel the sysfs behaviour, merge the new build-time option
for DMA domain strictness into the default domain type choice.

Suggested-by: Joerg Roedel 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/Kconfig | 80 +--
 drivers/iommu/iommu.c |  2 +-
 2 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c84da8205be7..6e06f876d75a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -79,55 +79,55 @@ config IOMMU_DEBUGFS
  debug/iommu directory, and then populate a subdirectory with
  entries as required.
 
-config IOMMU_DEFAULT_PASSTHROUGH
-   bool "IOMMU passthrough by default"
-   depends on IOMMU_API
-   help
- Enable passthrough by default, removing the need to pass in
- iommu.passthrough=on or iommu=pt through command line. If this
- is enabled, you can still disable with iommu.passthrough=off
- or iommu=nopt depending on the architecture.
-
- If unsure, say N here.
-
 choice
-   prompt "IOMMU default DMA IOTLB invalidation mode"
-   depends on IOMMU_DMA
-
-   default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU)
-   default IOMMU_DEFAULT_STRICT
+   prompt "IOMMU default domain type"
+   depends on IOMMU_API
+   default IOMMU_DEFAULT_DMA_LAZY if AMD_IOMMU || INTEL_IOMMU
+   default IOMMU_DEFAULT_DMA_STRICT
help
- This option allows an IOMMU DMA IOTLB invalidation mode to be
- chosen at build time, to override the default mode of each ARCH,
- removing the need to pass in kernel parameters through command line.
- It is still possible to provide common boot params to override this
- config.
+ Choose the type of IOMMU domain used to manage DMA API usage by
+ device drivers. The options here typically represent different
+ levels of tradeoff between robustness/security and performance,
+ depending on the IOMMU driver. Not all IOMMUs support all options.
+ This choice can be overridden at boot via the command line, and for
+ some devices also at runtime via sysfs.
 
  If unsure, keep the default.
 
-config IOMMU_DEFAULT_STRICT
-   bool "strict"
+config IOMMU_DEFAULT_DMA_STRICT
+   bool "Translated - Strict"
help
- For every IOMMU DMA unmap operation, the flush operation of IOTLB and
- the free operation of IOVA are guaranteed to be done in the unmap
- function.
+ Trusted devices use translation to restrict their access to only
+ DMA-mapped pages, with strict TLB invalidation on unmap. Equivalent
+ to passing "iommu.passthrough=0 iommu.strict=1" on the command line.
 
-config IOMMU_DEFAULT_LAZY
-   bool "lazy"
+ Untrusted devices always use this mode, with an additional layer of
+ bounce-buffering such that they cannot gain access to any unrelated
+ data within a mapped page.
+
+config IOMMU_DEFAULT_DMA_LAZY
+   bool "Translated - Lazy"
help
- Support lazy mode, where for every IOMMU DMA unmap operation, the
- flush operation of IOTLB and the free operation of IOVA are deferred.
- They are only guaranteed to be done before the related IOVA will be
- reused.
+ Trusted devices use translation to restrict their access to only
+ DMA-mapped pages, but with "lazy" batched TLB invalidation. This
+ mode allows higher performance with some IOMMUs due to reduced TLB
+ flushing, but at the cost of reduced isolation since devices may be
+ able to access memory for some time after it has been unmapped.
+ Equivalent to passing "iommu.passthrough=0 iommu.strict=0" on the
+ command line.
 
- The isolation provided in this mode is not as secure as STRICT mode,
- such that a vulnerable time window may be created between the DMA
- unmap and the mappings cached in the IOMMU IOTLB or device TLB
- finally being invalidated, where the device could still access the
- memory which has already been unmapped by the device driver.
- However this mode may provide better performance in high throughput
- scenarios, and is still considerably more secure than passthrough
- mode or no IOMMU.
+ If this mode is not supported by the IOMMU driver, the effective
+ runtime default will fall back to IOMMU_DEFAULT_DMA_STRICT.
+
+config IOMMU_DEFAULT_PASSTHROUGH
+   bool "Passthrough"
+   help
+ Trusted devices are identity-mapped, giving them unrestricted access
+ to memory with minimal performance overhead. Equivalent to passing
+ "iommu.passthrough=1" (historically "iommu=pt") on the command line.
+
+ If this mode is not supported by the IOMMU driver, the effective
+ runtime default will fall back to 

[PATCH v2 18/24] iommu: Express DMA strictness via the domain type

2021-07-28 Thread Robin Murphy
Eliminate the iommu_get_dma_strict() indirection and pipe the
information through the domain type from the beginning. Besides
the flow simplification this also has several nice side-effects:

 - Automatically implies strict mode for untrusted devices by
   virtue of their IOMMU_DOMAIN_DMA override.
 - Ensures that we only end up using flush queues for drivers
   which are aware of them and can actually benefit.
 - Allows us to handle flush queue init failure by falling back
   to strict mode instead of leaving it to possibly blow up later.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |  2 +-
 drivers/iommu/dma-iommu.c   |  9 +
 drivers/iommu/iommu.c   | 12 +++-
 include/linux/iommu.h   |  1 -
 5 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a1f0d83d1eb5..19400826eba7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2175,7 +2175,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
 
-   if (!iommu_get_dma_strict(domain))
+   if (domain->type == IOMMU_DOMAIN_DMA_FQ)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 936c5e9d5e82..109e4723f9f5 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -765,7 +765,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
 
-   if (!iommu_get_dma_strict(domain))
+   if (domain->type == IOMMU_DOMAIN_DMA_FQ)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
if (smmu->impl && smmu->impl->init_context) {
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8b3545c01077..7f3968865387 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -363,13 +363,14 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
 
init_iova_domain(iovad, 1UL << order, base_pfn);
 
-   if (!cookie->fq_domain && !dev_is_untrusted(dev) &&
-   domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) {
+   if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
- iommu_dma_entry_dtor))
+ iommu_dma_entry_dtor)) {
pr_warn("iova flush queue initialization failed\n");
-   else
+   domain->type = IOMMU_DOMAIN_DMA;
+   } else {
cookie->fq_domain = domain;
+   }
}
 
return iova_reserve_iommu_regions(dev, domain);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 982545234cf3..eecb5657de69 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -136,6 +136,9 @@ static int __init iommu_subsys_init(void)
}
}
 
+   if (!iommu_default_passthrough() && !iommu_dma_strict)
+   iommu_def_domain_type = IOMMU_DOMAIN_DMA_FQ;
+
pr_info("Default domain type: %s %s\n",
iommu_domain_type_str(iommu_def_domain_type),
(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ?
@@ -357,15 +360,6 @@ void iommu_set_dma_strict(void)
iommu_dma_strict = true;
 }
 
-bool iommu_get_dma_strict(struct iommu_domain *domain)
-{
-   /* only allow lazy flushing for DMA domains */
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   return iommu_dma_strict;
-   return true;
-}
-EXPORT_SYMBOL_GPL(iommu_get_dma_strict);
-
 static ssize_t iommu_group_attr_show(struct kobject *kobj,
 struct attribute *__attr, char *buf)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 046ba4d54cd2..edfe2fdb8368 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -498,7 +498,6 @@ int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);
 
 void iommu_set_dma_strict(void);
-bool iommu_get_dma_strict(struct iommu_domain *domain);
 
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
  unsigned long iova, int flags);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 19/24] iommu: Expose DMA domain strictness via sysfs

2021-07-28 Thread Robin Murphy
The sysfs interface for default domain types exists primarily so users
can choose the performance/security tradeoff relevant to their own
workload. As such, the choice between the policies for DMA domains fits
perfectly as an additional point on that scale - downgrading a
particular device from a strict default to non-strict may be enough to
let it reach the desired level of performance, while still retaining
more peace of mind than with a wide-open identity domain. Now that we've
abstracted non-strict mode as a distinct type of DMA domain, allow it to
be chosen through the user interface as well.

Signed-off-by: Robin Murphy 
---
 Documentation/ABI/testing/sysfs-kernel-iommu_groups | 2 ++
 drivers/iommu/iommu.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
index eae2f1c1e11e..43ba764ba5b7 100644
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -42,6 +42,8 @@ Description:  /sys/kernel/iommu_groups//type shows 
the type of default
  ==
DMA   All the DMA transactions from the device in this group
  are translated by the iommu.
+   DMA-FQAs above, but using batched invalidation to lazily
+ remove translations after use.
identity  All the DMA transactions from the device in this group
  are not translated by the iommu.
auto  Change to the type the device was booted with.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index eecb5657de69..5a08e0806cbb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3265,6 +3265,8 @@ static ssize_t iommu_group_store_type(struct iommu_group 
*group,
req_type = IOMMU_DOMAIN_IDENTITY;
else if (sysfs_streq(buf, "DMA"))
req_type = IOMMU_DOMAIN_DMA;
+   else if (sysfs_streq(buf, "DMA-FQ"))
+   req_type = IOMMU_DOMAIN_DMA_FQ;
else if (sysfs_streq(buf, "auto"))
req_type = 0;
else
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 17/24] iommu/vt-d: Prepare for multiple DMA domain types

2021-07-28 Thread Robin Murphy
In preparation for the strict vs. non-strict decision for DMA domains
to be expressed in the domain type, make sure we expose our flush queue
awareness by accepting the new domain type, and test the specific
feature flag where we want to identify DMA domains in general. The DMA
ops reset/setup can simply be made unconditional, since iommu-dma
already knows only to touch DMA domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/intel/iommu.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7e168634c433..8fc46c9d6b96 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -582,7 +582,7 @@ struct intel_iommu *domain_get_iommu(struct dmar_domain 
*domain)
int iommu_id;
 
/* si_domain and vm domain should not get here. */
-   if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
+   if (WARN_ON(!iommu_is_dma_domain(>domain)))
return NULL;
 
for_each_domain_iommu(iommu_id, domain)
@@ -1034,7 +1034,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain 
*domain,
pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << 
VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
if (domain_use_first_level(domain)) {
pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US;
-   if (domain->domain.type == IOMMU_DOMAIN_DMA)
+   if (iommu_is_dma_domain(>domain))
pteval |= DMA_FL_PTE_ACCESS;
}
if (cmpxchg64(>val, 0ULL, pteval))
@@ -2345,7 +2345,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned 
long iov_pfn,
if (domain_use_first_level(domain)) {
attr |= DMA_FL_PTE_XD | DMA_FL_PTE_US;
 
-   if (domain->domain.type == IOMMU_DOMAIN_DMA) {
+   if (iommu_is_dma_domain(>domain)) {
attr |= DMA_FL_PTE_ACCESS;
if (prot & DMA_PTE_WRITE)
attr |= DMA_FL_PTE_DIRTY;
@@ -4528,6 +4528,7 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
 
switch (type) {
case IOMMU_DOMAIN_DMA:
+   case IOMMU_DOMAIN_DMA_FQ:
case IOMMU_DOMAIN_UNMANAGED:
dmar_domain = alloc_domain(0);
if (!dmar_domain) {
@@ -5197,12 +5198,8 @@ static void intel_iommu_release_device(struct device 
*dev)
 
 static void intel_iommu_probe_finalize(struct device *dev)
 {
-   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
-   if (domain && domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, 0, U64_MAX);
-   else
-   set_dma_ops(dev, NULL);
+   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
 }
 
 static void intel_iommu_get_resv_regions(struct device *device,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 16/24] iommu/arm-smmu: Prepare for multiple DMA domain types

2021-07-28 Thread Robin Murphy
In preparation for the strict vs. non-strict decision for DMA domains to
be expressed in the domain type, make sure we expose our flush queue
awareness by accepting the new domain type.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4c648da447bf..a1f0d83d1eb5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1972,6 +1972,7 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
 
if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_DMA_FQ &&
type != IOMMU_DOMAIN_IDENTITY)
return NULL;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 970d9e4dcd69..936c5e9d5e82 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -869,7 +869,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned 
type)
struct arm_smmu_domain *smmu_domain;
 
if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY) {
-   if (using_legacy_binding || type != IOMMU_DOMAIN_DMA)
+   if (using_legacy_binding ||
+   (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_DMA_FQ))
return NULL;
}
/*
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 15/24] iommu/amd: Prepare for multiple DMA domain types

2021-07-28 Thread Robin Murphy
The DMA ops reset/setup can simply be unconditional, since
iommu-dma already knows only to touch DMA domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/amd/iommu.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 0fd98d35d73b..02f9b4fffe90 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1707,14 +1707,9 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
 
 static void amd_iommu_probe_finalize(struct device *dev)
 {
-   struct iommu_domain *domain;
-
/* Domains are initialized for this device - have a look what we ended 
up with */
-   domain = iommu_get_domain_for_dev(dev);
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, 0, U64_MAX);
-   else
-   set_dma_ops(dev, NULL);
+   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
 }
 
 static void amd_iommu_release_device(struct device *dev)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 13/24] iommu/dma: Remove redundant "!dev" checks

2021-07-28 Thread Robin Murphy
iommu_dma_init_domain() is now only called from iommu_setup_dma_ops(),
which has already assumed dev to be non-NULL.

Reviewed-by: John Garry 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 10067fbc4309..e28396cea6eb 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -363,7 +363,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
 
init_iova_domain(iovad, 1UL << order, base_pfn);
 
-   if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
+   if (!cookie->fq_domain && !dev_is_untrusted(dev) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
  iommu_dma_entry_dtor))
@@ -372,9 +372,6 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
cookie->fq_domain = domain;
}
 
-   if (!dev)
-   return 0;
-
return iova_reserve_iommu_regions(dev, domain);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 14/24] iommu: Introduce explicit type for non-strict DMA domains

2021-07-28 Thread Robin Murphy
Promote the difference between strict and non-strict DMA domains from an
internal detail to a distinct domain feature and type, to pave the road
for exposing it through the sysfs default domain interface.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c |  2 +-
 drivers/iommu/iommu.c |  8 ++--
 include/linux/iommu.h | 11 +++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e28396cea6eb..8b3545c01077 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1311,7 +1311,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
 * The IOMMU core code allocates the default DMA domain, which the
 * underlying IOMMU driver needs to support via the dma-iommu layer.
 */
-   if (domain->type == IOMMU_DOMAIN_DMA) {
+   if (iommu_is_dma_domain(domain)) {
if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
goto out_err;
dev->dma_ops = _dma_ops;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fa8109369f74..982545234cf3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -115,6 +115,7 @@ static const char *iommu_domain_type_str(unsigned int t)
case IOMMU_DOMAIN_UNMANAGED:
return "Unmanaged";
case IOMMU_DOMAIN_DMA:
+   case IOMMU_DOMAIN_DMA_FQ:
return "Translated";
default:
return "Unknown";
@@ -552,6 +553,9 @@ static ssize_t iommu_group_show_type(struct iommu_group 
*group,
case IOMMU_DOMAIN_DMA:
type = "DMA\n";
break;
+   case IOMMU_DOMAIN_DMA_FQ:
+   type = "DMA-FQ\n";
+   break;
}
}
mutex_unlock(>mutex);
@@ -765,7 +769,7 @@ static int iommu_create_device_direct_mappings(struct 
iommu_group *group,
unsigned long pg_size;
int ret = 0;
 
-   if (!domain || domain->type != IOMMU_DOMAIN_DMA)
+   if (!domain || !iommu_is_dma_domain(domain))
return 0;
 
BUG_ON(!domain->pgsize_bitmap);
@@ -1947,7 +1951,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
 
-   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(domain)) {
+   if (iommu_is_dma_domain(domain) && iommu_get_dma_cookie(domain)) {
iommu_domain_free(domain);
domain = NULL;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 141779d76035..046ba4d54cd2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -61,6 +61,7 @@ struct iommu_domain_geometry {
 #define __IOMMU_DOMAIN_DMA_API (1U << 1)  /* Domain for use in DMA-API
  implementation  */
 #define __IOMMU_DOMAIN_PT  (1U << 2)  /* Domain is identity mapped   */
+#define __IOMMU_DOMAIN_DMA_FQ  (1U << 3)  /* DMA-API uses flush queue*/
 
 /*
  * This are the possible domain-types
@@ -73,12 +74,17 @@ struct iommu_domain_geometry {
  * IOMMU_DOMAIN_DMA- Internally used for DMA-API implementations.
  *   This flag allows IOMMU drivers to implement
  *   certain optimizations for these domains
+ * IOMMU_DOMAIN_DMA_FQ - As above, but definitely using batched TLB
+ *   invalidation.
  */
 #define IOMMU_DOMAIN_BLOCKED   (0U)
 #define IOMMU_DOMAIN_IDENTITY  (__IOMMU_DOMAIN_PT)
 #define IOMMU_DOMAIN_UNMANAGED (__IOMMU_DOMAIN_PAGING)
 #define IOMMU_DOMAIN_DMA   (__IOMMU_DOMAIN_PAGING |\
 __IOMMU_DOMAIN_DMA_API)
+#define IOMMU_DOMAIN_DMA_FQ(__IOMMU_DOMAIN_PAGING |\
+__IOMMU_DOMAIN_DMA_API |   \
+__IOMMU_DOMAIN_DMA_FQ)
 
 struct iommu_domain {
unsigned type;
@@ -90,6 +96,11 @@ struct iommu_domain {
struct iommu_dma_cookie *iova_cookie;
 };
 
+static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
+{
+   return domain->type & __IOMMU_DOMAIN_DMA_API;
+}
+
 enum iommu_cap {
IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU can enforce cache coherent DMA
   transactions */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 12/24] iommu/dma: Unexport IOVA cookie management

2021-07-28 Thread Robin Murphy
IOVA cookies are now got and put by core code, so we no longer need to
export these to modular drivers. The export for getting MSI cookies
stays, since VFIO can still be a module, but it was already relying on
someone else putting them, so that aspect is unaffected.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 7 ---
 drivers/iommu/iommu.c | 3 +--
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 98ba927aee1a..10067fbc4309 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -98,9 +98,6 @@ static struct iommu_dma_cookie *cookie_alloc(enum 
iommu_dma_cookie_type type)
 /**
  * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
  * @domain: IOMMU domain to prepare for DMA-API usage
- *
- * IOMMU drivers should normally call this from their domain_alloc
- * callback when domain->type == IOMMU_DOMAIN_DMA.
  */
 int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
@@ -113,7 +110,6 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
 
return 0;
 }
-EXPORT_SYMBOL(iommu_get_dma_cookie);
 
 /**
  * iommu_get_msi_cookie - Acquire just MSI remapping resources
@@ -151,8 +147,6 @@ EXPORT_SYMBOL(iommu_get_msi_cookie);
  * iommu_put_dma_cookie - Release a domain's DMA mapping resources
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() or
  *  iommu_get_msi_cookie()
- *
- * IOMMU drivers should normally call this from their domain_free callback.
  */
 void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
@@ -172,7 +166,6 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
kfree(cookie);
domain->iova_cookie = NULL;
 }
-EXPORT_SYMBOL(iommu_put_dma_cookie);
 
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ea5a9ea8d431..fa8109369f74 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1947,8 +1947,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
 
-   /* Temporarily ignore -EEXIST while drivers still get their own cookies 
*/
-   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(domain) == 
-ENOMEM) {
+   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(domain)) {
iommu_domain_free(domain);
domain = NULL;
}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 11/24] iommu/virtio: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/virtio-iommu.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 6abdcab7273b..80930ce04a16 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -598,12 +598,6 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
spin_lock_init(>mappings_lock);
vdomain->mappings = RB_ROOT_CACHED;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(>domain)) {
-   kfree(vdomain);
-   return NULL;
-   }
-
return >domain;
 }
 
@@ -643,8 +637,6 @@ static void viommu_domain_free(struct iommu_domain *domain)
 {
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-   iommu_put_dma_cookie(domain);
-
/* Free all remaining mappings (size 2^64) */
viommu_del_mappings(vdomain, 0, 0);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 10/24] iommu/sun50i: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Maxime Ripard 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/sun50i-iommu.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 181bb1c3437c..c349a95ec7bd 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -610,14 +610,10 @@ static struct iommu_domain 
*sun50i_iommu_domain_alloc(unsigned type)
if (!sun50i_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain))
-   goto err_free_domain;
-
sun50i_domain->dt = (u32 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
get_order(DT_SIZE));
if (!sun50i_domain->dt)
-   goto err_put_cookie;
+   goto err_free_domain;
 
refcount_set(_domain->refcnt, 1);
 
@@ -627,10 +623,6 @@ static struct iommu_domain 
*sun50i_iommu_domain_alloc(unsigned type)
 
return _domain->domain;
 
-err_put_cookie:
-   if (type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(_domain->domain);
-
 err_free_domain:
kfree(sun50i_domain);
 
@@ -644,8 +636,6 @@ static void sun50i_iommu_domain_free(struct iommu_domain 
*domain)
free_pages((unsigned long)sun50i_domain->dt, get_order(DT_SIZE));
sun50i_domain->dt = NULL;
 
-   iommu_put_dma_cookie(domain);
-
kfree(sun50i_domain);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 09/24] iommu/sprd: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Chunyan Zhang 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/sprd-iommu.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 73dfd9946312..2bc1de6e823d 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -144,11 +144,6 @@ static struct iommu_domain 
*sprd_iommu_domain_alloc(unsigned int domain_type)
if (!dom)
return NULL;
 
-   if (iommu_get_dma_cookie(>domain)) {
-   kfree(dom);
-   return NULL;
-   }
-
spin_lock_init(>pgtlock);
 
dom->domain.geometry.aperture_start = 0;
@@ -161,7 +156,6 @@ static void sprd_iommu_domain_free(struct iommu_domain 
*domain)
 {
struct sprd_iommu_domain *dom = to_sprd_domain(domain);
 
-   iommu_put_dma_cookie(domain);
kfree(dom);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 08/24] iommu/rockchip: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Heiko Stuebner 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/rockchip-iommu.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 9febfb7f3025..c24561f54f32 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1074,10 +1074,6 @@ static struct iommu_domain 
*rk_iommu_domain_alloc(unsigned type)
if (!rk_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain))
-   goto err_free_domain;
-
/*
 * rk32xx iommus use a 2 level pagetable.
 * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
@@ -1085,7 +1081,7 @@ static struct iommu_domain 
*rk_iommu_domain_alloc(unsigned type)
 */
rk_domain->dt = (u32 *)get_zeroed_page(GFP_KERNEL | GFP_DMA32);
if (!rk_domain->dt)
-   goto err_put_cookie;
+   goto err_free_domain;
 
rk_domain->dt_dma = dma_map_single(dma_dev, rk_domain->dt,
   SPAGE_SIZE, DMA_TO_DEVICE);
@@ -1106,9 +1102,6 @@ static struct iommu_domain 
*rk_iommu_domain_alloc(unsigned type)
 
 err_free_dt:
free_page((unsigned long)rk_domain->dt);
-err_put_cookie:
-   if (type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(_domain->domain);
 err_free_domain:
kfree(rk_domain);
 
@@ -1137,8 +1130,6 @@ static void rk_iommu_domain_free(struct iommu_domain 
*domain)
 SPAGE_SIZE, DMA_TO_DEVICE);
free_page((unsigned long)rk_domain->dt);
 
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(_domain->domain);
kfree(rk_domain);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 07/24] iommu/mtk: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Yong Wu 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/mtk_iommu.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6f7c69688ce2..e39a6d1da28d 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -441,17 +441,11 @@ static struct iommu_domain 
*mtk_iommu_domain_alloc(unsigned type)
if (!dom)
return NULL;
 
-   if (iommu_get_dma_cookie(>domain)) {
-   kfree(dom);
-   return NULL;
-   }
-
return >domain;
 }
 
 static void mtk_iommu_domain_free(struct iommu_domain *domain)
 {
-   iommu_put_dma_cookie(domain);
kfree(to_mtk_domain(domain));
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 06/24] iommu/ipmmu-vmsa: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/ipmmu-vmsa.c | 27 ---
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 51ea6f00db2f..31252268f0d0 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -564,10 +564,13 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
  * IOMMU Operations
  */
 
-static struct iommu_domain *__ipmmu_domain_alloc(unsigned type)
+static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
 {
struct ipmmu_vmsa_domain *domain;
 
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+   return NULL;
+
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
return NULL;
@@ -577,27 +580,6 @@ static struct iommu_domain *__ipmmu_domain_alloc(unsigned 
type)
return >io_domain;
 }
 
-static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
-{
-   struct iommu_domain *io_domain = NULL;
-
-   switch (type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   io_domain = __ipmmu_domain_alloc(type);
-   break;
-
-   case IOMMU_DOMAIN_DMA:
-   io_domain = __ipmmu_domain_alloc(type);
-   if (io_domain && iommu_get_dma_cookie(io_domain)) {
-   kfree(io_domain);
-   io_domain = NULL;
-   }
-   break;
-   }
-
-   return io_domain;
-}
-
 static void ipmmu_domain_free(struct iommu_domain *io_domain)
 {
struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
@@ -606,7 +588,6 @@ static void ipmmu_domain_free(struct iommu_domain 
*io_domain)
 * Free the domain resources. We assume that all devices have already
 * been detached.
 */
-   iommu_put_dma_cookie(io_domain);
ipmmu_domain_destroy_context(domain);
free_io_pgtable_ops(domain->iop);
kfree(domain);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 05/24] iommu/exynos: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Marek Szyprowski 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/exynos-iommu.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index d0fbf1d10e18..34085d069cda 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -735,20 +735,16 @@ static struct iommu_domain 
*exynos_iommu_domain_alloc(unsigned type)
/* Check if correct PTE offsets are initialized */
BUG_ON(PG_ENT_SHIFT < 0 || !dma_dev);
 
+   if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
+   return NULL;
+
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA) {
-   if (iommu_get_dma_cookie(>domain) != 0)
-   goto err_pgtable;
-   } else if (type != IOMMU_DOMAIN_UNMANAGED) {
-   goto err_pgtable;
-   }
-
domain->pgtable = (sysmmu_pte_t *)__get_free_pages(GFP_KERNEL, 2);
if (!domain->pgtable)
-   goto err_dma_cookie;
+   goto err_pgtable;
 
domain->lv2entcnt = (short *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 
1);
if (!domain->lv2entcnt)
@@ -779,9 +775,6 @@ static struct iommu_domain 
*exynos_iommu_domain_alloc(unsigned type)
free_pages((unsigned long)domain->lv2entcnt, 1);
 err_counter:
free_pages((unsigned long)domain->pgtable, 2);
-err_dma_cookie:
-   if (type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(>domain);
 err_pgtable:
kfree(domain);
return NULL;
@@ -809,9 +802,6 @@ static void exynos_iommu_domain_free(struct iommu_domain 
*iommu_domain)
 
spin_unlock_irqrestore(>lock, flags);
 
-   if (iommu_domain->type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(iommu_domain);
-
dma_unmap_single(dma_dev, virt_to_phys(domain->pgtable), LV1TABLE_SIZE,
 DMA_TO_DEVICE);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 04/24] iommu/vt-d: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/intel/iommu.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c12cc955389a..7e168634c433 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1979,10 +1979,6 @@ static void domain_exit(struct dmar_domain *domain)
/* Remove associated devices and clear attached or cached domains */
domain_remove_dev_info(domain);
 
-   /* destroy iovas */
-   if (domain->domain.type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(>domain);
-
if (domain->pgd) {
struct page *freelist;
 
@@ -4544,10 +4540,6 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
return NULL;
}
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain))
-   return NULL;
-
domain = _domain->domain;
domain->geometry.aperture_start = 0;
domain->geometry.aperture_end   =
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 02/24] iommu/amd: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/amd/iommu.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 52fe2326042a..0fd98d35d73b 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1918,16 +1918,7 @@ static struct iommu_domain 
*amd_iommu_domain_alloc(unsigned type)
domain->domain.geometry.aperture_end   = ~0ULL;
domain->domain.geometry.force_aperture = true;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(>domain) == -ENOMEM)
-   goto free_domain;
-
return >domain;
-
-free_domain:
-   protection_domain_free(domain);
-
-   return NULL;
 }
 
 static void amd_iommu_domain_free(struct iommu_domain *dom)
@@ -1944,9 +1935,6 @@ static void amd_iommu_domain_free(struct iommu_domain 
*dom)
if (!dom)
return;
 
-   if (dom->type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(>domain);
-
if (domain->flags & PD_IOMMUV2_MASK)
free_gcr3_table(domain);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 03/24] iommu/arm-smmu: Drop IOVA cookie management

2021-07-28 Thread Robin Murphy
The core code bakes its own cookies now.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  7 ---
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 15 ---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c |  8 
 3 files changed, 4 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6346f21726f4..4c648da447bf 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1984,12 +1984,6 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
if (!smmu_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain)) {
-   kfree(smmu_domain);
-   return NULL;
-   }
-
mutex_init(_domain->init_mutex);
INIT_LIST_HEAD(_domain->devices);
spin_lock_init(_domain->devices_lock);
@@ -2021,7 +2015,6 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-   iommu_put_dma_cookie(domain);
free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 
/* Free the CD and ASID, if we allocated them */
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index ac21170fa208..970d9e4dcd69 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -868,10 +868,10 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
 {
struct arm_smmu_domain *smmu_domain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED &&
-   type != IOMMU_DOMAIN_DMA &&
-   type != IOMMU_DOMAIN_IDENTITY)
-   return NULL;
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY) {
+   if (using_legacy_binding || type != IOMMU_DOMAIN_DMA)
+   return NULL;
+   }
/*
 * Allocate the domain and initialise some of its data structures.
 * We can't really do anything meaningful until we've added a
@@ -881,12 +881,6 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned 
type)
if (!smmu_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA && (using_legacy_binding ||
-   iommu_get_dma_cookie(_domain->domain))) {
-   kfree(smmu_domain);
-   return NULL;
-   }
-
mutex_init(_domain->init_mutex);
spin_lock_init(_domain->cb_lock);
 
@@ -901,7 +895,6 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
 * Free the domain resources. We assume that all devices have
 * already been detached.
 */
-   iommu_put_dma_cookie(domain);
arm_smmu_destroy_domain_context(domain);
kfree(smmu_domain);
 }
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 021cf8f65ffc..4b7eca5f5148 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -335,12 +335,6 @@ static struct iommu_domain 
*qcom_iommu_domain_alloc(unsigned type)
if (!qcom_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain)) {
-   kfree(qcom_domain);
-   return NULL;
-   }
-
mutex_init(_domain->init_mutex);
spin_lock_init(_domain->pgtbl_lock);
 
@@ -351,8 +345,6 @@ static void qcom_iommu_domain_free(struct iommu_domain 
*domain)
 {
struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
 
-   iommu_put_dma_cookie(domain);
-
if (qcom_domain->iommu) {
/*
 * NOTE: unmap can be called after client device is powered
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 01/24] iommu: Pull IOVA cookie management into the core

2021-07-28 Thread Robin Murphy
Now that everyone has converged on iommu-dma for IOMMU_DOMAIN_DMA
support, we can abandon the notion of drivers being responsible for the
cookie type, and consolidate all the management into the core code.

CC: Marek Szyprowski 
CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
CC: Yong Wu 
CC: Heiko Stuebner 
CC: Chunyan Zhang 
CC: Chunyan Zhang 
CC: Maxime Ripard 
CC: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 7 +++
 include/linux/iommu.h | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2cda9950bd5..ea5a9ea8d431 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -7,6 +7,7 @@
 #define pr_fmt(fmt)"iommu: " fmt
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1946,6 +1947,11 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
 
+   /* Temporarily ignore -EEXIST while drivers still get their own cookies 
*/
+   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(domain) == 
-ENOMEM) {
+   iommu_domain_free(domain);
+   domain = NULL;
+   }
return domain;
 }
 
@@ -1957,6 +1963,7 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
 
 void iommu_domain_free(struct iommu_domain *domain)
 {
+   iommu_put_dma_cookie(domain);
domain->ops->domain_free(domain);
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4997c78e2670..141779d76035 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct iommu_domain;
 struct notifier_block;
 struct iommu_sva;
 struct iommu_fault_event;
+struct iommu_dma_cookie;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ   0x0
@@ -86,7 +87,7 @@ struct iommu_domain {
iommu_fault_handler_t handler;
void *handler_token;
struct iommu_domain_geometry geometry;
-   void *iova_cookie;
+   struct iommu_dma_cookie *iova_cookie;
 };
 
 enum iommu_cap {
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 00/24] iommu: Refactor DMA domain strictness

2021-07-28 Thread Robin Murphy
Hi all,

Here's v2 where things start to look more realistic, hence the expanded
CC list. The patches are now based on the current iommu/core branch to
take John's iommu_set_dma_strict() cleanup into account.

The series remiains in two (or possibly 3) logical parts - for people
CC'd on cookie cleanup patches, the later parts should not affect you
since your drivers don't implement non-strict mode anyway; the cleanup
is all pretty straightforward, but please do yell at me if I've managed
to let a silly mistake slip through and broken your driver.

This time I have also build-tested x86 as well as arm64 :)

Changes in v2:

- Add iommu_is_dma_domain() helper to abstract flag check (and help
  avoid silly typos like the one in v1).
- Tweak a few commit messages for spelling and (hopefully) clarity.
- Move the iommu_create_device_direct_mappings() update to patch #14
  where it should have been.
- Rewrite patch #20 as a conversion of the now-existing option.
- Clean up the ops->flush_iotlb_all check which is also made redundant
  by the new domain type
- Add patch #24, which is arguably tangential, but it was something I
  spotted during the rebase, so...

Once again, the whole lot is available on a branch here:

https://gitlab.arm.com/linux-arm/linux-rm/-/tree/iommu/fq

Thanks,
Robin.


CC: Marek Szyprowski 
CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
CC: Yong Wu 
CC: Heiko Stuebner 
CC: Chunyan Zhang 
CC: Chunyan Zhang 
CC: Maxime Ripard 
CC: Jean-Philippe Brucker 

Robin Murphy (24):
  iommu: Pull IOVA cookie management into the core
  iommu/amd: Drop IOVA cookie management
  iommu/arm-smmu: Drop IOVA cookie management
  iommu/vt-d: Drop IOVA cookie management
  iommu/exynos: Drop IOVA cookie management
  iommu/ipmmu-vmsa: Drop IOVA cookie management
  iommu/mtk: Drop IOVA cookie management
  iommu/rockchip: Drop IOVA cookie management
  iommu/sprd: Drop IOVA cookie management
  iommu/sun50i: Drop IOVA cookie management
  iommu/virtio: Drop IOVA cookie management
  iommu/dma: Unexport IOVA cookie management
  iommu/dma: Remove redundant "!dev" checks
  iommu: Introduce explicit type for non-strict DMA domains
  iommu/amd: Prepare for multiple DMA domain types
  iommu/arm-smmu: Prepare for multiple DMA domain types
  iommu/vt-d: Prepare for multiple DMA domain types
  iommu: Express DMA strictness via the domain type
  iommu: Expose DMA domain strictness via sysfs
  iommu: Merge strictness and domain type configs
  iommu/dma: Factor out flush queue init
  iommu: Allow enabling non-strict mode dynamically
  iommu/arm-smmu: Allow non-strict in pgtable_quirks interface
  iommu: Only log strictness for DMA domains

 .../ABI/testing/sysfs-kernel-iommu_groups |  2 +
 drivers/iommu/Kconfig | 80 +--
 drivers/iommu/amd/iommu.c | 21 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 25 --
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 29 ---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c   |  8 --
 drivers/iommu/dma-iommu.c | 44 +-
 drivers/iommu/exynos-iommu.c  | 18 +
 drivers/iommu/intel/iommu.c   | 23 ++
 drivers/iommu/iommu.c | 53 +++-
 drivers/iommu/ipmmu-vmsa.c| 27 +--
 drivers/iommu/mtk_iommu.c |  6 --
 drivers/iommu/rockchip-iommu.c| 11 +--
 drivers/iommu/sprd-iommu.c|  6 --
 drivers/iommu/sun50i-iommu.c  | 12 +--
 drivers/iommu/virtio-iommu.c  |  8 --
 include/linux/dma-iommu.h |  9 ++-
 include/linux/iommu.h | 15 +++-
 18 files changed, 171 insertions(+), 226 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-28 Thread Robin Murphy

On 2021-07-28 16:17, Ming Lei wrote:

On Wed, Jul 28, 2021 at 11:38:18AM +0100, John Garry wrote:

On 28/07/2021 02:32, Ming Lei wrote:

On Mon, Jul 26, 2021 at 3:51 PM John Garry  wrote:

On 23/07/2021 11:21, Ming Lei wrote:

Thanks, I was also going to suggest the latter, since it's what
arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most
indicative of where the slowness most likely stems from.

The improvement from 'iommu.strict=0' is very small:


Have you tried turning off the IOMMU to ensure that this is really just
an IOMMU problem?

You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig or passing
cmdline param iommu.passthrough=1 to bypass the the SMMU (equivalent to
disabling for kernel drivers).

Bypassing SMMU via iommu.passthrough=1 basically doesn't make a difference
on this issue.


A ~90% throughput drop still seems to me to be too high to be a software
issue. More so since I don't see similar on my system. And that throughput
drop does not lead to a total CPU usage drop, from the fio log.


Indeed, it now sounds like $SUBJECT has been a complete red herring, and 
although the SMMU may be reflecting the underlying slowness it is not in 
fact a significant contributor to it. Presumably perf shows any 
difference in CPU time moving elsewhere once iommu_dma_unmap_sg() is out 
of the picture?



Do you know if anyone has run memory benchmark tests on this board to find
out NUMA effect? I think lmbench or stream could be used for this.


https://lore.kernel.org/lkml/YOhbc5C47IzC893B@T590/


Hmm, a ~4x discrepancy in CPU<->memory bandwidth is pretty significant, 
but it's still not the ~10x discrepancy in NVMe throughput. Possibly 
CPU<->PCIe and/or PCIe<->memory bandwidth is even further impacted 
between sockets, or perhaps all the individual latencies just add up - 
that level of detailed performance analysis is beyond my expertise. 
Either way I guess it's probably time to take it up with the system 
vendor to see if there's anything which can be tuned in hardware/firmware.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-07-28 Thread Dave Hansen
On 7/28/21 7:52 AM, Tianyu Lan wrote:
> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int 
> numpages, bool enc)
>   int ret;
>  
>   /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> + if (hv_is_isolation_supported())
> + return hv_set_mem_enc(addr, numpages, enc);
> + else if (!mem_encrypt_active())
>   return 0;

__set_memory_enc_dec() is turning into a real mess.  SEV, TDX and now
Hyper-V are messing around in here.

It doesn't help that these additions are totally uncommented.  Even
worse is that hv_set_mem_enc() was intentionally named "enc" when it
presumably has nothing to do with encryption.

This needs to be refactored.  The current __set_memory_enc_dec() can
become __set_memory_enc_pgtable().  It gets used for the hypervisors
that get informed about "encryption" status via page tables: SEV and TDX.

Then, rename hv_set_mem_enc() to hv_set_visible_hcall().  You'll end up
with:

int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
if (hv_is_isolation_supported())
return hv_set_visible_hcall(...);

if (mem_encrypt_active() || ...)
return __set_memory_enc_pgtable();

/* Nothing to do */
return 0;
}

That tells the story pretty effectively, in code.

> +int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
> +{
> + return hv_set_mem_host_visibility((void *)addr,
> + numpages * HV_HYP_PAGE_SIZE,
> + enc ? VMBUS_PAGE_NOT_VISIBLE
> + : VMBUS_PAGE_VISIBLE_READ_WRITE);
> +}

I know this is off in Hyper-V code, but this just makes my eyes bleed.
I'd much rather see something which is less compact but readable.

> +/* Hyper-V GPA map flags */
> +#define  VMBUS_PAGE_NOT_VISIBLE  0
> +#define  VMBUS_PAGE_VISIBLE_READ_ONLY1
> +#define  VMBUS_PAGE_VISIBLE_READ_WRITE   3

That looks suspiciously like an enum.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-28 Thread Ming Lei
On Wed, Jul 28, 2021 at 11:38:18AM +0100, John Garry wrote:
> On 28/07/2021 02:32, Ming Lei wrote:
> > On Mon, Jul 26, 2021 at 3:51 PM John Garry  wrote:
> > > On 23/07/2021 11:21, Ming Lei wrote:
> > > > > Thanks, I was also going to suggest the latter, since it's what
> > > > > arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be 
> > > > > most
> > > > > indicative of where the slowness most likely stems from.
> > > > The improvement from 'iommu.strict=0' is very small:
> > > > 
> > > Have you tried turning off the IOMMU to ensure that this is really just
> > > an IOMMU problem?
> > > 
> > > You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig or passing
> > > cmdline param iommu.passthrough=1 to bypass the the SMMU (equivalent to
> > > disabling for kernel drivers).
> > Bypassing SMMU via iommu.passthrough=1 basically doesn't make a difference
> > on this issue.
> 
> A ~90% throughput drop still seems to me to be too high to be a software
> issue. More so since I don't see similar on my system. And that throughput
> drop does not lead to a total CPU usage drop, from the fio log.
> 
> Do you know if anyone has run memory benchmark tests on this board to find
> out NUMA effect? I think lmbench or stream could be used for this.

https://lore.kernel.org/lkml/YOhbc5C47IzC893B@T590/

-- 
Ming

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |   6 ++
 drivers/net/hyperv/netvsc.c   | 144 +-
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h|   5 ++
 4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bc48855dff10..862419912bfb 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,8 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
+   u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
u32 send_section_size;
@@ -1730,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 7bd935412853..fc312e5db4d5 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   vunmap(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   vunmap(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device 
*net_device, u32 q_idx)
return nvchan->mrc.slots ? 0 : -ENOMEM;
 }
 
+static void *netvsc_remap_buf(void *buf, unsigned long size)
+{
+   unsigned long *pfns;
+   void *vaddr;
+   int i;
+
+   pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long),
+  GFP_KERNEL);
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
+   + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
 static int netvsc_init_buf(struct hv_device *device,
   struct netvsc_device *net_device,
   const struct netvsc_device_info *device_info)
@@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device,
unsigned int buf_size;
size_t map_words;
int i, ret = 0;
+   void *vaddr;
 
/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size);
+   if (!vaddr)
+   goto cleanup;
+
+   net_device->recv_original_buf = net_device->recv_buf;
+   net_device->recv_buf = vaddr;
+   }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = _device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   vaddr = netvsc_remap_buf(net_device->send_buf, buf_size);
+   if (!vaddr)
+   goto cleanup;
+
+   net_device->send_original_buf = 

[PATCH 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 68 +++---
 1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 328bb961c281..78320719bdd8 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -427,6 +429,8 @@ struct storvsc_cmd_request {
u32 payload_sz;
 
struct vstor_packet vstor_packet;
+   u32 hvpg_count;
+   struct hv_dma_range *dma_range;
 };
 
 
@@ -509,6 +513,14 @@ struct storvsc_scan_work {
u8 tgt_id;
 };
 
+#define storvsc_dma_map(dev, page, offset, size, dir) \
+   dma_map_page(dev, page, offset, size, dir)
+
+#define storvsc_dma_unmap(dev, dma_range, dir) \
+   dma_unmap_page(dev, dma_range.dma,  \
+  dma_range.mapping_size,  \
+  dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
+
 static void storvsc_device_scan(struct work_struct *work)
 {
struct storvsc_scan_work *wrk;
@@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
struct hv_device *device;
struct storvsc_device *stor_device;
struct Scsi_Host *shost;
+   int i;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
}
 
+   if (request->dma_range) {
+   for (i = 0; i < request->hvpg_count; i++)
+   storvsc_dma_unmap(>device,
+   request->dma_range[i],
+   
request->vstor_packet.vm_srb.data_in == READ_TYPE);
+
+   kfree(request->dma_range);
+   }
+
storvsc_on_receive(stor_device, packet, request);
continue;
}
@@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
+   dma_addr_t dma;
u64 hvpfn;
+   u32 size;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   cmd_request->dma_range = kcalloc(hvpg_count,
+sizeof(*cmd_request->dma_range),
+GFP_ATOMIC);
+   if (!cmd_request->dma_range) {
+   ret = -ENOMEM;
+   goto free_payload;
+   }
 
for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
/*
@@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
 * last sgl should be reached at the same time that
 * the PFN array is filled.
 */
-   while (hvpfns_to_add--)
-   payload->range.pfn_array[i++] = hvpfn++;
+   while (hvpfns_to_add--) {
+   size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg,
+  (unsigned long)length);
+   dma = storvsc_dma_map(>device, 
pfn_to_page(hvpfn++),
+ offset_in_hvpg, size,
+ scmnd->sc_data_direction);
+   if (dma_mapping_error(>device, dma)) {
+   ret = -ENOMEM;
+   goto free_dma_range;
+   }
+
+   if (offset_in_hvpg) {
+   payload->range.offset = dma & 
~HV_HYP_PAGE_MASK;
+   

[PATCH 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls dma_map_decrypted()
to mark bounce buffer visible to host and map it in extra
address space. Populate dma memory decrypted ops with hv
map/unmap function.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

The map function vmap_pfn() can't work in the early place
hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
buffer in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 28 ++
 arch/x86/include/asm/mshyperv.h |  2 +
 arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
 drivers/hv/vmbus_drv.c  |  3 ++
 drivers/iommu/hyperv-iommu.c| 65 +
 include/linux/hyperv.h  |  1 +
 6 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 13bab7f07085..9fbb5cbf3321 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -266,3 +266,31 @@ int hv_set_mem_enc(unsigned long addr, int numpages, bool 
enc)
enc ? VMBUS_PAGE_NOT_VISIBLE
: VMBUS_PAGE_VISIBLE_READ_WRITE);
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return (unsigned long)NULL;
+
+   for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 8bf26e6e7055..b815ec0bc36d 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -249,6 +249,8 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int 
vcpu, int vector,
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
 int hv_set_mem_enc(unsigned long addr, int numpages, bool enc);
+void *hv_map_memory(void *addr, unsigned long size);
+void hv_unmap_memory(void *addr);
 void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
 void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 54f9aa7e8457..43bd031aa332 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 57bbbaa4e8f7..f068e22a5636 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 

[PATCH 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Use dma_map_decrypted() in the swiotlb code, store remap address returned
and use the remap address to copy data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/swiotlb.c| 11 ---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f507e3eacbea..584560ecaa8e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
+ * memory pool may be remapped in the memory encrypted case and 
store
+ * virtual address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fa81c096c1d..6866e5784b53 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -194,8 +194,13 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].alloc_size = 0;
}
 
-   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+   mem->vaddr = dma_map_decrypted(vaddr, bytes);
+   if (!mem->vaddr) {
+   pr_err("Failed to decrypt memory.\n");
+   return;
+   }
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
@@ -360,7 +365,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t 
tlb_addr, size_t size
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
unsigned long pfn = PFN_DOWN(orig_addr);
-   unsigned char *vaddr = phys_to_virt(tlb_addr);
+   unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
unsigned int tlb_offset;
 
if (orig_addr == INVALID_PHYS_ADDR)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
needs to be mapped into address space above vTOM and so
introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
bounce buffer memory. The platform can populate man/unmap callback
in the dma memory decrypted ops.

Signed-off-by: Tianyu Lan 
---
 include/linux/dma-map-ops.h |  9 +
 kernel/dma/mapping.c| 22 ++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..01d60a024e45 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -71,6 +71,11 @@ struct dma_map_ops {
unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
+struct dma_memory_decrypted_ops {
+   void *(*map)(void *addr, unsigned long size);
+   void (*unmap)(void *addr);
+};
+
 #ifdef CONFIG_DMA_OPS
 #include 
 
@@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device 
*dev)
 }
 #endif /* CONFIG_DMA_API_DEBUG */
 
+void *dma_map_decrypted(void *addr, unsigned long size);
+int dma_unmap_decrypted(void *addr, unsigned long size);
+
 extern const struct dma_map_ops dma_dummy_ops;
+extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 
 #endif /* _LINUX_DMA_MAP_OPS_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..6fb150dc1750 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -13,11 +13,13 @@
 #include 
 #include 
 #include 
+#include 
 #include "debug.h"
 #include "direct.h"
 
 bool dma_default_coherent;
 
+struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 /*
  * Managed DMA API
  */
@@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev)
return ops->get_merge_boundary(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+void *dma_map_decrypted(void *addr, unsigned long size)
+{
+   if (set_memory_decrypted((unsigned long)addr,
+size / PAGE_SIZE))
+   return NULL;
+
+   if (dma_memory_generic_decrypted_ops.map)
+   return dma_memory_generic_decrypted_ops.map(addr, size);
+   else
+   return addr;
+}
+
+int dma_unmap_encrypted(void *addr, unsigned long size)
+{
+   if (dma_memory_generic_decrypted_ops.unmap)
+   dma_memory_generic_decrypted_ops.unmap(addr);
+
+   return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE);
+}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/Kconfig|  1 +
 drivers/hv/channel.c  | 10 +
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 84 ++-
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 66c794d92391..a8386998be40 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,7 @@ config HYPERV
depends on X86 && ACPI && X86_LOCAL_APIC && HYPERVISOR_GUEST
select PARAVIRT
select X86_HV_CALLBACK_VECTOR
+   select VMAP_PFN
help
  Select this option to run Linux as a Hyper-V client operating
  system.
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 01048bb07082..7350da9dbe97 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -707,6 +707,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;
 
+   err = hv_ringbuffer_post_init(>outbound,
+ page, send_pages);
+   if (err)
+   goto error_free_gpadl;
+
+   err = hv_ringbuffer_post_init(>inbound,
+ [send_pages], recv_pages);
+   if (err)
+   goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 40bc0eff6665..15cd23a561f3 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+   struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 pagecnt, u32 max_pkt_size);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2aee356840a2..d4f93fca1108 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(>outbound.ring_buffer_mutex);
 }
 
-/* Initialize the ring buffer. */
-int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-  struct page *pages, u32 page_cnt, u32 max_pkt_size)
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 page_cnt)
 {
+   u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+   unsigned long *pfns_wraparound;
+   void *vaddr;
int i;
-   struct page **pages_wraparound;
 
-   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
+   if (!hv_isolation_type_snp())
+   return 0;
+
+   physic_addr += ms_hyperv.shared_gpa_boundary;
 
/*
 * First page holds struct hv_ring_buffer, do wraparound mapping for
 * the rest.
 */
-   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+   pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long),
   GFP_KERNEL);
-   if (!pages_wraparound)
+   if (!pfns_wraparound)
return -ENOMEM;
 
-   pages_wraparound[0] = pages;
+   pfns_wraparound[0] = physic_addr >> PAGE_SHIFT;
for (i = 0; i < 2 * (page_cnt - 1); i++)
-   pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
-
-   ring_info->ring_buffer = (struct hv_ring_buffer *)
-   vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
-
-   kfree(pages_wraparound);
+   pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) +
+   i % (page_cnt - 1) + 1;
 
-
-   if (!ring_info->ring_buffer)
+   vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO);
+   kfree(pfns_wraparound);
+   if (!vaddr)
return -ENOMEM;
 
-   ring_info->ring_buffer->read_index =
-   ring_info->ring_buffer->write_index = 0;
+   /* Clean memory after setting host visibility. */
+   memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+   ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+   ring_info->ring_buffer->read_index = 0;
+   

[PATCH 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
with host in Isolation VM and so it's necessary to use hvcall to set
them visible to host. In Isolation VM with AMD SEV SNP, the access
address should be in the extra space which is above shared gpa
boundary. So remap these pages into the extra address(pa +
shared_gpa_boundary).

Signed-off-by: Tianyu Lan 
---
 drivers/hv/connection.c   | 65 +++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 66 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6d315c1465e0..e6a7bae036a8 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "hyperv_vmbus.h"
@@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
 
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+   if (hv_is_isolation_supported()) {
+   msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+   msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+   }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
/*
@@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
return -ECONNREFUSED;
}
 
+   if (hv_is_isolation_supported()) {
+   vmbus_connection.monitor_pages_va[0]
+   = vmbus_connection.monitor_pages[0];
+   vmbus_connection.monitor_pages[0]
+   = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
+  MEMREMAP_WB);
+   if (!vmbus_connection.monitor_pages[0])
+   return -ENOMEM;
+
+   vmbus_connection.monitor_pages_va[1]
+   = vmbus_connection.monitor_pages[1];
+   vmbus_connection.monitor_pages[1]
+   = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
+  MEMREMAP_WB);
+   if (!vmbus_connection.monitor_pages[1]) {
+   memunmap(vmbus_connection.monitor_pages[0]);
+   return -ENOMEM;
+   }
+
+   memset(vmbus_connection.monitor_pages[0], 0x00,
+  HV_HYP_PAGE_SIZE);
+   memset(vmbus_connection.monitor_pages[1], 0x00,
+  HV_HYP_PAGE_SIZE);
+   }
+
return ret;
 }
 
@@ -159,6 +191,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+   u64 pfn[2];
 
/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -216,6 +249,16 @@ int vmbus_connect(void)
goto cleanup;
}
 
+   if (hv_is_isolation_supported()) {
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   if (hv_mark_gpa_visibility(2, pfn,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+   ret = -EFAULT;
+   goto cleanup;
+   }
+   }
+
msginfo = kzalloc(sizeof(*msginfo) +
  sizeof(struct vmbus_channel_initiate_contact),
  GFP_KERNEL);
@@ -284,6 +327,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+   u64 pfn[2];
+
/*
 * First send the unload request to the host.
 */
@@ -303,6 +348,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
 
+   if (hv_is_isolation_supported()) {
+   if (vmbus_connection.monitor_pages_va[0]) {
+   memunmap(vmbus_connection.monitor_pages[0]);
+   vmbus_connection.monitor_pages[0]
+   = vmbus_connection.monitor_pages_va[0];
+   vmbus_connection.monitor_pages_va[0] = NULL;
+   }
+
+   if (vmbus_connection.monitor_pages_va[1]) {
+   memunmap(vmbus_connection.monitor_pages[1]);
+   vmbus_connection.monitor_pages[1]
+   = vmbus_connection.monitor_pages_va[1];
+   vmbus_connection.monitor_pages_va[1] = NULL;
+   }
+
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+   }
+
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
   

[PATCH 06/13] HV: Add ghcb hvcall support for SNP VM

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 42 +
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c |  6 -
 drivers/hv/hv.c |  8 ++-
 include/asm-generic/mshyperv.h  | 29 +++
 5 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 9c30d5bb7b64..13bab7f07085 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,48 @@
 #include 
 #include 
 
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 1;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 3c0cafdf7309..8bf26e6e7055 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -254,6 +254,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5e479d54918c..6d315c1465e0 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
++channel->sig_events;
 
-   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 59f7173c4d9f..e5c9fc467893 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-   status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+   if (hv_isolation_type_snp())
+   status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+   (void *)aligned_msg, NULL,
+   sizeof(struct hv_input_post_message));
+   else
+   status = hv_do_hypercall(HVCALL_POST_MESSAGE,
+   aligned_msg, NULL);
 
/* Preemption must remain disabled until after the hypercall
 * so some other thread can't get scheduled onto this cpu and
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index b0cfc25dffaa..317d2a8d9700 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -31,6 +31,35 @@
 
 union hv_ghcb {
struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reserved2   : 4;
+  

[PATCH 05/13] HV: Add Write/Read MSR registers via ghcb page

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyper-V requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   |  16 +
 arch/x86/hyperv/ivm.c   | 114 ++
 arch/x86/include/asm/mshyperv.h |  78 +++-
 arch/x86/include/asm/sev.h  |   4 ++
 arch/x86/kernel/cpu/mshyperv.c  |   3 +
 arch/x86/kernel/sev-shared.c|  21 --
 drivers/hv/hv.c | 121 ++--
 include/asm-generic/mshyperv.h  |  12 +++-
 8 files changed, 307 insertions(+), 62 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index ee449c076ef4..b99f6b3930b7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -392,7 +392,7 @@ void __init hyperv_init(void)
goto clean_guest_os_id;
 
if (hv_isolation_type_snp()) {
-   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);
if (!ms_hyperv.ghcb_base)
goto clean_guest_os_id;
 
@@ -485,6 +485,7 @@ void hyperv_cleanup(void)
 
/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
/*
 * Reset hypercall page reference before reset the page,
@@ -558,16 +559,3 @@ bool hv_is_hyperv_initialized(void)
return hypercall_msr.enable;
 }
 EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.priv_high & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 24a58795abd8..9c30d5bb7b64 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,6 +6,8 @@
  *  Tianyu Lan 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -13,6 +15,118 @@
 #include 
 #include 
 
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(_ghcb->ghcb, value >> 32);
+
+   if (sev_es_ghcb_hv_call(_ghcb->ghcb, NULL, SVM_EXIT_MSR, 1, 0))
+   pr_warn("Fail to write msr via ghcb %llx.\n", msr);
+
+   local_irq_restore(flags);
+}
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   if (sev_es_ghcb_hv_call(_ghcb->ghcb, NULL, SVM_EXIT_MSR, 0, 0))
+   pr_warn("Fail to read msr via ghcb %llx.\n", msr);
+   else
+   *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+   | ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+   local_irq_restore(flags);
+}
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+   hv_ghcb_msr_read(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
+
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
+{
+   hv_ghcb_msr_write(msr, value);
+
+   /* Write proxy bit vua wrmsrl instruction. */
+   if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
+   wrmsrl(msr, value | 1 << 20);
+}
+EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
+
+void hv_signal_eom_ghcb(void)
+{
+   hv_sint_wrmsrl_ghcb(HV_X64_MSR_EOM, 0);
+}
+EXPORT_SYMBOL_GPL(hv_signal_eom_ghcb);
+
+enum hv_isolation_type hv_get_isolation_type(void)
+{
+   if (!(ms_hyperv.priv_high & HV_ISOLATION))
+   return 

[PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

Mark vmbus ring buffer visible with set_memory_decrypted() when
establish gpadl handle.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c   | 38 --
 include/linux/hyperv.h | 10 ++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f3761c73b074..01048bb07082 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -465,7 +466,7 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
struct list_head *curr;
u32 next_gpadl_handle;
unsigned long flags;
-   int ret = 0;
+   int ret = 0, index;
 
next_gpadl_handle =
(atomic_inc_return(_connection.next_gpadl_handle) - 1);
@@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
if (ret)
return ret;
 
+   ret = set_memory_decrypted((unsigned long)kbuffer,
+  HVPFN_UP(size));
+   if (ret) {
+   pr_warn("Failed to set host visibility.\n");
+   return ret;
+   }
+
init_completion(>waitevent);
msginfo->waiting_channel = channel;
 
@@ -539,6 +547,15 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
/* At this point, we received the gpadl created msg */
*gpadl_handle = gpadlmsg->gpadl;
 
+   if (type == HV_GPADL_BUFFER)
+   index = 0;
+   else
+   index = channel->gpadl_range[1].gpadlhandle ? 2 : 1;
+
+   channel->gpadl_range[index].size = size;
+   channel->gpadl_range[index].buffer = kbuffer;
+   channel->gpadl_range[index].gpadlhandle = *gpadl_handle;
+
 cleanup:
spin_lock_irqsave(_connection.channelmsg_lock, flags);
list_del(>msglistentry);
@@ -549,6 +566,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
}
 
kfree(msginfo);
+
+   if (ret)
+   set_memory_encrypted((unsigned long)kbuffer,
+HVPFN_UP(size));
+
return ret;
 }
 
@@ -811,7 +833,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 
gpadl_handle)
struct vmbus_channel_gpadl_teardown *msg;
struct vmbus_channel_msginfo *info;
unsigned long flags;
-   int ret;
+   int ret, i;
 
info = kzalloc(sizeof(*info) +
   sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
@@ -859,6 +881,18 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
u32 gpadl_handle)
spin_unlock_irqrestore(_connection.channelmsg_lock, flags);
 
kfree(info);
+
+   /* Find gpadl buffer virtual address and size. */
+   for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
+   if (channel->gpadl_range[i].gpadlhandle == gpadl_handle)
+   break;
+
+   if (set_memory_encrypted((unsigned long)channel->gpadl_range[i].buffer,
+   HVPFN_UP(channel->gpadl_range[i].size)))
+   pr_warn("Fail to set mem host visibility.\n");
+
+   channel->gpadl_range[i].gpadlhandle = 0;
+
return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2e859d2f9609..06eccaba10c5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -809,6 +809,14 @@ struct vmbus_device {
 
 #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
 
+struct vmbus_gpadl_range {
+   u32 gpadlhandle;
+   u32 size;
+   void *buffer;
+};
+
+#define VMBUS_GPADL_RANGE_COUNT3
+
 struct vmbus_channel {
struct list_head listentry;
 
@@ -829,6 +837,8 @@ struct vmbus_channel {
struct completion rescind_event;
 
u32 ringbuffer_gpadlhandle;
+   /* GPADL_RING and Send/Receive GPADL_BUFFER. */
+   struct vmbus_gpadl_range gpadl_range[VMBUS_GPADL_RANGE_COUNT];
 
/* Allocated memory for ring buffer */
struct page *ringbuffer_page;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted().

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/ivm.c  | 112 +
 arch/x86/include/asm/hyperv-tlfs.h |  18 +
 arch/x86/include/asm/mshyperv.h|   3 +-
 arch/x86/mm/pat/set_memory.c   |   6 +-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 6 files changed, 139 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..24a58795abd8
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
+ *
+ * In Isolation VM, all guest memory is encripted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host.
+ */
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+   struct hv_gpa_range_for_visibility **input_pcpu, *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_gpa_range_for_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -EINVAL;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, _processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
+
+/*
+ * hv_set_mem_host_visibility - Set specified memory visible to host.
+ *
+ * In Isolation VM, all guest memory is encrypted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host. This function works as wrap of hv_mark_gpa_visibility()
+ * with memory base and size.
+ */
+static int hv_set_mem_host_visibility(void *kbuffer, size_t size, u32 
visibility)
+{
+   int pagecount = size >> HV_HYP_PAGE_SHIFT;
+   u64 *pfn_array;
+   int ret = 0;
+   int i, pfn;
+
+   if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base)
+   return 0;
+
+   pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+   if (!pfn_array)
+   return -ENOMEM;
+
+   for (i = 0, pfn = 0; i < pagecount; i++) {
+   pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+   pfn++;
+
+   if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+   ret |= hv_mark_gpa_visibility(pfn, pfn_array, 
visibility);
+   pfn = 0;
+
+   if (ret)
+   goto err_free_pfn_array;
+   }
+   }
+
+ err_free_pfn_array:
+   kfree(pfn_array);
+   return ret;
+}
+
+int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
+{
+   return hv_set_mem_host_visibility((void *)addr,
+   numpages * HV_HYP_PAGE_SIZE,
+   enc ? VMBUS_PAGE_NOT_VISIBLE
+   : VMBUS_PAGE_VISIBLE_READ_WRITE);
+}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index f1366ce609e3..f027b5bf6076 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -276,6 +276,11 @@ enum hv_isolation_type {
 #define HV_X64_MSR_TIME_REF_COUNT   

[PATCH 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes shared memory boundary via cpuid
HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 12 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index dcfbd2770d7f..773e84e134b3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -327,6 +327,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.priv_high & HV_ISOLATION) {
ms_hyperv.isolation_config_a = 
cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = 
cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+   ms_hyperv.shared_gpa_boundary =
+   (u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 4269f3174e58..aa26d24a5ca9 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,8 +35,18 @@ struct ms_hyperv_info {
u32 max_vp_index;
u32 max_lp_index;
u32 isolation_config_a;
-   u32 isolation_config_b;
+   union {
+   u32 isolation_config_b;
+   struct {
+   u32 cvm_type : 4;
+   u32 Reserved11 : 1;
+   u32 shared_gpa_boundary_active : 1;
+   u32 shared_gpa_boundary_bits : 6;
+   u32 Reserved12 : 20;
+   };
+   };
void  __percpu **ghcb_base;
+   u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/13] x86/HV: Initialize GHCB page in Isolation VM

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   | 73 +++--
 arch/x86/include/asm/mshyperv.h |  2 +
 include/asm-generic/mshyperv.h  |  2 +
 3 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 4a643a85d570..ee449c076ef4 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -42,6 +43,26 @@ static void *hv_hypercall_pg_saved;
 struct hv_vp_assist_page **hv_vp_assist_page;
 EXPORT_SYMBOL_GPL(hv_vp_assist_page);
 
+static int hyperv_init_ghcb(void)
+{
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EINVAL;
+
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+   if (!ghcb_va)
+   return -ENOMEM;
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
+
+   return 0;
+}
+
 static int hv_cpu_init(unsigned int cpu)
 {
struct hv_vp_assist_page **hvp = _vp_assist_page[smp_processor_id()];
@@ -75,6 +96,8 @@ static int hv_cpu_init(unsigned int cpu)
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
}
 
+   hyperv_init_ghcb();
+
return 0;
 }
 
@@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu)
 {
struct hv_reenlightenment_control re_ctrl;
unsigned int new_cpu;
+   unsigned long flags;
+   void **input_arg;
+   void *pg;
+   void **ghcb_va = NULL;
+
+   local_irq_save(flags);
+   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
+   pg = *input_arg;
+   *input_arg = NULL;
+
+   if (hv_root_partition) {
+   void **output_arg;
+
+   output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+   *output_arg = NULL;
+   }
+
+   if (ms_hyperv.ghcb_base) {
+   ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   if (*ghcb_va)
+   memunmap(*ghcb_va);
+   *ghcb_va = NULL;
+   }
+
+   local_irq_restore(flags);
 
hv_common_cpu_die(cpu);
 
@@ -340,9 +388,22 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
-   if (hv_hypercall_pg == NULL) {
-   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-   goto remove_cpuhp_state;
+   if (hv_hypercall_pg == NULL)
+   goto clean_guest_os_id;
+
+   if (hv_isolation_type_snp()) {
+   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   if (!ms_hyperv.ghcb_base)
+   goto clean_guest_os_id;
+
+   if (hyperv_init_ghcb()) {
+   free_percpu(ms_hyperv.ghcb_base);
+   ms_hyperv.ghcb_base = NULL;
+   goto clean_guest_os_id;
+   }
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -403,7 +464,8 @@ void __init hyperv_init(void)
hv_query_ext_cap(0);
return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -431,6 +493,9 @@ void hyperv_cleanup(void)
 */
hv_hypercall_pg = NULL;
 
+   if (ms_hyperv.ghcb_base)
+   free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index adccbc209169..6627cfd2bfba 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+
 typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
void *data);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c1ab6a6e72b5..4269f3174e58 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -36,6 +36,7 @@ struct ms_hyperv_info {
u32 max_lp_index;
u32 isolation_config_a;
u32 isolation_config_b;
+   void  __percpu **ghcb_base;
 };
 extern struct ms_hyperv_info ms_hyperv;

[PATCH 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-07-28 Thread Tianyu Lan
From: Tianyu Lan 


Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Change sicne RFC V4:
   - Introduce dma map decrypted function to remap bounce buffer
 and provide dma map decrypted ops for platform to hook callback.   
 
   - Split swiotlb and dma map decrypted change into two patches
   - Replace vstart with vaddr in swiotlb changes.

Change since RFC v3:
   - Add interface set_memory_decrypted_map() to decrypt memory and
 map bounce buffer in extra address space 
   - Remove swiotlb remap function and store the remap address
 returned by set_memory_decrypted_map() in swiotlb mem data structure.
   - Introduce hv_set_mem_enc() to make code more readable in the 
__set_memory_enc_dec().

Change since RFC v2:
   - Remove not UIO driver in Isolation VM patch
   - Use vmap_pfn() to replace ioremap_page_range function in
   order to avoid exposing symbol ioremap_page_range() and
   ioremap_page_range()
   - Call hv set mem host visibility hvcall in 
set_memory_encrypted/decrypted()
   - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
   - Fix code style


Tianyu Lan (13):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in the Isolation VM.
  x86/HV: Add new hvcall guest address host visibility support
  HV: Mark vmbus ring buffer visible to host in Isolation VM
  HV: Add Write/Read MSR registers via ghcb page
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |  87 +++--
 arch/x86/hyperv/ivm.c  | 296 +
 arch/x86/include/asm/hyperv-tlfs.h |  18 ++
 arch/x86/include/asm/mshyperv.h|  86 -
 arch/x86/include/asm/sev.h |   4 +
 arch/x86/kernel/cpu/mshyperv.c |   5 +
 arch/x86/kernel/sev-shared.c   |  21 +-
 arch/x86/mm/pat/set_memory.c   |   6 +-
 arch/x86/xen/pci-swiotlb-xen.c |   3 +-
 drivers/hv/Kconfig |   1 +
 drivers/hv/channel.c   |  48 -
 drivers/hv/connection.c|  71 ++-
 drivers/hv/hv.c| 129 +
 drivers/hv/hyperv_vmbus.h  |   3 +
 drivers/hv/ring_buffer.c   |  84 ++--
 drivers/hv/vmbus_drv.c |   3 +
 drivers/iommu/hyperv-iommu.c   |  65 +++
 drivers/net/hyperv/hyperv_net.h|   6 +
 drivers/net/hyperv/netvsc.c| 144 +-
 drivers/net/hyperv/rndis_filter.c  |   2 +
 drivers/scsi/storvsc_drv.c |  68 ++-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |  53 +-
 include/linux/dma-map-ops.h|   9 +
 include/linux/hyperv.h |  16 ++
 include/linux/swiotlb.h|   4 +
 kernel/dma/mapping.c   |  22 +++
 kernel/dma/swiotlb.c   |  11 +-
 29 files changed, 1166 insertions(+), 102 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-07-28 Thread Georgi Djakov
On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> the memory type setting required for the non-coherent masters to use
> system cache. Now that system cache support for GPU is added, we will
> need to set the right PTE attribute for GPU buffers to be sys cached.
> Without this, the system cache lines are not allocated for GPU.
> 
> So the patches in this series introduces a new prot flag IOMMU_LLC,
> renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> and makes GPU the user of this protection flag.

Hi Sai,

Thank you for the patchset! Are you planning to refresh it, as it does
not apply anymore?

Thanks,
Georgi

> 
> The series slightly depends on following 2 patches posted earlier and
> is based on msm-next branch:
>  * https://lore.kernel.org/patchwork/patch/1363008/
>  * https://lore.kernel.org/patchwork/patch/1363010/
> 
> Sai Prakash Ranjan (3):
>   iommu/io-pgtable: Rename last-level cache quirk to
> IO_PGTABLE_QUIRK_PTW_LLC
>   iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
>   drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>  drivers/gpu/drm/msm/msm_iommu.c | 3 +++
>  drivers/gpu/drm/msm/msm_mmu.h   | 4 
>  drivers/iommu/io-pgtable-arm.c  | 9 ++---
>  include/linux/io-pgtable.h  | 6 +++---
>  include/linux/iommu.h   | 6 ++
>  7 files changed, 26 insertions(+), 7 deletions(-)
> 
> 
> base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/11] x86/sev: Add an x86 version of prot_guest_has()

2021-07-28 Thread Christoph Hellwig
On Tue, Jul 27, 2021 at 05:26:05PM -0500, Tom Lendacky via iommu wrote:
> Introduce an x86 version of the prot_guest_has() function. This will be
> used in the more generic x86 code to replace vendor specific calls like
> sev_active(), etc.
> 
> While the name suggests this is intended mainly for guests, it will
> also be used for host memory encryption checks in place of sme_active().
> 
> The amd_prot_guest_has() function does not use EXPORT_SYMBOL_GPL for the
> same reasons previously stated when changing sme_active(), sev_active and

None of that applies here as none of the callers get pulled into
random macros.  The only case of that is sme_me_mask through
sme_mask, but that's not something this series replaces as far as I can
tell.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-07-28 Thread Christoph Hellwig
On Tue, Jul 27, 2021 at 05:26:04PM -0500, Tom Lendacky via iommu wrote:
> In prep for other protected virtualization technologies, introduce a
> generic helper function, prot_guest_has(), that can be used to check
> for specific protection attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks
> to the code (e.g. if (sev_active() || tdx_active())).

So common checks obviously make sense, but I really hate the stupid
multiplexer.  Having one well-documented helper per feature is much
easier to follow.

> +#define PATTR_MEM_ENCRYPT0   /* Encrypted memory */
> +#define PATTR_HOST_MEM_ENCRYPT   1   /* Host encrypted 
> memory */
> +#define PATTR_GUEST_MEM_ENCRYPT  2   /* Guest encrypted 
> memory */
> +#define PATTR_GUEST_PROT_STATE   3   /* Guest encrypted 
> state */

The kerneldoc comments on these individual helpers will give you plenty
of space to properly document what they indicate and what a (potential)
caller should do based on them.  Something the above comments completely
fail to.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/11] Implement generic prot_guest_has() helper function

2021-07-28 Thread Christian König

Am 28.07.21 um 00:26 schrieb Tom Lendacky:

This patch series provides a generic helper function, prot_guest_has(),
to replace the sme_active(), sev_active(), sev_es_active() and
mem_encrypt_active() functions.

It is expected that as new protected virtualization technologies are
added to the kernel, they can all be covered by a single function call
instead of a collection of specific function calls all called from the
same locations.

The powerpc and s390 patches have been compile tested only. Can the
folks copied on this series verify that nothing breaks for them.


As GPU driver dev I'm only one end user of this, but at least from the 
high level point of view that makes totally sense to me.


Feel free to add an Acked-by: Christian König .

We could run that through the AMD GPU unit tests, but I fear we actually 
don't test on a system with SEV/SME active.


Going to raise that on our team call today.

Regards,
Christian.



Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Baoquan He 
Cc: Benjamin Herrenschmidt 
Cc: Borislav Petkov 
Cc: Christian Borntraeger 
Cc: Daniel Vetter 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Airlie 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Joerg Roedel 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Thomas Zimmermann 
Cc: Vasily Gorbik 
Cc: VMware Graphics 
Cc: Will Deacon 

---

Patches based on:
   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
   commit 79e920060fa7 ("Merge branch 'WIP/fixes'")

Tom Lendacky (11):
   mm: Introduce a function to check for virtualization protection
 features
   x86/sev: Add an x86 version of prot_guest_has()
   powerpc/pseries/svm: Add a powerpc version of prot_guest_has()
   x86/sme: Replace occurrences of sme_active() with prot_guest_has()
   x86/sev: Replace occurrences of sev_active() with prot_guest_has()
   x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
   treewide: Replace the use of mem_encrypt_active() with
 prot_guest_has()
   mm: Remove the now unused mem_encrypt_active() function
   x86/sev: Remove the now unused mem_encrypt_active() function
   powerpc/pseries/svm: Remove the now unused mem_encrypt_active()
 function
   s390/mm: Remove the now unused mem_encrypt_active() function

  arch/Kconfig   |  3 ++
  arch/powerpc/include/asm/mem_encrypt.h |  5 --
  arch/powerpc/include/asm/protected_guest.h | 30 +++
  arch/powerpc/platforms/pseries/Kconfig |  1 +
  arch/s390/include/asm/mem_encrypt.h|  2 -
  arch/x86/Kconfig   |  1 +
  arch/x86/include/asm/kexec.h   |  2 +-
  arch/x86/include/asm/mem_encrypt.h | 13 +
  arch/x86/include/asm/protected_guest.h | 27 ++
  arch/x86/kernel/crash_dump_64.c|  4 +-
  arch/x86/kernel/head64.c   |  4 +-
  arch/x86/kernel/kvm.c  |  3 +-
  arch/x86/kernel/kvmclock.c |  4 +-
  arch/x86/kernel/machine_kexec_64.c | 19 +++
  arch/x86/kernel/pci-swiotlb.c  |  9 ++--
  arch/x86/kernel/relocate_kernel_64.S   |  2 +-
  arch/x86/kernel/sev.c  |  6 +--
  arch/x86/kvm/svm/svm.c |  3 +-
  arch/x86/mm/ioremap.c  | 16 +++---
  arch/x86/mm/mem_encrypt.c  | 60 +++---
  arch/x86/mm/mem_encrypt_identity.c |  3 +-
  arch/x86/mm/pat/set_memory.c   |  3 +-
  arch/x86/platform/efi/efi_64.c |  9 ++--
  arch/x86/realmode/init.c   |  8 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  4 +-
  drivers/gpu/drm/drm_cache.c|  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c|  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.c|  6 +--
  drivers/iommu/amd/init.c   |  7 +--
  drivers/iommu/amd/iommu.c  |  3 +-
  drivers/iommu/amd/iommu_v2.c   |  3 +-
  drivers/iommu/iommu.c  |  3 +-
  fs/proc/vmcore.c   |  6 +--
  include/linux/mem_encrypt.h|  4 --
  include/linux/protected_guest.h| 37 +
  kernel/dma/swiotlb.c   |  4 +-
  36 files changed, 218 insertions(+), 104 deletions(-)
  create mode 100644 arch/powerpc/include/asm/protected_guest.h
  create mode 100644 arch/x86/include/asm/protected_guest.h
  create mode 100644 include/linux/protected_guest.h



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: check if group is NULL before remove device

2021-07-28 Thread Frank Wunderlich
Hi Joerg,

Sorry for late reply, somehow i marked message as read without answering it.

Am 15. Juli 2021 09:20:04 MESZ schrieb Joerg Roedel :
>On Thu, Jul 15, 2021 at 09:11:50AM +0200, Frank Wunderlich wrote:
>> From: Frank Wunderlich 
>> 
>> if probe is failing, iommu_group may be not initialized,
>
>Sentences start with capital letters.
>
>IOMMU patch subjects too, after the 'iommu:' prefix.

Will fix these in v2

>> so freeing it will result in NULL pointer access
>
>Please describe in more detail how this NULL-ptr dereference is
>triggered.

I had this by testing this series: 
https://patchwork.kernel.org/project/linux-mediatek/list/?series=515129

Initialization in mtk driver was failed (i guess the iommu group was not yet 
created), cleanup was started and so this function is called with a NULL group 
pointer. I can try to find my debug-trace if you need a kind of backtrace.

regards Frank
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

2021-07-28 Thread John Garry

On 28/07/2021 02:32, Ming Lei wrote:

On Mon, Jul 26, 2021 at 3:51 PM John Garry  wrote:

On 23/07/2021 11:21, Ming Lei wrote:

Thanks, I was also going to suggest the latter, since it's what
arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most
indicative of where the slowness most likely stems from.

The improvement from 'iommu.strict=0' is very small:


Have you tried turning off the IOMMU to ensure that this is really just
an IOMMU problem?

You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig or passing
cmdline param iommu.passthrough=1 to bypass the the SMMU (equivalent to
disabling for kernel drivers).

Bypassing SMMU via iommu.passthrough=1 basically doesn't make a difference
on this issue.


A ~90% throughput drop still seems to me to be too high to be a software 
issue. More so since I don't see similar on my system. And that 
throughput drop does not lead to a total CPU usage drop, from the fio log.


Do you know if anyone has run memory benchmark tests on this board to 
find out NUMA effect? I think lmbench or stream could be used for this.


Testing network performance in an equivalent fashion to storage could 
also be an idea.


Thanks,
John



And from fio log, submission latency is good, but completion latency
is pretty bad,
and maybe it is something that writing to PCI memory isn't committed to HW in
time?


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/7] sections: Unify kernel sections range check and use

2021-07-28 Thread Kefeng Wang
There are three head files(kallsyms.h, kernel.h and sections.h) which
include the kernel sections range check, let's make some cleanup and
unify them.

1. cleanup arch specific text/data check and fix address boundary check
   in kallsyms.h
2. make all the basic/core kernel range check function into sections.h
3. update all the callers, and use the helper in sections.h to simplify
   the code

After this series, we have 5 APIs about kernel sections range check in
sections.h

 * is_kernel_core_data()--- come from core_kernel_data() in kernel.h
 * is_kernel_rodata()   --- already in sections.h
 * is_kernel_text() --- come from kallsyms.h
 * is_kernel_inittext() --- come from kernel.h and kallsyms.h
 * is_kernel()  --- come from kallsyms.h


Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-a...@vger.kernel.org 
Cc: iommu@lists.linux-foundation.org
Cc: b...@vger.kernel.org 

v2:
- add ACK/RW to patch2, and drop inappropriate fix tag
- keep 'core' to check kernel data, suggestted by Steven Rostedt
  , rename is_kernel_data() to is_kernel_core_data()
- drop patch8 which is merged
- drop patch9 which is resend independently

v1:
https://lore.kernel.org/linux-arch/20210626073439.150586-1-wangkefeng.w...@huawei.com

Kefeng Wang (7):
  kallsyms: Remove arch specific text and data check
  kallsyms: Fix address-checks for kernel related range
  sections: Move and rename core_kernel_data() to is_kernel_core_data()
  sections: Move is_kernel_inittext() into sections.h
  kallsyms: Rename is_kernel() and is_kernel_text()
  sections: Add new is_kernel() and is_kernel_text()
  powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper

 arch/powerpc/mm/pgtable_32.c   |  7 +---
 arch/x86/kernel/unwind_orc.c   |  2 +-
 arch/x86/net/bpf_jit_comp.c|  2 +-
 include/asm-generic/sections.h | 71 ++
 include/linux/kallsyms.h   | 21 +++---
 include/linux/kernel.h |  2 -
 kernel/cfi.c   |  2 +-
 kernel/extable.c   | 33 ++--
 kernel/locking/lockdep.c   |  3 --
 kernel/trace/ftrace.c  |  2 +-
 mm/kasan/report.c  |  2 +-
 net/sysctl_net.c   |  2 +-
 12 files changed, 72 insertions(+), 77 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu