Re: [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest

2019-09-23 Thread Lu Baolu

Hi,

On 9/24/19 4:25 AM, Raj, Ashok wrote:

Hi Jacob

On Mon, Sep 23, 2019 at 12:27:15PM -0700, Jacob Pan wrote:


In VT-d 3.0, scalable mode is introduced, which offers two level
translation page tables and nested translation mode. Regards to
GIOVA support, it can be simplified by 1) moving the GIOVA support
over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
second level for GPA->HPA translation, and 4) enable nested (a.k.a.
dual stage) translation in host. Compared with current shadow GIOVA
support, the new approach is more secure and software is simplified
as we only need to flush the pIOMMU IOTLB and possible device-IOTLB
when an IOVA mapping in vIOMMU is torn down.

  .---.
  |  vIOMMU   |
  |---| .---.
  |   |IOTLB flush trap |   QEMU|
  .---.(unmap)  |---|
  | GVA->GPA  |>|   |
  '---' '---'
  |   |   |
  '---'   |
<--
|  VFIO/IOMMU
|  cache invalidation and
| guest gpd bind interfaces
v

For vSVA, the guest PGD bind interface will mark the PASID as guest
PASID and will inject page request into the guest. In FL gIOVA case, I
guess we are assuming there is no page fault for GIOVA. I will need to
add a flag in the gpgd bind such that any PRS will be auto responded
with invalid.


Is there real need to enforce this? I'm not sure if there is any
limitation in the spec, and if so, can the guest check that instead?


For FL gIOVA case, gPASID is always 0. If a physical device is passed
through, hPASID is also 0; If an mdev device (representing an ADI)
instead, hPASID would be the PASID corresponding to the ADI. The
simulation software (i.e. QEMU) maintains a map between gPASID and
hPASID.

I second Ashok's idea. We don't need to distinguish these two cases in
the api and handle page request interrupt in guest as an unrecoverable
one.



Also i believe the idea is to overcommit PASID#0 such uses. Thought
we had a capability to expose this to the vIOMMU as well. Not sure if this
is already documented, if not should be up in the next rev.




Also, native use of IOVA FL map is not to be supported? i.e. IOMMU API
and DMA API for native usage will continue to be SL only?

  .---.
  |  pIOMMU   |
  |---|
  .---.
  | GVA->GPA  |<---First level
  '---'
  | GPA->HPA  |<---Scond level


s/Scond/Second


Yes. Thanks!

Best regards,
Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest

2019-09-23 Thread Lu Baolu

Hi Jacob,

On 9/24/19 3:27 AM, Jacob Pan wrote:

Hi Baolu,

On Mon, 23 Sep 2019 20:24:50 +0800
Lu Baolu  wrote:


This patchset aims to move IOVA (I/O Virtual Address) translation
to 1st-level page table under scalable mode. The major purpose of
this effort is to make guest IOVA support more efficient.

As Intel VT-d architecture offers caching-mode, guest IOVA (GIOVA)
support is now implemented in a shadow page manner. The device
simulation software, like QEMU, has to figure out GIOVA->GPA mapping
and writes to a shadowed page table, which will be used by pIOMMU.
Each time when mappings are created or destroyed in vIOMMU, the
simulation software will intervene. The change on GIOVA->GPA will be
shadowed to host, and the pIOMMU will be updated via VFIO/IOMMU
interfaces.


  .---.
  |  vIOMMU   |
  |---| ..
  |   |IOTLB flush trap |QEMU|
  .---. (map/unmap) ||
  | GVA->GPA  |>|  .--.  |
  '---' |  | GPA->HPA |  |
  |   | |  '--'  |
  '---' ||
||
''
 |
 <
 |
 v VFIO/IOMMU API
   .---.
   |  pIOMMU   |
   |---|
   |   |
   .---.
   | GVA->HPA  |
   '---'
   |   |
   '---'

In VT-d 3.0, scalable mode is introduced, which offers two level
translation page tables and nested translation mode. Regards to
GIOVA support, it can be simplified by 1) moving the GIOVA support
over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
second level for GPA->HPA translation, and 4) enable nested (a.k.a.
dual stage) translation in host. Compared with current shadow GIOVA
support, the new approach is more secure and software is simplified
as we only need to flush the pIOMMU IOTLB and possible device-IOTLB
when an IOVA mapping in vIOMMU is torn down.

  .---.
  |  vIOMMU   |
  |---| .---.
  |   |IOTLB flush trap |   QEMU|
  .---.(unmap)  |---|
  | GVA->GPA  |>|   |
  '---' '---'
  |   |   |
  '---'   |
<--
|  VFIO/IOMMU
|  cache invalidation and
| guest gpd bind interfaces
v

For vSVA, the guest PGD bind interface will mark the PASID as guest
PASID and will inject page request into the guest. In FL gIOVA case, I
guess we are assuming there is no page fault for GIOVA. I will need to
add a flag in the gpgd bind such that any PRS will be auto responded
with invalid.


There should be no page fault. The pages should have been pinned.



Also, native use of IOVA FL map is not to be supported? i.e. IOMMU API
and DMA API for native usage will continue to be SL only?


Yes. There isn't such use case as far as I can see.

Best regards,
Baolu


  .---.
  |  pIOMMU   |
  |---|
  .---.
  | GVA->GPA  |<---First level
  '---'
  | GPA->HPA  |<---Scond level
  '---'
  '---'

This patch series only aims to achieve the first goal, a.k.a using
first level translation for IOVA mappings in vIOMMU. I am sending
it out for your comments. Any comments, suggestions and concerns are
welcomed.





Based-on-idea-by: Ashok Raj 
Based-on-idea-by: Kevin Tian 
Based-on-idea-by: Liu Yi L 
Based-on-idea-by: Lu Baolu 
Based-on-idea-by: Sanjay Kumar 

Lu Baolu (4):
   iommu/vt-d: Move domain_flush_cache helper into header
   iommu/vt-d: Add first level page table interfaces
   iommu/vt-d: Map/unmap domain with mmmap/mmunmap
   iommu/vt-d: Identify domains using first level page table

  drivers/iommu/Makefile |   2 +-
  drivers/iommu/intel-iommu.c| 142 ++--
  drivers/iommu/intel-pgtable.c  | 342
+ include/linux/intel-iommu.h|
31 ++- include/trace/events/intel_iommu.h |  60 +
  5 files changed, 553 insertions(+), 24 deletions(-)
  create mode 100644 drivers/iommu/intel-pgtable.c



[Jacob Pan]


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces

2019-09-23 Thread Lu Baolu

Hi Ashok,

On 9/24/19 4:31 AM, Raj, Ashok wrote:

On Mon, Sep 23, 2019 at 08:24:52PM +0800, Lu Baolu wrote:

This adds functions to manipulate first level page tables
which could be used by a scalale mode capable IOMMU unit.


s/scalale/scalable


Yes.





intel_mmmap_range(domain, addr, end, phys_addr, prot)


Maybe think of a different name..? mmmap seems a bit weird :-)


Yes. I don't like it either. I've thought about it and haven't
figured out a satisfied one. Do you have any suggestions?

Best regards,
Baolu




  - Map an iova range of [addr, end) to the physical memory
started at @phys_addr with the @prot permissions.

intel_mmunmap_range(domain, addr, end)
  - Tear down the map of an iova range [addr, end). A page
list will be returned which will be freed after iotlb
flushing.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Signed-off-by: Lu Baolu 



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces

2019-09-23 Thread Raj, Ashok
On Mon, Sep 23, 2019 at 08:24:52PM +0800, Lu Baolu wrote:
> This adds functions to manipulate first level page tables
> which could be used by a scalale mode capable IOMMU unit.

s/scalale/scalable

> 
> intel_mmmap_range(domain, addr, end, phys_addr, prot)

Maybe think of a different name..? mmmap seems a bit weird :-)

>  - Map an iova range of [addr, end) to the physical memory
>started at @phys_addr with the @prot permissions.
> 
> intel_mmunmap_range(domain, addr, end)
>  - Tear down the map of an iova range [addr, end). A page
>list will be returned which will be freed after iotlb
>flushing.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Cc: Yi Sun 
> Signed-off-by: Lu Baolu 


Re: [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest

2019-09-23 Thread Jacob Pan
Hi Baolu,

On Mon, 23 Sep 2019 20:24:50 +0800
Lu Baolu  wrote:

> This patchset aims to move IOVA (I/O Virtual Address) translation
> to 1st-level page table under scalable mode. The major purpose of
> this effort is to make guest IOVA support more efficient.
> 
> As Intel VT-d architecture offers caching-mode, guest IOVA (GIOVA)
> support is now implemented in a shadow page manner. The device
> simulation software, like QEMU, has to figure out GIOVA->GPA mapping
> and writes to a shadowed page table, which will be used by pIOMMU.
> Each time when mappings are created or destroyed in vIOMMU, the
> simulation software will intervene. The change on GIOVA->GPA will be
> shadowed to host, and the pIOMMU will be updated via VFIO/IOMMU
> interfaces.
> 
> 
>  .---.
>  |  vIOMMU   |
>  |---| ..
>  |   |IOTLB flush trap |QEMU|
>  .---. (map/unmap) ||
>  | GVA->GPA  |>|  .--.  |
>  '---' |  | GPA->HPA |  |
>  |   | |  '--'  |
>  '---' ||
>||
>''
> |
> <
> |
> v VFIO/IOMMU API
>   .---.
>   |  pIOMMU   |
>   |---|
>   |   |
>   .---.
>   | GVA->HPA  |
>   '---'
>   |   |
>   '---'
> 
> In VT-d 3.0, scalable mode is introduced, which offers two level
> translation page tables and nested translation mode. Regards to
> GIOVA support, it can be simplified by 1) moving the GIOVA support
> over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
> 2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
> second level for GPA->HPA translation, and 4) enable nested (a.k.a.
> dual stage) translation in host. Compared with current shadow GIOVA
> support, the new approach is more secure and software is simplified
> as we only need to flush the pIOMMU IOTLB and possible device-IOTLB
> when an IOVA mapping in vIOMMU is torn down.
> 
>  .---.
>  |  vIOMMU   |
>  |---| .---.
>  |   |IOTLB flush trap |   QEMU|
>  .---.(unmap)  |---|
>  | GVA->GPA  |>|   |
>  '---' '---'
>  |   |   |
>  '---'   |
><--
>|  VFIO/IOMMU  
>|  cache invalidation and  
>| guest gpd bind interfaces
>v
For vSVA, the guest PGD bind interface will mark the PASID as guest
PASID and will inject page request into the guest. In FL gIOVA case, I
guess we are assuming there is no page fault for GIOVA. I will need to
add a flag in the gpgd bind such that any PRS will be auto responded
with invalid.

Also, native use of IOVA FL map is not to be supported? i.e. IOMMU API
and DMA API for native usage will continue to be SL only?
>  .---.
>  |  pIOMMU   |
>  |---|
>  .---.
>  | GVA->GPA  |<---First level
>  '---'
>  | GPA->HPA  |<---Scond level
>  '---'
>  '---'
> 
> This patch series only aims to achieve the first goal, a.k.a using
> first level translation for IOVA mappings in vIOMMU. I am sending
> it out for your comments. Any comments, suggestions and concerns are
> welcomed.
> 


> Based-on-idea-by: Ashok Raj 
> Based-on-idea-by: Kevin Tian 
> Based-on-idea-by: Liu Yi L 
> Based-on-idea-by: Lu Baolu 
> Based-on-idea-by: Sanjay Kumar 
> 
> Lu Baolu (4):
>   iommu/vt-d: Move domain_flush_cache helper into header
>   iommu/vt-d: Add first level page table interfaces
>   iommu/vt-d: Map/unmap domain with mmmap/mmunmap
>   iommu/vt-d: Identify domains using first level page table
> 
>  drivers/iommu/Makefile |   2 +-
>  drivers/iommu/intel-iommu.c| 142 ++--
>  drivers/iommu/intel-pgtable.c  | 342
> + include/linux/intel-iommu.h|
> 31 ++- include/trace/events/intel_iommu.h |  60 +
>  5 files changed, 553 insertions(+), 24 deletions(-)
>  create mode 100644 drivers/iommu/intel-pgtable.c
> 

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH trivial 3/3] treewide: arch: Fix Kconfig indentation

2019-09-23 Thread Krzysztof Kozlowski
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^/\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski 
---
 arch/Kconfig   |  4 ++--
 arch/alpha/Kconfig |  2 +-
 arch/arm/Kconfig.debug |  4 ++--
 arch/arm/mach-ep93xx/Kconfig   |  8 
 arch/arm/mach-hisi/Kconfig |  2 +-
 arch/arm/mach-ixp4xx/Kconfig   | 16 
 arch/arm/mach-mmp/Kconfig  |  2 +-
 arch/arm/mach-omap1/Kconfig| 14 +++---
 arch/arm/mach-prima2/Kconfig   |  6 +++---
 arch/arm/mach-s3c24xx/Kconfig  |  4 ++--
 arch/arm/mach-s3c64xx/Kconfig  |  6 +++---
 arch/arm/plat-samsung/Kconfig  |  2 +-
 arch/arm64/Kconfig |  6 +++---
 arch/arm64/Kconfig.debug   |  2 +-
 arch/h8300/Kconfig |  4 ++--
 arch/h8300/Kconfig.cpu |  4 ++--
 arch/m68k/Kconfig.bus  |  2 +-
 arch/m68k/Kconfig.debug| 16 
 arch/m68k/Kconfig.machine  |  8 
 arch/nds32/Kconfig.cpu | 18 +-
 arch/openrisc/Kconfig  | 26 +-
 arch/powerpc/Kconfig.debug | 18 +-
 arch/powerpc/platforms/Kconfig.cputype |  2 +-
 arch/riscv/Kconfig.socs|  2 +-
 arch/sh/boards/Kconfig |  2 +-
 arch/sh/mm/Kconfig |  2 +-
 arch/um/Kconfig|  2 +-
 arch/x86/Kconfig   | 18 +-
 28 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 5f8a5d84dbbe..8d4f77bbed29 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -76,7 +76,7 @@ config JUMP_LABEL
depends on HAVE_ARCH_JUMP_LABEL
depends on CC_HAS_ASM_GOTO
help
- This option enables a transparent branch optimization that
+This option enables a transparent branch optimization that
 makes certain almost-always-true or almost-always-false branch
 conditions even cheaper to execute within the kernel.
 
@@ -84,7 +84,7 @@ config JUMP_LABEL
 scheduler functionality, networking code and KVM have such
 branches and include support for this optimization technique.
 
- If it is detected that the compiler has support for "asm goto",
+If it is detected that the compiler has support for "asm goto",
 the kernel will compile such branches with just a nop
 instruction. When the condition flag is toggled to true, the
 nop will be converted to a jump instruction to execute the
diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index ef179033a7c2..30a6291355cb 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -545,7 +545,7 @@ config NR_CPUS
default "4" if !ALPHA_GENERIC && !ALPHA_MARVEL
help
  MARVEL support can handle a maximum of 32 CPUs, all the others
-  with working support have a maximum of 4 CPUs.
+ with working support have a maximum of 4 CPUs.
 
 config ARCH_DISCONTIGMEM_ENABLE
bool "Discontiguous Memory Support"
diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index 8bcbd0cd739b..0e5d52fbddbd 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -274,7 +274,7 @@ choice
select DEBUG_UART_8250
help
  Say Y here if you want the debug print routines to direct
-  their output to the CNS3xxx UART0.
+ their output to the CNS3xxx UART0.
 
config DEBUG_DAVINCI_DA8XX_UART1
bool "Kernel low-level debugging on DaVinci DA8XX using UART1"
@@ -828,7 +828,7 @@ choice
select DEBUG_UART_8250
help
  Say Y here if you want kernel low-level debugging support
-  on Rockchip RV1108 based platforms.
+ on Rockchip RV1108 based platforms.
 
config DEBUG_RV1108_UART1
bool "Kernel low-level debugging messages via Rockchip RV1108 
UART1"
diff --git a/arch/arm/mach-ep93xx/Kconfig b/arch/arm/mach-ep93xx/Kconfig
index f2db5fd38145..bf81dfab7f1b 100644
--- a/arch/arm/mach-ep93xx/Kconfig
+++ b/arch/arm/mach-ep93xx/Kconfig
@@ -126,10 +126,10 @@ config MACH_MICRO9S
  Contec Micro9-Slim board.
 
 config MACH_SIM_ONE
-bool "Support Simplemachines Sim.One board"
-help
-  Say 'Y' here if you want your kernel to support the
-  Simplemachines Sim.One board.
+   bool "Support Simplemachines Sim.One board"
+   help
+ Say 'Y' here if you want your kernel to support the
+ Simplemachines Sim.One board.
 
 config MACH_SNAPPER_CL15
bool "Support Bluewater Systems Snapper CL15 Module"
diff --git a/arch/arm/mach-hisi/Kconfig 

[PATCH trivial 2/3] treewide: Fix Kconfig indentation

2019-09-23 Thread Krzysztof Kozlowski
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^/\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski 
---
 certs/Kconfig  | 14 ++---
 init/Kconfig   | 28 +-
 kernel/trace/Kconfig   |  8 
 lib/Kconfig|  2 +-
 lib/Kconfig.debug  | 36 +-
 lib/Kconfig.kgdb   |  8 
 mm/Kconfig | 28 +-
 samples/Kconfig|  2 +-
 security/apparmor/Kconfig  |  2 +-
 security/integrity/Kconfig | 24 +++
 security/integrity/ima/Kconfig | 12 ++--
 security/safesetid/Kconfig | 24 +++
 12 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/certs/Kconfig b/certs/Kconfig
index c94e93d8bccf..0358c66d3d7c 100644
--- a/certs/Kconfig
+++ b/certs/Kconfig
@@ -6,14 +6,14 @@ config MODULE_SIG_KEY
default "certs/signing_key.pem"
depends on MODULE_SIG
help
- Provide the file name of a private key/certificate in PEM format,
- or a PKCS#11 URI according to RFC7512. The file should contain, or
- the URI should identify, both the certificate and its corresponding
- private key.
+Provide the file name of a private key/certificate in PEM format,
+or a PKCS#11 URI according to RFC7512. The file should contain, or
+the URI should identify, both the certificate and its corresponding
+private key.
 
- If this option is unchanged from its default "certs/signing_key.pem",
- then the kernel will automatically generate the private key and
- certificate as described in 
Documentation/admin-guide/module-signing.rst
+If this option is unchanged from its default "certs/signing_key.pem",
+then the kernel will automatically generate the private key and
+certificate as described in 
Documentation/admin-guide/module-signing.rst
 
 config SYSTEM_TRUSTED_KEYRING
bool "Provide system-wide ring of trusted keys"
diff --git a/init/Kconfig b/init/Kconfig
index 6d4db887f696..f59c854839d2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -169,10 +169,10 @@ config BUILD_SALT
string "Build ID Salt"
default ""
help
-  The build ID is used to link binaries and their debug info. Setting
-  this option will use the value in the calculation of the build id.
-  This is mostly useful for distributions which want to ensure the
-  build is unique between builds. It's safe to leave the default.
+ The build ID is used to link binaries and their debug info. Setting
+ this option will use the value in the calculation of the build id.
+ This is mostly useful for distributions which want to ensure the
+ build is unique between builds. It's safe to leave the default.
 
 config HAVE_KERNEL_GZIP
bool
@@ -1327,9 +1327,9 @@ menuconfig EXPERT
select DEBUG_KERNEL
help
  This option allows certain base kernel options and settings
-  to be disabled or tweaked. This is for specialized
-  environments which can tolerate a "non-standard" kernel.
-  Only use this if you really know what you are doing.
+ to be disabled or tweaked. This is for specialized
+ environments which can tolerate a "non-standard" kernel.
+ Only use this if you really know what you are doing.
 
 config UID16
bool "Enable 16-bit UID system calls" if EXPERT
@@ -1439,11 +1439,11 @@ config BUG
bool "BUG() support" if EXPERT
default y
help
-  Disabling this option eliminates support for BUG and WARN, reducing
-  the size of your kernel image and potentially quietly ignoring
-  numerous fatal conditions. You should only consider disabling this
-  option for embedded systems with no facilities for reporting errors.
-  Just say Y.
+ Disabling this option eliminates support for BUG and WARN, reducing
+ the size of your kernel image and potentially quietly ignoring
+ numerous fatal conditions. You should only consider disabling this
+ option for embedded systems with no facilities for reporting errors.
+ Just say Y.
 
 config ELF_CORE
depends on COREDUMP
@@ -1459,8 +1459,8 @@ config PCSPKR_PLATFORM
select I8253_LOCK
default y
help
-  This option allows to disable the internal PC-Speaker
-  support, saving some memory.
+ This option allows to disable the internal PC-Speaker
+ support, saving some memory.
 
 config BASE_FULL
default y
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e08527f50d2a..0393003f102f 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -76,7 +76,7 

[PATCH trivial 1/3] treewide: drivers: Fix Kconfig indentation

2019-09-23 Thread Krzysztof Kozlowski
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^/\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/acpi/Kconfig  |  8 +-
 drivers/ata/Kconfig   | 12 +--
 drivers/auxdisplay/Kconfig| 14 +--
 drivers/base/firmware_loader/Kconfig  |  2 +-
 drivers/block/Kconfig | 28 +++---
 drivers/block/mtip32xx/Kconfig|  2 +-
 drivers/char/Kconfig  |  6 +-
 drivers/char/agp/Kconfig  |  2 +-
 drivers/char/hw_random/Kconfig| 10 +-
 drivers/char/ipmi/Kconfig | 20 ++--
 drivers/clk/Kconfig   |  2 +-
 drivers/clk/mediatek/Kconfig  | 10 +-
 drivers/clk/versatile/Kconfig |  2 +-
 drivers/clocksource/Kconfig   | 20 ++--
 drivers/cpufreq/Kconfig.x86   |  6 +-
 drivers/cpuidle/Kconfig   |  8 +-
 drivers/cpuidle/Kconfig.arm   | 16 ++--
 drivers/crypto/Kconfig|  4 +-
 drivers/crypto/caam/Kconfig   | 14 +--
 drivers/crypto/chelsio/Kconfig| 30 +++---
 drivers/crypto/stm32/Kconfig  |  6 +-
 drivers/crypto/ux500/Kconfig  | 16 ++--
 drivers/devfreq/Kconfig   |  6 +-
 drivers/dma/Kconfig   | 46 -
 drivers/edac/Kconfig  |  2 +-
 drivers/firmware/Kconfig  |  4 +-
 drivers/firmware/efi/Kconfig  |  2 +-
 drivers/hid/Kconfig   |  2 +-
 drivers/hwmon/Kconfig | 14 +--
 drivers/i2c/busses/Kconfig| 16 ++--
 drivers/i2c/muxes/Kconfig | 18 ++--
 drivers/iio/gyro/Kconfig  |  8 +-
 drivers/infiniband/hw/bnxt_re/Kconfig | 12 +--
 drivers/input/keyboard/Kconfig|  8 +-
 drivers/input/mouse/Kconfig   |  6 +-
 drivers/input/tablet/Kconfig  | 20 ++--
 drivers/input/touchscreen/Kconfig |  2 +-
 drivers/iommu/Kconfig |  2 +-
 drivers/irqchip/Kconfig   | 10 +-
 drivers/isdn/hardware/mISDN/Kconfig   |  2 +-
 drivers/macintosh/Kconfig |  6 +-
 drivers/md/Kconfig| 54 +--
 drivers/media/Kconfig |  6 +-
 drivers/media/radio/si470x/Kconfig|  4 +-
 drivers/memstick/core/Kconfig | 18 ++--
 drivers/memstick/host/Kconfig |  4 +-
 drivers/misc/Kconfig  | 16 ++--
 drivers/mtd/nand/onenand/Kconfig  | 12 +--
 drivers/nfc/nfcmrvl/Kconfig   |  2 +-
 drivers/pci/Kconfig   | 24 ++---
 drivers/pci/controller/dwc/Kconfig|  6 +-
 drivers/pci/hotplug/Kconfig   |  2 +-
 drivers/perf/Kconfig  | 14 +--
 drivers/phy/hisilicon/Kconfig |  6 +-
 drivers/pinctrl/Kconfig   | 18 ++--
 drivers/pinctrl/freescale/Kconfig | 12 +--
 drivers/pinctrl/qcom/Kconfig  | 34 +++
 drivers/platform/chrome/Kconfig   |  6 +-
 drivers/platform/mellanox/Kconfig |  4 +-
 drivers/platform/x86/Kconfig  | 48 +-
 drivers/power/avs/Kconfig | 12 +--
 drivers/power/supply/Kconfig  | 30 +++---
 drivers/regulator/Kconfig |  8 +-
 drivers/rpmsg/Kconfig |  2 +-
 drivers/rtc/Kconfig   |  6 +-
 drivers/scsi/Kconfig  | 22 ++---
 drivers/scsi/aic7xxx/Kconfig.aic7xxx  | 14 +--
 drivers/scsi/pcmcia/Kconfig   |  2 +-
 drivers/scsi/qedf/Kconfig |  4 +-
 drivers/scsi/smartpqi/Kconfig |  8 +-
 drivers/soc/fsl/Kconfig   |  8 +-
 drivers/soc/qcom/Kconfig  | 22 ++---
 drivers/soc/rockchip/Kconfig  | 18 ++--
 drivers/spi/Kconfig   | 18 ++--
 drivers/staging/fbtft/Kconfig | 12 +--
 drivers/staging/fwserial/Kconfig  |  6 +-
 drivers/staging/most/Kconfig  |  8 +-
 drivers/staging/nvec/Kconfig  | 10 +-
 drivers/staging/pi433/Kconfig | 24 ++---
 drivers/staging/uwb/Kconfig   | 42 
 .../vc04_services/bcm2835-audio/Kconfig   | 12 +--
 drivers/staging/wusbcore/Kconfig  |  2 +-
 drivers/tty/Kconfig   | 26 ++---
 drivers/tty/hvc/Kconfig   |  4 +-
 drivers/tty/serial/8250/Kconfig   |  2 +-
 drivers/tty/serial/Kconfig   

Re: [RFC PATCH 1/3] dma-mapping: make overriding GFP_* flags arch customizable

2019-09-23 Thread Christoph Hellwig
On Mon, Sep 23, 2019 at 02:34:16PM +0200, Halil Pasic wrote:
> Before commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in
> common code") tweaking the client code supplied GFP_* flags used to be
> an issue handled in the architecture specific code. The commit message
> suggests, that fixing the client code would actually be a better way
> of dealing with this.
> 
> On s390 common I/O devices are generally capable of using the full 64
> bit address space for DMA I/O, but some chunks of the DMA memory need to
> be 31 bit addressable (in physical address space) because the
> instructions involved mandate it. Before switching to DMA API this used
> to be a non-issue, we used to allocate those chunks from ZONE_DMA.
> Currently our only option with the DMA API is to restrict the devices to
> (via dma_mask and dma_mask_coherent) to 31 bit, which is sub-optimal.
> 
> Thus s390 we would benefit form having control over what flags are
> dropped.

No way, sorry.  You need to express that using a dma mask instead of
overloading the GFP flags.


[RFC PATCH 2/3] s390/virtio: fix virtio-ccw DMA without PV

2019-09-23 Thread Halil Pasic
Commit 37db8985b211 ("s390/cio: add basic protected virtualization
support") breaks virtio-ccw devices with VIRTIO_F_IOMMU_PLATFORM for non
Protected Virtualization (PV) guests. The problem is that the dma_mask
of the ccw device, which is used by virtio core, gets changed from 64 to
31 bit, because some of the DMA allocations do require 31 bit
addressable memory. For PV the only drawback is that some of the virtio
structures must end up in ZONE_DMA because we have the bounce the
buffers mapped via DMA API anyway.

But for non PV guests we have a problem: because of the 31 bit mask
guests bigger than 2G are likely to try bouncing buffers. The swiotlb
however is only initialized for PV guests, because we don't want to
bounce anything for non PV guests. The first such map kills the guest.

Let us go back to differentiating between allocations that need to be
from ZONE_DMA and the ones that don't using the GFP_DMA flag. For that
we need to make sure dma_override_gfp_flags() won't clear away GFP_DMA
like it does by default. Then we can fix the dma_mask. CCW devices are
very well capable of  DMA-ing data to/from any address, it is just that
certain things need do be 31 addressable because of the 31 bit heritage.

Signed-off-by: Halil Pasic 
Reported-by: Marc Hartmayer 
Fixes: 37db8985b211 ("s390/cio: add basic protected virtualization support")
---

I was conservative about preserving old behavior for PCI. Could we just
not clear any flags in dma_override_gfp_flags()?
---
 arch/s390/Kconfig |  1 +
 arch/s390/include/asm/cio.h   |  5 +++--
 arch/s390/mm/init.c   | 20 
 drivers/s390/cio/css.c| 16 +---
 drivers/s390/cio/device.c |  5 +++--
 drivers/s390/cio/device_ops.c |  3 ++-
 6 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index f933a473b128..e61351b61ce7 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -60,6 +60,7 @@ config S390
def_bool y
select ARCH_BINFMT_ELF_STATE
select ARCH_HAS_DEVMEM_IS_ALLOWED
+   select ARCH_HAS_DMA_OVERRIDE_GFP_FLAGS
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/s390/include/asm/cio.h b/arch/s390/include/asm/cio.h
index b5bfb3123cb1..32041ec48170 100644
--- a/arch/s390/include/asm/cio.h
+++ b/arch/s390/include/asm/cio.h
@@ -364,10 +364,11 @@ extern void cio_dma_free(void *cpu_addr, size_t size);
 extern struct device *cio_get_dma_css_dev(void);
 
 void *cio_gp_dma_zalloc(struct gen_pool *gp_dma, struct device *dma_dev,
-   size_t size);
+   size_t size, gfp_t flags);
 void cio_gp_dma_free(struct gen_pool *gp_dma, void *cpu_addr, size_t size);
 void cio_gp_dma_destroy(struct gen_pool *gp_dma, struct device *dma_dev);
-struct gen_pool *cio_gp_dma_create(struct device *dma_dev, int nr_pages);
+struct gen_pool *cio_gp_dma_create(struct device *dma_dev, int nr_pages,
+  gfp_t flags);
 
 /* Function from drivers/s390/cio/chsc.c */
 int chsc_sstpc(void *page, unsigned int op, u16 ctrl, u64 *clock_delta);
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index a124f19f7b3c..757e2cc60a1a 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -161,6 +161,26 @@ bool force_dma_unencrypted(struct device *dev)
return is_prot_virt_guest();
 }
 
+
+gfp_t dma_override_gfp_flags(struct device *dev, gfp_t flags)
+{
+   gfp_t  taboo_mask;
+   const char *taboo_msg;
+
+   if (dma_is_direct(dev->dma_ops)) {
+   /* cio: we have to mix in some allocations from ZONE_DMA */
+   taboo_mask = __GFP_DMA32 | __GFP_HIGHMEM;
+   taboo_msg = "__GFP_DMA32, __GFP_HIGHMEM";
+   } else {
+   /* pci */
+   taboo_mask = __GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM;
+   taboo_msg = " __GFP_DMA, __GFP_DMA32, __GFP_HIGHMEM,";
+   }
+   dev_WARN_ONCE(dev, flags & taboo_mask,
+ "fixme: don't dma_alloc with any of: %s\n", taboo_msg);
+   return flags & ~taboo_mask;
+}
+
 /* protected virtualization */
 static void pv_init(void)
 {
diff --git a/drivers/s390/cio/css.c b/drivers/s390/cio/css.c
index 22c55816100b..3115602384d7 100644
--- a/drivers/s390/cio/css.c
+++ b/drivers/s390/cio/css.c
@@ -231,7 +231,7 @@ struct subchannel *css_alloc_subchannel(struct 
subchannel_id schid,
 * The physical addresses of some the dma structures that can
 * belong to a subchannel need to fit 31 bit width (e.g. ccw).
 */
-   sch->dev.coherent_dma_mask = DMA_BIT_MASK(31);
+   sch->dev.coherent_dma_mask = DMA_BIT_MASK(64);
sch->dev.dma_mask = >dev.coherent_dma_mask;
return sch;
 
@@ -1091,7 +1091,8 @@ struct device *cio_get_dma_css_dev(void)
return _subsystems[0]->device;
 }
 
-struct gen_pool *cio_gp_dma_create(struct device *dma_dev, int nr_pages)

[RFC PATCH 0/3] fix dma_mask for CCW devices

2019-09-23 Thread Halil Pasic
Commit 37db8985b211 ("s390/cio: add basic protected virtualization
support") breaks virtio-ccw devices with VIRTIO_F_IOMMU_PLATFORM for non
Protected Virtualization (PV) guests. The problem is that the dma_mask
of the CCW device, which is used by virtio core, gets changed from 64 to
31 bit. This is done because some of the DMA allocations do require 31
bit addressable memory, but it has unfavorable side effects. 

For PV the only drawback is that some of the virtio structures must end
up in ZONE_DMA (with PV we have the bounce the buffers mapped via DMA
API anyway).

But for non PV guests we have a problem: because of the 31 bit mask
guests bigger than 2G are likely to try bouncing buffers. The swiotlb
however is only initialized for PV guests (because we don't want to
bounce anything for non PV guests). The first map of a buffer with
an address beyond 0x7fff kills the guest.

This series sets out to fix this problem by first making the GPF_DMA
flag count for DMA API allocations -- on s390 at least.  Then we set
dma_mask to 64 bit and do the allocations for the memory that needs to
be 31 bit addressable with the GPF_DMA flag.

For CCW devices we could probably just not clear any GFP flags at
all but, I decided to be conservative and change only what really needs
to be changed.

I'm not perfectly satisfied with this solution, but I believe it is good
enough, and I can't think of anything better at the moment. Ideas
welcome.

Halil Pasic (3):
  dma-mapping: make overriding GFP_* flags arch customizable
  s390/virtio: fix virtio-ccw DMA without PV
  dma-mapping: warn on harmful GFP_* flags

 arch/s390/Kconfig |  1 +
 arch/s390/include/asm/cio.h   |  5 +++--
 arch/s390/mm/init.c   | 20 
 drivers/s390/cio/css.c| 16 +---
 drivers/s390/cio/device.c |  5 +++--
 drivers/s390/cio/device_ops.c |  3 ++-
 include/linux/dma-mapping.h   | 13 +
 kernel/dma/Kconfig|  6 ++
 kernel/dma/mapping.c  |  4 +---
 9 files changed, 58 insertions(+), 15 deletions(-)

-- 
2.17.1



[RFC PATCH 3/3] dma-mapping: warn on harmful GFP_* flags

2019-09-23 Thread Halil Pasic
The commit message of commit 57bf5a8963f8 ("dma-mapping: clear harmful
GFP_* flags in common code") says that probably warn when we encounter
harmful GFP_* flags which we clean -- because the client code is best
case silly if not buggy. I concur with that.

Let's warn once when we encounter silly GFP_* flags. The guys caring
about the respective client code will hopefully fix these soon.

Signed-off-by: Halil Pasic 
---

I'm not too happy with my warning message. Suggestions welcome!
---
 include/linux/dma-mapping.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 5024bc863fa7..299f36ac8668 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -823,6 +823,9 @@ extern gfp_t dma_override_gfp_flags(struct device *dev, 
gfp_t flags);
 static inline gfp_t dma_override_gfp_flags(struct device *dev, gfp_t flags)
 {
/* let the implementation decide on the zone to allocate from: */
+   dev_WARN_ONCE(dev,
+ flags & (__GFP_DMA32 | __GFP_DMA | __GFP_HIGHMEM),
+ "fixme: don't dma_alloc with any of: __GFP_DMA32, 
__GFP_DMA, __GFP_HIGHMEM\n");
return flags & ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 }
 #endif
-- 
2.17.1



[RFC PATCH 1/3] dma-mapping: make overriding GFP_* flags arch customizable

2019-09-23 Thread Halil Pasic
Before commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in
common code") tweaking the client code supplied GFP_* flags used to be
an issue handled in the architecture specific code. The commit message
suggests, that fixing the client code would actually be a better way
of dealing with this.

On s390 common I/O devices are generally capable of using the full 64
bit address space for DMA I/O, but some chunks of the DMA memory need to
be 31 bit addressable (in physical address space) because the
instructions involved mandate it. Before switching to DMA API this used
to be a non-issue, we used to allocate those chunks from ZONE_DMA.
Currently our only option with the DMA API is to restrict the devices to
(via dma_mask and dma_mask_coherent) to 31 bit, which is sub-optimal.

Thus s390 we would benefit form having control over what flags are
dropped.

Signed-off-by: Halil Pasic 
---
 include/linux/dma-mapping.h | 10 ++
 kernel/dma/Kconfig  |  6 ++
 kernel/dma/mapping.c|  4 +---
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4a1c4fca475a..5024bc863fa7 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -817,4 +817,14 @@ static inline int dma_mmap_wc(struct device *dev,
 #define dma_unmap_len_set(PTR, LEN_NAME, VAL)do { } while (0)
 #endif
 
+#ifdef CONFIG_ARCH_HAS_DMA_OVERRIDE_GFP_FLAGS
+extern gfp_t dma_override_gfp_flags(struct device *dev, gfp_t flags);
+#else
+static inline gfp_t dma_override_gfp_flags(struct device *dev, gfp_t flags)
+{
+   /* let the implementation decide on the zone to allocate from: */
+   return flags & ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
+}
+#endif
+
 #endif
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 73c5c2b8e824..4756c75047e3 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -54,6 +54,12 @@ config ARCH_HAS_DMA_PREP_COHERENT
 config ARCH_HAS_DMA_COHERENT_TO_PFN
bool
 
+config ARCH_HAS_DMA_MMAP_PGPROT
+   bool
+
+config ARCH_HAS_DMA_OVERRIDE_GFP_FLAGS
+   bool
+
 config ARCH_HAS_FORCE_DMA_UNENCRYPTED
bool
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index d9334f31a5af..535b809548e2 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -303,9 +303,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
if (dma_alloc_from_dev_coherent(dev, size, dma_handle, _addr))
return cpu_addr;
 
-   /* let the implementation decide on the zone to allocate from: */
-   flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
-
+   flag = dma_override_gfp_flags(dev, flag);
if (dma_is_direct(ops))
cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
else if (ops->alloc)
-- 
2.17.1



[RFC PATCH 3/4] iommu/vt-d: Map/unmap domain with mmmap/mmunmap

2019-09-23 Thread Lu Baolu
If a dmar domain has DOMAIN_FLAG_FIRST_LEVEL_TRANS bit set
in its flags, IOMMU will use the first level page table for
translation. Hence, we need to map or unmap addresses in the
first level page table.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 94 -
 1 file changed, 82 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9cfe8098d993..103480016010 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -168,6 +168,11 @@ static inline unsigned long virt_to_dma_pfn(void *p)
return page_to_dma_pfn(virt_to_page(p));
 }
 
+static inline unsigned long dma_pfn_to_addr(unsigned long pfn)
+{
+   return pfn << VTD_PAGE_SHIFT;
+}
+
 /* global iommu list, set NULL for ignored DMAR units */
 static struct intel_iommu **g_iommus;
 
@@ -307,6 +312,9 @@ static int hw_pass_through = 1;
  */
 #define DOMAIN_FLAG_LOSE_CHILDREN  BIT(1)
 
+/* Domain uses first level translation for DMA remapping. */
+#define DOMAIN_FLAG_FIRST_LEVEL_TRANS  BIT(2)
+
 #define for_each_domain_iommu(idx, domain) \
for (idx = 0; idx < g_num_of_iommus; idx++) \
if (domain->iommu_refcnt[idx])
@@ -552,6 +560,11 @@ static inline int domain_type_is_si(struct dmar_domain 
*domain)
return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;
 }
 
+static inline int domain_type_is_flt(struct dmar_domain *domain)
+{
+   return domain->flags & DOMAIN_FLAG_FIRST_LEVEL_TRANS;
+}
+
 static inline int domain_pfn_supported(struct dmar_domain *domain,
   unsigned long pfn)
 {
@@ -1147,8 +1160,15 @@ static struct page *domain_unmap(struct dmar_domain 
*domain,
BUG_ON(start_pfn > last_pfn);
 
/* we don't need lock here; nobody else touches the iova range */
-   freelist = dma_pte_clear_level(domain, agaw_to_level(domain->agaw),
-  domain->pgd, 0, start_pfn, last_pfn, 
NULL);
+   if (domain_type_is_flt(domain))
+   freelist = intel_mmunmap_range(domain,
+  dma_pfn_to_addr(start_pfn),
+  dma_pfn_to_addr(last_pfn + 1));
+   else
+   freelist = dma_pte_clear_level(domain,
+  agaw_to_level(domain->agaw),
+  domain->pgd, 0, start_pfn,
+  last_pfn, NULL);
 
/* free pgd */
if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
@@ -2213,9 +2233,10 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
return level;
 }
 
-static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
-   struct scatterlist *sg, unsigned long phys_pfn,
-   unsigned long nr_pages, int prot)
+static int
+__domain_mapping_dma(struct dmar_domain *domain, unsigned long iov_pfn,
+struct scatterlist *sg, unsigned long phys_pfn,
+unsigned long nr_pages, int prot)
 {
struct dma_pte *first_pte = NULL, *pte = NULL;
phys_addr_t uninitialized_var(pteval);
@@ -2223,13 +2244,6 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
unsigned int largepage_lvl = 0;
unsigned long lvl_pages = 0;
 
-   BUG_ON(!domain_pfn_supported(domain, iov_pfn + nr_pages - 1));
-
-   if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
-   return -EINVAL;
-
-   prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
-
if (!sg) {
sg_res = nr_pages;
pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | prot;
@@ -2328,6 +2342,62 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
return 0;
 }
 
+static int
+__domain_mapping_mm(struct dmar_domain *domain, unsigned long iov_pfn,
+   struct scatterlist *sg, unsigned long phys_pfn,
+   unsigned long nr_pages, int prot)
+{
+   int ret = 0;
+
+   if (!sg)
+   return intel_mmmap_range(domain, dma_pfn_to_addr(iov_pfn),
+dma_pfn_to_addr(iov_pfn + nr_pages),
+dma_pfn_to_addr(phys_pfn), prot);
+
+   while (nr_pages > 0) {
+   unsigned long sg_pages, phys;
+   unsigned long pgoff = sg->offset & ~PAGE_MASK;
+
+   sg_pages = aligned_nrpages(sg->offset, sg->length);
+   phys = sg_phys(sg) - pgoff;
+
+   ret = intel_mmmap_range(domain, dma_pfn_to_addr(iov_pfn),
+   dma_pfn_to_addr(iov_pfn + sg_pages),
+   

[RFC PATCH 4/4] iommu/vt-d: Identify domains using first level page table

2019-09-23 Thread Lu Baolu
This checks whether a domain should use first level page table
for map/unmap. And if so, we should attach the domain to the
device in first level translation mode.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Cc: Sanjay Kumar 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 103480016010..d539e6a6c3dd 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1722,6 +1722,26 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 #endif
 }
 
+/*
+ * Check and return whether first level is used by default for
+ * DMA translation.
+ */
+static bool first_level_by_default(void)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd)
+   if (!sm_supported(iommu) ||
+   !ecap_flts(iommu->ecap) ||
+   !cap_caching_mode(iommu->cap))
+   return false;
+   rcu_read_unlock();
+
+   return true;
+}
+
 static struct dmar_domain *alloc_domain(int flags)
 {
struct dmar_domain *domain;
@@ -1736,6 +1756,9 @@ static struct dmar_domain *alloc_domain(int flags)
domain->has_iotlb_device = false;
INIT_LIST_HEAD(>devices);
 
+   if (first_level_by_default())
+   domain->flags |= DOMAIN_FLAG_FIRST_LEVEL_TRANS;
+
return domain;
 }
 
@@ -2625,6 +2648,11 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (hw_pass_through && domain_type_is_si(domain))
ret = intel_pasid_setup_pass_through(iommu, domain,
dev, PASID_RID2PASID);
+   else if (domain_type_is_flt(domain))
+   ret = intel_pasid_setup_first_level(iommu, dev,
+   domain->pgd, PASID_RID2PASID,
+   domain->iommu_did[iommu->seq_id],
+   PASID_FLAG_SUPERVISOR_MODE);
else
ret = intel_pasid_setup_second_level(iommu, domain,
dev, PASID_RID2PASID);
@@ -5349,8 +5377,14 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
goto attach_failed;
 
/* Setup the PASID entry for mediated devices: */
-   ret = intel_pasid_setup_second_level(iommu, domain, dev,
-domain->default_pasid);
+   if (domain_type_is_flt(domain))
+   ret = intel_pasid_setup_first_level(iommu, dev,
+   domain->pgd, domain->default_pasid,
+   domain->iommu_did[iommu->seq_id],
+   PASID_FLAG_SUPERVISOR_MODE);
+   else
+   ret = intel_pasid_setup_second_level(iommu, domain, dev,
+domain->default_pasid);
if (ret)
goto table_failed;
spin_unlock(>lock);
@@ -5583,7 +5617,8 @@ static phys_addr_t intel_iommu_iova_to_phys(struct 
iommu_domain *domain,
int level = 0;
u64 phys = 0;
 
-   if (dmar_domain->flags & DOMAIN_FLAG_LOSE_CHILDREN)
+   if ((dmar_domain->flags & DOMAIN_FLAG_LOSE_CHILDREN) ||
+   (dmar_domain->flags & DOMAIN_FLAG_FIRST_LEVEL_TRANS))
return 0;
 
pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, );
-- 
2.17.1



[RFC PATCH 0/4] Use 1st-level for DMA remapping in guest

2019-09-23 Thread Lu Baolu
This patchset aims to move IOVA (I/O Virtual Address) translation
to 1st-level page table under scalable mode. The major purpose of
this effort is to make guest IOVA support more efficient.

As Intel VT-d architecture offers caching-mode, guest IOVA (GIOVA)
support is now implemented in a shadow page manner. The device
simulation software, like QEMU, has to figure out GIOVA->GPA mapping
and writes to a shadowed page table, which will be used by pIOMMU.
Each time when mappings are created or destroyed in vIOMMU, the
simulation software will intervene. The change on GIOVA->GPA will be
shadowed to host, and the pIOMMU will be updated via VFIO/IOMMU
interfaces.


 .---.
 |  vIOMMU   |
 |---| ..
 |   |IOTLB flush trap |QEMU|
 .---. (map/unmap) ||
 | GVA->GPA  |>|  .--.  |
 '---' |  | GPA->HPA |  |
 |   | |  '--'  |
 '---' ||
   ||
   ''
|
<
|
v VFIO/IOMMU API
  .---.
  |  pIOMMU   |
  |---|
  |   |
  .---.
  | GVA->HPA  |
  '---'
  |   |
  '---'

In VT-d 3.0, scalable mode is introduced, which offers two level
translation page tables and nested translation mode. Regards to
GIOVA support, it can be simplified by 1) moving the GIOVA support
over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
second level for GPA->HPA translation, and 4) enable nested (a.k.a.
dual stage) translation in host. Compared with current shadow GIOVA
support, the new approach is more secure and software is simplified
as we only need to flush the pIOMMU IOTLB and possible device-IOTLB
when an IOVA mapping in vIOMMU is torn down.

 .---.
 |  vIOMMU   |
 |---| .---.
 |   |IOTLB flush trap |   QEMU|
 .---.(unmap)  |---|
 | GVA->GPA  |>|   |
 '---' '---'
 |   |   |
 '---'   |
   <--
   |  VFIO/IOMMU  
   |  cache invalidation and  
   | guest gpd bind interfaces
   v
 .---.
 |  pIOMMU   |
 |---|
 .---.
 | GVA->GPA  |<---First level
 '---'
 | GPA->HPA  |<---Scond level
 '---'
 '---'

This patch series only aims to achieve the first goal, a.k.a using
first level translation for IOVA mappings in vIOMMU. I am sending
it out for your comments. Any comments, suggestions and concerns are
welcomed.

Based-on-idea-by: Ashok Raj 
Based-on-idea-by: Kevin Tian 
Based-on-idea-by: Liu Yi L 
Based-on-idea-by: Lu Baolu 
Based-on-idea-by: Sanjay Kumar 

Lu Baolu (4):
  iommu/vt-d: Move domain_flush_cache helper into header
  iommu/vt-d: Add first level page table interfaces
  iommu/vt-d: Map/unmap domain with mmmap/mmunmap
  iommu/vt-d: Identify domains using first level page table

 drivers/iommu/Makefile |   2 +-
 drivers/iommu/intel-iommu.c| 142 ++--
 drivers/iommu/intel-pgtable.c  | 342 +
 include/linux/intel-iommu.h|  31 ++-
 include/trace/events/intel_iommu.h |  60 +
 5 files changed, 553 insertions(+), 24 deletions(-)
 create mode 100644 drivers/iommu/intel-pgtable.c

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 1/4] iommu/vt-d: Move domain_flush_cache helper into header

2019-09-23 Thread Lu Baolu
So that it could be used in other source files as well.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 7 ---
 include/linux/intel-iommu.h | 7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5aa68a094efd..9cfe8098d993 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -828,13 +828,6 @@ static struct intel_iommu *device_to_iommu(struct device 
*dev, u8 *bus, u8 *devf
return iommu;
 }
 
-static void domain_flush_cache(struct dmar_domain *domain,
-  void *addr, int size)
-{
-   if (!domain->iommu_coherency)
-   clflush_cache_range(addr, size);
-}
-
 static int device_context_mapped(struct intel_iommu *iommu, u8 bus, u8 devfn)
 {
struct context_entry *context;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ed11ef594378..3ee694d4f361 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -629,6 +629,13 @@ static inline int first_pte_in_page(struct dma_pte *pte)
return !((unsigned long)pte & ~VTD_PAGE_MASK);
 }
 
+static inline void
+domain_flush_cache(struct dmar_domain *domain, void *addr, int size)
+{
+   if (!domain->iommu_coherency)
+   clflush_cache_range(addr, size);
+}
+
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev 
*dev);
 extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces

2019-09-23 Thread Lu Baolu
This adds functions to manipulate first level page tables
which could be used by a scalale mode capable IOMMU unit.

intel_mmmap_range(domain, addr, end, phys_addr, prot)
 - Map an iova range of [addr, end) to the physical memory
   started at @phys_addr with the @prot permissions.

intel_mmunmap_range(domain, addr, end)
 - Tear down the map of an iova range [addr, end). A page
   list will be returned which will be freed after iotlb
   flushing.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Yi Sun 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/intel-pgtable.c  | 342 +
 include/linux/intel-iommu.h|  24 +-
 include/trace/events/intel_iommu.h |  60 +
 4 files changed, 426 insertions(+), 2 deletions(-)
 create mode 100644 drivers/iommu/intel-pgtable.c

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 4f405f926e73..dc550e14cc58 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
-obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
+obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o intel-pgtable.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
 obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
 obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/drivers/iommu/intel-pgtable.c b/drivers/iommu/intel-pgtable.c
new file mode 100644
index ..8e95978cd381
--- /dev/null
+++ b/drivers/iommu/intel-pgtable.c
@@ -0,0 +1,342 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * intel-pgtable.c - Intel IOMMU page table manipulation library
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ */
+
+#define pr_fmt(fmt) "DMAR: " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_X86
+/*
+ * mmmap: Map a range of IO virtual address to physical addresses.
+ */
+#define pgtable_populate(domain, nm)   \
+do {   \
+   void *__new = alloc_pgtable_page(domain->nid);  \
+   if (!__new) \
+   return -ENOMEM; \
+   smp_wmb();  \
+   spin_lock(&(domain)->page_table_lock);  \
+   if (nm ## _present(*nm)) {  \
+   free_pgtable_page(__new);   \
+   } else {\
+   set_##nm(nm, __##nm(__pa(__new) | _PAGE_TABLE));\
+   domain_flush_cache(domain, nm, sizeof(nm##_t)); \
+   }   \
+   spin_unlock(&(domain)->page_table_lock);\
+} while(0);
+
+static int
+mmmap_pte_range(struct dmar_domain *domain, pmd_t *pmd, unsigned long addr,
+   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
+{
+   pte_t *pte, *first_pte;
+   u64 pfn;
+
+   pfn = phys_addr >> PAGE_SHIFT;
+   if (unlikely(pmd_none(*pmd)))
+   pgtable_populate(domain, pmd);
+
+   first_pte = pte = pte_offset_kernel(pmd, addr);
+
+   do {
+   set_pte(pte, pfn_pte(pfn, prot));
+   pfn++;
+   } while (pte++, addr += PAGE_SIZE, addr != end);
+
+   domain_flush_cache(domain, first_pte, (void *)pte - (void *)first_pte);
+
+   return 0;
+}
+
+static int
+mmmap_pmd_range(struct dmar_domain *domain, pud_t *pud, unsigned long addr,
+   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
+{
+   unsigned long next;
+   pmd_t *pmd;
+
+   if (unlikely(pud_none(*pud)))
+   pgtable_populate(domain, pud);
+   pmd = pmd_offset(pud, addr);
+
+   phys_addr -= addr;
+   do {
+   next = pmd_addr_end(addr, end);
+   if (mmmap_pte_range(domain, pmd, addr, next,
+   phys_addr + addr, prot))
+   return -ENOMEM;
+   } while (pmd++, addr = next, addr != end);
+
+   return 0;
+}
+
+static int
+mmmap_pud_range(struct dmar_domain *domain, p4d_t *p4d, unsigned long addr,
+   unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
+{
+   unsigned long next;
+   pud_t *pud;
+
+   if (unlikely(p4d_none(*p4d)))
+   pgtable_populate(domain, p4d);
+
+   pud = pud_offset(p4d, addr);
+
+   phys_addr -= addr;
+   do {
+   next = pud_addr_end(addr, end);
+   if (mmmap_pmd_range(domain, pud, addr, next,
+   

Re: [PATCH 0/3] iommu/io-pgtable-arm: Mali LPAE improvements

2019-09-23 Thread Tomeu Vizoso
On Thu, 19 Sep 2019 at 10:31, Will Deacon  wrote:
>
> On Wed, Sep 11, 2019 at 06:19:40PM +0100, Robin Murphy wrote:
> > On 2019-09-11 5:20 pm, Will Deacon wrote:
> > > On Wed, Sep 11, 2019 at 06:19:04PM +0200, Neil Armstrong wrote:
> > > > On 11/09/2019 16:42, Robin Murphy wrote:
> > > > > Here's the eagerly-awaited fix to unblock T720/T820, plus a couple of
> > > > > other bits that I've collected so far. I'm not considering this as
> > > > > 5.3 fixes material, but it would be nice if there's any chance still
> > > > > to sneak it into 5.4.
> > > > >
> > > > > Robin.
> > > > >
> > > > >
> > > > > Robin Murphy (3):
> > > > >iommu/io-pgtable-arm: Correct Mali attributes
> > > > >iommu/io-pgtable-arm: Support more Mali configurations
> > > > >iommu/io-pgtable-arm: Allow coherent walks for Mali
> > > > >
> > > > >   drivers/iommu/io-pgtable-arm.c | 61 
> > > > > ++
> > > > >   1 file changed, 48 insertions(+), 13 deletions(-)
> > > > >
> > > >
> > > > Tested-by: Neil Armstrong 
> > > >
> > > > On Khadas VIM2 (Amlogic S912) with T820 Mali GPU
> > > >
> > > > I hope this will be part of v5.4 so we can run panfrost on vanilla v5.4 
> > > > !
> > >
> > > Not a chance -- the merge window opens on Monday and -next isn't being
> > > rolled out at the moment due to LPC. Let's shoot for 5.5 and get this
> > > queued up in a few weeks.
> >
> > Fair enough, that was certainly more extreme optimism than realistic
> > expectation on my part :)
> >
> > There is some argument for taking #1 and #2 as 5.4 fixes, though - the
> > upcoming Mesa 19.2 release will enable T820 support on the userspace side -
> > so let's pick that discussion up again in a few weeks.
>
> Ok, I'll include those two in my fixes pull to Joerg at -rc1.

Hi Will,

Looks like this didn't end up happening?

Thanks,

Tomeu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu