Re: [PATCH 05/13] x86: Add early TPM1.2/TPM2.0 interface support for Secure Launch

2020-09-24 Thread Jarkko Sakkinen
On Thu, Sep 24, 2020 at 10:58:33AM -0400, Ross Philipson wrote:
> From: "Daniel P. Smith" 
> 
> This commit introduces an abstraction for TPM1.2 and TPM2.0 devices
> above the TPM hardware interface.
> 
> Signed-off-by: Daniel P. Smith 
> Signed-off-by: Ross Philipson 

This is way, way too PoC. I wonder why there is no RFC tag.

Please also read section 2 of

https://www.kernel.org/doc/html/v5.8/process/submitting-patches.html

You should leverage existing TPM code in a way or another. Refine it so
that it scales for your purpose and then compile it into your thing
(just include the necesary C-files with relative paths).

How it is now is never going to fly.

/Jarkko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] x86: Trenchboot secure dynamic launch Linux kernel support

2020-09-24 Thread Jarkko Sakkinen
On Thu, Sep 24, 2020 at 10:58:28AM -0400, Ross Philipson wrote:
> The Trenchboot project focus on boot security has led to the enabling of
> the Linux kernel to be directly invocable by the x86 Dynamic Launch
> instruction(s) for establishing a Dynamic Root of Trust for Measurement
> (DRTM). The dynamic launch will be initiated by a boot loader with

What is "the dynamic launch"?

> associated support added to it, for example the first targeted boot
> loader will be GRUB2. An integral part of establishing the DRTM involves
> measuring everything that is intended to be run (kernel image, initrd,
> etc) and everything that will configure that kernel to run (command
> line, boot params, etc) into specific PCRs, the DRTM PCRs (17-22), in
> the TPM. Another key aspect is the dynamic launch is rooted in hardware,
> that is to say the hardware (CPU) is what takes the first measurement
> for the chain of integrity measurements. On Intel this is done using
> the GETSEC instruction provided by Intel's TXT and the SKINIT
> instruction provided by AMD's AMD-V. Information on these technologies
> can be readily found online. This patchset introduces Intel TXT support.

Why not both Intel and AMD? You should explain this in the cover letter.

I'd be more motivated to review and test a full all encompassing x86
solution. It would increase the patch set size but would also give it
a better test coverage, which I think would be a huge plus in such a
complex patch set.

> To enable the kernel to be launched by GETSEC, a stub must be built
> into the setup section of the compressed kernel to handle the specific
> state that the dynamic launch process leaves the BSP in. This is
> analogous to the EFI stub that is found in the same area. Also this stub

How is it analogous?

> must measure everything that is going to be used as early as possible.
> This stub code and subsequent code must also deal with the specific
> state that the dynamic launch leaves the APs in.

What is "the specific state"?

> A quick note on terminology. The larger open source project itself is
> called Trenchboot, which is hosted on Github (links below). The kernel
> feature enabling the use of the x86 technology is referred to as "Secure
> Launch" within the kernel code. As such the prefixes sl_/SL_ or
> slaunch/SLAUNCH will be seen in the code. The stub code discussed above
> is referred to as the SL stub.

Is this only for Trenchboot? I'm a bit lost. What is it anyway?

> The basic flow is:
> 
>  - Entry from the dynamic launch jumps to the SL stub
>  - SL stub fixes up the world on the BSP

What is "SL"?

>  - For TXT, SL stub wakes the APs, fixes up their worlds
>  - For TXT, APs are left halted waiting for an NMI to wake them
>  - SL stub jumps to startup_32
>  - SL main runs to measure configuration and module information into the
>DRTM PCRs. It also locates the TPM event log.
>  - Kernel boot proceeds normally from this point.
>  - During early setup, slaunch_setup() runs to finish some validation
>and setup tasks.

What are "some" validation and setup tasks?

>  - The SMP bringup code is modified to wake the waiting APs. APs vector
>to rmpiggy and start up normally from that point.
>  - Kernel boot finishes booting normally
>  - SL securityfs module is present to allow reading and writing of the
>TPM event log.

What is SL securityfs module? Why is it needed? We already have
securityfs file for the event log. Why it needs to be writable?

>  - SEXIT support to leave SMX mode is present on the kexec path and
>the various reboot paths (poweroff, reset, halt).

What SEXIT do and why it is required on the kexec path?

/Jarkko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: a saner API for allocating DMA addressable pages v3

2020-09-24 Thread Christoph Hellwig
This is in dma-mapping for-next now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] ARM/omap1: switch to use dma_direct_set_offset for lbus DMA offsets

2020-09-24 Thread Christoph Hellwig
On Mon, Sep 21, 2020 at 09:44:18AM +0300, Tony Lindgren wrote:
> > > Looks nice to me :) I still can't test this probably for few more weeks
> > > though but hopefully Aaro or Janusz (Added to Cc) can test it.
> > 
> > Works for me on Amstrad Delta (tested with a USB ethernet adapter).
> > 
> > Tested-by: Janusz Krzysztofik 
> 
> Great, good to hear! And thanks for testing it.
> 
> Christoph, feel free to queue this along with the other patches:
> 
> Acked-by: Tony Lindgren 
> 
> Or let me know if you want me to pick it up.

I've pulled it in now that the other patches are unlikely to be
tested in time for 5.10.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] dma-mapping: better document dma_addr_t and DMA_MAPPING_ERROR

2020-09-24 Thread 'Christoph Hellwig'
On Tue, Sep 22, 2020 at 01:56:46PM +, David Laight wrote:
> > +/*
> > + * A dma_addr_t can hold any valid DMA or bus address for the platform.  
> > It can
> > + * be given to a device to use as a DMA source or target.  A CPU cannot
> > + * reference a dma_addr_t directly because there may be translation 
> > between its
> > + * physical address space and the bus address space.
> 
> It can't access it 'directly' because it isn't a virtual address
> 
> > + *
> > + * DMA_MAPPING_ERROR is the magic error code if a mapping failed.  It 
> > should not
> > + * be used directly in drivers, but checked for using dma_mapping_error()
> > + * instead.
> > + */
> 
> I think it might be worth adding:
> 
> A dma_addr_t value may be device dependant and differ from the
> 'physical address' of the memory.

This is what I've committed:

 * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
 * be given to a device to use as a DMA source or target.  It is specific to a
 * given device and there may be a translation between the CPU physical address
 * space and the bus address space.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/13] x86: Secure Launch Kconfig

2020-09-24 Thread Randy Dunlap
On 9/24/20 7:58 AM, Ross Philipson wrote:
> Initial bits to bring in Secure Launch functionality. Add Kconfig
> options for compiling in/out the Secure Launch code.
> 
> Signed-off-by: Ross Philipson 

Hi,
from Documentation/process/coding-style.rst:

Lines under a ``config`` definition
are indented with one tab, while help text is indented an additional two
spaces.

> ---
>  arch/x86/Kconfig | 36 
>  1 file changed, 36 insertions(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 7101ac6..8957981 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1968,6 +1968,42 @@ config EFI_MIXED
>  
>  If unsure, say N.
>  
> +config SECURE_LAUNCH
> + bool "Secure Launch support"
> + default n
> + depends on X86_64
> + help
> +The Secure Launch feature allows a kernel to be loaded
> +directly through an Intel TXT measured launch. Intel TXT
> +establishes a Dynamic Root of Trust for Measurement (DRTM)
> +where the CPU measures the kernel image. This feature then
> +continues the measurement chain over kernel configuration
> +information and init images.
> +
> +choice
> + prompt "Select Secure Launch Algorithm for TPM2"
> + depends on SECURE_LAUNCH
> +
> +config SECURE_LAUNCH_SHA1
> + bool "Secure Launch TPM1 SHA1"
> + help
> +When using Secure Launch and TPM1 is present, use SHA1 hash
> +algorithm for measurements.
> +
> +config SECURE_LAUNCH_SHA256
> + bool "Secure Launch TPM2 SHA256"
> + help
> +When using Secure Launch and TPM2 is present, use SHA256 hash
> +algorithm for measurements.
> +
> +config SECURE_LAUNCH_SHA512
> + bool "Secure Launch TPM2 SHA512"
> + help
> +When using Secure Launch and TPM2 is present, use SHA512 hash
> +algorithm for measurements.
> +
> +endchoice
> +


thanks.
-- 
~Randy

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths

2020-09-24 Thread Lu Baolu



On 9/24/20 10:08 PM, David Woodhouse wrote:

From: David Woodhouse 

Instead of bailing out completely, such a unit can still be used for
interrupt remapping.


Reviewed-by: Lu Baolu 

Best regards,
baolu



Signed-off-by: David Woodhouse 
---
  drivers/iommu/intel/dmar.c | 46 +-
  1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 93e6345f3414..4420a759f095 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1024,8 +1024,8 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
  {
struct intel_iommu *iommu;
u32 ver, sts;
-   int agaw = 0;
-   int msagaw = 0;
+   int agaw = -1;
+   int msagaw = -1;
int err;
  
  	if (!drhd->reg_base_addr) {

@@ -1050,17 +1050,28 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
}
  
  	err = -EINVAL;

-   agaw = iommu_calculate_agaw(iommu);
-   if (agaw < 0) {
-   pr_err("Cannot get a valid agaw for iommu (seq_id = %d)\n",
-   iommu->seq_id);
-   goto err_unmap;
-   }
-   msagaw = iommu_calculate_max_sagaw(iommu);
-   if (msagaw < 0) {
-   pr_err("Cannot get a valid max agaw for iommu (seq_id = %d)\n",
-   iommu->seq_id);
-   goto err_unmap;
+   if (cap_sagaw(iommu->cap) == 0) {
+   pr_info("%s: No supported address widths. Not attempting DMA 
translation.\n",
+   iommu->name);
+   drhd->ignored = 1;
+   }
+
+   if (!drhd->ignored) {
+   agaw = iommu_calculate_agaw(iommu);
+   if (agaw < 0) {
+   pr_err("Cannot get a valid agaw for iommu (seq_id = 
%d)\n",
+  iommu->seq_id);
+   drhd->ignored = 1;
+   }
+   }
+   if (!drhd->ignored) {
+   msagaw = iommu_calculate_max_sagaw(iommu);
+   if (msagaw < 0) {
+   pr_err("Cannot get a valid max agaw for iommu (seq_id = 
%d)\n",
+  iommu->seq_id);
+   drhd->ignored = 1;
+   agaw = -1;
+   }
}
iommu->agaw = agaw;
iommu->msagaw = msagaw;
@@ -1087,7 +1098,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
  
  	raw_spin_lock_init(>register_lock);
  
-	if (intel_iommu_enabled) {

+   /*
+* This is only for hotplug; at boot time intel_iommu_enabled won't
+* be set yet. When intel_iommu_init() runs, it registers the units
+* present at boot time, then sets intel_iommu_enabled.
+*/
+   if (intel_iommu_enabled && !drhd->ignored) {
err = iommu_device_sysfs_add(>iommu, NULL,
 intel_iommu_groups,
 "%s", iommu->name);
@@ -1117,7 +1133,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
  
  static void free_iommu(struct intel_iommu *iommu)

  {
-   if (intel_iommu_enabled) {
+   if (intel_iommu_enabled && iommu->iommu.ops) {
iommu_device_unregister(>iommu);
iommu_device_sysfs_remove(>iommu);
}


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] iommu aux-domain APIs extensions

2020-09-24 Thread Lu Baolu

Hi Joerg,

On 9/24/20 5:55 PM, Joerg Roedel wrote:

On Tue, Sep 22, 2020 at 02:10:37PM +0800, Lu Baolu wrote:

Hi Jorge and Alex,

A description of this patch series could be found here.

https://lore.kernel.org/linux-iommu/20200901033422.22249-1-baolu...@linux.intel.com/


Hmm, I am wondering if we can avoid all this hassle and special APIs by
making the mdev framework more visible outside of the vfio code. There
is an underlying bus implementation for mdevs, so is there a reason
those can't use the standard iommu-core code to setup IOMMU mappings?


The original purpose of this series is to enable the device driver to
retrieve the aux-domain through iommu core after iommu
ops.aux_attach_dev().

The domain was allocated in vfio/mdev, but it's also needed by the
device driver in mediated callbacks. The idea of this patch series is to
extend the aux-API so that the domain could be saved in group->domain
and get by the mediated driver through the existing
iommu_get_domain_for_dev().

Back when we were developing the aux-domain, I proposed to keep the
domain in vfio/mdev.

https://lore.kernel.org/linux-iommu/20181105073408.21815-7-baolu...@linux.intel.com/

It wasn't discussed at that time due to the lack of real consumer. Intel
is now adding aux-domain support in idxd (DMA streaming accelerator)
driver which becomes the first real consumer. So this problem is brought
back to the table.



What speaks against doing:

- IOMMU drivers capable of handling mdevs register iommu-ops
  for the mdev_bus.

- iommu_domain_alloc() takes bus_type as parameter, so there can
  be special domains be allocated for mdevs.

- Group creation and domain allocation will happen
  automatically in the iommu-core when a new mdev is registered
  through device-driver core code.

- There should be no need for special iommu_aux_* APIs, as one
  can attach a domain directly to >dev with
  iommu_attach_device(domain, >dev).

Doing it this way will probably also keep the mdev-special code in VFIO
small.


Fully understand now. Thanks for guide.



Regards,

Joerg



Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bugzilla-dae...@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]

2020-09-24 Thread Raj, Ashok
On Wed, Sep 23, 2020 at 12:45:11PM -0700, Rajat Jain wrote:
> On Wed, Sep 23, 2020 at 9:19 AM Raj, Ashok  wrote:
> >
> > Hi Bjorn
> >
> >
> > On Wed, Sep 23, 2020 at 11:03:27AM -0500, Bjorn Helgaas wrote:
> > > [+cc IOMMU and NVMe folks]
> > >
> > > Sorry, I forgot to forward this to linux-pci when it was first
> > > reported.
> > >
> > > Apparently this happens with v5.9-rc3, and may be related to
> > > 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> > > which appeared in v5.8-rc3.
> > >
> > > There are several dmesg logs and proposed patches in the bugzilla, but
> > > no analysis yet of what the problem is.  From the first dmesg
> > > attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):
> >
> > We have been investigating this internally as well. It appears maybe the
> > specupdate for Cometlake is missing the errata documention. The offsets
> > were wrong in some of them, and if its the same issue its likely cause.
> 
> Can you please also confirm if errata applies to Tigerlake ?
> 

We confirmed ICL/TGL isn't affected.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bugzilla-dae...@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]

2020-09-24 Thread Raj, Ashok
Hi Alex

> > Apparently it also requires to disable RR, and I'm not able to confirm if
> > CML requires that as well. 
> > 
> > pci_quirk_disable_intel_spt_pch_acs_redir() also seems to consult the same
> > table, so i'm not sure why we need the other patch in bugzilla is required.
> 
> If we're talking about the Intel bug where PCH root ports implement
> the ACS capability and control registers as dword rather than word
> registers, then how is ACS getting enabled in order to generate an ACS
> violation?  The standard ACS code would write to the control register
> word at offset 6, which is still the read-only capability register on
> those devices.  Thanks,


Right... Maybe we need header log to figure out what exatly is happening. 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bugzilla-dae...@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]

2020-09-24 Thread Alex Williamson
On Thu, 24 Sep 2020 11:09:05 -0700
"Raj, Ashok"  wrote:

> Hi Kai
> 
> + Alex, since he had some of the early quirks authored.
> 
> On Thu, Sep 24, 2020 at 12:31:53AM +0800, Kai-Heng Feng wrote:
> > [+Cc Christoph]
> >   
> > > On Sep 24, 2020, at 00:03, Bjorn Helgaas  wrote:
> > > 
> > > [+cc IOMMU and NVMe folks]
> > > 
> > > Sorry, I forgot to forward this to linux-pci when it was first
> > > reported.
> > > 
> > > Apparently this happens with v5.9-rc3, and may be related to
> > > 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> > > which appeared in v5.8-rc3.
> > > 
> > > There are several dmesg logs and proposed patches in the bugzilla, but
> > > no analysis yet of what the problem is.  From the first dmesg
> > > attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):  
> > 
> > AFAIK Intel is working on it internally.
> > Comet Lake probably needs ACS quirk like older generation chips.  
> 
> I have confirmed with Internal documentation that the problem exists on
> Comet Lake. But its fixed ICL and TGL generations.
> 
> Unfortunately I do not see if the public specupdate documents are for these
> generation chipsets to makes sure all root port id's can be captured.
> 
> There is also another entry in bugzilla that was forwarded that referred to
> Request Redirect Capability to be always disabled as well. This same
> workaround also seems to be turning off RR for the root port. I believe it
> should fix it as well. But i saw another patch attached.
> 
> Can you tell how you reproduce this? just doing a
> 
> #echo mem > /sys/power/state
> 
> is sufficient with an attached NVMe drive? 
> 
> >   
> > > 
> > >  [   50.434945] PM: suspend entry (deep)
> > >  [   50.802086] nvme :01:00.0: saving config space at offset 0x0 
> > > (reading 0x11e0f)
> > >  [   50.842775] ACPI: Preparing to enter system sleep state S3
> > >  [   50.858922] ACPI: Waking up from system sleep state S3
> > >  [   50.883622] nvme :01:00.0: can't change power state from D3hot to 
> > > D0 (config space inaccessible)
> > >  [   50.947352] nvme :01:00.0: restoring config space at offset 0x0 
> > > (was 0x, writing 0x11e0f)
> > >  [   50.947816] pcieport :00:1b.0: DPC: containment event, 
> > > status:0x1f01 source:0x
> > >  [   50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error 
> > > detected
> > >  [   50.947829] pcieport :00:1b.0: PCIe Bus Error: 
> > > severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
> > >  [   50.947830] pcieport :00:1b.0:   device [8086:06ac] error 
> > > status/mask=0020/0001
> > >  [   50.947831] pcieport :00:1b.0:[21] ACSViol
> > > (First)
> > >  [   50.947841] pcieport :00:1b.0: AER: broadcast error_detected 
> > > message
> > >  [   50.947843] nvme nvme0: frozen state error detected, reset controller
> > > 
> > > I suspect the nvme "can't change power state" and restore config space
> > > errors are a consequence of the DPC event.  If DPC disables the link,
> > > the device is inaccessible.
> > > 
> > > I don't know what caused the ACS Violation.  The AER TLP Header Log
> > > might have a clue, but unfortunately we didn't print it.
> > >   
> 
> Apparently it also requires to disable RR, and I'm not able to confirm if
> CML requires that as well. 
> 
> pci_quirk_disable_intel_spt_pch_acs_redir() also seems to consult the same
> table, so i'm not sure why we need the other patch in bugzilla is required.

If we're talking about the Intel bug where PCH root ports implement
the ACS capability and control registers as dword rather than word
registers, then how is ACS getting enabled in order to generate an ACS
violation?  The standard ACS code would write to the control register
word at offset 6, which is still the read-only capability register on
those devices.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 6/6] iommu/vt-d: Check UAPI data processed by IOMMU core

2020-09-24 Thread Jacob Pan
IOMMU generic layer already does sanity checks on UAPI data for version
match and argsz range based on generic information.

This patch adjusts the following data checking responsibilities:
- removes the redundant version check from VT-d driver
- removes the check for vendor specific data size
- adds check for the use of reserved/undefined flags

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c |  3 +--
 drivers/iommu/intel/svm.c   | 11 +--
 include/uapi/linux/iommu.h  |  1 +
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 461f3a6864d4..18ed3b3c70d7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5408,8 +5408,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
int ret = 0;
u64 size = 0;
 
-   if (!inv_info || !dmar_domain ||
-   inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   if (!inv_info || !dmar_domain)
return -EINVAL;
 
if (!dev || !dev_is_pci(dev))
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 99353d6468fa..0cb9a15f1112 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -284,8 +284,15 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
if (WARN_ON(!iommu) || !data)
return -EINVAL;
 
-   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
-   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   if (data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   return -EINVAL;
+
+   /* IOMMU core ensures argsz is more than the start of the union */
+   if (data->argsz < offsetofend(struct iommu_gpasid_bind_data, 
vendor.vtd))
+   return -EINVAL;
+
+   /* Make sure no undefined flags are used in vendor data */
+   if (data->vendor.vtd.flags & ~(IOMMU_SVA_VTD_GPASID_LAST - 1))
return -EINVAL;
 
if (!dev_is_pci(dev))
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 66d4ca40b40f..e1d9e75f2c94 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -288,6 +288,7 @@ struct iommu_gpasid_bind_data_vtd {
 #define IOMMU_SVA_VTD_GPASID_PWT   (1 << 3) /* page-level write through */
 #define IOMMU_SVA_VTD_GPASID_EMTE  (1 << 4) /* extended mem type enable */
 #define IOMMU_SVA_VTD_GPASID_CD(1 << 5) /* PASID-level cache 
disable */
+#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 6)
__u64 flags;
__u32 pat;
__u32 emt;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 5/6] iommu/uapi: Handle data and argsz filled by users

2020-09-24 Thread Jacob Pan
IOMMU user APIs are responsible for processing user data. This patch
changes the interface such that user pointers can be passed into IOMMU
code directly. Separate kernel APIs without user pointers are introduced
for in-kernel users of the UAPI functionality.

IOMMU UAPI data has a user filled argsz field which indicates the data
length of the structure. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.

User data may also be extended, resulting in possible argsz increase.
Backward compatibility is ensured based on size and flags (or
the functional equivalent fields) checking.

This patch adds sanity checks in the IOMMU layer. In addition to argsz,
reserved/unused fields in padding, flags, and version are also checked.
Details are documented in Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c  | 199 +++--
 include/linux/iommu.h  |  28 +--
 include/uapi/linux/iommu.h |   1 +
 3 files changed, 212 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4ae02291ccc2..5c1b7ae48aae 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1961,34 +1961,219 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+/*
+ * Check flags and other user provided data for valid combinations. We also
+ * make sure no reserved fields or unused flags are set. This is to ensure
+ * not breaking userspace in the future when these fields or flags are used.
+ */
+static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
*info)
+{
+   u32 mask;
+   int i;
+
+   if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
+   if (info->cache & ~mask)
+   return -EINVAL;
+
+   if (info->granularity >= IOMMU_INV_GRANU_NR)
+   return -EINVAL;
+
+   switch (info->granularity) {
+   case IOMMU_INV_GRANU_ADDR:
+   if (info->cache & IOMMU_CACHE_INV_TYPE_PASID)
+   return -EINVAL;
+
+   mask = IOMMU_INV_ADDR_FLAGS_PASID |
+   IOMMU_INV_ADDR_FLAGS_ARCHID |
+   IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (info->granu.addr_info.flags & ~mask)
+   return -EINVAL;
+   break;
+   case IOMMU_INV_GRANU_PASID:
+   mask = IOMMU_INV_PASID_FLAGS_PASID |
+   IOMMU_INV_PASID_FLAGS_ARCHID;
+   if (info->granu.pasid_info.flags & ~mask)
+   return -EINVAL;
+
+   break;
+   case IOMMU_INV_GRANU_DOMAIN:
+   if (info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB)
+   return -EINVAL;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   /* Check reserved padding fields */
+   for (i = 0; i < sizeof(info->padding); i++) {
+   if (info->padding[i])
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
*dev,
-   struct iommu_cache_invalidate_info *inv_info)
+   void __user *uinfo)
 {
+   struct iommu_cache_invalidate_info inv_info = { 0 };
+   u32 minsz;
+   int ret = 0;
+
if (unlikely(!domain->ops->cache_invalidate))
return -ENODEV;
 
-   return domain->ops->cache_invalidate(domain, dev, inv_info);
+   /*
+* No new spaces can be added before the variable sized union, the
+* minimum size is the offset to the union.
+*/
+   minsz = offsetof(struct iommu_cache_invalidate_info, granu);
+
+   /* Copy minsz from user to get flags and argsz */
+   if (copy_from_user(_info, uinfo, minsz))
+   return -EFAULT;
+
+   /* Fields before variable size union is mandatory */
+   if (inv_info.argsz < minsz)
+   return -EINVAL;
+
+   /* PASID and address granu require additional info beyond minsz */
+   if (inv_info.argsz == minsz &&
+   ((inv_info.granularity == IOMMU_INV_GRANU_PASID) ||
+   (inv_info.granularity == IOMMU_INV_GRANU_ADDR)))
+   return -EINVAL;
+
+   if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
+   inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, 
granu.pasid_info))
+   return -EINVAL;
+
+   if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
+   inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, 
granu.addr_info))
+   return -EINVAL;
+
+   /*
+* User might be using a newer UAPI header which has a larger data
+* 

[PATCH v11 0/6] IOMMU user API enhancement

2020-09-24 Thread Jacob Pan
IOMMU user API header was introduced to support nested DMA translation and
related fault handling. The current UAPI data structures consist of three
areas that cover the interactions between host kernel and guest:
 - fault handling
 - cache invalidation
 - bind guest page tables, i.e. guest PASID

Future extensions are likely to support more architectures and vIOMMU features.

In the previous discussion, using user-filled data size and feature flags is
made a preferred approach over a unified version number.
https://lkml.org/lkml/2020/1/29/45

In addition to introduce argsz field to data structures, this patchset is also
trying to document the UAPI design, usage, and extension rules. VT-d driver
changes to utilize the new argsz field is included, VFIO usage is to follow.

This set is available at:
https://github.com/jacobpan/linux.git vsva_v5.9_uapi_v11

Thanks,

Jacob


Changelog:
v11
- Use #define instead of enum in PASID data format, squashed change
  into "iommu/uapi: Handle data and argsz filled by users"
- Remove alloc/free from documentation per Yi's comment. IOMMU UAPI
  does not perform IOASID alloc/free.
v10
- Documentation grammar fixes based on Randy's review
v9
- Directly pass PASID value to iommu_sva_unbind_gpasid() without
  the superfluous data in struct iommu_gpasid_bind_data.
v8
- Rebased to v5.9-rc2
- Addressed review comments from Eric Auger
  1. added a check for the unused vendor flags
  2. commit message improvements
v7
- Added PASID data format enum for range checking
- Tidy up based on reviews from Alex W.
- Removed doc section for vIOMMU fault handling
v6
- Renamed all UAPI functions with iommu_uapi_ prefix
- Replaced argsz maxsz checking with flag specific size checks
- Documentation improvements based on suggestions by Eric Auger
  Replaced example code with a pointer to the actual code
- Added more checks for illegal flags combinations
- Added doc file to MAINTAINERS
v5
- Addjusted paddings in UAPI data to be 8 byte aligned
- Do not clobber argsz in IOMMU core before passing on to vendor driver
- Removed pr_warn_ for invalid UAPI data check, just return -EINVAL
- Clarified VFIO responsibility in UAPI data handling
- Use iommu_uapi prefix to differentiate APIs has in-kernel caller
- Added comment for unchecked flags of invalidation granularity
- Added example in doc to show vendor data checking

v4
- Added checks of UAPI data for reserved fields, version, and flags.
- Removed version check from vendor driver (vt-d)
- Relaxed argsz check to match the UAPI struct size instead of variable
  union size
- Updated documentation

v3:
- Rewrote backward compatibility rule to support existing code
  re-compiled with newer kernel UAPI header that runs on older
  kernel. Based on review comment from Alex W.
  https://lore.kernel.org/linux-iommu/20200611094741.6d118...@w520.home/
- Take user pointer directly in UAPI functions. Perform argsz check
  and copy_from_user() in IOMMU driver. Eliminate the need for
  VFIO or other upper layer to parse IOMMU data.
- Create wrapper function for in-kernel users of UAPI functions
v2:
- Removed unified API version and helper
- Introduced argsz for each UAPI data
- Introduced UAPI doc

Jacob Pan (6):
  docs: IOMMU user API
  iommu/uapi: Add argsz for user filled data
  iommu/uapi: Use named union for user data
  iommu/uapi: Rename uapi functions
  iommu/uapi: Handle data and argsz filled by users
  iommu/vt-d: Check UAPI data processed by IOMMU core

 Documentation/userspace-api/iommu.rst | 209 ++
 MAINTAINERS   |   1 +
 drivers/iommu/intel/iommu.c   |  25 ++--
 drivers/iommu/intel/svm.c |  13 ++-
 drivers/iommu/iommu.c | 201 ++--
 include/linux/iommu.h |  35 --
 include/uapi/linux/iommu.h|  18 ++-
 7 files changed, 461 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/userspace-api/iommu.rst

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 4/6] iommu/uapi: Rename uapi functions

2020-09-24 Thread Jacob Pan
User APIs such as iommu_sva_unbind_gpasid() may also be used by the
kernel. Since we introduced user pointer to the UAPI functions,
in-kernel callers cannot share the same APIs. In-kernel callers are also
trusted, there is no need to validate the data.

We plan to have two flavors of the same API functions, one called
through ioctls, carrying a user pointer and one called directly with
valid IOMMU UAPI structs. To differentiate both, let's rename existing
functions with an iommu_uapi_ prefix.

Suggested-by: Alex Williamson 
Reviewed-by: Eric Auger 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 18 +-
 include/linux/iommu.h | 31 ---
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 609bd25bf154..4ae02291ccc2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1961,35 +1961,35 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
-int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
-  struct iommu_cache_invalidate_info *inv_info)
+int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
*dev,
+   struct iommu_cache_invalidate_info *inv_info)
 {
if (unlikely(!domain->ops->cache_invalidate))
return -ENODEV;
 
return domain->ops->cache_invalidate(domain, dev, inv_info);
 }
-EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
+EXPORT_SYMBOL_GPL(iommu_uapi_cache_invalidate);
 
-int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-  struct device *dev, struct iommu_gpasid_bind_data 
*data)
+int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain,
+  struct device *dev, struct 
iommu_gpasid_bind_data *data)
 {
if (unlikely(!domain->ops->sva_bind_gpasid))
return -ENODEV;
 
return domain->ops->sva_bind_gpasid(domain, dev, data);
 }
-EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
+EXPORT_SYMBOL_GPL(iommu_uapi_sva_bind_gpasid);
 
-int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-ioasid_t pasid)
+int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, struct device 
*dev,
+ioasid_t pasid)
 {
if (unlikely(!domain->ops->sva_unbind_gpasid))
return -ENODEV;
 
return domain->ops->sva_unbind_gpasid(dev, pasid);
 }
-EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
+EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
 
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fee209efb756..710d5d2691eb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -424,13 +424,13 @@ extern int iommu_attach_device(struct iommu_domain 
*domain,
   struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
-extern int iommu_cache_invalidate(struct iommu_domain *domain,
- struct device *dev,
- struct iommu_cache_invalidate_info *inv_info);
-extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-   struct device *dev, struct iommu_gpasid_bind_data *data);
-extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-   struct device *dev, ioasid_t pasid);
+extern int iommu_uapi_cache_invalidate(struct iommu_domain *domain,
+  struct device *dev,
+  struct iommu_cache_invalidate_info 
*inv_info);
+extern int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain,
+ struct device *dev, struct 
iommu_gpasid_bind_data *data);
+extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1032,21 +1032,22 @@ static inline int iommu_sva_get_pasid(struct iommu_sva 
*handle)
return IOMMU_PASID_INVALID;
 }
 
-static inline int
-iommu_cache_invalidate(struct iommu_domain *domain,
-  struct device *dev,
-  struct iommu_cache_invalidate_info *inv_info)
+static inline int iommu_uapi_cache_invalidate(struct iommu_domain *domain,
+ struct device *dev,
+ struct 
iommu_cache_invalidate_info *inv_info)
 {
return -ENODEV;
 }
-static inline int 

[PATCH v11 1/6] docs: IOMMU user API

2020-09-24 Thread Jacob Pan
IOMMU UAPI is newly introduced to support communications between guest
virtual IOMMU and host IOMMU. There has been lots of discussions on how
it should work with VFIO UAPI and userspace in general.

This document is intended to clarify the UAPI design and usage. The
mechanics of how future extensions should be achieved are also covered
in this documentation.

Cc: linux-...@vger.kernel.org
Cc: Jonathan Corbet 
Cc: linux-...@vger.kernel.org
Reviewed-by: Eric Auger 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 Documentation/userspace-api/iommu.rst | 209 ++
 MAINTAINERS   |   1 +
 2 files changed, 210 insertions(+)
 create mode 100644 Documentation/userspace-api/iommu.rst

diff --git a/Documentation/userspace-api/iommu.rst 
b/Documentation/userspace-api/iommu.rst
new file mode 100644
index ..d3108c1519d5
--- /dev/null
+++ b/Documentation/userspace-api/iommu.rst
@@ -0,0 +1,209 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. iommu:
+
+=
+IOMMU Userspace API
+=
+
+IOMMU UAPI is used for virtualization cases where communications are
+needed between physical and virtual IOMMU drivers. For baremetal
+usage, the IOMMU is a system device which does not need to communicate
+with userspace directly.
+
+The primary use cases are guest Shared Virtual Address (SVA) and
+guest IO virtual address (IOVA), wherein the vIOMMU implementation
+relies on the physical IOMMU and for this reason requires interactions
+with the host driver.
+
+.. contents:: :local:
+
+Functionalities
+===
+Communications of user and kernel involve both directions. The
+supported user-kernel APIs are as follows:
+
+1. Bind/Unbind guest PASID (e.g. Intel VT-d)
+2. Bind/Unbind guest PASID table (e.g. ARM SMMU)
+3. Invalidate IOMMU caches upon guest requests
+4. Report errors to the guest and serve page requests
+
+Requirements
+
+The IOMMU UAPIs are generic and extensible to meet the following
+requirements:
+
+1. Emulated and para-virtualised vIOMMUs
+2. Multiple vendors (Intel VT-d, ARM SMMU, etc.)
+3. Extensions to the UAPI shall not break existing userspace
+
+Interfaces
+==
+Although the data structures defined in IOMMU UAPI are self-contained,
+there are no user API functions introduced. Instead, IOMMU UAPI is
+designed to work with existing user driver frameworks such as VFIO.
+
+Extension Rules & Precautions
+-
+When IOMMU UAPI gets extended, the data structures can *only* be
+modified in two ways:
+
+1. Adding new fields by re-purposing the padding[] field. No size change.
+2. Adding new union members at the end. May increase the structure sizes.
+
+No new fields can be added *after* the variable sized union in that it
+will break backward compatibility when offset moves. A new flag must
+be introduced whenever a change affects the structure using either
+method. The IOMMU driver processes the data based on flags which
+ensures backward compatibility.
+
+Version field is only reserved for the unlikely event of UAPI upgrade
+at its entirety.
+
+It's *always* the caller's responsibility to indicate the size of the
+structure passed by setting argsz appropriately.
+Though at the same time, argsz is user provided data which is not
+trusted. The argsz field allows the user app to indicate how much data
+it is providing; it's still the kernel's responsibility to validate
+whether it's correct and sufficient for the requested operation.
+
+Compatibility Checking
+--
+When IOMMU UAPI extension results in some structure size increase,
+IOMMU UAPI code shall handle the following cases:
+
+1. User and kernel has exact size match
+2. An older user with older kernel header (smaller UAPI size) running on a
+   newer kernel (larger UAPI size)
+3. A newer user with newer kernel header (larger UAPI size) running
+   on an older kernel.
+4. A malicious/misbehaving user passing illegal/invalid size but within
+   range. The data may contain garbage.
+
+Feature Checking
+
+While launching a guest with vIOMMU, it is strongly advised to check
+the compatibility upfront, as some subsequent errors happening during
+vIOMMU operation, such as cache invalidation failures cannot be nicely
+escalated to the guest due to IOMMU specifications. This can lead to
+catastrophic failures for the users.
+
+User applications such as QEMU are expected to import kernel UAPI
+headers. Backward compatibility is supported per feature flags.
+For example, an older QEMU (with older kernel header) can run on newer
+kernel. Newer QEMU (with new kernel header) may refuse to initialize
+on an older kernel if new feature flags are not supported by older
+kernel. Simply recompiling existing code with newer kernel header should
+not be an issue in that only existing flags are used.
+
+IOMMU vendor driver should report the below features to IOMMU UAPI

[PATCH v11 2/6] iommu/uapi: Add argsz for user filled data

2020-09-24 Thread Jacob Pan
As IOMMU UAPI gets extended, user data size may increase. To support
backward compatibiliy, this patch introduces a size field to each UAPI
data structures. It is *always* the responsibility for the user to fill in
the correct size. Padding fields are adjusted to ensure 8 byte alignment.

Specific scenarios for user data handling are documented in:
Documentation/userspace-api/iommu.rst

As there is no current users of the API, struct version is not
incremented.

Reviewed-by: Eric Auger 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 include/uapi/linux/iommu.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index c2b2caf9ed41..b42acc8fe007 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -139,6 +139,7 @@ enum iommu_page_response_code {
 
 /**
  * struct iommu_page_response - Generic page response information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @flags: encodes whether the corresponding fields are valid
  * (IOMMU_FAULT_PAGE_RESPONSE_* values)
@@ -147,6 +148,7 @@ enum iommu_page_response_code {
  * @code: response code from  iommu_page_response_code
  */
 struct iommu_page_response {
+   __u32   argsz;
 #define IOMMU_PAGE_RESP_VERSION_1  1
__u32   version;
 #define IOMMU_PAGE_RESP_PASID_VALID(1 << 0)
@@ -222,6 +224,7 @@ struct iommu_inv_pasid_info {
 /**
  * struct iommu_cache_invalidate_info - First level/stage invalidation
  * information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @cache: bitfield that allows to select which caches to invalidate
  * @granularity: defines the lowest granularity used for the invalidation:
@@ -250,6 +253,7 @@ struct iommu_inv_pasid_info {
  * must support the used granularity.
  */
 struct iommu_cache_invalidate_info {
+   __u32   argsz;
 #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
__u32   version;
 /* IOMMU paging structure cache */
@@ -259,7 +263,7 @@ struct iommu_cache_invalidate_info {
 #define IOMMU_CACHE_INV_TYPE_NR(3)
__u8cache;
__u8granularity;
-   __u8padding[2];
+   __u8padding[6];
union {
struct iommu_inv_pasid_info pasid_info;
struct iommu_inv_addr_info addr_info;
@@ -296,6 +300,7 @@ struct iommu_gpasid_bind_data_vtd {
 
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID 
binding
+ * @argsz: User filled size of this data
  * @version:   Version of this data structure
  * @format:PASID table entry format
  * @flags: Additional information on guest bind request
@@ -313,17 +318,18 @@ struct iommu_gpasid_bind_data_vtd {
  * PASID to host PASID based on this bind data.
  */
 struct iommu_gpasid_bind_data {
+   __u32 argsz;
 #define IOMMU_GPASID_BIND_VERSION_11
__u32 version;
 #define IOMMU_PASID_FORMAT_INTEL_VTD   1
__u32 format;
+   __u32 addr_width;
 #define IOMMU_SVA_GPASID_VAL   (1 << 0) /* guest PASID valid */
__u64 flags;
__u64 gpgd;
__u64 hpasid;
__u64 gpasid;
-   __u32 addr_width;
-   __u8  padding[12];
+   __u8  padding[8];
/* Vendor specific data */
union {
struct iommu_gpasid_bind_data_vtd vtd;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 3/6] iommu/uapi: Use named union for user data

2020-09-24 Thread Jacob Pan
IOMMU UAPI data size is filled by the user space which must be validated
by the kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.

To use offsetof() on the union, we must make it named.

Link: https://lore.kernel.org/linux-iommu/20200611145518.0c281...@x1.home/
Signed-off-by: Jacob Pan 
Reviewed-by: Lu Baolu 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel/iommu.c | 22 +++---
 drivers/iommu/intel/svm.c   |  2 +-
 include/uapi/linux/iommu.h  |  4 ++--
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 87b17bac04c2..461f3a6864d4 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5434,8 +5434,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 
/* Size is only valid in address selective invalidation */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-   size = to_vtd_size(inv_info->addr_info.granule_size,
-  inv_info->addr_info.nb_granules);
+   size = to_vtd_size(inv_info->granu.addr_info.granule_size,
+  inv_info->granu.addr_info.nb_granules);
 
for_each_set_bit(cache_type,
 (unsigned long *)_info->cache,
@@ -5456,20 +5456,20 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 * granularity.
 */
if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
-   (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
-   pasid = inv_info->pasid_info.pasid;
+   (inv_info->granu.pasid_info.flags & 
IOMMU_INV_PASID_FLAGS_PASID))
+   pasid = inv_info->granu.pasid_info.pasid;
else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
-(inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
-   pasid = inv_info->addr_info.pasid;
+(inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
+   pasid = inv_info->granu.addr_info.pasid;
 
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
/* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
-   (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
+   (inv_info->granu.addr_info.addr & 
((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
pr_err_ratelimited("User address not aligned, 
0x%llx, size order %llu\n",
-  inv_info->addr_info.addr, 
size);
+  
inv_info->granu.addr_info.addr, size);
}
 
/*
@@ -5477,9 +5477,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 * We use npages = -1 to indicate that.
 */
qi_flush_piotlb(iommu, did, pasid,
-   mm_to_dma_pfn(inv_info->addr_info.addr),
+   
mm_to_dma_pfn(inv_info->granu.addr_info.addr),
(granu == QI_GRAN_NONG_PASID) ? -1 : 1 
<< size,
-   inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
+   inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
 
if (!info->ats_enabled)
break;
@@ -5502,7 +5502,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
size = 64 - VTD_PAGE_SHIFT;
addr = 0;
} else if (inv_info->granularity == 
IOMMU_INV_GRANU_ADDR) {
-   addr = inv_info->addr_info.addr;
+   addr = inv_info->granu.addr_info.addr;
}
 
if (info->ats_enabled)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 95c3164a2302..99353d6468fa 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -370,7 +370,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
spin_lock(>lock);
ret = intel_pasid_setup_nested(iommu, dev,
   (pgd_t *)(uintptr_t)data->gpgd,
-  data->hpasid, >vtd, 

RE: [PATCH v2 4/9] iommu/ioasid: Add reference couting functions

2020-09-24 Thread Shameerali Kolothum Thodi
Hi Jacob,

> -Original Message-
> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
> Jacob Pan
> Sent: 22 August 2020 05:35
> To: iommu@lists.linux-foundation.org; LKML ;
> Jean-Philippe Brucker ; Lu Baolu
> ; Joerg Roedel ; David
> Woodhouse 
> Cc: Tian, Kevin ; Raj Ashok ; Wu
> Hao 
> Subject: [PATCH v2 4/9] iommu/ioasid: Add reference couting functions
> 
> There can be multiple users of an IOASID, each user could have hardware
> contexts associated with the IOASID. In order to align lifecycles,
> reference counting is introduced in this patch. It is expected that when
> an IOASID is being freed, each user will drop a reference only after its
> context is cleared.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/ioasid.c | 113
> +
>  include/linux/ioasid.h |   4 ++
>  2 files changed, 117 insertions(+)
> 
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index f73b3dbfc37a..5f31d63c75b1 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -717,6 +717,119 @@ int ioasid_set_for_each_ioasid(struct ioasid_set
> *set,
>  EXPORT_SYMBOL_GPL(ioasid_set_for_each_ioasid);
> 
>  /**
> + * IOASID refcounting rules
> + * - ioasid_alloc() set initial refcount to 1
> + *
> + * - ioasid_free() decrement and test refcount.
> + * If refcount is 0, ioasid will be freed. Deleted from the system-wide
> + * xarray as well as per set xarray. The IOASID will be returned to the
> + * pool and available for new allocations.
> + *
> + * If recount is non-zero, mark IOASID as
> IOASID_STATE_FREE_PENDING.
> + * No new reference can be added. The IOASID is not returned to the
> pool
> + * for reuse.
> + * After free, ioasid_get() will return error but ioasid_find() and other
> + * non refcount adding APIs will continue to work until the last 
> reference
> + * is dropped
> + *
> + * - ioasid_get() get a reference on an active IOASID
> + *
> + * - ioasid_put() decrement and test refcount of the IOASID.
> + * If refcount is 0, ioasid will be freed. Deleted from the system-wide
> + * xarray as well as per set xarray. The IOASID will be returned to the
> + * pool and available for new allocations.
> + * Do nothing if refcount is non-zero.
> + *

Is it better to have a return for this based on whether ioasid is freed or not? 

I was going through Jean's SMMUv3 SVA patches[1] and that one returns true
if ioasid was freed. And that info is subsequently used to reset the pasid 
associated
with a mm. Though, not sure that is still relevant or not.

Thanks,
Shameer
1. 
https://lore.kernel.org/linux-iommu/20200918101852.582559-3-jean-phili...@linaro.org/

> + * - ioasid_find() does not take reference, caller must hold reference
> + *
> + * ioasid_free() can be called multiple times without error until all refs 
> are
> + * dropped.
> + */
> +
> +int ioasid_get_locked(struct ioasid_set *set, ioasid_t ioasid)
> +{
> + struct ioasid_data *data;
> +
> + data = xa_load(_allocator->xa, ioasid);
> + if (!data) {
> + pr_err("Trying to get unknown IOASID %u\n", ioasid);
> + return -EINVAL;
> + }
> + if (data->state == IOASID_STATE_FREE_PENDING) {
> + pr_err("Trying to get IOASID being freed%u\n", ioasid);
> + return -EBUSY;
> + }
> +
> + if (set && data->set != set) {
> + pr_err("Trying to get IOASID not in set%u\n", ioasid);
> + /* data found but does not belong to the set */
> + return -EACCES;
> + }
> + refcount_inc(>users);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_get_locked);
> +
> +/**
> + * ioasid_get - Obtain a reference of an ioasid
> + * @set
> + * @ioasid
> + *
> + * Check set ownership if @set is non-null.
> + */
> +int ioasid_get(struct ioasid_set *set, ioasid_t ioasid)
> +{
> + int ret = 0;
> +
> + spin_lock(_allocator_lock);
> + ret = ioasid_get_locked(set, ioasid);
> + spin_unlock(_allocator_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_get);
> +
> +void ioasid_put_locked(struct ioasid_set *set, ioasid_t ioasid)
> +{
> + struct ioasid_data *data;
> +
> + data = xa_load(_allocator->xa, ioasid);
> + if (!data) {
> + pr_err("Trying to put unknown IOASID %u\n", ioasid);
> + return;
> + }
> +
> + if (set && data->set != set) {
> + pr_err("Trying to drop IOASID not in the set %u\n", ioasid);
> + return;
> + }
> +
> + if (!refcount_dec_and_test(>users)) {
> + pr_debug("%s: IOASID %d has %d remainning users\n",
> + __func__, ioasid, refcount_read(>users));
> + return;
> + }
> + ioasid_do_free(data);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_put_locked);
> +
> +/**
> + * ioasid_put - Drop a reference of an ioasid
> + * @set
> + * @ioasid
> + *
> + * Check set ownership if @set is non-null.
> 

Re: [PATCH v9 3/7] iommu/uapi: Introduce enum type for PASID data format

2020-09-24 Thread Jacob Pan
Hi Joerg,

On Thu, 24 Sep 2020 10:40:16 +0200, Joerg Roedel  wrote:

> > On Fri, 18 Sep 2020 11:44:50 +0200, Joerg Roedel 
> > wrote: 
> > > On Fri, Sep 11, 2020 at 02:57:52PM -0700, Jacob Pan wrote:  
> > > > There can be multiple vendor-specific PASID data formats used in
> > > > UAPI structures. This patch adds enum type with a last entry which
> > > > makes range checking much easier.
> > > 
> > > But it also makes it much easier to screw up the numbers (which are
> > > ABI) by inserting a new value into the middle. I prefer defines here,
> > > or alternativly BUILD_BUG_ON() checks for the numbers.
> > >   
> > I am not following, the purpose of IOMMU_PASID_FORMAT_LAST *is* for
> > preparing the future insertion of new value into the middle.
> > The checking against IOMMU_PASID_FORMAT_LAST is to protect ABI
> > compatibility by making sure that out of range format are rejected in
> > all versions of the ABI.  
> 
> But with the enum you could have:
> 
> enum {
>   VTD_FOO,
>   SMMU_FOO,
>   LAST,
> };
> 
> which makes VTD_FOO==0 and SMMU_FOO==1, and when in the next version
> someone adds:
> 
> enum {
>   VTD_FOO,
>   VTD_BAR,
>   SMMU_FOO,
>   LAST,
> };
> 
> then SMMU_FOO will become 2 and break ABI. So I'd like to have this
> checked somewhere.
Got your point, will change to defines.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bugzilla-dae...@bugzilla.kernel.org: [Bug 209149] New: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3]

2020-09-24 Thread Raj, Ashok
Hi Kai

+ Alex, since he had some of the early quirks authored.

On Thu, Sep 24, 2020 at 12:31:53AM +0800, Kai-Heng Feng wrote:
> [+Cc Christoph]
> 
> > On Sep 24, 2020, at 00:03, Bjorn Helgaas  wrote:
> > 
> > [+cc IOMMU and NVMe folks]
> > 
> > Sorry, I forgot to forward this to linux-pci when it was first
> > reported.
> > 
> > Apparently this happens with v5.9-rc3, and may be related to
> > 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint"),
> > which appeared in v5.8-rc3.
> > 
> > There are several dmesg logs and proposed patches in the bugzilla, but
> > no analysis yet of what the problem is.  From the first dmesg
> > attachment (https://bugzilla.kernel.org/attachment.cgi?id=292327):
> 
> AFAIK Intel is working on it internally.
> Comet Lake probably needs ACS quirk like older generation chips.

I have confirmed with Internal documentation that the problem exists on
Comet Lake. But its fixed ICL and TGL generations.

Unfortunately I do not see if the public specupdate documents are for these
generation chipsets to makes sure all root port id's can be captured.

There is also another entry in bugzilla that was forwarded that referred to
Request Redirect Capability to be always disabled as well. This same
workaround also seems to be turning off RR for the root port. I believe it
should fix it as well. But i saw another patch attached.

Can you tell how you reproduce this? just doing a

#echo mem > /sys/power/state

is sufficient with an attached NVMe drive? 

> 
> > 
> >  [   50.434945] PM: suspend entry (deep)
> >  [   50.802086] nvme :01:00.0: saving config space at offset 0x0 
> > (reading 0x11e0f)
> >  [   50.842775] ACPI: Preparing to enter system sleep state S3
> >  [   50.858922] ACPI: Waking up from system sleep state S3
> >  [   50.883622] nvme :01:00.0: can't change power state from D3hot to 
> > D0 (config space inaccessible)
> >  [   50.947352] nvme :01:00.0: restoring config space at offset 0x0 
> > (was 0x, writing 0x11e0f)
> >  [   50.947816] pcieport :00:1b.0: DPC: containment event, 
> > status:0x1f01 source:0x
> >  [   50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error 
> > detected
> >  [   50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected 
> > (Non-Fatal), type=Transaction Layer, (Receiver ID)
> >  [   50.947830] pcieport :00:1b.0:   device [8086:06ac] error 
> > status/mask=0020/0001
> >  [   50.947831] pcieport :00:1b.0:[21] ACSViol
> > (First)
> >  [   50.947841] pcieport :00:1b.0: AER: broadcast error_detected message
> >  [   50.947843] nvme nvme0: frozen state error detected, reset controller
> > 
> > I suspect the nvme "can't change power state" and restore config space
> > errors are a consequence of the DPC event.  If DPC disables the link,
> > the device is inaccessible.
> > 
> > I don't know what caused the ACS Violation.  The AER TLP Header Log
> > might have a clue, but unfortunately we didn't print it.
> > 

Apparently it also requires to disable RR, and I'm not able to confirm if
CML requires that as well. 

pci_quirk_disable_intel_spt_pch_acs_redir() also seems to consult the same
table, so i'm not sure why we need the other patch in bugzilla is required.


Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-09-24 Thread Arvind Sankar
On Thu, Sep 24, 2020 at 10:58:35AM -0400, Ross Philipson wrote:
> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> later AMD SKINIT) to vector to during the late launch. The symbol
> sl_stub_entry is that entry point and its offset into the kernel is
> conveyed to the launching code using the MLE (Measured Launch
> Environment) header in the structure named mle_header. The offset of the
> MLE header is set in the kernel_info. The routine sl_stub contains the
> very early late launch setup code responsible for setting up the basic
> environment to allow the normal kernel startup_32 code to proceed. It is
> also responsible for properly waking and handling the APs on Intel
> platforms. The routine sl_main which runs after entering 64b mode is
> responsible for measuring configuration and module information before
> it is used like the boot params, the kernel command line, the TXT heap,
> an external initramfs, etc.
> 
> Signed-off-by: Ross Philipson 

Which version of the kernel is this based on?

> diff --git a/arch/x86/boot/compressed/head_64.S 
> b/arch/x86/boot/compressed/head_64.S
> index 97d37f0..42043bf 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -279,6 +279,21 @@ SYM_INNER_LABEL(efi32_pe_stub_entry, SYM_L_LOCAL)
>  SYM_FUNC_END(efi32_stub_entry)
>  #endif
>  
> +#ifdef CONFIG_SECURE_LAUNCH
> +SYM_FUNC_START(sl_stub_entry)
> + /*
> +  * On entry, %ebx has the entry abs offset to sl_stub_entry. To
> +  * find the beginning of where we are loaded, sub off from the
> +  * beginning.
> +  */

This requirement should be added to the documentation. Is it necessary
or can this stub just figure out the address the same way as the other
32-bit entry points, using the scratch space in bootparams as a little
stack?

> + leal(startup_32 - sl_stub_entry)(%ebx), %ebx
> +
> + /* More room to work in sl_stub in the text section */
> + jmp sl_stub
> +
> +SYM_FUNC_END(sl_stub_entry)
> +#endif
> +
>   .code64
>   .org 0x200
>  SYM_CODE_START(startup_64)
> @@ -537,6 +552,25 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>   shrq$3, %rcx
>   rep stosq
>  
> +#ifdef CONFIG_SECURE_LAUNCH
> + /*
> +  * Have to do the final early sl stub work in 64b area.
> +  *
> +  * *** NOTE ***
> +  *
> +  * Several boot params get used before we get a chance to measure
> +  * them in this call. This is a known issue and we currently don't
> +  * have a solution. The scratch field doesn't matter and loadflags
> +  * have KEEP_SEGMENTS set by the stub code. There is no obvious way
> +  * to do anything about the use of kernel_alignment or init_size
> +  * though these seem low risk.
> +  */

There are various fields in bootparams that depend on where the
kernel/initrd and cmdline are loaded in memory. If the entire bootparams
page is getting measured, does that mean they all have to be at fixed
addresses on every boot?

Also KEEP_SEGMENTS support is gone from the kernel since v5.7, since it
was unused. startup_32 now always loads a GDT and then the segment
registers. I think this should be ok for you as the only thing the flag
used to do in the 64-bit kernel was to stop startup_32 from blindly
loading __BOOT_DS into the segment registers before it had setup its own
GDT.

For the 32-bit assembler code that's being added, tip/master now has
changes that prevent the compressed kernel from having any runtime
relocations.  You'll need to revise some of the code and the data
structures initial values to avoid creating relocations.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 0/3] iommu: Add support to change default domain of an iommu group

2020-09-24 Thread Raj, Ashok
Hi Joerg,


On Mon, Sep 07, 2020 at 08:54:44PM -0700, Prakhya, Sai Praneeth wrote:
> Presently, the default domain of an iommu group is allocated during boot time
> and it cannot be changed later. So, the device would typically be either in
> identity (pass_through) mode or the device would be in DMA mode as long as the
> system is up and running. There is no way to change the default domain type
> dynamically i.e. after booting, a device cannot switch between identity mode 
> and
> DMA mode.
> 
> Assume a use case wherein the privileged user would want to use the device in
> pass-through mode when the device is used for host so that it would be high
> performing. Presently, this is not supported. Hence add support to change the
> default domain of an iommu group dynamically.
> 
> Support this by writing to a sysfs file, namely
> "/sys/kernel/iommu_groups//type".
> 
> Testing:
> 
> Tested by dynamically changing storage device (nvme) from
> 1. identity mode to DMA and making sure file transfer works
> 2. DMA mode to identity mode and making sure file transfer works
> Tested only for intel_iommu/vt-d. Would appreciate if someone could test on 
> AMD
> and ARM based machines.
> 

Just trying to followup on this series.

Sai has moved out of Intel, hence I'm trying to followup on his behalf.

Let me know if you have queued this for the next release.

> Based on iommu maintainer's 'next' branch.
> 
> Changes from V6:
> 
> 1. None except for version bump.
> 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dt-bindings: reserved-memory: Document "active" property

2020-09-24 Thread Dmitry Osipenko
24.09.2020 17:01, Thierry Reding пишет:
> On Thu, Sep 24, 2020 at 04:23:59PM +0300, Dmitry Osipenko wrote:
>> 04.09.2020 15:59, Thierry Reding пишет:
>>> From: Thierry Reding 
>>>
>>> Reserved memory regions can be marked as "active" if hardware is
>>> expected to access the regions during boot and before the operating
>>> system can take control. One example where this is useful is for the
>>> operating system to infer whether the region needs to be identity-
>>> mapped through an IOMMU.
>>>
>>> Signed-off-by: Thierry Reding 
>>> ---
>>>  .../bindings/reserved-memory/reserved-memory.txt   | 7 +++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git 
>>> a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
>>> b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>> index 4dd20de6977f..163d2927e4fc 100644
>>> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
>>> @@ -63,6 +63,13 @@ reusable (optional) - empty property
>>>able to reclaim it back. Typically that means that the operating
>>>system can use that region to store volatile or cached data that
>>>can be otherwise regenerated or migrated elsewhere.
>>> +active (optional) - empty property
>>> +- If this property is set for a reserved memory region, it indicates
>>> +  that some piece of hardware may be actively accessing this region.
>>> +  Should the operating system want to enable IOMMU protection for a
>>> +  device, all active memory regions must have been identity-mapped
>>> +  in order to ensure that non-quiescent hardware during boot can
>>> +  continue to access the memory.
>>>  
>>>  Linux implementation note:
>>>  - If a "linux,cma-default" property is present, then Linux will use the
>>>
>>
>> Hi,
>>
>> Could you please explain what devices need this quirk? I see that you're
>> targeting Tegra SMMU driver, which means that it should be some pre-T186
>> device.
> 
> Primarily I'm looking at Tegra210 and later, because on earlier devices
> the bootloader doesn't consistently initialize display. I know that it
> does on some devices, but not all of them.

AFAIK, all tablet devices starting with Tegra20 that have display panel
are initializing display at a boot time for showing splash screen. This
includes all T20/T30/T114 tablets that are already supported by upstream
kernel.

> This same code should also
> work on Tegra186 and later (with an ARM SMMU) although the situation is
> slightly more complicated there because IOMMU translations will fault by
> default long before these identity mappings can be established.
> 
>> Is this reservation needed for some device that has display
>> hardwired to a very specific IOMMU domain at the boot time?
> 
> No, this is only used to convey information about the active framebuffer
> to the kernel. In practice the DMA/IOMMU code will use this information
> to establish a 1:1 mapping on whatever IOMMU domain that was picked for
> display.
> 
>> If you're targeting devices that don't have IOMMU enabled by default at
>> the boot time, then this approach won't work for the existing devices
>> which won't ever get an updated bootloader.
> 
> If the devices don't use an IOMMU, then there should be no problem. The
> extra reserved-memory nodes would still be necessary to ensure that the
> kernel doesn't reuse the framebuffer memory for the slab allocator, but
> if no IOMMU is used, then the display controller accessing the memory
> isn't going to cause problems other than perhaps scanning out data that
> is no longer a framebuffer.
> 
> There should also be no problem for devices with an old bootloader
> because this code is triggered by the presence of a reserved-memory node
> referenced via the memory-region property. Devices with an old
> bootloader should continue to work as they did before. Although I
> suppose they would start faulting once we enable DMA/IOMMU integration
> for Tegra SMMU if they have a bootloader that does initialize display to
> actively scan out during boot.
> 
>> I think Robin Murphy already suggested that we should simply create
>> a dummy "identity" IOMMU domain by default for the DRM/VDE devices and
>> then replace it with an explicitly created domain within the drivers.
> 
> I don't recall reading about that suggestion. So does this mean that for
> certain devices we'd want to basically passthrough by default and then
> at some point during boot take over with a properly managed IOMMU
> domain?

Yes, my understanding that this is what Robin suggested here:

https://lore.kernel.org/linux-iommu/cb12808b-7316-19db-7413-b7f852a6f...@arm.com/

> The primary goal here is to move towards using the DMA API rather than
> the IOMMU API directly, so we don't really have the option of replacing
> with an explicitly created domain. Unless we have code in the DMA/IOMMU
> code that does this 

Re: [PATCH v3 6/8] iommu/arm-smmu: Add impl hook for inherit boot mappings

2020-09-24 Thread Bjorn Andersson
On Mon 21 Sep 16:08 CDT 2020, Will Deacon wrote:

> On Sat, Sep 12, 2020 at 10:25:59PM -0500, Bjorn Andersson wrote:
> > On Fri 11 Sep 12:13 CDT 2020, Robin Murphy wrote:
> > > On 2020-09-04 16:55, Bjorn Andersson wrote:
> > > > Add a new operation to allow platform implementations to inherit any
> > > > stream mappings from the boot loader.
> > > 
> > > Is there a reason we need an explicit step for this? The aim of the
> > > cfg_probe hook is that the SMMU software state should all be set up by 
> > > then,
> > > and you can mess about with it however you like before arm_smmu_reset()
> > > actually commits anything to hardware. I would have thought you could
> > > permanently steal a context bank, configure it as your bypass hole, read 
> > > out
> > > the previous SME configuration and tweak smmu->smrs and smmu->s2crs
> > > appropriately all together "invisibly" at that point.
> > 
> > I did this because as of 6a79a5a3842b ("iommu/arm-smmu: Call
> > configuration impl hook before consuming features") we no longer have
> > setup pgsize_bitmap as we hit cfg_probe, which means that I need to
> > replicate this logic to set up the iommu_domain.
> > 
> > If I avoid setting up an iommu_domain for the identity context, as you
> > request in patch 8, this shouldn't be needed anymore.
> > 
> > > If that can't work, I'm very curious as to what I've overlooked.
> > > 
> > 
> > I believe that will work, I will rework the patches and try it out.
> 
> Did you get a chance to rework this?
> 

Unfortunately not, I hope to get to this shortly.

Thanks,
Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-24 Thread Bjorn Helgaas
On Fri, Aug 21, 2020 at 03:15:39PM +0200, Jean-Philippe Brucker wrote:
> Platforms without device-tree nor ACPI can provide a topology
> description embedded into the virtio config space. Parse it.
> 
> Use PCI FIXUP to probe the config space early, because we need to
> discover the topology before any DMA configuration takes place, and the
> virtio driver may be loaded much later. Since we discover the topology
> description when probing the PCI hierarchy, the virtual IOMMU cannot
> manage other platform devices discovered earlier.

> +struct viommu_cap_config {
> + u8 bar;
> + u32 length; /* structure size */
> + u32 offset; /* structure offset within the bar */

s/the bar/the BAR/ (to match comment below).

> +static void viommu_pci_parse_topology(struct pci_dev *dev)
> +{
> + int ret;
> + u32 features;
> + void __iomem *regs, *common_regs;
> + struct viommu_cap_config cap = {0};
> + struct virtio_pci_common_cfg __iomem *common_cfg;
> +
> + /*
> +  * The virtio infrastructure might not be loaded at this point. We need
> +  * to access the BARs ourselves.
> +  */
> + ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_COMMON_CFG, );
> + if (!ret) {
> + pci_warn(dev, "common capability not found\n");

Is the lack of this capability really an error, i.e., is this
pci_warn() or pci_info()?  The "device doesn't have topology
description" below is only pci_dbg(), which suggests that we can live
without this.

Maybe a hint about what "common capability" means?

> + return;
> + }
> +
> + if (pci_enable_device_mem(dev))
> + return;
> +
> + common_regs = pci_iomap(dev, cap.bar, 0);
> + if (!common_regs)
> + return;
> +
> + common_cfg = common_regs + cap.offset;
> +
> + /* Perform the init sequence before we can read the config */
> + ret = viommu_pci_reset(common_cfg);

I guess this is some special device-specific reset, not any kind of
standard PCI reset?

> + if (ret < 0) {
> + pci_warn(dev, "unable to reset device\n");
> + goto out_unmap_common;
> + }
> +
> + iowrite8(VIRTIO_CONFIG_S_ACKNOWLEDGE, _cfg->device_status);
> + iowrite8(VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER,
> +  _cfg->device_status);
> +
> + /* Find out if the device supports topology description */
> + iowrite32(0, _cfg->device_feature_select);
> + features = ioread32(_cfg->device_feature);
> +
> + if (!(features & BIT(VIRTIO_IOMMU_F_TOPOLOGY))) {
> + pci_dbg(dev, "device doesn't have topology description");
> + goto out_reset;
> + }
> +
> + ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, );
> + if (!ret) {
> + pci_warn(dev, "device config capability not found\n");
> + goto out_reset;
> + }
> +
> + regs = pci_iomap(dev, cap.bar, 0);
> + if (!regs)
> + goto out_reset;
> +
> + pci_info(dev, "parsing virtio-iommu topology\n");
> + ret = viommu_parse_topology(>dev, regs + cap.offset,
> + pci_resource_len(dev, 0) - cap.offset);
> + if (ret)
> + pci_warn(dev, "failed to parse topology: %d\n", ret);
> +
> + pci_iounmap(dev, regs);
> +out_reset:
> + ret = viommu_pci_reset(common_cfg);
> + if (ret)
> + pci_warn(dev, "unable to reset device\n");
> +out_unmap_common:
> + pci_iounmap(dev, common_regs);
> +}
> +
> +/*
> + * Catch a PCI virtio-iommu implementation early to get the topology 
> description
> + * before we start probing other endpoints.
> + */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1040 + 
> VIRTIO_ID_IOMMU,
> + viommu_pci_parse_topology);
> -- 
> 2.28.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/13] x86: Secure Launch kernel late boot stub

2020-09-24 Thread Ross Philipson
The routine slaunch_setup is called out of the x86 specific setup_arch
routine during early kernel boot. After determining what platform is
present, various operations specific to that platform occur. This
includes finalizing setting for the platform late launch and verifying
that memory protections are in place.

For TXT, this code also reserves the original compressed kernel setup
area where the APs were left looping so that this memory cannot be used.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/setup.c|   3 +
 arch/x86/kernel/slaunch.c  | 495 +
 drivers/iommu/intel/dmar.c |   4 +
 4 files changed, 503 insertions(+)
 create mode 100644 arch/x86/kernel/slaunch.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e77261d..318366f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_X86_32)  += tls.o
 obj-$(CONFIG_IA32_EMULATION)   += tls.o
 obj-y  += step.o
 obj-$(CONFIG_INTEL_TXT)+= tboot.o
+obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o
 obj-$(CONFIG_ISA_DMA_API)  += i8237.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3511736..cae9476 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -1009,6 +1010,8 @@ void __init setup_arch(char **cmdline_p)
early_gart_iommu_check();
 #endif
 
+   slaunch_setup();
+
/*
 * partially used pages are not usable - thus
 * we are rounding upwards:
diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
new file mode 100644
index 000..e040e32
--- /dev/null
+++ b/arch/x86/kernel/slaunch.c
@@ -0,0 +1,495 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Secure Launch late validation/setup, securityfs exposure and
+ * finalization support.
+ *
+ * Copyright (c) 2020, Oracle and/or its affiliates.
+ * Copyright (c) 2020 Apertus Solutions, LLC
+ *
+ * Author(s):
+ * Daniel P. Smith 
+ *
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static u32 sl_flags;
+static struct sl_ap_wake_info ap_wake_info;
+static u64 evtlog_addr;
+static u32 evtlog_size;
+static u64 vtd_pmr_lo_size;
+
+/* This should be plenty of room */
+static u8 txt_dmar[PAGE_SIZE] __aligned(16);
+
+u32 slaunch_get_flags(void)
+{
+   return sl_flags;
+}
+EXPORT_SYMBOL(slaunch_get_flags);
+
+struct sl_ap_wake_info *slaunch_get_ap_wake_info(void)
+{
+   return _wake_info;
+}
+
+struct acpi_table_header *slaunch_get_dmar_table(struct acpi_table_header 
*dmar)
+{
+   /* The DMAR is only stashed and provided via TXT on Intel systems */
+   if (memcmp(txt_dmar, "DMAR", 4))
+   return dmar;
+
+   return (struct acpi_table_header *)(_dmar[0]);
+}
+
+static void __init __noreturn slaunch_txt_reset(void __iomem *txt,
+   const char *msg, u64 error)
+{
+   u64 one = 1, val;
+
+   pr_err("%s", msg);
+
+   /*
+* This performs a TXT reset with a sticky error code. The reads of
+* TXT_CR_E2STS act as barriers.
+*/
+   memcpy_toio(txt + TXT_CR_ERRORCODE, , sizeof(u64));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(u64));
+   memcpy_toio(txt + TXT_CR_CMD_NO_SECRETS, , sizeof(u64));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(u64));
+   memcpy_toio(txt + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(u64));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(u64));
+   memcpy_toio(txt + TXT_CR_CMD_RESET, , sizeof(u64));
+
+   for ( ; ; )
+   asm volatile ("hlt");
+
+   unreachable();
+}
+
+/*
+ * The TXT heap is too big to map all at once with early_ioremap
+ * so it is done a table at a time.
+ */
+static void __init *txt_early_get_heap_table(void __iomem *txt, u32 type,
+u32 bytes)
+{
+   void *heap;
+   u64 base, size, offset = 0;
+   int i;
+
+   if (type > TXT_SINIT_MLE_DATA_TABLE)
+   slaunch_txt_reset(txt,
+   "Error invalid table type for early heap walk\n",
+   SL_ERROR_HEAP_WALK);
+
+   memcpy_fromio(, txt + TXT_CR_HEAP_BASE, sizeof(u64));
+   memcpy_fromio(, txt + TXT_CR_HEAP_SIZE, sizeof(u64));
+
+   /* Iterate over heap tables looking for table of "type" */
+   for (i = 0; i < type; i++) {
+   base += offset;
+   heap = early_memremap(base, sizeof(u64));
+   if (!heap)
+   slaunch_txt_reset(txt,

[PATCH 04/13] x86: Add early TPM TIS/CRB interface support for Secure Launch

2020-09-24 Thread Ross Philipson
From: "Daniel P. Smith" 

The Secure Launch capability that is part of the compressed kernel
requires the ability to send measurements it takes to the TPM. This
commit introduces the necessary code to communicate with the hardware
interface of TPM devices.

Signed-off-by: Daniel P. Smith 
Signed-off-by: Ross Philipson 
---
 arch/x86/boot/compressed/Makefile |   2 +
 arch/x86/boot/compressed/tpm/crb.c| 304 ++
 arch/x86/boot/compressed/tpm/crb.h|  20 ++
 arch/x86/boot/compressed/tpm/tis.c| 215 +
 arch/x86/boot/compressed/tpm/tis.h|  46 +
 arch/x86/boot/compressed/tpm/tpm.h|  48 +
 arch/x86/boot/compressed/tpm/tpm_buff.c   | 121 
 arch/x86/boot/compressed/tpm/tpm_common.h | 127 +
 arch/x86/boot/compressed/tpm/tpmbuff.h|  34 
 arch/x86/boot/compressed/tpm/tpmio.c  |  51 +
 10 files changed, 968 insertions(+)
 create mode 100644 arch/x86/boot/compressed/tpm/crb.c
 create mode 100644 arch/x86/boot/compressed/tpm/crb.h
 create mode 100644 arch/x86/boot/compressed/tpm/tis.c
 create mode 100644 arch/x86/boot/compressed/tpm/tis.h
 create mode 100644 arch/x86/boot/compressed/tpm/tpm.h
 create mode 100644 arch/x86/boot/compressed/tpm/tpm_buff.c
 create mode 100644 arch/x86/boot/compressed/tpm/tpm_common.h
 create mode 100644 arch/x86/boot/compressed/tpm/tpmbuff.h
 create mode 100644 arch/x86/boot/compressed/tpm/tpmio.c

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 0fd84b9..5515afa 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -99,6 +99,8 @@ efi-obj-$(CONFIG_EFI_STUB) = 
$(objtree)/drivers/firmware/efi/libstub/lib.a
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA256) += $(obj)/early_sha256.o
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA512) += $(obj)/early_sha512.o
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/tpm/tpmio.o 
$(obj)/tpm/tpm_buff.o \
+   $(obj)/tpm/tis.o $(obj)/tpm/crb.o
 
 # The compressed kernel is built with -fPIC/-fPIE so that a boot loader
 # can place it anywhere in memory and it will still run. However, since
diff --git a/arch/x86/boot/compressed/tpm/crb.c 
b/arch/x86/boot/compressed/tpm/crb.c
new file mode 100644
index 000..05c4038
--- /dev/null
+++ b/arch/x86/boot/compressed/tpm/crb.c
@@ -0,0 +1,304 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2020 Apertus Solutions, LLC
+ *
+ * Author(s):
+ *  Daniel P. Smith 
+ */
+
+#include 
+#include "tpm.h"
+#include "tpmbuff.h"
+#include "crb.h"
+#include "tpm_common.h"
+
+#define TPM_LOC_STATE  0x
+#define TPM_LOC_CTRL   0x0008
+#define TPM_LOC_STS0x000C
+#define TPM_CRB_INTF_ID0x0030
+#define TPM_CRB_CTRL_EXT   0x0038
+#define TPM_CRB_CTRL_REQ   0x0040
+#define TPM_CRB_CTRL_STS   0x0044
+#define TPM_CRB_CTRL_CANCEL0x0048
+#define TPM_CRB_CTRL_START 0x004C
+#define TPM_CRB_INT_ENABLE 0x0050
+#define TPM_CRB_INT_STS0x0054
+#define TPM_CRB_CTRL_CMD_SIZE  0x0058
+#define TPM_CRB_CTRL_CMD_LADDR 0x005C
+#define TPM_CRB_CTRL_CMD_HADDR 0x0060
+#define TPM_CRB_CTRL_RSP_SIZE  0x0064
+#define TPM_CRB_CTRL_RSP_ADDR  0x0068
+#define TPM_CRB_DATA_BUFFER0x0080
+
+#define REGISTER(l, r) (((l) << 12) | (r))
+
+static u8 locality = TPM_NO_LOCALITY;
+
+struct tpm_loc_state {
+   union {
+   u8 val;
+   struct {
+   u8 tpm_established:1;
+   u8 loc_assigned:1;
+   u8 active_locality:3;
+   u8 _reserved:2;
+   u8 tpm_reg_valid_sts:1;
+   };
+   };
+} __packed;
+
+struct tpm_loc_ctrl {
+   union {
+   u32 val;
+   struct {
+   u32 request_access:1;
+   u32 relinquish:1;
+   u32 seize:1;
+   u32 reset_establishment_bit:1;
+   u32 _reserved:28;
+   };
+   };
+} __packed;
+
+struct tpm_loc_sts {
+   union {
+   u32 val;
+   struct {
+   u32 granted:1;
+   u32 beenSeized:1;
+   u32 _reserved:30;
+   };
+   };
+} __packed;
+
+struct tpm_crb_ctrl_req {
+   union {
+   u32 val;
+   struct {
+   u32 cmd_ready:1;
+   u32 go_idle:1;
+   u32 _reserved:30;
+   };
+   };
+} __packed;
+
+struct tpm_crb_ctrl_sts {
+   union {
+   u32 val;
+   struct {
+   u32 tpm_sts:1;
+   u32 tpm_idle:1;
+   u32 _reserved:30;
+   };
+   };
+} __packed;
+
+struct tpm_crb_intf_id_ext {
+   union {
+

[PATCH 00/13] x86: Trenchboot secure dynamic launch Linux kernel support

2020-09-24 Thread Ross Philipson
The Trenchboot project focus on boot security has led to the enabling of
the Linux kernel to be directly invocable by the x86 Dynamic Launch
instruction(s) for establishing a Dynamic Root of Trust for Measurement
(DRTM). The dynamic launch will be initiated by a boot loader with
associated support added to it, for example the first targeted boot
loader will be GRUB2. An integral part of establishing the DRTM involves
measuring everything that is intended to be run (kernel image, initrd,
etc) and everything that will configure that kernel to run (command
line, boot params, etc) into specific PCRs, the DRTM PCRs (17-22), in
the TPM. Another key aspect is the dynamic launch is rooted in hardware,
that is to say the hardware (CPU) is what takes the first measurement
for the chain of integrity measurements. On Intel this is done using
the GETSEC instruction provided by Intel's TXT and the SKINIT
instruction provided by AMD's AMD-V. Information on these technologies
can be readily found online. This patchset introduces Intel TXT support.

To enable the kernel to be launched by GETSEC, a stub must be built
into the setup section of the compressed kernel to handle the specific
state that the dynamic launch process leaves the BSP in. This is
analogous to the EFI stub that is found in the same area. Also this stub
must measure everything that is going to be used as early as possible.
This stub code and subsequent code must also deal with the specific
state that the dynamic launch leaves the APs in.

A quick note on terminology. The larger open source project itself is
called Trenchboot, which is hosted on Github (links below). The kernel
feature enabling the use of the x86 technology is referred to as "Secure
Launch" within the kernel code. As such the prefixes sl_/SL_ or
slaunch/SLAUNCH will be seen in the code. The stub code discussed above
is referred to as the SL stub.

The basic flow is:

 - Entry from the dynamic launch jumps to the SL stub
 - SL stub fixes up the world on the BSP
 - For TXT, SL stub wakes the APs, fixes up their worlds
 - For TXT, APs are left halted waiting for an NMI to wake them
 - SL stub jumps to startup_32
 - SL main runs to measure configuration and module information into the
   DRTM PCRs. It also locates the TPM event log.
 - Kernel boot proceeds normally from this point.
 - During early setup, slaunch_setup() runs to finish some validation
   and setup tasks.
 - The SMP bringup code is modified to wake the waiting APs. APs vector
   to rmpiggy and start up normally from that point.
 - Kernel boot finishes booting normally
 - SL securityfs module is present to allow reading and writing of the
   TPM event log.
 - SEXIT support to leave SMX mode is present on the kexec path and
   the various reboot paths (poweroff, reset, halt).

Links:

The Trenchboot project including documentation:

https://github.com/trenchboot

Intel TXT is documented in its own specification and in the SDM Instruction Set 
volume:

https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
https://software.intel.com/en-us/articles/intel-sdm

AMD SKINIT is documented in the System Programming manual:

https://www.amd.com/system/files/TechDocs/24593.pdf

GRUB2 pre-launch support patchset (WIP):

https://lists.gnu.org/archive/html/grub-devel/2020-05/msg00011.html

Thanks
Ross Philipson and Daniel P. Smith

Daniel P. Smith (4):
  x86: Add early TPM TIS/CRB interface support for Secure Launch
  x86: Add early TPM1.2/TPM2.0 interface support for Secure Launch
  x86: Add early general TPM interface support for Secure Launch
  x86: Secure Launch adding event log securityfs

Ross Philipson (9):
  x86: Secure Launch Kconfig
  x86: Secure Launch main header file
  x86: Add early SHA support for Secure Launch early measurements
  x86: Secure Launch kernel early boot stub
  x86: Secure Launch kernel late boot stub
  x86: Secure Launch SMP bringup support
  kexec: Secure Launch kexec SEXIT support
  reboot: Secure Launch SEXIT support on reboot paths
  tpm: Allow locality 2 to be set when initializing the TPM for Secure
Launch

 Documentation/x86/boot.rst|   9 +
 arch/x86/Kconfig  |  36 ++
 arch/x86/boot/compressed/Makefile |   8 +
 arch/x86/boot/compressed/early_sha1.c | 104 
 arch/x86/boot/compressed/early_sha1.h |  17 +
 arch/x86/boot/compressed/early_sha256.c   |   6 +
 arch/x86/boot/compressed/early_sha512.c   |   6 +
 arch/x86/boot/compressed/head_64.S|  34 +
 arch/x86/boot/compressed/kernel_info.S|   7 +
 arch/x86/boot/compressed/sl_main.c| 390 
 arch/x86/boot/compressed/sl_stub.S| 606 ++
 arch/x86/boot/compressed/tpm/crb.c| 304 +
 arch/x86/boot/compressed/tpm/crb.h|  20 +
 arch/x86/boot/compressed/tpm/tis.c| 215 +++
 arch/x86/boot/compressed/tpm/tis.h 

[PATCH 09/13] x86: Secure Launch SMP bringup support

2020-09-24 Thread Ross Philipson
On Intel, the APs are left in a well documented state after TXT performs
the late launch. Specifically they cannot have #INIT asserted on them so
a standard startup via INIT/SIPI/SIPI cannot be performed. Instead the
early SL stub code parked the APs in a pause/jmp loop waiting for an NMI.
The modified SMP boot code is called for the Secure Launch case. The
jump address for the RM piggy entry point is fixed up in the jump where
the APs are waiting and an NMI IPI is sent to the AP. The AP vectors to
the Secure Launch entry point in the RM piggy which mimics what the real
mode code would do then jumps the the standard RM piggy protected mode
entry point.

Signed-off-by: Ross Philipson 
---
 arch/x86/include/asm/realmode.h  |  3 ++
 arch/x86/kernel/smpboot.c| 86 
 arch/x86/realmode/rm/header.S|  3 ++
 arch/x86/realmode/rm/trampoline_64.S | 37 
 4 files changed, 129 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index b35030e..20bc283 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -34,6 +34,9 @@ struct real_mode_header {
 #ifdef CONFIG_X86_64
u32 machine_real_restart_seg;
 #endif
+#ifdef CONFIG_SECURE_LAUNCH
+   u32 sl_trampoline_start32;
+#endif
 };
 
 /* This must match data at realmode/rm/trampoline_{32,64}.S */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f5ef689..0ca0b07 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1017,6 +1018,83 @@ int common_cpu_up(unsigned int cpu, struct task_struct 
*idle)
return 0;
 }
 
+#ifdef CONFIG_SECURE_LAUNCH
+
+static atomic_t first_ap_only = {1};
+
+/*
+ * Called to fix the long jump address for the waiting APs to vector to
+ * the correct startup location in the Secure Launch stub in the rmpiggy.
+ */
+static int
+slaunch_fixup_jump_vector(void)
+{
+   struct sl_ap_wake_info *ap_wake_info;
+   unsigned int *ap_jmp_ptr = 0;
+
+   if (!atomic_dec_and_test(_ap_only))
+   return 0;
+
+   ap_wake_info = slaunch_get_ap_wake_info();
+
+   ap_jmp_ptr = (unsigned int *)__va(ap_wake_info->ap_wake_block +
+ ap_wake_info->ap_jmp_offset);
+
+   *ap_jmp_ptr = real_mode_header->sl_trampoline_start32;
+
+   pr_info("TXT AP long jump address updated\n");
+
+   return 0;
+}
+
+/*
+ * TXT AP startup is quite different than normal. The APs cannot have #INIT
+ * asserted on them or receive SIPIs. The early Secure Launch code has parked
+ * the APs in a pause loop waiting to receive an NMI. This will wake the APs
+ * and have them jump to the protected mode code in the rmpiggy where the rest
+ * of the SMP boot of the AP will proceed normally.
+ */
+static int
+slaunch_wakeup_cpu_from_txt(int cpu, int apicid)
+{
+   unsigned long send_status = 0, accept_status = 0;
+
+   /* Only done once */
+   if (slaunch_fixup_jump_vector())
+   return -1;
+
+   /* Send NMI IPI to idling AP and wake it up */
+   apic_icr_write(APIC_DM_NMI, apicid);
+
+   if (init_udelay == 0)
+   udelay(10);
+   else
+   udelay(300);
+
+   send_status = safe_apic_wait_icr_idle();
+
+   if (init_udelay == 0)
+   udelay(10);
+   else
+   udelay(300);
+
+   accept_status = (apic_read(APIC_ESR) & 0xEF);
+
+   if (send_status)
+   pr_err("Secure Launch IPI never delivered???\n");
+   if (accept_status)
+   pr_err("Secure Launch IPI delivery error (%lx)\n",
+   accept_status);
+
+   return (send_status | accept_status);
+}
+
+#else
+
+#define slaunch_wakeup_cpu_from_txt(cpu, apicid)   0
+
+#endif  /* !CONFIG_SECURE_LAUNCH */
+
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -1071,6 +1149,12 @@ static int do_boot_cpu(int apicid, int cpu, struct 
task_struct *idle,
cpumask_clear_cpu(cpu, cpu_initialized_mask);
smp_mb();
 
+   /* With Intel TXT, the AP startup is totally different */
+   if (slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) {
+   boot_error = slaunch_wakeup_cpu_from_txt(cpu, apicid);
+   goto txt_wake;
+   }
+
/*
 * Wake up a CPU in difference cases:
 * - Use the method in the APIC driver if it's defined
@@ -1083,6 +1167,8 @@ static int do_boot_cpu(int apicid, int cpu, struct 
task_struct *idle,
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
 cpu0_nmi_registered);
 
+txt_wake:
+
if (!boot_error) {
/*
 * Wait 10s total for first sign of life from AP

[PATCH 10/13] x86: Secure Launch adding event log securityfs

2020-09-24 Thread Ross Philipson
From: "Daniel P. Smith" 

The late init functionality registers securityfs nodes to allow access
to TXT register fields on Intel along with the fetching of and writing
events to the late launch TPM log.

Signed-off-by: Daniel P. Smith 
Signed-off-by: Ross Philipson 
Signed-off-by: garnetgrimm 
---
 arch/x86/kernel/slaunch.c | 293 +-
 1 file changed, 292 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
index e040e32..7bdb89e 100644
--- a/arch/x86/kernel/slaunch.c
+++ b/arch/x86/kernel/slaunch.c
@@ -8,7 +8,7 @@
  *
  * Author(s):
  * Daniel P. Smith 
- *
+ * Garnet T. Grimm 
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -493,3 +493,294 @@ void __init slaunch_setup(void)
vendor[3] == INTEL_CPUID_MFGID_EDX)
slaunch_setup_intel();
 }
+
+#define SL_FS_ENTRIES  10
+/* root directory node must be last */
+#define SL_ROOT_DIR_ENTRY  (SL_FS_ENTRIES - 1)
+#define SL_TXT_DIR_ENTRY   (SL_FS_ENTRIES - 2)
+#define SL_TXT_FILE_FIRST  (SL_TXT_DIR_ENTRY - 1)
+#define SL_TXT_ENTRY_COUNT 7
+
+#define DECLARE_TXT_PUB_READ_U(size, fmt, msg_size)\
+static ssize_t txt_pub_read_u##size(unsigned int offset,   \
+   loff_t *read_offset,\
+   size_t read_len,\
+   char __user *buf)   \
+{  \
+   void __iomem *txt;  \
+   char msg_buffer[msg_size];  \
+   u##size reg_value = 0;  \
+   txt = ioremap(TXT_PUB_CONFIG_REGS_BASE, \
+   TXT_NR_CONFIG_PAGES * PAGE_SIZE);   \
+   if (IS_ERR(txt))\
+   return PTR_ERR(txt);\
+   memcpy_fromio(_value, txt + offset, sizeof(u##size));   \
+   iounmap(txt);   \
+   snprintf(msg_buffer, msg_size, fmt, reg_value); \
+   return simple_read_from_buffer(buf, read_len, read_offset,  \
+   _buffer, msg_size); \
+}
+
+DECLARE_TXT_PUB_READ_U(8, "%#04x\n", 6);
+DECLARE_TXT_PUB_READ_U(32, "%#010x\n", 12);
+DECLARE_TXT_PUB_READ_U(64, "%#018llx\n", 20);
+
+#define DECLARE_TXT_FOPS(reg_name, reg_offset, reg_size)   \
+static ssize_t txt_##reg_name##_read(struct file *flip,
\
+   char __user *buf, size_t read_len, loff_t *read_offset) \
+{  \
+   return txt_pub_read_u##reg_size(reg_offset, read_offset,\
+   read_len, buf); \
+}  \
+static const struct file_operations reg_name##_ops = { \
+   .read = txt_##reg_name##_read,  \
+}
+
+DECLARE_TXT_FOPS(sts, TXT_CR_STS, 64);
+DECLARE_TXT_FOPS(ests, TXT_CR_ESTS, 8);
+DECLARE_TXT_FOPS(errorcode, TXT_CR_ERRORCODE, 32);
+DECLARE_TXT_FOPS(didvid, TXT_CR_DIDVID, 64);
+DECLARE_TXT_FOPS(e2sts, TXT_CR_E2STS, 64);
+DECLARE_TXT_FOPS(ver_emif, TXT_CR_VER_EMIF, 32);
+DECLARE_TXT_FOPS(scratchpad, TXT_CR_SCRATCHPAD, 64);
+
+/*
+ * Securityfs exposure
+ */
+struct memfile {
+   char *name;
+   void *addr;
+   size_t size;
+};
+
+static struct memfile sl_evtlog = {"eventlog", 0, 0};
+static void *txt_heap;
+static struct txt_heap_event_log_pointer2_1_element __iomem *evtlog20;
+static DEFINE_MUTEX(sl_evt_log_mutex);
+
+static ssize_t sl_evtlog_read(struct file *file, char __user *buf,
+ size_t count, loff_t *pos)
+{
+   ssize_t size;
+
+   if (!sl_evtlog.addr)
+   return 0;
+
+   mutex_lock(_evt_log_mutex);
+   size = simple_read_from_buffer(buf, count, pos, sl_evtlog.addr,
+  sl_evtlog.size);
+   mutex_unlock(_evt_log_mutex);
+
+   return size;
+}
+
+static ssize_t sl_evtlog_write(struct file *file, const char __user *buf,
+   size_t datalen, loff_t *ppos)
+{
+   char *data;
+   ssize_t result;
+
+   if (!sl_evtlog.addr)
+   return 0;
+
+   /* No partial writes. */
+   result = -EINVAL;
+   if (*ppos != 0)
+   goto out;
+
+   data = memdup_user(buf, datalen);
+   if (IS_ERR(data)) {
+   result = PTR_ERR(data);
+   goto out;
+   }
+
+   mutex_lock(_evt_log_mutex);
+   if (evtlog20)
+   result = tpm20_log_event(evtlog20, sl_evtlog.addr,

[PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-09-24 Thread Ross Philipson
The Secure Launch (SL) stub provides the entry point for Intel TXT (and
later AMD SKINIT) to vector to during the late launch. The symbol
sl_stub_entry is that entry point and its offset into the kernel is
conveyed to the launching code using the MLE (Measured Launch
Environment) header in the structure named mle_header. The offset of the
MLE header is set in the kernel_info. The routine sl_stub contains the
very early late launch setup code responsible for setting up the basic
environment to allow the normal kernel startup_32 code to proceed. It is
also responsible for properly waking and handling the APs on Intel
platforms. The routine sl_main which runs after entering 64b mode is
responsible for measuring configuration and module information before
it is used like the boot params, the kernel command line, the TXT heap,
an external initramfs, etc.

Signed-off-by: Ross Philipson 
---
 Documentation/x86/boot.rst |   9 +
 arch/x86/boot/compressed/Makefile  |   1 +
 arch/x86/boot/compressed/head_64.S |  34 ++
 arch/x86/boot/compressed/kernel_info.S |   7 +
 arch/x86/boot/compressed/sl_main.c | 390 +
 arch/x86/boot/compressed/sl_stub.S | 606 +
 arch/x86/kernel/asm-offsets.c  |  16 +
 7 files changed, 1063 insertions(+)
 create mode 100644 arch/x86/boot/compressed/sl_main.c
 create mode 100644 arch/x86/boot/compressed/sl_stub.S

diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index 7fafc7a..7232801 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -1026,6 +1026,15 @@ Offset/size: 0x000c/4
 
   This field contains maximal allowed type for setup_data and setup_indirect 
structs.
 
+   =
+Field name:mle_header_offset
+Offset/size:   0x0010/4
+   =
+
+  This field contains the offset to the Secure Launch Measured Launch 
Environment
+  (MLE) header. This offset is used to locate information needed during a 
secure
+  late launch using Intel TXT and AMD SKINIT.
+
 
 The Image Checksum
 ==
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 35947b9..d881ff7 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,6 +102,7 @@ vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA512) += 
$(obj)/early_sha512.o
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/tpm/tpmio.o 
$(obj)/tpm/tpm_buff.o \
$(obj)/tpm/tis.o $(obj)/tpm/crb.o $(obj)/tpm/tpm1_cmds.o \
$(obj)/tpm/tpm2_cmds.o $(obj)/tpm/tpm2_auth.o $(obj)/tpm/tpm.o
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/sl_main.o $(obj)/sl_stub.o
 
 # The compressed kernel is built with -fPIC/-fPIE so that a boot loader
 # can place it anywhere in memory and it will still run. However, since
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 97d37f0..42043bf 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -279,6 +279,21 @@ SYM_INNER_LABEL(efi32_pe_stub_entry, SYM_L_LOCAL)
 SYM_FUNC_END(efi32_stub_entry)
 #endif
 
+#ifdef CONFIG_SECURE_LAUNCH
+SYM_FUNC_START(sl_stub_entry)
+   /*
+* On entry, %ebx has the entry abs offset to sl_stub_entry. To
+* find the beginning of where we are loaded, sub off from the
+* beginning.
+*/
+   leal(startup_32 - sl_stub_entry)(%ebx), %ebx
+
+   /* More room to work in sl_stub in the text section */
+   jmp sl_stub
+
+SYM_FUNC_END(sl_stub_entry)
+#endif
+
.code64
.org 0x200
 SYM_CODE_START(startup_64)
@@ -537,6 +552,25 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
shrq$3, %rcx
rep stosq
 
+#ifdef CONFIG_SECURE_LAUNCH
+   /*
+* Have to do the final early sl stub work in 64b area.
+*
+* *** NOTE ***
+*
+* Several boot params get used before we get a chance to measure
+* them in this call. This is a known issue and we currently don't
+* have a solution. The scratch field doesn't matter and loadflags
+* have KEEP_SEGMENTS set by the stub code. There is no obvious way
+* to do anything about the use of kernel_alignment or init_size
+* though these seem low risk.
+*/
+   pushq   %rsi
+   movq%rsi, %rdi
+   callq   sl_main
+   popq%rsi
+#endif
+
 /*
  * Do the extraction, and jump to the new kernel..
  */
diff --git a/arch/x86/boot/compressed/kernel_info.S 
b/arch/x86/boot/compressed/kernel_info.S
index f818ee8..192d557 100644
--- a/arch/x86/boot/compressed/kernel_info.S
+++ b/arch/x86/boot/compressed/kernel_info.S
@@ -17,6 +17,13 @@ kernel_info:
/* Maximal allowed type for setup_data and setup_indirect structs. */
.long   SETUP_TYPE_MAX
 
+   /* Offset to the MLE header structure */
+#ifdef CONFIG_SECURE_LAUNCH
+   .long   mle_header

[PATCH 03/13] x86: Add early SHA support for Secure Launch early measurements

2020-09-24 Thread Ross Philipson
The SHA algorithms are necessary to measure configuration information into
the TPM as early as possible before using the values. This implementation
uses the established approach of #including the SHA libraries directly in
the code since the compressed kernel is not uncompressed at this point.

The SHA code here has its origins in the code from the main kernel. That
code could not be pulled directly into the setup portion of the compressed
kernel because of other dependencies it pulls in. The result is this is a
modified copy of that code that still leverages the core SHA algorithms.

Signed-off-by: Daniel P. Smith 
Signed-off-by: Ross Philipson 
---
 arch/x86/boot/compressed/Makefile   |   4 +
 arch/x86/boot/compressed/early_sha1.c   | 104 
 arch/x86/boot/compressed/early_sha1.h   |  17 +++
 arch/x86/boot/compressed/early_sha256.c |   6 +
 arch/x86/boot/compressed/early_sha512.c |   6 +
 include/linux/sha512.h  |  21 
 lib/sha1.c  |   4 +
 lib/sha512.c| 209 
 8 files changed, 371 insertions(+)
 create mode 100644 arch/x86/boot/compressed/early_sha1.c
 create mode 100644 arch/x86/boot/compressed/early_sha1.h
 create mode 100644 arch/x86/boot/compressed/early_sha256.c
 create mode 100644 arch/x86/boot/compressed/early_sha512.c
 create mode 100644 include/linux/sha512.h
 create mode 100644 lib/sha512.c

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index ff7894f..0fd84b9 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -96,6 +96,10 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
 
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA256) += $(obj)/early_sha256.o
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA512) += $(obj)/early_sha512.o
+
 # The compressed kernel is built with -fPIC/-fPIE so that a boot loader
 # can place it anywhere in memory and it will still run. However, since
 # it is executed as-is without any ELF relocation processing performed
diff --git a/arch/x86/boot/compressed/early_sha1.c 
b/arch/x86/boot/compressed/early_sha1.c
new file mode 100644
index 000..198c46d
--- /dev/null
+++ b/arch/x86/boot/compressed/early_sha1.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2020, Oracle and/or its affiliates.
+ * Copyright (c) 2020 Apertus Solutions, LLC.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "early_sha1.h"
+
+#define SHA1_DISABLE_EXPORT
+#include "../../../../lib/sha1.c"
+
+/* The SHA1 implementation in lib/sha1.c was written to get the workspace
+ * buffer as a parameter. This wrapper function provides a container
+ * around a temporary workspace that is cleared after the transform completes.
+ */
+static void __sha_transform(u32 *digest, const char *data)
+{
+   u32 ws[SHA1_WORKSPACE_WORDS];
+
+   sha1_transform(digest, data, ws);
+
+   memset(ws, 0, sizeof(ws));
+   /*
+* As this is cryptographic code, prevent the memset 0 from being
+* optimized out potentially leaving secrets in memory.
+*/
+   wmb();
+
+}
+
+void early_sha1_init(struct sha1_state *sctx)
+{
+   sha1_init(sctx->state);
+   sctx->count = 0;
+}
+
+void early_sha1_update(struct sha1_state *sctx,
+  const u8 *data,
+  unsigned int len)
+{
+   unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+   sctx->count += len;
+
+   if (likely((partial + len) >= SHA1_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA1_BLOCK_SIZE - partial;
+
+   memcpy(sctx->buffer + partial, data, p);
+   data += p;
+   len -= p;
+
+   __sha_transform(sctx->state, sctx->buffer);
+   }
+
+   blocks = len / SHA1_BLOCK_SIZE;
+   len %= SHA1_BLOCK_SIZE;
+
+   if (blocks) {
+   while (blocks--) {
+   __sha_transform(sctx->state, data);
+   data += SHA1_BLOCK_SIZE;
+   }
+   }
+   partial = 0;
+   }
+
+   if (len)
+   memcpy(sctx->buffer + partial, data, len);
+}
+
+void early_sha1_final(struct sha1_state *sctx, u8 *out)
+{
+   const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+   __be64 *bits = (__be64 *)(sctx->buffer + bit_offset);
+   __be32 *digest = (__be32 *)out;
+   unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+   int i;
+
+   sctx->buffer[partial++] = 0x80;
+   if (partial > bit_offset) {
+   memset(sctx->buffer + 

[PATCH 05/13] x86: Add early TPM1.2/TPM2.0 interface support for Secure Launch

2020-09-24 Thread Ross Philipson
From: "Daniel P. Smith" 

This commit introduces an abstraction for TPM1.2 and TPM2.0 devices
above the TPM hardware interface.

Signed-off-by: Daniel P. Smith 
Signed-off-by: Ross Philipson 
---
 arch/x86/boot/compressed/Makefile |   3 +-
 arch/x86/boot/compressed/tpm/tpm1.h   | 112 
 arch/x86/boot/compressed/tpm/tpm1_cmds.c  |  99 ++
 arch/x86/boot/compressed/tpm/tpm2.h   |  89 
 arch/x86/boot/compressed/tpm/tpm2_auth.c  |  44 
 arch/x86/boot/compressed/tpm/tpm2_auth.h  |  21 
 arch/x86/boot/compressed/tpm/tpm2_cmds.c  | 145 ++
 arch/x86/boot/compressed/tpm/tpm2_constants.h |  66 
 8 files changed, 578 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/boot/compressed/tpm/tpm1.h
 create mode 100644 arch/x86/boot/compressed/tpm/tpm1_cmds.c
 create mode 100644 arch/x86/boot/compressed/tpm/tpm2.h
 create mode 100644 arch/x86/boot/compressed/tpm/tpm2_auth.c
 create mode 100644 arch/x86/boot/compressed/tpm/tpm2_auth.h
 create mode 100644 arch/x86/boot/compressed/tpm/tpm2_cmds.c
 create mode 100644 arch/x86/boot/compressed/tpm/tpm2_constants.h

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 5515afa..a4308d5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -100,7 +100,8 @@ vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA256) += $(obj)/early_sha256.o
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA512) += $(obj)/early_sha512.o
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/tpm/tpmio.o 
$(obj)/tpm/tpm_buff.o \
-   $(obj)/tpm/tis.o $(obj)/tpm/crb.o
+   $(obj)/tpm/tis.o $(obj)/tpm/crb.o $(obj)/tpm/tpm1_cmds.o \
+   $(obj)/tpm/tpm2_cmds.o $(obj)/tpm/tpm2_auth.o
 
 # The compressed kernel is built with -fPIC/-fPIE so that a boot loader
 # can place it anywhere in memory and it will still run. However, since
diff --git a/arch/x86/boot/compressed/tpm/tpm1.h 
b/arch/x86/boot/compressed/tpm/tpm1.h
new file mode 100644
index 000..498fa45
--- /dev/null
+++ b/arch/x86/boot/compressed/tpm/tpm1.h
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2020 Apertus Solutions, LLC
+ *
+ * Author(s):
+ *  Daniel P. Smith 
+ *
+ * The definitions in this header are extracted from the Trusted Computing
+ * Group's "TPM Main Specification", Parts 1-3.
+ *
+ */
+
+#ifndef _TPM1_H
+#define _TPM1_H
+
+#include "tpm.h"
+
+/* Section 2.2.3 */
+#define TPM_AUTH_DATA_USAGE u8
+#define TPM_PAYLOAD_TYPE u8
+#define TPM_VERSION_BYTE u8
+#define TPM_TAG u16
+#define TPM_PROTOCOL_ID u16
+#define TPM_STARTUP_TYPE u16
+#define TPM_ENC_SCHEME u16
+#define TPM_SIG_SCHEME u16
+#define TPM_MIGRATE_SCHEME u16
+#define TPM_PHYSICAL_PRESENCE u16
+#define TPM_ENTITY_TYPE u16
+#define TPM_KEY_USAGE u16
+#define TPM_EK_TYPE u16
+#define TPM_STRUCTURE_TAG u16
+#define TPM_PLATFORM_SPECIFIC u16
+#define TPM_COMMAND_CODE u32
+#define TPM_CAPABILITY_AREA u32
+#define TPM_KEY_FLAGS u32
+#define TPM_ALGORITHM_ID u32
+#define TPM_MODIFIER_INDICATOR u32
+#define TPM_ACTUAL_COUNT u32
+#define TPM_TRANSPORT_ATTRIBUTES u32
+#define TPM_AUTHHANDLE u32
+#define TPM_DIRINDEX u32
+#define TPM_KEY_HANDLE u32
+#define TPM_PCRINDEX u32
+#define TPM_RESULT u32
+#define TPM_RESOURCE_TYPE u32
+#define TPM_KEY_CONTROL u32
+#define TPM_NV_INDEX u32 The
+#define TPM_FAMILY_ID u32
+#define TPM_FAMILY_VERIFICATION u32
+#define TPM_STARTUP_EFFECTS u32
+#define TPM_SYM_MODE u32
+#define TPM_FAMILY_FLAGS u32
+#define TPM_DELEGATE_INDEX u32
+#define TPM_CMK_DELEGATE u32
+#define TPM_COUNT_ID u32
+#define TPM_REDIT_COMMAND u32
+#define TPM_TRANSHANDLE u32
+#define TPM_HANDLE u32
+#define TPM_FAMILY_OPERATION u32
+
+/* Section 6 */
+#define TPM_TAG_RQU_COMMAND0x00C1
+#define TPM_TAG_RQU_AUTH1_COMMAND  0x00C2
+#define TPM_TAG_RQU_AUTH2_COMMAND  0x00C3
+#define TPM_TAG_RSP_COMMAND0x00C4
+#define TPM_TAG_RSP_AUTH1_COMMAND  0x00C5
+#define TPM_TAG_RSP_AUTH2_COMMAND  0x00C6
+
+/* Section 16 */
+#define TPM_SUCCESS 0x0
+
+/* Section 17 */
+#define TPM_ORD_EXTEND 0x0014
+
+#define SHA1_DIGEST_SIZE 20
+
+/* Section 5.4 */
+struct tpm_sha1_digest {
+   u8 digest[SHA1_DIGEST_SIZE];
+};
+struct tpm_digest {
+   TPM_PCRINDEX pcr;
+   union {
+   struct tpm_sha1_digest sha1;
+   } digest;
+};
+
+#define TPM_DIGEST struct tpm_sha1_digest
+#define TPM_CHOSENID_HASH  TPM_DIGEST
+#define TPM_COMPOSITE_HASH TPM_DIGEST
+#define TPM_DIRVALUE   TPM_DIGEST
+#define TPM_HMAC   TPM_DIGEST
+#define TPM_PCRVALUE   TPM_DIGEST
+#define TPM_AUDITDIGESTTPM_DIGEST
+#define TPM_DAA_TPM_SEED   TPM_DIGEST
+#define TPM_DAA_CONTEXT_SEED   TPM_DIGEST
+
+struct tpm_extend_cmd {
+   TPM_PCRINDEX pcr_num;
+   TPM_DIGEST digest;
+};
+
+struct 

[PATCH 02/13] x86: Secure Launch main header file

2020-09-24 Thread Ross Philipson
Introduce the main Secure Launch header file used in the early SL stub
and the early setup code.

Signed-off-by: Ross Philipson 
---
 include/linux/slaunch.h | 544 
 1 file changed, 544 insertions(+)
 create mode 100644 include/linux/slaunch.h

diff --git a/include/linux/slaunch.h b/include/linux/slaunch.h
new file mode 100644
index 000..88adc69
--- /dev/null
+++ b/include/linux/slaunch.h
@@ -0,0 +1,544 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Main Secure Launch header file.
+ *
+ * Copyright (c) 2020, Oracle and/or its affiliates.
+ */
+
+#ifndef _LINUX_SLAUNCH_H
+#define _LINUX_SLAUNCH_H
+
+/*
+ * Secure Launch Defined State Flags
+ */
+#define SL_FLAG_ACTIVE 0x0001
+#define SL_FLAG_ARCH_SKINIT0x0002
+#define SL_FLAG_ARCH_TXT   0x0004
+
+#ifdef CONFIG_SECURE_LAUNCH
+
+/*
+ * Secure Launch main definitions file.
+ *
+ * Copyright (c) 2019 Oracle and/or its affiliates. All rights reserved.
+ */
+
+#define __SL32_CS  0x0008
+#define __SL32_DS  0x0010
+
+#define SL_CPU_AMD 1
+#define SL_CPU_INTEL   2
+
+#define INTEL_CPUID_MFGID_EBX  0x756e6547 /* Genu */
+#define INTEL_CPUID_MFGID_EDX  0x49656e69 /* ineI */
+#define INTEL_CPUID_MFGID_ECX  0x6c65746e /* ntel */
+
+#define AMD_CPUID_MFGID_EBX0x68747541 /* Auth */
+#define AMD_CPUID_MFGID_EDX0x69746e65 /* enti */
+#define AMD_CPUID_MFGID_ECX0x444d4163 /* cAMD */
+
+/*
+ * Intel Safer Mode Extensions (SMX)
+ *
+ * Intel SMX provides a programming interface to establish a Measured Launched
+ * Environment (MLE). The measurement and protection mechanisms supported by 
the
+ * capabilities of an Intel Trusted Execution Technology (TXT) platform. SMX is
+ * the processor’s programming interface in an Intel TXT platform.
+ *
+ * See Intel SDM Volume 2 - 6.1 "Safer Mode Extensions Reference"
+ */
+
+/*
+ * SMX GETSEC Leaf Functions
+ */
+#define SMX_X86_GETSEC_SEXIT   5
+#define SMX_X86_GETSEC_SMCTRL  7
+#define SMX_X86_GETSEC_WAKEUP  8
+
+/*
+ * Intel Trusted Execution Technology MMIO Registers Banks
+ */
+#define TXT_PUB_CONFIG_REGS_BASE   0xfed3
+#define TXT_PRIV_CONFIG_REGS_BASE  0xfed2
+#define TXT_NR_CONFIG_PAGES ((TXT_PUB_CONFIG_REGS_BASE - \
+ TXT_PRIV_CONFIG_REGS_BASE) >> PAGE_SHIFT)
+
+/*
+ * Intel Trusted Execution Technology (TXT) Registers
+ */
+#define TXT_CR_STS 0x
+#define TXT_CR_ESTS0x0008
+#define TXT_CR_ERRORCODE   0x0030
+#define TXT_CR_CMD_RESET   0x0038
+#define TXT_CR_CMD_CLOSE_PRIVATE   0x0048
+#define TXT_CR_DIDVID  0x0110
+#define TXT_CR_VER_EMIF0x0200
+#define TXT_CR_CMD_UNLOCK_MEM_CONFIG   0x0218
+#define TXT_CR_SINIT_BASE  0x0270
+#define TXT_CR_SINIT_SIZE  0x0278
+#define TXT_CR_MLE_JOIN0x0290
+#define TXT_CR_HEAP_BASE   0x0300
+#define TXT_CR_HEAP_SIZE   0x0308
+#define TXT_CR_SCRATCHPAD  0x0378
+#define TXT_CR_CMD_OPEN_LOCALITY1  0x0380
+#define TXT_CR_CMD_CLOSE_LOCALITY1 0x0388
+#define TXT_CR_CMD_OPEN_LOCALITY2  0x0390
+#define TXT_CR_CMD_CLOSE_LOCALITY2 0x0398
+#define TXT_CR_CMD_SECRETS 0x08e0
+#define TXT_CR_CMD_NO_SECRETS  0x08e8
+#define TXT_CR_E2STS   0x08f0
+
+/* TXTCR_STS status bits */
+#define TXT_SENTER_DONE_STS(1<<0)
+#define TXT_SEXIT_DONE_STS (1<<1)
+
+/*
+ * SINIT/MLE Capabilities Field Bit Definitions
+ */
+#define TXT_SINIT_MLE_CAP_WAKE_GETSEC  0
+#define TXT_SINIT_MLE_CAP_WAKE_MONITOR 1
+
+/*
+ * OS/MLE Secure Launch Specific Definitions
+ */
+#define TXT_OS_MLE_STRUCT_VERSION  1
+#define TXT_OS_MLE_MAX_VARIABLE_MTRRS  32
+
+/*
+ * TXT Heap Table Enumeration
+ */
+#define TXT_BIOS_DATA_TABLE1
+#define TXT_OS_MLE_DATA_TABLE  2
+#define TXT_OS_SINIT_DATA_TABLE3
+#define TXT_SINIT_MLE_DATA_TABLE   4
+
+/*
+ * Secure Launch Defined Error Codes used in MLE-initiated TXT resets.
+ *
+ * TXT Specification
+ * Appendix I ACM Error Codes
+ */
+#define SL_ERROR_GENERIC   0xc0008001
+#define SL_ERROR_TPM_INIT  0xc0008002
+#define SL_ERROR_TPM_INVALID_LOG20 0xc0008003
+#define SL_ERROR_TPM_LOGGING_FAILED0xc0008004
+#define SL_ERROR_TPM_GET_LOC   0xc0008005
+#define SL_ERROR_TPM_EXTEND0xc0008006
+#define SL_ERROR_MTRR_INV_VCNT 0xc0008007
+#define SL_ERROR_MTRR_INV_DEF_TYPE 0xc0008008
+#define SL_ERROR_MTRR_INV_BASE 0xc0008009
+#define SL_ERROR_MTRR_INV_MASK 0xc000800a
+#define SL_ERROR_MSR_INV_MISC_EN   0xc000800b
+#define SL_ERROR_INV_AP_INTERRUPT  0xc000800c
+#define SL_ERROR_RESERVE_AP_WAKE   0xc000800d
+#define SL_ERROR_HEAP_WALK 0xc000800e
+#define SL_ERROR_HEAP_MAP  0xc000800f
+#define SL_ERROR_HEAP_MDR_VALS 0xc0008010
+#define 

[PATCH 01/13] x86: Secure Launch Kconfig

2020-09-24 Thread Ross Philipson
Initial bits to bring in Secure Launch functionality. Add Kconfig
options for compiling in/out the Secure Launch code.

Signed-off-by: Ross Philipson 
---
 arch/x86/Kconfig | 36 
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7101ac6..8957981 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1968,6 +1968,42 @@ config EFI_MIXED
 
   If unsure, say N.
 
+config SECURE_LAUNCH
+   bool "Secure Launch support"
+   default n
+   depends on X86_64
+   help
+  The Secure Launch feature allows a kernel to be loaded
+  directly through an Intel TXT measured launch. Intel TXT
+  establishes a Dynamic Root of Trust for Measurement (DRTM)
+  where the CPU measures the kernel image. This feature then
+  continues the measurement chain over kernel configuration
+  information and init images.
+
+choice
+   prompt "Select Secure Launch Algorithm for TPM2"
+   depends on SECURE_LAUNCH
+
+config SECURE_LAUNCH_SHA1
+   bool "Secure Launch TPM1 SHA1"
+   help
+  When using Secure Launch and TPM1 is present, use SHA1 hash
+  algorithm for measurements.
+
+config SECURE_LAUNCH_SHA256
+   bool "Secure Launch TPM2 SHA256"
+   help
+  When using Secure Launch and TPM2 is present, use SHA256 hash
+  algorithm for measurements.
+
+config SECURE_LAUNCH_SHA512
+   bool "Secure Launch TPM2 SHA512"
+   help
+  When using Secure Launch and TPM2 is present, use SHA512 hash
+  algorithm for measurements.
+
+endchoice
+
 config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/13] reboot: Secure Launch SEXIT support on reboot paths

2020-09-24 Thread Ross Philipson
If the MLE kernel is being powered off, rebooted or halted,
then SEXIT must be called. Note that the SEXIT GETSEC leaf
can only be called after a machine_shutdown() has been done on
these paths. The machine_shutdown() is not called on a few paths
like when poweroff action does not have a poweroff callback (into
ACPI code) or when an emergency reset is done. In these cases,
just the TXT registers are finalized but SEXIT is skipped.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/reboot.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index a515e2d..90d0647 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -732,6 +733,7 @@ static void native_machine_restart(char *__unused)
 
if (!reboot_force)
machine_shutdown();
+   slaunch_finalize(!reboot_force);
__machine_emergency_restart(0);
 }
 
@@ -742,6 +744,9 @@ static void native_machine_halt(void)
 
tboot_shutdown(TB_SHUTDOWN_HALT);
 
+   /* SEXIT done after machine_shutdown() to meet TXT requirements */
+   slaunch_finalize(1);
+
stop_this_cpu(NULL);
 }
 
@@ -750,8 +755,12 @@ static void native_machine_power_off(void)
if (pm_power_off) {
if (!reboot_force)
machine_shutdown();
+   slaunch_finalize(!reboot_force);
pm_power_off();
+   } else {
+   slaunch_finalize(0);
}
+
/* A fallback in case there is no PM info available */
tboot_shutdown(TB_SHUTDOWN_HALT);
 }
@@ -779,6 +788,7 @@ void machine_shutdown(void)
 
 void machine_emergency_restart(void)
 {
+   slaunch_finalize(0);
__machine_emergency_restart(1);
 }
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/13] x86: Add early general TPM interface support for Secure Launch

2020-09-24 Thread Ross Philipson
From: "Daniel P. Smith" 

This commit exposes a minimal general interface for the compressed
kernel to request the required TPM operations to send measurements to
a TPM.

Signed-off-by: Daniel P. Smith 
Signed-off-by: Ross Philipson 
---
 arch/x86/boot/compressed/Makefile  |   2 +-
 arch/x86/boot/compressed/tpm/tpm.c | 145 +
 2 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/boot/compressed/tpm/tpm.c

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index a4308d5..35947b9 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -101,7 +101,7 @@ vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA256) += 
$(obj)/early_sha256.o
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH_SHA512) += $(obj)/early_sha512.o
 vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/tpm/tpmio.o 
$(obj)/tpm/tpm_buff.o \
$(obj)/tpm/tis.o $(obj)/tpm/crb.o $(obj)/tpm/tpm1_cmds.o \
-   $(obj)/tpm/tpm2_cmds.o $(obj)/tpm/tpm2_auth.o
+   $(obj)/tpm/tpm2_cmds.o $(obj)/tpm/tpm2_auth.o $(obj)/tpm/tpm.o
 
 # The compressed kernel is built with -fPIC/-fPIE so that a boot loader
 # can place it anywhere in memory and it will still run. However, since
diff --git a/arch/x86/boot/compressed/tpm/tpm.c 
b/arch/x86/boot/compressed/tpm/tpm.c
new file mode 100644
index 000..0fe62d2
--- /dev/null
+++ b/arch/x86/boot/compressed/tpm/tpm.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2020 Apertus Solutions, LLC
+ *
+ * Author(s):
+ *  Daniel P. Smith 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "tpm.h"
+#include "tpmbuff.h"
+#include "tis.h"
+#include "crb.h"
+#include "tpm_common.h"
+#include "tpm1.h"
+#include "tpm2.h"
+#include "tpm2_constants.h"
+
+static struct tpm tpm;
+
+static void find_interface_and_family(struct tpm *t)
+{
+   struct tpm_interface_id intf_id;
+   struct tpm_intf_capability intf_cap;
+
+   /* Sort out whether if it is 1.2 */
+   intf_cap.val = tpm_read32(TPM_INTF_CAPABILITY_0);
+   if ((intf_cap.interface_version == TPM12_TIS_INTF_12) ||
+   (intf_cap.interface_version == TPM12_TIS_INTF_13)) {
+   t->family = TPM12;
+   t->intf = TPM_TIS;
+   return;
+   }
+
+   /* Assume that it is 2.0 and TIS */
+   t->family = TPM20;
+   t->intf = TPM_TIS;
+
+   /* Check if the interface is CRB */
+   intf_id.val = tpm_read32(TPM_INTERFACE_ID_0);
+   if (intf_id.interface_type == TPM_CRB_INTF_ACTIVE)
+   t->intf = TPM_CRB;
+}
+
+struct tpm *enable_tpm(void)
+{
+   struct tpm *t = 
+
+   find_interface_and_family(t);
+
+   switch (t->intf) {
+   case TPM_TIS:
+   if (!tis_init(t))
+   return NULL;
+   break;
+   case TPM_CRB:
+   if (!crb_init(t))
+   return NULL;
+   break;
+   }
+
+   return t;
+}
+
+u8 tpm_request_locality(struct tpm *t, u8 l)
+{
+   u8 ret = TPM_NO_LOCALITY;
+
+   ret = t->ops.request_locality(l);
+
+   if (ret < TPM_MAX_LOCALITY)
+   t->buff = alloc_tpmbuff(t->intf, ret);
+
+   return ret;
+}
+
+void tpm_relinquish_locality(struct tpm *t)
+{
+   t->ops.relinquish_locality();
+
+   free_tpmbuff(t->buff, t->intf);
+}
+
+#define MAX_TPM_EXTEND_SIZE 70 /* TPM2 SHA512 is the largest */
+int tpm_extend_pcr(struct tpm *t, u32 pcr, u16 algo,
+   u8 *digest)
+{
+   int ret = 0;
+
+   if (t->buff == NULL)
+   return -EINVAL;
+
+   if (t->family == TPM12) {
+   struct tpm_digest d;
+
+   if (algo != TPM_ALG_SHA1)
+   return -EINVAL;
+
+   d.pcr = pcr;
+   memcpy((void *)d.digest.sha1.digest,
+   digest, SHA1_DIGEST_SIZE);
+
+   ret = tpm1_pcr_extend(t, );
+   } else if (t->family == TPM20) {
+   struct tpml_digest_values *d;
+   u8 buf[MAX_TPM_EXTEND_SIZE];
+
+   d = (struct tpml_digest_values *) buf;
+   d->count = 1;
+   d->digests->alg = algo;
+   switch (algo) {
+   case TPM_ALG_SHA1:
+   memcpy(d->digests->digest, digest, SHA1_SIZE);
+   break;
+   case TPM_ALG_SHA256:
+   memcpy(d->digests->digest, digest, SHA256_SIZE);
+   break;
+   case TPM_ALG_SHA384:
+   memcpy(d->digests->digest, digest, SHA384_SIZE);
+   break;
+   case TPM_ALG_SHA512:
+   memcpy(d->digests->digest, digest, SHA512_SIZE);
+   break;
+   case TPM_ALG_SM3_256:
+   memcpy(d->digests->digest, digest, SM3256_SIZE);
+   break;
+   default:
+   

[PATCH 11/13] kexec: Secure Launch kexec SEXIT support

2020-09-24 Thread Ross Philipson
Prior to running the next kernel via kexec, the Secure Launch code
closes down private SMX resources and does an SEXIT. This allows the
next kernel to start normally without any issues starting the APs etc.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/slaunch.c | 70 +++
 kernel/kexec_core.c   |  4 +++
 2 files changed, 74 insertions(+)

diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
index 7bdb89e..b2221d8 100644
--- a/arch/x86/kernel/slaunch.c
+++ b/arch/x86/kernel/slaunch.c
@@ -784,3 +784,73 @@ static void __exit slaunch_exit(void)
 late_initcall(slaunch_late_init);
 
 __exitcall(slaunch_exit);
+
+static inline void smx_getsec_sexit(void)
+{
+   asm volatile (".byte 0x0f,0x37\n"
+ : : "a" (SMX_X86_GETSEC_SEXIT));
+}
+
+void slaunch_finalize(int do_sexit)
+{
+   void __iomem *config;
+   u64 one = 1, val;
+
+   if (!(slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)))
+   return;
+
+   config = ioremap(TXT_PRIV_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES *
+PAGE_SIZE);
+   if (!config) {
+   pr_emerg("Error SEXIT failed to ioremap TXT private reqs\n");
+   return;
+   }
+
+   /* Clear secrets bit for SEXIT */
+   memcpy_toio(config + TXT_CR_CMD_NO_SECRETS, , sizeof(u64));
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(u64));
+
+   /* Unlock memory configurations */
+   memcpy_toio(config + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(u64));
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(u64));
+
+   /* Close the TXT private register space */
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(u64));
+   memcpy_toio(config + TXT_CR_CMD_CLOSE_PRIVATE, , sizeof(u64));
+
+   /*
+* Calls to iounmap are not being done because of the state of the
+* system this late in the kexec process. Local IRQs are disabled and
+* iounmap causes a TLB flush which in turn causes a warning. Leaving
+* thse mappings is not an issue since the next kernel is going to
+* completely re-setup memory management.
+*/
+
+   /* Map public registers and do a final read fence */
+   config = ioremap(TXT_PUB_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES *
+PAGE_SIZE);
+   if (!config) {
+   pr_emerg("Error SEXIT failed to ioremap TXT public reqs\n");
+   return;
+   }
+
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(u64));
+
+   pr_emerg("TXT clear secrets bit and unlock memory complete.");
+
+   if (!do_sexit)
+   return;
+
+   if (smp_processor_id() != 0) {
+   pr_emerg("Error TXT SEXIT must be called on CPU 0\n");
+   return;
+   }
+
+   /* Disable SMX mode */
+   cr4_set_bits(X86_CR4_SMXE);
+
+   /* Do the SEXIT SMX operation */
+   smx_getsec_sexit();
+
+   pr_emerg("TXT SEXIT complete.");
+}
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0da..6b9ac11 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1179,6 +1180,9 @@ int kernel_kexec(void)
cpu_hotplug_enable();
pr_notice("Starting new kernel\n");
machine_shutdown();
+
+   /* Finalize TXT registers and do SEXIT */
+   slaunch_finalize(1);
}
 
machine_kexec(kexec_image);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 13/13] tpm: Allow locality 2 to be set when initializing the TPM for Secure Launch

2020-09-24 Thread Ross Philipson
The Secure Launch MLE environment uses PCRs that are only accessible from
the DRTM locality 2. By default the TPM drivers always initialize the
locality to 0. When a Secure Launch is in progress, initialize the
locality to 2.

Signed-off-by: Ross Philipson 
---
 drivers/char/tpm/tpm-chip.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ddaeceb..f35faab 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "tpm.h"
 
 DEFINE_IDR(dev_nums_idr);
@@ -34,12 +35,20 @@
 
 static int tpm_request_locality(struct tpm_chip *chip)
 {
-   int rc;
+   int rc, locality;
 
if (!chip->ops->request_locality)
return 0;
 
-   rc = chip->ops->request_locality(chip, 0);
+   if (slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) {
+   dev_dbg(>dev, "setting TPM locality to 2 for MLE\n");
+   locality = 2;
+   } else {
+   dev_dbg(>dev, "setting TPM locality to 0\n");
+   locality = 0;
+   }
+
+   rc = chip->ops->request_locality(chip, locality);
if (rc < 0)
return rc;
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Thierry Reding
On Thu, Sep 24, 2020 at 12:41:29PM +0200, Marek Szyprowski wrote:
> Hi Thierry,
> 
> On 24.09.2020 12:16, Thierry Reding wrote:
> > On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:
> >> On 24.09.2020 10:28, Joerg Roedel wrote:
> >>> On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
>  It allows to remap given buffer at the specific IOVA address, although
>  it doesn't guarantee that those specific addresses won't be later used
>  by the IOVA allocator. Probably it would make sense to add an API for
>  generic IOMMU-DMA framework to mark the given IOVA range as
>  reserved/unused to protect them.
> >>> There is an API for that, the IOMMU driver can return IOVA reserved
> >>> regions per device and the IOMMU core code will take care of mapping
> >>> these regions and reserving them in the IOVA allocator, so that
> >>> DMA-IOMMU code will not use it for allocations.
> >>>
> >>> Have a look at the iommu_ops->get_resv_regions() and
> >>> iommu_ops->put_resv_regions().
> >> I know about the reserved regions IOMMU API, but the main problem here,
> >> in case of Exynos, is that those reserved regions won't be created by
> >> the IOMMU driver but by the IOMMU client device. It is just a result how
> >> the media drivers manages their IOVA space. They simply have to load
> >> firmware at the IOVA address lower than the any address of the used
> >> buffers.
> > I've been working on adding a way to automatically add direct mappings
> > using reserved-memory regions parsed from device tree, see:
> >
> >  
> > https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/
> >
> > Perhaps this can be of use? With that you should be able to add a
> > reserved-memory region somewhere in the lower range that you need for
> > firmware images and have that automatically added as a direct mapping
> > so that it won't be reused later on for dynamic allocations.
> 
> Frankly, using that would be even bigger hack than what I've proposed in 
> my workaround. I see no point polluting DT with such artificial regions 
> just to ensure specific IOVA space layout.

I think I misunderstood the requirements that you have. Sounds like
there are no actual restrictions for where exactly the memory resides
for the firmware, it just has to be lower than any of the buffer
allocations. I agree, in that case using reserved memory regions does
not make sense at all.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Shaik Ameer Basha
Hi Robin and Marek,


On Thu, Sep 24, 2020 at 4:36 PM Robin Murphy  wrote:
>
> On 2020-09-24 11:47, Marek Szyprowski wrote:
> > Hi Robin,
> >
> > On 24.09.2020 12:40, Robin Murphy wrote:
> >> On 2020-09-24 11:16, Thierry Reding wrote:
> >>> On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:
>  On 24.09.2020 10:28, Joerg Roedel wrote:
> > On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
> >> It allows to remap given buffer at the specific IOVA address,
> >> although
> >> it doesn't guarantee that those specific addresses won't be later
> >> used
> >> by the IOVA allocator. Probably it would make sense to add an API for
> >> generic IOMMU-DMA framework to mark the given IOVA range as
> >> reserved/unused to protect them.
> > There is an API for that, the IOMMU driver can return IOVA reserved
> > regions per device and the IOMMU core code will take care of mapping
> > these regions and reserving them in the IOVA allocator, so that
> > DMA-IOMMU code will not use it for allocations.
> >
> > Have a look at the iommu_ops->get_resv_regions() and
> > iommu_ops->put_resv_regions().
> 
>  I know about the reserved regions IOMMU API, but the main problem here,
>  in case of Exynos, is that those reserved regions won't be created by
>  the IOMMU driver but by the IOMMU client device. It is just a result
>  how
>  the media drivers manages their IOVA space. They simply have to load
>  firmware at the IOVA address lower than the any address of the used
>  buffers.
> >>>
> >>> I've been working on adding a way to automatically add direct mappings
> >>> using reserved-memory regions parsed from device tree, see:
> >>>
> >>> https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/
> >>>
> >>> Perhaps this can be of use? With that you should be able to add a
> >>> reserved-memory region somewhere in the lower range that you need for
> >>> firmware images and have that automatically added as a direct mapping
> >>> so that it won't be reused later on for dynamic allocations.
> >>
> >> It can't easily be a *direct* mapping though - if the driver has to
> >> use the DMA masks to ensure that everything stays within the
> >> addressable range, then (as far as I'm aware) there's no physical
> >> memory that low down to equal the DMA addresses.
> >>
> >> TBH I'm not convinced that this is a sufficiently common concern to
> >> justify new APIs, or even to try to make overly generic. I think just
> >> implementing a new DMA attribute to say "please allocate/map this
> >> particular request at the lowest DMA address possible" would be good
> >> enough. Such a thing could also serve PCI drivers that actually care
> >> about SAC/DAC to give us more of a chance of removing the "try a
> >> 32-bit mask first" trick from everyone's hotpath...
> >
> > Hmm, I like the idea of such DMA attribute! It should make things really
> > simple, especially in the drivers. Thanks for the great idea! I will try
> > to implement it then instead of the workarounds I've proposed in
> > s5p-mfc/exynos4-is drivers.
>
> Right, I think it's fair to draw a line and say that anyone who wants a
> *specific* address needs to manage their own IOMMU domain.
>
> In the backend I suspect it's going to be cleanest to implement a
> dedicated iova_alloc_low() (or similar) function in the IOVA API that
> sidesteps all of the existing allocation paths and goes straight to the
> rbtree.

This is the place we started with..

But our solution was to provide an API which limits the allocation
range per device (dynamically) based on the driver request..
Something like,
limit_iova_alloc_range(dev, low_iova, high_iova); /* via helpers */

When multiple devices use the same IOVA space, how we can handle api's
like " iova_alloc_low()" ?
And providing APIs like " limit_iova_alloc_range()" may cater similar
future requirements from drivers instead of worrying about
high/low/mid etc.

Again, flexibility should be there with user drivers to request the
range they want at any point of time...

Please let us know your inputs.

>
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: gracefully handle DMAR units with no supported address widths

2020-09-24 Thread David Woodhouse
From: David Woodhouse 

Instead of bailing out completely, such a unit can still be used for
interrupt remapping.

Signed-off-by: David Woodhouse 
---
 drivers/iommu/intel/dmar.c | 46 +-
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 93e6345f3414..4420a759f095 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1024,8 +1024,8 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 {
struct intel_iommu *iommu;
u32 ver, sts;
-   int agaw = 0;
-   int msagaw = 0;
+   int agaw = -1;
+   int msagaw = -1;
int err;
 
if (!drhd->reg_base_addr) {
@@ -1050,17 +1050,28 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
}
 
err = -EINVAL;
-   agaw = iommu_calculate_agaw(iommu);
-   if (agaw < 0) {
-   pr_err("Cannot get a valid agaw for iommu (seq_id = %d)\n",
-   iommu->seq_id);
-   goto err_unmap;
-   }
-   msagaw = iommu_calculate_max_sagaw(iommu);
-   if (msagaw < 0) {
-   pr_err("Cannot get a valid max agaw for iommu (seq_id = %d)\n",
-   iommu->seq_id);
-   goto err_unmap;
+   if (cap_sagaw(iommu->cap) == 0) {
+   pr_info("%s: No supported address widths. Not attempting DMA 
translation.\n",
+   iommu->name);
+   drhd->ignored = 1;
+   }
+
+   if (!drhd->ignored) {
+   agaw = iommu_calculate_agaw(iommu);
+   if (agaw < 0) {
+   pr_err("Cannot get a valid agaw for iommu (seq_id = 
%d)\n",
+  iommu->seq_id);
+   drhd->ignored = 1;
+   }
+   }
+   if (!drhd->ignored) {
+   msagaw = iommu_calculate_max_sagaw(iommu);
+   if (msagaw < 0) {
+   pr_err("Cannot get a valid max agaw for iommu (seq_id = 
%d)\n",
+  iommu->seq_id);
+   drhd->ignored = 1;
+   agaw = -1;
+   }
}
iommu->agaw = agaw;
iommu->msagaw = msagaw;
@@ -1087,7 +1098,12 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 
raw_spin_lock_init(>register_lock);
 
-   if (intel_iommu_enabled) {
+   /*
+* This is only for hotplug; at boot time intel_iommu_enabled won't
+* be set yet. When intel_iommu_init() runs, it registers the units
+* present at boot time, then sets intel_iommu_enabled.
+*/
+   if (intel_iommu_enabled && !drhd->ignored) {
err = iommu_device_sysfs_add(>iommu, NULL,
 intel_iommu_groups,
 "%s", iommu->name);
@@ -1117,7 +1133,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 
 static void free_iommu(struct intel_iommu *iommu)
 {
-   if (intel_iommu_enabled) {
+   if (intel_iommu_enabled && iommu->iommu.ops) {
iommu_device_unregister(>iommu);
iommu_device_sysfs_remove(>iommu);
}
-- 
2.17.1



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/4] dt-bindings: reserved-memory: Document "active" property

2020-09-24 Thread Thierry Reding
On Thu, Sep 24, 2020 at 04:23:59PM +0300, Dmitry Osipenko wrote:
> 04.09.2020 15:59, Thierry Reding пишет:
> > From: Thierry Reding 
> > 
> > Reserved memory regions can be marked as "active" if hardware is
> > expected to access the regions during boot and before the operating
> > system can take control. One example where this is useful is for the
> > operating system to infer whether the region needs to be identity-
> > mapped through an IOMMU.
> > 
> > Signed-off-by: Thierry Reding 
> > ---
> >  .../bindings/reserved-memory/reserved-memory.txt   | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git 
> > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > index 4dd20de6977f..163d2927e4fc 100644
> > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > @@ -63,6 +63,13 @@ reusable (optional) - empty property
> >able to reclaim it back. Typically that means that the operating
> >system can use that region to store volatile or cached data that
> >can be otherwise regenerated or migrated elsewhere.
> > +active (optional) - empty property
> > +- If this property is set for a reserved memory region, it indicates
> > +  that some piece of hardware may be actively accessing this region.
> > +  Should the operating system want to enable IOMMU protection for a
> > +  device, all active memory regions must have been identity-mapped
> > +  in order to ensure that non-quiescent hardware during boot can
> > +  continue to access the memory.
> >  
> >  Linux implementation note:
> >  - If a "linux,cma-default" property is present, then Linux will use the
> > 
> 
> Hi,
> 
> Could you please explain what devices need this quirk? I see that you're
> targeting Tegra SMMU driver, which means that it should be some pre-T186
> device.

Primarily I'm looking at Tegra210 and later, because on earlier devices
the bootloader doesn't consistently initialize display. I know that it
does on some devices, but not all of them. This same code should also
work on Tegra186 and later (with an ARM SMMU) although the situation is
slightly more complicated there because IOMMU translations will fault by
default long before these identity mappings can be established.

> Is this reservation needed for some device that has display
> hardwired to a very specific IOMMU domain at the boot time?

No, this is only used to convey information about the active framebuffer
to the kernel. In practice the DMA/IOMMU code will use this information
to establish a 1:1 mapping on whatever IOMMU domain that was picked for
display.

> If you're targeting devices that don't have IOMMU enabled by default at
> the boot time, then this approach won't work for the existing devices
> which won't ever get an updated bootloader.

If the devices don't use an IOMMU, then there should be no problem. The
extra reserved-memory nodes would still be necessary to ensure that the
kernel doesn't reuse the framebuffer memory for the slab allocator, but
if no IOMMU is used, then the display controller accessing the memory
isn't going to cause problems other than perhaps scanning out data that
is no longer a framebuffer.

There should also be no problem for devices with an old bootloader
because this code is triggered by the presence of a reserved-memory node
referenced via the memory-region property. Devices with an old
bootloader should continue to work as they did before. Although I
suppose they would start faulting once we enable DMA/IOMMU integration
for Tegra SMMU if they have a bootloader that does initialize display to
actively scan out during boot.

> I think Robin Murphy already suggested that we should simply create
> a dummy "identity" IOMMU domain by default for the DRM/VDE devices and
> then replace it with an explicitly created domain within the drivers.

I don't recall reading about that suggestion. So does this mean that for
certain devices we'd want to basically passthrough by default and then
at some point during boot take over with a properly managed IOMMU
domain?

The primary goal here is to move towards using the DMA API rather than
the IOMMU API directly, so we don't really have the option of replacing
with an explicitly created domain. Unless we have code in the DMA/IOMMU
code that does this somehow.

But I'm not sure what would be a good way to mark certain devices as
needing an identity domain by default. Do we still use the reserved-
memory node for that? That would still require some sort of flag to
specify which reserved-memory regions would need this identity mapping
because, as was pointed out in earlier review, some devices may have
reserved-memory regions that are not meant to be identity mapped.

> Secondly, all NVIDIA bootloaders are passing tegra_fbmem=... via
> kernel's 

Re: [PATCH v10 11/11] vfio: Document nested stage control

2020-09-24 Thread Zenghui Yu

Hi Eric,

On 2020/3/21 0:19, Eric Auger wrote:

The VFIO API was enhanced to support nested stage control: a bunch of
new iotcls, one DMA FAULT region and an associated specific IRQ.

Let's document the process to follow to set up nested mode.

Signed-off-by: Eric Auger 


[...]


+The userspace must be prepared to receive faults. The VFIO-PCI device
+exposes one dedicated DMA FAULT region: it contains a ring buffer and
+its header that allows to manage the head/tail indices. The region is
+identified by the following index/subindex:
+- VFIO_REGION_TYPE_NESTED/VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT
+
+The DMA FAULT region exposes a VFIO_REGION_INFO_CAP_PRODUCER_FAULT
+region capability that allows the userspace to retrieve the ABI version
+of the fault records filled by the host.


Nit: I don't see this capability in the code.


Thanks,
Zenghui
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dt-bindings: reserved-memory: Document "active" property

2020-09-24 Thread Dmitry Osipenko
04.09.2020 15:59, Thierry Reding пишет:
> From: Thierry Reding 
> 
> Reserved memory regions can be marked as "active" if hardware is
> expected to access the regions during boot and before the operating
> system can take control. One example where this is useful is for the
> operating system to infer whether the region needs to be identity-
> mapped through an IOMMU.
> 
> Signed-off-by: Thierry Reding 
> ---
>  .../bindings/reserved-memory/reserved-memory.txt   | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git 
> a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index 4dd20de6977f..163d2927e4fc 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -63,6 +63,13 @@ reusable (optional) - empty property
>able to reclaim it back. Typically that means that the operating
>system can use that region to store volatile or cached data that
>can be otherwise regenerated or migrated elsewhere.
> +active (optional) - empty property
> +- If this property is set for a reserved memory region, it indicates
> +  that some piece of hardware may be actively accessing this region.
> +  Should the operating system want to enable IOMMU protection for a
> +  device, all active memory regions must have been identity-mapped
> +  in order to ensure that non-quiescent hardware during boot can
> +  continue to access the memory.
>  
>  Linux implementation note:
>  - If a "linux,cma-default" property is present, then Linux will use the
> 

Hi,

Could you please explain what devices need this quirk? I see that you're
targeting Tegra SMMU driver, which means that it should be some pre-T186
device. Is this reservation needed for some device that has display
hardwired to a very specific IOMMU domain at the boot time?

If you're targeting devices that don't have IOMMU enabled by default at
the boot time, then this approach won't work for the existing devices
which won't ever get an updated bootloader. I think Robin Murphy already
suggested that we should simply create a dummy "identity" IOMMU domain
by default for the DRM/VDE devices and then replace it with an
explicitly created domain within the drivers.

Secondly, all NVIDIA bootloaders are passing tegra_fbmem=... via
kernel's cmdline with the physical location of the framebuffer in
memory. Maybe we could support this option?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Joerg Roedel
On Thu, Sep 24, 2020 at 08:41:21AM -0400, Michael S. Tsirkin wrote:
> But this has nothing to do with Linux.  There is also no guarantee that
> the two committees will decide to use exactly the same format. Once one
> of them sets the format in stone, we can add support for that format to
> linux. If another one is playing nice and uses the same format, we can
> use the same parsers. If it doesn't linux will have to follow suit.

Or Linux decides to support only one of the formats, which would then be
ACPI.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 12:02:55PM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 05:38:13AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> > > On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > > > OK so this looks good. Can you pls repost with the minor tweak
> > > > suggested and all acks included, and I will queue this?
> > > 
> > > My NACK still stands, as long as a few questions are open:
> > > 
> > >   1) The format used here will be the same as in the ACPI table? I
> > >  think the answer to this questions must be Yes, so this leads
> > >  to the real question:
> > 
> > I am not sure it's a must.
> 
> It is, having only one parser for the ACPI and MMIO descriptions was one
> of the selling points for MMIO in past discussions and I think it makes
> sense to keep them in sync.
> 
> > We can always tweak the parser if there are slight differences
> > between ACPI and virtio formats.
> 
> There is no guarantee that there only need to be "tweaks" until the
> ACPI table format is stablized.
> 
> Regards,
> 
>   Joerg

But this has nothing to do with Linux.  There is also no guarantee that
the two committees will decide to use exactly the same format. Once one
of them sets the format in stone, we can add support for that format to
linux. If another one is playing nice and uses the same format, we can
use the same parsers. If it doesn't linux will have to follow suit.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/13] iommu: amd: Prepare for generic IO page table framework

2020-09-24 Thread Robin Murphy

On 2020-09-23 11:14, Suravee Suthikulpanit wrote:

Add initial hook up code to implement generic IO page table framework.

Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/Kconfig   |  1 +
  drivers/iommu/amd/Makefile  |  2 +-
  drivers/iommu/amd/amd_iommu_types.h | 32 +++
  drivers/iommu/amd/io_pgtable.c  | 89 +
  drivers/iommu/amd/iommu.c   | 10 
  drivers/iommu/io-pgtable.c  |  3 +
  include/linux/io-pgtable.h  |  2 +
  7 files changed, 128 insertions(+), 11 deletions(-)
  create mode 100644 drivers/iommu/amd/io_pgtable.c

diff --git a/drivers/iommu/amd/Kconfig b/drivers/iommu/amd/Kconfig
index 626b97d0dd21..a3cbafb603f5 100644
--- a/drivers/iommu/amd/Kconfig
+++ b/drivers/iommu/amd/Kconfig
@@ -10,6 +10,7 @@ config AMD_IOMMU
select IOMMU_API
select IOMMU_IOVA
select IOMMU_DMA
+   select IOMMU_IO_PGTABLE
depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
help
  With this option you can enable support for AMD IOMMU hardware in
diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index dc5a2fa4fd37..a935f8f4b974 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
  # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o
+obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o
  obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
  obj-$(CONFIG_AMD_IOMMU_V2) += iommu_v2.o
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index f696ac7c5f89..77cd8d966fbc 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -15,6 +15,7 @@
  #include 
  #include 
  #include 
+#include 
  
  /*

   * Maximum number of IOMMUs supported
@@ -252,6 +253,19 @@
  
  #define GA_GUEST_NR		0x1
  
+#define IOMMU_IN_ADDR_BIT_SIZE  52

+#define IOMMU_OUT_ADDR_BIT_SIZE 52
+
+/*
+ * This bitmap is used to advertise the page sizes our hardware support
+ * to the IOMMU core, which will then use this information to split
+ * physically contiguous memory regions it is mapping into page sizes
+ * that we support.
+ *
+ * 512GB Pages are not supported due to a hardware bug
+ */
+#define AMD_IOMMU_PGSIZES  ((~0xFFFUL) & ~(2ULL << 38))
+
  /* Bit value definition for dte irq remapping fields*/
  #define DTE_IRQ_PHYS_ADDR_MASK(((1ULL << 45)-1) << 6)
  #define DTE_IRQ_REMAP_INTCTL_MASK (0x3ULL << 60)
@@ -461,6 +475,23 @@ struct amd_irte_ops;
  
  #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED  (1 << 0)
  
+#define io_pgtable_to_data(x) \

+   container_of((x), struct amd_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x) \
+   io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+#define io_pgtable_ops_to_domain(x) \
+   container_of(io_pgtable_ops_to_data(x), \
+struct protection_domain, iop)
+
+struct amd_io_pgtable {
+   struct io_pgtable_cfg   pgtbl_cfg;
+   struct io_pgtable   iop;
+   int mode;
+   u64 *root;
+};
+
  /*
   * This structure contains generic data for  IOMMU protection domains
   * independent of their use.
@@ -469,6 +500,7 @@ struct protection_domain {
struct list_head dev_list; /* List of all devices in this domain */
struct iommu_domain domain; /* generic domain handle used by
   iommu core code */
+   struct amd_io_pgtable iop;
spinlock_t lock;/* mostly used to lock the page table*/
u16 id; /* the domain id written to the device table */
atomic64_t pt_root; /* pgtable root and pgtable mode */
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
new file mode 100644
index ..452cad26a2b3
--- /dev/null
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CPU-agnostic AMD IO page table allocator.
+ *
+ * Copyright (C) 2020 Advanced Micro Devices, Inc.
+ * Author: Suravee Suthikulpanit 
+ */
+
+#define pr_fmt(fmt) "AMD-Vi: " fmt
+#define dev_fmt(fmt)pr_fmt(fmt)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "amd_iommu_types.h"
+#include "amd_iommu.h"
+
+static void v1_tlb_flush_all(void *cookie)
+{
+}
+
+static void v1_tlb_flush_walk(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+}
+
+static void v1_tlb_flush_leaf(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+}
+
+static void v1_tlb_add_page(struct iommu_iotlb_gather *gather,
+unsigned long iova, size_t granule,
+void *cookie)
+{
+   struct protection_domain *pdom = cookie;
+   struct iommu_domain 

Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Joerg Roedel
On Thu, Sep 24, 2020 at 12:29:53PM +0200, Jean-Philippe Brucker wrote:
> It's not possible to use exactly the same code for parsing.

The access methods can be separate and don't affect the parsing logic.

> The access methods are different (need to deal with port-IO for
> built-in description on PCI, for example) and more importantly, the
> structure is different as well. The ACPI table needs nodes for
> virtio-iommu while the built-in description is contained in the
> virtio-iommu itself. So the endpoint nodes point to virtio-iommu node
> on ACPI, while they don't need a pointer on the built-in desc. I kept
> as much as possible common in structures and implementation, but in
> the end we still need about 200 unique lines on each side.

Will it hurt the non-ACPI version to have the not-needed pointers anyway
to keep the parsers the same?


Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dt-bindings: reserved-memory: Document "active" property

2020-09-24 Thread Thierry Reding
On Tue, Sep 15, 2020 at 02:36:48PM +0200, Thierry Reding wrote:
> On Mon, Sep 14, 2020 at 04:08:29PM -0600, Rob Herring wrote:
> > On Fri, Sep 04, 2020 at 02:59:57PM +0200, Thierry Reding wrote:
> > > From: Thierry Reding 
> > > 
> > > Reserved memory regions can be marked as "active" if hardware is
> > > expected to access the regions during boot and before the operating
> > > system can take control. One example where this is useful is for the
> > > operating system to infer whether the region needs to be identity-
> > > mapped through an IOMMU.
> > 
> > I like simple solutions, but this hardly seems adequate to solve the 
> > problem of passing IOMMU setup from bootloader/firmware to the OS. Like 
> > what is the IOVA that's supposed to be used if identity mapping is not 
> > used?
> 
> The assumption here is that if the region is not active there is no need
> for the IOVA to be specified because the kernel will allocate memory and
> assign any IOVA of its choosing.
> 
> Also, note that this is not meant as a way of passing IOMMU setup from
> the bootloader or firmware to the OS. The purpose of this is to specify
> that some region of memory is actively being accessed during boot. The
> particular case that I'm looking at is where the bootloader set up a
> splash screen and keeps it on during boot. The bootloader has not set up
> an IOMMU mapping and the identity mapping serves as a way of keeping the
> accesses by the display hardware working during the transitional period
> after the IOMMU translations have been enabled by the kernel but before
> the kernel display driver has had a chance to set up its own IOMMU
> mappings.
> 
> > If you know enough about the regions to assume identity mapping, then 
> > can't you know if active or not?
> 
> We could alternatively add some property that describes the region as
> requiring an identity mapping. But note that we can't make any
> assumptions here about the usage of these regions because the IOMMU
> driver simply has no way of knowing what they are being used for.
> 
> Some additional information is required in device tree for the IOMMU
> driver to be able to make that decision.

Rob, can you provide any hints on exactly how you want to move this
forward? I don't know in what direction you'd like to proceed.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-09-24 Thread Shameerali Kolothum Thodi



> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
> Sent: 24 September 2020 11:14
> To: Shameerali Kolothum Thodi 
> Cc: iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org;
> linux...@kvack.org; fenghua...@intel.com; catalin.mari...@arm.com;
> Suzuki K Poulose ; robin.mur...@arm.com;
> zhangfei@linaro.org; w...@kernel.org
> Subject: Re: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features
> 
> Hi Shameer,
> 
> On Mon, Sep 21, 2020 at 08:59:39AM +, Shameerali Kolothum Thodi
> wrote:
> > > +bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
> > > +{
> > > + unsigned long reg, fld;
> > > + unsigned long oas;
> > > + unsigned long asid_bits;
> > > + u32 feat_mask = ARM_SMMU_FEAT_BTM |
> > > ARM_SMMU_FEAT_COHERENCY;
> >
> > Why is BTM mandated for SVA? I couldn't find this requirement in SMMU spec
> > (Sorry if I missed it or this got discussed earlier). But if performance is 
> > the
> only concern here,
> > is it better just to allow it with a warning rather than limiting SMMUs 
> > without
> BTM?
> 
> It's a performance concern and requires to support multiple
> configurations, but the spec allows it. Are there SMMUs without BTM that
> need it?

Ok. Thanks for clarifying. May be better to add a comment here. Our platforms
do support BTM, but I had a strange case where the UEFI didn't enable DVM
but SMMU reported BTM and was causing random failures due to lack of
explicit tlbi on mm invalidation. Anyway that doesn't count here :)

Thanks,
Shameer

> Thanks,
> Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Robin Murphy

On 2020-09-24 11:47, Marek Szyprowski wrote:

Hi Robin,

On 24.09.2020 12:40, Robin Murphy wrote:

On 2020-09-24 11:16, Thierry Reding wrote:

On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:

On 24.09.2020 10:28, Joerg Roedel wrote:

On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:

It allows to remap given buffer at the specific IOVA address,
although
it doesn't guarantee that those specific addresses won't be later
used
by the IOVA allocator. Probably it would make sense to add an API for
generic IOMMU-DMA framework to mark the given IOVA range as
reserved/unused to protect them.

There is an API for that, the IOMMU driver can return IOVA reserved
regions per device and the IOMMU core code will take care of mapping
these regions and reserving them in the IOVA allocator, so that
DMA-IOMMU code will not use it for allocations.

Have a look at the iommu_ops->get_resv_regions() and
iommu_ops->put_resv_regions().


I know about the reserved regions IOMMU API, but the main problem here,
in case of Exynos, is that those reserved regions won't be created by
the IOMMU driver but by the IOMMU client device. It is just a result
how
the media drivers manages their IOVA space. They simply have to load
firmware at the IOVA address lower than the any address of the used
buffers.


I've been working on adding a way to automatically add direct mappings
using reserved-memory regions parsed from device tree, see:

https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/

Perhaps this can be of use? With that you should be able to add a
reserved-memory region somewhere in the lower range that you need for
firmware images and have that automatically added as a direct mapping
so that it won't be reused later on for dynamic allocations.


It can't easily be a *direct* mapping though - if the driver has to
use the DMA masks to ensure that everything stays within the
addressable range, then (as far as I'm aware) there's no physical
memory that low down to equal the DMA addresses.

TBH I'm not convinced that this is a sufficiently common concern to
justify new APIs, or even to try to make overly generic. I think just
implementing a new DMA attribute to say "please allocate/map this
particular request at the lowest DMA address possible" would be good
enough. Such a thing could also serve PCI drivers that actually care
about SAC/DAC to give us more of a chance of removing the "try a
32-bit mask first" trick from everyone's hotpath...


Hmm, I like the idea of such DMA attribute! It should make things really
simple, especially in the drivers. Thanks for the great idea! I will try
to implement it then instead of the workarounds I've proposed in
s5p-mfc/exynos4-is drivers.


Right, I think it's fair to draw a line and say that anyone who wants a 
*specific* address needs to manage their own IOMMU domain.


In the backend I suspect it's going to be cleanest to implement a 
dedicated iova_alloc_low() (or similar) function in the IOVA API that 
sidesteps all of the existing allocation paths and goes straight to the 
rbtree.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] amd : iommu : Initial IOMMU support for SNP

2020-09-24 Thread Joerg Roedel
On Wed, Sep 23, 2020 at 12:13:44PM +, Suravee Suthikulpanit wrote:
> Suravee Suthikulpanit (3):
>   iommu: amd: Use 4K page for completion wait write-back semaphore
>   iommu: amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR
>   iommu: amd: Re-purpose Exclusion range registers to support SNP CWWB

Applied, thanks. I am slightly concerned about the re-purposing of the
exclusion-range registers based on a feature bit being set. This makes
the hardware incompatible to older IOMMU drivers which do not check the
FEATURE_SNP bit.

It will probably work in this case, as the firmware on systems with
IOMMU-SNP support will not declare exclusion ranges at all and
exclusion-ranges in the IOMMU hardware have been a bad idea since
forever, but it would have been nicer if hardware actually
provided a bit to enable this behavior.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] iommu: amd: Add Generic IO Page Table Framework Support

2020-09-24 Thread Suravee Suthikulpanit




On 9/24/20 5:34 PM, Joerg Roedel wrote:

Hi Suravee,

On Wed, Sep 23, 2020 at 10:14:29AM +, Suravee Suthikulpanit wrote:

The framework allows callable implementation of IO page table.
This allows AMD IOMMU driver to switch between different types
of AMD IOMMU page tables (e.g. v1 vs. v2).


Is there a reason you created your own framework, there is already an
io-pgtable framework for ARM, maybe that can be reused?



Actually, this is the same framework used by ARM codes.
Sorry if the description is not clear.

Suravee

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Marek Szyprowski
Hi Robin,

On 24.09.2020 12:40, Robin Murphy wrote:
> On 2020-09-24 11:16, Thierry Reding wrote:
>> On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:
>>> On 24.09.2020 10:28, Joerg Roedel wrote:
 On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
> It allows to remap given buffer at the specific IOVA address, 
> although
> it doesn't guarantee that those specific addresses won't be later 
> used
> by the IOVA allocator. Probably it would make sense to add an API for
> generic IOMMU-DMA framework to mark the given IOVA range as
> reserved/unused to protect them.
 There is an API for that, the IOMMU driver can return IOVA reserved
 regions per device and the IOMMU core code will take care of mapping
 these regions and reserving them in the IOVA allocator, so that
 DMA-IOMMU code will not use it for allocations.

 Have a look at the iommu_ops->get_resv_regions() and
 iommu_ops->put_resv_regions().
>>>
>>> I know about the reserved regions IOMMU API, but the main problem here,
>>> in case of Exynos, is that those reserved regions won't be created by
>>> the IOMMU driver but by the IOMMU client device. It is just a result 
>>> how
>>> the media drivers manages their IOVA space. They simply have to load
>>> firmware at the IOVA address lower than the any address of the used
>>> buffers.
>>
>> I've been working on adding a way to automatically add direct mappings
>> using reserved-memory regions parsed from device tree, see:
>>
>> https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/
>>
>> Perhaps this can be of use? With that you should be able to add a
>> reserved-memory region somewhere in the lower range that you need for
>> firmware images and have that automatically added as a direct mapping
>> so that it won't be reused later on for dynamic allocations.
>
> It can't easily be a *direct* mapping though - if the driver has to 
> use the DMA masks to ensure that everything stays within the 
> addressable range, then (as far as I'm aware) there's no physical 
> memory that low down to equal the DMA addresses.
>
> TBH I'm not convinced that this is a sufficiently common concern to 
> justify new APIs, or even to try to make overly generic. I think just 
> implementing a new DMA attribute to say "please allocate/map this 
> particular request at the lowest DMA address possible" would be good 
> enough. Such a thing could also serve PCI drivers that actually care 
> about SAC/DAC to give us more of a chance of removing the "try a 
> 32-bit mask first" trick from everyone's hotpath...

Hmm, I like the idea of such DMA attribute! It should make things really 
simple, especially in the drivers. Thanks for the great idea! I will try 
to implement it then instead of the workarounds I've proposed in 
s5p-mfc/exynos4-is drivers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Marek Szyprowski
Hi Thierry,

On 24.09.2020 12:16, Thierry Reding wrote:
> On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:
>> On 24.09.2020 10:28, Joerg Roedel wrote:
>>> On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
 It allows to remap given buffer at the specific IOVA address, although
 it doesn't guarantee that those specific addresses won't be later used
 by the IOVA allocator. Probably it would make sense to add an API for
 generic IOMMU-DMA framework to mark the given IOVA range as
 reserved/unused to protect them.
>>> There is an API for that, the IOMMU driver can return IOVA reserved
>>> regions per device and the IOMMU core code will take care of mapping
>>> these regions and reserving them in the IOVA allocator, so that
>>> DMA-IOMMU code will not use it for allocations.
>>>
>>> Have a look at the iommu_ops->get_resv_regions() and
>>> iommu_ops->put_resv_regions().
>> I know about the reserved regions IOMMU API, but the main problem here,
>> in case of Exynos, is that those reserved regions won't be created by
>> the IOMMU driver but by the IOMMU client device. It is just a result how
>> the media drivers manages their IOVA space. They simply have to load
>> firmware at the IOVA address lower than the any address of the used
>> buffers.
> I've been working on adding a way to automatically add direct mappings
> using reserved-memory regions parsed from device tree, see:
>
>  
> https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/
>
> Perhaps this can be of use? With that you should be able to add a
> reserved-memory region somewhere in the lower range that you need for
> firmware images and have that automatically added as a direct mapping
> so that it won't be reused later on for dynamic allocations.

Frankly, using that would be even bigger hack than what I've proposed in 
my workaround. I see no point polluting DT with such artificial regions 
just to ensure specific IOVA space layout.

I just looking for a nice way of reusing most of the IOMMU DMA-mapping 
code with a small addition of manual IOVA space management (just to 
remap the firmware buffer at the specific IOVA address).

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Robin Murphy

On 2020-09-24 11:16, Thierry Reding wrote:

On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:

Hi Joerg,

On 24.09.2020 10:28, Joerg Roedel wrote:

On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:

It allows to remap given buffer at the specific IOVA address, although
it doesn't guarantee that those specific addresses won't be later used
by the IOVA allocator. Probably it would make sense to add an API for
generic IOMMU-DMA framework to mark the given IOVA range as
reserved/unused to protect them.

There is an API for that, the IOMMU driver can return IOVA reserved
regions per device and the IOMMU core code will take care of mapping
these regions and reserving them in the IOVA allocator, so that
DMA-IOMMU code will not use it for allocations.

Have a look at the iommu_ops->get_resv_regions() and
iommu_ops->put_resv_regions().


I know about the reserved regions IOMMU API, but the main problem here,
in case of Exynos, is that those reserved regions won't be created by
the IOMMU driver but by the IOMMU client device. It is just a result how
the media drivers manages their IOVA space. They simply have to load
firmware at the IOVA address lower than the any address of the used
buffers.


I've been working on adding a way to automatically add direct mappings
using reserved-memory regions parsed from device tree, see:

 
https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/

Perhaps this can be of use? With that you should be able to add a
reserved-memory region somewhere in the lower range that you need for
firmware images and have that automatically added as a direct mapping
so that it won't be reused later on for dynamic allocations.


It can't easily be a *direct* mapping though - if the driver has to use 
the DMA masks to ensure that everything stays within the addressable 
range, then (as far as I'm aware) there's no physical memory that low 
down to equal the DMA addresses.


TBH I'm not convinced that this is a sufficiently common concern to 
justify new APIs, or even to try to make overly generic. I think just 
implementing a new DMA attribute to say "please allocate/map this 
particular request at the lowest DMA address possible" would be good 
enough. Such a thing could also serve PCI drivers that actually care 
about SAC/DAC to give us more of a chance of removing the "try a 32-bit 
mask first" trick from everyone's hotpath...


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/13] iommu: amd: Add Generic IO Page Table Framework Support

2020-09-24 Thread Joerg Roedel
Hi Suravee,

On Wed, Sep 23, 2020 at 10:14:29AM +, Suravee Suthikulpanit wrote:
> The framework allows callable implementation of IO page table.
> This allows AMD IOMMU driver to switch between different types
> of AMD IOMMU page tables (e.g. v1 vs. v2).

Is there a reason you created your own framework, there is already an
io-pgtable framework for ARM, maybe that can be reused?

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/tegra-smmu: Some small fixes

2020-09-24 Thread Joerg Roedel
On Fri, Sep 11, 2020 at 12:16:40AM -0700, Nicolin Chen wrote:
> These are a series of small fixes for tegra-smmu driver.
> They might not be critial bugs as current mainline does
> not enable a lot of clients, but be nicer to have since
> we are going to enable the DMA domain for other clients
> in the near future.
> 
> Nicolin Chen (3):
>   iommu/tegra-smmu: Do not use PAGE_SHIFT and PAGE_MASK
>   iommu/tegra-smmu: Fix iova->phys translation
>   iommu/tegra-smmu: Allow to group clients in same swgroup
> 
>  drivers/iommu/tegra-smmu.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Jean-Philippe Brucker
On Thu, Sep 24, 2020 at 12:02:55PM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 05:38:13AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> > > On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > > > OK so this looks good. Can you pls repost with the minor tweak
> > > > suggested and all acks included, and I will queue this?
> > > 
> > > My NACK still stands, as long as a few questions are open:
> > > 
> > >   1) The format used here will be the same as in the ACPI table? I
> > >  think the answer to this questions must be Yes, so this leads
> > >  to the real question:
> > 
> > I am not sure it's a must.
> 
> It is, having only one parser for the ACPI and MMIO descriptions was one
> of the selling points for MMIO in past discussions and I think it makes
> sense to keep them in sync.

It's not possible to use exactly the same code for parsing. The access
methods are different (need to deal with port-IO for built-in description
on PCI, for example) and more importantly, the structure is different as
well. The ACPI table needs nodes for virtio-iommu while the built-in
description is contained in the virtio-iommu itself. So the endpoint nodes
point to virtio-iommu node on ACPI, while they don't need a pointer on the
built-in desc. I kept as much as possible common in structures and
implementation, but in the end we still need about 200 unique lines on
each side.

Thanks,
Jean

> 
> > We can always tweak the parser if there are slight differences
> > between ACPI and virtio formats.
> 
> There is no guarantee that there only need to be "tweaks" until the
> ACPI table format is stablized.
> 
> Regards,
> 
>   Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/tegra-smmu: Allow to group clients in same swgroup

2020-09-24 Thread Thierry Reding
On Fri, Sep 11, 2020 at 12:16:43AM -0700, Nicolin Chen wrote:
> There can be clients using the same swgroup in DT, for example i2c0
> and i2c1. The current driver will add them to separate IOMMU groups,
> though it has implemented device_group() callback which is to group
> devices using different swgroups like DC and DCB.
> 
> All clients having the same swgroup should be also added to the same
> IOMMU group so as to share an asid. Otherwise, the asid register may
> get overwritten every time a new device is attached.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)

Makes sense:

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Gerd Hoffmann
On Thu, Sep 24, 2020 at 12:02:55PM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 05:38:13AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> > > On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > > > OK so this looks good. Can you pls repost with the minor tweak
> > > > suggested and all acks included, and I will queue this?
> > > 
> > > My NACK still stands, as long as a few questions are open:
> > > 
> > >   1) The format used here will be the same as in the ACPI table? I
> > >  think the answer to this questions must be Yes, so this leads
> > >  to the real question:
> > 
> > I am not sure it's a must.
> 
> It is, having only one parser for the ACPI and MMIO descriptions was one
> of the selling points for MMIO in past discussions and I think it makes
> sense to keep them in sync.

So that requirement basically kills the "we have something to play with
while the acpi table spec is in progress" argument.  Also note that qemu
microvm got acpi support meanwhile.

Are there other cases where neither ACPI nor DT are available?

take care,
  Gerd

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/tegra-smmu: Fix iova->phys translation

2020-09-24 Thread Thierry Reding
On Fri, Sep 11, 2020 at 12:16:42AM -0700, Nicolin Chen wrote:
> IOVA might not be always 4KB aligned. So tegra_smmu_iova_to_phys
> function needs to add on the lower 12-bit offset from input iova.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 789d21c01b77..50b962b0647e 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -795,7 +795,7 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct 
> iommu_domain *domain,
>  
>   pfn = *pte & as->smmu->pfn_mask;
>  
> - return SMMU_PFN_PHYS(pfn);
> + return SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova);
>  }
>  
>  static struct tegra_smmu *tegra_smmu_find(struct device_node *np)

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] iommu/tegra-smmu: Do not use PAGE_SHIFT and PAGE_MASK

2020-09-24 Thread Thierry Reding
On Fri, Sep 11, 2020 at 12:16:41AM -0700, Nicolin Chen wrote:
> PAGE_SHIFT and PAGE_MASK are defined corresponding to the page size
> for CPU virtual addresses, which means PAGE_SHIFT could be a number
> other than 12, but tegra-smmu maintains fixed 4KB IOVA pages and has
> fixed [21:12] bit range for PTE entries.
> 
> So this patch replaces all PAGE_SHIFT/PAGE_MASK references with the
> macros defined with SMMU_PTE_SHIFT.
> 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/iommu/tegra-smmu.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 046add7acb61..789d21c01b77 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -130,6 +130,11 @@ static inline u32 smmu_readl(struct tegra_smmu *smmu, 
> unsigned long offset)
>  #define SMMU_PDE_SHIFT 22
>  #define SMMU_PTE_SHIFT 12
>  
> +#define SMMU_PAGE_MASK   (~(SMMU_SIZE_PT-1))
> +#define SMMU_OFFSET_IN_PAGE(x)   ((unsigned long)(x) & ~SMMU_PAGE_MASK)
> +#define SMMU_PFN_PHYS(x) ((phys_addr_t)(x) << SMMU_PTE_SHIFT)
> +#define SMMU_PHYS_PFN(x) ((unsigned long)((x) >> SMMU_PTE_SHIFT))
> +
>  #define SMMU_PD_READABLE (1 << 31)
>  #define SMMU_PD_WRITABLE (1 << 30)
>  #define SMMU_PD_NONSECURE(1 << 29)
> @@ -644,7 +649,7 @@ static void tegra_smmu_set_pte(struct tegra_smmu_as *as, 
> unsigned long iova,
>  u32 *pte, dma_addr_t pte_dma, u32 val)
>  {
>   struct tegra_smmu *smmu = as->smmu;
> - unsigned long offset = offset_in_page(pte);
> + unsigned long offset = SMMU_OFFSET_IN_PAGE(pte);
>  
>   *pte = val;
>  
> @@ -726,7 +731,7 @@ __tegra_smmu_map(struct iommu_domain *domain, unsigned 
> long iova,
>   pte_attrs |= SMMU_PTE_WRITABLE;
>  
>   tegra_smmu_set_pte(as, iova, pte, pte_dma,
> -__phys_to_pfn(paddr) | pte_attrs);
> +SMMU_PHYS_PFN(paddr) | pte_attrs);
>  
>   return 0;
>  }
> @@ -790,7 +795,7 @@ static phys_addr_t tegra_smmu_iova_to_phys(struct 
> iommu_domain *domain,
>  
>   pfn = *pte & as->smmu->pfn_mask;
>  
> - return PFN_PHYS(pfn);
> + return SMMU_PFN_PHYS(pfn);
>  }
>  
>  static struct tegra_smmu *tegra_smmu_find(struct device_node *np)
> @@ -1108,7 +1113,8 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
>   smmu->dev = dev;
>   smmu->mc = mc;
>  
> - smmu->pfn_mask = BIT_MASK(mc->soc->num_address_bits - PAGE_SHIFT) - 1;
> + smmu->pfn_mask =
> + BIT_MASK(mc->soc->num_address_bits - SMMU_PTE_SHIFT) - 1;

checkpatch no longer warns about lines longer than 80 characters. The
new limit is 100, so you can fit this all on one line.

But either way:

Acked-by: Thierry Reding 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Thierry Reding
On Thu, Sep 24, 2020 at 10:46:46AM +0200, Marek Szyprowski wrote:
> Hi Joerg,
> 
> On 24.09.2020 10:28, Joerg Roedel wrote:
> > On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
> >> It allows to remap given buffer at the specific IOVA address, although
> >> it doesn't guarantee that those specific addresses won't be later used
> >> by the IOVA allocator. Probably it would make sense to add an API for
> >> generic IOMMU-DMA framework to mark the given IOVA range as
> >> reserved/unused to protect them.
> > There is an API for that, the IOMMU driver can return IOVA reserved
> > regions per device and the IOMMU core code will take care of mapping
> > these regions and reserving them in the IOVA allocator, so that
> > DMA-IOMMU code will not use it for allocations.
> >
> > Have a look at the iommu_ops->get_resv_regions() and
> > iommu_ops->put_resv_regions().
> 
> I know about the reserved regions IOMMU API, but the main problem here, 
> in case of Exynos, is that those reserved regions won't be created by 
> the IOMMU driver but by the IOMMU client device. It is just a result how 
> the media drivers manages their IOVA space. They simply have to load 
> firmware at the IOVA address lower than the any address of the used 
> buffers.

I've been working on adding a way to automatically add direct mappings
using reserved-memory regions parsed from device tree, see:


https://lore.kernel.org/lkml/2020090413.691933-1-thierry.red...@gmail.com/

Perhaps this can be of use? With that you should be able to add a
reserved-memory region somewhere in the lower range that you need for
firmware images and have that automatically added as a direct mapping
so that it won't be reused later on for dynamic allocations.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 10/13] iommu/arm-smmu-v3: Check for SVA features

2020-09-24 Thread Jean-Philippe Brucker
Hi Shameer,

On Mon, Sep 21, 2020 at 08:59:39AM +, Shameerali Kolothum Thodi wrote:
> > +bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
> > +{
> > +   unsigned long reg, fld;
> > +   unsigned long oas;
> > +   unsigned long asid_bits;
> > +   u32 feat_mask = ARM_SMMU_FEAT_BTM |
> > ARM_SMMU_FEAT_COHERENCY;
> 
> Why is BTM mandated for SVA? I couldn't find this requirement in SMMU spec
> (Sorry if I missed it or this got discussed earlier). But if performance is 
> the only concern here,
> is it better just to allow it with a warning rather than limiting SMMUs 
> without BTM?

It's a performance concern and requires to support multiple
configurations, but the spec allows it. Are there SMMUs without BTM that
need it?

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Joerg Roedel
On Thu, Sep 24, 2020 at 05:38:13AM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> > On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > > OK so this looks good. Can you pls repost with the minor tweak
> > > suggested and all acks included, and I will queue this?
> > 
> > My NACK still stands, as long as a few questions are open:
> > 
> > 1) The format used here will be the same as in the ACPI table? I
> >think the answer to this questions must be Yes, so this leads
> >to the real question:
> 
> I am not sure it's a must.

It is, having only one parser for the ACPI and MMIO descriptions was one
of the selling points for MMIO in past discussions and I think it makes
sense to keep them in sync.

> We can always tweak the parser if there are slight differences
> between ACPI and virtio formats.

There is no guarantee that there only need to be "tweaks" until the
ACPI table format is stablized.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] iommu aux-domain APIs extensions

2020-09-24 Thread Joerg Roedel
On Tue, Sep 22, 2020 at 02:10:37PM +0800, Lu Baolu wrote:
> Hi Jorge and Alex,
> 
> A description of this patch series could be found here.
> 
> https://lore.kernel.org/linux-iommu/20200901033422.22249-1-baolu...@linux.intel.com/

Hmm, I am wondering if we can avoid all this hassle and special APIs by
making the mdev framework more visible outside of the vfio code. There
is an underlying bus implementation for mdevs, so is there a reason
those can't use the standard iommu-core code to setup IOMMU mappings?

What speaks against doing:

- IOMMU drivers capable of handling mdevs register iommu-ops
  for the mdev_bus.

- iommu_domain_alloc() takes bus_type as parameter, so there can
  be special domains be allocated for mdevs.

- Group creation and domain allocation will happen
  automatically in the iommu-core when a new mdev is registered
  through device-driver core code.

- There should be no need for special iommu_aux_* APIs, as one
  can attach a domain directly to >dev with
  iommu_attach_device(domain, >dev).

Doing it this way will probably also keep the mdev-special code in VFIO
small.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables

2020-09-24 Thread Joerg Roedel
On Thu, Sep 24, 2020 at 10:36:47AM +0100, Robin Murphy wrote:
> Yes, the issue was introduced by one of the changes in "dma-mapping:
> introduce DMA range map, supplanting dma_pfn_offset", so it only existed in
> the dma-mapping/for-next branch anyway.

Okay, alright then.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Auger Eric
Hi,

Adding Al in the loop

On 9/24/20 11:38 AM, Michael S. Tsirkin wrote:
> On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
>> On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
>>> OK so this looks good. Can you pls repost with the minor tweak
>>> suggested and all acks included, and I will queue this?
>>
>> My NACK still stands, as long as a few questions are open:
>>
>>  1) The format used here will be the same as in the ACPI table? I
>> think the answer to this questions must be Yes, so this leads
>> to the real question:
> 
> I am not sure it's a must.
> We can always tweak the parser if there are slight differences
> between ACPI and virtio formats.
> 
> But we do want the virtio format used here to be approved by the virtio
> TC, so it won't change.
> 
> Eric, Jean-Philippe, does one of you intend to create a github issue
> and request a ballot for the TC? It's been posted end of August with no
> changes ...
Jean-Philippe, would you?
> 
>>  2) Has the ACPI table format stabalized already? If and only if
>> the answer is Yes I will Ack these patches. We don't need to
>> wait until the ACPI table format is published in a
>> specification update, but at least some certainty that it
>> will not change in incompatible ways anymore is needed.
>>

Al, do you have any news about the the VIOT definition submission to
the UEFI ASWG?

Thank you in advance

Best Regards

Eric


> 
> Not that I know, but I don't see why it's a must.
> 
>> So what progress has been made with the ACPI table specification, is it
>> just a matter of time to get it approved or are there concerns?
>>
>> Regards,
>>
>>  Joerg
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> > OK so this looks good. Can you pls repost with the minor tweak
> > suggested and all acks included, and I will queue this?
> 
> My NACK still stands, as long as a few questions are open:
> 
>   1) The format used here will be the same as in the ACPI table? I
>  think the answer to this questions must be Yes, so this leads
>  to the real question:

I am not sure it's a must.
We can always tweak the parser if there are slight differences
between ACPI and virtio formats.

But we do want the virtio format used here to be approved by the virtio
TC, so it won't change.

Eric, Jean-Philippe, does one of you intend to create a github issue
and request a ballot for the TC? It's been posted end of August with no
changes ...

>   2) Has the ACPI table format stabalized already? If and only if
>  the answer is Yes I will Ack these patches. We don't need to
>  wait until the ACPI table format is published in a
>  specification update, but at least some certainty that it
>  will not change in incompatible ways anymore is needed.
> 

Not that I know, but I don't see why it's a must.

> So what progress has been made with the ACPI table specification, is it
> just a matter of time to get it approved or are there concerns?
> 
> Regards,
> 
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables

2020-09-24 Thread Robin Murphy

On 2020-09-24 10:25, Joerg Roedel wrote:

Hi Robin,

On Thu, Sep 24, 2020 at 10:08:46AM +0100, Robin Murphy wrote:

This should be fixed by 
https://lore.kernel.org/linux-iommu/daedc9364a19dc07487e4d07b8768b1e5934abd4.1600700881.git.robin.mur...@arm.com/T/#u
(already in linux-next).


Thanks! The question remains why this goes through the dma-mapping tree,
was it caused by a patch there?


Yes, the issue was introduced by one of the changes in "dma-mapping: 
introduce DMA range map, supplanting dma_pfn_offset", so it only existed 
in the dma-mapping/for-next branch anyway.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables

2020-09-24 Thread Joerg Roedel
Hi Robin,

On Thu, Sep 24, 2020 at 10:08:46AM +0100, Robin Murphy wrote:
> This should be fixed by 
> https://lore.kernel.org/linux-iommu/daedc9364a19dc07487e4d07b8768b1e5934abd4.1600700881.git.robin.mur...@arm.com/T/#u
> (already in linux-next).

Thanks! The question remains why this goes through the dma-mapping tree,
was it caused by a patch there?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kdump boot failing with IVRS checksum failure

2020-09-24 Thread Joerg Roedel
Hi Jerry,

On Mon, Sep 21, 2020 at 11:56:42AM -0700, Jerry Snitselaar wrote:
> We are seeing a kdump kernel boot failure in test on an HP DL325 Gen10
> and it was tracked down to 387caf0b759a ("iommu/amd: Treat per-device
> exclusion ranges as r/w unity-mapped regions"). Reproduced on 5.9-rc5
> and goes away with revert of the commit. There is a follow on commit
> that depends on this that was reverted as well 2ca6b6dc8512 ("iommu/amd:
> Remove unused variable"). I'm working on getting system access and want
> to see what the IVRS table looks like, but thought I'd give you heads
> up.

Thanks for looking into this, we really need to find the root-cause to
avoid the revert.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Joerg Roedel
On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
> OK so this looks good. Can you pls repost with the minor tweak
> suggested and all acks included, and I will queue this?

My NACK still stands, as long as a few questions are open:

1) The format used here will be the same as in the ACPI table? I
   think the answer to this questions must be Yes, so this leads
   to the real question:
   
2) Has the ACPI table format stabalized already? If and only if
   the answer is Yes I will Ack these patches. We don't need to
   wait until the ACPI table format is published in a
   specification update, but at least some certainty that it
   will not change in incompatible ways anymore is needed.

So what progress has been made with the ACPI table specification, is it
just a matter of time to get it approved or are there concerns?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables

2020-09-24 Thread Robin Murphy

Hi Joerg,

On 2020-09-24 10:03, Joerg Roedel wrote:

Adding Will and Robin.


This should be fixed by 
https://lore.kernel.org/linux-iommu/daedc9364a19dc07487e4d07b8768b1e5934abd4.1600700881.git.robin.mur...@arm.com/T/#u 
(already in linux-next).


Thanks,
Robin.


On Mon, Sep 21, 2020 at 06:50:40PM +0530, Naresh Kamboju wrote:

arm64  Freescale Layerscape 2088A RDB Board boot failed with linux-next
5.9.0-rc5-next-20200921 kernel tag version. The kernel warning and then
kernel panic happened.

Reported-by: Naresh Kamboju 

metadata:
   git branch: master
   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
   git commit: b10b8ad862118bf42c28a98b0f067619aadcfb23
   git describe: next-20200921
   make_kernelversion: 5.9.0-rc5
   kernel-config:
https://builds.tuxbuild.com/GxPuM0SSznSoSYYG8deYpQ/kernel.config


crash log,

[1.811830] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0x48
[1.818202] nand: Micron MT29F16G08ABACAWP
[1.822314] nand: 2048 MiB, SLC, erase size: 512 KiB, page size:
4096, OOB size: 224
[1.830078] [ cut here ]
[1.834703] Driver must set ecc.strength when using hardware ECC
[1.840739] WARNING: CPU: 1 PID: 1 at
drivers/mtd/nand/raw/nand_base.c:5671 nand_scan_with_ids+0x110c/0x1498
[1.850568] Modules linked in:
[1.853621] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
5.9.0-rc5-next-20200921 #1
[1.861015] Hardware name: Freescale Layerscape 2088A RDB Board (DT)
[1.867368] pstate: 4005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[1.873373] pc : nand_scan_with_ids+0x110c/0x1498
[1.878073] lr : nand_scan_with_ids+0x110c/0x1498
[1.882774] sp : 80001005ba50
[1.886083] x29: 80001005ba50 x28: 
[1.891395] x27: 0082edf98638 x26: 0048
[1.896706] x25: 002c x24: 0082edf98578
[1.902018] x23: 0001 x22: 0082ee6b
[1.907329] x21: 0082edf98840 x20: 
[1.912640] x19: 0082edf98080 x18: 0010
[1.917951] x17: 0010 x16: 833b5ff2
[1.923262] x15: 0082ee6b0480 x14: 
[1.928572] x13: 80009005b737 x12: 80001005b73f
[1.933883] x11: 80001005ba50 x10: 80001005ba50
[1.939194] x9 : dc379157bfbc x8 : 657274732e636365
[1.944504] x7 : 2074657320747375 x6 : dc37937ba000
[1.949815] x5 : dc37937baa58 x4 : 80001005b840
[1.955125] x3 :  x2 : 0082ee6b
[1.960436] x1 : 4732f0d38a403700 x0 : 
[1.965748] Call trace:
[1.968189]  nand_scan_with_ids+0x110c/0x1498
[1.972542]  fsl_ifc_nand_probe+0x474/0x6e0
[1.976723]  platform_drv_probe+0x5c/0xb0
[1.980729]  really_probe+0xf0/0x4d8
[1.984300]  driver_probe_device+0xfc/0x168
[1.988480]  device_driver_attach+0x7c/0x88
[1.992659]  __driver_attach+0xac/0x178
[1.996490]  bus_for_each_dev+0x78/0xc8
[2.000321]  driver_attach+0x2c/0x38
[2.003893]  bus_add_driver+0x14c/0x230
[2.007724]  driver_register+0x6c/0x128
[2.011555]  __platform_driver_register+0x50/0x60
[2.016258]  fsl_ifc_nand_driver_init+0x24/0x30
[2.020786]  do_one_initcall+0x4c/0x2d0
[2.024560] ata1: SATA link down (SStatus 0 SControl 300)
[2.024618]  kernel_init_freeable+0x214/0x280
[2.024624]  kernel_init+0x1c/0x120
[2.037849]  ret_from_fork+0x10/0x30
[2.041420] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
5.9.0-rc5-next-20200921 #1
[2.048815] Hardware name: Freescale Layerscape 2088A RDB Board (DT)
[2.055166] Call trace:
[2.057606]  dump_backtrace+0x0/0x1e0
[2.061263]  show_stack+0x20/0x30
[2.064574]  dump_stack+0xf8/0x168
[2.067972]  __warn+0xfc/0x178
[2.071023]  report_bug+0xfc/0x170
[2.074419]  bug_handler+0x28/0x70
[2.077816]  call_break_hook+0x70/0x88
[2.080559] ata2: SATA link down (SStatus 0 SControl 300)
[2.081560]  brk_handler+0x24/0x68
[2.081566]  do_debug_exception+0xb8/0x130
[2.094442]  el1_sync_handler+0xd8/0x120
[2.098360]  el1_sync+0x80/0x100
[2.101583]  nand_scan_with_ids+0x110c/0x1498
[2.105935]  fsl_ifc_nand_probe+0x474/0x6e0
[2.110115]  platform_drv_probe+0x5c/0xb0
[2.114120]  really_probe+0xf0/0x4d8
[2.117691]  driver_probe_device+0xfc/0x168
[2.121871]  device_driver_attach+0x7c/0x88
[2.126050]  __driver_attach+0xac/0x178
[2.129882]  bus_for_each_dev+0x78/0xc8
[2.133714]  driver_attach+0x2c/0x38
[2.137284]  bus_add_driver+0x14c/0x230
[2.141116]  driver_register+0x6c/0x128
[2.144946]  __platform_driver_register+0x50/0x60
[2.149647]  fsl_ifc_nand_driver_init+0x24/0x30
[2.154173]  do_one_initcall+0x4c/0x2d0
[2.158004]  kernel_init_freeable+0x214/0x280
[2.162358]  kernel_init+0x1c/0x120
[2.165841]  ret_from_fork+0x10/0x30
[2.169415] ---[ end trace d051012f465b08eb ]---
[2.174073] fsl,ifc-nand: probe of 53000.nand failed with error -22
[

Re: [PATCH] Revert "iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions"

2020-09-24 Thread Joerg Roedel
On Wed, Sep 23, 2020 at 10:26:55AM +0800, Baoquan He wrote:
> A regression failure of kdump kernel boot was reported on a HPE system.
> Bisect points at commit 387caf0b759ac43 ("iommu/amd: Treat per-device
> exclusion ranges as r/w unity-mapped regions") as criminal. Reverting it
> fix the failure.
> 
> With the commit, kdump kernel will always print below error message, then
> naturally AMD iommu can't function normally during kdump kernel bootup.
> 
>   ~
>   AMD-Vi: [Firmware Bug]: IVRS invalid checksum
> 
> Why commit 387caf0b759ac43 causing it haven't been made clear.

I think this should be debugged further, in future IOMMUs the exclusion
range feature will not be available anymore (mmio-fields get re-used for
SNP). So starting to use them again is not going to work.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu 5000000.iommu: Cannot accommodate DMA offset for IOMMU page tables

2020-09-24 Thread Joerg Roedel
Adding Will and Robin.

On Mon, Sep 21, 2020 at 06:50:40PM +0530, Naresh Kamboju wrote:
> arm64  Freescale Layerscape 2088A RDB Board boot failed with linux-next
> 5.9.0-rc5-next-20200921 kernel tag version. The kernel warning and then
> kernel panic happened.
> 
> Reported-by: Naresh Kamboju 
> 
> metadata:
>   git branch: master
>   git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git commit: b10b8ad862118bf42c28a98b0f067619aadcfb23
>   git describe: next-20200921
>   make_kernelversion: 5.9.0-rc5
>   kernel-config:
> https://builds.tuxbuild.com/GxPuM0SSznSoSYYG8deYpQ/kernel.config
> 
> 
> crash log,
> 
> [1.811830] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0x48
> [1.818202] nand: Micron MT29F16G08ABACAWP
> [1.822314] nand: 2048 MiB, SLC, erase size: 512 KiB, page size:
> 4096, OOB size: 224
> [1.830078] [ cut here ]
> [1.834703] Driver must set ecc.strength when using hardware ECC
> [1.840739] WARNING: CPU: 1 PID: 1 at
> drivers/mtd/nand/raw/nand_base.c:5671 nand_scan_with_ids+0x110c/0x1498
> [1.850568] Modules linked in:
> [1.853621] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> 5.9.0-rc5-next-20200921 #1
> [1.861015] Hardware name: Freescale Layerscape 2088A RDB Board (DT)
> [1.867368] pstate: 4005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> [1.873373] pc : nand_scan_with_ids+0x110c/0x1498
> [1.878073] lr : nand_scan_with_ids+0x110c/0x1498
> [1.882774] sp : 80001005ba50
> [1.886083] x29: 80001005ba50 x28: 
> [1.891395] x27: 0082edf98638 x26: 0048
> [1.896706] x25: 002c x24: 0082edf98578
> [1.902018] x23: 0001 x22: 0082ee6b
> [1.907329] x21: 0082edf98840 x20: 
> [1.912640] x19: 0082edf98080 x18: 0010
> [1.917951] x17: 0010 x16: 833b5ff2
> [1.923262] x15: 0082ee6b0480 x14: 
> [1.928572] x13: 80009005b737 x12: 80001005b73f
> [1.933883] x11: 80001005ba50 x10: 80001005ba50
> [1.939194] x9 : dc379157bfbc x8 : 657274732e636365
> [1.944504] x7 : 2074657320747375 x6 : dc37937ba000
> [1.949815] x5 : dc37937baa58 x4 : 80001005b840
> [1.955125] x3 :  x2 : 0082ee6b
> [1.960436] x1 : 4732f0d38a403700 x0 : 
> [1.965748] Call trace:
> [1.968189]  nand_scan_with_ids+0x110c/0x1498
> [1.972542]  fsl_ifc_nand_probe+0x474/0x6e0
> [1.976723]  platform_drv_probe+0x5c/0xb0
> [1.980729]  really_probe+0xf0/0x4d8
> [1.984300]  driver_probe_device+0xfc/0x168
> [1.988480]  device_driver_attach+0x7c/0x88
> [1.992659]  __driver_attach+0xac/0x178
> [1.996490]  bus_for_each_dev+0x78/0xc8
> [2.000321]  driver_attach+0x2c/0x38
> [2.003893]  bus_add_driver+0x14c/0x230
> [2.007724]  driver_register+0x6c/0x128
> [2.011555]  __platform_driver_register+0x50/0x60
> [2.016258]  fsl_ifc_nand_driver_init+0x24/0x30
> [2.020786]  do_one_initcall+0x4c/0x2d0
> [2.024560] ata1: SATA link down (SStatus 0 SControl 300)
> [2.024618]  kernel_init_freeable+0x214/0x280
> [2.024624]  kernel_init+0x1c/0x120
> [2.037849]  ret_from_fork+0x10/0x30
> [2.041420] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
> 5.9.0-rc5-next-20200921 #1
> [2.048815] Hardware name: Freescale Layerscape 2088A RDB Board (DT)
> [2.055166] Call trace:
> [2.057606]  dump_backtrace+0x0/0x1e0
> [2.061263]  show_stack+0x20/0x30
> [2.064574]  dump_stack+0xf8/0x168
> [2.067972]  __warn+0xfc/0x178
> [2.071023]  report_bug+0xfc/0x170
> [2.074419]  bug_handler+0x28/0x70
> [2.077816]  call_break_hook+0x70/0x88
> [2.080559] ata2: SATA link down (SStatus 0 SControl 300)
> [2.081560]  brk_handler+0x24/0x68
> [2.081566]  do_debug_exception+0xb8/0x130
> [2.094442]  el1_sync_handler+0xd8/0x120
> [2.098360]  el1_sync+0x80/0x100
> [2.101583]  nand_scan_with_ids+0x110c/0x1498
> [2.105935]  fsl_ifc_nand_probe+0x474/0x6e0
> [2.110115]  platform_drv_probe+0x5c/0xb0
> [2.114120]  really_probe+0xf0/0x4d8
> [2.117691]  driver_probe_device+0xfc/0x168
> [2.121871]  device_driver_attach+0x7c/0x88
> [2.126050]  __driver_attach+0xac/0x178
> [2.129882]  bus_for_each_dev+0x78/0xc8
> [2.133714]  driver_attach+0x2c/0x38
> [2.137284]  bus_add_driver+0x14c/0x230
> [2.141116]  driver_register+0x6c/0x128
> [2.144946]  __platform_driver_register+0x50/0x60
> [2.149647]  fsl_ifc_nand_driver_init+0x24/0x30
> [2.154173]  do_one_initcall+0x4c/0x2d0
> [2.158004]  kernel_init_freeable+0x214/0x280
> [2.162358]  kernel_init+0x1c/0x120
> [2.165841]  ret_from_fork+0x10/0x30
> [2.169415] ---[ end trace d051012f465b08eb ]---
> [2.174073] fsl,ifc-nand: probe of 53000.nand failed with error -22
> [2.181882] spi-nor spi0.0: unrecognized JEDEC id 

Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Michael S. Tsirkin
On Fri, Sep 04, 2020 at 06:24:12PM +0200, Auger Eric wrote:
> Hi,
> 
> On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> > Add a topology description to the virtio-iommu driver and enable x86
> > platforms.
> > 
> > Since [v2] we have made some progress on adding ACPI support for
> > virtio-iommu, which is the preferred boot method on x86. It will be a
> > new vendor-agnostic table describing para-virtual topologies in a
> > minimal format. However some platforms don't use either ACPI or DT for
> > booting (for example microvm), and will need the alternative topology
> > description method proposed here. In addition, since the process to get
> > a new ACPI table will take a long time, this provides a boot method even
> > to ACPI-based platforms, if only temporarily for testing and
> > development.
> > 
> > v3:
> > * Add patch 1 that moves virtio-iommu to a subfolder.
> > * Split the rest:
> >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > support.
> >   * Patch 4 adds definitions.
> >   * Patch 5 adds parser in topology.c.
> > * Address other comments.
> > 
> > Linux and QEMU patches available at:
> > https://jpbrucker.net/git/linux virtio-iommu/devel
> > https://jpbrucker.net/git/qemu virtio-iommu/devel
> I have tested that series with above QEMU branch on ARM with virtio-net
> and virtio-blk translated devices in non DT mode.
> 
> It works for me:
> Tested-by: Eric Auger 
> 
> Thanks
> 
> Eric

OK so this looks good. Can you pls repost with the minor tweak
suggested and all acks included, and I will queue this?

Thanks!

> > 
> > [spec] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00067.html
> > [v2] 
> > https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> > [v1] 
> > https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/
> > [rfc] 
> > https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> > 
> > Jean-Philippe Brucker (6):
> >   iommu/virtio: Move to drivers/iommu/virtio/
> >   iommu/virtio: Add topology helpers
> >   PCI: Add DMA configuration for virtual platforms
> >   iommu/virtio: Add topology definitions
> >   iommu/virtio: Support topology description in config space
> >   iommu/virtio: Enable x86 support
> > 
> >  drivers/iommu/Kconfig |  18 +-
> >  drivers/iommu/Makefile|   3 +-
> >  drivers/iommu/virtio/Makefile |   4 +
> >  drivers/iommu/virtio/topology-helpers.h   |  50 +
> >  include/linux/virt_iommu.h|  15 ++
> >  include/uapi/linux/virtio_iommu.h |  44 
> >  drivers/iommu/virtio/topology-helpers.c   | 196 
> >  drivers/iommu/virtio/topology.c   | 259 ++
> >  drivers/iommu/{ => virtio}/virtio-iommu.c |   4 +
> >  drivers/pci/pci-driver.c  |   5 +
> >  MAINTAINERS   |   3 +-
> >  11 files changed, 597 insertions(+), 4 deletions(-)
> >  create mode 100644 drivers/iommu/virtio/Makefile
> >  create mode 100644 drivers/iommu/virtio/topology-helpers.h
> >  create mode 100644 include/linux/virt_iommu.h
> >  create mode 100644 drivers/iommu/virtio/topology-helpers.c
> >  create mode 100644 drivers/iommu/virtio/topology.c
> >  rename drivers/iommu/{ => virtio}/virtio-iommu.c (99%)
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/1] [PULL REQUEST] iommu/vt-d: patches for v5.10

2020-09-24 Thread Joerg Roedel
On Tue, Sep 22, 2020 at 02:08:42PM +0800, Lu Baolu wrote:
> Below patch is ready for v5.10. It includes:
> 
>  - Use device numa domain if RHSA is missing
> 
> Can you please consider them for v5.10?
> 
> Best regards,
> Lu Baolu
> 
> Lu Baolu (1):
>   iommu/vt-d: Use device numa domain if RHSA is missing
> 
>  drivers/iommu/intel/iommu.c | 37 +++--
>  1 file changed, 35 insertions(+), 2 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 05/11] vfio/pci: Register an iommu fault handler

2020-09-24 Thread Zenghui Yu

Hi Eric,

On 2020/3/21 0:19, Eric Auger wrote:

Register an IOMMU fault handler which records faults in
the DMA FAULT region ring buffer. In a subsequent patch, we
will add the signaling of a specific eventfd to allow the
userspace to be notified whenever a new fault as shown up.

Signed-off-by: Eric Auger 


[...]


diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 586b89debed5..69595c240baf 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "vfio_pci_private.h"
  
@@ -283,6 +284,38 @@ static const struct vfio_pci_regops vfio_pci_dma_fault_regops = {

.add_capability = vfio_pci_dma_fault_add_capability,
  };
  
+int vfio_pci_iommu_dev_fault_handler(struct iommu_fault *fault, void *data)

+{
+   struct vfio_pci_device *vdev = (struct vfio_pci_device *)data;
+   struct vfio_region_dma_fault *reg =
+   (struct vfio_region_dma_fault *)vdev->fault_pages;
+   struct iommu_fault *new =
+   (struct iommu_fault *)(vdev->fault_pages + reg->offset +
+   reg->head * reg->entry_size);


Shouldn't 'reg->head' be protected under the fault_queue_lock? Otherwise
things may change behind our backs...

We shouldn't take any assumption about how IOMMU driver would report the
fault (serially or in parallel), I think.


+   int head, tail, size;
+   int ret = 0;
+
+   if (fault->type != IOMMU_FAULT_DMA_UNRECOV)
+   return -ENOENT;
+
+   mutex_lock(>fault_queue_lock);
+
+   head = reg->head;
+   tail = reg->tail;
+   size = reg->nb_entries;
+
+   if (CIRC_SPACE(head, tail, size) < 1) {
+   ret = -ENOSPC;
+   goto unlock;
+   }
+
+   *new = *fault;
+   reg->head = (head + 1) % size;
+unlock:
+   mutex_unlock(>fault_queue_lock);
+   return ret;
+}
+
  #define DMA_FAULT_RING_LENGTH 512
  
  static int vfio_pci_init_dma_fault_region(struct vfio_pci_device *vdev)

@@ -317,6 +350,13 @@ static int vfio_pci_init_dma_fault_region(struct 
vfio_pci_device *vdev)
header->entry_size = sizeof(struct iommu_fault);
header->nb_entries = DMA_FAULT_RING_LENGTH;
header->offset = sizeof(struct vfio_region_dma_fault);
+
+   ret = iommu_register_device_fault_handler(>pdev->dev,
+   vfio_pci_iommu_dev_fault_handler,
+   vdev);
+   if (ret)
+   goto out;
+
return 0;
  out:
kfree(vdev->fault_pages);



Thanks,
Zenghui
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate()

2020-09-24 Thread Joerg Roedel
On Fri, Sep 18, 2020 at 05:27:59PM +0200, Marek Szyprowski wrote:
> Hi
> 
> On 18.09.2020 03:13, Yu Kuai wrote:
> > if of_find_device_by_node() succeed, exynos_iommu_of_xlate() doesn't have
> > a corresponding put_device(). Thus add put_device() to fix the exception
> > handling for this function implementation.
> >
> > Fixes: aa759fd376fb ("iommu/exynos: Add callback for initializing devices 
> > from device tree")
> > Signed-off-by: Yu Kuai 
> 
> Thanks for the fix!
> 
> Acked-by: Marek Szyprowski 

Applied for v5.9, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Marek Szyprowski
Hi Joerg,

On 24.09.2020 10:28, Joerg Roedel wrote:
> On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
>> It allows to remap given buffer at the specific IOVA address, although
>> it doesn't guarantee that those specific addresses won't be later used
>> by the IOVA allocator. Probably it would make sense to add an API for
>> generic IOMMU-DMA framework to mark the given IOVA range as
>> reserved/unused to protect them.
> There is an API for that, the IOMMU driver can return IOVA reserved
> regions per device and the IOMMU core code will take care of mapping
> these regions and reserving them in the IOVA allocator, so that
> DMA-IOMMU code will not use it for allocations.
>
> Have a look at the iommu_ops->get_resv_regions() and
> iommu_ops->put_resv_regions().

I know about the reserved regions IOMMU API, but the main problem here, 
in case of Exynos, is that those reserved regions won't be created by 
the IOMMU driver but by the IOMMU client device. It is just a result how 
the media drivers manages their IOVA space. They simply have to load 
firmware at the IOVA address lower than the any address of the used 
buffers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 3/7] iommu/uapi: Introduce enum type for PASID data format

2020-09-24 Thread Joerg Roedel
Hi Jacob,

On Fri, Sep 18, 2020 at 10:11:08AM -0700, Jacob Pan wrote:
> On Fri, 18 Sep 2020 11:44:50 +0200, Joerg Roedel  wrote:
> 
> > On Fri, Sep 11, 2020 at 02:57:52PM -0700, Jacob Pan wrote:
> > > There can be multiple vendor-specific PASID data formats used in UAPI
> > > structures. This patch adds enum type with a last entry which makes
> > > range checking much easier.  
> > 
> > But it also makes it much easier to screw up the numbers (which are ABI)
> > by inserting a new value into the middle. I prefer defines here, or
> > alternativly BUILD_BUG_ON() checks for the numbers.
> > 
> I am not following, the purpose of IOMMU_PASID_FORMAT_LAST *is* for
> preparing the future insertion of new value into the middle.
> The checking against IOMMU_PASID_FORMAT_LAST is to protect ABI
> compatibility by making sure that out of range format are rejected in all
> versions of the ABI.

But with the enum you could have:

enum {
VTD_FOO,
SMMU_FOO,
LAST,
};

which makes VTD_FOO==0 and SMMU_FOO==1, and when in the next version
someone adds:

enum {
VTD_FOO,
VTD_BAR,
SMMU_FOO,
LAST,
};

then SMMU_FOO will become 2 and break ABI. So I'd like to have this
checked somewhere.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-24 Thread Jean-Philippe Brucker
On Fri, Sep 04, 2020 at 06:05:35PM +0200, Auger Eric wrote:
> > +static inline int viommu_pci_find_capability(struct pci_dev *dev, u8 
> > cfg_type,
> > +struct viommu_cap_config *cap)
> not sure the inline is useful here

No, I'll drop it

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/6] iommu/virtio: Add topology helpers

2020-09-24 Thread Jean-Philippe Brucker
On Fri, Sep 04, 2020 at 06:22:12PM +0200, Auger Eric wrote:
> > +/**
> > + * virt_dma_configure - Configure DMA of virtualized devices
> > + * @dev: the endpoint
> > + *
> > + * Setup the DMA and IOMMU ops of a virtual device, for platforms without 
> > DT or
> > + * ACPI.
> > + *
> > + * Return: -EPROBE_DEFER if the device is managed by an IOMMU that hasn't 
> > been
> > + *   probed yet, 0 otherwise
> > + */
> > +int virt_dma_configure(struct device *dev)
> > +{
> > +   const struct iommu_ops *iommu_ops;
> > +
> > +   iommu_ops = virt_iommu_setup(dev);
> > +   if (IS_ERR_OR_NULL(iommu_ops)) {
> > +   int ret = PTR_ERR(iommu_ops);
> > +
> > +   if (ret == -EPROBE_DEFER || ret == 0)
> > +   return ret;
> > +   dev_err(dev, "error %d while setting up virt IOMMU\n", ret);
> > +   return 0;
> why do we return 0 here?

So we can fall back to another method (ACPI or DT) if available

> Besides
> 
> Reviewed-by: Eric Auger 

Thanks!
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOVA allocation dependency between firmware buffer and remaining buffers

2020-09-24 Thread Joerg Roedel
On Wed, Sep 23, 2020 at 08:48:26AM +0200, Marek Szyprowski wrote:
> It allows to remap given buffer at the specific IOVA address, although 
> it doesn't guarantee that those specific addresses won't be later used 
> by the IOVA allocator. Probably it would make sense to add an API for 
> generic IOMMU-DMA framework to mark the given IOVA range as 
> reserved/unused to protect them.

There is an API for that, the IOMMU driver can return IOVA reserved
regions per device and the IOMMU core code will take care of mapping
these regions and reserving them in the IOVA allocator, so that
DMA-IOMMU code will not use it for allocations.

Have a look at the iommu_ops->get_resv_regions() and
iommu_ops->put_resv_regions().

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 04/11] vfio/pci: Add VFIO_REGION_TYPE_NESTED region type

2020-09-24 Thread Zenghui Yu

Hi Eric,

On 2020/3/21 0:19, Eric Auger wrote:

Add a new specific DMA_FAULT region aiming to exposed nested mode
translation faults.

The region has a ring buffer that contains the actual fault
records plus a header allowing to handle it (tail/head indices,
max capacity, entry size). At the moment the region is dimensionned
for 512 fault records.

Signed-off-by: Eric Auger 


[...]


diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 379a02c36e37..586b89debed5 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -260,6 +260,69 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, 
pci_power_t state)
return ret;
  }
  
+static void vfio_pci_dma_fault_release(struct vfio_pci_device *vdev,

+  struct vfio_pci_region *region)
+{
+}
+
+static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
+struct vfio_pci_region *region,
+struct vfio_info_cap *caps)
+{
+   struct vfio_region_info_cap_fault cap = {
+   .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
+   .header.version = 1,
+   .version = 1,
+   };
+   return vfio_info_add_capability(caps, , sizeof(cap));
+}
+
+static const struct vfio_pci_regops vfio_pci_dma_fault_regops = {
+   .rw = vfio_pci_dma_fault_rw,
+   .release= vfio_pci_dma_fault_release,
+   .add_capability = vfio_pci_dma_fault_add_capability,
+};
+
+#define DMA_FAULT_RING_LENGTH 512
+
+static int vfio_pci_init_dma_fault_region(struct vfio_pci_device *vdev)
+{
+   struct vfio_region_dma_fault *header;
+   size_t size;
+   int ret;
+
+   mutex_init(>fault_queue_lock);
+
+   /*
+* We provision 1 page for the header and space for
+* DMA_FAULT_RING_LENGTH fault records in the ring buffer.
+*/
+   size = ALIGN(sizeof(struct iommu_fault) *
+DMA_FAULT_RING_LENGTH, PAGE_SIZE) + PAGE_SIZE;
+
+   vdev->fault_pages = kzalloc(size, GFP_KERNEL);
+   if (!vdev->fault_pages)
+   return -ENOMEM;
+
+   ret = vfio_pci_register_dev_region(vdev,
+   VFIO_REGION_TYPE_NESTED,
+   VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT,
+   _pci_dma_fault_regops, size,
+   VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE,
+   vdev->fault_pages);
+   if (ret)
+   goto out;
+
+   header = (struct vfio_region_dma_fault *)vdev->fault_pages;
+   header->entry_size = sizeof(struct iommu_fault);
+   header->nb_entries = DMA_FAULT_RING_LENGTH;
+   header->offset = sizeof(struct vfio_region_dma_fault);
+   return 0;
+out:
+   kfree(vdev->fault_pages);
+   return ret;
+}
+
  static int vfio_pci_enable(struct vfio_pci_device *vdev)
  {
struct pci_dev *pdev = vdev->pdev;
@@ -358,6 +421,10 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
}
}
  
+	ret = vfio_pci_init_dma_fault_region(vdev);

+   if (ret)
+   goto disable_exit;
+
vfio_pci_probe_mmaps(vdev);
  
  	return 0;

@@ -1383,6 +1450,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
  
  	vfio_iommu_group_put(pdev->dev.iommu_group, >dev);

kfree(vdev->region);
+   kfree(vdev->fault_pages);
mutex_destroy(>ioeventfds_lock);
  
  	if (!disable_idle_d3)

diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 8a2c7607d513..a392f50e3a99 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -119,6 +119,8 @@ struct vfio_pci_device {
int ioeventfds_nr;
struct eventfd_ctx  *err_trigger;
struct eventfd_ctx  *req_trigger;
+   u8  *fault_pages;


What's the reason to use 'u8'? It doesn't match the type of header, nor
the type of ring buffer.


+   struct mutexfault_queue_lock;
struct list_headdummy_resources_list;
struct mutexioeventfds_lock;
struct list_headioeventfds_list;
@@ -150,6 +152,14 @@ extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device 
*vdev, char __user *buf,
  extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
   uint64_t data, int count, int fd);
  
+struct vfio_pci_fault_abi {

+   u32 entry_size;
+};


This is not used by this patch (and the whole series).


+
+extern size_t vfio_pci_dma_fault_rw(struct vfio_pci_device *vdev,
+   char __user *buf, size_t count,
+   loff_t *ppos, bool iswrite);
+
  extern int vfio_pci_init_perm_bits(void);
  extern void vfio_pci_uninit_perm_bits(void);
  
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c

index