Re: [PATCH v6 09/22] vfio: VFIO_IOMMU_BIND/UNBIND_MSI

2019-03-22 Thread Alex Williamson
On Fri, 22 Mar 2019 10:30:02 +0100
Auger Eric  wrote:

> Hi Alex,
> On 3/22/19 12:01 AM, Alex Williamson wrote:
> > On Sun, 17 Mar 2019 18:22:19 +0100
> > Eric Auger  wrote:
> >   
> >> This patch adds the VFIO_IOMMU_BIND/UNBIND_MSI ioctl which aim
> >> to pass/withdraw the guest MSI binding to/from the host.
> >>
> >> Signed-off-by: Eric Auger 
> >>
> >> ---
> >> v3 -> v4:
> >> - add UNBIND
> >> - unwind on BIND error
> >>
> >> v2 -> v3:
> >> - adapt to new proto of bind_guest_msi
> >> - directly use vfio_iommu_for_each_dev
> >>
> >> v1 -> v2:
> >> - s/vfio_iommu_type1_guest_msi_binding/vfio_iommu_type1_bind_guest_msi
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 58 +
> >>  include/uapi/linux/vfio.h   | 29 +
> >>  2 files changed, 87 insertions(+)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >> b/drivers/vfio/vfio_iommu_type1.c
> >> index 12a40b9db6aa..66513679081b 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -1710,6 +1710,25 @@ static int vfio_cache_inv_fn(struct device *dev, 
> >> void *data)
> >>return iommu_cache_invalidate(d, dev, >info);
> >>  }
> >>  
> >> +static int vfio_bind_msi_fn(struct device *dev, void *data)
> >> +{
> >> +  struct vfio_iommu_type1_bind_msi *ustruct =
> >> +  (struct vfio_iommu_type1_bind_msi *)data;
> >> +  struct iommu_domain *d = iommu_get_domain_for_dev(dev);
> >> +
> >> +  return iommu_bind_guest_msi(d, dev, ustruct->iova,
> >> +  ustruct->gpa, ustruct->size);
> >> +}
> >> +
> >> +static int vfio_unbind_msi_fn(struct device *dev, void *data)
> >> +{
> >> +  dma_addr_t *iova = (dma_addr_t *)data;
> >> +  struct iommu_domain *d = iommu_get_domain_for_dev(dev);  
> > 
> > Same as previous, we can encapsulate domain in our own struct to avoid
> > a lookup.
> >   
> >> +
> >> +  iommu_unbind_guest_msi(d, dev, *iova);  
> > 
> > Is it strange that iommu-core is exposing these interfaces at a device
> > level if every one of them requires us to walk all the devices?  Thanks,  
> 
> Hum this per device API was devised in response of Robin's comments on
> 
> [RFC v2 12/20] dma-iommu: Implement NESTED_MSI cookie.
> 
> "
> But that then seems to reveal a somewhat bigger problem - if the callers
> are simply registering IPAs, and relying on the ITS driver to grab an
> entry and fill in a PA later, then how does either one know *which* PA
> is supposed to belong to a given IPA in the case where you have multiple
> devices with different ITS targets assigned to the same guest? (and if
> it's possible to assume a guest will use per-device stage 1 mappings and
> present it with a single vITS backed by multiple pITSes, I think things
> start breaking even harder.)
> "
> 
> However looking back into the problem I wonder if there was an issue
> with the iommu_domain based API.
> 
> If my understanding is correct, when assigned devices are protected by a
> vIOMMU then they necessarily end up in separate host iommu domains even
> if they belong to the same iommu_domain on the guest. And there can only
> be a single device in this iommu_domain.

Don't forget that a container represents the IOMMU context in a vfio
environment, groups are associated with containers and a group may
contain one or more devices.  When a vIOMMU comes into play, we still
only have an IOMMU context per container.  If we have multiple devices
in a group, we run into problems with vIOMMU.  We can resolve this by
requiring that the user ignore all but one device in the group,
or making sure that the devices in the group have the same IOMMU
context.  The latter we could do in QEMU if PCIe-to-PCI bridges there
masked the per-device address space as it does on real hardware (ie.
there is no requester ID on conventional PCI, all transactions appear to
the IOMMU with the bridge requester ID).  So I raise this question
because vfio's minimum domain granularity is a group.

> If this is confirmed, there is a non ambiguous association between 1
> physical iommu_domain, 1 device, 1 S1 mapping and 1 physical MSI
> controller.
> 
> I added the device handle handle to disambiguate those associations. The
> gIOVA ->gDB mapping is associated with a device handle. Then when the
> host needs a stage 1 mapping for this device, to build the nested
> mapping towards the physical DB it can easily grab the gIOVA->gDB stage
> 1 mapping registered for this device.
> 
> The correctness looks more obvious to me, at least.

Except all devices within all groups within the same container
necessarily share the same IOMMU context, so from that perspective, it
appears to impose non-trivial redundancy on the caller.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [git pull] IOMMU Fixes for Linux v5.1-rc1

2019-03-22 Thread pr-tracker-bot
The pull request you sent on Fri, 22 Mar 2019 12:22:29 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> tags/iommu-fixes-v5.1-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/070c95d457267eefecd70f5dd434740201d5083c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 RFC/RFT] dma-contiguous: Get normal pages for single-page allocations

2019-03-22 Thread Nicolin Chen
Hi Catalin,

On Fri, Mar 22, 2019 at 10:57:13AM +, Catalin Marinas wrote:
> > > Do you have any numbers to back this up? You don't seem to address
> > > dma_direct_alloc() either but, as I said above, it's not trivial since
> > > some platforms expect certain physical range for DMA allocations.
> > 
> > What's the dma_direct_alloc() here about? Mind elaborating?
> 
> I just did a grep for dma_alloc_from_contiguous() in the 5.1-rc1 kernel
> and came up with __dma_direct_alloc_pages(). Should your patch cover
> this as well?

I don't get the meaning of "cover this" here. What missing part
do you refer to?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 03/10] NTB: Introduce helper functions to calculate logical port number

2019-03-22 Thread Bjorn Helgaas
On Thu, Mar 21, 2019 at 06:06:38PM -0600, Logan Gunthorpe wrote:
> This patch introduces the "Logical Port Number" which is similar to the
> "Port Number" in that it enumerates the ports in the system.
> 
> The original (or Physical) "Port Number" can be any number used by the
> hardware to uniquley identify a port in the system. The "Logical Port

s/uniquley/uniquely/

> Number" enumerates all ports in the system from 0 to the number of
> ports minus one.
> 
> For example a system with 5 ports might have the following port numbers
> which would be enumareted thusly:

s/enumareted/enumerated/

> + * ntb_logical_port_number() - get the logical port number of the local port
> + * @ntb: NTB device context.
> + *
> + * The Logical Port Number is defined to be a unique number for each
> + * port starting from zero through to the number of ports minus one.
> + * This is in contrast to the Port Number where each port can be assigned
> + * any unqique physical number by the hardware.

s/unqique/unique/

> + * ntb_peer_logical_port_number() - get the logical peer port by given index
> + * @ntb: NTB device context.
> + * @pidx:Peer port index.
> + *
> + * The Logical Port Number is defined to be a unique number for each
> + * port starting from zero through to the number of ports minus one.
> + * This is in contrast to the Port Number where each port can be assigned
> + * any unqique physical number by the hardware.

s/unqique/unique/

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 09/10] NTB: Add MSI interrupt support to ntb_transport

2019-03-22 Thread Bjorn Helgaas
Hi Logan,

Drive-by nits:

On Thu, Mar 21, 2019 at 06:06:44PM -0600, Logan Gunthorpe wrote:
> Introduce the module parameter 'use_msi' which, when set uses

s/set/set,/

> MSI interrupts instead of doorbells for each queue pair (QP). T
> he parameter is only available if NTB MSI support is configured into

Spurious newline in the middle of "The".

> the kernel. We also require there to be more than one memory window
> (MW) so that an extra one is available to forward the APIC region.
> 
> To use MSIs, we request one interrupt per QP and forward the MSI address
> and data to the peer using scratch pad registers (SPADS) above the MW
> spads. (If there are not enough SPADS the MSI interrupt will not be used.)

/spads/SPADS/ for consistency

> Once registered, we simply use ntb_msi_peer_trigger and the recieving

s/recieving/receiving/

> ISR simply queues up the rxc_db_work for the queue.
> 
> This addition can significantly improve performance of ntb_transport.
> In a simple, untuned, apples-to-apples comparision using ntb_netdev
> and iperf with switchtec hardware, I see 3.88Gb/s without MSI
> interrupts and 14.1Gb/s which is a more than 3x improvement.

s/which is/with MSI, which is/

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: Don't print warning when IOMMU driver only supports unmanaged domains

2019-03-22 Thread Joerg Roedel
From: Joerg Roedel 

Print the warning about the fall-back to IOMMU_DOMAIN_DMA in
iommu_group_get_for_dev() only when such a domain was
actually allocated.

Otherwise the user will get misleading warnings in the
kernel log when the iommu driver used doesn't support
IOMMU_DOMAIN_DMA and IOMMU_DOMAIN_IDENTITY.

Fixes: fccb4e3b8ab09 ('iommu: Allow default domain type to be set on the kernel 
command line')
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/iommu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 33a982e33716..109de67d5d72 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1105,10 +1105,12 @@ struct iommu_group *iommu_group_get_for_dev(struct 
device *dev)
 
dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
-   dev_warn(dev,
-"failed to allocate default IOMMU domain of 
type %u; falling back to IOMMU_DOMAIN_DMA",
-iommu_def_domain_type);
dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
+   if (dom) {
+   dev_warn(dev,
+"failed to allocate default IOMMU 
domain of type %u; falling back to IOMMU_DOMAIN_DMA",
+iommu_def_domain_type);
+   }
}
 
group->default_domain = dom;
-- 
2.16.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu: Add config option to set lazy mode as default

2019-03-22 Thread John Garry

On 22/03/2019 14:11, Zhen Lei wrote:


This allows the default behaviour to be controlled by a kernel config
option instead of changing the command line for the kernel to include
"iommu.strict=0" on ARM64 where this is desired.

This is similar to CONFIG_IOMMU_DEFAULT_PASSTHROUGH

Note: At present, intel_iommu, amd_iommu and s390_iommu use lazy mode as
defalut, so there is no need to add code for them.


/s/defalut/default/



Signed-off-by: Zhen Lei 
---
 drivers/iommu/Kconfig | 14 ++
 drivers/iommu/iommu.c |  5 +


Do we need to update kernel-parameters.txt for iommu.strict?


 2 files changed, 19 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6f07f3b21816c64..5ec9780f564eaf8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -85,6 +85,20 @@ config IOMMU_DEFAULT_PASSTHROUGH

  If unsure, say N here.

+config IOMMU_LAZY_MODE


maybe should add "DMA" to the name, and even "DEFAULT"


+   bool "IOMMU use lazy mode to flush IOTLB and free IOVA"
+   depends on IOMMU_API
+   help
+ For every IOMMU unmap operation, the flush operation of IOTLB and the 
free
+ operation of IOVA are deferred.


This is a bit unclear, as there is no context. I think that you need to 
say something like, "Support lazy mode, where for every IOMMU DMA unmap 
operation, the flush operation of IOTLB and the free operation of IOVA 
are deferred. "



They are only guaranteed to be done before

+ the related IOVA will be reused. Removing the need to pass in 
iommu.strict=0
+ through command line on ARM64(Now, intel_iommu, amd_iommu, s390_iommu 
use
+ lazy mode as deault).


prone to going out-of-date

 If this is enabled, you can still disable with kernel

+ parameters, such as iommu.strict=1, intel_iommu=strict, 
amd_iommu=fullflush
+ or s390_iommu=strict depending on the architecture.
+
+ If unsure, say N here.
+
 config OF_IOMMU
def_bool y
depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 33a982e33716369..e307d70d1578b3b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -43,7 +43,12 @@
 #else
 static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
+
+#ifdef CONFIG_IOMMU_LAZY_MODE
+static bool iommu_dma_strict __read_mostly;
+#else
 static bool iommu_dma_strict __read_mostly = true;
+#endif

 struct iommu_callback_data {
const struct iommu_ops *ops;
--
1.8.3





Cheers



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu: Add config option to set lazy mode as default

2019-03-22 Thread Zhen Lei
This allows the default behaviour to be controlled by a kernel config
option instead of changing the command line for the kernel to include
"iommu.strict=0" on ARM64 where this is desired.

This is similar to CONFIG_IOMMU_DEFAULT_PASSTHROUGH.

Note: At present, intel_iommu, amd_iommu and s390_iommu use lazy mode as
defalut, so there is no need to add code for them.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/Kconfig | 14 ++
 drivers/iommu/iommu.c |  5 +
 2 files changed, 19 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6f07f3b21816c64..5ec9780f564eaf8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -85,6 +85,20 @@ config IOMMU_DEFAULT_PASSTHROUGH

  If unsure, say N here.

+config IOMMU_LAZY_MODE
+   bool "IOMMU use lazy mode to flush IOTLB and free IOVA"
+   depends on IOMMU_API
+   help
+ For every IOMMU unmap operation, the flush operation of IOTLB and the 
free
+ operation of IOVA are deferred. They are only guaranteed to be done 
before
+ the related IOVA will be reused. Removing the need to pass in 
iommu.strict=0
+ through command line on ARM64(Now, intel_iommu, amd_iommu, s390_iommu 
use
+ lazy mode as deault). If this is enabled, you can still disable with 
kernel
+ parameters, such as iommu.strict=1, intel_iommu=strict, 
amd_iommu=fullflush
+ or s390_iommu=strict depending on the architecture.
+
+ If unsure, say N here.
+
 config OF_IOMMU
def_bool y
depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 33a982e33716369..e307d70d1578b3b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -43,7 +43,12 @@
 #else
 static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
+
+#ifdef CONFIG_IOMMU_LAZY_MODE
+static bool iommu_dma_strict __read_mostly;
+#else
 static bool iommu_dma_strict __read_mostly = true;
+#endif

 struct iommu_callback_data {
const struct iommu_ops *ops;
--
1.8.3


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 00/22] SMMUv3 Nested Stage Setup

2019-03-22 Thread Auger Eric
Hi,

On 3/17/19 6:22 PM, Eric Auger wrote:
> This series allows a virtualizer to program the nested stage mode.
> This is useful when both the host and the guest are exposed with
> an SMMUv3 and a PCI device is assigned to the guest using VFIO.
> 
> In this mode, the physical IOMMU must be programmed to translate
> the two stages: the one set up by the guest (IOVA -> GPA) and the
> one set up by the host VFIO driver as part of the assignment process
> (GPA -> HPA).
> 
> On Intel, this is traditionnaly achieved by combining the 2 stages
> into a single physical stage. However this relies on the capability
> to trap on each guest translation structure update. This is possible
> by using the VTD Caching Mode. Unfortunately the ARM SMMUv3 does
> not offer a similar mechanism.
> 
> However, the ARM SMMUv3 architecture supports 2 physical stages! Those
> were devised exactly with that use case in mind. Assuming the HW
> implements both stages (optional), the guest now can use stage 1
> while the host uses stage 2.
> 
> This assumes the virtualizer has means to propagate guest settings
> to the host SMMUv3 driver. This series brings this VFIO/IOMMU
> infrastructure.  Those services are:
> - bind the guest stage 1 configuration to the stream table entry
> - propagate guest TLB invalidations
> - bind MSI IOVAs
> - propagate faults collected at physical level up to the virtualizer
> 
> This series largely reuses the user API and infrastructure originally
> devised for SVA/SVM and patches submitted by Jacob, Yi Liu, Tianyu in
> [1-2] and Jean-Philippe [3-4].
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/v5.0-2stage-v6
For those who would like to test this series with qemu, please use the
following qemu branch:

https://github.com/eauger/qemu/tree/v4.0-2stage-unformal-for-v6-testing

until I release a formal respin.

Thanks

Eric
> 
> References:
> [1] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual
> Address (SVA)
> https://lwn.net/Articles/754331/
> [2] [RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d
> (VFIO part)
> https://lists.linuxfoundation.org/pipermail/iommu/2017-April/021475.html
> [3] [v2,00/40] Shared Virtual Addressing for the IOMMU
> https://patchwork.ozlabs.org/cover/912129/
> [4] [PATCH v3 00/10] Shared Virtual Addressing for the IOMMU
> https://patchwork.kernel.org/cover/10608299/
> 
> History:
> v5 -> v6:
> - Fix compilation issue when CONFIG_IOMMU_API is unset
> 
> v4 -> v5:
> - fix bug reported by Vincent: fault handler unregistration now happens in
>   vfio_pci_release
> - IOMMU_FAULT_PERM_* moved outside of struct definition + small
>   uapi changes suggested by Kean-Philippe (except fetch_addr)
> - iommu: introduce device fault report API: removed the PRI part.
> - see individual logs for more details
> - reset the ste abort flag on detach
> 
> v3 -> v4:
> - took into account Alex, jean-Philippe and Robin's comments on v3
> - rework of the smmuv3 driver integration
> - add tear down ops for msi binding and PASID table binding
> - fix S1 fault propagation
> - put fault reporting patches at the beginning of the series following
>   Jean-Philippe's request
> - update of the cache invalidate and fault API uapis
> - VFIO fault reporting rework with 2 separate regions and one mmappable
>   segment for the fault queue
> - moved to PATCH
> 
> v2 -> v3:
> - When registering the S1 MSI binding we now store the device handle. This
>   addresses Robin's comment about discimination of devices beonging to
>   different S1 groups and using different physical MSI doorbells.
> - Change the fault reporting API: use VFIO_PCI_DMA_FAULT_IRQ_INDEX to
>   set the eventfd and expose the faults through an mmappable fault region
> 
> v1 -> v2:
> - Added the fault reporting capability
> - asid properly passed on invalidation (fix assignment of multiple
>   devices)
> - see individual change logs for more info
> 
> 
> Eric Auger (13):
>   iommu: Introduce bind/unbind_guest_msi
>   vfio: VFIO_IOMMU_BIND/UNBIND_MSI
>   iommu/smmuv3: Get prepared for nested stage support
>   iommu/smmuv3: Implement attach/detach_pasid_table
>   iommu/smmuv3: Implement cache_invalidate
>   dma-iommu: Implement NESTED_MSI cookie
>   iommu/smmuv3: Implement bind/unbind_guest_msi
>   iommu/smmuv3: Report non recoverable faults
>   vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type
>   vfio-pci: Register an iommu fault handler
>   vfio_pci: Allow to mmap the fault queue
>   vfio-pci: Add VFIO_PCI_DMA_FAULT_IRQ_INDEX
>   vfio: Document nested stage control
> 
> Jacob Pan (4):
>   driver core: add per device iommu param
>   iommu: introduce device fault data
>   iommu: introduce device fault report API
>   iommu: Introduce attach/detach_pasid_table API
> 
> Jean-Philippe Brucker (2):
>   iommu/arm-smmu-v3: Link domains and devices
>   iommu/arm-smmu-v3: Maintain a SID->device structure
> 
> Liu, Yi L (3):
>   iommu: Introduce 

[git pull] IOMMU Fixes for Linux v5.1-rc1

2019-03-22 Thread Joerg Roedel
Hi Linus,

The following changes since commit 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b:

  Linux 5.1-rc1 (2019-03-17 14:22:26 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v5.1-rc2

for you to fetch changes up to 84c11e4df5aa4955acaa441f0cf1cb2e50daf64b:

  iommu/vt-d: Save the right domain ID used by hardware (2019-03-22 12:02:27 
+0100)


IOMMU Fixes for Linux v5.1-rc2

Including:

- AMD IOMMU fix for sg-mapping with sg->offset > PAGE_SIZE

- Fix for IOVA code to trigger the slow-path less often

- Two fixes for Intel VT-d to avoid writing to read-onl registers and
  to flush the right domain id for the default domains in scalable
  mode


Lu Baolu (2):
  iommu/vt-d: Check capability before disabling protected memory
  iommu/vt-d: Save the right domain ID used by hardware

Robert Richter (1):
  iommu/iova: Fix tracking of recently failed iova address

Stanislaw Gruszka (1):
  iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE

 drivers/iommu/amd_iommu.c   | 7 ++-
 drivers/iommu/intel-iommu.c | 5 -
 drivers/iommu/iova.c| 5 +++--
 3 files changed, 13 insertions(+), 4 deletions(-)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 RFC/RFT] dma-contiguous: Get normal pages for single-page allocations

2019-03-22 Thread Catalin Marinas
Hi Nicolin,

On Thu, Mar 21, 2019 at 04:32:49PM -0700, Nicolin Chen wrote:
> On Tue, Mar 19, 2019 at 02:43:01PM +, Catalin Marinas wrote:
> > On Tue, Mar 05, 2019 at 10:32:02AM -0800, Nicolin Chen wrote:
> > > The addresses within a single page are always contiguous, so it's
> > > not so necessary to always allocate one single page from CMA area.
> > > Since the CMA area has a limited predefined size of space, it may
> > > run out of space in heavy use cases, where there might be quite a
> > > lot CMA pages being allocated for single pages.
> > > 
> > > However, there is also a concern that a device might care where a
> > > page comes from -- it might expect the page from CMA area and act
> > > differently if the page doesn't.
> > > 
> > > This patch tries to get normal pages for single-page allocations
> > > unless the device has its own CMA area. This would save resources
> > > from the CMA area for more CMA allocations. And it'd also reduce
> > > CMA fragmentations resulted from trivial allocations.
> > 
> > This is not sufficient. Some architectures/platforms declare limits on
> > the CMA range so that DMA is possible with all expected devices. For
> > example, on arm64 we keep the CMA in the lower 4GB of the address range,
> > though with this patch you only covered the iommu ops allocation.
> 
> I will follow the way of v1 by adding alloc_page()/free_page()
> function to those callers who don't have fallback allocations.
> In this way, archs may use different callbacks to alloc pages.
> 
> > Do you have any numbers to back this up? You don't seem to address
> > dma_direct_alloc() either but, as I said above, it's not trivial since
> > some platforms expect certain physical range for DMA allocations.
> 
> What's the dma_direct_alloc() here about? Mind elaborating?

I just did a grep for dma_alloc_from_contiguous() in the 5.1-rc1 kernel
and came up with __dma_direct_alloc_pages(). Should your patch cover
this as well?

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma: select GENERIC_ALLOCATOR for DMA_REMAP

2019-03-22 Thread Clément Leger via iommu


- Mail original -
De: "Clément Leger" 
À: "linux-kernel" 
Envoyé: Jeudi 21 Mars 2019 11:20:29
Objet: [PATCH] dma: select GENERIC_ALLOCATOR for DMA_REMAP

When DMA_REMAP is enabled, code in remap.c needs generic allocator.
It currently worked since few architectures uses it (arm64, csky) and
they both select GENERIC_ALLOCATOR.

Signed-off-by: Clement Leger 
---
 kernel/dma/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index a06ba3013b3b..6a43c4ae77f9 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -58,6 +58,7 @@ config SWIOTLB
 config DMA_REMAP
depends on MMU
bool
+   select GENERIC_ALLOCATOR
 
 config DMA_DIRECT_REMAP
bool
-- 
2.15.0.276.g89ea799
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 7/7] iommu/vt-d: Remove lazy allocation of domains

2019-03-22 Thread James Sewart via iommu
Hey Jacob,

> On 14 Mar 2019, at 23:35, Jacob Pan  wrote:
> 
> On Thu, 14 Mar 2019 11:59:36 +
> James Sewart  wrote:
> 
>> 
>> -domain = get_valid_domain_for_dev(dev);
>> +domain = find_domain(dev);
>>  if (!domain)
>>  return DMA_MAPPING_ERROR;
>> 
>> @@ -3914,7 +3624,7 @@ static int intel_map_sg(struct device *dev,
>> struct scatterlist *sglist, int nele if (iommu_no_mapping(dev))
>>  return intel_nontranslate_map_sg(dev, sglist,
>> nelems, dir); 
>> -domain = get_valid_domain_for_dev(dev);
>> +domain = find_domain(dev);
> This patchset looks like a very good clean up, I am wondering why we
> can't use the generic iommu_get_domain_for_dev() here, since VT-d has a
> default DMA domain after your patch.

This should be possible, only downside is we get an iommu_domain from 
iommu_get_domain_for_dev and will have to check its not null before 
getting the dmar_domain from it. We will be able to remove find_domain 
though.

Cheers,
James.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/7] iommu/vt-d: Fix-up device-domain relationship by refactoring to use iommu group default domain.

2019-03-22 Thread James Sewart via iommu
Hey Lu,

> On 20 Mar 2019, at 01:26, Lu Baolu  wrote:
> 
> Hi James,
> 
> On 3/19/19 9:35 PM, James Sewart wrote:
>> Hey Lu,
>>> On 15 Mar 2019, at 03:13, Lu Baolu  wrote:
>>> 
>>> Hi James,
>>> 
>>> On 3/14/19 7:56 PM, James Sewart wrote:
 Patches 1 and 2 are the same as v1.
 v1-v2:
   Refactored ISA direct mappings to be returned by iommu_get_resv_regions.
   Integrated patch by Lu to defer turning on DMAR until iommu.c has mapped
 reserved regions.
   Integrated patches by Lu to remove more unused code in cleanup.
 Lu: I didn't integrate your patch to set the default domain type as it
 isn't directly related to the aim of this patchset. Instead patch 4
>>> 
>>> Without those patches, user experience will be affected and some devices
>>> will not work on Intel platforms anymore.
>>> 
>>> For a long time, Intel IOMMU driver has its own logic to determine
>>> whether a device requires an identity domain. For example, when user
>>> specifies "iommu=pt" in kernel parameter, all device will be attached
>>> with the identity domain. Further more, some quirky devices require
>>> an identity domain to be used before enabling DMA remapping, otherwise,
>>> it will not work. This was done by adding quirk bits in Intel IOMMU
>>> driver.
>>> 
>>> So from my point of view, one way is porting all those quirks and kernel
>>> parameters into IOMMU generic layer, or opening a door for vendor IOMMU
>>> driver to determine the default domain type by their own. I prefer the
>>> latter option since it will not impact any behaviors on other
>>> architectures.
>> I see your point. I’m not confident that using the proposed door to set a
>> groups default domain has the desired behaviour. As discussed before the
>> default domain type will be set based on the desired type for only the
>> first device attached to a group. I think to change the default domain
>> type you would need a slightly different door that wasn’t conditioned on
>> device.
> 
> I think this as another problem. Just a summarize for the ease of
> discussion. We saw two problems:
> 
> 1. When allocating a new group for a device, how should we determine the
> type of the default domain? This is what my proposal patches trying to
> address.

This will work as expected only if all devices within a group get the same 
result from is_identity_map. Otherwise wee see issue 2.

> 
> 2. If we need to put a device into an existing group which uses a
> different type of domain from what the device desires to use, we might
> break the functionality of the device. For this problem I'd second your
> proposal below if I get your point correctly.
> 
>> For situations where individual devices require an identity domain because
>> of quirks then maybe calling is_identity_map per device in
>> iommu_group_get_for_dev is a better solution than the one I proposed.
> 
> Do you mean if we see a quirky device requires a different domain type
> other than the default domain type of the group, we will assign a new
> group to it? That looks good to me as far as I can see. I suppose this
> should be done in vt-d's ops callback.

I have thought about this a bit and I think the cleanest approach is to 
put devices that require an identity domain into their own group, this can 
be done in the device_group callback, avoiding any situation where we have 
to deal with creating a new group based on domain type in 
iommu_group_get_for_dev. This way we shouldn’t ever be in a situation with 
multiple different domain types per group. This way your patches will work 
as expected. See below for a possible implementation.

> 
> Best regards,
> Lu Baolu

Cheers,
James.

Devices that require an identity map because of quirks or other reasons
should be put in their own IOMMU group so as to not end up with multiple
different domains per group.

Signed-off-by: James Sewart 

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3cb8b36abf50..0f5a121d31a0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5421,6 +5421,22 @@ struct intel_iommu *intel_svm_device_to_iommu(struct 
device *dev)
 }
 #endif /* CONFIG_INTEL_IOMMU_SVM */

+static struct iommu_group *intel_identity_group;
+
+struct iommu_group *intel_iommu_pci_device_group(struct device *dev)
+{
+   if (iommu_no_mapping(dev)) {
+   if (!intel_identity_group) {
+   intel_identity_group = iommu_group_alloc();
+   if (IS_ERR(intel_identity_group))
+   return NULL;
+   }
+   return intel_identity_group;
+   }
+
+   return pci_device_group(dev);
+}
+
 const struct iommu_ops intel_iommu_ops = {
.capable= intel_iommu_capable,
.domain_alloc   = intel_iommu_domain_alloc,
@@ -5435,7 +5451,7 @@ const struct iommu_ops intel_iommu_ops = {
.get_resv_regions   = intel_iommu_get_resv_regions,
.put_resv_regions 

Re: [PATCH v2 3/7] iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions

2019-03-22 Thread James Sewart via iommu
Hey Lu,

> On 15 Mar 2019, at 02:19, Lu Baolu  wrote:
> 
> Hi James,
> 
> On 3/14/19 7:58 PM, James Sewart wrote:
>> To support mapping ISA region via iommu_group_create_direct_mappings,
>> make sure its exposed by iommu_get_resv_regions. This allows
>> deduplication of reserved region mappings
>> Signed-off-by: James Sewart 
>> ---
>>  drivers/iommu/intel-iommu.c | 42 +
>>  1 file changed, 33 insertions(+), 9 deletions(-)
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 8e0a4e2ff77f..2e00e8708f06 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -337,6 +337,8 @@ static LIST_HEAD(dmar_rmrr_units);
>>  #define for_each_rmrr_units(rmrr) \
>>  list_for_each_entry(rmrr, _rmrr_units, list)
>>  +static struct iommu_resv_region *isa_resv_region;
>> +
>>  /* bitmap for indexing intel_iommus */
>>  static int g_num_of_iommus;
>>  @@ -2780,26 +2782,34 @@ static inline int iommu_prepare_rmrr_dev(struct 
>> dmar_rmrr_unit *rmrr,
>>rmrr->end_address);
>>  }
>>  +static inline struct iommu_resv_region *iommu_get_isa_resv_region(void)
>> +{
>> +if (!isa_resv_region)
>> +isa_resv_region = iommu_alloc_resv_region(0,
>> +16*1024*1024,
>> +0, IOMMU_RESV_DIRECT);
>> +
>> +return isa_resv_region;
>> +}
>> +
>>  #ifdef CONFIG_INTEL_IOMMU_FLOPPY_WA
>> -static inline void iommu_prepare_isa(void)
>> +static inline void iommu_prepare_isa(struct pci_dev *pdev)
>>  {
>> -struct pci_dev *pdev;
>>  int ret;
>> +struct iommu_resv_region *reg = iommu_get_isa_resv_region();
>>  -   pdev = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, NULL);
>> -if (!pdev)
>> +if (!reg)
>>  return;
>>  pr_info("Prepare 0-16MiB unity mapping for LPC\n");
>> -ret = iommu_prepare_identity_map(>dev, 0, 16*1024*1024 - 1);
>> +ret = iommu_prepare_identity_map(>dev, reg->start,
>> +reg->start + reg->length - 1);
>>  if (ret)
>>  pr_err("Failed to create 0-16MiB identity map - floppy might 
>> not work\n");
>> -
>> -pci_dev_put(pdev);
>>  }
>>  #else
>> -static inline void iommu_prepare_isa(void)
>> +static inline void iommu_prepare_isa(struct pci_dev *pdev)
>>  {
>>  return;
>>  }
>> @@ -3289,6 +3299,7 @@ static int __init init_dmars(void)
>>  struct dmar_rmrr_unit *rmrr;
>>  bool copied_tables = false;
>>  struct device *dev;
>> +struct pci_dev *pdev;
>>  struct intel_iommu *iommu;
>>  int i, ret;
>>  @@ -3469,7 +3480,11 @@ static int __init init_dmars(void)
>>  }
>>  }
>>  -   iommu_prepare_isa();
>> +pdev = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, NULL);
>> +if (pdev) {
>> +iommu_prepare_isa(pdev);
>> +pci_dev_put(pdev);
>> +}
>>domains_done:
>>  @@ -5266,6 +5281,7 @@ static void intel_iommu_get_resv_regions(struct 
>> device *device,
>>  struct iommu_resv_region *reg;
>>  struct dmar_rmrr_unit *rmrr;
>>  struct device *i_dev;
>> +struct pci_dev *pdev;
>>  int i;
>>  rcu_read_lock();
>> @@ -5280,6 +5296,14 @@ static void intel_iommu_get_resv_regions(struct 
>> device *device,
>>  }
>>  rcu_read_unlock();
>>  +   if (dev_is_pci(device)) {
>> +pdev = to_pci_dev(device);
>> +if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA) {
>> +reg = iommu_get_isa_resv_region();
>> +list_add_tail(>list, head);
>> +}
>> +}
>> +
> 
> Just wondering why not just
> 
> +#ifdef CONFIG_INTEL_IOMMU_FLOPPY_WA
> + if (dev_is_pci(device)) {
> + pdev = to_pci_dev(device);
> + if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA) {
> + reg = iommu_alloc_resv_region(0,
> + 16*1024*1024,
> + 0, IOMMU_RESV_DIRECT);
> + if (reg)
> + list_add_tail(>list, head);
> + }
> + }
> +#endif
> 
> and, remove all other related code?

At this point in the patchset if we remove iommu_prepare_isa then the ISA 
region won’t be mapped to the device. Only once the dma domain is allocable 
will the reserved regions be mapped by iommu_group_create_direct_mappings.

Theres an issue that if we choose to alloc a new resv_region with type 
IOMMU_RESV_DIRECT, we will need to refactor intel_iommu_put_resv_regions 
to free this entry type which means refactoring the rmrr regions in 
get_resv_regions. Should this work be in this patchset?

> 
> Best regards,
> Lu Baolu

Cheers,
James.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] iommu/vt-d: Check capability before disabling protected memory

2019-03-22 Thread Joerg Roedel
On Wed, Mar 20, 2019 at 09:58:33AM +0800, Lu Baolu wrote:
> The spec states in 10.4.16 that the Protected Memory Enable
> Register should be treated as read-only for implementations
> not supporting protected memory regions (PLMR and PHMR fields
> reported as Clear in the Capability register).
> 
> Cc: Jacob Pan 
> Cc: mark gross 
> Suggested-by: Ashok Raj 
> Fixes: f8bab73515ca5 ("intel-iommu: PMEN support")
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-iommu.c | 3 +++
>  1 file changed, 3 insertions(+)

Applied both, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/iova: Fix tracking of recently failed iova address

2019-03-22 Thread Joerg Roedel
On Wed, Mar 20, 2019 at 06:57:23PM +, Robert Richter wrote:
> From: Robert Richter 
> Subject: [PATCH v2] iommu/iova: Fix tracking of recently failed iova address
>  size
> 
> If a 32 bit allocation request is too big to possibly succeed, it
> early exits with a failure and then should never update max32_alloc_
> size. This patch fixes current code, now the size is only updated if
> the slow path failed while walking the tree. Without the fix the
> allocation may enter the slow path again even if there was a failure
> before of a request with the same or a smaller size.
> 
> Cc:  # 4.20+
> Fixes: bee60e94a1e2 ("iommu/iova: Optimise attempts to allocate iova from 
> 32bit address range")
> Signed-off-by: Robert Richter 
> Reviewed-by: Robin Murphy 
> Signed-off-by: Robert Richter 
> ---
>  drivers/iommu/iova.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 09/22] vfio: VFIO_IOMMU_BIND/UNBIND_MSI

2019-03-22 Thread Auger Eric
Hi Alex,
On 3/22/19 12:01 AM, Alex Williamson wrote:
> On Sun, 17 Mar 2019 18:22:19 +0100
> Eric Auger  wrote:
> 
>> This patch adds the VFIO_IOMMU_BIND/UNBIND_MSI ioctl which aim
>> to pass/withdraw the guest MSI binding to/from the host.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v3 -> v4:
>> - add UNBIND
>> - unwind on BIND error
>>
>> v2 -> v3:
>> - adapt to new proto of bind_guest_msi
>> - directly use vfio_iommu_for_each_dev
>>
>> v1 -> v2:
>> - s/vfio_iommu_type1_guest_msi_binding/vfio_iommu_type1_bind_guest_msi
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 58 +
>>  include/uapi/linux/vfio.h   | 29 +
>>  2 files changed, 87 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 12a40b9db6aa..66513679081b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1710,6 +1710,25 @@ static int vfio_cache_inv_fn(struct device *dev, void 
>> *data)
>>  return iommu_cache_invalidate(d, dev, >info);
>>  }
>>  
>> +static int vfio_bind_msi_fn(struct device *dev, void *data)
>> +{
>> +struct vfio_iommu_type1_bind_msi *ustruct =
>> +(struct vfio_iommu_type1_bind_msi *)data;
>> +struct iommu_domain *d = iommu_get_domain_for_dev(dev);
>> +
>> +return iommu_bind_guest_msi(d, dev, ustruct->iova,
>> +ustruct->gpa, ustruct->size);
>> +}
>> +
>> +static int vfio_unbind_msi_fn(struct device *dev, void *data)
>> +{
>> +dma_addr_t *iova = (dma_addr_t *)data;
>> +struct iommu_domain *d = iommu_get_domain_for_dev(dev);
> 
> Same as previous, we can encapsulate domain in our own struct to avoid
> a lookup.
> 
>> +
>> +iommu_unbind_guest_msi(d, dev, *iova);
> 
> Is it strange that iommu-core is exposing these interfaces at a device
> level if every one of them requires us to walk all the devices?  Thanks,

Hum this per device API was devised in response of Robin's comments on

[RFC v2 12/20] dma-iommu: Implement NESTED_MSI cookie.

"
But that then seems to reveal a somewhat bigger problem - if the callers
are simply registering IPAs, and relying on the ITS driver to grab an
entry and fill in a PA later, then how does either one know *which* PA
is supposed to belong to a given IPA in the case where you have multiple
devices with different ITS targets assigned to the same guest? (and if
it's possible to assume a guest will use per-device stage 1 mappings and
present it with a single vITS backed by multiple pITSes, I think things
start breaking even harder.)
"

However looking back into the problem I wonder if there was an issue
with the iommu_domain based API.

If my understanding is correct, when assigned devices are protected by a
vIOMMU then they necessarily end up in separate host iommu domains even
if they belong to the same iommu_domain on the guest. And there can only
be a single device in this iommu_domain.

If this is confirmed, there is a non ambiguous association between 1
physical iommu_domain, 1 device, 1 S1 mapping and 1 physical MSI
controller.

I added the device handle handle to disambiguate those associations. The
gIOVA ->gDB mapping is associated with a device handle. Then when the
host needs a stage 1 mapping for this device, to build the nested
mapping towards the physical DB it can easily grab the gIOVA->gDB stage
1 mapping registered for this device.

The correctness looks more obvious to me, at least.

Thanks

Eric

> 
> Alex
> 
>> +return 0;
>> +}
>> +
>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>> unsigned int cmd, unsigned long arg)
>>  {
>> @@ -1814,6 +1833,45 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>vfio_cache_inv_fn);
>>  mutex_unlock(>lock);
>>  return ret;
>> +} else if (cmd == VFIO_IOMMU_BIND_MSI) {
>> +struct vfio_iommu_type1_bind_msi ustruct;
>> +int ret;
>> +
>> +minsz = offsetofend(struct vfio_iommu_type1_bind_msi,
>> +size);
>> +
>> +if (copy_from_user(, (void __user *)arg, minsz))
>> +return -EFAULT;
>> +
>> +if (ustruct.argsz < minsz || ustruct.flags)
>> +return -EINVAL;
>> +
>> +mutex_lock(>lock);
>> +ret = vfio_iommu_for_each_dev(iommu, ,
>> +  vfio_bind_msi_fn);
>> +if (ret)
>> +vfio_iommu_for_each_dev(iommu, ,
>> +vfio_unbind_msi_fn);
>> +mutex_unlock(>lock);
>> +return ret;
>> +} else if (cmd == VFIO_IOMMU_UNBIND_MSI) {
>> +struct vfio_iommu_type1_unbind_msi ustruct;
>> +int ret;
>> +
>> +minsz = offsetofend(struct vfio_iommu_type1_unbind_msi,
>> +iova);
>> +
>> +

Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-03-22 Thread Dave Young
Hi Pingfan, 

Thanks for the effort,
On 03/01/19 at 11:19am, Pingfan Liu wrote:
> On Fri, Mar 1, 2019 at 11:04 AM Pingfan Liu  wrote:
> >
> > Hi Borislav,
> >
> > Do you think the following patch is good at present?
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 81f9d23..9213073 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -460,7 +460,7 @@ static void __init
> > memblock_x86_reserve_range_setup_data(void)
> >  # define CRASH_ADDR_LOW_MAX(512 << 20)
> >  # define CRASH_ADDR_HIGH_MAX   (512 << 20)
> >  #else
> > -# define CRASH_ADDR_LOW_MAX(896UL << 20)
> > +# define CRASH_ADDR_LOW_MAX(1 << 32)
> >  # define CRASH_ADDR_HIGH_MAX   MAXMEM
> >  #endif
> >
> Or patch lools like:
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3d872a5..ed0def5 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -459,7 +459,7 @@ static void __init
> memblock_x86_reserve_range_setup_data(void)
>  # define CRASH_ADDR_LOW_MAX(512 << 20)
>  # define CRASH_ADDR_HIGH_MAX   (512 << 20)
>  #else
> -# define CRASH_ADDR_LOW_MAX(896UL << 20)
> +# define CRASH_ADDR_LOW_MAX(1 << 32)
>  # define CRASH_ADDR_HIGH_MAX   MAXMEM
>  #endif
> 
> @@ -551,6 +551,15 @@ static void __init reserve_crashkernel(void)
> high ? CRASH_ADDR_HIGH_MAX
>  : CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> +   /*
> +* crashkernel=X reserve below 4G fails? Try MAXMEM
> +*/
> +   if (!high && !crash_base)
> +   crash_base = memblock_find_in_range(CRASH_ALIGN,
> +   CRASH_ADDR_HIGH_MAX,
> +   crash_size, CRASH_ALIGN);
> +#endif
> 
> which tries 0-4G, the fall back to 4G above

This way looks good to me, I will do some testing with old kexec-tools,
Once testing done I can take up this again and repost later with some 
documentation
update.  Also will split to 2 patches  one to drop the old limitation,
another for the fallback.

Thanks
Dave
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/22] iommu: introduce device fault data

2019-03-22 Thread Auger Eric
Hi Jacob,

On 3/21/19 11:04 PM, Jacob Pan wrote:
> On Sun, 17 Mar 2019 18:22:12 +0100
> Eric Auger  wrote:
> 
>> From: Jacob Pan 
>>
>> Device faults detected by IOMMU can be reported outside the IOMMU
>> subsystem for further processing. This patch introduces
>> a generic device fault data structure.
>>
>> The fault can be either an unrecoverable fault or a page request,
>> also referred to as a recoverable fault.
>>
>> We only care about non internal faults that are likely to be reported
>> to an external subsystem.
>>
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v4 -> v5:
>> - simplified struct iommu_fault_event comment
>> - Moved IOMMU_FAULT_PERM outside of the struct
>> - Removed IOMMU_FAULT_PERM_INST
>> - s/IOMMU_FAULT_PAGE_REQUEST_PASID_PRESENT/
>>   IOMMU_FAULT_PAGE_REQUEST_PASID_VALID
>>
>> v3 -> v4:
>> - use a union containing aither an unrecoverable fault or a page
>>   request message. Move the device private data in the page request
>>   structure. Reshuffle the fields and use flags.
>> - move fault perm attributes to the uapi
>> - remove a bunch of iommu_fault_reason enum values that were related
>>   to internal errors
>> ---
>>  include/linux/iommu.h  |  44 ++
>>  include/uapi/linux/iommu.h | 115
>> + 2 files changed, 159
>> insertions(+) create mode 100644 include/uapi/linux/iommu.h
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index ffbbc7e39cee..c6f398f7e6e0 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -25,6 +25,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define IOMMU_READ  (1 << 0)
>>  #define IOMMU_WRITE (1 << 1)
>> @@ -48,6 +49,7 @@ struct bus_type;
>>  struct device;
>>  struct iommu_domain;
>>  struct notifier_block;
>> +struct iommu_fault_event;
>>  
>>  /* iommu fault flags */
>>  #define IOMMU_FAULT_READ0x0
>> @@ -55,6 +57,7 @@ struct notifier_block;
>>  
>>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>>  struct device *, unsigned long, int, void *);
>> +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
>> void *); 
>>  struct iommu_domain_geometry {
>>  dma_addr_t aperture_start; /* First address that can be
>> mapped*/ @@ -247,6 +250,46 @@ struct iommu_device {
>>  struct device *dev;
>>  };
>>  
>> +/**
>> + * struct iommu_fault_event - Generic fault event
>> + *
>> + * Can represent recoverable faults such as a page requests or
>> + * unrecoverable faults such as DMA or IRQ remapping faults.
>> + *
>> + * @fault: fault descriptor
>> + * @iommu_private: used by the IOMMU driver for storing
>> fault-specific
>> + * data. Users should not modify this field before
>> + * sending the fault response.
>> + */
>> +struct iommu_fault_event {
>> +struct iommu_fault fault;
>> +u64 iommu_private;
>> +};
>> +
>> +/**
>> + * struct iommu_fault_param - per-device IOMMU fault data
>> + * @dev_fault_handler: Callback function to handle IOMMU faults at
>> device level
>> + * @data: handler private data
>> + *
>> + */
>> +struct iommu_fault_param {
>> +iommu_dev_fault_handler_t handler;
>> +void *data;
>> +};
>> +
>> +/**
>> + * struct iommu_param - collection of per-device IOMMU data
>> + *
>> + * @fault_param: IOMMU detected device fault reporting data
>> + *
>> + * TODO: migrate other per device data pointers under
>> iommu_dev_data, e.g.
>> + *  struct iommu_group  *iommu_group;
>> + *  struct iommu_fwspec *iommu_fwspec;
>> + */
>> +struct iommu_param {
>> +struct iommu_fault_param *fault_param;
>> +};
>> +
>>  int  iommu_device_register(struct iommu_device *iommu);
>>  void iommu_device_unregister(struct iommu_device *iommu);
>>  int  iommu_device_sysfs_add(struct iommu_device *iommu,
>> @@ -422,6 +465,7 @@ struct iommu_ops {};
>>  struct iommu_group {};
>>  struct iommu_fwspec {};
>>  struct iommu_device {};
>> +struct iommu_fault_param {};
>>  
>>  static inline bool iommu_present(struct bus_type *bus)
>>  {
>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
>> new file mode 100644
>> index ..edcc0dda7993
>> --- /dev/null
>> +++ b/include/uapi/linux/iommu.h
>> @@ -0,0 +1,115 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +/*
>> + * IOMMU user API definitions
>> + */
>> +
>> +#ifndef _UAPI_IOMMU_H
>> +#define _UAPI_IOMMU_H
>> +
>> +#include 
>> +
>> +#define IOMMU_FAULT_PERM_WRITE  (1 << 0) /* write */
>> +#define IOMMU_FAULT_PERM_EXEC   (1 << 1) /* exec */
>> +#define IOMMU_FAULT_PERM_PRIV   (1 << 2) /* privileged */
>> +
>> +/*  Generic fault types, can be expanded IRQ remapping fault */
>> +enum iommu_fault_type {
>> +IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
>> +IOMMU_FAULT_PAGE_REQ,   /* page request 

Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-22 Thread Auger Eric
Hi Jacob,

On 3/21/19 11:10 PM, Jacob Pan wrote:
> On Thu, 21 Mar 2019 15:32:45 +0100
> Auger Eric  wrote:
> 
>> Hi jean, Jacob,
>>
>> On 3/21/19 3:13 PM, Jean-Philippe Brucker wrote:
>>> On 21/03/2019 13:54, Auger Eric wrote:  
 Hi Jacob, Jean-Philippe,

 On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:  
> On 20/03/2019 16:37, Jacob Pan wrote:
> [...]  
>>> +struct iommu_inv_addr_info {
>>> +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0)
>>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1)
>>> +#define IOMMU_INV_ADDR_FLAGS_LEAF  (1 << 2)
>>> +   __u32   flags;
>>> +   __u32   archid;
>>> +   __u64   pasid;
>>> +   __u64   addr;
>>> +   __u64   granule_size;
>>> +   __u64   nb_granules;
>>> +};
>>> +
>>> +/**
>>> + * First level/stage invalidation information
>>> + * @cache: bitfield that allows to select which caches to
>>> invalidate
>>> + * @granularity: defines the lowest granularity used for the
>>> invalidation:
>>> + * domain > pasid > addr
>>> + *
>>> + * Not all the combinations of cache/granularity make sense:
>>> + *
>>> + * type |   DEV_IOTLB   | IOTLB |
>>> PASID|
>>> + * granularity |   |   |
>>> cache   |
>>> + *
>>> -+---+---+---+
>>> + * DOMAIN  |   N/A |   Y   |
>>> Y   |
>>> + * PASID   |   Y   |   Y   |
>>> Y   |
>>> + * ADDR|   Y   |   Y   |
>>> N/A |
>>> + */
>>> +struct iommu_cache_invalidate_info {
>>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>>> +   __u32   version;
>>> +/* IOMMU paging structure cache */
>>> +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU
>>> IOTLB */ +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB(1 <<
>>> 1) /* Device IOTLB */ +#define
>>> IOMMU_CACHE_INV_TYPE_PASID  (1 << 2) /* PASID cache */  
>> Just a clarification, this used to be an enum. You do intend to
>> issue a single invalidation request on multiple cache types?
>> Perhaps for virtio-IOMMU? I only see a single cache type in your
>> patch #14. For VT-d we plan to issue one cache type at a time
>> for now. So this format works for us.  
>
> Yes for virtio-iommu I'd like as little overhead as possible,
> which means a single invalidation message to hit both IOTLB and
> ATC at once, and the ability to specify multiple pages with
> @nb_granules.  
 The original request/explanation from Jean-Philippe can be found
 here: https://lkml.org/lkml/2019/1/28/1497
  
>  
>> However, if multiple cache types are issued in a single
>> invalidation. They must share a single granularity, not all
>> combinations are valid. e.g. dev IOTLB does not support domain
>> granularity. Just a reminder, not an issue. Driver could filter
>> out invalid combinations.  
 Sure I will add a comment about this restriction.  
>
> Agreed. Even the core could filter out invalid combinations based
> on the table above: IOTLB and domain granularity are N/A.  
 I don't get this sentence. What about vtd IOTLB domain-selective
 invalidation:  
>>>
>>> My mistake: I meant dev-IOTLB and domain granularity are N/A  
>>
>> Ah OK, no worries.
>>
>> How do we proceed further with those user APIs? Besides the comment to
>> be added above and previous suggestion from Jean ("Invalidations by
>> @granularity use field ...), have we reached a consensus now on:
>>
>> - attach/detach_pasid_table
>> - cache_invalidate
>> - fault data and fault report API?
>>
> These APIs are sufficient for today's VT-d use and leave enough space
> for extension. E.g. new fault reasons.
> 
> I have cherry picked the above APIs from your patchset to enable VT-d
> nested translation. Going forward, I will reuse these until they get
> merged.

OK thanks!

Eric
> 
>> If not, please let me know.
>>
>> Thanks
>>
>> Eric
>>
>>
>>>
>>> Thanks,
>>> Jean
>>>   
 "
 • IOTLB entries caching mappings associated with the specified
 domain-id are invalidated.
 • Paging-structure-cache entries caching mappings associated with
 the specified domain-id are invalidated.
 "

 Thanks

 Eric
  
>
> Thanks,
> Jean
>  
>>  
>>> +   __u8cache;
>>> +   __u8granularity;
>>> +   __u8padding[2];
>>> +   union {
>>> +   __u64   pasid;
>>> +   struct iommu_inv_addr_info addr_info;
>>> +   };
>>> +};
>>> +
>>> +
>>>  #endif /* _UAPI_IOMMU_H */  
>>
>> [Jacob Pan]
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> 

Re: [PATCH v6 07/22] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE

2019-03-22 Thread Auger Eric
Hi Alex,

On 3/21/19 11:19 PM, Alex Williamson wrote:
> On Sun, 17 Mar 2019 18:22:17 +0100
> Eric Auger  wrote:
> 
>> From: "Liu, Yi L" 
>>
>> This patch adds VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE ioctl
>> which aims to pass/withdraw the virtual iommu guest configuration
>> to/from the VFIO driver downto to the iommu subsystem.
>>
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v3 -> v4:
>> - restore ATTACH/DETACH
>> - add unwind on failure
>>
>> v2 -> v3:
>> - s/BIND_PASID_TABLE/SET_PASID_TABLE
>>
>> v1 -> v2:
>> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
>> - remove the struct device arg
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 53 +
>>  include/uapi/linux/vfio.h   | 17 +++
>>  2 files changed, 70 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 73652e21efec..222e9199edbf 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1644,6 +1644,43 @@ static int vfio_domains_have_iommu_cache(struct 
>> vfio_iommu *iommu)
>>  return ret;
>>  }
>>  
>> +static void
>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>> +{
>> +struct vfio_domain *d;
>> +
>> +mutex_lock(>lock);
>> +
>> +list_for_each_entry(d, >domain_list, next) {
>> +iommu_detach_pasid_table(d->domain);
>> +}
>> +mutex_unlock(>lock);
>> +}
>> +
>> +static int
>> +vfio_attach_pasid_table(struct vfio_iommu *iommu,
>> +struct vfio_iommu_type1_attach_pasid_table *ustruct)
>> +{
>> +struct vfio_domain *d;
>> +int ret = 0;
>> +
>> +mutex_lock(>lock);
>> +
>> +list_for_each_entry(d, >domain_list, next) {
>> +ret = iommu_attach_pasid_table(d->domain, >config);
>> +if (ret)
>> +goto unwind;
>> +}
>> +goto unlock;
>> +unwind:
>> +list_for_each_entry_continue_reverse(d, >domain_list, next) {
>> +iommu_detach_pasid_table(d->domain);
>> +}
>> +unlock:
>> +mutex_unlock(>lock);
>> +return ret;
>> +}
>> +
>>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>> unsigned int cmd, unsigned long arg)
>>  {
>> @@ -1714,6 +1751,22 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  
>>  return copy_to_user((void __user *)arg, , minsz) ?
>>  -EFAULT : 0;
>> +} else if (cmd == VFIO_IOMMU_ATTACH_PASID_TABLE) {
>> +struct vfio_iommu_type1_attach_pasid_table ustruct;
>> +
>> +minsz = offsetofend(struct vfio_iommu_type1_attach_pasid_table,
>> +config);
>> +
>> +if (copy_from_user(, (void __user *)arg, minsz))
>> +return -EFAULT;
>> +
>> +if (ustruct.argsz < minsz || ustruct.flags)
>> +return -EINVAL;
> 
> Who is responsible for validating the ustruct.config?
> arm_smmu_attach_pasid_table() only looks at the format, ignoring the
> version field :-\  In fact, where is struct iommu_pasid_smmuv3 ever used
> from the config?

This is something missing and to be fixed in smmuv3
arm_smmu_attach_pasid_table(). At the moment the virtual SMMUv3 only
supports a single context descriptor hence the shortcut.

  Should the padding fields be forced to zero?  We
> don't have flags to incorporate use of them with future extensions, so
> maybe we don't care?

My guess is if we were to add new fields in iommu_pasid_smmuv3, we would
both increment iommu_pasid_smmuv3.version and
iommu_pasid_table_config.version. I don't think padding fields are meant
to be filled here (ie. no flag needed).

> 
>> +
>> +return vfio_attach_pasid_table(iommu, );
>> +} else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) {
>> +vfio_detach_pasid_table(iommu);
>> +return 0;
>>  }
>>  
>>  return -ENOTTY;
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 02bb7ad6e986..329d378565d9 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -14,6 +14,7 @@
>>  
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define VFIO_API_VERSION0
>>  
>> @@ -759,6 +760,22 @@ struct vfio_iommu_type1_dma_unmap {
>>  #define VFIO_IOMMU_ENABLE   _IO(VFIO_TYPE, VFIO_BASE + 15)
>>  #define VFIO_IOMMU_DISABLE  _IO(VFIO_TYPE, VFIO_BASE + 16)
>>  
>> +/**
>> + * VFIO_IOMMU_ATTACH_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
>> + *  struct vfio_iommu_type1_attach_pasid_table)
>> + *
>> + * Passes the PASID table to the host. Calling ATTACH_PASID_TABLE
>> + * while a table is already installed is allowed: it replaces the old
>> + * table. DETACH does a comprehensive tear down of the nested mode.
>> + */
>> +struct vfio_iommu_type1_attach_pasid_table {
>> +__u32   argsz;
>> +__u32   flags;
>> +struct iommu_pasid_table_config config;
>> +};
>> +#define