date:20170518

Re: [Qemu-devel] [RFC PATCH 09/20] Memory: introduce iommu_ops->record_device

2017-05-18 Thread Liu, Yi L

Hi Alex,

What's your opinion with Tianyu's question? Is it accepatable
to use VFIO API in intel_iommu emulator?

Thanks,
Yi L
On Fri, Apr 28, 2017 at 02:46:16PM +0800, Lan Tianyu wrote:
> On 2017年04月26日 18:06, Liu, Yi L wrote:
> > With vIOMMU exposed to guest, vIOMMU emulator needs to do translation
> > between host and guest. e.g. a device-selective TLB flush, vIOMMU
> > emulator needs to replace guest SID with host SID so that to limit
> > the invalidation. This patch introduces a new callback
> > iommu_ops->record_device() to notify vIOMMU emulator to record necessary
> > information about the assigned device.
> 
> This patch is to prepare to translate guest sbdf to host sbdf.
> 
> Alex:
>   Could we add a new vfio API to do such translation? This will be more
> straight forward than storing host sbdf in the vIOMMU device model.
> 
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/exec/memory.h | 11 +++
> >  memory.c  | 12 
> >  2 files changed, 23 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 7bd13ab..49087ef 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -203,6 +203,8 @@ struct MemoryRegionIOMMUOps {
> >  IOMMUNotifierFlag new_flags);
> >  /* Set this up to provide customized IOMMU replay function */
> >  void (*replay)(MemoryRegion *iommu, IOMMUNotifier *notifier);
> > +void (*record_device)(MemoryRegion *iommu,
> > +  void *device_info);
> >  };
> >  
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > @@ -708,6 +710,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
> >  void memory_region_notify_one(IOMMUNotifier *notifier,
> >IOMMUTLBEntry *entry);
> >  
> > +/*
> > + * memory_region_notify_device_record: notify IOMMU to record assign
> > + * device.
> > + * @mr: the memory region to notify
> > + * @ device_info: device information
> > + */
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +void *info);
> > +
> >  /**
> >   * memory_region_register_iommu_notifier: register a notifier for changes 
> > to
> >   * IOMMU translation entries.
> > diff --git a/memory.c b/memory.c
> > index 0728e62..45ef069 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1600,6 +1600,18 @@ static void 
> > memory_region_update_iommu_notify_flags(MemoryRegion *mr)
> >  mr->iommu_notify_flags = flags;
> >  }
> >  
> > +void memory_region_notify_device_record(MemoryRegion *mr,
> > +void *info)
> > +{
> > +assert(memory_region_is_iommu(mr));
> > +
> > +if (mr->iommu_ops->record_device) {
> > +mr->iommu_ops->record_device(mr, info);
> > +}
> > +
> > +return;
> > +}
> > +
> >  void memory_region_register_iommu_notifier(MemoryRegion *mr,
> > IOMMUNotifier *n)
> >  {
> > 
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-18 Thread Oza Oza via iommu

On Thu, May 18, 2017 at 12:43 AM, Arnd Bergmann  wrote:
> On Tue, May 16, 2017 at 7:22 AM, Oza Pawandeep  wrote:
>> current device framework and OF framework integration assumes
>> dma-ranges in a way where memory-mapped devices define their
>> dma-ranges. (child-bus-address, parent-bus-address, length).
>>
>> of_dma_configure is specifically written to take care of memory
>> mapped devices. but no implementation exists for pci to take
>> care of pcie based memory ranges.
>
> Hi Oza,
>
> I'm trying to make sense of this, but am still rather puzzled. I have
> no idea what the distinction between memory-mapped devices and
> pcie based devices is in your description, as PCIe is usually memory
> mapped, and Linux doesn't actually support other kinds of PCIe
> devices on most architectures.
>
there are 2 problems which I am trying to address here.

problem-1:
let me explain our PCI RC's limitations first.

IOVA allocaiton honours device's coherent_dma_mask/dma_mask.
in PCI case, current code honours DMA mask set by EP,
there is no concept of PCI host bridge dma-mask, which should be there
and could truely reflect the limitaiton of PCI host bridge.

having said that we have
dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
which means we can only address 512GB.

now because of broken of_dma_get_range we end up getting 64bit dma_mask.
please check the code:of_dma_configure()
if (ret < 0) {
dma_addr = offset = 0;
size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);

now in this process I figred out problems in of_dma_get_range: hence the fix
1) return of wrong size as 0 becasue of whole parsing problem.
2) not handling absence of dma-ranges which is valid for PCI master.
3) not handling multipe inbound windows.
4) in order to get largest possible dma_mask. this patch also returns
the largest possible size based on dma-ranges,

please have a look at
[PATCH v6 3/3] PCI/of fix of_dma_get_range; get PCI specific dma-ranges
I just made is bus specific leaving origional of_dma_get_range unmodified
and defining new PCI handling of_bus_pci_get_dma_ranges

also when I say memory-mapped and PCI device, I only mean to say with respect
to the dma-ranges format. (of coure PCI is memory mapped as well).
probbaly my commit description is misleading, sorry about that.

so Problem1: is just bug fix, Nothing else

Problem2: [PATCH v6 2/3] iommu/pci: reserve IOVA for PCI masters

we have memory banks

<0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
<0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
<0x0090 0x 0x4 0x>, /* 16G @ 576G */
<0x00a0 0x 0x4 0x>; /* 16G @ 640G */

when I run SPDK (User sapce) which internally uses vfio to access PCI
endpoint directly.
vfio uses huge-pages which could coming from 640G/0x00a0.
and the way vfio maps the hugepage to user space and generates IOVA is different
from the way kernel allocate IOVA.

vfio just maps one to one IOVA to Physical address.
it just calls directly remap_pfn_range.

so the way kernel allocates IOVA (where it honours device dma_mask)
and the way userspace gets IOVA is totally different.

so dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
will not work.
instead we have to go for scatterred dma-ranges leaving holes.

having said that we have to reserve IOVA allocations for inbound memory.
I am in a process of addressing Robin Murphy's comment on that and rebasing
my patch on rc12.

this problem statement is more important to us.
because it makes both kernel and use space IOVA allocations work when
IOMMU is enabled.

probably thing might be confusing because I am clubbing my patches to
address both
the problems. going forward I should just try to first send out patch
for problem2 alone (not sure)
because my next patch-set would bring some changes in pci/probe.c as well.

>> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
>> world dma-ranges.
>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> this patch serves following:
>>
>> 1) exposes interface to the pci host driver for their
>> inbound memory ranges
>>
>> 2) provide an interface to callers such as of_dma_get_ranges.
>> so then the returned size get best possible (largest) dma_mask.
>> because PCI RC drivers do not call APIs such as
>> dma_set_coherent_mask() and hence rather it shows its addressing
>> capabilities based on dma-ranges.
>>
>> for e.g.
>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>> we should get dev->coherent_dma_mask=0x7f.
>
> do you mean the coherent_dma_mask of the PCI host bridge
> or an attached device here?
>
> If you require PCI devices to come up with an initial
> coherent_dma_mask other than 0xff, there are other
> problems involved. In particular, you will need to use
> swiotlb, which is not supported on arm32 at the moment,
> and the dma_set_mask()/dma_set_coherent_mask()
> functions need to be modified.

even without this patch also it comes up with coherent_

Re: [PATCH] iommu/amd: flush IOTLB for specific domains only

2017-05-18 Thread Michel Dänzer

On 07/04/17 07:20 PM, Joerg Roedel wrote:
> On Mon, Mar 27, 2017 at 11:47:07AM +0530, arindam.n...@amd.com wrote:
>> From: Arindam Nath 
>>
>> The idea behind flush queues is to defer the IOTLB flushing
>> for domains for which the mappings are no longer valid. We
>> add such domains in queue_add(), and when the queue size
>> reaches FLUSH_QUEUE_SIZE, we perform __queue_flush().
>>
>> Since we have already taken lock before __queue_flush()
>> is called, we need to make sure the IOTLB flushing is
>> performed as quickly as possible.
>>
>> In the current implementation, we perform IOTLB flushing
>> for all domains irrespective of which ones were actually
>> added in the flush queue initially. This can be quite
>> expensive especially for domains for which unmapping is
>> not required at this point of time.
>>
>> This patch makes use of domain information in
>> 'struct flush_queue_entry' to make sure we only flush
>> IOTLBs for domains who need it, skipping others.
>>
>> Signed-off-by: Arindam Nath 
>> ---
>>  drivers/iommu/amd_iommu.c | 15 ---
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 98940d1..6a9a048 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -2227,15 +2227,16 @@ static struct iommu_group 
>> *amd_iommu_device_group(struct device *dev)
>>  
>>  static void __queue_flush(struct flush_queue *queue)
>>  {
>> -struct protection_domain *domain;
>> -unsigned long flags;
>>  int idx;
>>  
>> -/* First flush TLB of all known domains */
>> -spin_lock_irqsave(&amd_iommu_pd_lock, flags);
>> -list_for_each_entry(domain, &amd_iommu_pd_list, list)
>> -domain_flush_tlb(domain);
>> -spin_unlock_irqrestore(&amd_iommu_pd_lock, flags);
>> +/* First flush TLB of all domains which were added to flush queue */
>> +for (idx = 0; idx < queue->next; ++idx) {
>> +struct flush_queue_entry *entry;
>> +
>> +entry = queue->entries + idx;
>> +
>> +domain_flush_tlb(&entry->dma_dom->domain);
>> +}
> 
> With this we will flush a domain every time we find one of its
> iova-addresses in the flush queue, so potentially we flush a domain
> multiple times per __queue_flush() call.
> 
> Its better to either add a flush-flag to the domains and evaluate that
> in __queue_flush or keep a list of domains to flush to make the flushing
> really more efficient.

Arindam, can you incorporate Joerg's feedback?

FWIW, looks like Carrizo systems are affected by this as well (see e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=101029#c21), so it would be
good to land this fix in some form ASAP.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 1/3] of/pci/dma: fix DMA configuration for PCI masters

2017-05-18 Thread Oza Oza via iommu

On Wed, May 17, 2017 at 10:40 PM, Bjorn Helgaas  wrote:
> On Tue, May 16, 2017 at 10:52:05AM +0530, Oza Pawandeep wrote:
>> current device framework and OF framework integration assumes
>
> s/current/The current/
>
>> dma-ranges in a way where memory-mapped devices define their
>> dma-ranges. (child-bus-address, parent-bus-address, length).
>>
>> of_dma_configure is specifically written to take care of memory
>> mapped devices. but no implementation exists for pci to take
>> care of pcie based memory ranges.
>
> s/pci/PCI/  (also other occurrences below)
> s/pcie/PCIe/
>
> I don't see how PCIe is relevant here.  The bridge might support PCIe,
> but I don't think anything here is actually specific to PCIe.  If
> that's the case, I think it's confusing to mention PCIe.

It attempts to fix of_dma_get_range for PCI master,
because it currently it is returning *size as 0 (to the caller of_dma_configure)
resulting into largest dma_mask which would be 64-bit mask on armv8.
which usually has worked so far, because any other SOC's PCI RC, do not
have the limitations as of Broadcom iproc based PCI RC.
our RC will drop 64bit IOVAs, because it is not capable of addressing
entire 64bit range.

infact there are 2 real problems, please allow me to explain.
please refer to my next mail in reply to Arnd Bergmann,

>
>> for e.g. iproc based SOCs and other SOCs(suc as rcar) have PCI
>> world dma-ranges.
>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>>
>> this patch serves following:
>>
>> 1) exposes interface to the pci host driver for their
>> inbound memory ranges
>>
>> 2) provide an interface to callers such as of_dma_get_ranges.
>> so then the returned size get best possible (largest) dma_mask.
>> because PCI RC drivers do not call APIs such as
>> dma_set_coherent_mask() and hence rather it shows its addressing
>> capabilities based on dma-ranges.
>> for e.g.
>> dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>;
>> we should get dev->coherent_dma_mask=0x7f.
>>
>> 3) this patch handles multiple inbound windows and dma-ranges.
>> it is left to the caller, how it wants to use them.
>> the new function returns the resources in a standard and unform way
>>
>> 4) this way the callers of for e.g. of_dma_get_ranges
>> does not need to change.
>
> Please start sentences with a capital letter.

will take care of your comments.
Thanks,
Oza.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 2/3] iommu/pci: reserve IOVA for PCI masters

2017-05-18 Thread Oza Oza via iommu

On Wed, May 17, 2017 at 10:41 PM, Bjorn Helgaas  wrote:
> On Tue, May 16, 2017 at 10:52:06AM +0530, Oza Pawandeep wrote:
>> this patch reserves the IOVA for PCI masters.
>> ARM64 based SOCs may have scattered memory banks.
>> such as iproc based SOC has
>>
>> <0x 0x8000 0x0 0x8000>, /* 2G @ 2G */
>> <0x0008 0x8000 0x3 0x8000>, /* 14G @ 34G */
>> <0x0090 0x 0x4 0x>, /* 16G @ 576G */
>> <0x00a0 0x 0x4 0x>; /* 16G @ 640G */
>>
>> but incoming PCI transcation addressing capability is limited
>
> s/transcation/transaction/
>
>> by host bridge, for example if max incoming window capability
>> is 512 GB, then 0x0090 and 0x00a0 will fall beyond it.
>>
>> to address this problem, iommu has to avoid allocating IOVA which
>
> s/iommu/IOMMU/
>
>> are reserved. which inturn does not allocate IOVA if it falls into hole.
>
> s/inturn/in turn/

Hi Bjorn,

Thank you for the comments.
Will take care of all your comments.

Regards,
Oza.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear

2017-05-18 Thread Matt Fleming

On Mon, 15 May, at 08:35:17PM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:19:21PM -0500, Tom Lendacky wrote:
>
> > +   paddr = boot_params.efi_info.efi_memmap_hi;
> > +   paddr <<= 32;
> > +   paddr |= boot_params.efi_info.efi_memmap;
> > +   if (phys_addr == paddr)
> > +   return true;
> > +
> > +   paddr = boot_params.efi_info.efi_systab_hi;
> > +   paddr <<= 32;
> > +   paddr |= boot_params.efi_info.efi_systab;
> 
> So those two above look like could be two global vars which are
> initialized somewhere in the EFI init path:
> 
> efi_memmap_phys and efi_systab_phys or so.
> 
> Matt ?
> 
> And then you won't need to create that paddr each time on the fly. I
> mean, it's not a lot of instructions but still...
 
We should already have the physical memmap address available in
'efi.memmap.phys_map'.

And the physical address of the system table should be in
'efi_phys.systab'. See efi_init().
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption

2017-05-18 Thread Borislav Petkov

On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote:
> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> determine if SME is active.

But why do user-space tools need to know that?

I mean, when we load the kdump kernel, we do it with the first kernel,
with the kexec_load() syscall, AFAICT. And that code does a lot of
things during that init, like machine_kexec_prepare()->init_pgtable() to
prepare the ident mapping of the second kernel, for example.

What I'm aiming at is that the first kernel knows *exactly* whether SME
is enabled or not and doesn't need to tell the second one through some
sysfs entries - it can do that during loading.

So I don't think we need any userspace things at all...

Or?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 2/4] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Laurent Pinchart

Hi Sricharan,

Thank you for the patch.

On Thursday 18 May 2017 20:24:15 Sricharan R wrote:
> While deferring the probe of IOMMU masters, xlate and
> add_device callbacks called from of_iommu_configure
> can pass back error values like -ENODEV, which means
> the IOMMU cannot be connected with that master for real
> reasons. Before the IOMMU probe deferral, all such errors
> were ignored. Now all those errors are propagated back,
> killing the master's probe for such errors. Instead ignore
> all the errors except EPROBE_DEFER, which is the only one
> of concern and let the master work without IOMMU, thus
> restoring the old behavior.
> 
> Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred
> probing or error") Reported-by: Geert Uytterhoeven 
> Tested-by: Magnus Damn 
> Signed-off-by: Sricharan R 

Reviewed-by: Laurent Pinchart 

> ---
> [V4] Reworded commit log and changed dev_info to dev_dbg
> 
>  drivers/iommu/of_iommu.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index e6e9bec..19779b8 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -237,6 +237,12 @@ const struct iommu_ops *of_iommu_configure(struct
> device *dev, ops = ERR_PTR(err);
>   }
> 
> + /* Ignore all other errors apart from EPROBE_DEFER */
> + if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
> + dev_dbg(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
> + ops = NULL;
> + }
> +
>   return ops;
>  }

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V4 3/4] ACPI/IORT: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Sricharan R

While deferring the probe of IOMMU masters, xlate and
add_device callbacks called from iort_iommu_configure
can pass back error values like -ENODEV, which means
the IOMMU cannot be connected with that master for real
reasons. Before the IOMMU probe deferral, all such errors
were ignored. Now all those errors are propagated back,
killing the master's probe for such errors. Instead ignore
all the errors except EPROBE_DEFER, which is the only one
of concern and let the master work without IOMMU, thus
restoring the old behavior.

Fixes: 5a1bb638d567 ("drivers: acpi: Handle IOMMU lookup failure with deferred 
probing or error")
Signed-off-by: Sricharan R 
---
[V4] Added this patch newly.

 drivers/acpi/arm64/iort.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index c5fecf9..16e101f 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -782,6 +782,12 @@ const struct iommu_ops *iort_iommu_configure(struct device 
*dev)
if (err)
ops = ERR_PTR(err);
 
+   /* Ignore all other errors apart from EPROBE_DEFER */
+   if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
+   dev_dbg(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
+   ops = NULL;
+   }
+
return ops;
 }
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V4 4/4] ARM: dma-mapping: Don't tear third-party mappings

2017-05-18 Thread Sricharan R

From: Laurent Pinchart 

arch_setup_dma_ops() is used in device probe code paths to create an
IOMMU mapping and attach it to the device. The function assumes that the
device is attached to a device-specific IOMMU instance (or at least a
device-specific TLB in a shared IOMMU instance) and thus creates a
separate mapping for every device.

On several systems (Renesas R-Car Gen2 being one of them), that
assumption is not true, and IOMMU mappings must be shared between
multiple devices. In those cases the IOMMU driver knows better than the
generic ARM dma-mapping layer and attaches mapping to devices manually
with arm_iommu_attach_device(), which sets the DMA ops for the device.

The arch_setup_dma_ops() function takes this into account and bails out
immediately if the device already has DMA ops assigned. However, the
corresponding arch_teardown_dma_ops() function, called from driver
unbind code paths (including probe deferral), will tear the mapping down
regardless of who created it. When the device is reprobed
arch_setup_dma_ops() will be called again but won't perform any
operation as the DMA ops will still be set.

We need to reset the DMA ops in arch_teardown_dma_ops() to fix this.
However, we can't do so unconditionally, as then a new mapping would be
created by arch_setup_dma_ops() when the device is reprobed, regardless
of whether the device needs to share a mapping or not. We must thus keep
track of whether arch_setup_dma_ops() created the mapping, and only in
that case tear it down in arch_teardown_dma_ops().

Keep track of that information in the dev_archdata structure. As the
structure is embedded in all instances of struct device let's not grow
it, but turn the existing dma_coherent bool field into a bitfield that
can be used for other purposes.

Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time for 
platform/amba/pci bus devices")
Reviewed-by: Robin Murphy 
Signed-off-by: Laurent Pinchart 
---
 arch/arm/include/asm/device.h | 3 ++-
 arch/arm/mm/dma-mapping.c | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 36ec9c8..3234fe9 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -19,7 +19,8 @@ struct dev_archdata {
 #ifdef CONFIG_XEN
const struct dma_map_ops *dev_dma_ops;
 #endif
-   bool dma_coherent;
+   unsigned int dma_coherent:1;
+   unsigned int dma_ops_setup:1;
 };
 
 struct omap_device;
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c742dfd..b48998f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2430,9 +2430,13 @@ void arch_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size,
dev->dma_ops = xen_dma_ops;
}
 #endif
+   dev->archdata.dma_ops_setup = true;
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
+   if (!dev->archdata.dma_ops_setup)
+   return;
+
arm_teardown_iommu_dma_ops(dev);
 }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V4 2/4] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Sricharan R

While deferring the probe of IOMMU masters, xlate and
add_device callbacks called from of_iommu_configure
can pass back error values like -ENODEV, which means
the IOMMU cannot be connected with that master for real
reasons. Before the IOMMU probe deferral, all such errors
were ignored. Now all those errors are propagated back,
killing the master's probe for such errors. Instead ignore
all the errors except EPROBE_DEFER, which is the only one
of concern and let the master work without IOMMU, thus
restoring the old behavior.

Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
probing or error")
Reported-by: Geert Uytterhoeven 
Tested-by: Magnus Damn 
Signed-off-by: Sricharan R 
---
[V4] Reworded commit log and changed dev_info to dev_dbg

 drivers/iommu/of_iommu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e6e9bec..19779b8 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -237,6 +237,12 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
ops = ERR_PTR(err);
}
 
+   /* Ignore all other errors apart from EPROBE_DEFER */
+   if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
+   dev_dbg(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
+   ops = NULL;
+   }
+
return ops;
 }
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V4 1/4] iommu: of: Fix check for returning EPROBE_DEFER

2017-05-18 Thread Sricharan R

Now with IOMMU probe deferral, we return -EPROBE_DEFER
for masters that are connected to an IOMMU which is not
probed yet, but going to get probed, so that we can attach
the correct dma_ops. So while trying to defer the probe of
the master, check if the of_iommu node that it is connected
to is marked in DT as 'status=disabled', then the IOMMU is never
is going to get probed. So simply return NULL and let the master
work without an IOMMU.

Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
probing or error")
Signed-off-by: Sricharan R 
Reported-by: Geert Uytterhoeven 
Reviewed-by: Laurent Pinchart 
Tested-by: Will Deacon 
Tested-by: Magnus Damn 
Acked-by: Will Deacon 
---
 drivers/iommu/of_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 9f44ee8..e6e9bec 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -118,6 +118,7 @@ static bool of_iommu_driver_present(struct device_node *np)
 
ops = iommu_ops_from_fwnode(fwnode);
if ((ops && !ops->of_xlate) ||
+   !of_device_is_available(iommu_spec->np) ||
(!ops && !of_iommu_driver_present(iommu_spec->np)))
return NULL;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 2/3] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Sricharan R

Hi Laurent,

On 5/18/2017 7:13 PM, Laurent Pinchart wrote:
> Hi Sricharan
> 
> On Thursday 18 May 2017 19:08:12 Sricharan R wrote:
>> On 5/18/2017 6:00 PM, Laurent Pinchart wrote:
>>> On Thursday 18 May 2017 17:26:14 Sricharan R wrote:
 On 5/18/2017 4:09 PM, Laurent Pinchart wrote:
> On Thursday 18 May 2017 15:37:09 Sricharan R wrote:
>> While deferring the probe of IOMMU masters,
>> xlate and add_device callback can pass back error values
>> like -ENODEV, which means IOMMU cannot be connected
>> with that master for real reasons. So rather than
>> killing the master's probe for such errors, just
>> ignore the errors and let the master work without
>> an IOMMU.
>
> I don't think this is a good idea. Why should we allow IOMMU drivers to
> return an error if we're always going to ignore the error value ? That
> will lead to drivers implementing slightly different behaviours, which
> will be painful the day we'll need to start acting based on the error
> value.

 The of_iommu_configure interface, before this series, was returning
 either correct 'iommu_ops' or NULL. Also there was no return value from
 of_dma_configure which calls of_iommu_configure. This means that if we
 block only -ENODEV now and let the other errors, the probe of the master
 devices can be killed for reasons apart from deferring. This would be a
 change in behavior introduced. All of xlate, add_device, of_pci_map_rid
 and others can return values apart from -ENODEV. So was thinking that
 restoring the old behavior, except for returning EPROBE_DEFER was the
 better thing to do ?
>>>
>>> We went from a situation where of_iommu_configure() could return either
>>> valid operations in the case the device was to be handled by the IOMMU or
>>> NULL otherwise, to a situation where we needed a third option for probe
>>> deferral. The way we've done this, through error pointers, allows lots of
>>> other errors to be returned as well from the of_xlate and add_device
>>> operations.
>>
>> right, this was difference in the behavior now.
>>
>>> There is currently no use for returning error codes other than
>>> -EPROBE_DEFFER from of_iommu_configure(), so your proposal is to block
>>> errors returned from the of_xlate and add_device operations inside
>>> of_iommu_configure(). My point is that, by doing so, we allow IOMMU
>>> drivers to return random error codes that are then ignored. I believe
>>> this can cause problems in the future when we will need to extend the API
>>> and standardize more error codes, as by then IOMMU drivers will return
>>> random errors (they actually do so already to some extent).
>>>
>>> For of_xlate I agree with you to some extent. v4.11 just checked whether
>>> of_xlate succeeded or not, and just didn't use the IOMMU in case it
>>> failed. The exact error value was ignored, and drivers already return
>>> random errors. Going back to the v4.11 behaviour is what we need to do in
>>> the short-term, even if I believe we should standardize the error values
>>> returned from of_xlate after v4.12.
>>>
>>> For add_device, however, the situation is a bit different. The add_device
>>> operation is called from the IOMMU bus notifier, and the -ENODEV error is
>>> ignored by add_iommu_group(). Any other error will cause bus_set_iommu()
>>> to fail, which makes IOMMU probing fail for the drivers that check the
>>> return value of bus_set_iommu() (some of them don't :-/).
>>>
>>> Fixing all this properly requires standardizing the error codes, and going
>>> through the existing IOMMU drivers to comply with the standardized
>>> behaviour.
>>
>> I understand your concern on standardizing the error codes from xlate,
>> add_device, others and handling them properly. As you said there are quite
>> some errors returned from them today. Also another thing is standardizing
>> the behavior of of_iommu_configure itself. So that API serves to connect a
>> device to its correct iommu_ops. When that's not possible, what should be
>> the output and how should that be handled by the caller. The current
>> behavior is to either 1) connect to correct ops or 2) wait for it or 3)
>> progress further with plain/default dma_ops. Anyways as you said
>> standardizing the iommu api ops, would make the of_iommu_configure handling
>> more specific. Having said that i think similar fix needs to be done for
>> acpi's iort_iommu_configure as well.
> 
> I'm less knowledgeable about ACPI but I think you're right. Would you like to 
> tackle this for v4.13 ? :-)

Will add the fix for ACPI now in this series. ok, Will really see if i can
address the standardizing part for 4.13.

> 
>>> While this shouldn't be very difficult, it's likely not material for a
>>> v4.12- rc fix. We will thus likely need to merge this patch (or something
>>> very similar to it), but I'd really like to see this fixed properly for
>>> v4.13.
>>
>> When you say "merge this patch (or something simil

Re: [PATCH V2 2/3] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Laurent Pinchart

Hi Sricharan

On Thursday 18 May 2017 19:08:12 Sricharan R wrote:
> On 5/18/2017 6:00 PM, Laurent Pinchart wrote:
> > On Thursday 18 May 2017 17:26:14 Sricharan R wrote:
> >> On 5/18/2017 4:09 PM, Laurent Pinchart wrote:
> >>> On Thursday 18 May 2017 15:37:09 Sricharan R wrote:
>  While deferring the probe of IOMMU masters,
>  xlate and add_device callback can pass back error values
>  like -ENODEV, which means IOMMU cannot be connected
>  with that master for real reasons. So rather than
>  killing the master's probe for such errors, just
>  ignore the errors and let the master work without
>  an IOMMU.
> >>> 
> >>> I don't think this is a good idea. Why should we allow IOMMU drivers to
> >>> return an error if we're always going to ignore the error value ? That
> >>> will lead to drivers implementing slightly different behaviours, which
> >>> will be painful the day we'll need to start acting based on the error
> >>> value.
> >> 
> >> The of_iommu_configure interface, before this series, was returning
> >> either correct 'iommu_ops' or NULL. Also there was no return value from
> >> of_dma_configure which calls of_iommu_configure. This means that if we
> >> block only -ENODEV now and let the other errors, the probe of the master
> >> devices can be killed for reasons apart from deferring. This would be a
> >> change in behavior introduced. All of xlate, add_device, of_pci_map_rid
> >> and others can return values apart from -ENODEV. So was thinking that
> >> restoring the old behavior, except for returning EPROBE_DEFER was the
> >> better thing to do ?
> > 
> > We went from a situation where of_iommu_configure() could return either
> > valid operations in the case the device was to be handled by the IOMMU or
> > NULL otherwise, to a situation where we needed a third option for probe
> > deferral. The way we've done this, through error pointers, allows lots of
> > other errors to be returned as well from the of_xlate and add_device
> > operations.
>
> right, this was difference in the behavior now.
> 
> > There is currently no use for returning error codes other than
> > -EPROBE_DEFFER from of_iommu_configure(), so your proposal is to block
> > errors returned from the of_xlate and add_device operations inside
> > of_iommu_configure(). My point is that, by doing so, we allow IOMMU
> > drivers to return random error codes that are then ignored. I believe
> > this can cause problems in the future when we will need to extend the API
> > and standardize more error codes, as by then IOMMU drivers will return
> > random errors (they actually do so already to some extent).
> > 
> > For of_xlate I agree with you to some extent. v4.11 just checked whether
> > of_xlate succeeded or not, and just didn't use the IOMMU in case it
> > failed. The exact error value was ignored, and drivers already return
> > random errors. Going back to the v4.11 behaviour is what we need to do in
> > the short-term, even if I believe we should standardize the error values
> > returned from of_xlate after v4.12.
> > 
> > For add_device, however, the situation is a bit different. The add_device
> > operation is called from the IOMMU bus notifier, and the -ENODEV error is
> > ignored by add_iommu_group(). Any other error will cause bus_set_iommu()
> > to fail, which makes IOMMU probing fail for the drivers that check the
> > return value of bus_set_iommu() (some of them don't :-/).
> > 
> > Fixing all this properly requires standardizing the error codes, and going
> > through the existing IOMMU drivers to comply with the standardized
> > behaviour.
>
> I understand your concern on standardizing the error codes from xlate,
> add_device, others and handling them properly. As you said there are quite
> some errors returned from them today. Also another thing is standardizing
> the behavior of of_iommu_configure itself. So that API serves to connect a
> device to its correct iommu_ops. When that's not possible, what should be
> the output and how should that be handled by the caller. The current
> behavior is to either 1) connect to correct ops or 2) wait for it or 3)
> progress further with plain/default dma_ops. Anyways as you said
> standardizing the iommu api ops, would make the of_iommu_configure handling
> more specific. Having said that i think similar fix needs to be done for
> acpi's iort_iommu_configure as well.

I'm less knowledgeable about ACPI but I think you're right. Would you like to 
tackle this for v4.13 ? :-)

> > While this shouldn't be very difficult, it's likely not material for a
> > v4.12- rc fix. We will thus likely need to merge this patch (or something
> > very similar to it), but I'd really like to see this fixed properly for
> > v4.13.
>
> When you say "merge this patch (or something similar)", is that about
> documenting the error values for of_xlate and add_device that you showed
> down below  (or) about the patch in discussion ?

I meant the patch we're discussing, "[PATCH V2 2/3] iom

Re: [v6 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-05-18 Thread Geetha Akula

On Tue, May 16, 2017 at 5:45 AM, Rob Herring  wrote:
> DT changes should go to DT list.
>
> On Fri, May 12, 2017 at 7:41 AM, Geetha sowjanya
>  wrote:
>> From: Linu Cherian 
>>
>> Cavium ThunderX2 SMMU implementation doesn't support page 1 register space
>> and PAGE0_REGS_ONLY option is enabled as an errata workaround.
>> This option when turned on, replaces all page 1 offsets used for
>> EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.
>>
>> SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY,
>> since resource size can be either 64k/128k.
>> For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
>> platform_get_resource call, so that SMMU options are set beforehand.
>>
>> Signed-off-by: Linu Cherian 
>> Signed-off-by: Geetha Sowjanya 
>> ---
>>  Documentation/arm64/silicon-errata.txt |  1 +
>>  .../devicetree/bindings/iommu/arm,smmu-v3.txt  |  6 ++
>>  drivers/iommu/arm-smmu-v3.c| 64 
>> +-
>>  3 files changed, 56 insertions(+), 15 deletions(-)
>>
>> diff --git a/Documentation/arm64/silicon-errata.txt 
>> b/Documentation/arm64/silicon-errata.txt
>> index 10f2ddd..4693a32 100644
>> --- a/Documentation/arm64/silicon-errata.txt
>> +++ b/Documentation/arm64/silicon-errata.txt
>> @@ -62,6 +62,7 @@ stable kernels.
>>  | Cavium | ThunderX GICv3  | #23154  | CAVIUM_ERRATUM_23154 
>>|
>>  | Cavium | ThunderX Core   | #27456  | CAVIUM_ERRATUM_27456 
>>|
>>  | Cavium | ThunderX SMMUv2 | #27704  | N/A  
>>|
>> +| Cavium | ThunderX2 SMMUv3| #74 | N/A  
>>|
>>  || | |  
>>|
>>  | Freescale/NXP  | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585  
>>|
>>  || | |  
>>|
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt 
>> b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> index be57550..e6da62b 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
>> @@ -49,6 +49,12 @@ the PCIe specification.
>>  - hisilicon,broken-prefetch-cmd
>>  : Avoid sending CMD_PREFETCH_* commands to the SMMU.
>>
>> +- cavium-cn99xx,broken-page1-regspace
>
> "cavium-cn99xx" is not a vendor.
>
> I'm sure you have an SoC specific compatible string, so use that to
> enable any errata work-arounds.
>
> Rob

Hi Rob,

The "cavium-cn99xx" indeed vendor specific. "cavium" is the vendor and "cn99xx"
is the chip model number. We can't use just the vendor name because,
in future their
might be other model chips from the same vendor.


Thank you,
Geetha.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 2/3] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Sricharan R

Hi Laurent,

On 5/18/2017 6:00 PM, Laurent Pinchart wrote:
> Hi Sricharan,
> 
> On Thursday 18 May 2017 17:26:14 Sricharan R wrote:
>> On 5/18/2017 4:09 PM, Laurent Pinchart wrote:
>>> On Thursday 18 May 2017 15:37:09 Sricharan R wrote:
 While deferring the probe of IOMMU masters,
 xlate and add_device callback can pass back error values
 like -ENODEV, which means IOMMU cannot be connected
 with that master for real reasons. So rather than
 killing the master's probe for such errors, just
 ignore the errors and let the master work without
 an IOMMU.
>>>
>>> I don't think this is a good idea. Why should we allow IOMMU drivers to
>>> return an error if we're always going to ignore the error value ? That
>>> will lead to drivers implementing slightly different behaviours, which
>>> will be painful the day we'll need to start acting based on the error
>>> value.
>>
>> The of_iommu_configure interface, before this series, was returning either
>> correct 'iommu_ops' or NULL. Also there was no return value from
>> of_dma_configure which calls of_iommu_configure. This means that if we block
>> only -ENODEV now and let the other errors, the probe of the master devices
>> can be killed for reasons apart from deferring. This would be a change in
>> behavior introduced. All of xlate, add_device, of_pci_map_rid and others
>> can return values apart from -ENODEV. So was thinking that restoring the
>> old behavior, except for returning EPROBE_DEFER was the better thing to do
>> ?
> 
> We went from a situation where of_iommu_configure() could return either valid 
> operations in the case the device was to be handled by the IOMMU or NULL 
> otherwise, to a situation where we needed a third option for probe deferral. 
> The way we've done this, through error pointers, allows lots of other errors 
> to be returned as well from the of_xlate and add_device operations.
> 

right, this was difference in the behavior now.

> There is currently no use for returning error codes other than -EPROBE_DEFFER 
> from of_iommu_configure(), so your proposal is to block errors returned from 
> the of_xlate and add_device operations inside of_iommu_configure(). My point 
> is that, by doing so, we allow IOMMU drivers to return random error codes 
> that 
> are then ignored. I believe this can cause problems in the future when we 
> will 
> need to extend the API and standardize more error codes, as by then IOMMU 
> drivers will return random errors (they actually do so already to some 
> extent).
> 
> For of_xlate I agree with you to some extent. v4.11 just checked whether 
> of_xlate succeeded or not, and just didn't use the IOMMU in case it failed. 
> The exact error value was ignored, and drivers already return random errors. 
> Going back to the v4.11 behaviour is what we need to do in the short-term, 
> even if I believe we should standardize the error values returned from 
> of_xlate after v4.12.
> 
> For add_device, however, the situation is a bit different. The add_device  
> operation is called from the IOMMU bus notifier, and the -ENODEV error is 
> ignored by add_iommu_group(). Any other error will cause bus_set_iommu() to 
> fail, which makes IOMMU probing fail for the drivers that check the return 
> value of bus_set_iommu() (some of them don't :-/).
> 
> Fixing all this properly requires standardizing the error codes, and going 
> through the existing IOMMU drivers to comply with the standardized behaviour. 

I understand your concern on standardizing the error codes from xlate,
add_device, others and handling them properly. As you said there are quite some
errors returned from them today. Also another thing is standardizing the
behavior of of_iommu_configure itself. So that API serves to connect a device
to its correct iommu_ops. When that's not possible, what should be the output
and how should that be handled by the caller. The current behavior is to
either 1) connect to correct ops or 2) wait for it or 3) progress further
with plain/default dma_ops. Anyways as you said standardizing the iommu api
ops, would make the of_iommu_configure handling more specific. Having said that
i think similar fix needs to be done for acpi's iort_iommu_configure as
well.

> While this shouldn't be very difficult, it's likely not material for a v4.12-
> rc fix. We will thus likely need to merge this patch (or something very 
> similar to it), but I'd really like to see this fixed properly for v4.13.
> 

When you say "merge this patch (or something similar)", is that about
documenting the error values for of_xlate and add_device that you showed
down below  (or) about the patch in discussion ?

Regards,
 Sricharan

>>> At the very least, if you want to give a specific meaning to -ENODEV, you
>>> should check for that value specifically and not ignore all errors other
>>> than -EPROBE_DEFER. You also need to document the meaning of the error
>>> value. This can be done in the documentation of the of_xlate operation in
>

Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place

2017-05-18 Thread Borislav Petkov

On Tue, Apr 18, 2017 at 04:21:49PM -0500, Tom Lendacky wrote:
> Add the support to encrypt the kernel in-place. This is done by creating
> new page mappings for the kernel - a decrypted write-protected mapping
> and an encrypted mapping. The kernel is encrypted by copying it through
> a temporary buffer.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/mem_encrypt.h |6 +
>  arch/x86/mm/Makefile   |2 
>  arch/x86/mm/mem_encrypt.c  |  262 
> 
>  arch/x86/mm/mem_encrypt_boot.S |  151 +
>  4 files changed, 421 insertions(+)
>  create mode 100644 arch/x86/mm/mem_encrypt_boot.S
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index b406df2..8f6f9b4 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -31,6 +31,12 @@ static inline u64 sme_dma_mask(void)
>   return ((u64)sme_me_mask << 1) - 1;
>  }
>  
> +void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
> +  unsigned long decrypted_kernel_vaddr,
> +  unsigned long kernel_len,
> +  unsigned long encryption_wa,
> +  unsigned long encryption_pgd);
> +
>  void __init sme_early_encrypt(resource_size_t paddr,
> unsigned long size);
>  void __init sme_early_decrypt(resource_size_t paddr,
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 9e13841..0633142 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU)  += numa_emulation.o
>  obj-$(CONFIG_X86_INTEL_MPX)  += mpx.o
>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
>  obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
> +
> +obj-$(CONFIG_AMD_MEM_ENCRYPT)+= mem_encrypt_boot.o
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 30b07a3..0ff41a4 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Since SME related variables are set early in the boot process they must
> @@ -216,8 +217,269 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned 
> long size)
>   set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
>  }
>  
> +void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,

static

> +   unsigned long end)
> +{
> + unsigned long addr = start;
> + pgdval_t *pgd_p;
> +
> + while (addr < end) {
> + unsigned long pgd_end;
> +
> + pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
> + if (pgd_end > end)
> + pgd_end = end;
> +
> + pgd_p = (pgdval_t *)pgd_base + pgd_index(addr);
> + *pgd_p = 0;

Hmm, so this is a contiguous range from [start:end] which translates to
8-byte PGD pointers in the PGD page so you can simply memset that range,
no?

Instead of iterating over each one?

> +
> + addr = pgd_end;
> + }
> +}
> +
> +#define PGD_FLAGS_KERNPG_TABLE_NOENC
> +#define PUD_FLAGS_KERNPG_TABLE_NOENC
> +#define PMD_FLAGS(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
> +
> +static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
> +  unsigned long vaddr, pmdval_t pmd_val)
> +{
> + pgdval_t pgd, *pgd_p;
> + pudval_t pud, *pud_p;
> + pmdval_t pmd, *pmd_p;

You should use the enclosing type, not the underlying one. I.e.,

pgd_t *pgd;
pud_t *pud;
...

and then the macros native_p*d_val(), p*d_offset() and so on. I say
native_* because we don't want to have any paravirt nastyness here.
I believe your previous version was using the proper interfaces.

And the kernel has gotten 5-level pagetables support in
the meantime, so this'll need to start at p4d AFAICT.
arch/x86/mm/fault.c::dump_pagetable() looks like a good example to stare
at.

> + pgd_p = (pgdval_t *)pgd_base + pgd_index(vaddr);
> + pgd = *pgd_p;
> + if (pgd) {
> + pud_p = (pudval_t *)(pgd & ~PTE_FLAGS_MASK);
> + } else {
> + pud_p = pgtable_area;
> + memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
> + pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
> +
> + *pgd_p = (pgdval_t)pud_p + PGD_FLAGS;
> + }
> +
> + pud_p += pud_index(vaddr);
> + pud = *pud_p;
> + if (pud) {
> + if (pud & _PAGE_PSE)
> + goto out;
> +
> + pmd_p = (pmdval_t *)(pud & ~PTE_FLAGS_MASK);
> + } else {
> + pmd_p = pgtable_area;
> + memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
> + pgtable_area += sizeof(*pmd_p) * PTRS_PER_PMD;
> +
> + *pud_p = (pudval_t)pmd_p + PUD_FLAGS;
> + }
> +
> + pmd_p += pmd_index(vaddr);
> + pmd = *pmd_p;
> +

Re: [PATCH V2 2/3] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Laurent Pinchart

Hi Sricharan,

On Thursday 18 May 2017 17:26:14 Sricharan R wrote:
> On 5/18/2017 4:09 PM, Laurent Pinchart wrote:
> > On Thursday 18 May 2017 15:37:09 Sricharan R wrote:
> >> While deferring the probe of IOMMU masters,
> >> xlate and add_device callback can pass back error values
> >> like -ENODEV, which means IOMMU cannot be connected
> >> with that master for real reasons. So rather than
> >> killing the master's probe for such errors, just
> >> ignore the errors and let the master work without
> >> an IOMMU.
> > 
> > I don't think this is a good idea. Why should we allow IOMMU drivers to
> > return an error if we're always going to ignore the error value ? That
> > will lead to drivers implementing slightly different behaviours, which
> > will be painful the day we'll need to start acting based on the error
> > value.
> 
> The of_iommu_configure interface, before this series, was returning either
> correct 'iommu_ops' or NULL. Also there was no return value from
> of_dma_configure which calls of_iommu_configure. This means that if we block
> only -ENODEV now and let the other errors, the probe of the master devices
> can be killed for reasons apart from deferring. This would be a change in
> behavior introduced. All of xlate, add_device, of_pci_map_rid and others
> can return values apart from -ENODEV. So was thinking that restoring the
> old behavior, except for returning EPROBE_DEFER was the better thing to do
> ?

We went from a situation where of_iommu_configure() could return either valid 
operations in the case the device was to be handled by the IOMMU or NULL 
otherwise, to a situation where we needed a third option for probe deferral. 
The way we've done this, through error pointers, allows lots of other errors 
to be returned as well from the of_xlate and add_device operations.

There is currently no use for returning error codes other than -EPROBE_DEFFER 
from of_iommu_configure(), so your proposal is to block errors returned from 
the of_xlate and add_device operations inside of_iommu_configure(). My point 
is that, by doing so, we allow IOMMU drivers to return random error codes that 
are then ignored. I believe this can cause problems in the future when we will 
need to extend the API and standardize more error codes, as by then IOMMU 
drivers will return random errors (they actually do so already to some 
extent).

For of_xlate I agree with you to some extent. v4.11 just checked whether 
of_xlate succeeded or not, and just didn't use the IOMMU in case it failed. 
The exact error value was ignored, and drivers already return random errors. 
Going back to the v4.11 behaviour is what we need to do in the short-term, 
even if I believe we should standardize the error values returned from 
of_xlate after v4.12.

For add_device, however, the situation is a bit different. The add_device 
operation is called from the IOMMU bus notifier, and the -ENODEV error is 
ignored by add_iommu_group(). Any other error will cause bus_set_iommu() to 
fail, which makes IOMMU probing fail for the drivers that check the return 
value of bus_set_iommu() (some of them don't :-/).

Fixing all this properly requires standardizing the error codes, and going 
through the existing IOMMU drivers to comply with the standardized behaviour. 
While this shouldn't be very difficult, it's likely not material for a v4.12-
rc fix. We will thus likely need to merge this patch (or something very 
similar to it), but I'd really like to see this fixed properly for v4.13.

> > At the very least, if you want to give a specific meaning to -ENODEV, you
> > should check for that value specifically and not ignore all errors other
> > than -EPROBE_DEFER. You also need to document the meaning of the error
> > value. This can be done in the documentation of the of_xlate operation in
> > include/linux/iommu.h:
> > 
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 2cb54adc4a33..6ba553e7384a 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -181,7 +181,6 @@ struct iommu_resv_region {
> >   * @domain_window_disable: Disable a particular window for a domain
> >   * @domain_set_windows: Set the number of windows for a domain
> >   * @domain_get_windows: Return the number of windows for a domain
> > - * @of_xlate: add OF master IDs to iommu grouping
> >   * @pgsize_bitmap: bitmap of all possible supported page sizes
> >   */
> >  
> >  struct iommu_ops {
> > @@ -224,6 +223,11 @@ struct iommu_ops {
> > /* Get the number of windows per domain */
> > u32 (*domain_get_windows)(struct iommu_domain *domain);
> > 
> > +   /**
> > +* @of_xlate:
> > +*
> > +* Add OF master IDs to iommu grouping.
> > +*/
> > int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
> > 
> > unsigned long pgsize_bitmap;
> > 
> > And add documentation for the error codes there.
> > 
> > If you want to ignore some errors returned from the add_device operation
> > you should documen

Re: [PATCH V2 2/3] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Sricharan R

Hi Laurent,

On 5/18/2017 4:09 PM, Laurent Pinchart wrote:
> Hi Sricharan,
> 
> Thank you for the patch.
> 
> On Thursday 18 May 2017 15:37:09 Sricharan R wrote:
>> While deferring the probe of IOMMU masters,
>> xlate and add_device callback can pass back error values
>> like -ENODEV, which means IOMMU cannot be connected
>> with that master for real reasons. So rather than
>> killing the master's probe for such errors, just
>> ignore the errors and let the master work without
>> an IOMMU.
> 
> I don't think this is a good idea. Why should we allow IOMMU drivers to 
> return 
> an error if we're always going to ignore the error value ? That will lead to 
> drivers implementing slightly different behaviours, which will be painful the 
> day we'll need to start acting based on the error value.
> 

The of_iommu_configure interface, before this series, was returning either
correct 'iommu_ops' or NULL. Also there was no return value from
of_dma_configure which calls of_iommu_configure. This means that if we block
only -ENODEV now and let the other errors, the probe of the master devices
can be killed for reasons apart from deferring. This would be a change in
behavior introduced. All of xlate, add_device, of_pci_map_rid and others
can return values apart from -ENODEV. So was thinking that restoring the
old behavior, except for returning EPROBE_DEFER was the better thing to do ? 

Regards,
 Sricharan

> At the very least, if you want to give a specific meaning to -ENODEV, you 
> should check for that value specifically and not ignore all errors other than 
> -EPROBE_DEFER. You also need to document the meaning of the error value. This 
> can be done in the documentation of the of_xlate operation in 
> include/linux/iommu.h:
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 2cb54adc4a33..6ba553e7384a 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -181,7 +181,6 @@ struct iommu_resv_region {
>   * @domain_window_disable: Disable a particular window for a domain
>   * @domain_set_windows: Set the number of windows for a domain
>   * @domain_get_windows: Return the number of windows for a domain
> - * @of_xlate: add OF master IDs to iommu grouping
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -224,6 +223,11 @@ struct iommu_ops {
>   /* Get the number of windows per domain */
>   u32 (*domain_get_windows)(struct iommu_domain *domain);
>  
> + /**
> +  * @of_xlate:
> +  *
> +  * Add OF master IDs to iommu grouping.
> +  */
>   int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
>  
>   unsigned long pgsize_bitmap;
> 
> 
> And add documentation for the error codes there.
> 
> If you want to ignore some errors returned from the add_device operation you 
> should document it similarly, and in particular document which error check(s) 
> need to be performed by of_xlate and which are the responsibility of 
> add_device.
> 
>> Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred
>> probing or error")
>> Reported-by: Geert Uytterhoeven 
>> Tested-by: Magnus Damn 
>> Signed-off-by: Sricharan R 
>> ---
>> [V2] Corrected spelling/case in commit log
>>
>>  drivers/iommu/of_iommu.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index e6e9bec..f0d22c0 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -237,6 +237,12 @@ const struct iommu_ops *of_iommu_configure(struct
>> device *dev, ops = ERR_PTR(err);
>>  }
>>
>> +/* Ignore all other errors apart from EPROBE_DEFER */
>> +if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
>> +dev_info(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
>> +ops = NULL;
>> +}
>> +
>>  return ops;
>>  }
> 

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation

2017-05-18 Thread Jean-Philippe Brucker

On 17/05/17 11:27, Liu, Yi L wrote:
> On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
>> On Wed, 26 Apr 2017 18:12:02 +0800
>> "Liu, Yi L"  wrote:
>>>  
>>> +/* IOCTL for Shared Virtual Memory Bind */
>>> +struct vfio_device_svm {
>>> +   __u32   argsz;
>>> +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */
>>> +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace 
>>> driver */
>>> +#define VFIO_SVM_BIND_PGTABLE  (1 << 2) /* Bind guest mmu page table */
>>> +   __u32   flags;
>>> +   __u32   length;
>>> +   __u8data[];
>>
>> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
>> pasid_table_info?  So at a minimum this is a union including struct
>> pasid_table_info.  Furthermore how does a user learn what the opaque
>> data in struct pasid_table_info is without looking at the code?  A user
>> API needs to be clear and documented, not opaque and variable.  We
>> should also have references to the hardware spec for an Intel or ARM
>> PASID table in uapi.  flags should be defined as they're used, let's
>> not reserve them with the expectation of future use.
>>
> 
> Agree. would add description accordingly. For the flags, I would remove
> the last two as I wouldn't use. I think Jean would add them in his/her
> patchset. Anyhow, one of us need to do merge on the flags.

Yes, I can add the VFIO_SVM_BIND_PASID (or rather _TASK) flag as (1 << 1)
in my series if it helps the merge. The PGTABLE flag is for another series
which I don't plan to send out anytime soon, since there already is enough
pending work on this.

Thanks,
Jean


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 2/3] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Laurent Pinchart

Hi Sricharan,

Thank you for the patch.

On Thursday 18 May 2017 15:37:09 Sricharan R wrote:
> While deferring the probe of IOMMU masters,
> xlate and add_device callback can pass back error values
> like -ENODEV, which means IOMMU cannot be connected
> with that master for real reasons. So rather than
> killing the master's probe for such errors, just
> ignore the errors and let the master work without
> an IOMMU.

I don't think this is a good idea. Why should we allow IOMMU drivers to return 
an error if we're always going to ignore the error value ? That will lead to 
drivers implementing slightly different behaviours, which will be painful the 
day we'll need to start acting based on the error value.

At the very least, if you want to give a specific meaning to -ENODEV, you 
should check for that value specifically and not ignore all errors other than 
-EPROBE_DEFER. You also need to document the meaning of the error value. This 
can be done in the documentation of the of_xlate operation in 
include/linux/iommu.h:

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54adc4a33..6ba553e7384a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -181,7 +181,6 @@ struct iommu_resv_region {
  * @domain_window_disable: Disable a particular window for a domain
  * @domain_set_windows: Set the number of windows for a domain
  * @domain_get_windows: Return the number of windows for a domain
- * @of_xlate: add OF master IDs to iommu grouping
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -224,6 +223,11 @@ struct iommu_ops {
/* Get the number of windows per domain */
u32 (*domain_get_windows)(struct iommu_domain *domain);
 
+   /**
+* @of_xlate:
+*
+* Add OF master IDs to iommu grouping.
+*/
int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
 
unsigned long pgsize_bitmap;


And add documentation for the error codes there.

If you want to ignore some errors returned from the add_device operation you 
should document it similarly, and in particular document which error check(s) 
need to be performed by of_xlate and which are the responsibility of 
add_device.

> Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred
> probing or error")
> Reported-by: Geert Uytterhoeven 
> Tested-by: Magnus Damn 
> Signed-off-by: Sricharan R 
> ---
> [V2] Corrected spelling/case in commit log
> 
>  drivers/iommu/of_iommu.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index e6e9bec..f0d22c0 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -237,6 +237,12 @@ const struct iommu_ops *of_iommu_configure(struct
> device *dev, ops = ERR_PTR(err);
>   }
> 
> + /* Ignore all other errors apart from EPROBE_DEFER */
> + if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
> + dev_info(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
> + ops = NULL;
> + }
> +
>   return ops;
>  }

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 1/3] iommu: of: Fix check for returning EPROBE_DEFER

2017-05-18 Thread Laurent Pinchart

Hi Sricharan,

Thank you for the patch.

On Thursday 18 May 2017 15:37:08 Sricharan R wrote:
> Now with IOMMU probe deferral, we return -EPROBE_DEFER
> for masters that are connected to an IOMMU which is not
> probed yet, but going to get probed, so that we can attach
> the correct dma_ops. So while trying to defer the probe of
> the master, check if the of_iommu node that it is connected
> to is marked in DT as 'status=disabled', then the IOMMU is never
> is going to get probed. So simply return NULL and let the master
> work without an IOMMU.
> 
> Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred
> probing or error") Signed-off-by: Sricharan R 
> Reported-by: Geert Uytterhoeven 
> Tested-by: Will Deacon 
> Tested-by: Magnus Damn 
> Acked-by: Will Deacon 

Reviewed-by: Laurent Pinchart 

> ---
> [V2] Corrected spelling/case in commit log
> 
>  drivers/iommu/of_iommu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 9f44ee8..e6e9bec 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -118,6 +118,7 @@ static bool of_iommu_driver_present(struct device_node
> *np)
> 
>   ops = iommu_ops_from_fwnode(fwnode);
>   if ((ops && !ops->of_xlate) ||
> + !of_device_is_available(iommu_spec->np) ||
>   (!ops && !of_iommu_driver_present(iommu_spec->np)))
>   return NULL;

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Qemu-devel] [RFC PATCH 5/8] VFIO: Add new IOTCL for PASID Table bind propagation

2017-05-18 Thread Liu, Yi L

On Fri, May 12, 2017 at 03:58:51PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:02 +0800
> "Liu, Yi L"  wrote:
> 
> > From: "Liu, Yi L" 
> > 
> > This patch adds VFIO_IOMMU_SVM_BIND_TASK for potential PASID table
> > binding requests.
> > 
> > On VT-d, this IOCTL cmd would be used to link the guest PASID page table
> > to host. While for other vendors, it may also be used to support other
> > kind of SVM bind request. Previously, there is a discussion on it with
> > ARM engineer. It can be found by the link below. This IOCTL cmd may
> > support SVM PASID bind request from userspace driver, or page table(cr3)
> > bind request from guest. These SVM bind requests would be supported by
> > adding different flags. e.g. VFIO_SVM_BIND_PASID is added to support
> > PASID bind from userspace driver, VFIO_SVM_BIND_PGTABLE is added to
> > support page table bind from guest.
> > 
> > https://patchwork.kernel.org/patch/9594231/
> > 
> > Signed-off-by: Liu, Yi L 
> > ---
> >  include/uapi/linux/vfio.h | 17 +
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 519eff3..6b97987 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -547,6 +547,23 @@ struct vfio_iommu_type1_dma_unmap {
> >  #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
> >  #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
> >  
> > +/* IOCTL for Shared Virtual Memory Bind */
> > +struct vfio_device_svm {
> > +   __u32   argsz;
> > +#define VFIO_SVM_BIND_PASIDTBL (1 << 0) /* Bind PASID Table */
> > +#define VFIO_SVM_BIND_PASID(1 << 1) /* Bind PASID from userspace 
> > driver */
> > +#define VFIO_SVM_BIND_PGTABLE  (1 << 2) /* Bind guest mmu page table */
> > +   __u32   flags;
> > +   __u32   length;
> > +   __u8data[];
> 
> In the case of VFIO_SVM_BIND_PASIDTBL this is clearly struct
> pasid_table_info?  So at a minimum this is a union including struct
> pasid_table_info.  Furthermore how does a user learn what the opaque
> data in struct pasid_table_info is without looking at the code?  A user
> API needs to be clear and documented, not opaque and variable.  We
> should also have references to the hardware spec for an Intel or ARM
> PASID table in uapi.  flags should be defined as they're used, let's
> not reserve them with the expectation of future use.
> 

Agree. would add description accordingly. For the flags, I would remove
the last two as I wouldn't use. I think Jean would add them in his/her
patchset. Anyhow, one of us need to do merge on the flags.

Thanks,
Yi L

> > +};
> > +
> > +#define VFIO_SVM_TYPE_MASK (VFIO_SVM_BIND_PASIDTBL | \
> > +   VFIO_SVM_BIND_PASID | \
> > +   VFIO_SVM_BIND_PGTABLE)
> > +
> > +#define VFIO_IOMMU_SVM_BIND_TASK   _IO(VFIO_TYPE, VFIO_BASE + 22)
> > +
> >  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
> >  
> >  /*
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 4/8] iommu/vt-d: Add iommu do invalidate function

2017-05-18 Thread Liu, Yi L

On Fri, May 12, 2017 at 03:59:18PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:01 +0800
> "Liu, Yi L"  wrote:
> 
> > From: Jacob Pan 
> > 
> > This patch adds Intel VT-d specific function to implement
> > iommu_do_invalidate API.
> > 
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the guest
> > to the physical IOMMU.
> > 
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> > 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel-iommu.c | 43 
> > +++
> >  include/linux/intel-iommu.h | 11 +++
> >  2 files changed, 54 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 6d5b939..0b098ad 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5042,6 +5042,48 @@ static void intel_iommu_detach_device(struct 
> > iommu_domain *domain,
> > dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
> >  }
> >  
> > +static int intel_iommu_do_invalidate(struct iommu_domain *domain,
> > +   struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +   int ret = 0;
> > +   struct intel_iommu *iommu;
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct intel_invalidate_data *inv_data;
> > +   struct qi_desc *qi;
> > +   u16 did;
> > +   u8 bus, devfn;
> > +
> > +   if (!inv_info || !dmar_domain || (inv_info->model != INTEL_IOMMU))
> > +   return -EINVAL;
> > +
> > +   iommu = device_to_iommu(dev, &bus, &devfn);
> > +   if (!iommu)
> > +   return -ENODEV;
> > +
> > +   inv_data = (struct intel_invalidate_data *)&inv_info->opaque;
> > +
> > +   /* check SID */
> > +   if (PCI_DEVID(bus, devfn) != inv_data->sid)
> > +   return 0;
> > +
> > +   qi = &inv_data->inv_desc;
> > +
> > +   switch (qi->low & QI_TYPE_MASK) {
> > +   case QI_DIOTLB_TYPE:
> > +   case QI_DEIOTLB_TYPE:
> > +   /* for device IOTLB, we just let it pass through */
> > +   break;
> > +   default:
> > +   did = dmar_domain->iommu_did[iommu->seq_id];
> > +   set_mask_bits(&qi->low, QI_DID_MASK, QI_DID(did));
> > +   break;
> > +   }
> > +
> > +   ret = qi_submit_sync(qi, iommu);
> > +
> > +   return ret;
> 
> nit, ret variable is unnecessary.

yes, would remove it.
 
> > +}
> > +
> >  static int intel_iommu_map(struct iommu_domain *domain,
> >unsigned long iova, phys_addr_t hpa,
> >size_t size, int iommu_prot)
> > @@ -5416,6 +5458,7 @@ static int intel_iommu_unbind_pasid_table(struct 
> > iommu_domain *domain,
> >  #ifdef CONFIG_INTEL_IOMMU_SVM
> > .bind_pasid_table   = intel_iommu_bind_pasid_table,
> > .unbind_pasid_table = intel_iommu_unbind_pasid_table,
> > +   .do_invalidate  = intel_iommu_do_invalidate,
> >  #endif
> > .map= intel_iommu_map,
> > .unmap  = intel_iommu_unmap,
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index ac04f28..9d6562c 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -271,6 +272,10 @@ enum {
> >  #define QI_PGRP_RESP_TYPE  0x9
> >  #define QI_PSTRM_RESP_TYPE 0xa
> >  
> > +#define QI_DID(did)(((u64)did & 0x) << 16)
> > +#define QI_DID_MASKGENMASK(31, 16)
> > +#define QI_TYPE_MASK   GENMASK(3, 0)
> > +
> >  #define QI_IEC_SELECTIVE   (((u64)1) << 4)
> >  #define QI_IEC_IIDEX(idx)  (((u64)(idx & 0x) << 32))
> >  #define QI_IEC_IM(m)   (((u64)(m & 0x1f) << 27))
> > @@ -529,6 +534,12 @@ struct intel_svm {
> >  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
> >  #endif
> >  
> > +struct intel_invalidate_data {
> > +   u16 sid;
> > +   u32 pasid;
> > +   struct qi_desc inv_desc;
> > +};
> 
> This needs to be uapi since the vfio user is expected to create it, so
> we need a uapi version of qi_desc too.
>

yes, would do it.

Thx,
Yi L
 
> > +
> >  extern const struct attribute_group *intel_iommu_groups[];
> >  extern void intel_iommu_debugfs_init(void);
> >  extern struct context_entry *iommu_context_addr(struct intel_iommu *iommu,
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 3/8] iommu: Introduce iommu do invalidate API function

2017-05-18 Thread Liu, Yi L

On Fri, May 12, 2017 at 03:59:24PM -0600, Alex Williamson wrote:
> On Wed, 26 Apr 2017 18:12:00 +0800
> "Liu, Yi L"  wrote:
> 

Hi Alex,

Pls refer to the open I mentioned in this email, I need your comments
on it to prepare the formal patchset for SVM virtualization. Thx.

> > From: "Liu, Yi L" 
> > 
> > When a SVM capable device is assigned to a guest, the first level page
> > tables are owned by the guest and the guest PASID table pointer is
> > linked to the device context entry of the physical IOMMU.
> > 
> > Host IOMMU driver has no knowledge of caching structure updates unless
> > the guest invalidation activities are passed down to the host. The
> > primary usage is derived from emulated IOMMU in the guest, where QEMU
> > can trap invalidation activities before pass them down the
> > host/physical IOMMU. There are IOMMU architectural specific actions
> > need to be taken which requires the generic APIs introduced in this
> > patch to have opaque data in the tlb_invalidate_info argument.
> > 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/iommu.c | 13 +
> >  include/linux/iommu.h | 16 
> >  2 files changed, 29 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index f2da636..ca7cff2 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1153,6 +1153,19 @@ int iommu_unbind_pasid_table(struct iommu_domain 
> > *domain, struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
> >  
> > +int iommu_do_invalidate(struct iommu_domain *domain,
> > +   struct device *dev, struct tlb_invalidate_info *inv_info)
> > +{
> > +   int ret = 0;
> > +
> > +   if (unlikely(domain->ops->do_invalidate == NULL))
> > +   return -ENODEV;
> > +
> > +   ret = domain->ops->do_invalidate(domain, dev, inv_info);
> > +   return ret;
> 
> nit, ret is unnecessary.

yes, would modify it. Thx.

> > +}
> > +EXPORT_SYMBOL_GPL(iommu_do_invalidate);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >   struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 491a011..a48e3b75 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -140,6 +140,11 @@ struct pasid_table_info {
> > __u8opaque[];/* IOMMU-specific details */
> >  };
> >  
> > +struct tlb_invalidate_info {
> > +   __u32   model;
> > +   __u8opaque[];
> > +};
> 
> I'm wondering if 'model' is really necessary here, shouldn't this
> function only be called if a bind_pasid_table() succeeded, and then the
> model would be set at that time?

For this model, I'm thinking about another potential usage which
is from Tianyu's idea to use tlb_invalidate_info to pass invalidations
for iova related mappings. In such case, there would be no bind_pasid_table()
before it, so a model check would be needed. But I may remove it since this
patchset is focusing on SVM.

Here, I have an open to check with you. I defined the tlb_invalidate_info
with full opaque data. The opaque would include the invalidate info for
different vendors. But we have two choices for the tlb_invalidate_info
definition.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
   on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
   question since different vendor may have vendor specific
   invalidation info. This would make it difficult to have vendor
   agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
   The generic structure would be a hyper-set of all possible
   invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. Jean from ARM has gave some comments on it and inclined to the opaque
way with generic part defined explicitly. Jean's reply is in the link below.

http://www.spinics.net/lists/kvm/msg149884.html

I'd like to see your comments on it before moving forward. I'm fine with
Jean's idea. For VT-d, I may define it as "generic part" + "raw data".

Thanks,
Yi L

> This also needs to be uapi since you're expecting a user to provide it
> to vfio.  The opaque data needs to be fully specified (relative to
> uapi) per model.
> 

would do it as you pointed.

> > +
> >  #ifdef CONFIG_IOMMU_API
> >  
> >  /**

[PATCH V2 2/3] iommu: of: Ignore all errors except EPROBE_DEFER

2017-05-18 Thread Sricharan R

While deferring the probe of IOMMU masters,
xlate and add_device callback can pass back error values
like -ENODEV, which means IOMMU cannot be connected
with that master for real reasons. So rather than
killing the master's probe for such errors, just
ignore the errors and let the master work without
an IOMMU.

Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
probing or error")
Reported-by: Geert Uytterhoeven 
Tested-by: Magnus Damn 
Signed-off-by: Sricharan R 
---
[V2] Corrected spelling/case in commit log

 drivers/iommu/of_iommu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index e6e9bec..f0d22c0 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -237,6 +237,12 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
ops = ERR_PTR(err);
}
 
+   /* Ignore all other errors apart from EPROBE_DEFER */
+   if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
+   dev_info(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
+   ops = NULL;
+   }
+
return ops;
 }
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V2 1/3] iommu: of: Fix check for returning EPROBE_DEFER

2017-05-18 Thread Sricharan R

Now with IOMMU probe deferral, we return -EPROBE_DEFER
for masters that are connected to an IOMMU which is not
probed yet, but going to get probed, so that we can attach
the correct dma_ops. So while trying to defer the probe of
the master, check if the of_iommu node that it is connected
to is marked in DT as 'status=disabled', then the IOMMU is never
is going to get probed. So simply return NULL and let the master
work without an IOMMU.

Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
probing or error")
Signed-off-by: Sricharan R 
Reported-by: Geert Uytterhoeven 
Tested-by: Will Deacon 
Tested-by: Magnus Damn 
Acked-by: Will Deacon 
---
[V2] Corrected spelling/case in commit log

 drivers/iommu/of_iommu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 9f44ee8..e6e9bec 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -118,6 +118,7 @@ static bool of_iommu_driver_present(struct device_node *np)
 
ops = iommu_ops_from_fwnode(fwnode);
if ((ops && !ops->of_xlate) ||
+   !of_device_is_available(iommu_spec->np) ||
(!ops && !of_iommu_driver_present(iommu_spec->np)))
return NULL;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH V2 3/3] ARM: dma-mapping: Don't tear third-party mappings

2017-05-18 Thread Sricharan R

From: Laurent Pinchart 

arch_setup_dma_ops() is used in device probe code paths to create an
IOMMU mapping and attach it to the device. The function assumes that the
device is attached to a device-specific IOMMU instance (or at least a
device-specific TLB in a shared IOMMU instance) and thus creates a
separate mapping for every device.

On several systems (Renesas R-Car Gen2 being one of them), that
assumption is not true, and IOMMU mappings must be shared between
multiple devices. In those cases the IOMMU driver knows better than the
generic ARM dma-mapping layer and attaches mapping to devices manually
with arm_iommu_attach_device(), which sets the DMA ops for the device.

The arch_setup_dma_ops() function takes this into account and bails out
immediately if the device already has DMA ops assigned. However, the
corresponding arch_teardown_dma_ops() function, called from driver
unbind code paths (including probe deferral), will tear the mapping down
regardless of who created it. When the device is reprobed
arch_setup_dma_ops() will be called again but won't perform any
operation as the DMA ops will still be set.

We need to reset the DMA ops in arch_teardown_dma_ops() to fix this.
However, we can't do so unconditionally, as then a new mapping would be
created by arch_setup_dma_ops() when the device is reprobed, regardless
of whether the device needs to share a mapping or not. We must thus keep
track of whether arch_setup_dma_ops() created the mapping, and only in
that case tear it down in arch_teardown_dma_ops().

Keep track of that information in the dev_archdata structure. As the
structure is embedded in all instances of struct device let's not grow
it, but turn the existing dma_coherent bool field into a bitfield that
can be used for other purposes.

Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time for 
platform/amba/pci bus devices")
Reviewed-by: Robin Murphy 
Signed-off-by: Laurent Pinchart 
---
 arch/arm/include/asm/device.h | 3 ++-
 arch/arm/mm/dma-mapping.c | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 36ec9c8..3234fe9 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -19,7 +19,8 @@ struct dev_archdata {
 #ifdef CONFIG_XEN
const struct dma_map_ops *dev_dma_ops;
 #endif
-   bool dma_coherent;
+   unsigned int dma_coherent:1;
+   unsigned int dma_ops_setup:1;
 };
 
 struct omap_device;
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c742dfd..b48998f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2430,9 +2430,13 @@ void arch_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size,
dev->dma_ops = xen_dma_ops;
}
 #endif
+   dev->archdata.dma_ops_setup = true;
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
+   if (!dev->archdata.dma_ops_setup)
+   return;
+
arm_teardown_iommu_dma_ops(dev);
 }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] iommu: of: Fix check for returning EPROBE_DEFER

2017-05-18 Thread Sricharan R

Hi Bjorn,

On 5/17/2017 10:34 PM, Bjorn Helgaas wrote:
> On Wed, May 17, 2017 at 05:00:07PM +0530, Sricharan R wrote:
>> Now with iommu probe deferral, we return -EPROBE_DEFER
>> for master's that are connected to an iommu which is not
> 
> s/master's/masters/
> 
> s/iommu/IOMMU/ in your English text (changelogs and comments).  That seems
> to be the convention, based on "git log drivers/iommu/of_iommu.c"
> 

ok, will correct those.

Regards,
 Sricharan


>> probed yet, but going to get probed, so that we can attach
>> the correct dma_ops. So while trying to defer the probe of
>> the master, check if the of_iommu node that it is connected
>> to is marked in DT as 'status=disabled', then the iommu is never
>> is going to get probed. So simply return NULL and let the master
>> work without an iommu.
>>
>> Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
>> probing or error")
>> Signed-off-by: Sricharan R 
>> Reported-by: Geert Uytterhoeven 
>> Tested-by: Will Deacon 
>> Tested-by: Magnus Damn 
>> Acked-by: Will Deacon 
>> ---
>>  drivers/iommu/of_iommu.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
>> index 9f44ee8..e6e9bec 100644
>> --- a/drivers/iommu/of_iommu.c
>> +++ b/drivers/iommu/of_iommu.c
>> @@ -118,6 +118,7 @@ static bool of_iommu_driver_present(struct device_node 
>> *np)
>>  
>>  ops = iommu_ops_from_fwnode(fwnode);
>>  if ((ops && !ops->of_xlate) ||
>> +!of_device_is_available(iommu_spec->np) ||
>>  (!ops && !of_iommu_driver_present(iommu_spec->np)))
>>  return NULL;
>>  
>> -- 
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
>> Code Aurora Forum, hosted by The Linux Foundation
>>
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data

2017-05-18 Thread Borislav Petkov

On Wed, May 17, 2017 at 03:26:58PM -0500, Tom Lendacky wrote:
> > Also, simplify that test:
> > 
> > if (mpf->feature1)
> > ...
> 
> Ok, I can do that but I hope no one says anything about it being
> unrelated to the patch. :)

Bah, that's minor.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear

2017-05-18 Thread Borislav Petkov

On Wed, May 17, 2017 at 01:54:39PM -0500, Tom Lendacky wrote:
> I was worried what the compiler might do when CONFIG_EFI is not set,
> but it appears to take care of it. I'll double check though.

There's a efi_enabled() !CONFIG_EFI version too, so should be fine.

> I may introduce a length variable to capture data->len right after
> paddr_next is set and then have just a single memunmap() call before
> the if check.

Yap.

> I tried that, but calling an "__init" function (early_memremap()) from
> a non "__init" function generated warnings. I suppose I can pass in a
> function for the map and unmap but that looks worse to me (also the
> unmap functions take different arguments).

No, the other way around: the __init function should call the non-init
one and you need the non-init one anyway for memremap_is_setup_data().

> This is like the chicken and the egg scenario. In order to determine if
> an address is setup data I have to explicitly map the setup data chain
> as decrypted. In order to do that I have to supply a flag to explicitly
> map the data decrypted otherwise I wind up back in the
> memremap_is_setup_data() function again and again and again...

Oh, fun.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 5/6] iommu/iova: move the caculation of pad mask out of loop

2017-05-18 Thread Zhen Lei

I'm not sure whether the compiler can optimize it, but move it out will
be better. At least, it does not require lock protection.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 711b10a..338930b 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -155,23 +155,16 @@ iova_insert_rbtree(struct rb_root *root, struct iova 
*iova,
rb_insert_color(&iova->node, root);
 }
 
-/*
- * Computes the padding size required, to make the start address
- * naturally aligned on the power-of-two order of its size
- */
-static unsigned int
-iova_get_pad_size(unsigned int size, unsigned int limit_pfn)
-{
-   return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
-}
-
 static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
unsigned long size, unsigned long limit_pfn,
struct iova *new, bool size_aligned)
 {
struct rb_node *prev, *curr;
unsigned long flags;
-   unsigned int pad_size = 0;
+   unsigned long pad_mask, pad_size = 0;
+
+   if (size_aligned)
+   pad_mask = __roundup_pow_of_two(size) - 1;
 
/* Walk the tree backwards */
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
@@ -185,8 +178,13 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
else if (limit_pfn < curr_iova->pfn_hi)
goto adjust_limit_pfn;
else {
+   /*
+* Computes the padding size required, to make the start
+* address naturally aligned on the power-of-two order
+* of its size
+*/
if (size_aligned)
-   pad_size = iova_get_pad_size(size, limit_pfn);
+   pad_size = (limit_pfn + 1 - size) & pad_mask;
if ((curr_iova->pfn_hi + size + pad_size) <= limit_pfn)
break;  /* found a free slot */
}
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 0/6] iommu/iova: improve the allocation performance of dma64

2017-05-18 Thread Zhen Lei

v2 -> v3:
It's been a long time. I have not received any advise except Robin Murphy's. So
the major changes is just deleted an old patch ("iommu/iova: fix incorrect 
variable types")
and merged it into patch 5 of this version.

v1 -> v2:
Because the problem of my email-server, all patches sent to Joerg Roedel 
 failed.
So I repost all these patches again, there is no changes.

v1:
64 bits devices is very common now. But currently we only defined a 
cached32_node
to optimize the allocation performance of dma32, and I saw some dma64 drivers 
chose
to allocate iova from dma32 space first, maybe becuase of current dma64 
performance
problem or some other reasons.

For example:(in drivers/iommu/amd_iommu.c)
static unsigned long dma_ops_alloc_iova(..
{
..
if (dma_mask > DMA_BIT_MASK(32))
pfn = alloc_iova_fast(&dma_dom->iovad, pages,
  IOVA_PFN(DMA_BIT_MASK(32)));
if (!pfn)
pfn = alloc_iova_fast(&dma_dom->iovad, pages, 
IOVA_PFN(dma_mask));

For the details of why dma64 iova allocation performance is very bad, please 
refer the
description of patch-5.

In this patch series, I added a cached64_node to manage the dma64 iova 
space(iova>=4G), it
takes the same effect as cached32_node(iova<4G).

Below it's the performance data before and after my patch series:
(before)$ iperf -s

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35898
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.2 sec  7.88 MBytes  6.48 Mbits/sec
[  5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35900
[  5]  0.0-10.3 sec  7.88 MBytes  6.43 Mbits/sec
[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35902
[  4]  0.0-10.3 sec  7.88 MBytes  6.43 Mbits/sec

(after)$ iperf -s

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)

[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36330
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
[  5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36332
[  5]  0.0-10.0 sec  1.10 GBytes   939 Mbits/sec
[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36334
[  4]  0.0-10.0 sec  1.10 GBytes   938 Mbits/sec

Zhen Lei (6):
  iommu/iova: cut down judgement times
  iommu/iova: insert start_pfn boundary of dma32
  iommu/iova: adjust __cached_rbnode_insert_update
  iommu/iova: to optimize the allocation performance of dma64
  iommu/iova: move the caculation of pad mask out of loop
  iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32

 drivers/iommu/amd_iommu.c|   7 +-
 drivers/iommu/dma-iommu.c|  21 ++
 drivers/iommu/intel-iommu.c  |  11 +--
 drivers/iommu/iova.c | 143 +--
 drivers/misc/mic/scif/scif_rma.c |   3 +-
 include/linux/iova.h |   7 +-
 6 files changed, 93 insertions(+), 99 deletions(-)

-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 2/6] iommu/iova: insert start_pfn boundary of dma32

2017-05-18 Thread Zhen Lei

Reserve the first granule size memory(start at start_pfn) as boundary
iova, to make sure that iovad->cached32_node can not be NULL in future.
Meanwhile, changed the assignment of iovad->cached32_node from rb_next to
rb_prev of &free->node in function __cached_rbnode_delete_update.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 63 ++--
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 333a9cc..d0c19ec 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -32,6 +32,17 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 static void init_iova_rcaches(struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
 
+static void
+insert_iova_boundary(struct iova_domain *iovad)
+{
+   struct iova *iova;
+   unsigned long start_pfn_32bit = iovad->start_pfn;
+
+   iova = reserve_iova(iovad, start_pfn_32bit, start_pfn_32bit);
+   BUG_ON(!iova);
+   iovad->cached32_node = &iova->node;
+}
+
 void
 init_iova_domain(struct iova_domain *iovad, unsigned long granule,
unsigned long start_pfn, unsigned long pfn_32bit)
@@ -45,27 +56,38 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
 
spin_lock_init(&iovad->iova_rbtree_lock);
iovad->rbroot = RB_ROOT;
-   iovad->cached32_node = NULL;
iovad->granule = granule;
iovad->start_pfn = start_pfn;
iovad->dma_32bit_pfn = pfn_32bit;
init_iova_rcaches(iovad);
+
+   /*
+* Insert boundary nodes for dma32. So cached32_node can not be NULL in
+* future.
+*/
+   insert_iova_boundary(iovad);
 }
 EXPORT_SYMBOL_GPL(init_iova_domain);
 
 static struct rb_node *
 __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn)
 {
-   if ((*limit_pfn > iovad->dma_32bit_pfn) ||
-   (iovad->cached32_node == NULL))
+   struct rb_node *cached_node;
+   struct rb_node *next_node;
+
+   if (*limit_pfn > iovad->dma_32bit_pfn)
return rb_last(&iovad->rbroot);
-   else {
-   struct rb_node *prev_node = rb_prev(iovad->cached32_node);
-   struct iova *curr_iova =
-   rb_entry(iovad->cached32_node, struct iova, node);
-   *limit_pfn = curr_iova->pfn_lo - 1;
-   return prev_node;
+   else
+   cached_node = iovad->cached32_node;
+
+   next_node = rb_next(cached_node);
+   if (next_node) {
+   struct iova *next_iova = rb_entry(next_node, struct iova, node);
+
+   *limit_pfn = min(*limit_pfn, next_iova->pfn_lo - 1);
}
+
+   return cached_node;
 }
 
 static void
@@ -83,20 +105,13 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, 
struct iova *free)
struct iova *cached_iova;
struct rb_node *curr;
 
-   if (!iovad->cached32_node)
-   return;
curr = iovad->cached32_node;
cached_iova = rb_entry(curr, struct iova, node);
 
if (free->pfn_lo >= cached_iova->pfn_lo) {
-   struct rb_node *node = rb_next(&free->node);
-   struct iova *iova = rb_entry(node, struct iova, node);
-
/* only cache if it's below 32bit pfn */
-   if (node && iova->pfn_lo < iovad->dma_32bit_pfn)
-   iovad->cached32_node = node;
-   else
-   iovad->cached32_node = NULL;
+   if (free->pfn_hi <= iovad->dma_32bit_pfn)
+   iovad->cached32_node = rb_prev(&free->node);
}
 }
 
@@ -142,7 +157,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain 
*iovad,
unsigned long size, unsigned long limit_pfn,
struct iova *new, bool size_aligned)
 {
-   struct rb_node *prev, *curr = NULL;
+   struct rb_node *prev, *curr;
unsigned long flags;
unsigned long saved_pfn;
unsigned int pad_size = 0;
@@ -172,13 +187,9 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
curr = rb_prev(curr);
}
 
-   if (!curr) {
-   if (size_aligned)
-   pad_size = iova_get_pad_size(size, limit_pfn);
-   if ((iovad->start_pfn + size + pad_size) > limit_pfn) {
-   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
-   return -ENOMEM;
-   }
+   if (unlikely(!curr)) {
+   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+   return -ENOMEM;
}
 
/* pfn_lo will point to size aligned address if size_aligned is set */
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 6/6] iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32

2017-05-18 Thread Zhen Lei

To make sure iovad->cached32_node and iovad->cached64_node can exactly
control dma32 and dma64 area. It also help us to remove the parameter
pfn_32bit of init_iova_domain.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/amd_iommu.c|  7 ++-
 drivers/iommu/dma-iommu.c| 21 -
 drivers/iommu/intel-iommu.c  | 11 +++
 drivers/iommu/iova.c |  4 ++--
 drivers/misc/mic/scif/scif_rma.c |  3 +--
 include/linux/iova.h |  2 +-
 6 files changed, 13 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 63cacf5..9aebfa6 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -61,7 +61,6 @@
 /* IO virtual address start page frame number */
 #define IOVA_START_PFN (1)
 #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT)
-#define DMA_32BIT_PFN  IOVA_PFN(DMA_BIT_MASK(32))
 
 /* Reserved IOVA ranges */
 #define MSI_RANGE_START(0xfee0)
@@ -1776,8 +1775,7 @@ static struct dma_ops_domain *dma_ops_domain_alloc(void)
if (!dma_dom->domain.pt_root)
goto free_dma_dom;
 
-   init_iova_domain(&dma_dom->iovad, PAGE_SIZE,
-IOVA_START_PFN, DMA_32BIT_PFN);
+   init_iova_domain(&dma_dom->iovad, PAGE_SIZE, IOVA_START_PFN);
 
/* Initialize reserved ranges */
copy_reserved_iova(&reserved_iova_ranges, &dma_dom->iovad);
@@ -2747,8 +2745,7 @@ static int init_reserved_iova_ranges(void)
struct pci_dev *pdev = NULL;
struct iova *val;
 
-   init_iova_domain(&reserved_iova_ranges, PAGE_SIZE,
-IOVA_START_PFN, DMA_32BIT_PFN);
+   init_iova_domain(&reserved_iova_ranges, PAGE_SIZE, IOVA_START_PFN);
 
lockdep_set_class(&reserved_iova_ranges.iova_rbtree_lock,
  &reserved_rbtree_key);
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 8348f366..b3455d4 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -290,18 +290,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
/* ...then finally give it a kicking to make sure it fits */
base_pfn = max_t(unsigned long, base_pfn,
domain->geometry.aperture_start >> order);
-   end_pfn = min_t(unsigned long, end_pfn,
-   domain->geometry.aperture_end >> order);
}
-   /*
-* PCI devices may have larger DMA masks, but still prefer allocating
-* within a 32-bit mask to avoid DAC addressing. Such limitations don't
-* apply to the typical platform device, so for those we may as well
-* leave the cache limit at the top of their range to save an rb_last()
-* traversal on every allocation.
-*/
-   if (dev && dev_is_pci(dev))
-   end_pfn &= DMA_BIT_MASK(32) >> order;
 
/* start_pfn is always nonzero for an already-initialised domain */
if (iovad->start_pfn) {
@@ -310,19 +299,17 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base,
pr_warn("Incompatible range for DMA domain\n");
return -EFAULT;
}
-   /*
-* If we have devices with different DMA masks, move the free
-* area cache limit down for the benefit of the smaller one.
-*/
-   iovad->dma_32bit_pfn = min(end_pfn, iovad->dma_32bit_pfn);
 
return 0;
}
 
-   init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
+   init_iova_domain(iovad, 1UL << order, base_pfn);
if (!dev)
return 0;
 
+   if (end_pfn < iovad->dma_32bit_pfn)
+   dev_dbg(dev, "ancient device or dma range missed some bits?");
+
return iova_reserve_iommu_regions(dev, domain);
 }
 EXPORT_SYMBOL(iommu_dma_init_domain);
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 90ab011..266b96b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -82,8 +82,6 @@
 #define IOVA_START_PFN (1)
 
 #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT)
-#define DMA_32BIT_PFN  IOVA_PFN(DMA_BIT_MASK(32))
-#define DMA_64BIT_PFN  IOVA_PFN(DMA_BIT_MASK(64))
 
 /* page table handling */
 #define LEVEL_STRIDE   (9)
@@ -1874,8 +1872,7 @@ static int dmar_init_reserved_ranges(void)
struct iova *iova;
int i;
 
-   init_iova_domain(&reserved_iova_list, VTD_PAGE_SIZE, IOVA_START_PFN,
-   DMA_32BIT_PFN);
+   init_iova_domain(&reserved_iova_list, VTD_PAGE_SIZE, IOVA_START_PFN);
 
lockdep_set_class(&reserved_iova_list.iova_rbtree_lock,
&reserved_rbtree_key);
@@ -1933,8 +1930,7 @@ static int domain_init(struct dmar_domain *domain, struct 
intel_iommu *iommu,
int adjust_width,

[PATCH v3 4/6] iommu/iova: to optimize the allocation performance of dma64

2017-05-18 Thread Zhen Lei

Currently we always search free iova space for dma64 begin at the last
node of iovad rb-tree. In the worst case, there maybe too many nodes exist
at the tail, so that we should traverse many times for the first loop in
__alloc_and_insert_iova_range. As we traced, more than 10K times for the
case of iperf.

__alloc_and_insert_iova_range:
..
curr = __get_cached_rbnode(iovad, &limit_pfn);
//--> return rb_last(&iovad->rbroot);
while (curr) {
..
curr = rb_prev(curr);
}

So add cached64_node to take the same effect as cached32_node, and add
the start_pfn boundary of dma64, to prevent a iova cross both dma32 and
dma64 area.
|---|--|
|<--cached32_node-->||
|   |
start_pfn dma_32bit_pfn + 1

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 46 +++---
 include/linux/iova.h |  5 +++--
 2 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 1b8e136..711b10a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -37,10 +37,15 @@ insert_iova_boundary(struct iova_domain *iovad)
 {
struct iova *iova;
unsigned long start_pfn_32bit = iovad->start_pfn;
+   unsigned long start_pfn_64bit = iovad->dma_32bit_pfn + 1;
 
iova = reserve_iova(iovad, start_pfn_32bit, start_pfn_32bit);
BUG_ON(!iova);
iovad->cached32_node = &iova->node;
+
+   iova = reserve_iova(iovad, start_pfn_64bit, start_pfn_64bit);
+   BUG_ON(!iova);
+   iovad->cached64_node = &iova->node;
 }
 
 void
@@ -62,8 +67,8 @@ init_iova_domain(struct iova_domain *iovad, unsigned long 
granule,
init_iova_rcaches(iovad);
 
/*
-* Insert boundary nodes for dma32. So cached32_node can not be NULL in
-* future.
+* Insert boundary nodes for dma32 and dma64. So cached32_node and
+* cached64_node can not be NULL in future.
 */
insert_iova_boundary(iovad);
 }
@@ -75,10 +80,10 @@ __get_cached_rbnode(struct iova_domain *iovad, unsigned 
long *limit_pfn)
struct rb_node *cached_node;
struct rb_node *next_node;
 
-   if (*limit_pfn > iovad->dma_32bit_pfn)
-   return rb_last(&iovad->rbroot);
-   else
+   if (*limit_pfn <= iovad->dma_32bit_pfn)
cached_node = iovad->cached32_node;
+   else
+   cached_node = iovad->cached64_node;
 
next_node = rb_next(cached_node);
if (next_node) {
@@ -94,29 +99,32 @@ static void
 __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
 {
struct iova *cached_iova;
+   struct rb_node **cached_node;
 
-   if (new->pfn_hi > iovad->dma_32bit_pfn)
-   return;
+   if (new->pfn_hi <= iovad->dma_32bit_pfn)
+   cached_node = &iovad->cached32_node;
+   else
+   cached_node = &iovad->cached64_node;
 
-   cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
+   cached_iova = rb_entry(*cached_node, struct iova, node);
if (new->pfn_lo <= cached_iova->pfn_lo)
-   iovad->cached32_node = rb_prev(&new->node);
+   *cached_node = rb_prev(&new->node);
 }
 
 static void
 __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 {
struct iova *cached_iova;
-   struct rb_node *curr;
+   struct rb_node **cached_node;
 
-   curr = iovad->cached32_node;
-   cached_iova = rb_entry(curr, struct iova, node);
+   if (free->pfn_hi <= iovad->dma_32bit_pfn)
+   cached_node = &iovad->cached32_node;
+   else
+   cached_node = &iovad->cached64_node;
 
-   if (free->pfn_lo >= cached_iova->pfn_lo) {
-   /* only cache if it's below 32bit pfn */
-   if (free->pfn_hi <= iovad->dma_32bit_pfn)
-   iovad->cached32_node = rb_prev(&free->node);
-   }
+   cached_iova = rb_entry(*cached_node, struct iova, node);
+   if (free->pfn_lo >= cached_iova->pfn_lo)
+   *cached_node = rb_prev(&free->node);
 }
 
 /* Insert the iova into domain rbtree by holding writer lock */
@@ -262,7 +270,7 @@ EXPORT_SYMBOL_GPL(iova_cache_put);
  * alloc_iova - allocates an iova
  * @iovad: - iova domain in question
  * @size: - size of page frames to allocate
- * @limit_pfn: - max limit address
+ * @limit_pfn: - max limit address(included)
  * @size_aligned: - set if size_aligned address range is required
  * This function allocates an iova in the range iovad->start_pfn to limit_pfn,
  * searching top-down from limit_pfn to iovad->start_pfn. If the size_aligned
@@ -381,7 +389,7 @@ EXPORT_SYMBOL_GPL(free_iova);
  * alloc_iova_fast - allocates an iova from rcache
  * @iovad: - iova domain in question
  * @size: - size of page frames to

[PATCH v3 1/6] iommu/iova: cut down judgement times

2017-05-18 Thread Zhen Lei

Below judgement can only be satisfied at the last time, which produced 2N
judgements(suppose N times failed, 0 or 1 time successed) in vain.

if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
return iova;
}

Signed-off-by: Zhen Lei 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/iova.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 5c88ba7..333a9cc 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -291,15 +291,12 @@ private_find_iova(struct iova_domain *iovad, unsigned 
long pfn)
while (node) {
struct iova *iova = rb_entry(node, struct iova, node);
 
-   /* If pfn falls within iova's range, return iova */
-   if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
-   return iova;
-   }
-
if (pfn < iova->pfn_lo)
node = node->rb_left;
-   else if (pfn > iova->pfn_lo)
+   else if (pfn > iova->pfn_hi)
node = node->rb_right;
+   else
+   return iova;/* pfn falls within iova's range */
}
 
return NULL;
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 3/6] iommu/iova: adjust __cached_rbnode_insert_update

2017-05-18 Thread Zhen Lei

For case 2 and 3, adjust cached32_node to the new place, case 1 keep no
change.

For example:
case1: (the right part was allocated)
|--|
|<-free>|<--new_iova-->|
|
|
   cached32_node

case2: (all was allocated)
|--|
|<-new_iova--->|
|
|
   cached32_node

case3:
|---|..|-|
|..free..|<--new_iova-->|
|  |
|  |
   cached32_node(new) cached32_node(old)

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index d0c19ec..1b8e136 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -91,12 +91,16 @@ __get_cached_rbnode(struct iova_domain *iovad, unsigned 
long *limit_pfn)
 }
 
 static void
-__cached_rbnode_insert_update(struct iova_domain *iovad,
-   unsigned long limit_pfn, struct iova *new)
+__cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
 {
-   if (limit_pfn != iovad->dma_32bit_pfn)
+   struct iova *cached_iova;
+
+   if (new->pfn_hi > iovad->dma_32bit_pfn)
return;
-   iovad->cached32_node = &new->node;
+
+   cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
+   if (new->pfn_lo <= cached_iova->pfn_lo)
+   iovad->cached32_node = rb_prev(&new->node);
 }
 
 static void
@@ -159,12 +163,10 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
 {
struct rb_node *prev, *curr;
unsigned long flags;
-   unsigned long saved_pfn;
unsigned int pad_size = 0;
 
/* Walk the tree backwards */
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
-   saved_pfn = limit_pfn;
curr = __get_cached_rbnode(iovad, &limit_pfn);
prev = curr;
while (curr) {
@@ -198,11 +200,10 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
 
/* If we have 'prev', it's a valid place to start the insertion. */
iova_insert_rbtree(&iovad->rbroot, new, prev);
-   __cached_rbnode_insert_update(iovad, saved_pfn, new);
+   __cached_rbnode_insert_update(iovad, new);
 
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
 
-
return 0;
 }
 
-- 
2.5.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

38 matches

Mail list logo