Re: [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources

2017-05-30 Thread Bjorn Helgaas via iommu
On Mon, May 22, 2017 at 11:39 AM, Oza Pawandeep  wrote:
> This patch adds support for inbound memory window
> for PCI RC drivers.
>
> It defines new function pci_create_root_bus2 which
> takes inbound resources as an argument and fills in the
> memory resource to PCI internal host bridge structure
> as inbound_windows.
>
> Legacy RC driver could continue to use pci_create_root_bus,
> but any RC driver who wants to reseve IOVAS for their
> inbound memory holes, should use new API pci_create_root_bus2.
>
> Signed-off-by: Oza Pawandeep 
> ...

> +struct pci_bus *pci_create_root_bus2(struct device *parent, int bus,
> +   struct pci_ops *ops, void *sysdata, struct list_head 
> *resources,
> +   struct list_head *in_res)
> +{
> +   return pci_create_root_bus_msi(parent, bus, ops, sysdata,
> +  resources, in_res, NULL);
> +}
> +EXPORT_SYMBOL_GPL(pci_create_root_bus2);

Based on your response to Lorenzo's "[RFC/RFT PATCH 03/18] PCI:
Introduce pci_scan_root_bus_bridge()", I'm hoping you can avoid adding
yet another variant of pci_create_root_bus().

So I think I can wait for that to settle out and look for a v8?

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Device address specific mapping of arm,mmu-500

2017-05-30 Thread Ray Jui via iommu
Hi Marc/Robin/Will,

On 5/30/17 10:27 AM, Marc Zyngier wrote:
> On 30/05/17 18:16, Ray Jui wrote:
>> Hi Marc,
>>
>> On 5/30/17 9:59 AM, Marc Zyngier wrote:
>>> On 30/05/17 17:49, Ray Jui wrote:
 Hi Will,

 On 5/30/17 8:14 AM, Will Deacon wrote:
> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote:
>> I'm writing to check with you to see if the latest arm-smmu.c driver in
>> v4.12-rc Linux for smmu-500 can support mapping that is only specific to
>> a particular physical address range while leave the rest still to be
>> handled by the client device. I believe this can already be supported by
>> the device tree binding of the generic IOMMU framework; however, it is
>> not clear to me whether or not the arm-smmu.c driver can support it.
>>
>> To give you some background information:
>>
>> We have a SoC that has PCIe root complex that has a build-in logic block
>> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block
>> has a HW bug that causes the MSI writes not parsed properly and can
>> potentially corrupt data in the internal FIFO. A workaround is to have
>> ARM MMU-500 takes care of all inbound transactions. I found that is
>> working after hooking up our PCIe root complex to MMU-500; however, even
>> with this optimized arm-smmu driver in v4.12, I'm still seeing a
>> significant Ethernet throughput drop in both the TX and RX directions.
>> The throughput drop is very significant at around 50% (but is already
>> much improved compared to other prior kernel versions at 70~90%).
>
> Did Robin's experiments help at all with this?
>
> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf
>

 It looks like these are new optimizations that have not yet been merged
 in v4.12? I'm going to give it a try.

>> One alternative is to only use MMU-500 for MSI writes towards
>> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific
>> region of physical address that I want MMU-500 to act on and leave the
>> rest of inbound transactions to be handled directly by our PCIe
>> controller, it can potentially work around the HW bug we have and at the
>> same time achieve optimal throughput.
>
> I don't think you can bypass the SMMU for MSIs unless you give them their
> own StreamIDs, which is likely to break things horribly in the kernel. You
> could try to create an identity mapping, but you'll still have the
> translation overhead and you'd probably end up having to supply your own 
> DMA
> ops to manage the address space. I'm assuming that you need to prevent the
> physical address of the ITS from being allocated as an IOVA?

 Will, is that a HW limitation that the SMMU cannot be used, only for MSI
 writes, in which case, the physical address range is very specific in
 our ASIC that falls in the device memory region (e.g., below 0x8000)?

 In fact, what I need in this case is a static mapping from IOMMU on the
 physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the
 address that MSI writes go to. This is to bypass the MSI forwarding
 logic in our PCIe controller. At the same time, I can leave the rest of
 inbound transactions to be handled by our PCIe controller without going
 through the MMU.
>>>
>>> How is that going to work for DMA? I imagine your network interfaces do
>>> have to access memory, don't they? How can the transactions be
>>> terminated in the PCIe controller?
>>
>> Sorry, I may not phrase this properly. These inbound transactions (DMA
>> write to DDR, from endpoint) do not terminate in the PCIe controller.
>> They are taken by the PCIe controller as PCIe transactions and will be
>> carried towards the designated memory on the host.
> 
> So what is the StreamID used for these transactions? Is that a different
> StreamID from that of the DMAing device? If you want to avoid the SMMU
> effect on the transaction, you must make sure if doesn't match anything
> there.
> 
> Thanks,
> 
>   M.
> 

Thanks for the reply. I'm checking with our ASIC team, but from my
understanding, the stream ID in our ASIC is constructed based on the
some custom fields that a developer can program + some standard PCIe BDF
fields. That is, I don't think we can make the stream ID from the same
PF different between MSI writes and DMA writes, as you have already
predicted.

It sounds like I do not have much option here...

Thanks,

Ray
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Save and restore pci properties to support FLR

2017-05-30 Thread Bjorn Helgaas
On Tue, May 30, 2017 at 09:25:47AM -0700, Ashok Raj wrote:
> Resending Jean's patch so it can be included earlier than his large
> SVM commits. Original patch https://patchwork.kernel.org/patch/9593891
> was ack'ed by Bjorn. Let's commit these separately since we need
> functionality earlier.
> 
> Resending this series as requested by Jean.
> 
> CQ Tang (1):
>   PCI: Save properties required to handle FLR for replay purposes.
> 
> Jean-Philippe Brucker (1):
>   PCI: Cache PRI and PASID bits in pci_dev
> 
>  drivers/pci/ats.c   | 88 
> -
>  drivers/pci/pci.c   |  3 ++
>  include/linux/pci-ats.h | 10 ++
>  include/linux/pci.h |  8 +
>  4 files changed, 94 insertions(+), 15 deletions(-)

Applied to pci/virtualization for v4.13.  See response to 2/2 for minor
changes I made there.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-30 Thread Bjorn Helgaas
On Tue, May 30, 2017 at 09:25:49AM -0700, Ashok Raj wrote:
> From: CQ Tang 
> 
> Requires: https://patchwork.kernel.org/patch/9593891
> 
> 
> After a FLR, pci-states need to be restored. This patch saves PASID features
> and PRI reqs cached.
> 
> To: Bjorn Helgaas 
> To: Joerg Roedel 
> To: linux-...@vger.kernel.org
> To: linux-ker...@vger.kernel.org
> Cc: Jean-Phillipe Brucker 
> Cc: David Woodhouse 
> Cc: iommu@lists.linux-foundation.org
> 
> Signed-off-by: CQ Tang 
> Signed-off-by: Ashok Raj 
> ---
>  drivers/pci/ats.c   | 65 
> +
>  drivers/pci/pci.c   |  3 +++
>  include/linux/pci-ats.h | 10 
>  include/linux/pci.h |  6 +
>  4 files changed, 69 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 2126497..a769955 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
>   pci_read_config_word(pdev, pos + PCI_PRI_STATUS, );
> - if ((control & PCI_PRI_CTRL_ENABLE) ||
> - !(status & PCI_PRI_STATUS_STOPPED))
> + if (!(status & PCI_PRI_STATUS_STOPPED))
>   return -EBUSY;
>  
>   pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, _requests);
>   reqs = min(max_requests, reqs);
> + pdev->pri_reqs_alloc = reqs;
>   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
>  
> - control |= PCI_PRI_CTRL_ENABLE;
> + control = PCI_PRI_CTRL_ENABLE;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   pdev->pri_enabled = 1;
> @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pri);
>  
>  /**
> + * pci_restore_pri_state - Restore PRI
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pri_state(struct pci_dev *pdev)
> +{
> +   u16 control = PCI_PRI_CTRL_ENABLE;
> +   u32 reqs = pdev->pri_reqs_alloc;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pri_enabled)
> +   return;

I propose swapping the order of these tests, so that if PRI is not
enabled, we don't have to search for the capability.  Similarly for
PASID below.

I made these changes and re-indented these functions on my branch.  No
action required unless you object to these changes.

> +
> +   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> +   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> +
> +/**
>   * pci_reset_pri - Resets device's PRI state
>   * @pdev: PCI device structure
>   *
> @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
> - if (control & PCI_PRI_CTRL_ENABLE)
> - return -EBUSY;
> -
> - control |= PCI_PRI_CTRL_RESET;
> -
> + control = PCI_PRI_CTRL_RESET;
>   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
>  
>   return 0;
> @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   if (!pos)
>   return -EINVAL;
>  
> - pci_read_config_word(pdev, pos + PCI_PASID_CTRL, );
>   pci_read_config_word(pdev, pos + PCI_PASID_CAP, );
> -
> - if (control & PCI_PASID_CTRL_ENABLE)
> - return -EINVAL;
> -
>   supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>  
>   /* User wants to enable anything unsupported? */
> @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>   return -EINVAL;
>  
>   control = PCI_PASID_CTRL_ENABLE | features;
> + pdev->pasid_features = features;
>  
>   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
>  
> @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev)
>  EXPORT_SYMBOL_GPL(pci_disable_pasid);
>  
>  /**
> + * pci_restore_pasid_state - Restore PASID capabilities.
> + * @pdev: PCI device structure
> + *
> + */
> +void pci_restore_pasid_state(struct pci_dev *pdev)
> +{
> +   u16 control;
> +   int pos;
> +
> +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> +   if (!pos)
> +   return;
> +
> +   if (!pdev->pasid_enabled)
> +   return;
> +
> +   control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
> +   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> +}
> +EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
> +
> +/**
>   * pci_pasid_features - Check which PASID features are supported
>   * @pdev: PCI device structure
>   *
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7904d02..c9a6510 100644
> --- 

Re: [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings

2017-05-30 Thread Tom Lendacky

On 5/16/2017 12:35 PM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:20:56PM -0500, Tom Lendacky wrote:

Since video memory needs to be accessed decrypted, be sure that the
memory encryption mask is not set for the video ranges.

Signed-off-by: Tom Lendacky 
---
  arch/x86/include/asm/vga.h   |   13 +
  arch/x86/mm/pageattr.c   |2 ++
  drivers/gpu/drm/drm_gem.c|2 ++
  drivers/gpu/drm/drm_vm.c |4 
  drivers/gpu/drm/ttm/ttm_bo_vm.c  |7 +--
  drivers/gpu/drm/udl/udl_fb.c |4 
  drivers/video/fbdev/core/fbmem.c |   12 
  7 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
index c4b9dc2..5c7567a 100644
--- a/arch/x86/include/asm/vga.h
+++ b/arch/x86/include/asm/vga.h
@@ -7,12 +7,25 @@
  #ifndef _ASM_X86_VGA_H
  #define _ASM_X86_VGA_H
  
+#include 

+
  /*
   *On the PC, we can just recalculate addresses and then
   *access the videoram directly without any black magic.
+ * To support memory encryption however, we need to access
+ * the videoram as decrypted memory.
   */
  
+#ifdef CONFIG_AMD_MEM_ENCRYPT

+#define VGA_MAP_MEM(x, s)  \
+({ \
+   unsigned long start = (unsigned long)phys_to_virt(x);   \
+   set_memory_decrypted(start, (s) >> PAGE_SHIFT);   \
+   start;  \
+})
+#else
  #define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
+#endif


Can we push the check in and save us the ifdeffery?

#define VGA_MAP_MEM(x, s)   \
({  \
 unsigned long start = (unsigned long)phys_to_virt(x);   \
 \
 if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) \
 set_memory_decrypted(start, (s) >> PAGE_SHIFT); \
 \
 start;  \
})

It does build here. :)



That works for me and it's a lot cleaner.  I'll make the change.

Thanks,
Tom
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-30 Thread Raj, Ashok
On Tue, May 30, 2017 at 02:50:33PM -0500, Bjorn Helgaas wrote:
> On Tue, May 30, 2017 at 09:25:49AM -0700, Ashok Raj wrote:
> > From: CQ Tang 
> > 
> > Requires: https://patchwork.kernel.org/patch/9593891
> 
> The above patch (9593891) is not in my tree or Linus' tree, so I can't
> do anything with this yet.

I resent the patch as part of this series.. maybe should have massaged the 
commit message.. my bad :-(. Its the first patch in this series.

Jean mentioned it might take him a while before the SVM patches stabilize.
Since this patch is pretty much stand alone, he asked if i could resend it
along with the one i need.

Sorry for any confusion.

Cheers,
Ashok
> 
> > After a FLR, pci-states need to be restored. This patch saves PASID features
> > and PRI reqs cached.
> > 
> > To: Bjorn Helgaas 
> > To: Joerg Roedel 
> > To: linux-...@vger.kernel.org
> > To: linux-ker...@vger.kernel.org
> > Cc: Jean-Phillipe Brucker 
> > Cc: David Woodhouse 
> > Cc: iommu@lists.linux-foundation.org
> > 
> > Signed-off-by: CQ Tang 
> > Signed-off-by: Ashok Raj 
> > ---
> >  drivers/pci/ats.c   | 65 
> > +
> >  drivers/pci/pci.c   |  3 +++
> >  include/linux/pci-ats.h | 10 
> >  include/linux/pci.h |  6 +
> >  4 files changed, 69 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> > index 2126497..a769955 100644
> > --- a/drivers/pci/ats.c
> > +++ b/drivers/pci/ats.c
> > @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
> > if (!pos)
> > return -EINVAL;
> >  
> > -   pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
> > pci_read_config_word(pdev, pos + PCI_PRI_STATUS, );
> > -   if ((control & PCI_PRI_CTRL_ENABLE) ||
> > -   !(status & PCI_PRI_STATUS_STOPPED))
> > +   if (!(status & PCI_PRI_STATUS_STOPPED))
> > return -EBUSY;
> >  
> > pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, _requests);
> > reqs = min(max_requests, reqs);
> > +   pdev->pri_reqs_alloc = reqs;
> > pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> >  
> > -   control |= PCI_PRI_CTRL_ENABLE;
> > +   control = PCI_PRI_CTRL_ENABLE;
> > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> >  
> > pdev->pri_enabled = 1;
> > @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev)
> >  EXPORT_SYMBOL_GPL(pci_disable_pri);
> >  
> >  /**
> > + * pci_restore_pri_state - Restore PRI
> > + * @pdev: PCI device structure
> > + *
> > + */
> > +void pci_restore_pri_state(struct pci_dev *pdev)
> > +{
> > +   u16 control = PCI_PRI_CTRL_ENABLE;
> > +   u32 reqs = pdev->pri_reqs_alloc;
> > +   int pos;
> > +
> > +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> > +   if (!pos)
> > +   return;
> > +
> > +   if (!pdev->pri_enabled)
> > +   return;
> > +
> > +   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
> > +   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> > +}
> > +EXPORT_SYMBOL_GPL(pci_restore_pri_state);
> > +
> > +/**
> >   * pci_reset_pri - Resets device's PRI state
> >   * @pdev: PCI device structure
> >   *
> > @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev)
> > if (!pos)
> > return -EINVAL;
> >  
> > -   pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
> > -   if (control & PCI_PRI_CTRL_ENABLE)
> > -   return -EBUSY;
> > -
> > -   control |= PCI_PRI_CTRL_RESET;
> > -
> > +   control = PCI_PRI_CTRL_RESET;
> > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
> >  
> > return 0;
> > @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int 
> > features)
> > if (!pos)
> > return -EINVAL;
> >  
> > -   pci_read_config_word(pdev, pos + PCI_PASID_CTRL, );
> > pci_read_config_word(pdev, pos + PCI_PASID_CAP, );
> > -
> > -   if (control & PCI_PASID_CTRL_ENABLE)
> > -   return -EINVAL;
> > -
> > supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
> >  
> > /* User wants to enable anything unsupported? */
> > @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
> > return -EINVAL;
> >  
> > control = PCI_PASID_CTRL_ENABLE | features;
> > +   pdev->pasid_features = features;
> >  
> > pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
> >  
> > @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev)
> >  EXPORT_SYMBOL_GPL(pci_disable_pasid);
> >  
> >  /**
> > + * pci_restore_pasid_state - Restore PASID capabilities.
> > + * @pdev: PCI device structure
> > + *
> > + */
> > +void pci_restore_pasid_state(struct pci_dev *pdev)
> > +{
> > +   u16 control;
> > +   int pos;
> > +
> > +   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
> > +   if (!pos)
> > +   return;
> > +
> > +   if 

Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear

2017-05-30 Thread Tom Lendacky

On 5/26/2017 11:35 AM, Borislav Petkov wrote:

On Fri, May 26, 2017 at 11:22:36AM -0500, Tom Lendacky wrote:

In addition to the same issue as efi.memmap.phys_map, efi_phys has
the __initdata attribute so it will be released/freed which will cause
problems in checks performed afterwards.


Sounds to me like we should drop the __initdata attr and prepare them
much earlier for use by the SME code.


Probably something we can look at for a follow-on patch.

Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME

2017-05-30 Thread Tom Lendacky

On 5/25/2017 11:17 PM, Xunlei Pang wrote:

On 04/19/2017 at 05:21 AM, Tom Lendacky wrote:

Provide support so that kexec can be used to boot a kernel when SME is
enabled.

Support is needed to allocate pages for kexec without encryption.  This
is needed in order to be able to reboot in the kernel in the same manner
as originally booted.


Hi Tom,

Looks like kdump will break, I didn't see the similar handling for kdump cases, 
see kernel:
 kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc. >
We need to support kdump with SME, kdump 
kernel/initramfs/purgatory/elfcorehdr/etc
are all loaded into the reserved memory(see crashkernel=X) by userspace 
kexec-tools.
I think a straightforward way would be to mark the whole reserved memory range 
without
encryption before loading all the kexec segments for kdump, I guess we can 
handle this
easily in arch_kexec_unprotect_crashkres().


Yes, that would work.



Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped 
to the
encrypted data.


This is an area that I'm not familiar with, so I don't completely
understand the flow in regards to where/when/how the ELF headers are
copied and what needs to be done.

Can you elaborate a bit on this?

Thanks,
Tom



Regards,
Xunlei



Additionally, when shutting down all of the CPUs we need to be sure to
flush the caches and then halt. This is needed when booting from a state
where SME was not active into a state where SME is active (or vice-versa).
Without these steps, it is possible for cache lines to exist for the same
physical location but tagged both with and without the encryption bit. This
can cause random memory corruption when caches are flushed depending on
which cacheline is written last.

Signed-off-by: Tom Lendacky 
---
  arch/x86/include/asm/init.h  |1 +
  arch/x86/include/asm/irqflags.h  |5 +
  arch/x86/include/asm/kexec.h |8 
  arch/x86/include/asm/pgtable_types.h |1 +
  arch/x86/kernel/machine_kexec_64.c   |   35 +-
  arch/x86/kernel/process.c|   26 +++--
  arch/x86/mm/ident_map.c  |   11 +++
  include/linux/kexec.h|   14 ++
  kernel/kexec_core.c  |7 +++
  9 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 737da62..b2ec511 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -6,6 +6,7 @@ struct x86_mapping_info {
void *context;   /* context for alloc_pgt_page */
unsigned long pmd_flag;  /* page flag for PMD entry */
unsigned long offset;/* ident mapping offset */
+   unsigned long kernpg_flag;   /* kernel pagetable flag override */
  };
  
  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index ac7692d..38b5920 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void)
asm volatile("hlt": : :"memory");
  }
  
+static inline __cpuidle void native_wbinvd_halt(void)

+{
+   asm volatile("wbinvd; hlt" : : : "memory");
+}
+
  #endif
  
  #ifdef CONFIG_PARAVIRT

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 70ef205..e8183ac 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -207,6 +207,14 @@ struct kexec_entry64_regs {
uint64_t r15;
uint64_t rip;
  };
+
+extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
+  gfp_t gfp);
+#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
+
+extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
+#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
+
  #endif
  
  typedef void crash_vmclear_fn(void);

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index ce8cb1c..0f326f4 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -213,6 +213,7 @@ enum page_cache_mode {
  #define PAGE_KERNEL   __pgprot(__PAGE_KERNEL | _PAGE_ENC)
  #define PAGE_KERNEL_RO__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
  #define PAGE_KERNEL_EXEC  __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC_NOENC __pgprot(__PAGE_KERNEL_EXEC)
  #define PAGE_KERNEL_RX__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
  #define PAGE_KERNEL_NOCACHE   __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
  #define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 085c3b3..11c0ca9 100644
--- 

Re: [PATCH 1/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-30 Thread Raj, Ashok
On Thu, May 11, 2017 at 11:50:24AM +0100, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 10/05/17 19:39, Ashok Raj wrote:
> > From: CQ Tang 
> > 
> > Requires: https://patchwork.kernel.org/patch/9593891
> 
> Since your series is likely to go in much earlier than my SVM mess, maybe
> you could carry that PCI patch along with it? Or I could resend it on its
> own if you prefer.

I have send the 2 patches to the list, so we can get them in before the long
SVM series gets in. 



> 
> I'm planning to resend the SVM series in a few weeks but it still won't
> make it into mainline since it hasn't run on hardware.
> 
> Thanks,
> Jean-Philippe
> 
Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] Save and restore pci properties to support FLR

2017-05-30 Thread Ashok Raj
Resending Jean's patch so it can be included earlier than his large
SVM commits. Original patch https://patchwork.kernel.org/patch/9593891
was ack'ed by Bjorn. Let's commit these separately since we need
functionality earlier.

Resending this series as requested by Jean.

CQ Tang (1):
  PCI: Save properties required to handle FLR for replay purposes.

Jean-Philippe Brucker (1):
  PCI: Cache PRI and PASID bits in pci_dev

 drivers/pci/ats.c   | 88 -
 drivers/pci/pci.c   |  3 ++
 include/linux/pci-ats.h | 10 ++
 include/linux/pci.h |  8 +
 4 files changed, 94 insertions(+), 15 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] PCI: Save properties required to handle FLR for replay purposes.

2017-05-30 Thread Ashok Raj
From: CQ Tang 

Requires: https://patchwork.kernel.org/patch/9593891


After a FLR, pci-states need to be restored. This patch saves PASID features
and PRI reqs cached.

To: Bjorn Helgaas 
To: Joerg Roedel 
To: linux-...@vger.kernel.org
To: linux-ker...@vger.kernel.org
Cc: Jean-Phillipe Brucker 
Cc: David Woodhouse 
Cc: iommu@lists.linux-foundation.org

Signed-off-by: CQ Tang 
Signed-off-by: Ashok Raj 
---
 drivers/pci/ats.c   | 65 +
 drivers/pci/pci.c   |  3 +++
 include/linux/pci-ats.h | 10 
 include/linux/pci.h |  6 +
 4 files changed, 69 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 2126497..a769955 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs)
if (!pos)
return -EINVAL;
 
-   pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
pci_read_config_word(pdev, pos + PCI_PRI_STATUS, );
-   if ((control & PCI_PRI_CTRL_ENABLE) ||
-   !(status & PCI_PRI_STATUS_STOPPED))
+   if (!(status & PCI_PRI_STATUS_STOPPED))
return -EBUSY;
 
pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, _requests);
reqs = min(max_requests, reqs);
+   pdev->pri_reqs_alloc = reqs;
pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
 
-   control |= PCI_PRI_CTRL_ENABLE;
+   control = PCI_PRI_CTRL_ENABLE;
pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
 
pdev->pri_enabled = 1;
@@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev)
 EXPORT_SYMBOL_GPL(pci_disable_pri);
 
 /**
+ * pci_restore_pri_state - Restore PRI
+ * @pdev: PCI device structure
+ *
+ */
+void pci_restore_pri_state(struct pci_dev *pdev)
+{
+   u16 control = PCI_PRI_CTRL_ENABLE;
+   u32 reqs = pdev->pri_reqs_alloc;
+   int pos;
+
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
+   if (!pos)
+   return;
+
+   if (!pdev->pri_enabled)
+   return;
+
+   pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs);
+   pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
+}
+EXPORT_SYMBOL_GPL(pci_restore_pri_state);
+
+/**
  * pci_reset_pri - Resets device's PRI state
  * @pdev: PCI device structure
  *
@@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev)
if (!pos)
return -EINVAL;
 
-   pci_read_config_word(pdev, pos + PCI_PRI_CTRL, );
-   if (control & PCI_PRI_CTRL_ENABLE)
-   return -EBUSY;
-
-   control |= PCI_PRI_CTRL_RESET;
-
+   control = PCI_PRI_CTRL_RESET;
pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control);
 
return 0;
@@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (!pos)
return -EINVAL;
 
-   pci_read_config_word(pdev, pos + PCI_PASID_CTRL, );
pci_read_config_word(pdev, pos + PCI_PASID_CAP, );
-
-   if (control & PCI_PASID_CTRL_ENABLE)
-   return -EINVAL;
-
supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
 
/* User wants to enable anything unsupported? */
@@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
return -EINVAL;
 
control = PCI_PASID_CTRL_ENABLE | features;
+   pdev->pasid_features = features;
 
pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
 
@@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev)
 EXPORT_SYMBOL_GPL(pci_disable_pasid);
 
 /**
+ * pci_restore_pasid_state - Restore PASID capabilities.
+ * @pdev: PCI device structure
+ *
+ */
+void pci_restore_pasid_state(struct pci_dev *pdev)
+{
+   u16 control;
+   int pos;
+
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID);
+   if (!pos)
+   return;
+
+   if (!pdev->pasid_enabled)
+   return;
+
+   control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features;
+   pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control);
+}
+EXPORT_SYMBOL_GPL(pci_restore_pasid_state);
+
+/**
  * pci_pasid_features - Check which PASID features are supported
  * @pdev: PCI device structure
  *
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7904d02..c9a6510 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1171,6 +1172,8 @@ void pci_restore_state(struct pci_dev *dev)
 
/* PCI Express register must be restored first */
pci_restore_pcie_state(dev);
+   pci_restore_pasid_state(dev);
+   pci_restore_pri_state(dev);
pci_restore_ats_state(dev);
pci_restore_vc_state(dev);
 
diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
index 57e0b82..782fb8e 100644
--- a/include/linux/pci-ats.h

Re: [PATCH] iommu/amd: Propagate IOVA allocation failure

2017-05-30 Thread Robin Murphy
On 30/05/17 10:48, Joerg Roedel wrote:
> On Fri, May 26, 2017 at 07:31:26PM +0100, Robin Murphy wrote:
>> Unlike the old allocator, alloc_iova_fast() will return 0 if it failed
>> to allocate a PFN. Since the callers of dma_ops_alloc_iova() would end
>> up treating that as a valid address, translate it to the DMA_ERROR_CODE
>> that they would expect.
>>
>> Fixes: 256e4621c21a ("iommu/amd: Make use of the generic IOVA allocator")
>> Signed-off-by: Robin Murphy 
>> ---
>>
>> Just something I spotted whilst comparing dma_map_page() callchains...
>>
>>  drivers/iommu/amd_iommu.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 63cacf5d6cf2..489dc302899e 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -1555,6 +1555,9 @@ static unsigned long dma_ops_alloc_iova(struct device 
>> *dev,
>>  if (!pfn)
>>  pfn = alloc_iova_fast(_dom->iovad, pages, 
>> IOVA_PFN(dma_mask));
>>  
>> +if (!pfn)
>> +return DMA_ERROR_CODE;
>> +
> 
> That shouldn't make a difference on x86 because the DMA_ERROR_CODE is 0
> as well.

Ha, now that I did manage to overlook. Oh well, never mind then :)

Thanks,
Robin.

> 
> 
>   Joerg
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Device address specific mapping of arm,mmu-500

2017-05-30 Thread Marc Zyngier
On 30/05/17 18:16, Ray Jui wrote:
> Hi Marc,
> 
> On 5/30/17 9:59 AM, Marc Zyngier wrote:
>> On 30/05/17 17:49, Ray Jui wrote:
>>> Hi Will,
>>>
>>> On 5/30/17 8:14 AM, Will Deacon wrote:
 On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote:
> I'm writing to check with you to see if the latest arm-smmu.c driver in
> v4.12-rc Linux for smmu-500 can support mapping that is only specific to
> a particular physical address range while leave the rest still to be
> handled by the client device. I believe this can already be supported by
> the device tree binding of the generic IOMMU framework; however, it is
> not clear to me whether or not the arm-smmu.c driver can support it.
>
> To give you some background information:
>
> We have a SoC that has PCIe root complex that has a build-in logic block
> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block
> has a HW bug that causes the MSI writes not parsed properly and can
> potentially corrupt data in the internal FIFO. A workaround is to have
> ARM MMU-500 takes care of all inbound transactions. I found that is
> working after hooking up our PCIe root complex to MMU-500; however, even
> with this optimized arm-smmu driver in v4.12, I'm still seeing a
> significant Ethernet throughput drop in both the TX and RX directions.
> The throughput drop is very significant at around 50% (but is already
> much improved compared to other prior kernel versions at 70~90%).

 Did Robin's experiments help at all with this?

 http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf

>>>
>>> It looks like these are new optimizations that have not yet been merged
>>> in v4.12? I'm going to give it a try.
>>>
> One alternative is to only use MMU-500 for MSI writes towards
> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific
> region of physical address that I want MMU-500 to act on and leave the
> rest of inbound transactions to be handled directly by our PCIe
> controller, it can potentially work around the HW bug we have and at the
> same time achieve optimal throughput.

 I don't think you can bypass the SMMU for MSIs unless you give them their
 own StreamIDs, which is likely to break things horribly in the kernel. You
 could try to create an identity mapping, but you'll still have the
 translation overhead and you'd probably end up having to supply your own 
 DMA
 ops to manage the address space. I'm assuming that you need to prevent the
 physical address of the ITS from being allocated as an IOVA?
>>>
>>> Will, is that a HW limitation that the SMMU cannot be used, only for MSI
>>> writes, in which case, the physical address range is very specific in
>>> our ASIC that falls in the device memory region (e.g., below 0x8000)?
>>>
>>> In fact, what I need in this case is a static mapping from IOMMU on the
>>> physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the
>>> address that MSI writes go to. This is to bypass the MSI forwarding
>>> logic in our PCIe controller. At the same time, I can leave the rest of
>>> inbound transactions to be handled by our PCIe controller without going
>>> through the MMU.
>>
>> How is that going to work for DMA? I imagine your network interfaces do
>> have to access memory, don't they? How can the transactions be
>> terminated in the PCIe controller?
> 
> Sorry, I may not phrase this properly. These inbound transactions (DMA
> write to DDR, from endpoint) do not terminate in the PCIe controller.
> They are taken by the PCIe controller as PCIe transactions and will be
> carried towards the designated memory on the host.

So what is the StreamID used for these transactions? Is that a different
StreamID from that of the DMAing device? If you want to avoid the SMMU
effect on the transaction, you must make sure if doesn't match anything
there.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Device address specific mapping of arm,mmu-500

2017-05-30 Thread Robin Murphy
On 30/05/17 17:49, Ray Jui wrote:
> Hi Will,
> 
> On 5/30/17 8:14 AM, Will Deacon wrote:
>> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote:
>>> I'm writing to check with you to see if the latest arm-smmu.c driver in
>>> v4.12-rc Linux for smmu-500 can support mapping that is only specific to
>>> a particular physical address range while leave the rest still to be
>>> handled by the client device. I believe this can already be supported by
>>> the device tree binding of the generic IOMMU framework; however, it is
>>> not clear to me whether or not the arm-smmu.c driver can support it.
>>>
>>> To give you some background information:
>>>
>>> We have a SoC that has PCIe root complex that has a build-in logic block
>>> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block
>>> has a HW bug that causes the MSI writes not parsed properly and can
>>> potentially corrupt data in the internal FIFO. A workaround is to have
>>> ARM MMU-500 takes care of all inbound transactions. I found that is
>>> working after hooking up our PCIe root complex to MMU-500; however, even
>>> with this optimized arm-smmu driver in v4.12, I'm still seeing a
>>> significant Ethernet throughput drop in both the TX and RX directions.
>>> The throughput drop is very significant at around 50% (but is already
>>> much improved compared to other prior kernel versions at 70~90%).
>>
>> Did Robin's experiments help at all with this?
>>
>> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf
>>
> 
> It looks like these are new optimizations that have not yet been merged
> in v4.12? I'm going to give it a try.

Actually, most of the stuff there did land in 4.12 - only the
iommu/pgtable part is experimental stuff which hasn't been on the list
yet (but hopefully should be soon).

>>> One alternative is to only use MMU-500 for MSI writes towards
>>> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific
>>> region of physical address that I want MMU-500 to act on and leave the
>>> rest of inbound transactions to be handled directly by our PCIe
>>> controller, it can potentially work around the HW bug we have and at the
>>> same time achieve optimal throughput.
>>
>> I don't think you can bypass the SMMU for MSIs unless you give them their
>> own StreamIDs, which is likely to break things horribly in the kernel. You
>> could try to create an identity mapping, but you'll still have the
>> translation overhead and you'd probably end up having to supply your own DMA
>> ops to manage the address space. I'm assuming that you need to prevent the
>> physical address of the ITS from being allocated as an IOVA?
> 
> Will, is that a HW limitation that the SMMU cannot be used, only for MSI
> writes, in which case, the physical address range is very specific in
> our ASIC that falls in the device memory region (e.g., below 0x8000)?

Yes, either translation is enabled or it isn't - we don't have
GART-style apertures. To segregate by address the best you can do is set
up the page tables to identity-map all of the "untranslated" address
space. As Will mentioned, if MSI writes could be distinguished from DMA
writes by Stream ID, rather than by address, then there would be more
options, but in the PCI case at least that's not generally possible.

Robin.

> In fact, what I need in this case is a static mapping from IOMMU on the
> physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the
> address that MSI writes go to. This is to bypass the MSI forwarding
> logic in our PCIe controller. At the same time, I can leave the rest of
> inbound transactions to be handled by our PCIe controller without going
> through the MMU.
> 
>>
>>> Any feedback from you is greatly appreciated!
>>
>> Fix the hardware ;)
> 
> Indeed that has to happen with the next revision of the ASIC. But as you
> can see I'm getting quite desperate here trying to find an interim solution.
> 
>>
>> Will
>>
> 
> Thanks for the help!
> 
> Ray
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Device address specific mapping of arm,mmu-500

2017-05-30 Thread Ray Jui via iommu
Hi Marc,

On 5/30/17 9:59 AM, Marc Zyngier wrote:
> On 30/05/17 17:49, Ray Jui wrote:
>> Hi Will,
>>
>> On 5/30/17 8:14 AM, Will Deacon wrote:
>>> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote:
 I'm writing to check with you to see if the latest arm-smmu.c driver in
 v4.12-rc Linux for smmu-500 can support mapping that is only specific to
 a particular physical address range while leave the rest still to be
 handled by the client device. I believe this can already be supported by
 the device tree binding of the generic IOMMU framework; however, it is
 not clear to me whether or not the arm-smmu.c driver can support it.

 To give you some background information:

 We have a SoC that has PCIe root complex that has a build-in logic block
 to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block
 has a HW bug that causes the MSI writes not parsed properly and can
 potentially corrupt data in the internal FIFO. A workaround is to have
 ARM MMU-500 takes care of all inbound transactions. I found that is
 working after hooking up our PCIe root complex to MMU-500; however, even
 with this optimized arm-smmu driver in v4.12, I'm still seeing a
 significant Ethernet throughput drop in both the TX and RX directions.
 The throughput drop is very significant at around 50% (but is already
 much improved compared to other prior kernel versions at 70~90%).
>>>
>>> Did Robin's experiments help at all with this?
>>>
>>> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf
>>>
>>
>> It looks like these are new optimizations that have not yet been merged
>> in v4.12? I'm going to give it a try.
>>
 One alternative is to only use MMU-500 for MSI writes towards
 GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific
 region of physical address that I want MMU-500 to act on and leave the
 rest of inbound transactions to be handled directly by our PCIe
 controller, it can potentially work around the HW bug we have and at the
 same time achieve optimal throughput.
>>>
>>> I don't think you can bypass the SMMU for MSIs unless you give them their
>>> own StreamIDs, which is likely to break things horribly in the kernel. You
>>> could try to create an identity mapping, but you'll still have the
>>> translation overhead and you'd probably end up having to supply your own DMA
>>> ops to manage the address space. I'm assuming that you need to prevent the
>>> physical address of the ITS from being allocated as an IOVA?
>>
>> Will, is that a HW limitation that the SMMU cannot be used, only for MSI
>> writes, in which case, the physical address range is very specific in
>> our ASIC that falls in the device memory region (e.g., below 0x8000)?
>>
>> In fact, what I need in this case is a static mapping from IOMMU on the
>> physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the
>> address that MSI writes go to. This is to bypass the MSI forwarding
>> logic in our PCIe controller. At the same time, I can leave the rest of
>> inbound transactions to be handled by our PCIe controller without going
>> through the MMU.
> 
> How is that going to work for DMA? I imagine your network interfaces do
> have to access memory, don't they? How can the transactions be
> terminated in the PCIe controller?

Sorry, I may not phrase this properly. These inbound transactions (DMA
write to DDR, from endpoint) do not terminate in the PCIe controller.
They are taken by the PCIe controller as PCIe transactions and will be
carried towards the designated memory on the host.

> 
> Thanks,
> 
>   M.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Device address specific mapping of arm,mmu-500

2017-05-30 Thread Marc Zyngier
On 30/05/17 17:49, Ray Jui wrote:
> Hi Will,
> 
> On 5/30/17 8:14 AM, Will Deacon wrote:
>> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote:
>>> I'm writing to check with you to see if the latest arm-smmu.c driver in
>>> v4.12-rc Linux for smmu-500 can support mapping that is only specific to
>>> a particular physical address range while leave the rest still to be
>>> handled by the client device. I believe this can already be supported by
>>> the device tree binding of the generic IOMMU framework; however, it is
>>> not clear to me whether or not the arm-smmu.c driver can support it.
>>>
>>> To give you some background information:
>>>
>>> We have a SoC that has PCIe root complex that has a build-in logic block
>>> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block
>>> has a HW bug that causes the MSI writes not parsed properly and can
>>> potentially corrupt data in the internal FIFO. A workaround is to have
>>> ARM MMU-500 takes care of all inbound transactions. I found that is
>>> working after hooking up our PCIe root complex to MMU-500; however, even
>>> with this optimized arm-smmu driver in v4.12, I'm still seeing a
>>> significant Ethernet throughput drop in both the TX and RX directions.
>>> The throughput drop is very significant at around 50% (but is already
>>> much improved compared to other prior kernel versions at 70~90%).
>>
>> Did Robin's experiments help at all with this?
>>
>> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf
>>
> 
> It looks like these are new optimizations that have not yet been merged
> in v4.12? I'm going to give it a try.
> 
>>> One alternative is to only use MMU-500 for MSI writes towards
>>> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific
>>> region of physical address that I want MMU-500 to act on and leave the
>>> rest of inbound transactions to be handled directly by our PCIe
>>> controller, it can potentially work around the HW bug we have and at the
>>> same time achieve optimal throughput.
>>
>> I don't think you can bypass the SMMU for MSIs unless you give them their
>> own StreamIDs, which is likely to break things horribly in the kernel. You
>> could try to create an identity mapping, but you'll still have the
>> translation overhead and you'd probably end up having to supply your own DMA
>> ops to manage the address space. I'm assuming that you need to prevent the
>> physical address of the ITS from being allocated as an IOVA?
> 
> Will, is that a HW limitation that the SMMU cannot be used, only for MSI
> writes, in which case, the physical address range is very specific in
> our ASIC that falls in the device memory region (e.g., below 0x8000)?
> 
> In fact, what I need in this case is a static mapping from IOMMU on the
> physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the
> address that MSI writes go to. This is to bypass the MSI forwarding
> logic in our PCIe controller. At the same time, I can leave the rest of
> inbound transactions to be handled by our PCIe controller without going
> through the MMU.

How is that going to work for DMA? I imagine your network interfaces do
have to access memory, don't they? How can the transactions be
terminated in the PCIe controller?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Device address specific mapping of arm,mmu-500

2017-05-30 Thread Ray Jui via iommu
Hi Will,

On 5/30/17 8:14 AM, Will Deacon wrote:
> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote:
>> I'm writing to check with you to see if the latest arm-smmu.c driver in
>> v4.12-rc Linux for smmu-500 can support mapping that is only specific to
>> a particular physical address range while leave the rest still to be
>> handled by the client device. I believe this can already be supported by
>> the device tree binding of the generic IOMMU framework; however, it is
>> not clear to me whether or not the arm-smmu.c driver can support it.
>>
>> To give you some background information:
>>
>> We have a SoC that has PCIe root complex that has a build-in logic block
>> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block
>> has a HW bug that causes the MSI writes not parsed properly and can
>> potentially corrupt data in the internal FIFO. A workaround is to have
>> ARM MMU-500 takes care of all inbound transactions. I found that is
>> working after hooking up our PCIe root complex to MMU-500; however, even
>> with this optimized arm-smmu driver in v4.12, I'm still seeing a
>> significant Ethernet throughput drop in both the TX and RX directions.
>> The throughput drop is very significant at around 50% (but is already
>> much improved compared to other prior kernel versions at 70~90%).
> 
> Did Robin's experiments help at all with this?
> 
> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf
> 

It looks like these are new optimizations that have not yet been merged
in v4.12? I'm going to give it a try.

>> One alternative is to only use MMU-500 for MSI writes towards
>> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific
>> region of physical address that I want MMU-500 to act on and leave the
>> rest of inbound transactions to be handled directly by our PCIe
>> controller, it can potentially work around the HW bug we have and at the
>> same time achieve optimal throughput.
> 
> I don't think you can bypass the SMMU for MSIs unless you give them their
> own StreamIDs, which is likely to break things horribly in the kernel. You
> could try to create an identity mapping, but you'll still have the
> translation overhead and you'd probably end up having to supply your own DMA
> ops to manage the address space. I'm assuming that you need to prevent the
> physical address of the ITS from being allocated as an IOVA?

Will, is that a HW limitation that the SMMU cannot be used, only for MSI
writes, in which case, the physical address range is very specific in
our ASIC that falls in the device memory region (e.g., below 0x8000)?

In fact, what I need in this case is a static mapping from IOMMU on the
physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the
address that MSI writes go to. This is to bypass the MSI forwarding
logic in our PCIe controller. At the same time, I can leave the rest of
inbound transactions to be handled by our PCIe controller without going
through the MMU.

> 
>> Any feedback from you is greatly appreciated!
> 
> Fix the hardware ;)

Indeed that has to happen with the next revision of the ASIC. But as you
can see I'm getting quite desperate here trying to find an interim solution.

> 
> Will
> 

Thanks for the help!

Ray
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear

2017-05-30 Thread Tom Lendacky

On 5/21/2017 2:16 AM, Borislav Petkov wrote:

On Fri, May 19, 2017 at 03:50:32PM -0500, Tom Lendacky wrote:

The "worker" function would be doing the loop through the setup data,
but since the setup data is mapped inside the loop I can't do the __init
calling the non-init function and still hope to consolidate the code.
Maybe I'm missing something here...


Hmm, I see what you mean. But the below change ontop doesn't fire any
warnings here. Maybe your .config has something set which I don't...


Check if you have CONFIG_DEBUG_SECTION_MISMATCH=y

Thanks,
Tom



---
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 55317ba3b6dc..199c983192ae 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -515,71 +515,50 @@ static bool memremap_is_efi_data(resource_size_t 
phys_addr,
   * Examine the physical address to determine if it is boot data by checking
   * it against the boot params setup_data chain.
   */
-static bool memremap_is_setup_data(resource_size_t phys_addr,
-  unsigned long size)
+static bool
+__memremap_is_setup_data(resource_size_t phys_addr, unsigned long size, bool 
early)
  {
struct setup_data *data;
u64 paddr, paddr_next;
+   u32 len;
  
  	paddr = boot_params.hdr.setup_data;

while (paddr) {
-   bool is_setup_data = false;
  
  		if (phys_addr == paddr)

return true;
  
-		data = memremap(paddr, sizeof(*data),

-   MEMREMAP_WB | MEMREMAP_DEC);
+   if (early)
+   data = early_memremap_decrypted(paddr, sizeof(*data));
+   else
+   data = memremap(paddr, sizeof(*data), MEMREMAP_WB | 
MEMREMAP_DEC);
  
  		paddr_next = data->next;

+   len = data->len;
  
-		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))

-   is_setup_data = true;
+   if (early)
+   early_memunmap(data, sizeof(*data));
+   else
+   memunmap(data);
  
-		memunmap(data);
  
-		if (is_setup_data)

+   if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
return true;
  
  		paddr = paddr_next;

}
-
return false;
  }
  
-/*

- * Examine the physical address to determine if it is boot data by checking
- * it against the boot params setup_data chain (early boot version).
- */
  static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
unsigned long size)
  {
-   struct setup_data *data;
-   u64 paddr, paddr_next;
-
-   paddr = boot_params.hdr.setup_data;
-   while (paddr) {
-   bool is_setup_data = false;
-
-   if (phys_addr == paddr)
-   return true;
-
-   data = early_memremap_decrypted(paddr, sizeof(*data));
-
-   paddr_next = data->next;
-
-   if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
-   is_setup_data = true;
-
-   early_memunmap(data, sizeof(*data));
-
-   if (is_setup_data)
-   return true;
-
-   paddr = paddr_next;
-   }
+   return __memremap_is_setup_data(phys_addr, size, true);
+}
  
-	return false;

+static bool memremap_is_setup_data(resource_size_t phys_addr, unsigned long 
size)
+{
+   return __memremap_is_setup_data(phys_addr, size, false);
  }
  
  /*



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place

2017-05-30 Thread Tom Lendacky

On 5/26/2017 11:25 AM, Borislav Petkov wrote:

On Thu, May 25, 2017 at 05:24:27PM -0500, Tom Lendacky wrote:

I guess I could do that, but this will probably only end up clearing a
single PGD entry anyway since it's highly doubtful the address range
would cross a 512GB boundary.


Or you can compute how many 512G-covering, i.e., PGD entries there are
and clear just the right amnount. :^)


I can change the name. As for the use of ENTRY... without the
ENTRY/ENDPROC combination I was receiving a warning about a return
instruction outside of a callable function. It looks like I can just
define the "sme_enc_routine:" label with the ENDPROC and the warning
goes away and the global is avoided. It doesn't like the local labels
(.L...) so I'll use the new name.


Is that warning from objtool or where does it come from?


Yes, it's from objtool:

arch/x86/mm/mem_encrypt_boot.o: warning: objtool: .text+0xd2: return 
instruction outside of a callable function




How do I trigger it locally


I think having CONFIG_STACK_VALIDATION=y will trigger it.




The hardware will try to optimize rep movsb into large chunks assuming
things are aligned, sizes are large enough, etc. so we don't have to
explicitly specify and setup for a rep movsq.


I thought the hw does that for movsq too?


It does.

Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-30 Thread Tom Lendacky

On 5/19/2017 3:16 PM, Josh Poimboeuf wrote:

On Fri, May 19, 2017 at 01:30:05PM +0200, Borislav Petkov wrote:

it is called so early. I can get past it by adding:

CFLAGS_mem_encrypt.o := $(nostackp)

in the arch/x86/mm/Makefile, but that obviously eliminates the support
for the whole file.  Would it be better to split out the sme_enable()
and other boot routines into a separate file or just apply the
$(nostackp) to the whole file?


Josh might have a better idea here... CCed.


I'm the stack validation guy, not the stack protection guy :-)

But there is a way to disable compiler options on a per-function basis
with the gcc __optimize__ function attribute.  For example:

   __attribute__((__optimize__("no-stack-protector")))



I'll look at doing that instead of removing the support for the whole
file.

Thanks,
Tom
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-30 Thread Tom Lendacky

On 5/19/2017 6:30 AM, Borislav Petkov wrote:

On Fri, Apr 21, 2017 at 01:56:13PM -0500, Tom Lendacky wrote:

On 4/18/2017 4:22 PM, Tom Lendacky wrote:

Add support to check if SME has been enabled and if memory encryption
should be activated (checking of command line option based on the
configuration of the default state).  If memory encryption is to be
activated, then the encryption mask is set and the kernel is encrypted
"in place."

Signed-off-by: Tom Lendacky 
---
  arch/x86/kernel/head_64.S |1 +
  arch/x86/mm/mem_encrypt.c |   83 +++--
  2 files changed, 80 insertions(+), 4 deletions(-)



...



-unsigned long __init sme_enable(void)
+unsigned long __init sme_enable(struct boot_params *bp)
  {
+   const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
+   unsigned int eax, ebx, ecx, edx;
+   unsigned long me_mask;
+   bool active_by_default;
+   char buffer[16];


So it turns out that when KASLR is enabled (CONFIG_RAMDOMIZE_BASE=y)
the stack-protector support causes issues with this function because


What issues?


The stack protection support makes use of the gs segment register and
at this point not everything is setup properly to allow it to work,
so it segfaults.

Thanks,
Tom




it is called so early. I can get past it by adding:

CFLAGS_mem_encrypt.o := $(nostackp)

in the arch/x86/mm/Makefile, but that obviously eliminates the support
for the whole file.  Would it be better to split out the sme_enable()
and other boot routines into a separate file or just apply the
$(nostackp) to the whole file?


Josh might have a better idea here... CCed.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-30 Thread Tom Lendacky
On 5/30/2017 9:55 AM, Borislav Petkov wrote:
> On Tue, May 30, 2017 at 09:38:36AM -0500, Tom Lendacky wrote:
>> In this case we're running identity mapped and the "on" constant ends up
>> as kernel address (0x81...) which results in a segfault.
> 
> Would
> 
>   static const char *__on_str = "on";
> 
>   ...
> 
>   if (!strncmp(buffer, __pa_nodebug(__on_str), 2))
>   ...
> 
> work?
> 
> __phys_addr_nodebug() seems to pay attention to phys_base and
> PAGE_OFFSET and so on...

Except that phys_base hasn't been adjusted yet so that doesn't work
either.

> 
> I'd like to avoid that rip-relative address finding in inline asm which
> looks fragile to me.

I can define the command line option and the "on" and "off" values as
character buffers in the function and initialize them on a per character
basis (using a static string causes the same issues as referencing a
string constant), i.e.:

char cmdline_arg[] = {'m', 'e', 'm', '_', 'e', 'n', 'c', 'r', 'y', 'p', 't', 
'\0'};
char cmdline_off[] = {'o', 'f', 'f', '\0'};
char cmdline_on[] = {'o', 'n', '\0'};

It doesn't look the greatest, but it works and removes the need for the
rip-relative addressing.

Thanks,
Tom

> 
> Thanks.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-30 Thread Borislav Petkov
On Tue, May 30, 2017 at 09:38:36AM -0500, Tom Lendacky wrote:
> In this case we're running identity mapped and the "on" constant ends up
> as kernel address (0x81...) which results in a segfault.

Would

static const char *__on_str = "on";

...

if (!strncmp(buffer, __pa_nodebug(__on_str), 2))
...

work?

__phys_addr_nodebug() seems to pay attention to phys_base and
PAGE_OFFSET and so on...

I'd like to avoid that rip-relative address finding in inline asm which
looks fragile to me.

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption

2017-05-30 Thread Tom Lendacky

On 5/19/2017 6:27 AM, Borislav Petkov wrote:

On Tue, Apr 18, 2017 at 04:22:23PM -0500, Tom Lendacky wrote:

Add support to check if SME has been enabled and if memory encryption
should be activated (checking of command line option based on the
configuration of the default state).  If memory encryption is to be
activated, then the encryption mask is set and the kernel is encrypted
"in place."

Signed-off-by: Tom Lendacky 
---
  arch/x86/kernel/head_64.S |1 +
  arch/x86/mm/mem_encrypt.c |   83 +++--
  2 files changed, 80 insertions(+), 4 deletions(-)


...


+unsigned long __init sme_enable(struct boot_params *bp)
  {
+   const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
+   unsigned int eax, ebx, ecx, edx;
+   unsigned long me_mask;
+   bool active_by_default;
+   char buffer[16];
+   u64 msr;
+
+   /* Check for the SME support leaf */
+   eax = 0x8000;
+   ecx = 0;
+   native_cpuid(, , , );
+   if (eax < 0x801f)
+   goto out;
+
+   /*
+* Check for the SME feature:
+*   CPUID Fn8000_001F[EAX] - Bit 0
+* Secure Memory Encryption support
+*   CPUID Fn8000_001F[EBX] - Bits 5:0
+* Pagetable bit position used to indicate encryption
+*/
+   eax = 0x801f;
+   ecx = 0;
+   native_cpuid(, , , );
+   if (!(eax & 1))
+   goto out;


< newline here.


+   me_mask = 1UL << (ebx & 0x3f);
+
+   /* Check if SME is enabled */
+   msr = __rdmsr(MSR_K8_SYSCFG);
+   if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+   goto out;
+
+   /*
+* Fixups have not been applied to phys_base yet, so we must obtain
+* the address to the SME command line option data in the following
+* way.
+*/
+   asm ("lea sme_cmdline_arg(%%rip), %0"
+: "=r" (cmdline_arg)
+: "p" (sme_cmdline_arg));
+   asm ("lea sme_cmdline_on(%%rip), %0"
+: "=r" (cmdline_on)
+: "p" (sme_cmdline_on));
+   asm ("lea sme_cmdline_off(%%rip), %0"
+: "=r" (cmdline_off)
+: "p" (sme_cmdline_off));
+
+   if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
+   active_by_default = true;
+   else
+   active_by_default = false;
+
+   cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
+((u64)bp->ext_cmd_line_ptr << 32));
+
+   cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
+
+   if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)
+   sme_me_mask = me_mask;


Why doesn't simply

if (!strncmp(buffer, "on", 2))
...

work?


In this case we're running identity mapped and the "on" constant ends up
as kernel address (0x81...) which results in a segfault.

Thanks,
Tom




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 3/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126

2017-05-30 Thread Geetha sowjanya
From: Geetha Sowjanya 

Cavium ThunderX2 SMMU doesn't support MSI and also doesn't have unique irq
lines for gerror, eventq and cmdq-sync.

This patch addresses the issue by checking if any interrupt sources are
using same irq number, then they are registered as shared irqs.

Signed-off-by: Geetha Sowjanya 
---
 Documentation/arm64/silicon-errata.txt |1 +
 drivers/iommu/arm-smmu-v3.c|   29 +
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 4693a32..42422f6 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -63,6 +63,7 @@ stable kernels.
 | Cavium | ThunderX Core   | #27456  | CAVIUM_ERRATUM_27456
|
 | Cavium | ThunderX SMMUv2 | #27704  | N/A 
|
 | Cavium | ThunderX2 SMMUv3| #74 | N/A 
|
+| Cavium | ThunderX2 SMMUv3| #126| N/A 
|
 || | | 
|
 | Freescale/NXP  | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 
|
 || | | 
|
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4e80205..d2db01f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2232,6 +2232,25 @@ static void arm_smmu_setup_msis(struct arm_smmu_device 
*smmu)
devm_add_action(dev, arm_smmu_free_msis, dev);
 }
 
+static int get_irq_flags(struct arm_smmu_device *smmu, int irq)
+{
+   int match_count = 0;
+
+   if (irq == smmu->evtq.q.irq)
+   match_count++;
+   if (irq == smmu->cmdq.q.irq)
+   match_count++;
+   if (irq == smmu->gerr_irq)
+   match_count++;
+   if (irq == smmu->priq.q.irq)
+   match_count++;
+
+   if (match_count > 1)
+   return IRQF_SHARED | IRQF_ONESHOT;
+
+   return IRQF_ONESHOT;
+}
+
 static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
 {
int ret, irq;
@@ -2252,7 +2271,7 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
if (irq) {
ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
arm_smmu_evtq_thread,
-   IRQF_ONESHOT,
+   get_irq_flags(smmu, irq),
"arm-smmu-v3-evtq", smmu);
if (ret < 0)
dev_warn(smmu->dev, "failed to enable evtq irq\n");
@@ -2261,7 +2280,8 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
irq = smmu->cmdq.q.irq;
if (irq) {
ret = devm_request_irq(smmu->dev, irq,
-  arm_smmu_cmdq_sync_handler, 0,
+  arm_smmu_cmdq_sync_handler,
+  get_irq_flags(smmu, irq),
   "arm-smmu-v3-cmdq-sync", smmu);
if (ret < 0)
dev_warn(smmu->dev, "failed to enable cmdq-sync irq\n");
@@ -2270,7 +2290,8 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
irq = smmu->gerr_irq;
if (irq) {
ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler,
-  0, "arm-smmu-v3-gerror", smmu);
+   get_irq_flags(smmu, irq),
+   "arm-smmu-v3-gerror", smmu);
if (ret < 0)
dev_warn(smmu->dev, "failed to enable gerror irq\n");
}
@@ -2280,7 +2301,7 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device 
*smmu)
if (irq) {
ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
arm_smmu_priq_thread,
-   IRQF_ONESHOT,
+   get_irq_flags(smmu, 
irq),
"arm-smmu-v3-priq",
smmu);
if (ret < 0)
-- 
1.7.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74

2017-05-30 Thread Geetha sowjanya
From: Linu Cherian 

Cavium ThunderX2 SMMU implementation doesn't support page 1 register space
and PAGE0_REGS_ONLY option is enabled as an errata workaround.
This option when turned on, replaces all page 1 offsets used for
EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.

SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY,
since resource size can be either 64k/128k.
For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
platform_get_resource call, so that SMMU options are set beforehand.

Signed-off-by: Linu Cherian 
Signed-off-by: Geetha Sowjanya 
---
 Documentation/arm64/silicon-errata.txt |1 +
 .../devicetree/bindings/iommu/arm,smmu-v3.txt  |6 ++
 drivers/iommu/arm-smmu-v3.c|   64 +++-
 3 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/Documentation/arm64/silicon-errata.txt 
b/Documentation/arm64/silicon-errata.txt
index 10f2ddd..4693a32 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -62,6 +62,7 @@ stable kernels.
 | Cavium | ThunderX GICv3  | #23154  | CAVIUM_ERRATUM_23154
|
 | Cavium | ThunderX Core   | #27456  | CAVIUM_ERRATUM_27456
|
 | Cavium | ThunderX SMMUv2 | #27704  | N/A 
|
+| Cavium | ThunderX2 SMMUv3| #74 | N/A 
|
 || | | 
|
 | Freescale/NXP  | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 
|
 || | | 
|
diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
index be57550..607e270 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt
@@ -49,6 +49,12 @@ the PCIe specification.
 - hisilicon,broken-prefetch-cmd
 : Avoid sending CMD_PREFETCH_* commands to the SMMU.
 
+- cavium,cn9900-broken-page1-regspace
+: Replaces all page 1 offsets used for EVTQ_PROD/CONS,
+   PRIQ_PROD/CONS register access 
with page 0 offsets.
+   Set for Caviun ThunderX2 
silicon that doesn't support
+   SMMU page1 register space.
+
 ** Example
 
 smmu@2b40 {
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 380969a..4e80205 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -412,6 +412,9 @@
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
+#define ARM_SMMU_PAGE0_REGS_ONLY(smmu) \
+   ((smmu)->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY)
+
 static bool disable_bypass;
 module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
@@ -597,6 +600,7 @@ struct arm_smmu_device {
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
+#define ARM_SMMU_OPT_PAGE0_REGS_ONLY(1 << 1)
u32 options;
 
struct arm_smmu_cmdqcmdq;
@@ -663,9 +667,19 @@ struct arm_smmu_option_prop {
 
 static struct arm_smmu_option_prop arm_smmu_options[] = {
{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
+   { ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
{ 0, NULL},
 };
 
+static inline void __iomem *arm_smmu_page1_fixup(unsigned long offset,
+struct arm_smmu_device *smmu)
+{
+   if (offset > SZ_64K && ARM_SMMU_PAGE0_REGS_ONLY(smmu))
+   offset -= SZ_64K;
+
+   return smmu->base + offset;
+}
+
 static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -1961,8 +1975,8 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device 
*smmu,
return -ENOMEM;
}
 
-   q->prod_reg = smmu->base + prod_off;
-   q->cons_reg = smmu->base + cons_off;
+   q->prod_reg = arm_smmu_page1_fixup(prod_off, smmu);
+   q->cons_reg = arm_smmu_page1_fixup(cons_off, smmu);
q->ent_dwords   = dwords;
 
q->q_base  = Q_BASE_RWA;
@@ -2363,8 +2377,10 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
 
/* Event queue */
writeq_relaxed(smmu->evtq.q.q_base, smmu->base + ARM_SMMU_EVTQ_BASE);
-   writel_relaxed(smmu->evtq.q.prod, smmu->base + ARM_SMMU_EVTQ_PROD);
-   writel_relaxed(smmu->evtq.q.cons, smmu->base + 

[PATCH v7 1/3] ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model

2017-05-30 Thread Geetha sowjanya
From: Linu Cherian 

Cavium ThunderX2 implementation doesn't support second page in SMMU
register space. Hence, resource size is set as 64k for this model.

Signed-off-by: Linu Cherian 
Signed-off-by: Geetha Sowjanya 
---
 drivers/acpi/arm64/iort.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index c5fecf9..bba2b59 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -833,12 +833,20 @@ static void __init arm_smmu_v3_init_resources(struct 
resource *res,
 {
struct acpi_iort_smmu_v3 *smmu;
int num_res = 0;
+   unsigned long size = SZ_128K;
 
/* Retrieve SMMUv3 specific data */
smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
 
+   /*
+* Override the size, for Cavium ThunderX2 implementation
+* which doesn't support the page 1 SMMU register space.
+*/
+   if (smmu->model == ACPI_IORT_SMMU_CAVIUM_CN99XX)
+   size = SZ_64K;
+
res[num_res].start = smmu->base_address;
-   res[num_res].end = smmu->base_address + SZ_128K - 1;
+   res[num_res].end = smmu->base_address + size - 1;
res[num_res].flags = IORESOURCE_MEM;
 
num_res++;
-- 
1.7.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 0/3] Cavium ThunderX2 SMMUv3 errata workarounds

2017-05-30 Thread Geetha sowjanya
Cavium ThunderX2 SMMUv3 implementation has two Silicon Erratas.
1. Errata ID #74
   SMMU register alias Page 1 is not implemented
2. Errata ID #126
   SMMU doesnt support unique IRQ lines and also MSI for gerror,
   eventq and cmdq-sync

The following patchset does software workaround for these two erratas.

This series is based on patchset.
https://www.spinics.net/lists/arm-kernel/msg578443.html

Changes since v6:
   - Changed device tree compatible string to vendor specific.
   - Rebased on Robin's latest "Update SMMU models for IORT rev. C" v2 patch.
 https://www.spinics.net/lists/arm-kernel/msg582809.html

Changes since v5:
  - Rebased on Robin's "Update SMMU models for IORT rev. C" patch.
 https://www.spinics.net/lists/arm-kernel/msg580728.html
  - Replaced ACPI_IORT_SMMU_V3_CAVIUM_CN99XX macro with 
ACPI_IORT_SMMU_CAVIUM_CN99XX

Changes since v4:
 - Replaced all page1 offset macros ARM_SMMU_EVTQ/PRIQ_PROD/CONS with
arm_smmu_page1_fixup(ARM_SMMU_EVTQ/PRIQ_PROD/CONS, smmu)

Changes since v3:
 - Merged patches 1, 2 and 4 of Version 3.
 - Modified the page1_offset_adjust() and get_irq_flags() implementation as
   suggested by Robin.

Changes since v2:
 - Updated "Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt" document 
with
   new SMMU option used to enable errata workaround.

Changes since v1:
 - Since the use of MIDR register is rejected and SMMU_IIDR is broken on this
   silicon, as suggested by Will Deacon modified the patches to use ThunderX2
   SMMUv3 IORT model number to enable errata workaround.

Geetha Sowjanya (1):
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126

Linu Cherian (2):
  ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3
model
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2  erratum
#74

 Documentation/arm64/silicon-errata.txt |2 +
 .../devicetree/bindings/iommu/arm,smmu-v3.txt  |6 ++
 drivers/acpi/arm64/iort.c  |   10 ++-
 drivers/iommu/arm-smmu-v3.c|   93 
 4 files changed, 91 insertions(+), 20 deletions(-)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/arm-smmu-v3: Increase CMDQ drain timeout value

2017-05-30 Thread Will Deacon
Hi Sunil,

On Mon, May 29, 2017 at 02:41:59PM +0530, Sunil Kovvuri wrote:
> On Fri, May 5, 2017 at 4:47 PM,   wrote:
> > From: Sunil Goutham 
> >
> > Processing queue full of TLB invalidation commands might
> > take more time on some platforms than current timeout
> > of 100us. So increased drain timeout value.
> >
> > Also now udelay time is increased exponentially for each poll.
> >
> > Signed-off-by: Sunil Goutham 

[...]

> Gentle reminder for patch review.

Sorry, I forgot to push this out. Should be on my iommu/devel branch now.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] iommu/arm-smmu-v3: Add support for PCI ATS

2017-05-30 Thread Jean-Philippe Brucker
On 30/05/17 11:28, Joerg Roedel wrote:
> On Wed, May 24, 2017 at 07:01:42PM +0100, Jean-Philippe Brucker wrote:
>> * TLB invalidation by range is batched and committed with a single sync.
>>   Batching ATC invalidation is inconvenient, endpoints limit the number of
>>   inflight invalidations. We'd have to count the number of invalidations
>>   queued and send a sync periodically. In addition, I suspect we always
>>   need a sync between TLB and ATC invalidation for the same page.
> 
> This sounds like the number of outstanding ATS invalidations is not
> managed by the SMMU hardware, is that right?

Yes, the hardware doesn't know about ATS queue depth, it simply forwards
invalidations to the root complex. Doing a sync on the SMMU command queue
waits for all completions of outstanding ATS invalidations, but it is up
to the driver to limit ATS invalidations according to queue depth.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/7] dt-bindings: PCI: Describe ATS property for root complex nodes

2017-05-30 Thread Jean-Philippe Brucker
On 30/05/17 11:01, Joerg Roedel wrote:
> On Wed, May 24, 2017 at 07:01:38PM +0100, Jean-Philippe Brucker wrote:
>> +- ats-supported: if present, the root complex supports the Address
>> +  Translation Service (ATS). It is able to interpret the AT field in PCIe
>> +  Transaction Layer Packets, and forward Translation Completions or
>> +  Invalidation Requests to endpoints.
>> +
>> +  Device drivers should not enable ATS in endpoints unless this property
>> +  is present.
> 
> Device drivers should never enable ATS, this is the job of the IOMMU
> driver. This should be clarified in the text.

Right, I will change this.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] iommu/arm-smmu-v3: Add support for PCI ATS

2017-05-30 Thread Joerg Roedel
On Wed, May 24, 2017 at 07:01:42PM +0100, Jean-Philippe Brucker wrote:
> * TLB invalidation by range is batched and committed with a single sync.
>   Batching ATC invalidation is inconvenient, endpoints limit the number of
>   inflight invalidations. We'd have to count the number of invalidations
>   queued and send a sync periodically. In addition, I suspect we always
>   need a sync between TLB and ATC invalidation for the same page.

This sounds like the number of outstanding ATS invalidations is not
managed by the SMMU hardware, is that right?


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/7] dt-bindings: PCI: Describe ATS property for root complex nodes

2017-05-30 Thread Joerg Roedel
On Wed, May 24, 2017 at 07:01:38PM +0100, Jean-Philippe Brucker wrote:
> +- ats-supported: if present, the root complex supports the Address
> +  Translation Service (ATS). It is able to interpret the AT field in PCIe
> +  Transaction Layer Packets, and forward Translation Completions or
> +  Invalidation Requests to endpoints.
> +
> +  Device drivers should not enable ATS in endpoints unless this property
> +  is present.

Device drivers should never enable ATS, this is the job of the IOMMU
driver. This should be clarified in the text.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Propagate IOVA allocation failure

2017-05-30 Thread Joerg Roedel
On Fri, May 26, 2017 at 07:31:26PM +0100, Robin Murphy wrote:
> Unlike the old allocator, alloc_iova_fast() will return 0 if it failed
> to allocate a PFN. Since the callers of dma_ops_alloc_iova() would end
> up treating that as a valid address, translate it to the DMA_ERROR_CODE
> that they would expect.
> 
> Fixes: 256e4621c21a ("iommu/amd: Make use of the generic IOVA allocator")
> Signed-off-by: Robin Murphy 
> ---
> 
> Just something I spotted whilst comparing dma_map_page() callchains...
> 
>  drivers/iommu/amd_iommu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 63cacf5d6cf2..489dc302899e 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1555,6 +1555,9 @@ static unsigned long dma_ops_alloc_iova(struct device 
> *dev,
>   if (!pfn)
>   pfn = alloc_iova_fast(_dom->iovad, pages, 
> IOVA_PFN(dma_mask));
>  
> + if (!pfn)
> + return DMA_ERROR_CODE;
> +

That shouldn't make a difference on x86 because the DMA_ERROR_CODE is 0
as well.


Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: unwrap __get_valid_domain_for_dev()

2017-05-30 Thread Joerg Roedel
On Mon, May 22, 2017 at 06:28:51PM +0800, Peter Xu wrote:
> We do find_domain() in __get_valid_domain_for_dev(), while we do the
> same thing in get_valid_domain_for_dev().  No need to do it twice.
> 
> Signed-off-by: Peter Xu 
> ---
>  drivers/iommu/intel-iommu.c | 16 ++--
>  1 file changed, 2 insertions(+), 14 deletions(-)


Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/6] iommu: of: Fix check for returning EPROBE_DEFER

2017-05-30 Thread Joerg Roedel
On Sat, May 27, 2017 at 07:17:40PM +0530, Sricharan R wrote:
> Now with IOMMU probe deferral, we return -EPROBE_DEFER
> for masters that are connected to an IOMMU which is not
> probed yet, but going to get probed, so that we can attach
> the correct dma_ops. So while trying to defer the probe of
> the master, check if the of_iommu node that it is connected
> to is marked in DT as 'status=disabled', then the IOMMU is never
> is going to get probed. So simply return NULL and let the master
> work without an IOMMU.
> 
> Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
> probing or error")
> Reported-by: Geert Uytterhoeven 
> Reviewed-by: Laurent Pinchart 
> Tested-by: Will Deacon 
> Tested-by: Magnus Damn 
> Acked-by: Will Deacon 
> Signed-off-by: Sricharan R 
> ---
>  drivers/iommu/of_iommu.c | 1 +
>  1 file changed, 1 insertion(+)

Sorry for the confusion, I meant to apply this version of the patch-set
instead of v5.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/5] iommu: of: Fix check for returning EPROBE_DEFER

2017-05-30 Thread Joerg Roedel
On Tue, May 23, 2017 at 06:31:29PM +0530, Sricharan R wrote:
> Now with IOMMU probe deferral, we return -EPROBE_DEFER
> for masters that are connected to an IOMMU which is not
> probed yet, but going to get probed, so that we can attach
> the correct dma_ops. So while trying to defer the probe of
> the master, check if the of_iommu node that it is connected
> to is marked in DT as 'status=disabled', then the IOMMU is never
> is going to get probed. So simply return NULL and let the master
> work without an IOMMU.
> 
> Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred 
> probing or error")
> Signed-off-by: Sricharan R 
> Reported-by: Geert Uytterhoeven 
> Reviewed-by: Laurent Pinchart 
> Tested-by: Will Deacon 
> Tested-by: Magnus Damn 
> Acked-by: Will Deacon 

Applied all to iommu/fixes, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] ACPICA: IORT: Update SMMU models for IORT rev. C

2017-05-30 Thread Joerg Roedel
On Mon, May 22, 2017 at 04:06:37PM +0100, Robin Murphy wrote:
> IORT revision C has been published with a number of new SMMU
> implementation identifiers. Since IORT doesn't have any way of falling
> back to a more generic model code, we really need Linux to know about
> these before vendors start updating their firmware tables to use them.
> 
> CC: Rafael J. Wysocki 
> CC: Robert Moore 
> CC: Lv Zheng 
> Acked-by: Robert Richter 
> Tested-by: Robert Richter 
> Signed-off-by: Robin Murphy 
> ---
> 
> v2: Update more comments, add Robert's tags.

I generally prefer 'Fixes'-tags, can you please add them too?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 3/6] ACPI/IORT: Ignore all errors except EPROBE_DEFER

2017-05-30 Thread Lorenzo Pieralisi
On Mon, May 29, 2017 at 10:36:42AM +0530, Sricharan R wrote:
> Hi Rafael,
> 
> On 5/28/2017 12:48 AM, Rafael J. Wysocki wrote:
> > On Saturday, May 27, 2017 07:17:42 PM Sricharan R wrote:
> >> While deferring the probe of IOMMU masters, xlate and
> >> add_device callbacks called from iort_iommu_configure
> >> can pass back error values like -ENODEV, which means
> >> the IOMMU cannot be connected with that master for real
> >> reasons. Before the IOMMU probe deferral, all such errors
> >> were ignored. Now all those errors are propagated back,
> >> killing the master's probe for such errors. Instead ignore
> >> all the errors except EPROBE_DEFER, which is the only one
> >> of concern and let the master work without IOMMU, thus
> >> restoring the old behavior. Also make explicit that
> >> acpi_dma_configure handles only -EPROBE_DEFER from
> >> iort_iommu_configure.
> >>
> >> Fixes: 5a1bb638d567 ("drivers: acpi: Handle IOMMU lookup failure with 
> >> deferred probing or error")
> >> Signed-off-by: Sricharan R 
> >> ---
> >>  drivers/acpi/arm64/iort.c | 6 ++
> >>  drivers/acpi/scan.c   | 4 ++--
> >>  2 files changed, 8 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> >> index c5fecf9..16e101f 100644
> >> --- a/drivers/acpi/arm64/iort.c
> >> +++ b/drivers/acpi/arm64/iort.c
> >> @@ -782,6 +782,12 @@ const struct iommu_ops *iort_iommu_configure(struct 
> >> device *dev)
> >>if (err)
> >>ops = ERR_PTR(err);
> >>  
> >> +  /* Ignore all other errors apart from EPROBE_DEFER */
> >> +  if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) {
> >> +  dev_dbg(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops));
> >> +  ops = NULL;
> >> +  }
> >> +
> >>return ops;
> >>  }
> >>  
> >> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> >> index e39ec7b..3a10d757 100644
> >> --- a/drivers/acpi/scan.c
> >> +++ b/drivers/acpi/scan.c
> >> @@ -1371,8 +1371,8 @@ int acpi_dma_configure(struct device *dev, enum 
> >> dev_dma_attr attr)
> >>iort_set_dma_mask(dev);
> >>  
> >>iommu = iort_iommu_configure(dev);
> >> -  if (IS_ERR(iommu))
> >> -  return PTR_ERR(iommu);
> >> +  if (IS_ERR(iommu) && PTR_ERR(iommu) == -EPROBE_DEFER)
> >> +  return -EPROBE_DEFER;
> >>  
> >>size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
> >>/*
> >>
> > 
> > ACK for the scan.c change and I'm assuming this to go in via ARM64.
> > 
> 
> Thanks for the ACK, should go through the IOMMU tree, since this fixes the 
> IOMMU probe deferral
> that got merged through it.

Yes and it would be good to get them merged for -rc4 given that there
are bug fixes there, I agree the IOMMU tree is the way they should
go upstream.

Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Device address specific mapping of arm,mmu-500

2017-05-30 Thread Ray Jui via iommu
Hi All,

I'm writing to check with you to see if the latest arm-smmu.c driver in
v4.12-rc Linux for smmu-500 can support mapping that is only specific to
a particular physical address range while leave the rest still to be
handled by the client device. I believe this can already be supported by
the device tree binding of the generic IOMMU framework; however, it is
not clear to me whether or not the arm-smmu.c driver can support it.

To give you some background information:

We have a SoC that has PCIe root complex that has a build-in logic block
to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block
has a HW bug that causes the MSI writes not parsed properly and can
potentially corrupt data in the internal FIFO. A workaround is to have
ARM MMU-500 takes care of all inbound transactions. I found that is
working after hooking up our PCIe root complex to MMU-500; however, even
with this optimized arm-smmu driver in v4.12, I'm still seeing a
significant Ethernet throughput drop in both the TX and RX directions.
The throughput drop is very significant at around 50% (but is already
much improved compared to other prior kernel versions at 70~90%).

One alternative is to only use MMU-500 for MSI writes towards
GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific
region of physical address that I want MMU-500 to act on and leave the
rest of inbound transactions to be handled directly by our PCIe
controller, it can potentially work around the HW bug we have and at the
same time achieve optimal throughput.

Any feedback from you is greatly appreciated!

Best regards,

Ray
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3)

2017-05-30 Thread Nath, Arindam
>-Original Message-
>From: Joerg Roedel [mailto:j...@8bytes.org]
>Sent: Monday, May 29, 2017 8:09 PM
>To: Nath, Arindam ; Lendacky, Thomas
>
>Cc: iommu@lists.linux-foundation.org; amd-...@lists.freedesktop.org;
>Deucher, Alexander ; Bridgman, John
>; dr...@endlessm.com; Suthikulpanit, Suravee
>; li...@endlessm.com; Craig Stein
>; mic...@daenzer.net; Kuehling, Felix
>; sta...@vger.kernel.org
>Subject: Re: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3)
>
>Hi Arindam,
>
>I met Tom Lendacky last week in Nuremberg last week and he told me he is
>working on the same area of the code that this patch is for. His reason
>for touching this code was to solve some locking problems. Maybe you two
>can work together on a joint approach to improve this?

Sure Joerg, I will work with Tom.

Thanks.

>
>On Mon, May 22, 2017 at 01:18:01PM +0530, arindam.n...@amd.com wrote:
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 63cacf5..1edeebec 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -2227,15 +2227,26 @@ static struct iommu_group
>*amd_iommu_device_group(struct device *dev)
>>
>>  static void __queue_flush(struct flush_queue *queue)
>>  {
>> -struct protection_domain *domain;
>> -unsigned long flags;
>>  int idx;
>>
>> -/* First flush TLB of all known domains */
>> -spin_lock_irqsave(_iommu_pd_lock, flags);
>> -list_for_each_entry(domain, _iommu_pd_list, list)
>> -domain_flush_tlb(domain);
>> -spin_unlock_irqrestore(_iommu_pd_lock, flags);
>> +/* First flush TLB of all domains which were added to flush queue */
>> +for (idx = 0; idx < queue->next; ++idx) {
>> +struct flush_queue_entry *entry;
>> +
>> +entry = queue->entries + idx;
>> +
>> +/*
>> + * There might be cases where multiple IOVA entries for the
>> + * same domain are queued in the flush queue. To avoid
>> + * flushing the same domain again, we check whether the
>> + * flag is set or not. This improves the efficiency of
>> + * flush operation.
>> + */
>> +if (!entry->dma_dom->domain.already_flushed) {
>> +entry->dma_dom->domain.already_flushed = true;
>> +domain_flush_tlb(>dma_dom->domain);
>> +}
>> +}
>
>There is also a race condition here I have to look into. It is not
>introduced by your patch, but needs fixing anyway. I'll look into this
>too.
>
>
>Regards,
>
>   Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu