Re: [PATCH v7 2/3] PCI: Add support for PCI inbound windows resources
On Mon, May 22, 2017 at 11:39 AM, Oza Pawandeepwrote: > This patch adds support for inbound memory window > for PCI RC drivers. > > It defines new function pci_create_root_bus2 which > takes inbound resources as an argument and fills in the > memory resource to PCI internal host bridge structure > as inbound_windows. > > Legacy RC driver could continue to use pci_create_root_bus, > but any RC driver who wants to reseve IOVAS for their > inbound memory holes, should use new API pci_create_root_bus2. > > Signed-off-by: Oza Pawandeep > ... > +struct pci_bus *pci_create_root_bus2(struct device *parent, int bus, > + struct pci_ops *ops, void *sysdata, struct list_head > *resources, > + struct list_head *in_res) > +{ > + return pci_create_root_bus_msi(parent, bus, ops, sysdata, > + resources, in_res, NULL); > +} > +EXPORT_SYMBOL_GPL(pci_create_root_bus2); Based on your response to Lorenzo's "[RFC/RFT PATCH 03/18] PCI: Introduce pci_scan_root_bus_bridge()", I'm hoping you can avoid adding yet another variant of pci_create_root_bus(). So I think I can wait for that to settle out and look for a v8? Bjorn ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Save and restore pci properties to support FLR
On Tue, May 30, 2017 at 09:25:47AM -0700, Ashok Raj wrote: > Resending Jean's patch so it can be included earlier than his large > SVM commits. Original patch https://patchwork.kernel.org/patch/9593891 > was ack'ed by Bjorn. Let's commit these separately since we need > functionality earlier. > > Resending this series as requested by Jean. > > CQ Tang (1): > PCI: Save properties required to handle FLR for replay purposes. > > Jean-Philippe Brucker (1): > PCI: Cache PRI and PASID bits in pci_dev > > drivers/pci/ats.c | 88 > - > drivers/pci/pci.c | 3 ++ > include/linux/pci-ats.h | 10 ++ > include/linux/pci.h | 8 + > 4 files changed, 94 insertions(+), 15 deletions(-) Applied to pci/virtualization for v4.13. See response to 2/2 for minor changes I made there. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] PCI: Save properties required to handle FLR for replay purposes.
On Tue, May 30, 2017 at 09:25:49AM -0700, Ashok Raj wrote: > From: CQ Tang> > Requires: https://patchwork.kernel.org/patch/9593891 > > > After a FLR, pci-states need to be restored. This patch saves PASID features > and PRI reqs cached. > > To: Bjorn Helgaas > To: Joerg Roedel > To: linux-...@vger.kernel.org > To: linux-ker...@vger.kernel.org > Cc: Jean-Phillipe Brucker > Cc: David Woodhouse > Cc: iommu@lists.linux-foundation.org > > Signed-off-by: CQ Tang > Signed-off-by: Ashok Raj > --- > drivers/pci/ats.c | 65 > + > drivers/pci/pci.c | 3 +++ > include/linux/pci-ats.h | 10 > include/linux/pci.h | 6 + > 4 files changed, 69 insertions(+), 15 deletions(-) > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > index 2126497..a769955 100644 > --- a/drivers/pci/ats.c > +++ b/drivers/pci/ats.c > @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs) > if (!pos) > return -EINVAL; > > - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, ); > pci_read_config_word(pdev, pos + PCI_PRI_STATUS, ); > - if ((control & PCI_PRI_CTRL_ENABLE) || > - !(status & PCI_PRI_STATUS_STOPPED)) > + if (!(status & PCI_PRI_STATUS_STOPPED)) > return -EBUSY; > > pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, _requests); > reqs = min(max_requests, reqs); > + pdev->pri_reqs_alloc = reqs; > pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs); > > - control |= PCI_PRI_CTRL_ENABLE; > + control = PCI_PRI_CTRL_ENABLE; > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); > > pdev->pri_enabled = 1; > @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev) > EXPORT_SYMBOL_GPL(pci_disable_pri); > > /** > + * pci_restore_pri_state - Restore PRI > + * @pdev: PCI device structure > + * > + */ > +void pci_restore_pri_state(struct pci_dev *pdev) > +{ > + u16 control = PCI_PRI_CTRL_ENABLE; > + u32 reqs = pdev->pri_reqs_alloc; > + int pos; > + > + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI); > + if (!pos) > + return; > + > + if (!pdev->pri_enabled) > + return; I propose swapping the order of these tests, so that if PRI is not enabled, we don't have to search for the capability. Similarly for PASID below. I made these changes and re-indented these functions on my branch. No action required unless you object to these changes. > + > + pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs); > + pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); > +} > +EXPORT_SYMBOL_GPL(pci_restore_pri_state); > + > +/** > * pci_reset_pri - Resets device's PRI state > * @pdev: PCI device structure > * > @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev) > if (!pos) > return -EINVAL; > > - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, ); > - if (control & PCI_PRI_CTRL_ENABLE) > - return -EBUSY; > - > - control |= PCI_PRI_CTRL_RESET; > - > + control = PCI_PRI_CTRL_RESET; > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); > > return 0; > @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features) > if (!pos) > return -EINVAL; > > - pci_read_config_word(pdev, pos + PCI_PASID_CTRL, ); > pci_read_config_word(pdev, pos + PCI_PASID_CAP, ); > - > - if (control & PCI_PASID_CTRL_ENABLE) > - return -EINVAL; > - > supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV; > > /* User wants to enable anything unsupported? */ > @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features) > return -EINVAL; > > control = PCI_PASID_CTRL_ENABLE | features; > + pdev->pasid_features = features; > > pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control); > > @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev) > EXPORT_SYMBOL_GPL(pci_disable_pasid); > > /** > + * pci_restore_pasid_state - Restore PASID capabilities. > + * @pdev: PCI device structure > + * > + */ > +void pci_restore_pasid_state(struct pci_dev *pdev) > +{ > + u16 control; > + int pos; > + > + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID); > + if (!pos) > + return; > + > + if (!pdev->pasid_enabled) > + return; > + > + control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features; > + pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control); > +} > +EXPORT_SYMBOL_GPL(pci_restore_pasid_state); > + > +/** > * pci_pasid_features - Check which PASID features are supported > * @pdev: PCI device structure > * > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 7904d02..c9a6510 100644 > ---
Re: [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings
On 5/16/2017 12:35 PM, Borislav Petkov wrote: On Tue, Apr 18, 2017 at 04:20:56PM -0500, Tom Lendacky wrote: Since video memory needs to be accessed decrypted, be sure that the memory encryption mask is not set for the video ranges. Signed-off-by: Tom Lendacky--- arch/x86/include/asm/vga.h | 13 + arch/x86/mm/pageattr.c |2 ++ drivers/gpu/drm/drm_gem.c|2 ++ drivers/gpu/drm/drm_vm.c |4 drivers/gpu/drm/ttm/ttm_bo_vm.c |7 +-- drivers/gpu/drm/udl/udl_fb.c |4 drivers/video/fbdev/core/fbmem.c | 12 7 files changed, 42 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h index c4b9dc2..5c7567a 100644 --- a/arch/x86/include/asm/vga.h +++ b/arch/x86/include/asm/vga.h @@ -7,12 +7,25 @@ #ifndef _ASM_X86_VGA_H #define _ASM_X86_VGA_H +#include + /* *On the PC, we can just recalculate addresses and then *access the videoram directly without any black magic. + * To support memory encryption however, we need to access + * the videoram as decrypted memory. */ +#ifdef CONFIG_AMD_MEM_ENCRYPT +#define VGA_MAP_MEM(x, s) \ +({ \ + unsigned long start = (unsigned long)phys_to_virt(x); \ + set_memory_decrypted(start, (s) >> PAGE_SHIFT); \ + start; \ +}) +#else #define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x) +#endif Can we push the check in and save us the ifdeffery? #define VGA_MAP_MEM(x, s) \ ({ \ unsigned long start = (unsigned long)phys_to_virt(x); \ \ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) \ set_memory_decrypted(start, (s) >> PAGE_SHIFT); \ \ start; \ }) It does build here. :) That works for me and it's a lot cleaner. I'll make the change. Thanks, Tom ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] PCI: Save properties required to handle FLR for replay purposes.
On Tue, May 30, 2017 at 02:50:33PM -0500, Bjorn Helgaas wrote: > On Tue, May 30, 2017 at 09:25:49AM -0700, Ashok Raj wrote: > > From: CQ Tang> > > > Requires: https://patchwork.kernel.org/patch/9593891 > > The above patch (9593891) is not in my tree or Linus' tree, so I can't > do anything with this yet. I resent the patch as part of this series.. maybe should have massaged the commit message.. my bad :-(. Its the first patch in this series. Jean mentioned it might take him a while before the SVM patches stabilize. Since this patch is pretty much stand alone, he asked if i could resend it along with the one i need. Sorry for any confusion. Cheers, Ashok > > > After a FLR, pci-states need to be restored. This patch saves PASID features > > and PRI reqs cached. > > > > To: Bjorn Helgaas > > To: Joerg Roedel > > To: linux-...@vger.kernel.org > > To: linux-ker...@vger.kernel.org > > Cc: Jean-Phillipe Brucker > > Cc: David Woodhouse > > Cc: iommu@lists.linux-foundation.org > > > > Signed-off-by: CQ Tang > > Signed-off-by: Ashok Raj > > --- > > drivers/pci/ats.c | 65 > > + > > drivers/pci/pci.c | 3 +++ > > include/linux/pci-ats.h | 10 > > include/linux/pci.h | 6 + > > 4 files changed, 69 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > > index 2126497..a769955 100644 > > --- a/drivers/pci/ats.c > > +++ b/drivers/pci/ats.c > > @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs) > > if (!pos) > > return -EINVAL; > > > > - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, ); > > pci_read_config_word(pdev, pos + PCI_PRI_STATUS, ); > > - if ((control & PCI_PRI_CTRL_ENABLE) || > > - !(status & PCI_PRI_STATUS_STOPPED)) > > + if (!(status & PCI_PRI_STATUS_STOPPED)) > > return -EBUSY; > > > > pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, _requests); > > reqs = min(max_requests, reqs); > > + pdev->pri_reqs_alloc = reqs; > > pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs); > > > > - control |= PCI_PRI_CTRL_ENABLE; > > + control = PCI_PRI_CTRL_ENABLE; > > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); > > > > pdev->pri_enabled = 1; > > @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev) > > EXPORT_SYMBOL_GPL(pci_disable_pri); > > > > /** > > + * pci_restore_pri_state - Restore PRI > > + * @pdev: PCI device structure > > + * > > + */ > > +void pci_restore_pri_state(struct pci_dev *pdev) > > +{ > > + u16 control = PCI_PRI_CTRL_ENABLE; > > + u32 reqs = pdev->pri_reqs_alloc; > > + int pos; > > + > > + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI); > > + if (!pos) > > + return; > > + > > + if (!pdev->pri_enabled) > > + return; > > + > > + pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs); > > + pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); > > +} > > +EXPORT_SYMBOL_GPL(pci_restore_pri_state); > > + > > +/** > > * pci_reset_pri - Resets device's PRI state > > * @pdev: PCI device structure > > * > > @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev) > > if (!pos) > > return -EINVAL; > > > > - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, ); > > - if (control & PCI_PRI_CTRL_ENABLE) > > - return -EBUSY; > > - > > - control |= PCI_PRI_CTRL_RESET; > > - > > + control = PCI_PRI_CTRL_RESET; > > pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); > > > > return 0; > > @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int > > features) > > if (!pos) > > return -EINVAL; > > > > - pci_read_config_word(pdev, pos + PCI_PASID_CTRL, ); > > pci_read_config_word(pdev, pos + PCI_PASID_CAP, ); > > - > > - if (control & PCI_PASID_CTRL_ENABLE) > > - return -EINVAL; > > - > > supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV; > > > > /* User wants to enable anything unsupported? */ > > @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features) > > return -EINVAL; > > > > control = PCI_PASID_CTRL_ENABLE | features; > > + pdev->pasid_features = features; > > > > pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control); > > > > @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev) > > EXPORT_SYMBOL_GPL(pci_disable_pasid); > > > > /** > > + * pci_restore_pasid_state - Restore PASID capabilities. > > + * @pdev: PCI device structure > > + * > > + */ > > +void pci_restore_pasid_state(struct pci_dev *pdev) > > +{ > > + u16 control; > > + int pos; > > + > > + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID); > > + if (!pos) > > + return; > > + > > + if
Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
On 5/26/2017 11:35 AM, Borislav Petkov wrote: On Fri, May 26, 2017 at 11:22:36AM -0500, Tom Lendacky wrote: In addition to the same issue as efi.memmap.phys_map, efi_phys has the __initdata attribute so it will be released/freed which will cause problems in checks performed afterwards. Sounds to me like we should drop the __initdata attr and prepare them much earlier for use by the SME code. Probably something we can look at for a follow-on patch. Thanks, Tom ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
On 5/25/2017 11:17 PM, Xunlei Pang wrote: On 04/19/2017 at 05:21 AM, Tom Lendacky wrote: Provide support so that kexec can be used to boot a kernel when SME is enabled. Support is needed to allocate pages for kexec without encryption. This is needed in order to be able to reboot in the kernel in the same manner as originally booted. Hi Tom, Looks like kdump will break, I didn't see the similar handling for kdump cases, see kernel: kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc. > We need to support kdump with SME, kdump kernel/initramfs/purgatory/elfcorehdr/etc are all loaded into the reserved memory(see crashkernel=X) by userspace kexec-tools. I think a straightforward way would be to mark the whole reserved memory range without encryption before loading all the kexec segments for kdump, I guess we can handle this easily in arch_kexec_unprotect_crashkres(). Yes, that would work. Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped to the encrypted data. This is an area that I'm not familiar with, so I don't completely understand the flow in regards to where/when/how the ELF headers are copied and what needs to be done. Can you elaborate a bit on this? Thanks, Tom Regards, Xunlei Additionally, when shutting down all of the CPUs we need to be sure to flush the caches and then halt. This is needed when booting from a state where SME was not active into a state where SME is active (or vice-versa). Without these steps, it is possible for cache lines to exist for the same physical location but tagged both with and without the encryption bit. This can cause random memory corruption when caches are flushed depending on which cacheline is written last. Signed-off-by: Tom Lendacky--- arch/x86/include/asm/init.h |1 + arch/x86/include/asm/irqflags.h |5 + arch/x86/include/asm/kexec.h |8 arch/x86/include/asm/pgtable_types.h |1 + arch/x86/kernel/machine_kexec_64.c | 35 +- arch/x86/kernel/process.c| 26 +++-- arch/x86/mm/ident_map.c | 11 +++ include/linux/kexec.h| 14 ++ kernel/kexec_core.c |7 +++ 9 files changed, 101 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h index 737da62..b2ec511 100644 --- a/arch/x86/include/asm/init.h +++ b/arch/x86/include/asm/init.h @@ -6,6 +6,7 @@ struct x86_mapping_info { void *context; /* context for alloc_pgt_page */ unsigned long pmd_flag; /* page flag for PMD entry */ unsigned long offset;/* ident mapping offset */ + unsigned long kernpg_flag; /* kernel pagetable flag override */ }; int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page, diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index ac7692d..38b5920 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void) asm volatile("hlt": : :"memory"); } +static inline __cpuidle void native_wbinvd_halt(void) +{ + asm volatile("wbinvd; hlt" : : : "memory"); +} + #endif #ifdef CONFIG_PARAVIRT diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 70ef205..e8183ac 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -207,6 +207,14 @@ struct kexec_entry64_regs { uint64_t r15; uint64_t rip; }; + +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, + gfp_t gfp); +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages + +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages); +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages + #endif typedef void crash_vmclear_fn(void); diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index ce8cb1c..0f326f4 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -213,6 +213,7 @@ enum page_cache_mode { #define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC) #define PAGE_KERNEL_RO__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC) #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC) +#define PAGE_KERNEL_EXEC_NOENC __pgprot(__PAGE_KERNEL_EXEC) #define PAGE_KERNEL_RX__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC) #define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC) #define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 085c3b3..11c0ca9 100644 ---
Re: [PATCH 1/2] PCI: Save properties required to handle FLR for replay purposes.
On Thu, May 11, 2017 at 11:50:24AM +0100, Jean-Philippe Brucker wrote: > Hi, > > On 10/05/17 19:39, Ashok Raj wrote: > > From: CQ Tang> > > > Requires: https://patchwork.kernel.org/patch/9593891 > > Since your series is likely to go in much earlier than my SVM mess, maybe > you could carry that PCI patch along with it? Or I could resend it on its > own if you prefer. I have send the 2 patches to the list, so we can get them in before the long SVM series gets in. > > I'm planning to resend the SVM series in a few weeks but it still won't > make it into mainline since it hasn't run on hardware. > > Thanks, > Jean-Philippe > Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] Save and restore pci properties to support FLR
Resending Jean's patch so it can be included earlier than his large SVM commits. Original patch https://patchwork.kernel.org/patch/9593891 was ack'ed by Bjorn. Let's commit these separately since we need functionality earlier. Resending this series as requested by Jean. CQ Tang (1): PCI: Save properties required to handle FLR for replay purposes. Jean-Philippe Brucker (1): PCI: Cache PRI and PASID bits in pci_dev drivers/pci/ats.c | 88 - drivers/pci/pci.c | 3 ++ include/linux/pci-ats.h | 10 ++ include/linux/pci.h | 8 + 4 files changed, 94 insertions(+), 15 deletions(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] PCI: Save properties required to handle FLR for replay purposes.
From: CQ TangRequires: https://patchwork.kernel.org/patch/9593891 After a FLR, pci-states need to be restored. This patch saves PASID features and PRI reqs cached. To: Bjorn Helgaas To: Joerg Roedel To: linux-...@vger.kernel.org To: linux-ker...@vger.kernel.org Cc: Jean-Phillipe Brucker Cc: David Woodhouse Cc: iommu@lists.linux-foundation.org Signed-off-by: CQ Tang Signed-off-by: Ashok Raj --- drivers/pci/ats.c | 65 + drivers/pci/pci.c | 3 +++ include/linux/pci-ats.h | 10 include/linux/pci.h | 6 + 4 files changed, 69 insertions(+), 15 deletions(-) diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c index 2126497..a769955 100644 --- a/drivers/pci/ats.c +++ b/drivers/pci/ats.c @@ -160,17 +160,16 @@ int pci_enable_pri(struct pci_dev *pdev, u32 reqs) if (!pos) return -EINVAL; - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, ); pci_read_config_word(pdev, pos + PCI_PRI_STATUS, ); - if ((control & PCI_PRI_CTRL_ENABLE) || - !(status & PCI_PRI_STATUS_STOPPED)) + if (!(status & PCI_PRI_STATUS_STOPPED)) return -EBUSY; pci_read_config_dword(pdev, pos + PCI_PRI_MAX_REQ, _requests); reqs = min(max_requests, reqs); + pdev->pri_reqs_alloc = reqs; pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs); - control |= PCI_PRI_CTRL_ENABLE; + control = PCI_PRI_CTRL_ENABLE; pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); pdev->pri_enabled = 1; @@ -206,6 +205,29 @@ void pci_disable_pri(struct pci_dev *pdev) EXPORT_SYMBOL_GPL(pci_disable_pri); /** + * pci_restore_pri_state - Restore PRI + * @pdev: PCI device structure + * + */ +void pci_restore_pri_state(struct pci_dev *pdev) +{ + u16 control = PCI_PRI_CTRL_ENABLE; + u32 reqs = pdev->pri_reqs_alloc; + int pos; + + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI); + if (!pos) + return; + + if (!pdev->pri_enabled) + return; + + pci_write_config_dword(pdev, pos + PCI_PRI_ALLOC_REQ, reqs); + pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); +} +EXPORT_SYMBOL_GPL(pci_restore_pri_state); + +/** * pci_reset_pri - Resets device's PRI state * @pdev: PCI device structure * @@ -224,12 +246,7 @@ int pci_reset_pri(struct pci_dev *pdev) if (!pos) return -EINVAL; - pci_read_config_word(pdev, pos + PCI_PRI_CTRL, ); - if (control & PCI_PRI_CTRL_ENABLE) - return -EBUSY; - - control |= PCI_PRI_CTRL_RESET; - + control = PCI_PRI_CTRL_RESET; pci_write_config_word(pdev, pos + PCI_PRI_CTRL, control); return 0; @@ -259,12 +276,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features) if (!pos) return -EINVAL; - pci_read_config_word(pdev, pos + PCI_PASID_CTRL, ); pci_read_config_word(pdev, pos + PCI_PASID_CAP, ); - - if (control & PCI_PASID_CTRL_ENABLE) - return -EINVAL; - supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV; /* User wants to enable anything unsupported? */ @@ -272,6 +284,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features) return -EINVAL; control = PCI_PASID_CTRL_ENABLE | features; + pdev->pasid_features = features; pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control); @@ -305,6 +318,28 @@ void pci_disable_pasid(struct pci_dev *pdev) EXPORT_SYMBOL_GPL(pci_disable_pasid); /** + * pci_restore_pasid_state - Restore PASID capabilities. + * @pdev: PCI device structure + * + */ +void pci_restore_pasid_state(struct pci_dev *pdev) +{ + u16 control; + int pos; + + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID); + if (!pos) + return; + + if (!pdev->pasid_enabled) + return; + + control = PCI_PASID_CTRL_ENABLE | pdev->pasid_features; + pci_write_config_word(pdev, pos + PCI_PASID_CTRL, control); +} +EXPORT_SYMBOL_GPL(pci_restore_pasid_state); + +/** * pci_pasid_features - Check which PASID features are supported * @pdev: PCI device structure * diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 7904d02..c9a6510 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -1171,6 +1172,8 @@ void pci_restore_state(struct pci_dev *dev) /* PCI Express register must be restored first */ pci_restore_pcie_state(dev); + pci_restore_pasid_state(dev); + pci_restore_pri_state(dev); pci_restore_ats_state(dev); pci_restore_vc_state(dev); diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h index 57e0b82..782fb8e 100644 --- a/include/linux/pci-ats.h
Re: [PATCH] iommu/amd: Propagate IOVA allocation failure
On 30/05/17 10:48, Joerg Roedel wrote: > On Fri, May 26, 2017 at 07:31:26PM +0100, Robin Murphy wrote: >> Unlike the old allocator, alloc_iova_fast() will return 0 if it failed >> to allocate a PFN. Since the callers of dma_ops_alloc_iova() would end >> up treating that as a valid address, translate it to the DMA_ERROR_CODE >> that they would expect. >> >> Fixes: 256e4621c21a ("iommu/amd: Make use of the generic IOVA allocator") >> Signed-off-by: Robin Murphy>> --- >> >> Just something I spotted whilst comparing dma_map_page() callchains... >> >> drivers/iommu/amd_iommu.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c >> index 63cacf5d6cf2..489dc302899e 100644 >> --- a/drivers/iommu/amd_iommu.c >> +++ b/drivers/iommu/amd_iommu.c >> @@ -1555,6 +1555,9 @@ static unsigned long dma_ops_alloc_iova(struct device >> *dev, >> if (!pfn) >> pfn = alloc_iova_fast(_dom->iovad, pages, >> IOVA_PFN(dma_mask)); >> >> +if (!pfn) >> +return DMA_ERROR_CODE; >> + > > That shouldn't make a difference on x86 because the DMA_ERROR_CODE is 0 > as well. Ha, now that I did manage to overlook. Oh well, never mind then :) Thanks, Robin. > > > Joerg > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Device address specific mapping of arm,mmu-500
On 30/05/17 18:16, Ray Jui wrote: > Hi Marc, > > On 5/30/17 9:59 AM, Marc Zyngier wrote: >> On 30/05/17 17:49, Ray Jui wrote: >>> Hi Will, >>> >>> On 5/30/17 8:14 AM, Will Deacon wrote: On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote: > I'm writing to check with you to see if the latest arm-smmu.c driver in > v4.12-rc Linux for smmu-500 can support mapping that is only specific to > a particular physical address range while leave the rest still to be > handled by the client device. I believe this can already be supported by > the device tree binding of the generic IOMMU framework; however, it is > not clear to me whether or not the arm-smmu.c driver can support it. > > To give you some background information: > > We have a SoC that has PCIe root complex that has a build-in logic block > to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block > has a HW bug that causes the MSI writes not parsed properly and can > potentially corrupt data in the internal FIFO. A workaround is to have > ARM MMU-500 takes care of all inbound transactions. I found that is > working after hooking up our PCIe root complex to MMU-500; however, even > with this optimized arm-smmu driver in v4.12, I'm still seeing a > significant Ethernet throughput drop in both the TX and RX directions. > The throughput drop is very significant at around 50% (but is already > much improved compared to other prior kernel versions at 70~90%). Did Robin's experiments help at all with this? http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf >>> >>> It looks like these are new optimizations that have not yet been merged >>> in v4.12? I'm going to give it a try. >>> > One alternative is to only use MMU-500 for MSI writes towards > GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific > region of physical address that I want MMU-500 to act on and leave the > rest of inbound transactions to be handled directly by our PCIe > controller, it can potentially work around the HW bug we have and at the > same time achieve optimal throughput. I don't think you can bypass the SMMU for MSIs unless you give them their own StreamIDs, which is likely to break things horribly in the kernel. You could try to create an identity mapping, but you'll still have the translation overhead and you'd probably end up having to supply your own DMA ops to manage the address space. I'm assuming that you need to prevent the physical address of the ITS from being allocated as an IOVA? >>> >>> Will, is that a HW limitation that the SMMU cannot be used, only for MSI >>> writes, in which case, the physical address range is very specific in >>> our ASIC that falls in the device memory region (e.g., below 0x8000)? >>> >>> In fact, what I need in this case is a static mapping from IOMMU on the >>> physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the >>> address that MSI writes go to. This is to bypass the MSI forwarding >>> logic in our PCIe controller. At the same time, I can leave the rest of >>> inbound transactions to be handled by our PCIe controller without going >>> through the MMU. >> >> How is that going to work for DMA? I imagine your network interfaces do >> have to access memory, don't they? How can the transactions be >> terminated in the PCIe controller? > > Sorry, I may not phrase this properly. These inbound transactions (DMA > write to DDR, from endpoint) do not terminate in the PCIe controller. > They are taken by the PCIe controller as PCIe transactions and will be > carried towards the designated memory on the host. So what is the StreamID used for these transactions? Is that a different StreamID from that of the DMAing device? If you want to avoid the SMMU effect on the transaction, you must make sure if doesn't match anything there. Thanks, M. -- Jazz is not dead. It just smells funny... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Device address specific mapping of arm,mmu-500
On 30/05/17 17:49, Ray Jui wrote: > Hi Will, > > On 5/30/17 8:14 AM, Will Deacon wrote: >> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote: >>> I'm writing to check with you to see if the latest arm-smmu.c driver in >>> v4.12-rc Linux for smmu-500 can support mapping that is only specific to >>> a particular physical address range while leave the rest still to be >>> handled by the client device. I believe this can already be supported by >>> the device tree binding of the generic IOMMU framework; however, it is >>> not clear to me whether or not the arm-smmu.c driver can support it. >>> >>> To give you some background information: >>> >>> We have a SoC that has PCIe root complex that has a build-in logic block >>> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block >>> has a HW bug that causes the MSI writes not parsed properly and can >>> potentially corrupt data in the internal FIFO. A workaround is to have >>> ARM MMU-500 takes care of all inbound transactions. I found that is >>> working after hooking up our PCIe root complex to MMU-500; however, even >>> with this optimized arm-smmu driver in v4.12, I'm still seeing a >>> significant Ethernet throughput drop in both the TX and RX directions. >>> The throughput drop is very significant at around 50% (but is already >>> much improved compared to other prior kernel versions at 70~90%). >> >> Did Robin's experiments help at all with this? >> >> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf >> > > It looks like these are new optimizations that have not yet been merged > in v4.12? I'm going to give it a try. Actually, most of the stuff there did land in 4.12 - only the iommu/pgtable part is experimental stuff which hasn't been on the list yet (but hopefully should be soon). >>> One alternative is to only use MMU-500 for MSI writes towards >>> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific >>> region of physical address that I want MMU-500 to act on and leave the >>> rest of inbound transactions to be handled directly by our PCIe >>> controller, it can potentially work around the HW bug we have and at the >>> same time achieve optimal throughput. >> >> I don't think you can bypass the SMMU for MSIs unless you give them their >> own StreamIDs, which is likely to break things horribly in the kernel. You >> could try to create an identity mapping, but you'll still have the >> translation overhead and you'd probably end up having to supply your own DMA >> ops to manage the address space. I'm assuming that you need to prevent the >> physical address of the ITS from being allocated as an IOVA? > > Will, is that a HW limitation that the SMMU cannot be used, only for MSI > writes, in which case, the physical address range is very specific in > our ASIC that falls in the device memory region (e.g., below 0x8000)? Yes, either translation is enabled or it isn't - we don't have GART-style apertures. To segregate by address the best you can do is set up the page tables to identity-map all of the "untranslated" address space. As Will mentioned, if MSI writes could be distinguished from DMA writes by Stream ID, rather than by address, then there would be more options, but in the PCI case at least that's not generally possible. Robin. > In fact, what I need in this case is a static mapping from IOMMU on the > physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the > address that MSI writes go to. This is to bypass the MSI forwarding > logic in our PCIe controller. At the same time, I can leave the rest of > inbound transactions to be handled by our PCIe controller without going > through the MMU. > >> >>> Any feedback from you is greatly appreciated! >> >> Fix the hardware ;) > > Indeed that has to happen with the next revision of the ASIC. But as you > can see I'm getting quite desperate here trying to find an interim solution. > >> >> Will >> > > Thanks for the help! > > Ray > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Device address specific mapping of arm,mmu-500
Hi Marc, On 5/30/17 9:59 AM, Marc Zyngier wrote: > On 30/05/17 17:49, Ray Jui wrote: >> Hi Will, >> >> On 5/30/17 8:14 AM, Will Deacon wrote: >>> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote: I'm writing to check with you to see if the latest arm-smmu.c driver in v4.12-rc Linux for smmu-500 can support mapping that is only specific to a particular physical address range while leave the rest still to be handled by the client device. I believe this can already be supported by the device tree binding of the generic IOMMU framework; however, it is not clear to me whether or not the arm-smmu.c driver can support it. To give you some background information: We have a SoC that has PCIe root complex that has a build-in logic block to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block has a HW bug that causes the MSI writes not parsed properly and can potentially corrupt data in the internal FIFO. A workaround is to have ARM MMU-500 takes care of all inbound transactions. I found that is working after hooking up our PCIe root complex to MMU-500; however, even with this optimized arm-smmu driver in v4.12, I'm still seeing a significant Ethernet throughput drop in both the TX and RX directions. The throughput drop is very significant at around 50% (but is already much improved compared to other prior kernel versions at 70~90%). >>> >>> Did Robin's experiments help at all with this? >>> >>> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf >>> >> >> It looks like these are new optimizations that have not yet been merged >> in v4.12? I'm going to give it a try. >> One alternative is to only use MMU-500 for MSI writes towards GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific region of physical address that I want MMU-500 to act on and leave the rest of inbound transactions to be handled directly by our PCIe controller, it can potentially work around the HW bug we have and at the same time achieve optimal throughput. >>> >>> I don't think you can bypass the SMMU for MSIs unless you give them their >>> own StreamIDs, which is likely to break things horribly in the kernel. You >>> could try to create an identity mapping, but you'll still have the >>> translation overhead and you'd probably end up having to supply your own DMA >>> ops to manage the address space. I'm assuming that you need to prevent the >>> physical address of the ITS from being allocated as an IOVA? >> >> Will, is that a HW limitation that the SMMU cannot be used, only for MSI >> writes, in which case, the physical address range is very specific in >> our ASIC that falls in the device memory region (e.g., below 0x8000)? >> >> In fact, what I need in this case is a static mapping from IOMMU on the >> physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the >> address that MSI writes go to. This is to bypass the MSI forwarding >> logic in our PCIe controller. At the same time, I can leave the rest of >> inbound transactions to be handled by our PCIe controller without going >> through the MMU. > > How is that going to work for DMA? I imagine your network interfaces do > have to access memory, don't they? How can the transactions be > terminated in the PCIe controller? Sorry, I may not phrase this properly. These inbound transactions (DMA write to DDR, from endpoint) do not terminate in the PCIe controller. They are taken by the PCIe controller as PCIe transactions and will be carried towards the designated memory on the host. > > Thanks, > > M. > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Device address specific mapping of arm,mmu-500
On 30/05/17 17:49, Ray Jui wrote: > Hi Will, > > On 5/30/17 8:14 AM, Will Deacon wrote: >> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote: >>> I'm writing to check with you to see if the latest arm-smmu.c driver in >>> v4.12-rc Linux for smmu-500 can support mapping that is only specific to >>> a particular physical address range while leave the rest still to be >>> handled by the client device. I believe this can already be supported by >>> the device tree binding of the generic IOMMU framework; however, it is >>> not clear to me whether or not the arm-smmu.c driver can support it. >>> >>> To give you some background information: >>> >>> We have a SoC that has PCIe root complex that has a build-in logic block >>> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block >>> has a HW bug that causes the MSI writes not parsed properly and can >>> potentially corrupt data in the internal FIFO. A workaround is to have >>> ARM MMU-500 takes care of all inbound transactions. I found that is >>> working after hooking up our PCIe root complex to MMU-500; however, even >>> with this optimized arm-smmu driver in v4.12, I'm still seeing a >>> significant Ethernet throughput drop in both the TX and RX directions. >>> The throughput drop is very significant at around 50% (but is already >>> much improved compared to other prior kernel versions at 70~90%). >> >> Did Robin's experiments help at all with this? >> >> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf >> > > It looks like these are new optimizations that have not yet been merged > in v4.12? I'm going to give it a try. > >>> One alternative is to only use MMU-500 for MSI writes towards >>> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific >>> region of physical address that I want MMU-500 to act on and leave the >>> rest of inbound transactions to be handled directly by our PCIe >>> controller, it can potentially work around the HW bug we have and at the >>> same time achieve optimal throughput. >> >> I don't think you can bypass the SMMU for MSIs unless you give them their >> own StreamIDs, which is likely to break things horribly in the kernel. You >> could try to create an identity mapping, but you'll still have the >> translation overhead and you'd probably end up having to supply your own DMA >> ops to manage the address space. I'm assuming that you need to prevent the >> physical address of the ITS from being allocated as an IOVA? > > Will, is that a HW limitation that the SMMU cannot be used, only for MSI > writes, in which case, the physical address range is very specific in > our ASIC that falls in the device memory region (e.g., below 0x8000)? > > In fact, what I need in this case is a static mapping from IOMMU on the > physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the > address that MSI writes go to. This is to bypass the MSI forwarding > logic in our PCIe controller. At the same time, I can leave the rest of > inbound transactions to be handled by our PCIe controller without going > through the MMU. How is that going to work for DMA? I imagine your network interfaces do have to access memory, don't they? How can the transactions be terminated in the PCIe controller? Thanks, M. -- Jazz is not dead. It just smells funny... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Device address specific mapping of arm,mmu-500
Hi Will, On 5/30/17 8:14 AM, Will Deacon wrote: > On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote: >> I'm writing to check with you to see if the latest arm-smmu.c driver in >> v4.12-rc Linux for smmu-500 can support mapping that is only specific to >> a particular physical address range while leave the rest still to be >> handled by the client device. I believe this can already be supported by >> the device tree binding of the generic IOMMU framework; however, it is >> not clear to me whether or not the arm-smmu.c driver can support it. >> >> To give you some background information: >> >> We have a SoC that has PCIe root complex that has a build-in logic block >> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block >> has a HW bug that causes the MSI writes not parsed properly and can >> potentially corrupt data in the internal FIFO. A workaround is to have >> ARM MMU-500 takes care of all inbound transactions. I found that is >> working after hooking up our PCIe root complex to MMU-500; however, even >> with this optimized arm-smmu driver in v4.12, I'm still seeing a >> significant Ethernet throughput drop in both the TX and RX directions. >> The throughput drop is very significant at around 50% (but is already >> much improved compared to other prior kernel versions at 70~90%). > > Did Robin's experiments help at all with this? > > http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf > It looks like these are new optimizations that have not yet been merged in v4.12? I'm going to give it a try. >> One alternative is to only use MMU-500 for MSI writes towards >> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific >> region of physical address that I want MMU-500 to act on and leave the >> rest of inbound transactions to be handled directly by our PCIe >> controller, it can potentially work around the HW bug we have and at the >> same time achieve optimal throughput. > > I don't think you can bypass the SMMU for MSIs unless you give them their > own StreamIDs, which is likely to break things horribly in the kernel. You > could try to create an identity mapping, but you'll still have the > translation overhead and you'd probably end up having to supply your own DMA > ops to manage the address space. I'm assuming that you need to prevent the > physical address of the ITS from being allocated as an IOVA? Will, is that a HW limitation that the SMMU cannot be used, only for MSI writes, in which case, the physical address range is very specific in our ASIC that falls in the device memory region (e.g., below 0x8000)? In fact, what I need in this case is a static mapping from IOMMU on the physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the address that MSI writes go to. This is to bypass the MSI forwarding logic in our PCIe controller. At the same time, I can leave the rest of inbound transactions to be handled by our PCIe controller without going through the MMU. > >> Any feedback from you is greatly appreciated! > > Fix the hardware ;) Indeed that has to happen with the next revision of the ASIC. But as you can see I'm getting quite desperate here trying to find an interim solution. > > Will > Thanks for the help! Ray ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
On 5/21/2017 2:16 AM, Borislav Petkov wrote: On Fri, May 19, 2017 at 03:50:32PM -0500, Tom Lendacky wrote: The "worker" function would be doing the loop through the setup data, but since the setup data is mapped inside the loop I can't do the __init calling the non-init function and still hope to consolidate the code. Maybe I'm missing something here... Hmm, I see what you mean. But the below change ontop doesn't fire any warnings here. Maybe your .config has something set which I don't... Check if you have CONFIG_DEBUG_SECTION_MISMATCH=y Thanks, Tom --- diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 55317ba3b6dc..199c983192ae 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -515,71 +515,50 @@ static bool memremap_is_efi_data(resource_size_t phys_addr, * Examine the physical address to determine if it is boot data by checking * it against the boot params setup_data chain. */ -static bool memremap_is_setup_data(resource_size_t phys_addr, - unsigned long size) +static bool +__memremap_is_setup_data(resource_size_t phys_addr, unsigned long size, bool early) { struct setup_data *data; u64 paddr, paddr_next; + u32 len; paddr = boot_params.hdr.setup_data; while (paddr) { - bool is_setup_data = false; if (phys_addr == paddr) return true; - data = memremap(paddr, sizeof(*data), - MEMREMAP_WB | MEMREMAP_DEC); + if (early) + data = early_memremap_decrypted(paddr, sizeof(*data)); + else + data = memremap(paddr, sizeof(*data), MEMREMAP_WB | MEMREMAP_DEC); paddr_next = data->next; + len = data->len; - if ((phys_addr > paddr) && (phys_addr < (paddr + data->len))) - is_setup_data = true; + if (early) + early_memunmap(data, sizeof(*data)); + else + memunmap(data); - memunmap(data); - if (is_setup_data) + if ((phys_addr > paddr) && (phys_addr < (paddr + data->len))) return true; paddr = paddr_next; } - return false; } -/* - * Examine the physical address to determine if it is boot data by checking - * it against the boot params setup_data chain (early boot version). - */ static bool __init early_memremap_is_setup_data(resource_size_t phys_addr, unsigned long size) { - struct setup_data *data; - u64 paddr, paddr_next; - - paddr = boot_params.hdr.setup_data; - while (paddr) { - bool is_setup_data = false; - - if (phys_addr == paddr) - return true; - - data = early_memremap_decrypted(paddr, sizeof(*data)); - - paddr_next = data->next; - - if ((phys_addr > paddr) && (phys_addr < (paddr + data->len))) - is_setup_data = true; - - early_memunmap(data, sizeof(*data)); - - if (is_setup_data) - return true; - - paddr = paddr_next; - } + return __memremap_is_setup_data(phys_addr, size, true); +} - return false; +static bool memremap_is_setup_data(resource_size_t phys_addr, unsigned long size) +{ + return __memremap_is_setup_data(phys_addr, size, false); } /* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
On 5/26/2017 11:25 AM, Borislav Petkov wrote: On Thu, May 25, 2017 at 05:24:27PM -0500, Tom Lendacky wrote: I guess I could do that, but this will probably only end up clearing a single PGD entry anyway since it's highly doubtful the address range would cross a 512GB boundary. Or you can compute how many 512G-covering, i.e., PGD entries there are and clear just the right amnount. :^) I can change the name. As for the use of ENTRY... without the ENTRY/ENDPROC combination I was receiving a warning about a return instruction outside of a callable function. It looks like I can just define the "sme_enc_routine:" label with the ENDPROC and the warning goes away and the global is avoided. It doesn't like the local labels (.L...) so I'll use the new name. Is that warning from objtool or where does it come from? Yes, it's from objtool: arch/x86/mm/mem_encrypt_boot.o: warning: objtool: .text+0xd2: return instruction outside of a callable function How do I trigger it locally I think having CONFIG_STACK_VALIDATION=y will trigger it. The hardware will try to optimize rep movsb into large chunks assuming things are aligned, sizes are large enough, etc. so we don't have to explicitly specify and setup for a rep movsq. I thought the hw does that for movsq too? It does. Thanks, Tom ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
On 5/19/2017 3:16 PM, Josh Poimboeuf wrote: On Fri, May 19, 2017 at 01:30:05PM +0200, Borislav Petkov wrote: it is called so early. I can get past it by adding: CFLAGS_mem_encrypt.o := $(nostackp) in the arch/x86/mm/Makefile, but that obviously eliminates the support for the whole file. Would it be better to split out the sme_enable() and other boot routines into a separate file or just apply the $(nostackp) to the whole file? Josh might have a better idea here... CCed. I'm the stack validation guy, not the stack protection guy :-) But there is a way to disable compiler options on a per-function basis with the gcc __optimize__ function attribute. For example: __attribute__((__optimize__("no-stack-protector"))) I'll look at doing that instead of removing the support for the whole file. Thanks, Tom ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
On 5/19/2017 6:30 AM, Borislav Petkov wrote: On Fri, Apr 21, 2017 at 01:56:13PM -0500, Tom Lendacky wrote: On 4/18/2017 4:22 PM, Tom Lendacky wrote: Add support to check if SME has been enabled and if memory encryption should be activated (checking of command line option based on the configuration of the default state). If memory encryption is to be activated, then the encryption mask is set and the kernel is encrypted "in place." Signed-off-by: Tom Lendacky--- arch/x86/kernel/head_64.S |1 + arch/x86/mm/mem_encrypt.c | 83 +++-- 2 files changed, 80 insertions(+), 4 deletions(-) ... -unsigned long __init sme_enable(void) +unsigned long __init sme_enable(struct boot_params *bp) { + const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off; + unsigned int eax, ebx, ecx, edx; + unsigned long me_mask; + bool active_by_default; + char buffer[16]; So it turns out that when KASLR is enabled (CONFIG_RAMDOMIZE_BASE=y) the stack-protector support causes issues with this function because What issues? The stack protection support makes use of the gs segment register and at this point not everything is setup properly to allow it to work, so it segfaults. Thanks, Tom it is called so early. I can get past it by adding: CFLAGS_mem_encrypt.o := $(nostackp) in the arch/x86/mm/Makefile, but that obviously eliminates the support for the whole file. Would it be better to split out the sme_enable() and other boot routines into a separate file or just apply the $(nostackp) to the whole file? Josh might have a better idea here... CCed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
On 5/30/2017 9:55 AM, Borislav Petkov wrote: > On Tue, May 30, 2017 at 09:38:36AM -0500, Tom Lendacky wrote: >> In this case we're running identity mapped and the "on" constant ends up >> as kernel address (0x81...) which results in a segfault. > > Would > > static const char *__on_str = "on"; > > ... > > if (!strncmp(buffer, __pa_nodebug(__on_str), 2)) > ... > > work? > > __phys_addr_nodebug() seems to pay attention to phys_base and > PAGE_OFFSET and so on... Except that phys_base hasn't been adjusted yet so that doesn't work either. > > I'd like to avoid that rip-relative address finding in inline asm which > looks fragile to me. I can define the command line option and the "on" and "off" values as character buffers in the function and initialize them on a per character basis (using a static string causes the same issues as referencing a string constant), i.e.: char cmdline_arg[] = {'m', 'e', 'm', '_', 'e', 'n', 'c', 'r', 'y', 'p', 't', '\0'}; char cmdline_off[] = {'o', 'f', 'f', '\0'}; char cmdline_on[] = {'o', 'n', '\0'}; It doesn't look the greatest, but it works and removes the need for the rip-relative addressing. Thanks, Tom > > Thanks. > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
On Tue, May 30, 2017 at 09:38:36AM -0500, Tom Lendacky wrote: > In this case we're running identity mapped and the "on" constant ends up > as kernel address (0x81...) which results in a segfault. Would static const char *__on_str = "on"; ... if (!strncmp(buffer, __pa_nodebug(__on_str), 2)) ... work? __phys_addr_nodebug() seems to pay attention to phys_base and PAGE_OFFSET and so on... I'd like to avoid that rip-relative address finding in inline asm which looks fragile to me. Thanks. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
On 5/19/2017 6:27 AM, Borislav Petkov wrote: On Tue, Apr 18, 2017 at 04:22:23PM -0500, Tom Lendacky wrote: Add support to check if SME has been enabled and if memory encryption should be activated (checking of command line option based on the configuration of the default state). If memory encryption is to be activated, then the encryption mask is set and the kernel is encrypted "in place." Signed-off-by: Tom Lendacky--- arch/x86/kernel/head_64.S |1 + arch/x86/mm/mem_encrypt.c | 83 +++-- 2 files changed, 80 insertions(+), 4 deletions(-) ... +unsigned long __init sme_enable(struct boot_params *bp) { + const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off; + unsigned int eax, ebx, ecx, edx; + unsigned long me_mask; + bool active_by_default; + char buffer[16]; + u64 msr; + + /* Check for the SME support leaf */ + eax = 0x8000; + ecx = 0; + native_cpuid(, , , ); + if (eax < 0x801f) + goto out; + + /* +* Check for the SME feature: +* CPUID Fn8000_001F[EAX] - Bit 0 +* Secure Memory Encryption support +* CPUID Fn8000_001F[EBX] - Bits 5:0 +* Pagetable bit position used to indicate encryption +*/ + eax = 0x801f; + ecx = 0; + native_cpuid(, , , ); + if (!(eax & 1)) + goto out; < newline here. + me_mask = 1UL << (ebx & 0x3f); + + /* Check if SME is enabled */ + msr = __rdmsr(MSR_K8_SYSCFG); + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT)) + goto out; + + /* +* Fixups have not been applied to phys_base yet, so we must obtain +* the address to the SME command line option data in the following +* way. +*/ + asm ("lea sme_cmdline_arg(%%rip), %0" +: "=r" (cmdline_arg) +: "p" (sme_cmdline_arg)); + asm ("lea sme_cmdline_on(%%rip), %0" +: "=r" (cmdline_on) +: "p" (sme_cmdline_on)); + asm ("lea sme_cmdline_off(%%rip), %0" +: "=r" (cmdline_off) +: "p" (sme_cmdline_off)); + + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT)) + active_by_default = true; + else + active_by_default = false; + + cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr | +((u64)bp->ext_cmd_line_ptr << 32)); + + cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)); + + if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0) + sme_me_mask = me_mask; Why doesn't simply if (!strncmp(buffer, "on", 2)) ... work? In this case we're running identity mapped and the "on" constant ends up as kernel address (0x81...) which results in a segfault. Thanks, Tom ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 3/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
From: Geetha SowjanyaCavium ThunderX2 SMMU doesn't support MSI and also doesn't have unique irq lines for gerror, eventq and cmdq-sync. This patch addresses the issue by checking if any interrupt sources are using same irq number, then they are registered as shared irqs. Signed-off-by: Geetha Sowjanya --- Documentation/arm64/silicon-errata.txt |1 + drivers/iommu/arm-smmu-v3.c| 29 + 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt index 4693a32..42422f6 100644 --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -63,6 +63,7 @@ stable kernels. | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | | Cavium | ThunderX SMMUv2 | #27704 | N/A | | Cavium | ThunderX2 SMMUv3| #74 | N/A | +| Cavium | ThunderX2 SMMUv3| #126| N/A | || | | | | Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 | || | | | diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 4e80205..d2db01f 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2232,6 +2232,25 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) devm_add_action(dev, arm_smmu_free_msis, dev); } +static int get_irq_flags(struct arm_smmu_device *smmu, int irq) +{ + int match_count = 0; + + if (irq == smmu->evtq.q.irq) + match_count++; + if (irq == smmu->cmdq.q.irq) + match_count++; + if (irq == smmu->gerr_irq) + match_count++; + if (irq == smmu->priq.q.irq) + match_count++; + + if (match_count > 1) + return IRQF_SHARED | IRQF_ONESHOT; + + return IRQF_ONESHOT; +} + static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu) { int ret, irq; @@ -2252,7 +2271,7 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu) if (irq) { ret = devm_request_threaded_irq(smmu->dev, irq, NULL, arm_smmu_evtq_thread, - IRQF_ONESHOT, + get_irq_flags(smmu, irq), "arm-smmu-v3-evtq", smmu); if (ret < 0) dev_warn(smmu->dev, "failed to enable evtq irq\n"); @@ -2261,7 +2280,8 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu) irq = smmu->cmdq.q.irq; if (irq) { ret = devm_request_irq(smmu->dev, irq, - arm_smmu_cmdq_sync_handler, 0, + arm_smmu_cmdq_sync_handler, + get_irq_flags(smmu, irq), "arm-smmu-v3-cmdq-sync", smmu); if (ret < 0) dev_warn(smmu->dev, "failed to enable cmdq-sync irq\n"); @@ -2270,7 +2290,8 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu) irq = smmu->gerr_irq; if (irq) { ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler, - 0, "arm-smmu-v3-gerror", smmu); + get_irq_flags(smmu, irq), + "arm-smmu-v3-gerror", smmu); if (ret < 0) dev_warn(smmu->dev, "failed to enable gerror irq\n"); } @@ -2280,7 +2301,7 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu) if (irq) { ret = devm_request_threaded_irq(smmu->dev, irq, NULL, arm_smmu_priq_thread, - IRQF_ONESHOT, + get_irq_flags(smmu, irq), "arm-smmu-v3-priq", smmu); if (ret < 0) -- 1.7.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 2/3] iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
From: Linu CherianCavium ThunderX2 SMMU implementation doesn't support page 1 register space and PAGE0_REGS_ONLY option is enabled as an errata workaround. This option when turned on, replaces all page 1 offsets used for EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets. SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY, since resource size can be either 64k/128k. For this, arm_smmu_device_dt_probe/acpi_probe has been moved before platform_get_resource call, so that SMMU options are set beforehand. Signed-off-by: Linu Cherian Signed-off-by: Geetha Sowjanya --- Documentation/arm64/silicon-errata.txt |1 + .../devicetree/bindings/iommu/arm,smmu-v3.txt |6 ++ drivers/iommu/arm-smmu-v3.c| 64 +++- 3 files changed, 56 insertions(+), 15 deletions(-) diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt index 10f2ddd..4693a32 100644 --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.txt @@ -62,6 +62,7 @@ stable kernels. | Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | | Cavium | ThunderX SMMUv2 | #27704 | N/A | +| Cavium | ThunderX2 SMMUv3| #74 | N/A | || | | | | Freescale/NXP | LS2080A/LS1043A | A-008585| FSL_ERRATUM_A008585 | || | | | diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt index be57550..607e270 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt @@ -49,6 +49,12 @@ the PCIe specification. - hisilicon,broken-prefetch-cmd : Avoid sending CMD_PREFETCH_* commands to the SMMU. +- cavium,cn9900-broken-page1-regspace +: Replaces all page 1 offsets used for EVTQ_PROD/CONS, + PRIQ_PROD/CONS register access with page 0 offsets. + Set for Caviun ThunderX2 silicon that doesn't support + SMMU page1 register space. + ** Example smmu@2b40 { diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 380969a..4e80205 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -412,6 +412,9 @@ #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 +#define ARM_SMMU_PAGE0_REGS_ONLY(smmu) \ + ((smmu)->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY) + static bool disable_bypass; module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO); MODULE_PARM_DESC(disable_bypass, @@ -597,6 +600,7 @@ struct arm_smmu_device { u32 features; #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0) +#define ARM_SMMU_OPT_PAGE0_REGS_ONLY(1 << 1) u32 options; struct arm_smmu_cmdqcmdq; @@ -663,9 +667,19 @@ struct arm_smmu_option_prop { static struct arm_smmu_option_prop arm_smmu_options[] = { { ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" }, + { ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"}, { 0, NULL}, }; +static inline void __iomem *arm_smmu_page1_fixup(unsigned long offset, +struct arm_smmu_device *smmu) +{ + if (offset > SZ_64K && ARM_SMMU_PAGE0_REGS_ONLY(smmu)) + offset -= SZ_64K; + + return smmu->base + offset; +} + static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain); @@ -1961,8 +1975,8 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, return -ENOMEM; } - q->prod_reg = smmu->base + prod_off; - q->cons_reg = smmu->base + cons_off; + q->prod_reg = arm_smmu_page1_fixup(prod_off, smmu); + q->cons_reg = arm_smmu_page1_fixup(cons_off, smmu); q->ent_dwords = dwords; q->q_base = Q_BASE_RWA; @@ -2363,8 +2377,10 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) /* Event queue */ writeq_relaxed(smmu->evtq.q.q_base, smmu->base + ARM_SMMU_EVTQ_BASE); - writel_relaxed(smmu->evtq.q.prod, smmu->base + ARM_SMMU_EVTQ_PROD); - writel_relaxed(smmu->evtq.q.cons, smmu->base +
[PATCH v7 1/3] ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model
From: Linu CherianCavium ThunderX2 implementation doesn't support second page in SMMU register space. Hence, resource size is set as 64k for this model. Signed-off-by: Linu Cherian Signed-off-by: Geetha Sowjanya --- drivers/acpi/arm64/iort.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index c5fecf9..bba2b59 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -833,12 +833,20 @@ static void __init arm_smmu_v3_init_resources(struct resource *res, { struct acpi_iort_smmu_v3 *smmu; int num_res = 0; + unsigned long size = SZ_128K; /* Retrieve SMMUv3 specific data */ smmu = (struct acpi_iort_smmu_v3 *)node->node_data; + /* +* Override the size, for Cavium ThunderX2 implementation +* which doesn't support the page 1 SMMU register space. +*/ + if (smmu->model == ACPI_IORT_SMMU_CAVIUM_CN99XX) + size = SZ_64K; + res[num_res].start = smmu->base_address; - res[num_res].end = smmu->base_address + SZ_128K - 1; + res[num_res].end = smmu->base_address + size - 1; res[num_res].flags = IORESOURCE_MEM; num_res++; -- 1.7.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 0/3] Cavium ThunderX2 SMMUv3 errata workarounds
Cavium ThunderX2 SMMUv3 implementation has two Silicon Erratas. 1. Errata ID #74 SMMU register alias Page 1 is not implemented 2. Errata ID #126 SMMU doesnt support unique IRQ lines and also MSI for gerror, eventq and cmdq-sync The following patchset does software workaround for these two erratas. This series is based on patchset. https://www.spinics.net/lists/arm-kernel/msg578443.html Changes since v6: - Changed device tree compatible string to vendor specific. - Rebased on Robin's latest "Update SMMU models for IORT rev. C" v2 patch. https://www.spinics.net/lists/arm-kernel/msg582809.html Changes since v5: - Rebased on Robin's "Update SMMU models for IORT rev. C" patch. https://www.spinics.net/lists/arm-kernel/msg580728.html - Replaced ACPI_IORT_SMMU_V3_CAVIUM_CN99XX macro with ACPI_IORT_SMMU_CAVIUM_CN99XX Changes since v4: - Replaced all page1 offset macros ARM_SMMU_EVTQ/PRIQ_PROD/CONS with arm_smmu_page1_fixup(ARM_SMMU_EVTQ/PRIQ_PROD/CONS, smmu) Changes since v3: - Merged patches 1, 2 and 4 of Version 3. - Modified the page1_offset_adjust() and get_irq_flags() implementation as suggested by Robin. Changes since v2: - Updated "Documentation/devicetree/bindings/iommu/arm,smmu-v3.txt" document with new SMMU option used to enable errata workaround. Changes since v1: - Since the use of MIDR register is rejected and SMMU_IIDR is broken on this silicon, as suggested by Will Deacon modified the patches to use ThunderX2 SMMUv3 IORT model number to enable errata workaround. Geetha Sowjanya (1): iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126 Linu Cherian (2): ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74 Documentation/arm64/silicon-errata.txt |2 + .../devicetree/bindings/iommu/arm,smmu-v3.txt |6 ++ drivers/acpi/arm64/iort.c | 10 ++- drivers/iommu/arm-smmu-v3.c| 93 4 files changed, 91 insertions(+), 20 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/arm-smmu-v3: Increase CMDQ drain timeout value
Hi Sunil, On Mon, May 29, 2017 at 02:41:59PM +0530, Sunil Kovvuri wrote: > On Fri, May 5, 2017 at 4:47 PM,wrote: > > From: Sunil Goutham > > > > Processing queue full of TLB invalidation commands might > > take more time on some platforms than current timeout > > of 100us. So increased drain timeout value. > > > > Also now udelay time is increased exponentially for each poll. > > > > Signed-off-by: Sunil Goutham [...] > Gentle reminder for patch review. Sorry, I forgot to push this out. Should be on my iommu/devel branch now. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/7] iommu/arm-smmu-v3: Add support for PCI ATS
On 30/05/17 11:28, Joerg Roedel wrote: > On Wed, May 24, 2017 at 07:01:42PM +0100, Jean-Philippe Brucker wrote: >> * TLB invalidation by range is batched and committed with a single sync. >> Batching ATC invalidation is inconvenient, endpoints limit the number of >> inflight invalidations. We'd have to count the number of invalidations >> queued and send a sync periodically. In addition, I suspect we always >> need a sync between TLB and ATC invalidation for the same page. > > This sounds like the number of outstanding ATS invalidations is not > managed by the SMMU hardware, is that right? Yes, the hardware doesn't know about ATS queue depth, it simply forwards invalidations to the root complex. Doing a sync on the SMMU command queue waits for all completions of outstanding ATS invalidations, but it is up to the driver to limit ATS invalidations according to queue depth. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/7] dt-bindings: PCI: Describe ATS property for root complex nodes
On 30/05/17 11:01, Joerg Roedel wrote: > On Wed, May 24, 2017 at 07:01:38PM +0100, Jean-Philippe Brucker wrote: >> +- ats-supported: if present, the root complex supports the Address >> + Translation Service (ATS). It is able to interpret the AT field in PCIe >> + Transaction Layer Packets, and forward Translation Completions or >> + Invalidation Requests to endpoints. >> + >> + Device drivers should not enable ATS in endpoints unless this property >> + is present. > > Device drivers should never enable ATS, this is the job of the IOMMU > driver. This should be clarified in the text. Right, I will change this. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/7] iommu/arm-smmu-v3: Add support for PCI ATS
On Wed, May 24, 2017 at 07:01:42PM +0100, Jean-Philippe Brucker wrote: > * TLB invalidation by range is batched and committed with a single sync. > Batching ATC invalidation is inconvenient, endpoints limit the number of > inflight invalidations. We'd have to count the number of invalidations > queued and send a sync periodically. In addition, I suspect we always > need a sync between TLB and ATC invalidation for the same page. This sounds like the number of outstanding ATS invalidations is not managed by the SMMU hardware, is that right? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Propagate IOVA allocation failure
On Fri, May 26, 2017 at 07:31:26PM +0100, Robin Murphy wrote: > Unlike the old allocator, alloc_iova_fast() will return 0 if it failed > to allocate a PFN. Since the callers of dma_ops_alloc_iova() would end > up treating that as a valid address, translate it to the DMA_ERROR_CODE > that they would expect. > > Fixes: 256e4621c21a ("iommu/amd: Make use of the generic IOVA allocator") > Signed-off-by: Robin Murphy> --- > > Just something I spotted whilst comparing dma_map_page() callchains... > > drivers/iommu/amd_iommu.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 63cacf5d6cf2..489dc302899e 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -1555,6 +1555,9 @@ static unsigned long dma_ops_alloc_iova(struct device > *dev, > if (!pfn) > pfn = alloc_iova_fast(_dom->iovad, pages, > IOVA_PFN(dma_mask)); > > + if (!pfn) > + return DMA_ERROR_CODE; > + That shouldn't make a difference on x86 because the DMA_ERROR_CODE is 0 as well. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: unwrap __get_valid_domain_for_dev()
On Mon, May 22, 2017 at 06:28:51PM +0800, Peter Xu wrote: > We do find_domain() in __get_valid_domain_for_dev(), while we do the > same thing in get_valid_domain_for_dev(). No need to do it twice. > > Signed-off-by: Peter Xu> --- > drivers/iommu/intel-iommu.c | 16 ++-- > 1 file changed, 2 insertions(+), 14 deletions(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 1/6] iommu: of: Fix check for returning EPROBE_DEFER
On Sat, May 27, 2017 at 07:17:40PM +0530, Sricharan R wrote: > Now with IOMMU probe deferral, we return -EPROBE_DEFER > for masters that are connected to an IOMMU which is not > probed yet, but going to get probed, so that we can attach > the correct dma_ops. So while trying to defer the probe of > the master, check if the of_iommu node that it is connected > to is marked in DT as 'status=disabled', then the IOMMU is never > is going to get probed. So simply return NULL and let the master > work without an IOMMU. > > Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred > probing or error") > Reported-by: Geert Uytterhoeven> Reviewed-by: Laurent Pinchart > Tested-by: Will Deacon > Tested-by: Magnus Damn > Acked-by: Will Deacon > Signed-off-by: Sricharan R > --- > drivers/iommu/of_iommu.c | 1 + > 1 file changed, 1 insertion(+) Sorry for the confusion, I meant to apply this version of the patch-set instead of v5. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 1/5] iommu: of: Fix check for returning EPROBE_DEFER
On Tue, May 23, 2017 at 06:31:29PM +0530, Sricharan R wrote: > Now with IOMMU probe deferral, we return -EPROBE_DEFER > for masters that are connected to an IOMMU which is not > probed yet, but going to get probed, so that we can attach > the correct dma_ops. So while trying to defer the probe of > the master, check if the of_iommu node that it is connected > to is marked in DT as 'status=disabled', then the IOMMU is never > is going to get probed. So simply return NULL and let the master > work without an IOMMU. > > Fixes: 7b07cbefb68d ("iommu: of: Handle IOMMU lookup failure with deferred > probing or error") > Signed-off-by: Sricharan R> Reported-by: Geert Uytterhoeven > Reviewed-by: Laurent Pinchart > Tested-by: Will Deacon > Tested-by: Magnus Damn > Acked-by: Will Deacon Applied all to iommu/fixes, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/2] ACPICA: IORT: Update SMMU models for IORT rev. C
On Mon, May 22, 2017 at 04:06:37PM +0100, Robin Murphy wrote: > IORT revision C has been published with a number of new SMMU > implementation identifiers. Since IORT doesn't have any way of falling > back to a more generic model code, we really need Linux to know about > these before vendors start updating their firmware tables to use them. > > CC: Rafael J. Wysocki> CC: Robert Moore > CC: Lv Zheng > Acked-by: Robert Richter > Tested-by: Robert Richter > Signed-off-by: Robin Murphy > --- > > v2: Update more comments, add Robert's tags. I generally prefer 'Fixes'-tags, can you please add them too? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 3/6] ACPI/IORT: Ignore all errors except EPROBE_DEFER
On Mon, May 29, 2017 at 10:36:42AM +0530, Sricharan R wrote: > Hi Rafael, > > On 5/28/2017 12:48 AM, Rafael J. Wysocki wrote: > > On Saturday, May 27, 2017 07:17:42 PM Sricharan R wrote: > >> While deferring the probe of IOMMU masters, xlate and > >> add_device callbacks called from iort_iommu_configure > >> can pass back error values like -ENODEV, which means > >> the IOMMU cannot be connected with that master for real > >> reasons. Before the IOMMU probe deferral, all such errors > >> were ignored. Now all those errors are propagated back, > >> killing the master's probe for such errors. Instead ignore > >> all the errors except EPROBE_DEFER, which is the only one > >> of concern and let the master work without IOMMU, thus > >> restoring the old behavior. Also make explicit that > >> acpi_dma_configure handles only -EPROBE_DEFER from > >> iort_iommu_configure. > >> > >> Fixes: 5a1bb638d567 ("drivers: acpi: Handle IOMMU lookup failure with > >> deferred probing or error") > >> Signed-off-by: Sricharan R> >> --- > >> drivers/acpi/arm64/iort.c | 6 ++ > >> drivers/acpi/scan.c | 4 ++-- > >> 2 files changed, 8 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > >> index c5fecf9..16e101f 100644 > >> --- a/drivers/acpi/arm64/iort.c > >> +++ b/drivers/acpi/arm64/iort.c > >> @@ -782,6 +782,12 @@ const struct iommu_ops *iort_iommu_configure(struct > >> device *dev) > >>if (err) > >>ops = ERR_PTR(err); > >> > >> + /* Ignore all other errors apart from EPROBE_DEFER */ > >> + if (IS_ERR(ops) && (PTR_ERR(ops) != -EPROBE_DEFER)) { > >> + dev_dbg(dev, "Adding to IOMMU failed: %ld\n", PTR_ERR(ops)); > >> + ops = NULL; > >> + } > >> + > >>return ops; > >> } > >> > >> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > >> index e39ec7b..3a10d757 100644 > >> --- a/drivers/acpi/scan.c > >> +++ b/drivers/acpi/scan.c > >> @@ -1371,8 +1371,8 @@ int acpi_dma_configure(struct device *dev, enum > >> dev_dma_attr attr) > >>iort_set_dma_mask(dev); > >> > >>iommu = iort_iommu_configure(dev); > >> - if (IS_ERR(iommu)) > >> - return PTR_ERR(iommu); > >> + if (IS_ERR(iommu) && PTR_ERR(iommu) == -EPROBE_DEFER) > >> + return -EPROBE_DEFER; > >> > >>size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1); > >>/* > >> > > > > ACK for the scan.c change and I'm assuming this to go in via ARM64. > > > > Thanks for the ACK, should go through the IOMMU tree, since this fixes the > IOMMU probe deferral > that got merged through it. Yes and it would be good to get them merged for -rc4 given that there are bug fixes there, I agree the IOMMU tree is the way they should go upstream. Lorenzo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Device address specific mapping of arm,mmu-500
Hi All, I'm writing to check with you to see if the latest arm-smmu.c driver in v4.12-rc Linux for smmu-500 can support mapping that is only specific to a particular physical address range while leave the rest still to be handled by the client device. I believe this can already be supported by the device tree binding of the generic IOMMU framework; however, it is not clear to me whether or not the arm-smmu.c driver can support it. To give you some background information: We have a SoC that has PCIe root complex that has a build-in logic block to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block has a HW bug that causes the MSI writes not parsed properly and can potentially corrupt data in the internal FIFO. A workaround is to have ARM MMU-500 takes care of all inbound transactions. I found that is working after hooking up our PCIe root complex to MMU-500; however, even with this optimized arm-smmu driver in v4.12, I'm still seeing a significant Ethernet throughput drop in both the TX and RX directions. The throughput drop is very significant at around 50% (but is already much improved compared to other prior kernel versions at 70~90%). One alternative is to only use MMU-500 for MSI writes towards GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific region of physical address that I want MMU-500 to act on and leave the rest of inbound transactions to be handled directly by our PCIe controller, it can potentially work around the HW bug we have and at the same time achieve optimal throughput. Any feedback from you is greatly appreciated! Best regards, Ray ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3)
>-Original Message- >From: Joerg Roedel [mailto:j...@8bytes.org] >Sent: Monday, May 29, 2017 8:09 PM >To: Nath, Arindam; Lendacky, Thomas > >Cc: iommu@lists.linux-foundation.org; amd-...@lists.freedesktop.org; >Deucher, Alexander ; Bridgman, John > ; dr...@endlessm.com; Suthikulpanit, Suravee > ; li...@endlessm.com; Craig Stein > ; mic...@daenzer.net; Kuehling, Felix > ; sta...@vger.kernel.org >Subject: Re: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3) > >Hi Arindam, > >I met Tom Lendacky last week in Nuremberg last week and he told me he is >working on the same area of the code that this patch is for. His reason >for touching this code was to solve some locking problems. Maybe you two >can work together on a joint approach to improve this? Sure Joerg, I will work with Tom. Thanks. > >On Mon, May 22, 2017 at 01:18:01PM +0530, arindam.n...@amd.com wrote: >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c >> index 63cacf5..1edeebec 100644 >> --- a/drivers/iommu/amd_iommu.c >> +++ b/drivers/iommu/amd_iommu.c >> @@ -2227,15 +2227,26 @@ static struct iommu_group >*amd_iommu_device_group(struct device *dev) >> >> static void __queue_flush(struct flush_queue *queue) >> { >> -struct protection_domain *domain; >> -unsigned long flags; >> int idx; >> >> -/* First flush TLB of all known domains */ >> -spin_lock_irqsave(_iommu_pd_lock, flags); >> -list_for_each_entry(domain, _iommu_pd_list, list) >> -domain_flush_tlb(domain); >> -spin_unlock_irqrestore(_iommu_pd_lock, flags); >> +/* First flush TLB of all domains which were added to flush queue */ >> +for (idx = 0; idx < queue->next; ++idx) { >> +struct flush_queue_entry *entry; >> + >> +entry = queue->entries + idx; >> + >> +/* >> + * There might be cases where multiple IOVA entries for the >> + * same domain are queued in the flush queue. To avoid >> + * flushing the same domain again, we check whether the >> + * flag is set or not. This improves the efficiency of >> + * flush operation. >> + */ >> +if (!entry->dma_dom->domain.already_flushed) { >> +entry->dma_dom->domain.already_flushed = true; >> +domain_flush_tlb(>dma_dom->domain); >> +} >> +} > >There is also a race condition here I have to look into. It is not >introduced by your patch, but needs fixing anyway. I'll look into this >too. > > >Regards, > > Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu