Re: [PATCH v2] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock
在 2021/4/15 17:49, John Garry 写道: On 15/04/2021 04:52, chenxiang wrote: From: Xiang Chen It is not necessary to put free_iova_mem() inside of spinlock/unlock iova_rbtree_lock which only leads to more completion for the spinlock. It has a small promote on the performance after the change. And also rename private_free_iova() as remove_iova() because the function will not free iova after that change. Signed-off-by: Xiang Chen --- drivers/iommu/iova.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index b7ecd5b..c10af23 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -412,12 +412,11 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn) return NULL; } -static void private_free_iova(struct iova_domain *iovad, struct iova *iova) +static void remove_iova(struct iova_domain *iovad, struct iova *iova) { assert_spin_locked(>iova_rbtree_lock); __cached_rbnode_delete_update(iovad, iova); rb_erase(>node, >rbroot); -free_iova_mem(iova); } /** @@ -452,8 +451,9 @@ __free_iova(struct iova_domain *iovad, struct iova *iova) unsigned long flags; spin_lock_irqsave(>iova_rbtree_lock, flags); -private_free_iova(iovad, iova); +remove_iova(iovad, iova); spin_unlock_irqrestore(>iova_rbtree_lock, flags); +free_iova_mem(iova); } EXPORT_SYMBOL_GPL(__free_iova); @@ -473,9 +473,9 @@ free_iova(struct iova_domain *iovad, unsigned long pfn) spin_lock_irqsave(>iova_rbtree_lock, flags); iova = private_find_iova(iovad, pfn); if (iova) -private_free_iova(iovad, iova); +remove_iova(iovad, iova); spin_unlock_irqrestore(>iova_rbtree_lock, flags); - +free_iova_mem(iova); you need to make this: if (iova) free_iova_mem(iova); as free_iova_mem(iova) dereferences iova: if (iova->pfn_lo != IOVA_ANCHOR) kmem_cache_free(iova_cache, iova) So if iova were NULL, we crash. Or you can make free_iova_mem() NULL safe. Right, it has the issue. What about changing it as below? @@ -472,10 +472,13 @@ free_iova(struct iova_domain *iovad, unsigned long pfn) spin_lock_irqsave(>iova_rbtree_lock, flags); iova = private_find_iova(iovad, pfn); - if (iova) - private_free_iova(iovad, iova); + if (!iova) { + spin_unlock_irqrestore(>iova_rbtree_lock, flags); + return; + } + remove_iova(iovad, iova); spin_unlock_irqrestore(>iova_rbtree_lock, flags); - + free_iova_mem(iova); } EXPORT_SYMBOL_GPL(free_iova); } EXPORT_SYMBOL_GPL(free_iova); @@ -825,7 +825,8 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad) if (WARN_ON(!iova)) continue; -private_free_iova(iovad, iova); +remove_iova(iovad, iova); +free_iova_mem(iova); } spin_unlock_irqrestore(>iova_rbtree_lock, flags); . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
On Thu, Apr 15, 2021 at 03:11:19PM +0200, Auger Eric wrote: > Hi Jason, > > On 4/1/21 6:03 PM, Jason Gunthorpe wrote: > > On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote: > > > >> DMA page faults are delivered to root-complex via page request message and > >> it is per-device according to PCIe spec. Page request handling flow is: > >> > >> 1) iommu driver receives a page request from device > >> 2) iommu driver parses the page request message. Get the RID,PASID, faulted > >>page and requested permissions etc. > >> 3) iommu driver triggers fault handler registered by device driver with > >>iommu_report_device_fault() > > > > This seems confused. > > > > The PASID should define how to handle the page fault, not the driver. > > In my series I don't use PASID at all. I am just enabling nested stage > and the guest uses a single context. I don't allocate any user PASID at > any point. > > When there is a fault at physical level (a stage 1 fault that concerns > the guest), this latter needs to be reported and injected into the > guest. The vfio pci driver registers a fault handler to the iommu layer > and in that fault handler it fills a circ bugger and triggers an eventfd > that is listened to by the VFIO-PCI QEMU device. this latter retrives > the faault from the mmapped circ buffer, it knowns which vIOMMU it is > attached to, and passes the fault to the vIOMMU. > Then the vIOMMU triggers and IRQ in the guest. > > We are reusing the existing concepts from VFIO, region, IRQ to do that. > > For that use case, would you also use /dev/ioasid? /dev/ioasid could do all the things you described vfio-pci as doing, it can even do them the same way you just described. Stated another way, do you plan to duplicate all of this code someday for vfio-cxl? What about for vfio-platform? ARM SMMU can be hooked to platform devices, right? I feel what you guys are struggling with is some choice in the iommu kernel APIs that cause the events to be delivered to the pci_device owner, not the PASID owner. That feels solvable. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/2] iommu/sva: Remove mm parameter from SVA bind API
Hi Christoph, On Thu, 15 Apr 2021 07:44:59 +0100, Christoph Hellwig wrote: > > * > > * Returns 0 on success and < 0 on error. > > @@ -28,6 +28,9 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, > > ioasid_t min, ioasid_t max) int ret = 0; > > ioasid_t pasid; > > > > + if (mm != current->mm) > > + return -EINVAL; > > + > > Why not remove the parameter entirely? It was removed in my v1 but thought it would be cleaner if we treat iommu_sva_alloc_pasid() as a leaf function of iommu_sva_bind_device(). Then we don't have to do get_task_mm() every time. But to your point below, it is better to get low-level driver handle it. > > > @@ -2989,8 +2990,11 @@ iommu_sva_bind_device(struct device *dev, struct > > mm_struct *mm, unsigned int fla return ERR_PTR(-ENODEV); > > > > /* Supervisor SVA does not need the current mm */ > > - if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && mm) > > - return ERR_PTR(-EINVAL); > > + if (!(flags & IOMMU_SVA_BIND_SUPERVISOR)) { > > + mm = get_task_mm(current); > > + if (!mm) > > + return ERR_PTR(-EINVAL); > > + } > > I don't see why we need the reference. I think we should just stop > passing the mm to ->sva_bind and let the low-level driver deal with > any reference to current->mm where needed. The mm users reference is just for precaution, in case low level driver use kthread etc. I agree it is cleaner to just remove mm here, let the low-level driver deal with it. Let me give it a spin. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V2 09/12] swiotlb: Add bounce buffer remap address setting function
On Wed, Apr 14, 2021 at 10:49:42AM -0400, Tianyu Lan wrote: > From: Tianyu Lan > > For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory) > needs to be accessed via extra address space(e.g address above bit39). > Hyper-V code may remap extra address space outside of swiotlb. > swiotlb_bounce() May? Isn't this a MUST in this case? > needs to use remap virtual address to copy data from/to bounce buffer. Add > new interface swiotlb_set_bounce_remap() to do that. I am bit lost - why can't you use the swiotlb_init and pass in your DMA pool that is already above bit 39? > > Signed-off-by: Tianyu Lan > --- > include/linux/swiotlb.h | 5 + > kernel/dma/swiotlb.c| 13 - > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index d9c9fc9ca5d2..3ccd08116683 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -82,8 +82,13 @@ unsigned int swiotlb_max_segment(void); > size_t swiotlb_max_mapping_size(struct device *dev); > bool is_swiotlb_active(void); > void __init swiotlb_adjust_size(unsigned long new_size); > +void swiotlb_set_bounce_remap(unsigned char *vaddr); > #else > #define swiotlb_force SWIOTLB_NO_FORCE > +static inline void swiotlb_set_bounce_remap(unsigned char *vaddr) > +{ > +} > + > static inline bool is_swiotlb_buffer(phys_addr_t paddr) > { > return false; > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 7c42df6e6100..5fd2db6aa149 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -94,6 +94,7 @@ static unsigned int io_tlb_index; > * not be bounced (unless SWIOTLB_FORCE is set). > */ > static unsigned int max_segment; > +static unsigned char *swiotlb_bounce_remap_addr; > > /* > * We need to save away the original address corresponding to a mapped entry > @@ -421,6 +422,11 @@ void __init swiotlb_exit(void) > swiotlb_cleanup(); > } > > +void swiotlb_set_bounce_remap(unsigned char *vaddr) > +{ > + swiotlb_bounce_remap_addr = vaddr; > +} > + > /* > * Bounce: copy the swiotlb buffer from or back to the original dma location > */ > @@ -428,7 +434,12 @@ static void swiotlb_bounce(phys_addr_t orig_addr, > phys_addr_t tlb_addr, > size_t size, enum dma_data_direction dir) > { > unsigned long pfn = PFN_DOWN(orig_addr); > - unsigned char *vaddr = phys_to_virt(tlb_addr); > + unsigned char *vaddr; > + > + if (swiotlb_bounce_remap_addr) > + vaddr = swiotlb_bounce_remap_addr + tlb_addr - io_tlb_start; > + else > + vaddr = phys_to_virt(tlb_addr); > > if (PageHighMem(pfn_to_page(pfn))) { > /* The buffer does not have a mapping. Map it in and copy */ > -- > 2.25.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V2 07/12] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
On Wed, Apr 14, 2021 at 10:49:40AM -0400, Tianyu Lan wrote: > From: Tianyu Lan > > VMbus ring buffer are shared with host and it's need to > be accessed via extra address space of Isolation VM with > SNP support. This patch is to map the ring buffer > address in extra address space via ioremap(). HV host Why do you need to use ioremap()? Why not just use vmap? > visibility hvcall smears data in the ring buffer and > so reset the ring buffer memory to zero after calling > visibility hvcall. So you are exposing these two: EXPORT_SYMBOL_GPL(get_vm_area); EXPORT_SYMBOL_GPL(ioremap_page_range); But if you used vmap wouldn't you get the same thing for free? > > Signed-off-by: Tianyu Lan > --- > drivers/hv/channel.c | 10 + > drivers/hv/hyperv_vmbus.h | 2 + > drivers/hv/ring_buffer.c | 83 +-- > mm/ioremap.c | 1 + > mm/vmalloc.c | 1 + > 5 files changed, 76 insertions(+), 21 deletions(-) > > diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c > index 407b74d72f3f..4a9fb7ad4c72 100644 > --- a/drivers/hv/channel.c > +++ b/drivers/hv/channel.c > @@ -634,6 +634,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel, > if (err) > goto error_clean_ring; > > + err = hv_ringbuffer_post_init(>outbound, > + page, send_pages); > + if (err) > + goto error_free_gpadl; > + > + err = hv_ringbuffer_post_init(>inbound, > + [send_pages], recv_pages); > + if (err) > + goto error_free_gpadl; > + > /* Create and init the channel open message */ > open_info = kzalloc(sizeof(*open_info) + > sizeof(struct vmbus_channel_open_channel), > diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h > index 0778add21a9c..d78a04ad5490 100644 > --- a/drivers/hv/hyperv_vmbus.h > +++ b/drivers/hv/hyperv_vmbus.h > @@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu); > /* Interface */ > > void hv_ringbuffer_pre_init(struct vmbus_channel *channel); > +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info, > + struct page *pages, u32 page_cnt); > > int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, > struct page *pages, u32 pagecnt); > diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c > index 35833d4d1a1d..c8b0f7b45158 100644 > --- a/drivers/hv/ring_buffer.c > +++ b/drivers/hv/ring_buffer.c > @@ -17,6 +17,8 @@ > #include > #include > #include > +#include > +#include > > #include "hyperv_vmbus.h" > > @@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel > *channel) > mutex_init(>outbound.ring_buffer_mutex); > } > > +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info, > +struct page *pages, u32 page_cnt) > +{ > + struct vm_struct *area; > + u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT; > + unsigned long vaddr; > + int err = 0; > + > + if (!hv_isolation_type_snp()) > + return 0; > + > + physic_addr += ms_hyperv.shared_gpa_boundary; > + area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP); > + if (!area || !area->addr) > + return -EFAULT; > + > + vaddr = (unsigned long)area->addr; > + err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE, > +physic_addr, PAGE_KERNEL_IO); > + err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE, > + vaddr + (2 * page_cnt - 1) * PAGE_SIZE, > + physic_addr + PAGE_SIZE, PAGE_KERNEL_IO); > + if (err) { > + vunmap((void *)vaddr); > + return -EFAULT; > + } > + > + /* Clean memory after setting host visibility. */ > + memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE); > + > + ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr; > + ring_info->ring_buffer->read_index = 0; > + ring_info->ring_buffer->write_index = 0; > + ring_info->ring_buffer->feature_bits.value = 1; > + > + return 0; > +} > + > /* Initialize the ring buffer. */ > int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info, > struct page *pages, u32 page_cnt) > @@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info > *ring_info, > > BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE)); > > - /* > - * First page holds struct hv_ring_buffer, do wraparound mapping for > - * the rest. > - */ > - pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *), > -GFP_KERNEL); > - if (!pages_wraparound) > - return -ENOMEM; > - > - pages_wraparound[0] = pages; > - for (i = 0; i < 2 * (page_cnt - 1); i++) > - pages_wraparound[i + 1] = [i %
Re: [Resend RFC PATCH V2 06/12] HV/Vmbus: Add SNP support for VMbus channel initiate message
On Wed, Apr 14, 2021 at 10:49:39AM -0400, Tianyu Lan wrote: > From: Tianyu Lan > > The physical address of monitor pages in the CHANNELMSG_INITIATE_CONTACT > msg should be in the extra address space for SNP support and these What is this 'extra address space'? Is that just normal virtual address space of the Linux kernel? > pages also should be accessed via the extra address space inside Linux > guest and remap the extra address by ioremap function. OK, why do you need to use ioremap on them? Why not use vmap for example? What is it that makes ioremap the right candidate? > > Signed-off-by: Tianyu Lan > --- > drivers/hv/connection.c | 62 +++ > drivers/hv/hyperv_vmbus.h | 1 + > 2 files changed, 63 insertions(+) > > diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c > index 79bca653dce9..a0be9c11d737 100644 > --- a/drivers/hv/connection.c > +++ b/drivers/hv/connection.c > @@ -101,6 +101,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo > *msginfo, u32 version) > > msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]); > msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]); > + > + if (hv_isolation_type_snp()) { > + msg->monitor_page1 += ms_hyperv.shared_gpa_boundary; > + msg->monitor_page2 += ms_hyperv.shared_gpa_boundary; > + } > + > msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU); > > /* > @@ -145,6 +151,29 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo > *msginfo, u32 version) > return -ECONNREFUSED; > } > > + if (hv_isolation_type_snp()) { > + vmbus_connection.monitor_pages_va[0] > + = vmbus_connection.monitor_pages[0]; > + vmbus_connection.monitor_pages[0] > + = ioremap_cache(msg->monitor_page1, HV_HYP_PAGE_SIZE); > + if (!vmbus_connection.monitor_pages[0]) > + return -ENOMEM; > + > + vmbus_connection.monitor_pages_va[1] > + = vmbus_connection.monitor_pages[1]; > + vmbus_connection.monitor_pages[1] > + = ioremap_cache(msg->monitor_page2, HV_HYP_PAGE_SIZE); > + if (!vmbus_connection.monitor_pages[1]) { > + vunmap(vmbus_connection.monitor_pages[0]); > + return -ENOMEM; > + } > + > + memset(vmbus_connection.monitor_pages[0], 0x00, > +HV_HYP_PAGE_SIZE); > + memset(vmbus_connection.monitor_pages[1], 0x00, > +HV_HYP_PAGE_SIZE); > + } > + > return ret; > } > > @@ -156,6 +185,7 @@ int vmbus_connect(void) > struct vmbus_channel_msginfo *msginfo = NULL; > int i, ret = 0; > __u32 version; > + u64 pfn[2]; > > /* Initialize the vmbus connection */ > vmbus_connection.conn_state = CONNECTING; > @@ -213,6 +243,16 @@ int vmbus_connect(void) > goto cleanup; > } > > + if (hv_isolation_type_snp()) { > + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]); > + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]); > + if (hv_mark_gpa_visibility(2, pfn, > + VMBUS_PAGE_VISIBLE_READ_WRITE)) { > + ret = -EFAULT; > + goto cleanup; > + } > + } > + > msginfo = kzalloc(sizeof(*msginfo) + > sizeof(struct vmbus_channel_initiate_contact), > GFP_KERNEL); > @@ -279,6 +319,8 @@ int vmbus_connect(void) > > void vmbus_disconnect(void) > { > + u64 pfn[2]; > + > /* >* First send the unload request to the host. >*/ > @@ -298,6 +340,26 @@ void vmbus_disconnect(void) > vmbus_connection.int_page = NULL; > } > > + if (hv_isolation_type_snp()) { > + if (vmbus_connection.monitor_pages_va[0]) { > + vunmap(vmbus_connection.monitor_pages[0]); > + vmbus_connection.monitor_pages[0] > + = vmbus_connection.monitor_pages_va[0]; > + vmbus_connection.monitor_pages_va[0] = NULL; > + } > + > + if (vmbus_connection.monitor_pages_va[1]) { > + vunmap(vmbus_connection.monitor_pages[1]); > + vmbus_connection.monitor_pages[1] > + = vmbus_connection.monitor_pages_va[1]; > + vmbus_connection.monitor_pages_va[1] = NULL; > + } > + > + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]); > + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]); > + hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE); > + } > + > hv_free_hyperv_page((unsigned
Re: [PATCH v2 1/2] iommu/sva: Tighten SVA bind API with explicit flags
Hi Christoph, Thanks for the review. On Thu, 15 Apr 2021 07:40:33 +0100, Christoph Hellwig wrote: > On Wed, Apr 14, 2021 at 08:27:56AM -0700, Jacob Pan wrote: > > static int idxd_enable_system_pasid(struct idxd_device *idxd) > > { > > - int flags; > > + unsigned int flags; > > unsigned int pasid; > > struct iommu_sva *sva; > > > > - flags = SVM_FLAG_SUPERVISOR_MODE; > > + flags = IOMMU_SVA_BIND_SUPERVISOR; > > > > - sva = iommu_sva_bind_device(>pdev->dev, NULL, ); > > + sva = iommu_sva_bind_device(>pdev->dev, NULL, flags); > > Please also remove the now pointless flags variable. > Good catch. > > +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, > > unsigned int flags) > > Pleae avoid the pointless overly long line. > > > -#define SVM_FLAG_GUEST_PASID (1<<3) > > +#define SVM_FLAG_GUEST_PASID (1<<2) > > This flag is entirely unused, please just remove it in a prep patch > rather than renumbering it. > You are right. The flag was set and intended to be used by the guest IO page request patches by Baolu. As you might be aware, we are restructuring the guest SVA uAPI according to Jason's proposal, can we wait until we have a clear solution? We may refactor lots of code. > > static inline struct iommu_sva * > > -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void > > *drvdata) +iommu_sva_bind_device(struct device *dev, struct mm_struct > > *mm, unsigned int flags) > > Same overy long line here. This is temporary as the mm parameter will be removed in the next patch. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb
On Wed, Apr 14, 2021 at 10:49:37AM -0400, Tianyu Lan wrote: > From: Tianyu Lan > > Hyper-V provides GHCB protocol to write Synthetic Interrupt > Controller MSR registers and these registers are emulated by > Hypervisor rather than paravisor. What is paravisor? Is that the VMPL0 to borrow AMD speak from https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf > > Hyper-V requests to write SINTx MSR registers twice(once via > GHCB and once via wrmsr instruction including the proxy bit 21) Why? And what does 'proxy bit 21' mean? Isn't it just setting a bit on the MSR? Oh. Are you writting to it twice because you need to let the hypervisor know (Hyper-V) which is done via the WRMSR. And then the inner hypervisor (VMPL0) via the SINITx MSR? > Guest OS ID MSR also needs to be set via GHCB. > > Signed-off-by: Tianyu Lan > --- > arch/x86/hyperv/hv_init.c | 18 + > arch/x86/hyperv/ivm.c | 130 > arch/x86/include/asm/mshyperv.h | 87 + > arch/x86/kernel/cpu/mshyperv.c | 3 + > drivers/hv/hv.c | 65 +++- > include/asm-generic/mshyperv.h | 4 +- > 6 files changed, 261 insertions(+), 46 deletions(-) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 90e65fbf4c58..87b1dd9c84d6 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -475,6 +475,9 @@ void __init hyperv_init(void) > > ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); > *ghcb_base = ghcb_va; > + > + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */ > + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id); > } > > rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); > @@ -561,6 +564,7 @@ void hyperv_cleanup(void) > > /* Reset our OS id */ > wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0); > + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0); > > /* >* Reset hypercall page reference before reset the page, > @@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void) > return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4); > } > EXPORT_SYMBOL_GPL(hv_is_hibernation_supported); > - > -enum hv_isolation_type hv_get_isolation_type(void) > -{ > - if (!(ms_hyperv.features_b & HV_ISOLATION)) > - return HV_ISOLATION_TYPE_NONE; > - return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b); > -} > -EXPORT_SYMBOL_GPL(hv_get_isolation_type); > - > -bool hv_is_isolation_supported(void) > -{ > - return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE; > -} > -EXPORT_SYMBOL_GPL(hv_is_isolation_supported); > diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c > index a5950b7a9214..2ec64b367aaf 100644 > --- a/arch/x86/hyperv/ivm.c > +++ b/arch/x86/hyperv/ivm.c > @@ -6,12 +6,139 @@ > * Tianyu Lan > */ > > +#include > +#include > #include > #include > #include > #include > +#include > +#include > #include > > +union hv_ghcb { > + struct ghcb ghcb; > +} __packed __aligned(PAGE_SIZE); > + > +void hv_ghcb_msr_write(u64 msr, u64 value) > +{ > + union hv_ghcb *hv_ghcb; > + void **ghcb_base; > + unsigned long flags; > + > + if (!ms_hyperv.ghcb_base) > + return; > + > + local_irq_save(flags); > + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); > + hv_ghcb = (union hv_ghcb *)*ghcb_base; > + if (!hv_ghcb) { > + local_irq_restore(flags); > + return; > + } > + > + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE); > + > + hv_ghcb->ghcb.protocol_version = 1; > + hv_ghcb->ghcb.ghcb_usage = 0; > + > + ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR); > + ghcb_set_rcx(_ghcb->ghcb, msr); > + ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value)); > + ghcb_set_rdx(_ghcb->ghcb, value >> 32); > + ghcb_set_sw_exit_info_1(_ghcb->ghcb, 1); > + ghcb_set_sw_exit_info_2(_ghcb->ghcb, 0); > + > + VMGEXIT(); > + > + if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1) > + pr_warn("Fail to write msr via ghcb.\n."); > + > + local_irq_restore(flags); > +} > +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write); > + > +void hv_ghcb_msr_read(u64 msr, u64 *value) > +{ > + union hv_ghcb *hv_ghcb; > + void **ghcb_base; > + unsigned long flags; > + > + if (!ms_hyperv.ghcb_base) > + return; > + > + local_irq_save(flags); > + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base); > + hv_ghcb = (union hv_ghcb *)*ghcb_base; > + if (!hv_ghcb) { > + local_irq_restore(flags); > + return; > + } > + > + memset(hv_ghcb, 0x00, PAGE_SIZE); > + hv_ghcb->ghcb.protocol_version = 1; > + hv_ghcb->ghcb.ghcb_usage = 0; > + > + ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR); > + ghcb_set_rcx(_ghcb->ghcb,
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
Hi Suravee! On 15/04/2021 10:28, Suthikulpanit, Suravee wrote: David, On 4/14/2021 10:33 PM, David Coe wrote: Hi Suravee! I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached the code! There are 3 sets of results in the attachment, all for the Ryzen 2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 patch. A. As-distributed kernel (cold boot) >5 retries, so no IOMMU read/write capability, no amd_iommu events. B. As-distributed kernel (warm boot) <5 retries, amd_iommu running stats show large numbers as before. C. Revert+Update kernel amd_iommu events listed and also show large hit/miss numbers. In due course, I'll load the new (revert+update) kernel on the 4700G but won't overload your mail-box unless something unusual turns up. Best regards, For the Ryzen 2400G, could you please try with: - 1 event at a time - Not more than 8 events (On your system, it has 2 banks x 4 counters/bank. I am trying to see if this issue might be related to the counters multiplexing). Thanks, Herewith similar one-at-a-time perf stat results for the Ryzen 4700U. As before, more than 8 events displays the third (%) column and (sometimes) 'silly' numbers in the first column. -- David $ sudo ./iommu_list.sh Performance counter stats for 'system wide': 20 amd_iommu_0/cmd_processed/ 10.003010666 seconds time elapsed Performance counter stats for 'system wide': 5 amd_iommu_0/cmd_processed_inv/ 10.002349464 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 10.002386129 seconds time elapsed Performance counter stats for 'system wide': 325 amd_iommu_0/int_dte_hit/ 10.002346630 seconds time elapsed Performance counter stats for 'system wide': 2 amd_iommu_0/int_dte_mis/ 10.002365656 seconds time elapsed Performance counter stats for 'system wide': 356 amd_iommu_0/mem_dte_hit/ 10.002426866 seconds time elapsed Performance counter stats for 'system wide': 3,955 amd_iommu_0/mem_dte_mis/ 10.002729058 seconds time elapsed Performance counter stats for 'system wide': 4 amd_iommu_0/mem_iommu_tlb_pde_hit/ 10.002422610 seconds time elapsed Performance counter stats for 'system wide': 1,326 amd_iommu_0/mem_iommu_tlb_pde_mis/ 10.002397045 seconds time elapsed Performance counter stats for 'system wide': 1,009 amd_iommu_0/mem_iommu_tlb_pte_hit/ 10.002445347 seconds time elapsed Performance counter stats for 'system wide': 1,072 amd_iommu_0/mem_iommu_tlb_pte_mis/ 10.002414734 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/mem_pass_excl/ 10.002435482 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/mem_pass_pretrans/ 10.002409956 seconds time elapsed Performance counter stats for 'system wide': 3,405 amd_iommu_0/mem_pass_untrans/ 10.002563812 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/mem_target_abort/ 10.002473657 seconds time elapsed Performance counter stats for 'system wide': 1,311 amd_iommu_0/mem_trans_total/ 10.002471787 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/page_tbl_read_gst/ 10.002216033 seconds time elapsed Performance counter stats for 'system wide': 276,609 amd_iommu_0/page_tbl_read_nst/ 10.002029261 seconds time elapsed Performance counter stats for 'system wide': 126,161 amd_iommu_0/page_tbl_read_tot/ 10.003569029 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/smi_blk/ 10.001871818 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/smi_recv/ 10.002212891 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/tlb_inv/ 10.002396606 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/vapic_int_guest/ 10.002435308 seconds time elapsed Performance counter stats for 'system wide': 340 amd_iommu_0/vapic_int_non_guest/ 10.002405868 seconds time elapsed $ sudo perf stat -e 'amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, amd_iommu_0/mem_iommu_tlb_pte_hit/,
Re: [PATCH 3/3] iommu/virtio: Enable x86 support
On Thu, Mar 18, 2021 at 02:28:02PM -0400, Michael S. Tsirkin wrote: > On Tue, Mar 16, 2021 at 08:16:54PM +0100, Jean-Philippe Brucker wrote: > > With the VIOT support in place, x86 platforms can now use the > > virtio-iommu. > > > > The arm64 Kconfig selects IOMMU_DMA, while x86 IOMMU drivers select it > > themselves. > > > > Signed-off-by: Jean-Philippe Brucker > > Acked-by: Michael S. Tsirkin > > > --- > > drivers/iommu/Kconfig | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > index 2819b5c8ec30..ccca83ef2f06 100644 > > --- a/drivers/iommu/Kconfig > > +++ b/drivers/iommu/Kconfig > > @@ -400,8 +400,9 @@ config HYPERV_IOMMU > > config VIRTIO_IOMMU > > tristate "Virtio IOMMU driver" > > depends on VIRTIO > > - depends on ARM64 > > + depends on (ARM64 || X86) > > select IOMMU_API > > + select IOMMU_DMA if X86 > > Would it hurt to just select unconditionally? Seems a bit cleaner > ... Yes I think I'll do this for the moment Thanks, Jean > > > select INTERVAL_TREE > > select ACPI_VIOT if ACPI > > help > > -- > > 2.30.2 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] iommu/virtio: Enable x86 support
On Thu, Mar 18, 2021 at 11:43:38AM +, Robin Murphy wrote: > On 2021-03-16 19:16, Jean-Philippe Brucker wrote: > > With the VIOT support in place, x86 platforms can now use the > > virtio-iommu. > > > > The arm64 Kconfig selects IOMMU_DMA, while x86 IOMMU drivers select it > > themselves. > > Actually, now that both AMD and Intel are converted over, maybe it's finally > time to punt that to x86 arch code to match arm64? x86 also has CONFIG_HYPERV_IOMMU that doesn't use IOMMU_DMA, and might not want to pull in dma-iommu.o + iova.o (don't know if they care about guest size). There also is the old gart driver, but that doesn't select IOMMU_SUPPORT. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/fsl-pamu: Fix uninitialized variable warning
On Thu, Apr 15, 2021 at 04:44:42PM +0200, Joerg Roedel wrote: > From: Joerg Roedel > > The variable 'i' in the function update_liodn_stash() is not > initialized and only used in a debug printk(). So it has no > meaning at all, remove it. > > Reported-by: kernel test robot > Signed-off-by: Joerg Roedel Sorry for introducing this. The fix looks good: Reviewed-by: Christoph Hellwig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] ACPI: Add driver for the VIOT table
On Fri, Mar 19, 2021 at 11:44:26AM +0100, Auger Eric wrote: > add some kernel-doc comments matching the explanation in the commit message? Yes, I'll add some comments. I got rid of the other issues you pointed out while reworking the driver, should be more straightforward now Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/fsl-pamu: Fix uninitialized variable warning
From: Joerg Roedel The variable 'i' in the function update_liodn_stash() is not initialized and only used in a debug printk(). So it has no meaning at all, remove it. Reported-by: kernel test robot Signed-off-by: Joerg Roedel --- drivers/iommu/fsl_pamu_domain.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 32944d67baa4..0ac781186dbd 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -57,14 +57,13 @@ static int __init iommu_init_mempool(void) static int update_liodn_stash(int liodn, struct fsl_dma_domain *dma_domain, u32 val) { - int ret = 0, i; + int ret = 0; unsigned long flags; spin_lock_irqsave(_lock, flags); ret = pamu_update_paace_stash(liodn, val); if (ret) { - pr_debug("Failed to update SPAACE %d field for liodn %d\n ", -i, liodn); + pr_debug("Failed to update SPAACE for liodn %d\n ", liodn); spin_unlock_irqrestore(_lock, flags); return ret; } -- 2.30.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
I think you've put your finger on it, Suravee! On 15/04/2021 10:28, Suthikulpanit, Suravee wrote: David, On 4/14/2021 10:33 PM, David Coe wrote: Hi Suravee! I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached the code! There are 3 sets of results in the attachment, all for the Ryzen 2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 patch. A. As-distributed kernel (cold boot) >5 retries, so no IOMMU read/write capability, no amd_iommu events. B. As-distributed kernel (warm boot) <5 retries, amd_iommu running stats show large numbers as before. C. Revert+Update kernel amd_iommu events listed and also show large hit/miss numbers. In due course, I'll load the new (revert+update) kernel on the 4700G but won't overload your mail-box unless something unusual turns up. Best regards, For the Ryzen 2400G, could you please try with: - 1 event at a time - Not more than 8 events (On your system, it has 2 banks x 4 counters/bank. I am trying to see if this issue might be related to the counters multiplexing). Thanks, Attached are the results you requested for the 2400G along with a tiny shell-script. One event at a time and various batches of less than 8 events produce unexceptionable data. One final batch of 10 events and (hoopla) up go the counter stats. Will you be doing something in mitigation or does this just go with the patch? Is there anything further you need from me? I'll run the script on the 4700U but I don't expect surprises :-). All most appreciated, -- David iommu_list.sh Description: application/shellscript $ sudo ./iommu_list.sh Performance counter stats for 'system wide': 12 amd_iommu_0/cmd_processed/ 10.001266851 seconds time elapsed Performance counter stats for 'system wide': 11 amd_iommu_0/cmd_processed_inv/ 10.001259049 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 10.000791810 seconds time elapsed Performance counter stats for 'system wide': 350 amd_iommu_0/int_dte_hit/ 10.000848437 seconds time elapsed Performance counter stats for 'system wide': 16 amd_iommu_0/int_dte_mis/ 10.001271989 seconds time elapsed Performance counter stats for 'system wide': 348 amd_iommu_0/mem_dte_hit/ 10.000808074 seconds time elapsed Performance counter stats for 'system wide': 211,925 amd_iommu_0/mem_dte_mis/ 10.000915362 seconds time elapsed Performance counter stats for 'system wide': 30 amd_iommu_0/mem_iommu_tlb_pde_hit/ 10.001520597 seconds time elapsed Performance counter stats for 'system wide': 450 amd_iommu_0/mem_iommu_tlb_pde_mis/ 10.000877493 seconds time elapsed Performance counter stats for 'system wide': 10,953 amd_iommu_0/mem_iommu_tlb_pte_hit/ 10.000831802 seconds time elapsed Performance counter stats for 'system wide': 13,235 amd_iommu_0/mem_iommu_tlb_pte_mis/ 10.001292003 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/mem_pass_excl/ 10.000836000 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/mem_pass_pretrans/ 10.000799887 seconds time elapsed Performance counter stats for 'system wide': 12,283 amd_iommu_0/mem_pass_untrans/ 10.000815339 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/mem_target_abort/ 10.001205168 seconds time elapsed Performance counter stats for 'system wide': 1,333 amd_iommu_0/mem_trans_total/ 10.000915359 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/page_tbl_read_gst/ 10.001248235 seconds time elapsed Performance counter stats for 'system wide': 65 amd_iommu_0/page_tbl_read_nst/ 10.001266411 seconds time elapsed Performance counter stats for 'system wide': 78 amd_iommu_0/page_tbl_read_tot/ 10.001272406 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/smi_blk/ 10.001282912 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/smi_recv/ 10.001223193 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/tlb_inv/ 10.001234853 seconds time elapsed Performance counter stats for 'system wide': 0 amd_iommu_0/vapic_int_guest/
Re: [PATCH 1/3] ACPICA: iASL: Add definitions for the VIOT table
On Thu, Mar 18, 2021 at 06:52:44PM +0100, Auger Eric wrote: > Besides > Reviewed-by: Eric Auger Thanks, though this patch comes from ACPICA and has now been merged with the other ACPICA updates: https://lore.kernel.org/linux-acpi/20210406213028.718796-1-erik.kan...@intel.com/ Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] ACPI: Add driver for the VIOT table
On Thu, Mar 18, 2021 at 07:36:50PM +, Robin Murphy wrote: > On 2021-03-16 19:16, Jean-Philippe Brucker wrote: > > The ACPI Virtual I/O Translation Table describes topology of > > para-virtual platforms. For now it describes the relation between > > virtio-iommu and the endpoints it manages. Supporting that requires > > three steps: > > > > (1) acpi_viot_init(): parse the VIOT table, build a list of endpoints > > and vIOMMUs. > > > > (2) acpi_viot_set_iommu_ops(): when the vIOMMU driver is loaded and the > > device probed, register it to the VIOT driver. This step is required > > because unlike similar drivers, VIOT doesn't create the vIOMMU > > device. > > Note that you're basically the same as the DT case in this regard, so I'd > expect things to be closer to that pattern than to that of IORT. > > [...] > > @@ -1506,12 +1507,17 @@ int acpi_dma_configure_id(struct device *dev, enum > > dev_dma_attr attr, > > { > > const struct iommu_ops *iommu; > > u64 dma_addr = 0, size = 0; > > + int ret; > > if (attr == DEV_DMA_NOT_SUPPORTED) { > > set_dma_ops(dev, _dummy_ops); > > return 0; > > } > > + ret = acpi_viot_dma_setup(dev, attr); > > + if (ret) > > + return ret > 0 ? 0 : ret; > > I think things could do with a fair bit of refactoring here. Ideally we want > to process a possible _DMA method (acpi_dma_get_range()) regardless of which > flavour of IOMMU table might be present, and the amount of duplication we > fork into at this point is unfortunate. > > > + > > iort_dma_setup(dev, _addr, ); > > For starters I think most of that should be dragged out to this level here - > it's really only the {rc,nc}_dma_get_range() bit that deserves to be the > IORT-specific call. Makes sense, though I'll move it to drivers/acpi/arm64/dma.c instead of here, because it has only ever run on CONFIG_ARM64. I don't want to accidentally break some x86 platform with an invalid _DMA method (same reason 7ad426398082 and 18b709beb503 kept this code in IORT) > > > iommu = iort_iommu_configure_id(dev, input_id); > > Similarly, it feels like it's only the table scan part in the middle of that > that needs dispatching between IORT/VIOT, and its head and tail pulled out > into a common path. Agreed > > [...] > > +static const struct iommu_ops *viot_iommu_setup(struct device *dev) > > +{ > > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > > + struct viot_iommu *viommu = NULL; > > + struct viot_endpoint *ep; > > + u32 epid; > > + int ret; > > + > > + /* Already translated? */ > > + if (fwspec && fwspec->ops) > > + return NULL; > > + > > + mutex_lock(_lock); > > + list_for_each_entry(ep, _endpoints, list) { > > + if (viot_device_match(dev, >dev_id, )) { > > + epid += ep->endpoint_id; > > + viommu = ep->viommu; > > + break; > > + } > > + } > > + mutex_unlock(_lock); > > + if (!viommu) > > + return NULL; > > + > > + /* We're not translating ourself */ > > + if (viot_device_match(dev, >dev_id, )) > > + return NULL; > > + > > + /* > > +* If we found a PCI range managed by the viommu, we're the one that has > > +* to request ACS. > > +*/ > > + if (dev_is_pci(dev)) > > + pci_request_acs(); > > + > > + if (!viommu->ops || WARN_ON(!viommu->dev)) > > + return ERR_PTR(-EPROBE_DEFER); > > Can you create (or look up) a viommu->fwnode when initially parsing the VIOT > to represent the IOMMU devices to wait for, such that the > viot_device_match() lookup can resolve to that and let you fall into the > standard iommu_ops_from_fwnode() path? That's what I mean about following > the DT pattern - I guess it might need a bit of trickery to rewrite things > if iommu_device_register() eventually turns up with a new fwnode, so I doubt > we can get away without *some* kind of private interface between > virtio-iommu and VIOT, but it would be nice for the common(ish) DMA paths to > stay as unaware of the specifics as possible. Yes I can reuse iommu_ops_from_fwnode(). Turns out it's really easy: if we move the VIOT initialization after acpi_scan_init(), we can use pci_get_domain_bus_and_slot() directly and create missing fwnodes. That gets rid of any need for a private interface between virtio-iommu and VIOT. > > > + > > + ret = iommu_fwspec_init(dev, viommu->dev->fwnode, viommu->ops); > > + if (ret) > > + return ERR_PTR(ret); > > + > > + iommu_fwspec_add_ids(dev, , 1); > > + > > + /* > > +* If we have reason to believe the IOMMU driver missed the initial > > +* add_device callback for dev, replay it to get things in order. > > +*/ > > + if (dev->bus && !device_iommu_mapped(dev)) > > + iommu_probe_device(dev); > > + > > + return viommu->ops; > > +} > > + > > +/** > > + * acpi_viot_dma_setup - Configure DMA for an endpoint described in
Re: [PATCH v2] iommu/vt-d: Force to flush iotlb before creating superpage
On Thu, Apr 15, 2021 at 08:46:28AM +0800, Longpeng(Mike) wrote: > Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating > superpage") > Cc: # v3.0+ > Link: > https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/ > Suggested-by: Lu Baolu > Signed-off-by: Longpeng(Mike) > --- > v1 -> v2: > - add Joerg > - reconstruct the solution base on the Baolu's suggestion > --- > drivers/iommu/intel/iommu.c | 52 > + > 1 file changed, 38 insertions(+), 14 deletions(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: Put newline after closing bracket in warning
On Mon, Apr 12, 2021 at 08:01:41PM +0200, Paul Menzel wrote: > drivers/iommu/amd/init.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] iommu/mediatek-v1: Avoid build fail when build as module
On Mon, Apr 12, 2021 at 02:48:42PM +0800, Yong Wu wrote: > When this driver build as module, It build fail like: > > ERROR: modpost: "of_phandle_iterator_args" > [drivers/iommu/mtk_iommu_v1.ko] undefined! > > This patch remove this interface to avoid this build fail. > > Reported-by: Valdis Kletnieks > Signed-off-by: Yong Wu > --- > Currently below patch is only in linux-next-20210409. This fixes tag may be > not needed. we can add this if it is need. > Fixes: 8de000cf0265 ("iommu/mediatek-v1: Allow building as module") > --- > drivers/iommu/mtk_iommu_v1.c | 62 > 1 file changed, 28 insertions(+), 34 deletions(-) Applied both, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()'
On Sun, Apr 11, 2021 at 09:08:17AM +0200, Christophe JAILLET wrote: > drivers/iommu/intel/irq_remapping.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86
On Sun, Apr 11, 2021 at 02:23:12PM +0800, Lu Baolu wrote: > drivers/iommu/intel/pasid.c | 2 ++ > 1 file changed, 2 insertions(+) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] iommu/amd: Revert and remove failing PMC test
On Fri, Apr 09, 2021 at 03:58:46AM -0500, Suravee Suthikulpanit wrote: > Paul Menzel (1): > Revert "iommu/amd: Fix performance counter initialization" > > Suravee Suthikulpanit (1): > iommu/amd: Remove performance counter pre-initialization test Applied, thanks Paul and Suravee. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RESEND] iommu/amd: Remove duplicate check of devid
On Fri, Apr 09, 2021 at 11:30:40AM +0800, Shaokun Zhang wrote: > drivers/iommu/amd/iommu.c | 9 + > 1 file changed, 1 insertion(+), 8 deletions(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: exynos: remove unneeded local variable initialization
On Thu, Apr 08, 2021 at 10:16:22PM +0200, Krzysztof Kozlowski wrote: > drivers/iommu/exynos-iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
Hi Jason, On 4/1/21 6:03 PM, Jason Gunthorpe wrote: > On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote: > >> DMA page faults are delivered to root-complex via page request message and >> it is per-device according to PCIe spec. Page request handling flow is: >> >> 1) iommu driver receives a page request from device >> 2) iommu driver parses the page request message. Get the RID,PASID, faulted >>page and requested permissions etc. >> 3) iommu driver triggers fault handler registered by device driver with >>iommu_report_device_fault() > > This seems confused. > > The PASID should define how to handle the page fault, not the driver. In my series I don't use PASID at all. I am just enabling nested stage and the guest uses a single context. I don't allocate any user PASID at any point. When there is a fault at physical level (a stage 1 fault that concerns the guest), this latter needs to be reported and injected into the guest. The vfio pci driver registers a fault handler to the iommu layer and in that fault handler it fills a circ bugger and triggers an eventfd that is listened to by the VFIO-PCI QEMU device. this latter retrives the faault from the mmapped circ buffer, it knowns which vIOMMU it is attached to, and passes the fault to the vIOMMU. Then the vIOMMU triggers and IRQ in the guest. We are reusing the existing concepts from VFIO, region, IRQ to do that. For that use case, would you also use /dev/ioasid? Thanks Eric > > I don't remember any device specific actions in ATS, so what is the > driver supposed to do? > >> 4) device driver's fault handler signals an event FD to notify userspace to >>fetch the information about the page fault. If it's VM case, inject the >>page fault to VM and let guest to solve it. > > If the PASID is set to 'report page fault to userspace' then some > event should come out of /dev/ioasid, or be reported to a linked > eventfd, or whatever. > > If the PASID is set to 'SVM' then the fault should be passed to > handle_mm_fault > > And so on. > > Userspace chooses what happens based on how they configure the PASID > through /dev/ioasid. > > Why would a device driver get involved here? > >> Eric has sent below series for the page fault reporting for VM with passthru >> device. >> https://lore.kernel.org/kvm/20210223210625.604517-5-eric.au...@redhat.com/ > > It certainly should not be in vfio pci. Everything using a PASID needs > this infrastructure, VDPA, mdev, PCI, CXL, etc. > > Jason > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.
On 4/14/2021 11:45 PM, Greg KH wrote: On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote: From: Tianyu Lan UIO HV driver should not load in the isolation VM for security reason. Return ENOTSUPP in the hv_uio_probe() in the isolation VM. Signed-off-by: Tianyu Lan --- drivers/uio/uio_hv_generic.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c index 0330ba99730e..678b021d66f8 100644 --- a/drivers/uio/uio_hv_generic.c +++ b/drivers/uio/uio_hv_generic.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "../hv/hyperv_vmbus.h" @@ -241,6 +242,10 @@ hv_uio_probe(struct hv_device *dev, void *ring_buffer; int ret; + /* UIO driver should not be loaded in the isolation VM.*/ + if (hv_is_isolation_supported()) + return -ENOTSUPP; + /* Communicating with host has to be via shared memory not hypercall */ if (!channel->offermsg.monitor_allocated) { dev_err(>device, "vmbus channel requires hypercall\n"); -- 2.25.1 Again you send out known-wrong patches? :( Sorry for noise. Will fix this next version and I think we should make sure user space driver to check data from host. This patch will be removed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.
Hi Stephen: Thanks for your review. On 4/15/2021 12:17 AM, Stephen Hemminger wrote: On Wed, 14 Apr 2021 17:45:51 +0200 Greg KH wrote: On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote: From: Tianyu Lan UIO HV driver should not load in the isolation VM for security reason. Return ENOTSUPP in the hv_uio_probe() in the isolation VM. Signed-off-by: Tianyu Lan This is debatable, in isolation VM's shouldn't userspace take responsibility to validate host communication. If that is an issue please participate with the DPDK community (main user of this) to make sure netvsc userspace driver has the required checks. Agree. Will report back to secure team and apply request to add change in userspace netvsc driver. Thanks for advise. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API
On Thu, Apr 15, 2021 at 01:33:24PM +0800, Lu Baolu wrote: > Hi Jason, > > On 4/14/21 7:26 PM, Jason Gunthorpe wrote: > > On Wed, Apr 14, 2021 at 02:22:09PM +0800, Lu Baolu wrote: > > > > > I still worry about supervisor pasid allocation. > > > > > > If we use iommu_sva_alloc_pasid() to allocate a supervisor pasid, which > > > mm should the pasid be set? I've ever thought about passing _mm to > > > iommu_sva_alloc_pasid(). But if you add "mm != current->mm", this seems > > > not to work. Or do you prefer a separated interface for supervisor pasid > > > allocation/free? > > > > Without a mm_struct it is not SVA, so don't use SVA APIs for whatever > > a 'supervisor pasid' is > > The supervisor PASID has its mm_struct. The only difference is that the > device will set priv=1 in its DMA transactions with the PASID. Soemthing like init_mm should not be used with 'SVA' Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node
[+Lorenzo] > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: 15 April 2021 10:49 > To: Shameerali Kolothum Thodi ; > linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; > iommu@lists.linux-foundation.org; de...@acpica.org > Cc: Linuxarm ; steven.pr...@arm.com; Guohanjun > (Hanjun Guo) ; sami.muja...@arm.com; > robin.mur...@arm.com; wanghuiqiang ; > Jean-Philippe Brucker > Subject: Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node > > Hi Shameer, > > + Jean-Philippe > > > On 11/19/20 1:11 PM, Shameer Kolothum wrote: > > RFC v1 --> v2: > > - Added a generic interface for IOMMU drivers to retrieve all the > > RMR info associated with a given IOMMU. > > - SMMUv3 driver gets the RMR list during probe() and installs > > bypass STEs for all the SIDs in the RMR list. This is to keep > > the ongoing traffic alive(if any) during SMMUv3 reset. This is > >based on the suggestions received for v1 to take care of the > >EFI framebuffer use case. Only sanity tested for now. > > - During the probe/attach device, SMMUv3 driver reserves any > > RMR region associated with the device such that there is a unity > > mapping for them in SMMU. > > --- > > > > The series adds support to IORT RMR nodes specified in IORT > > Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory > > ranges that are used by endpoints and require a unity mapping > > in SMMU. > > > > We have faced issues with 3408iMR RAID controller cards which > > fail to boot when SMMU is enabled. This is because these controllers > > make use of host memory for various caching related purposes and when > > SMMU is enabled the iMR firmware fails to access these memory regions > > as there is no mapping for them. IORT RMR provides a way for UEFI to > > describe and report these memory regions so that the kernel can make > > a unity mapping for these in SMMU. > > > > RFC because, Patch #1 is to update the actbl2.h and should be done > > through acpica update. I have send out a pull request[1] for that. > > > > Tests: > > > > With a UEFI, that reports the RMR for the dev, > > > > [16F0h 5872 1] Type : 06 > > [16F1h 5873 2] Length : 007C > > [16F3h 5875 1] Revision : 00 > > [1038h 0056 2] Reserved : > > [1038h 0056 2] Identifier : > > [16F8h 5880 4]Mapping Count : 0001 > > [16FCh 5884 4] Mapping Offset : 0040 > > > > [1700h 5888 4]Number of RMR Descriptors : 0002 > > [1704h 5892 4]RMR Descriptor Offset : 0018 > > > > [1708h 5896 8] Base Address of RMR : E640 > > [1710h 5904 8]Length of RMR : 0010 > > [1718h 5912 4] Reserved : > > > > [171Ch 5916 8] Base Address of RMR : 27B0 > > [1724h 5924 8]Length of RMR : 00C0 > > [172Ch 5932 4] Reserved : > > > > [1730h 5936 4] Input base : > > [1734h 5940 4] ID Count : 0001 > > [1738h 5944 4] Output Base : 0003 > > [173Ch 5948 4] Output Reference : 0064 > > [1740h 5952 4]Flags (decoded below) : 0001 > >Single Mapping : 1 > > Following Jean-Philippe's suggestion I have used your series for nested > stage SMMUv3 integration, ie. to simplify the MSI nested stage mapping. > > Host allocates hIOVA -> physical doorbell (pDB) as it normally does for > VFIO device passthrough. IOVA Range is 0x800 - 0x810. > > I expose this MIS IOVA range to the guest as an RMR and as a result > guest has a flat mapping for this range. As the physical device is > programmed with hIOVA we have the following mapping: > > IOVAIPA PA > hIOVA -> hIOVA -> pDB > S1 s2 > > This works. > > The only weird thing is that I need to expose 256 RMRs due to the > 'Single Mapping' mandatory flag. I need to have 1 RMR per potential SID > on the bus. Please see the discussion here, https://op-lists.linaro.org/pipermail/linaro-open-discussions/2021-April/000150.html Hi Lorenzo, May be this is a use case for relaxing that single mapping requirement. Thanks, Shameer > > I will post a new version of SMMUv3 nested stage soon for people to test > & compare. Obviously this removes a bunch of code on both SMMU/VFIO and > QEMU code so I think this solution looks better overall. > > Thanks > > Eric > > ... > > > > Without the series the RAID controller initialization fails as > > below, > > > > ... > > [ 12.631117] megaraid_sas :03:00.0: FW supports sync > cache: Yes > > [ 12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is > called outbound_intr_mask:0x4009 > > [
RE: [RFC PATCH v2 2/8] ACPI/IORT: Add support for RMR node parsing
Hi Eric, > -Original Message- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: 15 April 2021 10:39 > To: Shameerali Kolothum Thodi ; > linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org; > iommu@lists.linux-foundation.org; de...@acpica.org > Cc: Linuxarm ; steven.pr...@arm.com; Guohanjun > (Hanjun Guo) ; sami.muja...@arm.com; > robin.mur...@arm.com; wanghuiqiang > Subject: Re: [RFC PATCH v2 2/8] ACPI/IORT: Add support for RMR node parsing > > Hi Shameer, > On 11/19/20 1:11 PM, Shameer Kolothum wrote: > > Add support for parsing RMR node information from ACPI. > > Find associated stream ids and smmu node info from the > > RMR node and populate a linked list with RMR memory > > descriptors. > > > > Signed-off-by: Shameer Kolothum > > --- > > drivers/acpi/arm64/iort.c | 122 > +- > > 1 file changed, 121 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > > index 9929ff50c0c0..a9705aa35028 100644 > > --- a/drivers/acpi/arm64/iort.c > > +++ b/drivers/acpi/arm64/iort.c > > @@ -40,6 +40,25 @@ struct iort_fwnode { > > static LIST_HEAD(iort_fwnode_list); > > static DEFINE_SPINLOCK(iort_fwnode_lock); > > > > +struct iort_rmr_id { > > + u32 sid; > > + struct acpi_iort_node *smmu; > > +}; > > + > > +/* > > + * One entry for IORT RMR. > > + */ > > +struct iort_rmr_entry { > > + struct list_head list; > > + > > + unsigned int rmr_ids_num; > > + struct iort_rmr_id *rmr_ids; > > + > > + struct acpi_iort_rmr_desc *rmr_desc; > > +}; > > + > > +static LIST_HEAD(iort_rmr_list); /* list of RMR regions from ACPI > */ > > + > > /** > > * iort_set_fwnode() - Create iort_fwnode and use it to register > > *iommu data in the iort_fwnode_list > > @@ -393,7 +412,8 @@ static struct acpi_iort_node > *iort_node_get_id(struct acpi_iort_node *node, > > if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT || > > node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX || > > node->type == ACPI_IORT_NODE_SMMU_V3 || > > - node->type == ACPI_IORT_NODE_PMCG) { > > + node->type == ACPI_IORT_NODE_PMCG || > > + node->type == ACPI_IORT_NODE_RMR) { > > *id_out = map->output_base; > > return parent; > > } > > @@ -1647,6 +1667,103 @@ static void __init iort_enable_acs(struct > acpi_iort_node *iort_node) > > #else > > static inline void iort_enable_acs(struct acpi_iort_node *iort_node) { } > > #endif > > +static int iort_rmr_desc_valid(struct acpi_iort_rmr_desc *desc) > > +{ > > + struct iort_rmr_entry *e; > > + u64 end, start = desc->base_address, length = desc->length; > > + > > + if (!IS_ALIGNED(start, SZ_64K) || !IS_ALIGNED(length, SZ_64K)) > > + return -EINVAL; > > + > > + end = start + length - 1; > > + > > + /* Check for address overlap */ > I don't get this check. What is the problem if you attach the same range > to different stream ids. Shouldn't you check there is no overlap for the > same sid? That’s right. The check should be for memory descriptors within an RMR. I got confused by the wordings in the IORT spec and that is now clarified with Lorenzo here, https://op-lists.linaro.org/pipermail/linaro-open-discussions/2021-April/000150.html I will change this. > > > > + list_for_each_entry(e, _rmr_list, list) { > > + u64 e_start = e->rmr_desc->base_address; > > + u64 e_end = e_start + e->rmr_desc->length - 1; > > + > > + if (start <= e_end && end >= e_start) > > + return -EINVAL; > > + } > > + > > + return 0; > > +} > > + > > +static int __init iort_parse_rmr(struct acpi_iort_node *iort_node) > > +{ > > + struct iort_rmr_id *rmr_ids, *ids; > > + struct iort_rmr_entry *e; > > + struct acpi_iort_rmr *rmr; > > + struct acpi_iort_rmr_desc *rmr_desc; > > + u32 map_count = iort_node->mapping_count; > > + int i, ret = 0, desc_count = 0; > > + > > + if (iort_node->type != ACPI_IORT_NODE_RMR) > > + return 0; > > + > > + if (!iort_node->mapping_offset || !map_count) { > > + pr_err(FW_BUG "Invalid ID mapping, skipping RMR node %p\n", > > + iort_node); > > + return -EINVAL; > > + } > > + > > + rmr_ids = kmalloc(sizeof(*rmr_ids) * map_count, GFP_KERNEL); > > + if (!rmr_ids) > > + return -ENOMEM; > > + > > + /* Retrieve associated smmu and stream id */ > > + ids = rmr_ids; > nit: do you need both rmr_ids and ids? Not really as the spec says it is M:1 mapping. So we only will have one single id here(also map_count for the node must be set to 1 as well). > > + for (i = 0; i < map_count; i++, ids++) { > > + ids->smmu = iort_node_get_id(iort_node, >sid, i); > > + if (!ids->smmu) { > > + pr_err(FW_BUG "Invalid SMMU reference, skipping RMR > node %p\n", > > +
Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework
Hi, On 2021/4/15 15:43, Keqian Zhu wrote: design it as not switchable. I will modify the commit message of patch#12, thanks! I am not sure that I fully get your point. But I can't see any gaps of using iommu_dev_enable/disable_feature() to switch dirty log on and off. Probably I missed anything. IOMMU_DEV_FEAT_HWDBM just tells user whether underlying IOMMU driver supports dirty tracking, it is not used to management the status of dirty log tracking. The feature reporting is per device, but the status management is per iommu_domain. Only when all devices in a domain support HWDBM, we can start dirty log for the domain. So why not for_each_device_attached_domain() iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM) ? And I think we'd better not mix the feature reporting and status management. Thoughts? I am worrying about having two sets of APIs for single purpose. From vendor iommu driver's point of view, this feature is per device. Hence, it still needs to do the same thing. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock
On 15/04/2021 04:52, chenxiang wrote: From: Xiang Chen It is not necessary to put free_iova_mem() inside of spinlock/unlock iova_rbtree_lock which only leads to more completion for the spinlock. It has a small promote on the performance after the change. And also rename private_free_iova() as remove_iova() because the function will not free iova after that change. Signed-off-by: Xiang Chen --- drivers/iommu/iova.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index b7ecd5b..c10af23 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -412,12 +412,11 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn) return NULL; } -static void private_free_iova(struct iova_domain *iovad, struct iova *iova) +static void remove_iova(struct iova_domain *iovad, struct iova *iova) { assert_spin_locked(>iova_rbtree_lock); __cached_rbnode_delete_update(iovad, iova); rb_erase(>node, >rbroot); - free_iova_mem(iova); } /** @@ -452,8 +451,9 @@ __free_iova(struct iova_domain *iovad, struct iova *iova) unsigned long flags; spin_lock_irqsave(>iova_rbtree_lock, flags); - private_free_iova(iovad, iova); + remove_iova(iovad, iova); spin_unlock_irqrestore(>iova_rbtree_lock, flags); + free_iova_mem(iova); } EXPORT_SYMBOL_GPL(__free_iova); @@ -473,9 +473,9 @@ free_iova(struct iova_domain *iovad, unsigned long pfn) spin_lock_irqsave(>iova_rbtree_lock, flags); iova = private_find_iova(iovad, pfn); if (iova) - private_free_iova(iovad, iova); + remove_iova(iovad, iova); spin_unlock_irqrestore(>iova_rbtree_lock, flags); - + free_iova_mem(iova); you need to make this: if (iova) free_iova_mem(iova); as free_iova_mem(iova) dereferences iova: if (iova->pfn_lo != IOVA_ANCHOR) kmem_cache_free(iova_cache, iova) So if iova were NULL, we crash. Or you can make free_iova_mem() NULL safe. } EXPORT_SYMBOL_GPL(free_iova); @@ -825,7 +825,8 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad) if (WARN_ON(!iova)) continue; - private_free_iova(iovad, iova); + remove_iova(iovad, iova); + free_iova_mem(iova); } spin_unlock_irqrestore(>iova_rbtree_lock, flags); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node
Hi Shameer, + Jean-Philippe On 11/19/20 1:11 PM, Shameer Kolothum wrote: > RFC v1 --> v2: > - Added a generic interface for IOMMU drivers to retrieve all the > RMR info associated with a given IOMMU. > - SMMUv3 driver gets the RMR list during probe() and installs > bypass STEs for all the SIDs in the RMR list. This is to keep > the ongoing traffic alive(if any) during SMMUv3 reset. This is >based on the suggestions received for v1 to take care of the >EFI framebuffer use case. Only sanity tested for now. > - During the probe/attach device, SMMUv3 driver reserves any > RMR region associated with the device such that there is a unity > mapping for them in SMMU. > --- > > The series adds support to IORT RMR nodes specified in IORT > Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory > ranges that are used by endpoints and require a unity mapping > in SMMU. > > We have faced issues with 3408iMR RAID controller cards which > fail to boot when SMMU is enabled. This is because these controllers > make use of host memory for various caching related purposes and when > SMMU is enabled the iMR firmware fails to access these memory regions > as there is no mapping for them. IORT RMR provides a way for UEFI to > describe and report these memory regions so that the kernel can make > a unity mapping for these in SMMU. > > RFC because, Patch #1 is to update the actbl2.h and should be done > through acpica update. I have send out a pull request[1] for that. > > Tests: > > With a UEFI, that reports the RMR for the dev, > > [16F0h 5872 1] Type : 06 > [16F1h 5873 2] Length : 007C > [16F3h 5875 1] Revision : 00 > [1038h 0056 2] Reserved : > [1038h 0056 2] Identifier : > [16F8h 5880 4]Mapping Count : 0001 > [16FCh 5884 4] Mapping Offset : 0040 > > [1700h 5888 4]Number of RMR Descriptors : 0002 > [1704h 5892 4]RMR Descriptor Offset : 0018 > > [1708h 5896 8] Base Address of RMR : E640 > [1710h 5904 8]Length of RMR : 0010 > [1718h 5912 4] Reserved : > > [171Ch 5916 8] Base Address of RMR : 27B0 > [1724h 5924 8]Length of RMR : 00C0 > [172Ch 5932 4] Reserved : > > [1730h 5936 4] Input base : > [1734h 5940 4] ID Count : 0001 > [1738h 5944 4] Output Base : 0003 > [173Ch 5948 4] Output Reference : 0064 > [1740h 5952 4]Flags (decoded below) : 0001 >Single Mapping : 1 Following Jean-Philippe's suggestion I have used your series for nested stage SMMUv3 integration, ie. to simplify the MSI nested stage mapping. Host allocates hIOVA -> physical doorbell (pDB) as it normally does for VFIO device passthrough. IOVA Range is 0x800 - 0x810. I expose this MIS IOVA range to the guest as an RMR and as a result guest has a flat mapping for this range. As the physical device is programmed with hIOVA we have the following mapping: IOVAIPA PA hIOVA -> hIOVA -> pDB S1 s2 This works. The only weird thing is that I need to expose 256 RMRs due to the 'Single Mapping' mandatory flag. I need to have 1 RMR per potential SID on the bus. I will post a new version of SMMUv3 nested stage soon for people to test & compare. Obviously this removes a bunch of code on both SMMU/VFIO and QEMU code so I think this solution looks better overall. Thanks Eric > ... > > Without the series the RAID controller initialization fails as > below, > > ... > [ 12.631117] megaraid_sas :03:00.0: FW supports sync cache: Yes > > [ 12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is > called outbound_intr_mask:0x4009 > > [ 18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED for > SCSI host 0 > > [ 23.019383] megaraid_sas :03:00.0: Waiting for FW to come to ready > state > [ 106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault > code:0x1 subcode:0x0 func:megasas_transition_to_ready > > [ 106.695186] megaraid_sas :03:00.0: System Register set: > > [ 106.889787] megaraid_sas :03:00.0: Failed to transition controller to > ready for scsi0. > > [ 106.910475] megaraid_sas :03:00.0: Failed from megasas_init_fw 6407 > > estuary:/$ > > With the series, now the kernel has direct mapping for the dev as > below, > >
Re: [RFC PATCH v2 2/8] ACPI/IORT: Add support for RMR node parsing
Hi Shameer, On 11/19/20 1:11 PM, Shameer Kolothum wrote: > Add support for parsing RMR node information from ACPI. > Find associated stream ids and smmu node info from the > RMR node and populate a linked list with RMR memory > descriptors. > > Signed-off-by: Shameer Kolothum > --- > drivers/acpi/arm64/iort.c | 122 +- > 1 file changed, 121 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > index 9929ff50c0c0..a9705aa35028 100644 > --- a/drivers/acpi/arm64/iort.c > +++ b/drivers/acpi/arm64/iort.c > @@ -40,6 +40,25 @@ struct iort_fwnode { > static LIST_HEAD(iort_fwnode_list); > static DEFINE_SPINLOCK(iort_fwnode_lock); > > +struct iort_rmr_id { > + u32 sid; > + struct acpi_iort_node *smmu; > +}; > + > +/* > + * One entry for IORT RMR. > + */ > +struct iort_rmr_entry { > + struct list_head list; > + > + unsigned int rmr_ids_num; > + struct iort_rmr_id *rmr_ids; > + > + struct acpi_iort_rmr_desc *rmr_desc; > +}; > + > +static LIST_HEAD(iort_rmr_list); /* list of RMR regions from ACPI */ > + > /** > * iort_set_fwnode() - Create iort_fwnode and use it to register > * iommu data in the iort_fwnode_list > @@ -393,7 +412,8 @@ static struct acpi_iort_node *iort_node_get_id(struct > acpi_iort_node *node, > if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT || > node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX || > node->type == ACPI_IORT_NODE_SMMU_V3 || > - node->type == ACPI_IORT_NODE_PMCG) { > + node->type == ACPI_IORT_NODE_PMCG || > + node->type == ACPI_IORT_NODE_RMR) { > *id_out = map->output_base; > return parent; > } > @@ -1647,6 +1667,103 @@ static void __init iort_enable_acs(struct > acpi_iort_node *iort_node) > #else > static inline void iort_enable_acs(struct acpi_iort_node *iort_node) { } > #endif > +static int iort_rmr_desc_valid(struct acpi_iort_rmr_desc *desc) > +{ > + struct iort_rmr_entry *e; > + u64 end, start = desc->base_address, length = desc->length; > + > + if (!IS_ALIGNED(start, SZ_64K) || !IS_ALIGNED(length, SZ_64K)) > + return -EINVAL; > + > + end = start + length - 1; > + > + /* Check for address overlap */ I don't get this check. What is the problem if you attach the same range to different stream ids. Shouldn't you check there is no overlap for the same sid? > + list_for_each_entry(e, _rmr_list, list) { > + u64 e_start = e->rmr_desc->base_address; > + u64 e_end = e_start + e->rmr_desc->length - 1; > + > + if (start <= e_end && end >= e_start) > + return -EINVAL; > + } > + > + return 0; > +} > + > +static int __init iort_parse_rmr(struct acpi_iort_node *iort_node) > +{ > + struct iort_rmr_id *rmr_ids, *ids; > + struct iort_rmr_entry *e; > + struct acpi_iort_rmr *rmr; > + struct acpi_iort_rmr_desc *rmr_desc; > + u32 map_count = iort_node->mapping_count; > + int i, ret = 0, desc_count = 0; > + > + if (iort_node->type != ACPI_IORT_NODE_RMR) > + return 0; > + > + if (!iort_node->mapping_offset || !map_count) { > + pr_err(FW_BUG "Invalid ID mapping, skipping RMR node %p\n", > +iort_node); > + return -EINVAL; > + } > + > + rmr_ids = kmalloc(sizeof(*rmr_ids) * map_count, GFP_KERNEL); > + if (!rmr_ids) > + return -ENOMEM; > + > + /* Retrieve associated smmu and stream id */ > + ids = rmr_ids; nit: do you need both rmr_ids and ids? > + for (i = 0; i < map_count; i++, ids++) { > + ids->smmu = iort_node_get_id(iort_node, >sid, i); > + if (!ids->smmu) { > + pr_err(FW_BUG "Invalid SMMU reference, skipping RMR > node %p\n", > +iort_node); > + ret = -EINVAL; > + goto out; > + } > + } > + > + /* Retrieve RMR data */ > + rmr = (struct acpi_iort_rmr *)iort_node->node_data; > + if (!rmr->rmr_offset || !rmr->rmr_count) { > + pr_err(FW_BUG "Invalid RMR descriptor array, skipping RMR node > %p\n", > +iort_node); > + ret = -EINVAL; > + goto out; > + } > + > + rmr_desc = ACPI_ADD_PTR(struct acpi_iort_rmr_desc, iort_node, > + rmr->rmr_offset); > + > + for (i = 0; i < rmr->rmr_count; i++, rmr_desc++) { > + ret = iort_rmr_desc_valid(rmr_desc); > + if (ret) { > + pr_err(FW_BUG "Invalid RMR descriptor[%d] for node %p, > skipping...\n", > +i, iort_node); > + goto out; so I understand you skip the whole node and not just that rmr desc, otherwise you would continue. so in
Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test
David, On 4/14/2021 10:33 PM, David Coe wrote: Hi Suravee! I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached the code! There are 3 sets of results in the attachment, all for the Ryzen 2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 patch. A. As-distributed kernel (cold boot) >5 retries, so no IOMMU read/write capability, no amd_iommu events. B. As-distributed kernel (warm boot) <5 retries, amd_iommu running stats show large numbers as before. C. Revert+Update kernel amd_iommu events listed and also show large hit/miss numbers. In due course, I'll load the new (revert+update) kernel on the 4700G but won't overload your mail-box unless something unusual turns up. Best regards, For the Ryzen 2400G, could you please try with: - 1 event at a time - Not more than 8 events (On your system, it has 2 banks x 4 counters/bank. I am trying to see if this issue might be related to the counters multiplexing). Thanks, Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V2 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver
On 4/14/2021 11:50 PM, Christoph Hellwig wrote: +struct dma_range { + dma_addr_t dma; + u32 mapping_size; +}; That's a rather generic name that is bound to create a conflict sooner or later. Good point. Will update. #include "hyperv_net.h" #include "netvsc_trace.h" +#include "../../hv/hyperv_vmbus.h" Please move public interfaces out of the private header rather than doing this. OK. Will update. + if (hv_isolation_type_snp()) { + area = get_vm_area(buf_size, VM_IOREMAP); Err, no. get_vm_area is private a for a reason. + if (!area) + goto cleanup; + + vaddr = (unsigned long)area->addr; + for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) { + extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * HV_HYP_PAGE_SIZE) + << HV_HYP_PAGE_SHIFT) + ms_hyperv.shared_gpa_boundary; + ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE, + vaddr + (i + 1) * HV_HYP_PAGE_SIZE, + extra_phys, PAGE_KERNEL_IO); + } + + if (ret) + goto cleanup; And this is not something a driver should ever do. I think you are badly reimplementing functionality that should be in the dma coherent allocator here. OK. I will try hiding these in the Hyper-V dma ops callback. Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb
On 4/14/2021 11:41 PM, Christoph Hellwig wrote: +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write); Just curious, who is going to use all these exports? These seems like extremely low-level functionality. Isn't there a way to build a more useful higher level API? Yes, will remove it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Resend RFC PATCH V2 03/12] x86/Hyper-V: Add new hvcall guest address host visibility support
Hi Christoph: Thanks for your review. On 4/14/2021 11:40 PM, Christoph Hellwig wrote: +/* + * hv_set_mem_host_visibility - Set host visibility for specified memory. + */ I don't think this comment really clarifies anything over the function name. What is 'host visibility' OK. Will update the comment. +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility) Should size be a size_t? Should visibility be an enum of some kind? Will update. +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility) Not sure what this does either. Will add a comment. + local_irq_save(flags); + input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **) Is there a chance we could find a shorter but still descriptive name for this variable? Why do we need the cast? Sure. The cast is to avoid build error due to "incompatible-pointer-types" +#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE +#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE) pointlessly overlong line. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework
On 2021/4/15 15:03, Lu Baolu wrote: > On 4/15/21 2:18 PM, Keqian Zhu wrote: >> Hi Baolu, >> >> Thanks for the review! >> >> On 2021/4/14 15:00, Lu Baolu wrote: >>> Hi Keqian, >>> >>> On 4/13/21 4:54 PM, Keqian Zhu wrote: Some types of IOMMU are capable of tracking DMA dirty log, such as ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the dirty log tracking framework in the IOMMU base layer. Three new essential interfaces are added, and we maintaince the status of dirty log tracking in iommu_domain. 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap A new dev feature are added to indicate whether a specific type of iommu hardware supports and its driver realizes them. Signed-off-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/iommu.c | 150 ++ include/linux/iommu.h | 53 +++ 2 files changed, 203 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index d0b0a15dba84..667b2d6d2fc0 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1922,6 +1922,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, domain->type = type; /* Assume all sizes by default; the driver may override this later */ domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; +mutex_init(>switch_log_lock); return domain; } @@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_domain_set_attr); +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable, + unsigned long iova, size_t size, int prot) +{ +const struct iommu_ops *ops = domain->ops; +int ret; + +if (unlikely(!ops || !ops->switch_dirty_log)) +return -ENODEV; + +mutex_lock(>switch_log_lock); +if (enable && domain->dirty_log_tracking) { +ret = -EBUSY; +goto out; +} else if (!enable && !domain->dirty_log_tracking) { +ret = -EINVAL; +goto out; +} + +ret = ops->switch_dirty_log(domain, enable, iova, size, prot); +if (ret) +goto out; + +domain->dirty_log_tracking = enable; +out: +mutex_unlock(>switch_log_lock); +return ret; +} +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log); >>> >>> Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the >>> difference between >>> >>> iommu_switch_dirty_log(on) vs. >>> iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM) >>> >>> iommu_switch_dirty_log(off) vs. >>> iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM) >> Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable >> are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use >> these >> interfaces for it. >> >> IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so >> we should > > Previously we had iommu_dev_has_feature() and then was cleaned up due to > lack of real users. If you have a real case for it, why not bringing it > back? Yep, good suggestion. > >> design it as not switchable. I will modify the commit message of patch#12, >> thanks! > > I am not sure that I fully get your point. But I can't see any gaps of > using iommu_dev_enable/disable_feature() to switch dirty log on and off. > Probably I missed anything. IOMMU_DEV_FEAT_HWDBM just tells user whether underlying IOMMU driver supports dirty tracking, it is not used to management the status of dirty log tracking. The feature reporting is per device, but the status management is per iommu_domain. Only when all devices in a domain support HWDBM, we can start dirty log for the domain. And I think we'd better not mix the feature reporting and status management. Thoughts? > >> >>> + +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova, + size_t size, unsigned long *bitmap, + unsigned long base_iova, unsigned long bitmap_pgshift) +{ +const struct iommu_ops *ops = domain->ops; +unsigned int min_pagesz; +size_t pgsize; +int ret = 0; + +if (unlikely(!ops || !ops->sync_dirty_log)) +return -ENODEV; + +min_pagesz = 1 << __ffs(domain->pgsize_bitmap); +if (!IS_ALIGNED(iova | size, min_pagesz)) { +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n", + iova, size, min_pagesz); +return -EINVAL; +} + +
Re: [RFC PATCH v2 1/8] ACPICA: IORT: Update for revision E
Hi Shameer, On 11/19/20 1:11 PM, Shameer Kolothum wrote: > IORT revision E contains a few additions like, > -Added an identifier field in the node descriptors to aid table > cross-referencing. > -Introduced the Reserved Memory Range(RMR) node. This is used > to describe memory ranges that are used by endpoints and requires > a unity mapping in SMMU. > -Introduced a flag in the RC node to express support for PRI. > > Signed-off-by: Shameer Kolothum > --- > include/acpi/actbl2.h | 25 +++-- > 1 file changed, 19 insertions(+), 6 deletions(-) > > diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h > index ec66779cb193..274fce7b5c01 100644 > --- a/include/acpi/actbl2.h > +++ b/include/acpi/actbl2.h > @@ -68,7 +68,7 @@ > * IORT - IO Remapping Table > * > * Conforms to "IO Remapping Table System Software on ARM Platforms", > - * Document number: ARM DEN 0049D, March 2018 > + * Document number: ARM DEN 0049E, June 2020 > * > > **/ > > @@ -86,7 +86,8 @@ struct acpi_iort_node { > u8 type; > u16 length; > u8 revision; > - u32 reserved; > + u16 reserved; > + u16 identifier; > u32 mapping_count; > u32 mapping_offset; > char node_data[1]; > @@ -100,7 +101,8 @@ enum acpi_iort_node_type { > ACPI_IORT_NODE_PCI_ROOT_COMPLEX = 0x02, > ACPI_IORT_NODE_SMMU = 0x03, > ACPI_IORT_NODE_SMMU_V3 = 0x04, > - ACPI_IORT_NODE_PMCG = 0x05 > + ACPI_IORT_NODE_PMCG = 0x05, > + ACPI_IORT_NODE_RMR = 0x06, > }; > > struct acpi_iort_id_mapping { > @@ -167,10 +169,10 @@ struct acpi_iort_root_complex { > u8 reserved[3]; /* Reserved, must be zero */ > }; > > -/* Values for ats_attribute field above */ > +/* Masks for ats_attribute field above */ > > -#define ACPI_IORT_ATS_SUPPORTED 0x0001 /* The root complex > supports ATS */ > -#define ACPI_IORT_ATS_UNSUPPORTED 0x /* The root complex > doesn't support ATS */ > +#define ACPI_IORT_ATS_SUPPORTED (1) /* The root complex supports > ATS */ > +#define ACPI_IORT_PRI_SUPPORTED (1<<1) /* The root complex > supports PRI */ > > struct acpi_iort_smmu { > u64 base_address; /* SMMU base address */ > @@ -241,6 +243,17 @@ struct acpi_iort_pmcg { > u64 page1_base_address; > }; > > +struct acpi_iort_rmr { so indeed in E.b there is a new field here. u32 flags > + u32 rmr_count; > + u32 rmr_offset; > +}; > + > +struct acpi_iort_rmr_desc { > + u64 base_address; > + u64 length; > + u32 reserved; > +}; > + > > /*** > * > * IVRS - I/O Virtualization Reporting Structure > Thanks Eric ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework
On 4/15/21 2:18 PM, Keqian Zhu wrote: Hi Baolu, Thanks for the review! On 2021/4/14 15:00, Lu Baolu wrote: Hi Keqian, On 4/13/21 4:54 PM, Keqian Zhu wrote: Some types of IOMMU are capable of tracking DMA dirty log, such as ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the dirty log tracking framework in the IOMMU base layer. Three new essential interfaces are added, and we maintaince the status of dirty log tracking in iommu_domain. 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap A new dev feature are added to indicate whether a specific type of iommu hardware supports and its driver realizes them. Signed-off-by: Keqian Zhu Signed-off-by: Kunkun Jiang --- drivers/iommu/iommu.c | 150 ++ include/linux/iommu.h | 53 +++ 2 files changed, 203 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index d0b0a15dba84..667b2d6d2fc0 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1922,6 +1922,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus, domain->type = type; /* Assume all sizes by default; the driver may override this later */ domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; +mutex_init(>switch_log_lock); return domain; } @@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_domain_set_attr); +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable, + unsigned long iova, size_t size, int prot) +{ +const struct iommu_ops *ops = domain->ops; +int ret; + +if (unlikely(!ops || !ops->switch_dirty_log)) +return -ENODEV; + +mutex_lock(>switch_log_lock); +if (enable && domain->dirty_log_tracking) { +ret = -EBUSY; +goto out; +} else if (!enable && !domain->dirty_log_tracking) { +ret = -EINVAL; +goto out; +} + +ret = ops->switch_dirty_log(domain, enable, iova, size, prot); +if (ret) +goto out; + +domain->dirty_log_tracking = enable; +out: +mutex_unlock(>switch_log_lock); +return ret; +} +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log); Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the difference between iommu_switch_dirty_log(on) vs. iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM) iommu_switch_dirty_log(off) vs. iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM) Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use these interfaces for it. IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so we should Previously we had iommu_dev_has_feature() and then was cleaned up due to lack of real users. If you have a real case for it, why not bringing it back? design it as not switchable. I will modify the commit message of patch#12, thanks! I am not sure that I fully get your point. But I can't see any gaps of using iommu_dev_enable/disable_feature() to switch dirty log on and off. Probably I missed anything. + +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova, + size_t size, unsigned long *bitmap, + unsigned long base_iova, unsigned long bitmap_pgshift) +{ +const struct iommu_ops *ops = domain->ops; +unsigned int min_pagesz; +size_t pgsize; +int ret = 0; + +if (unlikely(!ops || !ops->sync_dirty_log)) +return -ENODEV; + +min_pagesz = 1 << __ffs(domain->pgsize_bitmap); +if (!IS_ALIGNED(iova | size, min_pagesz)) { +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n", + iova, size, min_pagesz); +return -EINVAL; +} + +mutex_lock(>switch_log_lock); +if (!domain->dirty_log_tracking) { +ret = -EINVAL; +goto out; +} + +while (size) { +pgsize = iommu_pgsize(domain, iova, size); + +ret = ops->sync_dirty_log(domain, iova, pgsize, + bitmap, base_iova, bitmap_pgshift); Any reason why do you want to do this in a per-4K page manner? This can lead to a lot of indirect calls and bad performance. How about a sync_dirty_pages()? The function name of iommu_pgsize() is a bit puzzling. Actually it will try to compute the max size that fit into size, so the pgsize can be a large page size even if the underlying mapping is 4K. The __iommu_unmap() also has a similar logic. This series has some improvement on the iommu_pgsize() helper. https://lore.kernel.org/linux-iommu/20210405191112.28192-1-isa...@codeaurora.org/ Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org
Re: [PATCH v2 2/2] iommu/sva: Remove mm parameter from SVA bind API
> * > * Returns 0 on success and < 0 on error. > @@ -28,6 +28,9 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t > min, ioasid_t max) > int ret = 0; > ioasid_t pasid; > > + if (mm != current->mm) > + return -EINVAL; > + Why not remove the parameter entirely? > @@ -2989,8 +2990,11 @@ iommu_sva_bind_device(struct device *dev, struct > mm_struct *mm, unsigned int fla > return ERR_PTR(-ENODEV); > > /* Supervisor SVA does not need the current mm */ > - if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && mm) > - return ERR_PTR(-EINVAL); > + if (!(flags & IOMMU_SVA_BIND_SUPERVISOR)) { > + mm = get_task_mm(current); > + if (!mm) > + return ERR_PTR(-EINVAL); > + } I don't see why we need the reference. I think we should just stop passing the mm to ->sva_bind and let the low-level driver deal with any reference to current->mm where needed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/2] iommu/sva: Tighten SVA bind API with explicit flags
On Wed, Apr 14, 2021 at 08:27:56AM -0700, Jacob Pan wrote: > static int idxd_enable_system_pasid(struct idxd_device *idxd) > { > - int flags; > + unsigned int flags; > unsigned int pasid; > struct iommu_sva *sva; > > - flags = SVM_FLAG_SUPERVISOR_MODE; > + flags = IOMMU_SVA_BIND_SUPERVISOR; > > - sva = iommu_sva_bind_device(>pdev->dev, NULL, ); > + sva = iommu_sva_bind_device(>pdev->dev, NULL, flags); Please also remove the now pointless flags variable. > +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int > flags) Pleae avoid the pointless overly long line. > -#define SVM_FLAG_GUEST_PASID (1<<3) > +#define SVM_FLAG_GUEST_PASID (1<<2) This flag is entirely unused, please just remove it in a prep patch rather than renumbering it. > static inline struct iommu_sva * > -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void > *drvdata) > +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int > flags) Same overy long line here. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework
Hi Baolu, Thanks for the review! On 2021/4/14 15:00, Lu Baolu wrote: > Hi Keqian, > > On 4/13/21 4:54 PM, Keqian Zhu wrote: >> Some types of IOMMU are capable of tracking DMA dirty log, such as >> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the >> dirty log tracking framework in the IOMMU base layer. >> >> Three new essential interfaces are added, and we maintaince the status >> of dirty log tracking in iommu_domain. >> 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking >> 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap >> 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap >> >> A new dev feature are added to indicate whether a specific type of >> iommu hardware supports and its driver realizes them. >> >> Signed-off-by: Keqian Zhu >> Signed-off-by: Kunkun Jiang >> --- >> drivers/iommu/iommu.c | 150 ++ >> include/linux/iommu.h | 53 +++ >> 2 files changed, 203 insertions(+) >> >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index d0b0a15dba84..667b2d6d2fc0 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -1922,6 +1922,7 @@ static struct iommu_domain >> *__iommu_domain_alloc(struct bus_type *bus, >> domain->type = type; >> /* Assume all sizes by default; the driver may override this later */ >> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; >> +mutex_init(>switch_log_lock); >> return domain; >> } >> @@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain >> *domain, >> } >> EXPORT_SYMBOL_GPL(iommu_domain_set_attr); >> +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable, >> + unsigned long iova, size_t size, int prot) >> +{ >> +const struct iommu_ops *ops = domain->ops; >> +int ret; >> + >> +if (unlikely(!ops || !ops->switch_dirty_log)) >> +return -ENODEV; >> + >> +mutex_lock(>switch_log_lock); >> +if (enable && domain->dirty_log_tracking) { >> +ret = -EBUSY; >> +goto out; >> +} else if (!enable && !domain->dirty_log_tracking) { >> +ret = -EINVAL; >> +goto out; >> +} >> + >> +ret = ops->switch_dirty_log(domain, enable, iova, size, prot); >> +if (ret) >> +goto out; >> + >> +domain->dirty_log_tracking = enable; >> +out: >> +mutex_unlock(>switch_log_lock); >> +return ret; >> +} >> +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log); > > Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the > difference between > > iommu_switch_dirty_log(on) vs. iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM) > > iommu_switch_dirty_log(off) vs. > iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM) Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use these interfaces for it. IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so we should design it as not switchable. I will modify the commit message of patch#12, thanks! > >> + >> +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova, >> + size_t size, unsigned long *bitmap, >> + unsigned long base_iova, unsigned long bitmap_pgshift) >> +{ >> +const struct iommu_ops *ops = domain->ops; >> +unsigned int min_pagesz; >> +size_t pgsize; >> +int ret = 0; >> + >> +if (unlikely(!ops || !ops->sync_dirty_log)) >> +return -ENODEV; >> + >> +min_pagesz = 1 << __ffs(domain->pgsize_bitmap); >> +if (!IS_ALIGNED(iova | size, min_pagesz)) { >> +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n", >> + iova, size, min_pagesz); >> +return -EINVAL; >> +} >> + >> +mutex_lock(>switch_log_lock); >> +if (!domain->dirty_log_tracking) { >> +ret = -EINVAL; >> +goto out; >> +} >> + >> +while (size) { >> +pgsize = iommu_pgsize(domain, iova, size); >> + >> +ret = ops->sync_dirty_log(domain, iova, pgsize, >> + bitmap, base_iova, bitmap_pgshift); > > Any reason why do you want to do this in a per-4K page manner? This can > lead to a lot of indirect calls and bad performance. > > How about a sync_dirty_pages()? The function name of iommu_pgsize() is a bit puzzling. Actually it will try to compute the max size that fit into size, so the pgsize can be a large page size even if the underlying mapping is 4K. The __iommu_unmap() also has a similar logic. BRs, Keqian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu