Re: [PATCH v2] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-04-15 Thread chenxiang (M)



在 2021/4/15 17:49, John Garry 写道:

On 15/04/2021 04:52, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change. And also
rename private_free_iova() as remove_iova() because the function will 
not

free iova after that change.

Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7ecd5b..c10af23 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -412,12 +412,11 @@ private_find_iova(struct iova_domain *iovad, 
unsigned long pfn)

  return NULL;
  }
  -static void private_free_iova(struct iova_domain *iovad, struct 
iova *iova)

+static void remove_iova(struct iova_domain *iovad, struct iova *iova)
  {
  assert_spin_locked(>iova_rbtree_lock);
  __cached_rbnode_delete_update(iovad, iova);
  rb_erase(>node, >rbroot);
-free_iova_mem(iova);
  }
/**
@@ -452,8 +451,9 @@ __free_iova(struct iova_domain *iovad, struct 
iova *iova)

  unsigned long flags;
spin_lock_irqsave(>iova_rbtree_lock, flags);
-private_free_iova(iovad, iova);
+remove_iova(iovad, iova);
  spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(__free_iova);
  @@ -473,9 +473,9 @@ free_iova(struct iova_domain *iovad, unsigned 
long pfn)

  spin_lock_irqsave(>iova_rbtree_lock, flags);
  iova = private_find_iova(iovad, pfn);
  if (iova)
-private_free_iova(iovad, iova);
+remove_iova(iovad, iova);
  spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-
+free_iova_mem(iova);


you need to make this:
if (iova)
free_iova_mem(iova);

as free_iova_mem(iova) dereferences iova:

if (iova->pfn_lo != IOVA_ANCHOR)
kmem_cache_free(iova_cache, iova)

So if iova were NULL, we crash.

Or you can make free_iova_mem() NULL safe.


Right, it has the issue. What about changing it as below?

@@ -472,10 +472,13 @@ free_iova(struct iova_domain *iovad, unsigned long 
pfn)


spin_lock_irqsave(>iova_rbtree_lock, flags);
iova = private_find_iova(iovad, pfn);
-   if (iova)
-   private_free_iova(iovad, iova);
+   if (!iova) {
+ spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return;
+   }
+   remove_iova(iovad, iova);
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-
+   free_iova_mem(iova);
 }
 EXPORT_SYMBOL_GPL(free_iova);




  }
  EXPORT_SYMBOL_GPL(free_iova);
  @@ -825,7 +825,8 @@ iova_magazine_free_pfns(struct iova_magazine 
*mag, struct iova_domain *iovad)

  if (WARN_ON(!iova))
  continue;
  -private_free_iova(iovad, iova);
+remove_iova(iovad, iova);
+free_iova_mem(iova);
  }
spin_unlock_irqrestore(>iova_rbtree_lock, flags);




.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-15 Thread Jason Gunthorpe
On Thu, Apr 15, 2021 at 03:11:19PM +0200, Auger Eric wrote:
> Hi Jason,
> 
> On 4/1/21 6:03 PM, Jason Gunthorpe wrote:
> > On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote:
> > 
> >> DMA page faults are delivered to root-complex via page request message and
> >> it is per-device according to PCIe spec. Page request handling flow is:
> >>
> >> 1) iommu driver receives a page request from device
> >> 2) iommu driver parses the page request message. Get the RID,PASID, faulted
> >>page and requested permissions etc.
> >> 3) iommu driver triggers fault handler registered by device driver with
> >>iommu_report_device_fault()
> > 
> > This seems confused.
> > 
> > The PASID should define how to handle the page fault, not the driver.
> 
> In my series I don't use PASID at all. I am just enabling nested stage
> and the guest uses a single context. I don't allocate any user PASID at
> any point.
> 
> When there is a fault at physical level (a stage 1 fault that concerns
> the guest), this latter needs to be reported and injected into the
> guest. The vfio pci driver registers a fault handler to the iommu layer
> and in that fault handler it fills a circ bugger and triggers an eventfd
> that is listened to by the VFIO-PCI QEMU device. this latter retrives
> the faault from the mmapped circ buffer, it knowns which vIOMMU it is
> attached to, and passes the fault to the vIOMMU.
> Then the vIOMMU triggers and IRQ in the guest.
> 
> We are reusing the existing concepts from VFIO, region, IRQ to do that.
> 
> For that use case, would you also use /dev/ioasid?

/dev/ioasid could do all the things you described vfio-pci as doing,
it can even do them the same way you just described.

Stated another way, do you plan to duplicate all of this code someday
for vfio-cxl? What about for vfio-platform? ARM SMMU can be hooked to
platform devices, right?

I feel what you guys are struggling with is some choice in the iommu
kernel APIs that cause the events to be delivered to the pci_device
owner, not the PASID owner.

That feels solvable.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-15 Thread Jacob Pan
Hi Christoph,

On Thu, 15 Apr 2021 07:44:59 +0100, Christoph Hellwig 
wrote:

> >   *
> >   * Returns 0 on success and < 0 on error.
> > @@ -28,6 +28,9 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm,
> > ioasid_t min, ioasid_t max) int ret = 0;
> > ioasid_t pasid;
> >  
> > +   if (mm != current->mm)
> > +   return -EINVAL;
> > +  
> 
> Why not remove the parameter entirely?
It was removed in my v1 but thought it would be cleaner if we treat
iommu_sva_alloc_pasid() as a leaf function of iommu_sva_bind_device(). Then
we don't have to do get_task_mm() every time. But to your point below, it
is better to get low-level driver handle it.
> 
> > @@ -2989,8 +2990,11 @@ iommu_sva_bind_device(struct device *dev, struct
> > mm_struct *mm, unsigned int fla return ERR_PTR(-ENODEV);
> >  
> > /* Supervisor SVA does not need the current mm */
> > -   if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && mm)
> > -   return ERR_PTR(-EINVAL);
> > +   if (!(flags & IOMMU_SVA_BIND_SUPERVISOR)) {
> > +   mm = get_task_mm(current);
> > +   if (!mm)
> > +   return ERR_PTR(-EINVAL);
> > +   }  
> 
> I don't see why we need the reference.  I think we should just stop
> passing the mm to ->sva_bind and let the low-level driver deal with
> any reference to current->mm where needed.
The mm users reference is just for precaution, in case low level driver use
kthread etc.
I agree it is cleaner to just remove mm here, let the low-level driver deal
with it.
Let me give it a spin.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 09/12] swiotlb: Add bounce buffer remap address setting function

2021-04-15 Thread Konrad Rzeszutek Wilk
On Wed, Apr 14, 2021 at 10:49:42AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory)
> needs to be accessed via extra address space(e.g address above bit39).
> Hyper-V code may remap extra address space outside of swiotlb. 
> swiotlb_bounce()

May? Isn't this a MUST in this case?

> needs to use remap virtual address to copy data from/to bounce buffer. Add
> new interface swiotlb_set_bounce_remap() to do that.

I am bit lost - why can't you use the swiotlb_init and pass in your
DMA pool that is already above bit 39?

> 
> Signed-off-by: Tianyu Lan 
> ---
>  include/linux/swiotlb.h |  5 +
>  kernel/dma/swiotlb.c| 13 -
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index d9c9fc9ca5d2..3ccd08116683 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -82,8 +82,13 @@ unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
>  bool is_swiotlb_active(void);
>  void __init swiotlb_adjust_size(unsigned long new_size);
> +void swiotlb_set_bounce_remap(unsigned char *vaddr);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> +static inline void swiotlb_set_bounce_remap(unsigned char *vaddr)
> +{
> +}
> +
>  static inline bool is_swiotlb_buffer(phys_addr_t paddr)
>  {
>   return false;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 7c42df6e6100..5fd2db6aa149 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -94,6 +94,7 @@ static unsigned int io_tlb_index;
>   * not be bounced (unless SWIOTLB_FORCE is set).
>   */
>  static unsigned int max_segment;
> +static unsigned char *swiotlb_bounce_remap_addr;
>  
>  /*
>   * We need to save away the original address corresponding to a mapped entry
> @@ -421,6 +422,11 @@ void __init swiotlb_exit(void)
>   swiotlb_cleanup();
>  }
>  
> +void swiotlb_set_bounce_remap(unsigned char *vaddr)
> +{
> + swiotlb_bounce_remap_addr = vaddr;
> +}
> +
>  /*
>   * Bounce: copy the swiotlb buffer from or back to the original dma location
>   */
> @@ -428,7 +434,12 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
> phys_addr_t tlb_addr,
>  size_t size, enum dma_data_direction dir)
>  {
>   unsigned long pfn = PFN_DOWN(orig_addr);
> - unsigned char *vaddr = phys_to_virt(tlb_addr);
> + unsigned char *vaddr;
> +
> + if (swiotlb_bounce_remap_addr)
> + vaddr = swiotlb_bounce_remap_addr + tlb_addr - io_tlb_start;
> + else
> + vaddr = phys_to_virt(tlb_addr);
>  
>   if (PageHighMem(pfn_to_page(pfn))) {
>   /* The buffer does not have a mapping.  Map it in and copy */
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 07/12] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-04-15 Thread Konrad Rzeszutek Wilk
On Wed, Apr 14, 2021 at 10:49:40AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> VMbus ring buffer are shared with host and it's need to
> be accessed via extra address space of Isolation VM with
> SNP support. This patch is to map the ring buffer
> address in extra address space via ioremap(). HV host

Why do you need to use ioremap()? Why not just use vmap?


> visibility hvcall smears data in the ring buffer and
> so reset the ring buffer memory to zero after calling
> visibility hvcall.

So you are exposing these two:
 EXPORT_SYMBOL_GPL(get_vm_area);
 EXPORT_SYMBOL_GPL(ioremap_page_range);

But if you used vmap wouldn't you get the same thing for free?

> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/hv/channel.c  | 10 +
>  drivers/hv/hyperv_vmbus.h |  2 +
>  drivers/hv/ring_buffer.c  | 83 +--
>  mm/ioremap.c  |  1 +
>  mm/vmalloc.c  |  1 +
>  5 files changed, 76 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 407b74d72f3f..4a9fb7ad4c72 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -634,6 +634,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   if (err)
>   goto error_clean_ring;
>  
> + err = hv_ringbuffer_post_init(>outbound,
> +   page, send_pages);
> + if (err)
> + goto error_free_gpadl;
> +
> + err = hv_ringbuffer_post_init(>inbound,
> +   [send_pages], recv_pages);
> + if (err)
> + goto error_free_gpadl;
> +
>   /* Create and init the channel open message */
>   open_info = kzalloc(sizeof(*open_info) +
>  sizeof(struct vmbus_channel_open_channel),
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 0778add21a9c..d78a04ad5490 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
>  /* Interface */
>  
>  void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> + struct page *pages, u32 page_cnt);
>  
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  struct page *pages, u32 pagecnt);
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 35833d4d1a1d..c8b0f7b45158 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "hyperv_vmbus.h"
>  
> @@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel 
> *channel)
>   mutex_init(>outbound.ring_buffer_mutex);
>  }
>  
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> +struct page *pages, u32 page_cnt)
> +{
> + struct vm_struct *area;
> + u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
> + unsigned long vaddr;
> + int err = 0;
> +
> + if (!hv_isolation_type_snp())
> + return 0;
> +
> + physic_addr += ms_hyperv.shared_gpa_boundary;
> + area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP);
> + if (!area || !area->addr)
> + return -EFAULT;
> +
> + vaddr = (unsigned long)area->addr;
> + err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE,
> +physic_addr, PAGE_KERNEL_IO);
> + err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE,
> +   vaddr + (2 * page_cnt - 1) * PAGE_SIZE,
> +   physic_addr + PAGE_SIZE, PAGE_KERNEL_IO);
> + if (err) {
> + vunmap((void *)vaddr);
> + return -EFAULT;
> + }
> +
> + /* Clean memory after setting host visibility. */
> + memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
> +
> + ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
> + ring_info->ring_buffer->read_index = 0;
> + ring_info->ring_buffer->write_index = 0;
> + ring_info->ring_buffer->feature_bits.value = 1;
> +
> + return 0;
> +}
> +
>  /* Initialize the ring buffer. */
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  struct page *pages, u32 page_cnt)
> @@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
> *ring_info,
>  
>   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
>  
> - /*
> -  * First page holds struct hv_ring_buffer, do wraparound mapping for
> -  * the rest.
> -  */
> - pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> -GFP_KERNEL);
> - if (!pages_wraparound)
> - return -ENOMEM;
> -
> - pages_wraparound[0] = pages;
> - for (i = 0; i < 2 * (page_cnt - 1); i++)
> - pages_wraparound[i + 1] = [i % 

Re: [Resend RFC PATCH V2 06/12] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-04-15 Thread Konrad Rzeszutek Wilk
On Wed, Apr 14, 2021 at 10:49:39AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> The physical address of monitor pages in the CHANNELMSG_INITIATE_CONTACT
> msg should be in the extra address space for SNP support and these

What is this 'extra address space'? Is that just normal virtual address
space of the Linux kernel?

> pages also should be accessed via the extra address space inside Linux
> guest and remap the extra address by ioremap function.

OK, why do you need to use ioremap on them? Why not use vmap for
example? What is it that makes ioremap the right candidate?





> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/hv/connection.c   | 62 +++
>  drivers/hv/hyperv_vmbus.h |  1 +
>  2 files changed, 63 insertions(+)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 79bca653dce9..a0be9c11d737 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -101,6 +101,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>  
>   msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
>   msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> + if (hv_isolation_type_snp()) {
> + msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
> + msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
> + }
> +
>   msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
>  
>   /*
> @@ -145,6 +151,29 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>   return -ECONNREFUSED;
>   }
>  
> + if (hv_isolation_type_snp()) {
> + vmbus_connection.monitor_pages_va[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages[0]
> + = ioremap_cache(msg->monitor_page1, HV_HYP_PAGE_SIZE);
> + if (!vmbus_connection.monitor_pages[0])
> + return -ENOMEM;
> +
> + vmbus_connection.monitor_pages_va[1]
> + = vmbus_connection.monitor_pages[1];
> + vmbus_connection.monitor_pages[1]
> + = ioremap_cache(msg->monitor_page2, HV_HYP_PAGE_SIZE);
> + if (!vmbus_connection.monitor_pages[1]) {
> + vunmap(vmbus_connection.monitor_pages[0]);
> + return -ENOMEM;
> + }
> +
> + memset(vmbus_connection.monitor_pages[0], 0x00,
> +HV_HYP_PAGE_SIZE);
> + memset(vmbus_connection.monitor_pages[1], 0x00,
> +HV_HYP_PAGE_SIZE);
> + }
> +
>   return ret;
>  }
>  
> @@ -156,6 +185,7 @@ int vmbus_connect(void)
>   struct vmbus_channel_msginfo *msginfo = NULL;
>   int i, ret = 0;
>   __u32 version;
> + u64 pfn[2];
>  
>   /* Initialize the vmbus connection */
>   vmbus_connection.conn_state = CONNECTING;
> @@ -213,6 +243,16 @@ int vmbus_connect(void)
>   goto cleanup;
>   }
>  
> + if (hv_isolation_type_snp()) {
> + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> + if (hv_mark_gpa_visibility(2, pfn,
> + VMBUS_PAGE_VISIBLE_READ_WRITE)) {
> + ret = -EFAULT;
> + goto cleanup;
> + }
> + }
> +
>   msginfo = kzalloc(sizeof(*msginfo) +
> sizeof(struct vmbus_channel_initiate_contact),
> GFP_KERNEL);
> @@ -279,6 +319,8 @@ int vmbus_connect(void)
>  
>  void vmbus_disconnect(void)
>  {
> + u64 pfn[2];
> +
>   /*
>* First send the unload request to the host.
>*/
> @@ -298,6 +340,26 @@ void vmbus_disconnect(void)
>   vmbus_connection.int_page = NULL;
>   }
>  
> + if (hv_isolation_type_snp()) {
> + if (vmbus_connection.monitor_pages_va[0]) {
> + vunmap(vmbus_connection.monitor_pages[0]);
> + vmbus_connection.monitor_pages[0]
> + = vmbus_connection.monitor_pages_va[0];
> + vmbus_connection.monitor_pages_va[0] = NULL;
> + }
> +
> + if (vmbus_connection.monitor_pages_va[1]) {
> + vunmap(vmbus_connection.monitor_pages[1]);
> + vmbus_connection.monitor_pages[1]
> + = vmbus_connection.monitor_pages_va[1];
> + vmbus_connection.monitor_pages_va[1] = NULL;
> + }
> +
> + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> + hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
> + }
> +
>   hv_free_hyperv_page((unsigned 

Re: [PATCH v2 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-04-15 Thread Jacob Pan
Hi Christoph,

Thanks for the review.

On Thu, 15 Apr 2021 07:40:33 +0100, Christoph Hellwig 
wrote:

> On Wed, Apr 14, 2021 at 08:27:56AM -0700, Jacob Pan wrote:
> >  static int idxd_enable_system_pasid(struct idxd_device *idxd)
> >  {
> > -   int flags;
> > +   unsigned int flags;
> > unsigned int pasid;
> > struct iommu_sva *sva;
> >  
> > -   flags = SVM_FLAG_SUPERVISOR_MODE;
> > +   flags = IOMMU_SVA_BIND_SUPERVISOR;
> >  
> > -   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
> > +   sva = iommu_sva_bind_device(>pdev->dev, NULL, flags);  
> 
> Please also remove the now pointless flags variable.
> 
Good catch.

> > +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm,
> > unsigned int flags)  
> 
> Pleae avoid the pointless overly long line.
> 
> > -#define SVM_FLAG_GUEST_PASID   (1<<3)
> > +#define SVM_FLAG_GUEST_PASID   (1<<2)  
> 
> This flag is entirely unused, please just remove it in a prep patch
> rather than renumbering it.
> 
You are right. The flag was set and intended to be used by the guest IO
page request patches by Baolu.

As you might be aware, we are restructuring the guest SVA uAPI according to
Jason's proposal, can we wait until we have a clear solution? We may
refactor lots of code.

> >  static inline struct iommu_sva *
> > -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void
> > *drvdata) +iommu_sva_bind_device(struct device *dev, struct mm_struct
> > *mm, unsigned int flags)  
> 
> Same overy long line here.
This is temporary as the mm parameter will be removed in the next patch.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb

2021-04-15 Thread Konrad Rzeszutek Wilk
On Wed, Apr 14, 2021 at 10:49:37AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Hyper-V provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers and these registers are emulated by
> Hypervisor rather than paravisor.

What is paravisor? Is that the VMPL0 to borrow AMD speak from
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
 
> 
> Hyper-V requests to write SINTx MSR registers twice(once via
> GHCB and once via wrmsr instruction including the proxy bit 21)

Why? And what does 'proxy bit 21' mean? Isn't it just setting a bit
on the MSR?

Oh. Are you writting to it twice because you need to let the hypervisor
know (Hyper-V) which is done via the WRMSR. And then the inner
hypervisor (VMPL0) via the SINITx MSR?


> Guest OS ID MSR also needs to be set via GHCB.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  arch/x86/hyperv/hv_init.c   |  18 +
>  arch/x86/hyperv/ivm.c   | 130 
>  arch/x86/include/asm/mshyperv.h |  87 +
>  arch/x86/kernel/cpu/mshyperv.c  |   3 +
>  drivers/hv/hv.c |  65 +++-
>  include/asm-generic/mshyperv.h  |   4 +-
>  6 files changed, 261 insertions(+), 46 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 90e65fbf4c58..87b1dd9c84d6 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -475,6 +475,9 @@ void __init hyperv_init(void)
>  
>   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>   *ghcb_base = ghcb_va;
> +
> + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>   }
>  
>   rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -561,6 +564,7 @@ void hyperv_cleanup(void)
>  
>   /* Reset our OS id */
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>  
>   /*
>* Reset hypercall page reference before reset the page,
> @@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
>   return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> - if (!(ms_hyperv.features_b & HV_ISOLATION))
> - return HV_ISOLATION_TYPE_NONE;
> - return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> -}
> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> -
> -bool hv_is_isolation_supported(void)
> -{
> - return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> -}
> -EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index a5950b7a9214..2ec64b367aaf 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -6,12 +6,139 @@
>   *  Tianyu Lan 
>   */
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
> +union hv_ghcb {
> + struct ghcb ghcb;
> +} __packed __aligned(PAGE_SIZE);
> +
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
> +
> + hv_ghcb->ghcb.protocol_version = 1;
> + hv_ghcb->ghcb.ghcb_usage = 0;
> +
> + ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR);
> + ghcb_set_rcx(_ghcb->ghcb, msr);
> + ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
> + ghcb_set_rdx(_ghcb->ghcb, value >> 32);
> + ghcb_set_sw_exit_info_1(_ghcb->ghcb, 1);
> + ghcb_set_sw_exit_info_2(_ghcb->ghcb, 0);
> +
> + VMGEXIT();
> +
> + if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
> + pr_warn("Fail to write msr via ghcb.\n.");
> +
> + local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
> +
> +void hv_ghcb_msr_read(u64 msr, u64 *value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> + memset(hv_ghcb, 0x00, PAGE_SIZE);
> + hv_ghcb->ghcb.protocol_version = 1;
> + hv_ghcb->ghcb.ghcb_usage = 0;
> +
> + ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR);
> + ghcb_set_rcx(_ghcb->ghcb, 

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-15 Thread David Coe

Hi Suravee!

On 15/04/2021 10:28, Suthikulpanit, Suravee wrote:

David,

On 4/14/2021 10:33 PM, David Coe wrote:

Hi Suravee!

I've re-run your revert+update patch on Ubuntu's latest kernel 
5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached 
the code!


There are 3 sets of results in the attachment, all for the Ryzen 
2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 
patch.


A. As-distributed kernel (cold boot)
    >5 retries, so no IOMMU read/write capability, no amd_iommu events.

B. As-distributed kernel (warm boot)
    <5 retries, amd_iommu running stats show large numbers as before.

C. Revert+Update kernel
    amd_iommu events listed and also show large hit/miss numbers.

In due course, I'll load the new (revert+update) kernel on the 4700G 
but won't overload your mail-box unless something unusual turns up.


Best regards,



For the Ryzen 2400G, could you please try with:
- 1 event at a time
- Not more than 8 events (On your system, it has 2 banks x 4 counters/bank.
I am trying to see if this issue might be related to the counters 
multiplexing).


Thanks,


Herewith similar one-at-a-time perf stat results for the Ryzen 4700U. As 
before, more than 8 events displays the third (%) column and (sometimes) 
'silly' numbers in the first column.


--
David
$ sudo ./iommu_list.sh

 Performance counter stats for 'system wide':

20  amd_iommu_0/cmd_processed/

  10.003010666 seconds time elapsed


 Performance counter stats for 'system wide':

 5   amd_iommu_0/cmd_processed_inv/

  10.002349464 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/

  10.002386129 seconds time elapsed


 Performance counter stats for 'system wide':

   325   amd_iommu_0/int_dte_hit/

  10.002346630 seconds time elapsed


 Performance counter stats for 'system wide':

 2   amd_iommu_0/int_dte_mis/

  10.002365656 seconds time elapsed


 Performance counter stats for 'system wide':

   356   amd_iommu_0/mem_dte_hit/

  10.002426866 seconds time elapsed


 Performance counter stats for 'system wide':

 3,955   amd_iommu_0/mem_dte_mis/

  10.002729058 seconds time elapsed


 Performance counter stats for 'system wide':

 4   amd_iommu_0/mem_iommu_tlb_pde_hit/

  10.002422610 seconds time elapsed


 Performance counter stats for 'system wide':

 1,326   amd_iommu_0/mem_iommu_tlb_pde_mis/

  10.002397045 seconds time elapsed


 Performance counter stats for 'system wide':

 1,009   amd_iommu_0/mem_iommu_tlb_pte_hit/

  10.002445347 seconds time elapsed


 Performance counter stats for 'system wide':

 1,072   amd_iommu_0/mem_iommu_tlb_pte_mis/

  10.002414734 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_pass_excl/

  10.002435482 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_pass_pretrans/

  10.002409956 seconds time elapsed


 Performance counter stats for 'system wide':

 3,405   amd_iommu_0/mem_pass_untrans/

  10.002563812 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_target_abort/

  10.002473657 seconds time elapsed


 Performance counter stats for 'system wide':

 1,311   amd_iommu_0/mem_trans_total/

  10.002471787 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/page_tbl_read_gst/

  10.002216033 seconds time elapsed


 Performance counter stats for 'system wide':

   276,609   amd_iommu_0/page_tbl_read_nst/

  10.002029261 seconds time elapsed


 Performance counter stats for 'system wide':

   126,161   amd_iommu_0/page_tbl_read_tot/

  10.003569029 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/smi_blk/

  10.001871818 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/smi_recv/

  10.002212891 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/tlb_inv/

  10.002396606 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/vapic_int_guest/

  10.002435308 seconds time elapsed


 Performance counter stats for 'system wide':

   340   amd_iommu_0/vapic_int_non_guest/

  10.002405868 seconds time elapsed


$ sudo perf stat -e 'amd_iommu_0/mem_dte_hit/, amd_iommu_0/mem_dte_mis/, 
amd_iommu_0/mem_iommu_tlb_pde_hit/, amd_iommu_0/mem_iommu_tlb_pde_mis/, 
amd_iommu_0/mem_iommu_tlb_pte_hit/, 

Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2021-04-15 Thread Jean-Philippe Brucker
On Thu, Mar 18, 2021 at 02:28:02PM -0400, Michael S. Tsirkin wrote:
> On Tue, Mar 16, 2021 at 08:16:54PM +0100, Jean-Philippe Brucker wrote:
> > With the VIOT support in place, x86 platforms can now use the
> > virtio-iommu.
> > 
> > The arm64 Kconfig selects IOMMU_DMA, while x86 IOMMU drivers select it
> > themselves.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> 
> Acked-by: Michael S. Tsirkin 
> 
> > ---
> >  drivers/iommu/Kconfig | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 2819b5c8ec30..ccca83ef2f06 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -400,8 +400,9 @@ config HYPERV_IOMMU
> >  config VIRTIO_IOMMU
> > tristate "Virtio IOMMU driver"
> > depends on VIRTIO
> > -   depends on ARM64
> > +   depends on (ARM64 || X86)
> > select IOMMU_API
> > +   select IOMMU_DMA if X86
> 
> Would it hurt to just select unconditionally? Seems a bit cleaner
> ...

Yes I think I'll do this for the moment

Thanks,
Jean

> 
> > select INTERVAL_TREE
> > select ACPI_VIOT if ACPI
> > help
> > -- 
> > 2.30.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] iommu/virtio: Enable x86 support

2021-04-15 Thread Jean-Philippe Brucker
On Thu, Mar 18, 2021 at 11:43:38AM +, Robin Murphy wrote:
> On 2021-03-16 19:16, Jean-Philippe Brucker wrote:
> > With the VIOT support in place, x86 platforms can now use the
> > virtio-iommu.
> > 
> > The arm64 Kconfig selects IOMMU_DMA, while x86 IOMMU drivers select it
> > themselves.
> 
> Actually, now that both AMD and Intel are converted over, maybe it's finally
> time to punt that to x86 arch code to match arm64?

x86 also has CONFIG_HYPERV_IOMMU that doesn't use IOMMU_DMA, and might not
want to pull in dma-iommu.o + iova.o (don't know if they care about guest
size). There also is the old gart driver, but that doesn't select
IOMMU_SUPPORT. 

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/fsl-pamu: Fix uninitialized variable warning

2021-04-15 Thread Christoph Hellwig
On Thu, Apr 15, 2021 at 04:44:42PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> The variable 'i' in the function update_liodn_stash() is not
> initialized and only used in a debug printk(). So it has no
> meaning at all, remove it.
> 
> Reported-by: kernel test robot 
> Signed-off-by: Joerg Roedel 

Sorry for introducing this.  The fix looks good:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] ACPI: Add driver for the VIOT table

2021-04-15 Thread Jean-Philippe Brucker
On Fri, Mar 19, 2021 at 11:44:26AM +0100, Auger Eric wrote:
> add some kernel-doc comments matching the explanation in the commit message?

Yes, I'll add some comments. I got rid of the other issues you pointed out
while reworking the driver, should be more straightforward now

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/fsl-pamu: Fix uninitialized variable warning

2021-04-15 Thread Joerg Roedel
From: Joerg Roedel 

The variable 'i' in the function update_liodn_stash() is not
initialized and only used in a debug printk(). So it has no
meaning at all, remove it.

Reported-by: kernel test robot 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/fsl_pamu_domain.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 32944d67baa4..0ac781186dbd 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -57,14 +57,13 @@ static int __init iommu_init_mempool(void)
 static int update_liodn_stash(int liodn, struct fsl_dma_domain *dma_domain,
  u32 val)
 {
-   int ret = 0, i;
+   int ret = 0;
unsigned long flags;
 
spin_lock_irqsave(_lock, flags);
ret = pamu_update_paace_stash(liodn, val);
if (ret) {
-   pr_debug("Failed to update SPAACE %d field for liodn %d\n ",
-i, liodn);
+   pr_debug("Failed to update SPAACE for liodn %d\n ", liodn);
spin_unlock_irqrestore(_lock, flags);
return ret;
}
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-15 Thread David Coe

I think you've put your finger on it, Suravee!

On 15/04/2021 10:28, Suthikulpanit, Suravee wrote:

David,

On 4/14/2021 10:33 PM, David Coe wrote:

Hi Suravee!

I've re-run your revert+update patch on Ubuntu's latest kernel 
5.11.0-14 partly to check my mailer's 'mangling' hadn't also reached 
the code!


There are 3 sets of results in the attachment, all for the Ryzen 
2400G. The as-distributed kernel already incorporates your IOMMU RFCv3 
patch.


A. As-distributed kernel (cold boot)
    >5 retries, so no IOMMU read/write capability, no amd_iommu events.

B. As-distributed kernel (warm boot)
    <5 retries, amd_iommu running stats show large numbers as before.

C. Revert+Update kernel
    amd_iommu events listed and also show large hit/miss numbers.

In due course, I'll load the new (revert+update) kernel on the 4700G 
but won't overload your mail-box unless something unusual turns up.


Best regards,



For the Ryzen 2400G, could you please try with:
- 1 event at a time
- Not more than 8 events (On your system, it has 2 banks x 4 counters/bank.
I am trying to see if this issue might be related to the counters 
multiplexing).


Thanks,


Attached are the results you requested for the 2400G along with a tiny 
shell-script.


One event at a time and various batches of less than 8 events produce 
unexceptionable data. One final batch of 10 events and (hoopla) up go 
the counter stats.


Will you be doing something in mitigation or does this just go with the 
patch? Is there anything further you need from me? I'll run the script 
on the 4700U but I don't expect surprises :-).


All most appreciated,

--
David


iommu_list.sh
Description: application/shellscript
$ sudo ./iommu_list.sh

 Performance counter stats for 'system wide':

12  amd_iommu_0/cmd_processed/

  10.001266851 seconds time elapsed


 Performance counter stats for 'system wide':

11   amd_iommu_0/cmd_processed_inv/

  10.001259049 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/ign_rd_wr_mmio_1ff8h/

  10.000791810 seconds time elapsed


 Performance counter stats for 'system wide':

   350   amd_iommu_0/int_dte_hit/

  10.000848437 seconds time elapsed


 Performance counter stats for 'system wide':

16   amd_iommu_0/int_dte_mis/

  10.001271989 seconds time elapsed


 Performance counter stats for 'system wide':

   348   amd_iommu_0/mem_dte_hit/

  10.000808074 seconds time elapsed


 Performance counter stats for 'system wide':

   211,925   amd_iommu_0/mem_dte_mis/

  10.000915362 seconds time elapsed


 Performance counter stats for 'system wide':

30   amd_iommu_0/mem_iommu_tlb_pde_hit/

  10.001520597 seconds time elapsed


 Performance counter stats for 'system wide':

   450   amd_iommu_0/mem_iommu_tlb_pde_mis/

  10.000877493 seconds time elapsed


 Performance counter stats for 'system wide':

10,953   amd_iommu_0/mem_iommu_tlb_pte_hit/

  10.000831802 seconds time elapsed


 Performance counter stats for 'system wide':

13,235   amd_iommu_0/mem_iommu_tlb_pte_mis/

  10.001292003 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_pass_excl/

  10.000836000 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_pass_pretrans/

  10.000799887 seconds time elapsed


 Performance counter stats for 'system wide':

12,283   amd_iommu_0/mem_pass_untrans/

  10.000815339 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/mem_target_abort/

  10.001205168 seconds time elapsed


 Performance counter stats for 'system wide':

 1,333   amd_iommu_0/mem_trans_total/

  10.000915359 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/page_tbl_read_gst/

  10.001248235 seconds time elapsed


 Performance counter stats for 'system wide':

65   amd_iommu_0/page_tbl_read_nst/

  10.001266411 seconds time elapsed


 Performance counter stats for 'system wide':

78   amd_iommu_0/page_tbl_read_tot/

  10.001272406 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/smi_blk/

  10.001282912 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/smi_recv/

  10.001223193 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/tlb_inv/

  10.001234853 seconds time elapsed


 Performance counter stats for 'system wide':

 0   amd_iommu_0/vapic_int_guest/

  

Re: [PATCH 1/3] ACPICA: iASL: Add definitions for the VIOT table

2021-04-15 Thread Jean-Philippe Brucker
On Thu, Mar 18, 2021 at 06:52:44PM +0100, Auger Eric wrote:
> Besides
> Reviewed-by: Eric Auger 

Thanks, though this patch comes from ACPICA and has now been merged with
the other ACPICA updates:
https://lore.kernel.org/linux-acpi/20210406213028.718796-1-erik.kan...@intel.com/

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] ACPI: Add driver for the VIOT table

2021-04-15 Thread Jean-Philippe Brucker
On Thu, Mar 18, 2021 at 07:36:50PM +, Robin Murphy wrote:
> On 2021-03-16 19:16, Jean-Philippe Brucker wrote:
> > The ACPI Virtual I/O Translation Table describes topology of
> > para-virtual platforms. For now it describes the relation between
> > virtio-iommu and the endpoints it manages. Supporting that requires
> > three steps:
> > 
> > (1) acpi_viot_init(): parse the VIOT table, build a list of endpoints
> >  and vIOMMUs.
> > 
> > (2) acpi_viot_set_iommu_ops(): when the vIOMMU driver is loaded and the
> >  device probed, register it to the VIOT driver. This step is required
> >  because unlike similar drivers, VIOT doesn't create the vIOMMU
> >  device.
> 
> Note that you're basically the same as the DT case in this regard, so I'd
> expect things to be closer to that pattern than to that of IORT.
> 
> [...]
> > @@ -1506,12 +1507,17 @@ int acpi_dma_configure_id(struct device *dev, enum 
> > dev_dma_attr attr,
> >   {
> > const struct iommu_ops *iommu;
> > u64 dma_addr = 0, size = 0;
> > +   int ret;
> > if (attr == DEV_DMA_NOT_SUPPORTED) {
> > set_dma_ops(dev, _dummy_ops);
> > return 0;
> > }
> > +   ret = acpi_viot_dma_setup(dev, attr);
> > +   if (ret)
> > +   return ret > 0 ? 0 : ret;
> 
> I think things could do with a fair bit of refactoring here. Ideally we want
> to process a possible _DMA method (acpi_dma_get_range()) regardless of which
> flavour of IOMMU table might be present, and the amount of duplication we
> fork into at this point is unfortunate.
> 
> > +
> > iort_dma_setup(dev, _addr, );
> 
> For starters I think most of that should be dragged out to this level here -
> it's really only the {rc,nc}_dma_get_range() bit that deserves to be the
> IORT-specific call.

Makes sense, though I'll move it to drivers/acpi/arm64/dma.c instead of
here, because it has only ever run on CONFIG_ARM64. I don't want to
accidentally break some x86 platform with an invalid _DMA method (same
reason 7ad426398082 and 18b709beb503 kept this code in IORT)

> 
> > iommu = iort_iommu_configure_id(dev, input_id);
> 
> Similarly, it feels like it's only the table scan part in the middle of that
> that needs dispatching between IORT/VIOT, and its head and tail pulled out
> into a common path.

Agreed

> 
> [...]
> > +static const struct iommu_ops *viot_iommu_setup(struct device *dev)
> > +{
> > +   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> > +   struct viot_iommu *viommu = NULL;
> > +   struct viot_endpoint *ep;
> > +   u32 epid;
> > +   int ret;
> > +
> > +   /* Already translated? */
> > +   if (fwspec && fwspec->ops)
> > +   return NULL;
> > +
> > +   mutex_lock(_lock);
> > +   list_for_each_entry(ep, _endpoints, list) {
> > +   if (viot_device_match(dev, >dev_id, )) {
> > +   epid += ep->endpoint_id;
> > +   viommu = ep->viommu;
> > +   break;
> > +   }
> > +   }
> > +   mutex_unlock(_lock);
> > +   if (!viommu)
> > +   return NULL;
> > +
> > +   /* We're not translating ourself */
> > +   if (viot_device_match(dev, >dev_id, ))
> > +   return NULL;
> > +
> > +   /*
> > +* If we found a PCI range managed by the viommu, we're the one that has
> > +* to request ACS.
> > +*/
> > +   if (dev_is_pci(dev))
> > +   pci_request_acs();
> > +
> > +   if (!viommu->ops || WARN_ON(!viommu->dev))
> > +   return ERR_PTR(-EPROBE_DEFER);
> 
> Can you create (or look up) a viommu->fwnode when initially parsing the VIOT
> to represent the IOMMU devices to wait for, such that the
> viot_device_match() lookup can resolve to that and let you fall into the
> standard iommu_ops_from_fwnode() path? That's what I mean about following
> the DT pattern - I guess it might need a bit of trickery to rewrite things
> if iommu_device_register() eventually turns up with a new fwnode, so I doubt
> we can get away without *some* kind of private interface between
> virtio-iommu and VIOT, but it would be nice for the common(ish) DMA paths to
> stay as unaware of the specifics as possible.

Yes I can reuse iommu_ops_from_fwnode(). Turns out it's really easy: if we
move the VIOT initialization after acpi_scan_init(), we can use
pci_get_domain_bus_and_slot() directly and create missing fwnodes. That
gets rid of any need for a private interface between virtio-iommu and
VIOT.

> 
> > +
> > +   ret = iommu_fwspec_init(dev, viommu->dev->fwnode, viommu->ops);
> > +   if (ret)
> > +   return ERR_PTR(ret);
> > +
> > +   iommu_fwspec_add_ids(dev, , 1);
> > +
> > +   /*
> > +* If we have reason to believe the IOMMU driver missed the initial
> > +* add_device callback for dev, replay it to get things in order.
> > +*/
> > +   if (dev->bus && !device_iommu_mapped(dev))
> > +   iommu_probe_device(dev);
> > +
> > +   return viommu->ops;
> > +}
> > +
> > +/**
> > + * acpi_viot_dma_setup - Configure DMA for an endpoint described in 

Re: [PATCH v2] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-15 Thread Joerg Roedel
On Thu, Apr 15, 2021 at 08:46:28AM +0800, Longpeng(Mike) wrote:
> Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating 
> superpage")
> Cc:  # v3.0+
> Link: 
> https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/
> Suggested-by: Lu Baolu 
> Signed-off-by: Longpeng(Mike) 
> ---
> v1 -> v2:
>   - add Joerg
>   - reconstruct the solution base on the Baolu's suggestion
> ---
>  drivers/iommu/intel/iommu.c | 52 
> +
>  1 file changed, 38 insertions(+), 14 deletions(-)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Put newline after closing bracket in warning

2021-04-15 Thread Jörg Rödel
On Mon, Apr 12, 2021 at 08:01:41PM +0200, Paul Menzel wrote:
>  drivers/iommu/amd/init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/mediatek-v1: Avoid build fail when build as module

2021-04-15 Thread Joerg Roedel
On Mon, Apr 12, 2021 at 02:48:42PM +0800, Yong Wu wrote:
> When this driver build as module, It build fail like:
> 
> ERROR: modpost: "of_phandle_iterator_args"
> [drivers/iommu/mtk_iommu_v1.ko] undefined!
> 
> This patch remove this interface to avoid this build fail.
> 
> Reported-by: Valdis Kletnieks 
> Signed-off-by: Yong Wu 
> ---
> Currently below patch is only in linux-next-20210409. This fixes tag may be
> not needed. we can add this if it is need.
> Fixes: 8de000cf0265 ("iommu/mediatek-v1: Allow building as module")
> ---
>  drivers/iommu/mtk_iommu_v1.c | 62 
>  1 file changed, 28 insertions(+), 34 deletions(-)

Applied both, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()'

2021-04-15 Thread Joerg Roedel
On Sun, Apr 11, 2021 at 09:08:17AM +0200, Christophe JAILLET wrote:
>  drivers/iommu/intel/irq_remapping.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86

2021-04-15 Thread Joerg Roedel
On Sun, Apr 11, 2021 at 02:23:12PM +0800, Lu Baolu wrote:
>  drivers/iommu/intel/pasid.c | 2 ++
>  1 file changed, 2 insertions(+)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu/amd: Revert and remove failing PMC test

2021-04-15 Thread Joerg Roedel
On Fri, Apr 09, 2021 at 03:58:46AM -0500, Suravee Suthikulpanit wrote:
> Paul Menzel (1):
>   Revert "iommu/amd: Fix performance counter initialization"
> 
> Suravee Suthikulpanit (1):
>   iommu/amd: Remove performance counter pre-initialization test

Applied, thanks Paul and Suravee.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND] iommu/amd: Remove duplicate check of devid

2021-04-15 Thread Joerg Roedel
On Fri, Apr 09, 2021 at 11:30:40AM +0800, Shaokun Zhang wrote:
>  drivers/iommu/amd/iommu.c | 9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: exynos: remove unneeded local variable initialization

2021-04-15 Thread Joerg Roedel
On Thu, Apr 08, 2021 at 10:16:22PM +0200, Krzysztof Kozlowski wrote:
>  drivers/iommu/exynos-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-15 Thread Auger Eric
Hi Jason,

On 4/1/21 6:03 PM, Jason Gunthorpe wrote:
> On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote:
> 
>> DMA page faults are delivered to root-complex via page request message and
>> it is per-device according to PCIe spec. Page request handling flow is:
>>
>> 1) iommu driver receives a page request from device
>> 2) iommu driver parses the page request message. Get the RID,PASID, faulted
>>page and requested permissions etc.
>> 3) iommu driver triggers fault handler registered by device driver with
>>iommu_report_device_fault()
> 
> This seems confused.
> 
> The PASID should define how to handle the page fault, not the driver.

In my series I don't use PASID at all. I am just enabling nested stage
and the guest uses a single context. I don't allocate any user PASID at
any point.

When there is a fault at physical level (a stage 1 fault that concerns
the guest), this latter needs to be reported and injected into the
guest. The vfio pci driver registers a fault handler to the iommu layer
and in that fault handler it fills a circ bugger and triggers an eventfd
that is listened to by the VFIO-PCI QEMU device. this latter retrives
the faault from the mmapped circ buffer, it knowns which vIOMMU it is
attached to, and passes the fault to the vIOMMU.
Then the vIOMMU triggers and IRQ in the guest.

We are reusing the existing concepts from VFIO, region, IRQ to do that.

For that use case, would you also use /dev/ioasid?

Thanks

Eric
> 
> I don't remember any device specific actions in ATS, so what is the
> driver supposed to do?
> 
>> 4) device driver's fault handler signals an event FD to notify userspace to
>>fetch the information about the page fault. If it's VM case, inject the
>>page fault to VM and let guest to solve it.
> 
> If the PASID is set to 'report page fault to userspace' then some
> event should come out of /dev/ioasid, or be reported to a linked
> eventfd, or whatever.
> 
> If the PASID is set to 'SVM' then the fault should be passed to
> handle_mm_fault
> 
> And so on.
> 
> Userspace chooses what happens based on how they configure the PASID
> through /dev/ioasid.
> 
> Why would a device driver get involved here?
> 
>> Eric has sent below series for the page fault reporting for VM with passthru
>> device.
>> https://lore.kernel.org/kvm/20210223210625.604517-5-eric.au...@redhat.com/
> 
> It certainly should not be in vfio pci. Everything using a PASID needs
> this infrastructure, VDPA, mdev, PCI, CXL, etc.
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-15 Thread Tianyu Lan




On 4/14/2021 11:45 PM, Greg KH wrote:

On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote:

From: Tianyu Lan 

UIO HV driver should not load in the isolation VM for security reason.
Return ENOTSUPP in the hv_uio_probe() in the isolation VM.

Signed-off-by: Tianyu Lan 
---
  drivers/uio/uio_hv_generic.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 0330ba99730e..678b021d66f8 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "../hv/hyperv_vmbus.h"
  
@@ -241,6 +242,10 @@ hv_uio_probe(struct hv_device *dev,

void *ring_buffer;
int ret;
  
+	/* UIO driver should not be loaded in the isolation VM.*/

+   if (hv_is_isolation_supported())
+   return -ENOTSUPP;
+   
/* Communicating with host has to be via shared memory not hypercall */
if (!channel->offermsg.monitor_allocated) {
dev_err(>device, "vmbus channel requires hypercall\n");
--
2.25.1



Again you send out known-wrong patches?

:(

Sorry for noise. Will fix this next version and I think we should make 
sure user space driver to check data from host. This patch will be removed.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-15 Thread Tianyu Lan

Hi Stephen:
Thanks for your review.


On 4/15/2021 12:17 AM, Stephen Hemminger wrote:

On Wed, 14 Apr 2021 17:45:51 +0200
Greg KH  wrote:


On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote:

From: Tianyu Lan 

UIO HV driver should not load in the isolation VM for security reason.
Return ENOTSUPP in the hv_uio_probe() in the isolation VM.

Signed-off-by: Tianyu Lan 


This is debatable, in isolation VM's shouldn't userspace take responsibility
to validate host communication. If that is an issue please participate with
the DPDK community (main user of this) to make sure netvsc userspace driver
has the required checks.



Agree. Will report back to secure team and apply request to add change 
in userspace netvsc driver. Thanks for advise.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-15 Thread Jason Gunthorpe
On Thu, Apr 15, 2021 at 01:33:24PM +0800, Lu Baolu wrote:
> Hi Jason,
> 
> On 4/14/21 7:26 PM, Jason Gunthorpe wrote:
> > On Wed, Apr 14, 2021 at 02:22:09PM +0800, Lu Baolu wrote:
> > 
> > > I still worry about supervisor pasid allocation.
> > > 
> > > If we use iommu_sva_alloc_pasid() to allocate a supervisor pasid, which
> > > mm should the pasid be set? I've ever thought about passing _mm to
> > > iommu_sva_alloc_pasid(). But if you add "mm != current->mm", this seems
> > > not to work. Or do you prefer a separated interface for supervisor pasid
> > > allocation/free?
> > 
> > Without a mm_struct it is not SVA, so don't use SVA APIs for whatever
> > a 'supervisor pasid' is
> 
> The supervisor PASID has its mm_struct. The only difference is that the
> device will set priv=1 in its DMA transactions with the PASID.

Soemthing like init_mm should not be used with 'SVA'

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node

2021-04-15 Thread Shameerali Kolothum Thodi
[+Lorenzo]

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 15 April 2021 10:49
> To: Shameerali Kolothum Thodi ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org; de...@acpica.org
> Cc: Linuxarm ; steven.pr...@arm.com; Guohanjun
> (Hanjun Guo) ; sami.muja...@arm.com;
> robin.mur...@arm.com; wanghuiqiang ;
> Jean-Philippe Brucker 
> Subject: Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node
> 
> Hi Shameer,
> 
> + Jean-Philippe
> 
> 
> On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> > RFC v1 --> v2:
> >  - Added a generic interface for IOMMU drivers to retrieve all the
> >    RMR info associated with a given IOMMU.
> >  - SMMUv3 driver gets the RMR list during probe() and installs
> >    bypass STEs for all the SIDs in the RMR list. This is to keep
> >    the ongoing traffic alive(if any) during SMMUv3 reset. This is
> >based on the suggestions received for v1 to take care of the
> >EFI framebuffer use case. Only sanity tested for now.
> >  - During the probe/attach device, SMMUv3 driver reserves any
> >    RMR region associated with the device such that there is a unity
> >    mapping for them in SMMU.
> > ---
> >
> > The series adds support to IORT RMR nodes specified in IORT
> > Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory
> > ranges that are used by endpoints and require a unity mapping
> > in SMMU.
> >
> > We have faced issues with 3408iMR RAID controller cards which
> > fail to boot when SMMU is enabled. This is because these controllers
> > make use of host memory for various caching related purposes and when
> > SMMU is enabled the iMR firmware fails to access these memory regions
> > as there is no mapping for them. IORT RMR provides a way for UEFI to
> > describe and report these memory regions so that the kernel can make
> > a unity mapping for these in SMMU.
> >
> > RFC because, Patch #1 is to update the actbl2.h and should be done
> > through acpica update. I have send out a pull request[1] for that.
> >
> > Tests:
> >
> > With a UEFI, that reports the RMR for the dev,
> > 
> > [16F0h 5872   1] Type : 06
> > [16F1h 5873   2]   Length : 007C
> > [16F3h 5875   1] Revision : 00
> > [1038h 0056   2] Reserved : 
> > [1038h 0056   2]   Identifier : 
> > [16F8h 5880   4]Mapping Count : 0001
> > [16FCh 5884   4]   Mapping Offset : 0040
> >
> > [1700h 5888   4]Number of RMR Descriptors : 0002
> > [1704h 5892   4]RMR Descriptor Offset : 0018
> >
> > [1708h 5896   8]  Base Address of RMR : E640
> > [1710h 5904   8]Length of RMR : 0010
> > [1718h 5912   4] Reserved : 
> >
> > [171Ch 5916   8]  Base Address of RMR : 27B0
> > [1724h 5924   8]Length of RMR : 00C0
> > [172Ch 5932   4] Reserved : 
> >
> > [1730h 5936   4]   Input base : 
> > [1734h 5940   4] ID Count : 0001
> > [1738h 5944   4]  Output Base : 0003
> > [173Ch 5948   4] Output Reference : 0064
> > [1740h 5952   4]Flags (decoded below) : 0001
> >Single Mapping : 1
> 
> Following Jean-Philippe's suggestion I have used your series for nested
> stage SMMUv3 integration, ie. to simplify the MSI nested stage mapping.
> 
> Host allocates hIOVA -> physical doorbell (pDB) as it normally does for
> VFIO device passthrough. IOVA Range is 0x800 - 0x810.
> 
> I expose this MIS IOVA range to the guest as an RMR and as a result
> guest has a flat mapping for this range. As the physical device is
> programmed with hIOVA we have the following mapping:
> 
> IOVAIPA  PA
> hIOVA   -> hIOVA ->  pDB
> S1   s2
> 
> This works.
> 
> The only weird thing is that I need to expose 256 RMRs due to the
> 'Single Mapping' mandatory flag. I need to have 1 RMR per potential SID
> on the bus.

Please see the discussion here,
https://op-lists.linaro.org/pipermail/linaro-open-discussions/2021-April/000150.html

Hi Lorenzo,

May be this is a use case for relaxing that single mapping requirement.

Thanks,
Shameer

> 
> I will post a new version of SMMUv3 nested stage soon for people to test
> & compare. Obviously this removes a bunch of code on both SMMU/VFIO and
> QEMU code so I think this solution looks better overall.
> 
> Thanks
> 
> Eric
> > ...
> >
> > Without the series the RAID controller initialization fails as
> > below,
> >
> > ...
> > [   12.631117] megaraid_sas :03:00.0: FW supports sync
> cache: Yes
> > [   12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is
> called outbound_intr_mask:0x4009
> > [   

RE: [RFC PATCH v2 2/8] ACPI/IORT: Add support for RMR node parsing

2021-04-15 Thread Shameerali Kolothum Thodi
Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 15 April 2021 10:39
> To: Shameerali Kolothum Thodi ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org; de...@acpica.org
> Cc: Linuxarm ; steven.pr...@arm.com; Guohanjun
> (Hanjun Guo) ; sami.muja...@arm.com;
> robin.mur...@arm.com; wanghuiqiang 
> Subject: Re: [RFC PATCH v2 2/8] ACPI/IORT: Add support for RMR node parsing
> 
> Hi Shameer,
> On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> > Add support for parsing RMR node information from ACPI.
> > Find associated stream ids and smmu node info from the
> > RMR node and populate a linked list with RMR memory
> > descriptors.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  drivers/acpi/arm64/iort.c | 122
> +-
> >  1 file changed, 121 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> > index 9929ff50c0c0..a9705aa35028 100644
> > --- a/drivers/acpi/arm64/iort.c
> > +++ b/drivers/acpi/arm64/iort.c
> > @@ -40,6 +40,25 @@ struct iort_fwnode {
> >  static LIST_HEAD(iort_fwnode_list);
> >  static DEFINE_SPINLOCK(iort_fwnode_lock);
> >
> > +struct iort_rmr_id {
> > +   u32  sid;
> > +   struct acpi_iort_node *smmu;
> > +};
> > +
> > +/*
> > + * One entry for IORT RMR.
> > + */
> > +struct iort_rmr_entry {
> > +   struct list_head list;
> > +
> > +   unsigned int rmr_ids_num;
> > +   struct iort_rmr_id *rmr_ids;
> > +
> > +   struct acpi_iort_rmr_desc *rmr_desc;
> > +};
> > +
> > +static LIST_HEAD(iort_rmr_list); /* list of RMR regions from ACPI
> */
> > +
> >  /**
> >   * iort_set_fwnode() - Create iort_fwnode and use it to register
> >   *iommu data in the iort_fwnode_list
> > @@ -393,7 +412,8 @@ static struct acpi_iort_node
> *iort_node_get_id(struct acpi_iort_node *node,
> > if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
> > node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX ||
> > node->type == ACPI_IORT_NODE_SMMU_V3 ||
> > -   node->type == ACPI_IORT_NODE_PMCG) {
> > +   node->type == ACPI_IORT_NODE_PMCG ||
> > +   node->type == ACPI_IORT_NODE_RMR) {
> > *id_out = map->output_base;
> > return parent;
> > }
> > @@ -1647,6 +1667,103 @@ static void __init iort_enable_acs(struct
> acpi_iort_node *iort_node)
> >  #else
> >  static inline void iort_enable_acs(struct acpi_iort_node *iort_node) { }
> >  #endif
> > +static int iort_rmr_desc_valid(struct acpi_iort_rmr_desc *desc)
> > +{
> > +   struct iort_rmr_entry *e;
> > +   u64 end, start = desc->base_address, length = desc->length;
> > +
> > +   if (!IS_ALIGNED(start, SZ_64K) || !IS_ALIGNED(length, SZ_64K))
> > +   return -EINVAL;
> > +
> > +   end = start + length - 1;
> > +
> > +   /* Check for address overlap */
> I don't get this check. What is the problem if you attach the same range
> to different stream ids. Shouldn't you check there is no overlap for the
> same sid?

That’s right. The check should be for memory descriptors within an RMR.
I got confused by the wordings in the IORT spec and that is now clarified
with Lorenzo here,

https://op-lists.linaro.org/pipermail/linaro-open-discussions/2021-April/000150.html

I will change this.
> 
> 
> > +   list_for_each_entry(e, _rmr_list, list) {
> > +   u64 e_start = e->rmr_desc->base_address;
> > +   u64 e_end = e_start + e->rmr_desc->length - 1;
> > +
> > +   if (start <= e_end && end >= e_start)
> > +   return -EINVAL;
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > +static int __init iort_parse_rmr(struct acpi_iort_node *iort_node)
> > +{
> > +   struct iort_rmr_id *rmr_ids, *ids;
> > +   struct iort_rmr_entry *e;
> > +   struct acpi_iort_rmr *rmr;
> > +   struct acpi_iort_rmr_desc *rmr_desc;
> > +   u32 map_count = iort_node->mapping_count;
> > +   int i, ret = 0, desc_count = 0;
> > +
> > +   if (iort_node->type != ACPI_IORT_NODE_RMR)
> > +   return 0;
> > +
> > +   if (!iort_node->mapping_offset || !map_count) {
> > +   pr_err(FW_BUG "Invalid ID mapping, skipping RMR node %p\n",
> > +  iort_node);
> > +   return -EINVAL;
> > +   }
> > +
> > +   rmr_ids = kmalloc(sizeof(*rmr_ids) * map_count, GFP_KERNEL);
> > +   if (!rmr_ids)
> > +   return -ENOMEM;
> > +
> > +   /* Retrieve associated smmu and stream id */
> > +   ids = rmr_ids;
> nit: do you need both rmr_ids and ids?

Not really as the spec says it is M:1 mapping. So we only will have one 
single id here(also map_count for the node must be set to 1 as well).

> > +   for (i = 0; i < map_count; i++, ids++) {
> > +   ids->smmu = iort_node_get_id(iort_node, >sid, i);
> > +   if (!ids->smmu) {
> > +   pr_err(FW_BUG "Invalid SMMU reference, skipping RMR
> node %p\n",
> > +

Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-15 Thread Lu Baolu

Hi,

On 2021/4/15 15:43, Keqian Zhu wrote:

design it as not switchable. I will modify the commit message of patch#12, 
thanks!

I am not sure that I fully get your point. But I can't see any gaps of
using iommu_dev_enable/disable_feature() to switch dirty log on and off.
Probably I missed anything.

IOMMU_DEV_FEAT_HWDBM just tells user whether underlying IOMMU driver supports
dirty tracking, it is not used to management the status of dirty log tracking.

The feature reporting is per device, but the status management is per 
iommu_domain.
Only when all devices in a domain support HWDBM, we can start dirty log for the 
domain.


So why not

for_each_device_attached_domain()
iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)

?


And I think we'd better not mix the feature reporting and status management. 
Thoughts?



I am worrying about having two sets of APIs for single purpose. From
vendor iommu driver's point of view, this feature is per device. Hence,
it still needs to do the same thing.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-04-15 Thread John Garry

On 15/04/2021 04:52, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change. And also
rename private_free_iova() as remove_iova() because the function will not
free iova after that change.

Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7ecd5b..c10af23 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -412,12 +412,11 @@ private_find_iova(struct iova_domain *iovad, unsigned 
long pfn)
return NULL;
  }
  
-static void private_free_iova(struct iova_domain *iovad, struct iova *iova)

+static void remove_iova(struct iova_domain *iovad, struct iova *iova)
  {
assert_spin_locked(>iova_rbtree_lock);
__cached_rbnode_delete_update(iovad, iova);
rb_erase(>node, >rbroot);
-   free_iova_mem(iova);
  }
  
  /**

@@ -452,8 +451,9 @@ __free_iova(struct iova_domain *iovad, struct iova *iova)
unsigned long flags;
  
  	spin_lock_irqsave(>iova_rbtree_lock, flags);

-   private_free_iova(iovad, iova);
+   remove_iova(iovad, iova);
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(__free_iova);
  
@@ -473,9 +473,9 @@ free_iova(struct iova_domain *iovad, unsigned long pfn)

spin_lock_irqsave(>iova_rbtree_lock, flags);
iova = private_find_iova(iovad, pfn);
if (iova)
-   private_free_iova(iovad, iova);
+   remove_iova(iovad, iova);
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-
+   free_iova_mem(iova);


you need to make this:
if (iova)
free_iova_mem(iova);

as free_iova_mem(iova) dereferences iova:

if (iova->pfn_lo != IOVA_ANCHOR)
kmem_cache_free(iova_cache, iova)

So if iova were NULL, we crash.

Or you can make free_iova_mem() NULL safe.


  }
  EXPORT_SYMBOL_GPL(free_iova);
  
@@ -825,7 +825,8 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct iova_domain *iovad)

if (WARN_ON(!iova))
continue;
  
-		private_free_iova(iovad, iova);

+   remove_iova(iovad, iova);
+   free_iova_mem(iova);
}
  
  	spin_unlock_irqrestore(>iova_rbtree_lock, flags);




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 0/8] ACPI/IORT: Support for IORT RMR node

2021-04-15 Thread Auger Eric
Hi Shameer,

+ Jean-Philippe


On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> RFC v1 --> v2:
>  - Added a generic interface for IOMMU drivers to retrieve all the 
>    RMR info associated with a given IOMMU.
>  - SMMUv3 driver gets the RMR list during probe() and installs
>    bypass STEs for all the SIDs in the RMR list. This is to keep
>    the ongoing traffic alive(if any) during SMMUv3 reset. This is
>based on the suggestions received for v1 to take care of the
>EFI framebuffer use case. Only sanity tested for now.
>  - During the probe/attach device, SMMUv3 driver reserves any
>    RMR region associated with the device such that there is a unity
>    mapping for them in SMMU.
> ---    
> 
> The series adds support to IORT RMR nodes specified in IORT
> Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory
> ranges that are used by endpoints and require a unity mapping
> in SMMU.
> 
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these controllers
> make use of host memory for various caching related purposes and when
> SMMU is enabled the iMR firmware fails to access these memory regions
> as there is no mapping for them. IORT RMR provides a way for UEFI to
> describe and report these memory regions so that the kernel can make
> a unity mapping for these in SMMU.
> 
> RFC because, Patch #1 is to update the actbl2.h and should be done
> through acpica update. I have send out a pull request[1] for that.
> 
> Tests:
> 
> With a UEFI, that reports the RMR for the dev,
> 
> [16F0h 5872   1] Type : 06
> [16F1h 5873   2]   Length : 007C
> [16F3h 5875   1] Revision : 00
> [1038h 0056   2] Reserved : 
> [1038h 0056   2]   Identifier : 
> [16F8h 5880   4]Mapping Count : 0001
> [16FCh 5884   4]   Mapping Offset : 0040
> 
> [1700h 5888   4]Number of RMR Descriptors : 0002
> [1704h 5892   4]RMR Descriptor Offset : 0018
> 
> [1708h 5896   8]  Base Address of RMR : E640
> [1710h 5904   8]Length of RMR : 0010
> [1718h 5912   4] Reserved : 
> 
> [171Ch 5916   8]  Base Address of RMR : 27B0
> [1724h 5924   8]Length of RMR : 00C0
> [172Ch 5932   4] Reserved : 
> 
> [1730h 5936   4]   Input base : 
> [1734h 5940   4] ID Count : 0001
> [1738h 5944   4]  Output Base : 0003
> [173Ch 5948   4] Output Reference : 0064
> [1740h 5952   4]Flags (decoded below) : 0001
>Single Mapping : 1

Following Jean-Philippe's suggestion I have used your series for nested
stage SMMUv3 integration, ie. to simplify the MSI nested stage mapping.

Host allocates hIOVA -> physical doorbell (pDB) as it normally does for
VFIO device passthrough. IOVA Range is 0x800 - 0x810.

I expose this MIS IOVA range to the guest as an RMR and as a result
guest has a flat mapping for this range. As the physical device is
programmed with hIOVA we have the following mapping:

IOVAIPA  PA
hIOVA   -> hIOVA ->  pDB
S1   s2

This works.

The only weird thing is that I need to expose 256 RMRs due to the
'Single Mapping' mandatory flag. I need to have 1 RMR per potential SID
on the bus.

I will post a new version of SMMUv3 nested stage soon for people to test
& compare. Obviously this removes a bunch of code on both SMMU/VFIO and
QEMU code so I think this solution looks better overall.

Thanks

Eric
> ...
> 
> Without the series the RAID controller initialization fails as
> below,
> 
> ...
> [   12.631117] megaraid_sas :03:00.0: FW supports sync cache: Yes 
>   
> [   12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is 
> called outbound_intr_mask:0x4009  
>  
> [   18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED for 
> SCSI host 0   
>   
> [   23.019383] megaraid_sas :03:00.0: Waiting for FW to come to ready 
> state 
> [  106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault 
> code:0x1 subcode:0x0 func:megasas_transition_to_ready 
>
> [  106.695186] megaraid_sas :03:00.0: System Register set:
>   
> [  106.889787] megaraid_sas :03:00.0: Failed to transition controller to 
> ready for scsi0.  
>  
> [  106.910475] megaraid_sas :03:00.0: Failed from megasas_init_fw 6407
>   
> estuary:/$
> 
> With the series, now the kernel has direct mapping for the dev as
> below,
> 
> 

Re: [RFC PATCH v2 2/8] ACPI/IORT: Add support for RMR node parsing

2021-04-15 Thread Auger Eric
Hi Shameer,
On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> Add support for parsing RMR node information from ACPI.
> Find associated stream ids and smmu node info from the
> RMR node and populate a linked list with RMR memory
> descriptors.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/acpi/arm64/iort.c | 122 +-
>  1 file changed, 121 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 9929ff50c0c0..a9705aa35028 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -40,6 +40,25 @@ struct iort_fwnode {
>  static LIST_HEAD(iort_fwnode_list);
>  static DEFINE_SPINLOCK(iort_fwnode_lock);
>  
> +struct iort_rmr_id {
> + u32  sid;
> + struct acpi_iort_node *smmu;
> +};
> +
> +/*
> + * One entry for IORT RMR.
> + */
> +struct iort_rmr_entry {
> + struct list_head list;
> +
> + unsigned int rmr_ids_num;
> + struct iort_rmr_id *rmr_ids;
> +
> + struct acpi_iort_rmr_desc *rmr_desc;
> +};
> +
> +static LIST_HEAD(iort_rmr_list); /* list of RMR regions from ACPI */
> +
>  /**
>   * iort_set_fwnode() - Create iort_fwnode and use it to register
>   *  iommu data in the iort_fwnode_list
> @@ -393,7 +412,8 @@ static struct acpi_iort_node *iort_node_get_id(struct 
> acpi_iort_node *node,
>   if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
>   node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX ||
>   node->type == ACPI_IORT_NODE_SMMU_V3 ||
> - node->type == ACPI_IORT_NODE_PMCG) {
> + node->type == ACPI_IORT_NODE_PMCG ||
> + node->type == ACPI_IORT_NODE_RMR) {
>   *id_out = map->output_base;
>   return parent;
>   }
> @@ -1647,6 +1667,103 @@ static void __init iort_enable_acs(struct 
> acpi_iort_node *iort_node)
>  #else
>  static inline void iort_enable_acs(struct acpi_iort_node *iort_node) { }
>  #endif
> +static int iort_rmr_desc_valid(struct acpi_iort_rmr_desc *desc)
> +{
> + struct iort_rmr_entry *e;
> + u64 end, start = desc->base_address, length = desc->length;
> +
> + if (!IS_ALIGNED(start, SZ_64K) || !IS_ALIGNED(length, SZ_64K))
> + return -EINVAL;
> +
> + end = start + length - 1;
> +
> + /* Check for address overlap */
I don't get this check. What is the problem if you attach the same range
to different stream ids. Shouldn't you check there is no overlap for the
same sid?


> + list_for_each_entry(e, _rmr_list, list) {
> + u64 e_start = e->rmr_desc->base_address;
> + u64 e_end = e_start + e->rmr_desc->length - 1;
> +
> + if (start <= e_end && end >= e_start)
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int __init iort_parse_rmr(struct acpi_iort_node *iort_node)
> +{
> + struct iort_rmr_id *rmr_ids, *ids;
> + struct iort_rmr_entry *e;
> + struct acpi_iort_rmr *rmr;
> + struct acpi_iort_rmr_desc *rmr_desc;
> + u32 map_count = iort_node->mapping_count;
> + int i, ret = 0, desc_count = 0;
> +
> + if (iort_node->type != ACPI_IORT_NODE_RMR)
> + return 0;
> +
> + if (!iort_node->mapping_offset || !map_count) {
> + pr_err(FW_BUG "Invalid ID mapping, skipping RMR node %p\n",
> +iort_node);
> + return -EINVAL;
> + }
> +
> + rmr_ids = kmalloc(sizeof(*rmr_ids) * map_count, GFP_KERNEL);
> + if (!rmr_ids)
> + return -ENOMEM;
> +
> + /* Retrieve associated smmu and stream id */
> + ids = rmr_ids;
nit: do you need both rmr_ids and ids?
> + for (i = 0; i < map_count; i++, ids++) {
> + ids->smmu = iort_node_get_id(iort_node, >sid, i);
> + if (!ids->smmu) {
> + pr_err(FW_BUG "Invalid SMMU reference, skipping RMR 
> node %p\n",
> +iort_node);
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + /* Retrieve RMR data */
> + rmr = (struct acpi_iort_rmr *)iort_node->node_data;
> + if (!rmr->rmr_offset || !rmr->rmr_count) {
> + pr_err(FW_BUG "Invalid RMR descriptor array, skipping RMR node 
> %p\n",
> +iort_node);
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + rmr_desc = ACPI_ADD_PTR(struct acpi_iort_rmr_desc, iort_node,
> + rmr->rmr_offset);
> +
> + for (i = 0; i < rmr->rmr_count; i++, rmr_desc++) {
> + ret = iort_rmr_desc_valid(rmr_desc);
> + if (ret) {
> + pr_err(FW_BUG "Invalid RMR descriptor[%d] for node %p, 
> skipping...\n",
> +i, iort_node);
> + goto out;
so I understand you skip the whole node and not just that rmr desc,
otherwise you would continue. so in 

Re: [PATCH 2/2] iommu/amd: Remove performance counter pre-initialization test

2021-04-15 Thread Suthikulpanit, Suravee

David,

On 4/14/2021 10:33 PM, David Coe wrote:

Hi Suravee!

I've re-run your revert+update patch on Ubuntu's latest kernel 5.11.0-14 partly 
to check my mailer's 'mangling' hadn't also reached the code!

There are 3 sets of results in the attachment, all for the Ryzen 2400G. The 
as-distributed kernel already incorporates your IOMMU RFCv3 patch.

A. As-distributed kernel (cold boot)
    >5 retries, so no IOMMU read/write capability, no amd_iommu events.

B. As-distributed kernel (warm boot)
    <5 retries, amd_iommu running stats show large numbers as before.

C. Revert+Update kernel
    amd_iommu events listed and also show large hit/miss numbers.

In due course, I'll load the new (revert+update) kernel on the 4700G but won't 
overload your mail-box unless something unusual turns up.

Best regards,



For the Ryzen 2400G, could you please try with:
- 1 event at a time
- Not more than 8 events (On your system, it has 2 banks x 4 counters/bank.
I am trying to see if this issue might be related to the counters multiplexing).

Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Resend RFC PATCH V2 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-04-15 Thread Tianyu Lan




On 4/14/2021 11:50 PM, Christoph Hellwig wrote:

+struct dma_range {
+   dma_addr_t dma;
+   u32 mapping_size;
+};


That's a rather generic name that is bound to create a conflict sooner
or later.


Good point. Will update.




  #include "hyperv_net.h"
  #include "netvsc_trace.h"
+#include "../../hv/hyperv_vmbus.h"


Please move public interfaces out of the private header rather than doing
this.


OK. Will update.




+   if (hv_isolation_type_snp()) {
+   area = get_vm_area(buf_size, VM_IOREMAP);


Err, no.  get_vm_area is private a for a reason.


+   if (!area)
+   goto cleanup;
+
+   vaddr = (unsigned long)area->addr;
+   for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
+   extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * 
HV_HYP_PAGE_SIZE)
+   << HV_HYP_PAGE_SHIFT) + 
ms_hyperv.shared_gpa_boundary;
+   ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
+  vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
+  extra_phys, PAGE_KERNEL_IO);
+   }
+
+   if (ret)
+   goto cleanup;


And this is not something a driver should ever do.  I think you are badly
reimplementing functionality that should be in the dma coherent allocator
here.


OK. I will try hiding these in the Hyper-V dma ops callback. Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb

2021-04-15 Thread Tianyu Lan

On 4/14/2021 11:41 PM, Christoph Hellwig wrote:

+EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);


Just curious, who is going to use all these exports?  These seems like
extremely low-level functionality.  Isn't there a way to build a more
useful higher level API?



Yes, will remove it.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 03/12] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-04-15 Thread Tianyu Lan

Hi Christoph:
Thanks for your review.

On 4/14/2021 11:40 PM, Christoph Hellwig wrote:

+/*
+ * hv_set_mem_host_visibility - Set host visibility for specified memory.
+ */


I don't think this comment really clarifies anything over the function
name.  What is 'host visibility'


OK. Will update the comment.




+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)


Should size be a size_t?
Should visibility be an enum of some kind?



Will update.


+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)


Not sure what this does either.


Will add a comment.




+   local_irq_save(flags);
+   input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)


Is there a chance we could find a shorter but still descriptive
name for this variable?  Why do we need the cast?


Sure. The cast is to avoid build error due to "incompatible-pointer-types"



+#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
+#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)


pointlessly overlong line.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-15 Thread Keqian Zhu



On 2021/4/15 15:03, Lu Baolu wrote:
> On 4/15/21 2:18 PM, Keqian Zhu wrote:
>> Hi Baolu,
>>
>> Thanks for the review!
>>
>> On 2021/4/14 15:00, Lu Baolu wrote:
>>> Hi Keqian,
>>>
>>> On 4/13/21 4:54 PM, Keqian Zhu wrote:
 Some types of IOMMU are capable of tracking DMA dirty log, such as
 ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
 dirty log tracking framework in the IOMMU base layer.

 Three new essential interfaces are added, and we maintaince the status
 of dirty log tracking in iommu_domain.
 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

 A new dev feature are added to indicate whether a specific type of
 iommu hardware supports and its driver realizes them.

 Signed-off-by: Keqian Zhu 
 Signed-off-by: Kunkun Jiang 
 ---
drivers/iommu/iommu.c | 150 ++
include/linux/iommu.h |  53 +++
2 files changed, 203 insertions(+)

 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
 index d0b0a15dba84..667b2d6d2fc0 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -1922,6 +1922,7 @@ static struct iommu_domain 
 *__iommu_domain_alloc(struct bus_type *bus,
domain->type = type;
/* Assume all sizes by default; the driver may override this later 
 */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
 +mutex_init(>switch_log_lock);
  return domain;
}
 @@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain 
 *domain,
}
EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
+int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
 +   unsigned long iova, size_t size, int prot)
 +{
 +const struct iommu_ops *ops = domain->ops;
 +int ret;
 +
 +if (unlikely(!ops || !ops->switch_dirty_log))
 +return -ENODEV;
 +
 +mutex_lock(>switch_log_lock);
 +if (enable && domain->dirty_log_tracking) {
 +ret = -EBUSY;
 +goto out;
 +} else if (!enable && !domain->dirty_log_tracking) {
 +ret = -EINVAL;
 +goto out;
 +}
 +
 +ret = ops->switch_dirty_log(domain, enable, iova, size, prot);
 +if (ret)
 +goto out;
 +
 +domain->dirty_log_tracking = enable;
 +out:
 +mutex_unlock(>switch_log_lock);
 +return ret;
 +}
 +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
>>>
>>> Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the
>>> difference between
>>>
>>> iommu_switch_dirty_log(on) vs. 
>>> iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)
>>>
>>> iommu_switch_dirty_log(off) vs. 
>>> iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM)
>> Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable
>> are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use 
>> these
>> interfaces for it.
>>
>> IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so 
>> we should
> 
> Previously we had iommu_dev_has_feature() and then was cleaned up due to
> lack of real users. If you have a real case for it, why not bringing it
> back?
Yep, good suggestion.

> 
>> design it as not switchable. I will modify the commit message of patch#12, 
>> thanks!
> 
> I am not sure that I fully get your point. But I can't see any gaps of
> using iommu_dev_enable/disable_feature() to switch dirty log on and off.
> Probably I missed anything.
IOMMU_DEV_FEAT_HWDBM just tells user whether underlying IOMMU driver supports
dirty tracking, it is not used to management the status of dirty log tracking.

The feature reporting is per device, but the status management is per 
iommu_domain.
Only when all devices in a domain support HWDBM, we can start dirty log for the 
domain.

And I think we'd better not mix the feature reporting and status management. 
Thoughts?

> 
>>
>>>
 +
 +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
 + size_t size, unsigned long *bitmap,
 + unsigned long base_iova, unsigned long bitmap_pgshift)
 +{
 +const struct iommu_ops *ops = domain->ops;
 +unsigned int min_pagesz;
 +size_t pgsize;
 +int ret = 0;
 +
 +if (unlikely(!ops || !ops->sync_dirty_log))
 +return -ENODEV;
 +
 +min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
 +if (!IS_ALIGNED(iova | size, min_pagesz)) {
 +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
 +   iova, size, min_pagesz);
 +return -EINVAL;
 +}
 +
 +

Re: [RFC PATCH v2 1/8] ACPICA: IORT: Update for revision E

2021-04-15 Thread Auger Eric
Hi Shameer,

On 11/19/20 1:11 PM, Shameer Kolothum wrote:
> IORT revision E contains a few additions like,
>     -Added an identifier field in the node descriptors to aid table
>      cross-referencing.
>     -Introduced the Reserved Memory Range(RMR) node. This is used
>      to describe memory ranges that are used by endpoints and requires
>      a unity mapping in SMMU.
> -Introduced a flag in the RC node to express support for PRI.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  include/acpi/actbl2.h | 25 +++--
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
> index ec66779cb193..274fce7b5c01 100644
> --- a/include/acpi/actbl2.h
> +++ b/include/acpi/actbl2.h
> @@ -68,7 +68,7 @@
>   * IORT - IO Remapping Table
>   *
>   * Conforms to "IO Remapping Table System Software on ARM Platforms",
> - * Document number: ARM DEN 0049D, March 2018
> + * Document number: ARM DEN 0049E, June 2020
>   *
>   
> **/
>  
> @@ -86,7 +86,8 @@ struct acpi_iort_node {
>   u8 type;
>   u16 length;
>   u8 revision;
> - u32 reserved;
> + u16 reserved;
> + u16 identifier;
>   u32 mapping_count;
>   u32 mapping_offset;
>   char node_data[1];
> @@ -100,7 +101,8 @@ enum acpi_iort_node_type {
>   ACPI_IORT_NODE_PCI_ROOT_COMPLEX = 0x02,
>   ACPI_IORT_NODE_SMMU = 0x03,
>   ACPI_IORT_NODE_SMMU_V3 = 0x04,
> - ACPI_IORT_NODE_PMCG = 0x05
> + ACPI_IORT_NODE_PMCG = 0x05,
> + ACPI_IORT_NODE_RMR = 0x06,
>  };
>  
>  struct acpi_iort_id_mapping {
> @@ -167,10 +169,10 @@ struct acpi_iort_root_complex {
>   u8 reserved[3]; /* Reserved, must be zero */
>  };
>  
> -/* Values for ats_attribute field above */
> +/* Masks for ats_attribute field above */
>  
> -#define ACPI_IORT_ATS_SUPPORTED 0x0001   /* The root complex 
> supports ATS */
> -#define ACPI_IORT_ATS_UNSUPPORTED   0x   /* The root complex 
> doesn't support ATS */
> +#define ACPI_IORT_ATS_SUPPORTED (1)  /* The root complex supports 
> ATS */
> +#define ACPI_IORT_PRI_SUPPORTED (1<<1)   /* The root complex 
> supports PRI */
>  
>  struct acpi_iort_smmu {
>   u64 base_address;   /* SMMU base address */
> @@ -241,6 +243,17 @@ struct acpi_iort_pmcg {
>   u64 page1_base_address;
>  };
>  
> +struct acpi_iort_rmr {
so indeed in E.b there is a new field here.
u32 flags
> + u32 rmr_count;
> + u32 rmr_offset;
> +};
> +
> +struct acpi_iort_rmr_desc {
> + u64 base_address;
> + u64 length;
> + u32 reserved;
> +};
> +
>  
> /***
>   *
>   * IVRS - I/O Virtualization Reporting Structure
> 
Thanks

Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-15 Thread Lu Baolu

On 4/15/21 2:18 PM, Keqian Zhu wrote:

Hi Baolu,

Thanks for the review!

On 2021/4/14 15:00, Lu Baolu wrote:

Hi Keqian,

On 4/13/21 4:54 PM, Keqian Zhu wrote:

Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Three new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

A new dev feature are added to indicate whether a specific type of
iommu hardware supports and its driver realizes them.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
   drivers/iommu/iommu.c | 150 ++
   include/linux/iommu.h |  53 +++
   2 files changed, 203 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..667b2d6d2fc0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1922,6 +1922,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
   domain->type = type;
   /* Assume all sizes by default; the driver may override this later */
   domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+mutex_init(>switch_log_lock);
 return domain;
   }
@@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
   }
   EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
   +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
+   unsigned long iova, size_t size, int prot)
+{
+const struct iommu_ops *ops = domain->ops;
+int ret;
+
+if (unlikely(!ops || !ops->switch_dirty_log))
+return -ENODEV;
+
+mutex_lock(>switch_log_lock);
+if (enable && domain->dirty_log_tracking) {
+ret = -EBUSY;
+goto out;
+} else if (!enable && !domain->dirty_log_tracking) {
+ret = -EINVAL;
+goto out;
+}
+
+ret = ops->switch_dirty_log(domain, enable, iova, size, prot);
+if (ret)
+goto out;
+
+domain->dirty_log_tracking = enable;
+out:
+mutex_unlock(>switch_log_lock);
+return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);


Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the
difference between

iommu_switch_dirty_log(on) vs. iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)

iommu_switch_dirty_log(off) vs. iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM)

Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable
are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use these
interfaces for it.

IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so we 
should


Previously we had iommu_dev_has_feature() and then was cleaned up due to
lack of real users. If you have a real case for it, why not bringing it
back?


design it as not switchable. I will modify the commit message of patch#12, 
thanks!


I am not sure that I fully get your point. But I can't see any gaps of
using iommu_dev_enable/disable_feature() to switch dirty log on and off.
Probably I missed anything.






+
+int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
+ size_t size, unsigned long *bitmap,
+ unsigned long base_iova, unsigned long bitmap_pgshift)
+{
+const struct iommu_ops *ops = domain->ops;
+unsigned int min_pagesz;
+size_t pgsize;
+int ret = 0;
+
+if (unlikely(!ops || !ops->sync_dirty_log))
+return -ENODEV;
+
+min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+if (!IS_ALIGNED(iova | size, min_pagesz)) {
+pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+   iova, size, min_pagesz);
+return -EINVAL;
+}
+
+mutex_lock(>switch_log_lock);
+if (!domain->dirty_log_tracking) {
+ret = -EINVAL;
+goto out;
+}
+
+while (size) {
+pgsize = iommu_pgsize(domain, iova, size);
+
+ret = ops->sync_dirty_log(domain, iova, pgsize,
+  bitmap, base_iova, bitmap_pgshift);


Any reason why do you want to do this in a per-4K page manner? This can
lead to a lot of indirect calls and bad performance.

How about a sync_dirty_pages()?

The function name of iommu_pgsize() is a bit puzzling. Actually it will try to
compute the max size that fit into size, so the pgsize can be a large page size
even if the underlying mapping is 4K. The __iommu_unmap() also has a similar 
logic.


This series has some improvement on the iommu_pgsize() helper.

https://lore.kernel.org/linux-iommu/20210405191112.28192-1-isa...@codeaurora.org/

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [PATCH v2 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-15 Thread Christoph Hellwig
>   *
>   * Returns 0 on success and < 0 on error.
> @@ -28,6 +28,9 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
> min, ioasid_t max)
>   int ret = 0;
>   ioasid_t pasid;
>  
> + if (mm != current->mm)
> + return -EINVAL;
> +

Why not remove the parameter entirely?

> @@ -2989,8 +2990,11 @@ iommu_sva_bind_device(struct device *dev, struct 
> mm_struct *mm, unsigned int fla
>   return ERR_PTR(-ENODEV);
>  
>   /* Supervisor SVA does not need the current mm */
> - if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && mm)
> - return ERR_PTR(-EINVAL);
> + if (!(flags & IOMMU_SVA_BIND_SUPERVISOR)) {
> + mm = get_task_mm(current);
> + if (!mm)
> + return ERR_PTR(-EINVAL);
> + }

I don't see why we need the reference.  I think we should just stop
passing the mm to ->sva_bind and let the low-level driver deal with
any reference to current->mm where needed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-04-15 Thread Christoph Hellwig
On Wed, Apr 14, 2021 at 08:27:56AM -0700, Jacob Pan wrote:
>  static int idxd_enable_system_pasid(struct idxd_device *idxd)
>  {
> - int flags;
> + unsigned int flags;
>   unsigned int pasid;
>   struct iommu_sva *sva;
>  
> - flags = SVM_FLAG_SUPERVISOR_MODE;
> + flags = IOMMU_SVA_BIND_SUPERVISOR;
>  
> - sva = iommu_sva_bind_device(>pdev->dev, NULL, );
> + sva = iommu_sva_bind_device(>pdev->dev, NULL, flags);

Please also remove the now pointless flags variable.

> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int 
> flags)

Pleae avoid the pointless overly long line.

> -#define SVM_FLAG_GUEST_PASID (1<<3)
> +#define SVM_FLAG_GUEST_PASID (1<<2)

This flag is entirely unused, please just remove it in a prep patch
rather than renumbering it.

>  static inline struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void 
> *drvdata)
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int 
> flags)

Same overy long line here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-15 Thread Keqian Zhu
Hi Baolu,

Thanks for the review!

On 2021/4/14 15:00, Lu Baolu wrote:
> Hi Keqian,
> 
> On 4/13/21 4:54 PM, Keqian Zhu wrote:
>> Some types of IOMMU are capable of tracking DMA dirty log, such as
>> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
>> dirty log tracking framework in the IOMMU base layer.
>>
>> Three new essential interfaces are added, and we maintaince the status
>> of dirty log tracking in iommu_domain.
>> 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
>> 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
>> 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
>>
>> A new dev feature are added to indicate whether a specific type of
>> iommu hardware supports and its driver realizes them.
>>
>> Signed-off-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/iommu.c | 150 ++
>>   include/linux/iommu.h |  53 +++
>>   2 files changed, 203 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index d0b0a15dba84..667b2d6d2fc0 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1922,6 +1922,7 @@ static struct iommu_domain 
>> *__iommu_domain_alloc(struct bus_type *bus,
>>   domain->type = type;
>>   /* Assume all sizes by default; the driver may override this later */
>>   domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>> +mutex_init(>switch_log_lock);
>> return domain;
>>   }
>> @@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain 
>> *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
>>   +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
>> +   unsigned long iova, size_t size, int prot)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +int ret;
>> +
>> +if (unlikely(!ops || !ops->switch_dirty_log))
>> +return -ENODEV;
>> +
>> +mutex_lock(>switch_log_lock);
>> +if (enable && domain->dirty_log_tracking) {
>> +ret = -EBUSY;
>> +goto out;
>> +} else if (!enable && !domain->dirty_log_tracking) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +ret = ops->switch_dirty_log(domain, enable, iova, size, prot);
>> +if (ret)
>> +goto out;
>> +
>> +domain->dirty_log_tracking = enable;
>> +out:
>> +mutex_unlock(>switch_log_lock);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
> 
> Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the
> difference between
> 
> iommu_switch_dirty_log(on) vs. iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)
> 
> iommu_switch_dirty_log(off) vs. 
> iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM)
Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable
are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use these
interfaces for it.

IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so we 
should
design it as not switchable. I will modify the commit message of patch#12, 
thanks!

> 
>> +
>> +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
>> + size_t size, unsigned long *bitmap,
>> + unsigned long base_iova, unsigned long bitmap_pgshift)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +unsigned int min_pagesz;
>> +size_t pgsize;
>> +int ret = 0;
>> +
>> +if (unlikely(!ops || !ops->sync_dirty_log))
>> +return -ENODEV;
>> +
>> +min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
>> +if (!IS_ALIGNED(iova | size, min_pagesz)) {
>> +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
>> +   iova, size, min_pagesz);
>> +return -EINVAL;
>> +}
>> +
>> +mutex_lock(>switch_log_lock);
>> +if (!domain->dirty_log_tracking) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +while (size) {
>> +pgsize = iommu_pgsize(domain, iova, size);
>> +
>> +ret = ops->sync_dirty_log(domain, iova, pgsize,
>> +  bitmap, base_iova, bitmap_pgshift);
> 
> Any reason why do you want to do this in a per-4K page manner? This can
> lead to a lot of indirect calls and bad performance.
> 
> How about a sync_dirty_pages()?
The function name of iommu_pgsize() is a bit puzzling. Actually it will try to
compute the max size that fit into size, so the pgsize can be a large page size
even if the underlying mapping is 4K. The __iommu_unmap() also has a similar 
logic.

BRs,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu