Re: [RFC PATCH 3/3] pseries/iommu: Enable DDW for VFIO TCE create

2024-03-25 Thread Shivaprasad G Bhat

Hi Michael,

On 3/13/24 18:23, Michael Ellerman wrote:

Hi Shivaprasad,

Shivaprasad G Bhat  writes:

The commit 9d67c9433509 ("powerpc/iommu: Add \"borrowing\"
iommu_table_group_ops") implemented the "borrow" mechanism for
the pSeries SPAPR TCE. It did implement this support partially
that it left out creating the DDW if not present already.

The patch here attempts to fix the missing gaps.
  - Expose the DDW info to user by collecting it during probe.
  - Create the window and the iommu table if not present during
VFIO_SPAPR_TCE_CREATE.
  - Remove and recreate the window if the pageshift and window sizes
do not match.
  - Restore the original window in enable_ddw() if the user had
created/modified the DDW. As there is preference for DIRECT mapping
on the host driver side, the user created window is removed.

The changes work only for the non-SRIOV-VF scenarios for PEs having
2 DMA windows.

This crashes on powernv.



Thanks for pointing this out.  I will take care of this in v2 of this RFC.


Regards,

Shivaprasad



Re: [RFC PATCH 3/3] pseries/iommu: Enable DDW for VFIO TCE create

2024-03-13 Thread Michael Ellerman
Hi Shivaprasad,

Shivaprasad G Bhat  writes:
> The commit 9d67c9433509 ("powerpc/iommu: Add \"borrowing\"
> iommu_table_group_ops") implemented the "borrow" mechanism for
> the pSeries SPAPR TCE. It did implement this support partially
> that it left out creating the DDW if not present already.
>
> The patch here attempts to fix the missing gaps.
>  - Expose the DDW info to user by collecting it during probe.
>  - Create the window and the iommu table if not present during
>VFIO_SPAPR_TCE_CREATE.
>  - Remove and recreate the window if the pageshift and window sizes
>do not match.
>  - Restore the original window in enable_ddw() if the user had
>created/modified the DDW. As there is preference for DIRECT mapping
>on the host driver side, the user created window is removed.
>
> The changes work only for the non-SRIOV-VF scenarios for PEs having
> 2 DMA windows.

This crashes on powernv.

Full log at 
https://github.com/linuxppc/linux-snowpatch/actions/runs/8253875566/job/22577897225.

[0.958561][T1] pci_bus 0002:01: Configuring PE for bus
[0.959699][T1] pci 0002:01 : [PE# fd] Secondary bus 
0x0001 associated with PE#fd
[0.961692][T1] pci 0002:01:00.0: Configured PE#fd
[0.962424][T1] pci 0002:01 : [PE# fd] Setting up 32-bit TCE table 
at 0..8000
[0.966424][T1] IOMMU table initialized, virtual merging enabled
[0.967544][T1] pci 0002:01 : [PE# fd] Setting up window#0 
0.. pg=1
[0.969362][T1] pci 0002:01 : [PE# fd] Enabling 64-bit DMA bypass
[0.971386][T1] pci 0002:01:00.0: Adding to iommu group 0
[0.973481][T1] BUG: Unable to handle kernel instruction fetch (NULL 
pointer?)
[0.974388][T1] Faulting instruction address: 0x
[0.975578][T1] Oops: Kernel access of bad area, sig: 11 [#1]
[0.976476][T1] LE PAGE_SIZE=64K MMU=Hash SMP ERROR: Error: saw 
oops/warning etc. while expecting NR_CPUS=2048 NUMA PowerNV
[0.97][T1] Modules linked in:
[0.978570][T1] CPU: 1 PID: 1 Comm: swapper/1 Not tainted 
6.8.0-rc6-g80dcb4e6d0aa #1
[0.979766][T1] Hardware name: IBM PowerNV (emulated by qemu) POWER8 
0x4d0200 opal:v6.8-104-g820d43c0 PowerNV
[0.981197][T1] NIP:   LR: c005653c CTR: 

[0.982221][T1] REGS: c3687420 TRAP: 0480   Not tainted  
(6.8.0-rc6-g80dcb4e6d0aa)
[0.983400][T1] MSR:  90009033   CR: 
44004422  XER: 
[0.984742][T1] CFAR: c0056538 IRQMASK: 0 
[0.984742][T1] GPR00: c0056520 c36876c0 
c15b9800 c363ae58 
[0.984742][T1] GPR04: c352f0a0 c26d4748 
0001  
[0.984742][T1] GPR08:  c2716668 
0003 8000 
[0.984742][T1] GPR12:  c2be 
c00110cc  
[0.984742][T1] GPR16:   
  
[0.984742][T1] GPR20:   
 0001 
[0.984742][T1] GPR24: c14681d8  
c3068a00 0001 
[0.984742][T1] GPR28: c3068a00  
c363ae58 c352f0a0 
[0.994647][T1] NIP [] 0x0
[0.995699][T1] LR [c005653c] 
spapr_tce_platform_iommu_attach_dev+0x74/0xc8
[0.997399][T1] Call Trace:
[0.997897][T1] [c36876c0] [c0056514] 
spapr_tce_platform_iommu_attach_dev+0x4c/0xc8 (unreliable)
[0.999383][T1] [c3687700] [c0b383dc] 
__iommu_attach_device+0x44/0xfc
[1.000476][T1] [c3687730] [c0b38574] 
__iommu_device_set_domain+0xe0/0x170
[1.001728][T1] [c36877c0] [c0b3869c] 
__iommu_group_set_domain_internal+0x98/0x1c0
[1.003014][T1] [c3687820] [c0b3bb10] 
iommu_setup_default_domain+0x544/0x650
[1.004306][T1] [c36878e0] [c0b3d3b4] 
__iommu_probe_device+0x5b0/0x604
[1.005500][T1] [c3687950] [c0b3d454] 
iommu_probe_device+0x4c/0xb0
[1.006563][T1] [c3687990] [c005648c] 
iommu_add_device+0x3c/0x78
[1.007590][T1] [c36879b0] [c00db920] 
pnv_pci_ioda_dma_dev_setup+0x168/0x73c
[1.008918][T1] [c3687a60] [c00729f4] 
pcibios_bus_add_device+0x80/0x328
[1.010077][T1] [c3687ac0] [c0a49fa0] 
pci_bus_add_device+0x30/0x11c
[1.011169][T1] [c3687b30] [c0a4a0e4] 
pci_bus_add_devices+0x58/0xb4
[1.012230][T1] [c3687b70] [c0a4a118] 
pci_bus_add_devices+0x8c/0xb4
[1.013301][T1] [c3687bb0] [c201a3c8] 
pcibios_init+0xd8/0x140
[1.014314][T1] [c3687c30] [c0010d58] 
do_one_initcall+0x80/0x2f8
[

[RFC PATCH 3/3] pseries/iommu: Enable DDW for VFIO TCE create

2024-03-12 Thread Shivaprasad G Bhat
The commit 9d67c9433509 ("powerpc/iommu: Add \"borrowing\"
iommu_table_group_ops") implemented the "borrow" mechanism for
the pSeries SPAPR TCE. It did implement this support partially
that it left out creating the DDW if not present already.

The patch here attempts to fix the missing gaps.
 - Expose the DDW info to user by collecting it during probe.
 - Create the window and the iommu table if not present during
   VFIO_SPAPR_TCE_CREATE.
 - Remove and recreate the window if the pageshift and window sizes
   do not match.
 - Restore the original window in enable_ddw() if the user had
   created/modified the DDW. As there is preference for DIRECT mapping
   on the host driver side, the user created window is removed.

The changes work only for the non-SRIOV-VF scenarios for PEs having
2 DMA windows.

Signed-off-by: Shivaprasad G Bhat 
---
 arch/powerpc/include/asm/iommu.h   |3 
 arch/powerpc/kernel/iommu.c|7 -
 arch/powerpc/platforms/pseries/iommu.c |  362 +++-
 3 files changed, 360 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 744cc5fc22d3..fde174122844 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -110,6 +110,7 @@ struct iommu_table {
unsigned long  it_page_shift;/* table iommu page size */
struct list_head it_group_list;/* List of iommu_table_group_link */
__be64 *it_userspace; /* userspace view of the table */
+   bool reset_ddw;
struct iommu_table_ops *it_ops;
struct krefit_kref;
int it_nid;
@@ -169,6 +170,8 @@ struct iommu_table_group_ops {
__u32 page_shift,
__u64 window_size,
__u32 levels);
+   void (*init_group)(struct iommu_table_group *table_group,
+   struct device *dev);
long (*create_table)(struct iommu_table_group *table_group,
int num,
__u32 page_shift,
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index aa11b2acf24f..1cce2b8b8f2c 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -740,6 +740,7 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid,
return NULL;
}
 
+   tbl->it_nid = nid;
iommu_table_reserve_pages(tbl, res_start, res_end);
 
/* We only split the IOMMU table if we have 1GB or more of space */
@@ -1141,7 +1142,10 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain 
*platform_domain,
 {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct iommu_group *grp = iommu_group_get(dev);
-   struct iommu_table_group *table_group;
+   struct iommu_table_group *table_group = iommu_group_get_iommudata(grp);
+
+   /* This should have been in spapr_tce_iommu_probe_device() ?*/
+   table_group->ops->init_group(table_group, dev);
 
/* At first attach the ownership is already set */
if (!domain) {
@@ -1149,7 +1153,6 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain 
*platform_domain,
return 0;
}
 
-   table_group = iommu_group_get_iommudata(grp);
/*
 * The domain being set to PLATFORM from earlier
 * BLOCKED. The table_group ownership has to be released.
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 3d9865dadf73..7224107a0f60 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -630,6 +630,62 @@ static void iommu_table_setparms(struct pci_controller 
*phb,
phb->dma_window_base_cur += phb->dma_window_size;
 }
 
+static int iommu_table_reset(struct iommu_table *tbl, unsigned long busno,
+  unsigned long liobn, unsigned long win_addr,
+  unsigned long window_size, unsigned long 
page_shift,
+  void *base, struct iommu_table_ops 
*table_ops)
+{
+   unsigned long sz = BITS_TO_LONGS(tbl->it_size) * sizeof(unsigned long);
+   unsigned int i, oldsize = tbl->it_size;
+   struct iommu_pool *p;
+
+   WARN_ON(!tbl->it_ops);
+
+   if (oldsize != (window_size >> page_shift)) {
+   vfree(tbl->it_map);
+
+   tbl->it_map = vzalloc_node(sz, tbl->it_nid);
+   if (!tbl->it_map)
+   return -ENOMEM;
+
+   tbl->it_size = window_size >> page_shift;
+   if (oldsize < (window_size >> page_shift))
+   iommu_table_clear(tbl);
+   }
+   tbl->it_busno = busno;
+   tbl->it_index = liobn;
+   tbl->it_offset = win_addr >> page_shift;
+   tbl->it_blocksize = 16;
+   tbl->it_type = TCE_PCI;
+   tbl->it_ops = table_ops;
+   tbl->it_page_shift = page_shift;
+