[PATCH kernel v7 31/31] vfio: powerpc/spapr: Support Dynamic DMA windows
This adds create/remove window ioctls to create and remove DMA windows. sPAPR defines a Dynamic DMA windows capability which allows para-virtualized guests to create additional DMA windows on a PCI bus. The existing linux kernels use this new window to map the entire guest memory and switch to the direct DMA operations saving time on map/unmap requests which would normally happen in a big amounts. This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows. Up to 2 windows are supported now by the hardware and by this driver. This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional information such as a number of supported windows and maximum number levels of TCE tables. DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature as we still want to support v2 on platforms which cannot do DDW for the sake of TCE acceleration in KVM (coming soon). Signed-off-by: Alexey Kardashevskiy --- Changes: v7: * s/VFIO_IOMMU_INFO_DDW/VFIO_IOMMU_SPAPR_INFO_DDW/ * fixed typos in and updated vfio.txt * fixed VFIO_IOMMU_SPAPR_TCE_GET_INFO handler * moved ddw properties to vfio_iommu_spapr_tce_ddw_info v6: * added explicit VFIO_IOMMU_INFO_DDW flag to vfio_iommu_spapr_tce_info, it used to be page mask flags from platform code * added explicit pgsizes field * added cleanup if tce_iommu_create_window() failed in a middle * added checks for callbacks in tce_iommu_create_window and remove those from tce_iommu_remove_window when it is too late to test anyway * spapr_tce_find_free_table returns sensible error code now * updated description of VFIO_IOMMU_SPAPR_TCE_CREATE/ VFIO_IOMMU_SPAPR_TCE_REMOVE v4: * moved code to tce_iommu_create_window()/tce_iommu_remove_window() helpers * added docs --- Documentation/vfio.txt | 19 arch/powerpc/include/asm/iommu.h| 2 +- drivers/vfio/vfio_iommu_spapr_tce.c | 196 +++- include/uapi/linux/vfio.h | 61 ++- 4 files changed, 273 insertions(+), 5 deletions(-) diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index 7dcf2b5..8b1ec51 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt @@ -452,6 +452,25 @@ address is from pre-registered range. This separation helps in optimizing DMA for guests. +6) sPAPR specification allows guests to have an additional DMA window(s) on +a PCI bus with a variable page size. Two ioctls have been added to support +this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE. +The platform has to support the functionality or error will be returned to +the userspace. The existing hardware supports up to 2 DMA windows, one is +2GB long, uses 4K pages and called "default 32bit window"; the other can +be as big as entire RAM, use different page size, it is optional - guests +create those in run-time if the guest driver supports 64bit DMA. + +VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and +a number of TCE table levels (if a TCE table is going to be big enough and +the kernel may not be able to allocate enough of physically contiguous memory). +It creates a new window in the available slot and returns the bus address where +the new window starts. Due to hardware limitation, the user space cannot choose +the location of DMA windows. + +VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window +and removes it. + --- [1] VFIO was originally an acronym for "Virtual Function I/O" in its diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 9027b9e..1db774c0 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -147,7 +147,7 @@ extern void iommu_free_table(struct iommu_table *tbl, const char *node_name); extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, int nid); -#define IOMMU_TABLE_GROUP_MAX_TABLES 1 +#define IOMMU_TABLE_GROUP_MAX_TABLES 2 struct iommu_table_group; diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 8cbd239..225af37 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -535,6 +535,20 @@ static long tce_iommu_build_v2(struct tce_container *container, return ret; } +static int spapr_tce_find_free_table(struct tce_container *container) +{ + int i; + + for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { + struct iommu_table *tbl = >tables[i]; + + if (!tbl->it_size) + return i; + } + + return -ENOSPC; +} + static long tce_iommu_create_table(struct iommu_table_group *table_group, int num, __u32 page_shift, @@ -573,11 +587,114 @@ static void tce_iommu_free_table(struct iommu_table *tbl)
[PATCH kernel v7 31/31] vfio: powerpc/spapr: Support Dynamic DMA windows
This adds create/remove window ioctls to create and remove DMA windows. sPAPR defines a Dynamic DMA windows capability which allows para-virtualized guests to create additional DMA windows on a PCI bus. The existing linux kernels use this new window to map the entire guest memory and switch to the direct DMA operations saving time on map/unmap requests which would normally happen in a big amounts. This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows. Up to 2 windows are supported now by the hardware and by this driver. This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional information such as a number of supported windows and maximum number levels of TCE tables. DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature as we still want to support v2 on platforms which cannot do DDW for the sake of TCE acceleration in KVM (coming soon). Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v7: * s/VFIO_IOMMU_INFO_DDW/VFIO_IOMMU_SPAPR_INFO_DDW/ * fixed typos in and updated vfio.txt * fixed VFIO_IOMMU_SPAPR_TCE_GET_INFO handler * moved ddw properties to vfio_iommu_spapr_tce_ddw_info v6: * added explicit VFIO_IOMMU_INFO_DDW flag to vfio_iommu_spapr_tce_info, it used to be page mask flags from platform code * added explicit pgsizes field * added cleanup if tce_iommu_create_window() failed in a middle * added checks for callbacks in tce_iommu_create_window and remove those from tce_iommu_remove_window when it is too late to test anyway * spapr_tce_find_free_table returns sensible error code now * updated description of VFIO_IOMMU_SPAPR_TCE_CREATE/ VFIO_IOMMU_SPAPR_TCE_REMOVE v4: * moved code to tce_iommu_create_window()/tce_iommu_remove_window() helpers * added docs --- Documentation/vfio.txt | 19 arch/powerpc/include/asm/iommu.h| 2 +- drivers/vfio/vfio_iommu_spapr_tce.c | 196 +++- include/uapi/linux/vfio.h | 61 ++- 4 files changed, 273 insertions(+), 5 deletions(-) diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index 7dcf2b5..8b1ec51 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt @@ -452,6 +452,25 @@ address is from pre-registered range. This separation helps in optimizing DMA for guests. +6) sPAPR specification allows guests to have an additional DMA window(s) on +a PCI bus with a variable page size. Two ioctls have been added to support +this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE. +The platform has to support the functionality or error will be returned to +the userspace. The existing hardware supports up to 2 DMA windows, one is +2GB long, uses 4K pages and called default 32bit window; the other can +be as big as entire RAM, use different page size, it is optional - guests +create those in run-time if the guest driver supports 64bit DMA. + +VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and +a number of TCE table levels (if a TCE table is going to be big enough and +the kernel may not be able to allocate enough of physically contiguous memory). +It creates a new window in the available slot and returns the bus address where +the new window starts. Due to hardware limitation, the user space cannot choose +the location of DMA windows. + +VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window +and removes it. + --- [1] VFIO was originally an acronym for Virtual Function I/O in its diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 9027b9e..1db774c0 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -147,7 +147,7 @@ extern void iommu_free_table(struct iommu_table *tbl, const char *node_name); extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, int nid); -#define IOMMU_TABLE_GROUP_MAX_TABLES 1 +#define IOMMU_TABLE_GROUP_MAX_TABLES 2 struct iommu_table_group; diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 8cbd239..225af37 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -535,6 +535,20 @@ static long tce_iommu_build_v2(struct tce_container *container, return ret; } +static int spapr_tce_find_free_table(struct tce_container *container) +{ + int i; + + for (i = 0; i IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { + struct iommu_table *tbl = container-tables[i]; + + if (!tbl-it_size) + return i; + } + + return -ENOSPC; +} + static long tce_iommu_create_table(struct iommu_table_group *table_group, int num, __u32 page_shift, @@ -573,11 +587,114 @@ static void tce_iommu_free_table(struct