Re: [PATCH v6 00/11] DDW + Indirect Mapping
On Tue, 2021-08-31 at 13:39 -0700, David Christensen wrote: > > > > This series allow Indirect DMA using DDW when available, which > > usually > > means bigger pagesizes and more TCEs, and so more DMA space. > > How is the mapping method selected? LPAR creation via the HMC, Linux > kernel load parameter, or some other method? At device/bus probe, if there is enough DMA space available for Direct DMA, then it's used. If not, it uses indirect DMA. > > The hcall overhead doesn't seem too worrisome when mapping 1GB pages > so > the Indirect DMA method might be best in my situation (DPDK). Well, it depends on usage. I mean, the recommended use of IOMMU is to map, transmit and then unmap, but this will vary on the implementation of the driver. If, for example, there is some reuse of the DMA mapping, as in a previous patchset I sent (IOMMU Pagecache), then the hcall overhead can be reduced drastically. > > Dave Best regards, Leonardo
Re: [PATCH v6 00/11] DDW + Indirect Mapping
On 8/31/21 1:18 PM, Leonardo Brás wrote: Hello David, Sorry for the delay, I did not get your mail because I was not CC'd in your reply (you sent the mail just to the mailing list). Replies bellow: On Mon, 2021-08-30 at 10:48 -0700, David Christensen wrote: On 8/16/21 11:39 PM, Leonardo Bras wrote: So far it's assumed possible to map the guest RAM 1:1 to the bus, which works with a small number of devices. SRIOV changes it as the user can configure hundreds VFs and since phyp preallocates TCEs and does not allow IOMMU pages bigger than 64K, it has to limit the number of TCEs per a PE to limit waste of physical pages. As of today, if the assumed direct mapping is not possible, DDW creation is skipped and the default DMA window "ibm,dma-window" is used instead. Using the DDW instead of the default DMA window may allow to expand the amount of memory that can be DMA-mapped, given the number of pages (TCEs) may stay the same (or increase) and the default DMA window offers only 4k-pages while DDW may offer larger pages (4k, 64k, 16M ...). So if I'm reading this correctly, VFIO applications requiring hugepage DMA mappings (e.g. 16M or 2GB) can be supported on an LPAR or DLPAR after this change, is that correct? Different DDW IOMMU page sizes were already supported in Linux (4k, 64k, 16M) for a while now, and the remaining page sizes in LoPAR were enabled in the following patch: http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobra...@gmail.com/ (commit 472724111f0f72042deb6a9dcee9578e5398a1a1) The thing is there are two ways of using DMA: - Direct DMA, mapping the whole memory space of the host, which requires a lot of DMA space if the guest memory is huge. This already supports DDW and allows using the bigger pagesizes. This happens on device/bus probe. - Indirect DMA with IOMMU, mapping memory regions on demand, and un- mapping after use. This requires much less DMA space, but causes an overhead because an hcall is necessary for mapping and un-mapping. Before this series, Indirect DMA was only possible with the 'default DMA window' which allows using only 4k pages. This series allow Indirect DMA using DDW when available, which usually means bigger pagesizes and more TCEs, and so more DMA space. How is the mapping method selected? LPAR creation via the HMC, Linux kernel load parameter, or some other method? The hcall overhead doesn't seem too worrisome when mapping 1GB pages so the Indirect DMA method might be best in my situation (DPDK). Dave
Re: [PATCH v6 00/11] DDW + Indirect Mapping
Hello David, Sorry for the delay, I did not get your mail because I was not CC'd in your reply (you sent the mail just to the mailing list). Replies bellow: On Mon, 2021-08-30 at 10:48 -0700, David Christensen wrote: > On 8/16/21 11:39 PM, Leonardo Bras wrote: > > So far it's assumed possible to map the guest RAM 1:1 to the bus, > > which > > works with a small number of devices. SRIOV changes it as the user > > can > > configure hundreds VFs and since phyp preallocates TCEs and does not > > allow IOMMU pages bigger than 64K, it has to limit the number of TCEs > > per a PE to limit waste of physical pages. > > > > As of today, if the assumed direct mapping is not possible, DDW > > creation > > is skipped and the default DMA window "ibm,dma-window" is used > > instead. > > > > Using the DDW instead of the default DMA window may allow to expand > > the > > amount of memory that can be DMA-mapped, given the number of pages > > (TCEs) > > may stay the same (or increase) and the default DMA window offers > > only > > 4k-pages while DDW may offer larger pages (4k, 64k, 16M ...). > > So if I'm reading this correctly, VFIO applications requiring hugepage > DMA mappings (e.g. 16M or 2GB) can be supported on an LPAR or DLPAR > after this change, is that correct? Different DDW IOMMU page sizes were already supported in Linux (4k, 64k, 16M) for a while now, and the remaining page sizes in LoPAR were enabled in the following patch: http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobra...@gmail.com/ (commit 472724111f0f72042deb6a9dcee9578e5398a1a1) The thing is there are two ways of using DMA: - Direct DMA, mapping the whole memory space of the host, which requires a lot of DMA space if the guest memory is huge. This already supports DDW and allows using the bigger pagesizes. This happens on device/bus probe. - Indirect DMA with IOMMU, mapping memory regions on demand, and un- mapping after use. This requires much less DMA space, but causes an overhead because an hcall is necessary for mapping and un-mapping. Before this series, Indirect DMA was only possible with the 'default DMA window' which allows using only 4k pages. This series allow Indirect DMA using DDW when available, which usually means bigger pagesizes and more TCEs, and so more DMA space. tl;dr this patchset means you can have more DMA space in Indirect DMA, because you are using DDW instead of the Default DMA window. > Any limitations based on processor > or pHyp revision levels? The IOMMU page size will be limited by the sizes offered by processor and hypervisor. They are announced at "IO Page Sizes" output of ibm,query-pe-dma-window, but now the biggest pagesize automatically selected with commit 472724111f0f72042deb6a9dcee9578e5398a1a1 above mentioned. Hope this helps, please let me know if there is any remaining question. Best regards, Leonardo
Re: [PATCH v6 00/11] DDW + Indirect Mapping
On Tue, 17 Aug 2021 03:39:18 -0300, Leonardo Bras wrote: > So far it's assumed possible to map the guest RAM 1:1 to the bus, which > works with a small number of devices. SRIOV changes it as the user can > configure hundreds VFs and since phyp preallocates TCEs and does not > allow IOMMU pages bigger than 64K, it has to limit the number of TCEs > per a PE to limit waste of physical pages. > > As of today, if the assumed direct mapping is not possible, DDW creation > is skipped and the default DMA window "ibm,dma-window" is used instead. > > [...] Applied to powerpc/next. [01/11] powerpc/pseries/iommu: Replace hard-coded page shift https://git.kernel.org/powerpc/c/0c634bafe3bbee7a36dca7f1277057e05bf14d91 [02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper https://git.kernel.org/powerpc/c/3c33066a21903076722a2881556a92aa3cd7d359 [03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper https://git.kernel.org/powerpc/c/4ff8677a0b192a58d998d1d34fc5168203041a24 [04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper https://git.kernel.org/powerpc/c/92a23219299cedde52e3298788484f4875d5ce0f [05/11] powerpc/pseries/iommu: Allow DDW windows starting at 0x00 https://git.kernel.org/powerpc/c/2ca73c54ce24489518a56d816331b774044c2445 [06/11] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() https://git.kernel.org/powerpc/c/7ed2ed2db2685a285cb09ab330dc4efea0b64022 [07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper https://git.kernel.org/powerpc/c/fc8cba8f989fb98e496b33a78476861e246c42a0 [08/11] powerpc/pseries/iommu: Update remove_dma_window() to accept property name https://git.kernel.org/powerpc/c/a5fd95120c653962a9e75e260a35436b96d2c991 [09/11] powerpc/pseries/iommu: Find existing DDW with given property name https://git.kernel.org/powerpc/c/8599395d34f2dd7b77bef42da1d99798e7a3d58f [10/11] powerpc/pseries/iommu: Make use of DDW for indirect mapping https://git.kernel.org/powerpc/c/381ceda88c4c4c8345cad1cffa6328892f15dca6 [11/11] powerpc/pseries/iommu: Rename "direct window" to "dma window" https://git.kernel.org/powerpc/c/57dbbe590f152e5e8a3ff8bf5ba163df34eeae0b cheers
Re: [PATCH v6 00/11] DDW + Indirect Mapping
On 8/16/21 11:39 PM, Leonardo Bras wrote: So far it's assumed possible to map the guest RAM 1:1 to the bus, which works with a small number of devices. SRIOV changes it as the user can configure hundreds VFs and since phyp preallocates TCEs and does not allow IOMMU pages bigger than 64K, it has to limit the number of TCEs per a PE to limit waste of physical pages. As of today, if the assumed direct mapping is not possible, DDW creation is skipped and the default DMA window "ibm,dma-window" is used instead. Using the DDW instead of the default DMA window may allow to expand the amount of memory that can be DMA-mapped, given the number of pages (TCEs) may stay the same (or increase) and the default DMA window offers only 4k-pages while DDW may offer larger pages (4k, 64k, 16M ...). So if I'm reading this correctly, VFIO applications requiring hugepage DMA mappings (e.g. 16M or 2GB) can be supported on an LPAR or DLPAR after this change, is that correct? Any limitations based on processor or pHyp revision levels? Dave
[PATCH v6 00/11] DDW + Indirect Mapping
So far it's assumed possible to map the guest RAM 1:1 to the bus, which works with a small number of devices. SRIOV changes it as the user can configure hundreds VFs and since phyp preallocates TCEs and does not allow IOMMU pages bigger than 64K, it has to limit the number of TCEs per a PE to limit waste of physical pages. As of today, if the assumed direct mapping is not possible, DDW creation is skipped and the default DMA window "ibm,dma-window" is used instead. Using the DDW instead of the default DMA window may allow to expand the amount of memory that can be DMA-mapped, given the number of pages (TCEs) may stay the same (or increase) and the default DMA window offers only 4k-pages while DDW may offer larger pages (4k, 64k, 16M ...). Patch #1 replaces hard-coded 4K page size with a variable containing the correct page size for the window. Patch #2 introduces iommu_table_in_use(), and replace manual bit-field checking where it's used. It will be used for aborting enable_ddw() if there is any current iommu allocation and we are trying single window indirect mapping. Patch #3 introduces iommu_pseries_alloc_table() that will be helpful when indirect mapping needs to replace the iommu_table. Patch #4 adds helpers for adding DDWs in the list. Patch #5 refactors enable_ddw() so it returns if direct mapping is possible, instead of DMA offset. It helps for next patches on indirect DMA mapping and also allows DMA windows starting at 0x00. Patch #6 bring new helper to simplify enable_ddw(), allowing some reorganization for introducing indirect mapping DDW. Patch #7 adds new helper _iommu_table_setparms() and use it in other *setparams*() to fill iommu_table. It will also be used for creating a new iommu_table for indirect mapping. Patch #8 updates remove_dma_window() to accept different property names, so we can introduce a new property for indirect mapping. Patch #9 extracts find_existing_ddw_windows() into find_existing_ddw_windows_named(), and calls it by it's property name. This will be useful when the property for indirect mapping is created, so we can search the device-tree for both properties. Patch #10: Instead of destroying the created DDW if it doesn't map the whole partition, make use of it instead of the default DMA window as it improves performance. Also, update the iommu_table and re-generate the pools. It introduces a new property name for DDW with indirect DMA mapping. Patch #11: Does some renaming of 'direct window' to 'dma window', given the DDW created can now be also used in indirect mapping if direct mapping is not available. All patches were tested into an LPAR with an virtio-net interface that allows default DMA window and DDW to coexist. Changes since v5: - Reviews from Frederic Barrat - 02/11 : memset bitmap only if tbl not in use - 06/11 : remove_ddw() is not used in enable_ddw() error path anymore New helpers were created for that. - 10/11 : There was a typo, but got replaced due to 06/11 fix. v5 Link: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=253799&state=%2A&archive=both Changes since v4: - Solve conflicts with new upstream versions - Avoid unecessary code moving by doing variable declaration before definition - Rename _iommu_table_setparms to iommu_table_setparms_common and changed base parameter from unsigned long to void* in order to avoid unecessary casting. - Fix breaking case for existing direct-mapping. - Fix IORESOURCE_MEM bound issue - Move new tbl to pci->table_group->tables[1] instead of replacing [0] v4 Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=241597&state=%2A&archive=both Changes since v3: - Fixed inverted free order at ddw_property_create() - Updated goto tag naming v3 Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=240287&state=%2A&archive=both Changes since v2: - Some patches got removed from the series and sent by themselves, - New tbl created for DDW + indirect mapping reserves MMIO32 space, - Improved reserved area algorithm, - Improved commit messages, - Removed define for default DMA window prop name, - Avoided some unnecessary renaming, - Removed some unnecessary empty lines, - Changed some code moving to forward declarations. v2 Link: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=201210&state=%2A&archive=both Leonardo Bras (11): powerpc/pseries/iommu: Replace hard-coded page shift powerpc/kernel/iommu: Add new iommu_table_in_use() helper powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper powerpc/pseries/iommu: Add ddw_list_new_entry() helper powerpc/pseries/iommu: Allow DDW windows starting at 0x00 powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper powerpc/pseries/iommu: Update remove_dma_window() to accept property name powerpc/pseries/iommu: Find existing DDW with given property name powerpc/pseries/iommu: Ma