Re: [PATCH v4 1/3] dt-bindings: memory: mediatek: Convert SMI to DT
On Mon, 2 Nov 2020 at 06:31, Yong Wu wrote: > > On Sat, 2020-10-31 at 12:36 +0100, Krzysztof Kozlowski wrote: > > On Fri, Oct 30, 2020 at 05:12:52PM +0800, Yong Wu wrote: > > > Convert MediaTek SMI to DT schema. > > > > > > CC: Fabien Parent > > > CC: Ming-Fan Chen > > > CC: Matthias Brugger > > > Signed-off-by: Yong Wu > > > --- > > > .../mediatek,smi-common.txt | 50 --- > > > .../mediatek,smi-common.yaml | 140 ++ > > > .../memory-controllers/mediatek,smi-larb.txt | 50 --- > > > .../memory-controllers/mediatek,smi-larb.yaml | 129 > > > 4 files changed, 269 insertions(+), 100 deletions(-) > > > delete mode 100644 > > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt > > > > +Cc Honghui Zhang, > > As comment [1], Honghui's address is not valid now. I will act for him. > > > > > Your Ack is needed as you contributed descriptions to the bindings and > > work is being relicensed to GPL-2.0-only OR BSD-2-Clause. > > "GPL-2.0-only OR BSD-2-Clause" is required when we run check-patch. > > If I still use "GPL-2.0-only", then the contributors' Ack/SoB is not > needed, right? That would be one solution but I was thinking to proceed with only your agreement. You were the main contributor to these files. Honghui added a few descriptions. Other developers added only compatibles. Since we cannot reach Honghui, I would assume that your agreement (Sign-off) is enough. Best regards, Krzysztof ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 2/2] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module
+ ath10k list John Stultz writes: > Allow the qcom_scm driver to be loadable as a permenent module. > > This still uses the "depends on QCOM_SCM || !QCOM_SCM" bit to > ensure that drivers that call into the qcom_scm driver are > also built as modules. While not ideal in some cases its the > only safe way I can find to avoid build errors without having > those drivers select QCOM_SCM and have to force it on (as > QCOM_SCM=n can be valid for those drivers). > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Andy Gross > Cc: Bjorn Andersson > Cc: Joerg Roedel > Cc: Thomas Gleixner > Cc: Jason Cooper > Cc: Marc Zyngier > Cc: Linus Walleij > Cc: Vinod Koul > Cc: Kalle Valo > Cc: Maulik Shah > Cc: Lina Iyer > Cc: Saravana Kannan > Cc: Todd Kjos > Cc: Greg Kroah-Hartman > Cc: linux-arm-...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org > Cc: linux-g...@vger.kernel.org > Acked-by: Greg Kroah-Hartman > Signed-off-by: John Stultz > --- > v3: > * Fix __arm_smccc_smc build issue reported by > kernel test robot > v4: > * Add "depends on QCOM_SCM || !QCOM_SCM" bit to ath10k > config that requires it. > v5: > * Fix QCOM_QCM typo in Kconfig, it should be QCOM_SCM > --- > drivers/firmware/Kconfig| 4 ++-- > drivers/firmware/Makefile | 3 ++- > drivers/firmware/qcom_scm.c | 4 > drivers/iommu/Kconfig | 2 ++ > drivers/net/wireless/ath/ath10k/Kconfig | 1 + > 5 files changed, 11 insertions(+), 3 deletions(-) For ath10k part: Acked-by: Kalle Valo -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [GIT PULL] dma-mapping fix for 5.10
On Sat, Oct 31, 2020 at 12:50:44PM -0700, Linus Torvalds wrote: > So this is just a stylistic nit, and has no impact on this pull (which > I've done). But looking at the patch, it triggers one of my "this is > wrong" patterns. Adding the author and maintainer of that code so that they can sort it out. > > In particular, this: > > u64 dma_start = 0; > ... > for (dma_start = ~0ULL; r->size; r++) { > > is actually completely bogus in theory, and it's a horribly horribly > bad pattern to have. > > The thing that I hate about that parttern is "~0ULL", which is simply wrong. > > The correct pattern for "all bits set" is ~0. NOTHING ELSE. No extra > letters at the end. > > Why? Because using an unsigned type is wrong, and will not extend the > bits up to a potentially bigger size. > > So adding that "ULL" is not just three extra characters to type, it > actually _detracts_ from the code and makes it more fragile and > potentially wrong. > > It so happens, that yes, in the kernel, "ull" us 64-bit, and you get > the right results. So the code _works_. But it's wrong, and it now > requires that the types match exactly (ie it would not be broken if > somebody ever were to say "I want to use use 128-bit dma addresses and > u128"). > > Another example is using "~0ul", which would give different results on > a 32-bit kernel and a 64-bit kernel. Again: DON'T DO THAT. > > I repeat: the right thing to do for "all bits set" is just a plain ~0 > or -1. Either of those are fine (technically assumes a 2's complement > machine, but let's just be honest: that's a perfectly fine assumption, > and -1 might be preferred by some because it makes that sign extension > behavior of the integer constant more obvious). > > Don't try to do anything clever or anything else, because it's going > to be strictly worse. > > The old code that that patch removed was "technically correct", but > just pointless, and actually shows the problem: > > for (dma_start = ~(dma_addr_t)0; r->size; r++) { > > the above is indeed a correct way to say "I want all bits set in a > dma_addr_t", but while correct, it is - once again - strictly inferior > to just using "~0". > > Why? Because "~0" works regardless of type. IOW, exactly *because* > people used the wrong pattern for "all bits set", that patch was now > (a) bigger than necessary and (b) much more ilkely to cause bugs (ie I > could have imagined people changing just the type of the variable > without changing the initialization). > > So in that tiny three-line patch there were actually several examples > of why "~0" is the right pattern to use for "all bits set". Because it > JustWorks(tm) in ways other patterns do not. > > And if you have a compiler that complains about assigning -1 or ~0 to > an unsigned variable, get rid of that piece of garbage. You're almost > certainly either using some warning flag that you shouldn't be using, > or the compiler writer didn't know what they were doing. > > Linus > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ---end quoted text--- ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/3] dt-bindings: memory: mediatek: Convert SMI to DT
On Sat, 2020-10-31 at 12:36 +0100, Krzysztof Kozlowski wrote: > On Fri, Oct 30, 2020 at 05:12:52PM +0800, Yong Wu wrote: > > Convert MediaTek SMI to DT schema. > > > > CC: Fabien Parent > > CC: Ming-Fan Chen > > CC: Matthias Brugger > > Signed-off-by: Yong Wu > > --- > > .../mediatek,smi-common.txt | 50 --- > > .../mediatek,smi-common.yaml | 140 ++ > > .../memory-controllers/mediatek,smi-larb.txt | 50 --- > > .../memory-controllers/mediatek,smi-larb.yaml | 129 > > 4 files changed, 269 insertions(+), 100 deletions(-) > > delete mode 100644 > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt > > +Cc Honghui Zhang, As comment [1], Honghui's address is not valid now. I will act for him. > > Your Ack is needed as you contributed descriptions to the bindings and > work is being relicensed to GPL-2.0-only OR BSD-2-Clause. "GPL-2.0-only OR BSD-2-Clause" is required when we run check-patch. If I still use "GPL-2.0-only", then the contributors' Ack/SoB is not needed, right? [1] https://lore.kernel.org/linux-iommu/1604051256.26323.100.camel@mhfsdcap03/T/#u > > > Best regards, > Krzysztof > > > > > > create mode 100644 > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml > > delete mode 100644 > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt > > create mode 100644 > > Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml > > > > diff --git > > a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt > > > > b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt > > deleted file mode 100644 > > index dbafffe3f41e.. > > --- > > a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt > > +++ /dev/null > > @@ -1,50 +0,0 @@ > > -SMI (Smart Multimedia Interface) Common > > - > > -The hardware block diagram please check bindings/iommu/mediatek,iommu.txt > > - > > -Mediatek SMI have two generations of HW architecture, here is the list > > -which generation the SoCs use: > > -generation 1: mt2701 and mt7623. > > -generation 2: mt2712, mt6779, mt8167, mt8173 and mt8183. > > - > > -There's slight differences between the two SMI, for generation 2, the > > -register which control the iommu port is at each larb's register base. But > > -for generation 1, the register is at smi ao base(smi always on register > > -base). Besides that, the smi async clock should be prepared and enabled for > > -SMI generation 1 to transform the smi clock into emi clock domain, but > > that is > > -not needed for SMI generation 2. > > - > > -Required properties: > > -- compatible : must be one of : > > - "mediatek,mt2701-smi-common" > > - "mediatek,mt2712-smi-common" > > - "mediatek,mt6779-smi-common" > > - "mediatek,mt7623-smi-common", "mediatek,mt2701-smi-common" > > - "mediatek,mt8167-smi-common" > > - "mediatek,mt8173-smi-common" > > - "mediatek,mt8183-smi-common" > > -- reg : the register and size of the SMI block. > > -- power-domains : a phandle to the power domain of this local arbiter. > > -- clocks : Must contain an entry for each entry in clock-names. > > -- clock-names : must contain 3 entries for generation 1 smi HW and 2 > > entries > > - for generation 2 smi HW as follows: > > - - "apb" : Advanced Peripheral Bus clock, It's the clock for setting > > - the register. > > - - "smi" : It's the clock for transfer data and command. > > - They may be the same if both source clocks are the same. > > - - "async" : asynchronous clock, it help transform the smi clock into the > > emi > > - clock domain, this clock is only needed by generation 1 smi HW. > > - and these 2 option clocks for generation 2 smi HW: > > - - "gals0": the path0 clock of GALS(Global Async Local Sync). > > - - "gals1": the path1 clock of GALS(Global Async Local Sync). > > - Here is the list which has this GALS: mt6779 and mt8183. > > - > > -Example: > > - smi_common: smi@14022000 { > > - compatible = "mediatek,mt8173-smi-common"; > > - reg = <0 0x14022000 0 0x1000>; > > - power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>; > > - clocks = <&mmsys CLK_MM_SMI_COMMON>, > > -<&mmsys CLK_MM_SMI_COMMON>; > > - clock-names = "apb", "smi"; > > - }; > > diff --git > > a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml > > > > b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml > > new file mode 100644 > > index ..e050a0c2aed6 > > --- /dev/null > > +++ > > b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml > > @@ -0,0 +1,140 @@ > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > > +# Copyright (c) 2020 MediaTek Inc. > > +%YAML 1.2 > > +--- > > +$id: > > http
Re: [PATCH v3 00/14] iommu/amd: Add Generic IO Page Table Framework Support
Joerg, You mentioned to remind you to pull this in to linux-next. Thanks, Suravee On 10/4/20 8:45 AM, Suravee Suthikulpanit wrote: The framework allows callable implementation of IO page table. This allows AMD IOMMU driver to switch between different types of AMD IOMMU page tables (e.g. v1 vs. v2). This series refactors the current implementation of AMD IOMMU v1 page table to adopt the framework. There should be no functional change. Subsequent series will introduce support for the AMD IOMMU v2 page table. Thanks, Suravee Change from V2 (https://lore.kernel.org/lkml/835c0d46-ed96-9fbe-856a-777dcffac...@amd.com/T/#t) - Patch 2/14: Introduce helper function io_pgtable_cfg_to_data. - Patch 13/14: Put back the struct iommu_flush_ops since patch v2 would run into NULL pointer bug when calling free_io_pgtable_ops if not defined. Change from V1 (https://lkml.org/lkml/2020/9/23/251) - Do not specify struct io_pgtable_cfg.coherent_walk, since it is not currently used. (per Robin) - Remove unused struct iommu_flush_ops. (patch 2/13) - Move amd_iommu_setup_io_pgtable_ops to iommu.c instead of io_pgtable.c patch 13/13) Suravee Suthikulpanit (14): iommu/amd: Re-define amd_iommu_domain_encode_pgtable as inline iommu/amd: Prepare for generic IO page table framework iommu/amd: Move pt_root to to struct amd_io_pgtable iommu/amd: Convert to using amd_io_pgtable iommu/amd: Declare functions as extern iommu/amd: Move IO page table related functions iommu/amd: Restructure code for freeing page table iommu/amd: Remove amd_iommu_domain_get_pgtable iommu/amd: Rename variables to be consistent with struct io_pgtable_ops iommu/amd: Refactor fetch_pte to use struct amd_io_pgtable iommu/amd: Introduce iommu_v1_iova_to_phys iommu/amd: Introduce iommu_v1_map_page and iommu_v1_unmap_page iommu/amd: Introduce IOMMU flush callbacks iommu/amd: Adopt IO page table framework drivers/iommu/amd/Kconfig | 1 + drivers/iommu/amd/Makefile | 2 +- drivers/iommu/amd/amd_iommu.h | 22 + drivers/iommu/amd/amd_iommu_types.h | 43 +- drivers/iommu/amd/io_pgtable.c | 564 drivers/iommu/amd/iommu.c | 646 +++- drivers/iommu/io-pgtable.c | 3 + include/linux/io-pgtable.h | 2 + 8 files changed, 691 insertions(+), 592 deletions(-) create mode 100644 drivers/iommu/amd/io_pgtable.c ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/7] Convert the intel iommu driver to the dma-iommu api
Hi Tvrtko, On 10/12/20 4:44 PM, Tvrtko Ursulin wrote: On 29/09/2020 01:11, Lu Baolu wrote: Hi Tvrtko, On 9/28/20 5:44 PM, Tvrtko Ursulin wrote: On 27/09/2020 07:34, Lu Baolu wrote: Hi, The previous post of this series could be found here. https://lore.kernel.org/linux-iommu/20200912032200.11489-1-baolu...@linux.intel.com/ This version introduce a new patch [4/7] to fix an issue reported here. https://lore.kernel.org/linux-iommu/51a1baec-48d1-c0ac-181b-1fba92aa4...@linux.intel.com/ There aren't any other changes. Please help to test and review. Best regards, baolu Lu Baolu (3): iommu: Add quirk for Intel graphic devices in map_sg Since I do have patches to fix i915 to handle this, do we want to co-ordinate the two and avoid having to add this quirk and then later remove it? Or you want to go the staged approach? I have no preference. It depends on which patch goes first. Let the maintainers help here. FYI we have merged the required i915 patches to out tree last week or so. I *think* this means they will go into 5.11. So the i915 specific workaround patch will not be needed in Intel IOMMU. Do you mind telling me what's the status of this fix patch? I tried this series on v5.10-rc1 with the graphic quirk patch dropped. I am still seeing dma faults from graphic device. Best regards, baolu Regards, Tvrtko ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs
Hi Barry, I love your patch! Yet something to improve: [auto build test ERROR on kselftest/next] [also build test ERROR on linus/master v5.10-rc1] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Barry-Song/dma-mapping-provide-a-benchmark-for-streaming-DMA-mapping/20201101-182009 base: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next config: h8300-allyesconfig (attached as .config) compiler: h8300-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/b9abda38be7f32b9420c27b6c24eff2e69defa87 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Barry-Song/dma-mapping-provide-a-benchmark-for-streaming-DMA-mapping/20201101-182009 git checkout b9abda38be7f32b9420c27b6c24eff2e69defa87 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): h8300-linux-ld: kernel/dma/map_benchmark.o: in function `.L28': map_benchmark.c:(.text+0x283): undefined reference to `__udivdi3' >> h8300-linux-ld: map_benchmark.c:(.text+0x2c1): undefined reference to >> `__udivdi3' h8300-linux-ld: map_benchmark.c:(.text+0x327): undefined reference to `__udivdi3' h8300-linux-ld: kernel/dma/map_benchmark.o: in function `.L26': map_benchmark.c:(.text+0x3d7): undefined reference to `__udivdi3' h8300-linux-ld: kernel/dma/map_benchmark.o: in function `.L44': map_benchmark.c:(.text+0x799): undefined reference to `__divdi3' h8300-linux-ld: kernel/dma/map_benchmark.o: in function `.L45': map_benchmark.c:(.text+0x7f5): undefined reference to `__divdi3' --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs
Hi Barry, I love your patch! Yet something to improve: [auto build test ERROR on kselftest/next] [also build test ERROR on linus/master v5.10-rc1 next-20201030] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Barry-Song/dma-mapping-provide-a-benchmark-for-streaming-DMA-mapping/20201101-182009 base: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next config: mips-allyesconfig (attached as .config) compiler: mips-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/b9abda38be7f32b9420c27b6c24eff2e69defa87 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Barry-Song/dma-mapping-provide-a-benchmark-for-streaming-DMA-mapping/20201101-182009 git checkout b9abda38be7f32b9420c27b6c24eff2e69defa87 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): arch/mips/kernel/head.o: in function `dtb_found': (.ref.text+0xe0): relocation truncated to fit: R_MIPS_26 against `start_kernel' init/main.o: in function `set_reset_devices': main.c:(.init.text+0x20): relocation truncated to fit: R_MIPS_26 against `_mcount' main.c:(.init.text+0x30): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' init/main.o: in function `debug_kernel': main.c:(.init.text+0x9c): relocation truncated to fit: R_MIPS_26 against `_mcount' main.c:(.init.text+0xac): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' init/main.o: in function `quiet_kernel': main.c:(.init.text+0x118): relocation truncated to fit: R_MIPS_26 against `_mcount' main.c:(.init.text+0x128): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' init/main.o: in function `init_setup': main.c:(.init.text+0x1a4): relocation truncated to fit: R_MIPS_26 against `_mcount' main.c:(.init.text+0x1c8): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' main.c:(.init.text+0x1e8): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc' main.c:(.init.text+0x1fc): additional relocation overflows omitted from the output mips-linux-ld: kernel/dma/map_benchmark.o: in function `map_benchmark_thread': >> map_benchmark.c:(.text.map_benchmark_thread+0x1f4): undefined reference to >> `__divdi3' >> mips-linux-ld: map_benchmark.c:(.text.map_benchmark_thread+0x218): undefined >> reference to `__divdi3' mips-linux-ld: kernel/dma/map_benchmark.o: in function `do_map_benchmark': >> map_benchmark.c:(.text.do_map_benchmark+0x260): undefined reference to >> `__udivdi3' >> mips-linux-ld: map_benchmark.c:(.text.do_map_benchmark+0x284): undefined >> reference to `__udivdi3' mips-linux-ld: map_benchmark.c:(.text.do_map_benchmark+0x2b4): undefined reference to `__udivdi3' mips-linux-ld: map_benchmark.c:(.text.do_map_benchmark+0x300): undefined reference to `__udivdi3' --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/2] dma-mapping: add benchmark support for streaming DMA APIs
Nowadays, there are increasing requirements to benchmark the performance of dma_map and dma_unmap particually while the device is attached to an IOMMU. This patch enables the support. Users can run specified number of threads to do dma_map_page and dma_unmap_page on a specific NUMA node with the specified duration. Then dma_map_benchmark will calculate the average latency for map and unmap. A difficulity for this benchmark is that dma_map/unmap APIs must run on a particular device. Each device might have different backend of IOMMU or non-IOMMU. So we use the driver_override to bind dma_map_benchmark to a particual device by: For platform devices: echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override echo xxx > /sys/bus/platform/drivers/xxx/unbind echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind For PCI devices: echo dma_map_benchmark > /sys/bus/pci/devices/:00:01.0/driver_override echo :00:01.0 > /sys/bus/pci/drivers/xxx/unbind echo :00:01.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind Cc: Joerg Roedel Cc: Will Deacon Cc: Shuah Khan Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Signed-off-by: Barry Song --- -v2: * add PCI support; v1 supported platform devices only * replace ssleep by msleep_interruptible() to permit users to exit benchmark before it is completed * many changes according to Robin's suggestions, thanks! Robin - add standard deviation output to reflect the worst case - check users' parameters strictly like the number of threads - make cache dirty before dma_map - fix unpaired dma_map_page and dma_unmap_single; - remove redundant "long long" before ktime_to_ns(); - use devm_add_action(); - wakeup all threads together after they are ready kernel/dma/Kconfig | 8 + kernel/dma/Makefile| 1 + kernel/dma/map_benchmark.c | 295 + 3 files changed, 304 insertions(+) create mode 100644 kernel/dma/map_benchmark.c diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index c99de4a21458..949c53da5991 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -225,3 +225,11 @@ config DMA_API_DEBUG_SG is technically out-of-spec. If unsure, say N. + +config DMA_MAP_BENCHMARK + bool "Enable benchmarking of streaming DMA mapping" + help + Provides /sys/kernel/debug/dma_map_benchmark that helps with testing + performance of dma_(un)map_page. + + See tools/testing/selftests/dma/dma_map_benchmark.c diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile index dc755ab68aab..7aa6b26b1348 100644 --- a/kernel/dma/Makefile +++ b/kernel/dma/Makefile @@ -10,3 +10,4 @@ obj-$(CONFIG_DMA_API_DEBUG) += debug.o obj-$(CONFIG_SWIOTLB) += swiotlb.o obj-$(CONFIG_DMA_COHERENT_POOL)+= pool.o obj-$(CONFIG_DMA_REMAP)+= remap.o +obj-$(CONFIG_DMA_MAP_BENCHMARK)+= map_benchmark.o diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c new file mode 100644 index ..ac397758087b --- /dev/null +++ b/kernel/dma/map_benchmark.c @@ -0,0 +1,295 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020 Hisilicon Limited. + */ + +#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) +#define DMA_MAP_MAX_THREADS1024 +#define DMA_MAP_MAX_SECONDS300 + +struct map_benchmark { + __u64 avg_map_100ns; /* average map latency in 100ns */ + __u64 map_stddev; /* standard deviation of map latency */ + __u64 avg_unmap_100ns; /* as above */ + __u64 unmap_stddev; + __u32 threads; /* how many threads will do map/unmap in parallel */ + __u32 seconds; /* how long the test will last */ + int node; /* which numa node this benchmark will run on */ + __u64 expansion[10];/* For future use */ +}; + +struct map_benchmark_data { + struct map_benchmark bparam; + struct device *dev; + struct dentry *debugfs; + atomic64_t sum_map_100ns; + atomic64_t sum_unmap_100ns; + atomic64_t sum_square_map; + atomic64_t sum_square_unmap; + atomic64_t loops; +}; + +static int map_benchmark_thread(void *data) +{ + void *buf; + dma_addr_t dma_addr; + struct map_benchmark_data *map = data; + int ret = 0; + + buf = (void *)__get_free_page(GFP_KERNEL); + if (!buf) + return -ENOMEM; + + while (!kthread_should_stop()) { + __u64 map_100ns, unmap_100ns, map_square, unmap_square; + ktime_t map_stime, map_etime, unmap_stime, unmap_etime; + + /* +* for a non-coherent device, if we don't stain them in the cache, +* this will give an underes
[PATCH v2 0/2] dma-mapping: provide a benchmark for streaming DMA mapping
Nowadays, there are increasing requirements to benchmark the performance of dma_map and dma_unmap particually while the device is attached to an IOMMU. This patchset provides the benchmark infrastruture for streaming DMA mapping. The architecture of the code is pretty much similar with GUP benchmark: * mm/gup_benchmark.c provides kernel interface; * tools/testing/selftests/vm/gup_benchmark.c provides user program to call the interface provided by mm/gup_benchmark.c. In our case, kernel/dma/map_benchmark.c is like mm/gup_benchmark.c; tools/testing/selftests/dma/dma_map_benchmark.c is like tools/testing/ selftests/vm/gup_benchmark.c A major difference with GUP benchmark is DMA_MAP benchmark needs to run on a device. Considering one board with below devices and IOMMUs device A --- IOMMU 1 device B --- IOMMU 2 device C --- non-IOMMU Different devices might attach to different IOMMU or non-IOMMU. To make benchmark run, we can either * create a virtual device and hack the kernel code to attach the virtual device to IOMMU1, IOMMU2 or non-IOMMU. * use the existing driver_override mechinism, unbind device A,B, OR c from their original driver and bind A to dma_map_benchmark platform driver or pci driver for benchmarking. In this patchset, I prefer to use the driver_override and avoid the ugly hack in kernel. We can dynamically switch device behind different IOMMUs to get the performance of IOMMU or non-IOMMU. -v2: * add PCI support; v1 supported platform devices only * replace ssleep by msleep_interruptible() to permit users to exit benchmark before it is completed * many changes according to Robin's suggestions, thanks! Robin - add standard deviation output to reflect the worst case - check users' parameters strictly like the number of threads - make cache dirty before dma_map - fix unpaired dma_map_page and dma_unmap_single; - remove redundant "long long" before ktime_to_ns(); - use devm_add_action(); - wakeup all threads together after they are ready Barry Song (2): dma-mapping: add benchmark support for streaming DMA APIs selftests/dma: add test application for DMA_MAP_BENCHMARK MAINTAINERS | 6 + kernel/dma/Kconfig| 8 + kernel/dma/Makefile | 1 + kernel/dma/map_benchmark.c| 295 ++ tools/testing/selftests/dma/Makefile | 6 + tools/testing/selftests/dma/config| 1 + .../testing/selftests/dma/dma_map_benchmark.c | 87 ++ 7 files changed, 404 insertions(+) create mode 100644 kernel/dma/map_benchmark.c create mode 100644 tools/testing/selftests/dma/Makefile create mode 100644 tools/testing/selftests/dma/config create mode 100644 tools/testing/selftests/dma/dma_map_benchmark.c -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/2] selftests/dma: add test application for DMA_MAP_BENCHMARK
This patch provides the test application for DMA_MAP_BENCHMARK. Before running the test application, we need to bind a device to dma_map_ benchmark driver. For example, unbind "xxx" from its original driver and bind to dma_map_benchmark: echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override echo xxx > /sys/bus/platform/drivers/xxx/unbind echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind Another example for PCI devices: echo dma_map_benchmark > /sys/bus/pci/devices/:00:01.0/driver_override echo :00:01.0 > /sys/bus/pci/drivers/xxx/unbind echo :00:01.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind The below command will run 16 threads on numa node 0 for 10 seconds on the device bound to dma_map_benchmark platform_driver or pci_driver: ./dma_map_benchmark -t 16 -s 10 -n 0 dma mapping benchmark: threads:16 seconds:10 average map latency(us):1.1 standard deviation:1.9 average unmap latency(us):0.5 standard deviation:0.8 Cc: Joerg Roedel Cc: Will Deacon Cc: Shuah Khan Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Signed-off-by: Barry Song --- -v2: * check parameters like threads, seconds strictly * print standard deviation for latencies MAINTAINERS | 6 ++ tools/testing/selftests/dma/Makefile | 6 ++ tools/testing/selftests/dma/config| 1 + .../testing/selftests/dma/dma_map_benchmark.c | 87 +++ 4 files changed, 100 insertions(+) create mode 100644 tools/testing/selftests/dma/Makefile create mode 100644 tools/testing/selftests/dma/config create mode 100644 tools/testing/selftests/dma/dma_map_benchmark.c diff --git a/MAINTAINERS b/MAINTAINERS index 608fc8484c02..a1e38d5e14f6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5247,6 +5247,12 @@ F: include/linux/dma-mapping.h F: include/linux/dma-map-ops.h F: kernel/dma/ +DMA MAPPING BENCHMARK +M: Barry Song +L: iommu@lists.linux-foundation.org +F: kernel/dma/map_benchmark.c +F: tools/testing/selftests/dma/ + DMA-BUF HEAPS FRAMEWORK M: Sumit Semwal R: Benjamin Gaignard diff --git a/tools/testing/selftests/dma/Makefile b/tools/testing/selftests/dma/Makefile new file mode 100644 index ..aa8e8b5b3864 --- /dev/null +++ b/tools/testing/selftests/dma/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +CFLAGS += -I../../../../usr/include/ + +TEST_GEN_PROGS := dma_map_benchmark + +include ../lib.mk diff --git a/tools/testing/selftests/dma/config b/tools/testing/selftests/dma/config new file mode 100644 index ..6102ee3c43cd --- /dev/null +++ b/tools/testing/selftests/dma/config @@ -0,0 +1 @@ +CONFIG_DMA_MAP_BENCHMARK=y diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c new file mode 100644 index ..4778df0c458f --- /dev/null +++ b/tools/testing/selftests/dma/dma_map_benchmark.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2020 Hisilicon Limited. + */ + +#include +#include +#include +#include +#include +#include +#include + +#define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) +#define DMA_MAP_MAX_THREADS1024 +#define DMA_MAP_MAX_SECONDS 300 + +struct map_benchmark { + __u64 avg_map_100ns; /* average map latency in 100ns */ + __u64 map_stddev; /* standard deviation of map latency */ + __u64 avg_unmap_100ns; /* as above */ + __u64 unmap_stddev; + __u32 threads; /* how many threads will do map/unmap in parallel */ + __u32 seconds; /* how long the test will last */ + int node; /* which numa node this benchmark will run on */ + __u64 expansion[10];/* For future use */ +}; + +int main(int argc, char **argv) +{ + struct map_benchmark map; + int fd, opt; + /* default single thread, run 20 seconds on NUMA_NO_NODE */ + int threads = 1, seconds = 20, node = -1; + int cmd = DMA_MAP_BENCHMARK; + char *p; + + while ((opt = getopt(argc, argv, "t:s:n:")) != -1) { + switch (opt) { + case 't': + threads = atoi(optarg); + break; + case 's': + seconds = atoi(optarg); + break; + case 'n': + node = atoi(optarg); + break; + default: + return -1; + } + } + + if (threads <= 0 || threads > DMA_MAP_MAX_THREADS) { + fprintf(stderr, "invalid number of threads, must be in 1-%d\n", + DMA_MAP_MAX_THREADS); + exit(1); + } + + if (seconds <= 0 || seconds > DMA_MAP_MAX_SECONDS) { + fprintf(stderr, "invalid number of seconds, must be in 1-%d\n", + DMA_MAP_MAX_SECONDS); + exit(1); + } + + fd =