** Description changed: SRU Justification: [Impact] - * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer" - (upstream with since kernel v6.7-rc1) there was a move (on s390x only) - to a different dma-iommu implementation. + * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer" + (upstream with since kernel v6.7-rc1) there was a move (on s390x only) + to a different dma-iommu implementation. - * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters" - (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config - option should now be set to 'yes' by default for s390x. + * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters" + (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config + option should now be set to 'yes' by default for s390x. - * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY - are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be - set to "no" by default, which was upstream done by b2b97a62f055 - "Revert "s390: update defconfigs"". + * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY + are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be + set to "no" by default, which was upstream done by b2b97a62f055 + "Revert "s390: update defconfigs"". - * These changes are all upstream, but were not picked up by the Ubuntu - kernel config. + * These changes are all upstream, but were not picked up by the Ubuntu + kernel config. - * And not having these config options set properly is causing significant - PCI-related network throughput degradation (up to -72%). + * And not having these config options set properly is causing significant + PCI-related network throughput degradation (up to -72%). - * This shows for almost all workloads and numbers of connections, - deteriorating with the number of connections increasing. + * This shows for almost all workloads and numbers of connections, + deteriorating with the number of connections increasing. - * Especially drastic is the drop for a high number of parallel connections - (50 and 250) and for small and medium-size transactional workloads. - However, also for streaming-type workloads the degradation is clearly - visible (up to 48% degradation). + * Especially drastic is the drop for a high number of parallel connections + (50 and 250) and for small and medium-size transactional workloads. + However, also for streaming-type workloads the degradation is clearly + visible (up to 48% degradation). [Fix] - * The (upstream) fix is to set - IOMMU_DEFAULT_DMA_STRICT=no - and - IOMMU_DEFAULT_DMA_LAZY=y - (which is needed for the changed DAM IOMMU implementation since v6.7). + * The (upstream accepted) fix is to set + IOMMU_DEFAULT_DMA_STRICT=no + and + IOMMU_DEFAULT_DMA_LAZY=y + (which is needed for the changed DAM IOMMU implementation since v6.7). [Test Case] - * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8) - (one acting as server and as client) - that have (PCIe attached) RoCE Express devices attached - and that are connected to each other. + * Setup two Ubuntu Server 24.04 LPARs (with kernel 6.8) + (one acting as server and as client) + that have (PCIe attached) RoCE Express devices attached + and that are connected to each other. - * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml: - <?xml version="1.0"?> - <profile name="TCP_RR"> - <group nprocs="250"> - <transaction iterations="1"> - <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> - </transaction> - <transaction duration="300"> - <flowop type="write" options="size=200"/> - <flowop type="read" options="size=1000"/> - </transaction> - <transaction iterations="1"> - <flowop type="disconnect" /> - </transaction> - </group> - </profile> + * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml: + <?xml version="1.0"?> + <profile name="TCP_RR"> + <group nprocs="250"> + <transaction iterations="1"> + <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> + </transaction> + <transaction duration="300"> + <flowop type="write" options="size=200"/> + <flowop type="read" options="size=1000"/> + </transaction> + <transaction iterations="1"> + <flowop type="disconnect" /> + </transaction> + </group> + </profile> - * Install uperf on both systems, client and server. + * Install uperf on both systems, client and server. - * Start uperf at server: uperf -s + * Start uperf at server: uperf -s - * Start uperf at client: uperf -vai 5 -m uperf-profile.xml + * Start uperf at client: uperf -vai 5 -m uperf-profile.xml - * Switch from strict to lazy mode - either using the new kernel (or the test build below) - or using kernel cmd-line parameter iommu.strict=0. + * Switch from strict to lazy mode + either using the new kernel (or the test build below) + or using kernel cmd-line parameter iommu.strict=0. - * Restart uperf on server and client, like before. + * Restart uperf on server and client, like before. - * Verification will be performed by IBM. + * Verification will be performed by IBM. [Regression Potential] - * The is a certain regression potential, since the behavior with - the two modified kernel config options will change significantly. + * The is a certain regression potential, since the behavior with + the two modified kernel config options will change significantly. - * This may solve the (network) throughput issue with PCI devices, - but may also come with side-effects on other PCIe based devices - (the old compression adapters or the new NVMe carrier cards). + * This may solve the (network) throughput issue with PCI devices, + but may also come with side-effects on other PCIe based devices + (the old compression adapters or the new NVMe carrier cards). [Other] - * CCW devices are not affected. + * CCW devices are not affected. - * This is s390x-specific only, hence will not affect any other + * This is s390x-specific only, hence will not affect any other architecture. __________ Symptom: Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 22.04, all of our PCI-related network measurements on LPAR show massive throughput degradations (up to -72%). This shows for almost all workloads and numbers of connections, detereorating with the number of connections increasing. Especially drastic is the drop for a high number of parallel connections (50 and 250) and for small and medium-size transactional workloads. However, also for streaming-type workloads the degradation is clearly visible (up to 48% degradation). Problem: With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode changed from lazy to strict, causing these massive degradations. Behavior can also be changed with a kernel commandline parameter (iommu.strict) for easy verification. The issue is known and was quickly fixed upstream in December 2023, after being present for little less than two weeks. Upstream fix: https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a Repro: rr1c-200x1000-250 with rr1c-200x1000-250.xml: <?xml version="1.0"?> <profile name="TCP_RR"> <group nprocs="250"> <transaction iterations="1"> <flowop type="connect" options="remotehost=<remote IP> protocol=tcp tcp_nodelay" /> </transaction> <transaction duration="300"> <flowop type="write" options="size=200"/> <flowop type="read" options="size=1000"/> </transaction> <transaction iterations="1"> <flowop type="disconnect" /> </transaction> </group> </profile> 0) Install uperf on both systems, client and server. 1) Start uperf at server: uperf -s 2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml 3) Switch from strict to lazy mode using kernel commandline parameter iommu.strict=0. 4) Repeat steps 1) and 2). Example: For the following example, we chose the workload named above (rr1c-200x1000-250): iommu.strict=1 (strict): 233464.914 TPS iommu.strict=0 (lazy): 835123.193 TPS
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2071471 Title: [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive throughput degradation for PCI-related network workloads To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/2071471/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
