** Description changed: - Can you please cherry-pick those 2 important commits from upstream? + BugLink: https://bugs.launchpad.net/bugs/2093871 - 1. net: introduce helper sendpages_ok() - (https://github.com/torvalds/linux/commit/23a55f44) + [Impact] - Network drivers are using sendpage_ok() to check the first page of an - iterator in order to disable MSG_SPLICE_PAGES. The iterator can - represent list of contiguous pages. + Currently the nvme-tcp and drbg subsystems try to enable the MSG_SPLICE_PAGES + flag on pages to be written, and when MSG_SPLICE_PAGES is set, eventually it + calls skb_splice_from_iter(), which then checks all pages with sendpage_ok() + to see if all the pages are sendable. - When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used, - it requires all pages in the iterator to be sendable. Therefore it needs - to check that each page is sendable. + At the moment, both subsystems only check the first page in a potentially + contiguous block of pages, if they are sendpage_ok(), and if the first page is, + then it just assumes all the rest are sendpage_ok() too, and sends the I/O off + to eventually be found out by skb_splice_from_iter(). If one or more of the + pages in the contiguous block is not sendpage_ok(), then we get a warn printed, + data transfer is aborted. In the nvme-tcp case, IO then hangs. - The patch introduces a helper sendpages_ok(), it returns true if all the - contiguous pages are sendable. + This patchset introduces sendpages_ok() which iterates over each page in a + contiguous block, checks if it is sendpage_ok(), and only returns true if all + of them are. - Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use - this helper to check whether the page list is OK. If the helper does not - return true, the driver should remove MSG_SPLICE_PAGES flag. + This resolves the whole MSG_SPLICE_PAGES flag situation, since you can now + depend on the result of sendpages_ok(), instead of just assuming everything is + okay. + This issue is what caused bug 2075110 [0] to be discovered in the first place, + since it was responsible for contigious blocks of pages where the first was + sendpage_ok(), but pages further into the block were not. - 2. nvme-tcp: use sendpages_ok() instead of sendpage_ok() (https://github.com/torvalds/linux/commit/6af7331a) + [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110 - Currently nvme_tcp_try_send_data() use sendpage_ok() in order to disable - MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator - may represent contiguous pages. + Even with "md/md-bitmap: fix writing non bitmap pages" applied, the issue can + still happen, e.g. with merged IO pages, so this fix is still needed to + eliminate the issue. - MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the - pages it sends with sendpage_ok(). + [Fix] - When nvme_tcp_try_send_data() sends an iterator that the first page is - sendable, but one of the other pages isn't skb_splice_from_iter() warns - and aborts the data transfer. + The fixes landed in mainline 6.12-rc1: - Using the new helper sendpages_ok() in order to disable MSG_SPLICE_PAGES - solves the issue. + commit 23a55f4492fcf868d068da31a2cd30c15f46207d + Author: Ofir Gal <[email protected]> + Date: Thu Jul 18 11:45:12 2024 +0300 + Subject: net: introduce helper sendpages_ok() + Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23a55f4492fcf868d068da31a2cd30c15f46207d + + commit 6af7331a70b4888df43ec1d7e1803ae2c43b6981 + Author: Ofir Gal <[email protected]> + Date: Thu Jul 18 11:45:13 2024 +0300 + Subject: nvme-tcp: use sendpages_ok() instead of sendpage_ok() + Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af7331a70b4888df43ec1d7e1803ae2c43b6981 + + commit 7960af373ade3b39e10106ef415e43a1d2aa48c6 + Author: Ofir Gal <[email protected]> + Date: Thu Jul 18 11:45:14 2024 +0300 + Subject: drbd: use sendpages_ok() instead of sendpage_ok() + Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7960af373ade3b39e10106ef415e43a1d2aa48c6 + + They are needed for noble and oracular. + + [Testcase] + + This is the same testcase as the original bug 2075110 [0], as the fix is + designed to prevent it or similar other bugs from happening again. + + [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110 + + Because of this, the fix: + + commit ab99a87542f194f28e2364a42afbf9fb48b1c724 + Author: Ofir Gal <[email protected]> + Date: Fri Jun 7 10:27:44 2024 +0300 + Subject: md/md-bitmap: fix writing non bitmap pages + Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab99a87542f194f28e2364a42afbf9fb48b1c724 + + needs to be reverted during your test runs, or you won't see the issue + reproduce. + + You can use this ppa for updated kernels with the revert to trigger the + issue: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-revert + + This can be reproduced by running blktests md/001 [1], which the author + of the fix created to act as a regression test for this issue. + + [1] + https://github.com/osandov/blktests/commit/a24a7b462816fbad7dc6c175e53fcc764ad0a822 + + Deploy a fresh Noble VM, that has a scratch NVME disk. + + $ sudo apt install build-essential fio + $ git clone https://github.com/osandov/blktests.git + $ cd blktests + $ make + $ echo "TEST_DEVS=(/dev/nvme0n1)" > config + $ sudo ./check md/001 + + The md/001 test will hang an affected system, and the above oops message + will be visible in dmesg. + + A test kernel is available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-test + + This has both the fixes for this bug, and also bug 2075110. The issue will not + reproduce. + + There is also a test kernel available with the fix for this bug present, and the + fix for bug 2075110 reverted, so you can see the impact of these patches only: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-repro + + This will also not reproduce the issue anymore. + + [Where problems could occur] + + What we are changing is rather simple. Instead of checking the first page and + assuming all the rest in the contiguous block are sendpage_ok(), we now + check each page in the contiguous block to see if all of them are sendpage_ok(). + + If any aren't, then we abort the write to the driver, and try again later. This + saves us time. + + However, it does take longer to call sendpage_ok() on each of the pages in the + contiguous block, so there will be a minor performance hit. + + Small performance hit for correctness should be okay. + + Currently we are only applying to nvme-tcp and drbg subsystems. If a regression + were to occur, it would affect users of those subsystems only. + + [Other info] + + Upstream mailing list: + https://lore.kernel.org/all/[email protected]/T/#u
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2093871 Title: Introduce and use sendpages_ok() instead of sendpage_ok() in nvme-tcp and drbg To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2093871/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
