[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Attached is a V2 patch for Noble for e2fsprogs. ** Patch added: "Debdiff for e2fsprogs on noble V2" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5772258/+files/lp2036467_noble_V2.debdiff ** Patch removed: "Debdiff for e2fsprogs on noble" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5738302/+files/lp2036467_noble.debdiff ** Patch removed: "Debdiff for e2fsprogs on lunar" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707894/+files/lp2036467_lunar.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Status in e2fsprogs source package in Noble: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Krister, Thanks for the heads up about 1.47.1 upstream, it does indeed look like a release is coming soon. It seems Debian unstable already has 1.7.1-rc1: https://packages.debian.org/sid/e2fsprogs When the Ubuntu archive opens for OO, we will merge 1.47.1~rc1-1 from debian unstable, and then submit the patches for SRU to noble, mantic, jammy and focal. Should be a few days. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Status in e2fsprogs source package in Noble: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
@Matthew I took a look at your debdiffs (I hope they are updated) and they look good in general, I checked the debdiffs for Focal, Jammy, Mantic and Noble. The Noble debdiff requires a rebase, now in Noble we have version 1.47.0-2.4~exp1ubuntu4, so we want version 1.47.0-2.4~exp1ubuntu4.1 with your changes (it will be a SRU for Noble as well at this point). This will need to be fixed in the next development release (OO series) to avoid any future regression. But at the moment the archive is not yet open for that. Please, fix that and someone can sponsor the uploads targeting all supported releases at once. I am unsubscribing ~ubuntu-sponsors, once you address the comment above please subscribe it again and someone will take a look. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Status in e2fsprogs source package in Noble: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Matthew, It's been a couple months. We'd really love to get the fix for Focal, Jammy, and Noble. Any chance this could get sponsored and approved soon? I also checked up on upstream and it appears that they're preparing a 1.47.1 release of e2fsprogs that should include this fix. It hasn't been tagged yet, but they're starting the process: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=3fcbc9ffbeaa0df3dd06113b61f9b3bed4efb92e -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Status in e2fsprogs source package in Noble: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
** Also affects: e2fsprogs (Ubuntu Noble) Importance: Critical Assignee: Matthew Ruffell (mruffell) Status: In Progress -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Status in e2fsprogs source package in Noble: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Thanks for the update, Matthew. We're looking forward to getting this fix from Ubuntu. My team patched our version the e2fsprogs package from Focal about a year ago, before we submitted the fix upstream. Since that patch, we haven't had any re-occurrences of the problem. It used to show up about 4-5 times a day for us. Glad that the fix is working in your tests as well. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
I have been running the test packages on AWS with the reproducer running for 20 days now, and they are still running great. The change to direct IO really does fix this issue, and my testing has removed any and all concerns of causing a regression. Previously focal wouldn't last more than 20 minutes, and jammy onward, a week. I will get these patches sponsored now. Sorry for the delay Krister. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Krister, Fascinating. I'm in New Zealand, so I use ap-southeast-2 in Sydney, Australia for all my instances, and I never gave it any thought that this could depend on how busy EBS is on the availability zone. I'll move my instances to us-west-2. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Matthew, Thanks for the update. I'm glad this finally reproduced in your environment. I don't have a great explanation for why it took so much longer there. I did observe that it seemed more likely to occur in us-west-2 during the 9a-5p window in the local timezone. The timing may be subtly affected by overall EBS utilization. Just a guess, though. Thanks again, -K -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Krister, I have finally seen this occur in real life with my own two eyes! You are absolutely correct, the 4-retry doesn't seem to be sufficient sometimes. The reproducer works on Focal and previous in about 20 minutes, so its easy to see the issue trigger on Focal. But Focal and previous doesn't retry at all. On Jammy, Mantic and noble, it took about a week straight, but I managed to get it to trigger for each of them. Start Tue Jan 16 01:57:20 UTC 2024 Tue Jan 16 02:18:53 UTC 2024 End Tue Jan 23 20:12:28 UTC 2024 Tue Jan 23 14:32:08 UTC 2024 The 4-retry does help, and helps quite a lot really. Anyway, I upgraded my test environment to the test packages, and I will leave them running for a week. If things look good then, I'll get these patches sponsored for SRU. Sorry for the delay, but I really wanted to see it fail on Jammy, Mantic and Noble before we go patching them. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Ubuntu 23.04 (Lunar Lobster) has reached end of life, so this bug will not be fixed for that specific release. ** Changed in: e2fsprogs (Ubuntu Lunar) Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: Won't Fix Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Matthew, Thanks for the update. I went ahead and tested your updated packages on a Focal, Jammy, and Noble image in EC2 this evening. With the latest packages installed, I was unable to reproduce the problem on any of the three installs. I'm uncertain which builds were inconsistent about triggering the problem for you, but it might be worth noting that the version of the package after Focal got an additional partial fix for the superblock checksum mismatch. In those cases, it'll re-try the read of the block up to 3 times before returning a failure. In my previous testing, this would increase the amount of time before one hits the problem, but not eliminate it entirely. Thanks again for you help with getting these patches in. It's much appreciated! -K -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi Krister, I apologise for the delay. The main issue I have been having with testing is that it reproduces significantly faster on some releases than others, and I still haven't managed to reproduce once on some releases. I'll set up some fresh reproducers now, and leave them running. If you want to help test, there are test packages for all releases in: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test Regardless, I'll try move this forwards. Thanks, Matthew -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Attached is a patch for noble that solves this issue. ** Patch added: "Debdiff for e2fsprogs on noble" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5738302/+files/lp2036467_noble.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Attached is a V2 patch for mantic with a different version number, due to it no longer being the devel release. ** Patch removed: "Debdiff for e2fsprogs on mantic" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5707893/+files/lp2036467_mantic.debdiff ** Patch added: "Debdiff for e2fsprogs on mantic V2" https://bugs.launchpad.net/ubuntu/+source/e2fsprogs/+bug/2036467/+attachment/5738301/+files/lp2036467_mantic_v2.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
@mruffel just wanted to check back to see if the instructions in the report worked to reproduce the problem for you. If so, do you have any estimate when packages with the patch will be made available? Thanks! -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
Hi, Just wanted to check back to see if the reproducer and fix worked in your testing environments. I was also curious if it were possible to share any plans around when an update that contains this fix might be released. Thanks again. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
** Description changed: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. + $ resize2fs /dev/nvme1n1p1 + resize2fs 1.47.0 (5-Feb-2023) + resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 + Couldn't find valid filesystem superblock. + Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: -#!/usr/bin/bash -set -euxo pipefail + #!/usr/bin/bash + set -euxo pipefail -while true -do -parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s -sleep .5 -mkfs.ext4 /dev/nvme1n1p1 -mount -t ext4 /dev/nvme1n1p1 /mnt -stress-ng --temp-path /mnt -D 4 & -STRESS_PID=$! -sleep 1 -growpart /dev/nvme1n1 1 -resize2fs /dev/nvme1n1p1 -kill $STRESS_PID -wait $STRESS_PID -umount /mnt -wipefs -a /dev/nvme1n1p1 -wipefs -a /dev/nvme1n1 -done + while true + do + parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s + sleep .5 + mkfs.ext4 /dev/nvme1n1p1 + mount -t ext4 /dev/nvme1n1p1 /mnt + stress-ng --temp-path /mnt -D 4 & + STRESS_PID=$! + sleep 1 + growpart /dev/nvme1n1 1 + resize2fs /dev/nvme1n1p1 + kill $STRESS_PID + wait $STRESS_PID + umount /mnt + wipefs -a /dev/nvme1n1p1 + wipefs -a /dev/nvme1n1 + done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] - Upstream mailing list discussion: + Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for - online resizes + online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non-ESM archives to be picked up in cloud images. ** Changed in: e2fsprogs (Ubuntu Bionic) Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: Won't Fix Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. $ resize2fs /dev/nvme1n1p1 resize2fs 1.47.0 (5-Feb-2023) resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1 Couldn't find valid filesystem superblock. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
@juliank I'm just doing a little bit more testing for the moment, as I really want to make sure this isn't going to cause any issues in the cloud images. It would be nice to have this bug fixed though, I have seen a few cases related to it over the years. I'll ask my SEG colleagues for help with sponsoring in a day or two. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
trusty and xenial receive bug updates via Pro and not via the main archive anymore, you'll have to get SEG to add bug tasks for Pro and prepare +esm updates with them. ** Changed in: e2fsprogs (Ubuntu Trusty) Status: In Progress => Won't Fix ** Changed in: e2fsprogs (Ubuntu Xenial) Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
@mruffel did you mean to get sponsoring for the patches? you might then want to subscribe ~ubuntu-sponsors so this can be merged by the patch pilots. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: Won't Fix Status in e2fsprogs source package in Xenial: Won't Fix Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a scratch disk. Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done Test packages are available in the following ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test If you install the test packages, the race no longer occurs. [Where problems could occur] We are changing how resize2fs reads the superblock from underlying disks. If a regression were to occur, resize2fs could fail to resize offline or online volumes. As all cloud-images are online resized during their initial boot, this could have a large impact to public and private clouds should a regression occur. [Other info] Upstream mailing list discussion: https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ This was fixed in the below commit upstream: commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 Author: Theodore Ts'o Date: Thu, 15 Jun 2023 00:17:01 -0400 Subject: resize2fs: use Direct I/O when reading the superblock for online resizes Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 The commit has not been tagged to any release. All supported Ubuntu releases require this fix, and need to be published in standard non- ESM archives to be picked up in cloud images. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
** Summary changed: - superblock checksum mismatch in resize2fs + Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs ** Description changed: - Hi, - We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer: + [Impact] + + This is a long running bug plaguing cloud-images, where on a rare + occasion resize2fs would fail and the image would not resize to fit the + entire disk. + + Online resizes would fail due to a superblock checksum mismatch, where + the superblock in memory differs from what is currently on disk due to + changes made to the image. + + Changing the read of the superblock to Direct I/O solves the issue. + + [Testcase] + + Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use + as a scratch disk. + + Run the following script, courtesy of Krister Johansen and his team: #!/usr/bin/bash set -euxo pipefail while true do parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s sleep .5 mkfs.ext4 /dev/nvme1n1p1 mount -t ext4 /dev/nvme1n1p1 /mnt stress-ng --temp-path /mnt -D 4 & STRESS_PID=$! sleep 1 growpart /dev/nvme1n1 1 resize2fs /dev/nvme1n1p1 kill $STRESS_PID wait $STRESS_PID umount /mnt wipefs -a /dev/nvme1n1p1 wipefs -a /dev/nvme1n1 done - (This was on a 60gb gp3 volume attached to a c5.4xlarge) + Test packages are available in the following ppa: - We were able to find a fix that works and get the patch accepted - upstream. The short explanation is that by switching the superblock - read to direct io, we no longer see the problem. + https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test - The patch is available here, but hasn't been published in a released - version of e2fsprogs: + If you install the test packages, the race no longer occurs. - https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 + [Where problems could occur] - A longer thread with the maintainer is available here: + We are changing how resize2fs reads the superblock from underlying + disks. + If a regression were to occur, resize2fs could fail to resize offline or + online volumes. As all cloud-images are online resized during their + initial boot, this could have a large impact to public and private + clouds should a regression occur. + + [Other info] + + Upstream mailing list discussion: + https://lore.kernel.org/linux-ext4/20230605225221.ga5...@templeofstupid.com/ https://lore.kernel.org/linux-ext4/20230609042239.ga1436...@mit.edu/ - This bug report is to request that Ubuntu backport this patch to the - versions of e2fsprogs that are in releases that are available in images - on AWS, preferably Focal and Jammy. + This was fixed in the below commit upstream: + + commit 43a498e938887956f393b5e45ea6ac79cc5f4b84 + Author: Theodore Ts'o + Date: Thu, 15 Jun 2023 00:17:01 -0400 + Subject: resize2fs: use Direct I/O when reading the superblock for + online resizes + Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84 + + The commit has not been tagged to any release. All supported Ubuntu + releases require this fix, and need to be published in standard non-ESM + archives to be picked up in cloud images. ** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to e2fsprogs in Ubuntu. https://bugs.launchpad.net/bugs/2036467 Title: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs Status in cloud-images: New Status in e2fsprogs package in Ubuntu: In Progress Status in e2fsprogs source package in Trusty: In Progress Status in e2fsprogs source package in Xenial: In Progress Status in e2fsprogs source package in Bionic: In Progress Status in e2fsprogs source package in Focal: In Progress Status in e2fsprogs source package in Jammy: In Progress Status in e2fsprogs source package in Lunar: In Progress Status in e2fsprogs source package in Mantic: In Progress Bug description: [Impact] This is a long running bug plaguing cloud-images, where on a rare occasion resize2fs would fail and the image would not resize to fit the entire disk. Online resizes would fail due to a superblock checksum mismatch, where the superblock in memory differs from what is currently on disk due to changes made to the image. Changing the read of the superblock to Direct I/O solves the issue. [Testcase] Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use as a