** Description changed: + [Impact] + + With mmap()ed files on ext4's data journaling it's possible to change + a mapped page's buffers contents during their jbd2 transaction commit + (as currently nothing prevents/blocks the write access at that time.) + + This might happen between the buffers checksum calculation and actual + write to journal, so the (old) checksum is invalid for the (new) data. + + If the system crashes after that, but before such journal entry makes + it to the filesystem, the journal replay on the next mount just fails, + and the filesystem now requires fsck. (apparently curtin might set up + /etc/fstab with passno=0, requiring manual intervention.) + + [39751.096455] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support! + [39751.114435] JBD2: Invalid checksum recovering block 87305 in log + [39751.146133] JBD2: Invalid checksum recovering block 88039 in log + [39751.195950] JBD2: Invalid checksum recovering block 49633 in log + [39751.265158] JBD2: recovery failed + [39751.265163] EXT4-fs (vdc): error loading journal + + [Fix] + + The fix is to write-protect the pages during journal transaction commit, + so that writes to mapped pages hit a page fault, then ext4's page_mkwrite + hook can block until the commit finishes and the buffers can be modified. + + In order to do that, add jbd2 journal callbacks that the filesystems can + customize, called before/after the critical region in transaction commit, + then have ext4 in data journaling mode to write-protect the pages whose + buffers are being committed (and handle cases that need pages redirtied.) + + The changes are restricted to the data journaling mode and page_mkwrite + hook, and other modes/paths use the same code/behavior in the callbacks. + + [Test Case] + + Set up an ext4 filesystem in data journaling mode, and run stress-ng's + mmap file test on it, then crash the system after a bit; check whether + the filesystem can mount again or not (i.e., with jbd2 checksum errors.) + + # mkfs.ext4 $DEV + # mount -o data=journal $DEV $DIR + # cd $DIR + # stress-ng --mmap $((4*$(nproc))) --mmap-file & + # sleep 60 + # echo c >/proc/sysrq-trigger + ... + # mount -o data=journal $DEV $DIR # PASS/FAIL. + # dmesg | tail + + [Regression Potential] + + Regressions would likely manifest in ext4 data journaling mode (which + is not the default mode, 'ordered') with memory mapped access, as the + other modes/paths are largely unaffected by the changes/same behavior. + + This has been tested with (x)fstests, that showed no regressions on + data=ordered and data=journal on both Bionic and Focal (with kernel + versions 4.15.0-156-generic and 5.4.0-84-generic) w/in 10 runs each. + And the stress-ng test-case as well. (Numbers/details in the LP bug.) + + [Other info] + + The patchset is applied on 5.10, so Hirsute (5.11) is already fixed; + only Focal and Bionic need it. + + There are little changes in the patches between Focal and Bionic + (mostly minor backport adjustments, mainly due to no vm_fault_t) + but unfortunately that needs separate versions for most patches. + + ... + + + [Original Bug Description] + [Impact] In the event of a loss of power, ext4 filesystems mounted w/ data=journal,journal_checksum are subject to a corruption issue that requires a fsck to recover. This is exacerbated by installations by curtin that set passno=0 in /etc/fstab, preventing fsck from running automatically and thus requiring a manual recovery. And *that* is further exacerbated because initramfs-tools is smart enough to not include fsck.ext4 when passno=0 is detected in /etc/fstab, requiring the user to boot from recovery media. [Test Case] Forcibly power cycle a system running 'stress-ng --dir 0'. I've created a package to automate the reproduction: https://git.launchpad.net/~dannf/+git/dgx2-ext4-csum-repro?h=master [Fix] [Regression Risk]
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1847340 Title: ext4 journal recovery fails w/ data=journal + journal_checksum + mmap To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1847340/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
