Bug#984472: e2fsprogs: resize2fs on inconsistent filesystem (in a way that e2fsck doesn't detect) completely corrupts MMP data (in a way e2fsck can't recover from)
On Sat, Mar 06, 2021 at 11:07:53PM -0500, Theodore Ts'o wrote: > 1. e2image -r rbd13.e2i.qcow2 /tmp/rbd13 > 2. truncate -s 10T /tmp/rbd13 > 3. e2fsck -fy /tmp/rbd13 > 4. resize2fs /tmp/rbd13 > 5. e2fsck -fy /tmp/rbd13 > (FTR: the report specified these instructions almost exactly (the first fsck step is noted as optional, since the corruption happened regardless thereof)) > The bug report was also incorrect by saying that resizing the file > system to 4T was sufficient; that is not true. I must've conflated the results during testing somehow, apologies. > I suppose we can add some "are you sure? ARE YOU REALLY SURE?" > question to "tune2fs -f -E clear_mmp", [...] > Another workaround is to simply do an online resize (that is, on the > node where the file system is mounted, run resize2fs on the mounted > file system). This was all performed on a snapshot, which by definition is not mounted on any node, even if it had been when snapped. Indeed, there was no possibility of another node in my local non-ceph testing, purely because it was a local, unexported zvol, and I'm rather sure it wasn't mounted at any point when e2fsck/resize2fs were running, hence why I obliged to clearing the MMP in the first place. I've tried the patches and they appear to work pefectly, and I can no longer trigger this in any configuration in either environment. Thanks! наб signature.asc Description: PGP signature
Bug#984472: e2fsprogs: resize2fs on inconsistent filesystem (in a way that e2fsck doesn't detect) completely corrupts MMP data (in a way e2fsck can't recover from)
On Fri, Mar 05, 2021 at 09:48:50PM -0500, Theodore Ts'o wrote: > I can't reproduce the problem given your file system image. Given > your description, this is almost certainly operator error. OK, I was finally able to reproduce the problem, but not using your reproduction instructions. I reproduced it via: 1. e2image -r rbd13.e2i.qcow2 /tmp/rbd13 2. truncate -s 10T /tmp/rbd13 3. e2fsck -fy /tmp/rbd13 4. resize2fs /tmp/rbd13 5. e2fsck -fy /tmp/rbd13 The bug report was also incorrect by saying that resizing the file system to 4T was sufficient; that is not true. It can only be reproduced by when a file system is resized sufficiently large that there is no longer enough room to grow the block group descriptors without moving the allocation bitmaps and/or inode table out of the way in order to create room for the block group descriptors. (As in the above reproduction recipe.) % e2image -r rbd13.e2i.qcow2 /tmp/rbd13 e2image 1.46.2 (28-Feb-2021) % truncate -s 4T /tmp/rbd13 % resize2fs /tmp/rbd13 resize2fs 1.46.2 (28-Feb-2021) Please run 'e2fsck -f /tmp/rbd13' first. % e2fsck -f /tmp/rbd13 e2fsck 1.46.2 (28-Feb-2021) e2fsck: MMP: e2fsck being run while checking MMP block MMP check failed: If you are sure the filesystem is not in use on any node, run: 'tune2fs -f -E clear_mmp /tmp/rbd13' MMP_block: mmp_magic: 0x4d4d50 mmp_check_interval: 10 mmp_sequence: e24d4d50 mmp_update_date: Sat Mar 6 22:47:25 2021 mmp_update_time: 1615088845 mmp_node_name: cwcc mmp_device_name: /tmp/rbd13 /tmp/rbd13: ** WARNING: Filesystem still has errors ** This is a separate bug. The issue here is that resize2fs is exiting after printing the "Please run 'e2fsck -f /tmp/rbd13' first." without cleanly stopping (resetting) the MMP protection. And this doesn't lead to file system corruption, as we can see here: % tune2fs -f -E clear_mmp /tmp/rbd13 tune2fs 1.46.2 (28-Feb-2021) % e2fsck -fy /tmp/rbd13 e2fsck 1.46.2 (28-Feb-2021) Clearing orphaned inode 45617124 (uid=107, gid=115, mode=0100600, size=16777216) Clearing orphaned inode 15073744 (uid=0, gid=0, mode=0100644, size=593696) Clearing orphaned inode 15073743 (uid=0, gid=0, mode=0100644, size=3031904) Clearing orphaned inode 50331709 (uid=0, gid=0, mode=0100644, size=149704) Clearing orphaned inode 50332495 (uid=0, gid=0, mode=0100755, size=231560) Clearing orphaned inode 50332319 (uid=0, gid=0, mode=0100644, size=2670992) Clearing orphaned inode 50332271 (uid=0, gid=0, mode=0100644, size=651472) Clearing orphaned inode 50332251 (uid=0, gid=0, mode=0100644, size=282752) Clearing orphaned inode 13 (uid=0, gid=0, mode=0100600, size=0) Pass 1: Checking inodes, blocks, and sizes Inode 46530577 extent tree (at level 1) could be shorter. Optimize? yes Inode 46530714 extent tree (at level 1) could be shorter. Optimize? yes Pass 1E: Optimizing extent trees Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /tmp/rbd13: * FILE SYSTEM WAS MODIFIED * /tmp/rbd13: 225913/67108864 files (8.6% non-contiguous), 185961407/268435456 blocks The workaround to the first bug is to clear the MMP feature, do the offline resize, and then enable the MMP feature again. Of course, while the MMP feature you won't be protected by another node trying to modify the file system. But it will allow you to grow the file system. Another workaround is to simply do an online resize (that is, on the node where the file system is mounted, run resize2fs on the mounted file system). However, depending on the kernel version, it won't allow you to resize past the limits of the reserved block group descriptors reserved by the resize inode. - Ted
Bug#984472: e2fsprogs: resize2fs on inconsistent filesystem (in a way that e2fsck doesn't detect) completely corrupts MMP data (in a way e2fsck can't recover from)
I can't reproduce the problem given your file system image. Given your description, this is almost certainly operator error. > $ e2fsck -f /dev/zvol/filling/store/nabijaczleweli/e2test > e2fsck 1.46.2 (28-Feb-2021) > e2fsck: MMP: e2fsck being run while checking MMP block > MMP check failed: If you are sure the filesystem is not in use on any node, > run: > 'tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test' > MMP_block: > mmp_magic: 0x4d4d50 > mmp_check_interval: 10 > mmp_sequence: e24d4d50 > mmp_update_date: Wed Mar 3 14:51:38 2021 > mmp_update_time: 1614779498 > mmp_node_name: tarta > mmp_device_name: /dev/zvol/filling/store/nabijacz What this message means is that some *other* node was trying to run fsck on the node at the same time as your e2fsck run. The key in this message is MMP check failed: If you are sure the filesystem is not in use on any node, run: When you then run 'tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test' It forcibly clears the MMP block. The MMP protects the file system from simulatenous modification by more than one system. Given that you you had this file system on some kind of remote block device (which presumably is why you were using the multi-mount protection feature in the first place), forcibly overriding the MMP protection is a bad thing. It's the functional equivalent of turning off the gun's safety, and then aiming the gun at your foot, and pulling the trigger. > > $ tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test > tune2fs 1.46.2 (28-Feb-2021) > > $ e2fsck -fy /dev/zvol/filling/store/nabijaczleweli/e2test > e2fsck 1.46.2 (28-Feb-2021) > ext2fs_open2: Superblock checksum does not match superblock > e2fsck: Superblock invalid, trying backup blocks... > Superblock has invalid MMP magic. Fix? yes The bad checksum and the invalid MMP magic means that some other system was also modifying the file system while tune2fs was clearing the MMP block. If I run the same set of commands in your logs after unpacking your image: % unzstd rbd13.e2i.qcow2.zst % e2image -r rbd13.e2i.qcow2 rbd13 % truncate -s 4T rbd13 # This "expands" the file to 4TB ... and then running the same set of commands using "rbd13" instead of "/dev/zvol/filling/store/...", it works just fine. But on the local file, obviously there won't be any other system or node modifying the file system, and there is no corruption. I do see some problems in how resize2fs handles MMP devices. In particular resize2fs doesn't check the MMP block. So if there is some other process messing with the file system, the result can be file system corruption. But you really shouldn't have been trying to resize the file system if someone else is trying to use the file system. And even if you do this, if you run "tune2fs -f -E clear_mmp ..." and something else is using the file system, you're going to be doomed, anyway. I suppose we can add some "are you sure? ARE YOU REALLY SURE?" question to "tune2fs -f -E clear_mmp", and maybe even force the user to type the string "I AGREE TO ASSUME RESPONSIBILITY FOR FILE SYSTEM DESTRUCTION IF OTHER NODES ARE USING, MOUNTING, OR MODIFYING THE FILE SYSTEM", but at the end of the day, there is only so much we can do protect against PEBCAK failures - Ted