Bug#984472: e2fsprogs: resize2fs on inconsistent filesystem (in a way that e2fsck doesn't detect) completely corrupts MMP data (in a way e2fsck can't recover from)

2021-03-09 Thread наб
On Sat, Mar 06, 2021 at 11:07:53PM -0500, Theodore Ts'o wrote:
> 1.  e2image -r rbd13.e2i.qcow2 /tmp/rbd13
> 2.  truncate -s 10T /tmp/rbd13
> 3.  e2fsck -fy /tmp/rbd13
> 4.  resize2fs /tmp/rbd13
> 5.  e2fsck -fy /tmp/rbd13
>   
(FTR: the report specified these instructions almost exactly
  (the first fsck step is noted as optional,
   since the corruption happened regardless thereof))

> The bug report was also incorrect by saying that resizing the file
> system to 4T was sufficient; that is not true.
I must've conflated the results during testing somehow, apologies.

> I suppose we can add some "are you sure?  ARE YOU REALLY SURE?"
> question to "tune2fs -f -E clear_mmp", [...]

> Another workaround is to simply do an online resize (that is, on the
> node where the file system is mounted, run resize2fs on the mounted
> file system).
This was all performed on a snapshot, which by definition is not mounted
on any node, even if it had been when snapped.

Indeed, there was no possibility of another node in my local non-ceph
testing, purely because it was a local, unexported zvol,
and I'm rather sure it wasn't mounted at any point when e2fsck/resize2fs
were running, hence why I obliged to clearing the MMP in the first place.

I've tried the patches and they appear to work pefectly,
and I can no longer trigger this in any configuration
in either environment.

Thanks!
наб


signature.asc
Description: PGP signature


Bug#984472: e2fsprogs: resize2fs on inconsistent filesystem (in a way that e2fsck doesn't detect) completely corrupts MMP data (in a way e2fsck can't recover from)

2021-03-06 Thread Theodore Ts'o
On Fri, Mar 05, 2021 at 09:48:50PM -0500, Theodore Ts'o wrote:
> I can't reproduce the problem given your file system image.  Given
> your description, this is almost certainly operator error.

OK, I was finally able to reproduce the problem, but not using your
reproduction instructions.  I reproduced it via:

1.  e2image -r rbd13.e2i.qcow2 /tmp/rbd13
2.  truncate -s 10T /tmp/rbd13
3.  e2fsck -fy /tmp/rbd13
4.  resize2fs /tmp/rbd13
5.  e2fsck -fy /tmp/rbd13
  

The bug report was also incorrect by saying that resizing the file
system to 4T was sufficient; that is not true.  It can only be
reproduced by when a file system is resized sufficiently large that
there is no longer enough room to grow the block group descriptors
without moving the allocation bitmaps and/or inode table out of the
way in order to create room for the block group descriptors.  (As in
the above reproduction recipe.)

% e2image -r rbd13.e2i.qcow2 /tmp/rbd13
e2image 1.46.2 (28-Feb-2021)
% truncate -s 4T /tmp/rbd13
% resize2fs  /tmp/rbd13
resize2fs 1.46.2 (28-Feb-2021)
Please run 'e2fsck -f /tmp/rbd13' first.
% e2fsck -f /tmp/rbd13
e2fsck 1.46.2 (28-Feb-2021)
e2fsck: MMP: e2fsck being run while checking MMP block
MMP check failed: If you are sure the filesystem is not in use on any node, run:
'tune2fs -f -E clear_mmp /tmp/rbd13'
MMP_block:
mmp_magic: 0x4d4d50
mmp_check_interval: 10
mmp_sequence: e24d4d50
mmp_update_date: Sat Mar  6 22:47:25 2021
mmp_update_time: 1615088845
mmp_node_name: cwcc
mmp_device_name: /tmp/rbd13

/tmp/rbd13: ** WARNING: Filesystem still has errors **

This is a separate bug.  The issue here is that resize2fs is exiting
after printing the "Please run 'e2fsck -f /tmp/rbd13' first." without
cleanly stopping (resetting) the MMP protection.  And this doesn't
lead to file system corruption, as we can see here:

% tune2fs -f -E clear_mmp /tmp/rbd13
tune2fs 1.46.2 (28-Feb-2021)
% e2fsck -fy /tmp/rbd13
e2fsck 1.46.2 (28-Feb-2021)
Clearing orphaned inode 45617124 (uid=107, gid=115, mode=0100600, size=16777216)
Clearing orphaned inode 15073744 (uid=0, gid=0, mode=0100644, size=593696)
Clearing orphaned inode 15073743 (uid=0, gid=0, mode=0100644, size=3031904)
Clearing orphaned inode 50331709 (uid=0, gid=0, mode=0100644, size=149704)
Clearing orphaned inode 50332495 (uid=0, gid=0, mode=0100755, size=231560)
Clearing orphaned inode 50332319 (uid=0, gid=0, mode=0100644, size=2670992)
Clearing orphaned inode 50332271 (uid=0, gid=0, mode=0100644, size=651472)
Clearing orphaned inode 50332251 (uid=0, gid=0, mode=0100644, size=282752)
Clearing orphaned inode 13 (uid=0, gid=0, mode=0100600, size=0)
Pass 1: Checking inodes, blocks, and sizes
Inode 46530577 extent tree (at level 1) could be shorter.  Optimize? yes

Inode 46530714 extent tree (at level 1) could be shorter.  Optimize? yes

Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/tmp/rbd13: * FILE SYSTEM WAS MODIFIED *
/tmp/rbd13: 225913/67108864 files (8.6% non-contiguous), 185961407/268435456 
blocks


The workaround to the first bug is to clear the MMP feature, do the
offline resize, and then enable the MMP feature again.  Of course,
while the MMP feature you won't be protected by another node trying to
modify the file system.  But it will allow you to grow the file system.

Another workaround is to simply do an online resize (that is, on the
node where the file system is mounted, run resize2fs on the mounted
file system).  However, depending on the kernel version, it won't
allow you to resize past the limits of the reserved block group
descriptors reserved by the resize inode.

- Ted



Bug#984472: e2fsprogs: resize2fs on inconsistent filesystem (in a way that e2fsck doesn't detect) completely corrupts MMP data (in a way e2fsck can't recover from)

2021-03-05 Thread Theodore Ts'o
I can't reproduce the problem given your file system image.  Given
your description, this is almost certainly operator error.

> $ e2fsck -f   /dev/zvol/filling/store/nabijaczleweli/e2test
> e2fsck 1.46.2 (28-Feb-2021)
> e2fsck: MMP: e2fsck being run while checking MMP block
> MMP check failed: If you are sure the filesystem is not in use on any node, 
> run:
> 'tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test'
> MMP_block:
> mmp_magic: 0x4d4d50
> mmp_check_interval: 10
> mmp_sequence: e24d4d50
> mmp_update_date: Wed Mar  3 14:51:38 2021
> mmp_update_time: 1614779498
> mmp_node_name: tarta
> mmp_device_name: /dev/zvol/filling/store/nabijacz

What this message means is that some *other* node was trying to run
fsck on the node at the same time as your e2fsck run.

The key in this message is 

MMP check failed: If you are sure the filesystem is not in use on any node, run:
  

When you then run 

'tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test'

It forcibly clears the MMP block.  The MMP protects the file system
from simulatenous modification by more than one system.  Given that
you you had this file system on some kind of remote block device
(which presumably is why you were using the multi-mount protection
feature in the first place), forcibly overriding the MMP protection is
a bad thing.  It's the functional equivalent of turning off the gun's
safety, and then aiming the gun at your foot, and pulling the trigger.

> 
> $ tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test
> tune2fs 1.46.2 (28-Feb-2021)
> 
> $ e2fsck -fy   /dev/zvol/filling/store/nabijaczleweli/e2test
> e2fsck 1.46.2 (28-Feb-2021)
> ext2fs_open2: Superblock checksum does not match superblock
> e2fsck: Superblock invalid, trying backup blocks...
> Superblock has invalid MMP magic.  Fix? yes

The bad checksum and the invalid MMP magic means that some other
system was also modifying the file system while tune2fs was clearing
the MMP block.

If I run the same set of commands in your logs after unpacking your image:

% unzstd rbd13.e2i.qcow2.zst
% e2image -r rbd13.e2i.qcow2 rbd13
% truncate -s 4T rbd13  # This "expands" the file to 4TB

... and then running the same set of commands using "rbd13" instead of
"/dev/zvol/filling/store/...", it works just fine.  But on the local
file, obviously there won't be any other system or node modifying the
file system, and there is no corruption.


I do see some problems in how resize2fs handles MMP devices.  In
particular resize2fs doesn't check the MMP block.  So if there is some
other process messing with the file system, the result can be file
system corruption.  But you really shouldn't have been trying to
resize the file system if someone else is trying to use the file
system.

And even if you do this, if you run "tune2fs -f -E clear_mmp ..." and
something else is using the file system, you're going to be doomed,
anyway.

I suppose we can add some "are you sure?  ARE YOU REALLY SURE?"
question to "tune2fs -f -E clear_mmp", and maybe even force the user
to type the string "I AGREE TO ASSUME RESPONSIBILITY FOR FILE SYSTEM
DESTRUCTION IF OTHER NODES ARE USING, MOUNTING, OR MODIFYING THE FILE
SYSTEM", but at the end of the day, there is only so much we can do
protect against PEBCAK failures

- Ted