C source to reproduce the problem as fast as possible.
The -r option will remove the files.
** Description changed:
+ == SRU, Bionic, Focal, Groovy, Hirsute, Impish ==
+
+ [Impact]
+
+ Creating millions of files on ext4 partition with large_dir support by
+ touching them will eventually trip an ext4 leaf node issue in the index
+ hash. This occurs more frequently when also using smaller block sizes
+ and ends up either with a EXIST or EUCLEAN failure.
+
+ This occurs on the restart condition when performing do_split.
+
+ [ Fix ]
+
+ The fix protects do_split() from the restart condition, making it safe
+ from both current and future ordering of goto statements in earlier
+ sections of the code.
+
+ The fix is from a patch sent upstream and cc'd to Ted Tso but didn't
+ appear on the ext4 mailing list presumably because it got marked as
+ SPAM.
+
+ [ Test Case ]
+
+ Without the fix touching tens of thousands of empty files will trip the
+ issue. It seems to occur more frequently with memory pressure and
+ smaller block sizes, e.g.:
+
+ sudo mkdir -p /mnt/tmpfs /mnt/storage
+ sudo mount -t tmpfs -o size=9000M tmpfs /mnt/tmpfs
+ sudo dd if=/dev/urandom of=/mnt/tmpfs/ext4.img bs=1M
+ sudo mkfs.ext4 -O large_dir -N 21000000 -O dir_index /mnt/tmpfs/ext4.img -b
1024 -F
+ sudo mount /mnt/tmpfs/ext4.img /mnt/storage
+
+ and compile and run the attached C program that quickly populates
+ /mnt/storage with empty files. Without the fix this will terminate with
+ an -EEXIST or -EUCLEAN error on the file creation after several tens of
+ thousands of files.
+
+ [Where problems could occur]
+
+ This changes the behaviour of the directory indexing hashing so there is
+ a regression potential that this may introduce subsequent index hashing
+ issues when needed (or not) to do a split. This patch seems to cover
+ all the necessary cases, so I believe this risk is relatively low. I
+ have also tested this on all the kernel series in the SRU with
+ 21,000,000 files so I am confident we have enough test coverage to show
+ the fix is OK.
+
+ ----------------------------------------------------------
+
I believe, I found a bug in ext4 in recent kernel versions.
I stumbled across this while I was trying to restore a backup to a new VM.
How to reproduce this bug:
1. Use a virtual/physical machine with "Ubuntu 18.04.5 LTS" and kernel
version 4.15.0-144-generic.
2. add a secondary disk to hold the test files.
3. prepare and mount the filesystem with enabled 'large_dir' flag:
mkfs.ext4 -m0 /dev/sdb1;
tune2fs -O large_dir /dev/sdb1;
mkdir /mnt/storage;
mount /dev/sdb1 /mnt/storage;
4. change to directory and create approx. 16 mio files
cd /mnt/storage;
i=0;
while (( $i < 20000000 )); do
- i=$(( $i + 1 ));
- (( $i % 1000 == 0 )) && echo $i;
- touch file_$i.dat || break;
+ i=$(( $i + 1 ));
+ (( $i % 1000 == 0 )) && echo $i;
+ touch file_$i.dat || break;
done
-
Expected behaviour:
- 20 mio files shoud be created without error
What happened instead:
- The loop aborts with an error message:
# 16263100
# touch: cannot touch 'file_16263173.dat': Structure needs cleaning
- dmesg gives a little more details:
# [Mon Jun 21 03:15:18 2021] EXT4-fs error (device sdb): dx_probe:855: inode
#2: block 146221: comm touch: directory leaf block found instead of index block
-
Additional notes:
- This occurs on kernel version 4.15.0-144-generic
- Not sure, but I believe one test was run on 4.15.0-143-generic and failed
too.
- Did not check against 4.15.0-142-generic
- On 4.15.0-141-generic, the problem does not exist. Behaviour is as expected.
** Also affects: linux (Ubuntu Impish)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Focal)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Groovy)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Hirsute)
Importance: Undecided
Status: New
** Attachment added: "C source to touch millions of files"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933074/+attachment/5509402/+files/touch.c
** Description changed:
== SRU, Bionic, Focal, Groovy, Hirsute, Impish ==
[Impact]
Creating millions of files on ext4 partition with large_dir support by
touching them will eventually trip an ext4 leaf node issue in the index
hash. This occurs more frequently when also using smaller block sizes
and ends up either with a EXIST or EUCLEAN failure.
This occurs on the restart condition when performing do_split.
[ Fix ]
The fix protects do_split() from the restart condition, making it safe
from both current and future ordering of goto statements in earlier
sections of the code.
The fix is from a patch sent upstream and cc'd to Ted Tso but didn't
appear on the ext4 mailing list presumably because it got marked as
SPAM.
[ Test Case ]
Without the fix touching tens of thousands of empty files will trip the
issue. It seems to occur more frequently with memory pressure and
smaller block sizes, e.g.:
sudo mkdir -p /mnt/tmpfs /mnt/storage
sudo mount -t tmpfs -o size=9000M tmpfs /mnt/tmpfs
sudo dd if=/dev/urandom of=/mnt/tmpfs/ext4.img bs=1M
sudo mkfs.ext4 -O large_dir -N 21000000 -O dir_index /mnt/tmpfs/ext4.img -b
1024 -F
sudo mount /mnt/tmpfs/ext4.img /mnt/storage
- and compile and run the attached C program that quickly populates
- /mnt/storage with empty files. Without the fix this will terminate with
- an -EEXIST or -EUCLEAN error on the file creation after several tens of
- thousands of files.
+ and compile and run the attached C program (see
+
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933074/+attachment/5509402/+files/touch.c)
+ that quickly populates /mnt/storage with empty files. Without the fix
+ this will terminate with an -EEXIST or -EUCLEAN error on the file
+ creation after several tens of thousands of files.
[Where problems could occur]
This changes the behaviour of the directory indexing hashing so there is
a regression potential that this may introduce subsequent index hashing
issues when needed (or not) to do a split. This patch seems to cover
all the necessary cases, so I believe this risk is relatively low. I
have also tested this on all the kernel series in the SRU with
21,000,000 files so I am confident we have enough test coverage to show
the fix is OK.
----------------------------------------------------------
I believe, I found a bug in ext4 in recent kernel versions.
I stumbled across this while I was trying to restore a backup to a new VM.
How to reproduce this bug:
1. Use a virtual/physical machine with "Ubuntu 18.04.5 LTS" and kernel
version 4.15.0-144-generic.
2. add a secondary disk to hold the test files.
3. prepare and mount the filesystem with enabled 'large_dir' flag:
mkfs.ext4 -m0 /dev/sdb1;
tune2fs -O large_dir /dev/sdb1;
mkdir /mnt/storage;
mount /dev/sdb1 /mnt/storage;
4. change to directory and create approx. 16 mio files
cd /mnt/storage;
i=0;
while (( $i < 20000000 )); do
i=$(( $i + 1 ));
(( $i % 1000 == 0 )) && echo $i;
touch file_$i.dat || break;
done
Expected behaviour:
- 20 mio files shoud be created without error
What happened instead:
- The loop aborts with an error message:
# 16263100
# touch: cannot touch 'file_16263173.dat': Structure needs cleaning
- dmesg gives a little more details:
# [Mon Jun 21 03:15:18 2021] EXT4-fs error (device sdb): dx_probe:855: inode
#2: block 146221: comm touch: directory leaf block found instead of index block
Additional notes:
- This occurs on kernel version 4.15.0-144-generic
- Not sure, but I believe one test was run on 4.15.0-143-generic and failed
too.
- Did not check against 4.15.0-142-generic
- On 4.15.0-141-generic, the problem does not exist. Behaviour is as expected.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1933074
Title:
large_dir in ext4 broken
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933074/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs