On Wed, 2023-12-06 at 09:18 -0500, Edward Zuniga wrote:
Thanks everyone for the feedback! I've learned so much from reading the 
discussions.

For our application, we will have a LAN with a single server (1TB RAID1 array 
for OS, 200TB RAID5 array for data) and up to 16 workstations (1TB RAID1 array 
for OS). Our IT department is more familiar with Rocky Linux 8, which I assume 
will perform the same as AlmaLinux 8. Some of our MRI processing can take weeks 
to finish, so we need a system that is very reliable. We also work with 
individual files in the hundreds of gigabytes.

While reading the Red Hat 8 
manual<https://urldefense.proofpoint.com/v2/url?u=https-3A__access.redhat.com_documentation_en-2Dus_red-5Fhat-5Fenterprise-5Flinux_8_html_managing-5Ffile-5Fsystems_overview-2Dof-2Davailable-2Dfile-2Dsystems-5Fmanaging-2Dfile-2Dsystems&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=gd8BzeSQcySVxr0gDWSEbN-P-pgDXkdyCtaMqdCgPPdW1cyL5RIpaIYrCn8C5x2A&m=l2u4y6HnoaoxnKdKpP6TSgfeihqn5-vjvZKxF5i2ORzU4Y8_oxPPbTRAt842L9HF&s=x-uSSeA1IYnWWjSgPr_wxJC5HkceMyQWxtnRfc9p6uM&e=>,
 I found a few possible issues regarding XFS. I'm curious to see if anyone has 
experienced these as well.

1. Metadata error behavior
In ext4, you can configure the behavior when the file system encounters 
metadata errors. The default behavior is to simply continue the operation. When 
XFS encounters an unrecoverable metadata error, it shuts down the file system 
and returns the EFSCORRUPTED error.
This could be problematic for processing that takes several weeks.

In the rare issues I've hit with xfs metadata, xfs_repair has always been able 
to save me.  I've needed it maybe a dozen times in the last 20 years.  The 
repairs have been very fast in my experience - about 1 minute on an 8Tb volume. 
 Repair time seems to scale at O(ln(n)) based on my research.

2. Inode numbers

The ext4 file system does not support more than 232 inodes.

XFS dynamically allocates inodes. An XFS file system cannot run out of inodes 
as long as there is free space on the file system.

Certain applications cannot properly handle inode numbers larger than 232 on an 
XFS file system. These applications might cause the failure of 32-bit stat 
calls with the EOVERFLOW return value. Inode number exceed 232 under the 
following conditions:

  *   The file system is larger than 1 TiB with 256-byte inodes.
  *   The file system is larger than 2 TiB with 512-byte inodes.

If your application fails with large inode numbers, mount the XFS file system 
with the -o inode32 option to enforce inode numbers below 232. Note that using 
inode32 does not affect inodes that are already allocated with 64-bit numbers.

To be honest, I've only ever hit issues with 64 bit inodes on 32 bit kernels.  
I'm not sure I've truly stressed the space, but I've got some pretty decent 
volumes (30Tb) with a whole lot of files and not hit any issues.  Your mileage 
may vary.

Has anyone encountered this issue?
3. The Red Hat 8 manual also warns that using xfs_repair -L might cause 
significant file system damage and data loss and should only be used as a last 
resort. The manual does not mention a similar warning about using e2fsck to 
repair an ext4 file system. Has anyone experienced issues repairing a corrupt 
XFS file system?


xfs_repair -L is fairly scary as it zeros out the transaction log.  I'd reach 
out to the XFS folks before running it.  I've only needed a normal xfs_repair 
in the past, and that pretty infrequently.


On Tue, Dec 5, 2023 at 8:46 PM Konstantin Olchanski 
<[email protected]<mailto:[email protected]>> wrote:
On Mon, Dec 04, 2023 at 03:03:46PM -0500, Edward Zuniga wrote:
>
> We are upgrading our MRI Lab servers and workstations to AlmaLinux 8. We
> have used ext4 for the past 10 years, however we are considering using XFS
> for its better performance with larger files. Which file system do you use
> for your lab?
>

Historical background.

XFS filesystem with the companion XLV logical volume manager (aka "partitioning 
tool")
came to Linux from SGI IRIX, where it was developed circa late-1990-ies. XFS 
was copied
to Linux verbatim (initially with shims and kludges, later, fully integrated).
XLV was reimplemented as LVM.

The EXT series of filesystems were developed together with the linux kernel 
(first ext
filesystem may have originated with MINIX, look it up). As improvements were 
made,
journaling, no need to fsck after crash, online grow/shrink, etc, they were
renamed ext2/ext3/ext4 and they are still largely compatible between themselves.

For many purposes, both filesystems are obsoleted by ZFS, which added:

- added metadata and data checksums - to detect silent bit rot on 
current-generation HDDs and SSDs
- added online filesystem check - for broken data, gives you list of filenames 
instead of inode numbers
- added "built-in" mirroring - together with checksums, online fsck (zfs scrub) 
and monthly zfs scrub cron job, allows automatic healing of bit rot.
- added "built-in" raid-5 and raid-6 - again, together with checksums and 
online fsck, allows automatic healing and robust operation in presence of disk 
bad sectors, I/O errors, corruption and single-disk failure.
- other goodies like snapshots, large ram cache, dedup, online compression, etc 
are taken for granted for current generation filesystems.

On current generation HDDs and SSds, use of bare XFS and ext4 is dangerous, SSD 
failure or "HDD grows bad sectors" will destroy your data completely.

On current generation HDDs, use of mirrored XFS and ext4 is dangerous (using 
mdadm or LVM mirroring), (a) bit rot inevitably causes differences between data 
between the two disks. Lacking checksums, mdadm and LVM mirroring cannot decide 
which of the two copies is the correct one. (b) after a crash, mirror rebuild 
fill fail if both disks happen to have bad sectors (or throw random I/O errors).

Ditto for RAID5 and RAID6, probability of RAID rebuild failing because multiple 
disks have have sectors and I/O errors goes up with the number of disks.

ZFS was invented to resolve all these problems. (BTRFS was invented as a NIH 
erzatz ZFS, is still incomplete wrt RAID5/RAID6).

Bottom line, if you can, use ZFS. Current Ubuntu installer has a button 
"install on ZFS", use it!


Reply via email to