Re: Problems with 4.5 TB disk array (LVM & ext3)

Stephan . Wiesand Thu, 27 Mar 2008 08:40:22 -0700

On Wed, 26 Mar 2008, Jon Peatfield wrote:

On Wed, 26 Mar 2008, Jan Schulze wrote:
Hi all,
I have a disk array with about 4.5 TB and would like to use it as one largelogical volume with an ext3 file system. When mounting the logical volume ,I get an "Input/Output error: can't read superblock".
Do you get any interesting kernel messages in the output of dmesg (or/var/log/messages etc)? Which exact kernel is this (uname -r) and what arch(i386/x86_64 etc; uname -m)?


And what driver/hardware?

I'm using SL 4.2 with kernel 2.6 and this is what I did so far:

- used parted to create a gpt disk label (mklabel gpt) and one large
  partition (mkpart primary ext3 0s -1s)

- used parted to enable LVM flag on device (set 1 LVM on)

I know it would be slow but can you test that you can read/write to all of/dev/sda1?


Using dd's "seek" parameter, this should not take too much time. But if

creating the GPT label & partition was successful, chances are the wholedevice is accessible.

- created one physical volume, one volume group and one logical volume
  (pvcreate /dev/sda1, vgcreate raid6 /dev/sda1, lvcreate -l 1189706 -n
  vol1 raid6)

- created an ext3 filesystem and explicitly specified a 4K blocksize, as
  this should allow a filesystem size of up to 16 TB (mkfs.ext3 -m 0 -b
  4096 /dev/raid6/vol1)
For some reason my EL4 notes tell me that we also specify -N (number ofinodes), as well as -E (set RAID stride), -J size= (set journal size) and -Osparse_super,dir_index,filetype though most of that is probably the defaultthese days...

Specifying the stripe width is also supposed to be a good idea, as isaligning the start of the partition to a stripe boundary (although that'smore likely to be useful without LVM on top).

However, mounting (mount /dev/raid6/vol1 /raid) gives the superblock error,mentioned above.
Everything is working as expected, when using ext2 filesystem (with LVM) orext3 filesystem (without LVM). Using a smaller volume (< 2 TB) is working
with ext3+LVM as well. Only the combination of > 2TB+ext3+LVM gives metrouble.
Any ideas or suggestions?
We found that in at least some combinations of kernel/hardware (driversreally I expect), that support for >2TB block devices was still rather flakey(at least in the early versions of EL4).
We ended up getting our RAID boxes to present as multiple LUNs each under 2TBwhich we can then set up as PVs and join back together into a single VG andstill have an LV which was bigger than 2TB. I'm rather conservative in suchthings so we still avoid big block devices at the moment.
[ obviously with single disk sizes growing at the rate they are it means thatthe block-devices >2TB code is going to get a LOT more testing! ]

We're successfully using devices up to 7 TB with a single XFSfilesystem on them, under SL4/5 (but I think we started doing this with4.3, not 4.2). I have no hope to be able to check (xfs_repair) thoseshould this ever become necessary though - from what I've read it wouldrequire more RAM than fits into a server today.

However, some of the tools (e.g. ext2/3 fsck) still seemed to fail at about3.5TB so we ended up needing to build the 'very latest' tools to be able torun fsck properly (the ones included in EL4 - and EL5 I think - get into aninfinite loop at some point while scanning the inode tables).
Currently we try to avoid 'big' ext3 LVs ; the one where we discovered thefsck problems was originally ~6.8TB but we ended up splitting that intoseveral smaller LVs since even with working tools it still took ~2 days tofsck... (and longer to dump/copy/restore it all!)
Some of my co-workers swear by XFS for 'big' volumes but then we do have SGIboxes where XFS (well CXFS) is the expected default fs. I've not done muchtesting with XFS on SL mainly because TUV don't like XFS much...

I think it's still the best choice for large (> 2 TB) filesystems. The xfsavailable in SL4 contrib has done very well here. There are someinteresting effects when such a filesystem runs full and you have toremount it with the "inode64" option in order to be able to create newfiles and you discover that quite a few applications are not ready for64-bit node numbers. But that aside, it has done very well. No otherheadaches. We're now beginning to deploy large (>10TB) XFS filesystemsunder SL5.

This being said, we now also have lustre OSTs (using a modified ext3)7.5 TB in size. No problems so far, but then none of them has run full orrequired an fsck yet.


--
Stephan Wiesand
   DESY - DV -
   Platanenallee 6
   15738 Zeuthen, Germany

Re: Problems with 4.5 TB disk array (LVM & ext3)

Reply via email to