On Thu, 27 Mar 2008, Stephen John Smoogen wrote:
On Wed, Mar 26, 2008 at 7:34 PM, Michael Hannon <[EMAIL PROTECTED]> wrote:
Greetings. We have a lately had a lot of trouble with relatively large
(order of 1TB) file systems mounted on RAID 5 or RAID 6 volumes. The
file systems in question are based on ext3.
In a typical scenario, we have a drive go bad in a RAID array. We then
remove it from the array, if it isn't already, add a new hard drive
(i.e., by hand, not from a hot spare), and add it back to the RAID
array. The RAID operations are all done using mdadm.
After the RAID array has completed its rebuild, we run fsck on the RAID
device. When we do that, fsck seems to run forever, i.e., for days at a
time, occasionally spitting out messages about files with recognizable
names, but never completing satisfactorily.
fsck of 1TB is going to take days due to the linear nature of it
Hmm, we successfully fsck'd ext3 filesystems 1.4 TB in size frequently a
couple of years ago, under 2.4 (back then, it was SuSE 8.2 + a vanilla
kernel). This took no more than a few hours (maybe 2,3, or 4). It was
hardware RAID, not too reliable (hence "frequently"), and not too fast (<
100 MB/s). A contemporary linux server with software RAID should complete
an fsck *much* faster, or something is wrong.
And I still wonder why fsck at at all just because a broken disk was
replaced in a redundant array?
--
Stephan Wiesand
DESY - DV -
Platanenallee 6
15738 Zeuthen, Germany