On 04/21/2011 06:43 AM, Josep Guerrero wrote: > I have a cluster with 8 nodes, all of them running Debian Lenny (plus some > additions so multipath and Infiniband works), which share an array of 48 1TB > disks. Those disks form 22 pairs of hardware RAID1, plus 4 spares). The first > 21 pairs are organized in two striped LVM logical volumes, of 16 and 3 TB, > both formatted with ocfs2. The kernel is the version supplied with the > distribution (2.6.26-2-amd64). > > I wanted to run an fsck on both volumes because of some errors I was getting > (probably unrelated to the filesystems, but I wanted to check). On the 3TB > volume (around 10% full) the check worked perfectly, and finished in less than > an hour (this was run with the fsck.ocfs2 provided by Lenny ocfs2-tools, > version 1.4.1): > <snip>
> but the check for the second filesystem (around 40% full) did this: > > ============ > hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol0 > Checking OCFS2 filesystem in /dev/hidrahome/lvol0: > label:<NONE> > uuid: 6a a9 0e aa cf 33 45 4c b4 72 3a b6 7c 3b 8d 57 > number of blocks: 4168098816 > bytes per block: 4096 > number of clusters: 4168098816 > bytes per cluster: 4096 > max slots: 8 > > /dev/hidrahome/lvol0 was run with -f, check forced. > Pass 0a: Checking cluster allocation chains > ============= > > and stayed there for 8 hours (all the time keeping one core around 100% CPU > usage and with a light load on the disks; this was consistent with the same > step in the previous run, but of course it didn't take so long). I thought > that maybe I had run into some bug, so I interrupted the process, downloaded > ocfs2-tools 1.4.4 sources, compiled them, and tried with that fsck, obtaining > similar results, since it's been running for almost 7 hours like this: > > ============= > hidra0:/usr/local/src/ocfs2-tools-1.4.4/fsck.ocfs2# ./fsck.ocfs2 -f > /dev/hidrahome/lvol0 > fsck.ocfs2 1.4.4 > Checking OCFS2 filesystem in /dev/hidrahome/lvol0: > Label:<NONE> > UUID: 6AA90EAACF33454CB4723AB67C3B8D57 > Number of blocks: 4168098816 > Block size: 4096 > Number of clusters: 4168098816 > Cluster size: 4096 > Number of slots: 8 > > /dev/hidrahome/lvol0 was run with -f, check forced. > Pass 0a: Checking cluster allocation chains > > ============= > > and with one core CPU at 100%. > > Could someone tell me if this is normal? I've been searching the web and > checking manuals for information on how long this checks should take, and > apart from one message in this list mentioning that 3 days in a 8 TB > filesystem > with 300 GB was too long, I haven't been able to find anything. > > If this is normal, is there any way to estimate, taking into account that the > first filesystem uses exactly the same disks and took less than an hour to > check, how long it should take for this other filesystem? Do: # debugfs.ocfs2 -R "stat //global_bitmap" /dev/hidrahome/lvol0 Does this hang too? Redirect the output to a file. That will give us some clues. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users