Re: [Lustre-discuss] Recovery from Hardware Failure

2011-02-11 Thread Joe Digilio
Cliff, thank you for your help so far.

Unfortunately, the initial e2fsck's (-n) of both the MDT and the OSTs
did not come back clean.  Using -p, the OSTs cleaned up nicely (in
fact, most OST problems went away after the journal was recovered).
The MDT had many files dumped to lost+found.

When I run lfsck -d it never seems to delete the orphans...
Subsequent runs show exactly the same orphans.  Many lines like this
for all OSTs:
[0] zero-length orphan objid 1
[0] zero-length orphan objid 960
[0] zero-length orphan objid 992
lfsck: [0]: pass3 orphan found objid 1207392, 6234112 bytes
lfsck: [0]: pass3 orphan found objid 1207360, 6234112 bytes

Shouldn't those be deleted when using -d?  Or am I misunderstanding
the documentation?

Thanks again!
-Joe


On Mon, Feb 7, 2011 at 17:00, Cliff White cli...@whamcloud.com wrote:
 You should not have to do the lfsck if the initial fsck's come back clean.
 cliffw

 On Mon, Feb 7, 2011 at 1:16 PM, Joe Digilio jgd-lus...@metajoe.com wrote:

 Last week we experienced a major hardware failure (disk controller)
 that brought down our system hard.  Now that I have the replacement
 controller, I want to make sure I recover correctly.  Below is the
 procedure I plan to follow based on what I've gathered from the
 Operations Manual.

 Any comments?
 Do I need to create the mds/ost DBs AFTER ll_recover_lost_found_objs?

 Thanks!
 -Joe


 ###MDT Recovery
 # Capture fs state before doing anything
 e2fsck -vfn /dev/$MDTDEV
 # safe repair
 e2fsck -vfp /dev/$MDTDEV
 # Verify no more problems and generate mdsdb
 e2fsck -vfn --mdsdb /tmp/mdsdb /dev/$MDTDEV

 ###OST Recovery
 foreach OST
    # Capture fs state before doing anything
    e2fsck -vfn /dev/$OSTDEV
    # safe repair
    e2fsck -vfp /dev/$OSTDEV
    # Verify no more problems
    e2fsck -vfn --mdsdb /tmp/mdsdb --ostdb /tmp/ostXdb /dev/$OSTDEV

 ### Recover lost+found Objects
 foreach OST
    mount -t ldiskfs /dev/$OSTDEV /mnt/ost
    ll_recover_lost_found_objs -v -d /mnt/ost/lost+found

 ### Coherency Check
 lfsck -n -v --mdsdb /tmp/mdsdb --ostdb
 /tmp/ost1db,/tmp/ost2db,...,/tmp/ostNdb /lustre
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Recovery from Hardware Failure

2011-02-07 Thread Cliff White
You should not have to do the lfsck if the initial fsck's come back clean.
cliffw


On Mon, Feb 7, 2011 at 1:16 PM, Joe Digilio jgd-lus...@metajoe.com wrote:

 Last week we experienced a major hardware failure (disk controller)
 that brought down our system hard.  Now that I have the replacement
 controller, I want to make sure I recover correctly.  Below is the
 procedure I plan to follow based on what I've gathered from the
 Operations Manual.

 Any comments?
 Do I need to create the mds/ost DBs AFTER ll_recover_lost_found_objs?

 Thanks!
 -Joe


 ###MDT Recovery
 # Capture fs state before doing anything
 e2fsck -vfn /dev/$MDTDEV
 # safe repair
 e2fsck -vfp /dev/$MDTDEV
 # Verify no more problems and generate mdsdb
 e2fsck -vfn --mdsdb /tmp/mdsdb /dev/$MDTDEV

 ###OST Recovery
 foreach OST
# Capture fs state before doing anything
e2fsck -vfn /dev/$OSTDEV
# safe repair
e2fsck -vfp /dev/$OSTDEV
# Verify no more problems
e2fsck -vfn --mdsdb /tmp/mdsdb --ostdb /tmp/ostXdb /dev/$OSTDEV

 ### Recover lost+found Objects
 foreach OST
mount -t ldiskfs /dev/$OSTDEV /mnt/ost
ll_recover_lost_found_objs -v -d /mnt/ost/lost+found

 ### Coherency Check
 lfsck -n -v --mdsdb /tmp/mdsdb --ostdb
 /tmp/ost1db,/tmp/ost2db,...,/tmp/ostNdb /lustre
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss