> I had a power loss and on reboot I am stopping and being told to run
fsck
> manually without the -i or -p option and to enter root password to go
into
> maintenance mode at this point I  am at a line by line mode. this has
> happened five times before and I have had to reinstall to get it back
is
> this the only was out ...I have got to back this in raid form I think.
>   Any input would be appreciated.

I use software RAID on one of my machines, and had the same thing
(power-outage) occur.  I have a very similarly configured machine
without RAID.  They both went down with the power blackout.
Interestingly, the non-RAID machine re-booted very cleanly.  It passed
the automatic fsck on re-boot, and went merrily on it's way without a
hiccup, and without any lost files appearing in the lost+found
directories.

The RAID machine, on the other hand, put me through the "boot into
single-user root mode" process, and I had to run manual fsck, which
found and corrected oodles of things, and left quite a few things in
lost+found directories.

Now, for the interesting things I learned:

(1) RAID was not the cause of the problems!  X was running at the time;
specifically, the KDE desktop, as well as an open Konqueror browser
window.  Virtually all of the lost files were KDE and Konqueror related
files.  I was able to delete them without ill affect, except for the
loss of my bookmarks.  Oops.
(Note:  I had to use special tools to look at the contents of the
lost+found files and delete them; an ordinary "cat" or "rm" did not work
on them - it has been a long time; I don't recall the specific commands,
except the man data had all kinds of "use at your own risk" warnings all
over the place).

(2) I was not able to fsck the RAID (md) devices directly (or, more
accurately, did not know how to do that properly in the broken state the
machine was in); however, individually fscking the matching RAID
partitions on each disk resulted in fixing the disk problems, as I was
using ONLY MIRRORING...  the disks were identical.  Yay.  I have since
learned that was a bad/dangerous thing to do; I simply got lucky.  I
think if you tried that with any type of RAID partition other than pure
mirrored, you would completely destroy it.

(3) The reason I could not fsck the md devices was because I could not
get far through boot to get the arrays started.  I ended up having to
boot into "rescue" mode from the original installation disk.  This
allowed me to unmount the "/" partition, and still have a functional OS.
I ran fsck from there.  If I had been smarter, I might have found a way
to load raidtools in this mode, and thus start the RAID devices, and
fsck them properly.  However, the default "linux rescue" boot did not
include raidtools.  (I wish it did).

(4) The reason I could not boot was remarkably stupid on my part...  It
seems that, between the previous boot and the power outage, I had gone
through "cleaning" unused things from my hard drives.  In my zeal and
inexperience, I deleted the /initrd directory.  ("Hey, it was completely
empty!")  As a result, the system simply would not boot.  Once I
re-created that directory, while logged in via "linux rescue,"
everything was fine once again.  I was able to re-run fsck on the
operational RAID partitions (and found no errors, see "lucky" discussion
above).  Note that software RAID launched processes in the background,
to re-synch the RAID arrays; these can take a long time to run...  If
you ever get into this situation, remember to check those processes, to
be sure they are done, before re-booting again.  It may save you some
headaches.  (I think it is the 2nd or 3rd console that shows the
activity.)

Lessons to be learned:

(a) "fsck" is a great tool.

(b) "linux rescue" is preferable to re-install.

(c) never fsck the underlying devices (i.e. the individual pieces) of a
RAID partition; fsck the full md device instead (especially if it's not
a pure mirror partition).

(d) never willy-nilly delete even empty directories, unless you KNOW
what they are for (he says, with red face).

(e) whenever you change things (especially top-level directories or
their contents), do a re-boot right then and there, so you can fix
anything that goes wrong, with what you just did fresh in your head.

I apologize for the long-winded posting; but, I hope my mis-adventures
will save even one other person from the bruised forehead I got from
banging my head on the wall.  At least there was a Happy Ending!

Regards,
Jim



_______________________________________________
Seawolf-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/seawolf-list

Reply via email to