Re: [SLUG] Re: An almost-hosed Linux system

lukekendall Mon, 28 Aug 2006 06:27:40 -0700

On 28 Aug, Matthew Palmer wrote:

> > 3) Can you tell Ubuntu to use Lilo instead of Grub as the boot loader?
>  
>  Yes, you can, but I can't imagine a single reason on the planet why you
>  would want to.  Grub takes a bit of learning, but actually having a flexible
>  and semi-sensible boot loader is more than enough reward.

Not to start a Lilo vs Grub thread, I'll just say that I'll continue
trying to get Grub and raid working happily together before I give up.

>From yours and Jeff's answers, and other indications, it can certainly
be done.

> > I do think though that for grub and raid (unlike lilo and raid), the
> > best you can do is double up the stanza for each kernel you want to
> > boot and manually choose the working drive when a drive in the raid
> > mirror fails?
>  
>  Depends.  If you're using PATA, which has explicit device names depending on
>  where you plugged the drive in, then yes, but I can't imagine how you'd get
>  around that.  For SCSI or SATA, which tend to just allocate starting from
>  'a', you should be OK with identical configration and dual-installing grub.
>  
> > I've been trying to retrofit raid mirroring on a system after installing
> > Ubuntu 6.
>  
>  That would be 6.06LTS, yes?

Probably: the isos are ubuntu-6.06-desktop-i386.iso and 
ubuntu-6.06-alternate-i386.iso

> > I'm thinking I should install Ubuntu again from scratch, and redo the
> > 10GB of extra package installation and all the configuration for mail
> > etc. again. :-(
>  
>  10GB of packages?  Holy moley that's a lot of software.

Yes, I was surprised how it all added up.

> > I gather I do this by installing from the Alternate CD, which is less
> > beautiful but gives you more control over the installation?
>  
>  Ayup.  Also put LVM in there while you're at it.  You'll thank me sooner or
>  later.  <grin>

Good point, thanks.

> > Is there a way to note the list of all the packages I installed, so I
> > can avoid spending another 4 hours selecting the packages again?
>  
>  On the old system:
>  
>  dpkg --get-selections > /tmp/package-list
>  
>  Copy /tmp/package-list to somewhere safe, then when you're ready to go
>  again, copy it onto the new system and run:
>  
>  dpkg --set-selections < /tmp/package-list

Hmm, I don't think I can do that unless I get the system running again.
The single-user mode doesn't even have vi, so I'd be surprised if it
has dpkg.

But if I can recreate the installation from the copy I saved to a spare
partition, then I won't need to try to dig the package info out from
the config files. (Fingers crossed.)

> > A bit of google searching on ubuntu and raid strongly suggests that
> > grub just doesn't work properly with a mirrored boot and/or root.
>  
>  Google is lying.  It works brilliantly.
>  
> > (1: Failed drives cause devices to change name.
>  
>  That's a "failure" of the device naming system, but it shouldn't matter
>  because md scans disks to find chunks of RAID to stitch together.

True, but I confess I don't understand how grub knows to boot up from
hda7 when sda7 fails, and vice versa, before it loads the initramfs
with raid support.

This is why many grub/raid setup pages I found said you needed to make
an entry like:

        title Linux (hda7)
              root (hd0,6)
              kernel /boot/vmlinuz... root=/dev/md0
              etc.

        title Linux (sda7)
              root (hd1,6)
              kernel /boot/vmlinuz... root=/dev/md0
              etc.

But I gather that you (and Jeff) haven't had to do that, and somehow
the right thing happens.

Interestingly for me, I've just noticed that on my old Ubuntu (on my
not yet raided /dev/hda6 partition):

   o    only one raid device is mentioned in /etc/fstab
        (/dev/md2 for /home)

   o    /etc/mdadm/mdadm.conf has:
        DEVICE /dev/hda* /dev/sda*
        ARRAY /dev/md0 devices=/dev/hda7,/dev/sda7
        ARRAY /dev/md2 devices=/dev/hda8,/dev/sda8

   o    /proc/mdstat reports that *three* raid arrays are up and
        running: md0, md1 and md2!

   o    mdadm -D reports that md1 and md2 have the same UUID and are
        built from the same devices (sda8, hda8) and /proc/mdstat says
        md1 and md2 are the same size (/proc/mdstat says the md2 raid
        device has strange names: dm-5[1] and dm-4[0], cf what it says
        for md1: sda8[0] and hda8[1]).

Maybe mdadm abhors a vacuum? :-)

(I had intended to use md0 for hda6/sda6, md1 for hda7/sda7, and md2 for
hda8/sda8 - but it looks like I set md0 = hda7/sda7 by mistake, though
at least I left out md1 because I wasn't going to build that until after
I'd got everything working on one /home and / pair, first.)

But at least now I know why /dev/hda7 is busy and can't be mounted -
it's somehow active inside a raid array that isn't mounted because its
fsck failed.

> > 2: Even after installing
> > grub to both devices in the mirror, you still have to have double
> > stanzas in menu.lst for each raw device so you *manually* choose to
> > boot off the other device in the event of failure.)
>  
>  I can't imagine why you'd need that, except in the above-mentioned scenario.
>
> > I gather that in contrast, lilo stores the actual locations for the
> > kernel images on both devices and *also* knows to try each device in
> > event of failure.
>  
>  I seriously doubt that LILO has that much smarts.  My guess is that it works
>  because LILO just goes "blocks [list] on the first disk has my kernel".

True, but I saw a reference to lilo trying each device in event of
failure.  Certainly I've had a drive drop out of a raid set under lilo
and the system booted up just fine.

> > Can you choose to use Lilo with Ubuntu?
>  
>  Yes.

Thanks for the info, Matt.

Jeff wrote:

> The way I use grub to boot a RAID 1 root filesystem:
> 
>  * Consider that /boot/grub is on both disks, and that each disk looks like
>    a normal enough filesystem to grub.
>
>  * Set up grub to use the RAID device (it might be md or lvm) as root.

Yep, I *did* pay attention to your earlier advice and did that (noting
that it's the "root=/dev/md0" root not the "root (hd...)" root.

>  * Install grub (using the grub command line) to the master boot record of
>    the block devices that make up the RAID device.

Did that.  Though when I booted I discovered the problem which I
eventually traced to messing up the construction of the md because you
can't avoid wiping the data on the partitions that are being put
together, because you *must* mke2fs on the md otherwise you get the
superblock mismatch error.

>  * Now test that grub boots correctly with either device removed. The BIOS
>    should boot from either, because grub is on the MBR of both. Once it's
>    booted, you're already in RAIDsville: grub and Linux are using the single
>    md or lvm device.
> 
>  * Don't skip the previous step. Always test. :-)

Hmm, you must have been reading my mind.  :-)  I was thinking "Do I
really want the hassle of undoing the chassis and disconnecting drives
to test the raid+grub is working okay...?"  You've convinced me I
should.  Maybe I can disable it temporarily in the BIOS to save me
opening the case and unplugging each drive in turn.

Thanks for the advice, guys.  And really, I felt any misunderstandings
were my fault.  I found all the advice very helpful.  And now I'm about
to try to recreate the arrays and see if I can get past the fsck
failures because I didn't mke2fs the raid devices.

Thanks again Jeff, and Matt.

Regards,

luke

--------
PS:

I've just learned that while you can say:

    mdadm --stop /dev/md0

you can't say:

    mdadm --run /dev/md0

Instead, you have to:

    mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/hda7 /dev/sda7

Then I waited for it to sync before I ran mke2fs -j on it, and now I'm
copying the 10GB of files back.  Then I can try fsck-ing it and see if
it's happy, and *then* I can try rebooting from it!

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Re: [SLUG] Re: An almost-hosed Linux system

Reply via email to