Three questions:
1) Do you install Ubuntu so that you can choose raid, by installing from
the Alternate CD?
2) Has anyone here got a Grub system booting from a raid1 / (slash) root
filesystem? (Google for grub and raid and you find lots of problems and
a few people saying it can be done.)
3) Can you tell Ubuntu to use Lilo instead of Grub as the boot loader?
Gory details below, mainly for posterity in case some other poor sap
makes the same mistakes I do.
Last minute thought - I should be able to destroy the /dev/md0 on "/"
and recreate it properly, and then copy the files back on. Yippee!
I do think though that for grub and raid (unlike lilo and raid), the
best you can do is double up the stanza for each kernel you want to
boot and manually choose the working drive when a drive in the raid
mirror fails?
--- gory details -----
I've been trying to retrofit raid mirroring on a system after installing
Ubuntu 6. This seemed to work out okay until I got to the point where
the fsck failed during boot because /dev/md0's superblock claimed a
different size to the partition table entry:
fsck 1.38 (30-Jun-2005)
/dev/md0 is mounted. fsck 1.38 (30-Jun-2005)
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
The filesystem size (according to the superblock) is 3146724 blocks
The physical size of the device is 3146704 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? yes
(A google search lead me to
https://launchpad.net/distros/ubuntu/+source/debian-installer/+bug/13076
which included this comment which interested me, because when I created
my raid I also found the devices all had the same UUID that I had to
correct:
I had the exact same problem as described in comment #2. It actually
appears to
be related to the mdadm 1.81 UUID bug, but I can't seem to find the
reference to
that bug right now.
Only one of my RAID-1 arrays (/) started up , because mdadm 1.81 created
all of
my arrays with the same UUID.
)
Running fsck on /dev/md0 manually was unable to fix the problem.
Running fsck manually on /dev/hda7 and /dev/sda7 did seem to repair all
problems (the same problems), except that it made no difference to the
correctness of /dev/md0 that was made from them.
In short, I can't boot from it.
I just found why fsck reported the superblock error. Yes, this is
exactly what I did wrong - I mke2fs'd the individual raid components and
added them together, since I was turning an existing ext3 slash into a
mirror. Damn:
http://www.linuxjournal.com/node/5653/print says:
8. Create an ext2 filesystem on /dev/md0 using the command mke2fs
/dev/md0. Do not mke2fs on the RAID-1 component partitions,
in this case /dev/hda2 and /dev/hdc2. If you do not create an
ext2 filesystem on /dev/md0, then e2fsck /dev/md0 will return
an error message, something like this:
The filesystem size (according to the superblock) is
2104483 blocks. The physical size of the device is 2104384 blocks.
Either the superblock or the partition table is likely to be corrupt.
This is because mkraid writes the RAID superblock near the
end of the component partitions. e2fsck does not recognize
the RAID superblock that has caused the physical size to be
smaller. You can mount /dev/md0 at this point, and even use
/usr, but the ext2 filesystem superblock contains incorrect
information. You may not notice problems but you should not use
the filesystem in this state. You will not be able to boot and
mount /dev/md0 unless you turn off the filesystem checking by
making the appropriate entry in fstab (e.g., /dev/md0 /usr ext2
defaults 1 0). The 0 at the end of the line causes e2fsck to be
skipped. Do not do this unless you have to fix your RAID. Make
/dev/md0 an ext2 filesystem.
Rebooting to an older Ubuntu 6 on a different partition, I was unable to
mount either device in the raid array in question.
If I tried to mount a device in the array, mount reported that it was
already mounted. If I forced a mount, it didn't know the filesystem
type. If I told it ext3, the filesystem mounted but appeared to be
empty.
When I unmounted the device (/dev/hda7), "mount" reported that it
wasn't mounted. (?!)
I thought the data might be lost, but by booting up as far as I can
from the semi-hosed system, I can at least see the slash filesystem and
mount other areas, and copy everything off. (Interesting to see a
handful of errors in the copy, that match what fsck reported as
problems.)
I'm thinking I should install Ubuntu again from scratch, and redo the
10GB of extra package installation and all the configuration for mail
etc. again. :-(
I gather I do this by installing from the Alternate CD, which is less
beautiful but gives you more control over the installation?
Is there a way to note the list of all the packages I installed, so I
can avoid spending another 4 hours selecting the packages again?
A bit of google searching on ubuntu and raid strongly suggests that
grub just doesn't work properly with a mirrored boot and/or root.
(1: Failed drives cause devices to change name. 2: Even after installing
grub to both devices in the mirror, you still have to have double
stanzas in menu.lst for each raw device so you *manually* choose to
boot off the other device in the event of failure.)
To contradict myself, this page indicates someone doing it happily with
Ubuntu 6.06:
http://users.piuha.net/martti/comp/ubuntu/raid.html
I gather that in contrast, lilo stores the actual locations for the
kernel images on both devices and *also* knows to try each device in
event of failure.
Does anyone here have any other tips on installing Ubuntu onto a
software raid mirror?
Can you choose to use Lilo with Ubuntu?
Some config details below.
luke
----------------- fdisk /dev/hda ----------------------------
Disk /dev/hda: 200.0 GB, 200049647616 bytes
255 heads, 63 sectors/track, 24321 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 5737 46082421 7 HPFS/NTFS
/dev/hda2 5738 5992 2048287+ b W95 FAT32
/dev/hda3 5993 24321 147227692+ 5 Extended
/dev/hda5 5993 6057 522081 82 Linux swap / Solaris
/dev/hda6 6058 7624 12586896 83 Linux
/dev/hda7 7625 9191 12586896 fd Linux raid autodetect
/dev/hda8 9192 24321 121531693+ fd Linux raid autodetect
----------------- fdisk /dev/sda ----------------------------
Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 5737 46082421 7 HPFS/NTFS
/dev/sda2 5738 5992 2048287+ b W95 FAT32
/dev/sda3 5993 30401 196065292+ 5 Extended
/dev/sda5 5993 6057 522081 82 Linux swap / Solaris
/dev/sda6 6058 7624 12586896 83 Linux
/dev/sda7 7625 9191 12586896 fd Linux raid autodetect
/dev/sda8 9192 24321 121531693+ fd Linux raid autodetect
/dev/sda9 24322 30401 48837568+ 83 Linux
----------------- mount ----------------------------
/dev/hda7 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
/sys on /sys type sysfs (rw)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
-------------- After booting up in the old /dev/hda6 Ubuntu ----------
[EMAIL PROTECTED]:~# mount
/dev/hda6 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
/sys on /sys type sysfs (rw)
varrun on /var/run type tmpfs (rw)
varlock on /var/lock type tmpfs (rw)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
devshm on /dev/shm type tmpfs (rw)
lrm on /lib/modules/2.6.15-25-386/volatile type tmpfs (rw)
/dev/hda1 on /C type ntfs (rw,nls=utf8,umask=007,gid=46)
/dev/hda2 on /D type vfat (rw,utf8,umask=007,gid=46)
/dev/md2 on /home type ext3 (rw)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
----------------- /etc/mdadm/mdadm.conf ----------------------------
DEVICE /dev/hda* /dev/sda*
ARRAY /dev/md0 devices=/dev/hda7,/dev/sda7
ARRAY /dev/md2 devices=/dev/hda8,/dev/sda8
----------------- mdadm -E /dev/md0 ----------------------------
[EMAIL PROTECTED]:~# mdadm -E /dev/md0
mdadm: No super block found on /dev/md0 (Expected magic a92b4efc, got 5c5b8cb7)
----------------- mdadm -D /dev/md0 ----------------------------
/dev/md0:
Version : 00.90.03
Creation Time : Sun Aug 6 17:49:47 2006
Raid Level : raid1
Array Size : 12586816 (12.00 GiB 12.89 GB)
Device Size : 12586816 (12.00 GiB 12.89 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sun Aug 27 22:17:05 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 198d19c1:54e1b9b4:06d2b37b:ad06ae48
Events : 0.1298
Number Major Minor RaidDevice State
0 8 7 0 active sync /dev/sda7
1 3 7 1 active sync /dev/hda7
----------------- mdadm -D /dev/md1 ----------------------------
/dev/md1:
Version : 00.90.03
Creation Time : Sun Aug 6 15:03:29 2006
Raid Level : raid1
Array Size : 121531584 (115.90 GiB 124.45 GB)
Device Size : 121531584 (115.90 GiB 124.45 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sat Aug 12 09:24:07 2006
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 4da55295:9574e998:08b9b3cb:7a82a70b
Events : 0.51059
Number Major Minor RaidDevice State
0 8 8 0 active sync /dev/sda8
1 3 8 1 active sync /dev/hda8
-------------- Weird stuff --------------------------------------
After mount -f -t ext3 /dev/hda7 /mnt/tmp
-------------------- mount --------------------------------------
/dev/hda6 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
/sys on /sys type sysfs (rw)
varrun on /var/run type tmpfs (rw)
varlock on /var/lock type tmpfs (rw)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
devshm on /dev/shm type tmpfs (rw)
lrm on /lib/modules/2.6.15-25-386/volatile type tmpfs (rw)
/dev/hda1 on /C type ntfs (rw,nls=utf8,umask=007,gid=46)
/dev/hda2 on /D type vfat (rw,utf8,umask=007,gid=46)
/dev/md2 on /home type ext3 (rw)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/dev/hda7 on /mnt/tmp type ext3 (rw)
/dev/sdb1 on /media/PD 12X type vfat
(rw,nosuid,nodev,quiet,shortname=mixed,uid=1000,gid=1000,umask=077,iocharset=utf8)
-------------------- umount --------------------------------------
[EMAIL PROTECTED]:~# umount /dev/hda7
umount: /dev/hda7: not mounted
umount: /dev/hda7: not mounted
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html