Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
On Wed, Dec 22, 2004 at 03:39:32PM +0900, Horms wrote: On Tue, Dec 21, 2004 at 10:15:54AM -0800, Marc Singer wrote: On Mon, Dec 20, 2004 at 02:52:50AM +0100, Goswin von Brederlow wrote: Marc Singer [EMAIL PROTECTED] writes: On running mdadm --assemble /dev/md0 mdadm[4247]: segfault at 002c rip 0804b19e rsp db80 error 4 mdadm[4253]: segfault at 002c rip 0804b19e rsp db80 error 4 That doesn't mean much. Unaligned access gets reported as such as well for example. Could you paste a gdb stack backtrace preverably after rebuilding mdadm with debug info? I've had a look at the mdadm side of this problem. It looks like it is crashing because there is no configuration file. From mdadm.c break; case ASSEMBLE: if (devs_found == 1 ident.uuid_set == 0 ident.super_minor == UnSet !scan ) { /* Only a device has been given, so get details from config file */ mddev_ident_t array_ident = conf_get_ident(configfile, devlist-devname); mdfd = open_mddev(devlist-devname, array_ident-autof); The conf_get_ident() is returning 0 which is then dereferenced by the open_mddev() call and segfault'ing. configfile is NULL because none was given. AFAICT, configfile never defaults. And even if it did, there is no config file on my machine. I haven't yet looked into the code within the initrd to see what it does. It is possible that the only way that my setup can work is if the md driver is compiled in. Still, I wonder about the ioctl errors that I see with fdisk. I took a breif look at initrd and for reference have included how it handles mdadm below. I think that the important bit is that it passes not only the md device but its component devices to mdadm -A (--assemble). It seems to me that as your invocation does not do that, it is going into a code path in mdadm that has a bug in it and thus segfaults. In a nutshell, I don't think it is a kernel problem. -- Horms getraid_mdadm() { mdadm=$(mdadm -D $device) || { echo $PROG: mdadm -D $device failed 2 exit 1 } eval $( echo $mdadm | awk ' $1 == Raid $2 == Level { print echo $4; next } $1 == Number $2 == Major { start = 1; next } $1 == UUID { print uuid= $3; start = 0; next } !start { next } $2 == 0 $3 == 0 { next } { devices = devices $NF } END { print devices='\'' devices '\'' } ' ) printf '%s\n' $devices getroot echo mdadm -A /devfs/md/$minor -R -u $uuid $devices \ md$minor-script echo /sbin/mdadm 6 } So, if I understand correctly, if I were to add the UUID of the devices then I'd probably have a working solution? I can certainly try that.
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
On Wed, Dec 22, 2004 at 03:39:32PM +0900, Horms wrote: I took a breif look at initrd and for reference have included how it handles mdadm below. I think that the important bit is that it passes not only the md device but its component devices to mdadm -A (--assemble). It seems to me that as your invocation does not do that, it is going into a code path in mdadm that has a bug in it and thus segfaults. In a nutshell, I don't think it is a kernel problem. Adding the UUID to the mdadm command line allows MD to assemble the raid. So, that answers the question why doesn't mdadm assembled once the machine is booted. Now, the question is this: with the md and raid modules added to the initrd, why doesn't initrd find and start the raid array? -- Horms getraid_mdadm() { mdadm=$(mdadm -D $device) || { echo $PROG: mdadm -D $device failed 2 exit 1 } eval $( echo $mdadm | awk ' $1 == Raid $2 == Level { print echo $4; next } $1 == Number $2 == Major { start = 1; next } $1 == UUID { print uuid= $3; start = 0; next } !start { next } $2 == 0 $3 == 0 { next } { devices = devices $NF } END { print devices='\'' devices '\'' } ' ) printf '%s\n' $devices getroot echo mdadm -A /devfs/md/$minor -R -u $uuid $devices \ md$minor-script echo /sbin/mdadm 6 }
Re: Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
On Wed, 2004-12-22 at 01:23 -0800, Marc Singer wrote: On Wed, Dec 22, 2004 at 03:39:32PM +0900, Horms wrote: I took a breif look at initrd and for reference have included how it handles mdadm below. I think that the important bit is that it passes not only the md device but its component devices to mdadm -A (--assemble). It seems to me that as your invocation does not do that, it is going into a code path in mdadm that has a bug in it and thus segfaults. In a nutshell, I don't think it is a kernel problem. Adding the UUID to the mdadm command line allows MD to assemble the raid. So, that answers the question why doesn't mdadm assembled once the machine is booted. Now, the question is this: with the md and raid modules added to the initrd, why doesn't initrd find and start the raid array? I have an outstanding bug with this same issue. Except, I used udevd, it wasn't creating the /dev nodes soon enough. I had to revert to non-udevd /dev. This solved the mounting problems after the initrd mounted the root fs, which all md (mirrored). with udev, mdadm could not find anything except /. The following is what I have working beautifully. [EMAIL PROTECTED]:~$ df FilesystemSize Used Avail Use% Mounted on /dev/md1 1.9G 60M 1.9G 4% / tmpfs 475M 0 475M 0% /dev/shm /dev/md0 139M 4.6M 134M 4% /boot /dev/md4 9.6G 18M 9.6G 1% /home /dev/md3 972M 656K 972M 1% /tmp /dev/md5 15G 179M 15G 2% /usr /dev/md2 9.6G 239M 9.3G 3% /var Dunno if it is related. -- greg, [EMAIL PROTECTED] The technology that is Stronger, better, faster: Linux signature.asc Description: This is a digitally signed message part
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
On Mon, Dec 20, 2004 at 02:52:50AM +0100, Goswin von Brederlow wrote: Marc Singer [EMAIL PROTECTED] writes: On running mdadm --assemble /dev/md0 mdadm[4247]: segfault at 002c rip 0804b19e rsp db80 error 4 mdadm[4253]: segfault at 002c rip 0804b19e rsp db80 error 4 That doesn't mean much. Unaligned access gets reported as such as well for example. Could you paste a gdb stack backtrace preverably after rebuilding mdadm with debug info? I've had a look at the mdadm side of this problem. It looks like it is crashing because there is no configuration file. From mdadm.c break; case ASSEMBLE: if (devs_found == 1 ident.uuid_set == 0 ident.super_minor == UnSet !scan ) { /* Only a device has been given, so get details from config file */ mddev_ident_t array_ident = conf_get_ident(configfile, devlist-devname); mdfd = open_mddev(devlist-devname, array_ident-autof); The conf_get_ident() is returning 0 which is then dereferenced by the open_mddev() call and segfault'ing. configfile is NULL because none was given. AFAICT, configfile never defaults. And even if it did, there is no config file on my machine. I haven't yet looked into the code within the initrd to see what it does. It is possible that the only way that my setup can work is if the md driver is compiled in. Still, I wonder about the ioctl errors that I see with fdisk.
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
On Mon, Dec 20, 2004 at 02:52:50AM +0100, Goswin von Brederlow wrote: Marc Singer [EMAIL PROTECTED] writes: On running mdadm --assemble /dev/md0 mdadm[4247]: segfault at 002c rip 0804b19e rsp db80 error 4 mdadm[4253]: segfault at 002c rip 0804b19e rsp db80 error 4 That doesn't mean much. Unaligned access gets reported as such as well for example. Could you paste a gdb stack backtrace preverably after rebuilding mdadm with debug info? I'll see what I can do.
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
Hello, On Sat, Dec 18, 2004 at 04:35:37PM -0800, Marc Singer wrote: I'm testing this kernel on a machine that resently boots to 2.6.7 with no initrd and, therefore, all drivers for boot are compiled into the kernel. There are four drives in the array, all with type 0xfd. The 2.6.7 kernel finds them automatically. The 2.6.8 package does not, but that isn't the issue as this device isn't needed at boot-time. Did you add the needed md and raid modules to your initrd? Once the kernel boots, I load the raid5 module and look at /proc/mdstat. It looks OK. mdadm --assemble /dev/md0, however, crashes. I haven't saved error messages, though I could write then to a file if need be. you need to put the modules into /etc/mkinitrd/modules, and recreate you initrd with mkinitrd -o /boot/initrd.img-2.6.8-9-amd64-k8 2.6.8-9-amd64-k8 and rerun lilo, if you use it. This should fix the problem. -- ENOSIG pgpsnVX63EJqZ.pgp Description: PGP signature
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
On Sun, Dec 19, 2004 at 03:00:13PM +0100, Frederik Schueler wrote: Hello, On Sat, Dec 18, 2004 at 04:35:37PM -0800, Marc Singer wrote: I'm testing this kernel on a machine that resently boots to 2.6.7 with no initrd and, therefore, all drivers for boot are compiled into the kernel. There are four drives in the array, all with type 0xfd. The 2.6.7 kernel finds them automatically. The 2.6.8 package does not, but that isn't the issue as this device isn't needed at boot-time. Did you add the needed md and raid modules to your initrd? Since I don't care about it a boot-up, no. The root partition as well as /usr and /sbin is on an IDE drive. Once the kernel boots, I load the raid5 module and look at /proc/mdstat. It looks OK. mdadm --assemble /dev/md0, however, crashes. I haven't saved error messages, though I could write then to a file if need be. you need to put the modules into /etc/mkinitrd/modules, and recreate you initrd with mkinitrd -o /boot/initrd.img-2.6.8-9-amd64-k8 2.6.8-9-amd64-k8 and rerun lilo, if you use it. This should fix the problem. This isn't a boot-up problem. The MD driver was crashing when I loaded the modules by hand after the system booted. Of course, I'll try it anyway just to see if it makes a difference. Thanks for responding.
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
Hi, On Sun, Dec 19, 2004 at 10:08:08AM -0800, Marc Singer wrote: Once the kernel boots, I load the raid5 module and look at /proc/mdstat. It looks OK. mdadm --assemble /dev/md0, however, crashes. I haven't saved error messages, though I could write then to a file if need be. Can you please post this error message? Kind regards Frederik Schueler -- ENOSIG pgpGnOEdJ1Ndu.pgp Description: PGP signature
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
Marc Singer [EMAIL PROTECTED] writes: On Sun, Dec 19, 2004 at 03:00:13PM +0100, Frederik Schueler wrote: Hello, On Sat, Dec 18, 2004 at 04:35:37PM -0800, Marc Singer wrote: I'm testing this kernel on a machine that resently boots to 2.6.7 with no initrd and, therefore, all drivers for boot are compiled into the kernel. There are four drives in the array, all with type 0xfd. The 2.6.7 kernel finds them automatically. The 2.6.8 package does not, but that isn't the issue as this device isn't needed at boot-time. Did you add the needed md and raid modules to your initrd? Since I don't care about it a boot-up, no. The root partition as well as /usr and /sbin is on an IDE drive. Having raid buildin enables the autodetection. Raid as modules needs to get started manually. Once the kernel boots, I load the raid5 module and look at /proc/mdstat. It looks OK. mdadm --assemble /dev/md0, however, crashes. I haven't saved error messages, though I could write then to a file if need be. you need to put the modules into /etc/mkinitrd/modules, and recreate you initrd with mkinitrd -o /boot/initrd.img-2.6.8-9-amd64-k8 2.6.8-9-amd64-k8 and rerun lilo, if you use it. This should fix the problem. This isn't a boot-up problem. The MD driver was crashing when I loaded the modules by hand after the system booted. Of course, I'll try it anyway just to see if it makes a difference. Thanks for responding. If you can't get the modules to work or track down the bug compile a kernel with buildin raid support. I'm using a standard 2.6.8 compiled with gcc 3.3.4 (1:3.3.4-11) and raid buildin without problems. MfG Goswin
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
On Sun, Dec 19, 2004 at 08:40:51PM +0100, Goswin von Brederlow wrote: Marc Singer [EMAIL PROTECTED] writes: On Sun, Dec 19, 2004 at 03:00:13PM +0100, Frederik Schueler wrote: Did you add the needed md and raid modules to your initrd? Since I don't care about it a boot-up, no. The root partition as well as /usr and /sbin is on an IDE drive. Having raid buildin enables the autodetection. Raid as modules needs to get started manually. I got that part. If you can't get the modules to work or track down the bug compile a kernel with buildin raid support. I'm using a standard 2.6.8 compiled with gcc 3.3.4 (1:3.3.4-11) and raid buildin without problems. The idea, here, is to test the debian kernel-image package. Compiling a kernel kinda defeats that.
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
On Sun, Dec 19, 2004 at 03:00:13PM +0100, Frederik Schueler wrote: Hello, On Sat, Dec 18, 2004 at 04:35:37PM -0800, Marc Singer wrote: I'm testing this kernel on a machine that resently boots to 2.6.7 with no initrd and, therefore, all drivers for boot are compiled into the kernel. There are four drives in the array, all with type 0xfd. The 2.6.7 kernel finds them automatically. The 2.6.8 package does not, but that isn't the issue as this device isn't needed at boot-time. Did you add the needed md and raid modules to your initrd? Once the kernel boots, I load the raid5 module and look at /proc/mdstat. It looks OK. mdadm --assemble /dev/md0, however, crashes. I haven't saved error messages, though I could write then to a file if need be. you need to put the modules into /etc/mkinitrd/modules, and recreate you initrd with mkinitrd -o /boot/initrd.img-2.6.8-9-amd64-k8 2.6.8-9-amd64-k8 Added raid1 raid5 md to the modules file and booted with the updated initrd. Indeed, the modules are loaded. The error persists. and rerun lilo, if you use it. Using grub. This should fix the problem. Here're the messages from the dmesg log: On running fdisk -l: ioctl32(fdisk:4187): Unknown cmd fd(5) cmd(80081272){00} arg(dab8) on /dev/hda ioctl32(fdisk:4187): Unknown cmd fd(6) cmd(80081272){00} arg(dab8) on /dev/hdb ioctl32(fdisk:4187): Unknown cmd fd(7) cmd(80081272){00} arg(dab8) on /dev/hdd ioctl32(fdisk:4187): Unknown cmd fd(8) cmd(80081272){00} arg(dab8) on /dev/sda ioctl32(fdisk:4187): Unknown cmd fd(9) cmd(80081272){00} arg(dab8) on /dev/sdb ioctl32(fdisk:4187): Unknown cmd fd(10) cmd(80081272){00} arg(dab8) on /dev/sdc ioctl32(fdisk:4187): Unknown cmd fd(11) cmd(80081272){00} arg(dab8) on /dev/sdd On running mdadm --assemble /dev/md0 mdadm[4247]: segfault at 002c rip 0804b19e rsp db80 error 4 mdadm[4253]: segfault at 002c rip 0804b19e rsp db80 error 4
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
A couple of other data points: o The kernel-image 2.6.8-1-686 is also unable to mount the md device. There are not kernel messages when it fails. mdadm fails with a segment fault. o The 2.6.7 kernel is saying to following when it boots. The odd thing is that there is no /dev/md1 md: md1 stopped. md: could not lock sda. md: md_import_device returned -16 md: could not lock sdd. md: md_import_device returned -16 md: md1 stopped. md: could not lock sda. md: md_import_device returned -16 md: could not lock sdd. md: md_import_device returned -16 FYI: [EMAIL PROTECTED] ~ sudo mdadm --misc --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Thu May 8 15:16:00 2003 Raid Level : raid5 Array Size : 107739264 (102.75 GiB 110.33 GB) Device Size : 35913088 (34.25 GiB 36.78 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Dec 19 13:34:03 2004 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 128K UUID : 9192b7da:86bf82ba:3d9b7289:5cae6f5c Events : 0.5778796 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 812 active sync /dev/sda1 3 8 173 active sync /dev/sdb1
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
Marc Singer [EMAIL PROTECTED] writes: On running mdadm --assemble /dev/md0 mdadm[4247]: segfault at 002c rip 0804b19e rsp db80 error 4 mdadm[4253]: segfault at 002c rip 0804b19e rsp db80 error 4 That doesn't mean much. Unaligned access gets reported as such as well for example. Could you paste a gdb stack backtrace preverably after rebuilding mdadm with debug info? MfG Goswin
Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices
Package: kernel-image-2.6.8-9-amd64-k8 Version: 2.6.8-5 Severity: important I'm testing this kernel on a machine that resently boots to 2.6.7 with no initrd and, therefore, all drivers for boot are compiled into the kernel. There are four drives in the array, all with type 0xfd. The 2.6.7 kernel finds them automatically. The 2.6.8 package does not, but that isn't the issue as this device isn't needed at boot-time. Once the kernel boots, I load the raid5 module and look at /proc/mdstat. It looks OK. mdadm --assemble /dev/md0, however, crashes. I haven't saved error messages, though I could write then to a file if need be. One of the differences here is that the 2.6.7 kernel is built for x86. The 2.6.8 kernel was built for amd64, so it is possible that this is an issue with 64bitness. -- System Information: Debian Release: 3.1 APT prefers testing APT policy: (500, 'testing') Architecture: i386 (i686) Kernel: Linux 2.6.7 Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1) Versions of packages kernel-image-2.6.8-9-amd64-k8 depends on: ii coreutils [fileutils] 5.2.1-2The GNU core utilities ii initrd-tools 0.1.74 tools to create initrd image for p ii module-init-tools 3.1-rel-2 tools for managing Linux kernel mo -- no debconf information