Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-22 Thread Marc Singer
On Wed, Dec 22, 2004 at 03:39:32PM +0900, Horms wrote:
 On Tue, Dec 21, 2004 at 10:15:54AM -0800, Marc Singer wrote:
  On Mon, Dec 20, 2004 at 02:52:50AM +0100, Goswin von Brederlow wrote:
   Marc Singer [EMAIL PROTECTED] writes:
   
On running mdadm --assemble /dev/md0
   
  mdadm[4247]: segfault at 002c rip 0804b19e rsp 
db80 error 4
  mdadm[4253]: segfault at 002c rip 0804b19e rsp 
db80 error 4
   
   That doesn't mean much. Unaligned access gets reported as such as
   well for example.
   
   Could you paste a gdb stack backtrace preverably after rebuilding
   mdadm with debug info?
  
  I've had a look at the mdadm side of this problem.  It looks like it
  is crashing because there is no configuration file.  From mdadm.c
  
break;
case ASSEMBLE:
if (devs_found == 1  ident.uuid_set == 0 
ident.super_minor == UnSet  !scan ) {
/* Only a device has been given, so get details from 
  config file */
mddev_ident_t array_ident = conf_get_ident(configfile, 
  devlist-devname);
mdfd = open_mddev(devlist-devname, array_ident-autof);
  
  The conf_get_ident() is returning 0 which is then dereferenced by the
  open_mddev() call and segfault'ing.  configfile is NULL because none
  was given.  AFAICT, configfile never defaults.  And even if it did,
  there is no config file on my machine.
  
  I haven't yet looked into the code within the initrd to see what it
  does.  It is possible that the only way that my setup can work is if
  the md driver is compiled in.
  
  Still, I wonder about the ioctl errors that I see with fdisk.
 
 I took a breif look at initrd and for reference have included
 how it handles mdadm below. I think that the important bit
 is that it passes not only the md device but its component
 devices to mdadm -A (--assemble). It seems to me that as your
 invocation does not do that, it is going into a code path
 in mdadm that has a bug in it and thus segfaults. In a nutshell,
 I don't think it is a kernel problem.
 
 -- 
 Horms
 
 getraid_mdadm() {
 mdadm=$(mdadm -D $device) || {
 echo $PROG: mdadm -D $device failed 2
 exit 1
 }
 eval $(
 echo $mdadm | awk '
 $1 == Raid  $2 == Level { print echo  $4; 
 next }
 $1 == Number  $2 == Major { start = 1; next }
 $1 == UUID { print uuid= $3; start = 0; next }
 !start { next }
 $2 == 0  $3 == 0 { next }
 { devices = devices   $NF }
 END { print devices='\'' devices '\'' }
 '
 )
 
 printf '%s\n' $devices  getroot
 echo mdadm -A /devfs/md/$minor -R -u $uuid $devices \
  md$minor-script
 echo /sbin/mdadm 6
 }

So, if I understand correctly, if I were to add the UUID of the
devices then I'd probably have a working solution?  I can certainly
try that.





Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-22 Thread Marc Singer
On Wed, Dec 22, 2004 at 03:39:32PM +0900, Horms wrote:
 I took a breif look at initrd and for reference have included
 how it handles mdadm below. I think that the important bit
 is that it passes not only the md device but its component
 devices to mdadm -A (--assemble). It seems to me that as your
 invocation does not do that, it is going into a code path
 in mdadm that has a bug in it and thus segfaults. In a nutshell,
 I don't think it is a kernel problem.

Adding the UUID to the mdadm command line allows MD to assemble the
raid.  So, that answers the question why doesn't mdadm assembled once
the machine is booted.  

Now, the question is this: with the md and raid modules added to the
initrd, why doesn't initrd find and start the raid array?

 
 -- 
 Horms
 
 getraid_mdadm() {
 mdadm=$(mdadm -D $device) || {
 echo $PROG: mdadm -D $device failed 2
 exit 1
 }
 eval $(
 echo $mdadm | awk '
 $1 == Raid  $2 == Level { print echo  $4; 
 next }
 $1 == Number  $2 == Major { start = 1; next }
 $1 == UUID { print uuid= $3; start = 0; next }
 !start { next }
 $2 == 0  $3 == 0 { next }
 { devices = devices   $NF }
 END { print devices='\'' devices '\'' }
 '
 )
 
 printf '%s\n' $devices  getroot
 echo mdadm -A /devfs/md/$minor -R -u $uuid $devices \
  md$minor-script
 echo /sbin/mdadm 6
 }




Re: Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-22 Thread Greg Folkert
On Wed, 2004-12-22 at 01:23 -0800, Marc Singer wrote:
 On Wed, Dec 22, 2004 at 03:39:32PM +0900, Horms wrote:
  I took a breif look at initrd and for reference have included
  how it handles mdadm below. I think that the important bit
  is that it passes not only the md device but its component
  devices to mdadm -A (--assemble). It seems to me that as your
  invocation does not do that, it is going into a code path
  in mdadm that has a bug in it and thus segfaults. In a nutshell,
  I don't think it is a kernel problem.
 
 Adding the UUID to the mdadm command line allows MD to assemble the
 raid.  So, that answers the question why doesn't mdadm assembled once
 the machine is booted.  
 
 Now, the question is this: with the md and raid modules added to the
 initrd, why doesn't initrd find and start the raid array?

I have an outstanding bug with this same issue. Except, I used udevd, it
wasn't creating the /dev nodes soon enough. I had to revert to
non-udevd /dev. This solved the mounting problems after the initrd
mounted the root fs, which all md (mirrored).

with udev, mdadm could not find anything except /. The following is what
I have working beautifully.

[EMAIL PROTECTED]:~$ df
FilesystemSize  Used Avail Use% Mounted on
/dev/md1  1.9G   60M  1.9G   4% /
tmpfs 475M 0  475M   0% /dev/shm
/dev/md0  139M  4.6M  134M   4% /boot
/dev/md4  9.6G   18M  9.6G   1% /home
/dev/md3  972M  656K  972M   1% /tmp
/dev/md5   15G  179M   15G   2% /usr
/dev/md2  9.6G  239M  9.3G   3% /var


Dunno if it is related.
-- 
greg, [EMAIL PROTECTED]

The technology that is
Stronger, better, faster: Linux


signature.asc
Description: This is a digitally signed message part


Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-21 Thread Marc Singer
On Mon, Dec 20, 2004 at 02:52:50AM +0100, Goswin von Brederlow wrote:
 Marc Singer [EMAIL PROTECTED] writes:
 
  On running mdadm --assemble /dev/md0
 
mdadm[4247]: segfault at 002c rip 0804b19e rsp 
  db80 error 4
mdadm[4253]: segfault at 002c rip 0804b19e rsp 
  db80 error 4
 
 That doesn't mean much. Unaligned access gets reported as such as
 well for example.
 
 Could you paste a gdb stack backtrace preverably after rebuilding
 mdadm with debug info?

I've had a look at the mdadm side of this problem.  It looks like it
is crashing because there is no configuration file.  From mdadm.c

  break;
  case ASSEMBLE:
  if (devs_found == 1  ident.uuid_set == 0 
  ident.super_minor == UnSet  !scan ) {
  /* Only a device has been given, so get details from config 
file */
  mddev_ident_t array_ident = conf_get_ident(configfile, 
devlist-devname);
  mdfd = open_mddev(devlist-devname, array_ident-autof);

The conf_get_ident() is returning 0 which is then dereferenced by the
open_mddev() call and segfault'ing.  configfile is NULL because none
was given.  AFAICT, configfile never defaults.  And even if it did,
there is no config file on my machine.

I haven't yet looked into the code within the initrd to see what it
does.  It is possible that the only way that my setup can work is if
the md driver is compiled in.

Still, I wonder about the ioctl errors that I see with fdisk.






Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-20 Thread Marc Singer
On Mon, Dec 20, 2004 at 02:52:50AM +0100, Goswin von Brederlow wrote:
 Marc Singer [EMAIL PROTECTED] writes:
 
  On running mdadm --assemble /dev/md0
 
mdadm[4247]: segfault at 002c rip 0804b19e rsp 
  db80 error 4
mdadm[4253]: segfault at 002c rip 0804b19e rsp 
  db80 error 4
 
 That doesn't mean much. Unaligned access gets reported as such as
 well for example.
 
 Could you paste a gdb stack backtrace preverably after rebuilding
 mdadm with debug info?

I'll see what I can do.




Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-19 Thread Frederik Schueler
Hello,

On Sat, Dec 18, 2004 at 04:35:37PM -0800, Marc Singer wrote:
 I'm testing this kernel on a machine that resently boots to 2.6.7 with
 no initrd and, therefore, all drivers for boot are compiled into the
 kernel.
  There are four drives in the array, all with type 0xfd.  The
 2.6.7 kernel finds them automatically.  The 2.6.8 package does not,
 but that isn't the issue as this device isn't needed at boot-time.
 
Did you add the needed md and raid modules to your initrd?

 Once the kernel boots, I load the raid5 module and look at
 /proc/mdstat.  It looks OK.  mdadm --assemble /dev/md0, however,
 crashes.  I haven't saved error messages, though I could write then to
 a file if need be.

you need to put the modules into /etc/mkinitrd/modules, and recreate you
initrd with 

mkinitrd -o /boot/initrd.img-2.6.8-9-amd64-k8 2.6.8-9-amd64-k8

and rerun lilo, if you use it.

This should fix the problem.
 
-- 
ENOSIG


pgpsnVX63EJqZ.pgp
Description: PGP signature


Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-19 Thread Marc Singer
On Sun, Dec 19, 2004 at 03:00:13PM +0100, Frederik Schueler wrote:
 Hello,
 
 On Sat, Dec 18, 2004 at 04:35:37PM -0800, Marc Singer wrote:
  I'm testing this kernel on a machine that resently boots to 2.6.7 with
  no initrd and, therefore, all drivers for boot are compiled into the
  kernel.
   There are four drives in the array, all with type 0xfd.  The
  2.6.7 kernel finds them automatically.  The 2.6.8 package does not,
  but that isn't the issue as this device isn't needed at boot-time.
  
 Did you add the needed md and raid modules to your initrd?

Since I don't care about it a boot-up, no.  The root partition as well
as /usr and /sbin is on an IDE drive.

  Once the kernel boots, I load the raid5 module and look at
  /proc/mdstat.  It looks OK.  mdadm --assemble /dev/md0, however,
  crashes.  I haven't saved error messages, though I could write then to
  a file if need be.
 
 you need to put the modules into /etc/mkinitrd/modules, and recreate you
 initrd with 
 
 mkinitrd -o /boot/initrd.img-2.6.8-9-amd64-k8 2.6.8-9-amd64-k8
 
 and rerun lilo, if you use it.
 
 This should fix the problem.

This isn't a boot-up problem.  The MD driver was crashing when I
loaded the modules by hand after the system booted.

Of course, I'll try it anyway just to see if it makes a difference.
Thanks for responding.





Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-19 Thread Frederik Schueler
Hi,

On Sun, Dec 19, 2004 at 10:08:08AM -0800, Marc Singer wrote:
   Once the kernel boots, I load the raid5 module and look at
   /proc/mdstat.  It looks OK.  mdadm --assemble /dev/md0, however,
   crashes.  I haven't saved error messages, though I could write then to
   a file if need be.
 
Can you please post this error message? 

Kind regards
Frederik Schueler

-- 
ENOSIG


pgpGnOEdJ1Ndu.pgp
Description: PGP signature


Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-19 Thread Goswin von Brederlow
Marc Singer [EMAIL PROTECTED] writes:

 On Sun, Dec 19, 2004 at 03:00:13PM +0100, Frederik Schueler wrote:
 Hello,
 
 On Sat, Dec 18, 2004 at 04:35:37PM -0800, Marc Singer wrote:
  I'm testing this kernel on a machine that resently boots to 2.6.7 with
  no initrd and, therefore, all drivers for boot are compiled into the
  kernel.
   There are four drives in the array, all with type 0xfd.  The
  2.6.7 kernel finds them automatically.  The 2.6.8 package does not,
  but that isn't the issue as this device isn't needed at boot-time.
  
 Did you add the needed md and raid modules to your initrd?

 Since I don't care about it a boot-up, no.  The root partition as well
 as /usr and /sbin is on an IDE drive.

Having raid buildin enables the autodetection. Raid as modules needs
to get started manually.

  Once the kernel boots, I load the raid5 module and look at
  /proc/mdstat.  It looks OK.  mdadm --assemble /dev/md0, however,
  crashes.  I haven't saved error messages, though I could write then to
  a file if need be.
 
 you need to put the modules into /etc/mkinitrd/modules, and recreate you
 initrd with 
 
 mkinitrd -o /boot/initrd.img-2.6.8-9-amd64-k8 2.6.8-9-amd64-k8
 
 and rerun lilo, if you use it.
 
 This should fix the problem.

 This isn't a boot-up problem.  The MD driver was crashing when I
 loaded the modules by hand after the system booted.

 Of course, I'll try it anyway just to see if it makes a difference.
 Thanks for responding.

If you can't get the modules to work or track down the bug compile a
kernel with buildin raid support. I'm using a standard 2.6.8 compiled
with gcc 3.3.4 (1:3.3.4-11) and raid buildin without problems.

MfG
Goswin




Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-19 Thread Marc Singer
On Sun, Dec 19, 2004 at 08:40:51PM +0100, Goswin von Brederlow wrote:
 Marc Singer [EMAIL PROTECTED] writes:
 
  On Sun, Dec 19, 2004 at 03:00:13PM +0100, Frederik Schueler wrote:
  Did you add the needed md and raid modules to your initrd?
 
  Since I don't care about it a boot-up, no.  The root partition as well
  as /usr and /sbin is on an IDE drive.
 
 Having raid buildin enables the autodetection. Raid as modules needs
 to get started manually.

I got that part.

 If you can't get the modules to work or track down the bug compile a
 kernel with buildin raid support. I'm using a standard 2.6.8 compiled
 with gcc 3.3.4 (1:3.3.4-11) and raid buildin without problems.
 

The idea, here, is to test the debian kernel-image package.  Compiling
a kernel kinda defeats that.




Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-19 Thread Marc Singer
On Sun, Dec 19, 2004 at 03:00:13PM +0100, Frederik Schueler wrote:
 Hello,
 
 On Sat, Dec 18, 2004 at 04:35:37PM -0800, Marc Singer wrote:
  I'm testing this kernel on a machine that resently boots to 2.6.7 with
  no initrd and, therefore, all drivers for boot are compiled into the
  kernel.
   There are four drives in the array, all with type 0xfd.  The
  2.6.7 kernel finds them automatically.  The 2.6.8 package does not,
  but that isn't the issue as this device isn't needed at boot-time.
  
 Did you add the needed md and raid modules to your initrd?
 
  Once the kernel boots, I load the raid5 module and look at
  /proc/mdstat.  It looks OK.  mdadm --assemble /dev/md0, however,
  crashes.  I haven't saved error messages, though I could write then to
  a file if need be.
 
 you need to put the modules into /etc/mkinitrd/modules, and recreate you
 initrd with 
 
 mkinitrd -o /boot/initrd.img-2.6.8-9-amd64-k8 2.6.8-9-amd64-k8

Added 

  raid1
  raid5
  md

to the modules file and booted with the updated initrd.  Indeed, the
modules are loaded.  The error persists.

 
 and rerun lilo, if you use it.

Using grub.

 This should fix the problem.

Here're the messages from the dmesg log:

On running fdisk -l:

  ioctl32(fdisk:4187): Unknown cmd fd(5) cmd(80081272){00} arg(dab8) on 
/dev/hda
  ioctl32(fdisk:4187): Unknown cmd fd(6) cmd(80081272){00} arg(dab8) on 
/dev/hdb
  ioctl32(fdisk:4187): Unknown cmd fd(7) cmd(80081272){00} arg(dab8) on 
/dev/hdd
  ioctl32(fdisk:4187): Unknown cmd fd(8) cmd(80081272){00} arg(dab8) on 
/dev/sda
  ioctl32(fdisk:4187): Unknown cmd fd(9) cmd(80081272){00} arg(dab8) on 
/dev/sdb
  ioctl32(fdisk:4187): Unknown cmd fd(10) cmd(80081272){00} arg(dab8) on 
/dev/sdc
  ioctl32(fdisk:4187): Unknown cmd fd(11) cmd(80081272){00} arg(dab8) on 
/dev/sdd

On running mdadm --assemble /dev/md0

  mdadm[4247]: segfault at 002c rip 0804b19e rsp 
db80 error 4
  mdadm[4253]: segfault at 002c rip 0804b19e rsp 
db80 error 4





Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-19 Thread Marc Singer
A couple of other data points:

  o The kernel-image 2.6.8-1-686 is also unable to mount the md
device.  There are not kernel messages when it fails.  mdadm fails
with a segment fault.
  o The 2.6.7 kernel is saying to following when it boots.  The odd
thing is that there is no /dev/md1

md: md1 stopped.
md: could not lock sda.
md: md_import_device returned -16
md: could not lock sdd.
md: md_import_device returned -16
md: md1 stopped.
md: could not lock sda.
md: md_import_device returned -16
md: could not lock sdd.
md: md_import_device returned -16


FYI:

[EMAIL PROTECTED] ~  sudo mdadm --misc --detail /dev/md0
/dev/md0:
Version : 00.90.01
  Creation Time : Thu May  8 15:16:00 2003
 Raid Level : raid5
 Array Size : 107739264 (102.75 GiB 110.33 GB)
Device Size : 35913088 (34.25 GiB 36.78 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sun Dec 19 13:34:03 2004
  State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 128K

   UUID : 9192b7da:86bf82ba:3d9b7289:5cae6f5c
 Events : 0.5778796

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   812  active sync   /dev/sda1
   3   8   173  active sync   /dev/sdb1
 




Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-19 Thread Goswin von Brederlow
Marc Singer [EMAIL PROTECTED] writes:

 On running mdadm --assemble /dev/md0

   mdadm[4247]: segfault at 002c rip 0804b19e rsp 
 db80 error 4
   mdadm[4253]: segfault at 002c rip 0804b19e rsp 
 db80 error 4

That doesn't mean much. Unaligned access gets reported as such as
well for example.

Could you paste a gdb stack backtrace preverably after rebuilding
mdadm with debug info?

MfG
Goswin






Bug#286276: kernel-image-2.6.8-9-amd64-k8: Unable to mount md devices

2004-12-18 Thread Marc Singer
Package: kernel-image-2.6.8-9-amd64-k8
Version: 2.6.8-5
Severity: important

I'm testing this kernel on a machine that resently boots to 2.6.7 with
no initrd and, therefore, all drivers for boot are compiled into the
kernel.  There are four drives in the array, all with type 0xfd.  The
2.6.7 kernel finds them automatically.  The 2.6.8 package does not,
but that isn't the issue as this device isn't needed at boot-time.

Once the kernel boots, I load the raid5 module and look at
/proc/mdstat.  It looks OK.  mdadm --assemble /dev/md0, however,
crashes.  I haven't saved error messages, though I could write then to
a file if need be.

One of the differences here is that the 2.6.7 kernel is built for
x86.  The 2.6.8 kernel was built for amd64, so it is possible that
this is an issue with 64bitness. 


-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.7
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)

Versions of packages kernel-image-2.6.8-9-amd64-k8 depends on:
ii  coreutils [fileutils] 5.2.1-2The GNU core utilities
ii  initrd-tools  0.1.74 tools to create initrd image for p
ii  module-init-tools 3.1-rel-2  tools for managing Linux kernel mo

-- no debconf information