Bug#422217: linux-image-2.6.20-1-686: SCSI disks initialised too late for mdadm

2007-05-11 Thread Simon A. Boggis
I've done my experiment with initramfs-tools - putting a 'sleep 10'
before mount_root makes my machine boot the kernel, as I suspected in my
original email:

# diff -u /usr/share/initramfs-tools/init{.orig,}
--- /usr/share/initramfs-tools/init.orig2007-03-07
22:30:42.0 +
+++ /usr/share/initramfs-tools/init 2007-05-11 14:33:55.0 +0100
@@ -145,6 +145,12 @@
 run_scripts /scripts/init-premount
 [ $quiet != y ]  log_end_msg

+#SAB
+log_begin_msg SAB: slow SCSI disk discovery workaround: sleeping for
10 seconds
+/bin/sleep 10
+log_end_msg
+#SAB
+
 maybe_break mount
 log_begin_msg Mounting root file system...
 . /scripts/${BOOT}

# update-initramfs -k 2.6.20-1-686 -d

# update-initramfs -k 2.6.20-1-686 -c

# update-grub

# shutdown -r now

Boot log captured from serial-over-LAN console (hence excuse strange chars):

Begin: Running /scripts/init-premount
ACPI: Processor [CPU2] (supports 8 throttling states)
usbcore: registered newdriver 3.04.03
Copyright (c) 1999-2007 LSI Logic Corporation
e1000: :04:04.0: e1000_probe: (PCI-X:100MHz:64-bit) 00:04:23:c5:10:d6
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
e1000: eth0: e1hdd: Slimtype COMBO SOSC-2483K, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ACPioc0: 53C1030: Capabilities={Initiator}
scsi0 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=222, IRQ=24
ACPI: PCI Interrupt :02:05.1[B] - GSI 25 (level, low) - IRQ 25
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
scsi 0:0:0:0: Direct-Access SEAGATE  ST336754LC 0005 PQ: 0 ANSI: 3
 target0:0:0: Beginning Domain Validation
scsi1 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=222, IRQ=25
 target0:0:0: Ending Domain Validation
 target0:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP
(6.25 ns, offset 63)
scsi 0:0:1:0: Direct-Access SEAGATE  ST336754LC 0005 PQ: 0 ANSI: 3
 target0:0:1: Beginning Domain Validation
 target0:0:1: Ending Domain Validation
 target0:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP
(6.25 ns, offset 63)
scsi 0:0:2:0: Direct-Acces:2: Beginning Domain Validation
 target0:0:2: Ending Domain Validation
 target0:0:2: FAST-160   SEAGATE  ST336807LC   0C01 PQ: 0 ANSI: 3
 target0:0:3: Beginning Domain Validation
ACPI: PCI Interrupt :03:04.0[A] - GSI 24 (level, low) - IRQ 26
e100: eth4: e100_probe: addr 0xdecfe000, irq 26, MAC addr 00:02:B3:B4:3C:15
ACPI: PCI Interrupt :03:05.0[A] - GSI 27 (level, low) - IRQ 27
e100: eth5: e100_probe: addr 0xdecff000, irq 27, MAC addr
00:02:B3:B4:3C:1rive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
Done.
Begin: SAB: slow SCSIvery workaround: sleeping for 10 seconds ...
 target0:0:3: Ending Domain Validation
 target0:0:3: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP
(6.25 ns, offset 63)
scsi 0:0:4:0: Direct-Access SEAGATE  ST336754LC 0005 PQ: 0 ANSI: 3
 target0:0:4: Beginning Domain Validation
 target0:0:4: Ending Domain Validation
 target0:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP
(6.25 ns, offset 63)
scsi 0:0:5:0: Direct-Access SEAGATE  ST336754LC 0005 PQ: 0 ANSI: 3
 target0:0:5: Beginning Domain Validation
 target0:0:5: Ending Domain Validation
 target0:0:5: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRocessor
  ESG-SHV  SCA HSBP M29 1.06 PQ: 0 ANSI: 2
 target0:0:6: Beginning Domain ValSCSI device sda: 71687372 512-byte
hdwr sectors (36704 MB)
sda: Write Protect is off
SCSI device sda: write cache: enabled, read cache: enabled, supports DPO
and FUA
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
sda: Write Protect is off
SCSI device sda: write cache: enabled, read cache: enabled, supports DPO
and FUA
 sda: sda1 sda2 sda3  sda5 sda6 sda7 
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 71687372 512-byte hdwr sectors (36704 MB)
sdb: Write Protect is off
SCSI device sdb: write cache: enabled, read cache: enabled, supports DPO
and FUA
SCSIis off
SCSI device sdb: write cache: enabled, read cache: enabled, supports DPO
and FUA
 sdb:  71687372 512-byte hdwr sectors (36704 MB)
sdc: Write Protect is off
SCSI device sdc: write ca device sdc: write cache: enabled, read cache:
enabled, supports DPO and FUA
 sdc: unknown part Write Protect is off
SCSI device sdd: write cache: enabled, read cache: enabled, supports DPO
ed, supports DPO and FUA
 sdd: sdd4
 sdd4: bsd:bad subpartition - ignored
bad subpartition -0:3:0: Attached scsi disk sdd
SCSI device sde: 71687372 512-byte hdwr sectors (36704 MB)
sde: 12-byte hdwr sectors (36704 MB)
sde: Write Protect is off
SCSI device sde: write cache: enable2 512-byte hdwr sectors (36704 MB)
sdf: Write Protect is off
SCSI device sdf: write cache: enate cache: enabled, read cache: enabled,
supports DPO and FUA
 sdf: unknown partition table
sd Done.
Begin: Mounting root file system... ...
Begin: Running /scripts/local-top ...
Begin: Loading Mmd: 

Bug#422217: linux-image-2.6.20-1-686: SCSI disks initialised too late for mdadm

2007-05-11 Thread Simon A. Boggis
Stephen Gran wrote:
 This one time, at band camp, Simon A. Boggis said:
 I've done my experiment with initramfs-tools - putting a 'sleep 10'
 before mount_root makes my machine boot the kernel, as I suspected in my
 original email:

 # diff -u /usr/share/initramfs-tools/init{.orig,}
 --- /usr/share/initramfs-tools/init.orig2007-03-07
 22:30:42.0 +
 +++ /usr/share/initramfs-tools/init 2007-05-11 14:33:55.0 +0100
 @@ -145,6 +145,12 @@
  run_scripts /scripts/init-premount
  [ $quiet != y ]  log_end_msg

 +#SAB
 +log_begin_msg SAB: slow SCSI disk discovery workaround: sleeping for
 10 seconds
 +/bin/sleep 10
 +log_end_msg
 +#SAB
 +
  maybe_break mount
  log_begin_msg Mounting root file system...
  . /scripts/${BOOT}
 
 Not that I'm involved in this in any real way, but things like hardcoded
 sleep timeouts always make me uncomfortable - they introduce delays for
 people who don't need them, and they are racy at best and can still fail
 for the people who do need them.  Is there some way to use udevsettle or
 something instead?  If not, some method of sleep until $disk seems
 better than hardcoding it, to me at least.

I would completely agree with you - it's totally the wrong thing to do -
another SCSI card (or more, or slower devices) could take even longer.
The only reason I did it was to prove (as opposed to guess) that the
problem really is a race between SCSI becoming ready and mount_root.
This has now been shown to be the case, so the next questions are what
is the cause and can it be fixed properly?

Ideally one would like something like (in pseudo-code):

if has_scsi:
  start_scsi_in_blocking_mode
mount_root

or if it won't block then:

if has_scsi
  start_scsi_in_non-blocking_mode
  wait_until_scsi_ready
mount_root

It is interesting that the behaviour is different between 2.6.18 and
2.6.20 - this either implies that SCSI blocked in 2.6.18 or that we were
just lucky and SCSI initialisation won the race. I haven't had time to
work out what might have changed in 2.6.20 yet.

Best wishes,

Simon


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#422217: linux-image-2.6.20-1-686: SCSI disks initialised too late for mdadm

2007-05-04 Thread Simon A. Boggis
Package: linux-image-2.6.20-1-686
Version: 2.6.20-3
Severity: critical
Justification: breaks the whole system


Hi,

I've filed this bug under the this package since, although one could 
argue that initramfs-tools has the problem, the difference appears to be 
kernel version.

My machine is configured with an software raid 0 (mdadm) root 
filesystem, composed of two SCSI drives.

If I run the stock debian etch linux-image-2.6.18-4-686 
(2.6.18.dfsg.1-12etch1) kernel, everything works as expected. If I 
attempt to boot the linux-image-2.6.20-1-686 (2.6.20-3) kernel from 
unstable, my system hangs on boot.

Examining captures of the boot process shows that on 2.6.18-4-686 we 
see (excuse slight hiccups in formatting - imperfect capture from 
serial over LAN console):

Begin: Running /scripts/init-premount ...
ACPI: Processor [usbcore: registered new driver usbfs
usbcore: registered new driver hub
SCSI subsystem initiale1000: :04:04.0: e1000_probe: 
(PCI-X:100MHz:64-bit) 00:04:23:c5:10:d6
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; overrid 1999-2005 
LSI Logic Corporation
USB Universal Host Controller Interface driver v3.0
e1000: ett :00:1d.0[A] - GSI 16 (level, low) - IRQ 169
uhci_hcd :00:1d.0: UHCI Host Controllerhdd: Slimtype COMBO 
SOSC-2483K, ATAPI CD/DVD-ROM drive
scsi0 : ioc0: LSI53C1030, FwRev=01032700Vendor: SEAGATE   Model: 
ST336754LC Rev: 0005
Type:   Direct-Access  ANSI SCSI revision: 
03
 target0:0:0: Beginning Domain Validation
 target0:0:0: Ending Domain Validation
 target0:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP 
(6.25 ns, offset 63)
Vendor: SEAGATE   Model: ST336754LCRev: 0005
Type:   Direct-Access  ANSI SCSI revision: 
03
 target0:0:1: Beginning Domain Validation
 target0:0:1: Ending Domain Validation
 target0:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP 
(6.25 ns, offset 63)
Vendor: SEAGATE   Model: ST336807LCRev: 0C01
Type:   Direct-AccessGW9   ANSI SCSI 
revision: 03
 target0:0:3: Beginning Domain Validation
 target0:0:3: Ending DomaiFLOW PCOMP (6.25 ns, offset 63)
Vendor: ESG-SHV   Model: SCA HSBP M29  Rev: 1.06
Type:   Processor  ANSI SCSI revision: 
02
 target0:0:6: Beginning Domain Validation
 target0:0:6: Ending Domain Validation
 target0:0:6: asynchronous
ACPI: PCI Interrupt :02:05.1[B] - GSI 25 (level, low) - IRQ 66
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
scsi1 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=222, IRQ=66
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
hdd: ATAPI5sda: Write Protect is off
SCSI device sda: drive cache: write back w/ FUA
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back w/ FUA
 sda: sda1 sda2 sda3  sda5 sda6 sda7 
 24X5sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 71687372 512-byte hdwr sectors (36704 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back w/ FUA
SCSI device sdb: 71687372 512-byte hdwr sectors (36704 MB)
sdb: Write Protect is off
SCSI device sdb: drive c Write Protect is off
SCSI device sdc: drive cache: write back w/ FUA
SCSI device sdc: 7168737vision: 3.20
 sdc: sdc4
 sdc4: bsd:bad subpartition - ignored
bad subpartition - ignored
bac
Done.
Begin: Mounting root file system... ...
[BOOT CONTINUES]

whereas on linux-image-2.6.20-1-686 we see:

Begin: Running /scripts/init-premount ...
ACPIling states)
ACPI: Processor [CPU2] (supports 8 throttling states)
usbcore: registered new intporation.
ACPI: PCI Interrupt :04:04.0[A] - GSI 54 (level, low) - IRQ 17
ICH5: IDE contrbe irqs later
  ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:pio, hdb:pio
  ide1: BM-ystem initialized
USB Universal Host Controller Interface driver v3.0
e1000: eth0: e1000_probee COMBO SOSC-2483K, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ACPI: PCI Interscsi0 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, 
MaxQ=222, IRQ=25
ACPI: PCI Interrupt :02:05.1[B] - GSI 25 (level, low) - IRQ 27
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
scsi 0:0:0:0: Direct-Access SEAGATE  ST336754LC 0005 PQ: 0 ANSI: 
3
 target0:0:0: Beginning Domain Validation
scsi1 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=222, IRQ=27
 target0:0:0: Ending Domain Validation
 target0:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP 
(6.25 ns, offset 63)
scsi 0:0:1:0: Direct-Access SEAGATE  ST336754LC 0005 PQ: 0 ANSI: 
3
 target0:0:1: Beginning Domain Validation
 target0:0:1: Ending Domain Validation
 target0:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP 
(6.25 ns, offset 63)
hdd: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM