Bug#434138: Boot failure from software-RAID1 in etch during upgrade to 2.4.24-etchnhalf.1-686

2008-08-12 Thread Fabrice LORRAIN
Hello,

Through upgrade of our servers, I got biten several times by this bug.
This is a sum up :
- most of those pb occured while upgrading to 2.6.24-etchnhalf.1
- all the servers are Dell PowerEdges, with / build on
/dev/md0=/dev/sd[a,b]2, the disks been SCSI, SAS or SATA2.
- Using rootdelay=1 solved the pb on all cases (included the case in my
previous report).
- All the upgrades path have been :
 - upgrade to uptodate 2.6.18 (works on all cases except 1 out of ~30
servers)
 - then upgrade to 2.6.24 on some servers

Didn't boot without the fix (PE = PowerEdge):
- 2 * PE-1750
- 1 * PE-2800
- 2 PE-2850, one of them didn't boot on the
linux-image-2.6.18-6-686-bigmem 2.6.18.dfsg.1-18etch6 -
2.6.18.dfsg.1-22 upgrade (did without -bigmem, did with 2.6.18-bigmem or
2.4.24 with rootdelay=1)
- 3 * PE2950

Did boot 2.6.24 without the fix :
- 3 * PE-860 (SATA2 for those servers)
- 1 * PE-2650

All the server failed at the same stage in the initrd with the following
error :
...
Failure: failed to start /dev/md0

Attached :
- /etc/mdadm/mdadm.conf from /boot/initrd.img-2.6.24-etchnhalf.1-686
- /conf/md.conf from /boot/initrd.img-2.6.24-etchnhalf.1-686
- the bootlog from the PE-2800. In the used initrd
/scripts/init-premount/udev has been edited to echo -- in premount
and -- out premount.

Getting this fixed for lenny would be appreciated.
Documenting the pb in
http://wiki.debian.org/EtchAndAHalf?highlight=%28EtchAndAHalf%29 might
help others.

As a side not, I would have appreciated a quick fix with the hard coded
sleep [1|2] instead of the undocumented rootdelay approch for etch.
Getting bitten by 2 years old bug [#366175] in debian stable is annoying.

@+,
Fab
DEVICE partitions
ARRAY /dev/md0 level=raid1 num-devices=2 
UUID=be7f596f:e7ec06aa:9493fbbb:7d3be05a devices=/dev/sda2,/dev/sdb2
ARRAY /dev/md1 level=raid1 num-devices=2 
UUID=d8569b32:d1f063b0:b2cbc7fc:60944e9b devices=/dev/sda5,/dev/sdb5
ARRAY /dev/md2 level=raid1 num-devices=2 
UUID=c7758f0c:5c84bf52:8d369270:bf854ddc devices=/dev/sda6,/dev/sdb6
MD_HOMEHOST='proxy'
MD_DEVPAIRS='/dev/md0:raid1 /dev/md1:raid1 /dev/md2:raid1'
MD_LEVELS='raid1'
MD_DEVS='/dev/md0'
MD_MODULES='raid1'
Initializing cgroup subsys cpuset
Linux version 2.6.24-etchnhalf.1-686 (Debian 2.6.24-6~etchnhalf.4) ([EMAIL 
PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP 
Mon Jul 21 11:17:43 UTC 2008
BIOS-provided physical RAM map:
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 0010 - dffc (usable)
 BIOS-e820: dffc - dffcfc00 (ACPI data)
 BIOS-e820: dffcfc00 - d000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec9 (reserved)
 BIOS-e820: fed0 - fed00400 (reserved)
 BIOS-e820: fee0 - fee1 (reserved)
 BIOS-e820: ffb0 - 0001 (reserved)
 BIOS-e820: 0001 - 00012000 (usable)
Warning only 4GB will be used.
Use a HIGHMEM64G enabled kernel.
3200MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
Zone PFN ranges:
  DMA 0 - 4096
  Normal   4096 -   229376
  HighMem229376 -  1048576
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0:0 -  1048576
DMI 2.3 present.
ACPI: RSDP 000FD5B0, 0014 (r0 DELL  )
ACPI: RSDT 000FD5C4, 0038 (r1 DELL   PE BKC  1 MSFT  10A)
ACPI: FACP 000FD620, 0074 (r1 DELL   PE BKC  1 MSFT  10A)
ACPI: DSDT DFFC, 3CCD (r1 DELL   PE BKC  1 MSFT  10E)
ACPI: FACS DFFCFC00, 0040
ACPI: APIC 000FD694, 00E0 (r1 DELL   PE BKC  1 MSFT  10A)
ACPI: SPCR 000FD774, 0050 (r1 DELL   PE BKC  1 MSFT  10A)
ACPI: HPET 000FD7C4, 0038 (r1 DELL   PE BKC  1 MSFT  10A)
ACPI: MCFG 000FD7FC, 003C (r1 DELL   PE BKC  1 MSFT  10A)
ACPI: PM-Timer IO Port: 0x808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
Processor #1 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] enabled)
Processor #7 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x02] disabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x04] disabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x03] disabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x05] disabled)
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0])
IOAPIC[0]: 

Bug#434138: Boot failure from software-RAID1 in etch during upgrade to 2.4.24-etchnhalf.1-686

2008-08-12 Thread Fabrice LORRAIN
Fabrice LORRAIN a écrit :
 Hello,
...
 ...
 Failure: failed to start /dev/md0
 

Hmmm, missing part of my cut en paste, sorry, better read :
...
Begin: Assembling MD array /dev/md0 ...
mdadm: no devices found for /dev/md0
Failure: failed to start /dev/md0
Done.


@+,
Fab



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Boot failure from software-RAID1 in etch during upgrade to 2.4.24-etchnhalf.1-686

2008-08-12 Thread Fabrice LORRAIN
Hi all,

As reported in #434138, in some cases, upgrading to etchnhalf + software
raid doesn't work. The quick fix is to use delayroot=n as a kernel
parameter.

This information might be a helpful addition for [1].
Also, it seems that [1]  [2] hadn't been updated since etchnhalf
released, removing all the will, and may would help understand where
we stand.

Thanks,

@+,

Fab

[1] http://wiki.debian.org/EtchAndAHalf?highlight=%28etchand%29.
[2] http://wiki.debian.org/EtchAndAHalf/ReleaseNotes


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#409271: Status of this bug ?

2008-08-10 Thread Fabrice Lorrain

Hello,

I've just read through bts #409271 :initramfs-tools: NFSv4 not 
supported for root fs

and I'm interrested in knowing the status of this bug for lenny.

Thanks,

@+,
Fab

PS : ccing nfs-aware people.



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#494036: hwclock freeze mac-mini too

2008-08-10 Thread Fabrice Lorrain

Hello,

I've the same problem as Gonéri on a mac-mini core2 duo, since my 
upgrade to 2.6.26-1-686 :
- Erratic freeze at boot time during time setting. The last message 
showing at the console is :


Setting the system clock.
Checking root file system etc...
Setting the system clock.

I've always had 2 lines Setting the system clock so I guess the crash 
happen in  S11hwclock.sh.


I've less success in reproducing the crash with while true; do hwclock 
; sleep 1 ; done. But while true; do hwclock ; sleep1 ; done does 
every time...


Running with hpet=disable seems to fix the problem... so far.

For Gonéri, thanks for your complete BTS, it helped.

@+,
Fab



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#434138: Boot failure from software-RAID1 in etch during upgrade to 2.4.24-etchnhalf.1-686

2008-08-06 Thread Fabrice LORRAIN
Hello,

This bug is still pending in etch :

I just upgrade one of our servers from kernel 2.6.18-6-686 to
2.6.24-etchnhalf.1-686 and I got bitten by this bug.

Passing rootdelay=2 to the kernel at boot time seems to fix it.

The configuration is Dell PowerEdge 1750  + 2 SCSI hard drives in
software raid1.

@+,

Fab



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#478765: BUG: soft lockup - CPU#0 stuck for 11s! [modprobe:1389]

2008-05-24 Thread Fabrice Lorrain

maximilian attems a écrit :

On Wed, 30 Apr 2008, Fabrice Lorrain wrote:


Package: linux-image-2.6.24-1-686
Version: 2.6.24-6
Severity: normal

Hello,

I observe from time to time the following at cold or warm reboot on a
mac-mini (core 2 duo, cf. http://www.apple.com/macmini/specs.html).

I don't think the trouble is with the hardware (same box working fine
with OSX and Xubuntu-hardy in live mode)



can you reproduce with 2.6.25 from unstable?
installs just fine in testing please report back, thanks



Hello Maximilian,

This bug can be closed, I wasn't able to reproduce the crash since my 
uppgrade to 2.6.25, a couple of weeks ago.


Thanks.

@+,
Fab



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#478765: BUG: soft lockup - CPU#0 stuck for 11s! [modprobe:1389]

2008-04-30 Thread Fabrice Lorrain
Package: linux-image-2.6.24-1-686
Version: 2.6.24-6
Severity: normal

Hello,

I observe from time to time the following at cold or warm reboot on a
mac-mini (core 2 duo, cf. http://www.apple.com/macmini/specs.html).

I don't think the trouble is with the hardware (same box working fine
with OSX and Xubuntu-hardy in live mode)

Maybe related to #478278 and #464387.

I didn't see those pbs with 2.6.18-5-686 and -2.6.22-3-686.
Upgrading to linux-image-2.6.25-1-686 as suggested in #464387.

@+,
Fab

hda: PIONEER DVD-RW DVR-K06, ATAPI CD/DVD-ROM drive
hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
usb 4-1: new full speed USB device using uhci_hcd and address 2
usb 4-1: configuration #1 chosen from 1 choice
hda: set_drive_speed_status: status=0xd0 { Busy }
hda: UDMA/33 mode selected
BUG: soft lockup - CPU#0 stuck for 11s! [modprobe:1389]

Pid: 1389, comm: modprobe Not tainted (2.6.24-1-686 #1)
EIP: 0060:[f89e6dfb] EFLAGS: 0212 CPU: 0
EIP is at ide_inb+0x3/0x7 [ide_core]
EAX: 01d0 EBX: 0282 ECX:  EDX: 01f7
ESI: f89f9f78 EDI: f89f9f20 EBP: fffef992 ESP: f7673d68
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: f8c120e4 CR3: 3770c000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [f89e7262] __ide_wait_stat+0x91/0xee [ide_core]
 [f89e7a4f] ide_config_drive_speed+0xef/0x2ba [ide_core]
 [f89e86fe] ide_set_dma_mode+0x42/0x55 [ide_core]
 [f89ec5d7] ide_set_dma+0xe4/0x11a [ide_core]
 [f89e9717] probe_hwif+0x62b/0x690 [ide_core]
 [f89e9fda] ide_device_add+0x2e/0x9c [ide_core]
 [c011d47e] __wake_up_common+0x32/0x5c
 [f89ebc9e] ide_setup_pci_device+0x38/0x42 [ide_core]
 [c01eb7c7] pci_device_probe+0x36/0x57
 [c0241820] driver_probe_device+0xde/0x15c
 [c02bb9e5] klist_next+0x4b/0x6c
 [c0241930] __driver_attach+0x0/0x79
 [c0241976] __driver_attach+0x46/0x79
 [c0240dfa] bus_for_each_dev+0x37/0x59
 [c0241687] driver_attach+0x16/0x18
 [c0241930] __driver_attach+0x0/0x79
 [c02410e0] bus_add_driver+0x6d/0x197
 [c01eb904] __pci_register_driver+0x48/0x74
 [f8ac009c] piix_ide_init+0x9c/0xa0 [piix]
 [c0138ef2] blocking_notifier_call_chain+0x17/0x1a
 [c0143721] sys_init_module+0x15db/0x16f3
 [c016406e] vma_prio_tree_insert+0x17/0x2a
 [f897b000] alsa_card_azx_init+0x0/0x14 [snd_hda_intel]
 [c01e6c92] pci_bus_read_config_byte+0x0/0x61
 [c0103e5e] sysenter_past_esp+0x6b/0xa1
 ===
hda: set_drive_speed_status: status=0xd0 { Busy }
hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
BUG: soft lockup - CPU#0 stuck for 11s! [modprobe:1389]

Pid: 1389, comm: modprobe Not tainted (2.6.24-1-686 #1)
EIP: 0060:[f89e6dfb] EFLAGS: 0202 CPU: 0
EIP is at ide_inb+0x3/0x7 [ide_core]
EAX: 01d0 EBX: 0282 ECX:  EDX: 01f7
ESI: f89f9f78 EDI: f89f9f20 EBP: 0357 ESP: f7673d68
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: f8c120e4 CR3: 3770c000 CR4: 06d0
DR0:  DR1:  DR2:  DR3: 
DR6: 0ff0 DR7: 0400
 [f89e7262] __ide_wait_stat+0x91/0xee [ide_core]
 [f89e7a4f] ide_config_drive_speed+0xef/0x2ba [ide_core]
 [f89e8769] ide_set_pio_mode+0x58/0x6e [ide_core]
 [f89ec5f0] ide_set_dma+0xfd/0x11a [ide_core]
 [f89e9717] probe_hwif+0x62b/0x690 [ide_core]
 [f89e9fda] ide_device_add+0x2e/0x9c [ide_core]
 [c011d47e] __wake_up_common+0x32/0x5c
 [f89ebc9e] ide_setup_pci_device+0x38/0x42 [ide_core]
 [c01eb7c7] pci_device_probe+0x36/0x57
 [c0241820] driver_probe_device+0xde/0x15c
 [c02bb9e5] klist_next+0x4b/0x6c
 [c0241930] __driver_attach+0x0/0x79
 [c0241976] __driver_attach+0x46/0x79
 [c0240dfa] bus_for_each_dev+0x37/0x59
 [c0241687] driver_attach+0x16/0x18
 [c0241930] __driver_attach+0x0/0x79
 [c02410e0] bus_add_driver+0x6d/0x197
 [c01eb904] __pci_register_driver+0x48/0x74
 [f8ac009c] piix_ide_init+0x9c/0xa0 [piix]
 [c0138ef2] blocking_notifier_call_chain+0x17/0x1a
 [c0143721] sys_init_module+0x15db/0x16f3
 [c016406e] vma_prio_tree_insert+0x17/0x2a
 [f897b000] alsa_card_azx_init+0x0/0x14 [snd_hda_intel]
 [c01e6c92] pci_bus_read_config_byte+0x0/0x61
 [c0103e5e] sysenter_past_esp+0x6b/0xa1
 ===
hda: set_drive_speed_status: status=0xd0 { Busy }
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
usb 4-2: new full speed USB device using uhci_hcd and address 3
usb 4-2: configuration #1 chosen from 1 choice
ACPI: PCI Interrupt :00:1b.0[A] - GSI 22 (level, low) - IRQ 21
PCI: Setting latency timer of device :00:1b.0 to 64
usb 3-2: new low speed USB device using uhci_hcd and address 4
hda_codec: STAC922x, Apple subsys_id=106b0800
usb 3-2: configuration #1 chosen from 1 choice
...usb and input initialisation...
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: ATAPI CD-ROM drive, 0kB Cache
Uniform CD-ROM driver Revision: 3.20
hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
BUG: soft lockup - CPU#0 stuck 

Bug#391867: linux-image-2.6.18-3-686: confirmed : problem fixed

2007-01-29 Thread Fabrice Lorrain
Package: linux-image-2.6.18-3-686
Followup-For: Bug #391867


Hello,

I had the same problem has described on several Dell Optiplex GX270 with
uptodate etch + linux-image-2.6.18-3-686.

Using linux-image-2.6.18-4-686 from sid fixed this 
ata1: port is slow to respond, please be patient and the 90s of waiting
at each boot.

Targeting this kernel for etch would be appreciated.

Thanks.

@+,
Fab

-- System Information:
Debian Release: 4.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-4-686
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)

Versions of packages linux-image-2.6.18-3-686 depends on:
ii  coreutils 5.97-5 The GNU core utilities
ii  debconf [debconf-2.0] 1.5.11 Debian configuration management sy
ii  initramfs-tools [linux-initra 0.85e  tools for generating an initramfs
ii  module-init-tools 3.3-pre3-1 tools for managing Linux kernel mo

Versions of packages linux-image-2.6.18-3-686 recommends:
ii  libc6-i686   2.3.6.ds1-8 GNU C Library: Shared libraries [i

-- debconf information excluded


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#402751: linux-image-2.6.18-3-686: linux-image-2.6.18-4-686 in sid fixed identical reports in #39186

2007-01-29 Thread Fabrice Lorrain
Package: linux-image-2.6.18-3-686
Version: 2.6.18-7
Followup-For: Bug #402751

Hello Peter,

I had the same issue as you on several Dell Optiplex GX270.
Using the last kernel in sid (linux-image-2.6.18-3-686) fixed the issue for me.

Maybe you could give a try and report back to this bug report.

Have a look at bug #39186, it seems very close to your pb.

@+,
Fab

-- System Information:
Debian Release: 4.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-4-686
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)

Versions of packages linux-image-2.6.18-3-686 depends on:
ii  coreutils 5.97-5 The GNU core utilities
ii  debconf [debconf-2.0] 1.5.11 Debian configuration management sy
ii  initramfs-tools [linux-initra 0.85e  tools for generating an initramfs
ii  module-init-tools 3.3-pre3-1 tools for managing Linux kernel mo

Versions of packages linux-image-2.6.18-3-686 recommends:
ii  libc6-i686   2.3.6.ds1-8 GNU C Library: Shared libraries [i

-- debconf information excluded


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#399812: linux-image-2.6.17-2-686: Wrong nic ordering with d-i-RC1 kernel on new Dell poweredge

2006-11-21 Thread Fabrice Lorrain
Package: linux-image-2.6.17-2-686
Version: 2.6.17-9
Severity: normal
Tags: patch


Hello,

I did a bunch of d-i-RC1 installation on a Dell poweredge 2950 last
night (using both i386 and amd64 netinstall iso) and the kernel find the
embeded nic in the wrong order.

As I've seen someone using a 2.8.18.2 kernel complaining for the same pb 
in some french list, I did some digging around. Here is my finding :

http://linux.dell.com/files/whitepapers/nic-enum-whitepaper-v2.pdf dated
octobre 2006, explains that current 2.6 kernels get the ordering of
emmbeded nics wrong on the whole 9th generation (current saling
servers) of Dell poweredges.

Some custom sarge-install grabed on the net (don't have the url right
now but it has been floating arround on the linux-poweredge list) didn't 
have this pb. It was using linux-2.6.19-RC3.

Grepping through
http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.19-rc3
show up the following :

commit 6b4b78fed47e7380dfe9280b154e8b9bfcd4c86c
Author: Matt Domsch [EMAIL PROTECTED]
Date:   Fri Sep 29 15:23:23 2006 -0500

PCI: optionally sort device lists breadth-first

...
Feedback appreciated.  Patch tested on a Dell PowerEdge 1955
   blade with 2.6.18.

Which is the fix. 

I would suggest to include this in the next d-i target kernel, 'cause it
will save a lot of time for a _lot_ of people.

@+,
Fab


Full changelog of the commit :

commit 6b4b78fed47e7380dfe9280b154e8b9bfcd4c86c
Author: Matt Domsch [EMAIL PROTECTED]
Date:   Fri Sep 29 15:23:23 2006 -0500

PCI: optionally sort device lists breadth-first

Problem:
New Dell PowerEdge servers have 2 embedded ethernet ports, which are
labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
in the printed documentation.  Assuming no other add-in ethernet
ports
in the system, Linux 2.4 kernels name these eth0 and eth1
respectively.  Many people have come to expect this naming.  Linux
2.6
kernels name these eth1 and eth0 respectively (backwards from
expectations).  I also have reports that various Sun and HP servers
have similar behavior.


Root cause:
Linux 2.4 kernels walk the pci_devices list, which happens to be
sorted in breadth-first order (or pcbios_find_device order on i386,
which most often is breadth-first also).  2.6 kernels have both the
pci_devices list and the pci_bus_type.klist_devices list, the latter
is what is walked at driver load time to match the pci_id tables;
this
klist happens to be in depth-first order.

On systems where, for physical routing reasons, NIC1 appears on a
lower bus number than NIC2, but NIC2's bridge is discovered first in
the depth-first ordering, NIC2 will be discovered before NIC1.  If
the
list were sorted breadth-first, NIC1 would be discovered before
NIC2.

A PowerEdge 1955 system has the following topology which easily
exhibits the difference between depth-first and breadth-first device
lists.

-[:00]-+-00.0  Intel Corporation 5000P Chipset Memory Controller
Hub
   
+-02.0-[:03-08]--+-00.0-[:04-07]--+-00.0-[:05-06]00.0-[:06]00.0
Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled
NIC2, 2.4 kernel name eth1,
 2.6 kernel name eth0)
   +-1c.0-[:01-02]00.0-[:02]00.0  Broadcom
Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4
kernel name eth0, 2.6 kernel name eth1)


Other factors, such as device driver load order and the presence of
PCI slots at various points in the bus hierarchy further complicate
this problem; I'm not trying to solve those here, just restore the
device order, and thus basic behavior, that 2.4 kernels had.


Solution:

The solution can come in multiple steps.

Suggested fix #1: kernel
Patch below optionally sorts the two device lists into breadth-first
ordering to maintain compatibility with 2.4 kernels.  It adds two
new
command line options:
  pci=bfsort
  pci=nobfsort
to force the sort order, or not, as you wish.  It also adds DMI
checks
for the specific Dell systems which exhibit backwards ordering, to
make them right.


Suggested fix #2: udev rules from userland
Many people also have the expectation that embedded NICs are always
discovered before add-in NICs (which this patch does not try to do).
Using the PCI IRQ Routing Table provided by system BIOS, it's easy
to
determine which PCI devices are embedded, or if add-in, which PCI
slot
they're in.  I'm working on a tool that would allow udev to name
ethernet devices in ascending embedded, slot 1 .. slot N order,
subsort by PCI bus/dev/fn breadth-first.  It'll be possible to use
it
independent of udev as well for those distributions that don't use
udev in their installers.

Suggested fix #3: system board routing rules

Bug#267553: kernel does not install from nfsrooted environnement

2004-08-23 Thread Fabrice LORRAIN
Package: kernel-image-2.6.7-1-686
Version: 2.6.7-2
Severity: normal
Hello,
I try to install. a new i686 box under sarge with our custom nfsroot 
environnement. The installation of the kernel failed with the following:

# apt-get install kernel-image-2.6.7-1-686
Reading Package Lists... Done
Building Dependency Tree... Done
kernel-image-2.6.7-1-686 is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
3 not fully installed or removed.
Need to get 0B of archives.
After unpacking 0B of additional disk space will be used.
Setting up kernel-image-2.4.26-1-686 (2.4.26-4) ...
/usr/sbin/mkinitrd: Cannot determine root device
Failed to create initrd image.
dpkg: error processing kernel-image-2.4.26-1-686 (--configure):
 subprocess post-installation script returned error exit status 9
Setting up kernel-image-2.6.7-1-686 (2.6.7-2) ...
/usr/sbin/mkinitrd: Cannot determine root device
Failed to create initrd image.
dpkg: error processing kernel-image-2.6.7-1-686 (--configure):
 subprocess post-installation script returned error exit status 9
dpkg: dependency problems prevent configuration of kernel-image-2.6-686:
 kernel-image-2.6-686 depends on kernel-image-2.6.7-1-686; however:
  Package kernel-image-2.6.7-1-686 is not configured yet.
dpkg: error processing kernel-image-2.6-686 (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 kernel-image-2.4.26-1-686
 kernel-image-2.6.7-1-686
 kernel-image-2.6-686
E: Sub-process /usr/bin/dpkg returned an error code (1)
Same problem with kernel-image-2.4.26-1-686.
The exact booting order was the following :
- Netbooting (PXE) the box,
- partitioning
- tar xzf sarge.tgz /mnt/out (home build sarge-base tarball)
- chroot /mnt/out
- mount /proc
- apt-get {update,upgrade}
- install some packages
- apt-get install kernel-image-2.6.7-1-686
Right know it seems I cannot install sarge from a nfsroot env without 
building my own kernel... I do feel the severity should be higher. 
Leaving it at your appreciation because of the freeze.

@+,
Fab