Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479
On Tue, Jun 10, 2008 at 02:21:32PM -0600, dann frazier wrote: On Tue, Jun 10, 2008 at 04:24:16PM +0200, Leo Weppelman wrote: On Tue, Jun 03, 2008 at 01:01:55AM +0200, maximilian attems wrote: On Wed, 30 May 2007, Leo Weppelman wrote: Package: linux-image-2.6.21-1-686 Version: 2.6.21-4 The trace written to the console: = kernel BUG at drivers/block/cciss.c:2479! invalid opcode: [#1] SMP hmm i see. How to reproduce: = I have an ML-350-G5 with an E200i raid controller. There are 2 logical drives defined that map 1-1 on a physical drive. Those disks are part of a software RAID-1 array. When initializing an oracle database on the system, the system panics. Leo. can you still reproduce the error with an up to date kernel aka at least 2.6.24? better 2.6.25 as this one is still upstream supported? thanks for coming back and sorry for late ping?! greetings I tried it with the 2.6.25 (linux-image-2.6.25-2-686_2.6.25-4_i386.deb) today and I can no longer reproduce the bug as I could with 2.6.21. If you want some additional tests, let me know. I'll recycle the installed configuration somewhere next week for some other work. Well, since you asked :) Can you test the etchnhalf kernel? Latest one is: http://http.us.debian.org/debian/pool/main/l/linux-2.6.24/linux-image-2.6.24-etchnhalf.1-686_2.6.24-6~etchnhalf.2_i386.deb You shouldn't have asked ;-) This kernel crashes with the following info: [ cut here ] kernel BUG at drivers/block/cciss.c:2577! invalid opcode: [#1] SMP Modules linked in: sg ipmi_devintf ipmi_watchdog ipmi_poweroff 8021q nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 bnx2 raw dm_snapshot joydev usbhid hid evdev ehci_hcd iTCO_wdt container uhci_hcd ipmi_si ipmi_msghandler button psmouse i5000_edac shpchp pci_hotplug usbcore pcspkr serio_raw edac_core ext3 jbd mbcache dm_mod ide_generic raid1 md_mod ide_cd piix generic ide_core firmware_class cciss thermal processor fan sr_mod cdrom mptscsih mptbase aic7xxx sym53c8xx scsi_transport_spi BusLogic scsi_mod Pid: 1154, comm: md0_raid1 Not tainted (2.6.24-etchnhalf.1-686 #1) EIP: 0060:[f88fe9df] EFLAGS: 00010012 CPU: 1 EIP is at do_cciss_request+0x3c/0x3b6 [cciss] EAX: f76b9cec EBX: df8d5928 ECX: EDX: ESI: f76b9cec EDI: 0001fe00 EBP: df96 ESP: df8dbbc4 DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Process md0_raid1 (pid: 1154, ti=df8da000 task=f7d627d0 task.ti=df8da000) Stack: df8d5520 f7fef600 f7cb75c0 c01dccb3 0001 df8d5928 0c00 02a641a0 0001 df8dbc10 c16ba0a2 0c00 0200 Call Trace: [c01dccb3] __cfq_slice_expired+0x57/0x62 [c01de14d] cfq_set_request+0x250/0x2af [c01dd3ad] cfq_add_rq_rb+0x5c/0x6b [c01dd3e6] cfq_insert_request+0x2a/0x38d [c012c765] lock_timer_base+0x19/0x35 [c012ca14] del_timer+0x48/0x4e [c01d7106] blk_remove_plug+0x57/0x63 [c01d712f] __generic_unplug_device+0x1d/0x1f [c01d863c] __make_request+0x497/0x4ea [c01d5c12] generic_make_request+0x3b2/0x3e0 [c01dedbc] __next_cpu+0x12/0x21 [c012c765] lock_timer_base+0x19/0x35 [c012ca14] del_timer+0x48/0x4e [c01d7106] blk_remove_plug+0x57/0x63 [f88b4f88] raid1d+0x9f/0xcf1 [raid1] [c0103046] __switch_to+0x9d/0x11f [c02bcbda] schedule+0x588/0x5ec [c02bcde2] schedule_timeout+0x13/0x8d [c0104988] apic_timer_interrupt+0x28/0x30 [f8938857] md_thread+0xb9/0xcf [md_mod] [c0135489] autoremove_wake_function+0x0/0x35 [f893879e] md_thread+0x0/0xcf [md_mod] [c01353c2] kthread+0x38/0x5e [c013538a] kthread+0x0/0x5e [c0104b17] kernel_thread_helper+0x7/0x10 === Code: 01 00 00 8b 80 30 01 00 00 84 c0 0f 88 82 03 00 00 8b 44 24 14 e8 0f 56 8d c7 85 c0 89 c6 0f 84 6f 03 00 00 66 83 78 68 1f 76 04 0f 0b eb fe ba 01 00 00 00 89 e8 e8 43 c9 ff ff 85 c0 89 c3 0f EIP: [f88fe9df] do_cciss_request+0x3c/0x3b6 [cciss] SS:ESP 0068:df8dbbc4 ---[ end trace fba57bb3d3d2d56f ]--- I use fai to install the machine, so it's pretty easy to do a test install as long as the kernel doesn't need packages outside to current etch set... Leo. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479
On Tue, Jun 03, 2008 at 01:01:55AM +0200, maximilian attems wrote: On Wed, 30 May 2007, Leo Weppelman wrote: Package: linux-image-2.6.21-1-686 Version: 2.6.21-4 The trace written to the console: = kernel BUG at drivers/block/cciss.c:2479! invalid opcode: [#1] SMP hmm i see. How to reproduce: = I have an ML-350-G5 with an E200i raid controller. There are 2 logical drives defined that map 1-1 on a physical drive. Those disks are part of a software RAID-1 array. When initializing an oracle database on the system, the system panics. Leo. can you still reproduce the error with an up to date kernel aka at least 2.6.24? better 2.6.25 as this one is still upstream supported? thanks for coming back and sorry for late ping?! greetings -- maks I tried it with the 2.6.25 (linux-image-2.6.25-2-686_2.6.25-4_i386.deb) today and I can no longer reproduce the bug as I could with 2.6.21. If you want some additional tests, let me know. I'll recycle the installed configuration somewhere next week for some other work. Leo. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479
On Tue, Jun 10, 2008 at 04:24:16PM +0200, Leo Weppelman wrote: On Tue, Jun 03, 2008 at 01:01:55AM +0200, maximilian attems wrote: On Wed, 30 May 2007, Leo Weppelman wrote: Package: linux-image-2.6.21-1-686 Version: 2.6.21-4 The trace written to the console: = kernel BUG at drivers/block/cciss.c:2479! invalid opcode: [#1] SMP hmm i see. How to reproduce: = I have an ML-350-G5 with an E200i raid controller. There are 2 logical drives defined that map 1-1 on a physical drive. Those disks are part of a software RAID-1 array. When initializing an oracle database on the system, the system panics. Leo. can you still reproduce the error with an up to date kernel aka at least 2.6.24? better 2.6.25 as this one is still upstream supported? thanks for coming back and sorry for late ping?! greetings I tried it with the 2.6.25 (linux-image-2.6.25-2-686_2.6.25-4_i386.deb) today and I can no longer reproduce the bug as I could with 2.6.21. If you want some additional tests, let me know. I'll recycle the installed configuration somewhere next week for some other work. Well, since you asked :) Can you test the etchnhalf kernel? Latest one is: http://http.us.debian.org/debian/pool/main/l/linux-2.6.24/linux-image-2.6.24-etchnhalf.1-686_2.6.24-6~etchnhalf.2_i386.deb -- dann frazier -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479
On Wed, 30 May 2007, Leo Weppelman wrote: Package: linux-image-2.6.21-1-686 Version: 2.6.21-4 The trace written to the console: = kernel BUG at drivers/block/cciss.c:2479! invalid opcode: [#1] SMP hmm i see. How to reproduce: = I have an ML-350-G5 with an E200i raid controller. There are 2 logical drives defined that map 1-1 on a physical drive. Those disks are part of a software RAID-1 array. When initializing an oracle database on the system, the system panics. Leo. can you still reproduce the error with an up to date kernel aka at least 2.6.24? better 2.6.25 as this one is still upstream supported? thanks for coming back and sorry for late ping?! greetings -- maks -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479
I was asked to look at http://bugzilla.kernel.org/show_bug.cgi?id=7763 since that bug looked to be related. It looks like the raid1 module is violating the queue limitations of the cciss module indeed. Since the panic happens that the line: BUG_ON(creq-nr_phys_segments MAXSGENTRIES); in the cciss.c:do_cciss_request() function. I tried the patch suggested in the other bug-thread: http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-merge-max_hw_sector.patch But it did not help. It still crashes at exactly the same spot. signature.asc Description: Digital signature
Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479
Package: linux-image-2.6.21-1-686 Version: 2.6.21-4 The trace written to the console: = kernel BUG at drivers/block/cciss.c:2479! invalid opcode: [#1] SMP Modules linked in: mptctl sg nfsd exportfs lockd nfs_acl sunrpc ipv6 8021q raw dm_snapshot shpchp pci_hotplug psmouse serio_raw pcspkr ext3 jbd mbcache raid1 md_mod dm_mod ide_generic ide_cd usbhid hid piix tg3 cciss bnx2 generic ehci_hcd ide_core uhci_hcd usbcore thermal processor fan sr_mod cdrom mptscsih mptbase aic7xxx sym53c8xx scsi_transport_spi BusLogic scsi_mod CPU:0 EIP:0060:[f894c119]Not tainted VLI EFLAGS: 00010012 (2.6.21-1-686 #1) EIP is at do_cciss_request+0x44/0x349 [cciss] eax: f693b350 ebx: dfb7dbac ecx: edx: esi: 0800 edi: f5ff99c0 ebp: f693b350 esp: dfcafbec ds: 007b es: 007b fs: 00d8 gs: ss: 0068 Process md0_raid1 (pid: 6108, ti=dfcae000 task=df981a90 task.ti=dfcae000) Stack: dfd6004c dfb7dbac 0c00 dfd1 0001 0001 0001 dfcafc2c c17430e0 0c00 0200 c16bad80 c195a24c dfebaadc 0040 0040 dfcafc9c dfd60250 dfc97bec Call Trace: [c01bd25b] elv_next_request+0x10d/0x11c [f894a105] start_io+0x7b/0xe1 [cciss] [f894c413] do_cciss_request+0x33e/0x349 [cciss] [c016438a] cache_alloc_refill+0x58/0x466 [c01c6dfc] cfq_set_request+0x299/0x315 [c01bd54a] elv_rb_add+0x65/0x6d [c01c63e8] cfq_add_rq_rb+0x5c/0x6b [c01c6421] cfq_insert_request+0x2a/0x3ae [c0129e5f] lock_timer_base+0x15/0x2f [c012a15d] del_timer+0x48/0x4e [c01bfa97] blk_remove_plug+0x57/0x63 [c0129f73] __mod_timer+0x9c/0xa6 [c01bfac0] __generic_unplug_device+0x1d/0x1f [c01c0c02] __make_request+0x34c/0x46c [c01bedfa] generic_make_request+0x1a9/0x1b9 [c012a15d] del_timer+0x48/0x4e [c01bfa97] blk_remove_plug+0x57/0x63 [f8961ba9] raid1d+0xbf/0xd0e [raid1] [c0102ff1] __switch_to+0xfe/0x131 [c011b0fe] __activate_task+0x1c/0x29 [c029d6d9] schedule_timeout+0x13/0x8d [c0124b9c] do_exit+0x6c2/0x6c6 [f89a1a71] md_thread+0xc6/0xdd [md_mod] [c01328e5] autoremove_wake_function+0x0/0x35 [f89a19ab] md_thread+0x0/0xdd [md_mod] [c013281a] kthread+0xb2/0xdc [c0132768] kthread+0x0/0xdc [c01049a7] kernel_thread_helper+0x7/0x10 === Code: 44 24 10 8b 82 dc 00 00 00 84 c0 0f 88 0b 03 00 00 8b 44 24 08 e8 46 10 87 c7 85 c0 89 c5 0f 84 f8 02 00 00 66 83 78 68 1f 76 04 0f 0b eb fe 8b 44 24 10 ba 01 00 00 00 e8 a2 f9 ff ff 85 c0 89 EIP: [f894c119] do_cciss_request+0x44/0x349 [cciss] SS:ESP 0068:dfcafbec How to reproduce: = I have an ML-350-G5 with an E200i raid controller. There are 2 logical drives defined that map 1-1 on a physical drive. Those disks are part of a software RAID-1 array. When initializing an oracle database on the system, the system panics. Leo. signature.asc Description: Digital signature