Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479

2008-06-11 Thread Leo Weppelman
On Tue, Jun 10, 2008 at 02:21:32PM -0600, dann frazier wrote:
 On Tue, Jun 10, 2008 at 04:24:16PM +0200, Leo Weppelman wrote:
  On Tue, Jun 03, 2008 at 01:01:55AM +0200, maximilian attems wrote:
   On Wed, 30 May 2007, Leo Weppelman wrote:
   
Package: linux-image-2.6.21-1-686
Version: 2.6.21-4

The trace written to the console:
=

kernel BUG at drivers/block/cciss.c:2479!
invalid opcode:  [#1]
SMP 
   
   hmm i see.

How to reproduce:
=

I have an ML-350-G5 with an E200i raid controller. There are 2 logical 
drives
defined that map 1-1 on a physical drive. Those disks are part of a 
software
RAID-1 array. When initializing an oracle database on the system, the
system panics.


Leo.
   
   can you still reproduce the error with an up to date kernel aka at
   least 2.6.24? better 2.6.25 as this one is still upstream supported?
   
   thanks for coming back and sorry for late ping?!
   
   greetings
   
  
  I tried it with the 2.6.25 (linux-image-2.6.25-2-686_2.6.25-4_i386.deb) 
  today
  and I can no longer reproduce the bug as I could with 2.6.21.
  If you want some additional tests, let me know. I'll recycle the installed
  configuration somewhere next week for some other work.
 
 Well, since you asked :) Can you test the etchnhalf kernel? Latest one is:
   
 http://http.us.debian.org/debian/pool/main/l/linux-2.6.24/linux-image-2.6.24-etchnhalf.1-686_2.6.24-6~etchnhalf.2_i386.deb

You shouldn't have asked ;-) This kernel crashes with the following info:

[ cut here ]
kernel BUG at drivers/block/cciss.c:2577!
invalid opcode:  [#1] SMP
Modules linked in: sg ipmi_devintf ipmi_watchdog ipmi_poweroff 8021q nfs nfsd 
lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 bnx2 raw dm_snapshot joydev 
usbhid hid evdev ehci_hcd iTCO_wdt container uhci_hcd ipmi_si ipmi_msghandler 
button psmouse i5000_edac shpchp pci_hotplug usbcore pcspkr serio_raw edac_core 
ext3 jbd mbcache dm_mod ide_generic raid1 md_mod ide_cd piix generic ide_core 
firmware_class cciss thermal processor fan sr_mod cdrom mptscsih mptbase 
aic7xxx sym53c8xx scsi_transport_spi BusLogic scsi_mod

Pid: 1154, comm: md0_raid1 Not tainted (2.6.24-etchnhalf.1-686 #1)
EIP: 0060:[f88fe9df] EFLAGS: 00010012 CPU: 1
EIP is at do_cciss_request+0x3c/0x3b6 [cciss]
EAX: f76b9cec EBX: df8d5928 ECX:  EDX: 
ESI: f76b9cec EDI: 0001fe00 EBP: df96 ESP: df8dbbc4
 DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process md0_raid1 (pid: 1154, ti=df8da000 task=f7d627d0 task.ti=df8da000)
Stack: df8d5520 f7fef600 f7cb75c0 c01dccb3 0001 df8d5928 0c00 02a641a0
    0001  df8dbc10 c16ba0a2 0c00  0200
          
Call Trace:
 [c01dccb3] __cfq_slice_expired+0x57/0x62
 [c01de14d] cfq_set_request+0x250/0x2af
 [c01dd3ad] cfq_add_rq_rb+0x5c/0x6b
 [c01dd3e6] cfq_insert_request+0x2a/0x38d
 [c012c765] lock_timer_base+0x19/0x35
 [c012ca14] del_timer+0x48/0x4e
 [c01d7106] blk_remove_plug+0x57/0x63
 [c01d712f] __generic_unplug_device+0x1d/0x1f
 [c01d863c] __make_request+0x497/0x4ea
 [c01d5c12] generic_make_request+0x3b2/0x3e0
 [c01dedbc] __next_cpu+0x12/0x21
 [c012c765] lock_timer_base+0x19/0x35
 [c012ca14] del_timer+0x48/0x4e
 [c01d7106] blk_remove_plug+0x57/0x63
 [f88b4f88] raid1d+0x9f/0xcf1 [raid1]
 [c0103046] __switch_to+0x9d/0x11f
 [c02bcbda] schedule+0x588/0x5ec
 [c02bcde2] schedule_timeout+0x13/0x8d
 [c0104988] apic_timer_interrupt+0x28/0x30
 [f8938857] md_thread+0xb9/0xcf [md_mod]
 [c0135489] autoremove_wake_function+0x0/0x35
 [f893879e] md_thread+0x0/0xcf [md_mod]
 [c01353c2] kthread+0x38/0x5e
 [c013538a] kthread+0x0/0x5e
 [c0104b17] kernel_thread_helper+0x7/0x10
 ===
Code: 01 00 00 8b 80 30 01 00 00 84 c0 0f 88 82 03 00 00 8b 44 24 14 e8 0f 56 
8d c7 85 c0 89 c6 0f 84 6f 03 00 00 66 83 78 68 1f 76 04 0f 0b eb fe ba 01 00 
00 00 89 e8 e8 43 c9 ff ff 85 c0 89 c3 0f 
EIP: [f88fe9df] do_cciss_request+0x3c/0x3b6 [cciss] SS:ESP 0068:df8dbbc4
---[ end trace fba57bb3d3d2d56f ]---

I use fai to install the machine, so it's pretty easy to do a test install
as long as the kernel doesn't need packages outside to current etch set...

Leo.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479

2008-06-10 Thread Leo Weppelman
On Tue, Jun 03, 2008 at 01:01:55AM +0200, maximilian attems wrote:
 On Wed, 30 May 2007, Leo Weppelman wrote:
 
  Package: linux-image-2.6.21-1-686
  Version: 2.6.21-4
  
  The trace written to the console:
  =
  
  kernel BUG at drivers/block/cciss.c:2479!
  invalid opcode:  [#1]
  SMP 
 
 hmm i see.
  
  How to reproduce:
  =
  
  I have an ML-350-G5 with an E200i raid controller. There are 2 logical 
  drives
  defined that map 1-1 on a physical drive. Those disks are part of a software
  RAID-1 array. When initializing an oracle database on the system, the
  system panics.
  
  
  Leo.
 
 can you still reproduce the error with an up to date kernel aka at
 least 2.6.24? better 2.6.25 as this one is still upstream supported?
 
 thanks for coming back and sorry for late ping?!
 
 greetings
 
 -- 
 maks

I tried it with the 2.6.25 (linux-image-2.6.25-2-686_2.6.25-4_i386.deb) today
and I can no longer reproduce the bug as I could with 2.6.21.
If you want some additional tests, let me know. I'll recycle the installed
configuration somewhere next week for some other work.

Leo.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479

2008-06-10 Thread dann frazier
On Tue, Jun 10, 2008 at 04:24:16PM +0200, Leo Weppelman wrote:
 On Tue, Jun 03, 2008 at 01:01:55AM +0200, maximilian attems wrote:
  On Wed, 30 May 2007, Leo Weppelman wrote:
  
   Package: linux-image-2.6.21-1-686
   Version: 2.6.21-4
   
   The trace written to the console:
   =
   
   kernel BUG at drivers/block/cciss.c:2479!
   invalid opcode:  [#1]
   SMP 
  
  hmm i see.
   
   How to reproduce:
   =
   
   I have an ML-350-G5 with an E200i raid controller. There are 2 logical 
   drives
   defined that map 1-1 on a physical drive. Those disks are part of a 
   software
   RAID-1 array. When initializing an oracle database on the system, the
   system panics.
   
   
   Leo.
  
  can you still reproduce the error with an up to date kernel aka at
  least 2.6.24? better 2.6.25 as this one is still upstream supported?
  
  thanks for coming back and sorry for late ping?!
  
  greetings
  
 
 I tried it with the 2.6.25 (linux-image-2.6.25-2-686_2.6.25-4_i386.deb) today
 and I can no longer reproduce the bug as I could with 2.6.21.
 If you want some additional tests, let me know. I'll recycle the installed
 configuration somewhere next week for some other work.

Well, since you asked :) Can you test the etchnhalf kernel? Latest one is:
  
http://http.us.debian.org/debian/pool/main/l/linux-2.6.24/linux-image-2.6.24-etchnhalf.1-686_2.6.24-6~etchnhalf.2_i386.deb

-- 
dann frazier




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479

2008-06-02 Thread maximilian attems
On Wed, 30 May 2007, Leo Weppelman wrote:

 Package: linux-image-2.6.21-1-686
 Version: 2.6.21-4
 
 The trace written to the console:
 =
 
 kernel BUG at drivers/block/cciss.c:2479!
 invalid opcode:  [#1]
 SMP 

hmm i see.
 
 How to reproduce:
 =
 
 I have an ML-350-G5 with an E200i raid controller. There are 2 logical drives
 defined that map 1-1 on a physical drive. Those disks are part of a software
 RAID-1 array. When initializing an oracle database on the system, the
 system panics.
 
 
 Leo.

can you still reproduce the error with an up to date kernel aka at
least 2.6.24? better 2.6.25 as this one is still upstream supported?

thanks for coming back and sorry for late ping?!

greetings

-- 
maks



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479

2007-06-21 Thread Leo Weppelman
I was asked to look at http://bugzilla.kernel.org/show_bug.cgi?id=7763 since
that bug looked to be related.
It looks like the raid1 module is violating the queue limitations of the cciss
module indeed. Since the panic happens that the line: 
  BUG_ON(creq-nr_phys_segments  MAXSGENTRIES);
in the cciss.c:do_cciss_request() function.

I tried the patch suggested in the other bug-thread:
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-merge-max_hw_sector.patch

But it did not help. It still crashes at exactly the same spot.


signature.asc
Description: Digital signature


Bug#426705: cciss: kernel BUG at drivers/block/cciss.c:2479

2007-05-30 Thread Leo Weppelman
Package: linux-image-2.6.21-1-686
Version: 2.6.21-4

The trace written to the console:
=

kernel BUG at drivers/block/cciss.c:2479!
invalid opcode:  [#1]
SMP 
Modules linked in: mptctl sg nfsd exportfs lockd nfs_acl sunrpc ipv6 8021q raw 
dm_snapshot shpchp pci_hotplug psmouse serio_raw pcspkr ext3 jbd mbcache raid1 
md_mod dm_mod ide_generic ide_cd usbhid hid piix tg3 cciss bnx2 generic 
ehci_hcd ide_core uhci_hcd usbcore thermal processor fan sr_mod cdrom mptscsih 
mptbase aic7xxx sym53c8xx scsi_transport_spi BusLogic scsi_mod
CPU:0
EIP:0060:[f894c119]Not tainted VLI
EFLAGS: 00010012   (2.6.21-1-686 #1)
EIP is at do_cciss_request+0x44/0x349 [cciss]
eax: f693b350   ebx: dfb7dbac   ecx:    edx: 
esi: 0800   edi: f5ff99c0   ebp: f693b350   esp: dfcafbec
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process md0_raid1 (pid: 6108, ti=dfcae000 task=df981a90 task.ti=dfcae000)
Stack:  dfd6004c dfb7dbac 0c00 dfd1 0001 0001 0001 
   dfcafc2c c17430e0 0c00  0200 c16bad80   
   c195a24c dfebaadc  0040 0040 dfcafc9c dfd60250 dfc97bec 
Call Trace:
 [c01bd25b] elv_next_request+0x10d/0x11c
 [f894a105] start_io+0x7b/0xe1 [cciss]
 [f894c413] do_cciss_request+0x33e/0x349 [cciss]
 [c016438a] cache_alloc_refill+0x58/0x466
 [c01c6dfc] cfq_set_request+0x299/0x315
 [c01bd54a] elv_rb_add+0x65/0x6d
 [c01c63e8] cfq_add_rq_rb+0x5c/0x6b
 [c01c6421] cfq_insert_request+0x2a/0x3ae
 [c0129e5f] lock_timer_base+0x15/0x2f
 [c012a15d] del_timer+0x48/0x4e
 [c01bfa97] blk_remove_plug+0x57/0x63
 [c0129f73] __mod_timer+0x9c/0xa6
 [c01bfac0] __generic_unplug_device+0x1d/0x1f
 [c01c0c02] __make_request+0x34c/0x46c
 [c01bedfa] generic_make_request+0x1a9/0x1b9
 [c012a15d] del_timer+0x48/0x4e
 [c01bfa97] blk_remove_plug+0x57/0x63
 [f8961ba9] raid1d+0xbf/0xd0e [raid1]
 [c0102ff1] __switch_to+0xfe/0x131
 [c011b0fe] __activate_task+0x1c/0x29
 [c029d6d9] schedule_timeout+0x13/0x8d
 [c0124b9c] do_exit+0x6c2/0x6c6
 [f89a1a71] md_thread+0xc6/0xdd [md_mod]
 [c01328e5] autoremove_wake_function+0x0/0x35
 [f89a19ab] md_thread+0x0/0xdd [md_mod]
 [c013281a] kthread+0xb2/0xdc
 [c0132768] kthread+0x0/0xdc
 [c01049a7] kernel_thread_helper+0x7/0x10
 ===
Code: 44 24 10 8b 82 dc 00 00 00 84 c0 0f 88 0b 03 00 00 8b 44 24 08 e8 46 10 
87 c7 85 c0 89 c5 0f 84 f8 02 00 00 66 83 78 68 1f 76 04 0f 0b eb fe 8b 44 24 
10 ba 01 00 00 00 e8 a2 f9 ff ff 85 c0 89 
EIP: [f894c119] do_cciss_request+0x44/0x349 [cciss] SS:ESP 0068:dfcafbec


How to reproduce:
=

I have an ML-350-G5 with an E200i raid controller. There are 2 logical drives
defined that map 1-1 on a physical drive. Those disks are part of a software
RAID-1 array. When initializing an oracle database on the system, the
system panics.


Leo.


signature.asc
Description: Digital signature