Hi there.  I'm getting kernel panics, and I don't know why.

I have a 6 drive SCSI multipack connected to a LSI Logic / Symbios Logic 53c875 
(using the ncrs driver).  The box itself is an older Dell 1600SC with 1.GB RAM. 
 (32 bit xeon).  The box, scsi card, and multipack have been rock solid for the 
past 7 years.

I installed opensolaris 2008.05 (snv_86) and created a ZFS volume (raid 1+0) 
across the 6 drives.  When I copy files across the network to the volume, the 
machine will eventually (anywhere between 5 minutes and 2 hours) panic. 

Interestingly, I have the same model card, another SCSI disk pack, and another 
machine (PowerEdge SC440, core2 duo).  On this box, I'm also running 
opensolaris 2008.05.  I get identical panics, whether using the 64 bit (glm?) 
driver or the 32 bit ncrs driver.

I upgraded the Dell 1600SC to snv_91 in the hope that the problem would 
magically go away.  It didn't :-(

I added "set kmem_flags=0xf" to /etc/system & here's the most recent panic:

Jun 26 21:31:03 barcelona genunix: [ID 478202 kern.notice] kernel memory 
allocator: 
Jun 26 21:31:03 barcelona genunix: [ID 432124 kern.notice] buffer freed to 
wrong cache
Jun 26 21:31:03 barcelona genunix: [ID 815666 kern.notice] buffer was allocated 
from kmem_alloc_320,
Jun 26 21:31:03 barcelona genunix: [ID 530907 kern.notice] caller attempting 
free to kmem_alloc_8.
Jun 26 21:31:03 barcelona genunix: [ID 563406 kern.notice] buffer=e52c7400  
bufctl=e5279200  cache: kmem_alloc_8
Jun 26 21:31:03 barcelona genunix: [ID 341866 kern.notice] previous transaction 
on buffer e52c7400:
Jun 26 21:31:03 barcelona genunix: [ID 991227 kern.notice] thread=e12e7ce0  
time=T-0.013422618  slab=e509c088  cache: k
mem_alloc_320
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
kmem_cache_alloc_debug+258
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_cache_alloc+8d
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_zalloc+4b
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
glm_pkt_alloc_extern+83
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] glm_scsi_init_pkt+129
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] scsi_init_pkt+48
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
sd_initpkt_for_uscsi+9e
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_start_cmds+15f
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_core_iostart+158
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_uscsi_strategy+108
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] default_physio+31b
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] physio+1d
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
scsi_uscsi_handle_cmd+16d
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_send_scsi_cmd+13f
Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sdioctl+c86
Jun 26 21:31:03 barcelona unix: [ID 836849 kern.notice] 
Jun 26 21:31:03 barcelona ^Mpanic[cpu0]/thread=d391cde0: 
Jun 26 21:31:03 barcelona genunix: [ID 812275 kern.notice] kernel heap 
corruption detected
Jun 26 21:31:03 barcelona unix: [ID 100000 kern.notice] 
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc20 
genunix:kmem_error+421 (6, d1024398, e52c74)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc5c 
genunix:kmem_free+bf (e52c7400, 8)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc78 
ncrs:glm_pkt_destroy_extern+60 (d7a77600, e9767388)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc90 
ncrs:glm_scsi_destroy_pkt+42 (e97674a8, e97674a4)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cca8 
scsi:scsi_destroy_pkt+16 (e97674a4)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391ccc8 
sd:sd_destroypkt_for_uscsi+89 (d9365de0)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391ccf4 
sd:sd_return_command+124 (d4106a80, d9365de0)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd28 
sd:sdintr+499 (e97674a4)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd4c 
ncrs:glm_doneq_empty+3b (d7a77600)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd60 
ncrs:glm_intr+75 (d7a77600, 0)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cdac 
unix:av_dispatch_autovect+69 (14)
Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cdcc 
unix:dispatch_hardint+1a (14, 0)




jwa at barcelona:/var/crash/barcelona# mdb -k unix.8 vmcore.8
Loading modules: [ unix genunix specfs dtrace cpu.generic uppc pcplusmp 
scsi_vhci zfs mpt sd ip hook neti sctp arp usba fctl md lofs random sppp crypto 
ptm nfs fcip fcp cpc logindmux nsctl ii sdbc ufs rdc nsmb sv ]
> ::status
debugging crash dump vmcore.8 (32-bit) from barcelona
operating system: 5.11 snv_91 (i86pc)
panic message: kernel heap corruption detected
dump content: kernel pages only
> ::panicinfo
             cpu        0
          thread d391cde0
         message kernel heap corruption detected
              gs fec301b0
              fs fec30000
              es fec30160
              ds fec30160
             edi        f
             esi e5279200
             ebp d391cbd4
             esp d391cbc4
             ebx e5279264
             edx        0
             ecx        f
             eax d391cbe0
          trapno        0
             err        0
             eip fe838350
              cs fec30158
          eflags      282
            uesp        0
              ss fec30160
             gdt fe7fe00002cf
             idt fe7fd00007ff
             ldt        0
            task      150
             cr0 8005003b
             cr2 cfe23174
             cr3  24c0000
             cr4      6d8
> $C
d391cbd4 vpanic(fea67a08)
d391cc20 kmem_error+0x421(6, d1024398, e52c7400)
d391cc5c kmem_free+0xbf(e52c7400, 8)
d391cc78 glm_pkt_destroy_extern+0x60(d7a77600, e9767388)
d391cc90 glm_scsi_destroy_pkt+0x42(e97674a8, e97674a4)
d391cca8 scsi_destroy_pkt+0x16(e97674a4)
d391ccc8 sd_destroypkt_for_uscsi+0x89(d9365de0)
d391ccf4 sd_return_command+0x124(d4106a80, d9365de0)
d391cd28 sdintr+0x499(e97674a4)
d391cd4c glm_doneq_empty+0x3b(d7a77600)
d391cd60 glm_intr+0x75(d7a77600, 0)
d391cdac av_dispatch_autovect+0x69(14)
d391cdcc dispatch_hardint+0x1a(14, 0)
d918bc6c switch_sp_and_call+0xf(d391cddc, fe8196c4, 14, 0)
d918bca8 do_interrupt+0x7c(d918bcb8, f6c57c80)
d918bcb8 _interrupt+0x59()
d918bd38 bcopy+0x13(d42e8b68)
d918bd60 zio_done+0x2a(d42e8b68)
d918bd78 zio_execute+0x66()
d918bdc8 taskq_thread+0x176(d547e388, 0)
d918bdd8 thread_start+8()

jwa at barcelona:/var/crash/barcelona# modinfo | grep ncrs
163 f8c1c000   abb4  75   1  ncrs (NCRS SCSI HBA Driver 1.25)


I've also booted off of the 2008.05 CD and tried to do I/O (mostly tars & 
copying large files around); it panics from there, too.  So it's not some funny 
thing I've done to /etc/system or a /kernel/drv/*.conf file.


Because this is affecting two different machines with two different identical 
model SCSI cards, I'm tempted to point the finger at the SCSI driver... but 
about two years ago, I put one of these SCSI cards in an older x86 box running 
Solaris 10 (01/06 I believe) as well as an Ultra 10 running  06/06 and it 
worked w/o panicing.  

Another tidbit: sometimes it panics when I run the 'format' command.

Any suggestions?

thanks,
James
 
 
This message posted from opensolaris.org

Reply via email to