Re: [storage-discuss] scsi driver + moderate I/O == panic: kernel heap corruption detected

Chris Horne Sat, 28 Jun 2008 21:11:22 -0700

It looks to me like both glm and ncrs driver are missing

+       cmd->cmd_cdblen         = (uchar_t)cdblen;
        cmd->cmd_scblen         = (uchar_t)statuslen;
        cmd->cmd_privlen        = (uchar_t)tgtlen;


in their glm_pkt_alloc_extern() implementation.  It looks like this error has
been lurking for a long time (since glm_pkt_alloc_extern was introduced).
You may be seeing this now because you now using a disk that requires use
of >12 byte CDBs.

-Chris

jwa wrote:
> [ cross posted from help: 
> http://opensolaris.org/jive/thread.jspa?threadID=65070 ]
> 
> 
> Hi there. I'm getting kernel panics, and I don't know why.
> 
> I have a 6 drive SCSI multipack connected to a LSI Logic / Symbios Logic 
> 53c875 (using the ncrs driver). The box itself is an older Dell 1600SC with 
> 1.GB RAM. (32 bit xeon). The box, scsi card, and multipack have been rock 
> solid for the past 7 years.
> 
> I installed opensolaris 2008.05 (snv_86) and created a ZFS volume (raid 1+0) 
> across the 6 drives. When I copy files across the network to the volume, the 
> machine will eventually (anywhere between 5 minutes and 2 hours) panic.
> 
> Interestingly, I have the same model card, another SCSI disk pack, and 
> another machine (PowerEdge SC440, core2 duo). On this box, I'm also running 
> opensolaris 2008.05. I get identical panics, whether using the 64 bit (glm?) 
> driver or the 32 bit ncrs driver.
> 
> I upgraded the Dell 1600SC to snv_91 in the hope that the problem would 
> magically go away. It didn't :-(
> 
> I added "set kmem_flags=0xf" to /etc/system & here's the most recent panic:
> 
> Jun 26 21:31:03 barcelona genunix: [ID 478202 kern.notice] kernel memory 
> allocator:
> Jun 26 21:31:03 barcelona genunix: [ID 432124 kern.notice] buffer freed to 
> wrong cache
> Jun 26 21:31:03 barcelona genunix: [ID 815666 kern.notice] buffer was 
> allocated from kmem_alloc_320,
> Jun 26 21:31:03 barcelona genunix: [ID 530907 kern.notice] caller attempting 
> free to kmem_alloc_8.
> Jun 26 21:31:03 barcelona genunix: [ID 563406 kern.notice] buffer=e52c7400 
> bufctl=e5279200 cache: kmem_alloc_8
> Jun 26 21:31:03 barcelona genunix: [ID 341866 kern.notice] previous 
> transaction on buffer e52c7400:
> Jun 26 21:31:03 barcelona genunix: [ID 991227 kern.notice] thread=e12e7ce0 
> time=T-0.013422618 slab=e509c088 cache: k
> mem_alloc_320
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
> kmem_cache_alloc_debug+258
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_cache_alloc+8d
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_zalloc+4b
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
> glm_pkt_alloc_extern+83
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
> glm_scsi_init_pkt+129
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] scsi_init_pkt+48
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
> sd_initpkt_for_uscsi+9e
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_start_cmds+15f
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_core_iostart+158
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
> sd_uscsi_strategy+108
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] default_physio+31b
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] physio+1d
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
> scsi_uscsi_handle_cmd+16d
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] 
> sd_send_scsi_cmd+13f
> Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sdioctl+c86
> Jun 26 21:31:03 barcelona unix: [ID 836849 kern.notice]
> Jun 26 21:31:03 barcelona ^Mpanic[cpu0]/thread=d391cde0:
> Jun 26 21:31:03 barcelona genunix: [ID 812275 kern.notice] kernel heap 
> corruption detected
> Jun 26 21:31:03 barcelona unix: [ID 100000 kern.notice]
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc20 
> genunix:kmem_error+421 (6, d1024398, e52c74)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc5c 
> genunix:kmem_free+bf (e52c7400, 8)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc78 
> ncrs:glm_pkt_destroy_extern+60 (d7a77600, e9767388)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc90 
> ncrs:glm_scsi_destroy_pkt+42 (e97674a8, e97674a4)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cca8 
> scsi:scsi_destroy_pkt+16 (e97674a4)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391ccc8 
> sd:sd_destroypkt_for_uscsi+89 (d9365de0)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391ccf4 
> sd:sd_return_command+124 (d4106a80, d9365de0)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd28 
> sd:sdintr+499 (e97674a4)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd4c 
> ncrs:glm_doneq_empty+3b (d7a77600)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd60 
> ncrs:glm_intr+75 (d7a77600, 0)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cdac 
> unix:av_dispatch_autovect+69 (14)
> Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cdcc 
> unix:dispatch_hardint+1a (14, 0)
> 
> 
> 
> 
> [EMAIL PROTECTED]:/var/crash/barcelona# mdb -k unix.8 vmcore.8
> Loading modules: [ unix genunix specfs dtrace cpu.generic uppc pcplusmp 
> scsi_vhci zfs mpt sd ip hook neti sctp arp usba fctl md lofs random sppp 
> crypto ptm nfs fcip fcp cpc logindmux nsctl ii sdbc ufs rdc nsmb sv ]
> 
>>::status
> 
> debugging crash dump vmcore.8 (32-bit) from barcelona
> operating system: 5.11 snv_91 (i86pc)
> panic message: kernel heap corruption detected
> dump content: kernel pages only
> 
>>::panicinfo
> 
> cpu 0
> thread d391cde0
> message kernel heap corruption detected
> gs fec301b0
> fs fec30000
> es fec30160
> ds fec30160
> edi f
> esi e5279200
> ebp d391cbd4
> esp d391cbc4
> ebx e5279264
> edx 0
> ecx f
> eax d391cbe0
> trapno 0
> err 0
> eip fe838350
> cs fec30158
> eflags 282
> uesp 0
> ss fec30160
> gdt fe7fe00002cf
> idt fe7fd00007ff
> ldt 0
> task 150
> cr0 8005003b
> cr2 cfe23174
> cr3 24c0000
> cr4 6d8
> 
>>$C
> 
> d391cbd4 vpanic(fea67a08)
> d391cc20 kmem_error+0x421(6, d1024398, e52c7400)
> d391cc5c kmem_free+0xbf(e52c7400, 8)
> d391cc78 glm_pkt_destroy_extern+0x60(d7a77600, e9767388)
> d391cc90 glm_scsi_destroy_pkt+0x42(e97674a8, e97674a4)
> d391cca8 scsi_destroy_pkt+0x16(e97674a4)
> d391ccc8 sd_destroypkt_for_uscsi+0x89(d9365de0)
> d391ccf4 sd_return_command+0x124(d4106a80, d9365de0)
> d391cd28 sdintr+0x499(e97674a4)
> d391cd4c glm_doneq_empty+0x3b(d7a77600)
> d391cd60 glm_intr+0x75(d7a77600, 0)
> d391cdac av_dispatch_autovect+0x69(14)
> d391cdcc dispatch_hardint+0x1a(14, 0)
> d918bc6c switch_sp_and_call+0xf(d391cddc, fe8196c4, 14, 0)
> d918bca8 do_interrupt+0x7c(d918bcb8, f6c57c80)
> d918bcb8 _interrupt+0x59()
> d918bd38 bcopy+0x13(d42e8b68)
> d918bd60 zio_done+0x2a(d42e8b68)
> d918bd78 zio_execute+0x66()
> d918bdc8 taskq_thread+0x176(d547e388, 0)
> d918bdd8 thread_start+8()
> 
> [EMAIL PROTECTED]:/var/crash/barcelona# modinfo | grep ncrs
> 163 f8c1c000 abb4 75 1 ncrs (NCRS SCSI HBA Driver 1.25)
> 
> 
> I've also booted off of the 2008.05 CD and tried to do I/O (mostly tars & 
> copying large files around); it panics from there, too. So it's not some 
> funny thing I've done to /etc/system or a /kernel/drv/*.conf file.
> 
> 
> Because this is affecting two different machines with two different identical 
> model SCSI cards, I'm tempted to point the finger at the SCSI driver... but 
> about two years ago, I put one of these SCSI cards in an older x86 box 
> running Solaris 10 (01/06 I believe) as well as an Ultra 10 running 06/06 and 
> it worked w/o panicing.
> 
> Another tidbit: sometimes it panics when I run the 'format' command.
> 
> Any suggestions?
> 
> thanks,
> James
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> storage-discuss mailing list
> [email protected]
> http://mail.opensolaris.org/mailman/listinfo/storage-discuss

_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Re: [storage-discuss] scsi driver + moderate I/O == panic: kernel heap corruption detected

Reply via email to