[ cross posted from help: http://opensolaris.org/jive/thread.jspa?threadID=65070 ]
Hi there. I'm getting kernel panics, and I don't know why. I have a 6 drive SCSI multipack connected to a LSI Logic / Symbios Logic 53c875 (using the ncrs driver). The box itself is an older Dell 1600SC with 1.GB RAM. (32 bit xeon). The box, scsi card, and multipack have been rock solid for the past 7 years. I installed opensolaris 2008.05 (snv_86) and created a ZFS volume (raid 1+0) across the 6 drives. When I copy files across the network to the volume, the machine will eventually (anywhere between 5 minutes and 2 hours) panic. Interestingly, I have the same model card, another SCSI disk pack, and another machine (PowerEdge SC440, core2 duo). On this box, I'm also running opensolaris 2008.05. I get identical panics, whether using the 64 bit (glm?) driver or the 32 bit ncrs driver. I upgraded the Dell 1600SC to snv_91 in the hope that the problem would magically go away. It didn't :-( I added "set kmem_flags=0xf" to /etc/system & here's the most recent panic: Jun 26 21:31:03 barcelona genunix: [ID 478202 kern.notice] kernel memory allocator: Jun 26 21:31:03 barcelona genunix: [ID 432124 kern.notice] buffer freed to wrong cache Jun 26 21:31:03 barcelona genunix: [ID 815666 kern.notice] buffer was allocated from kmem_alloc_320, Jun 26 21:31:03 barcelona genunix: [ID 530907 kern.notice] caller attempting free to kmem_alloc_8. Jun 26 21:31:03 barcelona genunix: [ID 563406 kern.notice] buffer=e52c7400 bufctl=e5279200 cache: kmem_alloc_8 Jun 26 21:31:03 barcelona genunix: [ID 341866 kern.notice] previous transaction on buffer e52c7400: Jun 26 21:31:03 barcelona genunix: [ID 991227 kern.notice] thread=e12e7ce0 time=T-0.013422618 slab=e509c088 cache: k mem_alloc_320 Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_cache_alloc_debug+258 Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_cache_alloc+8d Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] kmem_zalloc+4b Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] glm_pkt_alloc_extern+83 Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] glm_scsi_init_pkt+129 Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] scsi_init_pkt+48 Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_initpkt_for_uscsi+9e Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_start_cmds+15f Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_core_iostart+158 Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_uscsi_strategy+108 Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] default_physio+31b Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] physio+1d Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] scsi_uscsi_handle_cmd+16d Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sd_send_scsi_cmd+13f Jun 26 21:31:03 barcelona genunix: [ID 851371 kern.notice] sdioctl+c86 Jun 26 21:31:03 barcelona unix: [ID 836849 kern.notice] Jun 26 21:31:03 barcelona ^Mpanic[cpu0]/thread=d391cde0: Jun 26 21:31:03 barcelona genunix: [ID 812275 kern.notice] kernel heap corruption detected Jun 26 21:31:03 barcelona unix: [ID 100000 kern.notice] Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc20 genunix:kmem_error+421 (6, d1024398, e52c74) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc5c genunix:kmem_free+bf (e52c7400, 8) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc78 ncrs:glm_pkt_destroy_extern+60 (d7a77600, e9767388) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cc90 ncrs:glm_scsi_destroy_pkt+42 (e97674a8, e97674a4) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cca8 scsi:scsi_destroy_pkt+16 (e97674a4) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391ccc8 sd:sd_destroypkt_for_uscsi+89 (d9365de0) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391ccf4 sd:sd_return_command+124 (d4106a80, d9365de0) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd28 sd:sdintr+499 (e97674a4) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd4c ncrs:glm_doneq_empty+3b (d7a77600) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cd60 ncrs:glm_intr+75 (d7a77600, 0) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cdac unix:av_dispatch_autovect+69 (14) Jun 26 21:31:03 barcelona genunix: [ID 353471 kern.notice] d391cdcc unix:dispatch_hardint+1a (14, 0) [EMAIL PROTECTED]:/var/crash/barcelona# mdb -k unix.8 vmcore.8 Loading modules: [ unix genunix specfs dtrace cpu.generic uppc pcplusmp scsi_vhci zfs mpt sd ip hook neti sctp arp usba fctl md lofs random sppp crypto ptm nfs fcip fcp cpc logindmux nsctl ii sdbc ufs rdc nsmb sv ] > ::status debugging crash dump vmcore.8 (32-bit) from barcelona operating system: 5.11 snv_91 (i86pc) panic message: kernel heap corruption detected dump content: kernel pages only > ::panicinfo cpu 0 thread d391cde0 message kernel heap corruption detected gs fec301b0 fs fec30000 es fec30160 ds fec30160 edi f esi e5279200 ebp d391cbd4 esp d391cbc4 ebx e5279264 edx 0 ecx f eax d391cbe0 trapno 0 err 0 eip fe838350 cs fec30158 eflags 282 uesp 0 ss fec30160 gdt fe7fe00002cf idt fe7fd00007ff ldt 0 task 150 cr0 8005003b cr2 cfe23174 cr3 24c0000 cr4 6d8 > $C d391cbd4 vpanic(fea67a08) d391cc20 kmem_error+0x421(6, d1024398, e52c7400) d391cc5c kmem_free+0xbf(e52c7400, 8) d391cc78 glm_pkt_destroy_extern+0x60(d7a77600, e9767388) d391cc90 glm_scsi_destroy_pkt+0x42(e97674a8, e97674a4) d391cca8 scsi_destroy_pkt+0x16(e97674a4) d391ccc8 sd_destroypkt_for_uscsi+0x89(d9365de0) d391ccf4 sd_return_command+0x124(d4106a80, d9365de0) d391cd28 sdintr+0x499(e97674a4) d391cd4c glm_doneq_empty+0x3b(d7a77600) d391cd60 glm_intr+0x75(d7a77600, 0) d391cdac av_dispatch_autovect+0x69(14) d391cdcc dispatch_hardint+0x1a(14, 0) d918bc6c switch_sp_and_call+0xf(d391cddc, fe8196c4, 14, 0) d918bca8 do_interrupt+0x7c(d918bcb8, f6c57c80) d918bcb8 _interrupt+0x59() d918bd38 bcopy+0x13(d42e8b68) d918bd60 zio_done+0x2a(d42e8b68) d918bd78 zio_execute+0x66() d918bdc8 taskq_thread+0x176(d547e388, 0) d918bdd8 thread_start+8() [EMAIL PROTECTED]:/var/crash/barcelona# modinfo | grep ncrs 163 f8c1c000 abb4 75 1 ncrs (NCRS SCSI HBA Driver 1.25) I've also booted off of the 2008.05 CD and tried to do I/O (mostly tars & copying large files around); it panics from there, too. So it's not some funny thing I've done to /etc/system or a /kernel/drv/*.conf file. Because this is affecting two different machines with two different identical model SCSI cards, I'm tempted to point the finger at the SCSI driver... but about two years ago, I put one of these SCSI cards in an older x86 box running Solaris 10 (01/06 I believe) as well as an Ultra 10 running 06/06 and it worked w/o panicing. Another tidbit: sometimes it panics when I run the 'format' command. Any suggestions? thanks, James This message posted from opensolaris.org _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
