Hi Jeff, Just a suggestion. The stack trace shows an incoming ethernet packet which is (attempted to) be re-directed to outbound.
Therefore I would suspect IP or the ethernet device driver instead if ZFS/iSCSI. To rule out, you might be able to use a different type of ethernet interface? HTH, Joost On Jul 3, 2008, at 8:27 PM, Jeff Brown wrote: > Sorry if this has already been posted -- I've been searching and > haven't found anything. > > For the past 2 months I've been experiencing a system crash that at > first appeared random until I figured out it was when I was running > a backup via bacula. I spent a month looking at bacula to see why > it was causing a system crash only to realize it only occurred when > I was writing to a zfs file system located on an iscsi target. The > configuration worked fine with Indiana DP2 but since going to the > current 05.2008 release (and continues even after patching to snv_91 > with pkg) the crash started. I was able to eliminate bacula as the > cause by simply trying to copy a file to the iscsi/zfs filesystem -- > instant system crash. > > Finding the core dump in /var/crash/rebel, I get the following: > > -bash-3.2# echo '::status' | mdb -k 11 > debugging crash dump vmcore.11 (32-bit) from rebel > operating system: 5.11 snv_91 (i86pc) > panic message: > BAD TRAP: type=e (#pf Page fault) rp=d3c826c8 addr=10 occurred in > module "ip" du > e to a NULL pointer dereference > dump content: kernel pages only > > -bash-3.2# echo '$C' | mdb -k 11 > d3c827ac tcp_send+0x704(d9a18238, d9fe3780, 5b4, 28, 14, 0) > d3c82840 tcp_wput_data+0x6d1(d9fe3780, 0, 0) > d3c8293c tcp_rput_data+0x2930(d9fe3640, d8b11180, d4efef40) > d3c82978 squeue_enter_chain+0xea(d4efef40, d8b11180, d8b11180, 1, 1) > d3c82a0c ip_input+0x944(d6088214, d7eba020, 0, d3c82a30) > d3c82a80 i_dls_link_rx+0x250(d93a4dd0, d7eba020, d8b11180) > d3c82ab8 mac_do_rx+0x9f(d93a5e20, d7eba020, d8b11180, 0) > d3c82ad4 mac_rx+0x16(d93a5e20, d7eba020, d8b11180) > d3c82af4 softmac_rput+0x37(d8ebcce0, d8b11180) > d3c82b34 putnext+0x1bc(d8ebca10, d8b11180) > d3c82b5c gld_passon+0x173(d92dc238, d8b11180, d3c82bdc, fe84f7d0) > d3c82bac gld_sendup+0x101(d897b800, d3c82bdc, d8b11180, f9ce4cb8) > d3c82c6c gld_recv_tagged+0x26c(d897b800, d8b11180, 0) > d3c82c84 gld_recv+0x13(d897b800, d8b11180, bc0, 8d78) > d3c82cf4 gem_receive+0x4ae(d61e0000, d9002020, 10080, 1) > d3c82d24 bfe_interrupt+0x2ce(d61e0000, 17582, d3c82d54, f99c5d44) > d3c82d54 gem_gld_intr+0x2b(d897b800) > d3c82d60 gld_intr+0x1f(d897b800, 0) > d3c82dac av_dispatch_autovect+0x69(13) > d3c82dcc dispatch_hardint+0x1a(13, 0) > d378bcfc switch_sp_and_call+0xf(d3c82ddc, fe8196c4, 13, 0) > d378bd34 do_interrupt+0x7c(d378bd44, fec3c8a4) > d378bd44 _interrupt+0x59() > d378bd9c mach_cpu_idle+0x17() > d378bdb0 cpu_idle+0xe8() > d378bdc8 idle+0x3f(0, 0) > d378bdd8 thread_start+8() > > > My initiator is configured and working: > > -bash-3.2# iscsiadm list initiator-node > Initiator node name: iqn.1986-03.com.sun:01:0000000068ea.4822f60e > Initiator node alias: opensolaris > Login Parameters (Default/Configured): > Header Digest: NONE/- > Data Digest: NONE/- > Authentication Type: CHAP > CHAP Name: J_Brown > RADIUS Server: NONE > RADIUS Access: disabled > Configured Sessions: 1 > -bash-3.2# iscsiadm list target > Target: iqn.1992-05.com.emc:apm000450062640038-9 > Alias: J_Brown > TPGT: 1 > ISID: 4000002a0000 > Connections: 1 > -bash-3.2# iscsiadm list static-config > Static Configuration Target: iqn. > 1992-05.com.emc:apm000450062640038-9,10.8.239.50:3260 > > > I've been able to create a zpool on the iscsi device but attempts to > write to it result in the crash. However, when I destroy the zpool > and just use format to access it, it appears to be permitting > writing without crashing. (currently doing a format of the 100GB > lun): > -bash-3.2# format > Searching for disks...done > > > AVAILABLE DISK SELECTIONS: > 0. c4d0 <DEFAULT cyl 4862 alt 2 hd 255 sec 63> > /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL > PROTECTED],0 > 1. c6t1d0 <EMC-Celerra iSCSI-0001-97.66GB> > /iscsi/[EMAIL PROTECTED] > %3Aapm000450062640038-9FFFF,1 > Specify disk (enter its number): 1 > selecting c6t1d0 > [disk formatted] > ... > format> format > Ready to format. Formatting cannot be interrupted > and takes 1600 minutes (estimated). Continue? y > Beginning format. The current time is Thu Jul 3 11:09:53 2008 > > Formatting... > done > > Verifying media... > pass 0 - pattern = 0xc6dec6de > > > > The problem only appears so far when I have zfs on the lun. I've > tried both zfs version 8 and 10 with the same results. > > Any suggestions on what I might try to prevent the crash? > > > This message posted from opensolaris.org > _______________________________________________ > storage-discuss mailing list > [email protected] > http://mail.opensolaris.org/mailman/listinfo/storage-discuss -- Joost Mulders + email: [EMAIL PROTECTED] Technical Specialist + phone: +31-33-45-15701 Client Solutions + fax: +31-33-45-15734 Sun Microsystems + mobile: +31-6-5198-7268 -= Anything not done right, has to be done again =- _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
