Sorry if this has already been posted -- I've been searching and haven't found 
anything.

For the past 2 months I've been experiencing a system crash that at first 
appeared random until I figured out it was when I was running a backup via 
bacula.  I spent a month looking at bacula to see why it was causing a system 
crash only to realize it only occurred when I was writing to a zfs file system 
located on an iscsi target.  The configuration worked fine with Indiana DP2 but 
since going to the current 05.2008 release (and continues even after patching 
to snv_91 with pkg) the crash started.  I was able to eliminate bacula as the 
cause by simply trying to copy a file to the iscsi/zfs filesystem -- instant 
system crash.

Finding the core dump in /var/crash/rebel, I get the following:

-bash-3.2# echo '::status' | mdb -k 11
debugging crash dump vmcore.11 (32-bit) from rebel
operating system: 5.11 snv_91 (i86pc)
panic message: 
BAD TRAP: type=e (#pf Page fault) rp=d3c826c8 addr=10 occurred in module "ip" du
e to a NULL pointer dereference
dump content: kernel pages only

-bash-3.2# echo '$C' | mdb -k 11
d3c827ac tcp_send+0x704(d9a18238, d9fe3780, 5b4, 28, 14, 0)
d3c82840 tcp_wput_data+0x6d1(d9fe3780, 0, 0)
d3c8293c tcp_rput_data+0x2930(d9fe3640, d8b11180, d4efef40)
d3c82978 squeue_enter_chain+0xea(d4efef40, d8b11180, d8b11180, 1, 1)
d3c82a0c ip_input+0x944(d6088214, d7eba020, 0, d3c82a30)
d3c82a80 i_dls_link_rx+0x250(d93a4dd0, d7eba020, d8b11180)
d3c82ab8 mac_do_rx+0x9f(d93a5e20, d7eba020, d8b11180, 0)
d3c82ad4 mac_rx+0x16(d93a5e20, d7eba020, d8b11180)
d3c82af4 softmac_rput+0x37(d8ebcce0, d8b11180)
d3c82b34 putnext+0x1bc(d8ebca10, d8b11180)
d3c82b5c gld_passon+0x173(d92dc238, d8b11180, d3c82bdc, fe84f7d0)
d3c82bac gld_sendup+0x101(d897b800, d3c82bdc, d8b11180, f9ce4cb8)
d3c82c6c gld_recv_tagged+0x26c(d897b800, d8b11180, 0)
d3c82c84 gld_recv+0x13(d897b800, d8b11180, bc0, 8d78)
d3c82cf4 gem_receive+0x4ae(d61e0000, d9002020, 10080, 1)
d3c82d24 bfe_interrupt+0x2ce(d61e0000, 17582, d3c82d54, f99c5d44)
d3c82d54 gem_gld_intr+0x2b(d897b800)
d3c82d60 gld_intr+0x1f(d897b800, 0)
d3c82dac av_dispatch_autovect+0x69(13)
d3c82dcc dispatch_hardint+0x1a(13, 0)
d378bcfc switch_sp_and_call+0xf(d3c82ddc, fe8196c4, 13, 0)
d378bd34 do_interrupt+0x7c(d378bd44, fec3c8a4)
d378bd44 _interrupt+0x59()
d378bd9c mach_cpu_idle+0x17()
d378bdb0 cpu_idle+0xe8()
d378bdc8 idle+0x3f(0, 0)
d378bdd8 thread_start+8()


My initiator is configured and working:

-bash-3.2# iscsiadm list initiator-node
Initiator node name: iqn.1986-03.com.sun:01:0000000068ea.4822f60e
Initiator node alias: opensolaris
        Login Parameters (Default/Configured):
                Header Digest: NONE/-
                Data Digest: NONE/-
        Authentication Type: CHAP
                CHAP Name: J_Brown
        RADIUS Server: NONE
        RADIUS Access: disabled
        Configured Sessions: 1
-bash-3.2# iscsiadm list target
Target: iqn.1992-05.com.emc:apm000450062640038-9
        Alias: J_Brown
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1
-bash-3.2# iscsiadm list static-config
Static Configuration Target: 
iqn.1992-05.com.emc:apm000450062640038-9,10.8.239.50:3260


I've been able to create a zpool on the iscsi device but attempts to write to 
it result in the crash.  However, when I destroy the zpool and just use format 
to access it, it appears to be permitting writing without crashing.  (currently 
doing a format of the 100GB lun):
-bash-3.2# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c4d0 <DEFAULT cyl 4862 alt 2 hd 255 sec 63>
          /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
       1. c6t1d0 <EMC-Celerra iSCSI-0001-97.66GB>
          /iscsi/[EMAIL PROTECTED],1
Specify disk (enter its number): 1
selecting c6t1d0
[disk formatted]
...
format> format
Ready to format.  Formatting cannot be interrupted
and takes 1600 minutes (estimated). Continue? y
Beginning format. The current time is Thu Jul  3 11:09:53 2008

Formatting...
done

Verifying media...
        pass 0 - pattern = 0xc6dec6de



The problem only appears so far when I have zfs on the lun.  I've tried both 
zfs version 8 and 10 with the same results.

Any suggestions on what I might try to prevent the crash?
 
 
This message posted from opensolaris.org
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to