Re: [zfs-discuss] [storage-discuss] iscsi target problems on snv_97

2008-09-17 Thread tim szeto
Moore, Joe wrote:
 I believe the problem you're seeing might be related to deadlock
 condition (CR 6745310), if you run pstack on the
 iscsi target  daemon you might find a bunch of zombie
 threads.  The fix
 is putback to snv-99, give snv-99 a try.
 

 Yes, a pstack of the core I've generated from iscsitgtd does have a number of 
 zombie threads.

 I'm afraid I can't make heads nor tails of the bug report at 
 http://bugs.opensolaris.org/view_bug.do?bug_id=6658836 nor its duplicate-of 
 6745310, nor any of the related bugs (all are unavailable except for 
 6676298, and the stack trace reported in that bug doesn't look anything like 
 mine.

 As far as I can tell snv-98 is the latest build, from Sep 10 according to 
 http://dlc.sun.com/osol/on/downloads/.  So snv-99 should be out next week, 
 correct?
   
snv-99 should be out next week.
 Anything I can do in the mean time?  Do I need to BFU to the latest nightly 
 build?  Or would just taking the iscsitgtd from that build suffice?
   
You could try snv-98.  You don't need to bfu, just get the latest iscsitgtd.

-Tim

 --Joe
 ___
 storage-discuss mailing list
 [EMAIL PROTECTED]
 http://mail.opensolaris.org/mailman/listinfo/storage-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] iscsi target problems on snv_97

2008-09-16 Thread tim szeto
Moore, Joe wrote:
 I've recently upgraded my x4500 to Nevada build 97, and am having problems 
 with the iscsi target.

 Background: this box is used to serve NFS underlying a VMware ESX environment 
 (zfs filesystem-type datasets) and presents iSCSI targets (zfs zvol datasets) 
 for a Windows host and to act as zoneroots for Solaris 10 hosts.  For optimal 
 random-read performance, I've configured a single zfs pool of mirrored VDEVs 
 of all 44 disks (+2 boot disks, +2 spares = 48)

 Before the upgrade, the box was flaky under load: all I/Os to the ZFS pool 
 would stop occasionally.

 Since the upgrade, that hasn't happened, and the NFS clients are quite happy. 
  The iSCSI initiators are not.

 The windows initiator is running the Microsoft iSCSI initiator v2.0.6 on 
 Windows 2003 SP2 x64 Enterprise Edition.  When the system reboots, it is not 
 able to connect to its iscsi targets.  No devices are found until I restart 
 the iscsitgt process on the x4500, at which point the initiator will 
 reconnect and find everything.  I notice that on the x4500, it maintains an 
 active TCP connection (according to netstat -an | grep 3260) to the Windows 
 box through the reboot and for a long time afterwards.  The initiator starts 
 a second connection, but it seems that the target doesn't let go of the old 
 one.  Or something.  At this point, every time I reboot the Windows system I 
 have to `pkill iscsitgtd`
   
 The Solaris system is running S10 Update 4.  Every once in a while (twice 
 today, and not correlated with the pkill's above) the system reports that all 
 of the iscsi disks are unavailable.  Nothing I've tried short of a reboot of 
 the whole host brings them back.  All of the zones on the system remount 
 their zoneroots read-only (and give I/O errors when read or zlogin'd to)

 There are a set of TCP connections from the zonehost to the x4500 that remain 
 even through disabling the iscsi_initiator service.  There's no process 
 holding them as far as pfiles can tell.

 Does this sound familiar to anyone?  Any suggestions on what I can do to 
 troubleshoot further?  I have a kernel dump from the zonehost and a snoop 
 capture of the wire for the Windows host (but it's big).
   
I believe the problem you're seeing might be related to deadlock 
condition (CR 6745310), if you run pstack on the
iscsi target  daemon you might find a bunch of zombie threads.  The fix 
is putback to snv-99, give snv-99 a try.

-Tim

 I'll be opening a bug too.

 Thanks,
 --Joe
 ___
 storage-discuss mailing list
 [EMAIL PROTECTED]
 http://mail.opensolaris.org/mailman/listinfo/storage-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: iSCSI target not coming back up (was Fwd: [zfs-discuss] Re: snv63: kernel panic on import)

2007-05-16 Thread tim szeto

Nigel,

   Was the iSCSI target daemon running and the targets are gone?  or 
did the daemon core repeatedly?


  How did you created the targets?

-tim

eric kustarz wrote:

Hi Tim,

Is the iSCSI target not coming back up after a reboot a known problem?

Can you take a look?

eric

Begin forwarded message:


From: eric kustarz [EMAIL PROTECTED]
Date: May 16, 2007 8:56:44 AM PDT
To: Nigel Smith [EMAIL PROTECTED]
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Re: snv63: kernel panic on import


On May 15, 2007, at 4:49 PM, Nigel Smith wrote:


I seem to have got the same core dump, in a different way.
I had a zpool setup on a iscsi 'disk'.  For details see:
http://mail.opensolaris.org/pipermail/storage-discuss/2007-May/001162.html 

But after a reboot the iscsi target was not longer available, so the 
iscsi

initiator could not provide the disk that he zpool was based on.
I did a 'zpool status', but the PC just rebooted, rather than 
handling it in a

graceful way.
After the reboot I discover a core dump has been created - details 
below:


ZFS panic'ing on a failed write in a non-redundant pool is known and 
is being worked on.  Why the iSCSI device didn't come up is also a 
bug.  I'll ask the iSCSI people to take a look...


eric



# cat /etc/release
Solaris Nevada snv_60 X86
   Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
 Assembled 12 March 2007
#
# cd /var/crash/solaris
# mdb -k 1
Loading modules: [ unix genunix specfs dtrace uppc pcplusmp 
scsi_vhci ufs ip hook neti
 sctp arp usba uhci qlc fctl nca lofs zfs random md cpc crypto fcip 
fcp logindmux ptm

  sppp emlxs ipc ]

::status

debugging crash dump vmcore.1 (64-bit) from solaris
operating system: 5.11 snv_60 (i86pc)
panic message:
ZFS: I/O failure (write on unknown off 0: zio fffec38cf340 [L0 
packed nvlist]
 4000L/600P DVA[0]=0:160225800:600 DVA[1]=0:9800:600 fletcher4 
lzjb LE contiguous

  birth=192896 fill=1 cksum=6b28
dump content: kernel pages only

*panic_thread::findstack -v

stack pointer for thread ff00025b2c80: ff00025b28f0
  ff00025b29e0 panic+0x9c()
  ff00025b2a40 zio_done+0x17c(fffec38cf340)
  ff00025b2a60 zio_next_stage+0xb3(fffec38cf340)
  ff00025b2ab0 zio_wait_for_children+0x5d(fffec38cf340, 11, 
fffec38cf598)

  ff00025b2ad0 zio_wait_children_done+0x20(fffec38cf340)
  ff00025b2af0 zio_next_stage+0xb3(fffec38cf340)
  ff00025b2b40 zio_vdev_io_assess+0x129(fffec38cf340)
  ff00025b2b60 zio_next_stage+0xb3(fffec38cf340)
  ff00025b2bb0 vdev_mirror_io_done+0x2af(fffec38cf340)
  ff00025b2bd0 zio_vdev_io_done+0x26(fffec38cf340)
  ff00025b2c60 taskq_thread+0x1a7(fffec154f018)
  ff00025b2c70 thread_start+8()

::cpuinfo -v
 ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH 
THREAD   PROC
  0 fbc31f80  1b20  99   nono t-0
ff00025b2c80 sched

   ||
RUNNING --++--  PRI THREAD   PROC
  READY60 ff00022c9c80 sched
 EXISTS60 ff00020e9c80 sched
 ENABLE

 ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH 
THREAD   PROC
  1 fffec11ad000  1f30  59  yesno t-0
fffec3dcbbc0 syslogd

   ||
RUNNING --++--  PRI THREAD   PROC
  READY60 ff000212bc80 sched
   QUIESCED59 fffec1e51360 syslogd
 EXISTS59 fffec1ec2180 syslogd
 ENABLE


::quit



This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss