Moore, Joe wrote: > I've recently upgraded my x4500 to Nevada build 97, and am having problems > with the iscsi target. > > Background: this box is used to serve NFS underlying a VMware ESX environment > (zfs filesystem-type datasets) and presents iSCSI targets (zfs zvol datasets) > for a Windows host and to act as zoneroots for Solaris 10 hosts. For optimal > random-read performance, I've configured a single zfs pool of mirrored VDEVs > of all 44 disks (+2 boot disks, +2 spares = 48) > > Before the upgrade, the box was flaky under load: all I/Os to the ZFS pool > would stop occasionally. > > Since the upgrade, that hasn't happened, and the NFS clients are quite happy. > The iSCSI initiators are not. > > The windows initiator is running the Microsoft iSCSI initiator v2.0.6 on > Windows 2003 SP2 x64 Enterprise Edition. When the system reboots, it is not > able to connect to its iscsi targets. No devices are found until I restart > the iscsitgt process on the x4500, at which point the initiator will > reconnect and find everything. I notice that on the x4500, it maintains an > active TCP connection (according to netstat -an | grep 3260) to the Windows > box through the reboot and for a long time afterwards. The initiator starts > a second connection, but it seems that the target doesn't let go of the old > one. Or something. At this point, every time I reboot the Windows system I > have to `pkill iscsitgtd` > > The Solaris system is running S10 Update 4. Every once in a while (twice > today, and not correlated with the pkill's above) the system reports that all > of the iscsi disks are unavailable. Nothing I've tried short of a reboot of > the whole host brings them back. All of the zones on the system remount > their zoneroots read-only (and give I/O errors when read or zlogin'd to) > > There are a set of TCP connections from the zonehost to the x4500 that remain > even through disabling the iscsi_initiator service. There's no process > holding them as far as pfiles can tell. > > Does this sound familiar to anyone? Any suggestions on what I can do to > troubleshoot further? I have a kernel dump from the zonehost and a snoop > capture of the wire for the Windows host (but it's big). > I believe the problem you're seeing might be related to deadlock condition (CR 6745310), if you run pstack on the iscsi target daemon you might find a bunch of zombie threads. The fix is putback to snv-99, give snv-99 a try.
-Tim > I'll be opening a bug too. > > Thanks, > --Joe > _______________________________________________ > storage-discuss mailing list > [email protected] > http://mail.opensolaris.org/mailman/listinfo/storage-discuss > _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
