Chris
Many thanks for emailing the snoop capture file to me.
The reason it was so large was that you had
inadvertently captured a lot of SSH data.
Once I had filtered that out, the capture file
was a much more sensible size. You have captured
the full iScsi session, so that is good.
I have made the capture file available here:
http://www.nwsmith.net/solaris/AppInit-SunTgt.cap

When I looked at the capture with Ethereal,
I was surprised by how much negotiation the
GlobalSAN initiator was doing compared to what 
we normally see with other initiators.
There are numerous repeated SCSI Inquiries and Mode Senses.
As I think these are superfluous to the cause of the problem,
I've edited them out of this post, and put them into
an separate file, for those that are interested in
looking at them, available here:
http://www.nwsmith.net/solaris/AppInit-SunTgt-Decode.txt

Ok, so here's my analysis of the capture:

GlobalSAN Initiator is 192.168.1.5    
Solaris Target is 192.168.1.1

### Initiator Login (packet #6)

SessionType=Normal
InitialR2T=No
HeaderDigest=None
DataDigest=None
MaxConnections=1
ImmediateData=Yes
MaxOutstandingR2T=4
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0.TargetName=iqn.1986-03.com.sun:02:15665ce1-a3a0-e5bd-e9dd-d78783dec173.media
InitiatorName=iqn.2005-03.com.studionetworksolutions:mac-1093623522

### Target Response - Success (packet #8+9+10)

InitialR2T=Yes
HeaderDigest=None
DataDigest=None
MaxConnections=1
ImmediateData=Yes
MaxOutstandingR2T=4
DataPDUInOrder=Yes
DataSequenceInOrder=Yes
ErrorRecoveryLevel=0
TargetAlias=media
TargetPortalGroupTag=1

Init->Tgt (packet#14)    SCSI: Test Unit Ready LUN: 0x00
Tgt->Init (packet#16)    SCSI: Response LUN: 0x00 (Good)

<--snip-->

Init->Tgt (packet#185)    SCSI: SCSI: Read Capacity(10) LUN: 0x00
Tgt->Init (packet#186-187)SCSI: Data In LUN: 0x00 (Read Capacity(10) Response)

<--snip-->

Init->Tgt (packet#237)    SCSI: Read(10) LUN: 0x00 (LBA: 0x00000000, Len: 48)

The Target ACK's this packet, then does nothing further.
After 4 seconds, the Initiator tries:

Init->Tgt (packet#239)    SCSI: Test Unit Ready LUN: 0x00

Again, the Target ACK's this packet, then does nothing further.
In packet#241 the initiator FIN's the TCP session.

Basically, we are seeing that as soon as the initiator
tries to do an actual READ from the storage, then the target
never responds.

Which is exactly the same as what we saw here:
http://mail.opensolaris.org/pipermail/storage-discuss/2007-October/003552.html
with the Gpxe initiator.

So I think there is a good chance that the GlobalSAN initiator
has the same issue. Basically the solaris iscsi target expects
to see certain key/value pairs from the initiator, but if
they are not present, then it assumes default values, which
seem to be inappropriate.

I know Jim Dunham and his team are looking at this problem,
and hopefully working on a solution.
Bug ID 6619812 should track progress.
http://bugs.opensolaris.org/view_bug.do?bug_id=6619812

Jim & his team, like to take a very rigorous approach
to understanding, fixing & testing problems, so don't
expect a quick fix.

To get a quick fix, the best hope is if Rick McNeal
and Andrew Hettinger, could make available to you
their 'fixed' version of the solaris iscsi target.

More background on these issues here:
http://www.opensolaris.org/jive/thread.jspa?messageID=167293
http://www.opensolaris.org/jive/thread.jspa?messageID=165390

Chris, could you just confirm which build of OpenSolaris Nevada
you are using?
Regards
Nigel Smith
 
 
This message posted from opensolaris.org
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to