Ok, I've taken a look at Jim Barker's snoop trace file with Ethereal.

Right at the start of the trace you can see the Solaris iscsi target
sends a NOP-In and the Microsoft iscsi initiator immediately
replying with a NOP-Out. 
(This exchange is to verify that the connection/session is still active
 & that all components are operational, in what otherwise would be an
idle period).
Thirty seconds latter this sequence is repeated.

In a each of the NOP-In packets, the LUN field is zero,
the InitiatorTaskTag field (ITT) is set to the reserved value 0xffffffff,
and the StatSN field is being incremented.

Lets take a look at RFC-3720. 
The relevent section is 10.19.
http://www3.tools.ietf.org/html/rfc3720
It says:

     Otherwise, when a target sends a NOP-In that is not a response to a
     Nop-Out received from the initiator, the Initiator Task Tag MUST be
     set to 0xffffffff and the Data Segment MUST NOT contain any data
     (DataSegmentLength MUST be 0).

  10.19.2. StatSN
  
     The StatSN field will always contain the next StatSN.  However, when
     the Initiator Task Tag is set to 0xffffffff, StatSN for the
     connection is not advanced after this PDU is sent."

Ok, so there you have it. The StatSN field should NOT be being incremented.
And so the iscsi target is violating the iscsi protocol.

Personally, I find RFC-3720 very difficult to read, and I don't envy the job
of those people who need to write the software that supports the iscsi protocol!

Anyway, let's get back to analysing the trace.
The Microsoft iscsi initiator soon realises something is wrong when it
sends the next proper scsi command, in this case "Test Unit Ready".
The Solaris iscsi traget replies, but the initiator see's that the
StatSN field has the wrong value. StatSN is too high because it has
been incremented, when it should not have been.
The Microsoft iscsi initiator sends a RST packet to tear down
the TCP connection. This is immediately followed by a login
sequence, and soon everything is up & running again, until the next glitch.

But all this is just covering old ground.
I spotted this bug back in January 2007.
And Rick McNeal fixed it.
Probably one of the last bugs that he fixed before leaving Sun for pastures new.

Here is the bug - #6521425
"When sending a NOP-In message the statsn value should not be incremented."
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6521425

This was fixed in Nevada snv_60.
And here we can see the putback that fixes this bug (& a few others):
http://mail.opensolaris.org/pipermail/onnv-notify/2007-February/011229.html

You can see the fix here:
http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/cmd/iscsi/iscsitgtd/iscsi_conn.c?r2=3719&r1=3452

I wonder if those other Nevada fixes from February 2007 are missing
 from Solaris 10u4?
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6482080
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6521041
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6522093
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6523439

I'm not sure what the process is, or who is involved at Sun, in maintaining
the iscsi target in Solaris 10.
Maybe someone at Sun, reading this, can clarify the situation for us.
Regards
Nigel Smith
 
 
This message posted from opensolaris.org
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to