Ok, I've taken a look at Jim Barker's snoop trace file with Ethereal. Right at the start of the trace you can see the Solaris iscsi target sends a NOP-In and the Microsoft iscsi initiator immediately replying with a NOP-Out. (This exchange is to verify that the connection/session is still active & that all components are operational, in what otherwise would be an idle period). Thirty seconds latter this sequence is repeated.
In a each of the NOP-In packets, the LUN field is zero, the InitiatorTaskTag field (ITT) is set to the reserved value 0xffffffff, and the StatSN field is being incremented. Lets take a look at RFC-3720. The relevent section is 10.19. http://www3.tools.ietf.org/html/rfc3720 It says: Otherwise, when a target sends a NOP-In that is not a response to a Nop-Out received from the initiator, the Initiator Task Tag MUST be set to 0xffffffff and the Data Segment MUST NOT contain any data (DataSegmentLength MUST be 0). 10.19.2. StatSN The StatSN field will always contain the next StatSN. However, when the Initiator Task Tag is set to 0xffffffff, StatSN for the connection is not advanced after this PDU is sent." Ok, so there you have it. The StatSN field should NOT be being incremented. And so the iscsi target is violating the iscsi protocol. Personally, I find RFC-3720 very difficult to read, and I don't envy the job of those people who need to write the software that supports the iscsi protocol! Anyway, let's get back to analysing the trace. The Microsoft iscsi initiator soon realises something is wrong when it sends the next proper scsi command, in this case "Test Unit Ready". The Solaris iscsi traget replies, but the initiator see's that the StatSN field has the wrong value. StatSN is too high because it has been incremented, when it should not have been. The Microsoft iscsi initiator sends a RST packet to tear down the TCP connection. This is immediately followed by a login sequence, and soon everything is up & running again, until the next glitch. But all this is just covering old ground. I spotted this bug back in January 2007. And Rick McNeal fixed it. Probably one of the last bugs that he fixed before leaving Sun for pastures new. Here is the bug - #6521425 "When sending a NOP-In message the statsn value should not be incremented." http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6521425 This was fixed in Nevada snv_60. And here we can see the putback that fixes this bug (& a few others): http://mail.opensolaris.org/pipermail/onnv-notify/2007-February/011229.html You can see the fix here: http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/cmd/iscsi/iscsitgtd/iscsi_conn.c?r2=3719&r1=3452 I wonder if those other Nevada fixes from February 2007 are missing from Solaris 10u4? http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6482080 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6521041 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6522093 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6523439 I'm not sure what the process is, or who is involved at Sun, in maintaining the iscsi target in Solaris 10. Maybe someone at Sun, reading this, can clarify the situation for us. Regards Nigel Smith This message posted from opensolaris.org _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
