Ok, I think I MAY have figured out what's going wrong.

Let's start with the dapptrace output, and clear out some of the 'noise'.
I've removed the lines containing these strings:
'queue_'
'util_title'
'_to_str'

And this is what we are left with:
(And I've numbered the lines from 01 to 42)

01 - 1009/22:  -> conn_poller(0x4931D0, 0x0, 0x0)
02 - 1009/22:    -> conn_state(0x4931D0, 0x1, 0xFFFFFFFF8717FCA0)
03 - 1009/22:    <- conn_state = 221
04 - 1009/22:    -> iscsi_handle_login_pkt(0x4931D0, 0x0, 0x0)
05 - 1009/22:      -> session_alloc(0x4931D0, 0xFFFFFD7FFE1FDF18, 0x43)
06 - 1009/22:      <- session_alloc = 327
07 - 1009/22:      -> connection_parameters_default(0x4931D0, 0x1, 0x0)
08 - 1009/22:      <- connection_parameters_default = 10
09 - 1009/22:      -> parse_text(0x4931D0, 0x98, 0xFFFFFD7FFE1FDE80)
10 - 1009/22:        -> find_target_node(0x47EDC9, 0x47EDC9, 0x0)
11 - 1009/22:        <- find_target_node = 123
12 - 1009/22:        -> check_access(0x46A930, 0x465D90, 0x0)
13 - 1009/22:        <- check_access = 305
14 - 1009/22:        -> add_text(0xFFFFFD7FFE1FDE80, 0xFFFFFD7FFE1FDE88, 
0x42BD84)
15 - 1009/22:        <- add_text = 184
16 - 1009/22:        -> add_text(0xFFFFFD7FFE1FDE80, 0xFFFFFD7FFE1FDE88, 
0x42BD90)
17 - 1009/22:        <- add_text = 184
18 - 1009/22:        -> add_text(0x493520, 0x493528, 0x47EE18)
19 - 1009/22:        <- add_text = 184
20 - 1009/22:      <- parse_text = 504
21 - 1009/22:      -> add_text(0xFFFFFD7FFE1FDE80, 0xFFFFFD7FFE1FDE88, 0x42B2CF)
22 - 1009/22:      <- add_text = 184
23 - 1009/22:      -> session_validate(0x49D010, 0x42B2CE, 0x1)
24 - 1009/22:      <- session_validate = 327
25 - 1009/22:      -> send_iscsi_pkt(0x4931D0, 0x4686D0, 0x479970)
26 - 1009/22:      <- send_iscsi_pkt = 117
27 - 1009/22:    <- iscsi_handle_login_pkt = 168

28 - 1009/22:    -> iscsi_handle_login_pkt(0x4931D0, 0x0, 0x0)
29 - 1009/22:      -> session_alloc(0x4931D0, 0xFFFFFD7FFE1FDF18, 0x43)
30 - 1009/22:      <- session_alloc = 51
31 - 1009/22:      -> connection_parameters_default(0x4931D0, 
0xFFFFFD7FFE1FDF18, 0x43)
32 - 1009/22:      <- connection_parameters_default = 10
33 - 1009/22:      -> conn_state(0x4931D0, 0x4, 0x0)
34 - 1009/22:      <- conn_state = 221
35 - 1009/22:    <- iscsi_handle_login_pkt = 316
36 - 1009/22:  <- conn_poller = 227
37 - 1009/21:  -> port_conn_remove(0x4931D0, 0x1, 0x0)
38 - 1009/21:  <- port_conn_remove = 115
39 - 1009/21:  <- conn_process = 422
40 - 1009/21:  -> conn_process(0x4931D0, 0x0, 0x0)
41 - 1009/21:  -> conn_state(0x4931D0, 0x0, 0x0)
42 - 1009/21:  <- conn_state = 221

In line 04 we enter function "iscsi_handle_login_pkt()"
You can see the code for this here:
    
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/iscsi/iscsitgtd/iscsi_login.c?r=3452
I'm using revision 3452 from early 2007, as that looks
to be around the time of the Solaris 10u4 version of
the target, which is what we are dealing with here.

If you follow through the code:
In code line 166, we call "session_alloc()"
In code line 171, we call "connection_parameters_default()"
In code line 239, we call "parse_text()"
In code line 268, we call "add_text()"
In code line 599, we call "session_validate()"
In code line 601, we call "send_iscsi_pkt()"
and then we exit out of "iscsi_handle_login_pkt()"

Now in line 28 we enter "iscsi_handle_login_pkt()" again,
presumably for the second login packet.
In code line 166, we call "session_alloc()"
In code line 171, we call "connection_parameters_default()"
And in code line 188, we call "conn_state()"

This means that in line 185, this statement must have been TRUE:
    185         if (ntohs(lh.tsid) != 0) {

Just above that line, and just after are these comments:
    181         /*
    182          * Is this a new session or an attempt to add a connection to
    183          * an existing session.
    184          */
    187                 /* Multiple connections per session not handled right 
now */

Variable 'lh' is defined as "iscsi_login_hdr_t  lh;".
And if we look in this file:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/iscsi_protocol.h
... we find:
    467 /* Login Header */
    468 typedef struct _iscsi_login_hdr {
    469         uint8_t opcode;
    470         uint8_t flags;
    471         uint8_t max_version;    /* Max. version supported */
    472         uint8_t min_version;    /* Min. version supported */
    473         uint8_t hlength;
    474         uint8_t dlength[3];
    475         uint8_t isid[6];        /* Initiator Session ID */
    476         uint16_t tsid;  /* Target Session ID */
    477         uint32_t itt;   /* Initiator Task Tag */
    478         uint16_t cid;
    479         uint16_t rsvd3;
    480         uint32_t cmdsn;
    481         uint32_t expstatsn;
    482         uint8_t rsvd5[16];
    483 } iscsi_login_hdr_t;

So 'lh.tsid' is referring to "Target Session ID"

Let's take a look in RFC-3720:
http://www3.tools.ietf.org/html/rfc3720
I quote:
"   - TSIH (Target Session Identifying Handle): A target assigned tag for
     a session with a specific named initiator.  The target generates it
     during session establishment.  Its internal format and content are
     not defined by this protocol, except for the value 0 that is
     reserved and used by the initiator to indicate a new session.  It
     is given to the target during additional connection establishment
     for the same session."

What the Solaris iscsi target calls TSID, is, I think, the same as what
the RFC calls TSIH.

In the trace that James Nord has provided, the first login packet
has a TSIH of 0, but the second login packet has a TSIH of 0x000c,
which is NOT ZERO, so I think that's why the Solaris iscsi target FIN's
the connection.
The value 0x000c is in fact the TSIH in the response from the target
to the first login packet.
But looking at my traces for various other initiator software, where there
is a two stage login, even in the second login packet the TSIH
is still zero.
I believe the initiator should use TSIH of zero, until the login
has been successfully completed.

Ok, that's my theory. I've rushed through documenting this theory
so others can pull it apart. I may have made some wrong assumptions
in the above argument, so I am open to comments.

If only we could get the debugging to work in the Sol10 target
it might prove or disprove the above theory!
Thanks
Nigel Smith
 
 
This message posted from opensolaris.org
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to