> On Nov 4, 2020, at 11:40 PM, Gulam Mohamed <gulam.moha...@oracle.com> wrote:
> 
> Description
> =========
> 1. This Kernel panic could be due to a timing issue when there is a race 
> between the sync thread and the initiator was processing of a login response 
> from the target. The session re-open can be invoked from two places
>          a.   Sessions sync thread when the iscsid restart
>          b.   From iscsid through iscsi error handler
> 2. The session reopen sequence is as follows in user-space 
> (iscsi-initiator-utils)
>          a.   Disconnect the connection
>          b.   Then send the stop connection request to the kernel which 
> releases the connection (releases the socket)
>          c.   Queues the reopen for 2 seconds delay
>         d.    Once the delay expires, create the TCP connection again by 
> calling the connect() call
>         e.    Poll for the connection
>          f.   When poll is successful i.e when the TCP connection is 
> established, it performs
>       i. Creation of session and connection data structures
>       ii. Bind the connection to the session. This is the place where we 
> assign the sock to tcp_sw_conn->sock
>       iii. Sets the parameters like target name, persistent address etc .
>       iv. Creates the login pdu
>       v. Sends the login pdu to kernel
>       vi. Returns to the main loop to process further events. The kernel then 
> sends the login request over to the target node
>       g. Once login response with success is received, it enters into full 
> feature phase and sets the negotiable parameters like max_recv_data_length, 
> max_transmit_length, data_digest etc .
> 3. While setting the negotiable parameters by calling 
> "iscsi_session_set_neg_params()", kernel panicked as sock was NULL
> 
> What happened here is
> --------------------------------
> 1.    Before initiator received the login response mentioned in above point 
> 2.f.v, another reopen request was sent from the error handler/sync session 
> for the same session, as the initiator utils was in main loop to process 
> further events (as 
>       mentioned in point 2.f.vi above). 
> 2.    While processing this reopen, it stopped the connection which released 
> the socket and queued this connection and at this point of time the login 
> response was received for the earlier one


To make sure I got this you are saying before iscsi_sw_tcp_conn_stop calls 
iscsi_sw_tcp_release_conn that the recv path has called iscsi_recv_pdu right?


> 3.    The kernel passed over this response to user-space which then sent the 
> set_neg_params request to kernel
> 4.    As the connection was stopped, the sock was NULL and hence while the 
> kernel was processing the set param request from user-space, it panicked
> 
> Fix
> ----
> 1.    While setting the set_param in kernel, we need to check if sock is NULL
> 2.    If the sock is NULL, then return EPERM (operation not permitted)
> 3.    Due to this error handler will be invoked in user-space again to 
> recover the session
> 
> Signed-off-by: Gulam Mohamed <gulam.moha...@oracle.com>
> Reviewed-by: Junxiao Bi <junxiao...@oracle.com>
> ---
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index df47557a02a3..fd668a194053 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -711,6 +711,12 @@ static int iscsi_sw_tcp_conn_set_param(struct 
> iscsi_cls_conn *cls_conn,
>        struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
>        struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
> 
> +       if (!tcp_sw_conn->sock) {
> +               iscsi_conn_printk(KERN_ERR, conn,
> +                               "Cannot set param as sock is NULL\n");
> +               return -ENOTCONN;
> +       }
> +

I think this might have 2 issues:

1. It only fixes iscsi_tcp. What about other drivers that free/null 
resources/fields in ep_disconnect that we layer need to access in the set_param 
callback (the cxgbi drivers)?

2. What about drivers that do not free/null fields (be2iscsi, bnx2i, ql4xxx and 
qedi) so the set_param calls end up just working. What state will userspace be 
in where we have logged in, but iscsid also thinks we are in the middle of 
trying to login?

I think we could do:

1. On ep_disconnect and stop_conn set some iscsi_cls_conn bit that indicates 
set_param calls for connection level settings should fail. scsi_transport_iscsi 
could set and check this bit for all drivers.
2. On bind_conn clear the bit.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/05277786-2E1F-432D-AE73-F39565C6BEA4%40oracle.com.

Reply via email to