Re: [PATCH] iscsi: Do Not set param when sock is NULL

2021-01-18 Thread Mike Christie
On 1/7/21 9:48 AM, Gulam Mohamed wrote:
> Hi Michael,
> 
>  As per your suggestions in below mail, I have completed the 
> suggested changes and tested them. Can you please review and let me know your 
> comments? Here is the patch:
> 
> Description
> ===
> 1. This Kernel panic could be due to a timing issue when there is a race 
> between the sync thread and the initiator was processing of a login response 
> from the target. The session re-open can be invoked from two places
>   a. Sessions sync thread when the iscsid restart
>   b. From iscsid through iscsi error handler 2. The session reopen sequence 
> is as follows in user-space (iscsi-initiator-utils)
>a. Disconnect the connection
>b. Then send the stop connection request to the kernel which releases the 
> connection (releases the socket)
>c. Queues the reopen for 2 seconds delay
>d. Once the delay expires, create the TCP connection again by calling the 
> connect() call
>e. Poll for the connection
>f. When poll is successful i.e when the TCP connection is established, it 
> performs
>   i. Creation of session and connection data structures
>   ii. Bind the connection to the session. This is the place where we 
> assign the sock to tcp_sw_conn->sock
>   iii. Sets the parameters like target name, persistent address etc .
>   iv. Creates the login pdu
>v. Sends the login pdu to kernel
>   vi. Returns to the main loop to process further events. The kernel then 
> sends the login request over to the target node
>g. Once login response with success is received, it enters into full 
> feature phase and sets the negotiable parameters like max_recv_data_length, 
> max_transmit_length, data_digest etc .
> 3. While setting the negotiable parameters by calling 
> "iscsi_session_set_neg_params()", kernel panicked as sock was NULL
> 
> What happened here is
> -
> 1. Before initiator received the login response mentioned in above point 
> 2.f.v, another reopen request was sent from the error handler/sync session 
> for the same session, as the initiator utils was in main loop to process 
> further events (as mentioned in point 2.f.vi above).
> 2. While processing this reopen, it stopped the connection which released the 
> socket and queued this connection and at this point of time the login 
> response was received for the earlier on
> 
> Fix
> ---
> 
> 1. Create a flag "set_param_fail" in iscsi_cls_conn structure 2. On 
> ep_disconnect and stop_conn set this flag to indicate set_param calls for 
> connection level settings should fail 3. This way, scsi_transport_iscsi can 
> set and check this bit for all drivers 2. On bind_conn clear the bit
> 
> Signed-off-by: Gulam Mohamed 
> ---
>  drivers/scsi/scsi_transport_iscsi.c | 6 ++  
> include/scsi/scsi_transport_iscsi.h | 3 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_transport_iscsi.c 
> b/drivers/scsi/scsi_transport_iscsi.c
> index 2e68c0a87698..15c5a7223a40 100644
> --- a/drivers/scsi/scsi_transport_iscsi.c
> +++ b/drivers/scsi/scsi_transport_iscsi.c
> @@ -2473,6 +2473,8 @@ static void iscsi_if_stop_conn(struct iscsi_cls_conn 
> *conn, int flag)
>* it works.
>*/
>   mutex_lock(_mutex);
> + if (!test_bit(ISCSI_SET_PARAM_FAIL_BIT, >set_param_fail))
> + set_bit(ISCSI_SET_PARAM_FAIL_BIT, >set_param_fail);

You can just do a test_and_set_bit.

>   conn->transport->stop_conn(conn, flag);
>   mutex_unlock(_mutex);
>  
> @@ -2895,6 +2897,8 @@ iscsi_set_param(struct iscsi_transport *transport, 
> struct iscsi_uevent *ev)
>   session->recovery_tmo = value;
>   break;
>   default:
> + if (test_bit(ISCSI_SET_PARAM_FAIL_BIT, >set_param_fail))
> + return -ENOTCONN;
>   err = transport->set_param(conn, ev->u.set_param.param,
>  data, ev->u.set_param.len);
>   }
> @@ -2956,6 +2960,7 @@ static int iscsi_if_ep_disconnect(struct 
> iscsi_transport *transport,
>   mutex_lock(>ep_mutex);
>   conn->ep = NULL;
>   mutex_unlock(>ep_mutex);
> + set_bit(ISCSI_SET_PARAM_FAIL_BIT, >set_param_fail);
>   }
>  
>   transport->ep_disconnect(ep);
> @@ -3716,6 +3721,7 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr 
> *nlh, uint32_t *group)
>   ev->r.retcode = transport->bind_conn(session, conn,
>   ev->u.b_conn.transport_eph,
>   ev->u.b_conn.is_leading);
> + clear_bit(ISCSI_SET_PARAM_FAIL_BIT, >set_param_fail);

You should check retcode and only clear if it indicates success.


>   mutex_unlock(_mutex);
>  
>   if (ev->r.retcode || !transport->ep_connect) diff --git 
> a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
> 

RE: [PATCH] iscsi: Do Not set param when sock is NULL

2021-01-07 Thread Gulam Mohamed
 
+#define ISCSI_SET_PARAM_FAIL_BIT   1
+
 /**
  * struct iscsi_transport - iSCSI Transport template
  *
@@ -206,6 +208,7 @@ struct iscsi_cls_conn {
 
struct device dev;  /* sysfs transport/container device */
enum iscsi_connection_state state;
+   unsigned long set_param_fail; /* set_param for connection should fail 
+*/
 };
 
 #define iscsi_dev_to_conn(_dev) \
--
2.27.0

Regards,
Gulam Mohamed.

-Original Message-
From: Michael Christie 
Sent: Tuesday, November 24, 2020 4:37 AM
To: Gulam Mohamed 
Cc: Lee Duncan ; Chris Leech ; James E.J. 
Bottomley ; Martin K. Petersen 
; open-iscsi@googlegroups.com; 
linux-s...@vger.kernel.org; linux-ker...@vger.kernel.org; Junxiao Bi 

Subject: Re: [PATCH] iscsi: Do Not set param when sock is NULL



> On Nov 4, 2020, at 11:40 PM, Gulam Mohamed  wrote:
> 
> Description
> =
> 1. This Kernel panic could be due to a timing issue when there is a race 
> between the sync thread and the initiator was processing of a login response 
> from the target. The session re-open can be invoked from two places
>  a.   Sessions sync thread when the iscsid restart
>  b.   From iscsid through iscsi error handler
> 2. The session reopen sequence is as follows in user-space 
> (iscsi-initiator-utils)
>  a.   Disconnect the connection
>  b.   Then send the stop connection request to the kernel which 
> releases the connection (releases the socket)
>  c.   Queues the reopen for 2 seconds delay
> d.Once the delay expires, create the TCP connection again by 
> calling the connect() call
> e.Poll for the connection
>  f.   When poll is successful i.e when the TCP connection is 
> established, it performs
>   i. Creation of session and connection data structures
>   ii. Bind the connection to the session. This is the place where we 
> assign the sock to tcp_sw_conn->sock
>   iii. Sets the parameters like target name, persistent address etc .
>   iv. Creates the login pdu
>   v. Sends the login pdu to kernel
>   vi. Returns to the main loop to process further events. The kernel then 
> sends the login request over to the target node
>   g. Once login response with success is received, it enters into full 
> feature phase and sets the negotiable parameters like max_recv_data_length, 
> max_transmit_length, data_digest etc .
> 3. While setting the negotiable parameters by calling 
> "iscsi_session_set_neg_params()", kernel panicked as sock was NULL
> 
> What happened here is
> 
> 1.Before initiator received the login response mentioned in above point 
> 2.f.v, another reopen request was sent from the error handler/sync session 
> for the same session, as the initiator utils was in main loop to process 
> further events (as 
>   mentioned in point 2.f.vi above). 
> 2.While processing this reopen, it stopped the connection which released 
> the socket and queued this connection and at this point of time the login 
> response was received for the earlier one


To make sure I got this you are saying before iscsi_sw_tcp_conn_stop calls 
iscsi_sw_tcp_release_conn that the recv path has called iscsi_recv_pdu right?


> 3.The kernel passed over this response to user-space which then sent the 
> set_neg_params request to kernel
> 4.As the connection was stopped, the sock was NULL and hence while the 
> kernel was processing the set param request from user-space, it panicked
> 
> Fix
> 
> 1.While setting the set_param in kernel, we need to check if sock is NULL
> 2.If the sock is NULL, then return EPERM (operation not permitted)
> 3.Due to this error handler will be invoked in user-space again to 
> recover the session
> 
> Signed-off-by: Gulam Mohamed 
> Reviewed-by: Junxiao Bi 
> ---
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 
> df47557a02a3..fd668a194053 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -711,6 +711,12 @@ static int iscsi_sw_tcp_conn_set_param(struct 
> iscsi_cls_conn *cls_conn,
>struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
>struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
> 
> +   if (!tcp_sw_conn->sock) {
> +   iscsi_conn_printk(KERN_ERR, conn,
> +   "Cannot set param as sock is NULL\n");
> +   return -ENOTCONN;
> +   }
> +

I think this might have 2 issues:

1. It only fixes iscsi_tcp. What about other drivers that free/null 
resources/fields in ep_disconnect that we layer need to access in the set_param 
callback (the cxgbi drivers)?

2. What about drivers that do 

Re: [PATCH] iscsi: Do Not set param when sock is NULL

2020-11-23 Thread Michael Christie



> On Nov 4, 2020, at 11:40 PM, Gulam Mohamed  wrote:
> 
> Description
> =
> 1. This Kernel panic could be due to a timing issue when there is a race 
> between the sync thread and the initiator was processing of a login response 
> from the target. The session re-open can be invoked from two places
>  a.   Sessions sync thread when the iscsid restart
>  b.   From iscsid through iscsi error handler
> 2. The session reopen sequence is as follows in user-space 
> (iscsi-initiator-utils)
>  a.   Disconnect the connection
>  b.   Then send the stop connection request to the kernel which 
> releases the connection (releases the socket)
>  c.   Queues the reopen for 2 seconds delay
> d.Once the delay expires, create the TCP connection again by 
> calling the connect() call
> e.Poll for the connection
>  f.   When poll is successful i.e when the TCP connection is 
> established, it performs
>   i. Creation of session and connection data structures
>   ii. Bind the connection to the session. This is the place where we 
> assign the sock to tcp_sw_conn->sock
>   iii. Sets the parameters like target name, persistent address etc .
>   iv. Creates the login pdu
>   v. Sends the login pdu to kernel
>   vi. Returns to the main loop to process further events. The kernel then 
> sends the login request over to the target node
>   g. Once login response with success is received, it enters into full 
> feature phase and sets the negotiable parameters like max_recv_data_length, 
> max_transmit_length, data_digest etc .
> 3. While setting the negotiable parameters by calling 
> "iscsi_session_set_neg_params()", kernel panicked as sock was NULL
> 
> What happened here is
> 
> 1.Before initiator received the login response mentioned in above point 
> 2.f.v, another reopen request was sent from the error handler/sync session 
> for the same session, as the initiator utils was in main loop to process 
> further events (as 
>   mentioned in point 2.f.vi above). 
> 2.While processing this reopen, it stopped the connection which released 
> the socket and queued this connection and at this point of time the login 
> response was received for the earlier one


To make sure I got this you are saying before iscsi_sw_tcp_conn_stop calls 
iscsi_sw_tcp_release_conn that the recv path has called iscsi_recv_pdu right?


> 3.The kernel passed over this response to user-space which then sent the 
> set_neg_params request to kernel
> 4.As the connection was stopped, the sock was NULL and hence while the 
> kernel was processing the set param request from user-space, it panicked
> 
> Fix
> 
> 1.While setting the set_param in kernel, we need to check if sock is NULL
> 2.If the sock is NULL, then return EPERM (operation not permitted)
> 3.Due to this error handler will be invoked in user-space again to 
> recover the session
> 
> Signed-off-by: Gulam Mohamed 
> Reviewed-by: Junxiao Bi 
> ---
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index df47557a02a3..fd668a194053 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -711,6 +711,12 @@ static int iscsi_sw_tcp_conn_set_param(struct 
> iscsi_cls_conn *cls_conn,
>struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
>struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
> 
> +   if (!tcp_sw_conn->sock) {
> +   iscsi_conn_printk(KERN_ERR, conn,
> +   "Cannot set param as sock is NULL\n");
> +   return -ENOTCONN;
> +   }
> +

I think this might have 2 issues:

1. It only fixes iscsi_tcp. What about other drivers that free/null 
resources/fields in ep_disconnect that we layer need to access in the set_param 
callback (the cxgbi drivers)?

2. What about drivers that do not free/null fields (be2iscsi, bnx2i, ql4xxx and 
qedi) so the set_param calls end up just working. What state will userspace be 
in where we have logged in, but iscsid also thinks we are in the middle of 
trying to login?

I think we could do:

1. On ep_disconnect and stop_conn set some iscsi_cls_conn bit that indicates 
set_param calls for connection level settings should fail. scsi_transport_iscsi 
could set and check this bit for all drivers.
2. On bind_conn clear the bit.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/05277786-2E1F-432D-AE73-F39565C6BEA4%40oracle.com.


Re: [PATCH] iscsi: Do Not set param when sock is NULL

2020-11-21 Thread The Lee-Man
The patch concept is good. I have a couple of issues though.

First, you say you will return EPERM, but you return ENOTCONN. Update the 
text to match to code?

Also, the indentation seems messed up? It's hard to tell using the web 
interface. If so, please fix.

On Monday, November 16, 2020 at 9:08:59 PM UTC-8 Gulam Mohamed wrote:

> Gentle reminder.
>
> Regards,
> Gulam Mohamed.
>
> -Original Message-
> From: Gulam Mohamed 
> Sent: Thursday, November 5, 2020 11:11 AM
> To: Lee Duncan ; Chris Leech ; James 
> E.J. Bottomley ; Martin K. Petersen <
> martin@oracle.com>; open-...@googlegroups.com; 
> linux...@vger.kernel.org; linux-...@vger.kernel.org
> Cc: Junxiao Bi 
> Subject: [PATCH] iscsi: Do Not set param when sock is NULL
>
> Description
> =
> 1. This Kernel panic could be due to a timing issue when there is a race 
> between the sync thread and the initiator was processing of a login 
> response from the target. The session re-open can be invoked from two places
> a. Sessions sync thread when the iscsid restart
> b. From iscsid through iscsi error handler
> 2. The session reopen sequence is as follows in user-space 
> (iscsi-initiator-utils)
> a. Disconnect the connection
> b. Then send the stop connection request to the kernel which releases the 
> connection (releases the socket)
> c. Queues the reopen for 2 seconds delay
> d. Once the delay expires, create the TCP connection again by calling the 
> connect() call
> e. Poll for the connection
> f. When poll is successful i.e when the TCP connection is established, it 
> performs
> i. Creation of session and connection data structures
> ii. Bind the connection to the session. This is the place where we assign 
> the sock to tcp_sw_conn->sock
> iii. Sets the parameters like target name, persistent address etc .
> iv. Creates the login pdu
> v. Sends the login pdu to kernel
> vi. Returns to the main loop to process further events. The kernel then 
> sends the login request over to the target node
> g. Once login response with success is received, it enters into full 
> feature phase and sets the negotiable parameters like max_recv_data_length, 
> max_transmit_length, data_digest etc . 3. While setting the negotiable 
> parameters by calling "iscsi_session_set_neg_params()", kernel panicked as 
> sock was NULL
>
> What happened here is
> 
> 1. Before initiator received the login response mentioned in above point 
> 2.f.v, another reopen request was sent from the error handler/sync session 
> for the same session, as the initiator utils was in main loop to process 
> further events (as 
> mentioned in point 2.f.vi above). 
> 2. While processing this reopen, it stopped the connection which released 
> the socket and queued this connection and at this point of time the login 
> response was received for the earlier one
> 3. The kernel passed over this response to user-space which then sent the 
> set_neg_params request to kernel
> 4. As the connection was stopped, the sock was NULL and hence while the 
> kernel was processing the set param request from user-space, it panicked
>
> Fix
> 
> 1. While setting the set_param in kernel, we need to check if sock is NULL
> 2. If the sock is NULL, then return EPERM (operation not permitted)
> 3. Due to this error handler will be invoked in user-space again to 
> recover the session
>
> Signed-off-by: Gulam Mohamed 
> Reviewed-by: Junxiao Bi 
> ---
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 
> df47557a02a3..fd668a194053 100644
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -711,6 +711,12 @@ static int iscsi_sw_tcp_conn_set_param(struct 
> iscsi_cls_conn *cls_conn,
> struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
> struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
>
> + if (!tcp_sw_conn->sock) {
> + iscsi_conn_printk(KERN_ERR, conn,
> + "Cannot set param as sock is NULL\n");
> + return -ENOTCONN;
> + }
> +
> switch(param) {
> case ISCSI_PARAM_HDRDGST_EN:
> iscsi_set_param(cls_conn, param, buf, buflen);
> --
> 2.18.4
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/e7accc7d-26e7-4d89-9f0d-77c299326aeen%40googlegroups.com.


RE: [PATCH] iscsi: Do Not set param when sock is NULL

2020-11-16 Thread Gulam Mohamed
Gentle reminder.

Regards,
Gulam Mohamed.

-Original Message-
From: Gulam Mohamed 
Sent: Thursday, November 5, 2020 11:11 AM
To: Lee Duncan ; Chris Leech ; James E.J. 
Bottomley ; Martin K. Petersen 
; open-iscsi@googlegroups.com; 
linux-s...@vger.kernel.org; linux-ker...@vger.kernel.org
Cc: Junxiao Bi 
Subject: [PATCH] iscsi: Do Not set param when sock is NULL

Description
=
1. This Kernel panic could be due to a timing issue when there is a race 
between the sync thread and the initiator was processing of a login response 
from the target. The session re-open can be invoked from two places
  a.Sessions sync thread when the iscsid restart
  b.From iscsid through iscsi error handler
2. The session reopen sequence is as follows in user-space 
(iscsi-initiator-utils)
  a.Disconnect the connection
  b.Then send the stop connection request to the kernel which 
releases the connection (releases the socket)
  c.Queues the reopen for 2 seconds delay
 d. Once the delay expires, create the TCP connection again by 
calling the connect() call
 e. Poll for the connection
  f.When poll is successful i.e when the TCP connection is 
established, it performs
i. Creation of session and connection data structures
ii. Bind the connection to the session. This is the place where we 
assign the sock to tcp_sw_conn->sock
iii. Sets the parameters like target name, persistent address etc .
iv. Creates the login pdu
v. Sends the login pdu to kernel
vi. Returns to the main loop to process further events. The kernel then 
sends the login request over to the target node
g. Once login response with success is received, it enters into full 
feature phase and sets the negotiable parameters like max_recv_data_length, 
max_transmit_length, data_digest etc . 3. While setting the negotiable 
parameters by calling "iscsi_session_set_neg_params()", kernel panicked as sock 
was NULL

What happened here is

1.  Before initiator received the login response mentioned in above point 
2.f.v, another reopen request was sent from the error handler/sync session for 
the same session, as the initiator utils was in main loop to process further 
events (as 
mentioned in point 2.f.vi above). 
2.  While processing this reopen, it stopped the connection which released 
the socket and queued this connection and at this point of time the login 
response was received for the earlier one
3.  The kernel passed over this response to user-space which then sent the 
set_neg_params request to kernel
4.  As the connection was stopped, the sock was NULL and hence while the 
kernel was processing the set param request from user-space, it panicked

Fix

1.  While setting the set_param in kernel, we need to check if sock is NULL
2.  If the sock is NULL, then return EPERM (operation not permitted)
3.  Due to this error handler will be invoked in user-space again to 
recover the session

Signed-off-by: Gulam Mohamed 
Reviewed-by: Junxiao Bi 
---
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 
df47557a02a3..fd668a194053 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -711,6 +711,12 @@ static int iscsi_sw_tcp_conn_set_param(struct 
iscsi_cls_conn *cls_conn,
struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;

+   if (!tcp_sw_conn->sock) {
+   iscsi_conn_printk(KERN_ERR, conn,
+   "Cannot set param as sock is NULL\n");
+   return -ENOTCONN;
+   }
+
switch(param) {
case ISCSI_PARAM_HDRDGST_EN:
iscsi_set_param(cls_conn, param, buf, buflen);
--
2.18.4

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/48883f4d-edfa-4a8d-9d45-aa5225f82e57%40default.