On Jan 24, 2020, at 10:19 PM, Gabriel Krisman Bertazi <[email protected]> 
wrote:
> 
> From: Bharath Ravi <[email protected]>
> 
> Hi Lee,
> 
> Martin asked for you to re-review this patch before he applies it, since
> there was a small change from v3 after you acked it.  The change is that
> we started to protect the list_empty() verification with the spin lock
> on session destruction.
> 
> For that reason, I dropped your reviewed-by.  Can you please take
> another look so we can have this merged?
> 
> Thanks,

I’m sorry if it didn’t get through, but I sent a Reviewed-by update and the end 
of last week.

I looked over the updates, and I said that they look good to me, and I said 
please re-add my:

Reviewed-by: Lee Duncan <[email protected] <mailto:[email protected]>>


> 
> -- >8 -- 
> 
> Connection failure processing depends on a daemon being present to (at
> least) stop the connection and start recovery.  This is a problem on a
> multipath scenario, where if the daemon failed for whatever reason, the
> SCSI path is never marked as down, multipath won't perform the
> failover and IO to the device will be forever waiting for that
> connection to come back.
> 
> This patch performs the connection failure entirely inside the kernel.
> This way, the failover can happen and pending IO can continue even if
> the daemon is dead. Once the daemon comes alive again, it can execute
> recovery procedures if applicable.
> 
> Changes since v3:
>  - Protect list_empty with connlock on session destroy
> 
> Changes since v2:
>  - Don't hold rx_mutex for too long at once
> 
> Changes since v1:
>  - Remove module parameter.
>  - Always do kernel-side stop work.
>  - Block recovery timeout handler if system is dying.
>  - send a CONN_TERM stop if the system is dying.
> 
> Cc: Mike Christie <[email protected]>
> Cc: Lee Duncan <[email protected]>
> Cc: Bart Van Assche <[email protected]>
> Co-developed-by: Dave Clausen <[email protected]>
> Signed-off-by: Dave Clausen <[email protected]>
> Co-developed-by: Nick Black <[email protected]>
> Signed-off-by: Nick Black <[email protected]>
> Co-developed-by: Vaibhav Nagarnaik <[email protected]>
> Signed-off-by: Vaibhav Nagarnaik <[email protected]>
> Co-developed-by: Anatol Pomazau <[email protected]>
> Signed-off-by: Anatol Pomazau <[email protected]>
> Co-developed-by: Tahsin Erdogan <[email protected]>
> Signed-off-by: Tahsin Erdogan <[email protected]>
> Co-developed-by: Frank Mayhar <[email protected]>
> Signed-off-by: Frank Mayhar <[email protected]>
> Co-developed-by: Junho Ryu <[email protected]>
> Signed-off-by: Junho Ryu <[email protected]>
> Co-developed-by: Khazhismel Kumykov <[email protected]>
> Signed-off-by: Khazhismel Kumykov <[email protected]>
> Reviewed-by: Reviewed-by: Khazhismel Kumykov <[email protected]>
> Signed-off-by: Bharath Ravi <[email protected]>
> Co-developed-by: Gabriel Krisman Bertazi <[email protected]>
> Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
> ---
> drivers/scsi/scsi_transport_iscsi.c | 68 +++++++++++++++++++++++++++++
> include/scsi/scsi_transport_iscsi.h |  1 +
> 2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_transport_iscsi.c 
> b/drivers/scsi/scsi_transport_iscsi.c
> index 271afea654e2..ba6cfaf71aef 100644
> --- a/drivers/scsi/scsi_transport_iscsi.c
> +++ b/drivers/scsi/scsi_transport_iscsi.c
> @@ -86,6 +86,12 @@ struct iscsi_internal {
>       struct transport_container session_cont;
> };
> 
> +/* Worker to perform connection failure on unresponsive connections
> + * completely in kernel space.
> + */
> +static void stop_conn_work_fn(struct work_struct *work);
> +static DECLARE_WORK(stop_conn_work, stop_conn_work_fn);
> +
> static atomic_t iscsi_session_nr; /* sysfs session id for next new session */
> static struct workqueue_struct *iscsi_eh_timer_workq;
> 
> @@ -1611,6 +1617,7 @@ static DEFINE_MUTEX(rx_queue_mutex);
> static LIST_HEAD(sesslist);
> static DEFINE_SPINLOCK(sesslock);
> static LIST_HEAD(connlist);
> +static LIST_HEAD(connlist_err);
> static DEFINE_SPINLOCK(connlock);
> 
> static uint32_t iscsi_conn_get_sid(struct iscsi_cls_conn *conn)
> @@ -2247,6 +2254,7 @@ iscsi_create_conn(struct iscsi_cls_session *session, 
> int dd_size, uint32_t cid)
> 
>       mutex_init(&conn->ep_mutex);
>       INIT_LIST_HEAD(&conn->conn_list);
> +     INIT_LIST_HEAD(&conn->conn_list_err);
>       conn->transport = transport;
>       conn->cid = cid;
> 
> @@ -2293,6 +2301,7 @@ int iscsi_destroy_conn(struct iscsi_cls_conn *conn)
> 
>       spin_lock_irqsave(&connlock, flags);
>       list_del(&conn->conn_list);
> +     list_del(&conn->conn_list_err);
>       spin_unlock_irqrestore(&connlock, flags);
> 
>       transport_unregister_device(&conn->dev);
> @@ -2407,6 +2416,51 @@ int iscsi_offload_mesg(struct Scsi_Host *shost,
> }
> EXPORT_SYMBOL_GPL(iscsi_offload_mesg);
> 
> +static void stop_conn_work_fn(struct work_struct *work)
> +{
> +     struct iscsi_cls_conn *conn, *tmp;
> +     unsigned long flags;
> +     LIST_HEAD(recovery_list);
> +
> +     spin_lock_irqsave(&connlock, flags);
> +     if (list_empty(&connlist_err)) {
> +             spin_unlock_irqrestore(&connlock, flags);
> +             return;
> +     }
> +     list_splice_init(&connlist_err, &recovery_list);
> +     spin_unlock_irqrestore(&connlock, flags);
> +
> +     list_for_each_entry_safe(conn, tmp, &recovery_list, conn_list_err) {
> +             uint32_t sid = iscsi_conn_get_sid(conn);
> +             struct iscsi_cls_session *session;
> +
> +             mutex_lock(&rx_queue_mutex);
> +
> +             session = iscsi_session_lookup(sid);
> +             if (session) {
> +                     if (system_state != SYSTEM_RUNNING) {
> +                             session->recovery_tmo = 0;
> +                             conn->transport->stop_conn(conn,
> +                                                        STOP_CONN_TERM);
> +                     } else {
> +                             conn->transport->stop_conn(conn,
> +                                                        STOP_CONN_RECOVER);
> +                     }
> +             }
> +
> +             list_del_init(&conn->conn_list_err);
> +
> +             mutex_unlock(&rx_queue_mutex);
> +
> +             /* we don't want to hold rx_queue_mutex for too long,
> +              * for instance if many conns failed at the same time,
> +              * since this stall other iscsi maintenance operations.
> +              * Give other users a chance to proceed.
> +              */
> +             cond_resched();
> +     }
> +}
> +
> void iscsi_conn_error_event(struct iscsi_cls_conn *conn, enum iscsi_err error)
> {
>       struct nlmsghdr *nlh;
> @@ -2414,6 +2468,12 @@ void iscsi_conn_error_event(struct iscsi_cls_conn 
> *conn, enum iscsi_err error)
>       struct iscsi_uevent *ev;
>       struct iscsi_internal *priv;
>       int len = nlmsg_total_size(sizeof(*ev));
> +     unsigned long flags;
> +
> +     spin_lock_irqsave(&connlock, flags);
> +     list_add(&conn->conn_list_err, &connlist_err);
> +     spin_unlock_irqrestore(&connlock, flags);
> +     queue_work(system_unbound_wq, &stop_conn_work);
> 
>       priv = iscsi_if_transport_lookup(conn->transport);
>       if (!priv)
> @@ -2743,11 +2803,19 @@ static int
> iscsi_if_destroy_conn(struct iscsi_transport *transport, struct iscsi_uevent 
> *ev)
> {
>       struct iscsi_cls_conn *conn;
> +     unsigned long flags;
> 
>       conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid);
>       if (!conn)
>               return -EINVAL;
> 
> +     spin_lock_irqsave(&connlock, flags);
> +     if (!list_empty(&conn->conn_list_err)) {
> +             spin_unlock_irqrestore(&connlock, flags);
> +             return -EAGAIN;
> +     }
> +     spin_unlock_irqrestore(&connlock, flags);
> +
>       ISCSI_DBG_TRANS_CONN(conn, "Destroying transport conn\n");
>       if (transport->destroy_conn)
>               transport->destroy_conn(conn);
> diff --git a/include/scsi/scsi_transport_iscsi.h 
> b/include/scsi/scsi_transport_iscsi.h
> index 325ae731d9ad..2129dc9e2dec 100644
> --- a/include/scsi/scsi_transport_iscsi.h
> +++ b/include/scsi/scsi_transport_iscsi.h
> @@ -190,6 +190,7 @@ extern void iscsi_ping_comp_event(uint32_t host_no,
> 
> struct iscsi_cls_conn {
>       struct list_head conn_list;     /* item in connlist */
> +     struct list_head conn_list_err; /* item in connlist_err */
>       void *dd_data;                  /* LLD private data */
>       struct iscsi_transport *transport;
>       uint32_t cid;                   /* connection id */
> -- 
> 2.25.0.rc2
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "open-iscsi" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/open-iscsi/SNd7Di-ZRao/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/open-iscsi/20200125061925.191601-1-krisman%40collabora.com.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/F29720C3-86AC-407A-8255-9186E3AE0676%40gmail.com.

Reply via email to