The kernel's call to unblock devices is only for setting devices to online if 
they are in the transport-offline state. It doesn't do anything if the scsi-eh 
set them to offline. The user space online code in your patch handles the 
scsi-eh case.

I hit the hang below and it should be fixed in this set:

https://lore.kernel.org/all/[email protected]/



________________________________
From: Uday Shankar <[email protected]>
Sent: Tuesday, August 23, 2022 2:32 PM
To: [email protected] <[email protected]>
Cc: Lee Duncan <[email protected]>; Chris Leech <[email protected]>; Michael 
Christie <[email protected]>
Subject: Re: [PATCH] recovery: remove onlining of devices via sysfs

Bump and CC maintainers.

On Thu, Aug 11, 2022 at 05:40:30PM -0600, Uday Shankar wrote:
> In setup_full_feature_phase, iscsid calls into the kernel via
> start_conn, then sets all the relevant device states to "running" via
> session_online_devs. This second step is redundant since start_conn will
> set the device states to running. Moreover, it can cause tasks to hang
> forever: between start_conn and session_online_devs, the kernel could
> detect another conn error and block the session again, which quiesces
> the device queues. Setting the device state to "running" via sysfs kicks
> off a rescan, and if the device queue is quiesced, the rescan will hang.
> The iscsid kernel stacktrace looks like the following:
>
> [<0>] blk_execute_rq+0x11c/0x170
> [<0>] __scsi_execute+0x108/0x270
> [<0>] scsi_vpd_inquiry+0x6d/0xc0
> [<0>] scsi_get_vpd_size+0x33/0x70
> [<0>] scsi_get_vpd_buf+0x25/0xb0
> [<0>] scsi_attach_vpd+0x33/0x1a0
> [<0>] scsi_rescan_device+0x2a/0x90
> [<0>] store_state_field+0x1b0/0x250
> [<0>] kernfs_fop_write_iter+0x130/0x1c0
> [<0>] new_sync_write+0x10c/0x190
> [<0>] vfs_write+0x218/0x2a0
> [<0>] ksys_write+0x59/0xd0
> [<0>] do_syscall_64+0x3a/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> Since iscsid is responsible for recovery from the second conn error but
> it is stuck, the relevant device queues will remain quiesced forever.
> Tasks attempting I/O on these queues will thus also get stuck.
>
> For these two reasons, remove the call to session_online_devs in
> setup_full_feature_phase.
>
> Signed-off-by: Uday Shankar <[email protected]>
> ---
>  usr/initiator.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/usr/initiator.c b/usr/initiator.c
> index 56bf38b..6cbdcba 100644
> --- a/usr/initiator.c
> +++ b/usr/initiator.c
> @@ -1068,7 +1068,6 @@ setup_full_feature_phase(iscsi_conn_t *conn)
>        } else {
>                session->notify_qtask = NULL;
>
> -             session_online_devs(session->hostno, session->id);
>                mgmt_ipc_write_rsp(c->qtask, ISCSI_SUCCESS);
>                log_warning("connection%d:%d is operational after recovery "
>                            "(%d attempts)", session->id, conn->id,
> --
> 2.25.1
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/DM5PR10MB14666AEF8ED8B55B35310917F1739%40DM5PR10MB1466.namprd10.prod.outlook.com.

Reply via email to