The kernel's call to unblock devices is only for setting devices to online if they are in the transport-offline state. It doesn't do anything if the scsi-eh set them to offline. The user space online code in your patch handles the scsi-eh case.
I hit the hang below and it should be fixed in this set: https://lore.kernel.org/all/[email protected]/ ________________________________ From: Uday Shankar <[email protected]> Sent: Tuesday, August 23, 2022 2:32 PM To: [email protected] <[email protected]> Cc: Lee Duncan <[email protected]>; Chris Leech <[email protected]>; Michael Christie <[email protected]> Subject: Re: [PATCH] recovery: remove onlining of devices via sysfs Bump and CC maintainers. On Thu, Aug 11, 2022 at 05:40:30PM -0600, Uday Shankar wrote: > In setup_full_feature_phase, iscsid calls into the kernel via > start_conn, then sets all the relevant device states to "running" via > session_online_devs. This second step is redundant since start_conn will > set the device states to running. Moreover, it can cause tasks to hang > forever: between start_conn and session_online_devs, the kernel could > detect another conn error and block the session again, which quiesces > the device queues. Setting the device state to "running" via sysfs kicks > off a rescan, and if the device queue is quiesced, the rescan will hang. > The iscsid kernel stacktrace looks like the following: > > [<0>] blk_execute_rq+0x11c/0x170 > [<0>] __scsi_execute+0x108/0x270 > [<0>] scsi_vpd_inquiry+0x6d/0xc0 > [<0>] scsi_get_vpd_size+0x33/0x70 > [<0>] scsi_get_vpd_buf+0x25/0xb0 > [<0>] scsi_attach_vpd+0x33/0x1a0 > [<0>] scsi_rescan_device+0x2a/0x90 > [<0>] store_state_field+0x1b0/0x250 > [<0>] kernfs_fop_write_iter+0x130/0x1c0 > [<0>] new_sync_write+0x10c/0x190 > [<0>] vfs_write+0x218/0x2a0 > [<0>] ksys_write+0x59/0xd0 > [<0>] do_syscall_64+0x3a/0x80 > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > Since iscsid is responsible for recovery from the second conn error but > it is stuck, the relevant device queues will remain quiesced forever. > Tasks attempting I/O on these queues will thus also get stuck. > > For these two reasons, remove the call to session_online_devs in > setup_full_feature_phase. > > Signed-off-by: Uday Shankar <[email protected]> > --- > usr/initiator.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/usr/initiator.c b/usr/initiator.c > index 56bf38b..6cbdcba 100644 > --- a/usr/initiator.c > +++ b/usr/initiator.c > @@ -1068,7 +1068,6 @@ setup_full_feature_phase(iscsi_conn_t *conn) > } else { > session->notify_qtask = NULL; > > - session_online_devs(session->hostno, session->id); > mgmt_ipc_write_rsp(c->qtask, ISCSI_SUCCESS); > log_warning("connection%d:%d is operational after recovery " > "(%d attempts)", session->id, conn->id, > -- > 2.25.1 > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/DM5PR10MB14666AEF8ED8B55B35310917F1739%40DM5PR10MB1466.namprd10.prod.outlook.com.
