Re: Kernel panic: Hung task on unbind session

The Lee-Man Fri, 19 Feb 2021 12:03:02 -0800

Yes, you bring up a good point in my opinion. I do not know this code well, 
but it seems like UNBIND_SESSION could never work.


Mike Chistie? Chris Leech?

On Thursday, January 21, 2021 at 6:52:27 AM UTC-8 [email protected] wrote:

> Hi Folks,
>
> I am looking at a kernel panic due to a hung task and could use some help 
> understanding whether this is a known issue.  Kernel version is 4.14.63.
>
> Here is an complete stack trace of the hung kworker task.
>
> crash> bt 106700
> PID: 106700  TASK: ffff885eb22ebe80  CPU: 8   COMMAND: "kworker/u32:0"
>  #0 [ffffc900550ebab8] __schedule at ffffffff815f0b78
>  #1 [ffffc900550ebb50] schedule at ffffffff815f1248
>  #2 [ffffc900550ebb58] schedule_timeout at ffffffff815f4fe6
>  #3 [ffffc900550ebbf8] wait_for_completion at ffffffff815f1cf0
>  #4 [ffffc900550ebc48] flush_workqueue at ffffffff8108ec66
>  #5 [ffffc900550ebce8] drain_workqueue at ffffffff8108ef84
>  #6 [ffffc900550ebd10] destroy_workqueue at ffffffff81091ce5
>  #7 [ffffc900550ebd30] scsi_host_dev_release at ffffffffa0095ced [scsi_mod]
>  #8 [ffffc900550ebd48] device_release at ffffffff81453c90
>  #9 [ffffc900550ebd68] kobject_put at ffffffff815d8130
> #10 [ffffc900550ebd88] iscsi_session_release at ffffffffa0aebf88 
> [scsi_transport_iscsi]
> #11 [ffffc900550ebda8] device_release at ffffffff81453c90
> #12 [ffffc900550ebdc8] kobject_put at ffffffff815d8130
> #13 [ffffc900550ebde8] device_release at ffffffff81453c90
> #14 [ffffc900550ebe08] kobject_put at ffffffff815d8130
> #15 [ffffc900550ebe28] scsi_remove_target at ffffffffa00a3e92 [scsi_mod]
> #16 [ffffc900550ebe70] __iscsi_unbind_session at ffffffffa0aecd8d 
> [scsi_transport_iscsi]
> #17 [ffffc900550ebe98] process_one_work at ffffffff8108f62a
> #18 [ffffc900550ebed8] worker_thread at ffffffff8108f84b
> #19 [ffffc900550ebf10] kthread at ffffffff8109536a
> #20 [ffffc900550ebf50] ret_from_fork at ffffffff816001ef
>
> After poking around in the kdump, I've discovered that the worker thread 
> that called __iscsi_unbind_session did so for a work item that came from 
> the same workqueue that is being destroyed at the top of the stack. My 
> understanding of work queues is that this isn't allowed and will result in 
> a hung task.   
>
> Here we can see where the __iscsi_unbind_session work is queued to a SCSI 
> work queue
>
> static int
> iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t 
> *group)
> {
> .
> .
> .
> case ISCSI_UEVENT_UNBIND_SESSION:
> session = iscsi_session_lookup(ev->u.d_session.sid);
> if (session)
> scsi_queue_work(iscsi_session_to_shost(session),     <--- unbind work 
> queued to scsi work queue
> &session->unbind_work);
> else
> err = -EINVAL;
> break;
> Here we can see that this puts the work item onto Scsi_Host->work_q 
>
> int scsi_queue_work(struct Scsi_Host *shost, struct work_struct *work)
> {
> if (unlikely(!shost->work_q)) {
> shost_printk(KERN_ERR, shost,
> "ERROR: Scsi host '%s' attempted to queue scsi-work, "
> "when no workqueue created.\n", shost->hostt->name);
> dump_stack();
>
> return -EINVAL;
> }
>
> return queue_work(shost->work_q, work);      <--- Work item goes into 
> Scsi_Host->work_q
> }
> Here we can see the scsi_host_dev_release routine destroying the 
> Scsi_Host->work_q
>
> static void scsi_host_dev_release(struct device *dev)
> {
> struct Scsi_Host *shost = dev_to_shost(dev);
> struct device *parent = dev->parent;
>
> scsi_proc_hostdir_rm(shost->hostt);
>
> /* Wait for functions invoked through call_rcu(&shost->rcu, ...) */
> rcu_barrier();
>
> if (shost->tmf_work_q)
> destroy_workqueue(shost->tmf_work_q);
> if (shost->ehandler)
> kthread_stop(shost->ehandler);
> if (shost->work_q)
> destroy_workqueue(shost->work_q);      <--- Destroying Scsi_Host->work_q
>
> I did some searching and couldn't locate a similar stack trace. Does 
> anyone know if this a known issue? 
>
> If not a known issue, any ideas as to what would normally keep the 
> Scsi_Host device from being removed inline in this call stack? This 
> happened on two hosts with mniutes of each other after starting to 
> disconnect from 2 targets. I believe the unbind session was kicked off from 
> an iscsiadm command to terminate the session but other than that nothing 
> out of the ordinary was going on. 
>
> Thanks in advance, 
> Adam
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/54f8f86d-cced-402e-93c3-e63baa261ec0n%40googlegroups.com.

Re: Kernel panic: Hung task on unbind session

Reply via email to