Yes, you bring up a good point in my opinion. I do not know this code well, but it seems like UNBIND_SESSION could never work.
Mike Chistie? Chris Leech? On Thursday, January 21, 2021 at 6:52:27 AM UTC-8 [email protected] wrote: > Hi Folks, > > I am looking at a kernel panic due to a hung task and could use some help > understanding whether this is a known issue. Kernel version is 4.14.63. > > Here is an complete stack trace of the hung kworker task. > > crash> bt 106700 > PID: 106700 TASK: ffff885eb22ebe80 CPU: 8 COMMAND: "kworker/u32:0" > #0 [ffffc900550ebab8] __schedule at ffffffff815f0b78 > #1 [ffffc900550ebb50] schedule at ffffffff815f1248 > #2 [ffffc900550ebb58] schedule_timeout at ffffffff815f4fe6 > #3 [ffffc900550ebbf8] wait_for_completion at ffffffff815f1cf0 > #4 [ffffc900550ebc48] flush_workqueue at ffffffff8108ec66 > #5 [ffffc900550ebce8] drain_workqueue at ffffffff8108ef84 > #6 [ffffc900550ebd10] destroy_workqueue at ffffffff81091ce5 > #7 [ffffc900550ebd30] scsi_host_dev_release at ffffffffa0095ced [scsi_mod] > #8 [ffffc900550ebd48] device_release at ffffffff81453c90 > #9 [ffffc900550ebd68] kobject_put at ffffffff815d8130 > #10 [ffffc900550ebd88] iscsi_session_release at ffffffffa0aebf88 > [scsi_transport_iscsi] > #11 [ffffc900550ebda8] device_release at ffffffff81453c90 > #12 [ffffc900550ebdc8] kobject_put at ffffffff815d8130 > #13 [ffffc900550ebde8] device_release at ffffffff81453c90 > #14 [ffffc900550ebe08] kobject_put at ffffffff815d8130 > #15 [ffffc900550ebe28] scsi_remove_target at ffffffffa00a3e92 [scsi_mod] > #16 [ffffc900550ebe70] __iscsi_unbind_session at ffffffffa0aecd8d > [scsi_transport_iscsi] > #17 [ffffc900550ebe98] process_one_work at ffffffff8108f62a > #18 [ffffc900550ebed8] worker_thread at ffffffff8108f84b > #19 [ffffc900550ebf10] kthread at ffffffff8109536a > #20 [ffffc900550ebf50] ret_from_fork at ffffffff816001ef > > After poking around in the kdump, I've discovered that the worker thread > that called __iscsi_unbind_session did so for a work item that came from > the same workqueue that is being destroyed at the top of the stack. My > understanding of work queues is that this isn't allowed and will result in > a hung task. > > Here we can see where the __iscsi_unbind_session work is queued to a SCSI > work queue > > static int > iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, uint32_t > *group) > { > . > . > . > case ISCSI_UEVENT_UNBIND_SESSION: > session = iscsi_session_lookup(ev->u.d_session.sid); > if (session) > scsi_queue_work(iscsi_session_to_shost(session), <--- unbind work > queued to scsi work queue > &session->unbind_work); > else > err = -EINVAL; > break; > Here we can see that this puts the work item onto Scsi_Host->work_q > > int scsi_queue_work(struct Scsi_Host *shost, struct work_struct *work) > { > if (unlikely(!shost->work_q)) { > shost_printk(KERN_ERR, shost, > "ERROR: Scsi host '%s' attempted to queue scsi-work, " > "when no workqueue created.\n", shost->hostt->name); > dump_stack(); > > return -EINVAL; > } > > return queue_work(shost->work_q, work); <--- Work item goes into > Scsi_Host->work_q > } > Here we can see the scsi_host_dev_release routine destroying the > Scsi_Host->work_q > > static void scsi_host_dev_release(struct device *dev) > { > struct Scsi_Host *shost = dev_to_shost(dev); > struct device *parent = dev->parent; > > scsi_proc_hostdir_rm(shost->hostt); > > /* Wait for functions invoked through call_rcu(&shost->rcu, ...) */ > rcu_barrier(); > > if (shost->tmf_work_q) > destroy_workqueue(shost->tmf_work_q); > if (shost->ehandler) > kthread_stop(shost->ehandler); > if (shost->work_q) > destroy_workqueue(shost->work_q); <--- Destroying Scsi_Host->work_q > > I did some searching and couldn't locate a similar stack trace. Does > anyone know if this a known issue? > > If not a known issue, any ideas as to what would normally keep the > Scsi_Host device from being removed inline in this call stack? This > happened on two hosts with mniutes of each other after starting to > disconnect from 2 targets. I believe the unbind session was kicked off from > an iscsiadm command to terminate the session but other than that nothing > out of the ordinary was going on. > > Thanks in advance, > Adam > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/54f8f86d-cced-402e-93c3-e63baa261ec0n%40googlegroups.com.
