Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On 03/22, James Bottomley wrote: OK, the fix from the SCSI point of view is to make the mpt teardown path actually work. The whole thing looks to be a complete mess because there's another place where it will do the wrong thing in mptscsih_remove(). You always have to call mpt_detach() otherwise the device doesn't get removed from the lists. In theory this patch fixes both bugs in the driver. Yes, I obviously thought about the same change ;) But note that mpt_detach() doesn't look correct too. This is almost off-topic, but still... At least it needs s/cancel_delayed_work/cancel_delayed_work_sync/ to sync with mpt_fault_reset_work(). And it is not clear if it is safe to simply destroy -fw_event_q, perhaps it can't have the pending works if the caller is teardown path, and otherwise we rely on mptsas_cleanup_fw_event_q(), I do not know... but mptsas_cleanup_fw_event_q() needs _sync too I guess, and obviously mptsas_free_fw_event() should be called unconditionally. OTOH, mpt_attach() doesn't verify that -fw_event_q was created succesfully. And, if mpt_do_ioc_recovery() fails it destroys -reset_work_q but not -fw_event_q. Looks like this driver needs some cleanups. --- a/drivers/message/fusion/mptscsih.c +++ b/drivers/message/fusion/mptscsih.c @@ -1176,10 +1176,14 @@ mptscsih_remove(struct pci_dev *pdev) MPT_SCSI_HOST *hd; int sz1; + if (!host) + /* not brought up far enough to do scsi_host_attach() */ scsi_host_alloc ;) + goto out; + Yes, I think this is correct. mpt_attach() does kzalloc(MPT_ADAPTER) so we can probably rely on -sh == NULL if _alloc failed. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On Sun, 23 Mar 2014, Tetsuo Handa wrote: Thomas Gleixner wrote: But then systemd/udev mutters: You migh be able to work around the timeout with udev rules and OPTIONS+=event_timeout=120, but that code was maybe never used or tested, so it might not work correctly. [1] AFAICT from the ubuntu bug system [2] nobody bothered even to try that. And if the udev/systemd event_timeout option is broken it's way better to fix that one instead of hacking random heuristics into the kernel. I haven't tried the event_timeout= option but I think it will not work. The timeout is hard coded as shown below and there will be no chance for taking the event_timeout= option into account. -- systemd-204/src/udev/udevd.c start -- (...snipped...) /* check for hanging events */ udev_list_node_foreach(loop, worker_list) { struct worker *worker = node_to_worker(loop); if (worker-state != WORKER_RUNNING) continue; if ((now(CLOCK_MONOTONIC) - worker-event_start_usec) 30 * 1000 * 1000) { And because systemd has an immutable hardcoded random timeout we add another hardcoded random timeout into kthread_create() to work around that. How broken is that? And it seems other people have solved it: http://www.redhat.com/archives/lvm-devel/2013-September/msg00036.html Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On Sun, 2014-03-23 at 09:04 +0100, Thomas Gleixner wrote: On Sun, 23 Mar 2014, Tetsuo Handa wrote: Thomas Gleixner wrote: But then systemd/udev mutters: You migh be able to work around the timeout with udev rules and OPTIONS+=event_timeout=120, but that code was maybe never used or tested, so it might not work correctly. [1] AFAICT from the ubuntu bug system [2] nobody bothered even to try that. And if the udev/systemd event_timeout option is broken it's way better to fix that one instead of hacking random heuristics into the kernel. I haven't tried the event_timeout= option but I think it will not work. The timeout is hard coded as shown below and there will be no chance for taking the event_timeout= option into account. -- systemd-204/src/udev/udevd.c start -- (...snipped...) /* check for hanging events */ udev_list_node_foreach(loop, worker_list) { struct worker *worker = node_to_worker(loop); if (worker-state != WORKER_RUNNING) continue; if ((now(CLOCK_MONOTONIC) - worker-event_start_usec) 30 * 1000 * 1000) { And because systemd has an immutable hardcoded random timeout we add another hardcoded random timeout into kthread_create() to work around that. How broken is that? And it seems other people have solved it: http://www.redhat.com/archives/lvm-devel/2013-September/msg00036.html I agree with Thomas. A hardcoded timeout is a systemd bug. However, could I get confirmation, while you can use this bug to do it, that the patch back in this thread actually fixes the crash when scsi_alloc_host fails, that's the serious SCSI bug, in my view? Thanks, James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On Sun, 23 Mar 2014, James Bottomley wrote: On Sun, 2014-03-23 at 09:04 +0100, Thomas Gleixner wrote: On Sun, 23 Mar 2014, Tetsuo Handa wrote: Thomas Gleixner wrote: But then systemd/udev mutters: You migh be able to work around the timeout with udev rules and OPTIONS+=event_timeout=120, but that code was maybe never used or tested, so it might not work correctly. [1] AFAICT from the ubuntu bug system [2] nobody bothered even to try that. And if the udev/systemd event_timeout option is broken it's way better to fix that one instead of hacking random heuristics into the kernel. I haven't tried the event_timeout= option but I think it will not work. The timeout is hard coded as shown below and there will be no chance for taking the event_timeout= option into account. -- systemd-204/src/udev/udevd.c start -- (...snipped...) /* check for hanging events */ udev_list_node_foreach(loop, worker_list) { struct worker *worker = node_to_worker(loop); if (worker-state != WORKER_RUNNING) continue; if ((now(CLOCK_MONOTONIC) - worker-event_start_usec) 30 * 1000 * 1000) { And because systemd has an immutable hardcoded random timeout we add another hardcoded random timeout into kthread_create() to work around that. How broken is that? And it seems other people have solved it: http://www.redhat.com/archives/lvm-devel/2013-September/msg00036.html I agree with Thomas. A hardcoded timeout is a systemd bug. However, could I get confirmation, while you can use this bug to do it, that the patch back in this thread actually fixes the crash when scsi_alloc_host fails, that's the serious SCSI bug, in my view? Which patch, the one to kthread_create() or the one to SCSI? Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On Sun, 2014-03-23 at 15:28 +0100, Thomas Gleixner wrote: On Sun, 23 Mar 2014, James Bottomley wrote: On Sun, 2014-03-23 at 09:04 +0100, Thomas Gleixner wrote: On Sun, 23 Mar 2014, Tetsuo Handa wrote: Thomas Gleixner wrote: But then systemd/udev mutters: You migh be able to work around the timeout with udev rules and OPTIONS+=event_timeout=120, but that code was maybe never used or tested, so it might not work correctly. [1] AFAICT from the ubuntu bug system [2] nobody bothered even to try that. And if the udev/systemd event_timeout option is broken it's way better to fix that one instead of hacking random heuristics into the kernel. I haven't tried the event_timeout= option but I think it will not work. The timeout is hard coded as shown below and there will be no chance for taking the event_timeout= option into account. -- systemd-204/src/udev/udevd.c start -- (...snipped...) /* check for hanging events */ udev_list_node_foreach(loop, worker_list) { struct worker *worker = node_to_worker(loop); if (worker-state != WORKER_RUNNING) continue; if ((now(CLOCK_MONOTONIC) - worker-event_start_usec) 30 * 1000 * 1000) { And because systemd has an immutable hardcoded random timeout we add another hardcoded random timeout into kthread_create() to work around that. How broken is that? And it seems other people have solved it: http://www.redhat.com/archives/lvm-devel/2013-September/msg00036.html I agree with Thomas. A hardcoded timeout is a systemd bug. However, could I get confirmation, while you can use this bug to do it, that the patch back in this thread actually fixes the crash when scsi_alloc_host fails, that's the serious SCSI bug, in my view? Which patch, the one to kthread_create() or the one to SCSI? The one to SCSI ... I'm only really interested in the oops when scsi_host_alloc fails. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)
Oleg Nesterov wrote: Tetsuo, what do you think? I don't like blocking SIGKILL while doing operations that depend on memory allocation by other processes. If the OOM killer is triggered and it chose the process blocking SIGKILL in mptsas_init() (I know it unlikely happens), it generates the OOM killer deadlock. My preference is to fix kthread_create() to handle SIGKILL gracefully. kthread_create() did not return upon SIGKILL before commit 786235ee. Since commit 786235ee, there is imbalance that kmalloc(GFP_KERNEL) in kthread_create_on_node() ignores SIGKILL unless TIF_MEMDIE is set but wait_for_completion_killable() in kthread_create_on_node() does not ignore SIGKILL even if TIF_MEMDIE is not set. Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion()) ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus, commit 786235ee sounds like a kernel API breakage. -- From 731f1f6dec7efaa132f751c432079b9b1fdb78d2 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp Date: Sat, 22 Mar 2014 15:16:50 +0900 Subject: [PATCH] kthread: Handle SIGKILL gracefully in kthread_create(). Commit 786235ee kthread: make kthread_create() killable changed to leave kthread_create() as soon as receiving SIGKILL. But this change caused boot failures because systemd-udevd receives SIGKILL due to timeout while loading SCSI controller drivers using finit_module() [1]. Therefore, this patch changes kthread_create() from wait forever in killable state to wait for 10 seconds in unkillable state, check for the OOM killer every second. This patch also changes the return value of timeout case from -ENOMEM to -EINTR because -ENOMEM could make sense for only TIF_MEMDIE case. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp --- kernel/kthread.c | 37 + 1 files changed, 21 insertions(+), 16 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index b5ae3ee..6a25a9f 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -269,6 +269,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), const char namefmt[], ...) { + int i = 0; DECLARE_COMPLETION_ONSTACK(done); struct task_struct *task; struct kthread_create_info *create = kmalloc(sizeof(*create), @@ -287,24 +288,28 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), wake_up_process(kthreadd_task); /* -* Wait for completion in killable state, for I might be chosen by -* the OOM killer while kthreadd is trying to allocate memory for -* new kernel thread. +* Wait for completion with 10 seconds timeout. Unless the kthreadd is +* blocked for direct memory reclaim path, the kthreadd will be able to +* complete the request within 10 seconds. Also, check every second if +* I was chosen by the OOM killer in order to avoid the OOM killer +* deadlock. */ - if (unlikely(wait_for_completion_killable(done))) { - /* -* If I was SIGKILLed before kthreadd (or new kernel thread) -* calls complete(), leave the cleanup of this structure to -* that thread. -*/ - if (xchg(create-done, NULL)) - return ERR_PTR(-ENOMEM); - /* -* kthreadd (or new kernel thread) will call complete() -* shortly. -*/ - wait_for_completion(done); + do { + if (likely(wait_for_completion_timeout(done, HZ))) + goto ready; + } while (i++ 10 !test_thread_flag(TIF_MEMDIE)); + /* +* The kthreadd was unable to complete the request within 10 seconds +* (or I was chosen by the OOM killer). Thus, give up and leave the +* cleanup of this structure to the kthreadd (or new kernel thread). +*/ + if (xchg(create-done, NULL)) { + WARN(1, Gave up waiting for kthreadd.\n); + return ERR_PTR(-EINTR); } + /* kthreadd (or new kernel thread) will call complete() shortly. */ + wait_for_completion(done); +ready: task = create-result; if (!IS_ERR(task)) { static const struct sched_param param = { .sched_priority = 0 }; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On 03/22, Tetsuo Handa wrote: Oleg Nesterov wrote: Tetsuo, what do you think? I don't like blocking SIGKILL while doing operations that depend on memory allocation by other processes. If the OOM killer is triggered and it chose the process blocking SIGKILL in mptsas_init() (I know it unlikely happens), it generates the OOM killer deadlock. It seems that we do not understand each other. I do not like this hack too. And it is even wrong, you can't really block SIGKILL. And it is unnecessary in a sense that (I think) it is fine that module_init() reacts to SIGKILL and aborts, just the fact it crashes the kernel in the error paths is not fine. The driver should be fixed anyway. As for timeout, either userspace/systemd should be changed to not send SIGKILL after 30 secs, or (better) the driver should be changed to not waste 30 secs. The hack I sent can only serve as a short term solution, it should be reverted once we have something better. And, otoh, this hack only affects the problematic driver which should be fixed in any case. Do you see my point? I can be wrong, but I think that you constantly misunderstand the intent. My preference is to fix kthread_create() to handle SIGKILL gracefully. And now I do not understand you too. I do not understand why should we fix kthread_create(). Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion()) ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus, commit 786235ee sounds like a kernel API breakage. Personally I do not really think so, but OK. In this case lets revert 786235ee. Commit 786235ee kthread: make kthread_create() killable changed to leave kthread_create() as soon as receiving SIGKILL. But this change caused boot failures because systemd-udevd receives SIGKILL due to timeout while loading SCSI controller drivers using finit_module() [1]. And I still think that 786235ee simply uncovered the problems in this driver. Perhaps we should change kthread_create() for some reason, but (imho) not because we need to help the buggy code. Therefore, this patch changes kthread_create() from wait forever in killable state to wait for 10 seconds in unkillable state, check for the OOM killer every second. Personally I dislike this change. In fact I think it is ugly. But this is only my opinion. If you convince someone to take this patch I won't argue. This patch also changes the return value of timeout case from -ENOMEM to -EINTR because -ENOMEM could make sense for only TIF_MEMDIE case. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp --- kernel/kthread.c | 37 + 1 files changed, 21 insertions(+), 16 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index b5ae3ee..6a25a9f 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -269,6 +269,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), const char namefmt[], ...) { + int i = 0; DECLARE_COMPLETION_ONSTACK(done); struct task_struct *task; struct kthread_create_info *create = kmalloc(sizeof(*create), @@ -287,24 +288,28 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), wake_up_process(kthreadd_task); /* - * Wait for completion in killable state, for I might be chosen by - * the OOM killer while kthreadd is trying to allocate memory for - * new kernel thread. + * Wait for completion with 10 seconds timeout. Unless the kthreadd is + * blocked for direct memory reclaim path, the kthreadd will be able to + * complete the request within 10 seconds. Also, check every second if + * I was chosen by the OOM killer in order to avoid the OOM killer + * deadlock. */ - if (unlikely(wait_for_completion_killable(done))) { - /* - * If I was SIGKILLed before kthreadd (or new kernel thread) - * calls complete(), leave the cleanup of this structure to - * that thread. - */ - if (xchg(create-done, NULL)) - return ERR_PTR(-ENOMEM); - /* - * kthreadd (or new kernel thread) will call complete() - * shortly. - */ - wait_for_completion(done); + do { + if (likely(wait_for_completion_timeout(done, HZ))) + goto ready; + } while (i++ 10 !test_thread_flag(TIF_MEMDIE)); + /* + * The kthreadd was unable to complete the request within 10 seconds + * (or I was chosen by the OOM killer). Thus, give up and leave the + * cleanup of this structure to the kthreadd (or new kernel thread). + */ + if (xchg(create-done, NULL)) { + WARN(1, Gave up
Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On Sat, 2014-03-22 at 20:25 +0100, Oleg Nesterov wrote: On 03/22, Tetsuo Handa wrote: Oleg Nesterov wrote: Tetsuo, what do you think? I don't like blocking SIGKILL while doing operations that depend on memory allocation by other processes. If the OOM killer is triggered and it chose the process blocking SIGKILL in mptsas_init() (I know it unlikely happens), it generates the OOM killer deadlock. It seems that we do not understand each other. I do not like this hack too. And it is even wrong, you can't really block SIGKILL. And it is unnecessary in a sense that (I think) it is fine that module_init() reacts to SIGKILL and aborts, just the fact it crashes the kernel in the error paths is not fine. The driver should be fixed anyway. As for timeout, either userspace/systemd should be changed to not send SIGKILL after 30 secs, or (better) the driver should be changed to not waste 30 secs. The hack I sent can only serve as a short term solution, it should be reverted once we have something better. And, otoh, this hack only affects the problematic driver which should be fixed in any case. Do you see my point? I can be wrong, but I think that you constantly misunderstand the intent. My preference is to fix kthread_create() to handle SIGKILL gracefully. And now I do not understand you too. I do not understand why should we fix kthread_create(). Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion()) ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus, commit 786235ee sounds like a kernel API breakage. Personally I do not really think so, but OK. In this case lets revert 786235ee. Commit 786235ee kthread: make kthread_create() killable changed to leave kthread_create() as soon as receiving SIGKILL. But this change caused boot failures because systemd-udevd receives SIGKILL due to timeout while loading SCSI controller drivers using finit_module() [1]. And I still think that 786235ee simply uncovered the problems in this driver. Perhaps we should change kthread_create() for some reason, but (imho) not because we need to help the buggy code. Therefore, this patch changes kthread_create() from wait forever in killable state to wait for 10 seconds in unkillable state, check for the OOM killer every second. Personally I dislike this change. In fact I think it is ugly. But this is only my opinion. If you convince someone to take this patch I won't argue. OK, the fix from the SCSI point of view is to make the mpt teardown path actually work. The whole thing looks to be a complete mess because there's another place where it will do the wrong thing in mptscsih_remove(). You always have to call mpt_detach() otherwise the device doesn't get removed from the lists. In theory this patch fixes both bugs in the driver. James --- diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c index 727819c..282d39a 100644 --- a/drivers/message/fusion/mptscsih.c +++ b/drivers/message/fusion/mptscsih.c @@ -1176,10 +1176,14 @@ mptscsih_remove(struct pci_dev *pdev) MPT_SCSI_HOST *hd; int sz1; + if (!host) + /* not brought up far enough to do scsi_host_attach() */ + goto out; + scsi_remove_host(host); if((hd = shost_priv(host)) == NULL) - return; + goto out; mptscsih_shutdown(pdev); @@ -1203,6 +1207,7 @@ mptscsih_remove(struct pci_dev *pdev) scsi_host_put(host); + out: mpt_detach(pdev); } -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On Sat, 22 Mar 2014, Oleg Nesterov wrote: On 03/22, Tetsuo Handa wrote: Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion()) ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus, commit 786235ee sounds like a kernel API breakage. Personally I do not really think so, but OK. In this case lets revert 786235ee. We have explicitely: mutex_lock, mutex_lock_killable and mutex_lock_interruptible. Ditto for wait_event and wait_for_completion. So the existance of the uninterruptible versions does not make an argument for the kthread_create() case. Commit 786235ee kthread: make kthread_create() killable changed to leave kthread_create() as soon as receiving SIGKILL. But this change caused boot failures because systemd-udevd receives SIGKILL due to timeout while loading SCSI controller drivers using finit_module() [1]. And I still think that 786235ee simply uncovered the problems in this driver. Perhaps we should change kthread_create() for some reason, but (imho) not because we need to help the buggy code. Right. Therefore, this patch changes kthread_create() from wait forever in killable state to wait for 10 seconds in unkillable state, check for the OOM killer every second. Personally I dislike this change. In fact I think it is ugly. But this is only my opinion. It's not only ugly, it's activly wrong. It's as wrong as 786235ee itself. And 786235ee needs to be reverted and the revert must go into 3.13.stable as well. I'll send a revert request in separate mail. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)
On Sat, 22 Mar 2014, Thomas Gleixner wrote: On Sat, 22 Mar 2014, Oleg Nesterov wrote: Personally I dislike this change. In fact I think it is ugly. But this is only my opinion. It's not only ugly, it's activly wrong. It's as wrong as 786235ee itself. And 786235ee needs to be reverted and the revert must go into 3.13.stable as well. I'll send a revert request in separate mail. Sorry. I misread the combo of both patches. 786235ee is correct. We definitely don't want to revert it. But I still think, that changing this to butt ugly heuristics with an arbitrary chosen timeout is not the proper solution to fix a driver problem and to work around a systemd policy which mandates that modprobe must return in 30 seconds. But then systemd/udev mutters: You migh be able to work around the timeout with udev rules and OPTIONS+=event_timeout=120, but that code was maybe never used or tested, so it might not work correctly. [1] AFAICT from the ubuntu bug system [2] nobody bothered even to try that. And if the udev/systemd event_timeout option is broken it's way better to fix that one instead of hacking random heuristics into the kernel. Thanks, tglx [1] http://lists.freedesktop.org/archives/systemd-devel/2014-March/018007.html [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)
Oleg Nesterov wrote: On 03/22, Tetsuo Handa wrote: Oleg Nesterov wrote: Tetsuo, what do you think? I don't like blocking SIGKILL while doing operations that depend on memory allocation by other processes. If the OOM killer is triggered and it chose the process blocking SIGKILL in mptsas_init() (I know it unlikely happens), it generates the OOM killer deadlock. It seems that we do not understand each other. I'm agreeing with you regarding long term solution. I think that I and you do not understand each other regarding which approach should be used for short term solution. I do not like this hack too. And it is even wrong, you can't really block SIGKILL. And it is unnecessary in a sense that (I think) it is fine that module_init() reacts to SIGKILL and aborts, just the fact it crashes the kernel in the error paths is not fine. I expect that kernel code reacts to SIGKILL and aborts, but current code (not only fusion but any code) is not ready for reacting to SIGKILL due to use of uninterruptible versions of lock/wait etc. operators. The driver should be fixed anyway. As for timeout, either userspace/systemd should be changed to not send SIGKILL after 30 secs, or (better) the driver should be changed to not waste 30 secs. I'm not asserting that we should not fix the driver and the userspace. I agree that we should fix the driver to respond SIGKILL properly and fix the userspace not to send SIGKILL on hard coded timeout. The hack I sent can only serve as a short term solution, it should be reverted once we have something better. And, otoh, this hack only affects the problematic driver which should be fixed in any case. Do you see my point? I can be wrong, but I think that you constantly misunderstand the intent. My preference is to fix kthread_create() to handle SIGKILL gracefully. And now I do not understand you too. I do not understand why should we fix kthread_create(). I can see your point. But as for kernel for Ubuntu 14.04 LTS (which the date of kernel freeze comes shortly), fixing kthread_create() is the safer choice, for we haven't proved that the fusion is the only kernel code which is disturbed by commit 786235ee. After we confirmed that there is no more kernel code which is disturbed by commit 786235ee, we can revert the kthread: Handle SIGKILL gracefully in kthread_create(). patch. Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion()) ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus, commit 786235ee sounds like a kernel API breakage. Personally I do not really think so, but OK. In this case lets revert 786235ee. Commit 786235ee kthread: make kthread_create() killable changed to leave kthread_create() as soon as receiving SIGKILL. But this change caused boot failures because systemd-udevd receives SIGKILL due to timeout while loading SCSI controller drivers using finit_module() [1]. And I still think that 786235ee simply uncovered the problems in this driver. Perhaps we should change kthread_create() for some reason, but (imho) not because we need to help the buggy code. I don't mean to help the buggy code. The kthread: Handle SIGKILL gracefully in kthread_create(). patch (or revert commit 786235ee) is short term solution (especially for distributions which the date of kernel freeze is approaching). Therefore, this patch changes kthread_create() from wait forever in killable state to wait for 10 seconds in unkillable state, check for the OOM killer every second. Personally I dislike this change. In fact I think it is ugly. But this is only my opinion. If you convince someone to take this patch I won't argue. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)
Thomas Gleixner wrote: But then systemd/udev mutters: You migh be able to work around the timeout with udev rules and OPTIONS+=event_timeout=120, but that code was maybe never used or tested, so it might not work correctly. [1] AFAICT from the ubuntu bug system [2] nobody bothered even to try that. And if the udev/systemd event_timeout option is broken it's way better to fix that one instead of hacking random heuristics into the kernel. I haven't tried the event_timeout= option but I think it will not work. The timeout is hard coded as shown below and there will be no chance for taking the event_timeout= option into account. -- systemd-204/src/udev/udevd.c start -- (...snipped...) /* check for hanging events */ udev_list_node_foreach(loop, worker_list) { struct worker *worker = node_to_worker(loop); if (worker-state != WORKER_RUNNING) continue; if ((now(CLOCK_MONOTONIC) - worker-event_start_usec) 30 * 1000 * 1000) { log_error(worker [%u] %s timeout; kill it\n, worker-pid, worker-event ? worker-event-devpath : idle); kill(worker-pid, SIGKILL); worker-state = WORKER_KILLED; /* drop reference taken for state 'running' */ worker_unref(worker); if (worker-event) { log_error(seq %llu '%s' killed\n, udev_device_get_seqnum(worker-event-dev), worker-event-devpath); worker-event-exitcode = -64; event_queue_delete(worker-event, true); worker-event = NULL; } } } (...snipped...) -- systemd-204/src/udev/udevd.c end -- -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)
On 03/20, Oleg Nesterov wrote: On 03/20, Joseph Salisbury wrote: There was some testing done with your test kernel. The data collected while running your kernel is available in the bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705/comments/58 Joseph, thanks a lot. I'll try to read the logs tomorrow, but at first glance Tetsuo was right, I do not see a long sleep in that log. Yes, it seems that it actually needs 30 secs. It spends most of the time (30.13286 seconds) in msleep+0x20/0x30 WaitForDoorbellInt+0x103/0x130 [mptbase] WaitForDoorbellReply+0x42/0x220 [mptbase] mpt_handshake_req_reply_wait+0x1dc/0x2c0 [mptbase] SendPortEnable.constprop.23+0x94/0xc0 [mptbase] WaitForDoorbellInt() does msleep(1) in a loop. This trace starts at the line 6001, and it is repeated 3792 times, till the line 176686 which apparently shows the trace of the 2nd WaitForDoorbellInt() in WaitForDoorbellReply(). SendPortEnable: if (ioc-ir_firmware || ioc-bus_type == SAS) { rc = mpt_handshake_req_reply_wait(ioc, req_sz, (u32*)port_enable, reply_sz, (u16*)reply_buf, 300 /*seconds*/, sleepFlag); } else { rc = mpt_handshake_req_reply_wait(ioc, req_sz, (u32*)port_enable, reply_sz, (u16*)reply_buf, 30 /*seconds*/, sleepFlag); } I am wondering which branch calls mpt_handshake_req_reply_wait(), the else's timeout=30 (passed to the 1st WaitForDoorbellInt) suspiciously matches the time WaitForDoorbellInt() actually runs... But everything works fine and at first glance the potential timeout error should be propogated correctly. So timeout is probably 300. And probably this all is fine. All I can suggest is the dirty hack for now. User-space should be changed too, I think, but this is another story. Tetsuo, what do you think? Oleg. --- --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -5395,6 +5395,8 @@ static struct pci_driver mptsas_driver = { #endif }; +#include linux/signal.h + static int __init mptsas_init(void) { @@ -5424,7 +5426,31 @@ mptsas_init(void) mpt_event_register(mptsasDoneCtx, mptsas_event_process); mpt_reset_register(mptsasDoneCtx, mptsas_ioc_reset); - error = pci_register_driver(mptsas_driver); + { + sigset_t full, save; + /* +* KILL ME. THIS IS THE DIRTY AND WRONG HACK WAITING FOR THE +* FIX FROM MAINTAINERS. +* +* - This driver needs a lot of time to complete SendPortEnable() +* but systemd-udevd sends SIGKILL after 30 seconds, see +* https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 +* +* Probably user-space should be changed, but: +* +* - Since commit 786235eeba0e kthread: make kthread_create() +* killable scsi_host_alloc() becomes killable and this SIGKILL +* crashes the kernel. +* +* If scsi_host_alloc() fails mptsas_probe() blindly calls +* mptscsih_remove() and scsi_remove_host() hits host == NULL. +*/ + sigfillset(full); + sigprocmask(SIG_SETMASK, full, save); + error = pci_register_driver(mptsas_driver); + sigprocmask(SIG_SETMASK, save, NULL); + } + if (error) sas_release_transport(mptsas_transport_template); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)
On Fri, Mar 21, 2014 at 11:34 AM, Oleg Nesterov o...@redhat.com wrote: Yes, it seems that it actually needs 30 secs. It spends most of the time (30.13286 seconds) in [..] So how about taking a completely different approach: - just say that waiting for devices in the module init sequence for over 30 seconds is really really wrong. - make the damn mptsas driver just register the controller from the init sequence, and then do device discovery asynchronously. The ATA layer does this correctly: it synchronously finds each host, but then it does /* perform each probe asynchronously */ for (i = 0; i host-n_ports; i++) { struct ata_port *ap = host-ports[i]; async_schedule(async_port_probe, ap); } and I really think SCSI drivers should do the same if they have this kind of ports can take forever to probe behavior. What would be the equivalent magic to do this for SCSI? Could we just make something like scsi_probe_and_add_lun() just always do this, the same way ata_host_register() does it? Linus -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)
On Fri, 2014-03-21 at 12:32 -0700, Linus Torvalds wrote: On Fri, Mar 21, 2014 at 11:34 AM, Oleg Nesterov o...@redhat.com wrote: Yes, it seems that it actually needs 30 secs. It spends most of the time (30.13286 seconds) in [..] So how about taking a completely different approach: - just say that waiting for devices in the module init sequence for over 30 seconds is really really wrong. - make the damn mptsas driver just register the controller from the init sequence, and then do device discovery asynchronously. The ATA layer does this correctly: it synchronously finds each host, but then it does /* perform each probe asynchronously */ for (i = 0; i host-n_ports; i++) { struct ata_port *ap = host-ports[i]; async_schedule(async_port_probe, ap); } and I really think SCSI drivers should do the same if they have this kind of ports can take forever to probe behavior. What would be the equivalent magic to do this for SCSI? Could we just make something like scsi_probe_and_add_lun() just always do this, the same way ata_host_register() does it? Well, we do do this asynchronously. The idea is that the add host only initialises the actual hardware. The port probing is supposed to be done asynchronously (provided the async probe option is enabled in SCSI, of course). The way this is supposed to happen is the driver initialises the hardware and then calls scsi_scan_host(). If the platform is set up for async scanning, that kicks off all the async workqueues and returns (or does it all synchronously if async scanning isn't enabled). It is possible fusion gets this wrong because the sas driver doesn't really couple to SCSI's libsas, which is where it would pick up most of the generic infrastructure for this. Plus it depends where all the time is being wasted. The fusion was the last sas chipset I got the specs for (under NDA). It's actually table driven, so if the problem is the controller taking ages to fill in the tables it might necessitate a fusion specific fix. I can see from the driver that it seems to do all the probing itself instead of relying on probe callbacks from scsi_scan_host(), so I know what needs to be fixed ... it's less clear how easy this would be given how monolithic the routine looks. James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)
On 03/19/2014 03:42 PM, Oleg Nesterov wrote: On 03/19, Oleg Nesterov wrote: On 03/19, Oleg Nesterov wrote: But please do not forget that the kernel crashes. Whatever else we do, this should be fixed anyway. And this should be fixed in driver. drivers/message/fusion/ is obviously buggy. Perhaps this is the only problem and Tetsuo is right, this driver really needs more than 30 secs to probe... But if you have a bit of free time, perhaps you can try the stupid debugging patch below ;) Not sure it will help, but who knows. Oleg. There was some testing done with your test kernel. The data collected while running your kernel is available in the bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705/comments/58 Thanks again for the assistance! diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 00d339c..5ecc27e 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -5400,12 +5400,16 @@ mptsas_init(void) { int error; + printk(KERN_CRIT mptsas_init start\n); + current-flags |= 0x1; show_mptmod_ver(my_NAME, my_VERSION); mptsas_transport_template = sas_attach_transport(mptsas_transport_functions); - if (!mptsas_transport_template) - return -ENODEV; + if (!mptsas_transport_template) { + error = -ENODEV; + goto out; + } mptsas_transport_template-eh_timed_out = mptsas_eh_timed_out; mptsasDoneCtx = mpt_register(mptscsih_io_done, MPTSAS_DRIVER, @@ -5428,6 +5432,9 @@ mptsas_init(void) if (error) sas_release_transport(mptsas_transport_template); +out: + current-flags = ~0x1; + printk(KERN_CRIT mptsas_init end\n); return error; } diff --git a/kernel/kthread.c b/kernel/kthread.c index b5ae3ee..78e643d 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -291,6 +291,13 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), * the OOM killer while kthreadd is trying to allocate memory for * new kernel thread. */ + + if (current-flags 1) { + pr_crit(mptsas no killable wait: %d %d\n, + signal_pending(current), __fatal_signal_pending(current)); + goto wait; + } + if (unlikely(wait_for_completion_killable(done))) { /* * If I was SIGKILLed before kthreadd (or new kernel thread) @@ -303,6 +310,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), * kthreadd (or new kernel thread) will call complete() * shortly. */ +wait: wait_for_completion(done); } task = create-result; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b46131e..2b202bd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2655,6 +2655,14 @@ static void __sched __schedule(void) unsigned long *switch_count; struct rq *rq; int cpu; + bool trace; + + trace = (current-flags 1) current-state !(preempt_count() PREEMPT_ACTIVE); + if (trace) { + pr_crit(mptsas sched: %lx %d %d\n, current-state, + signal_pending(current), __fatal_signal_pending(current)); + show_stack(NULL, NULL); + } need_resched: preempt_disable(); @@ -2733,6 +2741,11 @@ need_resched: sched_preempt_enable_no_resched(); if (need_resched()) goto need_resched; + + if (trace) { + pr_crit(mptsas wake: %d %d\n, + signal_pending(current), __fatal_signal_pending(current)); + } } static inline void sched_submit_work(struct task_struct *tsk) diff --git a/kernel/signal.c b/kernel/signal.c index 52f881d..d121944 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1152,6 +1152,11 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t, { int from_ancestor_ns = 0; + if (t-flags 1) { + pr_crit(mptsas killed %d\n, sig); + sched_show_task(t); + } + #ifdef CONFIG_PID_NS from_ancestor_ns = si_fromuser(info) !task_pid_nr_ns(current, task_active_pid_ns(t)); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)
On 03/20, Joseph Salisbury wrote: On 03/19/2014 03:42 PM, Oleg Nesterov wrote: On 03/19, Oleg Nesterov wrote: On 03/19, Oleg Nesterov wrote: But please do not forget that the kernel crashes. Whatever else we do, this should be fixed anyway. And this should be fixed in driver. drivers/message/fusion/ is obviously buggy. Perhaps this is the only problem and Tetsuo is right, this driver really needs more than 30 secs to probe... But if you have a bit of free time, perhaps you can try the stupid debugging patch below ;) Not sure it will help, but who knows. Oleg. There was some testing done with your test kernel. The data collected while running your kernel is available in the bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705/comments/58 Joseph, thanks a lot. I'll try to read the logs tomorrow, but at first glance Tetsuo was right, I do not see a long sleep in that log. And it shows that the hack around kthread_run() in scsi_host_alloc() won't help. I am wondering why I didn't realize this before ;) Hmm. Perhaps we should simply change mptsas_init() to block all signals until we have the right fix. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)
On 03/19, Oleg Nesterov wrote: But please do not forget that the kernel crashes. Whatever else we do, this should be fixed anyway. And this should be fixed in driver. drivers/message/fusion/ is obviously buggy. mptsas_probe() does sh = scsi_host_alloc(...); if (!sh) { ... goto out_mptsas_probe; } ... out_mptsas_probe: mptscsih_remove(pdev); and mptscsih_remove() blindly calls scsi_remove_host(ioc-sh) but -sh was not initialized, probably it is NULL. and scsi_remove_host(host) obviously assumes that this pointer is valid. I think we should wait for maintainers. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)
On 03/19, Oleg Nesterov wrote: On 03/19, Oleg Nesterov wrote: But please do not forget that the kernel crashes. Whatever else we do, this should be fixed anyway. And this should be fixed in driver. drivers/message/fusion/ is obviously buggy. Perhaps this is the only problem and Tetsuo is right, this driver really needs more than 30 secs to probe... But if you have a bit of free time, perhaps you can try the stupid debugging patch below ;) Not sure it will help, but who knows. Oleg. diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 00d339c..5ecc27e 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -5400,12 +5400,16 @@ mptsas_init(void) { int error; + printk(KERN_CRIT mptsas_init start\n); + current-flags |= 0x1; show_mptmod_ver(my_NAME, my_VERSION); mptsas_transport_template = sas_attach_transport(mptsas_transport_functions); - if (!mptsas_transport_template) - return -ENODEV; + if (!mptsas_transport_template) { + error = -ENODEV; + goto out; + } mptsas_transport_template-eh_timed_out = mptsas_eh_timed_out; mptsasDoneCtx = mpt_register(mptscsih_io_done, MPTSAS_DRIVER, @@ -5428,6 +5432,9 @@ mptsas_init(void) if (error) sas_release_transport(mptsas_transport_template); +out: + current-flags = ~0x1; + printk(KERN_CRIT mptsas_init end\n); return error; } diff --git a/kernel/kthread.c b/kernel/kthread.c index b5ae3ee..78e643d 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -291,6 +291,13 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), * the OOM killer while kthreadd is trying to allocate memory for * new kernel thread. */ + + if (current-flags 1) { + pr_crit(mptsas no killable wait: %d %d\n, + signal_pending(current), __fatal_signal_pending(current)); + goto wait; + } + if (unlikely(wait_for_completion_killable(done))) { /* * If I was SIGKILLed before kthreadd (or new kernel thread) @@ -303,6 +310,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), * kthreadd (or new kernel thread) will call complete() * shortly. */ +wait: wait_for_completion(done); } task = create-result; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b46131e..2b202bd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2655,6 +2655,14 @@ static void __sched __schedule(void) unsigned long *switch_count; struct rq *rq; int cpu; + bool trace; + + trace = (current-flags 1) current-state !(preempt_count() PREEMPT_ACTIVE); + if (trace) { + pr_crit(mptsas sched: %lx %d %d\n, current-state, + signal_pending(current), __fatal_signal_pending(current)); + show_stack(NULL, NULL); + } need_resched: preempt_disable(); @@ -2733,6 +2741,11 @@ need_resched: sched_preempt_enable_no_resched(); if (need_resched()) goto need_resched; + + if (trace) { + pr_crit(mptsas wake: %d %d\n, + signal_pending(current), __fatal_signal_pending(current)); + } } static inline void sched_submit_work(struct task_struct *tsk) diff --git a/kernel/signal.c b/kernel/signal.c index 52f881d..d121944 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1152,6 +1152,11 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t, { int from_ancestor_ns = 0; + if (t-flags 1) { + pr_crit(mptsas killed %d\n, sig); + sched_show_task(t); + } + #ifdef CONFIG_PID_NS from_ancestor_ns = si_fromuser(info) !task_pid_nr_ns(current, task_active_pid_ns(t)); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression] kthread:makekthread_create()killable)
On 03/19/2014 03:42 PM, Oleg Nesterov wrote: On 03/19, Oleg Nesterov wrote: On 03/19, Oleg Nesterov wrote: But please do not forget that the kernel crashes. Whatever else we do, this should be fixed anyway. And this should be fixed in driver. drivers/message/fusion/ is obviously buggy. Perhaps this is the only problem and Tetsuo is right, this driver really needs more than 30 secs to probe... But if you have a bit of free time, perhaps you can try the stupid debugging patch below ;) Not sure it will help, but who knows. Oleg. diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 00d339c..5ecc27e 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -5400,12 +5400,16 @@ mptsas_init(void) { int error; + printk(KERN_CRIT mptsas_init start\n); + current-flags |= 0x1; show_mptmod_ver(my_NAME, my_VERSION); mptsas_transport_template = sas_attach_transport(mptsas_transport_functions); - if (!mptsas_transport_template) - return -ENODEV; + if (!mptsas_transport_template) { + error = -ENODEV; + goto out; + } mptsas_transport_template-eh_timed_out = mptsas_eh_timed_out; mptsasDoneCtx = mpt_register(mptscsih_io_done, MPTSAS_DRIVER, @@ -5428,6 +5432,9 @@ mptsas_init(void) if (error) sas_release_transport(mptsas_transport_template); +out: + current-flags = ~0x1; + printk(KERN_CRIT mptsas_init end\n); return error; } diff --git a/kernel/kthread.c b/kernel/kthread.c index b5ae3ee..78e643d 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -291,6 +291,13 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), * the OOM killer while kthreadd is trying to allocate memory for * new kernel thread. */ + + if (current-flags 1) { + pr_crit(mptsas no killable wait: %d %d\n, + signal_pending(current), __fatal_signal_pending(current)); + goto wait; + } + if (unlikely(wait_for_completion_killable(done))) { /* * If I was SIGKILLed before kthreadd (or new kernel thread) @@ -303,6 +310,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), * kthreadd (or new kernel thread) will call complete() * shortly. */ +wait: wait_for_completion(done); } task = create-result; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b46131e..2b202bd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2655,6 +2655,14 @@ static void __sched __schedule(void) unsigned long *switch_count; struct rq *rq; int cpu; + bool trace; + + trace = (current-flags 1) current-state !(preempt_count() PREEMPT_ACTIVE); + if (trace) { + pr_crit(mptsas sched: %lx %d %d\n, current-state, + signal_pending(current), __fatal_signal_pending(current)); + show_stack(NULL, NULL); + } need_resched: preempt_disable(); @@ -2733,6 +2741,11 @@ need_resched: sched_preempt_enable_no_resched(); if (need_resched()) goto need_resched; + + if (trace) { + pr_crit(mptsas wake: %d %d\n, + signal_pending(current), __fatal_signal_pending(current)); + } } static inline void sched_submit_work(struct task_struct *tsk) diff --git a/kernel/signal.c b/kernel/signal.c index 52f881d..d121944 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1152,6 +1152,11 @@ static int send_signal(int sig, struct siginfo *info, struct task_struct *t, { int from_ancestor_ns = 0; + if (t-flags 1) { + pr_crit(mptsas killed %d\n, sig); + sched_show_task(t); + } + #ifdef CONFIG_PID_NS from_ancestor_ns = si_fromuser(info) !task_pid_nr_ns(current, task_active_pid_ns(t)); Thanks for the patch, Oleg. I built a test kernel and asked the bug reporter to test it [0]. [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705/comments/56 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html