Re: [PATCH v3 1/3] init / kthread: add module_long_probe_init() and module_long_probe_exit()
Luis R. Rodriguez wrote: Tetsuo bisected and found that commit 786235ee \kthread: make kthread_create() killable\ modified kthread_create() to bail as soon as SIGKILL is received. I just wrote commit 786235ee. It is not Tetsuo who bisected it. @@ -128,4 +129,38 @@ bool queue_kthread_work(struct kthread_worker *worker, void flush_kthread_work(struct kthread_work *work); void flush_kthread_worker(struct kthread_worker *worker); +#ifndef MODULE + +#define module_long_probe_init(x) __initcall(x); +#define module_long_probe_exit(x) __exitcall(x); + +#else +/* To be used by modules which can take over 30 seconds at probe */ +#define module_long_probe_init(initfn)\\ + static struct task_struct *__init_thread; \\ + static int _long_probe_##initfn(void *arg) \\ + { \\ + return initfn();\\ + } \\ + static inline __init int __long_probe_##initfn(void) \\ + { \\ + __init_thread = kthread_run(_long_probe_##initfn,\\ + NULL, \\ + #initfn); \\ + if (IS_ERR(__init_thread)) \\ + return PTR_ERR(__init_thread); \\ + return 0; \\ + } \\ + module_init(__long_probe_##initfn); +/* To be used by modules that require module_long_probe_init() */ +#define module_long_probe_exit(exitfn)\\ + static inline void __long_probe_##exitfn(void) \\ + { \\ + exitfn(); \\ exitfn() must not be called if initfn() failed or has not completed yet. You need a bool variable for indicating that we are ready to call exitfn(). Also, subsequent userspace operations may fail if we return to userspace before initfn() completes (e.g. device nodes are not created yet). + if (__init_thread)\\ + kthread_stop(__init_thread); \\ We can\'t use kthread_stop() here because we have to wait for initfn() to succeed before exitfn() is called. + } \\ + module_exit(__long_probe_##exitfn); +#endif /* MODULE */ + #endif /* _LINUX_KTHREAD_H */ -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probefrom init
Greg KH wrote: Why doesn't it work? Doesn't modprobe come right back and the init sequence still takes a while to run? What exactly fails? I guess ... @@ -5429,9 +5429,19 @@ mptsas_init(void) return error; } +static struct task_struct *init_thread; + +static int __init +mptsas_init(void) +{ + init_thread = kthread_run(mptsas_real_init, NULL, mptsas_init); + return 0; +} + static void __exit mptsas_exit(void) { + kthread_stop(init_thread); pci_unregister_driver(mptsas_driver); sas_release_transport(mptsas_transport_template); kthread_run() can fail. sas_attach_transport() and/or pci_register_driver() in mptsas_real_init() can fail. Caller process may fail to continue if sas_attach_transport() and pci_register_driver() in mptsas_real_init() has not completed yet. kthread_stop() must not be called when kthread_run() failed. pci_unregister_driver() and/or sas_release_transport() must not be called when mptsas_real_init() did not return 0 (or has not returned 0 yet). -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probefrom init
Luis R. Rodriguez wrote: On Mon, Jul 28, 2014 at 5:35 PM, Greg KH gre...@linuxfoundation.org wrote: On Mon, Jul 28, 2014 at 05:26:34PM -0700, Luis R. Rodriguez wrote: To ignore SIGKILL ? Sorry, I thought this was a userspace change that caused this. As it's a kernel change, well, maybe that patch should be reverted... That's certainly viable. Oleg? I don't want to revert that patch. I'm trying to reduce use of blocking operations that wait in unkillable state, for the OOM killer is optimistic enough to wait forever even if the OOM-killed process cannot be terminated due to having dependency on other threads that are waiting the OOM-killed process to terminate and release memory. Linux kernel is too optimistic about memory reclaim; memory allocation/reclaim deadlock is not detectable. If its reverted we won't know which drivers are hitting over the new 30 second limit requirement imposed by userspace, which the culprit commit forces failure on probe now. This series at least would allow us to annotate which drivers are brake the 30 second init limit, and enable a work around alternative if we wish to not revert the commit. This series essentially should be considered an alternative solution to what was proposed initially by Joseph Salisbury, it may not be perfect but its a proposal. I welcome others. (...snipped...) This also just addresses this *one* Ethernet driver, there was that SCSI driver that created the original report -- Canonical merged Joseph's fix as a work around, there is no general solution for this yet, and again with that work around you won't find which drivers run into this issue. There may be others found later so this is why I resorted to the deferred workqueue as a solution for now and to enable us to annotate which drivers need fixing as I expect getting the fix done / everyone happy can take time. If you want to know which drivers are hitting over the new 30 second limit requirement but don't want to break the boot, I think we can do what Ubuntu chose as a work around + a warning message like below. diff --git a/kernel/kthread.c b/kernel/kthread.c index c2390f4..43da2b1 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -292,6 +292,26 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), * new kernel thread. */ if (unlikely(wait_for_completion_killable(done))) { + int i = 0; + + /* +* I got SIGKILL, but wait for 10 more seconds for completion +* unless chosen by the OOM killer. This delay is there as a +* workaround for boot failure caused by SIGKILL upon device +* driver initialization timeout. +*/ + if (!test_tsk_thread_flag(current, TIF_MEMDIE)) { + pr_warn(I already got SIGKILL but ignoring it up to + 10 seconds, in case the caller can't survive + fail-immediately-upon-SIGKILL event. + Please make sure that the caller can survive + this event, for this delay will be removed + in the future.\n); + dump_stack(); + } + while (i++ 10 !test_tsk_thread_flag(current, TIF_MEMDIE)) + if (wait_for_completion_timeout(done, HZ)) + goto ready; /* * If I was SIGKILLed before kthreadd (or new kernel thread) * calls complete(), leave the cleanup of this structure to @@ -305,6 +325,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), */ wait_for_completion(done); } +ready: task = create-result; if (!IS_ERR(task)) { static const struct sched_param param = { .sched_priority = 0 }; Hannes Reinecke wrote: Well ... from my POV there are two issues here: The one is that systemd essentially nailed its internal worker timeout to 30 seconds. And there is no way of modifying that short of recompiling. Which should be fixed, so that at least one can modify this timeout. The other one is that systemd killing a worker by sending SIGKILL, which will kill modprobe terminally. Which definitely needs to be fixed. But if we have both issues resolved (eg by allowing udevd to use a longer timeout and revert the latter patch) we can identify the offending drivers _and_ get the system to boot by simply adding a kernel commandline parameter. Which is _far_ preferrable from a maintenance perspective. Users tend to become annoyed if their system doesn't boot ... The proposal which allows longer timeout was expired. https://bugs.launchpad.net/bugs/1297248 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to
Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probefrom init
Luis R. Rodriguez wrote: Tetsuo is it possible / desirable to allow tasks to not kill unless the reason is OOM ? Its unclear if this was discussed before, sorry if it was, have just been a bit busy today to review the archive / discussions on this. Are we aware that the 10 seconds timeout after SIGKILL is not the duration between the beginning of module loading and the end of kthread_create() but the duration to wait for kthreadd to create a new kernel thread? If the kthreadd is unable to create a new kernel thread within 10 seconds, something very bad is happening. For example, memory allocation deadlock sequence shown below might be happening. (1) process1 holds a mutex using mutex_lock(). (2) process1 calls kthread_create() and enters into killable wait state at wait_for_completion_killable(). (3) kthreadd calls kernel_thread() and enters into oom-killable busy loop due to out of memory at alloc_pages_nodemask(). (4) process2 is chosen by the OOM killer, but process2 is unable to terminate because process2 is waiting in unkillable state at mutex_lock() which was held by process1 at (1). (5) kthreadd continues busy loop because process2 does not release memory and the OOM killer does not kill more processes. (6) process1 continues waiting in oom-killable state because process1 is not chosen by the OOM killer. See? The system will remain unresponding unless somebody releases memory that is enough for kthreadd to complete. We cannot teach process1 that process1 needs to give up waiting for kthreadd and call mutex_unlock() in order to allow process2 to terminate. Also, we cannot teach the OOM killer that process1 needs to be oom-killed after process2 is oom-killed. Making the 10 seconds timeout after SIGKILL longer is safe. Changing it to no-timeout-unless-oom-killed is unsafe. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)
Oleg Nesterov wrote: Tetsuo, what do you think? I don't like blocking SIGKILL while doing operations that depend on memory allocation by other processes. If the OOM killer is triggered and it chose the process blocking SIGKILL in mptsas_init() (I know it unlikely happens), it generates the OOM killer deadlock. My preference is to fix kthread_create() to handle SIGKILL gracefully. kthread_create() did not return upon SIGKILL before commit 786235ee. Since commit 786235ee, there is imbalance that kmalloc(GFP_KERNEL) in kthread_create_on_node() ignores SIGKILL unless TIF_MEMDIE is set but wait_for_completion_killable() in kthread_create_on_node() does not ignore SIGKILL even if TIF_MEMDIE is not set. Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion()) ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus, commit 786235ee sounds like a kernel API breakage. -- From 731f1f6dec7efaa132f751c432079b9b1fdb78d2 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp Date: Sat, 22 Mar 2014 15:16:50 +0900 Subject: [PATCH] kthread: Handle SIGKILL gracefully in kthread_create(). Commit 786235ee kthread: make kthread_create() killable changed to leave kthread_create() as soon as receiving SIGKILL. But this change caused boot failures because systemd-udevd receives SIGKILL due to timeout while loading SCSI controller drivers using finit_module() [1]. Therefore, this patch changes kthread_create() from wait forever in killable state to wait for 10 seconds in unkillable state, check for the OOM killer every second. This patch also changes the return value of timeout case from -ENOMEM to -EINTR because -ENOMEM could make sense for only TIF_MEMDIE case. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp --- kernel/kthread.c | 37 + 1 files changed, 21 insertions(+), 16 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index b5ae3ee..6a25a9f 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -269,6 +269,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), const char namefmt[], ...) { + int i = 0; DECLARE_COMPLETION_ONSTACK(done); struct task_struct *task; struct kthread_create_info *create = kmalloc(sizeof(*create), @@ -287,24 +288,28 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), wake_up_process(kthreadd_task); /* -* Wait for completion in killable state, for I might be chosen by -* the OOM killer while kthreadd is trying to allocate memory for -* new kernel thread. +* Wait for completion with 10 seconds timeout. Unless the kthreadd is +* blocked for direct memory reclaim path, the kthreadd will be able to +* complete the request within 10 seconds. Also, check every second if +* I was chosen by the OOM killer in order to avoid the OOM killer +* deadlock. */ - if (unlikely(wait_for_completion_killable(done))) { - /* -* If I was SIGKILLed before kthreadd (or new kernel thread) -* calls complete(), leave the cleanup of this structure to -* that thread. -*/ - if (xchg(create-done, NULL)) - return ERR_PTR(-ENOMEM); - /* -* kthreadd (or new kernel thread) will call complete() -* shortly. -*/ - wait_for_completion(done); + do { + if (likely(wait_for_completion_timeout(done, HZ))) + goto ready; + } while (i++ 10 !test_thread_flag(TIF_MEMDIE)); + /* +* The kthreadd was unable to complete the request within 10 seconds +* (or I was chosen by the OOM killer). Thus, give up and leave the +* cleanup of this structure to the kthreadd (or new kernel thread). +*/ + if (xchg(create-done, NULL)) { + WARN(1, Gave up waiting for kthreadd.\n); + return ERR_PTR(-EINTR); } + /* kthreadd (or new kernel thread) will call complete() shortly. */ + wait_for_completion(done); +ready: task = create-result; if (!IS_ERR(task)) { static const struct sched_param param = { .sched_priority = 0 }; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)
Oleg Nesterov wrote: On 03/22, Tetsuo Handa wrote: Oleg Nesterov wrote: Tetsuo, what do you think? I don't like blocking SIGKILL while doing operations that depend on memory allocation by other processes. If the OOM killer is triggered and it chose the process blocking SIGKILL in mptsas_init() (I know it unlikely happens), it generates the OOM killer deadlock. It seems that we do not understand each other. I'm agreeing with you regarding long term solution. I think that I and you do not understand each other regarding which approach should be used for short term solution. I do not like this hack too. And it is even wrong, you can't really block SIGKILL. And it is unnecessary in a sense that (I think) it is fine that module_init() reacts to SIGKILL and aborts, just the fact it crashes the kernel in the error paths is not fine. I expect that kernel code reacts to SIGKILL and aborts, but current code (not only fusion but any code) is not ready for reacting to SIGKILL due to use of uninterruptible versions of lock/wait etc. operators. The driver should be fixed anyway. As for timeout, either userspace/systemd should be changed to not send SIGKILL after 30 secs, or (better) the driver should be changed to not waste 30 secs. I'm not asserting that we should not fix the driver and the userspace. I agree that we should fix the driver to respond SIGKILL properly and fix the userspace not to send SIGKILL on hard coded timeout. The hack I sent can only serve as a short term solution, it should be reverted once we have something better. And, otoh, this hack only affects the problematic driver which should be fixed in any case. Do you see my point? I can be wrong, but I think that you constantly misunderstand the intent. My preference is to fix kthread_create() to handle SIGKILL gracefully. And now I do not understand you too. I do not understand why should we fix kthread_create(). I can see your point. But as for kernel for Ubuntu 14.04 LTS (which the date of kernel freeze comes shortly), fixing kthread_create() is the safer choice, for we haven't proved that the fusion is the only kernel code which is disturbed by commit 786235ee. After we confirmed that there is no more kernel code which is disturbed by commit 786235ee, we can revert the kthread: Handle SIGKILL gracefully in kthread_create(). patch. Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion()) ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus, commit 786235ee sounds like a kernel API breakage. Personally I do not really think so, but OK. In this case lets revert 786235ee. Commit 786235ee kthread: make kthread_create() killable changed to leave kthread_create() as soon as receiving SIGKILL. But this change caused boot failures because systemd-udevd receives SIGKILL due to timeout while loading SCSI controller drivers using finit_module() [1]. And I still think that 786235ee simply uncovered the problems in this driver. Perhaps we should change kthread_create() for some reason, but (imho) not because we need to help the buggy code. I don't mean to help the buggy code. The kthread: Handle SIGKILL gracefully in kthread_create(). patch (or revert commit 786235ee) is short term solution (especially for distributions which the date of kernel freeze is approaching). Therefore, this patch changes kthread_create() from wait forever in killable state to wait for 10 seconds in unkillable state, check for the OOM killer every second. Personally I dislike this change. In fact I think it is ugly. But this is only my opinion. If you convince someone to take this patch I won't argue. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)
Thomas Gleixner wrote: But then systemd/udev mutters: You migh be able to work around the timeout with udev rules and OPTIONS+=event_timeout=120, but that code was maybe never used or tested, so it might not work correctly. [1] AFAICT from the ubuntu bug system [2] nobody bothered even to try that. And if the udev/systemd event_timeout option is broken it's way better to fix that one instead of hacking random heuristics into the kernel. I haven't tried the event_timeout= option but I think it will not work. The timeout is hard coded as shown below and there will be no chance for taking the event_timeout= option into account. -- systemd-204/src/udev/udevd.c start -- (...snipped...) /* check for hanging events */ udev_list_node_foreach(loop, worker_list) { struct worker *worker = node_to_worker(loop); if (worker-state != WORKER_RUNNING) continue; if ((now(CLOCK_MONOTONIC) - worker-event_start_usec) 30 * 1000 * 1000) { log_error(worker [%u] %s timeout; kill it\n, worker-pid, worker-event ? worker-event-devpath : idle); kill(worker-pid, SIGKILL); worker-state = WORKER_KILLED; /* drop reference taken for state 'running' */ worker_unref(worker); if (worker-event) { log_error(seq %llu '%s' killed\n, udev_device_get_seqnum(worker-event-dev), worker-event-devpath); worker-event-exitcode = -64; event_queue_delete(worker-event, true); worker-event = NULL; } } } (...snipped...) -- systemd-204/src/udev/udevd.c end -- -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3.13][v3.14][Regression] kthread:makekthread_create()killable
Oleg Nesterov wrote: If we need the urgent hack to fix the regression, then I suggest to change scsi_host_alloc() temporary until mptsas (or whatever) is fixed. Device initialization taking longer than 30 seconds is possible and is not a hang up. It is systemd which needs to be fixed. Perhaps systemd needs the fix too, I do not know. But this is irrelevant, I think. Or at least this should be discussed separately. I confirmed that this problem goes away if systemd-udevd supports longer timeout. kthread_run() can fail anyway, mptsas_probe() should not crash the kernel. Right. But mptsas_probe() triggering an OOPS is irrelevant to kthread_run() ( comment #27 ). And btw, it is not clear to me if in this case device initialization really needs more than 30 seconds... My understanding is probably wrong, so please correct me. But it seems that before your make kthread_create() killable - probe hangs - SIGKILL wakes it up - so I assume that the probe was interrupted and didn't finish correctly ??? - initialization continues, does scsi_host_alloc(), etc, and everything works fine even if probe was interrupted? I confirmed that device initialization really took more than 30 seconds ( comments #51 and #52 ). So perhaps that probe should not hang and this should be fixed too ? Do you know where exactly it hangs? And where it is woken up by SIGKILL ? Or I totally misunderstood ? The probe did not hang. SIGKILL affected only wait_for_completion_killable() in kthread_create_on_node() called by mptsas_probe() via scsi_host_alloc(). Thus, the probe was interrupted because kthread_run() returned an error. I think we need a bit different version, in order to take TIF_MEMDIE flag into account at the caller of kthread_create(), for the purpose of commit 786235ee is try to die as soon as possible if chosen by the OOM killer. for (;;) { shost-ehandler = kthread_run(scsi_error_handler, shost, scsi_eh_%d, shost-host_no); if (PTR_ERR(shost-ehandler) != -EINTR || test_thread_flag(TIF_MEMDIE)) Well, personally I do not care about TIF_MEMDIE/oom at all. We need the temporary hack (unless we have the right fix right now) which should be reverted later. I do seriously care about TIF_MEMDIE/oom. Last week I respond to a trouble which hit kernel: request_module() OOM local DoS (RHBZ #853474) without any malice. Not sure I understand... Yes, wait_for_completion_killable() can return immediately if TIF_SIGPENDING will be set again for any reason. Say, another signal. But the next iteration will clear TIF_SIGPENDING ? As I think it is difficult to prove that kmalloc(GFP_KERNEL) never sets TIF_SIGPENDING flag Ah, I see, you mean that kmalloc() can do this every time. No, this should not happen or we have another problem. Then, what happens if somebody does while (1) kill(pid, SIGKILL); where pid is the process calling kthread_run() from the for (;;) loop in scsi_host_alloc()? Theoretically, it will form an infinite retry loop. Clearing TIF_SIGPENDING does not guarantee that next wait_for_completion_killable() does not return immediately. Doing retry decision at scsi_host_alloc() will make things worse than doing it at kthread_create_on_node(). Anyway. I agree with any hack in scsi_host_alloc/etc, this is up to maintainers. I still think that your change uncovered the problems in drivers/message/fusion/, these problems should be fixed somehow. Dear maintainers, we need your help. Right. We found that we can fix this problem by updating systemd-udevd to support longer timeout ( comment #53 ). Joseph, would you consult systemd maintainers? -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3.13][v3.14][Regression] kthread: makekthread_create()killable
Oleg Nesterov wrote: If we need the urgent hack to fix the regression, then I suggest to change scsi_host_alloc() temporary until mptsas (or whatever) is fixed. Device initialization taking longer than 30 seconds is possible and is not a hang up. It is systemd which needs to be fixed. --- x/drivers/scsi/hosts.c +++ x/drivers/scsi/hosts.c @@ -447,8 +447,18 @@ struct Scsi_Host *scsi_host_alloc(struct dev_set_name(shost-shost_dev, host%d, shost-host_no); shost-shost_dev.groups = scsi_sysfs_shost_attr_groups; - shost-ehandler = kthread_run(scsi_error_handler, shost, - scsi_eh_%d, shost-host_no); + /* + * HUGE COMMENT. and kthread_create() needs s/ENOMEM/EINTR/. + */ + for (;;) { + shost-ehandler = kthread_run(scsi_error_handler, shost, + scsi_eh_%d, shost-host_no); + if (!IS_ERR(shost-ehandler) || PTR_ERR(shost-ehandler) != -EINTR) + break; + clear_thread_flag(TIF_SIGPENDING); + } + recalc_sigpending(); + if (IS_ERR(shost-ehandler)) { printk(KERN_WARNING scsi%d: error handler thread failed to spawn, error = %ld\n, shost-host_no, PTR_ERR(shost-ehandler)); I think we need a bit different version, in order to take TIF_MEMDIE flag into account at the caller of kthread_create(), for the purpose of commit 786235ee is try to die as soon as possible if chosen by the OOM killer. for (;;) { shost-ehandler = kthread_run(scsi_error_handler, shost, scsi_eh_%d, shost-host_no); if (PTR_ERR(shost-ehandler) != -EINTR || test_thread_flag(TIF_MEMDIE)) break; clear_thread_flag(TIF_SIGPENDING); } recalc_sigpending(); But I have two worrying points. (1) Changing return code from -ENOMEM to -EINTR may not be sufficient. If kmalloc(GFP_KERNEL) in kthread_create_on_node() does something that calls recalc_sigpending(), TIF_SIGPENDING will be set on the second call to kthread_run(). This will make wait_for_completion_killable() return -EINTR immediately because the second call to kthread_run() happens only when current thread already received SIGKILL (by other than the OOM killer). This may form an infinite busy loop. As I think it is difficult to prove that kmalloc(GFP_KERNEL) never sets TIF_SIGPENDING flag, we would need to call clear_thread_flag(TIF_SIGPENDING) immediately before wait_for_completion_killable() and call recalc_sigpending() immediately after wait_for_completion_killable(). Is this better than taking care of SIGKILL (by other than the OOM killer) on the first call to kthread_run() ? (2) I don't like scattering around test_thread_flag(TIF_MEMDIE), for there might be other drivers who receive SIGKILL by systemd's 30 seconds timeout. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SCSI: buslogic: Added check for DMA mapping errors (was Re:[BusLogic] DMA-API: device driver failed to check map error)
Khalid Aziz wrote: Added check for DMA mapping errors for request sense data buffer. Checking for mapping error can avoid potential wild writes. This patch was prompted by the warning from dma_unmap when kernel is compiled with CONFIG_DMA_API_DEBUG. This patch fixes CONFIG_DMA_API_DEBUG warning. But excuse me, is this error path correct? @@ -309,16 +309,17 @@ static struct blogic_ccb *blogic_alloc_ccb(struct blogic_adapter *adapter) blogic_dealloc_ccb deallocates a CCB, returning it to the Host Adapter's free list. The Host Adapter's Lock should already have been acquired by the caller. */ -static void blogic_dealloc_ccb(struct blogic_ccb *ccb) +static void blogic_dealloc_ccb(struct blogic_ccb *ccb, int dma_unmap) { struct blogic_adapter *adapter = ccb-adapter; scsi_dma_unmap(ccb-command); blogic_dealloc_ccb() uses ccb-command. But - pci_unmap_single(adapter-pci_device, ccb-sensedata, + if (dma_unmap) + pci_unmap_single(adapter-pci_device, ccb-sensedata, ccb-sense_datalen, PCI_DMA_FROMDEVICE); ccb-command = NULL; ccb-status = BLOGIC_CCB_FREE; ccb-next = adapter-free_ccbs; @@ -3177,13 +3179,21 @@ static int blogic_qcmd_lck(struct scsi_cmnd *command, ccb-legacy_tag = queuetag; } } memcpy(ccb-cdb, cdb, cdblen); ccb-sense_datalen = SCSI_SENSE_BUFFERSIZE; - ccb-sensedata = pci_map_single(adapter-pci_device, + sense_buf = pci_map_single(adapter-pci_device, command-sense_buffer, ccb-sense_datalen, PCI_DMA_FROMDEVICE); + if (dma_mapping_error(adapter-pci_device-dev, sense_buf)) { + blogic_err(DMA mapping for sense data buffer failed\n, + adapter); + scsi_dma_map(command); + blogic_dealloc_ccb(ccb, 0); when was ccb-command = command; called before calling blogic_dealloc_ccb()? + return SCSI_MLQUEUE_HOST_BUSY; + } + ccb-sensedata = sense_buf; ccb-command = command; command-scsi_done = comp_cb; if (blogic_multimaster_type(adapter)) { /* Place the CCB in an Outgoing Mailbox. The higher levels Also, what happens if scsi_dma_map(command); returned -ENOMEM ? If you are calling scsi_dma_map() because blogic_dealloc_ccb() calls scsi_dma_unmap(), why can't we do like { struct blogic_adapter *adapter = ccb-adapter; ccb-command = NULL; ccb-status = BLOGIC_CCB_FREE; ccb-next = adapter-free_ccbs; adapter-free_ccbs = ccb; } instead of scsi_dma_map(command); blogic_dealloc_ccb(ccb); ? -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] SCSI: buslogic: Added check for DMA mapping errors (wasRe:[BusLogic] DMA-API: device driver failed to check map error)
Khalid Aziz wrote: Added check for DMA mapping errors for request sense data buffer. Checking for mapping error can avoid potential wild writes. This patch was prompted by the warning from dma_unmap when kernel is compiled with CONFIG_DMA_API_DEBUG. This patch looks OK and works OK to me. Thank you. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html