Re: [PATCH v3 1/3] init / kthread: add module_long_probe_init() and module_long_probe_exit()

2014-08-12 Thread Tetsuo Handa
Luis R. Rodriguez wrote:
 Tetsuo bisected and found that commit 786235ee \kthread: make
 kthread_create() killable\ modified kthread_create() to bail as
 soon as SIGKILL is received.

I just wrote commit 786235ee. It is not Tetsuo who bisected it.

 @@ -128,4 +129,38 @@ bool queue_kthread_work(struct kthread_worker *worker,
 void flush_kthread_work(struct kthread_work *work);
 void flush_kthread_worker(struct kthread_worker *worker);
 
 +#ifndef MODULE
 +
 +#define module_long_probe_init(x)  __initcall(x);
 +#define module_long_probe_exit(x)  __exitcall(x);
 +
 +#else
 +/* To be used by modules which can take over 30 seconds at probe */
 +#define module_long_probe_init(initfn)\\
 + static struct task_struct *__init_thread;  \\
 + static int _long_probe_##initfn(void *arg)  \\
 + {   \\
 +  return initfn();\\
 + }   \\
 + static inline __init int __long_probe_##initfn(void) \\
 + {   \\
 +  __init_thread = kthread_run(_long_probe_##initfn,\\
 + NULL,  \\
 + #initfn);  \\
 +  if (IS_ERR(__init_thread))   \\
 +   return PTR_ERR(__init_thread);  \\
 +  return 0; \\
 + }   \\
 + module_init(__long_probe_##initfn);
 +/* To be used by modules that require module_long_probe_init() */
 +#define module_long_probe_exit(exitfn)\\
 + static inline void __long_probe_##exitfn(void)  \\
 + {   \\
 +  exitfn(); \\

exitfn() must not be called if initfn() failed or has not
completed yet. You need a bool variable for indicating that
we are ready to call exitfn().

Also, subsequent userspace operations may fail if
we return to userspace before initfn() completes
(e.g. device nodes are not created yet).

 +  if (__init_thread)\\
 +   kthread_stop(__init_thread);  \\

We can\'t use kthread_stop() here because we have to
wait for initfn() to succeed before exitfn() is called.

 + }   \\
 + module_exit(__long_probe_##exitfn);
 +#endif /* MODULE */
 +
 #endif /* _LINUX_KTHREAD_H */
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probefrom init

2014-08-10 Thread Tetsuo Handa
Greg KH wrote:
 Why doesn't it work?  Doesn't modprobe come right back and the init
 sequence still takes a while to run?  What exactly fails?

I guess ...

 @@ -5429,9 +5429,19 @@ mptsas_init(void)
   return error;
  }
  
 +static struct task_struct *init_thread;
 +
 +static int __init
 +mptsas_init(void)
 +{
 + init_thread = kthread_run(mptsas_real_init, NULL, mptsas_init);
 + return 0;
 +}
 +
  static void __exit
  mptsas_exit(void)
  {
 + kthread_stop(init_thread);
   pci_unregister_driver(mptsas_driver);
   sas_release_transport(mptsas_transport_template);
  
 

kthread_run() can fail. sas_attach_transport() and/or pci_register_driver()
in mptsas_real_init() can fail. Caller process may fail to continue if
sas_attach_transport() and pci_register_driver() in mptsas_real_init() has
not completed yet.

kthread_stop() must not be called when kthread_run() failed.
pci_unregister_driver() and/or sas_release_transport() must not be called
when mptsas_real_init() did not return 0 (or has not returned 0 yet).
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probefrom init

2014-07-29 Thread Tetsuo Handa
Luis R. Rodriguez wrote:
 On Mon, Jul 28, 2014 at 5:35 PM, Greg KH gre...@linuxfoundation.org wrote:
  On Mon, Jul 28, 2014 at 05:26:34PM -0700, Luis R. Rodriguez wrote:
  To ignore SIGKILL ?
 
  Sorry, I thought this was a userspace change that caused this.
 
  As it's a kernel change, well, maybe that patch should be reverted...
 
 That's certainly viable. Oleg?

I don't want to revert that patch.

I'm trying to reduce use of blocking operations that wait in unkillable state,
for the OOM killer is optimistic enough to wait forever even if the OOM-killed
process cannot be terminated due to having dependency on other threads that are
waiting the OOM-killed process to terminate and release memory. Linux kernel is
too optimistic about memory reclaim; memory allocation/reclaim deadlock is not
detectable.

 
 If its reverted we won't know which drivers are hitting over the new
 30 second limit requirement imposed by userspace, which the culprit
 commit forces failure on probe now. This series at least would allow
 us to annotate which drivers are brake the 30 second init limit, and
 enable a work around alternative if we wish to not revert the commit.
 This  series essentially should be considered an alternative solution
 to what was proposed initially by Joseph Salisbury, it may not be
 perfect but its a proposal. I welcome others.
(...snipped...)
 This also just addresses this *one* Ethernet driver, there was that
 SCSI driver that created the original report -- Canonical merged
 Joseph's fix as a work around, there is no general solution for this
 yet, and again with that work around you won't find which drivers run
 into this issue. There may be others found later so this is why I
 resorted to the deferred workqueue as a solution for now and to enable
 us to annotate which drivers need fixing as I expect getting the fix
 done / everyone happy can take time.

If you want to know which drivers are hitting over the new 30 second
limit requirement but don't want to break the boot, I think we can do
what Ubuntu chose as a work around + a warning message like below.

diff --git a/kernel/kthread.c b/kernel/kthread.c
index c2390f4..43da2b1 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -292,6 +292,26 @@ struct task_struct *kthread_create_on_node(int 
(*threadfn)(void *data),
 * new kernel thread.
 */
if (unlikely(wait_for_completion_killable(done))) {
+   int i = 0;
+
+   /*
+* I got SIGKILL, but wait for 10 more seconds for completion
+* unless chosen by the OOM killer. This delay is there as a
+* workaround for boot failure caused by SIGKILL upon device
+* driver initialization timeout.
+*/
+   if (!test_tsk_thread_flag(current, TIF_MEMDIE)) {
+   pr_warn(I already got SIGKILL but ignoring it up to 
+   10 seconds, in case the caller can't survive 
+   fail-immediately-upon-SIGKILL event. 
+   Please make sure that the caller can survive 
+   this event, for this delay will be removed 
+   in the future.\n);
+   dump_stack();
+   }
+   while (i++  10  !test_tsk_thread_flag(current, TIF_MEMDIE))
+   if (wait_for_completion_timeout(done, HZ))
+   goto ready;
/*
 * If I was SIGKILLed before kthreadd (or new kernel thread)
 * calls complete(), leave the cleanup of this structure to
@@ -305,6 +325,7 @@ struct task_struct *kthread_create_on_node(int 
(*threadfn)(void *data),
 */
wait_for_completion(done);
}
+ready:
task = create-result;
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };

Hannes Reinecke wrote:
 Well ... from my POV there are two issues here:
 
 The one is that systemd essentially nailed its internal worker 
 timeout to 30 seconds. And there is no way of modifying that short 
 of recompiling. Which should be fixed, so that at least one can
 modify this timeout.
 
 The other one is that systemd killing a worker by sending SIGKILL, 
 which will kill modprobe terminally.
 Which definitely needs to be fixed.
 
 But if we have both issues resolved (eg by allowing udevd to use a 
 longer timeout and revert the latter patch) we can identify the 
 offending drivers _and_ get the system to boot by simply adding a 
 kernel commandline parameter.
 Which is _far_ preferrable from a maintenance perspective.
 Users tend to become annoyed if their system doesn't boot ...

The proposal which allows longer timeout was expired.
https://bugs.launchpad.net/bugs/1297248
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to 

Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probefrom init

2014-07-29 Thread Tetsuo Handa
Luis R. Rodriguez wrote:
 Tetsuo is it possible / desirable to allow tasks to not kill unless the
 reason is OOM ? Its unclear if this was discussed before, sorry if it was,
 have just been a bit busy today to review the archive / discussions on this.

Are we aware that the 10 seconds timeout after SIGKILL is not the duration
between the beginning of module loading and the end of kthread_create() but
the duration to wait for kthreadd to create a new kernel thread?

If the kthreadd is unable to create a new kernel thread within 10 seconds,
something very bad is happening. For example, memory allocation deadlock
sequence shown below might be happening.

 (1) process1 holds a mutex using mutex_lock().
 (2) process1 calls kthread_create() and enters into killable wait state
 at wait_for_completion_killable().
 (3) kthreadd calls kernel_thread() and enters into oom-killable busy loop
 due to out of memory at alloc_pages_nodemask().
 (4) process2 is chosen by the OOM killer, but process2 is unable to
 terminate because process2 is waiting in unkillable state at
 mutex_lock() which was held by process1 at (1).
 (5) kthreadd continues busy loop because process2 does not release memory
 and the OOM killer does not kill more processes.
 (6) process1 continues waiting in oom-killable state because process1 is
 not chosen by the OOM killer.

See? The system will remain unresponding unless somebody releases memory
that is enough for kthreadd to complete. We cannot teach process1 that
process1 needs to give up waiting for kthreadd and call mutex_unlock()
in order to allow process2 to terminate. Also, we cannot teach the OOM
killer that process1 needs to be oom-killed after process2 is oom-killed.

Making the 10 seconds timeout after SIGKILL longer is safe.
Changing it to no-timeout-unless-oom-killed is unsafe.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)

2014-03-22 Thread Tetsuo Handa
Oleg Nesterov wrote:
 Tetsuo, what do you think?

I don't like blocking SIGKILL while doing operations that depend on memory
allocation by other processes. If the OOM killer is triggered and it chose
the process blocking SIGKILL in mptsas_init() (I know it unlikely happens),
it generates the OOM killer deadlock.

My preference is to fix kthread_create() to handle SIGKILL gracefully.
kthread_create() did not return upon SIGKILL before commit 786235ee.
Since commit 786235ee, there is imbalance that kmalloc(GFP_KERNEL) in
kthread_create_on_node() ignores SIGKILL unless TIF_MEMDIE is set but
wait_for_completion_killable() in kthread_create_on_node() does not ignore
SIGKILL even if TIF_MEMDIE is not set.

Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion())
ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus,
commit 786235ee sounds like a kernel API breakage.
--
From 731f1f6dec7efaa132f751c432079b9b1fdb78d2 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
Date: Sat, 22 Mar 2014 15:16:50 +0900
Subject: [PATCH] kthread: Handle SIGKILL gracefully in kthread_create().

Commit 786235ee kthread: make kthread_create() killable changed to
leave kthread_create() as soon as receiving SIGKILL. But this change
caused boot failures because systemd-udevd receives SIGKILL due to timeout
while loading SCSI controller drivers using finit_module() [1].

Therefore, this patch changes kthread_create() from wait forever in
killable state to wait for 10 seconds in unkillable state, check for
the OOM killer every second.

This patch also changes the return value of timeout case from -ENOMEM
to -EINTR because -ENOMEM could make sense for only TIF_MEMDIE case.

  [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705

Signed-off-by: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
---
 kernel/kthread.c |   37 +
 1 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index b5ae3ee..6a25a9f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -269,6 +269,7 @@ struct task_struct *kthread_create_on_node(int 
(*threadfn)(void *data),
   const char namefmt[],
   ...)
 {
+   int i = 0;
DECLARE_COMPLETION_ONSTACK(done);
struct task_struct *task;
struct kthread_create_info *create = kmalloc(sizeof(*create),
@@ -287,24 +288,28 @@ struct task_struct *kthread_create_on_node(int 
(*threadfn)(void *data),
 
wake_up_process(kthreadd_task);
/*
-* Wait for completion in killable state, for I might be chosen by
-* the OOM killer while kthreadd is trying to allocate memory for
-* new kernel thread.
+* Wait for completion with 10 seconds timeout. Unless the kthreadd is
+* blocked for direct memory reclaim path, the kthreadd will be able to
+* complete the request within 10 seconds. Also, check every second if
+* I was chosen by the OOM killer in order to avoid the OOM killer
+* deadlock.
 */
-   if (unlikely(wait_for_completion_killable(done))) {
-   /*
-* If I was SIGKILLed before kthreadd (or new kernel thread)
-* calls complete(), leave the cleanup of this structure to
-* that thread.
-*/
-   if (xchg(create-done, NULL))
-   return ERR_PTR(-ENOMEM);
-   /*
-* kthreadd (or new kernel thread) will call complete()
-* shortly.
-*/
-   wait_for_completion(done);
+   do {
+   if (likely(wait_for_completion_timeout(done, HZ)))
+   goto ready;
+   } while (i++  10  !test_thread_flag(TIF_MEMDIE));
+   /*
+* The kthreadd was unable to complete the request within 10 seconds
+* (or I was chosen by the OOM killer). Thus, give up and leave the
+* cleanup of this structure to the kthreadd (or new kernel thread).
+*/
+   if (xchg(create-done, NULL)) {
+   WARN(1, Gave up waiting for kthreadd.\n);
+   return ERR_PTR(-EINTR);
}
+   /* kthreadd (or new kernel thread) will call complete() shortly. */
+   wait_for_completion(done);
+ready:
task = create-result;
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)

2014-03-22 Thread Tetsuo Handa
Oleg Nesterov wrote:
 On 03/22, Tetsuo Handa wrote:
 
  Oleg Nesterov wrote:
   Tetsuo, what do you think?
 
  I don't like blocking SIGKILL while doing operations that depend on memory
  allocation by other processes. If the OOM killer is triggered and it chose
  the process blocking SIGKILL in mptsas_init() (I know it unlikely happens),
  it generates the OOM killer deadlock.
 
 It seems that we do not understand each other.
 

I'm agreeing with you regarding long term solution. I think that I and you
do not understand each other regarding which approach should be used for short
term solution.

 I do not like this hack too. And it is even wrong, you can't really block
 SIGKILL. And it is unnecessary in a sense that (I think) it is fine that
 module_init() reacts to SIGKILL and aborts, just the fact it crashes the
 kernel in the error paths is not fine.

I expect that kernel code reacts to SIGKILL and aborts, but current code
(not only fusion but any code) is not ready for reacting to SIGKILL due to
use of uninterruptible versions of lock/wait etc. operators.

 The driver should be fixed anyway. As for timeout, either userspace/systemd
 should be changed to not send SIGKILL after 30 secs, or (better) the driver
 should be changed to not waste 30 secs.

I'm not asserting that we should not fix the driver and the userspace.
I agree that we should fix the driver to respond SIGKILL properly and
fix the userspace not to send SIGKILL on hard coded timeout.

 
 The hack I sent can only serve as a short term solution, it should be
 reverted once we have something better. And, otoh, this hack only affects
 the problematic driver which should be fixed in any case.
 
 Do you see my point? I can be wrong, but I think that you constantly
 misunderstand the intent.
 
  My preference is to fix kthread_create() to handle SIGKILL gracefully.
 
 And now I do not understand you too. I do not understand why should we
 fix kthread_create().

I can see your point. But as for kernel for Ubuntu 14.04 LTS (which the date
of kernel freeze comes shortly), fixing kthread_create() is the safer choice,
for we haven't proved that the fusion is the only kernel code which is
disturbed by commit 786235ee.

After we confirmed that there is no more kernel code which is disturbed by
commit 786235ee, we can revert the kthread: Handle SIGKILL gracefully in
kthread_create(). patch.

 
  Many kernel operations (e.g. mutex_lock() wait_event() 
  wait_for_completion())
  ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus,
  commit 786235ee sounds like a kernel API breakage.
 
 Personally I do not really think so, but OK. In this case lets revert
 786235ee.
 
  Commit 786235ee kthread: make kthread_create() killable changed to
  leave kthread_create() as soon as receiving SIGKILL. But this change
  caused boot failures because systemd-udevd receives SIGKILL due to timeout
  while loading SCSI controller drivers using finit_module() [1].
 
 And I still think that 786235ee simply uncovered the problems in this
 driver. Perhaps we should change kthread_create() for some reason, but
 (imho) not because we need to help the buggy code.
 

I don't mean to help the buggy code. The kthread: Handle SIGKILL gracefully
in kthread_create(). patch (or revert commit 786235ee) is short term solution
(especially for distributions which the date of kernel freeze is approaching).

  Therefore, this patch changes kthread_create() from wait forever in
  killable state to wait for 10 seconds in unkillable state, check for
  the OOM killer every second.
 
 Personally I dislike this change. In fact I think it is ugly. But this
 is only my opinion.
 
 If you convince someone to take this patch I won't argue.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: please fix FUSION (Was:[v3.13][v3.14][Regression]kthread:makekthread_create()killable)

2014-03-22 Thread Tetsuo Handa
Thomas Gleixner wrote:
 But then systemd/udev mutters:
 
You migh be able to work around the timeout with udev rules and
 OPTIONS+=event_timeout=120, but that code was maybe never used
 or tested, so it might not work correctly. [1]
 
 AFAICT from the ubuntu bug system [2] nobody bothered even to try that.
 
 And if the udev/systemd event_timeout option is broken it's way better
 to fix that one instead of hacking random heuristics into the kernel.

I haven't tried the event_timeout= option but I think it will not work.
The timeout is hard coded as shown below and there will be no chance for taking
the event_timeout= option into account.

-- systemd-204/src/udev/udevd.c start --
(...snipped...)
/* check for hanging events */
udev_list_node_foreach(loop, worker_list) {
struct worker *worker = node_to_worker(loop);

if (worker-state != WORKER_RUNNING)
continue;

if ((now(CLOCK_MONOTONIC) - 
worker-event_start_usec)  30 * 1000 * 1000) {
log_error(worker [%u] %s timeout; kill 
it\n, worker-pid,
worker-event ? 
worker-event-devpath : idle);
kill(worker-pid, SIGKILL);
worker-state = WORKER_KILLED;
/* drop reference taken for state 
'running' */
worker_unref(worker);
if (worker-event) {
log_error(seq %llu '%s' 
killed\n,
  
udev_device_get_seqnum(worker-event-dev), worker-event-devpath);
worker-event-exitcode = -64;

event_queue_delete(worker-event, true);
worker-event = NULL;
}
}
}
(...snipped...)
-- systemd-204/src/udev/udevd.c end --
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3.13][v3.14][Regression] kthread:makekthread_create()killable

2014-03-19 Thread Tetsuo Handa
Oleg Nesterov wrote:
   If we need the urgent hack to fix the regression, then I suggest to change
   scsi_host_alloc() temporary until mptsas (or whatever) is fixed.
 
  Device initialization taking longer than 30 seconds is possible and is not a
  hang up. It is systemd which needs to be fixed.
 
 Perhaps systemd needs the fix too, I do not know. But this is irrelevant,
 I think. Or at least this should be discussed separately.

I confirmed that this problem goes away if systemd-udevd supports longer
timeout.

 
 kthread_run() can fail anyway, mptsas_probe() should not crash the kernel.

Right. But mptsas_probe() triggering an OOPS is irrelevant to kthread_run()
( comment #27 ).

 
 And btw, it is not clear to me if in this case device initialization really
 needs more than 30 seconds... My understanding is probably wrong, so please
 correct me. But it seems that before your make kthread_create() killable
 
   - probe hangs
 
   - SIGKILL wakes it up
 
   - so I assume that the probe was interrupted and didn't finish
 correctly ???
 
   - initialization continues, does scsi_host_alloc(), etc, and
 everything works fine even if probe was interrupted?
 

I confirmed that device initialization really took more than 30 seconds
( comments #51 and #52 ).

 So perhaps that probe should not hang and this should be fixed too ?
 Do you know where exactly it hangs? And where it is woken up by SIGKILL ?
 Or I totally misunderstood ?

The probe did not hang. SIGKILL affected only wait_for_completion_killable()
in kthread_create_on_node() called by mptsas_probe() via scsi_host_alloc().
Thus, the probe was interrupted because kthread_run() returned an error.

  I think we need a bit different version, in order to take TIF_MEMDIE flag 
  into
  account at the caller of kthread_create(), for the purpose of commit 
  786235ee
  is try to die as soon as possible if chosen by the OOM killer.
 
  for (;;) {
  shost-ehandler = kthread_run(scsi_error_handler, shost,
scsi_eh_%d, shost-host_no);
  if (PTR_ERR(shost-ehandler) != -EINTR ||
  test_thread_flag(TIF_MEMDIE))
 
 Well, personally I do not care about TIF_MEMDIE/oom at all. We need the
 temporary hack (unless we have the right fix right now) which should be
 reverted later.

I do seriously care about TIF_MEMDIE/oom. Last week I respond to a trouble
which hit kernel: request_module() OOM local DoS (RHBZ #853474) without
any malice.

 Not sure I understand... Yes, wait_for_completion_killable() can return
 immediately if TIF_SIGPENDING will be set again for any reason. Say, another
 signal. But the next iteration will clear TIF_SIGPENDING ?
 
   As I think it is difficult to prove that kmalloc(GFP_KERNEL) never sets
   TIF_SIGPENDING flag
 
 Ah, I see, you mean that kmalloc() can do this every time. No, this should
 not happen or we have another problem.

Then, what happens if somebody does

  while (1)
  kill(pid, SIGKILL);

where pid is the process calling kthread_run() from the for (;;) loop in
scsi_host_alloc()? Theoretically, it will form an infinite retry loop.
Clearing TIF_SIGPENDING does not guarantee that next
wait_for_completion_killable() does not return immediately.
Doing retry decision at scsi_host_alloc() will make things worse than
doing it at kthread_create_on_node().

 Anyway. I agree with any hack in scsi_host_alloc/etc, this is up to
 maintainers. I still think that your change uncovered the problems in
 drivers/message/fusion/, these problems should be fixed somehow.
 
 Dear maintainers, we need your help.
 

Right. We found that we can fix this problem by updating systemd-udevd to
support longer timeout ( comment #53 ). Joseph, would you consult systemd
maintainers?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3.13][v3.14][Regression] kthread: makekthread_create()killable

2014-03-18 Thread Tetsuo Handa
Oleg Nesterov wrote:
 If we need the urgent hack to fix the regression, then I suggest to change
 scsi_host_alloc() temporary until mptsas (or whatever) is fixed.

Device initialization taking longer than 30 seconds is possible and is not a
hang up. It is systemd which needs to be fixed.

 --- x/drivers/scsi/hosts.c
 +++ x/drivers/scsi/hosts.c
 @@ -447,8 +447,18 @@ struct Scsi_Host *scsi_host_alloc(struct
   dev_set_name(shost-shost_dev, host%d, shost-host_no);
   shost-shost_dev.groups = scsi_sysfs_shost_attr_groups;
  
 - shost-ehandler = kthread_run(scsi_error_handler, shost,
 - scsi_eh_%d, shost-host_no);
 + /*
 +  * HUGE COMMENT. and kthread_create() needs s/ENOMEM/EINTR/.
 +  */
 + for (;;) {
 + shost-ehandler = kthread_run(scsi_error_handler, shost,
 + scsi_eh_%d, shost-host_no);
 + if (!IS_ERR(shost-ehandler) || PTR_ERR(shost-ehandler) != 
 -EINTR)
 + break;
 + clear_thread_flag(TIF_SIGPENDING);
 + }
 + recalc_sigpending();
 +
   if (IS_ERR(shost-ehandler)) {
   printk(KERN_WARNING scsi%d: error handler thread failed to 
 spawn, error = %ld\n,
   shost-host_no, PTR_ERR(shost-ehandler));
 
 

I think we need a bit different version, in order to take TIF_MEMDIE flag into
account at the caller of kthread_create(), for the purpose of commit 786235ee
is try to die as soon as possible if chosen by the OOM killer.

for (;;) {
shost-ehandler = kthread_run(scsi_error_handler, shost,
  scsi_eh_%d, shost-host_no);
if (PTR_ERR(shost-ehandler) != -EINTR ||
test_thread_flag(TIF_MEMDIE))
break;
clear_thread_flag(TIF_SIGPENDING);
}
recalc_sigpending();

But I have two worrying points.

  (1) Changing return code from -ENOMEM to -EINTR may not be sufficient.

  If kmalloc(GFP_KERNEL) in kthread_create_on_node() does something that
  calls recalc_sigpending(), TIF_SIGPENDING will be set on the second call
  to kthread_run(). This will make wait_for_completion_killable() return
  -EINTR immediately because the second call to kthread_run() happens only
  when current thread already received SIGKILL (by other than the OOM
  killer). This may form an infinite busy loop.

  As I think it is difficult to prove that kmalloc(GFP_KERNEL) never sets
  TIF_SIGPENDING flag, we would need to call
  clear_thread_flag(TIF_SIGPENDING) immediately before
  wait_for_completion_killable() and call recalc_sigpending() immediately
  after wait_for_completion_killable(). Is this better than taking care of
  SIGKILL (by other than the OOM killer) on the first call to
  kthread_run() ?

  (2) I don't like scattering around test_thread_flag(TIF_MEMDIE), for there
  might be other drivers who receive SIGKILL by systemd's 30 seconds
  timeout.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] SCSI: buslogic: Added check for DMA mapping errors (was Re:[BusLogic] DMA-API: device driver failed to check map error)

2013-09-13 Thread Tetsuo Handa
Khalid Aziz wrote:
 Added check for DMA mapping errors for request sense data
 buffer. Checking for mapping error can avoid potential wild
 writes. This patch was prompted by the warning from
 dma_unmap when kernel is compiled with CONFIG_DMA_API_DEBUG.

This patch fixes CONFIG_DMA_API_DEBUG warning.
But excuse me, is this error path correct?

 @@ -309,16 +309,17 @@ static struct blogic_ccb *blogic_alloc_ccb(struct 
 blogic_adapter *adapter)
blogic_dealloc_ccb deallocates a CCB, returning it to the Host Adapter's
free list.  The Host Adapter's Lock should already have been acquired by 
 the
caller.
  */
 
 -static void blogic_dealloc_ccb(struct blogic_ccb *ccb)
 +static void blogic_dealloc_ccb(struct blogic_ccb *ccb, int dma_unmap)
  {
 struct blogic_adapter *adapter = ccb-adapter;
 
 scsi_dma_unmap(ccb-command);

blogic_dealloc_ccb() uses ccb-command. But

 -   pci_unmap_single(adapter-pci_device, ccb-sensedata,
 +   if (dma_unmap)
 +   pci_unmap_single(adapter-pci_device, ccb-sensedata,
  ccb-sense_datalen, PCI_DMA_FROMDEVICE);
 
 ccb-command = NULL;
 ccb-status = BLOGIC_CCB_FREE;
 ccb-next = adapter-free_ccbs;
 @@ -3177,13 +3179,21 @@ static int blogic_qcmd_lck(struct scsi_cmnd *command,
 ccb-legacy_tag = queuetag;
 }
 }
 memcpy(ccb-cdb, cdb, cdblen);
 ccb-sense_datalen = SCSI_SENSE_BUFFERSIZE;
 -   ccb-sensedata = pci_map_single(adapter-pci_device,
 +   sense_buf = pci_map_single(adapter-pci_device,
 command-sense_buffer, ccb-sense_datalen,
 PCI_DMA_FROMDEVICE);
 +   if (dma_mapping_error(adapter-pci_device-dev, sense_buf)) {
 +   blogic_err(DMA mapping for sense data buffer failed\n,
 +   adapter);
 +   scsi_dma_map(command);
 +   blogic_dealloc_ccb(ccb, 0);

when was ccb-command = command; called before calling blogic_dealloc_ccb()?

 +   return SCSI_MLQUEUE_HOST_BUSY;
 +   }
 +   ccb-sensedata = sense_buf;
 ccb-command = command;
 command-scsi_done = comp_cb;
 if (blogic_multimaster_type(adapter)) {
 /*
Place the CCB in an Outgoing Mailbox. The higher levels

Also, what happens if scsi_dma_map(command); returned -ENOMEM ?
If you are calling scsi_dma_map() because blogic_dealloc_ccb() calls
scsi_dma_unmap(), why can't we do like

  {
  struct blogic_adapter *adapter = ccb-adapter;
  ccb-command = NULL;
  ccb-status = BLOGIC_CCB_FREE;
  ccb-next = adapter-free_ccbs;
  adapter-free_ccbs = ccb;
  }

instead of

  scsi_dma_map(command);
  blogic_dealloc_ccb(ccb);

?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] SCSI: buslogic: Added check for DMA mapping errors (wasRe:[BusLogic] DMA-API: device driver failed to check map error)

2013-09-13 Thread Tetsuo Handa
Khalid Aziz wrote:
 Added check for DMA mapping errors for request sense data
 buffer. Checking for mapping error can avoid potential wild
 writes. This patch was prompted by the warning from
 dma_unmap when kernel is compiled with CONFIG_DMA_API_DEBUG.

This patch looks OK and works OK to me.

Thank you.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html