Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-17 Thread Oleg Nesterov
On 08/16, Bart Van Assche wrote: > > On 08/16/2016 06:06 AM, Oleg Nesterov wrote: >> If only I could reproduce. Or at least understand what are you doing to >> hit this bug ;) > > Hello Oleg, > > What I'm doing to hit this bug is to run the test script that is > available at

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-17 Thread Oleg Nesterov
On 08/16, Bart Van Assche wrote: > > On 08/16/2016 06:06 AM, Oleg Nesterov wrote: >> If only I could reproduce. Or at least understand what are you doing to >> hit this bug ;) > > Hello Oleg, > > What I'm doing to hit this bug is to run the test script that is > available at

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-16 Thread Bart Van Assche
On 08/16/2016 06:06 AM, Oleg Nesterov wrote: If only I could reproduce. Or at least understand what are you doing to hit this bug ;) Hello Oleg, What I'm doing to hit this bug is to run the test script that is available at https://github.com/bvanassche/srp-test on a setup that is equipped

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-16 Thread Bart Van Assche
On 08/16/2016 06:06 AM, Oleg Nesterov wrote: If only I could reproduce. Or at least understand what are you doing to hit this bug ;) Hello Oleg, What I'm doing to hit this bug is to run the test script that is available at https://github.com/bvanassche/srp-test on a setup that is equipped

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-16 Thread Oleg Nesterov
On 08/15, Bart Van Assche wrote: > > On 08/13/2016 09:32 AM, Oleg Nesterov wrote: >> On 08/12, Bart Van Assche wrote: >>> before I started testing. It took some time >>> before I could reproduce the hang in truncate_inode_pages_range(). >> >> all I can say this contradicts with the previous

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-16 Thread Oleg Nesterov
On 08/15, Bart Van Assche wrote: > > On 08/13/2016 09:32 AM, Oleg Nesterov wrote: >> On 08/12, Bart Van Assche wrote: >>> before I started testing. It took some time >>> before I could reproduce the hang in truncate_inode_pages_range(). >> >> all I can say this contradicts with the previous

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-15 Thread Bart Van Assche
On 08/13/2016 09:32 AM, Oleg Nesterov wrote: On 08/12, Bart Van Assche wrote: before I started testing. It took some time before I could reproduce the hang in truncate_inode_pages_range(). all I can say this contradicts with the previous testing results with my previous patch or with your

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-15 Thread Bart Van Assche
On 08/13/2016 09:32 AM, Oleg Nesterov wrote: On 08/12, Bart Van Assche wrote: before I started testing. It took some time before I could reproduce the hang in truncate_inode_pages_range(). all I can say this contradicts with the previous testing results with my previous patch or with your

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-13 Thread Oleg Nesterov
Forgot to mention... On 08/12, Bart Van Assche wrote: > > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1643,7 +1643,12 @@ find_page: >* wait_on_page_locked is used to avoid unnecessarily >* serialisations and why it's safe. >

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-13 Thread Oleg Nesterov
Forgot to mention... On 08/12, Bart Van Assche wrote: > > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1643,7 +1643,12 @@ find_page: >* wait_on_page_locked is used to avoid unnecessarily >* serialisations and why it's safe. >

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-13 Thread Oleg Nesterov
On 08/12, Bart Van Assche wrote: > > On 08/12/2016 09:16 AM, Oleg Nesterov wrote: > > Please drop two patches I sent before and try the new one below. > > Hello Oleg, > > Thanks for the patch. In addition to your patch I also applied the > attached two patches And I guess you did this because you

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-13 Thread Oleg Nesterov
On 08/12, Bart Van Assche wrote: > > On 08/12/2016 09:16 AM, Oleg Nesterov wrote: > > Please drop two patches I sent before and try the new one below. > > Hello Oleg, > > Thanks for the patch. In addition to your patch I also applied the > attached two patches And I guess you did this because you

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-12 Thread Bart Van Assche
On 08/12/2016 09:16 AM, Oleg Nesterov wrote: > Please drop two patches I sent before and try the new one below. Hello Oleg, Thanks for the patch. In addition to your patch I also applied the attached two patches before I started testing. It took some time before I could reproduce the hang in

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-12 Thread Bart Van Assche
On 08/12/2016 09:16 AM, Oleg Nesterov wrote: > Please drop two patches I sent before and try the new one below. Hello Oleg, Thanks for the patch. In addition to your patch I also applied the attached two patches before I started testing. It took some time before I could reproduce the hang in

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-12 Thread Bart Van Assche
On 08/12/2016 09:16 AM, Oleg Nesterov wrote: On 08/11, Oleg Nesterov wrote: Please drop two patches I sent before and try the new one below. Thanks, will do. Which kernel version do you use? Kernel v4.7 with a few ib_srp and dm-mpath backports from kernel v4.8-rc1 and also a few SCSI

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-12 Thread Bart Van Assche
On 08/12/2016 09:16 AM, Oleg Nesterov wrote: On 08/11, Oleg Nesterov wrote: Please drop two patches I sent before and try the new one below. Thanks, will do. Which kernel version do you use? Kernel v4.7 with a few ib_srp and dm-mpath backports from kernel v4.8-rc1 and also a few SCSI

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-12 Thread Oleg Nesterov
On 08/11, Oleg Nesterov wrote: > > I'll send another debugging patch tomorrow, I was a bit busy today. The next > step is obvious, we need to know the caller. Please drop two patches I sent before anf try the new one below. Which kernel version do you use? Oleg. --- diff --git

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-12 Thread Oleg Nesterov
On 08/11, Oleg Nesterov wrote: > > I'll send another debugging patch tomorrow, I was a bit busy today. The next > step is obvious, we need to know the caller. Please drop two patches I sent before anf try the new one below. Which kernel version do you use? Oleg. --- diff --git

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-11 Thread Oleg Nesterov
Hi Bart, On 08/10, Bart Van Assche wrote: > > That's an excellent catch. With your previous patch and this patch applied I > can't reproduce the hang in truncate_inode_pages_range() anymore. Great, thanks. I'll send another debugging patch tomorrow, I was a bit busy today. The next step is

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-11 Thread Oleg Nesterov
Hi Bart, On 08/10, Bart Van Assche wrote: > > That's an excellent catch. With your previous patch and this patch applied I > can't reproduce the hang in truncate_inode_pages_range() anymore. Great, thanks. I'll send another debugging patch tomorrow, I was a bit busy today. The next step is

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Oleg Nesterov
On 08/09, Bart Van Assche wrote: > > Hello Oleg, > > Something that puzzles me is that removing the "else" keyword from > abort_exclusive_wait() is sufficient to avoid the hang. Yes, we need to understand this. > If there would > be code that clears PG_locked without calling wake_up() this hang

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Oleg Nesterov
On 08/09, Bart Van Assche wrote: > > Hello Oleg, > > Something that puzzles me is that removing the "else" keyword from > abort_exclusive_wait() is sufficient to avoid the hang. Yes, we need to understand this. > If there would > be code that clears PG_locked without calling wake_up() this hang

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Bart Van Assche
On 08/10/2016 03:46 AM, Oleg Nesterov wrote: OK. Could you try another debugging patch below? diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e5a3244..9d5f892 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -711,6 +711,15 @@ static inline

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Bart Van Assche
On 08/10/2016 03:46 AM, Oleg Nesterov wrote: OK. Could you try another debugging patch below? diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e5a3244..9d5f892 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -711,6 +711,15 @@ static inline

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Peter Zijlstra
On Wed, Aug 10, 2016 at 12:57:25PM +0200, Oleg Nesterov wrote: > This condition is fine, and the trace is clear. This means that > lock_page_killable() > was interrupted and wake_bit_function() was not called. We do not need > another wakeup > in this case but somehow it helps. Again, I think

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Peter Zijlstra
On Wed, Aug 10, 2016 at 12:57:25PM +0200, Oleg Nesterov wrote: > This condition is fine, and the trace is clear. This means that > lock_page_killable() > was interrupted and wake_bit_function() was not called. We do not need > another wakeup > in this case but somehow it helps. Again, I think

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Oleg Nesterov
On 08/09, Bart Van Assche wrote: > > On 08/09/2016 10:15 AM, Oleg Nesterov wrote: > > > > --- x/kernel/sched/wait.c > > +++ x/kernel/sched/wait.c > > @@ -283,7 +283,7 @@ void abort_exclusive_wait(wait_queue_hea > > if (!list_empty(>task_list)) > > list_del_init(>task_list); > >

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Oleg Nesterov
On 08/10, Bart Van Assche wrote: > > On 08/10/2016 03:46 AM, Oleg Nesterov wrote: > > OK. Could you try another debugging patch below? > > > > Oleg. > > --- > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > index e5a3244..9d5f892 100644 > > ---

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Oleg Nesterov
On 08/09, Bart Van Assche wrote: > > On 08/09/2016 10:15 AM, Oleg Nesterov wrote: > > > > --- x/kernel/sched/wait.c > > +++ x/kernel/sched/wait.c > > @@ -283,7 +283,7 @@ void abort_exclusive_wait(wait_queue_hea > > if (!list_empty(>task_list)) > > list_del_init(>task_list); > >

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Oleg Nesterov
On 08/10, Bart Van Assche wrote: > > On 08/10/2016 03:46 AM, Oleg Nesterov wrote: > > OK. Could you try another debugging patch below? > > > > Oleg. > > --- > > > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > index e5a3244..9d5f892 100644 > > ---

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Bart Van Assche
On 08/10/2016 03:46 AM, Oleg Nesterov wrote: > OK. Could you try another debugging patch below? > > Oleg. > --- > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index e5a3244..9d5f892 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-10 Thread Bart Van Assche
On 08/10/2016 03:46 AM, Oleg Nesterov wrote: > OK. Could you try another debugging patch below? > > Oleg. > --- > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index e5a3244..9d5f892 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-09 Thread Bart Van Assche
On 08/08/2016 09:20 AM, Oleg Nesterov wrote: > So far _I think_ that the bug is somewhere else... Say, someone clears > PG_locked without wake_up(). Then SIGKILL sent to the task sleeping in > sys_read() "adds" the necessary wakeup... Hello Oleg, Something that puzzles me is that removing the

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-09 Thread Bart Van Assche
On 08/08/2016 09:20 AM, Oleg Nesterov wrote: > So far _I think_ that the bug is somewhere else... Say, someone clears > PG_locked without wake_up(). Then SIGKILL sent to the task sleeping in > sys_read() "adds" the necessary wakeup... Hello Oleg, Something that puzzles me is that removing the

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-09 Thread Bart Van Assche
On 08/09/2016 11:48 AM, Bart Van Assche wrote: [ 1548.018115] sysrq: SysRq : Show Blocked State [ 1548.018210] taskPC stack pid father [ 1548.018677] systemd-udevd D 8803a9f13be8 0 29908483 0x [ 1548.018792] 8803a9f13be8 82584bd0

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-09 Thread Bart Van Assche
On 08/09/2016 11:48 AM, Bart Van Assche wrote: [ 1548.018115] sysrq: SysRq : Show Blocked State [ 1548.018210] taskPC stack pid father [ 1548.018677] systemd-udevd D 8803a9f13be8 0 29908483 0x [ 1548.018792] 8803a9f13be8 82584bd0

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-09 Thread Bart Van Assche
On 08/09/2016 10:15 AM, Oleg Nesterov wrote: > On 08/08, Bart Van Assche wrote: >> >> No external modules were loaded when I triggered the lockup > > Heh. Could you test the patch below? > > Oleg. > > --- x/kernel/sched/wait.c > +++ x/kernel/sched/wait.c > @@ -283,7 +283,7 @@ void

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-09 Thread Bart Van Assche
On 08/09/2016 10:15 AM, Oleg Nesterov wrote: > On 08/08, Bart Van Assche wrote: >> >> No external modules were loaded when I triggered the lockup > > Heh. Could you test the patch below? > > Oleg. > > --- x/kernel/sched/wait.c > +++ x/kernel/sched/wait.c > @@ -283,7 +283,7 @@ void

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-09 Thread Oleg Nesterov
On 08/08, Bart Van Assche wrote: > > No external modules were loaded when I triggered the lockup Heh. Could you test the patch below? Oleg. --- x/kernel/sched/wait.c +++ x/kernel/sched/wait.c @@ -283,7 +283,7 @@ void abort_exclusive_wait(wait_queue_hea if (!list_empty(>task_list))

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-09 Thread Oleg Nesterov
On 08/08, Bart Van Assche wrote: > > No external modules were loaded when I triggered the lockup Heh. Could you test the patch below? Oleg. --- x/kernel/sched/wait.c +++ x/kernel/sched/wait.c @@ -283,7 +283,7 @@ void abort_exclusive_wait(wait_queue_hea if (!list_empty(>task_list))

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-08 Thread Bart Van Assche
On 08/08/2016 09:20 AM, Oleg Nesterov wrote: Do you use external modules during the testing? Hello Oleg, No external modules were loaded when I triggered the lockup I mentioned in the patch description. Although the SRP test software I referred to earlier can be run against the SCST SRP

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-08 Thread Bart Van Assche
On 08/08/2016 09:20 AM, Oleg Nesterov wrote: Do you use external modules during the testing? Hello Oleg, No external modules were loaded when I triggered the lockup I mentioned in the patch description. Although the SRP test software I referred to earlier can be run against the SCST SRP

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-08 Thread Oleg Nesterov
On 08/08, Bart Van Assche wrote: > > This is the sequence of which I think that it leads to the missed wakeup: > > Task 1Task 2Task 3 > Task 4 > > lock_page() > ... > lock_page_killable() >

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-08 Thread Oleg Nesterov
On 08/08, Bart Van Assche wrote: > > This is the sequence of which I think that it leads to the missed wakeup: > > Task 1Task 2Task 3 > Task 4 > > lock_page() > ... > lock_page_killable() >

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-08 Thread Bart Van Assche
On 08/08/16 03:22, Peter Zijlstra wrote: > That would be the exact scenario I drew a picture of, no? I'm still > failing to see the hole there. > > Please draw a picture like that and illustrate the hole. Hi Peter, This is the sequence of which I think that it leads to the missed wakeup: Task

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-08 Thread Bart Van Assche
On 08/08/16 03:22, Peter Zijlstra wrote: > That would be the exact scenario I drew a picture of, no? I'm still > failing to see the hole there. > > Please draw a picture like that and illustrate the hole. Hi Peter, This is the sequence of which I think that it leads to the missed wakeup: Task

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-08 Thread Peter Zijlstra
On Fri, Aug 05, 2016 at 10:41:33AM -0700, Bart Van Assche wrote: > On 08/04/2016 07:09 AM, Peter Zijlstra wrote: > >But I'd still like to understand where we loose the wakeup. > > My assumption is that __wake_up_common() and signal delivery happen > concurrently, that __wake_up_common() wakes up

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-08 Thread Peter Zijlstra
On Fri, Aug 05, 2016 at 10:41:33AM -0700, Bart Van Assche wrote: > On 08/04/2016 07:09 AM, Peter Zijlstra wrote: > >But I'd still like to understand where we loose the wakeup. > > My assumption is that __wake_up_common() and signal delivery happen > concurrently, that __wake_up_common() wakes up

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-05 Thread Bart Van Assche
On 08/04/2016 07:09 AM, Peter Zijlstra wrote: On Wed, Aug 03, 2016 at 02:51:23PM -0700, Bart Van Assche wrote: So I started testing the patch below that should fix the same hang but without triggering any wait list corruption. diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-05 Thread Bart Van Assche
On 08/04/2016 07:09 AM, Peter Zijlstra wrote: On Wed, Aug 03, 2016 at 02:51:23PM -0700, Bart Van Assche wrote: So I started testing the patch below that should fix the same hang but without triggering any wait list corruption. diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-04 Thread Bart Van Assche
On 08/04/16 07:09, Peter Zijlstra wrote: But I'd still like to understand where we loose the wakeup. What are you doing to reproduce this issue? Hello Peter, The test I run is as follows: * Configure the ib_srpt driver to export a RAM disk through the SRP protocol. The ib_srpt driver is a

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-04 Thread Bart Van Assche
On 08/04/16 07:09, Peter Zijlstra wrote: But I'd still like to understand where we loose the wakeup. What are you doing to reproduce this issue? Hello Peter, The test I run is as follows: * Configure the ib_srpt driver to export a RAM disk through the SRP protocol. The ib_srpt driver is a

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-04 Thread Peter Zijlstra
On Wed, Aug 03, 2016 at 02:51:23PM -0700, Bart Van Assche wrote: > So I started testing the patch below that should fix the same hang but > without triggering any wait list corruption. > > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c > index f15d6b6..4e3f651 100644 > ---

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-04 Thread Peter Zijlstra
On Wed, Aug 03, 2016 at 02:51:23PM -0700, Bart Van Assche wrote: > So I started testing the patch below that should fix the same hang but > without triggering any wait list corruption. > > diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c > index f15d6b6..4e3f651 100644 > ---

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Bart Van Assche
On 08/03/2016 02:30 PM, Oleg Nesterov wrote: I too can't understand the problem. Perhaps you missed the fact that abort_exclusive_wait() does everything under wait_queue_head_t->lock ? [ ... ] But we do not care if we race with another try_to_wake_up(), or even with another exclusive

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Bart Van Assche
On 08/03/2016 02:30 PM, Oleg Nesterov wrote: I too can't understand the problem. Perhaps you missed the fact that abort_exclusive_wait() does everything under wait_queue_head_t->lock ? [ ... ] But we do not care if we race with another try_to_wake_up(), or even with another exclusive

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Bart Van Assche
On 08/03/2016 02:30 PM, Oleg Nesterov wrote: On 08/03, Bart Van Assche wrote: try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not. My assumption is that the following sequence of events leads to the lockup that I had mentioned in the description of my patch: *

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Bart Van Assche
On 08/03/2016 02:30 PM, Oleg Nesterov wrote: On 08/03, Bart Van Assche wrote: try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not. My assumption is that the following sequence of events leads to the lockup that I had mentioned in the description of my patch: *

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Oleg Nesterov
Hi Bart, I too can't understand the problem. Perhaps you missed the fact that abort_exclusive_wait() does everything under wait_queue_head_t->lock ? On 08/03, Bart Van Assche wrote: > > try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not. > My assumption is that the

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Oleg Nesterov
Hi Bart, I too can't understand the problem. Perhaps you missed the fact that abort_exclusive_wait() does everything under wait_queue_head_t->lock ? On 08/03, Bart Van Assche wrote: > > try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not. > My assumption is that the

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Peter Zijlstra
On Wed, Aug 03, 2016 at 09:35:03AM -0700, Bart Van Assche wrote: > If try_to_wakeup() reads the task state before abort_exclusive_wait() > sets the task state and if autoremove_wake_function() is called after > abort_exclusive_wait() has removed a task from a wait list then the > cascading

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Peter Zijlstra
On Wed, Aug 03, 2016 at 09:35:03AM -0700, Bart Van Assche wrote: > If try_to_wakeup() reads the task state before abort_exclusive_wait() > sets the task state and if autoremove_wake_function() is called after > abort_exclusive_wait() has removed a task from a wait list then the > cascading

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Bart Van Assche
On 08/03/2016 11:11 AM, Peter Zijlstra wrote: That seems to do the right thing, so clearly I misunderstand. Please clarify. Hello Peter, try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not. My assumption is that the following sequence of events leads to the lockup that

Re: [PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Bart Van Assche
On 08/03/2016 11:11 AM, Peter Zijlstra wrote: That seems to do the right thing, so clearly I misunderstand. Please clarify. Hello Peter, try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not. My assumption is that the following sequence of events leads to the lockup that

[PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Bart Van Assche
If try_to_wakeup() reads the task state before abort_exclusive_wait() sets the task state and if autoremove_wake_function() is called after abort_exclusive_wait() has removed a task from a wait list then the cascading mechanism for exclusive wakeups in abort_exclusive_wait() won't be triggered.

[PATCH] sched: Avoid that __wait_on_bit_lock() hangs

2016-08-03 Thread Bart Van Assche
If try_to_wakeup() reads the task state before abort_exclusive_wait() sets the task state and if autoremove_wake_function() is called after abort_exclusive_wait() has removed a task from a wait list then the cascading mechanism for exclusive wakeups in abort_exclusive_wait() won't be triggered.