On 08/16, Bart Van Assche wrote:
>
> On 08/16/2016 06:06 AM, Oleg Nesterov wrote:
>> If only I could reproduce. Or at least understand what are you doing to
>> hit this bug ;)
>
> Hello Oleg,
>
> What I'm doing to hit this bug is to run the test script that is
> available at
On 08/16, Bart Van Assche wrote:
>
> On 08/16/2016 06:06 AM, Oleg Nesterov wrote:
>> If only I could reproduce. Or at least understand what are you doing to
>> hit this bug ;)
>
> Hello Oleg,
>
> What I'm doing to hit this bug is to run the test script that is
> available at
On 08/16/2016 06:06 AM, Oleg Nesterov wrote:
If only I could reproduce. Or at least understand what are you doing to
hit this bug ;)
Hello Oleg,
What I'm doing to hit this bug is to run the test script that is
available at https://github.com/bvanassche/srp-test on a setup that is
equipped
On 08/16/2016 06:06 AM, Oleg Nesterov wrote:
If only I could reproduce. Or at least understand what are you doing to
hit this bug ;)
Hello Oleg,
What I'm doing to hit this bug is to run the test script that is
available at https://github.com/bvanassche/srp-test on a setup that is
equipped
On 08/15, Bart Van Assche wrote:
>
> On 08/13/2016 09:32 AM, Oleg Nesterov wrote:
>> On 08/12, Bart Van Assche wrote:
>>> before I started testing. It took some time
>>> before I could reproduce the hang in truncate_inode_pages_range().
>>
>> all I can say this contradicts with the previous
On 08/15, Bart Van Assche wrote:
>
> On 08/13/2016 09:32 AM, Oleg Nesterov wrote:
>> On 08/12, Bart Van Assche wrote:
>>> before I started testing. It took some time
>>> before I could reproduce the hang in truncate_inode_pages_range().
>>
>> all I can say this contradicts with the previous
On 08/13/2016 09:32 AM, Oleg Nesterov wrote:
On 08/12, Bart Van Assche wrote:
before I started testing. It took some time
before I could reproduce the hang in truncate_inode_pages_range().
all I can say this contradicts with the previous testing results with
my previous patch or with your
On 08/13/2016 09:32 AM, Oleg Nesterov wrote:
On 08/12, Bart Van Assche wrote:
before I started testing. It took some time
before I could reproduce the hang in truncate_inode_pages_range().
all I can say this contradicts with the previous testing results with
my previous patch or with your
Forgot to mention...
On 08/12, Bart Van Assche wrote:
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1643,7 +1643,12 @@ find_page:
>* wait_on_page_locked is used to avoid unnecessarily
>* serialisations and why it's safe.
>
Forgot to mention...
On 08/12, Bart Van Assche wrote:
>
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1643,7 +1643,12 @@ find_page:
>* wait_on_page_locked is used to avoid unnecessarily
>* serialisations and why it's safe.
>
On 08/12, Bart Van Assche wrote:
>
> On 08/12/2016 09:16 AM, Oleg Nesterov wrote:
> > Please drop two patches I sent before and try the new one below.
>
> Hello Oleg,
>
> Thanks for the patch. In addition to your patch I also applied the
> attached two patches
And I guess you did this because you
On 08/12, Bart Van Assche wrote:
>
> On 08/12/2016 09:16 AM, Oleg Nesterov wrote:
> > Please drop two patches I sent before and try the new one below.
>
> Hello Oleg,
>
> Thanks for the patch. In addition to your patch I also applied the
> attached two patches
And I guess you did this because you
On 08/12/2016 09:16 AM, Oleg Nesterov wrote:
> Please drop two patches I sent before and try the new one below.
Hello Oleg,
Thanks for the patch. In addition to your patch I also applied the
attached two patches before I started testing. It took some time
before I could reproduce the hang in
On 08/12/2016 09:16 AM, Oleg Nesterov wrote:
> Please drop two patches I sent before and try the new one below.
Hello Oleg,
Thanks for the patch. In addition to your patch I also applied the
attached two patches before I started testing. It took some time
before I could reproduce the hang in
On 08/12/2016 09:16 AM, Oleg Nesterov wrote:
On 08/11, Oleg Nesterov wrote:
Please drop two patches I sent before and try the new one below.
Thanks, will do.
Which kernel version do you use?
Kernel v4.7 with a few ib_srp and dm-mpath backports from kernel
v4.8-rc1 and also a few SCSI
On 08/12/2016 09:16 AM, Oleg Nesterov wrote:
On 08/11, Oleg Nesterov wrote:
Please drop two patches I sent before and try the new one below.
Thanks, will do.
Which kernel version do you use?
Kernel v4.7 with a few ib_srp and dm-mpath backports from kernel
v4.8-rc1 and also a few SCSI
On 08/11, Oleg Nesterov wrote:
>
> I'll send another debugging patch tomorrow, I was a bit busy today. The next
> step is obvious, we need to know the caller.
Please drop two patches I sent before anf try the new one below.
Which kernel version do you use?
Oleg.
---
diff --git
On 08/11, Oleg Nesterov wrote:
>
> I'll send another debugging patch tomorrow, I was a bit busy today. The next
> step is obvious, we need to know the caller.
Please drop two patches I sent before anf try the new one below.
Which kernel version do you use?
Oleg.
---
diff --git
Hi Bart,
On 08/10, Bart Van Assche wrote:
>
> That's an excellent catch. With your previous patch and this patch applied I
> can't reproduce the hang in truncate_inode_pages_range() anymore.
Great, thanks.
I'll send another debugging patch tomorrow, I was a bit busy today. The next
step is
Hi Bart,
On 08/10, Bart Van Assche wrote:
>
> That's an excellent catch. With your previous patch and this patch applied I
> can't reproduce the hang in truncate_inode_pages_range() anymore.
Great, thanks.
I'll send another debugging patch tomorrow, I was a bit busy today. The next
step is
On 08/09, Bart Van Assche wrote:
>
> Hello Oleg,
>
> Something that puzzles me is that removing the "else" keyword from
> abort_exclusive_wait() is sufficient to avoid the hang.
Yes, we need to understand this.
> If there would
> be code that clears PG_locked without calling wake_up() this hang
On 08/09, Bart Van Assche wrote:
>
> Hello Oleg,
>
> Something that puzzles me is that removing the "else" keyword from
> abort_exclusive_wait() is sufficient to avoid the hang.
Yes, we need to understand this.
> If there would
> be code that clears PG_locked without calling wake_up() this hang
On 08/10/2016 03:46 AM, Oleg Nesterov wrote:
OK. Could you try another debugging patch below?
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a3244..9d5f892 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -711,6 +711,15 @@ static inline
On 08/10/2016 03:46 AM, Oleg Nesterov wrote:
OK. Could you try another debugging patch below?
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a3244..9d5f892 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -711,6 +711,15 @@ static inline
On Wed, Aug 10, 2016 at 12:57:25PM +0200, Oleg Nesterov wrote:
> This condition is fine, and the trace is clear. This means that
> lock_page_killable()
> was interrupted and wake_bit_function() was not called. We do not need
> another wakeup
> in this case but somehow it helps. Again, I think
On Wed, Aug 10, 2016 at 12:57:25PM +0200, Oleg Nesterov wrote:
> This condition is fine, and the trace is clear. This means that
> lock_page_killable()
> was interrupted and wake_bit_function() was not called. We do not need
> another wakeup
> in this case but somehow it helps. Again, I think
On 08/09, Bart Van Assche wrote:
>
> On 08/09/2016 10:15 AM, Oleg Nesterov wrote:
> >
> > --- x/kernel/sched/wait.c
> > +++ x/kernel/sched/wait.c
> > @@ -283,7 +283,7 @@ void abort_exclusive_wait(wait_queue_hea
> > if (!list_empty(>task_list))
> > list_del_init(>task_list);
> >
On 08/10, Bart Van Assche wrote:
>
> On 08/10/2016 03:46 AM, Oleg Nesterov wrote:
> > OK. Could you try another debugging patch below?
> >
> > Oleg.
> > ---
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index e5a3244..9d5f892 100644
> > ---
On 08/09, Bart Van Assche wrote:
>
> On 08/09/2016 10:15 AM, Oleg Nesterov wrote:
> >
> > --- x/kernel/sched/wait.c
> > +++ x/kernel/sched/wait.c
> > @@ -283,7 +283,7 @@ void abort_exclusive_wait(wait_queue_hea
> > if (!list_empty(>task_list))
> > list_del_init(>task_list);
> >
On 08/10, Bart Van Assche wrote:
>
> On 08/10/2016 03:46 AM, Oleg Nesterov wrote:
> > OK. Could you try another debugging patch below?
> >
> > Oleg.
> > ---
> >
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index e5a3244..9d5f892 100644
> > ---
On 08/10/2016 03:46 AM, Oleg Nesterov wrote:
> OK. Could you try another debugging patch below?
>
> Oleg.
> ---
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index e5a3244..9d5f892 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@
On 08/10/2016 03:46 AM, Oleg Nesterov wrote:
> OK. Could you try another debugging patch below?
>
> Oleg.
> ---
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index e5a3244..9d5f892 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@
On 08/08/2016 09:20 AM, Oleg Nesterov wrote:
> So far _I think_ that the bug is somewhere else... Say, someone clears
> PG_locked without wake_up(). Then SIGKILL sent to the task sleeping in
> sys_read() "adds" the necessary wakeup...
Hello Oleg,
Something that puzzles me is that removing the
On 08/08/2016 09:20 AM, Oleg Nesterov wrote:
> So far _I think_ that the bug is somewhere else... Say, someone clears
> PG_locked without wake_up(). Then SIGKILL sent to the task sleeping in
> sys_read() "adds" the necessary wakeup...
Hello Oleg,
Something that puzzles me is that removing the
On 08/09/2016 11:48 AM, Bart Van Assche wrote:
[ 1548.018115] sysrq: SysRq : Show Blocked State
[ 1548.018210] taskPC stack pid father
[ 1548.018677] systemd-udevd D 8803a9f13be8 0 29908483 0x
[ 1548.018792] 8803a9f13be8 82584bd0
On 08/09/2016 11:48 AM, Bart Van Assche wrote:
[ 1548.018115] sysrq: SysRq : Show Blocked State
[ 1548.018210] taskPC stack pid father
[ 1548.018677] systemd-udevd D 8803a9f13be8 0 29908483 0x
[ 1548.018792] 8803a9f13be8 82584bd0
On 08/09/2016 10:15 AM, Oleg Nesterov wrote:
> On 08/08, Bart Van Assche wrote:
>>
>> No external modules were loaded when I triggered the lockup
>
> Heh. Could you test the patch below?
>
> Oleg.
>
> --- x/kernel/sched/wait.c
> +++ x/kernel/sched/wait.c
> @@ -283,7 +283,7 @@ void
On 08/09/2016 10:15 AM, Oleg Nesterov wrote:
> On 08/08, Bart Van Assche wrote:
>>
>> No external modules were loaded when I triggered the lockup
>
> Heh. Could you test the patch below?
>
> Oleg.
>
> --- x/kernel/sched/wait.c
> +++ x/kernel/sched/wait.c
> @@ -283,7 +283,7 @@ void
On 08/08, Bart Van Assche wrote:
>
> No external modules were loaded when I triggered the lockup
Heh. Could you test the patch below?
Oleg.
--- x/kernel/sched/wait.c
+++ x/kernel/sched/wait.c
@@ -283,7 +283,7 @@ void abort_exclusive_wait(wait_queue_hea
if (!list_empty(>task_list))
On 08/08, Bart Van Assche wrote:
>
> No external modules were loaded when I triggered the lockup
Heh. Could you test the patch below?
Oleg.
--- x/kernel/sched/wait.c
+++ x/kernel/sched/wait.c
@@ -283,7 +283,7 @@ void abort_exclusive_wait(wait_queue_hea
if (!list_empty(>task_list))
On 08/08/2016 09:20 AM, Oleg Nesterov wrote:
Do you use external modules during the testing?
Hello Oleg,
No external modules were loaded when I triggered the lockup I mentioned
in the patch description. Although the SRP test software I referred to
earlier can be run against the SCST SRP
On 08/08/2016 09:20 AM, Oleg Nesterov wrote:
Do you use external modules during the testing?
Hello Oleg,
No external modules were loaded when I triggered the lockup I mentioned
in the patch description. Although the SRP test software I referred to
earlier can be run against the SCST SRP
On 08/08, Bart Van Assche wrote:
>
> This is the sequence of which I think that it leads to the missed wakeup:
>
> Task 1Task 2Task 3
> Task 4
>
> lock_page()
> ...
> lock_page_killable()
>
On 08/08, Bart Van Assche wrote:
>
> This is the sequence of which I think that it leads to the missed wakeup:
>
> Task 1Task 2Task 3
> Task 4
>
> lock_page()
> ...
> lock_page_killable()
>
On 08/08/16 03:22, Peter Zijlstra wrote:
> That would be the exact scenario I drew a picture of, no? I'm still
> failing to see the hole there.
>
> Please draw a picture like that and illustrate the hole.
Hi Peter,
This is the sequence of which I think that it leads to the missed wakeup:
Task
On 08/08/16 03:22, Peter Zijlstra wrote:
> That would be the exact scenario I drew a picture of, no? I'm still
> failing to see the hole there.
>
> Please draw a picture like that and illustrate the hole.
Hi Peter,
This is the sequence of which I think that it leads to the missed wakeup:
Task
On Fri, Aug 05, 2016 at 10:41:33AM -0700, Bart Van Assche wrote:
> On 08/04/2016 07:09 AM, Peter Zijlstra wrote:
> >But I'd still like to understand where we loose the wakeup.
>
> My assumption is that __wake_up_common() and signal delivery happen
> concurrently, that __wake_up_common() wakes up
On Fri, Aug 05, 2016 at 10:41:33AM -0700, Bart Van Assche wrote:
> On 08/04/2016 07:09 AM, Peter Zijlstra wrote:
> >But I'd still like to understand where we loose the wakeup.
>
> My assumption is that __wake_up_common() and signal delivery happen
> concurrently, that __wake_up_common() wakes up
On 08/04/2016 07:09 AM, Peter Zijlstra wrote:
On Wed, Aug 03, 2016 at 02:51:23PM -0700, Bart Van Assche wrote:
So I started testing the patch below that should fix the same hang but
without triggering any wait list corruption.
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index
On 08/04/2016 07:09 AM, Peter Zijlstra wrote:
On Wed, Aug 03, 2016 at 02:51:23PM -0700, Bart Van Assche wrote:
So I started testing the patch below that should fix the same hang but
without triggering any wait list corruption.
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index
On 08/04/16 07:09, Peter Zijlstra wrote:
But I'd still like to understand where we loose the wakeup. What are you
doing to reproduce this issue?
Hello Peter,
The test I run is as follows:
* Configure the ib_srpt driver to export a RAM disk through the SRP
protocol. The ib_srpt driver is a
On 08/04/16 07:09, Peter Zijlstra wrote:
But I'd still like to understand where we loose the wakeup. What are you
doing to reproduce this issue?
Hello Peter,
The test I run is as follows:
* Configure the ib_srpt driver to export a RAM disk through the SRP
protocol. The ib_srpt driver is a
On Wed, Aug 03, 2016 at 02:51:23PM -0700, Bart Van Assche wrote:
> So I started testing the patch below that should fix the same hang but
> without triggering any wait list corruption.
>
> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> index f15d6b6..4e3f651 100644
> ---
On Wed, Aug 03, 2016 at 02:51:23PM -0700, Bart Van Assche wrote:
> So I started testing the patch below that should fix the same hang but
> without triggering any wait list corruption.
>
> diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
> index f15d6b6..4e3f651 100644
> ---
On 08/03/2016 02:30 PM, Oleg Nesterov wrote:
I too can't understand the problem. Perhaps you missed the fact that
abort_exclusive_wait() does everything under wait_queue_head_t->lock ?
[ ... ]
But we do not care if we race with another try_to_wake_up(), or even with
another exclusive
On 08/03/2016 02:30 PM, Oleg Nesterov wrote:
I too can't understand the problem. Perhaps you missed the fact that
abort_exclusive_wait() does everything under wait_queue_head_t->lock ?
[ ... ]
But we do not care if we race with another try_to_wake_up(), or even with
another exclusive
On 08/03/2016 02:30 PM, Oleg Nesterov wrote:
On 08/03, Bart Van Assche wrote:
try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not.
My assumption is that the following sequence of events leads to the lockup
that I had mentioned in the description of my patch:
*
On 08/03/2016 02:30 PM, Oleg Nesterov wrote:
On 08/03, Bart Van Assche wrote:
try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not.
My assumption is that the following sequence of events leads to the lockup
that I had mentioned in the description of my patch:
*
Hi Bart,
I too can't understand the problem. Perhaps you missed the fact that
abort_exclusive_wait() does everything under wait_queue_head_t->lock ?
On 08/03, Bart Van Assche wrote:
>
> try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not.
> My assumption is that the
Hi Bart,
I too can't understand the problem. Perhaps you missed the fact that
abort_exclusive_wait() does everything under wait_queue_head_t->lock ?
On 08/03, Bart Van Assche wrote:
>
> try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait() not.
> My assumption is that the
On Wed, Aug 03, 2016 at 09:35:03AM -0700, Bart Van Assche wrote:
> If try_to_wakeup() reads the task state before abort_exclusive_wait()
> sets the task state and if autoremove_wake_function() is called after
> abort_exclusive_wait() has removed a task from a wait list then the
> cascading
On Wed, Aug 03, 2016 at 09:35:03AM -0700, Bart Van Assche wrote:
> If try_to_wakeup() reads the task state before abort_exclusive_wait()
> sets the task state and if autoremove_wake_function() is called after
> abort_exclusive_wait() has removed a task from a wait list then the
> cascading
On 08/03/2016 11:11 AM, Peter Zijlstra wrote:
That seems to do the right thing, so clearly I misunderstand. Please
clarify.
Hello Peter,
try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait()
not. My assumption is that the following sequence of events leads to the
lockup that
On 08/03/2016 11:11 AM, Peter Zijlstra wrote:
That seems to do the right thing, so clearly I misunderstand. Please
clarify.
Hello Peter,
try_to_wake_up() locks task_struct.pi_lock but abort_exclusive_wait()
not. My assumption is that the following sequence of events leads to the
lockup that
If try_to_wakeup() reads the task state before abort_exclusive_wait()
sets the task state and if autoremove_wake_function() is called after
abort_exclusive_wait() has removed a task from a wait list then the
cascading mechanism for exclusive wakeups in abort_exclusive_wait()
won't be triggered.
If try_to_wakeup() reads the task state before abort_exclusive_wait()
sets the task state and if autoremove_wake_function() is called after
abort_exclusive_wait() has removed a task from a wait list then the
cascading mechanism for exclusive wakeups in abort_exclusive_wait()
won't be triggered.
66 matches
Mail list logo