On (12/28/18 16:03), Daniel Wang wrote:
> Thanks. I was able to confirm that commit c7c3f05e341a9a2bd alone
> fixed the problem for me. As expected, all 16 CPUs' stacktrace was
> printed, before a final panic stack dump and a successful reboot.
Cool, thanks!
-ss
Thanks. I was able to confirm that commit c7c3f05e341a9a2bd alone
fixed the problem for me. As expected, all 16 CPUs' stacktrace was
printed, before a final panic stack dump and a successful reboot.
[ 24.035044] Hogging a CPU now
[ 48.200258] watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
On Fri, Dec 28, 2018 at 09:16:51AM +0900, Sergey Senozhatsky wrote:
> On (12/12/18 17:10), Sergey Senozhatsky wrote:
> > And there will be another -stable backport request in a week or so.
>
> The remaining one:
>
> commit c7c3f05e341a9a2bd
Now queued up, thanks.
greg k-h
On (12/12/18 17:10), Sergey Senozhatsky wrote:
> And there will be another -stable backport request in a week or so.
The remaining one:
commit c7c3f05e341a9a2bd
-ss
On Thu, Dec 13, 2018 at 10:59:31AM +0100, Petr Mladek wrote:
On Wed 2018-12-12 18:39:42, Daniel Wang wrote:
> Additionally, for dbdda842fe96f to work as expected we really
need fd5f7cde1b85d4c. Otherwise printk() can schedule under
console_sem and console_owner, which will deactivate the "load
On Wed 2018-12-12 18:39:42, Daniel Wang wrote:
> > Additionally, for dbdda842fe96f to work as expected we really
> need fd5f7cde1b85d4c. Otherwise printk() can schedule under
> console_sem and console_owner, which will deactivate the "load
> balance" logic.
>
> It looks like fd5f7cde1b85d4c got
> Additionally, for dbdda842fe96f to work as expected we really
need fd5f7cde1b85d4c. Otherwise printk() can schedule under
console_sem and console_owner, which will deactivate the "load
balance" logic.
It looks like fd5f7cde1b85d4c got into 4.14.82 that was released last month.
On Wed, Dec 12,
On (12/12/18 16:40), Daniel Wang wrote:
> In case this was buried in previous messages, the commit I'd like to
> get backported to 4.14 is dbdda842fe96f: printk: Add console owner and
> waiter logic to load balance console writes. But another followup
> patch that fixes a bug in that patch is also
On (12/12/18 16:52), Sasha Levin wrote:
> On Wed, Dec 12, 2018 at 01:49:25PM -0800, Daniel Wang wrote:
> > Thanks for the clarification. So I guess I don't need to start another
> > thread for it? What are the next steps?
>
> Nothing here, I'll queue it once Sergey or Petr clarify if they wanted
In case this was buried in previous messages, the commit I'd like to
get backported to 4.14 is dbdda842fe96f: printk: Add console owner and
waiter logic to load balance console writes. But another followup
patch that fixes a bug in that patch is also required. That is
c14376de3a1b: printk: Wake
Thank you!
On Wed, Dec 12, 2018 at 1:52 PM Sasha Levin wrote:
>
> On Wed, Dec 12, 2018 at 01:49:25PM -0800, Daniel Wang wrote:
> >Thanks for the clarification. So I guess I don't need to start another
> >thread for it? What are the next steps?
>
> Nothing here, I'll queue it once Sergey or Petr
On Wed, Dec 12, 2018 at 01:49:25PM -0800, Daniel Wang wrote:
Thanks for the clarification. So I guess I don't need to start another
thread for it? What are the next steps?
Nothing here, I'll queue it once Sergey or Petr clarify if they wanted
additional information in the -stable commit
Thanks for the clarification. So I guess I don't need to start another
thread for it? What are the next steps?
On Wed, Dec 12, 2018 at 1:43 PM Sasha Levin wrote:
>
> On Wed, Dec 12, 2018 at 12:11:29PM -0800, Daniel Wang wrote:
> >On Wed, Dec 12, 2018 at 9:43 AM Sasha Levin wrote:
> >>
> >> On
On Wed, Dec 12, 2018 at 12:11:29PM -0800, Daniel Wang wrote:
On Wed, Dec 12, 2018 at 9:43 AM Sasha Levin wrote:
On Wed, Dec 12, 2018 at 10:59:39PM +0900, Sergey Senozhatsky wrote:
>On (12/12/18 14:36), Petr Mladek wrote:
>> > OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
On Wed, Dec 12, 2018 at 9:43 AM Sasha Levin wrote:
>
> On Wed, Dec 12, 2018 at 10:59:39PM +0900, Sergey Senozhatsky wrote:
> >On (12/12/18 14:36), Petr Mladek wrote:
> >> > OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
> >> > and I wasn't Cc-ed on this whole discussion and
On Wed, Dec 12, 2018 at 10:59:39PM +0900, Sergey Senozhatsky wrote:
On (12/12/18 14:36), Petr Mladek wrote:
> OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
> and I wasn't Cc-ed on this whole discussion and found it purely
> accidentally while browsing linux-mm list.
I am
On (12/12/18 14:36), Petr Mladek wrote:
> > OK, really didn't know that! I wasn't Cc-ed on that AUTOSEL email,
> > and I wasn't Cc-ed on this whole discussion and found it purely
> > accidentally while browsing linux-mm list.
>
> I am sorry that I did not CC you. There were so many people in CC.
On Wed 2018-12-12 17:10:34, Sergey Senozhatsky wrote:
> On (12/12/18 01:48), Sasha Levin wrote:
> > > > > I guess we still don't have a really clear understanding of what
> > > > > exactly
> > > > is going in your system
> > > >
> > > > I would also like to get to the bottom of it. Unfortunately
On (12/12/18 01:48), Sasha Levin wrote:
> > > > I guess we still don't have a really clear understanding of what exactly
> > > is going in your system
> > >
> > > I would also like to get to the bottom of it. Unfortunately I haven't
> > > got the expertise in this area nor the time to do it yet.
On Wed, Dec 12, 2018 at 03:28:41PM +0900, Sergey Senozhatsky wrote:
On (12/11/18 22:08), Daniel Wang wrote:
I've been meaning to try it but kept getting distracted by other
things. I'll try to find some time for it this week or next. Right now
my intent is to get Steven's patch into 4.14
On (12/11/18 22:08), Daniel Wang wrote:
>
> I've been meaning to try it but kept getting distracted by other
> things. I'll try to find some time for it this week or next. Right now
> my intent is to get Steven's patch into 4.14 stable as it evidently
> fixed the particular issue I was seeing, and
> So... did my patch address the deadlock you are seeing or it didn't?
I've been meaning to try it but kept getting distracted by other
things. I'll try to find some time for it this week or next. Right now
my intent is to get Steven's patch into 4.14 stable as it evidently
fixed the particular
On (12/11/18 17:16), Daniel Wang wrote:
> > Let's first figure out if it works.
>
> I would still like to try applying your patches that went into
> printk.git, but for now I wonder if we can get Steven's patch into
> 4.14 first, for at least we know it mitigated the issue if not
> fundamentally
> Let's first figure out if it works.
I would still like to try applying your patches that went into
printk.git, but for now I wonder if we can get Steven's patch into
4.14 first, for at least we know it mitigated the issue if not
fundamentally addressed it, and we've agreed it's an innocuous
On (11/01/18 09:05), Daniel Wang wrote:
> > Another deadlock scenario could be the following one:
> >
> > printk()
> > console_trylock()
> > down_trylock()
> >raw_spin_lock_irqsave(>lock, flags)
> >
> > panic()
> >
On (11/01/18 09:05), Daniel Wang wrote:
> > Another deadlock scenario could be the following one:
> >
> > printk()
> > console_trylock()
> > down_trylock()
> >raw_spin_lock_irqsave(>lock, flags)
> >
> > panic()
> >
On Mon, Oct 22, 2018 at 3:10 AM Sergey Senozhatsky
wrote:
> Another deadlock scenario could be the following one:
>
> printk()
> console_trylock()
> down_trylock()
>raw_spin_lock_irqsave(>lock, flags)
>
> panic()
>
On Mon, Oct 22, 2018 at 3:10 AM Sergey Senozhatsky
wrote:
> Another deadlock scenario could be the following one:
>
> printk()
> console_trylock()
> down_trylock()
>raw_spin_lock_irqsave(>lock, flags)
>
> panic()
>
On (10/21/18 11:09), Daniel Wang wrote:
>
> Just got back from vacation. Thanks for the continued discussion. Just so
> I understand the current state. Looks like we've got a pretty good explanation
> of what's going on (though not completely sure), and backporting Steven's
> patches is still the
On (10/21/18 11:09), Daniel Wang wrote:
>
> Just got back from vacation. Thanks for the continued discussion. Just so
> I understand the current state. Looks like we've got a pretty good explanation
> of what's going on (though not completely sure), and backporting Steven's
> patches is still the
On Sun 2018-10-21 11:09:22, Daniel Wang wrote:
> Just got back from vacation. Thanks for the continued discussion. Just so
> I understand the current state. Looks like we've got a pretty good explanation
> of what's going on (though not completely sure), and backporting Steven's
> patches is still
On Sun 2018-10-21 11:09:22, Daniel Wang wrote:
> Just got back from vacation. Thanks for the continued discussion. Just so
> I understand the current state. Looks like we've got a pretty good explanation
> of what's going on (though not completely sure), and backporting Steven's
> patches is still
Just got back from vacation. Thanks for the continued discussion. Just so
I understand the current state. Looks like we've got a pretty good explanation
of what's going on (though not completely sure), and backporting Steven's
patches is still the way to go? I see that Sergey had sent an RFC
Just got back from vacation. Thanks for the continued discussion. Just so
I understand the current state. Looks like we've got a pretty good explanation
of what's going on (though not completely sure), and backporting Steven's
patches is still the way to go? I see that Sergey had sent an RFC
On (10/04/18 10:36), Petr Mladek wrote:
>
> This looks like a reasonable explanation of what is happening here.
> It also explains why the console owner logic helped.
Well, I'm still a bit puzzled, frankly speaking. I've two theories.
Theory #1 [most likely]
Steven is a wizard and his code
On (10/04/18 10:36), Petr Mladek wrote:
>
> This looks like a reasonable explanation of what is happening here.
> It also explains why the console owner logic helped.
Well, I'm still a bit puzzled, frankly speaking. I've two theories.
Theory #1 [most likely]
Steven is a wizard and his code
On Thu 2018-10-04 16:44:42, Sergey Senozhatsky wrote:
> On (10/03/18 11:37), Daniel Wang wrote:
> > When `softlockup_panic` is set (which is what my original repro had and
> > what we use in production), without the backport patch, the expected panic
> > would hit a seemingly deadlock. So even
On Thu 2018-10-04 16:44:42, Sergey Senozhatsky wrote:
> On (10/03/18 11:37), Daniel Wang wrote:
> > When `softlockup_panic` is set (which is what my original repro had and
> > what we use in production), without the backport patch, the expected panic
> > would hit a seemingly deadlock. So even
On (10/04/18 16:44), Sergey Senozhatsky wrote:
> So... Just an idea. Can you try a very dirty hack? Forcibly increase
> oops_in_progress in panic() before console_flush_on_panic(), so 8250
> serial8250_console_write() will use spin_trylock_irqsave() and maybe
> avoid deadlock.
E.g. something like
On (10/04/18 16:44), Sergey Senozhatsky wrote:
> So... Just an idea. Can you try a very dirty hack? Forcibly increase
> oops_in_progress in panic() before console_flush_on_panic(), so 8250
> serial8250_console_write() will use spin_trylock_irqsave() and maybe
> avoid deadlock.
E.g. something like
On Wed 2018-10-03 13:37:04, Steven Rostedt wrote:
> On Wed, 3 Oct 2018 10:16:08 -0700
> Daniel Wang wrote:
>
> > On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek wrote:
> > >
> > > On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > > > I don't see the big deal of backporting this. The biggest
On Wed 2018-10-03 13:37:04, Steven Rostedt wrote:
> On Wed, 3 Oct 2018 10:16:08 -0700
> Daniel Wang wrote:
>
> > On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek wrote:
> > >
> > > On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > > > I don't see the big deal of backporting this. The biggest
On (10/03/18 11:37), Daniel Wang wrote:
> When `softlockup_panic` is set (which is what my original repro had and
> what we use in production), without the backport patch, the expected panic
> would hit a seemingly deadlock. So even when the machine is configured
> to reboot immediately after the
On (10/03/18 11:37), Daniel Wang wrote:
> When `softlockup_panic` is set (which is what my original repro had and
> what we use in production), without the backport patch, the expected panic
> would hit a seemingly deadlock. So even when the machine is configured
> to reboot immediately after the
I wanted to let you know that I am leaving for a two-week vacation. So
if you don't hear from me during that period assume bad network
connectivity and not lack of enthusiasm. :) Feel free to go with the
backports if we reach an agreement here. Otherwise I'll do it when I get
back. Thank you all!
I wanted to let you know that I am leaving for a two-week vacation. So
if you don't hear from me during that period assume bad network
connectivity and not lack of enthusiasm. :) Feel free to go with the
backports if we reach an agreement here. Otherwise I'll do it when I get
back. Thank you all!
On Wed, Oct 3, 2018 at 10:37 AM Steven Rostedt wrote:
> Just so I understand correctly. Does the panic hit with and without the
> suggested backport patch? The only difference is that you get the full
> output with the patch and limited output without it?
When `softlockup_panic` is set (which is
On Wed, Oct 3, 2018 at 10:37 AM Steven Rostedt wrote:
> Just so I understand correctly. Does the panic hit with and without the
> suggested backport patch? The only difference is that you get the full
> output with the patch and limited output without it?
When `softlockup_panic` is set (which is
On Wed, 3 Oct 2018 10:16:08 -0700
Daniel Wang wrote:
> On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek wrote:
> >
> > On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > > I don't see the big deal of backporting this. The biggest complaints
> > > about backports are from fixes that were added to
On Wed, 3 Oct 2018 10:16:08 -0700
Daniel Wang wrote:
> On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek wrote:
> >
> > On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > > I don't see the big deal of backporting this. The biggest complaints
> > > about backports are from fixes that were added to
On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek wrote:
>
> On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > I don't see the big deal of backporting this. The biggest complaints
> > about backports are from fixes that were added to late -rc releases
> > where the fixes didn't get much testing.
On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek wrote:
>
> On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> > I don't see the big deal of backporting this. The biggest complaints
> > about backports are from fixes that were added to late -rc releases
> > where the fixes didn't get much testing.
On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> On Tue, 2 Oct 2018 17:15:17 -0700
> Daniel Wang wrote:
>
> > On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek wrote:
> > >
> > > Well, I still wonder why it helped and why you do not see it with 4.4.
> > > I have a feeling that the console owner
On Tue 2018-10-02 21:23:27, Steven Rostedt wrote:
> On Tue, 2 Oct 2018 17:15:17 -0700
> Daniel Wang wrote:
>
> > On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek wrote:
> > >
> > > Well, I still wonder why it helped and why you do not see it with 4.4.
> > > I have a feeling that the console owner
On Tue, 2 Oct 2018 17:15:17 -0700
Daniel Wang wrote:
> On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek wrote:
> >
> > Well, I still wonder why it helped and why you do not see it with 4.4.
> > I have a feeling that the console owner switch helped only by chance.
> > In fact, you might be affected by
On Tue, 2 Oct 2018 17:15:17 -0700
Daniel Wang wrote:
> On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek wrote:
> >
> > Well, I still wonder why it helped and why you do not see it with 4.4.
> > I have a feeling that the console owner switch helped only by chance.
> > In fact, you might be affected by
On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek wrote:
>
> Well, I still wonder why it helped and why you do not see it with 4.4.
> I have a feeling that the console owner switch helped only by chance.
> In fact, you might be affected by a race in
> printk_safe_flush_on_panic() that was fixed by the
On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek wrote:
>
> Well, I still wonder why it helped and why you do not see it with 4.4.
> I have a feeling that the console owner switch helped only by chance.
> In fact, you might be affected by a race in
> printk_safe_flush_on_panic() that was fixed by the
On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek wrote:
> Well, I still wonder why it helped and why you do not see it with 4.4.
> I have a feeling that the console owner switch helped only by chance.
So do I. I don't think Steven had the deadlock in mind when working on
that patch, but with that
On Tue, Oct 2, 2018 at 1:42 AM Petr Mladek wrote:
> Well, I still wonder why it helped and why you do not see it with 4.4.
> I have a feeling that the console owner switch helped only by chance.
So do I. I don't think Steven had the deadlock in mind when working on
that patch, but with that
On Mon 2018-10-01 13:37:30, Daniel Wang wrote:
> On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt wrote:
> >
> > > Serial console logs leading up to the deadlock. As can be seen the stack
> > > trace
> > > was incomplete because the printing path hit a timeout.
> >
> > I'm fine with having this
On Mon 2018-10-01 13:37:30, Daniel Wang wrote:
> On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt wrote:
> >
> > > Serial console logs leading up to the deadlock. As can be seen the stack
> > > trace
> > > was incomplete because the printing path hit a timeout.
> >
> > I'm fine with having this
On (09/27/18 12:46), Daniel Wang wrote:
> Prior to this change, the combination of `softlockup_panic=1` and
> `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot
> path
> is trying to grab the console lock that is held by the stack trace printing
> path. What seems to be
On (09/27/18 12:46), Daniel Wang wrote:
> Prior to this change, the combination of `softlockup_panic=1` and
> `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot
> path
> is trying to grab the console lock that is held by the stack trace printing
> path. What seems to be
On Mon, Oct 01, 2018 at 01:37:30PM -0700, Daniel Wang wrote:
>On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt wrote:
>>
>> > Serial console logs leading up to the deadlock. As can be seen the stack
>> > trace
>> > was incomplete because the printing path hit a timeout.
>>
>> I'm fine with having
On Mon, Oct 01, 2018 at 01:37:30PM -0700, Daniel Wang wrote:
>On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt wrote:
>>
>> > Serial console logs leading up to the deadlock. As can be seen the stack
>> > trace
>> > was incomplete because the printing path hit a timeout.
>>
>> I'm fine with having
On Mon, Oct 1, 2018 at 1:23 PM Vlastimil Babka wrote:
>
> On 10/1/18 10:13 PM, Pavel Machek wrote:
> >
> > Dunno. Is the patch perhaps a bit too complex? This is not exactly
> > trivial bugfix.
> >
> > pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat
> > printk.c | 108
> >
On Mon, Oct 1, 2018 at 1:23 PM Vlastimil Babka wrote:
>
> On 10/1/18 10:13 PM, Pavel Machek wrote:
> >
> > Dunno. Is the patch perhaps a bit too complex? This is not exactly
> > trivial bugfix.
> >
> > pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat
> > printk.c | 108
> >
On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt wrote:
>
> > Serial console logs leading up to the deadlock. As can be seen the stack
> > trace
> > was incomplete because the printing path hit a timeout.
>
> I'm fine with having this backported.
Thanks. I can send the cherrypicks your way. Do
On Mon, Oct 1, 2018 at 12:23 PM Steven Rostedt wrote:
>
> > Serial console logs leading up to the deadlock. As can be seen the stack
> > trace
> > was incomplete because the printing path hit a timeout.
>
> I'm fine with having this backported.
Thanks. I can send the cherrypicks your way. Do
On Mon, 1 Oct 2018 22:13:10 +0200
Pavel Machek wrote:
> > > [1]
> > > https://lore.kernel.org/lkml/20180409081535.dq7p5bfnpvd3x...@pathway.suse.cz/T/#u
> > >
> > > Serial console logs leading up to the deadlock. As can be seen the stack
> > > trace
> > > was incomplete because the printing
On Mon, 1 Oct 2018 22:13:10 +0200
Pavel Machek wrote:
> > > [1]
> > > https://lore.kernel.org/lkml/20180409081535.dq7p5bfnpvd3x...@pathway.suse.cz/T/#u
> > >
> > > Serial console logs leading up to the deadlock. As can be seen the stack
> > > trace
> > > was incomplete because the printing
On 10/1/18 10:13 PM, Pavel Machek wrote:
>
> Dunno. Is the patch perhaps a bit too complex? This is not exactly
> trivial bugfix.
>
> pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat
> printk.c | 108
> ++-
>
> I see
On 10/1/18 10:13 PM, Pavel Machek wrote:
>
> Dunno. Is the patch perhaps a bit too complex? This is not exactly
> trivial bugfix.
>
> pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat
> printk.c | 108
> ++-
>
> I see
On Mon 2018-10-01 15:23:24, Steven Rostedt wrote:
> On Thu, 27 Sep 2018 12:46:01 -0700
> Daniel Wang wrote:
>
> > Prior to this change, the combination of `softlockup_panic=1` and
> > `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot
> > path
> > is trying to grab the
On Mon 2018-10-01 15:23:24, Steven Rostedt wrote:
> On Thu, 27 Sep 2018 12:46:01 -0700
> Daniel Wang wrote:
>
> > Prior to this change, the combination of `softlockup_panic=1` and
> > `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot
> > path
> > is trying to grab the
On Thu, 27 Sep 2018 12:46:01 -0700
Daniel Wang wrote:
> Prior to this change, the combination of `softlockup_panic=1` and
> `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot
> path
> is trying to grab the console lock that is held by the stack trace printing
> path.
On Thu, 27 Sep 2018 12:46:01 -0700
Daniel Wang wrote:
> Prior to this change, the combination of `softlockup_panic=1` and
> `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot
> path
> is trying to grab the console lock that is held by the stack trace printing
> path.
78 matches
Mail list logo