On (09/06/19 17:32), Petr Mladek wrote:
> > [..]
> > > I mean, really, do we need to keep calling wake up if it
> > > probably never even executed?
> >
> > I guess ratelimiting you are talking about ("if it probably never even
> > executed") would be to check if we have already called wake up on
On 2019/09/04 17:25, Michal Hocko wrote:
> On Wed 04-09-19 16:00:42, Sergey Senozhatsky wrote:
>> On (09/04/19 15:41), Sergey Senozhatsky wrote:
>>> But the thing is different in case of dump_stack() + show_mem() +
>>> some other output. Because now we ratelimit not a single printk() line,
>>> but
On Fri, 2019-09-06 at 13:32 +0900, Sergey Senozhatsky wrote:
> On (09/05/19 12:03), Qian Cai wrote:
> > > ---
> > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > > index cd51aa7d08a9..89cb47882254 100644
> > > --- a/kernel/printk/printk.c
> > > +++ b/kernel/printk/printk.c
> > >
On (09/06/19 16:55), Petr Mladek wrote:
> > I think we can queue significantly much less irq_work-s from printk().
> >
> > Petr, Steven, what do you think?
> >
> > Something like this. Call wake_up_interruptible(), switch to
> > wake_up_klogd() only when called from sched code.
>
> Replacing
On Fri 2019-09-06 12:39:00, Sergey Senozhatsky wrote:
> On (09/05/19 13:23), Steven Rostedt wrote:
> > > I think we can queue significantly much less irq_work-s from printk().
> > >
> > > Petr, Steven, what do you think?
>
> [..]
> > I mean, really, do we need to keep calling wake up if it
> >
On Thu 2019-09-05 20:32:08, Sergey Senozhatsky wrote:
> On (09/04/19 16:42), Qian Cai wrote:
> > > Let me think more.
> >
> > To summary, those look to me are all good long-term improvement that would
> > reduce the likelihood of this kind of livelock in general especially for
> > other
> >
On (09/05/19 12:03), Qian Cai wrote:
> > ---
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index cd51aa7d08a9..89cb47882254 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2027,8 +2027,11 @@ asmlinkage int vprintk_emit(int facility, int
On (09/05/19 13:23), Steven Rostedt wrote:
> > I think we can queue significantly much less irq_work-s from printk().
> >
> > Petr, Steven, what do you think?
[..]
> I mean, really, do we need to keep calling wake up if it
> probably never even executed?
I guess ratelimiting you are talking
On (09/05/19 13:14), Steven Rostedt wrote:
> > Hmm, from the article,
> >
> > https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter
> >
> > "Since transmission of a single or multiple characters may take a long time
> > relative to CPU speeds, a UART maintains a flag showing
On Thu, 5 Sep 2019 20:32:08 +0900
Sergey Senozhatsky wrote:
> I think we can queue significantly much less irq_work-s from printk().
>
> Petr, Steven, what do you think?
What if we just rate limit the wake ups of klogd? I mean, really, do we
need to keep calling wake up if it probably never
On Thu, 05 Sep 2019 12:03:13 -0400
Qian Cai wrote:
> > > and could deal with console hardware that involve irq_exit() anyway.
> >
> > printk->console_driver->write() does not involve irq.
>
> Hmm, from the article,
>
>
On Thu, 2019-09-05 at 20:32 +0900, Sergey Senozhatsky wrote:
> On (09/04/19 16:42), Qian Cai wrote:
> > > Let me think more.
> >
> > To summary, those look to me are all good long-term improvement that would
> > reduce the likelihood of this kind of livelock in general especially for
> > other
>
On 9/5/19 4:09 PM, Qian Cai wrote:
> Instead of repeatedly make generalize statements, could you enlighten me with
> some concrete examples that have the similar properties which would trigger a
> livelock,
>
> - guaranteed GFP_ATOMIC allocations when processing softirq batches.
> - the
On 9/5/19 4:09 PM, Qian Cai wrote:
>
> I feel like you may not follow the thread closely. There are more details
> uncovered in the last few days and narrowed down to the culprits.
>
I have followed the thread closely, thank you very much.
I am happy that the problem is addressed as I
On Thu, 2019-09-05 at 10:32 +0200, Eric Dumazet wrote:
>
> On 9/4/19 10:42 PM, Qian Cai wrote:
>
> > To summary, those look to me are all good long-term improvement that would
> > reduce the likelihood of this kind of livelock in general especially for
> > other
> > unknown allocations that
On (09/04/19 16:42), Qian Cai wrote:
> > Let me think more.
>
> To summary, those look to me are all good long-term improvement that would
> reduce the likelihood of this kind of livelock in general especially for other
> unknown allocations that happen while processing softirqs, but it is still
On 9/4/19 10:42 PM, Qian Cai wrote:
> To summary, those look to me are all good long-term improvement that would
> reduce the likelihood of this kind of livelock in general especially for other
> unknown allocations that happen while processing softirqs, but it is still up
> to
> the air if
On Wed, 2019-09-04 at 23:48 +0900, Sergey Senozhatsky wrote:
> On (09/04/19 08:14), Qian Cai wrote:
> > > Plus one more check - waitqueue_active(_wait). printk() adds
> > > pending irq_work only if there is a user-space process sleeping on
> > > log_wait and irq_work is not already scheduled. If
On Wed, 2019-09-04 at 23:48 +0900, Sergey Senozhatsky wrote:
> On (09/04/19 08:14), Qian Cai wrote:
> > > Plus one more check - waitqueue_active(_wait). printk() adds
> > > pending irq_work only if there is a user-space process sleeping on
> > > log_wait and irq_work is not already scheduled. If
On (09/04/19 08:14), Qian Cai wrote:
> > Plus one more check - waitqueue_active(_wait). printk() adds
> > pending irq_work only if there is a user-space process sleeping on
> > log_wait and irq_work is not already scheduled. If the syslog is
> > active or there is noone to wakeup then we don't
On Wed, 2019-09-04 at 14:07 +0200, Michal Hocko wrote:
> On Wed 04-09-19 07:59:17, Qian Cai wrote:
> > On Wed, 2019-09-04 at 10:25 +0200, Michal Hocko wrote:
> > > On Wed 04-09-19 16:00:42, Sergey Senozhatsky wrote:
> > > > On (09/04/19 15:41), Sergey Senozhatsky wrote:
> > > > > But the thing is
On Wed, 2019-09-04 at 16:43 +0900, Sergey Senozhatsky wrote:
> On (09/04/19 16:19), Sergey Senozhatsky wrote:
> > Hmm. I need to look at this more... wake_up_klogd() queues work only once
> > on particular CPU: irq_work_queue(this_cpu_ptr(_up_klogd_work));
> >
> > bool irq_work_queue()
> > {
> >
On Wed 04-09-19 07:59:17, Qian Cai wrote:
> On Wed, 2019-09-04 at 10:25 +0200, Michal Hocko wrote:
> > On Wed 04-09-19 16:00:42, Sergey Senozhatsky wrote:
> > > On (09/04/19 15:41), Sergey Senozhatsky wrote:
> > > > But the thing is different in case of dump_stack() + show_mem() +
> > > > some
On Wed, 2019-09-04 at 10:25 +0200, Michal Hocko wrote:
> On Wed 04-09-19 16:00:42, Sergey Senozhatsky wrote:
> > On (09/04/19 15:41), Sergey Senozhatsky wrote:
> > > But the thing is different in case of dump_stack() + show_mem() +
> > > some other output. Because now we ratelimit not a single
On Wed 04-09-19 16:00:42, Sergey Senozhatsky wrote:
> On (09/04/19 15:41), Sergey Senozhatsky wrote:
> > But the thing is different in case of dump_stack() + show_mem() +
> > some other output. Because now we ratelimit not a single printk() line,
> > but hundreds of them. The ratelimit becomes -
On (09/04/19 16:19), Sergey Senozhatsky wrote:
> Hmm. I need to look at this more... wake_up_klogd() queues work only once
> on particular CPU: irq_work_queue(this_cpu_ptr(_up_klogd_work));
>
> bool irq_work_queue()
> {
> /* Only queue if not already pending */
> if
On (09/04/19 08:54), Michal Hocko wrote:
> I am sorry, I could have been more explicit when CCing you.
Oh, sorry! My bad!
> Sure the ratelimit is part of the problem. But I was more interested
> in the potential livelock (infinite loop) mentioned by Qian Cai. It
> is not important whether we
On (09/04/19 15:41), Sergey Senozhatsky wrote:
> But the thing is different in case of dump_stack() + show_mem() +
> some other output. Because now we ratelimit not a single printk() line,
> but hundreds of them. The ratelimit becomes - 10 * $$$ lines in 5 seconds
> (IOW, now we talk about
On Wed 04-09-19 15:41:44, Sergey Senozhatsky wrote:
> On (09/04/19 08:15), Michal Hocko wrote:
> > > If you look at the original report, the failed allocation dump_stack() is,
> > >
> > >
> > > warn_alloc.cold.43+0x8a/0x148
> > > __alloc_pages_nodemask+0x1a5c/0x1bb0
> > >
On (09/04/19 08:15), Michal Hocko wrote:
> > If you look at the original report, the failed allocation dump_stack() is,
> >
> >
> > warn_alloc.cold.43+0x8a/0x148
> > __alloc_pages_nodemask+0x1a5c/0x1bb0
> > alloc_pages_current+0x9c/0x110
> > allocate_slab+0x34a/0x11f0
> >
> On Tue, 2019-09-03 at 20:53 +0200, Michal Hocko wrote:
> > On Tue 03-09-19 11:42:22, Qian Cai wrote:
> > > On Tue, 2019-09-03 at 15:22 +0200, Michal Hocko wrote:
> > > > On Fri 30-08-19 18:15:22, Eric Dumazet wrote:
> > > > > If there is a risk of flooding the syslog, we should fix this
> > > >
Cc printk maintainers
On Tue 03-09-19 17:42:28, Qian Cai wrote:
> > > I suppose what happens is those skb_build() allocations are from softirq,
> > > and
> > > once one of them failed, it calls printk() which generates more
> > > interrupts.
> > > Hence, the infinite loop.
> >
> > Please
On Tue, 2019-09-03 at 20:53 +0200, Michal Hocko wrote:
> On Tue 03-09-19 11:42:22, Qian Cai wrote:
> > On Tue, 2019-09-03 at 15:22 +0200, Michal Hocko wrote:
> > > On Fri 30-08-19 18:15:22, Eric Dumazet wrote:
> > > > If there is a risk of flooding the syslog, we should fix this
> > > >
On Tue 03-09-19 11:42:22, Qian Cai wrote:
> On Tue, 2019-09-03 at 15:22 +0200, Michal Hocko wrote:
> > On Fri 30-08-19 18:15:22, Eric Dumazet wrote:
> > > If there is a risk of flooding the syslog, we should fix this generically
> > > in mm layer, not adding hundred of __GFP_NOWARN all over the
On Tue, 2019-09-03 at 15:22 +0200, Michal Hocko wrote:
> On Fri 30-08-19 18:15:22, Eric Dumazet wrote:
> > If there is a risk of flooding the syslog, we should fix this generically
> > in mm layer, not adding hundred of __GFP_NOWARN all over the places.
>
> We do already ratelimit in warn_alloc.
On Fri 30-08-19 18:15:22, Eric Dumazet wrote:
> If there is a risk of flooding the syslog, we should fix this generically
> in mm layer, not adding hundred of __GFP_NOWARN all over the places.
We do already ratelimit in warn_alloc. If it isn't sufficient then we
can think of a different
On 8/30/19 5:25 PM, Qian Cai wrote:
> On Fri, 2019-08-30 at 17:11 +0200, Eric Dumazet wrote:
>>
>> On 8/30/19 4:57 PM, Qian Cai wrote:
>>> When running heavy memory pressure workloads, the system is throwing
>>> endless warnings below due to the allocation could fail from
>>> __build_skb(), and
On Fri, 2019-08-30 at 18:15 +0200, Eric Dumazet wrote:
>
> On 8/30/19 5:25 PM, Qian Cai wrote:
> > On Fri, 2019-08-30 at 17:11 +0200, Eric Dumazet wrote:
> > >
> > > On 8/30/19 4:57 PM, Qian Cai wrote:
> > > > When running heavy memory pressure workloads, the system is throwing
> > > > endless
On 8/30/19 5:25 PM, Qian Cai wrote:
> On Fri, 2019-08-30 at 17:11 +0200, Eric Dumazet wrote:
>>
>> On 8/30/19 4:57 PM, Qian Cai wrote:
>>> When running heavy memory pressure workloads, the system is throwing
>>> endless warnings below due to the allocation could fail from
>>> __build_skb(), and
On Fri, 2019-08-30 at 17:11 +0200, Eric Dumazet wrote:
>
> On 8/30/19 4:57 PM, Qian Cai wrote:
> > When running heavy memory pressure workloads, the system is throwing
> > endless warnings below due to the allocation could fail from
> > __build_skb(), and the volume of this call could be huge
On 8/30/19 4:57 PM, Qian Cai wrote:
> When running heavy memory pressure workloads, the system is throwing
> endless warnings below due to the allocation could fail from
> __build_skb(), and the volume of this call could be huge which may
> generate a lot of serial console output and cosumes
When running heavy memory pressure workloads, the system is throwing
endless warnings below due to the allocation could fail from
__build_skb(), and the volume of this call could be huge which may
generate a lot of serial console output and cosumes all CPUs as
warn_alloc() could be expensive by
42 matches
Mail list logo