Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-16 Thread Chris Metcalf

On 6/12/2014 2:45 PM, Joe Perches wrote:

(adding Chris Metcalf for arch/tile,
  I think this change might impact that arch)


Thanks for the Cc.  I've been following the discussion and I think it's on the 
right track for arch/tile.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-16 Thread Chris Metcalf

On 6/12/2014 2:45 PM, Joe Perches wrote:

(adding Chris Metcalf for arch/tile,
  I think this change might impact that arch)


Thanks for the Cc.  I've been following the discussion and I think it's on the 
right track for arch/tile.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Luis R. Rodriguez
On Thu, Jun 12, 2014 at 11:45 AM, Joe Perches  wrote:
>
> trivia: s/loosing/losing/

Amended, thanks.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Luis R. Rodriguez
On Thu, Jun 12, 2014 at 11:01 AM, Davidlohr Bueso  wrote:
> On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote:
>> On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
>> > From: "Luis R. Rodriguez" 
>> >
>> > The default size of the ring buffer is too small for machines
>> > with a large amount of CPUs under heavy load. What ends up
>> > happening when debugging is the ring buffer overlaps and chews
>> > up old messages making debugging impossible unless the size is
>> > passed as a kernel parameter. An idle system upon boot up will
>> > on average spew out only about one or two extra lines but where
>> > this really matters is on heavy load and that will vary widely
>> > depending on the system and environment.
>>
>> Thanks for looking at this. It is a pity to lose stracktrace when a huge
>> machine Oopses just because the default ring buffer is too small.
>
> Agreed, I would very much welcome something like this.

Great!

>> > There are mechanisms to help increase the kernel ring buffer
>> > for tracing through debugfs, and those interfaces even allow growing
>> > the kernel ring buffer per CPU. We also have a static value which
>> > can be passed upon boot. Relying on debugfs however is not ideal
>> > for production, and relying on the value passed upon bootup is
>> > can only used *after* an issue has creeped up. Instead of being
>> > reactive this adds a proactive measure which lets you scale the
>> > amount of contributions you'd expect to the kernel ring buffer
>> > under load by each CPU in the worst case scenerio.
>> >
>> > We use num_possible_cpus() to avoid complexities which could be
>> > introduced by dynamically changing the ring buffer size at run
>> > time, num_possible_cpus() lets us use the upper limit on possible
>> > number of CPUs therefore avoiding having to deal with hotplugging
>> > CPUs on and off. This option is diabled by default, and if used
>> > the kernel ring buffer size then can be computed as follows:
>> >
>> > size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
>> >
>> > Cc: Michal Hocko 
>> > Cc: Petr Mladek 
>> > Cc: Andrew Morton 
>> > Cc: Joe Perches 
>> > Cc: Arun KS 
>> > Cc: Kees Cook 
>> > Cc: linux-kernel@vger.kernel.org
>> > Signed-off-by: Luis R. Rodriguez 
>> > ---
>> >  init/Kconfig   | 28 
>> >  kernel/printk/printk.c |  6 --
>> >  2 files changed, 32 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/init/Kconfig b/init/Kconfig
>> > index 9d3585b..1814436 100644
>> > --- a/init/Kconfig
>> > +++ b/init/Kconfig
>> > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
>> >  13 =>  8 KB
>> >  12 =>  4 KB
>> >
>> > +config LOG_CPU_BUF_SHIFT
>> > +   int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
>> > +   range 0 21
>> > +   default 0
>> > +   help
>> > + The kernel ring buffer will get additional data logged onto it
>> > + when multiple CPUs are supported. Typically the contributions is a
>> > + few lines when idle however under under load this can vary and in the
>> > + worst case it can mean loosing logging information. You can use this
>> > + to set the maximum expected mount of amount of logging contribution
>> > + under load by each CPU in the worst case scenerio. Select a size as
>> > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
>> > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
>> > + follows having 16 CPUs as possible.
>> > +
>> > +((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
>>
>> It might be better to use the CPU_NUM-specific value as a minimum of
>> the needed space. Linux distributions might want to distribute kernel
>> with non-zero value and still use the static "__log_buf" on reasonable
>> small systems.
>
> It should also depend on SMP and !BASE_SMALL.
> I was wondering about disabling this by default as it would defeat the
> purpose of being a proactive feature. Similarly, I worry about distros
> choosing a correct default value on their own.

True, it seems Petr's recommendations would address these concerns for
systems under a certain amount of limit of number of CPUs, as is right
now we require the contribution by CPU in worst case scenario to be >
1/2 of the default kernel ring buffer size that's > 64 number of CPUs.

>> > + Where as typically you'd only end up with 256 KB. This is disabled
>> > + by default with a value of 0.
>>
>> I would add:
>>
>>   This value is ignored when "log_buf_len" commandline parameter
>>   is used. It forces the exact size of the ring buffer.
>
> ... and update Documentation/kernel-parameters.txt to be more
> descriptive about this new functionality.

Will do!

>> > + Examples:
>> > +17 => 128 KB
>> > +16 => 64 KB
>> > +15 => 32 KB
>> > +14 => 16 KB
>> > +13 =>  8 KB
>> > +12 =>  4 KB
>>
>> I think 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Luis R. Rodriguez
On Thu, Jun 12, 2014 at 03:05:32PM +0200, Petr Mládek wrote:
> On Wed 2014-06-11 23:47:41, Luis R. Rodriguez wrote:
> > On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote:
> > > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> > > > From: "Luis R. Rodriguez" 
> > > > diff --git a/init/Kconfig b/init/Kconfig
> > > > index 9d3585b..1814436 100644
> > > > --- a/init/Kconfig
> > > > +++ b/init/Kconfig
> > > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
> > > >  13 =>  8 KB
> > > >  12 =>  4 KB
> > > >  
> > > > +config LOG_CPU_BUF_SHIFT
> > > > +   int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 
> > > > 128KB)"
> > > > +   range 0 21
> > > > +   default 0
> > > > +   help
> > > > + The kernel ring buffer will get additional data logged onto it
> > > > + when multiple CPUs are supported. Typically the contributions 
> > > > is a
> > > > + few lines when idle however under under load this can vary 
> > > > and in the
> > > > + worst case it can mean loosing logging information. You can 
> > > > use this
> > > > + to set the maximum expected mount of amount of logging 
> > > > contribution
> > > > + under load by each CPU in the worst case scenerio. Select a 
> > > > size as
> > > > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> > > > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be 
> > > > as
> > > > + follows having 16 CPUs as possible.
> > > > +
> > > > +((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
> > > 
> > > It might be better to use the CPU_NUM-specific value as a minimum of
> > > the needed space. Linux distributions might want to distribute kernel
> > > with non-zero value and still use the static "__log_buf" on reasonable
> > > small systems.
> > 
> > Not sure if I follow what you mean by CPU_NUM-specific, can you
> > elaborate?
> 
> I wanted to say that the space requested by LOG_CPU_BUF_SHIFT depends
> on the number of CPUs. If LOG_CPU_BUF_SHIFT is not zero, your
> patch always allocates new ringbuffer and leave the static "__log_buf"
> unused. I think that this is not necessary for machines with small
> amount of CPUs and probably also with small amount of memory.

True, which is why I disabled it by default if we want to only leave
this disabled for < certain amount of num CPU systems, what is that
number, I see below a recommendation and I do like it.

> I would rename the variable to LOG_CPU_BUF_MIN_SHIFT or so. It would
> represent minimal size that is needed to print CPU-specific
> messages. If they take only "small" part of the default ring buffer
> size, we could still use the default rind buffer.

True, and will rename this, that still leaves open the question of a
number of CPUs that is sensible to keep but you resolve that below.

> For example, if we left 50% of the default buffer for CPU-specific
> messages, the code might look like:
> 
>   #define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> 
>   int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> 
>   if (!new_log_buf_len && (cpu_extra > __LOG_BUF_LEN / 2))
>   new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
> 
>   if (!new_log_buf_len)
>   return;
> 
>   allocate the new ring buffer...

Yeah I like these heuristics a lot, will fold them in and send a v2 now in 
patch form.
To be clear with this CONFIG_LOG_CPU_MIN_BUF_SHIFT could actually be left now to
something other than non zeo by default and only if that contribution is seen to
go above 1/2 of __LOG_BUF_LEN will we allocate more for the ring buffer. With
default values of LOG_BUF_SHIFT at 18 and say a default value of 12 for
LOG_CPU_MIN_BUF_SHIFT this would mean we'd need if we remove the -1 we'd require
64 CPUs in order to trigger an allocation for more memory. If we keep the -1 
we'd
require anything over 64 number of CPUs. Do we want to keep the -1 and the > 64
CPU requirement as default? Is the LOG_CPU_MIN_BUF_SHIFT default of 12 resonable
to start with (assumes 4KB in the worst case before the kernel ring buffer 
flips over).

> > The default in this patch is to ignore this, do you mean that upstream
> > should probably default to a non-zero value here and then let distributions
> > select 0 for some kernel builds ?
> 
> If the change has effect only for huge systems, the default value
> might be non-zero everywhere.

Sure.

> > If so then perhaps adding a sysctl override value might be good to
> > allow only small systems to override this to 0?
> 
> I think that it won't help to lover the value using sysctl because the
> huge buffer would be already allocated during boot. If I did not miss 
> anything.
> 
> [...]

Yeah true, a sensible default would be best, with the systctl we'd also have
to handle dynamic re-allocations and while the tracing code already added
code to make this easier I'd 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Joe Perches
(adding Chris Metcalf for arch/tile,
 I think this change might impact that arch)

On Thu, 2014-06-12 at 11:01 -0700, Davidlohr Bueso wrote:
> On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote:
> > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> > > From: "Luis R. Rodriguez" 
> > > 
> > > The default size of the ring buffer is too small for machines
> > > with a large amount of CPUs under heavy load. What ends up
> > > happening when debugging is the ring buffer overlaps and chews
> > > up old messages making debugging impossible unless the size is
> > > passed as a kernel parameter. An idle system upon boot up will
> > > on average spew out only about one or two extra lines but where
> > > this really matters is on heavy load and that will vary widely
> > > depending on the system and environment.
> > 
> > Thanks for looking at this. It is a pity to lose stracktrace when a huge
> > machine Oopses just because the default ring buffer is too small.
> 
> Agreed, I would very much welcome something like this.
> 
> > > There are mechanisms to help increase the kernel ring buffer
> > > for tracing through debugfs, and those interfaces even allow growing
> > > the kernel ring buffer per CPU. We also have a static value which
> > > can be passed upon boot. Relying on debugfs however is not ideal
> > > for production, and relying on the value passed upon bootup is
> > > can only used *after* an issue has creeped up. Instead of being
> > > reactive this adds a proactive measure which lets you scale the
> > > amount of contributions you'd expect to the kernel ring buffer
> > > under load by each CPU in the worst case scenerio.
> > > 
> > > We use num_possible_cpus() to avoid complexities which could be
> > > introduced by dynamically changing the ring buffer size at run
> > > time, num_possible_cpus() lets us use the upper limit on possible
> > > number of CPUs therefore avoiding having to deal with hotplugging
> > > CPUs on and off. This option is diabled by default, and if used
> > > the kernel ring buffer size then can be computed as follows:
> > > 
> > > size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
> > > 
> > > Cc: Michal Hocko 
> > > Cc: Petr Mladek 
> > > Cc: Andrew Morton 
> > > Cc: Joe Perches 
> > > Cc: Arun KS 
> > > Cc: Kees Cook 
> > > Cc: linux-kernel@vger.kernel.org
> > > Signed-off-by: Luis R. Rodriguez 
> > > ---
> > >  init/Kconfig   | 28 
> > >  kernel/printk/printk.c |  6 --
> > >  2 files changed, 32 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index 9d3585b..1814436 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
> > >13 =>  8 KB
> > >12 =>  4 KB
> > >  
> > > +config LOG_CPU_BUF_SHIFT
> > > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> > > + range 0 21
> > > + default 0
> > > + help
> > > +   The kernel ring buffer will get additional data logged onto it
> > > +   when multiple CPUs are supported. Typically the contributions is a
> > > +   few lines when idle however under under load this can vary and in the
> > > +   worst case it can mean loosing logging information. You can use this

trivia: s/loosing/losing/

> > > +   to set the maximum expected mount of amount of logging contribution
> > > +   under load by each CPU in the worst case scenerio. Select a size as
> > > +   a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> > > +   LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
> > > +   follows having 16 CPUs as possible.
> > > +
> > > +  ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
> > 
> > It might be better to use the CPU_NUM-specific value as a minimum of
> > the needed space. Linux distributions might want to distribute kernel
> > with non-zero value and still use the static "__log_buf" on reasonable
> > small systems.
> 
> It should also depend on SMP and !BASE_SMALL.
> I was wondering about disabling this by default as it would defeat the
> purpose of being a proactive feature. Similarly, I worry about distros
> choosing a correct default value on their own.
> 
> > > +   Where as typically you'd only end up with 256 KB. This is disabled
> > > +   by default with a value of 0.
> > 
> > I would add:
> > 
> > This value is ignored when "log_buf_len" commandline parameter
> > is used. It forces the exact size of the ring buffer.
> 
> ... and update Documentation/kernel-parameters.txt to be more
> descriptive about this new functionality.
> 
> > > +   Examples:
> > > +  17 => 128 KB
> > > +  16 => 64 KB
> > > +  15 => 32 KB
> > > +  14 => 16 KB
> > > +  13 =>  8 KB
> > > +  12 =>  4 KB
> > 
> > I think that we should make it more cleat that it is per-CPU here,
> > for example:
> > 
> > 17 => 128 KB for each CPU
> > 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Davidlohr Bueso

On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote:
> On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" 
> > 
> > The default size of the ring buffer is too small for machines
> > with a large amount of CPUs under heavy load. What ends up
> > happening when debugging is the ring buffer overlaps and chews
> > up old messages making debugging impossible unless the size is
> > passed as a kernel parameter. An idle system upon boot up will
> > on average spew out only about one or two extra lines but where
> > this really matters is on heavy load and that will vary widely
> > depending on the system and environment.
> 
> Thanks for looking at this. It is a pity to lose stracktrace when a huge
> machine Oopses just because the default ring buffer is too small.

Agreed, I would very much welcome something like this.

> > There are mechanisms to help increase the kernel ring buffer
> > for tracing through debugfs, and those interfaces even allow growing
> > the kernel ring buffer per CPU. We also have a static value which
> > can be passed upon boot. Relying on debugfs however is not ideal
> > for production, and relying on the value passed upon bootup is
> > can only used *after* an issue has creeped up. Instead of being
> > reactive this adds a proactive measure which lets you scale the
> > amount of contributions you'd expect to the kernel ring buffer
> > under load by each CPU in the worst case scenerio.
> > 
> > We use num_possible_cpus() to avoid complexities which could be
> > introduced by dynamically changing the ring buffer size at run
> > time, num_possible_cpus() lets us use the upper limit on possible
> > number of CPUs therefore avoiding having to deal with hotplugging
> > CPUs on and off. This option is diabled by default, and if used
> > the kernel ring buffer size then can be computed as follows:
> > 
> > size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
> > 
> > Cc: Michal Hocko 
> > Cc: Petr Mladek 
> > Cc: Andrew Morton 
> > Cc: Joe Perches 
> > Cc: Arun KS 
> > Cc: Kees Cook 
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez 
> > ---
> >  init/Kconfig   | 28 
> >  kernel/printk/printk.c |  6 --
> >  2 files changed, 32 insertions(+), 2 deletions(-)
> > 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 9d3585b..1814436 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
> >  13 =>  8 KB
> >  12 =>  4 KB
> >  
> > +config LOG_CPU_BUF_SHIFT
> > +   int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> > +   range 0 21
> > +   default 0
> > +   help
> > + The kernel ring buffer will get additional data logged onto it
> > + when multiple CPUs are supported. Typically the contributions is a
> > + few lines when idle however under under load this can vary and in the
> > + worst case it can mean loosing logging information. You can use this
> > + to set the maximum expected mount of amount of logging contribution
> > + under load by each CPU in the worst case scenerio. Select a size as
> > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
> > + follows having 16 CPUs as possible.
> > +
> > +((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
> 
> It might be better to use the CPU_NUM-specific value as a minimum of
> the needed space. Linux distributions might want to distribute kernel
> with non-zero value and still use the static "__log_buf" on reasonable
> small systems.

It should also depend on SMP and !BASE_SMALL.
I was wondering about disabling this by default as it would defeat the
purpose of being a proactive feature. Similarly, I worry about distros
choosing a correct default value on their own.

> > + Where as typically you'd only end up with 256 KB. This is disabled
> > + by default with a value of 0.
> 
> I would add:
> 
>   This value is ignored when "log_buf_len" commandline parameter
>   is used. It forces the exact size of the ring buffer.

... and update Documentation/kernel-parameters.txt to be more
descriptive about this new functionality.

> > + Examples:
> > +17 => 128 KB
> > +16 => 64 KB
> > +15 => 32 KB
> > +14 => 16 KB
> > +13 =>  8 KB
> > +12 =>  4 KB
> 
> I think that we should make it more cleat that it is per-CPU here,
> for example:
> 
>   17 => 128 KB for each CPU
>   16 =>  64 KB for each CPU
>   15 =>  32 KB for each CPU
>   14 =>  16 KB for each CPU
>   13 =>   8 KB for each CPU
>   12 =>   4 KB for each CPU
> 

Agreed.

> >  #
> >  # Architectures with an unreliable sched_clock() should select this:
> >  #
> > diff --git 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Petr Mládek
On Wed 2014-06-11 23:47:41, Luis R. Rodriguez wrote:
> On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote:
> > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> > > From: "Luis R. Rodriguez" 
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index 9d3585b..1814436 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
> > >13 =>  8 KB
> > >12 =>  4 KB
> > >  
> > > +config LOG_CPU_BUF_SHIFT
> > > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> > > + range 0 21
> > > + default 0
> > > + help
> > > +   The kernel ring buffer will get additional data logged onto it
> > > +   when multiple CPUs are supported. Typically the contributions is a
> > > +   few lines when idle however under under load this can vary and in the
> > > +   worst case it can mean loosing logging information. You can use this
> > > +   to set the maximum expected mount of amount of logging contribution
> > > +   under load by each CPU in the worst case scenerio. Select a size as
> > > +   a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> > > +   LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
> > > +   follows having 16 CPUs as possible.
> > > +
> > > +  ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
> > 
> > It might be better to use the CPU_NUM-specific value as a minimum of
> > the needed space. Linux distributions might want to distribute kernel
> > with non-zero value and still use the static "__log_buf" on reasonable
> > small systems.
> 
> Not sure if I follow what you mean by CPU_NUM-specific, can you
> elaborate?

I wanted to say that the space requested by LOG_CPU_BUF_SHIFT depends
on the number of CPUs. If LOG_CPU_BUF_SHIFT is not zero, your
patch always allocates new ringbuffer and leave the static "__log_buf"
unused. I think that this is not necessary for machines with small
amount of CPUs and probably also with small amount of memory.

I would rename the variable to LOG_CPU_BUF_MIN_SHIFT or so. It would
represent minimal size that is needed to print CPU-specific
messages. If they take only "small" part of the default ring buffer
size, we could still use the default rind buffer.

For example, if we left 50% of the default buffer for CPU-specific
messages, the code might look like:

#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)

int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;

if (!new_log_buf_len && (cpu_extra > __LOG_BUF_LEN / 2))
new_log_buf_len = __LOG_BUF_LEN + cpu_extra;

if (!new_log_buf_len)
return;

allocate the new ring buffer...


> The default in this patch is to ignore this, do you mean that upstream
> should probably default to a non-zero value here and then let distributions
> select 0 for some kernel builds ?

If the change has effect only for huge systems, the default value
might be non-zero everywhere.

> If so then perhaps adding a sysctl override value might be good to
> allow only small systems to override this to 0?

I think that it won't help to lover the value using sysctl because the
huge buffer would be already allocated during boot. If I did not miss anything.

[...]
> 
> > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > > index 7228258..2023424 100644
> > > --- a/kernel/printk/printk.c
> > > +++ b/kernel/printk/printk.c
> > > @@ -246,6 +246,7 @@ static u32 clear_idx;
> > >  #define LOG_ALIGN __alignof__(struct printk_log)
> > >  #endif
> > >  #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> > > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT)
> > >  static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> > >  static char *log_buf = __log_buf;
> > >  static u32 log_buf_len = __LOG_BUF_LEN;
> > > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
> > >   unsigned long flags;
> > >   char *new_log_buf;
> > >   int free;
> > > + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;
> > >  
> > > - if (!new_log_buf_len)
> > > - return;
> > > + if (!new_log_buf_len && cpu_extra > 1)
> > > + new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
> > 
> > We still should return when both new_log_buf_len and cpu_extra are
> > zero and call here:
> > 
> > if (!new_log_buf_len)
> > return;
> 
> The check for cpu_extra > 1 does that -- the default in the patch was 0
> and 1 << 0 is 1, so if in the case that the default is used we'd bail
> just like before. Or did I perhaps miss what you were saying here?

The problem is that we do not bail out because you removed the "return".
If "new_log_buf_len=0" and "cpu_extra=1" then we keep
"new_log_buf_len" as is. Then we continue, try to allocate zero memory
and print error: "log_buf_len: 0 bytes not available". Do I get it right?


> > Also I would feel more comfortable if we somehow limit the maximum
> > size of 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Petr Mládek
On Wed 2014-06-11 23:47:41, Luis R. Rodriguez wrote:
 On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote:
  On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
   From: Luis R. Rodriguez mcg...@suse.com
   diff --git a/init/Kconfig b/init/Kconfig
   index 9d3585b..1814436 100644
   --- a/init/Kconfig
   +++ b/init/Kconfig
   @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
  13 =  8 KB
  12 =  4 KB

   +config LOG_CPU_BUF_SHIFT
   + int CPU kernel log buffer size contribution (13 = 8 KB, 17 = 128KB)
   + range 0 21
   + default 0
   + help
   +   The kernel ring buffer will get additional data logged onto it
   +   when multiple CPUs are supported. Typically the contributions is a
   +   few lines when idle however under under load this can vary and in the
   +   worst case it can mean loosing logging information. You can use this
   +   to set the maximum expected mount of amount of logging contribution
   +   under load by each CPU in the worst case scenerio. Select a size as
   +   a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
   +   LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
   +   follows having 16 CPUs as possible.
   +
   +  ((1  18) + ((16 - 1) * (1  12))) / 1024 = 316 KB
  
  It might be better to use the CPU_NUM-specific value as a minimum of
  the needed space. Linux distributions might want to distribute kernel
  with non-zero value and still use the static __log_buf on reasonable
  small systems.
 
 Not sure if I follow what you mean by CPU_NUM-specific, can you
 elaborate?

I wanted to say that the space requested by LOG_CPU_BUF_SHIFT depends
on the number of CPUs. If LOG_CPU_BUF_SHIFT is not zero, your
patch always allocates new ringbuffer and leave the static __log_buf
unused. I think that this is not necessary for machines with small
amount of CPUs and probably also with small amount of memory.

I would rename the variable to LOG_CPU_BUF_MIN_SHIFT or so. It would
represent minimal size that is needed to print CPU-specific
messages. If they take only small part of the default ring buffer
size, we could still use the default rind buffer.

For example, if we left 50% of the default buffer for CPU-specific
messages, the code might look like:

#define __LOG_CPU_MIN_BUF_LEN (1  CONFIG_LOG_CPU_MIN_BUF_SHIFT)

int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;

if (!new_log_buf_len  (cpu_extra  __LOG_BUF_LEN / 2))
new_log_buf_len = __LOG_BUF_LEN + cpu_extra;

if (!new_log_buf_len)
return;

allocate the new ring buffer...


 The default in this patch is to ignore this, do you mean that upstream
 should probably default to a non-zero value here and then let distributions
 select 0 for some kernel builds ?

If the change has effect only for huge systems, the default value
might be non-zero everywhere.

 If so then perhaps adding a sysctl override value might be good to
 allow only small systems to override this to 0?

I think that it won't help to lover the value using sysctl because the
huge buffer would be already allocated during boot. If I did not miss anything.

[...]
 
   diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
   index 7228258..2023424 100644
   --- a/kernel/printk/printk.c
   +++ b/kernel/printk/printk.c
   @@ -246,6 +246,7 @@ static u32 clear_idx;
#define LOG_ALIGN __alignof__(struct printk_log)
#endif
#define __LOG_BUF_LEN (1  CONFIG_LOG_BUF_SHIFT)
   +#define __LOG_CPU_BUF_LEN (1  CONFIG_LOG_CPU_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;
   @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
 unsigned long flags;
 char *new_log_buf;
 int free;
   + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;

   - if (!new_log_buf_len)
   - return;
   + if (!new_log_buf_len  cpu_extra  1)
   + new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
  
  We still should return when both new_log_buf_len and cpu_extra are
  zero and call here:
  
  if (!new_log_buf_len)
  return;
 
 The check for cpu_extra  1 does that -- the default in the patch was 0
 and 1  0 is 1, so if in the case that the default is used we'd bail
 just like before. Or did I perhaps miss what you were saying here?

The problem is that we do not bail out because you removed the return.
If new_log_buf_len=0 and cpu_extra=1 then we keep
new_log_buf_len as is. Then we continue, try to allocate zero memory
and print error: log_buf_len: 0 bytes not available. Do I get it right?


  Also I would feel more comfortable if we somehow limit the maximum
  size of cpu_extra.
 
 Michal had similar concerns and I thought up to limit it to 1024 max
 CPUs, but after my second implementation I did some math on the values
 that would be used if say LOG_CPU_BUF_SHIFT was 12, it turns out 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Davidlohr Bueso

On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote:
 On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
  From: Luis R. Rodriguez mcg...@suse.com
  
  The default size of the ring buffer is too small for machines
  with a large amount of CPUs under heavy load. What ends up
  happening when debugging is the ring buffer overlaps and chews
  up old messages making debugging impossible unless the size is
  passed as a kernel parameter. An idle system upon boot up will
  on average spew out only about one or two extra lines but where
  this really matters is on heavy load and that will vary widely
  depending on the system and environment.
 
 Thanks for looking at this. It is a pity to lose stracktrace when a huge
 machine Oopses just because the default ring buffer is too small.

Agreed, I would very much welcome something like this.

  There are mechanisms to help increase the kernel ring buffer
  for tracing through debugfs, and those interfaces even allow growing
  the kernel ring buffer per CPU. We also have a static value which
  can be passed upon boot. Relying on debugfs however is not ideal
  for production, and relying on the value passed upon bootup is
  can only used *after* an issue has creeped up. Instead of being
  reactive this adds a proactive measure which lets you scale the
  amount of contributions you'd expect to the kernel ring buffer
  under load by each CPU in the worst case scenerio.
  
  We use num_possible_cpus() to avoid complexities which could be
  introduced by dynamically changing the ring buffer size at run
  time, num_possible_cpus() lets us use the upper limit on possible
  number of CPUs therefore avoiding having to deal with hotplugging
  CPUs on and off. This option is diabled by default, and if used
  the kernel ring buffer size then can be computed as follows:
  
  size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
  
  Cc: Michal Hocko mho...@suse.cz
  Cc: Petr Mladek pmla...@suse.cz
  Cc: Andrew Morton a...@linux-foundation.org
  Cc: Joe Perches j...@perches.com
  Cc: Arun KS arunks.li...@gmail.com
  Cc: Kees Cook keesc...@chromium.org
  Cc: linux-kernel@vger.kernel.org
  Signed-off-by: Luis R. Rodriguez mcg...@suse.com
  ---
   init/Kconfig   | 28 
   kernel/printk/printk.c |  6 --
   2 files changed, 32 insertions(+), 2 deletions(-)
  
  diff --git a/init/Kconfig b/init/Kconfig
  index 9d3585b..1814436 100644
  --- a/init/Kconfig
  +++ b/init/Kconfig
  @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
   13 =  8 KB
   12 =  4 KB
   
  +config LOG_CPU_BUF_SHIFT
  +   int CPU kernel log buffer size contribution (13 = 8 KB, 17 = 128KB)
  +   range 0 21
  +   default 0
  +   help
  + The kernel ring buffer will get additional data logged onto it
  + when multiple CPUs are supported. Typically the contributions is a
  + few lines when idle however under under load this can vary and in the
  + worst case it can mean loosing logging information. You can use this
  + to set the maximum expected mount of amount of logging contribution
  + under load by each CPU in the worst case scenerio. Select a size as
  + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
  + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
  + follows having 16 CPUs as possible.
  +
  +((1  18) + ((16 - 1) * (1  12))) / 1024 = 316 KB
 
 It might be better to use the CPU_NUM-specific value as a minimum of
 the needed space. Linux distributions might want to distribute kernel
 with non-zero value and still use the static __log_buf on reasonable
 small systems.

It should also depend on SMP and !BASE_SMALL.
I was wondering about disabling this by default as it would defeat the
purpose of being a proactive feature. Similarly, I worry about distros
choosing a correct default value on their own.

  + Where as typically you'd only end up with 256 KB. This is disabled
  + by default with a value of 0.
 
 I would add:
 
   This value is ignored when log_buf_len commandline parameter
   is used. It forces the exact size of the ring buffer.

... and update Documentation/kernel-parameters.txt to be more
descriptive about this new functionality.

  + Examples:
  +17 = 128 KB
  +16 = 64 KB
  +15 = 32 KB
  +14 = 16 KB
  +13 =  8 KB
  +12 =  4 KB
 
 I think that we should make it more cleat that it is per-CPU here,
 for example:
 
   17 = 128 KB for each CPU
   16 =  64 KB for each CPU
   15 =  32 KB for each CPU
   14 =  16 KB for each CPU
   13 =   8 KB for each CPU
   12 =   4 KB for each CPU
 

Agreed.

   #
   # Architectures with an unreliable sched_clock() should select this:
   #
  diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
  index 7228258..2023424 100644

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Joe Perches
(adding Chris Metcalf for arch/tile,
 I think this change might impact that arch)

On Thu, 2014-06-12 at 11:01 -0700, Davidlohr Bueso wrote:
 On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote:
  On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
   From: Luis R. Rodriguez mcg...@suse.com
   
   The default size of the ring buffer is too small for machines
   with a large amount of CPUs under heavy load. What ends up
   happening when debugging is the ring buffer overlaps and chews
   up old messages making debugging impossible unless the size is
   passed as a kernel parameter. An idle system upon boot up will
   on average spew out only about one or two extra lines but where
   this really matters is on heavy load and that will vary widely
   depending on the system and environment.
  
  Thanks for looking at this. It is a pity to lose stracktrace when a huge
  machine Oopses just because the default ring buffer is too small.
 
 Agreed, I would very much welcome something like this.
 
   There are mechanisms to help increase the kernel ring buffer
   for tracing through debugfs, and those interfaces even allow growing
   the kernel ring buffer per CPU. We also have a static value which
   can be passed upon boot. Relying on debugfs however is not ideal
   for production, and relying on the value passed upon bootup is
   can only used *after* an issue has creeped up. Instead of being
   reactive this adds a proactive measure which lets you scale the
   amount of contributions you'd expect to the kernel ring buffer
   under load by each CPU in the worst case scenerio.
   
   We use num_possible_cpus() to avoid complexities which could be
   introduced by dynamically changing the ring buffer size at run
   time, num_possible_cpus() lets us use the upper limit on possible
   number of CPUs therefore avoiding having to deal with hotplugging
   CPUs on and off. This option is diabled by default, and if used
   the kernel ring buffer size then can be computed as follows:
   
   size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
   
   Cc: Michal Hocko mho...@suse.cz
   Cc: Petr Mladek pmla...@suse.cz
   Cc: Andrew Morton a...@linux-foundation.org
   Cc: Joe Perches j...@perches.com
   Cc: Arun KS arunks.li...@gmail.com
   Cc: Kees Cook keesc...@chromium.org
   Cc: linux-kernel@vger.kernel.org
   Signed-off-by: Luis R. Rodriguez mcg...@suse.com
   ---
init/Kconfig   | 28 
kernel/printk/printk.c |  6 --
2 files changed, 32 insertions(+), 2 deletions(-)
   
   diff --git a/init/Kconfig b/init/Kconfig
   index 9d3585b..1814436 100644
   --- a/init/Kconfig
   +++ b/init/Kconfig
   @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
  13 =  8 KB
  12 =  4 KB

   +config LOG_CPU_BUF_SHIFT
   + int CPU kernel log buffer size contribution (13 = 8 KB, 17 = 128KB)
   + range 0 21
   + default 0
   + help
   +   The kernel ring buffer will get additional data logged onto it
   +   when multiple CPUs are supported. Typically the contributions is a
   +   few lines when idle however under under load this can vary and in the
   +   worst case it can mean loosing logging information. You can use this

trivia: s/loosing/losing/

   +   to set the maximum expected mount of amount of logging contribution
   +   under load by each CPU in the worst case scenerio. Select a size as
   +   a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
   +   LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
   +   follows having 16 CPUs as possible.
   +
   +  ((1  18) + ((16 - 1) * (1  12))) / 1024 = 316 KB
  
  It might be better to use the CPU_NUM-specific value as a minimum of
  the needed space. Linux distributions might want to distribute kernel
  with non-zero value and still use the static __log_buf on reasonable
  small systems.
 
 It should also depend on SMP and !BASE_SMALL.
 I was wondering about disabling this by default as it would defeat the
 purpose of being a proactive feature. Similarly, I worry about distros
 choosing a correct default value on their own.
 
   +   Where as typically you'd only end up with 256 KB. This is disabled
   +   by default with a value of 0.
  
  I would add:
  
  This value is ignored when log_buf_len commandline parameter
  is used. It forces the exact size of the ring buffer.
 
 ... and update Documentation/kernel-parameters.txt to be more
 descriptive about this new functionality.
 
   +   Examples:
   +  17 = 128 KB
   +  16 = 64 KB
   +  15 = 32 KB
   +  14 = 16 KB
   +  13 =  8 KB
   +  12 =  4 KB
  
  I think that we should make it more cleat that it is per-CPU here,
  for example:
  
  17 = 128 KB for each CPU
  16 =  64 KB for each CPU
  15 =  32 KB for each CPU
  14 =  16 KB for each CPU
  13 =   8 KB for each CPU
 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Luis R. Rodriguez
On Thu, Jun 12, 2014 at 03:05:32PM +0200, Petr Mládek wrote:
 On Wed 2014-06-11 23:47:41, Luis R. Rodriguez wrote:
  On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote:
   On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
From: Luis R. Rodriguez mcg...@suse.com
diff --git a/init/Kconfig b/init/Kconfig
index 9d3585b..1814436 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
 13 =  8 KB
 12 =  4 KB
 
+config LOG_CPU_BUF_SHIFT
+   int CPU kernel log buffer size contribution (13 = 8 KB, 17 = 
128KB)
+   range 0 21
+   default 0
+   help
+ The kernel ring buffer will get additional data logged onto it
+ when multiple CPUs are supported. Typically the contributions 
is a
+ few lines when idle however under under load this can vary 
and in the
+ worst case it can mean loosing logging information. You can 
use this
+ to set the maximum expected mount of amount of logging 
contribution
+ under load by each CPU in the worst case scenerio. Select a 
size as
+ a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
+ LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be 
as
+ follows having 16 CPUs as possible.
+
+((1  18) + ((16 - 1) * (1  12))) / 1024 = 316 KB
   
   It might be better to use the CPU_NUM-specific value as a minimum of
   the needed space. Linux distributions might want to distribute kernel
   with non-zero value and still use the static __log_buf on reasonable
   small systems.
  
  Not sure if I follow what you mean by CPU_NUM-specific, can you
  elaborate?
 
 I wanted to say that the space requested by LOG_CPU_BUF_SHIFT depends
 on the number of CPUs. If LOG_CPU_BUF_SHIFT is not zero, your
 patch always allocates new ringbuffer and leave the static __log_buf
 unused. I think that this is not necessary for machines with small
 amount of CPUs and probably also with small amount of memory.

True, which is why I disabled it by default if we want to only leave
this disabled for  certain amount of num CPU systems, what is that
number, I see below a recommendation and I do like it.

 I would rename the variable to LOG_CPU_BUF_MIN_SHIFT or so. It would
 represent minimal size that is needed to print CPU-specific
 messages. If they take only small part of the default ring buffer
 size, we could still use the default rind buffer.

True, and will rename this, that still leaves open the question of a
number of CPUs that is sensible to keep but you resolve that below.

 For example, if we left 50% of the default buffer for CPU-specific
 messages, the code might look like:
 
   #define __LOG_CPU_MIN_BUF_LEN (1  CONFIG_LOG_CPU_MIN_BUF_SHIFT)
 
   int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
 
   if (!new_log_buf_len  (cpu_extra  __LOG_BUF_LEN / 2))
   new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
 
   if (!new_log_buf_len)
   return;
 
   allocate the new ring buffer...

Yeah I like these heuristics a lot, will fold them in and send a v2 now in 
patch form.
To be clear with this CONFIG_LOG_CPU_MIN_BUF_SHIFT could actually be left now to
something other than non zeo by default and only if that contribution is seen to
go above 1/2 of __LOG_BUF_LEN will we allocate more for the ring buffer. With
default values of LOG_BUF_SHIFT at 18 and say a default value of 12 for
LOG_CPU_MIN_BUF_SHIFT this would mean we'd need if we remove the -1 we'd require
64 CPUs in order to trigger an allocation for more memory. If we keep the -1 
we'd
require anything over 64 number of CPUs. Do we want to keep the -1 and the  64
CPU requirement as default? Is the LOG_CPU_MIN_BUF_SHIFT default of 12 resonable
to start with (assumes 4KB in the worst case before the kernel ring buffer 
flips over).

  The default in this patch is to ignore this, do you mean that upstream
  should probably default to a non-zero value here and then let distributions
  select 0 for some kernel builds ?
 
 If the change has effect only for huge systems, the default value
 might be non-zero everywhere.

Sure.

  If so then perhaps adding a sysctl override value might be good to
  allow only small systems to override this to 0?
 
 I think that it won't help to lover the value using sysctl because the
 huge buffer would be already allocated during boot. If I did not miss 
 anything.
 
 [...]

Yeah true, a sensible default would be best, with the systctl we'd also have
to handle dynamic re-allocations and while the tracing code already added
code to make this easier I'd prefer we don't make this a popular path.

  
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7228258..2023424 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
   

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Luis R. Rodriguez
On Thu, Jun 12, 2014 at 11:01 AM, Davidlohr Bueso davidl...@hp.com wrote:
 On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote:
 On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
  From: Luis R. Rodriguez mcg...@suse.com
 
  The default size of the ring buffer is too small for machines
  with a large amount of CPUs under heavy load. What ends up
  happening when debugging is the ring buffer overlaps and chews
  up old messages making debugging impossible unless the size is
  passed as a kernel parameter. An idle system upon boot up will
  on average spew out only about one or two extra lines but where
  this really matters is on heavy load and that will vary widely
  depending on the system and environment.

 Thanks for looking at this. It is a pity to lose stracktrace when a huge
 machine Oopses just because the default ring buffer is too small.

 Agreed, I would very much welcome something like this.

Great!

  There are mechanisms to help increase the kernel ring buffer
  for tracing through debugfs, and those interfaces even allow growing
  the kernel ring buffer per CPU. We also have a static value which
  can be passed upon boot. Relying on debugfs however is not ideal
  for production, and relying on the value passed upon bootup is
  can only used *after* an issue has creeped up. Instead of being
  reactive this adds a proactive measure which lets you scale the
  amount of contributions you'd expect to the kernel ring buffer
  under load by each CPU in the worst case scenerio.
 
  We use num_possible_cpus() to avoid complexities which could be
  introduced by dynamically changing the ring buffer size at run
  time, num_possible_cpus() lets us use the upper limit on possible
  number of CPUs therefore avoiding having to deal with hotplugging
  CPUs on and off. This option is diabled by default, and if used
  the kernel ring buffer size then can be computed as follows:
 
  size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
 
  Cc: Michal Hocko mho...@suse.cz
  Cc: Petr Mladek pmla...@suse.cz
  Cc: Andrew Morton a...@linux-foundation.org
  Cc: Joe Perches j...@perches.com
  Cc: Arun KS arunks.li...@gmail.com
  Cc: Kees Cook keesc...@chromium.org
  Cc: linux-kernel@vger.kernel.org
  Signed-off-by: Luis R. Rodriguez mcg...@suse.com
  ---
   init/Kconfig   | 28 
   kernel/printk/printk.c |  6 --
   2 files changed, 32 insertions(+), 2 deletions(-)
 
  diff --git a/init/Kconfig b/init/Kconfig
  index 9d3585b..1814436 100644
  --- a/init/Kconfig
  +++ b/init/Kconfig
  @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
   13 =  8 KB
   12 =  4 KB
 
  +config LOG_CPU_BUF_SHIFT
  +   int CPU kernel log buffer size contribution (13 = 8 KB, 17 = 128KB)
  +   range 0 21
  +   default 0
  +   help
  + The kernel ring buffer will get additional data logged onto it
  + when multiple CPUs are supported. Typically the contributions is a
  + few lines when idle however under under load this can vary and in the
  + worst case it can mean loosing logging information. You can use this
  + to set the maximum expected mount of amount of logging contribution
  + under load by each CPU in the worst case scenerio. Select a size as
  + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
  + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
  + follows having 16 CPUs as possible.
  +
  +((1  18) + ((16 - 1) * (1  12))) / 1024 = 316 KB

 It might be better to use the CPU_NUM-specific value as a minimum of
 the needed space. Linux distributions might want to distribute kernel
 with non-zero value and still use the static __log_buf on reasonable
 small systems.

 It should also depend on SMP and !BASE_SMALL.
 I was wondering about disabling this by default as it would defeat the
 purpose of being a proactive feature. Similarly, I worry about distros
 choosing a correct default value on their own.

True, it seems Petr's recommendations would address these concerns for
systems under a certain amount of limit of number of CPUs, as is right
now we require the contribution by CPU in worst case scenario to be 
1/2 of the default kernel ring buffer size that's  64 number of CPUs.

  + Where as typically you'd only end up with 256 KB. This is disabled
  + by default with a value of 0.

 I would add:

   This value is ignored when log_buf_len commandline parameter
   is used. It forces the exact size of the ring buffer.

 ... and update Documentation/kernel-parameters.txt to be more
 descriptive about this new functionality.

Will do!

  + Examples:
  +17 = 128 KB
  +16 = 64 KB
  +15 = 32 KB
  +14 = 16 KB
  +13 =  8 KB
  +12 =  4 KB

 I think that we should make it more cleat that it is per-CPU here,
 for example:

   17 = 128 KB for each CPU
   16 =  

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-12 Thread Luis R. Rodriguez
On Thu, Jun 12, 2014 at 11:45 AM, Joe Perches j...@perches.com wrote:

 trivia: s/loosing/losing/

Amended, thanks.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-11 Thread Luis R. Rodriguez
On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote:
> On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 9d3585b..1814436 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
> >  13 =>  8 KB
> >  12 =>  4 KB
> >  
> > +config LOG_CPU_BUF_SHIFT
> > +   int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> > +   range 0 21
> > +   default 0
> > +   help
> > + The kernel ring buffer will get additional data logged onto it
> > + when multiple CPUs are supported. Typically the contributions is a
> > + few lines when idle however under under load this can vary and in the
> > + worst case it can mean loosing logging information. You can use this
> > + to set the maximum expected mount of amount of logging contribution
> > + under load by each CPU in the worst case scenerio. Select a size as
> > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
> > + follows having 16 CPUs as possible.
> > +
> > +((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
> 
> It might be better to use the CPU_NUM-specific value as a minimum of
> the needed space. Linux distributions might want to distribute kernel
> with non-zero value and still use the static "__log_buf" on reasonable
> small systems.

Not sure if I follow what you mean by CPU_NUM-specific, can you elaborate?
The default in this patch is to ignore this, do you mean that upstream
should probably default to a non-zero value here and then let distributions
select 0 for some kernel builds ? If so then perhaps adding a sysctl
override value might be good to allow only small systems to override
this to 0?

> > + Where as typically you'd only end up with 256 KB. This is disabled
> > + by default with a value of 0.
> 
> I would add:
> 
>   This value is ignored when "log_buf_len" commandline parameter
>   is used. It forces the exact size of the ring buffer.

Good point, I've amended this in.

> > + Examples:
> > +17 => 128 KB
> > +16 => 64 KB
> > +15 => 32 KB
> > +14 => 16 KB
> > +13 =>  8 KB
> > +12 =>  4 KB
> 
> I think that we should make it more cleat that it is per-CPU here,
> for example:
> 
>   17 => 128 KB for each CPU
>   16 =>  64 KB for each CPU
>   15 =>  32 KB for each CPU
>   14 =>  16 KB for each CPU
>   13 =>   8 KB for each CPU
>   12 =>   4 KB for each CPU

Thanks, amended as well.

> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 7228258..2023424 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -246,6 +246,7 @@ static u32 clear_idx;
> >  #define LOG_ALIGN __alignof__(struct printk_log)
> >  #endif
> >  #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT)
> >  static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> >  static char *log_buf = __log_buf;
> >  static u32 log_buf_len = __LOG_BUF_LEN;
> > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
> > unsigned long flags;
> > char *new_log_buf;
> > int free;
> > +   int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;
> >  
> > -   if (!new_log_buf_len)
> > -   return;
> > +   if (!new_log_buf_len && cpu_extra > 1)
> > +   new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
> 
> We still should return when both new_log_buf_len and cpu_extra are
> zero and call here:
> 
>   if (!new_log_buf_len)
>   return;

The check for cpu_extra > 1 does that -- the default in the patch was 0
and 1 << 0 is 1, so if in the case that the default is used we'd bail
just like before. Or did I perhaps miss what you were saying here?

> Also I would feel more comfortable if we somehow limit the maximum
> size of cpu_extra.

Michal had similar concerns and I thought up to limit it to 1024 max
CPUs, but after my second implementation I did some math on the values
that would be used if say LOG_CPU_BUF_SHIFT was 12, it turns out to not
be *that* bad for even huge num_possible_cpus(). For example for 4096
num_possible_cpus() this comes out to with LOG_BUF_SHIFT of 18:


((1 << 18) + ((4096 - 1) * (1 << 12))) / 1024 = 16636 KB

~16 MB doesn't seem that bad for such a monster box which I'd presume
would have an insane amount of memory. If this logic however does
seems unreasonable and we should cap it -- then by all means lets
pick a sensible number, its just not clear to me what that number
should be. Another reason why I stayed away from capping this was
that we'd then likely end up capping this in the future, and I was
trying to find a solution that would 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-11 Thread Petr Mládek
On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" 
> 
> The default size of the ring buffer is too small for machines
> with a large amount of CPUs under heavy load. What ends up
> happening when debugging is the ring buffer overlaps and chews
> up old messages making debugging impossible unless the size is
> passed as a kernel parameter. An idle system upon boot up will
> on average spew out only about one or two extra lines but where
> this really matters is on heavy load and that will vary widely
> depending on the system and environment.

Thanks for looking at this. It is a pity to lose stracktrace when a huge
machine Oopses just because the default ring buffer is too small.


> There are mechanisms to help increase the kernel ring buffer
> for tracing through debugfs, and those interfaces even allow growing
> the kernel ring buffer per CPU. We also have a static value which
> can be passed upon boot. Relying on debugfs however is not ideal
> for production, and relying on the value passed upon bootup is
> can only used *after* an issue has creeped up. Instead of being
> reactive this adds a proactive measure which lets you scale the
> amount of contributions you'd expect to the kernel ring buffer
> under load by each CPU in the worst case scenerio.
> 
> We use num_possible_cpus() to avoid complexities which could be
> introduced by dynamically changing the ring buffer size at run
> time, num_possible_cpus() lets us use the upper limit on possible
> number of CPUs therefore avoiding having to deal with hotplugging
> CPUs on and off. This option is diabled by default, and if used
> the kernel ring buffer size then can be computed as follows:
> 
> size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
> 
> Cc: Michal Hocko 
> Cc: Petr Mladek 
> Cc: Andrew Morton 
> Cc: Joe Perches 
> Cc: Arun KS 
> Cc: Kees Cook 
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez 
> ---
>  init/Kconfig   | 28 
>  kernel/printk/printk.c |  6 --
>  2 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 9d3585b..1814436 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
>13 =>  8 KB
>12 =>  4 KB
>  
> +config LOG_CPU_BUF_SHIFT
> + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> + range 0 21
> + default 0
> + help
> +   The kernel ring buffer will get additional data logged onto it
> +   when multiple CPUs are supported. Typically the contributions is a
> +   few lines when idle however under under load this can vary and in the
> +   worst case it can mean loosing logging information. You can use this
> +   to set the maximum expected mount of amount of logging contribution
> +   under load by each CPU in the worst case scenerio. Select a size as
> +   a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> +   LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
> +   follows having 16 CPUs as possible.
> +
> +  ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB

It might be better to use the CPU_NUM-specific value as a minimum of
the needed space. Linux distributions might want to distribute kernel
with non-zero value and still use the static "__log_buf" on reasonable
small systems.

> +   Where as typically you'd only end up with 256 KB. This is disabled
> +   by default with a value of 0.

I would add:

This value is ignored when "log_buf_len" commandline parameter
is used. It forces the exact size of the ring buffer.

> +   Examples:
> +  17 => 128 KB
> +  16 => 64 KB
> +  15 => 32 KB
> +  14 => 16 KB
> +  13 =>  8 KB
> +  12 =>  4 KB

I think that we should make it more cleat that it is per-CPU here,
for example:

17 => 128 KB for each CPU
16 =>  64 KB for each CPU
15 =>  32 KB for each CPU
14 =>  16 KB for each CPU
13 =>   8 KB for each CPU
12 =>   4 KB for each CPU


>  #
>  # Architectures with an unreliable sched_clock() should select this:
>  #
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 7228258..2023424 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -246,6 +246,7 @@ static u32 clear_idx;
>  #define LOG_ALIGN __alignof__(struct printk_log)
>  #endif
>  #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT)
>  static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
>   unsigned long flags;
>   char 

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-11 Thread Petr Mládek
On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
 From: Luis R. Rodriguez mcg...@suse.com
 
 The default size of the ring buffer is too small for machines
 with a large amount of CPUs under heavy load. What ends up
 happening when debugging is the ring buffer overlaps and chews
 up old messages making debugging impossible unless the size is
 passed as a kernel parameter. An idle system upon boot up will
 on average spew out only about one or two extra lines but where
 this really matters is on heavy load and that will vary widely
 depending on the system and environment.

Thanks for looking at this. It is a pity to lose stracktrace when a huge
machine Oopses just because the default ring buffer is too small.


 There are mechanisms to help increase the kernel ring buffer
 for tracing through debugfs, and those interfaces even allow growing
 the kernel ring buffer per CPU. We also have a static value which
 can be passed upon boot. Relying on debugfs however is not ideal
 for production, and relying on the value passed upon bootup is
 can only used *after* an issue has creeped up. Instead of being
 reactive this adds a proactive measure which lets you scale the
 amount of contributions you'd expect to the kernel ring buffer
 under load by each CPU in the worst case scenerio.
 
 We use num_possible_cpus() to avoid complexities which could be
 introduced by dynamically changing the ring buffer size at run
 time, num_possible_cpus() lets us use the upper limit on possible
 number of CPUs therefore avoiding having to deal with hotplugging
 CPUs on and off. This option is diabled by default, and if used
 the kernel ring buffer size then can be computed as follows:
 
 size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
 
 Cc: Michal Hocko mho...@suse.cz
 Cc: Petr Mladek pmla...@suse.cz
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Joe Perches j...@perches.com
 Cc: Arun KS arunks.li...@gmail.com
 Cc: Kees Cook keesc...@chromium.org
 Cc: linux-kernel@vger.kernel.org
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  init/Kconfig   | 28 
  kernel/printk/printk.c |  6 --
  2 files changed, 32 insertions(+), 2 deletions(-)
 
 diff --git a/init/Kconfig b/init/Kconfig
 index 9d3585b..1814436 100644
 --- a/init/Kconfig
 +++ b/init/Kconfig
 @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
13 =  8 KB
12 =  4 KB
  
 +config LOG_CPU_BUF_SHIFT
 + int CPU kernel log buffer size contribution (13 = 8 KB, 17 = 128KB)
 + range 0 21
 + default 0
 + help
 +   The kernel ring buffer will get additional data logged onto it
 +   when multiple CPUs are supported. Typically the contributions is a
 +   few lines when idle however under under load this can vary and in the
 +   worst case it can mean loosing logging information. You can use this
 +   to set the maximum expected mount of amount of logging contribution
 +   under load by each CPU in the worst case scenerio. Select a size as
 +   a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
 +   LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
 +   follows having 16 CPUs as possible.
 +
 +  ((1  18) + ((16 - 1) * (1  12))) / 1024 = 316 KB

It might be better to use the CPU_NUM-specific value as a minimum of
the needed space. Linux distributions might want to distribute kernel
with non-zero value and still use the static __log_buf on reasonable
small systems.

 +   Where as typically you'd only end up with 256 KB. This is disabled
 +   by default with a value of 0.

I would add:

This value is ignored when log_buf_len commandline parameter
is used. It forces the exact size of the ring buffer.

 +   Examples:
 +  17 = 128 KB
 +  16 = 64 KB
 +  15 = 32 KB
 +  14 = 16 KB
 +  13 =  8 KB
 +  12 =  4 KB

I think that we should make it more cleat that it is per-CPU here,
for example:

17 = 128 KB for each CPU
16 =  64 KB for each CPU
15 =  32 KB for each CPU
14 =  16 KB for each CPU
13 =   8 KB for each CPU
12 =   4 KB for each CPU


  #
  # Architectures with an unreliable sched_clock() should select this:
  #
 diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
 index 7228258..2023424 100644
 --- a/kernel/printk/printk.c
 +++ b/kernel/printk/printk.c
 @@ -246,6 +246,7 @@ static u32 clear_idx;
  #define LOG_ALIGN __alignof__(struct printk_log)
  #endif
  #define __LOG_BUF_LEN (1  CONFIG_LOG_BUF_SHIFT)
 +#define __LOG_CPU_BUF_LEN (1  CONFIG_LOG_CPU_BUF_SHIFT)
  static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
  static char *log_buf = __log_buf;
  static u32 log_buf_len = __LOG_BUF_LEN;
 @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
   unsigned long flags;
  

Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-11 Thread Luis R. Rodriguez
On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote:
 On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
  From: Luis R. Rodriguez mcg...@suse.com
  diff --git a/init/Kconfig b/init/Kconfig
  index 9d3585b..1814436 100644
  --- a/init/Kconfig
  +++ b/init/Kconfig
  @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
   13 =  8 KB
   12 =  4 KB
   
  +config LOG_CPU_BUF_SHIFT
  +   int CPU kernel log buffer size contribution (13 = 8 KB, 17 = 128KB)
  +   range 0 21
  +   default 0
  +   help
  + The kernel ring buffer will get additional data logged onto it
  + when multiple CPUs are supported. Typically the contributions is a
  + few lines when idle however under under load this can vary and in the
  + worst case it can mean loosing logging information. You can use this
  + to set the maximum expected mount of amount of logging contribution
  + under load by each CPU in the worst case scenerio. Select a size as
  + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
  + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
  + follows having 16 CPUs as possible.
  +
  +((1  18) + ((16 - 1) * (1  12))) / 1024 = 316 KB
 
 It might be better to use the CPU_NUM-specific value as a minimum of
 the needed space. Linux distributions might want to distribute kernel
 with non-zero value and still use the static __log_buf on reasonable
 small systems.

Not sure if I follow what you mean by CPU_NUM-specific, can you elaborate?
The default in this patch is to ignore this, do you mean that upstream
should probably default to a non-zero value here and then let distributions
select 0 for some kernel builds ? If so then perhaps adding a sysctl
override value might be good to allow only small systems to override
this to 0?

  + Where as typically you'd only end up with 256 KB. This is disabled
  + by default with a value of 0.
 
 I would add:
 
   This value is ignored when log_buf_len commandline parameter
   is used. It forces the exact size of the ring buffer.

Good point, I've amended this in.

  + Examples:
  +17 = 128 KB
  +16 = 64 KB
  +15 = 32 KB
  +14 = 16 KB
  +13 =  8 KB
  +12 =  4 KB
 
 I think that we should make it more cleat that it is per-CPU here,
 for example:
 
   17 = 128 KB for each CPU
   16 =  64 KB for each CPU
   15 =  32 KB for each CPU
   14 =  16 KB for each CPU
   13 =   8 KB for each CPU
   12 =   4 KB for each CPU

Thanks, amended as well.

  diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
  index 7228258..2023424 100644
  --- a/kernel/printk/printk.c
  +++ b/kernel/printk/printk.c
  @@ -246,6 +246,7 @@ static u32 clear_idx;
   #define LOG_ALIGN __alignof__(struct printk_log)
   #endif
   #define __LOG_BUF_LEN (1  CONFIG_LOG_BUF_SHIFT)
  +#define __LOG_CPU_BUF_LEN (1  CONFIG_LOG_CPU_BUF_SHIFT)
   static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
   static char *log_buf = __log_buf;
   static u32 log_buf_len = __LOG_BUF_LEN;
  @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
  unsigned long flags;
  char *new_log_buf;
  int free;
  +   int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;
   
  -   if (!new_log_buf_len)
  -   return;
  +   if (!new_log_buf_len  cpu_extra  1)
  +   new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
 
 We still should return when both new_log_buf_len and cpu_extra are
 zero and call here:
 
   if (!new_log_buf_len)
   return;

The check for cpu_extra  1 does that -- the default in the patch was 0
and 1  0 is 1, so if in the case that the default is used we'd bail
just like before. Or did I perhaps miss what you were saying here?

 Also I would feel more comfortable if we somehow limit the maximum
 size of cpu_extra.

Michal had similar concerns and I thought up to limit it to 1024 max
CPUs, but after my second implementation I did some math on the values
that would be used if say LOG_CPU_BUF_SHIFT was 12, it turns out to not
be *that* bad for even huge num_possible_cpus(). For example for 4096
num_possible_cpus() this comes out to with LOG_BUF_SHIFT of 18:


((1  18) + ((4096 - 1) * (1  12))) / 1024 = 16636 KB

~16 MB doesn't seem that bad for such a monster box which I'd presume
would have an insane amount of memory. If this logic however does
seems unreasonable and we should cap it -- then by all means lets
pick a sensible number, its just not clear to me what that number
should be. Another reason why I stayed away from capping this was
that we'd then likely end up capping this in the future, and I was
trying to find a solution that would not require mucking as
technology evolves. The reasoning above is also why I had opted to
make the default to 0, only distributions would have a good sense
of what might 

[RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-10 Thread Luis R. Rodriguez
From: "Luis R. Rodriguez" 

The default size of the ring buffer is too small for machines
with a large amount of CPUs under heavy load. What ends up
happening when debugging is the ring buffer overlaps and chews
up old messages making debugging impossible unless the size is
passed as a kernel parameter. An idle system upon boot up will
on average spew out only about one or two extra lines but where
this really matters is on heavy load and that will vary widely
depending on the system and environment.

There are mechanisms to help increase the kernel ring buffer
for tracing through debugfs, and those interfaces even allow growing
the kernel ring buffer per CPU. We also have a static value which
can be passed upon boot. Relying on debugfs however is not ideal
for production, and relying on the value passed upon bootup is
can only used *after* an issue has creeped up. Instead of being
reactive this adds a proactive measure which lets you scale the
amount of contributions you'd expect to the kernel ring buffer
under load by each CPU in the worst case scenerio.

We use num_possible_cpus() to avoid complexities which could be
introduced by dynamically changing the ring buffer size at run
time, num_possible_cpus() lets us use the upper limit on possible
number of CPUs therefore avoiding having to deal with hotplugging
CPUs on and off. This option is diabled by default, and if used
the kernel ring buffer size then can be computed as follows:

size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN

Cc: Michal Hocko 
Cc: Petr Mladek 
Cc: Andrew Morton 
Cc: Joe Perches 
Cc: Arun KS 
Cc: Kees Cook 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez 
---
 init/Kconfig   | 28 
 kernel/printk/printk.c |  6 --
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9d3585b..1814436 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
 13 =>  8 KB
 12 =>  4 KB
 
+config LOG_CPU_BUF_SHIFT
+   int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
+   range 0 21
+   default 0
+   help
+ The kernel ring buffer will get additional data logged onto it
+ when multiple CPUs are supported. Typically the contributions is a
+ few lines when idle however under under load this can vary and in the
+ worst case it can mean loosing logging information. You can use this
+ to set the maximum expected mount of amount of logging contribution
+ under load by each CPU in the worst case scenerio. Select a size as
+ a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
+ LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
+ follows having 16 CPUs as possible.
+
+((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
+
+ Where as typically you'd only end up with 256 KB. This is disabled
+ by default with a value of 0.
+
+ Examples:
+17 => 128 KB
+16 => 64 KB
+15 => 32 KB
+14 => 16 KB
+13 =>  8 KB
+12 =>  4 KB
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7228258..2023424 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -246,6 +246,7 @@ static u32 clear_idx;
 #define LOG_ALIGN __alignof__(struct printk_log)
 #endif
 #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT)
 static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
@@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
unsigned long flags;
char *new_log_buf;
int free;
+   int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;
 
-   if (!new_log_buf_len)
-   return;
+   if (!new_log_buf_len && cpu_extra > 1)
+   new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
 
if (early) {
new_log_buf =
-- 
2.0.0.rc3.18.g00a5b79

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] printk: allow increasing the ring buffer depending on the number of CPUs

2014-06-10 Thread Luis R. Rodriguez
From: Luis R. Rodriguez mcg...@suse.com

The default size of the ring buffer is too small for machines
with a large amount of CPUs under heavy load. What ends up
happening when debugging is the ring buffer overlaps and chews
up old messages making debugging impossible unless the size is
passed as a kernel parameter. An idle system upon boot up will
on average spew out only about one or two extra lines but where
this really matters is on heavy load and that will vary widely
depending on the system and environment.

There are mechanisms to help increase the kernel ring buffer
for tracing through debugfs, and those interfaces even allow growing
the kernel ring buffer per CPU. We also have a static value which
can be passed upon boot. Relying on debugfs however is not ideal
for production, and relying on the value passed upon bootup is
can only used *after* an issue has creeped up. Instead of being
reactive this adds a proactive measure which lets you scale the
amount of contributions you'd expect to the kernel ring buffer
under load by each CPU in the worst case scenerio.

We use num_possible_cpus() to avoid complexities which could be
introduced by dynamically changing the ring buffer size at run
time, num_possible_cpus() lets us use the upper limit on possible
number of CPUs therefore avoiding having to deal with hotplugging
CPUs on and off. This option is diabled by default, and if used
the kernel ring buffer size then can be computed as follows:

size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN

Cc: Michal Hocko mho...@suse.cz
Cc: Petr Mladek pmla...@suse.cz
Cc: Andrew Morton a...@linux-foundation.org
Cc: Joe Perches j...@perches.com
Cc: Arun KS arunks.li...@gmail.com
Cc: Kees Cook keesc...@chromium.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
 init/Kconfig   | 28 
 kernel/printk/printk.c |  6 --
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9d3585b..1814436 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
 13 =  8 KB
 12 =  4 KB
 
+config LOG_CPU_BUF_SHIFT
+   int CPU kernel log buffer size contribution (13 = 8 KB, 17 = 128KB)
+   range 0 21
+   default 0
+   help
+ The kernel ring buffer will get additional data logged onto it
+ when multiple CPUs are supported. Typically the contributions is a
+ few lines when idle however under under load this can vary and in the
+ worst case it can mean loosing logging information. You can use this
+ to set the maximum expected mount of amount of logging contribution
+ under load by each CPU in the worst case scenerio. Select a size as
+ a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
+ LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
+ follows having 16 CPUs as possible.
+
+((1  18) + ((16 - 1) * (1  12))) / 1024 = 316 KB
+
+ Where as typically you'd only end up with 256 KB. This is disabled
+ by default with a value of 0.
+
+ Examples:
+17 = 128 KB
+16 = 64 KB
+15 = 32 KB
+14 = 16 KB
+13 =  8 KB
+12 =  4 KB
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7228258..2023424 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -246,6 +246,7 @@ static u32 clear_idx;
 #define LOG_ALIGN __alignof__(struct printk_log)
 #endif
 #define __LOG_BUF_LEN (1  CONFIG_LOG_BUF_SHIFT)
+#define __LOG_CPU_BUF_LEN (1  CONFIG_LOG_CPU_BUF_SHIFT)
 static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
@@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
unsigned long flags;
char *new_log_buf;
int free;
+   int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;
 
-   if (!new_log_buf_len)
-   return;
+   if (!new_log_buf_len  cpu_extra  1)
+   new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
 
if (early) {
new_log_buf =
-- 
2.0.0.rc3.18.g00a5b79

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/