Re: [PATCH] printk: Userspace format enumeration support

2021-02-07 Thread Chris Down

kernel test robot writes:

All errors (new ones prefixed by >>):

  /usr/bin/ld: arch/um/drivers/xterm.o: in function `xterm_open':

xterm.c:(.text+0x16b): undefined reference to `printk'
/usr/bin/ld: xterm.c:(.text+0x1a8): undefined reference to `printk'

  /usr/bin/ld: xterm.c:(.text+0x1f4): undefined reference to `printk'
  /usr/bin/ld: xterm.c:(.text+0x25e): undefined reference to `printk'
  /usr/bin/ld: xterm.c:(.text+0x2b5): undefined reference to `printk'
  /usr/bin/ld: arch/um/drivers/xterm.o:xterm.c:(.text+0x31c): more undefined 
references to `printk' follow
  collect2: error: ld returned 1 exit status


Ah yes, since it just uses it as extern. I'll fix it for v2.


Re: [PATCH] printk: Userspace format enumeration support

2021-02-07 Thread kernel test robot
Hi Chris,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on jeyu/modules-next]
[also build test ERROR on linux/master soc/for-next openrisc/for-next 
powerpc/next asm-generic/master linus/master v5.11-rc6 next-20210125]
[cannot apply to tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Chris-Down/printk-Userspace-format-enumeration-support/20210204-234240
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux.git 
modules-next
config: um-x86_64_defconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/08a6039db35c32e39de288d58d0362013fdfb387
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Chris-Down/printk-Userspace-format-enumeration-support/20210204-234240
git checkout 08a6039db35c32e39de288d58d0362013fdfb387
# save the attached .config to linux build tree
make W=1 ARCH=um SUBARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   /usr/bin/ld: arch/um/drivers/xterm.o: in function `xterm_open':
>> xterm.c:(.text+0x16b): undefined reference to `printk'
>> /usr/bin/ld: xterm.c:(.text+0x1a8): undefined reference to `printk'
   /usr/bin/ld: xterm.c:(.text+0x1f4): undefined reference to `printk'
   /usr/bin/ld: xterm.c:(.text+0x25e): undefined reference to `printk'
   /usr/bin/ld: xterm.c:(.text+0x2b5): undefined reference to `printk'
   /usr/bin/ld: arch/um/drivers/xterm.o:xterm.c:(.text+0x31c): more undefined 
references to `printk' follow
   collect2: error: ld returned 1 exit status

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH] printk: Userspace format enumeration support

2021-02-07 Thread Chris Down

Chris Down writes:

3. `KERN_SOH + level' can appear in other places than just printk strings

KERN_SOH is just ASCII '\001' -- it's not distinctive or unique, even when
paired with a check for something that looks like a level after it. For this
reason, your proposed patch results in a non-trivial amount of non-printk
related garbage in its output. For example:

 % binutils/strings -k /tmp/vmlinux | head -5
 3L)s
 3L)s
 c,[]A\
 c(L)c
 d$pL)d$`u4

Fundamentally, one cannot use a tool which just determines whether something is
printable to determine semantic intent.


$ kernel_strings --kernel --section ".rodata" vmlinux

I got exactly 0.


"It works on my computer" is not a valid testing methodology, 
especially for something as complex as the Linux kernel. It's 
especially not a valid rebuttal to someone demonstrating that it 
clearly doesn't work on theirs.


Even filtering to the .rodata section, there's plenty of garbage just 
in the first five cases:


   % binutils/strings --kernel --section ".rodata" /tmp/vmlinux | head -5
   3*** Your BIOS seems to not contain a fix for K8 errata #93
   1>pBC)
   dTRAC
   6Run %s as init process
   7calling  %pS @ %i

Clearly there are cases that you are not considering. My kernel config 
is attached if you want to try and replicate, but regardless, it's 
really not valid to say "it works for me" in response to someone 
showing that it doesn't.


Attached.
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.11.0-rc4-mm1 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (GCC) 10.2.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100200
CONFIG_LD_VERSION=23501
CONFIG_CLANG_VERSION=0
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_WATCH_QUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_PSI=y
# CONFIG_PSI_DEFAULT_DISABLED is not set
# end of CPU/Task time and stats accounting

# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y

Re: [PATCH] printk: Userspace format enumeration support

2021-02-07 Thread Chris Down

Joe Perches writes:

There are several issues with your proposed approach that make it unsuitable
for use as part of a reliable production environment:

1. It misses printk() formats without KERN_SOH

printk() formats without KERN_SOH are legal and use MESSAGE_LOGLEVEL_DEFAULT.
On my test kernel, your proposed patch loses >5% of printk formats -- over 200
messages -- due to this, including critical ones like those about hardware or
other errors.


There are _very_ few of those printks without KERN_ and those
very few are not generally being changed.


I already specified how many are lost: 5%. That's not "very few". That's a huge 
proportion of the coverage afforded by this patch, including several important 
cases.


Relying on "they generally don't change" is not a recipe for reliability or 
success (and they do change, more data on that below).



2. Users don't always have the kernel image available

Many of our machines and many of the machines of others like us do not boot
using local storage, but instead use PXE or other technologies where the kernel
may not be stored during runtime.

As is described in the changelog, it is necessary to be able to vary
remediations not only based on what is already in /dev/kmsg, but also to be
able to make decisions about our methodology based on what's _supported_ in the
running kernel at runtime, and your proposed approach makes this not viable.


Indirection would alway work.

You could load a separate file with output strings along with your
kernel image.


You're moving the goalposts quite quickly here, which makes it harder to reply 
to your points. Now you're proposing an entirely separate distribution path, 
compared to interfaces that we already have precedent for in the kernel (eg.  
trace_printk). That requires a strong justification, and I'm not seeing one 
here.



3. `KERN_SOH + level' can appear in other places than just printk strings

KERN_SOH is just ASCII '\001' -- it's not distinctive or unique, even when
paired with a check for something that looks like a level after it. For this
reason, your proposed patch results in a non-trivial amount of non-printk
related garbage in its output. For example:

 % binutils/strings -k /tmp/vmlinux | head -5
 3L)s
 3L)s
 c,[]A\
 c(L)c
 d$pL)d$`u4

Fundamentally, one cannot use a tool which just determines whether something is
printable to determine semantic intent.


$ kernel_strings --kernel --section ".rodata" vmlinux

I got exactly 0.


"It works on my computer" is not a valid testing methodology, especially for 
something as complex as the Linux kernel. It's especially not a valid rebuttal 
to someone demonstrating that it clearly doesn't work on theirs.


Even filtering to the .rodata section, there's plenty of garbage just in the 
first five cases:


% binutils/strings --kernel --section ".rodata" /tmp/vmlinux | head -5
3*** Your BIOS seems to not contain a fix for K8 errata #93
1>pBC)
dTRAC
6Run %s as init process
7calling  %pS @ %i

Clearly there are cases that you are not considering. My kernel config is 
attached if you want to try and replicate, but regardless, it's really not 
valid to say "it works for me" in response to someone showing that it doesn't.



4. strings(1) output cannot differentiate embedded newlines and new formats

The following has exactly the same output from strings(1), but will manifest
completely differently at printk() time:

 printk(KERN_ERR "line one\nline two\nline three\n");
 printk("line four\n");


This is not the preferred output style and is only done in old and
unchanging code.

Your use case in your commit log is looking for _changed_ formats.


Joe, it's fine to present alternatives to people's patches, but please do your 
research before spouting things like this. It's a waste of everyone's time to 
refute things which are so easily demonstrated to be false.


Here are a bunch of recent changes to printk I found just from literally 2 
minutes of looking through `git log`:


- ea34f78f3df6: 2020, printk site deleted (which of course we also need to 
know.)
- a0f6d924cada: 2020, new callsite. the level is printed dynamically, so your 
proposed patch would not match.
- bf13718bc57a: 2020, existing printk changed. 
- 994388f228c6: 2020, printk site changed to au0828_isocdbg, reworded entirely.

- a8b62fd08505: 2020, new callsite, dynamic level.

I could find literally pages and pages of these just from the last few years.  
Your belief that these printks are only in "unchanging" code does not match 
reality.



On Thu, 2021-02-04 at 15:37 +, Chris Down wrote:

This patch provides a solution to the issue of silently changed or
deleted printks:


Exactly _how_ many of these use cases do you think exist?

The generally preferred style for the example above would be:

pr_err("line one\n");
pr_err("line two\n");
pr_err("line three\n");
pr_err("line four\n");


I have no idea 

Re: [PATCH] printk: Userspace format enumeration support

2021-02-07 Thread Joe Perches
On Sun, 2021-02-07 at 14:13 +, Chris Down wrote:
> Joe Perches writes:
> > > There are certainly printks which can't be trivially monitored using the 
> > > printk
> > > format alone, but the vast majority of the ones that are monitored _do_ 
> > > have
> > > meaningful formats and can be monitored over time. No solution to this is 
> > > going
> > > to catch every single case, especially when so much of the information 
> > > can be
> > > generated dyamically, but this patchset still goes a long way to making 
> > > printk
> > > monitoring more tractable for use cases like the one described in the
> > > changelog.
> > 
> > For the _vast_ majority of printk strings, this can easily be found
> > and compared using a trivial modification to strings.
> 
> There are several issues with your proposed approach that make it unsuitable 
> for use as part of a reliable production environment:
> 
> 1. It misses printk() formats without KERN_SOH
> 
> printk() formats without KERN_SOH are legal and use MESSAGE_LOGLEVEL_DEFAULT. 
>  
> On my test kernel, your proposed patch loses >5% of printk formats -- over 
> 200 
> messages -- due to this, including critical ones like those about hardware or 
> other errors.

There are _very_ few of those printks without KERN_ and those
very few are not generally being changed.

> 2. Users don't always have the kernel image available
> 
> Many of our machines and many of the machines of others like us do not boot 
> using local storage, but instead use PXE or other technologies where the 
> kernel 
> may not be stored during runtime.
> 
> As is described in the changelog, it is necessary to be able to vary 
> remediations not only based on what is already in /dev/kmsg, but also to be 
> able to make decisions about our methodology based on what's _supported_ in 
> the 
> running kernel at runtime, and your proposed approach makes this not viable.

Indirection would alway work.

You could load a separate file with output strings along with your
kernel image.

> 3. `KERN_SOH + level' can appear in other places than just printk strings
> 
> KERN_SOH is just ASCII '\001' -- it's not distinctive or unique, even when 
> paired with a check for something that looks like a level after it. For this 
> reason, your proposed patch results in a non-trivial amount of non-printk 
> related garbage in its output. For example:
> 
>  % binutils/strings -k /tmp/vmlinux | head -5
>  3L)s
>  3L)s
>  c,[]A\
>  c(L)c
>  d$pL)d$`u4
> 
> Fundamentally, one cannot use a tool which just determines whether something 
> is 
> printable to determine semantic intent.

$ kernel_strings --kernel --section ".rodata" vmlinux

I got exactly 0.

> 4. strings(1) output cannot differentiate embedded newlines and new formats
> 
> The following has exactly the same output from strings(1), but will manifest 
> completely differently at printk() time:
> 
>  printk(KERN_ERR "line one\nline two\nline three\n");
>  printk("line four\n");

This is not the preferred output style and is only done in old and
unchanging code.

Your use case in your commit log is looking for _changed_ formats.

On Thu, 2021-02-04 at 15:37 +, Chris Down wrote:
> This patch provides a solution to the issue of silently changed or
> deleted printks:

Exactly _how_ many of these use cases do you think exist?

The generally preferred style for the example above would be:

pr_err("line one\n");
pr_err("line two\n");
pr_err("line three\n");
pr_err("line four\n");

> The originally posted patch _does_ differentiate between these cases, using 
> \0 
> as a reliable separator. Its outputs are, respectively:
> 
>  \0013line one\nline two\nline three\0\nline four\n\0
>  \0013line one\nline two\n\0line three\nline four\n\0
> 
> This isn't just a theoretical concern -- there are plenty of places which use 
> multiline printks, and we must be able to distinguish between that and 
> _multiple_ printks.

Just like there are many places that use buffered printks as the
example I gave earlier.  None of which your proposed solution would find.

> 5. strings(1) is not contextually aware, and cannot be made to act as if it is
> 
> strings has no idea about what it is reading, which is why it is more than 
> happy to output the kind of meaningless output shown in #3. There are plenty 
> of 
> places across the kernel where there might be a sequence of bytes which the 
> strings utility happens to interpret as being semantically meaningful, but in 
> reality just happens to be an unrelated sequence of coincidentally printable 
> bytes that just happens to contain a \001.
> 
> I appreciate your willingness to propose other solutions, but for these 
> reasons, the proposed strings(1) patch would not suffice as an interface for 
> printk enumeration.

I think you are on a path to try to make printk output immutable.
I think that's a _very_ bad path.

I also think this is adding needless 

Re: [PATCH] printk: Userspace format enumeration support

2021-02-07 Thread Chris Down

Joe Perches writes:

There are certainly printks which can't be trivially monitored using the printk
format alone, but the vast majority of the ones that are monitored _do_ have
meaningful formats and can be monitored over time. No solution to this is going
to catch every single case, especially when so much of the information can be
generated dyamically, but this patchset still goes a long way to making printk
monitoring more tractable for use cases like the one described in the
changelog.


For the _vast_ majority of printk strings, this can easily be found
and compared using a trivial modification to strings.


There are several issues with your proposed approach that make it unsuitable 
for use as part of a reliable production environment:


1. It misses printk() formats without KERN_SOH

printk() formats without KERN_SOH are legal and use MESSAGE_LOGLEVEL_DEFAULT.  
On my test kernel, your proposed patch loses >5% of printk formats -- over 200 
messages -- due to this, including critical ones like those about hardware or 
other errors.


2. Users don't always have the kernel image available

Many of our machines and many of the machines of others like us do not boot 
using local storage, but instead use PXE or other technologies where the kernel 
may not be stored during runtime.


As is described in the changelog, it is necessary to be able to vary 
remediations not only based on what is already in /dev/kmsg, but also to be 
able to make decisions about our methodology based on what's _supported_ in the 
running kernel at runtime, and your proposed approach makes this not viable.


3. `KERN_SOH + level' can appear in other places than just printk strings

KERN_SOH is just ASCII '\001' -- it's not distinctive or unique, even when 
paired with a check for something that looks like a level after it. For this 
reason, your proposed patch results in a non-trivial amount of non-printk 
related garbage in its output. For example:


% binutils/strings -k /tmp/vmlinux | head -5
3L)s
3L)s
c,[]A\
c(L)c
d$pL)d$`u4

Fundamentally, one cannot use a tool which just determines whether something is 
printable to determine semantic intent.


4. strings(1) output cannot differentiate embedded newlines and new formats

The following has exactly the same output from strings(1), but will manifest 
completely differently at printk() time:


printk(KERN_ERR "line one\nline two\nline three\n");
printk("line four\n");

With strings, the hypothetical output would be:*

3line one\nline two\nline three\nline four\n

* "line four\n" would also be missing with your current -k check.

But this makes it impossible to distinguish between this, compared to:

printk(KERN_ERR "line one\nline two\n");
printk("line three\n");
printk("line four\n");

The originally posted patch _does_ differentiate between these cases, using \0 
as a reliable separator. Its outputs are, respectively:


\0013line one\nline two\nline three\0\nline four\n\0
\0013line one\nline two\n\0line three\nline four\n\0

This isn't just a theoretical concern -- there are plenty of places which use 
multiline printks, and we must be able to distinguish between that and 
_multiple_ printks. Not being able to differentiate cases like these would 
dramatically reduce the effectiveness of printk enumeration, as we can no 
longer ascertain which formats will always be used together (for example, in 
the case of sequences of printks guarded by conditionals, which are all over 
the place).


5. strings(1) is not contextually aware, and cannot be made to act as if it is

strings has no idea about what it is reading, which is why it is more than 
happy to output the kind of meaningless output shown in #3. There are plenty of 
places across the kernel where there might be a sequence of bytes which the 
strings utility happens to interpret as being semantically meaningful, but in 
reality just happens to be an unrelated sequence of coincidentally printable 
bytes that just happens to contain a \001.


I appreciate your willingness to propose other solutions, but for these 
reasons, the proposed strings(1) patch would not suffice as an interface for 
printk enumeration.


Re: [PATCH] printk: Userspace format enumeration support

2021-02-06 Thread Joe Perches
On Sat, 2021-02-06 at 21:21 +, Chris Down wrote:
> Joe Perches writes:
> > On Fri, 2021-02-05 at 22:25 +, Chris Down wrote:
> > > Petr Mladek writes:
> > > >   +  is already optinaly added by pr_fmt() to the printed 
> > > > strings
> > > > as:  pr_fmt(): ...
> > > 
> > > pr_fmts are not consistently used across the kernel, and sometimes differ 
> > > from
> > > the module itself. Many modules don't use it at all, and we also don't 
> > > have it
> > > for pr_cont. Just picking some random examples:
> > > 
> > >  % grep -av vmlinux /proc/printk_formats | shuf -n 10
> > >  mac80211,6%s: mesh STA %pM switches to channel requiring DFS (%d 
> > > MHz, width:%d, CF1/2: %d/%d MHz), aborting
> > >  thinkpad_acpi,c N/Athinkpad_acpi,c %dthinkpad_acpi,5thinkpad_acpi: 
> > > temperatures (Celsius):thinkpad_acpi,3thinkpad_acpi: Out of memory for 
> > > LED data
> > 
> > I don't understand this format.
> > 
> > "Out of memory for LED data" is a single printk ending with a '\n' newline
> > I expected this to be broken up into multiple lines, one for each printk
> > that endsd in a newline.
> 
> Hmm, that's just a manifestation of directly using `shuf` without doing the 
> transformation of trailing nulls to newlines shown in the changelog. They are 
> still distinct and separated by nulls.
> 
> > And what would happen if the function was refactored removing the pr_cont
> > uses like the below: (basically, any output that uses a mechanism that
> > aggregates a buffer then emits it, and there are a _lot_ of those)
> > 
> > printk("%s\n", buffer);
> 
> There are certainly printks which can't be trivially monitored using the 
> printk 
> format alone, but the vast majority of the ones that are monitored _do_ have 
> meaningful formats and can be monitored over time. No solution to this is 
> going 
> to catch every single case, especially when so much of the information can be 
> generated dyamically, but this patchset still goes a long way to making 
> printk 
> monitoring more tractable for use cases like the one described in the 
> changelog.

For the _vast_ majority of printk strings, this can easily be found
and compared using a trivial modification to strings.

Module specific formats are stored in the .ko files and could be
examined separately.

Here's the possible patch to strings:

---
 binutils/strings.c | 98 +-
 1 file changed, 82 insertions(+), 16 deletions(-)

diff --git a/binutils/strings.c b/binutils/strings.c
index 3444fbc7222..d4abeb0cdbb 100644
--- a/binutils/strings.c
+++ b/binutils/strings.c
@@ -26,6 +26,10 @@
--data
-d  Scan only the initialized data section(s) of object files.
 
+   --section=
+   -S  Scan only the specific section(s)
+   --kernel
+   -k  Scan only linux-kernel KERN_SOH strings
--print-file-name
-f  Print the name of the file before each string.
 
@@ -108,6 +112,14 @@ static bfd_boolean print_filenames;
 /* TRUE means for object files scan only the data section.  */
 static bfd_boolean datasection_only;
 
+/* TRUE means for object files scan only the specified sections.  */
+static bfd_boolean specified_sections_only;
+static int specified_sections_count;
+static char **specified_sections;
+
+/* TRUE means scan only linux-kernel printk strings with KERN_SOH.  */
+static bfd_boolean linux_kernel_soh;
+
 /* The BFD object file format.  */
 static char *target;
 
@@ -122,6 +134,8 @@ static struct option long_options[] =
 {
   {"all", no_argument, NULL, 'a'},
   {"data", no_argument, NULL, 'd'},
+  {"section", required_argument, NULL, 'S'},
+  {"kernel", no_argument, NULL, 'k'},
   {"print-file-name", no_argument, NULL, 'f'},
   {"bytes", required_argument, NULL, 'n'},
   {"radix", required_argument, NULL, 't'},
@@ -173,7 +187,7 @@ main (int argc, char **argv)
   encoding = 's';
   output_separator = NULL;
 
-  while ((optc = getopt_long (argc, argv, "adfhHn:wot:e:T:s:Vv0123456789",
+  while ((optc = getopt_long (argc, argv, "adS:s:kfhHn:wot:e:T:s:Vv0123456789",
  long_options, (int *) 0)) != EOF)
 {
   switch (optc)
@@ -186,6 +200,17 @@ main (int argc, char **argv)
  datasection_only = TRUE;
  break;
 
+   case 'S':
+ specified_sections_only = TRUE;
+ specified_sections = xrealloc(specified_sections,
+   (specified_sections_count + 1) * 
sizeof(const char *));
+ specified_sections[specified_sections_count++] = optarg;
+ break;
+
+   case 'k':
+ linux_kernel_soh = TRUE;
+ break;
+
case 'f':
  print_filenames = TRUE;
  break;
@@ -318,6 +343,19 @@ main (int argc, char **argv)
   return (exit_status);
 }
 
+static bfd_boolean
+section_is_specified_section (asection *sect)
+{
+  int i;
+  for (i = 0; i < specified_sections_count; i++)
+{
+  if (strcmp(specified_sections[i], sect->name) == 0) {
+   

Re: [PATCH] printk: Userspace format enumeration support

2021-02-06 Thread Chris Down

Joe Perches writes:

On Fri, 2021-02-05 at 22:25 +, Chris Down wrote:

Petr Mladek writes:
>   +  is already optinaly added by pr_fmt() to the printed strings
> as:  pr_fmt(): ...

pr_fmts are not consistently used across the kernel, and sometimes differ from
the module itself. Many modules don't use it at all, and we also don't have it
for pr_cont. Just picking some random examples:

 % grep -av vmlinux /proc/printk_formats | shuf -n 10
 mac80211,6%s: mesh STA %pM switches to channel requiring DFS (%d MHz, 
width:%d, CF1/2: %d/%d MHz), aborting
 thinkpad_acpi,c N/Athinkpad_acpi,c %dthinkpad_acpi,5thinkpad_acpi: 
temperatures (Celsius):thinkpad_acpi,3thinkpad_acpi: Out of memory for LED data


I don't understand this format.

"Out of memory for LED data" is a single printk ending with a '\n' newline
I expected this to be broken up into multiple lines, one for each printk
that endsd in a newline.


Hmm, that's just a manifestation of directly using `shuf` without doing the 
transformation of trailing nulls to newlines shown in the changelog. They are 
still distinct and separated by nulls.



And what would happen if the function was refactored removing the pr_cont
uses like the below: (basically, any output that uses a mechanism that
aggregates a buffer then emits it, and there are a _lot_ of those)

printk("%s\n", buffer);


There are certainly printks which can't be trivially monitored using the printk 
format alone, but the vast majority of the ones that are monitored _do_ have 
meaningful formats and can be monitored over time. No solution to this is going 
to catch every single case, especially when so much of the information can be 
generated dyamically, but this patchset still goes a long way to making printk 
monitoring more tractable for use cases like the one described in the 
changelog.


Re: [PATCH] printk: Userspace format enumeration support

2021-02-06 Thread Joe Perches
On Fri, 2021-02-05 at 22:25 +, Chris Down wrote:
> Petr Mladek writes:
> >   +  is already optinaly added by pr_fmt() to the printed strings
> > as:  pr_fmt(): ...
> 
> pr_fmts are not consistently used across the kernel, and sometimes differ 
> from 
> the module itself. Many modules don't use it at all, and we also don't have 
> it 
> for pr_cont. Just picking some random examples:
> 
>  % grep -av vmlinux /proc/printk_formats | shuf -n 10
>  mac80211,6%s: mesh STA %pM switches to channel requiring DFS (%d MHz, 
> width:%d, CF1/2: %d/%d MHz), aborting
>  thinkpad_acpi,c N/Athinkpad_acpi,c %dthinkpad_acpi,5thinkpad_acpi: 
> temperatures (Celsius):thinkpad_acpi,3thinkpad_acpi: Out of memory for LED 
> data

I don't understand this format.

"Out of memory for LED data" is a single printk ending with a '\n' newline 
I expected this to be broken up into multiple lines, one for each printk
that endsd in a newline.

And what would happen if the function was refactored removing the pr_cont
uses like the below: (basically, any output that uses a mechanism that
aggregates a buffer then emits it, and there are a _lot_ of those)

printk("%s\n", buffer);

And there is already a relatively trivial way to do this using a modified
version of strings that looks for KERN_SOH[0-6], and if dynamic_debug is
enabled, look in the dynamic_debug section, either __verbose or __dyndbg
depending on the kernel version.

---
 drivers/platform/x86/thinkpad_acpi.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/platform/x86/thinkpad_acpi.c 
b/drivers/platform/x86/thinkpad_acpi.c
index 18b390153e7f..ff1c09c600f8 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -6353,21 +6353,26 @@ static void thermal_dump_all_sensors(void)
 {
int n, i;
struct ibm_thermal_sensors_struct t;
+   char output[256];
+   int len = 0;
 
n = thermal_get_sensors();
if (n <= 0)
return;
 
-   pr_notice("temperatures (Celsius):");
+   len += scnprintf(output + len, sizeof(output) - len,
+  "temperatures (Celsius):");
 
for (i = 0; i < n; i++) {
-   if (t.temp[i] != TPACPI_THERMAL_SENSOR_NA)
-   pr_cont(" %d", (int)(t.temp[i] / 1000));
+   if (t.temp[i] == TPACPI_THERMAL_SENSOR_NA)
+   len += scnprintf(output + len, sizeof(output) - len,
+" N/A");
else
-   pr_cont(" N/A");
+   len += scnprintf(output + len, sizeof(output) - len,
+" %d", t.temp[i] / 1000);
}
 
-   pr_cont("\n");
+   pr_notice("%s\n", output);
 }
 
 /* sysfs temp##_input -- */



Re: [PATCH] printk: Userspace format enumeration support

2021-02-06 Thread Chris Down

Greg Kroah-Hartman writes:

> I'm not against the idea. I don't think it belongs in /proc. Perhaps
> debugfs is a better place to put it.

Any location is fine with me, as long as it gets to userspace. How does
/printk/formats or /printk/formats/ sound to you?


That's fine with me, but I'd like to see the patch with this in it first
before approving it :)


Thanks! I'll send v2 soon then with the aforementioned changes, and a move to 
debugfs.


Re: [PATCH] printk: Userspace format enumeration support

2021-02-05 Thread Greg Kroah-Hartman
On Fri, Feb 05, 2021 at 10:45:19PM +, Chris Down wrote:
> Hi Steven,
> 
> Steven Rostedt writes:
> > Interesting, because when I was looking at the original patch (looked at
> > the lore link before reading your reply), I thought to myself "this looks
> > exactly like what I did for trace_printk formats", which the above file is
> > where it is shown. I'm curious if this work was inspired by that?
> 
> The double __builtin_constant_p() trick was suggested by Johannes based on
> prior art in trace_puts() just prior to patch submission. Other than that,
> it seems we came up with basically the same solution independently. :-)
> 
> > > Anyway, there is something wrong at the moment. The output looks fine
> > > with cat. But "less" says that it is a binary format and the output
> > > is a bit messy:
> > 
> > Hmm, that's usually the case when lseek gets messed up. Not sure how that
> > happened.
> 
> It looks as intended to me -- none of the newlines, nulls, or other control
> sequences are escaped currently, since I didn't immediately see a reason to
> do that. If that's a blocker though, I'm happy to change it.
> 
> > > $> less /proc/printk_formats
> > > "/proc/printk_formats" may be a binary file.  See it anyway?
> > > vmlinux,^A3Warning: unable to open an initial console.
> > > ^@vmlinux,^A3Failed to execute %s (error %d)
> > > ^@vmlinux,^A6Kernel memory protection disabled.
> > > ^@vmlinux,^A3Starting init: %s exists but couldn't execute it (error %d)
> > > 
> > > 
> > > That is for now. I still have to think about it. And I am also curious
> > > about what others thing about this idea.
> > > 
> > 
> > I'm not against the idea. I don't think it belongs in /proc. Perhaps
> > debugfs is a better place to put it.
> 
> Any location is fine with me, as long as it gets to userspace. How does
> /printk/formats or /printk/formats/ sound to you?

That's fine with me, but I'd like to see the patch with this in it first
before approving it :)

thanks,

greg k-h


Re: [PATCH] printk: Userspace format enumeration support

2021-02-05 Thread Steven Rostedt
On Fri, 5 Feb 2021 22:45:19 +
Chris Down  wrote:

> >I'm not against the idea. I don't think it belongs in /proc. Perhaps
> >debugfs is a better place to put it.  
> 
> Any location is fine with me, as long as it gets to userspace. How does 
> /printk/formats or /printk/formats/ sound to you?

I'm fine with it, if others are.

Something like this will probably need approval from others on the Cc list
here.

-- Steve


Re: [PATCH] printk: Userspace format enumeration support

2021-02-05 Thread Chris Down

Hi Petr,

Thanks for looking over the patch. :-)

Petr Mladek writes:

Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.

As you'd expect, monitoring using printk is rather brittle for a number
of reasons -- most notably that the message might disappear entirely in
a new version of the kernel, or that the message may change in some way
that the regex or other classification methods start to silently fail.


Another is that printk() is not reliable on its own. Messages might
get lost. The size of the log buffer is limited. Deamon reading
/dev/kmsg need not be scheduled in time or often enough. Console
might be slow. The messages are filtered on the console by console_loglevel.


This is of course true. We don't use kmsg as the last line of defence for 
monitoring or remediation, of course, but it would be unwise to not have 
infrastructure capable of monitoring it. We often need to act quickly when 
production incidents happen, and often kmsg is the place where those 
"unexpected" issues are surfaced. It's often much more likely that there is 
some kmsg log which we can act on in those scenarios than anything else, and 
even if it's not ideal, in reality, it's typically reliable enough to at least 
mitigate the problem when dealing with a large fleet of machines :-)



# Format: ,\0
$ perl -p -e 's/\n/\\n/g;s/\0/\n/g' /proc/printk_formats | shuf -n 5
vmlinux,6Disabling APIC timer\n
intel_rapl_common,3intel_rapl_common: Cannot find matching power limit for 
constraint %d\n
dm_crypt,3device-mapper: crypt: %s: INTEGRITY AEAD ERROR, sector %llu\n
mac80211,6%s: AP bug: HT capability missing from AssocResp\n
vmlinux,3zpool: couldn't create zpool - out of memory\n


The facility and log level are not well separated from the format string.

Also this is yet another style how the format is displayed. We already have

+ console/syslog: formated by record_print_text()
+ /dev/kmsg: formatted by info_print_ext_header(),  
msg_print_ext_body().
+ /sys/kernel/debug/dynamic_debug/control
+ /sys/kernel/debug/tracing/printk_formats

We should get some inspiration from the existing interfaces.


Sure, I'm not super bound to the format, as long as we have something that can 
aid those maintaining these systems which monitor printk in identifying that a 
format was mutated or removed. The module is more or less optional -- it's just 
intended as a hint about where to look.



But we first should decide what information might be useful:

  + 'facility' should not be needed. All messages should be from
 kernel.


That's fair enough, it can be omitted. I just didn't want to stray too far from 
the netconsole format, since we already mostly have it in this format there.


My intention is to _not_ deviate from existing interfaces, really, so I'll be 
happy with any suggested format that will achieve this patch's stated goals, 
since this kind of data is sorely needed :-)



  +  is already optinaly added by pr_fmt() to the printed strings
as:  pr_fmt(): ...


pr_fmts are not consistently used across the kernel, and sometimes differ from 
the module itself. Many modules don't use it at all, and we also don't have it 
for pr_cont. Just picking some random examples:


% grep -av vmlinux /proc/printk_formats | shuf -n 10
mac80211,6%s: mesh STA %pM switches to channel requiring DFS (%d MHz, 
width:%d, CF1/2: %d/%d MHz), aborting
thinkpad_acpi,c N/Athinkpad_acpi,c %dthinkpad_acpi,5thinkpad_acpi: 
temperatures (Celsius):thinkpad_acpi,3thinkpad_acpi: Out of memory for LED data
i915,6drm/i915 developers can then reassign to the right component if it's 
not a kernel issue.
video,4[Firmware Bug]: _BCQ is used instead of _BQC
i915,3gvt: requesting SMI service
are MMIO SPTEs.
i915,3gvt: invalid tiling mode: %x
video,3ACPI: Create sysfs link
cec,6cec-%s: duplicate logical address type
soundwire_bus,3%s: %s: inconsistent state state %d


+static int proc_pf_show(struct seq_file *s, void *v)
+{
+   const struct printk_fmt_sec *ps = NULL;
+   const char **fptr = NULL;
+
+   mutex_lock(_fmts_mutex);
+
+   list_for_each_entry(ps, _fmts_list, list) {
+   const char *mod_name = ps_get_module_name(ps);
+
+   for (fptr = ps->start; fptr < ps->end; fptr++) {
+   seq_puts(s, mod_name);
+   seq_putc(s, ',');
+   seq_puts(s, *fptr);
+   seq_putc(s, '\0');
+   }


You probably should get inspiration from t_show() 

Re: [PATCH] printk: Userspace format enumeration support

2021-02-05 Thread Chris Down

Hi Steven,

Steven Rostedt writes:

Interesting, because when I was looking at the original patch (looked at
the lore link before reading your reply), I thought to myself "this looks
exactly like what I did for trace_printk formats", which the above file is
where it is shown. I'm curious if this work was inspired by that?


The double __builtin_constant_p() trick was suggested by Johannes based on 
prior art in trace_puts() just prior to patch submission. Other than that, it 
seems we came up with basically the same solution independently. :-)



Anyway, there is something wrong at the moment. The output looks fine
with cat. But "less" says that it is a binary format and the output
is a bit messy:


Hmm, that's usually the case when lseek gets messed up. Not sure how that
happened.


It looks as intended to me -- none of the newlines, nulls, or other control 
sequences are escaped currently, since I didn't immediately see a reason to do 
that. If that's a blocker though, I'm happy to change it.



$> less /proc/printk_formats
"/proc/printk_formats" may be a binary file.  See it anyway?
vmlinux,^A3Warning: unable to open an initial console.
^@vmlinux,^A3Failed to execute %s (error %d)
^@vmlinux,^A6Kernel memory protection disabled.
^@vmlinux,^A3Starting init: %s exists but couldn't execute it (error %d)


That is for now. I still have to think about it. And I am also curious
about what others thing about this idea.



I'm not against the idea. I don't think it belongs in /proc. Perhaps
debugfs is a better place to put it.


Any location is fine with me, as long as it gets to userspace. How does 
/printk/formats or /printk/formats/ sound to you?


Thanks,

Chris


Re: [PATCH] printk: Userspace format enumeration support

2021-02-05 Thread Chris Down

Oh, and one more I just spotted and will fix in v2 after other feedback is in:

Chris Down writes:

+static void store_printk_fmt_sec(const struct module *mod, const char **start,
+const char **end)
+{
+   struct printk_fmt_sec *ps = NULL;
+   const char **fptr = NULL;
+   size_t size = 0;
+
+   ps = kmalloc(sizeof(struct printk_fmt_sec), GFP_KERNEL);
+   if (!ps)
+   return;
+
+   ps->module = mod;
+   ps->start = start;
+   ps->end = end;
+
+   for (fptr = ps->start; fptr < ps->end; fptr++)
+   size += strlen(*fptr) + 1;


This still works, but is out of date and doesn't account for the module and 
comma, so results in one more unnecessary round trip doing seq_buf_alloc.


Re: [PATCH] printk: Userspace format enumeration support

2021-02-05 Thread Steven Rostedt
On Fri, 5 Feb 2021 17:42:55 +0100
Petr Mladek  wrote:

> Hi,
> 
> I would like to hear opinion from a bigger audience. It is an
> userspace interface that we might need to maintain forewer.
> Adding few more people in to CC:
> 
> Steven Rostedt : printk co-maintainer

Thanks for Cc'ing me.

> Alexey Dobriyan : fs/proc maintainer
> Greg Kroah-Hartman : sysfs maintainer
> Jason Baron : dynamic_debug maintainer
> Kees Cook : security POV
> linux-...@vger.kernel.org: Linux API mailing list
> 
> Of course, we should also ask if this is the right approach
> for the think that you want to achieve.
> 
> The motivation for this patch is that the strings printed by kernels
> are not reliable and you want a simple way to compare differences
> bethween versions. Do I get it right?
> 
> See more comments below.
> 
> 


> Also this is yet another style how the format is displayed. We already have
> 
>   + console/syslog: formated by record_print_text()
>   + /dev/kmsg: formatted by info_print_ext_header(),  
> msg_print_ext_body().
>   + /sys/kernel/debug/dynamic_debug/control


>   + /sys/kernel/debug/tracing/printk_formats
> 
> We should get some inspiration from the existing interfaces.

Interesting, because when I was looking at the original patch (looked at
the lore link before reading your reply), I thought to myself "this looks
exactly like what I did for trace_printk formats", which the above file is
where it is shown. I'm curious if this work was inspired by that?



> > diff --git a/include/asm-generic/vmlinux.lds.h 
> > b/include/asm-generic/vmlinux.lds.h
> > index 34b7e0d2346c..0ca6e28e05d6 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -309,6 +309,17 @@
> >  #define ACPI_PROBE_TABLE(name)
> >  #endif
> >  
> > +#ifdef CONFIG_PRINTK_ENUMERATION
> > +#define PRINTK_FMTS
> > \
> > +   .printk_fmts : AT(ADDR(.printk_fmts) - LOAD_OFFSET) {   \
> > +   __start_printk_fmts = .;\
> > +   *(.printk_fmts) \
> > +   __stop_printk_fmts = .; \
> > +   }
> > +#else
> > +#define PRINTK_FMTS
> > +#endif  
> 
> It should be defined after #define TRACEDATA to follow the existing
> style.
> 
> But honestly I am not much familiar with the sections definitions.
> I am curious why TRACE_PRINTKS() and __dyndbg are defined
> a bit different way.
> 

I'm not sure what difference you mean.

> > +static int proc_pf_show(struct seq_file *s, void *v)
> > +{
> > +   const struct printk_fmt_sec *ps = NULL;
> > +   const char **fptr = NULL;
> > +
> > +   mutex_lock(_fmts_mutex);
> > +
> > +   list_for_each_entry(ps, _fmts_list, list) {
> > +   const char *mod_name = ps_get_module_name(ps);
> > +
> > +   for (fptr = ps->start; fptr < ps->end; fptr++) {
> > +   seq_puts(s, mod_name);
> > +   seq_putc(s, ',');
> > +   seq_puts(s, *fptr);
> > +   seq_putc(s, '\0');
> > +   }  
> 
> You probably should get inspiration from t_show() in trace_printk.c.
> It handles newlines, ...
> 
> Or by ddebug_proc_show(). It uses seq_escape().
> 
> Anyway, there is something wrong at the moment. The output looks fine
> with cat. But "less" says that it is a binary format and the output
> is a bit messy:

Hmm, that's usually the case when lseek gets messed up. Not sure how that
happened.

> 
> $> less /proc/printk_formats   
> "/proc/printk_formats" may be a binary file.  See it anyway? 
> vmlinux,^A3Warning: unable to open an initial console.
> ^@vmlinux,^A3Failed to execute %s (error %d)
> ^@vmlinux,^A6Kernel memory protection disabled.
> ^@vmlinux,^A3Starting init: %s exists but couldn't execute it (error %d)
> 
> 
> That is for now. I still have to think about it. And I am also curious
> about what others thing about this idea.
> 

I'm not against the idea. I don't think it belongs in /proc. Perhaps
debugfs is a better place to put it.

-- Steve


Re: [PATCH] printk: Userspace format enumeration support

2021-02-05 Thread Petr Mladek
Hi,

I would like to hear opinion from a bigger audience. It is an
userspace interface that we might need to maintain forewer.
Adding few more people in to CC:

Steven Rostedt : printk co-maintainer
Alexey Dobriyan : fs/proc maintainer
Greg Kroah-Hartman : sysfs maintainer
Jason Baron : dynamic_debug maintainer
Kees Cook : security POV
linux-...@vger.kernel.org: Linux API mailing list

Of course, we should also ask if this is the right approach
for the think that you want to achieve.

The motivation for this patch is that the strings printed by kernels
are not reliable and you want a simple way to compare differences
bethween versions. Do I get it right?

See more comments below.



On Thu 2021-02-04 15:37:52, Chris Down wrote:
> We have a number of systems industry-wide that have a subset of their
> functionality that works as follows:
 
> 1. Receive a message from local kmsg, serial console, or netconsole;
> 2. Apply a set of rules to classify the message;
> 3. Do something based on this classification (like scheduling a
>remediation for the machine), rinse, and repeat.
> 
> As a couple of examples of places we have this implemented just inside
> Facebook, although this isn't a Facebook-specific problem, we have this
> inside our netconsole processing (for alarm classification), and as part
> of our machine health checking. We use these messages to determine
> fairly important metrics around production health, and it's important
> that we get them right.
> 
> While for some kinds of issues we have counters, tracepoints, or metrics
> with a stable interface which can reliably indicate the issue, in order
> to react to production issues quickly we need to work with the interface
> which most kernel developers naturally use when developing: printk.
> 
> Most production issues come from unexpected phenomena, and as such
> usually the code in question doesn't have easily usable tracepoints or
> other counters available for the specific problem being mitigated. We
> have a number of lines of monitoring defence against problems in
> production (host metrics, process metrics, service metrics, etc), and
> where it's not feasible to reliably monitor at another level, this kind
> of pragmatic netconsole monitoring is essential.
> 
> As you'd expect, monitoring using printk is rather brittle for a number
> of reasons -- most notably that the message might disappear entirely in
> a new version of the kernel, or that the message may change in some way
> that the regex or other classification methods start to silently fail.

Another is that printk() is not reliable on its own. Messages might
get lost. The size of the log buffer is limited. Deamon reading
/dev/kmsg need not be scheduled in time or often enough. Console
might be slow. The messages are filtered on the console by console_loglevel.


> One factor that makes this even harder is that, under normal operation,
> many of these messages are never expected to be hit. For example, there
> may be some rare hardware bug which you want to detect if it was to ever
> happen again, but its recurrence is not likely or anticipated. This
> precludes using something like checking whether the printk in question
> was printed somewhere fleetwide recently to determine whether the
> message in question is still present or not, since we don't anticipate
> that it should be printed anywhere, but still need to monitor for its
> future presence in the long-term.
> 
> This class of issue has happened on a number of occasions, causing
> unhealthy machines with hardware issues to remain in production for
> longer than ideal. As a recent example, some monitoring around
> blk_update_request fell out of date and caused semi-broken machines to
> remain in production for longer than would be desirable.
> 
> Searching through the codebase to find the message is also extremely
> fragile, because many of the messages are further constructed beyond
> their callsite (eg. btrfs_printk and other module-specific wrappers,
> each with their own functionality). Even if they aren't, guessing the
> format and formulation of the underlying message based on the aesthetics
> of the message emitted is not a recipe for success at scale, and our
> previous issues with fleetwide machine health checking demonstrate as
> much.
> 
> This patch provides a solution to the issue of silently changed or
> deleted printks: we record pointers to all printk format strings known
> at compile time into a new .printk_fmts section, both in vmlinux and
> modules. At runtime, this can then be iterated by looking at
> /proc/printk_formats, which emits the same format as `printk` itself,
> which we already export elsewhere (for example, in netconsole):
> 
> # Format: ,\0
> $ perl -p -e 's/\n/\\n/g;s/\0/\n/g' /proc/printk_formats | shuf -n 5
> vmlinux,6Disabling APIC timer\n
> intel_rapl_common,3intel_rapl_common: Cannot find matching power limit 
> for constraint %d\n
> dm_crypt,3device-mapper: crypt: 

Re: [PATCH] printk: Userspace format enumeration support

2021-02-04 Thread Chris Down

Some improvements I noticed that I'll include when sending v2.

Chris Down writes:

+static void remove_printk_fmt_sec(const struct module *mod)
+{
+   struct printk_fmt_sec *tmp = NULL, *ps = NULL;
+   const char **fptr = NULL;


fptr can be scoped to list_for_each_entry_safe, I'll move it.


+static int proc_pf_show(struct seq_file *s, void *v)
+{
+   const struct printk_fmt_sec *ps = NULL;
+   const char **fptr = NULL;


Ditto for list_for_each_entry.


+
+   mutex_lock(_fmts_mutex);
+
+   list_for_each_entry(ps, _fmts_list, list) {
+   const char *mod_name = ps_get_module_name(ps);
+
+   for (fptr = ps->start; fptr < ps->end; fptr++) {
+   seq_puts(s, mod_name);
+   seq_putc(s, ',');
+   seq_puts(s, *fptr);
+   seq_putc(s, '\0');
+   }
+   }


Not sure why I didn't just use seq_printf here. I'll do that.

I'll wait a few days for feedback then will send v2 either way. :-)


[PATCH] printk: Userspace format enumeration support

2021-02-04 Thread Chris Down
We have a number of systems industry-wide that have a subset of their
functionality that works as follows:

1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
   remediation for the machine), rinse, and repeat.

As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.

While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.

Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.

As you'd expect, monitoring using printk is rather brittle for a number
of reasons -- most notably that the message might disappear entirely in
a new version of the kernel, or that the message may change in some way
that the regex or other classification methods start to silently fail.

One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be some rare hardware bug which you want to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.

This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.

Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.

This patch provides a solution to the issue of silently changed or
deleted printks: we record pointers to all printk format strings known
at compile time into a new .printk_fmts section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
/proc/printk_formats, which emits the same format as `printk` itself,
which we already export elsewhere (for example, in netconsole):

# Format: ,\0
$ perl -p -e 's/\n/\\n/g;s/\0/\n/g' /proc/printk_formats | shuf -n 5
vmlinux,6Disabling APIC timer\n
intel_rapl_common,3intel_rapl_common: Cannot find matching power limit for 
constraint %d\n
dm_crypt,3device-mapper: crypt: %s: INTEGRITY AEAD ERROR, sector %llu\n
mac80211,6%s: AP bug: HT capability missing from AssocResp\n
vmlinux,3zpool: couldn't create zpool - out of memory\n

This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.

There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.

Signed-off-by: Chris Down 
Cc: Petr Mladek 
Cc: Sergey Senozhatsky 
Cc: John Ogness 
Cc: Johannes Weiner 
Cc: Andrew Morton 
---
 arch/arm/kernel/entry-v7m.S  |   2 +-
 arch/arm/lib/backtrace-clang.S   |   2 +-
 arch/arm/lib/backtrace.S |   2 +-
 arch/arm/mach-rpc/io-acorn.S |   2 +-
 arch/arm/vfp/vfphw.S |   6 +-
 arch/openrisc/kernel/entry.S |   6 +-
 arch/powerpc/kernel/head_fsl_booke.S |   2 +-
 arch/x86/kernel/head_32.S|   2 +-