Re: misc details: Re: [PATCH 2/2] printk: use the lockless ringbuffer
On Tue 2020-02-25 21:11:31, John Ogness wrote: > >> --- a/kernel/printk/printk.c > >> +++ b/kernel/printk/printk.c > >> - * Every record carries the monotonic timestamp in microseconds, as well > >> as > >> - * the standard userspace syslog level and syslog facility. The usual > >> + * Every record meta-data carries the monotonic timestamp in > >> microseconds, as > > > > I am afraid that we could not guarantee monotonic timestamp because > > the writers are not synchronized. I hope that it will not create > > real problems and we could just remove the word "monotonic" ;-) > > I removed "monotonic". I hope userspace doesn't require the ringbuffer > to be chronologically sorted. That would explain why the safe buffers > use bogus timestamps. :-/ The timestamp was not stored into the safe buffers to keep the code simple. And people request to add the proper timestamps from time to time. IMHO, the precise timestamps are more important than ordering. So people should love the lockless ringbuffer from this POV ;-) > >> @@ -1974,9 +1966,9 @@ asmlinkage int vprintk_emit(int facility, int level, > >> > >>/* This stops the holder of console_sem just where we want him */ > >>logbuf_lock_irqsave(flags); > >> - curr_log_seq = log_next_seq; > >> + pending_output = !prb_read_valid(prb, console_seq, NULL); > >>printed_len = vprintk_store(facility, level, dict, dictlen, fmt, args); > >> - pending_output = (curr_log_seq != log_next_seq); > >> + pending_output &= prb_read_valid(prb, console_seq, NULL); > > > > The original code checked whether vprintk_store() stored the text > > into the main log buffer or only into the cont buffer. > > > > The new code checks whether console is behind which is something > > different. > > I would argue that they are the same thing in this context. Keep in mind > that we are under the logbuf_lock. If there was previously nothing > pending and now there is, this context is the only one that could have > added it. Right. > This logic will change significantly when we remove the locks (and it > will disappear once we go to kthreads). But we aren't that far at this > stage and I'd like to keep the general logic somewhat close to the > current mainline implementation for now. OK, it is not a big deal from my POV. It is just an optimization. It can be removed or improved later. It caught my eyes primary because prb_read_valid() was relatively complex function. I was not sure if it was worth the effort. But I am fine with keeping your code for now. It will help to reduce unrelated behavior changes. Best Regards, Petr ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: misc details: Re: [PATCH 2/2] printk: use the lockless ringbuffer
>> - Record meta-data is now stored in a separate array of descriptors. >> This is an additional 72 * (2 ^ ((CONFIG_LOG_BUF_SHIFT - 6))) bytes >> for the static array and 72 * (2 ^ ((log_buf_len - 6))) bytes for >> the dynamic array. > > It might help to show some examples. I mean to mention the sizes > when CONFIG_LOG_BUF_SHIFT is 12 or so. OK. >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> - * Every record carries the monotonic timestamp in microseconds, as well as >> - * the standard userspace syslog level and syslog facility. The usual >> + * Every record meta-data carries the monotonic timestamp in microseconds, >> as > > I am afraid that we could not guarantee monotonic timestamp because > the writers are not synchronized. I hope that it will not create > real problems and we could just remove the word "monotonic" ;-) I removed "monotonic". I hope userspace doesn't require the ringbuffer to be chronologically sorted. That would explain why the safe buffers use bogus timestamps. :-/ >> +/* >> + * Define the average message size. This only affects the number of >> + * descriptors that will be available. Underestimating is better than >> + * overestimating (too many available descriptors is better than not >> enough). >> + * The dictionary buffer will be the same size as the text buffer. >> + */ >> +#define PRB_AVGBITS 6 > > Do I get it correctly that '6' means 2^6 = 64 characters? Correct. > Some ugly counting on my test systems shows the average 49 chars: > > $> dmesg | cut -d ']' -f 2- | wc -c > 30172 > $> dmesg | cut -d ']' -f 2- | wc -l > 612 > $> echo $((30172 / 612)) > 49 > > If I get it correctly then lower number is the more safe side. > So, a more safe default should be 5? For v2 the value will be lowered to 5. >> -if (log_make_free_space(size)) { >> +if (!prb_reserve(, prb, )) { >> /* truncate the message if it is too long for empty buffer */ >> -size = truncate_msg(_len, _msg_len, >> -_len, _len); >> +truncate_msg(_len, _msg_len, _len); >> +r.text_buf_size = text_len + trunc_msg_len; Note that the additional space for the trunc_msg_len is being reserved. >> +r.dict_buf_size = dict_len; >> /* survive when the log buffer is too small for trunc_msg */ >> -if (log_make_free_space(size)) >> +if (!prb_reserve(, prb, )) >> return 0; >> } >> >> -if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) { >> -/* >> - * This message + an additional empty header does not fit >> - * at the end of the buffer. Add an empty header with len == 0 >> - * to signify a wrap around. >> - */ >> -memset(log_buf + log_next_idx, 0, sizeof(struct printk_log)); >> -log_next_idx = 0; >> -} >> - >> /* fill message */ >> -msg = (struct printk_log *)(log_buf + log_next_idx); >> -memcpy(log_text(msg), text, text_len); >> -msg->text_len = text_len; >> -if (trunc_msg_len) { >> -memcpy(log_text(msg) + text_len, trunc_msg, trunc_msg_len); >> -msg->text_len += trunc_msg_len; > > Note that the old code updates msg->text_len. msg->text_len is equivalent to r.info->text_len, which was already set by the prb_reserve() (and already includes the trunc_msg_len). >> -} >> -memcpy(log_dict(msg), dict, dict_len); >> -msg->dict_len = dict_len; >> -msg->facility = facility; >> -msg->level = level & 7; >> -msg->flags = flags & 0x1f; >> +memcpy(_buf[0], text, text_len); >> +if (trunc_msg_len) >> +memcpy(_buf[text_len], trunc_msg, trunc_msg_len); > > The new one just appends the string. That is all it needs to do here. >> +if (r.dict_buf) >> +memcpy(_buf[0], dict, dict_len); >> +r.info->facility = facility; >> +r.info->level = level & 7; >> +r.info->flags = flags & 0x1f; >> if (ts_nsec > 0) >> -msg->ts_nsec = ts_nsec; >> +r.info->ts_nsec = ts_nsec; >> else >> -msg->ts_nsec = local_clock(); >> -#ifdef CONFIG_PRINTK_CALLER >> -msg->caller_id = caller_id; >> -#endif >> -memset(log_dict(msg) + dict_len, 0, pad_len); >> -msg->len = size; >> +r.info->ts_nsec = local_clock(); >> +r.info->caller_id = caller_id; >> >> /* insert message */ >> -log_next_idx += msg->len; >> -log_next_seq++; >> +prb_commit(); >> >> -return msg->text_len; >> +return text_len; > > So, this should be text_len + trunc_msg_len. Good catch! Yes. Fixed for v2. Thank you. (Note that simply returning r.info->text_len is not allowed because the writer must not access that data after calling prb_commit()). >> @@ -1974,9 +1966,9 @@ asmlinkage int vprintk_emit(int facility, int level, >> >> /* This stops the holder of console_sem just where we want him */
Re: [PATCH 2/2] printk: use the lockless ringbuffer
在 2020年01月29日 00:19, John Ogness 写道: > Replace the existing ringbuffer usage and implementation with > lockless ringbuffer usage. Even though the new ringbuffer does not > require locking, all existing locking is left in place. Therefore, > this change is purely replacing the underlining ringbuffer. > > Changes that exist due to the ringbuffer replacement: > > - The VMCOREINFO has been updated for the new structures. > > - Dictionary data is now stored in a separate data buffer from the > human-readable messages. The dictionary data buffer is set to the > same size as the message buffer. Therefore, the total reserved > memory for messages is 2 * (2 ^ CONFIG_LOG_BUF_SHIFT) for the > initial static buffer and 2x the specified size in the log_buf_len > kernel parameter. > > - Record meta-data is now stored in a separate array of descriptors. > This is an additional 72 * (2 ^ ((CONFIG_LOG_BUF_SHIFT - 6))) bytes > for the static array and 72 * (2 ^ ((log_buf_len - 6))) bytes for > the dynamic array. > > Signed-off-by: John Ogness > --- > include/linux/kmsg_dump.h | 2 - > kernel/printk/Makefile| 1 + > kernel/printk/printk.c| 836 +++--- > 3 files changed, 416 insertions(+), 423 deletions(-) > > diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h > index 2e7a1e032c71..ae6265033e31 100644 > --- a/include/linux/kmsg_dump.h > +++ b/include/linux/kmsg_dump.h > @@ -46,8 +46,6 @@ struct kmsg_dumper { > bool registered; > > /* private state of the kmsg iterator */ > - u32 cur_idx; > - u32 next_idx; > u64 cur_seq; > u64 next_seq; > }; > diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile > index 4d052fc6bcde..eee3dc9b60a9 100644 > --- a/kernel/printk/Makefile > +++ b/kernel/printk/Makefile > @@ -2,3 +2,4 @@ > obj-y= printk.o > obj-$(CONFIG_PRINTK) += printk_safe.o > obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o > +obj-$(CONFIG_PRINTK) += printk_ringbuffer.o > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 1ef6f75d92f1..d0d24ee1d1f4 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -56,6 +56,7 @@ > #define CREATE_TRACE_POINTS > #include > > +#include "printk_ringbuffer.h" > #include "console_cmdline.h" > #include "braille.h" > #include "internal.h" > @@ -294,30 +295,22 @@ enum con_msg_format_flags { > static int console_msg_format = MSG_FORMAT_DEFAULT; > > /* > - * The printk log buffer consists of a chain of concatenated variable > - * length records. Every record starts with a record header, containing > - * the overall length of the record. > + * The printk log buffer consists of a sequenced collection of records, each > + * containing variable length message and dictionary text. Every record > + * also contains its own meta-data (@info). > * > - * The heads to the first and last entry in the buffer, as well as the > - * sequence numbers of these entries are maintained when messages are > - * stored. > - * > - * If the heads indicate available messages, the length in the header > - * tells the start next message. A length == 0 for the next message > - * indicates a wrap-around to the beginning of the buffer. > - * > - * Every record carries the monotonic timestamp in microseconds, as well as > - * the standard userspace syslog level and syslog facility. The usual > + * Every record meta-data carries the monotonic timestamp in microseconds, as > + * well as the standard userspace syslog level and syslog facility. The usual > * kernel messages use LOG_KERN; userspace-injected messages always carry > * a matching syslog facility, by default LOG_USER. The origin of every > * message can be reliably determined that way. > * > - * The human readable log message directly follows the message header. The > - * length of the message text is stored in the header, the stored message > - * is not terminated. > + * The human readable log message of a record is available in @text, the > length > + * of the message text in @text_len. The stored message is not terminated. > * > - * Optionally, a message can carry a dictionary of properties (key/value > pairs), > - * to provide userspace with a machine-readable message context. > + * Optionally, a record can carry a dictionary of properties (key/value > pairs), > + * to provide userspace with a machine-readable message context. The length > of > + * the dictionary is available in @dict_len. The dictionary is not > terminated. > * > * Examples for well-defined, commonly used property names are: > * DEVICE=b12:8 device identifier > @@ -331,21 +324,19 @@ static int console_msg_format = MSG_FORMAT_DEFAULT; > * follows directly after a '=' character. Every property is terminated by > * a '\0' character. The last property is not terminated. > * > - * Example of a message structure: > - * ff 8f 00 00 00 00 00 00 monotonic time in nsec >
Re: crashdump: Re: [PATCH 2/2] printk: use the lockless ringbuffer
On 2020-02-17, Petr Mladek wrote: >>> Should the "prb"(printk tb static) symbol be exported into >>> vmcoreinfo? Otherwise, do you happen to know how to walk through >>> the log_buf and get all kernel logs from vmcore? >> >> You are correct. This will need to be exported as well so that the >> descriptors can be accessed. (log_buf is only the pure human-readable >> text.) I am currently hacking the crash tool to see exactly what >> needs to be made available in order to access all the data of the >> ringbuffer. > > I am not sure which parts you are working on. Are you going to provide > also patch for makedumpfile, please? I'm working on crash first. makedumpfile is on my list as well. > I get the following failure when creating the crashdump using: > > echo c >/proc/sysrq-trigger > > > The kernel version is not supported. > The makedumpfile operation may be incomplete. > dump_dmesg: Can't find variable-length record symbols > makedumpfile Failed. > Running makedumpfile --dump-dmesg /proc/vmcore failed (1). Yes, the symbols have changed (and some are missing). I will get this sorted out for v2. And I will provide some heavily hacked code for crash and makedumpfile to show that the necessary symbols are there and it works. John Ogness ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
crashdump: Re: [PATCH 2/2] printk: use the lockless ringbuffer
On Fri 2020-02-14 14:50:02, John Ogness wrote: > Hi Lianbo, > > On 2020-02-14, lijiang wrote: > >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > >> index 1ef6f75d92f1..d0d24ee1d1f4 100644 > >> --- a/kernel/printk/printk.c > >> +++ b/kernel/printk/printk.c > >> @@ -1062,21 +928,16 @@ void log_buf_vmcoreinfo_setup(void) > >> { > >>VMCOREINFO_SYMBOL(log_buf); > >>VMCOREINFO_SYMBOL(log_buf_len); > > > > I notice that the "prb"(printk tb static) symbol is not exported into > > vmcoreinfo as follows: > > > > + VMCOREINFO_SYMBOL(prb); > > > > Should the "prb"(printk tb static) symbol be exported into vmcoreinfo? > > Otherwise, do you happen to know how to walk through the log_buf and > > get all kernel logs from vmcore? > > You are correct. This will need to be exported as well so that the > descriptors can be accessed. (log_buf is only the pure human-readable > text.) I am currently hacking the crash tool to see exactly what needs > to be made available in order to access all the data of the ringbuffer. I am not sure which parts you are working on. Are you going to provide also patch for makedumpfile, please? I get the following failure when creating the crashdump using: echo c >/proc/sysrq-trigger The kernel version is not supported. The makedumpfile operation may be incomplete. dump_dmesg: Can't find variable-length record symbols makedumpfile Failed. Running makedumpfile --dump-dmesg /proc/vmcore failed (1). Best Regards, Petr ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
misc details: Re: [PATCH 2/2] printk: use the lockless ringbuffer
On Tue 2020-01-28 17:25:48, John Ogness wrote: > Replace the existing ringbuffer usage and implementation with > lockless ringbuffer usage. Even though the new ringbuffer does not > require locking, all existing locking is left in place. Therefore, > this change is purely replacing the underlining ringbuffer. > > - Record meta-data is now stored in a separate array of descriptors. > This is an additional 72 * (2 ^ ((CONFIG_LOG_BUF_SHIFT - 6))) bytes > for the static array and 72 * (2 ^ ((log_buf_len - 6))) bytes for > the dynamic array. It might help to show some examples. I mean to mention the sizes when CONFIG_LOG_BUF_SHIFT is 12 or so. > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -294,30 +295,22 @@ enum con_msg_format_flags { > static int console_msg_format = MSG_FORMAT_DEFAULT; > > /* > - * The printk log buffer consists of a chain of concatenated variable > - * length records. Every record starts with a record header, containing > - * the overall length of the record. > + * The printk log buffer consists of a sequenced collection of records, each > + * containing variable length message and dictionary text. Every record > + * also contains its own meta-data (@info). > * > - * The heads to the first and last entry in the buffer, as well as the > - * sequence numbers of these entries are maintained when messages are > - * stored. > - * > - * If the heads indicate available messages, the length in the header > - * tells the start next message. A length == 0 for the next message > - * indicates a wrap-around to the beginning of the buffer. > - * > - * Every record carries the monotonic timestamp in microseconds, as well as > - * the standard userspace syslog level and syslog facility. The usual > + * Every record meta-data carries the monotonic timestamp in microseconds, as I am afraid that we could not guarantee monotonic timestamp because the writers are not synchronized. I hope that it will not create real problems and we could just remove the word "monotonic" ;-) > /* record buffer */ > -#define LOG_ALIGN __alignof__(struct printk_log) > +#define LOG_ALIGN __alignof__(unsigned long) > #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) > #define LOG_BUF_LEN_MAX (u32)(1 << 31) > static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > static char *log_buf = __log_buf; > static u32 log_buf_len = __LOG_BUF_LEN; > > +/* > + * Define the average message size. This only affects the number of > + * descriptors that will be available. Underestimating is better than > + * overestimating (too many available descriptors is better than not enough). > + * The dictionary buffer will be the same size as the text buffer. > + */ > +#define PRB_AVGBITS 6 Do I get it correctly that '6' means 2^6 = 64 characters? Some ugly counting on my test systems shows the average 49 chars: $> dmesg | cut -d ']' -f 2- | wc -c 30172 $> dmesg | cut -d ']' -f 2- | wc -l 612 $> echo $((30172 / 612)) 49 If I get it correctly then lower number is the more safe side. So, a more safe default should be 5? > + > +_DECLARE_PRINTKRB(printk_rb_static, CONFIG_LOG_BUF_SHIFT - PRB_AVGBITS, > + PRB_AVGBITS, PRB_AVGBITS, &__log_buf[0]); > + > @@ -606,60 +488,42 @@ static int log_store(u32 caller_id, int facility, int > level, >const char *dict, u16 dict_len, >const char *text, u16 text_len) > { > - struct printk_log *msg; > - u32 size, pad_len; > + struct prb_reserved_entry e; > + struct printk_record r; > u16 trunc_msg_len = 0; > > - /* number of '\0' padding bytes to next message */ > - size = msg_used_size(text_len, dict_len, _len); > + r.text_buf_size = text_len; > + r.dict_buf_size = dict_len; > > - if (log_make_free_space(size)) { > + if (!prb_reserve(, prb, )) { > /* truncate the message if it is too long for empty buffer */ > - size = truncate_msg(_len, _msg_len, > - _len, _len); > + truncate_msg(_len, _msg_len, _len); > + r.text_buf_size = text_len + trunc_msg_len; > + r.dict_buf_size = dict_len; > /* survive when the log buffer is too small for trunc_msg */ > - if (log_make_free_space(size)) > + if (!prb_reserve(, prb, )) > return 0; > } > > - if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) { > - /* > - * This message + an additional empty header does not fit > - * at the end of the buffer. Add an empty header with len == 0 > - * to signify a wrap around. > - */ > - memset(log_buf + log_next_idx, 0, sizeof(struct printk_log)); > - log_next_idx = 0; > - } > - > /* fill message */ > - msg = (struct printk_log *)(log_buf + log_next_idx); > - memcpy(log_text(msg), text, text_len); > -
Re: [PATCH 2/2] printk: use the lockless ringbuffer
在 2020年02月14日 21:50, John Ogness 写道: > Hi Lianbo, > > On 2020-02-14, lijiang wrote: >>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >>> index 1ef6f75d92f1..d0d24ee1d1f4 100644 >>> --- a/kernel/printk/printk.c >>> +++ b/kernel/printk/printk.c >>> @@ -1062,21 +928,16 @@ void log_buf_vmcoreinfo_setup(void) >>> { >>> VMCOREINFO_SYMBOL(log_buf); >>> VMCOREINFO_SYMBOL(log_buf_len); >> >> I notice that the "prb"(printk tb static) symbol is not exported into >> vmcoreinfo as follows: >> >> +VMCOREINFO_SYMBOL(prb); >> >> Should the "prb"(printk tb static) symbol be exported into vmcoreinfo? >> Otherwise, do you happen to know how to walk through the log_buf and >> get all kernel logs from vmcore? > > You are correct. This will need to be exported as well so that the > descriptors can be accessed. (log_buf is only the pure human-readable Really agree, and I guess that there may be more structures and their offsets to be exported, for example: struct prb_desc_ring, struct prb_data_ring, and struct prb_desc, etc. This makes sure that tools(such as makedumpfile and crash) can appropriately access them. > text.) I am currently hacking the crash tool to see exactly what needs > to be made available in order to access all the data of the ringbuffer. > It makes sense and avoids exporting unnecessary symbols and offsets. Thanks. Lianbo > John Ogness > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] printk: use the lockless ringbuffer
Hi Lianbo, On 2020-02-14, lijiang wrote: >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index 1ef6f75d92f1..d0d24ee1d1f4 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -1062,21 +928,16 @@ void log_buf_vmcoreinfo_setup(void) >> { >> VMCOREINFO_SYMBOL(log_buf); >> VMCOREINFO_SYMBOL(log_buf_len); > > I notice that the "prb"(printk tb static) symbol is not exported into > vmcoreinfo as follows: > > + VMCOREINFO_SYMBOL(prb); > > Should the "prb"(printk tb static) symbol be exported into vmcoreinfo? > Otherwise, do you happen to know how to walk through the log_buf and > get all kernel logs from vmcore? You are correct. This will need to be exported as well so that the descriptors can be accessed. (log_buf is only the pure human-readable text.) I am currently hacking the crash tool to see exactly what needs to be made available in order to access all the data of the ringbuffer. John Ogness ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] printk: use the lockless ringbuffer
On 2020-02-14, Sergey Senozhatsky wrote: >>> cat /dev/kmsg >>> cat: /dev/kmsg: Broken pipe >> >> In mainline you can have this "problem" as well. Once the ringbuffer >> has wrapped, any read to a newly opened /dev/kmsg when a new message >> arrived will result in an EPIPE. This happens quite easily once the >> ringbuffer has wrapped because each new message is overwriting the >> oldest message. > > Hmm. Something doesn't add up. > > Looking at the numbers, both r->info->seq and prb_first_seq(prb) > do increase, so there are new messages in the ring buffer > >u->seqr->seqprb_first_seq > [..] > cat: devkmsg_read() error 1981080 1982633 1981080 > cat: devkmsg_read() error 1981080 1982633 1981080 > cat: devkmsg_read() error 1981095 1982652 1981095 > cat: devkmsg_read() error 1981095 1982652 1981095 > cat: devkmsg_read() error 1981095 1982652 1981095 > [..] > > but 'cat' still wouldn't read anything from the logbuf - EPIPE. > > NOTE: I don't run 'cat /dev/kmsg' during the test. I run the test > first, then I run 'cat /dev/kmsg', after the test, when > printk-pressure is gone. Sure. The problem is not the printk-pressure. The problem is you have data-less records in your ringbuffer (from your previous printk-pressure). If you used your own program that continued to read after EPIPE, then you would see the sequence numbers jumping over the data-less records. > I can't reproduce it with current logbuf. 'cat' reads from /dev/kmsg > after heavy printk-pressure test. So chances are some loggers can also > experience problems. This might be a regression. Mainline doesn't have data-less records. In mainline such failed printk's are silently ignored (after attepting truncation). So for mainline you can only reproduce the overflow case. 1. Boot 5.6.0-rc1 (without any console= slowing down printk) 2. Fill the ringbuffer and let it overflow with: $ while true; do echo filling buffer > /dev/kmsg; done & 3. Once you can see the ringbuffer has overflowed (and continues to overflow), try to read from /dev/kmsg $ strace head /dev/kmsg In most cases you will see: read(3, 0x7f7307ac1000, 4096) = -1 EPIPE (Broken pipe) Current readers need to be able to handle EPIPE. cat(1) does not and so (unfortunately) is not a good candidate for reading the ringbuffer. >>> ... >>> systemd-journal: devkmsg_read() error 1979281 1982465 1980933 >>> systemd-journal: corrected seq 1982465 1982465 >>> cat: devkmsg_read() error 1980987 1982531 1980987 >>> cat: corrected seq 1982531 1982531 >>> cat: devkmsg_read() error 1981015 1982563 1981015 >>> cat: corrected seq 1982563 1982563 >> >> The situation with a data-less record is the same as when the ringbuffer >> wraps: cat is hitting that EPIPE. But re-opening the file descriptor is >> not going to help because it will not be able to get past that data-less >> record. > > So maybe this is the case with broken 'cat' on my system? I think it is appropriate for an application to close the descriptor after an EPIPE. /dev/kmsg is special because the reader should continue reading anyway. >> We could implement it such that devkmsg_read() will skip over data-less >> records instead of issuing an EPIPE. (That is what dmesg does.) But then >> do we need EPIPE at all? The reader can see that is has missed records >> by tracking the sequence number, so could we just get rid of EPIPE? Then >> cat(1) would be a great tool to view the raw ringbuffer. Please share >> your thoughts on this. > > Looking at systemd/src/journal/journald-kmsg.c : > server_read_dev_kmsg() -EPIPE is just one of the erronos they handle, > nothing special. Yes, but what does systemd-journald do when the EPIPE is _not_ returned and instead there is a jump in the sequence number? Looking at dev_kmsg_record(), systemd actually does it the way I would hope. It tracks the sequence number correctly. /* Did we lose any? */ if (serial > *s->kernel_seqnum) server_driver_message(s, 0, "MESSAGE_ID=" SD_MESSAGE_JOURNAL_MISSED_STR, LOG_MESSAGE("Missed %"PRIu64" kernel messages", serial - *s->kernel_seqnum), NULL); > Could it be the case that some other loggers would have special > handling for EPIPE? I'm not sure, let's look around. > > I'd say that EPIPE removal looks OK to me. But before we do that, I'm > not sure that we have clear understanding of 'cat /dev/kmsg' behaviour > change. In mainline, with regard to /dev/kmsg, sequence numbers will never jump. If there would be a jump (due to lost messages) then EPIPE is issued. The reader can either: 1. continue reading and see the jump 2. reopen the file descriptor, possibly having missed a ton more messages due to reopening, and then start from the oldest available message With my series, #2 is no longer an option because the lost
Re: [PATCH 2/2] printk: use the lockless ringbuffer
On (20/02/14 10:41), Sergey Senozhatsky wrote: > On (20/02/13 23:36), John Ogness wrote: [..] > > We could implement it such that devkmsg_read() will skip over data-less > > records instead of issuing an EPIPE. (That is what dmesg does.) But then > > do we need EPIPE at all? The reader can see that is has missed records > > by tracking the sequence number, so could we just get rid of EPIPE? Then > > cat(1) would be a great tool to view the raw ringbuffer. Please share > > your thoughts on this. > > Looking at systemd/src/journal/journald-kmsg.c : server_read_dev_kmsg() > -EPIPE is just one of the erronos they handle, nothing special. Could it > be the case that some other loggers would have special handling for EPIPE? > I'm not sure, let's look around. rsyslog static void readkmsg(void) { int i; uchar pRcv[8192+1]; char errmsg[2048]; for (;;) { dbgprintf("imkmsg waiting for kernel log line\n"); /* every read() from the opened device node receives one record of the printk buffer */ i = read(fklog, pRcv, 8192); if (i > 0) { /* successful read of message of nonzero length */ pRcv[i] = '\0'; } else if (i == -EPIPE) { imkmsgLogIntMsg(LOG_WARNING, "imkmsg: some messages in circular buffer got overwritten"); continue; } else { /* something went wrong - error or zero length message */ if (i < 0 && errno != EINTR && errno != EAGAIN) { /* error occured */ imkmsgLogIntMsg(LOG_ERR, "imkmsg: error reading kernel log - shutting down: %s", rs_strerror_r(errno, errmsg, sizeof(errmsg))); fklog = -1; } break; } submitSyslog(pRcv); } } So EPIPE errno better stay around. -ss ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] printk: use the lockless ringbuffer
On (20/02/13 23:36), John Ogness wrote: > >> Here prb_read_valid() was successful, so a record _was_ read. The > >> kerneldoc for the prb_read_valid() says: > > > > Hmm, yeah. That's true. > > > > OK, something weird... > > > > I ran some random printk-pressure test (mostly printks from IRQs; > > + some NMI printk-s, but they are routed through nmi printk-safe > > buffers; + some limited number of printk-safe printk-s, routed > > via printk-safe buffer (so, once again, IRQ); + user-space > > journalctl -f syslog reader), and after the test 'cat /dev/kmsg' > > is terminally broken > > > > [..] > > cat /dev/kmsg > > cat: /dev/kmsg: Broken pipe > > In mainline you can have this "problem" as well. Once the ringbuffer has > wrapped, any read to a newly opened /dev/kmsg when a new message arrived > will result in an EPIPE. This happens quite easily once the ringbuffer > has wrapped because each new message is overwriting the oldest message. Hmm. Something doesn't add up. Looking at the numbers, both r->info->seq and prb_first_seq(prb) do increase, so there are new messages in the ring buffer u->seqr->seqprb_first_seq [..] cat: devkmsg_read() error 1981080 1982633 1981080 cat: devkmsg_read() error 1981080 1982633 1981080 cat: devkmsg_read() error 1981095 1982652 1981095 cat: devkmsg_read() error 1981095 1982652 1981095 cat: devkmsg_read() error 1981095 1982652 1981095 [..] but 'cat' still wouldn't read anything from the logbuf - EPIPE. NOTE: I don't run 'cat /dev/kmsg' during the test. I run the test first, then I run 'cat /dev/kmsg', after the test, when printk-pressure is gone. I can't reproduce it with current logbuf. 'cat' reads from /dev/kmsg after heavy printk-pressure test. So chances are some loggers can also experience problems. This might be a regression. > > ... > > systemd-journal: devkmsg_read() error 1979281 1982465 1980933 > > systemd-journal: corrected seq 1982465 1982465 > > cat: devkmsg_read() error 1980987 1982531 1980987 > > cat: corrected seq 1982531 1982531 > > cat: devkmsg_read() error 1981015 1982563 1981015 > > cat: corrected seq 1982563 1982563 > > The situation with a data-less record is the same as when the ringbuffer > wraps: cat is hitting that EPIPE. But re-opening the file descriptor is > not going to help because it will not be able to get past that data-less > record. So maybe this is the case with broken 'cat' on my system? > We could implement it such that devkmsg_read() will skip over data-less > records instead of issuing an EPIPE. (That is what dmesg does.) But then > do we need EPIPE at all? The reader can see that is has missed records > by tracking the sequence number, so could we just get rid of EPIPE? Then > cat(1) would be a great tool to view the raw ringbuffer. Please share > your thoughts on this. Looking at systemd/src/journal/journald-kmsg.c : server_read_dev_kmsg() -EPIPE is just one of the erronos they handle, nothing special. Could it be the case that some other loggers would have special handling for EPIPE? I'm not sure, let's look around. I'd say that EPIPE removal looks OK to me. But before we do that, I'm not sure that we have clear understanding of 'cat /dev/kmsg' behaviour change. > On a side note (but related to data-less records): I hacked the > ringbuffer code to inject data-less records at various times in order to > verify your report. And I stumbled upon a bug in the ringbuffer, which > can lead to an infinite loop in console_unlock(). The problem occurs at: > > retry = prb_read_valid(prb, console_seq, NULL); > > which will erroneously return true if console_seq is pointing to a > data-less record but there are no valid records after it. The following > patch fixes the bug. And yes, for v2 I have added comments to the > desc_read_committed() code. That's great to know! -ss ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] printk: use the lockless ringbuffer
On 2020-02-13, Sergey Senozhatsky wrote: @@ -890,30 +758,26 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf, logbuf_unlock_irq(); ret = wait_event_interruptible(log_wait, - user->seq != log_next_seq); + prb_read_valid(prb, user->seq, r)); if (ret) goto out; logbuf_lock_irq(); } - if (user->seq < log_first_seq) { - /* our last seen message is gone, return error and reset */ - user->idx = log_first_idx; - user->seq = log_first_seq; + if (user->seq < r->info->seq) { + /* the expected message is gone, return error and reset */ + user->seq = r->info->seq; ret = -EPIPE; logbuf_unlock_irq(); goto out; } >>> >>> Sorry, why doesn't this do something like >>> >>> if (user->seq < prb_first_seq(prb)) { >>> /* the expected message is gone, return error and reset */ >>> user->seq = prb_first_seq(prb); >>> ret = -EPIPE; >>> ... >>> } >> >> Here prb_read_valid() was successful, so a record _was_ read. The >> kerneldoc for the prb_read_valid() says: > > Hmm, yeah. That's true. > > OK, something weird... > > I ran some random printk-pressure test (mostly printks from IRQs; > + some NMI printk-s, but they are routed through nmi printk-safe > buffers; + some limited number of printk-safe printk-s, routed > via printk-safe buffer (so, once again, IRQ); + user-space > journalctl -f syslog reader), and after the test 'cat /dev/kmsg' > is terminally broken > > [..] > cat /dev/kmsg > cat: /dev/kmsg: Broken pipe In mainline you can have this "problem" as well. Once the ringbuffer has wrapped, any read to a newly opened /dev/kmsg when a new message arrived will result in an EPIPE. This happens quite easily once the ringbuffer has wrapped because each new message is overwriting the oldest message. Although it can be convenient, cat(1) is actually a poor tool for viewing the ringbuffer for this reason. Unfortunately dmesg(1) is sub-optimal as well because it does not show the sequence numbers. So with dmesg(1) you cannot see if a message was dropped. :-/ > So I printed seq numbers from devksmg read to a seq buffer and dumped > it via procfs, just seq numbers before we adjust user->seq (set to > r->seq) and after > > + offt += snprintf(BUF + offt, > + sizeof(BUF) - offt, > + "%s: devkmsg_read() error %llu %llu > %llu\n", > + current->comm, > + user->seq, > + r->info->seq, > + prb_first_seq(prb)); > > > ... > systemd-journal: devkmsg_read() error 1979281 1982465 1980933 > systemd-journal: corrected seq 1982465 1982465 > cat: devkmsg_read() error 1980987 1982531 1980987 > cat: corrected seq 1982531 1982531 > cat: devkmsg_read() error 1981015 1982563 1981015 > cat: corrected seq 1982563 1982563 The situation with a data-less record is the same as when the ringbuffer wraps: cat is hitting that EPIPE. But re-opening the file descriptor is not going to help because it will not be able to get past that data-less record. We could implement it such that devkmsg_read() will skip over data-less records instead of issuing an EPIPE. (That is what dmesg does.) But then do we need EPIPE at all? The reader can see that is has missed records by tracking the sequence number, so could we just get rid of EPIPE? Then cat(1) would be a great tool to view the raw ringbuffer. Please share your thoughts on this. On a side note (but related to data-less records): I hacked the ringbuffer code to inject data-less records at various times in order to verify your report. And I stumbled upon a bug in the ringbuffer, which can lead to an infinite loop in console_unlock(). The problem occurs at: retry = prb_read_valid(prb, console_seq, NULL); which will erroneously return true if console_seq is pointing to a data-less record but there are no valid records after it. The following patch fixes the bug. And yes, for v2 I have added comments to the desc_read_committed() code. I now have 2 bugfixes queued up for v2. The first one is here[0]. [0] https://lkml.kernel.org/r/87wo919grz@linutronix.de John Ogness diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c index 796257f226ee..31893051ad6b 100644 --- a/kernel/printk/printk_ringbuffer.c +++ b/kernel/printk/printk_ringbuffer.c @@ -1074,6 +1071,7 @@ static int desc_read_committed(struct prb_desc_ring *desc_ring, unsigned long id, u64 seq, struct
Re: [PATCH 2/2] printk: use the lockless ringbuffer
On (20/02/13 10:42), John Ogness wrote: > On 2020-02-13, Sergey Senozhatsky wrote: > >> - while (user->seq == log_next_seq) { > >> + if (!prb_read_valid(prb, user->seq, r)) { > >>if (file->f_flags & O_NONBLOCK) { > >>ret = -EAGAIN; > >>logbuf_unlock_irq(); > >> @@ -890,30 +758,26 @@ static ssize_t devkmsg_read(struct file *file, char > >> __user *buf, > >> > >>logbuf_unlock_irq(); > >>ret = wait_event_interruptible(log_wait, > >> - user->seq != log_next_seq); > >> + prb_read_valid(prb, user->seq, r)); > >>if (ret) > >>goto out; > >>logbuf_lock_irq(); > >>} > >> > >> - if (user->seq < log_first_seq) { > >> - /* our last seen message is gone, return error and reset */ > >> - user->idx = log_first_idx; > >> - user->seq = log_first_seq; > >> + if (user->seq < r->info->seq) { > >> + /* the expected message is gone, return error and reset */ > >> + user->seq = r->info->seq; > >>ret = -EPIPE; > >>logbuf_unlock_irq(); > >>goto out; > >>} > > > > Sorry, why doesn't this do something like > > > > if (user->seq < prb_first_seq(prb)) { > > /* the expected message is gone, return error and reset */ > > user->seq = prb_first_seq(prb); > > ret = -EPIPE; > > ... > > } > > Here prb_read_valid() was successful, so a record _was_ read. The > kerneldoc for the prb_read_valid() says: Hmm, yeah. That's true. OK, something weird... I ran some random printk-pressure test (mostly printks from IRQs; + some NMI printk-s, but they are routed through nmi printk-safe buffers; + some limited number of printk-safe printk-s, routed via printk-safe buffer (so, once again, IRQ); + user-space journalctl -f syslog reader), and after the test 'cat /dev/kmsg' is terminally broken [..] cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe cat /dev/kmsg cat: /dev/kmsg: Broken pipe [..] dmesg works. Reading from /dev/kmsg - doesn't; it did work, however, before the test. So I printed seq numbers from devksmg read to a seq buffer and dumped it via procfs, just seq numbers before we adjust user->seq (set to r->seq) and after + offt += snprintf(BUF + offt, + sizeof(BUF) - offt, + "%s: devkmsg_read() error %llu %llu %llu\n", + current->comm, + user->seq, + r->info->seq, + prb_first_seq(prb)); ... systemd-journal: devkmsg_read() error 1979235 1979236 1979236 systemd-journal: corrected seq 1979236 1979236 systemd-journal: devkmsg_read() error 1979237 1979243 1979243 systemd-journal: corrected seq 1979243 1979243 systemd-journal: devkmsg_read() error 1979244 1979250 1979250 systemd-journal: corrected seq 1979250 1979250 systemd-journal: devkmsg_read() error 1979251 1979257 1979257 systemd-journal: corrected seq 1979257 1979257 systemd-journal: devkmsg_read() error 1979258 1979265 1979265 systemd-journal: corrected seq 1979265 1979265 systemd-journal: devkmsg_read() error 1979266 1979272 1979272 systemd-journal: corrected seq 1979272 1979272 systemd-journal: devkmsg_read() error 1979272 1979273 1979273 systemd-journal: corrected seq 1979273 1979273 systemd-journal:
Re: [PATCH 2/2] printk: use the lockless ringbuffer
On 2020-02-13, Sergey Senozhatsky wrote: >> -while (user->seq == log_next_seq) { >> +if (!prb_read_valid(prb, user->seq, r)) { >> if (file->f_flags & O_NONBLOCK) { >> ret = -EAGAIN; >> logbuf_unlock_irq(); >> @@ -890,30 +758,26 @@ static ssize_t devkmsg_read(struct file *file, char >> __user *buf, >> >> logbuf_unlock_irq(); >> ret = wait_event_interruptible(log_wait, >> - user->seq != log_next_seq); >> +prb_read_valid(prb, user->seq, r)); >> if (ret) >> goto out; >> logbuf_lock_irq(); >> } >> >> -if (user->seq < log_first_seq) { >> -/* our last seen message is gone, return error and reset */ >> -user->idx = log_first_idx; >> -user->seq = log_first_seq; >> +if (user->seq < r->info->seq) { >> +/* the expected message is gone, return error and reset */ >> +user->seq = r->info->seq; >> ret = -EPIPE; >> logbuf_unlock_irq(); >> goto out; >> } > > Sorry, why doesn't this do something like > > if (user->seq < prb_first_seq(prb)) { > /* the expected message is gone, return error and reset */ > user->seq = prb_first_seq(prb); > ret = -EPIPE; > ... > } Here prb_read_valid() was successful, so a record _was_ read. The kerneldoc for the prb_read_valid() says: * On success, the reader must check r->info.seq to see which record was * actually read. The value will either be the requested user->seq or some higher value because user->seq is not available. There are 2 reasons why user->seq is not available (and a later record _is_ available): 1. The ringbuffer overtook user->seq. In this case, comparing and then setting using prb_first_seq() could be appropriate. And r->info->seq might even already be what prb_first_seq() would return. (More on this below.) 2. The record with user->seq has no data because the writer failed to allocate dataring space. In this case, resetting back to prb_first_seq() would be incorrect. And since r->info->seq is the next valid record, it is appropriate that the next devkmsg_read() starts there. Rather than checking these cases separately, it is enough just to check for the 2nd case. For the 1st case, prb_first_seq() could be less than r->info->seq if all the preceeding records have no data. But this just means the whole set of records with missing data are skipped, which matches existing behavior. (For example, currently when devkmsg is behind 10 messages, there are not 10 -EPIPE returns. Instead it immediately catches up to the next available record.) Perhaps the new comment should be: /* * The expected message is gone, return error and * reset to the next available message. */ John Ogness ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] printk: use the lockless ringbuffer
On (20/01/28 17:25), John Ogness wrote: [..] > - while (user->seq == log_next_seq) { > + if (!prb_read_valid(prb, user->seq, r)) { > if (file->f_flags & O_NONBLOCK) { > ret = -EAGAIN; > logbuf_unlock_irq(); > @@ -890,30 +758,26 @@ static ssize_t devkmsg_read(struct file *file, char > __user *buf, > > logbuf_unlock_irq(); > ret = wait_event_interruptible(log_wait, > -user->seq != log_next_seq); > + prb_read_valid(prb, user->seq, r)); > if (ret) > goto out; > logbuf_lock_irq(); > } > > - if (user->seq < log_first_seq) { > - /* our last seen message is gone, return error and reset */ > - user->idx = log_first_idx; > - user->seq = log_first_seq; > + if (user->seq < r->info->seq) { > + /* the expected message is gone, return error and reset */ > + user->seq = r->info->seq; > ret = -EPIPE; > logbuf_unlock_irq(); > goto out; > } Sorry, why doesn't this do something like if (user->seq < prb_first_seq(prb)) { /* the expected message is gone, return error and reset */ user->seq = prb_first_seq(prb); ret = -EPIPE; ... } ? -ss ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/2] printk: use the lockless ringbuffer
Replace the existing ringbuffer usage and implementation with lockless ringbuffer usage. Even though the new ringbuffer does not require locking, all existing locking is left in place. Therefore, this change is purely replacing the underlining ringbuffer. Changes that exist due to the ringbuffer replacement: - The VMCOREINFO has been updated for the new structures. - Dictionary data is now stored in a separate data buffer from the human-readable messages. The dictionary data buffer is set to the same size as the message buffer. Therefore, the total reserved memory for messages is 2 * (2 ^ CONFIG_LOG_BUF_SHIFT) for the initial static buffer and 2x the specified size in the log_buf_len kernel parameter. - Record meta-data is now stored in a separate array of descriptors. This is an additional 72 * (2 ^ ((CONFIG_LOG_BUF_SHIFT - 6))) bytes for the static array and 72 * (2 ^ ((log_buf_len - 6))) bytes for the dynamic array. Signed-off-by: John Ogness --- include/linux/kmsg_dump.h | 2 - kernel/printk/Makefile| 1 + kernel/printk/printk.c| 836 +++--- 3 files changed, 416 insertions(+), 423 deletions(-) diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h index 2e7a1e032c71..ae6265033e31 100644 --- a/include/linux/kmsg_dump.h +++ b/include/linux/kmsg_dump.h @@ -46,8 +46,6 @@ struct kmsg_dumper { bool registered; /* private state of the kmsg iterator */ - u32 cur_idx; - u32 next_idx; u64 cur_seq; u64 next_seq; }; diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile index 4d052fc6bcde..eee3dc9b60a9 100644 --- a/kernel/printk/Makefile +++ b/kernel/printk/Makefile @@ -2,3 +2,4 @@ obj-y = printk.o obj-$(CONFIG_PRINTK) += printk_safe.o obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o +obj-$(CONFIG_PRINTK) += printk_ringbuffer.o diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1ef6f75d92f1..d0d24ee1d1f4 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -56,6 +56,7 @@ #define CREATE_TRACE_POINTS #include +#include "printk_ringbuffer.h" #include "console_cmdline.h" #include "braille.h" #include "internal.h" @@ -294,30 +295,22 @@ enum con_msg_format_flags { static int console_msg_format = MSG_FORMAT_DEFAULT; /* - * The printk log buffer consists of a chain of concatenated variable - * length records. Every record starts with a record header, containing - * the overall length of the record. + * The printk log buffer consists of a sequenced collection of records, each + * containing variable length message and dictionary text. Every record + * also contains its own meta-data (@info). * - * The heads to the first and last entry in the buffer, as well as the - * sequence numbers of these entries are maintained when messages are - * stored. - * - * If the heads indicate available messages, the length in the header - * tells the start next message. A length == 0 for the next message - * indicates a wrap-around to the beginning of the buffer. - * - * Every record carries the monotonic timestamp in microseconds, as well as - * the standard userspace syslog level and syslog facility. The usual + * Every record meta-data carries the monotonic timestamp in microseconds, as + * well as the standard userspace syslog level and syslog facility. The usual * kernel messages use LOG_KERN; userspace-injected messages always carry * a matching syslog facility, by default LOG_USER. The origin of every * message can be reliably determined that way. * - * The human readable log message directly follows the message header. The - * length of the message text is stored in the header, the stored message - * is not terminated. + * The human readable log message of a record is available in @text, the length + * of the message text in @text_len. The stored message is not terminated. * - * Optionally, a message can carry a dictionary of properties (key/value pairs), - * to provide userspace with a machine-readable message context. + * Optionally, a record can carry a dictionary of properties (key/value pairs), + * to provide userspace with a machine-readable message context. The length of + * the dictionary is available in @dict_len. The dictionary is not terminated. * * Examples for well-defined, commonly used property names are: * DEVICE=b12:8 device identifier @@ -331,21 +324,19 @@ static int console_msg_format = MSG_FORMAT_DEFAULT; * follows directly after a '=' character. Every property is terminated by * a '\0' character. The last property is not terminated. * - * Example of a message structure: - * ff 8f 00 00 00 00 00 00 monotonic time in nsec - * 0008 34 00record is 52 bytes long - * 000a0b 00 text is 11 bytes long - * 000c 1f 00dictionary is 23 bytes long - * 000e03 00 LOG_KERN (facility)