Re: [PATCH v3 2/3] printk: add lockless ringbuffer

2020-07-02 Thread Petr Mladek
On Thu 2020-06-18 16:55:18, John Ogness wrote:
> Introduce a multi-reader multi-writer lockless ringbuffer for storing
> the kernel log messages. Readers and writers may use their API from
> any context (including scheduler and NMI). This ringbuffer will make
> it possible to decouple printk() callers from any context, locking,
> or console constraints. It also makes it possible for readers to have
> full access to the ringbuffer contents at any time and context (for
> example from any panic situation).

Please, find few comments below. They are primary about improving
the comments. There are also 3 ideas how to clean up the code
a bit[*].

Everything might be solved by followup patches. Well, I would prefer
to update the comments in the next respin if you do it because
of the third patch.[**]

The small code changes can be done later. I do not want to introduce
eventual problems at this stage.

Feel free to use:

Reviewed-by: Petr Mladek 


[*] I have pretty good feeling about the code. And I really like the extensive
comments. The remaining ideas are mostly from the nits department.

[**] We are getting close to push this into linux-next. I think that
 it makes sense only when also the 3rd patch is ready.

 Also we should have a solution for the crashdump/crash tools at hands.
 But I think that this is already the case.


> --- /dev/null
> +++ b/kernel/printk/printk_ringbuffer.c
> + * Data Rings
> + * ~~
> + * The two data rings (text and dictionary) function identically. They exist
> + * separately so that their buffer sizes can be individually set and they do
> + * not affect one another.
> + *
> + * Data rings are byte arrays composed of data blocks. Data blocks are
> + * referenced by blk_lpos structs that point to the logical position of the
> + * beginning of a data block and the beginning of the next adjacent data
> + * block. Logical positions are mapped directly to index values of the byte
> + * array ringbuffer.
> + *
> + * Each data block consists of an ID followed by the raw data. The ID is the
> + * identifier of a descriptor that is associated with the data block. A data
> + * block is considered valid if all of the following conditions are met:
> + *
> + *   1) The descriptor associated with the data block is in the committed
> + *  or reusable queried state.

This sounds strange. The information in the descriptor is valid when
the descriptor is in the committed or reusable state. But the data
in the data block are valid only when the related descriptor is in
the committed state.

The data block might get reused faster than the descriptor that owned
it earlier.

> + *   2) The blk_lpos struct within the descriptor associated with the data
> + *  block references back to the same data block.

This is true. But this cross check is needed only when the assocoated
descriptor is found via the ID stored in the data block. It is not
needed when the data block is found via id/seq number stored in
the descriptor that is in commited state.


> + *   3) The data block is within the head/tail logical position range.

Also this is true. But it works by design. It does not need to be
checked when validating the data.

I would like to make this clear. So, I would write something like:


  The raw data stored in a data block can be considered valid when the
  the associated descriptor is in the committed state.

  The above condition is enough when the data block is found via
  information from the descriptor that was chosen by its ID.

  One more check is needed when the data block is found via
  blk_lpos.next stored in the descriptor for the previous data block.
  The data block might point to a descriptor in a committed state just
  by chance. It is the right one only when blk_lpos.begin points back
  to the data block.

I do not know. It might be to complicated. I actually did not mind
about the original text in previous versions. But I realize now that
it might have been the reason why the checks in data_make_reusable()
were too complicated in the previous versions.



> + * Sample reader code::
> + *
> + *   struct printk_info info;
> + *   struct printk_record r;
> + *   char text_buf[32];
> + *   char dict_buf[32];
> + *   u64 seq;
> + *
> + *   prb_rec_init_rd(, , _buf[0], sizeof(text_buf),
> + *   _buf[0], sizeof(dict_buf));
> + *
> + *   prb_for_each_record(0, _rb, , ) {
> + *   if (info.seq != seq)
> + *   pr_warn("lost %llu records\n", info.seq - seq);
> + *
> + *   if (info.text_len > r.text_buf_size) {
> + *   pr_warn("record %llu text truncated\n", info.seq);
> + *   text_buf[sizeof(text_buf) - 1] = 0;
> + *   }

AFAIK, the trailing '\0' is missing even the the text is not
truncated. We need to always add it here. Otherwise, we could not
print the text using %s below.

Also it means that it is not safe to copy the text using
memcpy(new_buf, _buf[0], r.text_len) because 

Re: [PATCH v3 2/3] printk: add lockless ringbuffer

2020-06-30 Thread Paul E. McKenney
On Thu, Jun 18, 2020 at 04:55:18PM +0206, John Ogness wrote:
> Introduce a multi-reader multi-writer lockless ringbuffer for storing
> the kernel log messages. Readers and writers may use their API from
> any context (including scheduler and NMI). This ringbuffer will make
> it possible to decouple printk() callers from any context, locking,
> or console constraints. It also makes it possible for readers to have
> full access to the ringbuffer contents at any time and context (for
> example from any panic situation).
> 
> The printk_ringbuffer is made up of 3 internal ringbuffers:
> 
> desc_ring:
> A ring of descriptors. A descriptor contains all record meta data
> (sequence number, timestamp, loglevel, etc.) as well as internal state
> information about the record and logical positions specifying where in
> the other ringbuffers the text and dictionary strings are located.
> 
> text_data_ring:
> A ring of data blocks. A data block consists of an unsigned long
> integer (ID) that maps to a desc_ring index followed by the text
> string of the record.
> 
> dict_data_ring:
> A ring of data blocks. A data block consists of an unsigned long
> integer (ID) that maps to a desc_ring index followed by the dictionary
> string of the record.
> 
> The internal state information of a descriptor is the key element to
> allow readers and writers to locklessly synchronize access to the data.
> 
> Co-developed-by: Petr Mladek 
> Signed-off-by: John Ogness 

The orderings match the comments, although a number could (later!)
be weakened to the easier-to-read smp_load_acquire() and/or
smp_store_release().  So, from a memory-ordering perspective:

Reviewed-by: Paul E. McKenney 

> ---
>  kernel/printk/Makefile|1 +
>  kernel/printk/printk_ringbuffer.c | 1674 +
>  kernel/printk/printk_ringbuffer.h |  352 ++
>  3 files changed, 2027 insertions(+)
>  create mode 100644 kernel/printk/printk_ringbuffer.c
>  create mode 100644 kernel/printk/printk_ringbuffer.h
> 
> diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
> index 4d052fc6bcde..eee3dc9b60a9 100644
> --- a/kernel/printk/Makefile
> +++ b/kernel/printk/Makefile
> @@ -2,3 +2,4 @@
>  obj-y= printk.o
>  obj-$(CONFIG_PRINTK) += printk_safe.o
>  obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)   += braille.o
> +obj-$(CONFIG_PRINTK) += printk_ringbuffer.o
> diff --git a/kernel/printk/printk_ringbuffer.c 
> b/kernel/printk/printk_ringbuffer.c
> new file mode 100644
> index ..75d056436cc5
> --- /dev/null
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -0,0 +1,1674 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "printk_ringbuffer.h"
> +
> +/**
> + * DOC: printk_ringbuffer overview
> + *
> + * Data Structure
> + * --
> + * The printk_ringbuffer is made up of 3 internal ringbuffers:
> + *
> + *   desc_ring
> + * A ring of descriptors. A descriptor contains all record meta data
> + * (sequence number, timestamp, loglevel, etc.) as well as internal state
> + * information about the record and logical positions specifying where in
> + * the other ringbuffers the text and dictionary strings are located.
> + *
> + *   text_data_ring
> + * A ring of data blocks. A data block consists of an unsigned long
> + * integer (ID) that maps to a desc_ring index followed by the text
> + * string of the record.
> + *
> + *   dict_data_ring
> + * A ring of data blocks. A data block consists of an unsigned long
> + * integer (ID) that maps to a desc_ring index followed by the dictionary
> + * string of the record.
> + *
> + * The internal state information of a descriptor is the key element to allow
> + * readers and writers to locklessly synchronize access to the data.
> + *
> + * Implementation
> + * --
> + *
> + * Descriptor Ring
> + * ~~~
> + * The descriptor ring is an array of descriptors. A descriptor contains all
> + * the meta data of a printk record as well as blk_lpos structs pointing to
> + * associated text and dictionary data blocks (see "Data Rings" below). Each
> + * descriptor is assigned an ID that maps directly to index values of the
> + * descriptor array and has a state. The ID and the state are bitwise 
> combined
> + * into a single descriptor field named @state_var, allowing ID and state to
> + * be synchronously and atomically updated.
> + *
> + * Descriptors have three states:
> + *
> + *   reserved
> + * A writer is modifying the record.
> + *
> + *   committed
> + * The record and all its data are complete and available for reading.
> + *
> + *   reusable
> + * The record exists, but its text and/or dictionary data may no longer
> + * be available.
> + *
> + * Querying the @state_var of a record requires providing the ID of the
> + * descriptor to query. This can yield a possible fourth (pseudo) state:
> + *
> + *   miss
> + * The descriptor being 

[PATCH v3 2/3] printk: add lockless ringbuffer

2020-06-23 Thread John Ogness
Introduce a multi-reader multi-writer lockless ringbuffer for storing
the kernel log messages. Readers and writers may use their API from
any context (including scheduler and NMI). This ringbuffer will make
it possible to decouple printk() callers from any context, locking,
or console constraints. It also makes it possible for readers to have
full access to the ringbuffer contents at any time and context (for
example from any panic situation).

The printk_ringbuffer is made up of 3 internal ringbuffers:

desc_ring:
A ring of descriptors. A descriptor contains all record meta data
(sequence number, timestamp, loglevel, etc.) as well as internal state
information about the record and logical positions specifying where in
the other ringbuffers the text and dictionary strings are located.

text_data_ring:
A ring of data blocks. A data block consists of an unsigned long
integer (ID) that maps to a desc_ring index followed by the text
string of the record.

dict_data_ring:
A ring of data blocks. A data block consists of an unsigned long
integer (ID) that maps to a desc_ring index followed by the dictionary
string of the record.

The internal state information of a descriptor is the key element to
allow readers and writers to locklessly synchronize access to the data.

Co-developed-by: Petr Mladek 
Signed-off-by: John Ogness 
---
 kernel/printk/Makefile|1 +
 kernel/printk/printk_ringbuffer.c | 1674 +
 kernel/printk/printk_ringbuffer.h |  352 ++
 3 files changed, 2027 insertions(+)
 create mode 100644 kernel/printk/printk_ringbuffer.c
 create mode 100644 kernel/printk/printk_ringbuffer.h

diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 4d052fc6bcde..eee3dc9b60a9 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -2,3 +2,4 @@
 obj-y  = printk.o
 obj-$(CONFIG_PRINTK)   += printk_safe.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
+obj-$(CONFIG_PRINTK)   += printk_ringbuffer.o
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
new file mode 100644
index ..75d056436cc5
--- /dev/null
+++ b/kernel/printk/printk_ringbuffer.c
@@ -0,0 +1,1674 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "printk_ringbuffer.h"
+
+/**
+ * DOC: printk_ringbuffer overview
+ *
+ * Data Structure
+ * --
+ * The printk_ringbuffer is made up of 3 internal ringbuffers:
+ *
+ *   desc_ring
+ * A ring of descriptors. A descriptor contains all record meta data
+ * (sequence number, timestamp, loglevel, etc.) as well as internal state
+ * information about the record and logical positions specifying where in
+ * the other ringbuffers the text and dictionary strings are located.
+ *
+ *   text_data_ring
+ * A ring of data blocks. A data block consists of an unsigned long
+ * integer (ID) that maps to a desc_ring index followed by the text
+ * string of the record.
+ *
+ *   dict_data_ring
+ * A ring of data blocks. A data block consists of an unsigned long
+ * integer (ID) that maps to a desc_ring index followed by the dictionary
+ * string of the record.
+ *
+ * The internal state information of a descriptor is the key element to allow
+ * readers and writers to locklessly synchronize access to the data.
+ *
+ * Implementation
+ * --
+ *
+ * Descriptor Ring
+ * ~~~
+ * The descriptor ring is an array of descriptors. A descriptor contains all
+ * the meta data of a printk record as well as blk_lpos structs pointing to
+ * associated text and dictionary data blocks (see "Data Rings" below). Each
+ * descriptor is assigned an ID that maps directly to index values of the
+ * descriptor array and has a state. The ID and the state are bitwise combined
+ * into a single descriptor field named @state_var, allowing ID and state to
+ * be synchronously and atomically updated.
+ *
+ * Descriptors have three states:
+ *
+ *   reserved
+ * A writer is modifying the record.
+ *
+ *   committed
+ * The record and all its data are complete and available for reading.
+ *
+ *   reusable
+ * The record exists, but its text and/or dictionary data may no longer
+ * be available.
+ *
+ * Querying the @state_var of a record requires providing the ID of the
+ * descriptor to query. This can yield a possible fourth (pseudo) state:
+ *
+ *   miss
+ * The descriptor being queried has an unexpected ID.
+ *
+ * The descriptor ring has a @tail_id that contains the ID of the oldest
+ * descriptor and @head_id that contains the ID of the newest descriptor.
+ *
+ * When a new descriptor should be created (and the ring is full), the tail
+ * descriptor is invalidated by first transitioning to the reusable state and
+ * then invalidating all tail data blocks up to and including the data blocks
+ * associated with the tail descriptor (for text and dictionary rings). Then
+ * @tail_id is advanced,