Re: makedumpfile: a feature question about filtering

2020-09-14 Thread piliu


On 09/14/2020 05:15 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> -Original Message-
>> On 09/14/2020 04:15 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
>>> -Original Message-
 On 09/11/2020 04:53 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Pingfan,
>
> -Original Message-
>> Hello,
>>
>> There is an appeal which only wants to save some user page including env
>> and args pages, and discards the other user space pages.
>
> I understand that it's helpful to get them even with -d 31 for crash's
> "ps -a" option..
>
>>
>> To achieve this feature, mm_struct's members "arg_start, arg_end,
>> env_start, env_end;" should be accessed. So we need to export mm_struct
>> and init_mm through vmcore.
>
> How many offsets/sizes will be required to walk all tasks?
 At present, I think only the info "arg_start, arg_end, env_start,
 env_end" in mm_struct are required.
>>>
>>> ah what I wanted to ask mainly was the number of the offsets/sizes used to
>>> walk through all (user) tasks in a system, because makedumpfile cannot get
>>> to a task's arg_start only with OFFSET(mm_struct.arg_start).  Is it easy
>>> enough to do it only with several vmcoreinfo entries?
>> Yes, it is. Iterating over tasks requires to expose
>> OFFSET(mm_struct.mmlist, and &init_mm. Then for each mm_struct, we need
>> an access to "arg_start, arg_end, env_start,env_end"
> 
> Hmm, but a Fedora 32 machine has an empty init_mm.mmlist.
> (because of no used swap?)
Aha, sorry that I made a mistake and mmlist is not used to organize all
the mm any more.

In order to access all mm_strcut in the system, init_task.tasks
linked-list should be exposed, and for each task we can access its
mm_struct by OFFSET(task_struct.mm), then OFFSET(mm_struct.arg_start).
> 
> crash> p init_mm.mmlist
> $1 = {
>   next = 0x826ee200 , 
>   prev = 0x826ee200 
> }
> crash> swap
> SWAP_INFO_STRUCTTYPE   SIZE   USED PCT  PRI  FILENAME
> 8badb5385a00  PARTITION  4153340k  0k   0%   -2  /dev/dm-1
> 
> I might be still missing something.
You are right. And could you foresee any problem with my new try?

Thanks,
Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 2/7] kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED

2020-09-14 Thread Wei Yang
On Tue, Sep 08, 2020 at 10:10:07PM +0200, David Hildenbrand wrote:
>IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
>always set to 0 by hardware. This is far from beautiful (and confusing),
>and the bit only applies to SYSRAM. So let's move it out of the
>bus-specific (PnP) defined bits.
>
>We'll add another SYSRAM specific bit soon. If we ever need more bits for
>other purposes, we can steal some from "desc", or reshuffle/regroup what we
>have.

I think you make this definition because we use IORESOURCE_SYSRAM_RAM for
hotpluged memory? So we make them all in IORESOURCE_SYSRAM_XXX family?

>
>Cc: Andrew Morton 
>Cc: Michal Hocko 
>Cc: Dan Williams 
>Cc: Jason Gunthorpe 
>Cc: Kees Cook 
>Cc: Ard Biesheuvel 
>Cc: Pankaj Gupta 
>Cc: Baoquan He 
>Cc: Wei Yang 
>Cc: Eric Biederman 
>Cc: Thomas Gleixner 
>Cc: Greg Kroah-Hartman 
>Cc: kexec@lists.infradead.org
>Signed-off-by: David Hildenbrand 
>---
> include/linux/ioport.h | 4 +++-
> kernel/kexec_file.c| 2 +-
> mm/memory_hotplug.c| 4 ++--
> 3 files changed, 6 insertions(+), 4 deletions(-)
>
>diff --git a/include/linux/ioport.h b/include/linux/ioport.h
>index 52a91f5fa1a36..d7620d7c941a0 100644
>--- a/include/linux/ioport.h
>+++ b/include/linux/ioport.h
>@@ -58,6 +58,9 @@ struct resource {
> #define IORESOURCE_EXT_TYPE_BITS 0x0100   /* Resource extended types */
> #define IORESOURCE_SYSRAM 0x0100  /* System RAM (modifier) */
> 
>+/* IORESOURCE_SYSRAM specific bits. */
>+#define IORESOURCE_SYSRAM_DRIVER_MANAGED  0x0200 /* Always detected 
>via a driver. */
>+
> #define IORESOURCE_EXCLUSIVE  0x0800  /* Userland may not map this 
> resource */
> 
> #define IORESOURCE_DISABLED   0x1000
>@@ -103,7 +106,6 @@ struct resource {
> #define IORESOURCE_MEM_32BIT  (3<<3)
> #define IORESOURCE_MEM_SHADOWABLE (1<<5)  /* dup: IORESOURCE_SHADOWABLE */
> #define IORESOURCE_MEM_EXPANSIONROM   (1<<6)
>-#define IORESOURCE_MEM_DRIVER_MANAGED (1<<7)
> 
> /* PnP I/O specific bits (IORESOURCE_BITS) */
> #define IORESOURCE_IO_16BIT_ADDR  (1<<0)
>diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
>index ca40bef75a616..dfeeed1aed084 100644
>--- a/kernel/kexec_file.c
>+++ b/kernel/kexec_file.c
>@@ -520,7 +520,7 @@ static int locate_mem_hole_callback(struct resource *res, 
>void *arg)
>   /* Returning 0 will take to next memory range */
> 
>   /* Don't use memory that will be detected and handled by a driver. */
>-  if (res->flags & IORESOURCE_MEM_DRIVER_MANAGED)
>+  if (res->flags & IORESOURCE_SYSRAM_DRIVER_MANAGED)
>   return 0;
> 
>   if (sz < kbuf->memsz)
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 4c47b68a9f4b5..8e1cd18b5cf14 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 
>start, u64 size,
>   unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> 
>   if (strcmp(resource_name, "System RAM"))
>-  flags |= IORESOURCE_MEM_DRIVER_MANAGED;
>+  flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED;
> 
>   /*
>* Make sure value parsed from 'mem=' only restricts memory adding
>@@ -1160,7 +1160,7 @@ EXPORT_SYMBOL_GPL(add_memory);
>  *
>  * For this memory, no entries in /sys/firmware/memmap ("raw firmware-provided
>  * memory map") are created. Also, the created memory resource is flagged
>- * with IORESOURCE_MEM_DRIVER_MANAGED, so in-kernel users can special-case
>+ * with IORESOURCE_SYSRAM_DRIVER_MANAGED, so in-kernel users can special-case
>  * this memory as well (esp., not place kexec images onto it).
>  *
>  * The resource_name (visible via /proc/iomem) has to have the format
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


COVID-19 DONATION!

2020-09-14 Thread Frances & Patrick Connolly



--
Our Names are Frances and Patrick Connolly and our foundation is 
donating (£1.5 Million Pounds) to you. Contact us via my email at 
(atlerr...@gmail.com) for further details.


Best Regards,
Frances & Patrick Connolly,
Copyright ©2020 The Frances & Patrick Connolly Foundation All Rights 
Reserved.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [BUG RT] dump-capture kernel not executed for panic in interrupt context

2020-09-14 Thread Eric W. Biederman


Adding the kexec list as well.

Joerg Vehlow  writes:

> Hi Eric,
>> What is this patch supposed to be doing?
>>
>> What bug is it fixing?
> This information is part in the first message of this mail thread.
> The patch was intendedfor the active discussion in this thread,
> not for a broad review.

> A short summary: In the rt kernel, a panic in an interrupt context does
> not start the dump-capture kernel, because there is a mutex_trylock in
> __crash_kexe. If this is called in interrupt context, it always fails.
> In the non-rt kernel calling mutex_trylock is not allowed according to
> the comment of the function, but it still works.

Thanks.  For whatever reason I did not see the rest of this thread
when I was replying to your patch.

I get the feeling the rt kernel is breaking this case deliberately.
I don't know of any reason why a trylock couldn't work.

That said I won't propose fixing up the locks that way.

>> A BUG_ON that triggers inside of BUG_ONs seems not just suspect but
>> outright impossible to make use of.
> I am not entirely sure what would happen here. But even if it gets in
> some kind ofendless loop, I guess this is ok, because it allows finding
> the problem. A piece of code in the function, that ensures the precondition
> is a lot better than relying on only a comment.
> If this was in mtex_trylock, the bug described above wouldn't have sneaked
> in 12 years ago...

BUG_ON's are more likely to hide a problem then to show it.
Sometimes they are appropriate but the should be avoided as much as
possible.


>> I get the feeling skimming this that it is time to sort out and simplify
>> the locking here, rather than make it more complex, and more likely to
>> fail.
> I would very much like that, but sadly it looks like it is not possible.
> Either it wouldrequire blocking locks, that may fail, or not locking at
> all, that may also fail.Using a different kind of lock (like spinlock)
> is also not possible, becausespinlock_trylock again uses mutex_trylock
> in the rt kernel.

I think it is possible but the locking needs to be relooked at.

>> I get the feeling that over the years somehow the assumption that the
>> rest of the kernel is broken and that we need to get out of the broken
>> kernel as fast and as simply as possible has been lost.
> Yes I also have the feeling, that the mutexes need fixing, but I wouldn't
> to post any patch for that. At the moment, given the interface of the mutex,
> this is clearly a bug in kexec, even if it works in the non-rt kernel.

Cleanups that break the code. Sigh.

The code was written correctly for this case and was fine until
8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()").

Mostly because I didn't trust locks given their comparatively high level
of abstraction and what do you know that turned out to be correct in
this case.

It definitely looks time to see how the locking can be improved on the
kexec on panic code path.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v5 0/6] printk: reimplement LOG_CONT handling

2020-09-14 Thread John Ogness
Hello,

Here is v5 for the second series to rework the printk subsystem.
(The v4 is here [0].) This series implements a new ringbuffer
feature that allows the last record to be extended. Petr Mladek
provided the initial proof of concept [1] for this.

Using the record extension feature, LOG_CONT is re-implemented
in a way that exactly preserves its behavior, but avoids the
need for an extra buffer. In particular, it avoids the need for
any synchronization that such a buffer requires.

This series deviates from the agreements [2] made at the meeting
during LPC2019 in Lisbon. The test results of the v1 series,
which implemented LOG_CONT as agreed upon, showed that the
effects on existing userspace tools using /dev/kmsg (journalctl,
dmesg) were not acceptable [3].

Patch 5 introduces *four* new memory barrier pairs. Two of them
are insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose and context of
the two existing memory barrier pairs they provide an alternate
path for. The other two new memory barrier pairs are significant
additions:

desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
ensure the state transitions back to desc_reserved before
fully trusting the descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
ensure the state transitions to desc_committed before checking
the head ID to see if the descriptor needs to be finalized.

The test module used to test the ringbuffer is available
here [4].

The series is based on the printk-rework branch of the printk git
tree:

e60768311af8 ("scripts/gdb: update for lockless printk ringbuffer")

The list of changes since v4:

printk_ringbuffer
=

- desc_read(): revert setting @state_var when inconsistent (a
  separate series [5] is addressing this bug)

- desc_reserve(): use DESC_SV() when setting reserved

- data_realloc(): also do nothing if the size is the same

- prb_reserve_in_last(): adjust dataless checks/warnings to match
  the non-dataless case

- prb_reserve_in_last(): fix length modifier in warnings

- change comments about "state flags" to just talk about "states"

John Ogness

[0] https://lkml.kernel.org/r/20200908202859.2736-1-john.ogn...@linutronix.de
[1] https://lkml.kernel.org/r/20200812163908.GH12903@alley
[2] https://lkml.kernel.org/r/87k1acz5rx@linutronix.de
[3] https://lkml.kernel.org/r/20200811160551.GC12903@alley
[4] https://github.com/Linutronix/prb-test.git
[5] https://lkml.kernel.org/r/20200914094803.27365-1-john.ogn...@linutronix.de

John Ogness (6):
  printk: ringbuffer: relocate get_data()
  printk: ringbuffer: add BLK_DATALESS() macro
  printk: ringbuffer: clear initial reserved fields
  printk: ringbuffer: change representation of states
  printk: ringbuffer: add finalization/extension support
  printk: reimplement log_cont using record extension

 Documentation/admin-guide/kdump/gdbmacros.txt |  13 +-
 kernel/printk/printk.c| 110 +--
 kernel/printk/printk_ringbuffer.c | 683 ++
 kernel/printk/printk_ringbuffer.h |  35 +-
 scripts/gdb/linux/dmesg.py|  12 +-
 5 files changed, 615 insertions(+), 238 deletions(-)

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v5 4/6] printk: ringbuffer: change representation of states

2020-09-14 Thread John Ogness
Rather than deriving the state by evaluating bits within the flags
area of the state variable, assign the states explicit values and
set those values in the flags area. Introduce macros to make it
simple to read and write state values for the state variable.

Although the functionality is preserved, the binary representation
for the states is changed.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 Documentation/admin-guide/kdump/gdbmacros.txt | 12 ---
 kernel/printk/printk_ringbuffer.c | 28 +
 kernel/printk/printk_ringbuffer.h | 31 ---
 scripts/gdb/linux/dmesg.py| 11 ---
 4 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 7adece30237e..8f533b751c46 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -295,9 +295,12 @@ document dump_record
 end
 
 define dmesg
-   set var $desc_committed = 1UL << ((sizeof(long) * 8) - 1)
-   set var $flags_mask = 3UL << ((sizeof(long) * 8) - 2)
-   set var $id_mask = ~$flags_mask
+   # definitions from kernel/printk/printk_ringbuffer.h
+   set var $desc_committed = 1
+   set var $desc_sv_bits = sizeof(long) * 8
+   set var $desc_flags_shift = $desc_sv_bits - 2
+   set var $desc_flags_mask = 3 << $desc_flags_shift
+   set var $id_mask = ~$desc_flags_mask
 
set var $desc_count = 1U << prb->desc_ring.count_bits
set var $prev_flags = 0
@@ -309,7 +312,8 @@ define dmesg
set var $desc = &prb->desc_ring.descs[$id % $desc_count]
 
# skip non-committed record
-   if (($desc->state_var.counter & $flags_mask) == $desc_committed)
+   set var $state = 3 & ($desc->state_var.counter >> 
$desc_flags_shift)
+   if ($state == $desc_committed)
dump_record $desc $prev_flags
set var $prev_flags = $desc->info.flags
end
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 82347abb22a5..911fbe150e9a 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -348,14 +348,6 @@ static bool data_check_size(struct prb_data_ring 
*data_ring, unsigned int size)
return true;
 }
 
-/* The possible responses of a descriptor state-query. */
-enum desc_state {
-   desc_miss,  /* ID mismatch */
-   desc_reserved,  /* reserved, in use by writer */
-   desc_committed, /* committed, writer is done */
-   desc_reusable,  /* free, not yet used by any writer */
-};
-
 /* Query the state of a descriptor. */
 static enum desc_state get_desc_state(unsigned long id,
  unsigned long state_val)
@@ -363,13 +355,7 @@ static enum desc_state get_desc_state(unsigned long id,
if (id != DESC_ID(state_val))
return desc_miss;
 
-   if (state_val & DESC_REUSE_MASK)
-   return desc_reusable;
-
-   if (state_val & DESC_COMMITTED_MASK)
-   return desc_committed;
-
-   return desc_reserved;
+   return DESC_STATE(state_val);
 }
 
 /*
@@ -467,8 +453,8 @@ static enum desc_state desc_read(struct prb_desc_ring 
*desc_ring,
 static void desc_make_reusable(struct prb_desc_ring *desc_ring,
   unsigned long id)
 {
-   unsigned long val_committed = id | DESC_COMMITTED_MASK;
-   unsigned long val_reusable = val_committed | DESC_REUSE_MASK;
+   unsigned long val_committed = DESC_SV(id, desc_committed);
+   unsigned long val_reusable = DESC_SV(id, desc_reusable);
struct prb_desc *desc = to_desc(desc_ring, id);
atomic_long_t *state_var = &desc->state_var;
 
@@ -904,7 +890,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 */
prev_state_val = atomic_long_read(&desc->state_var); /* 
LMM(desc_reserve:E) */
if (prev_state_val &&
-   prev_state_val != (id_prev_wrap | DESC_COMMITTED_MASK | 
DESC_REUSE_MASK)) {
+   get_desc_state(id_prev_wrap, prev_state_val) != desc_reusable) {
WARN_ON_ONCE(1);
return false;
}
@@ -918,7 +904,7 @@ static bool desc_reserve(struct printk_ringbuffer *rb, 
unsigned long *id_out)
 * This pairs with desc_read:D.
 */
if (!atomic_long_try_cmpxchg(&desc->state_var, &prev_state_val,
-id | 0)) { /* LMM(desc_reserve:F) */
+   DESC_SV(id, desc_reserved))) { /* LMM(desc_reserve:F) */
WARN_ON_ONCE(1);
return false;
}
@@ -1237,7 +1223,7 @@ void prb_commit(struct prb_reserved_entry *e)
 {
struct prb_desc_ring *desc_ring = &e->rb->desc_ring;
struct prb_desc *d = to_desc(desc_ring, e->id);
-   

[PATCH printk v5 2/6] printk: ringbuffer: add BLK_DATALESS() macro

2020-09-14 Thread John Ogness
Rather than continually needing to explicitly check @begin and @next
to identify a dataless block, introduce and use a BLK_DATALESS()
macro.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index aa6e31a27601..6ee5ebce1450 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -266,6 +266,8 @@
 
 /* Determine if a logical position refers to a data-less block. */
 #define LPOS_DATALESS(lpos)((lpos) & 1UL)
+#define BLK_DATALESS(blk)  (LPOS_DATALESS((blk)->begin) && \
+LPOS_DATALESS((blk)->next))
 
 /* Get the logical position at index 0 of the current wrap. */
 #define DATA_THIS_WRAP_START_LPOS(data_ring, lpos) \
@@ -1021,7 +1023,7 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
   struct prb_data_blk_lpos *blk_lpos)
 {
/* Data-less blocks take no space. */
-   if (LPOS_DATALESS(blk_lpos->begin))
+   if (BLK_DATALESS(blk_lpos))
return 0;
 
if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next)) {
@@ -1054,7 +1056,7 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
struct prb_data_block *db;
 
/* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (BLK_DATALESS(blk_lpos)) {
if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
*data_size = 0;
return "";
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH printk v5 1/6] printk: ringbuffer: relocate get_data()

2020-09-14 Thread John Ogness
Move the internal get_data() function as-is above prb_reserve() so
that a later change can make use of the static function.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk_ringbuffer.c | 116 +++---
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 0659b50872b5..aa6e31a27601 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1038,6 +1038,64 @@ static unsigned int space_used(struct prb_data_ring 
*data_ring,
DATA_SIZE(data_ring) - DATA_INDEX(data_ring, blk_lpos->begin));
 }
 
+/*
+ * Given @blk_lpos, return a pointer to the writer data from the data block
+ * and calculate the size of the data part. A NULL pointer is returned if
+ * @blk_lpos specifies values that could never be legal.
+ *
+ * This function (used by readers) performs strict validation on the lpos
+ * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
+ * triggered if an internal error is detected.
+ */
+static const char *get_data(struct prb_data_ring *data_ring,
+   struct prb_data_blk_lpos *blk_lpos,
+   unsigned int *data_size)
+{
+   struct prb_data_block *db;
+
+   /* Data-less data block description. */
+   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
+   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
+   *data_size = 0;
+   return "";
+   }
+   return NULL;
+   }
+
+   /* Regular data block: @begin less than @next and in same wrap. */
+   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
+   blk_lpos->begin < blk_lpos->next) {
+   db = to_block(data_ring, blk_lpos->begin);
+   *data_size = blk_lpos->next - blk_lpos->begin;
+
+   /* Wrapping data block: @begin is one wrap behind @next. */
+   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
+  DATA_WRAPS(data_ring, blk_lpos->next)) {
+   db = to_block(data_ring, 0);
+   *data_size = DATA_INDEX(data_ring, blk_lpos->next);
+
+   /* Illegal block description. */
+   } else {
+   WARN_ON_ONCE(1);
+   return NULL;
+   }
+
+   /* A valid data block will always be aligned to the ID size. */
+   if (WARN_ON_ONCE(blk_lpos->begin != ALIGN(blk_lpos->begin, 
sizeof(db->id))) ||
+   WARN_ON_ONCE(blk_lpos->next != ALIGN(blk_lpos->next, 
sizeof(db->id {
+   return NULL;
+   }
+
+   /* A valid data block will always have at least an ID. */
+   if (WARN_ON_ONCE(*data_size < sizeof(db->id)))
+   return NULL;
+
+   /* Subtract block ID space from size to reflect data size. */
+   *data_size -= sizeof(db->id);
+
+   return &db->data[0];
+}
+
 /**
  * prb_reserve() - Reserve space in the ringbuffer.
  *
@@ -1192,64 +1250,6 @@ void prb_commit(struct prb_reserved_entry *e)
local_irq_restore(e->irqflags);
 }
 
-/*
- * Given @blk_lpos, return a pointer to the writer data from the data block
- * and calculate the size of the data part. A NULL pointer is returned if
- * @blk_lpos specifies values that could never be legal.
- *
- * This function (used by readers) performs strict validation on the lpos
- * values to possibly detect bugs in the writer code. A WARN_ON_ONCE() is
- * triggered if an internal error is detected.
- */
-static const char *get_data(struct prb_data_ring *data_ring,
-   struct prb_data_blk_lpos *blk_lpos,
-   unsigned int *data_size)
-{
-   struct prb_data_block *db;
-
-   /* Data-less data block description. */
-   if (LPOS_DATALESS(blk_lpos->begin) && LPOS_DATALESS(blk_lpos->next)) {
-   if (blk_lpos->begin == NO_LPOS && blk_lpos->next == NO_LPOS) {
-   *data_size = 0;
-   return "";
-   }
-   return NULL;
-   }
-
-   /* Regular data block: @begin less than @next and in same wrap. */
-   if (DATA_WRAPS(data_ring, blk_lpos->begin) == DATA_WRAPS(data_ring, 
blk_lpos->next) &&
-   blk_lpos->begin < blk_lpos->next) {
-   db = to_block(data_ring, blk_lpos->begin);
-   *data_size = blk_lpos->next - blk_lpos->begin;
-
-   /* Wrapping data block: @begin is one wrap behind @next. */
-   } else if (DATA_WRAPS(data_ring, blk_lpos->begin + 
DATA_SIZE(data_ring)) ==
-  DATA_WRAPS(data_ring, blk_lpos->next)) {
-   db = to_block(data_ring, 0);
-   *data_size = DATA_INDEX(data_ring, blk_lpos->next);
-
-   /* Illegal block description. */
-   } else {
-   WARN_ON_ONCE(1);
-   return NULL;
- 

[PATCH printk v5 6/6] printk: reimplement log_cont using record extension

2020-09-14 Thread John Ogness
Use the record extending feature of the ringbuffer to implement
continuous messages. This preserves the existing continuous message
behavior.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c | 98 +-
 1 file changed, 20 insertions(+), 78 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 964b5701688f..9a2e23191576 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -535,7 +535,10 @@ static int log_store(u32 caller_id, int facility, int 
level,
r.info->caller_id = caller_id;
 
/* insert message */
-   prb_commit(&e);
+   if ((flags & LOG_CONT) || !(flags & LOG_NEWLINE))
+   prb_commit(&e);
+   else
+   prb_final_commit(&e);
 
return (text_len + trunc_msg_len);
 }
@@ -1084,7 +1087,7 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
dest_r.info->ts_nsec = r->info->ts_nsec;
dest_r.info->caller_id = r->info->caller_id;
 
-   prb_commit(&e);
+   prb_final_commit(&e);
 
return prb_record_text_space(&e);
 }
@@ -1884,87 +1887,26 @@ static inline u32 printk_caller_id(void)
0x8000 + raw_smp_processor_id();
 }
 
-/*
- * Continuation lines are buffered, and not committed to the record buffer
- * until the line is complete, or a race forces it. The line fragments
- * though, are printed immediately to the consoles to ensure everything has
- * reached the console in case of a kernel crash.
- */
-static struct cont {
-   char buf[LOG_LINE_MAX];
-   size_t len; /* length == 0 means unused buffer */
-   u32 caller_id;  /* printk_caller_id() of first print */
-   u64 ts_nsec;/* time of first print */
-   u8 level;   /* log level of first message */
-   u8 facility;/* log facility of first message */
-   enum log_flags flags;   /* prefix, newline flags */
-} cont;
-
-static void cont_flush(void)
-{
-   if (cont.len == 0)
-   return;
-
-   log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-   cont.len = 0;
-}
-
-static bool cont_add(u32 caller_id, int facility, int level,
-enum log_flags flags, const char *text, size_t len)
-{
-   /* If the line gets too long, split it up in separate records. */
-   if (cont.len + len > sizeof(cont.buf)) {
-   cont_flush();
-   return false;
-   }
-
-   if (!cont.len) {
-   cont.facility = facility;
-   cont.level = level;
-   cont.caller_id = caller_id;
-   cont.ts_nsec = local_clock();
-   cont.flags = flags;
-   }
-
-   memcpy(cont.buf + cont.len, text, len);
-   cont.len += len;
-
-   // The original flags come from the first line,
-   // but later continuations can add a newline.
-   if (flags & LOG_NEWLINE) {
-   cont.flags |= LOG_NEWLINE;
-   cont_flush();
-   }
-
-   return true;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const 
char *dict, size_t dictlen, char *text, size_t text_len)
 {
const u32 caller_id = printk_caller_id();
 
-   /*
-* If an earlier line was buffered, and we're a continuation
-* write from the same context, try to add it to the buffer.
-*/
-   if (cont.len) {
-   if (cont.caller_id == caller_id && (lflags & LOG_CONT)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
-   return text_len;
-   }
-   /* Otherwise, make sure it's flushed */
-   cont_flush();
-   }
-
-   /* Skip empty continuation lines that couldn't be added - they just 
flush */
-   if (!text_len && (lflags & LOG_CONT))
-   return 0;
-
-   /* If it doesn't end in a newline, try to buffer the current line */
-   if (!(lflags & LOG_NEWLINE)) {
-   if (cont_add(caller_id, facility, level, lflags, text, 
text_len))
+   if (lflags & LOG_CONT) {
+   struct prb_reserved_entry e;
+   struct printk_record r;
+
+   prb_rec_init_wr(&r, text_len, 0);
+   if (prb_reserve_in_last(&e, prb, &r, caller_id)) {
+   memcpy(&r.text_buf[r.info->text_len], text, text_len);
+   r.info->text_len += text_len;
+   if (lflags & LOG_NEWLINE) {
+   r.info->flags |= LOG_NEWLINE;
+   prb_final_commit(&e);
+   } else {
+   prb_commit(&e);
+   }
return text_len;
+   }
}
 

[PATCH printk v5 5/6] printk: ringbuffer: add finalization/extension support

2020-09-14 Thread John Ogness
Add support for extending the newest data block. For this, introduce
a new finalization state (desc_finalized) denoting a committed
descriptor that cannot be extended.

Until a record is finalized, a writer can reopen that record to
append new data. Reopening a record means transitioning from the
desc_committed state back to the desc_reserved state.

A writer can explicitly finalize a record if there is no intention
of extending it. Also, records are automatically finalized when a
new record is reserved. This relieves writers of needing to
explicitly finalize while also making such records available to
readers sooner. (Readers can only traverse finalized records.)

Four new memory barrier pairs are introduced. Two of them are
insignificant additions (data_realloc:A/desc_read:D and
data_realloc:A/data_push_tail:B) because they are alternate path
memory barriers that exactly match the purpose, pairing, and
context of the two existing memory barrier pairs they provide an
alternate path for. The other two new memory barrier pairs are
significant additions:

desc_reopen_last:A / _prb_commit:B - When reopening a descriptor,
ensure the state transitions back to desc_reserved before
fully trusting the descriptor data.

_prb_commit:B / desc_reserve:D - When committing a descriptor,
ensure the state transitions to desc_committed before checking
the head ID to see if the descriptor needs to be finalized.

Signed-off-by: John Ogness 
---
 Documentation/admin-guide/kdump/gdbmacros.txt |   3 +-
 kernel/printk/printk_ringbuffer.c | 525 --
 kernel/printk/printk_ringbuffer.h |   6 +-
 scripts/gdb/linux/dmesg.py|   3 +-
 4 files changed, 480 insertions(+), 57 deletions(-)

diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt 
b/Documentation/admin-guide/kdump/gdbmacros.txt
index 8f533b751c46..94fabb165abf 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -297,6 +297,7 @@ end
 define dmesg
# definitions from kernel/printk/printk_ringbuffer.h
set var $desc_committed = 1
+   set var $desc_finalized = 2
set var $desc_sv_bits = sizeof(long) * 8
set var $desc_flags_shift = $desc_sv_bits - 2
set var $desc_flags_mask = 3 << $desc_flags_shift
@@ -313,7 +314,7 @@ define dmesg
 
# skip non-committed record
set var $state = 3 & ($desc->state_var.counter >> 
$desc_flags_shift)
-   if ($state == $desc_committed)
+   if ($state == $desc_committed || $state == $desc_finalized)
dump_record $desc $prev_flags
set var $prev_flags = $desc->info.flags
end
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 911fbe150e9a..4e526c79f89c 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -46,20 +46,26 @@
  * into a single descriptor field named @state_var, allowing ID and state to
  * be synchronously and atomically updated.
  *
- * Descriptors have three states:
+ * Descriptors have four states:
  *
  *   reserved
  * A writer is modifying the record.
  *
  *   committed
- * The record and all its data are complete and available for reading.
+ * The record and all its data are written. A writer can reopen the
+ * descriptor (transitioning it back to reserved), but in the committed
+ * state the data is consistent.
+ *
+ *   finalized
+ * The record and all its data are complete and available for reading. A
+ * writer cannot reopen the descriptor.
  *
  *   reusable
  * The record exists, but its text and/or dictionary data may no longer
  * be available.
  *
  * Querying the @state_var of a record requires providing the ID of the
- * descriptor to query. This can yield a possible fourth (pseudo) state:
+ * descriptor to query. This can yield a possible fifth (pseudo) state:
  *
  *   miss
  * The descriptor being queried has an unexpected ID.
@@ -79,6 +85,28 @@
  * committed or reusable queried state. This makes it possible that a valid
  * sequence number of the tail is always available.
  *
+ * Descriptor Finalization
+ * ~~~
+ * When a writer calls the commit function prb_commit(), record data is
+ * fully stored and is consistent within the ringbuffer. However, a writer can
+ * reopen that record, claiming exclusive access (as with prb_reserve()), and
+ * modify that record. When finished, the writer must again commit the record.
+ *
+ * In order for a record to be made available to readers (and also become
+ * recyclable for writers), it must be finalized. A finalized record cannot be
+ * reopened and can never become "unfinalized". Record finalization can occur
+ * in three different scenarios:
+ *
+ *   1) A writer can simultaneously commit and finalize its record by calling
+ *  prb_final_commit() i

[PATCH printk v5 3/6] printk: ringbuffer: clear initial reserved fields

2020-09-14 Thread John Ogness
prb_reserve() will set some meta data values and leave others
uninitialized (or rather, containing the values of the previous
wrap). Simplify the API by always clearing out all the fields.
Only the sequence number is filled in. The caller is now
responsible for filling in the rest of the meta data fields.
In particular, for correctly filling in text and dict lengths.

Signed-off-by: John Ogness 
Reviewed-by: Petr Mladek 
---
 kernel/printk/printk.c| 12 
 kernel/printk/printk_ringbuffer.c | 30 ++
 2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fec71229169e..964b5701688f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -520,8 +520,11 @@ static int log_store(u32 caller_id, int facility, int 
level,
memcpy(&r.text_buf[0], text, text_len);
if (trunc_msg_len)
memcpy(&r.text_buf[text_len], trunc_msg, trunc_msg_len);
-   if (r.dict_buf)
+   r.info->text_len = text_len + trunc_msg_len;
+   if (r.dict_buf) {
memcpy(&r.dict_buf[0], dict, dict_len);
+   r.info->dict_len = dict_len;
+   }
r.info->facility = facility;
r.info->level = level & 7;
r.info->flags = flags & 0x1f;
@@ -1069,10 +1072,11 @@ static unsigned int __init add_to_rb(struct 
printk_ringbuffer *rb,
if (!prb_reserve(&e, rb, &dest_r))
return 0;
 
-   memcpy(&dest_r.text_buf[0], &r->text_buf[0], dest_r.text_buf_size);
+   memcpy(&dest_r.text_buf[0], &r->text_buf[0], r->info->text_len);
+   dest_r.info->text_len = r->info->text_len;
if (dest_r.dict_buf) {
-   memcpy(&dest_r.dict_buf[0], &r->dict_buf[0],
-  dest_r.dict_buf_size);
+   memcpy(&dest_r.dict_buf[0], &r->dict_buf[0], r->info->dict_len);
+   dest_r.info->dict_len = r->info->dict_len;
}
dest_r.info->facility = r->info->facility;
dest_r.info->level = r->info->level;
diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 6ee5ebce1450..82347abb22a5 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -146,10 +146,13 @@
  *
  * if (prb_reserve(&e, &test_rb, &r)) {
  * snprintf(r.text_buf, r.text_buf_size, "%s", textstr);
+ * r.info->text_len = strlen(textstr);
  *
  * // dictionary allocation may have failed
- * if (r.dict_buf)
+ * if (r.dict_buf) {
  * snprintf(r.dict_buf, r.dict_buf_size, "%s", dictstr);
+ * r.info->dict_len = strlen(dictstr);
+ * }
  *
  * r.info->ts_nsec = local_clock();
  *
@@ -1125,9 +1128,9 @@ static const char *get_data(struct prb_data_ring 
*data_ring,
  * @dict_buf_size is set to 0. Writers must check this before writing to
  * dictionary space.
  *
- * @info->text_len and @info->dict_len will already be set to @text_buf_size
- * and @dict_buf_size, respectively. If dictionary space reservation fails,
- * @info->dict_len is set to 0.
+ * Important: @info->text_len and @info->dict_len need to be set correctly by
+ *the writer in order for data to be readable and/or extended.
+ *Their values are initialized to 0.
  */
 bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 struct printk_record *r)
@@ -1135,6 +1138,7 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
struct prb_desc_ring *desc_ring = &rb->desc_ring;
struct prb_desc *d;
unsigned long id;
+   u64 seq;
 
if (!data_check_size(&rb->text_data_ring, r->text_buf_size))
goto fail;
@@ -1159,6 +1163,14 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 
d = to_desc(desc_ring, id);
 
+   /*
+* All @info fields (except @seq) are cleared and must be filled in
+* by the writer. Save @seq before clearing because it is used to
+* determine the new sequence number.
+*/
+   seq = d->info.seq;
+   memset(&d->info, 0, sizeof(d->info));
+
/*
 * Set the @e fields here so that prb_commit() can be used if
 * text data allocation fails.
@@ -1177,17 +1189,15 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
 * See the "Bootstrap" comment block in printk_ringbuffer.h for
 * details about how the initializer bootstraps the descriptors.
 */
-   if (d->info.seq == 0 && DESC_INDEX(desc_ring, id) != 0)
+   if (seq == 0 && DESC_INDEX(desc_ring, id) != 0)
d->info.seq = DESC_INDEX(desc_ring, id);
else
-   d->info.seq += DESCS_COUNT(desc_ring);
+   d->info.seq = seq + DESCS_COUNT(desc_ring);
 
r->text_buf = data_alloc(rb, &rb->tex

Your Respond ASAP

2020-09-14 Thread Ms Mary Mcniff
-- 
>From Chief Compliance Officer, Citigroup Inc CITIBANK
388 Greenwich St, New York, 10013, United States United.
PAYMENT CODE: FRB010
Swift: PTBLBXXX
==

Attention: Beneficiary,

We write to inform you that Series of meetings have been held over the
past 2 weeks with the Secretary General of United Nations,U.S
Department of State and Dubai Union Organization this ended last
week.And parcel is under our custody right now, It will deliver to you
within 24 hours once you clear the charges which will cost you
according to the BANKERS COURIER SERVICES that wish to deliver your
ATM CARD card to
you immediately.

However, it is the pleasure of this office to inform you that your ATM
CARD number; is 29741733 and it has been approved and upgraded in your
favor .you call me for the pin code numbers. The ATM CARD value is us
$10.5 Million only.

Kindly contact the paying bank for the claim of your ATM visa card
payment fund $10,500,000.00 through the below contact information;

Contact Person:Mr Williams S Young
Director of Financial Controller
Bank Name: CITIBANK
Bank address; 388 Greenwich St,
New York City,10013, United States
Email:mrsmegwilli...@gmail.com

Reconfirm the following information?

(1)Your Full Name=
(2)Mobile Phone Number==
(3)Current Home Address 
(4)Fax Number
(5)Passport/Drivers license ==

Endeavor to keep me posted once you contacted the officer in charge
through the above mentioned information.

Your timely response is highly appreciated.To this end, you are
required to forward your payment information as follows to enable us
load your fund into the card with your information and deliver it to
your door step. as the BANKERS COURIER SERVICES are in charge of the
delivery services to your destination.

Yours truly;

Ms Mary Mcniff.
Chief Compliance Officer, Citigroup Inc
FEDERAL RESERVE SYSTEM.
Email: marymcni...@gmail.com.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: makedumpfile: a feature question about filtering

2020-09-14 Thread 萩尾 一仁
-Original Message-
> On 09/14/2020 04:15 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> > -Original Message-
> >> On 09/11/2020 04:53 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> >>> Hi Pingfan,
> >>>
> >>> -Original Message-
>  Hello,
> 
>  There is an appeal which only wants to save some user page including env
>  and args pages, and discards the other user space pages.
> >>>
> >>> I understand that it's helpful to get them even with -d 31 for crash's
> >>> "ps -a" option..
> >>>
> 
>  To achieve this feature, mm_struct's members "arg_start, arg_end,
>  env_start, env_end;" should be accessed. So we need to export mm_struct
>  and init_mm through vmcore.
> >>>
> >>> How many offsets/sizes will be required to walk all tasks?
> >> At present, I think only the info "arg_start, arg_end, env_start,
> >> env_end" in mm_struct are required.
> >
> > ah what I wanted to ask mainly was the number of the offsets/sizes used to
> > walk through all (user) tasks in a system, because makedumpfile cannot get
> > to a task's arg_start only with OFFSET(mm_struct.arg_start).  Is it easy
> > enough to do it only with several vmcoreinfo entries?
> Yes, it is. Iterating over tasks requires to expose
> OFFSET(mm_struct.mmlist, and &init_mm. Then for each mm_struct, we need
> an access to "arg_start, arg_end, env_start,env_end"

Hmm, but a Fedora 32 machine has an empty init_mm.mmlist.
(because of no used swap?)

crash> p init_mm.mmlist
$1 = {
  next = 0x826ee200 , 
  prev = 0x826ee200 
}
crash> swap
SWAP_INFO_STRUCTTYPE   SIZE   USED PCT  PRI  FILENAME
8badb5385a00  PARTITION  4153340k  0k   0%   -2  /dev/dm-1

I might be still missing something.

Thanks,
Kazu
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: makedumpfile: a feature question about filtering

2020-09-14 Thread piliu


On 09/14/2020 04:15 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> -Original Message-
>> On 09/11/2020 04:53 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
>>> Hi Pingfan,
>>>
>>> -Original Message-
 Hello,

 There is an appeal which only wants to save some user page including env
 and args pages, and discards the other user space pages.
>>>
>>> I understand that it's helpful to get them even with -d 31 for crash's
>>> "ps -a" option..
>>>

 To achieve this feature, mm_struct's members "arg_start, arg_end,
 env_start, env_end;" should be accessed. So we need to export mm_struct
 and init_mm through vmcore.
>>>
>>> How many offsets/sizes will be required to walk all tasks?
>> At present, I think only the info "arg_start, arg_end, env_start,
>> env_end" in mm_struct are required.
> 
> ah what I wanted to ask mainly was the number of the offsets/sizes used to
> walk through all (user) tasks in a system, because makedumpfile cannot get
> to a task's arg_start only with OFFSET(mm_struct.arg_start).  Is it easy
> enough to do it only with several vmcoreinfo entries?
Yes, it is. Iterating over tasks requires to expose
OFFSET(mm_struct.mmlist, and &init_mm. Then for each mm_struct, we need
an access to "arg_start, arg_end, env_start,env_end"

Thanks,
Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: makedumpfile: a feature question about filtering

2020-09-14 Thread 萩尾 一仁
-Original Message-
> On 09/11/2020 04:53 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> > Hi Pingfan,
> >
> > -Original Message-
> >> Hello,
> >>
> >> There is an appeal which only wants to save some user page including env
> >> and args pages, and discards the other user space pages.
> >
> > I understand that it's helpful to get them even with -d 31 for crash's
> > "ps -a" option..
> >
> >>
> >> To achieve this feature, mm_struct's members "arg_start, arg_end,
> >> env_start, env_end;" should be accessed. So we need to export mm_struct
> >> and init_mm through vmcore.
> >
> > How many offsets/sizes will be required to walk all tasks?
> At present, I think only the info "arg_start, arg_end, env_start,
> env_end" in mm_struct are required.

ah what I wanted to ask mainly was the number of the offsets/sizes used to
walk through all (user) tasks in a system, because makedumpfile cannot get
to a task's arg_start only with OFFSET(mm_struct.arg_start).  Is it easy
enough to do it only with several vmcoreinfo entries?

Thanks,
Kazu

> > If kernel maintainers accept it, I will not oppose that feature..
> OK, I will start with kernel side as the first step.
> >
> > (or it would be simpler to mark the pages something special when allocating,
> > but I don't feel like it's easy to change kernel to do so.)
> Yes, I agree. From kernel side, the page is a normal user space page,
> there should not be exception.
> 
> Thanks,
> Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec