[PATCH][next] kexec_file: Assign array_size() to a variable

2020-07-10 Thread Gustavo A. R. Silva
Assign array_size() to variable _size_ and use it in both vzalloc()
and memcpy(). These sorts of multiplication factors need to be wrapped
in array_size().

This issue was found with the help of Coccinelle and, audited and fixed
manually.

Signed-off-by: Gustavo A. R. Silva 
---
 kernel/kexec_file.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 09cc78df53c6..4479d864aaf2 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -883,16 +883,16 @@ static int kexec_purgatory_setup_sechdrs(struct 
purgatory_info *pi,
unsigned long offset;
Elf_Shdr *sechdrs;
int i;
+   size_t size = array_size(sizeof(Elf_Shdr), pi->ehdr->e_shnum);
 
/*
 * The section headers in kexec_purgatory are read-only. In order to
 * have them modifiable make a temporary copy.
 */
-   sechdrs = vzalloc(array_size(sizeof(Elf_Shdr), pi->ehdr->e_shnum));
+   sechdrs = vzalloc(size);
if (!sechdrs)
return -ENOMEM;
-   memcpy(sechdrs, (void *)pi->ehdr + pi->ehdr->e_shoff,
-  pi->ehdr->e_shnum * sizeof(Elf_Shdr));
+   memcpy(sechdrs, (void *)pi->ehdr + pi->ehdr->e_shoff, size);
pi->sechdrs = sechdrs;
 
offset = 0;
-- 
2.27.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/4] printk: replace ringbuffer

2020-07-10 Thread Petr Mladek
On Fri 2020-07-10 11:58:34, John Ogness wrote:
> On 2020-07-10, Petr Mladek  wrote:
> >> The next series in the printk-rework (move LOG_CONT handling from
> >> writers to readers) makes some further changes that, while not
> >> incompatible, could affect the output of existing tools. It may be a
> >> good idea to let the new ringbuffer sit in linux-next until the next
> >> series has been discussed/reviewed/merged. After the next series,
> >> everything will be in place (with regard to userspace tools) to
> >> finish the rework.
> >
> > I know that it might be premature question. But I wonder what kind
> > of changes are expected because of the continuous lines.
> 
> I will be posting the next series quite soon, so I think it will be
> better to discuss it when we have a working example in front of us.

sure

> > Do you expect some changes in the ring buffer structures so that
> > the debugging tools would need yet another update to actually
> > access the data?
>
> The next series will be modifying the ringbuffer to allow data-less
> records. This is necessary to support the thousands of
> 
> pr_cont("\n");
> 
> calls in the kernel code. Failed dataring allocations will still be
> detected because the message flags for those records will be 0. For the
> above pr_cont() line, they will be LOG_NEWLINE|LOG_CONT.
> 
> Since the dump tools need to make changes for the new ringbuffer anyway,
> I think it would be good to hammer out the accepted LOG_CONT
> implementation first, just in case we do need to make any subtle
> internal changes.

Makes sense.


> > Or do you expect backward compatible changes that would allow
> > to pass related parts of the continuous lines via syslog/dev_kmsg
> > interface and join them later in userspace?
> 
> For users of console, non-extended netconsole, syslog, and kmsg_dump,
> there will be no external changes whatsoever. These interfaces have no
> awareness of sequence numbers, which will allow the kernel to
> re-assemble the LOG_CONT messages for them.

yup

> Users of /dev/kmsg and extended netconsole see sequence numbers. Offlist
> we discussed various hacks how to get around this without causing errors
> for existing software, but it was all ugly.
> 
> IMHO users of these sequence number interfaces need to see all the
> records individually and reassemble the LOG_CONT messages themselves if
> they want to. I believe that is the only sane path forward. To do this,
> the caller id will no longer be optional to the sequence number output
> since that is vital information to re-assemble the LOG_CONT messages.
> 
> Keep in mind that current software already needs to be able to handle
> the caller id being shown. Also, currently in mainline there is no
> guarantee that LOG_CONT messages are contiguous. So current software
> must also be ready to accept broken up LOG_CONT messages. This is why I
> think it would be acceptable to make this change for /dev/kmsg and
> extended netconsole. But I understand it is controversial since tools
> like systemd and dmesg use /dev/kmsg. Until they are modified to
> re-assemble LOG_CONT messages, they will present the user with the
> ugliness of LOG_CONT pieces (always, rather than as is now rarely).

I am not completely sure how often newline is used in common messages.
I hope that it is more frequently used in a debug code that has to
dump many details.

If my assumption is correct then your proposed approach should be well
acceptable. Normal people will not notice. Developers should be able
to deal with it, especially when a reasonable output might be seen
on consoles, via dmesg or crashdumnp.

Anyway, the proposal looks fine to me. I hope that it will work :-)
Let's continue the discussion when you send the patchset.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/4] printk: replace ringbuffer

2020-07-10 Thread John Ogness
On 2020-07-10, Petr Mladek  wrote:
>> The next series in the printk-rework (move LOG_CONT handling from
>> writers to readers) makes some further changes that, while not
>> incompatible, could affect the output of existing tools. It may be a
>> good idea to let the new ringbuffer sit in linux-next until the next
>> series has been discussed/reviewed/merged. After the next series,
>> everything will be in place (with regard to userspace tools) to
>> finish the rework.
>
> I know that it might be premature question. But I wonder what kind
> of changes are expected because of the continuous lines.

I will be posting the next series quite soon, so I think it will be
better to discuss it when we have a working example in front of us.

> Do you expect some changes in the ring buffer structures so that
> the debugging tools would need yet another update to actually
> access the data?

The next series will be modifying the ringbuffer to allow data-less
records. This is necessary to support the thousands of

pr_cont("\n");

calls in the kernel code. Failed dataring allocations will still be
detected because the message flags for those records will be 0. For the
above pr_cont() line, they will be LOG_NEWLINE|LOG_CONT.

Since the dump tools need to make changes for the new ringbuffer anyway,
I think it would be good to hammer out the accepted LOG_CONT
implementation first, just in case we do need to make any subtle
internal changes.

> Or do you expect backward compatible changes that would allow
> to pass related parts of the continuous lines via syslog/dev_kmsg
> interface and join them later in userspace?

For users of console, non-extended netconsole, syslog, and kmsg_dump,
there will be no external changes whatsoever. These interfaces have no
awareness of sequence numbers, which will allow the kernel to
re-assemble the LOG_CONT messages for them.

Users of /dev/kmsg and extended netconsole see sequence numbers. Offlist
we discussed various hacks how to get around this without causing errors
for existing software, but it was all ugly.

IMHO users of these sequence number interfaces need to see all the
records individually and reassemble the LOG_CONT messages themselves if
they want to. I believe that is the only sane path forward. To do this,
the caller id will no longer be optional to the sequence number output
since that is vital information to re-assemble the LOG_CONT messages.

Keep in mind that current software already needs to be able to handle
the caller id being shown. Also, currently in mainline there is no
guarantee that LOG_CONT messages are contiguous. So current software
must also be ready to accept broken up LOG_CONT messages. This is why I
think it would be acceptable to make this change for /dev/kmsg and
extended netconsole. But I understand it is controversial since tools
like systemd and dmesg use /dev/kmsg. Until they are modified to
re-assemble LOG_CONT messages, they will present the user with the
ugliness of LOG_CONT pieces (always, rather than as is now rarely).

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/4] printk: replace ringbuffer

2020-07-10 Thread Petr Mladek
On Thu 2020-07-09 09:09:31, John Ogness wrote:
> On 2020-07-08, Petr Mladek  wrote:
> > OK, I think that we are ready to try this in linux-next.
> > I am going to push it there via printk/linux.git.
> >
> > [...]
> > 
> > Of course, there are still many potential problems. The following comes
> > to my mind:
> >
> > [...]
> >
> >+ Debugging tools accessing the buffer directly would need to
> >  understand the new structure. Fortunately John provided
> >  patches for the most prominent ones.
> 
> The next series in the printk-rework (move LOG_CONT handling from
> writers to readers) makes some further changes that, while not
> incompatible, could affect the output of existing tools. It may be a
> good idea to let the new ringbuffer sit in linux-next until the next
> series has been discussed/reviewed/merged. After the next series,
> everything will be in place (with regard to userspace tools) to finish
> the rework.

I know that it might be premature question. But I wonder what kind
of changes are expected because of the continuous lines.

Do you expect some changes in the ring buffer structures so that
the debugging tools would need yet another update to actually
access the data?

Or do you expect backward compatible changes that would allow
to pass related parts of the continuous lines via syslog/dev_kmsg
interface and join them later in userspace?

IMHO, it would make sense to wait only when the structures need
some modification. Concatenating related parts on the userspace
side will need to stay optional anyway.

As I say, this might be premature question. I just do not want
to unnecessary delay mainlining the current state. It would get
much wider testing there. And it is great when the changes might
be done in "small" steps.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 2/4] printk: add lockless ringbuffer

2020-07-10 Thread Sergey Senozhatsky
On (20/07/10 17:43), Sergey Senozhatsky wrote:
> [..]
> 
> > +void prb_init(struct printk_ringbuffer *rb,
> > + char *text_buf, unsigned int textbits,
> > + char *dict_buf, unsigned int dictbits,
> > + struct prb_desc *descs, unsigned int descbits)
> > +{
> > +   memset(descs, 0, _DESCS_COUNT(descbits) * sizeof(descs[0]));
> > +
> > +   rb->desc_ring.count_bits = descbits;
> > +   rb->desc_ring.descs = descs;
> > +   atomic_long_set(&rb->desc_ring.head_id, DESC0_ID(descbits));
> > +   atomic_long_set(&rb->desc_ring.tail_id, DESC0_ID(descbits));
> > +
> > +   rb->text_data_ring.size_bits = textbits;
> > +   rb->text_data_ring.data = text_buf;
> > +   atomic_long_set(&rb->text_data_ring.head_lpos, BLK0_LPOS(textbits));
> > +   atomic_long_set(&rb->text_data_ring.tail_lpos, BLK0_LPOS(textbits));
> > +
> > +   rb->dict_data_ring.size_bits = dictbits;
> > +   rb->dict_data_ring.data = dict_buf;
> > +   atomic_long_set(&rb->dict_data_ring.head_lpos, BLK0_LPOS(dictbits));
> > +   atomic_long_set(&rb->dict_data_ring.tail_lpos, BLK0_LPOS(dictbits));
> > +
> 
> Just a side note: some people want !CONFIG_PRINTK builds. I wonder
> how many people will want !CONFIG_PRINTK_DICT. The core logbuf/dict
> logbuf split is really cool.

We also may be can add a dedicated logbuf for the user-space logs,
as opposed to current "user space can write to the kernel logbuf".
Some folks really dislike that systemd, for instance, can trash
the core kernel logbuf doing a lot of /dev/kmsg writes.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 2/4] printk: add lockless ringbuffer

2020-07-10 Thread Sergey Senozhatsky
On (20/07/09 15:29), John Ogness wrote:
[..]
> +/*
> + * A data block: mapped directly to the beginning of the data block area
> + * specified as a logical position within the data ring.
> + *
> + * @id:   the ID of the associated descriptor
> + * @data: the writer data
> + *
> + * Note that the size of a data block is only known by its associated
> + * descriptor.
> + */
> +struct prb_data_block {
> + unsigned long   id;
> + chardata[0];
> +};

A nit: I think someone will send "Replace zero-length arrays with
flexible array member" soon enough:

-   chardata[0];
+   chardata[];

[..]
> +/*
> + * Sanity checker for reserve size. The ringbuffer code assumes that a data
> + * block does not exceed the maximum possible size that could fit within the
> + * ringbuffer. This function provides that basic size check so that the
> + * assumption is safe.
> + *
> + * Writers are also not allowed to write 0-sized (data-less) records. Such
> + * records are used only internally by the ringbuffer.
> + */
> +static bool data_check_size(struct prb_data_ring *data_ring, unsigned int 
> size)
> +{
> + struct prb_data_block *db = NULL;
> +
> + /*
> +  * Writers are not allowed to write data-less records. Such records
> +  * are used only internally by the ringbuffer to denote records where
> +  * their data failed to allocate or have been lost.
> +  */

A nit: The same data-less records comment is some 5 lines earlier. But OK.

> + if (size == 0)
> + return false;

[..]

> +void prb_init(struct printk_ringbuffer *rb,
> +   char *text_buf, unsigned int textbits,
> +   char *dict_buf, unsigned int dictbits,
> +   struct prb_desc *descs, unsigned int descbits)
> +{
> + memset(descs, 0, _DESCS_COUNT(descbits) * sizeof(descs[0]));
> +
> + rb->desc_ring.count_bits = descbits;
> + rb->desc_ring.descs = descs;
> + atomic_long_set(&rb->desc_ring.head_id, DESC0_ID(descbits));
> + atomic_long_set(&rb->desc_ring.tail_id, DESC0_ID(descbits));
> +
> + rb->text_data_ring.size_bits = textbits;
> + rb->text_data_ring.data = text_buf;
> + atomic_long_set(&rb->text_data_ring.head_lpos, BLK0_LPOS(textbits));
> + atomic_long_set(&rb->text_data_ring.tail_lpos, BLK0_LPOS(textbits));
> +
> + rb->dict_data_ring.size_bits = dictbits;
> + rb->dict_data_ring.data = dict_buf;
> + atomic_long_set(&rb->dict_data_ring.head_lpos, BLK0_LPOS(dictbits));
> + atomic_long_set(&rb->dict_data_ring.tail_lpos, BLK0_LPOS(dictbits));
> +

Just a side note: some people want !CONFIG_PRINTK builds. I wonder
how many people will want !CONFIG_PRINTK_DICT. The core logbuf/dict
logbuf split is really cool.

-ss

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 0/4] printk: replace ringbuffer

2020-07-10 Thread Petr Mladek
On Thu 2020-07-09 15:29:40, John Ogness wrote:
> Hello,
> 
> Here is a v5 for the first series to rework the printk
> subsystem. The v4 is here [0]. This first series
> only replaces the existing ringbuffer implementation. No locking
> is removed. The semantics/behavior of printk are kept the same
> except for a minor optimization that is reverted (patch 3).
> 
> John Ogness (4):
>   crash: add VMCOREINFO macro to define offset in a struct declared by
> typedef
>   printk: add lockless ringbuffer
>   Revert "printk: lock/unlock console only for new logbuf entries"
>   printk: use the lockless ringbuffer

The patchset is committed in printk/linux.git, branch printk-rework.

I did not add any target kernel version into the topic branch name.
We could use it for the entire rework. The pieces would go into
mainline when we and linux-next are happy.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] sadump, kaslr: fix failure of calculating kaslr_offset due to an sadump format restriction

2020-07-10 Thread Kazuhito Hagio
On Thu, Jul 9, 2020 at 6:30 PM HATAYAMA Daisuke  wrote:
>
> We faced recently a memory dump collected by sadump where unused part
> of register values are non-zero. For the crash dump, calculating
> kaslr_offset fails because it is based on the assumption that unused
> part of register values in the sadump format are always zero cleared.
>
> The problem is that used and unused part of register values are
> rigorously indistinguishable in the sadump format. Although there is
> kernel data structure that represents a map between logical cpu
> numbers and lapic ids, they cannot be used in order to calculate
> kaslr_offset.
>
> To fix this, we have no choice but use a trial-and-error approach: try
> to use each entry of register values in order until we find a good
> pair of cr3 and idtr by which we can refer to linux_banner symbol as
> expected.
>
> Signed-off-by: HATAYAMA Daisuke 

I've replaced some indent spaces with tabs and merged:
https://github.com/makedumpfile/makedumpfile/commit/3c0cf7a93cff83f1e711e241eb47fcb096a451c5

Thanks,
Kazu

> ---
>  sadump_info.c | 140 
> --
>  1 file changed, 97 insertions(+), 43 deletions(-)
>
> diff --git a/sadump_info.c b/sadump_info.c
> index 46867ce..aa9a048 100644
> --- a/sadump_info.c
> +++ b/sadump_info.c
> @@ -101,6 +101,7 @@ static int lookup_diskset(unsigned long long 
> whole_offset, int *diskid,
>   unsigned long long *disk_offset);
>  static int max_mask_cpu(void);
>  static int cpu_online_mask_init(void);
> +static int linux_banner_sanity_check(ulong cr3);
>  static int per_cpu_init(void);
>  static int get_data_from_elf_note_desc(const char *note_buf, uint32_t 
> n_descsz,
>char *name, uint32_t n_type, char 
> **data);
> @@ -1293,6 +1294,30 @@ finish:
> return ret;
>  }
>
> +static int linux_banner_sanity_check(ulong cr3)
> +{
> +   unsigned long linux_banner_paddr;
> +   char buf[sizeof("Linux version")];
> +
> +   linux_banner_paddr = vtop4_x86_64_pagetable(SYMBOL(linux_banner), 
> cr3);
> +   if (linux_banner_paddr == NOT_PADDR) {
> +   DEBUG_MSG("sadump: linux_banner address translation 
> failed\n");
> +   return FALSE;
> +   }
> +
> +   if (!readmem(PADDR, linux_banner_paddr, &buf, sizeof(buf))) {
> +   DEBUG_MSG("sadump: reading linux_banner failed\n");
> +   return FALSE;
> +   }
> +
> +   if (!STRNEQ(buf, "Linux version")) {
> +   DEBUG_MSG("sadump: linux_banner sanity check failed\n");
> +   return FALSE;
> +   }
> +
> +   return TRUE;
> +}
> +
>  /*
>   * Calculate kaslr_offset and phys_base
>   *
> @@ -1370,59 +1395,86 @@ calc_kaslr_offset(void)
>  {
> struct sadump_header *sh = si->sh_memory;
> uint64_t idtr = 0, cr3 = 0, idtr_paddr;
> -   struct sadump_smram_cpu_state smram, zero;
> +   struct sadump_smram_cpu_state smram;
> int apicid;
> unsigned long divide_error_vmcore, divide_error_vmlinux;
> unsigned long kaslr_offset, phys_base;
> unsigned long kaslr_offset_kdump, phys_base_kdump;
> +   int sanity_check_passed = FALSE;
>
> -   memset(&zero, 0, sizeof(zero));
> for (apicid = 0; apicid < sh->nr_cpus; ++apicid) {
> -   if (!get_smram_cpu_state(apicid, &smram)) {
> -   ERRMSG("get_smram_cpu_state error\n");
> +
> +DEBUG_MSG("sadump: apicid: %d\n", apicid);
> +
> +if (!get_smram_cpu_state(apicid, &smram)) {
> +ERRMSG("get_smram_cpu_state error\n");
> +return FALSE;
> +}
> +
> +idtr = 
> ((uint64_t)smram.IdtUpper)<<32|(uint64_t)smram.IdtLower;
> +
> +if (!smram.Cr3 || !idtr) {
> +DEBUG_MSG("sadump: cr3: %lx idt: %lx, skipped\n",
> +  smram.Cr3,
> +  idtr);
> +continue;
> +}
> +
> +if ((SYMBOL(pti_init) != NOT_FOUND_SYMBOL) ||
> +(SYMBOL(kaiser_init) != NOT_FOUND_SYMBOL))
> +cr3 = smram.Cr3 & 
> ~(CR3_PCID_MASK|PTI_USER_PGTABLE_MASK);
> +else
> +cr3 = smram.Cr3 & ~CR3_PCID_MASK;
> +
> +/* Convert virtual address of IDT table to physical address 
> */
> +   idtr_paddr = vtop4_x86_64_pagetable(idtr, cr3);
> +if (idtr_paddr == NOT_PADDR) {
> +DEBUG_MSG("sadump: converting IDT physical address "
> + "failed.\n");
> +continue;
> +}
> +
> +   /* Now we can calculate kaslr_offset and phys_base */
> +   divide_error_vmlinux = SYMBOL(divide_error);
> +   divide_error_vmcore = get_vec0_addr(idtr