Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-04-29 Thread Bhupesh Sharma
On Tue, Apr 28, 2020 at 3:37 PM Catalin Marinas  wrote:
>
> On Tue, Apr 28, 2020 at 01:55:58PM +0530, Bhupesh Sharma wrote:
> > On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland  wrote:
> > > On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote:
> > > >  arch/arm64/configs/defconfig | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > > > index 24e534d85045..fa122f4341a2 100644
> > > > --- a/arch/arm64/configs/defconfig
> > > > +++ b/arch/arm64/configs/defconfig
> > > > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
> > > >  CONFIG_NUMA=y
> > > >  CONFIG_SECCOMP=y
> > > >  CONFIG_KEXEC=y
> > > > +CONFIG_KEXEC_FILE=y
> > > >  CONFIG_CRASH_DUMP=y
> > > >  CONFIG_XEN=y
> > > >  CONFIG_COMPAT=y
> > > > --
> > > > 2.7.4
> >
> > Thanks a lot  Mark.
> >
> > Hi Catalin, Will,
> >
> > Can you please help pick this patch in the arm tree. We have an
> > increasing number of user-cases from distro users
> > who want to use kexec_file_load() as the default interface for
> > kexec/kdump on arm64.
>
> We could pick it up if it doesn't conflict with the arm-soc tree. They
> tend to pick most of the defconfig changes these days (and could as well
> pick this one).

Thanks Catalin.
(+Cc Arnd)

Hi Arnd,

Can you please help pick this change via the arm-soc tree?

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Re: [RESEND PATCH v5 2/5] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

2020-04-29 Thread Scott Branden

Hi Bhupesh,

On 2020-02-23 10:25 p.m., Bhupesh Sharma wrote:

Hi Amit,

On Fri, Feb 21, 2020 at 2:36 PM Amit Kachhap  wrote:

Hi Bhupesh,

On 1/13/20 5:44 PM, Bhupesh Sharma wrote:

Hi James,

On 01/11/2020 12:30 AM, Dave Anderson wrote:

- Original Message -

Hi Bhupesh,

On 25/12/2019 19:01, Bhupesh Sharma wrote:

On 12/12/2019 04:02 PM, James Morse wrote:

On 29/11/2019 19:59, Bhupesh Sharma wrote:

vabits_actual variable on arm64 indicates the actual VA space size,
and allows a single binary to support both 48-bit and 52-bit VA
spaces.

If the ARMv8.2-LVA optional feature is present, and we are running
with a 64KB page size; then it is possible to use 52-bits of address
space for both userspace and kernel addresses. However, any kernel
binary that supports 52-bit must also be able to fall back to 48-bit
at early boot time if the hardware feature is not present.

Since TCR_EL1.T1SZ indicates the size offset of the memory region
addressed by TTBR1_EL1 (and hence can be used for determining the
vabits_actual value) it makes more sense to export the same in
vmcoreinfo rather than vabits_actual variable, as the name of the
variable can change in future kernel versions, but the architectural
constructs like TCR_EL1.T1SZ can be used better to indicate intended
specific fields to user-space.

User-space utilities like makedumpfile and crash-utility, need to
read/write this value from/to vmcoreinfo

(write?)

Yes, also write so that the vmcoreinfo from an (crashing) arm64
system can
be used for
analysis of the root-cause of panic/crash on say an x86_64 host using
utilities like
crash-utility/gdb.

I read this as as "User-space [...] needs to write to vmcoreinfo".

That's correct. But for writing to vmcore dump in the kdump kernel, we
need to read the symbols from the vmcoreinfo in the primary kernel.


for determining if a virtual address lies in the linear map range.

I think this is a fragile example. The debugger shouldn't need to know
this.

Well that the current user-space utility design, so I am not sure we
can
tweak that too much.


The user-space computation for determining whether an address lies in
the linear map range is the same as we have in kernel-space:

 #define __is_lm_address(addr)(!(((u64)addr) &
BIT(vabits_actual -
 1)))

This was changed with 14c127c957c1 ("arm64: mm: Flip kernel VA
space"). If
user-space
tools rely on 'knowing' the kernel memory layout, they must have to
constantly be fixed
and updated. This is a poor argument for adding this to something that
ends up as ABI.

See above. The user-space has to rely on some ABI/guaranteed
hardware-symbols which can be
used for 'determining' the kernel memory layout.

I disagree. Everything and anything in the kernel will change. The
ABI rules apply to
stuff exposed via syscalls and kernel filesystems. It does not apply
to kernel internals,
like the memory layout we used yesterday. 14c127c957c1 is a case in
point.

A debugger trying to rely on this sort of thing would have to play
catchup whenever it
changes.

Exactly.  That's the whole point.

The crash utility and makedumpfile are not in the same league as other
user-space tools.
They have always had to "play catchup" precisely because they depend
upon kernel internals,
which constantly change.

I agree with you and DaveA here. Software user-space debuggers are
dependent on kernel internals (which can change from time-to-time) and
will have to play catch-up (which has been the case since the very start).

Unfortunately we don't have any clear ABI for software debugging tools -
may be something to look for in future.

A case in point is gdb/kgdb, which still needs to run with KASLR
turned-off (nokaslr) for debugging, as it confuses gdb which resolve
kernel symbol address from symbol table of vmlinux. But we can
work-around the same in makedumpfile/crash by reading the 'kaslr_offset'
value. And I have several users telling me now they cannot use gdb on
KASLR enabled kernel to debug panics, but can makedumpfile + crash
combination to achieve the same.

So, we should be looking to fix these utilities which are broken since
the 52-bit changes for arm64. Accordingly, I will try to send the v6
soon while incorporating the comments posted on the v5.

Any update on the next v6 version. Since this patch series is fixing the
current broken kdump so need this series to add some more fields in
vmcoreinfo for Pointer Authentication work.

Sorry for the delay. I was caught up in some other urgent arm64
user-space issues.
I am preparing the v6 now and hopefully will be able to post it out
for review later today.


Did v6 get sent out?



Thanks,
Bhupesh



Regards,
Scott

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[RFC PATCH 1/1] printk: add support for lockless ringbuffer

2020-04-29 Thread John Ogness
Linux is moving to a new lockless ringbuffer. The new ringbuffer
is structured completely different to the previous iterations.
Add support for retrieving the ringbuffer from debug information
and/or using vmcoreinfo. The new ringbuffer is detected based on
the availability of the "prb" symbol.

Signed-off-by: John Ogness 
---
 Makefile   |   2 +-
 dwarf_info.c   |  36 +-
 makedumpfile.c | 101 ++--
 makedumpfile.h |  25 +++
 printk.c   | 177 +
 5 files changed, 333 insertions(+), 8 deletions(-)
 create mode 100644 printk.c

diff --git a/Makefile b/Makefile
index ef20672..dc4ae3e 100644
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ CFLAGS_ARCH += -m32
 endif
 
 SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h 
sadump_info.h
-SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c 
cache.c tools.c
+SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c 
cache.c tools.c printk.c
 OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c 
arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
diff --git a/dwarf_info.c b/dwarf_info.c
index e42a9f5..543588b 100644
--- a/dwarf_info.c
+++ b/dwarf_info.c
@@ -614,6 +614,7 @@ search_structure(Dwarf_Die *die, int *found)
 {
int tag;
const char *name;
+   Dwarf_Die die_type;
 
/*
 * If we get to here then we don't have any more
@@ -622,9 +623,31 @@ search_structure(Dwarf_Die *die, int *found)
do {
tag  = dwarf_tag(die);
name = dwarf_diename(die);
-   if ((tag != DW_TAG_structure_type) || (!name)
-   || strcmp(name, dwarf_info.struct_name))
+   if ((!name) || strcmp(name, dwarf_info.struct_name))
+   continue;
+
+   if (tag == DW_TAG_typedef) {
+   if (!get_die_type(die, _type)) {
+   ERRMSG("Can't get CU die of DW_AT_type.\n");
+   break;
+   }
+
+   /* Resolve typedefs of typedefs. */
+   while ((tag = dwarf_tag(_type)) == DW_TAG_typedef) {
+   if (!get_die_type(_type, _type)) {
+   ERRMSG("Can't get CU die of 
DW_AT_type.\n");
+   return;
+   }
+   }
+
+   if (tag != DW_TAG_structure_type)
+   continue;
+   die = _type;
+
+   } else if (tag != DW_TAG_structure_type) {
continue;
+   }
+
/*
 * Skip if DW_AT_byte_size is not included.
 */
@@ -740,6 +763,15 @@ search_typedef(Dwarf_Die *die, int *found)
ERRMSG("Can't get CU die of DW_AT_type.\n");
break;
}
+
+   /* Resolve typedefs of typedefs. */
+   while ((tag = dwarf_tag(_type)) == DW_TAG_typedef) {
+   if (!get_die_type(_type, _type)) {
+   ERRMSG("Can't get CU die of 
DW_AT_type.\n");
+   return;
+   }
+   }
+
dwarf_info.struct_size = dwarf_bytesize(_type);
if (dwarf_info.struct_size <= 0)
continue;
diff --git a/makedumpfile.c b/makedumpfile.c
index f5860a1..6192ee7 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -1555,6 +1555,7 @@ get_symbol_info(void)
SYMBOL_INIT(node_data, "node_data");
SYMBOL_INIT(pgdat_list, "pgdat_list");
SYMBOL_INIT(contig_page_data, "contig_page_data");
+   SYMBOL_INIT(prb, "prb");
SYMBOL_INIT(log_buf, "log_buf");
SYMBOL_INIT(log_buf_len, "log_buf_len");
SYMBOL_INIT(log_end, "log_end");
@@ -1971,16 +1972,46 @@ get_structure_info(void)
OFFSET_INIT(elf64_phdr.p_memsz, "elf64_phdr", "p_memsz");
 
SIZE_INIT(printk_log, "printk_log");
-   if (SIZE(printk_log) != NOT_FOUND_STRUCTURE) {
+   SIZE_INIT(printk_ringbuffer, "printk_ringbuffer");
+   if ((SIZE(printk_ringbuffer) != NOT_FOUND_STRUCTURE)) {
+   info->flag_use_printk_ringbuffer = TRUE;
+   info->flag_use_printk_log = FALSE;
+
+   OFFSET_INIT(printk_log.desc_ring, "printk_ringbuffer", 
"desc_ring");
+   OFFSET_INIT(printk_log.text_data_ring, "printk_ringbuffer",  
"text_data_ring");
+
+   OFFSET_INIT(printk_log.count_bits, "prb_desc_ring", 
"count_bits");
+   OFFSET_INIT(printk_log.descs, "prb_desc_ring", 

[RFC PATCH 0/1] support lockless printk ringbuffer

2020-04-29 Thread John Ogness
Hi Kazu,

Here is a patch adding full support for the new lockless printk
ringbuffer as it is currently being proposed. Note that the latest
version has not yet been submitted to LKML. I was waiting until I
finished tests with crash(8) and makedumpfile(8).

The new ringbuffer will export all the necessary symbols, sizes,
and offsets in VMCOREINFO.

Note that I created a separate printk.c for the iteration logic.

Also note that I modified dwarf_info.c to support resolving
typedefs of typedefs. This was necessary in order to support
atomic_long_t and its "counter" member.

I don't expect you to take the patch as-is, but I hope it can
provide some positive ground work for moving forward.

John Ogness (1):
  printk: add support for lockless ringbuffer

 Makefile   |   2 +-
 dwarf_info.c   |  36 +-
 makedumpfile.c | 101 ++--
 makedumpfile.h |  25 +++
 printk.c   | 177 +
 5 files changed, 333 insertions(+), 8 deletions(-)
 create mode 100644 printk.c

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 15/18] arm64: kexec: kexec EL2 vectors

2020-04-29 Thread Marc Zyngier

On 2020-03-26 03:24, Pavel Tatashin wrote:

If we have a EL2 mode without VHE, the EL2 vectors are needed in order
to switch to EL2 and jump to new world with hyperivsor privileges.

Signed-off-by: Pavel Tatashin 
---
 arch/arm64/include/asm/kexec.h  |  5 +
 arch/arm64/kernel/asm-offsets.c |  1 +
 arch/arm64/kernel/machine_kexec.c   |  5 +
 arch/arm64/kernel/relocate_kernel.S | 35 +
 4 files changed, 46 insertions(+)

diff --git a/arch/arm64/include/asm/kexec.h 
b/arch/arm64/include/asm/kexec.h

index d944c2e289b2..0f758fd51518 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -95,6 +95,7 @@ static inline void crash_post_resume(void) {}
 extern const unsigned long kexec_relocate_code_size;
 extern const unsigned char kexec_relocate_code_start[];
 extern const unsigned long kexec_kern_reloc_offset;
+extern const unsigned long kexec_el2_vectors_offset;
 #endif

 /*
@@ -104,6 +105,9 @@ extern const unsigned long kexec_kern_reloc_offset;
  * kernel, or purgatory entry address).
  * kern_arg0   first argument to kernel is its dtb address. The other
  * arguments are currently unused, and must be set to 0
+ * el2_vector  If present means that relocation routine will go to EL1
+ * from EL2 to do the copy, and then back to EL2 to do the jump
+ * to new world.
  */
 struct kern_reloc_arg {
phys_addr_t head;
@@ -112,6 +116,7 @@ struct kern_reloc_arg {
phys_addr_t kern_arg1;
phys_addr_t kern_arg2;
phys_addr_t kern_arg3;
+   phys_addr_t el2_vector;
 };

 #define ARCH_HAS_KIMAGE_ARCH
diff --git a/arch/arm64/kernel/asm-offsets.c 
b/arch/arm64/kernel/asm-offsets.c

index 448230684749..ff974b648347 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -136,6 +136,7 @@ int main(void)
   DEFINE(KEXEC_KRELOC_KERN_ARG1,	offsetof(struct kern_reloc_arg, 
kern_arg1));
   DEFINE(KEXEC_KRELOC_KERN_ARG2,	offsetof(struct kern_reloc_arg, 
kern_arg2));
   DEFINE(KEXEC_KRELOC_KERN_ARG3,	offsetof(struct kern_reloc_arg, 
kern_arg3));
+  DEFINE(KEXEC_KRELOC_EL2_VECTOR,	offsetof(struct kern_reloc_arg, 
el2_vector));

 #endif
   return 0;
 }
diff --git a/arch/arm64/kernel/machine_kexec.c
b/arch/arm64/kernel/machine_kexec.c
index ab571fca9bd1..bd398def7627 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -84,6 +84,11 @@ int machine_kexec_post_load(struct kimage *kimage)
kern_reloc_arg->head = kimage->head;
kern_reloc_arg->entry_addr = kimage->start;
kern_reloc_arg->kern_arg0 = kimage->arch.dtb_mem;
+   /* Setup vector table only when EL2 is available, but no VHE */
+   if (is_hyp_mode_available() && !is_kernel_in_hyp_mode()) {
+   kern_reloc_arg->el2_vector = __pa(reloc_code)
+   + kexec_el2_vectors_offset;
+   }
kexec_image_info(kimage);

return 0;
diff --git a/arch/arm64/kernel/relocate_kernel.S
b/arch/arm64/kernel/relocate_kernel.S
index aa9f2b2cd77c..6fd2fc0ef373 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -89,6 +89,38 @@ ENTRY(arm64_relocate_new_kernel)
 .ltorg
 END(arm64_relocate_new_kernel)

+.macro el1_sync_64
+   br  x4  /* Jump to new world from el2 */
+   .fill 31, 4, 0  /* Set other 31 instr to zeroes */
+.endm


The common idiom to write this is to align the beginning of the
macro, and not to bother about what follows:

.macro whatever
.align 7
br  x4
.endm

Specially given that 0 is an undefined instruction, and I really hate to 
see

those in the actual text. On the contrary, .align generates NOPs.


+
+.macro invalid_vector label
+\label:
+   b \label
+   .fill 31, 4, 0  /* Set other 31 instr to zeroes */
+.endm
+
+/* el2 vectors - switch el2 here while we restore the memory image. */
+   .align 11
+ENTRY(kexec_el2_vectors)


Please see commit 617a2f392c92 ("arm64: kvm: Annotate assembly using 
modern

annoations"), and follow the same pattern.


+   invalid_vector el2_sync_invalid_sp0 /* Synchronous EL2t */
+   invalid_vector el2_irq_invalid_sp0  /* IRQ EL2t */
+   invalid_vector el2_fiq_invalid_sp0  /* FIQ EL2t */
+   invalid_vector el2_error_invalid_sp0/* Error EL2t */
+   invalid_vector el2_sync_invalid_spx /* Synchronous EL2h */
+   invalid_vector el2_irq_invalid_spx  /* IRQ EL2h */
+   invalid_vector el2_fiq_invalid_spx  /* FIQ EL2h */
+   invalid_vector el2_error_invalid_spx/* Error EL2h */
+   el1_sync_64 /* Synchronous 64-bit EL1 */
+   invalid_vector el1_irq_invalid_64   /* IRQ 64-bit EL1 */
+   invalid_vector el1_fiq_invalid_64   /* FIQ 64-bit EL1 */
+   invalid_vector el1_error_invalid_64 /* Error 64-bit EL1 */
+   

Re: [PATCH v9 10/18] arm64: kexec: cpu_soft_restart change argument types

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> Change argument types from unsigned long to a more descriptive
> phys_addr_t.

For 'entry', which is a physical addresses, sure...

> diff --git a/arch/arm64/kernel/cpu-reset.h b/arch/arm64/kernel/cpu-reset.h
> index ed50e9587ad8..38cbd4068019 100644
> --- a/arch/arm64/kernel/cpu-reset.h
> +++ b/arch/arm64/kernel/cpu-reset.h
> @@ -10,17 +10,17 @@
>  
>  #include 
>  
> -void __cpu_soft_restart(unsigned long el2_switch, unsigned long entry,
> - unsigned long arg0, unsigned long arg1, unsigned long arg2);

> +void __cpu_soft_restart(phys_addr_t el2_switch, phys_addr_t entry,
> + phys_addr_t arg0, phys_addr_t arg1, phys_addr_t arg2);

This looks weird because its re-using the hyp-stub API, because it might call 
the hyp-stub
from the idmap. entry is passed in, so this isn't tied to kexec. Without tying 
it to
kexec, how do you know arg2 is a physical address?
I think it tried to be re-usable because 32bit has more users for this.

arg0-2 are unsigned long because the hyp-stub is just moving general purpose 
registers around.

This is to avoid casting?
Sure, its only got one caller. This thing evolved because the platform-has-EL2 
and
kdump-while-KVM-was-running code was bolted on as they were discovered.


> -static inline void __noreturn cpu_soft_restart(unsigned long entry,
> -unsigned long arg0,
> -unsigned long arg1,
> -unsigned long arg2)
> +static inline void __noreturn cpu_soft_restart(phys_addr_t entry,
> +phys_addr_t arg0,
> +phys_addr_t arg1,
> +phys_addr_t arg2)
>  {
>   typeof(__cpu_soft_restart) *restart;
>  
> - unsigned long el2_switch = !is_kernel_in_hyp_mode() &&
> + phys_addr_t el2_switch = !is_kernel_in_hyp_mode() &&
>   is_hyp_mode_available();

What on earth happened here!?


>   restart = (void *)__pa_symbol(__cpu_soft_restart);


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 09/18] arm64: kexec: call kexec_image_info only once

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> Currently, kexec_image_info() is called during load time, and
> right before kernel is being kexec'ed. There is no need to do both.

I think the original logic was if debugging, you'd see the load-time value in 
dmesg, and
the kexec-time values when the machine panic()d and you hadn't made a note of 
the previous
set. But I'm not sure why you'd have these debug messages enabled unless you 
were
debugging kexec.


> So, call it only once when segments are loaded and the physical
> location of page with copy of arm64_relocate_new_kernel is known.

Sure, keep the earlier version that is more likely to be seen.

Acked-by: James Morse 


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 07/18] arm64: trans_pgd: hibernate: idmap the single page that holds the copy page routines

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> From: James Morse 
> 
> To resume from hibernate, the contents of memory are restored from
> the swap image. This may overwrite any page, including the running
> kernel and its page tables.
> 
> Hibernate copies the code it uses to do the restore into a single
> page that it knows won't be overwritten, and maps it with page tables
> built from pages that won't be overwritten.
> 
> Today the address it uses for this mapping is arbitrary, but to allow
> kexec to reuse this code, it needs to be idmapped. To idmap the page
> we must avoid the kernel helpers that have VA_BITS baked in.
> 
> Convert create_single_mapping() to take a single PA, and idmap it.
> The page tables are built in the reverse order to normal using
> pfn_pte() to stir in any bits between 52:48. T0SZ is always increased
> to cover 48bits, or 52 if the copy code has bits 52:48 in its PA.
> 
> Pasha: The original patch from James
> inux-arm-kernel/20200115143322.214247-4-james.mo...@arm.com

-EBROKENLINK

The convention is to use a 'Link:' tag in the signed-off area.
e.g. 5a3577039cbe

> Adopted it to trans_pgd, so it can be commonly used by both Kexec
> and Hibernate. Some minor clean-ups.

Please describe your changes just before your SoB. This means each author 
sign's off on
the stuff above their SoB, and its obvious who made which changes.

Search for 'Lucky K Maintainer' in process/submitting-patches.rst for an 
example.


> diff --git a/arch/arm64/include/asm/trans_pgd.h 
> b/arch/arm64/include/asm/trans_pgd.h
> index 97a7ea73b289..4912d3caf0ca 100644
> --- a/arch/arm64/include/asm/trans_pgd.h
> +++ b/arch/arm64/include/asm/trans_pgd.h
> @@ -32,4 +32,7 @@ int trans_pgd_create_copy(struct trans_pgd_info *info, 
> pgd_t **trans_pgd,
>  int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
>  void *page, unsigned long dst_addr, pgprot_t pgprot);

This trans_pgd_map_page() used to be create_single_mapping(), which is where 
the original
patch made its changes.

You should only need one of these, not both.


> +int trans_pgd_idmap_page(struct trans_pgd_info *info, phys_addr_t 
> *trans_ttbr0,
> +  unsigned long *t0sz, void *page);
> +
>  #endif /* _ASM_TRANS_TABLE_H */

> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
> index 37d7d1c60f65..c2517d1af2af 100644
> --- a/arch/arm64/mm/trans_pgd.c
> +++ b/arch/arm64/mm/trans_pgd.c
> @@ -242,3 +242,52 @@ int trans_pgd_map_page(struct trans_pgd_info *info, 
> pgd_t *trans_pgd,
>  
>   return 0;
>  }
> +
> +/*
> + * The page we want to idmap may be outside the range covered by VA_BITS that
> + * can be built using the kernel's p?d_populate() helpers. As a one off, for 
> a
> + * single page, we build these page tables bottom up and just assume that 
> will
> + * need the maximum T0SZ.
> + *
> + * Returns 0 on success, and -ENOMEM on failure.
> + * On success trans_ttbr0 contains page table with idmapped page, t0sz is 
> set to

> + * maxumum T0SZ for this page.

maxumum

> + */


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 08/18] arm64: kexec: move relocation function setup

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> Currently, kernel relocation function is configured in machine_kexec()
> at the time of kexec reboot by using control_code_page.
> 
> This operation, however, is more logical to be done during kexec_load,
> and thus remove from reboot time. Move, setup of this function to
> newly added machine_kexec_post_load().

This would avoid the need to special-case the cache maintenance, so its a good 
cleanup...


> Because once MMU is enabled, kexec control page will contain more than
> relocation kernel, but also vector table, add pointer to the actual
> function within this page arch.kern_reloc. Currently, it equals to the
> beginning of page, we will add offsets later, when vector table is
> added.

If the vector table always comes second, wouldn't this be extra work to hold 
the value 0?
You can control the layout of this relocation code, as it has to be written in 
assembly.
I don't get why this would be necessary.


> diff --git a/arch/arm64/kernel/machine_kexec.c 
> b/arch/arm64/kernel/machine_kexec.c
> index ae1bad0156cd..ec71a153cc2d 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -58,6 +59,17 @@ void machine_kexec_cleanup(struct kimage *kimage)
>   /* Empty routine needed to avoid build errors. */
>  }
>  
> +int machine_kexec_post_load(struct kimage *kimage)
> +{
> + void *reloc_code = page_to_virt(kimage->control_code_page);
> +
> + memcpy(reloc_code, arm64_relocate_new_kernel,
> +arm64_relocate_new_kernel_size);
> + kimage->arch.kern_reloc = __pa(reloc_code);

Could we move the two cache maintenance calls for this area in here too. 
Keeping it next
to the modification makes it clearer why it is required.

In this case we can use flush_icache_range() instead of its __variant because 
this now
happens much earlier.


> + return 0;
> +}

Regardless,
Reviewed-by: James Morse 


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 05/18] arm64: trans_pgd: pass NULL instead of init_mm to *_populate functions

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> trans_pgd_* should be independent from mm context because the tables that
> are created by this code are used when there are no mm context around, as
> it is between kernels. Simply replace mm_init's with NULL.

arm64's p?d_populate() helpers don't use the mm parameter, so it doesn't make 
any
difference. This was originally done so that if we ever needed anything from 
the mm, we
didn't get a NULL dereference or EL0 behaviour due to a future '!= _mm'.

If you think it matters,
Acked-by: James Morse 


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 03/18] arm64: trans_pgd: make trans_pgd_map_page generic

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> kexec is going to use a different allocator, so make

> trans_pgd_map_page to accept allocator as an argument, and also
> kexec is going to use a different map protection, so also pass
> it via argument.

This trans_pgd_map_page() used to be create_single_mapping() It creates page 
tables that
map one page: the relocation code.

Why do you need a different pgprot? Surely PAGE_KERNEL_EXEC is exactly what you 
want.


> diff --git a/arch/arm64/include/asm/trans_pgd.h 
> b/arch/arm64/include/asm/trans_pgd.h
> index 23153c13d1ce..ad5194ad178d 100644
> --- a/arch/arm64/include/asm/trans_pgd.h
> +++ b/arch/arm64/include/asm/trans_pgd.h
> @@ -12,10 +12,24 @@
>  #include 
>  #include 
>  
> +/*
> + * trans_alloc_page
> + *   - Allocator that should return exactly one zeroed page, if this
> + *allocator fails, trans_pgd returns -ENOMEM error.

trans_pgd is what you pass in to trans_pgd_map_page() or 
trans_pgd_create_copy().
Do you mean what those functions return?


> + *
> + * trans_alloc_arg
> + *   - Passed to trans_alloc_page as an argument
> + */

> diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
> index 3d6f0fd73591..607bb1fbc349 100644
> --- a/arch/arm64/kernel/hibernate.c
> +++ b/arch/arm64/kernel/hibernate.c
> @@ -195,6 +200,11 @@ static int create_safe_exec_page(void *src_start, size_t 
> length,
>unsigned long dst_addr,
>phys_addr_t *phys_dst_addr)
>  {
> + struct trans_pgd_info trans_info = {
> + .trans_alloc_page   = hibernate_page_alloc,
> + .trans_alloc_arg= (void *)GFP_ATOMIC,
> + };

As you need another copy of this in the next patch, is it worth declaring this 
globally
and making it const?


> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
> index d20e48520cef..275a79935d7e 100644
> --- a/arch/arm64/mm/trans_pgd.c
> +++ b/arch/arm64/mm/trans_pgd.c
> @@ -180,8 +185,18 @@ int trans_pgd_create_copy(pgd_t **dst_pgdp, unsigned 
> long start,
>   return rc;
>  }
>  
> -int trans_pgd_map_page(pgd_t *trans_pgd, void *page, unsigned long dst_addr,
> -pgprot_t pgprot)
> +/*
> + * Add map entry to trans_pgd for a base-size page at PTE level.
> + * info: contains allocator and its argument
> + * trans_pgd:page table in which new map is added.
> + * page: page to be mapped.

> + * dst_addr: new VA address for the pages

~s/pages/page/

This thing only maps one page.


> + * pgprot:   protection for the page.
> + *
> + * Returns 0 on success, and -ENOMEM on failure.
> + */
> +int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
> +void *page, unsigned long dst_addr, pgprot_t pgprot)
>  {
>   pgd_t *pgdp;
>   pud_t *pudp;



Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 02/18] arm64: hibernate: move page handling function to new trans_pgd.c

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> Now, that we abstracted the required functions move them to a new home.
> Later, we will generalize these function in order to be useful outside
> of hibernation.

Reviewed-by: James Morse 


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 04/18] arm64: trans_pgd: pass allocator trans_pgd_create_copy

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> Make trans_pgd_create_copy and its subroutines to use allocator that is
> passed as an argument

Reviewed-by: James Morse 


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v9 01/18] arm64: kexec: make dtb_mem always enabled

2020-04-29 Thread James Morse
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> Currently, dtb_mem is enabled only when CONFIG_KEXEC_FILE is
> enabled. This adds ugly ifdefs to c files.

~s/dtb_mem/ARCH_HAS_KIMAGE_ARCH/ ?
dtb_mem is just one member of struct kimage_arch.


> Always enabled dtb_mem, when it is not used, it is NULL.
> Change the dtb_mem to phys_addr_t, as it is a physical address.

Regardless,
Reviewed-by: James Morse 


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-04-29 Thread 萩尾 一仁
Hi Pingfan,

> -Original Message-
> Hi Kazu and Cascardo,
> 
> I encounter a weird problem when running makedumpfile on a s390 machine.
> 
> Our production kernel uses extreme sparse memory model, and has the
> following:
> 
> in mm/sparse.c
> 
> #ifdef CONFIG_SPARSEMEM_EXTREME
> struct mem_section **mem_section;
> #else
> struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
> cacheline_internodealigned_in_smp;
> #endif
> 
> So in makedumpfile.c, get_mem_section(), it got a failed result when the
> first call site to validate_mem_section(), then it should success at the
> second call site to validate_mem_section(), which is inside if
> (is_sparsemem_extreme()) condition.

I think your production kernel should have kernel commit a0b1280368d1
("kdump: write correct address of mem_section into vmcoreinfo"), so the
first call should return TRUE and the second one should return FALSE.

> 
> But the actual result is not like expected.
> 
> After introducing
> commit e113f1c974c820f9633dc0073eda525d7575f365[PATCH] cope with
> not-present mem section
> 
> I got two successful calls to validate_mem_section(), and finally failed
> at the condition
>   ret = symbol_valid ^ pointer_valid;
>   if (!ret) {
>   ERRMSG("Could not validate mem_section.\n");
>   }
> 
> 
> Do you have any idea?

Presumably this will be what I expected that it might be possible.
I can apply the patch below this time, what about this?
https://github.com/k-hagio/makedumpfile-old/commit/ce883df3864a5744ac0f1eff47de06b5074edb5f.patch

or, we can also investigate why the second call returns TRUE, and
fix the conditions in the validate_mem_section()..

Thanks,
Kazu
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-04-29 Thread piliu
Hi Kazu and Cascardo,

I encounter a weird problem when running makedumpfile on a s390 machine.

Our production kernel uses extreme sparse memory model, and has the
following:

in mm/sparse.c

#ifdef CONFIG_SPARSEMEM_EXTREME
struct mem_section **mem_section;
#else
struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
cacheline_internodealigned_in_smp;
#endif

So in makedumpfile.c, get_mem_section(), it got a failed result when the
first call site to validate_mem_section(), then it should success at the
second call site to validate_mem_section(), which is inside if
(is_sparsemem_extreme()) condition.

But the actual result is not like expected.

After introducing
commit e113f1c974c820f9633dc0073eda525d7575f365[PATCH] cope with
not-present mem section

I got two successful calls to validate_mem_section(), and finally failed
at the condition
ret = symbol_valid ^ pointer_valid;
if (!ret) {
ERRMSG("Could not validate mem_section.\n");
}


Do you have any idea?

Thanks,
Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec