[PATCH] arm64: kexec_file: return successfully even if kaslr-seed doesn't exist

2019-01-10 Thread AKASHI Takahiro
In kexec_file_load, kaslr-seed property of the current dtb will be deleted
any way before setting a new value if possible. It doesn't matter whether
it exists in the current dtb.

So "ret" should be reset to 0 here.

Fixes: commit 884143f60c89 ("arm64: kexec_file: add kaslr support")
Signed-off-by: AKASHI Takahiro 
---
 arch/arm64/kernel/machine_kexec_file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/machine_kexec_file.c 
b/arch/arm64/kernel/machine_kexec_file.c
index 10e33860e47a..f2c211a6229b 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -87,7 +87,9 @@ static int setup_dtb(struct kimage *image,
 
/* add kaslr-seed */
ret = fdt_delprop(dtb, off, FDT_PROP_KASLR_SEED);
-   if (ret && (ret != -FDT_ERR_NOTFOUND))
+   if  (ret == -FDT_ERR_NOTFOUND)
+   ret = 0;
+   else if (ret)
goto out;
 
if (rng_is_initialized()) {
-- 
2.19.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv5] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-10 Thread Pingfan Liu
On Wed, Jan 9, 2019 at 10:25 PM Baoquan He  wrote:
>
> On 01/08/19 at 05:48pm, Mike Rapoport wrote:
> > On Tue, Jan 08, 2019 at 05:01:38PM +0800, Baoquan He wrote:
> > > Hi Mike,
> > >
> > > On 01/08/19 at 10:05am, Mike Rapoport wrote:
> > > > I'm not thrilled by duplicating this code (yet again).
> > > > I liked the v3 of this patch [1] more, assuming we allow bottom-up mode 
> > > > to
> > > > allocate [0, kernel_start) unconditionally.
> > > > I'd just replace you first patch in v3 [2] with something like:
> > >
> > > In initmem_init(), we will restore the top-down allocation style anyway.
> > > While reserve_crashkernel() is called after initmem_init(), it's not
> > > appropriate to adjust memblock_find_in_range_node(), and we really want
> > > to find region bottom up for crashkernel reservation, no matter where
> > > kernel is loaded, better call __memblock_find_range_bottom_up().
> > >
> > > Create a wrapper to do the necessary handling, then call
> > > __memblock_find_range_bottom_up() directly, looks better.
> >
> > What bothers me is 'the necessary handling' which is already done in
> > several places in memblock in a similar, but yet slightly different way.
>
> The page aligning for start and the mirror flag setting, I suppose.
> >
> > memblock_find_in_range() and memblock_phys_alloc_nid() retry with different
> > MEMBLOCK_MIRROR, but memblock_phys_alloc_try_nid() does that only when
> > allocating from the specified node and does not retry when it falls back to
> > any node. And memblock_alloc_internal() has yet another set of fallbacks.
>
> Get what you mean, seems they are trying to allocate within mirrorred
> memory region, if fail, try the non-mirrorred region. If kernel data
> allocation failed, no need to care about if it's movable or not, it need
> to live firstly. For the bottom-up allocation wrapper, maybe we need do
> like this too?
>
> >
> > So what should be the necessary handling in the wrapper for
> > __memblock_find_range_bottom_up() ?
> >
> > BTW, even without any memblock modifications, retrying allocation in
> > reserve_crashkerenel() for different ranges, like the proposal at [1] would
> > also work, wouldn't it?
>
> Yes, it also looks good. This patch only calls once, seems a simpler
> line adding.
>
> In fact, below one and this patch, both is fine to me, as long as it
> fixes the problem customers are complaining about.
>
It seems that there is divergence on opinion. Maybe it is easier to
fix this bug by dyoung's patch. I will repost his patch.

Thanks and regards,
Pingfan
> >
> > [1] http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
>
> Thanks
> Baoquan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv5] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-10 Thread Pingfan Liu
On Thu, Jan 10, 2019 at 3:57 PM Mike Rapoport  wrote:
>
> Hi Pingfan,
>
> On Wed, Jan 09, 2019 at 09:02:41PM +0800, Pingfan Liu wrote:
> > On Tue, Jan 8, 2019 at 11:49 PM Mike Rapoport  wrote:
> > >
> > > On Tue, Jan 08, 2019 at 05:01:38PM +0800, Baoquan He wrote:
> > > > Hi Mike,
> > > >
> > > > On 01/08/19 at 10:05am, Mike Rapoport wrote:
> > > > > I'm not thrilled by duplicating this code (yet again).
> > > > > I liked the v3 of this patch [1] more, assuming we allow bottom-up 
> > > > > mode to
> > > > > allocate [0, kernel_start) unconditionally.
> > > > > I'd just replace you first patch in v3 [2] with something like:
> > > >
> > > > In initmem_init(), we will restore the top-down allocation style anyway.
> > > > While reserve_crashkernel() is called after initmem_init(), it's not
> > > > appropriate to adjust memblock_find_in_range_node(), and we really want
> > > > to find region bottom up for crashkernel reservation, no matter where
> > > > kernel is loaded, better call __memblock_find_range_bottom_up().
> > > >
> > > > Create a wrapper to do the necessary handling, then call
> > > > __memblock_find_range_bottom_up() directly, looks better.
> > >
> > > What bothers me is 'the necessary handling' which is already done in
> > > several places in memblock in a similar, but yet slightly different way.
> > >
> > > memblock_find_in_range() and memblock_phys_alloc_nid() retry with 
> > > different
> > > MEMBLOCK_MIRROR, but memblock_phys_alloc_try_nid() does that only when
> > > allocating from the specified node and does not retry when it falls back 
> > > to
> > > any node. And memblock_alloc_internal() has yet another set of fallbacks.
> > >
> > > So what should be the necessary handling in the wrapper for
> > > __memblock_find_range_bottom_up() ?
> > >
> > Well, it is a hard choice.
> > > BTW, even without any memblock modifications, retrying allocation in
> > > reserve_crashkerenel() for different ranges, like the proposal at [1] 
> > > would
> > > also work, wouldn't it?
> > >
> > Yes, it can work. Then is it worth to expose the bottom-up allocation
> > style beside for hotmovable purpose?
>
> Some architectures use bottom-up as a "compatability" mode with bootmem.
> And, I believe, powerpc and s390 use bottom-up to make some of the
> allocations close to the kernel.
>
Ok, got it. Thanks.

Best regards,
Pingfan

> > Thanks,
> > Pingfan
> > > [1] http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > >
> > > > Thanks
> > > > Baoquan
> > > >
> > > > >
> > > > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > > > index 7df468c..d1b30b9 100644
> > > > > --- a/mm/memblock.c
> > > > > +++ b/mm/memblock.c
> > > > > @@ -274,24 +274,14 @@ phys_addr_t __init_memblock 
> > > > > memblock_find_in_range_node(phys_addr_t size,
> > > > >  * try bottom-up allocation only when bottom-up mode
> > > > >  * is set and @end is above the kernel image.
> > > > >  */
> > > > > -   if (memblock_bottom_up() && end > kernel_end) {
> > > > > -   phys_addr_t bottom_up_start;
> > > > > -
> > > > > -   /* make sure we will allocate above the kernel */
> > > > > -   bottom_up_start = max(start, kernel_end);
> > > > > -
> > > > > +   if (memblock_bottom_up()) {
> > > > > /* ok, try bottom-up allocation first */
> > > > > -   ret = __memblock_find_range_bottom_up(bottom_up_start, 
> > > > > end,
> > > > > +   ret = __memblock_find_range_bottom_up(start, end,
> > > > >   size, align, nid, 
> > > > > flags);
> > > > > if (ret)
> > > > > return ret;
> > > > >
> > > > > /*
> > > > > -* we always limit bottom-up allocation above the kernel,
> > > > > -* but top-down allocation doesn't have the limit, so
> > > > > -* retrying top-down allocation may succeed when bottom-up
> > > > > -* allocation failed.
> > > > > -*
> > > > >  * bottom-up allocation is expected to be fail very 
> > > > > rarely,
> > > > >  * so we use WARN_ONCE() here to see the stack trace if
> > > > >  * fail happens.
> > > > >
> > > > > [1] 
> > > > > https://lore.kernel.org/lkml/1545966002-3075-3-git-send-email-kernelf...@gmail.com/
> > > > > [2] 
> > > > > https://lore.kernel.org/lkml/1545966002-3075-2-git-send-email-kernelf...@gmail.com/
> > > > >
> > > > > > +
> > > > > > + return ret;
> > > > > > +}
> > > > > > +
> > > > > >  /**
> > > > > >   * __memblock_find_range_top_down - find free area utility, in 
> > > > > > top-down
> > > > > >   * @start: start of candidate range
> > > > > > --
> > > > > > 2.7.4
> > > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours,
> > > > > Mike.
> > > > >
> > > >
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
> > >
> >
>
> --
> Sincerely yours,
> Mike.
>

___
kexec mailing list
kexec@lists.infradead.org
http://list

Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on

2019-01-10 Thread Bhupesh Sharma
Hi Qian,

On Sat, Dec 15, 2018 at 7:24 AM Qian Cai  wrote:
>
> On 12/14/18 2:23 AM, Ard Biesheuvel wrote:
> > On Fri, 14 Dec 2018 at 05:08, Qian Cai  wrote:
> >> Also tried to move the local TLB flush part around a bit inside
> >> __cpu_setup(), although it did complete kdump some times, it did trigger
> >> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> >> seems no way to recover remotely without reinstalling the OS.
> >
> > This doesn't make any sense to me. If the system gets into a weird
> > state out of cold reboot, how could this code be the culprit? Please
> > check your firmware, and try to reproduce the issue on a system that
> > doesn't have such defects.
> >
>
> I'll continue investigating those "Synchronous Exception" although it is kind 
> of
> hard due to I don't have any source code of the firmware to confirm it is 
> buggy
> or not.
>
> I did manage to reproduce this kdump issue on around 5 of those server 
> running a
> fairly recent version of the firmware (07/01/2018). I don't have access to 
> other
> large CPU machines.

Sorry I got busy with some other stuff, but as I reported earlier, I
am not able to reproduce this on my HPE apollo with the latest linus
tree as well.
Here are some details on my setup:

1. # uname -r
5.0.0-rc1+

with the following commit as the HEAD:
commit a88cc8da0279f8e481b0d90e51a0a1cffac55906 (HEAD -> master,
origin/master, origin/HEAD)
Merge: 9cb2feb4d21d 73444bc4d8f9
Author: Linus Torvalds 
Date:   Tue Jan 8 18:58:29 2019 -0800

Merge branch 'akpm' (patches from Andrew)

2. I use the following kdump commandline:
Kernel command line: BOOT_IMAGE=(hd9,gpt2)/vmlinuz-5.0.0-rc1+ ro
irqpoll nr_cpus=1 swiotlb=noforce reset_devices
earlycon=pl011,mmio,0x40202

3. I am able to run kdump successfully on the machine and also collect
the crash core properly:

.. snip..
kdump: saving to /sysroot//var/crash/127.0.0.1-2019-01-10-10:52:25/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Copying data  : [100.0 %] \
   eta: 0s
kdump: saving vmcore complete
.. snip ..

4. I use the same firmware version on the board as you shared earlier:
# dmidecode | grep -A 20 -i "BIOS Information"
BIOS Information
Vendor: American Megatrends Inc.
Version: L50_5.13_1.0.6
Release Date: 07/10/2018
Address: 0xF
Runtime Size: 64 kB
ROM Size: 64 MB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
ACPI is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 6.3

So, I am guessing that it might be a kdump command line issue at your end.

Thanks,
Bhupesh

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-10 Thread Lianbo Jiang
This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 Documentation/kdump/vmcoreinfo.txt | 500 +
 1 file changed, 500 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index ..8e444586b87b
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,500 @@
+
+   VMCOREINFO
+
+
+===
+What is the VMCOREINFO?
+===
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+
+Common variables
+
+
+init_uts_ns.name.release
+
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built.
+
+PAGE_SIZE
+-
+
+The size of a page. It is the smallest unit of data for memory
+management in kernel. It is usually 4096 bytes and a page is aligned
+on 4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+---
+
+This is the UTS namespace, which is used to isolate two specific
+elements of the system that relate to the uname(2) system call. The UTS
+namespace is named after the data structure used to store information
+returned by the uname(2) system call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---
+
+An array node_states[N_ONLINE] which represents the set of online node
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+-
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+--
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--
+
+Stores the virtual area list. makedumpfile can get the vmalloc start
+value from this variable. This value is necessary for vmalloc translation.
+
+mem_map
+---
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+
+
+Makedumpfile can get the pglist_data structure from this symbol, which
+is used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them are used to translate an
+address.
+
+page
+
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute the contiguous memory.
+
+pglist_data
+---
+
+The size of a pglist_data structure. This value will be used to check
+if the pglist_data structure is valid. It is also used for checking the
+memory type.
+
+zone
+
+
+The size of a zone structure. This value is often used to check if the
+zone structure has been found. It is also used for excluding free pages.
+
+free_area
+-
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful for excluding free pages.
+
+list_head
+-
+
+The size of a list_head structure. Used when iterating lists in a
+post-mortem analysis session.
+
+nodemask_t
+--
+
+The size of a nodemask_t type. Used to compute the number of online
+nodes.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
+   compound_order|compound_head)
+---
+
+User-space tools can compute their values based on the offset of these
+variables. The variables are helpful to exclude unnecessary pages.
+
+(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn

[PATCH 0/2 v6] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2019-01-10 Thread Lianbo Jiang
This patchset did two things:
a. add a new document for vmcoreinfo

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo.

b. export the value of sme mask to vmcoreinfo

For AMD machine with SME feature, makedumpfile tools need to know whether
the crashed kernel was encrypted or not. If SME is enabled in the first
kernel, the crashed kernel's page table(pgd/pud/pmd/pte) contains the
memory encryption mask, so makedumpfile needs to remove the sme mask to
obtain the true physical address.

Changes since v1:
1. No need to export a kernel-internal mask to userspace, so copy the
value of sme_me_mask to a local variable 'sme_mask' and write the value
of sme_mask to vmcoreinfo.
2. Add comment for the code.
3. Improve the patch log.
4. Add the vmcoreinfo documentation.

Changes since v2:
1. Improve the vmcoreinfo document, add more descripts for these
variables exported.
2. Fix spelling errors in the document.

Changes since v3:
1. Still improve the vmcoreinfo document, and make it become more
clear and easy to read.
2. Move sme_mask comments in the code to the vmcoreinfo document.
3. Improve patch log.

Changes since v4:
1. Remove a command that dumping the VMCOREINFO contents from this
   document.
2. Merge the 'PG_buddy' and 'PG_offline' into the PG_* flag in this
   document.
3. Correct some of the mistakes in this document.

Changes since v5:
1. Improve patch log.

Lianbo Jiang (2):
  kdump: add the vmcoreinfo documentation
  kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

 Documentation/kdump/vmcoreinfo.txt | 500 +
 arch/x86/kernel/machine_kexec_64.c |   3 +
 2 files changed, 503 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/2 v6] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2019-01-10 Thread Lianbo Jiang
For AMD machine with SME feature, makedumpfile tools need to know
whether the crashed kernel was encrypted or not. If SME is enabled
in the first kernel, the crashed kernel's page table(pgd/pud/pmd/pte)
contains the memory encryption mask, so makedumpfile needs to remove
the sme mask to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 4c8acdfdc5a7..bc4108096b18 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -352,10 +352,13 @@ void machine_kexec(struct kimage *image)
 
 void arch_crash_save_vmcoreinfo(void)
 {
+   u64 sme_mask = sme_me_mask;
+
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
pgtable_l5_enabled());
+   VMCOREINFO_NUMBER(sme_mask);
 
 #ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec