Re: [PATCH v2] x86/kdump: Reserve extra memory when SME or SEV is active

2019-08-30 Thread Borislav Petkov
On Mon, Aug 26, 2019 at 12:45:35PM +0800, Kairui Song wrote:
> Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"),
> SWIOTLB will be enabled even if there is less than 4G of memory when SME
> is active, to support DMA of devices that not support address with the
> encrypt bit.
> 
> And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is
> active") make the kernel keep SWIOTLB enabled even if there is an IOMMU.
> 
> Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory
> encryption") will always force SWIOTLB to be enabled when SEV is active
> in all cases.
> 
> Now, when either SME or SEV is active, SWIOTLB will be force enabled,
> and this is also true for kdump kernel. As a result kdump kernel will
> run out of already scarce pre-reserved memory easily.
> 
> So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure
> kdump kernel have enough memory, except when "crashkernel=size[KMG],high"
> is specified or any offset is used. As for the high reservation case, an
> extra low memory region will always be reserved and that is enough for
> SWIOTLB. Else if the offset format is used, user should be fully aware
> of any possible kdump kernel memory requirement and have to organize the
> memory usage carefully.
> 
> Signed-off-by: Kairui Song 
> 
> ---
> Update from V1:
> - Use mem_encrypt_active() instead of "sme_active() || sev_active()"
> - Don't reserve extra memory when ",high" or "@offset" is used, and
>   don't print redundant message.
> - Fix coding style problem
> 
>  arch/x86/kernel/setup.c | 31 ---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bbe35bf879f5..221beb10c55d 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -528,7 +528,7 @@ static int __init reserve_crashkernel_low(void)
>  
>  static void __init reserve_crashkernel(void)
>  {
> - unsigned long long crash_size, crash_base, total_mem;
> + unsigned long long crash_size, crash_base, total_mem, mem_enc_req;
>   bool high = false;
>   int ret;
>  
> @@ -550,6 +550,15 @@ static void __init reserve_crashkernel(void)
>   return;
>   }
>  
> + /*
> +  * When SME/SEV is active, it will always required an extra SWIOTLB
> +  * region.
> +  */
> + if (mem_encrypt_active())
> + mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M);
> + else
> + mem_enc_req = 0;

Hmm, ugly.

You set mem_enc_reg here ...

> +
>   /* 0 means: find the address automatically */
>   if (!crash_base) {
>   /*
> @@ -563,11 +572,19 @@ static void __init reserve_crashkernel(void)
>   if (!high)
>   crash_base = memblock_find_in_range(CRASH_ALIGN,
>   CRASH_ADDR_LOW_MAX,
> - crash_size, CRASH_ALIGN);
> - if (!crash_base)
> + crash_size + mem_enc_req,
> + CRASH_ALIGN);
> + /*
> +  * For high reservation, an extra low memory for SWIOTLB will
> +  * always be reserved later, so no need to reserve extra
> +  * memory for memory encryption case here.
> +  */
> + if (!crash_base) {
> + mem_enc_req = 0;

... but you clear it here...

>   crash_base = memblock_find_in_range(CRASH_ALIGN,
>   CRASH_ADDR_HIGH_MAX,
>   crash_size, CRASH_ALIGN);
> + }
>   if (!crash_base) {
>   pr_info("crashkernel reservation failed - No suitable 
> area found.\n");
>   return;
> @@ -575,6 +592,7 @@ static void __init reserve_crashkernel(void)
>   } else {
>   unsigned long long start;
>  
> + mem_enc_req = 0;

... and here...

>   start = memblock_find_in_range(crash_base,
>  crash_base + crash_size,
>  crash_size, 1 << 20);
> @@ -583,6 +601,13 @@ static void __init reserve_crashkernel(void)
>   return;
>   }
>   }
> +
> + if (mem_enc_req) {
> + pr_info("Memory encryption is active, crashkernel needs %ldMB 
> extra memory\n",
> + (unsigned long)(mem_enc_req >> 20));
> + crash_size += mem_enc_req;
> + }

... and then you report only when it is still set.

How about you carve out that if (!crash_base) { ... } else { } piece
into a separate function without any further changes - only code
movement? That is your patch 1.

Your patch 2 is then adding the mem_encrypt_active() check in the if
(!crash_base && !high) case, i.e., only 

Re: [PATCH V40 08/29] kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE

2019-08-30 Thread Philipp Rudo
Hi Matthew,

found a typo ...

On Mon, 19 Aug 2019 17:17:44 -0700
Matthew Garrett  wrote:

[...]

> index 6d0635ceddd0..9b4f37a4edf1 100644
> --- a/arch/s390/kernel/kexec_elf.c
> +++ b/arch/s390/kernel/kexec_elf.c
> @@ -130,7 +130,7 @@ static int s390_elf_probe(const char *buf, unsigned long 
> len)
>  const struct kexec_file_ops s390_kexec_elf_ops = {
>   .probe = s390_elf_probe,
>   .load = s390_elf_load,
> -#ifdef CONFIG_KEXEC_VERIFY_SIG
> +#ifdef CONFIG_KEXEC__SIG
  ^^
... here.
>   .verify_sig = s390_verify_sig,
> -#endif /* CONFIG_KEXEC_VERIFY_SIG */
> +#endif /* CONFIG_KEXEC_SIG */
>  };

Thanks
Philipp



Re: [PATCHv2 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce

2019-08-30 Thread Borislav Petkov
On Tue, Aug 27, 2019 at 11:02:19AM +0800, Pingfan Liu wrote:
> v1 -> v2: fix compile warning and error on x86_32
> 
> 
> This series include two related groups:
> [1-3/4]: protect nr_cpus from rebooting by broadcast mce
> [4/4]: improve "kexec -l" robustness against broadcast mce
> 
> When I tried to fix [1], Thomas raised concern about the nr_cpus' 
> vulnerability
> to unexpected rebooting by broadcast mce. After analysis, I think only the
> following first case suffers from the rebooting by broadcast mce. [1-3/4] aims
> to fix that issue.
> 
> *** Back ground ***
> 
> On x86 it's required to have all logical CPUs set CR4.MCE=1. Otherwise, a
> broadcast MCE observing CR4.MCE=0b on any core will shutdown the machine.
> 
> The option 'nosmt' has already complied with the above rule by Thomas's patch.
> For detail, refer to 506a66f3748 (Revert "x86/apic: Ignore secondary threads 
> if
> nosmt=force")
> 
> But for nr_cpus option, the exposure to broadcast MCE is a little complicated,
> and can be categorized into three cases.

One thing is not clear to me: are you "fixing" a hypothetical case here
or have you *actually* experienced an MCE happening while kdumping with
nr_cpus < num_online_cpus()?

Btw, pls do not use lkml.org to refer to previous mails but

http://lkml.kernel.org/r/

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: crash: `kmem -s` reported "kmem: dma-kmalloc-512: slab: ffffe192c0001000 invalid freepointer: e5ffef4e9a040b7e" on a dumped vmcore

2019-08-30 Thread lijiang
在 2019年08月17日 15:23, lijiang 写道:
> 在 2019年08月11日 10:29, lijiang 写道:
>> 在 2019年08月09日 06:37, Lendacky, Thomas 写道:
>>> On 8/1/19 8:05 PM, Dave Young wrote:
 Add kexec cc list.
 On 08/01/19 at 11:02pm, lijiang wrote:
> Hi, Tom
>
> Recently, i ran into a problem about SME and used crash tool to check the 
> vmcore as follow:
>
> crash> kmem -s | grep -i invalid
> kmem: dma-kmalloc-512: slab: e192c0001000 invalid freepointer: 
> e5ffef4e9a040b7e
> kmem: dma-kmalloc-512: slab: e192c0001000 invalid freepointer: 
> e5ffef4e9a040b7e
>
> And the crash tool reported the above error, probably, the main reason is 
> that kernel does not
> correctly handle the first 640k region when SME is enabled.
>
> When SME is enabled, the kernel and initramfs images are loaded into the 
> decrypted memory, and
> the backup area(first 640k) is also mapped as decrypted, but the first 
> 640k data is copied to
> the backup area in purgatory(). Please refer to this file: 
> arch/x86/purgatory/purgatory.c
> ..
> static int copy_backup_region(void)
> {
>  if (purgatory_backup_dest) {
>  memcpy((void *)purgatory_backup_dest,
> (void *)purgatory_backup_src, 
> purgatory_backup_sz);
>  }
>  return 0;
> }
> ..
>
> arch/x86/kernel/machine_kexec_64.c
> ..
> machine_kexec_prepare()->
> arch_update_purgatory()->
> .
>
> Actually, the firs 640k area is encrypted in the first kernel when SME is 
> enabled, here kernel
> copies the first 640k data to the backup area in purgatory(), because the 
> backup area is mapped
> as decrypted, this copying operation makes that the first 640k data is 
> decrypted(decoded) and
> saved to the backup area, but probably kernel can not aware of SME in 
> purgatory(), which causes
> kernel mistakenly read out the first 640k.
>
> In addition, i hacked kernel code as follow:
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 7bcc92add72c..a51631d36a7a 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -377,6 +378,16 @@ static ssize_t __read_vmcore(char *buffer, size_t 
> buflen, loff_t *fpos,
>  m->offset + m->size - *fpos,
>  buflen);
>  start = m->paddr + *fpos - m->offset;
> +   if (m->paddr == 0x73f6) {//the backup area's 
> start address:0x73f6
> +   tmp = read_from_oldmem(buffer, tsz, 
> ,
> +   userbuf, false);
> +   } else
>  tmp = read_from_oldmem(buffer, tsz, 
> ,
> userbuf, 
> mem_encrypt_active());
>  if (tmp < 0)
>
> Here, i used the crash tool to check the vmcore, i can see that the 
> backup area is decrypted,
> except for the dma-kmalloc-512. So i suspect that kernel did not 
> correctly read out the first
> 640k data to backup area. Do you happen to know how to deal with the 
> first 640k area in purgatory()
> when SME is enabled? Any idea?
>>>
>>> I'm not all that familiar with kexec and purgatory, etc., but I think
>>> that you want to setup the page table that is active when purgatory runs
>>> so that the src and dest both have the SME encryption mask set in their
>>> respective page table entries. This way, when the copy is performed,
>>> everything is copied correctly. 
>>
>> Exactly. That's just what i was thinking.
>>
> 
> I tried to setup the 1:1 mapping in the init_pgtable() with the memory 
> encryption mask, but that still
> did not correctly access the encrypted memory in purgatory(). I'm not sure 
> whether i missed anything
> else, i'm still digging into it.
> 

As we know, kdump kernel will reuse the first 640k region, so the old content 
in the first 640k area will
be copied to a backup area, which is done in purgatory(). When dumping the 
vmcore, kdump kernel will read
the old content of the first 640k area from the backup area. 

According to above description, when SME is enabled in the first kernel, kernel 
has to setup the identity
mapping for the first 640k area with encryption mask so that kernel can 
correctly access the old memory.
And also setup the identity mapping for the backup region with encryption mask. 
But kdump kernel won't
properly deal with the encrypted memory before SME is enabled, which causes the 
failure of kdump kernel
boot.

So i planed to setup the temporary mapping of page table with encryption mask 
for the first 640k area and
backup region in purgatory().

> I guess that should make the 1:1 mapping 

Re: [PATCH] x86: Fix PAGE_OFFSET for kernels since 4.20

2019-08-30 Thread Donald Buczek

Dear Baoquan,

On 8/30/19 12:05 PM, Bhupesh Sharma wrote:

On Fri, Aug 30, 2019 at 3:04 PM Donald Buczek  wrote:


Dear Baoquan,

On 8/30/19 11:23 AM, Baoquan He wrote:

On 08/30/19 at 11:12am, Donald Buczek wrote:

Linux kernel commit d52888aa2753 ("x86/mm: Move LDT remap out of KASLR
region on 5-level paging") changed the base of the direct mapping
from 0x8800 to 0x8880. This was merged
into v4.20-rc2.


A good catch and necessary fix, thanks.

Does it have issue in makedumpfile?


We don't use makedumpfile. We use `cp /proc/vmcore /mnt/crash.vmcore` in the 
panic kernel.


That shouldn't be a problem in makedumpfile as we have a generic way
to calculate the PAGE_OFFSET value there from the PT_LOADs in the
'/proc/kcore' file (which I mentioned in the other email conversation,
see [0]):

static int
get_page_offset_x86_64(void)
{
<..snip..>
 if (get_num_pt_loads()) {
 /*
  * Linux 4.19 (only) adds KCORE_REMAP PT_LOADs, which have
  * virt_start < __START_KERNEL_map, to /proc/kcore. In order
  * not to select them, we select the last valid PT_LOAD.
  */
 for (i = 0;
 get_pt_load(i, _start, NULL, _start, NULL);
 i++) {
 if (virt_start != NOT_KV_ADDR
 && virt_start < __START_KERNEL_map
 && phys_start != NOT_PADDR) {
 page_offset = virt_start - phys_start;
 }
 }
 if (page_offset) {
 info->page_offset = page_offset;
 DEBUG_MSG("page_offset  : %lx (pt_load)\n",
 info->page_offset);
 return TRUE;
 }
 }
<..snip..>

Also as I mentioned in the other thread, I don't think adding
different MACRO value for a kernel version is a long-term maintainable
approach. Instead I am working on adding a similar functionality as
present in makedumpfile to make the PAGE_OFFSET calculation generic.
Only if we fail to calculate PAGE_OFFSET through a generic method
should we fall back on MACRO values for backward compatibility.


Very good,.


I will try to post the patch for reviews by tomorrow.


I'll be happy to test that.

Best
  Donald



[0]. https://lkml.org/lkml/2019/8/28/1060

Thanks,
Bhupesh


Without this patch, the file /mnt/crash.vmcore has the wrong vaddr in the elf 
headers and can't be processed by gdb or crash.

With this patch, the file has the correct vaddr an can be processed by gdb and 
crash.

Btw:  I've got four strange error replies for my mail.

  Date: 30 Aug 2019 05:13:31 EDT
  From: mailer_dae...@email.uscc.net
  To: buc...@molgen.mpg.de
  Message to 7867650...@email.uscc.net failed.

I hope, my patch reaches the right people anyway.

Best
Donald




#ifdef __x86_64__
#define __PAGE_OFFSET_ORIG  (0x8100) /* 2.6.26, or former */
#define __PAGE_OFFSET_2_6_27(0x8800) /* 2.6.27, or later  */
#define __PAGE_OFFSET_5LEVEL(0xff10) /* 5-level page table */
...
#endif

Thanks
Baoquan



Update to new address accordingly.
---
   kexec/arch/i386/crashdump-x86.c | 2 ++
   kexec/arch/i386/crashdump-x86.h | 3 ++-
   2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index a2aea31..c79791f 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -61,6 +61,8 @@ static int get_kernel_page_offset(struct kexec_info 
*UNUSED(info),

  if (kv < KERNEL_VERSION(2, 6, 27))
  elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_2_6_27;
+else if (kv < KERNEL_VERSION(4, 20, 0))
+elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_4_20_0;
  else
  elf_info->page_offset = X86_64_PAGE_OFFSET;
  }
diff --git a/kexec/arch/i386/crashdump-x86.h b/kexec/arch/i386/crashdump-x86.h
index ddee19f..e4fdc82 100644
--- a/kexec/arch/i386/crashdump-x86.h
+++ b/kexec/arch/i386/crashdump-x86.h
@@ -13,7 +13,8 @@ int load_crashdump_segments(struct kexec_info *info, char 
*mod_cmdline,

   #define X86_64__START_KERNEL_map   0x8000ULL
   #define X86_64_PAGE_OFFSET_PRE_2_6_27  0x8100ULL
-#define X86_64_PAGE_OFFSET  0x8800ULL
+#define X86_64_PAGE_OFFSET_PRE_4_20_0   0x8800ULL
+#define X86_64_PAGE_OFFSET  0x8880ULL

   #define X86_64_MAXMEM  0x3fffUL

--
2.22.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec



--
Donald Buczek
buc...@molgen.mpg.de
Tel: +49 30 8413 1433



--
Donald Buczek
buc...@molgen.mpg.de
Tel: +49 30 8413 1433

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: Fix PAGE_OFFSET for kernels since 4.20

2019-08-30 Thread Bhupesh Sharma
On Fri, Aug 30, 2019 at 3:04 PM Donald Buczek  wrote:
>
> Dear Baoquan,
>
> On 8/30/19 11:23 AM, Baoquan He wrote:
> > On 08/30/19 at 11:12am, Donald Buczek wrote:
> >> Linux kernel commit d52888aa2753 ("x86/mm: Move LDT remap out of KASLR
> >> region on 5-level paging") changed the base of the direct mapping
> >> from 0x8800 to 0x8880. This was merged
> >> into v4.20-rc2.
> >
> > A good catch and necessary fix, thanks.
> >
> > Does it have issue in makedumpfile?
>
> We don't use makedumpfile. We use `cp /proc/vmcore /mnt/crash.vmcore` in the 
> panic kernel.

That shouldn't be a problem in makedumpfile as we have a generic way
to calculate the PAGE_OFFSET value there from the PT_LOADs in the
'/proc/kcore' file (which I mentioned in the other email conversation,
see [0]):

static int
get_page_offset_x86_64(void)
{
<..snip..>
if (get_num_pt_loads()) {
/*
 * Linux 4.19 (only) adds KCORE_REMAP PT_LOADs, which have
 * virt_start < __START_KERNEL_map, to /proc/kcore. In order
 * not to select them, we select the last valid PT_LOAD.
 */
for (i = 0;
get_pt_load(i, _start, NULL, _start, NULL);
i++) {
if (virt_start != NOT_KV_ADDR
&& virt_start < __START_KERNEL_map
&& phys_start != NOT_PADDR) {
page_offset = virt_start - phys_start;
}
}
if (page_offset) {
info->page_offset = page_offset;
DEBUG_MSG("page_offset  : %lx (pt_load)\n",
info->page_offset);
return TRUE;
}
}
<..snip..>

Also as I mentioned in the other thread, I don't think adding
different MACRO value for a kernel version is a long-term maintainable
approach. Instead I am working on adding a similar functionality as
present in makedumpfile to make the PAGE_OFFSET calculation generic.
Only if we fail to calculate PAGE_OFFSET through a generic method
should we fall back on MACRO values for backward compatibility.

I will try to post the patch for reviews by tomorrow.

[0]. https://lkml.org/lkml/2019/8/28/1060

Thanks,
Bhupesh

> Without this patch, the file /mnt/crash.vmcore has the wrong vaddr in the elf 
> headers and can't be processed by gdb or crash.
>
> With this patch, the file has the correct vaddr an can be processed by gdb 
> and crash.
>
> Btw:  I've got four strange error replies for my mail.
>
>  Date: 30 Aug 2019 05:13:31 EDT
>  From: mailer_dae...@email.uscc.net
>  To: buc...@molgen.mpg.de
>  Message to 7867650...@email.uscc.net failed.
>
> I hope, my patch reaches the right people anyway.
>
> Best
>Donald
>
>
> >
> > #ifdef __x86_64__
> > #define __PAGE_OFFSET_ORIG  (0x8100) /* 2.6.26, or former */
> > #define __PAGE_OFFSET_2_6_27(0x8800) /* 2.6.27, or later  */
> > #define __PAGE_OFFSET_5LEVEL(0xff10) /* 5-level page table 
> > */
> > ...
> > #endif
> >
> > Thanks
> > Baoquan
> >
> >>
> >> Update to new address accordingly.
> >> ---
> >>   kexec/arch/i386/crashdump-x86.c | 2 ++
> >>   kexec/arch/i386/crashdump-x86.h | 3 ++-
> >>   2 files changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kexec/arch/i386/crashdump-x86.c 
> >> b/kexec/arch/i386/crashdump-x86.c
> >> index a2aea31..c79791f 100644
> >> --- a/kexec/arch/i386/crashdump-x86.c
> >> +++ b/kexec/arch/i386/crashdump-x86.c
> >> @@ -61,6 +61,8 @@ static int get_kernel_page_offset(struct kexec_info 
> >> *UNUSED(info),
> >>
> >>  if (kv < KERNEL_VERSION(2, 6, 27))
> >>  elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_2_6_27;
> >> +else if (kv < KERNEL_VERSION(4, 20, 0))
> >> +elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_4_20_0;
> >>  else
> >>  elf_info->page_offset = X86_64_PAGE_OFFSET;
> >>  }
> >> diff --git a/kexec/arch/i386/crashdump-x86.h 
> >> b/kexec/arch/i386/crashdump-x86.h
> >> index ddee19f..e4fdc82 100644
> >> --- a/kexec/arch/i386/crashdump-x86.h
> >> +++ b/kexec/arch/i386/crashdump-x86.h
> >> @@ -13,7 +13,8 @@ int load_crashdump_segments(struct kexec_info *info, 
> >> char *mod_cmdline,
> >>
> >>   #define X86_64__START_KERNEL_map   0x8000ULL
> >>   #define X86_64_PAGE_OFFSET_PRE_2_6_27  0x8100ULL
> >> -#define X86_64_PAGE_OFFSET  0x8800ULL
> >> +#define X86_64_PAGE_OFFSET_PRE_4_20_0   0x8800ULL
> >> +#define X86_64_PAGE_OFFSET  0x8880ULL
> >>
> >>   #define X86_64_MAXMEM  0x3fffUL
> >>
> >> --
> >> 2.22.0
> >>
> >>
> >> ___
> >> kexec mailing list
> >> kexec@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
>
>
> --
> Donald Buczek
> buc...@molgen.mpg.de
> Tel: +49 30 8413 1433

___
kexec mailing list

Re: [PATCH] x86: Fix PAGE_OFFSET for kernels since 4.20

2019-08-30 Thread Donald Buczek

Dear Baoquan,

On 8/30/19 11:23 AM, Baoquan He wrote:

On 08/30/19 at 11:12am, Donald Buczek wrote:

Linux kernel commit d52888aa2753 ("x86/mm: Move LDT remap out of KASLR
region on 5-level paging") changed the base of the direct mapping
from 0x8800 to 0x8880. This was merged
into v4.20-rc2.


A good catch and necessary fix, thanks.

Does it have issue in makedumpfile?


We don't use makedumpfile. We use `cp /proc/vmcore /mnt/crash.vmcore` in the 
panic kernel.

Without this patch, the file /mnt/crash.vmcore has the wrong vaddr in the elf 
headers and can't be processed by gdb or crash.

With this patch, the file has the correct vaddr an can be processed by gdb and 
crash.

Btw:  I've got four strange error replies for my mail.

Date: 30 Aug 2019 05:13:31 EDT
From: mailer_dae...@email.uscc.net
To: buc...@molgen.mpg.de
Message to 7867650...@email.uscc.net failed.

I hope, my patch reaches the right people anyway.

Best
  Donald




#ifdef __x86_64__
#define __PAGE_OFFSET_ORIG  (0x8100) /* 2.6.26, or former */
#define __PAGE_OFFSET_2_6_27(0x8800) /* 2.6.27, or later  */
#define __PAGE_OFFSET_5LEVEL(0xff10) /* 5-level page table */
...
#endif

Thanks
Baoquan



Update to new address accordingly.
---
  kexec/arch/i386/crashdump-x86.c | 2 ++
  kexec/arch/i386/crashdump-x86.h | 3 ++-
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index a2aea31..c79791f 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -61,6 +61,8 @@ static int get_kernel_page_offset(struct kexec_info 
*UNUSED(info),
  
  		if (kv < KERNEL_VERSION(2, 6, 27))

elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_2_6_27;
+   else if (kv < KERNEL_VERSION(4, 20, 0))
+   elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_4_20_0;
else
elf_info->page_offset = X86_64_PAGE_OFFSET;
}
diff --git a/kexec/arch/i386/crashdump-x86.h b/kexec/arch/i386/crashdump-x86.h
index ddee19f..e4fdc82 100644
--- a/kexec/arch/i386/crashdump-x86.h
+++ b/kexec/arch/i386/crashdump-x86.h
@@ -13,7 +13,8 @@ int load_crashdump_segments(struct kexec_info *info, char 
*mod_cmdline,
  
  #define X86_64__START_KERNEL_map	0x8000ULL

  #define X86_64_PAGE_OFFSET_PRE_2_6_27 0x8100ULL
-#define X86_64_PAGE_OFFSET 0x8800ULL
+#define X86_64_PAGE_OFFSET_PRE_4_20_0  0x8800ULL
+#define X86_64_PAGE_OFFSET 0x8880ULL
  
  #define X86_64_MAXMEM		0x3fffUL
  
--

2.22.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec



--
Donald Buczek
buc...@molgen.mpg.de
Tel: +49 30 8413 1433

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86: Fix PAGE_OFFSET for kernels since 4.20

2019-08-30 Thread Baoquan He
On 08/30/19 at 11:12am, Donald Buczek wrote:
> Linux kernel commit d52888aa2753 ("x86/mm: Move LDT remap out of KASLR
> region on 5-level paging") changed the base of the direct mapping
> from 0x8800 to 0x8880. This was merged
> into v4.20-rc2.

A good catch and necessary fix, thanks.

Does it have issue in makedumpfile?

#ifdef __x86_64__
#define __PAGE_OFFSET_ORIG  (0x8100) /* 2.6.26, or former */
#define __PAGE_OFFSET_2_6_27(0x8800) /* 2.6.27, or later  */
#define __PAGE_OFFSET_5LEVEL(0xff10) /* 5-level page table */ 
...
#endif

Thanks
Baoquan

> 
> Update to new address accordingly.
> ---
>  kexec/arch/i386/crashdump-x86.c | 2 ++
>  kexec/arch/i386/crashdump-x86.h | 3 ++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
> index a2aea31..c79791f 100644
> --- a/kexec/arch/i386/crashdump-x86.c
> +++ b/kexec/arch/i386/crashdump-x86.c
> @@ -61,6 +61,8 @@ static int get_kernel_page_offset(struct kexec_info 
> *UNUSED(info),
>  
>   if (kv < KERNEL_VERSION(2, 6, 27))
>   elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_2_6_27;
> + else if (kv < KERNEL_VERSION(4, 20, 0))
> + elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_4_20_0;
>   else
>   elf_info->page_offset = X86_64_PAGE_OFFSET;
>   }
> diff --git a/kexec/arch/i386/crashdump-x86.h b/kexec/arch/i386/crashdump-x86.h
> index ddee19f..e4fdc82 100644
> --- a/kexec/arch/i386/crashdump-x86.h
> +++ b/kexec/arch/i386/crashdump-x86.h
> @@ -13,7 +13,8 @@ int load_crashdump_segments(struct kexec_info *info, char 
> *mod_cmdline,
>  
>  #define X86_64__START_KERNEL_map 0x8000ULL
>  #define X86_64_PAGE_OFFSET_PRE_2_6_270x8100ULL
> -#define X86_64_PAGE_OFFSET   0x8800ULL
> +#define X86_64_PAGE_OFFSET_PRE_4_20_00x8800ULL
> +#define X86_64_PAGE_OFFSET   0x8880ULL
>  
>  #define X86_64_MAXMEM0x3fffUL
>  
> -- 
> 2.22.0
> 
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] x86: Fix PAGE_OFFSET for kernels since 4.20

2019-08-30 Thread Donald Buczek
Linux kernel commit d52888aa2753 ("x86/mm: Move LDT remap out of KASLR
region on 5-level paging") changed the base of the direct mapping
from 0x8800 to 0x8880. This was merged
into v4.20-rc2.

Update to new address accordingly.
---
 kexec/arch/i386/crashdump-x86.c | 2 ++
 kexec/arch/i386/crashdump-x86.h | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index a2aea31..c79791f 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -61,6 +61,8 @@ static int get_kernel_page_offset(struct kexec_info 
*UNUSED(info),
 
if (kv < KERNEL_VERSION(2, 6, 27))
elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_2_6_27;
+   else if (kv < KERNEL_VERSION(4, 20, 0))
+   elf_info->page_offset = X86_64_PAGE_OFFSET_PRE_4_20_0;
else
elf_info->page_offset = X86_64_PAGE_OFFSET;
}
diff --git a/kexec/arch/i386/crashdump-x86.h b/kexec/arch/i386/crashdump-x86.h
index ddee19f..e4fdc82 100644
--- a/kexec/arch/i386/crashdump-x86.h
+++ b/kexec/arch/i386/crashdump-x86.h
@@ -13,7 +13,8 @@ int load_crashdump_segments(struct kexec_info *info, char 
*mod_cmdline,
 
 #define X86_64__START_KERNEL_map   0x8000ULL
 #define X86_64_PAGE_OFFSET_PRE_2_6_27  0x8100ULL
-#define X86_64_PAGE_OFFSET 0x8800ULL
+#define X86_64_PAGE_OFFSET_PRE_4_20_0  0x8800ULL
+#define X86_64_PAGE_OFFSET 0x8880ULL
 
 #define X86_64_MAXMEM  0x3fffUL
 
-- 
2.22.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 4/4] kdump: update Documentation about crashkernel on arm64

2019-08-30 Thread Chen Zhou
Now we support crashkernel=X,[low] on arm64, update the Documentation.

Signed-off-by: Chen Zhou 
---
 Documentation/admin-guide/kdump/kdump.rst   | 13 +++--
 Documentation/admin-guide/kernel-parameters.txt | 12 +++-
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index ac7e131..e55173e 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -299,7 +299,13 @@ Boot into System Kernel
"crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
starting at physical address 0x0100 (16MB) for the dump-capture kernel.
 
-   On x86 and x86_64, use "crashkernel=64M@16M".
+   On x86 use "crashkernel=64M@16M".
+
+   On x86_64, use "crashkernel=Y[@X]" to select a region under 4G first, and
+   fall back to reserve region above 4G when '@offset' hasn't been specified.
+   We can also use "crashkernel=X,high" to select a region above 4G, which
+   also tries to allocate at least 256M below 4G automatically and
+   "crashkernel=Y,low" can be used to allocate specified size low memory.
 
On ppc64, use "crashkernel=128M@32M".
 
@@ -316,8 +322,11 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.
 
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   On arm64, use "crashkernel=Y[@X]". Note that the start address of
the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
+   If crashkernel=Z,low is specified simultaneously, reserve spcified size
+   low memory for crash kdump kernel devices firstly and then reserve memory
+   above 4G.
 
 Load the Dump-capture Kernel
 
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 4c19719..069a122 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -708,6 +708,9 @@
[KNL, x86_64] select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+   [KNL, arm64] If crashkernel=X,low is specified, reserve
+   spcified size low memory for crash kdump kernel devices
+   firstly, and then reserve memory above 4G.
See Documentation/admin-guide/kdump/kdump.rst for 
further details.
 
crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -732,12 +735,19 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
-   at least 256M below 4G automatically.
+   least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
+   [KNL, arm64] range under 4G.
+   This one let user to specify own low range under 4G
+   for crash dump kernel instead.
+   Different with x86_64, kernel allocates specified size
+   physical memory region only when this parameter is 
specified
+   instead of trying to allocate at least 256M below 4G
+   automatically.
 
cryptomgr.notests
[KNL] Disable crypto self-tests
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 3/4] arm64: kdump: add memory for devices by DT property, low-memory-range

2019-08-30 Thread Chen Zhou
If we want to reserve crashkernel above 4G, we could use parameters
"crashkernel=X crashkernel=Y,low", in this case, specified size low
memory is reserved for crash dump kernel devices and never mapped by
the first kernel. This memory range is advertised to crash dump kernel
via DT property under /chosen,
linux,low-memory-range=

Crash dump kernel reads this property at boot time and call
memblock_add() after memblock_cap_memory_range() has been called.

Signed-off-by: Chen Zhou 
---
 arch/arm64/mm/init.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index c99f845..a376b18 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -294,6 +294,26 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+static int __init early_init_dt_scan_lowmem(unsigned long node,
+   const char *uname, int depth, void *data)
+{
+   struct memblock_region *lowmem = data;
+   const __be32 *reg;
+   int len;
+
+   if (depth != 1 || strcmp(uname, "chosen") != 0)
+   return 0;
+
+   reg = of_get_flat_dt_prop(node, "linux,low-memory-range", );
+   if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
+   return 1;
+
+   lowmem->base = dt_mem_next_cell(dt_root_addr_cells, );
+   lowmem->size = dt_mem_next_cell(dt_root_size_cells, );
+
+   return 1;
+}
+
 static int __init early_init_dt_scan_usablemem(unsigned long node,
const char *uname, int depth, void *data)
 {
@@ -324,13 +344,21 @@ static void __init fdt_enforce_memory_region(void)
 
if (reg.size)
memblock_cap_memory_range(reg.base, reg.size);
+
+   of_scan_flat_dt(early_init_dt_scan_lowmem, );
+
+   if (reg.size)
+   memblock_add(reg.base, reg.size);
 }
 
 void __init arm64_memblock_init(void)
 {
const s64 linear_region_size = -(s64)PAGE_OFFSET;
 
-   /* Handle linux,usable-memory-range property */
+   /*
+* Handle linux,usable-memory-range and linux,low-memory-range
+* properties.
+*/
fdt_enforce_memory_region();
 
/* Remove memory above our supported physical address size */
-- 
2.7.4



[PATCH v6 0/4] support reserving crashkernel above 4G on arm64 kdump

2019-08-30 Thread Chen Zhou
I am busy with other things, so it was a long time before this version was
released.

This patch series enable reserving crashkernel above 4G in arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
in this case, if swiotlb or DMA buffers are requierd, crash dump kernel
will boot failure because there is no low memory available for allocation.

To solve these issues, introduce crashkernel=X,low to reserve specified
size low memory.
Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
is specified simultaneously, kernel should reserve specified size low memory
for crash dump kernel devices. So there may be two crash kernel regions, one
is below 4G, the other is above 4G.
In order to distinct from the high region and make no effect to the use of
kexec-tools, rename the low region as "Crash kernel (low)", and add DT property
"linux,low-memory-range" to crash dump kernel's dtb to pass the low region.

Besides, we need to modify kexec-tools:
arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])

The previous changes and discussions can be retrieved from:

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2019-August/023569.html
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360

Chen Zhou (4):
  x86: kdump: move reserve_crashkernel_low() into crash_core.c
  arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  arm64: kdump: add memory for devices by DT property, low-memory-range
  kdump: update Documentation about crashkernel on arm64

 Documentation/admin-guide/kdump/kdump.rst   | 13 -
 Documentation/admin-guide/kernel-parameters.txt | 12 -
 arch/arm64/include/asm/kexec.h  |  3 ++
 arch/arm64/kernel/setup.c   |  8 ++-
 arch/arm64/mm/init.c| 61 +--
 arch/x86/include/asm/kexec.h|  3 ++
 arch/x86/kernel/setup.c | 65 +++--
 include/linux/crash_core.h  |  4 ++
 include/linux/kexec.h   |  1 -
 kernel/crash_core.c | 65 +
 10 files changed, 168 insertions(+), 67 deletions(-)

-- 
2.7.4



[PATCH v6 1/4] x86: kdump: move reserve_crashkernel_low() into crash_core.c

2019-08-30 Thread Chen Zhou
In preparation for supporting reserve_crashkernel_low in arm64 as
x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.

Note, in arm64, we reserve low memory if and only if crashkernel=X,low
is specified. Different with x86_64, don't set low memory automatically.

Signed-off-by: Chen Zhou 
---
 arch/x86/include/asm/kexec.h |  3 ++
 arch/x86/kernel/setup.c  | 65 
 include/linux/crash_core.h   |  4 +++
 include/linux/kexec.h|  1 -
 kernel/crash_core.c  | 65 
 5 files changed, 78 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b4..bf21d6c 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -18,6 +18,9 @@
 
 # define KEXEC_CONTROL_CODE_MAX_SIZE   2048
 
+/* 16M alignment for crash kernel regions */
+#define CRASH_ALIGNSZ_16M
+
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bbe35bf..2b437c4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -449,9 +449,6 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 
 #ifdef CONFIG_KEXEC_CORE
 
-/* 16M alignment for crash kernel regions */
-#define CRASH_ALIGNSZ_16M
-
 /*
  * Keep the crash kernel below this limit.
  *
@@ -473,59 +470,6 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_HIGH_MAX   SZ_64T
 #endif
 
-static int __init reserve_crashkernel_low(void)
-{
-#ifdef CONFIG_X86_64
-   unsigned long long base, low_base = 0, low_size = 0;
-   unsigned long total_low_mem;
-   int ret;
-
-   total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
-
-   /* crashkernel=Y,low */
-   ret = parse_crashkernel_low(boot_command_line, total_low_mem, 
_size, );
-   if (ret) {
-   /*
-* two parts from lib/swiotlb.c:
-* -swiotlb size: user-specified with swiotlb= or default.
-*
-* -swiotlb overflow buffer: now hardcoded to 32k. We round it
-* to 8M for other buffers that may need to stay low too. Also
-* make sure we allocate enough extra low memory so that we
-* don't run out of DMA buffers for 32-bit devices.
-*/
-   low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL 
<< 20);
-   } else {
-   /* passed with crashkernel=0,low ? */
-   if (!low_size)
-   return 0;
-   }
-
-   low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
-   if (!low_base) {
-   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
smaller size.\n",
-  (unsigned long)(low_size >> 20));
-   return -ENOMEM;
-   }
-
-   ret = memblock_reserve(low_base, low_size);
-   if (ret) {
-   pr_err("%s: Error reserving crashkernel low memblock.\n", 
__func__);
-   return ret;
-   }
-
-   pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System 
low RAM: %ldMB)\n",
-   (unsigned long)(low_size >> 20),
-   (unsigned long)(low_base >> 20),
-   (unsigned long)(total_low_mem >> 20));
-
-   crashk_low_res.start = low_base;
-   crashk_low_res.end   = low_base + low_size - 1;
-   insert_resource(_resource, _low_res);
-#endif
-   return 0;
-}
-
 static void __init reserve_crashkernel(void)
 {
unsigned long long crash_size, crash_base, total_mem;
@@ -589,9 +533,12 @@ static void __init reserve_crashkernel(void)
return;
}
 
-   if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
-   memblock_free(crash_base, crash_size);
-   return;
+   if (crash_base >= (1ULL << 32)) {
+   if (reserve_crashkernel_low()) {
+   memblock_free(crash_base, crash_size);
+   return;
+   }
+   insert_resource(_resource, _low_res);
}
 
pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System 
RAM: %ldMB)\n",
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 525510a..9192e43 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -6,6 +6,8 @@
 #include 
 #include 
 
+#include 
+
 #define CRASH_CORE_NOTE_NAME  "CORE"
 #define CRASH_CORE_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
 #define CRASH_CORE_NOTE_NAME_BYTES ALIGN(sizeof(CRASH_CORE_NOTE_NAME), 4)
@@ -63,6 +65,7 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 extern unsigned char *vmcoreinfo_data;
 extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
+extern struct resource crashk_low_res;
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 

[PATCH v6 2/4] arm64: kdump: reserve crashkenel above 4G for crash dump kernel

2019-08-30 Thread Chen Zhou
Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=X,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

Signed-off-by: Chen Zhou 
---
 arch/arm64/include/asm/kexec.h |  3 +++
 arch/arm64/kernel/setup.c  |  8 +++-
 arch/arm64/mm/init.c   | 31 +--
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 12a561a..88279a9 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -25,6 +25,9 @@
 
 #define KEXEC_ARCH KEXEC_ARCH_AARCH64
 
+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGNSZ_2M
+
 #ifndef __ASSEMBLY__
 
 /**
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 9c4bad7..2ead608 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -231,7 +231,13 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, _data);
 #ifdef CONFIG_KEXEC_CORE
-   /* Userspace will find "Crash kernel" region in /proc/iomem. */
+   /*
+* Userspace will find "Crash kernel" region in /proc/iomem.
+* Note: the low region is renamed as Crash kernel (low).
+*/
+   if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+   crashk_low_res.end <= res->end)
+   request_resource(res, _low_res);
if (crashk_res.end && crashk_res.start >= res->start &&
crashk_res.end <= res->end)
request_resource(res, _res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f3c7952..c99f845 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -64,6 +64,7 @@ static void __init reserve_crashkernel(void)
 {
unsigned long long crash_base, crash_size;
int ret;
+   phys_addr_t crash_max = ARCH_LOW_ADDRESS_LIMIT;
 
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
_size, _base);
@@ -71,12 +72,38 @@ static void __init reserve_crashkernel(void)
if (ret || !crash_size)
return;
 
+   ret = reserve_crashkernel_low();
+   if (!ret && crashk_low_res.end) {
+   /*
+* If crashkernel=X,low specified, there may be two regions,
+* we need to make some changes as follows:
+*
+* 1. rename the low region as "Crash kernel (low)"
+* In order to distinct from the high region and make no effect
+* to the use of existing kexec-tools, rename the low region as
+* "Crash kernel (low)".
+*
+* 2. change the upper bound for crash memory
+* Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
+*
+* 3. mark the low region as "nomap"
+* The low region is intended to be used for crash dump kernel
+* devices, just mark the low region as "nomap" simply.
+*/
+   const char *rename = "Crash kernel (low)";
+
+   crashk_low_res.name = rename;
+   crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
+   memblock_mark_nomap(crashk_low_res.start,
+   resource_size(_low_res));
+   }
+
crash_size = PAGE_ALIGN(crash_size);
 
if (crash_base == 0) {
/* Current arm64 boot protocol requires 2MB alignment */
-   crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT,
-   crash_size, SZ_2M);
+   crash_base = memblock_find_in_range(0, crash_max, crash_size,
+   SZ_2M);
if (crash_base == 0) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
crash_size);
-- 
2.7.4



[PATCH v2] arm64: kdump: add another DT property to crash dump kernel's dtb

2019-08-30 Thread Chen Zhou
Currently, there is only one crash kernel region on arm64, we add
another region "crash kernel low" used for crash dump kernel devices.

To do this, we add DT property "linux,low-memory-range" to crash
dump kernel's dtb to pass the low region.

Signed-off-by: Chen Zhou 
---
For "support reserving crashkernel above 4G on arm64 kdump", we need to
modify the kexec-tools.

I will post patch series "[PATCH v6 0/4] support reserving crashkernel
above 4G on arm64 kdump". This version is much different from the previous
one and the kexec-tools part neeed to be modified.

Changes since [v1]:
- Add another DT property "linux,low-memory-range" to crash dump kernel's
dtb to pass the low region instead of reusing "linux,usable-memory-range".

[1]: http://lists.infradead.org/pipermail/kexec/2019-April/022792.html
---
 kexec/arch/arm64/crashdump-arm64.c | 29 +++--
 kexec/arch/arm64/crashdump-arm64.h |  2 ++
 kexec/arch/arm64/iomem.h   |  1 +
 kexec/arch/arm64/kexec-arm64.c | 27 +++
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/kexec/arch/arm64/crashdump-arm64.c 
b/kexec/arch/arm64/crashdump-arm64.c
index 4fd7aa8..a8be036 100644
--- a/kexec/arch/arm64/crashdump-arm64.c
+++ b/kexec/arch/arm64/crashdump-arm64.c
@@ -39,6 +39,14 @@ struct memory_ranges usablemem_rgns = {
.ranges = _reserved_mem,
 };
 
+/* memory range reserved for crashkernel low, optional */
+struct memory_range crash_reserved_low_mem;
+struct memory_ranges lowmem_rgns = {
+   .size = 0,
+   .max_size = 1,
+   .ranges = _reserved_low_mem,
+};
+
 struct memory_range elfcorehdr_mem;
 
 static struct crash_elf_info elf_info = {
@@ -89,7 +97,10 @@ static int iomem_range_callback(void *UNUSED(data), int 
UNUSED(nr),
char *str, unsigned long long base,
unsigned long long length)
 {
-   if (strncmp(str, CRASH_KERNEL, strlen(CRASH_KERNEL)) == 0)
+   if (strncmp(str, CRASH_KERNEL_LOW, strlen(CRASH_KERNEL_LOW)) == 0)
+   return mem_regions_add(_rgns,
+   base, length, RANGE_RAM);
+   else if (strncmp(str, CRASH_KERNEL, strlen(CRASH_KERNEL)) == 0)
return mem_regions_add(_rgns,
   base, length, RANGE_RAM);
else if (strncmp(str, SYSTEM_RAM, strlen(SYSTEM_RAM)) == 0)
@@ -129,7 +140,7 @@ static int crash_get_memory_ranges(void)
if (!usablemem_rgns.size)
kexec_iomem_for_each_line(NULL, iomem_range_callback, NULL);
 
-   /* allow only a single region for crash dump kernel */
+   /* allow only a single usablemem region for crash dump kernel */
if (usablemem_rgns.size != 1)
return -EINVAL;
 
@@ -141,6 +152,20 @@ static int crash_get_memory_ranges(void)
return -ENOMEM;
}
 
+   /* lowmem region for crash dump kernel is optional, at most one region 
*/
+   if (lowmem_rgns.size > 1)
+   return -EINVAL;
+
+   if (lowmem_rgns.size) {
+   dbgprint_mem_range("Reserved low memory range", 
_reserved_low_mem,
+   1);
+
+   if (mem_regions_exclude(_memory_rgns, 
_reserved_low_mem)) {
+   fprintf(stderr,
+   "Error: Number of crash memory ranges 
excedeed the max limit\n");
+   return -ENOMEM;
+   }
+   }
/*
 * Make sure that the memory regions are sorted.
 */
diff --git a/kexec/arch/arm64/crashdump-arm64.h 
b/kexec/arch/arm64/crashdump-arm64.h
index 880b83a..f185534 100644
--- a/kexec/arch/arm64/crashdump-arm64.h
+++ b/kexec/arch/arm64/crashdump-arm64.h
@@ -18,6 +18,8 @@
 
 extern struct memory_ranges usablemem_rgns;
 extern struct memory_range crash_reserved_mem;
+extern struct memory_ranges lowmem_rgns;
+extern struct memory_range crash_reserved_low_mem;
 extern struct memory_range elfcorehdr_mem;
 
 extern int load_crashdump_segments(struct kexec_info *info);
diff --git a/kexec/arch/arm64/iomem.h b/kexec/arch/arm64/iomem.h
index d4864bb..45d7953 100644
--- a/kexec/arch/arm64/iomem.h
+++ b/kexec/arch/arm64/iomem.h
@@ -4,6 +4,7 @@
 #define SYSTEM_RAM "System RAM\n"
 #define KERNEL_CODE"Kernel code\n"
 #define KERNEL_DATA"Kernel data\n"
+#define CRASH_KERNEL_LOW   "Crash kernel (low)\n"
 #define CRASH_KERNEL   "Crash kernel\n"
 #define IOMEM_RESERVED "reserved\n"
 
diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index eb3a3a3..dddec23 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -38,6 +38,7 @@
 #define PROP_SIZE_CELLS "#size-cells"
 #define PROP_ELFCOREHDR "linux,elfcorehdr"
 #define PROP_USABLE_MEM_RANGE "linux,usable-memory-range"
+#define PROP_LOW_MEM_RANGE "linux,low-memory-range"
 
 #define PAGE_OFFSET_36