Re: [PATCH] x86/kexec: do not update E820 kexec table for setup_data

2024-03-18 Thread Dave Young
On Tue, 5 Mar 2024 at 09:32, Dave Young  wrote:
>
> crashkernel reservation failed on a Thinkpad t440s laptop recently,
> Actually the memblock reservation succeeded, but later insert_resource()
> failed.
>
> Test step:
> kexec load ->
> kexec reboot ->
> check the crashkernel memory
> dmesg|grep "crashkernel reserved"; saw reserved suceeeded:
> 0xd000 - 0xda00
> grep Crash /proc/iomem: got nothing
>
> The background story is like below:
> Currently E820 code reserves setup_data regions for both the current kernel
> and the kexec kernel, and it will also insert them into resources list.
> Before the kexec kernel reboot nobody passes the old setup_data, kexec only
> passes SETUP_EFI and SETUP_IMA if needed.  Thus the old setup data memory
> are not used at all. But due to old kernel updated the kexec e820 table
> as well so kexec kernel see them as E820_TYPE_RESERVED_KERN regions, later
> the old setup_data regions will be inserted into resources list in kexec
> kernel by e820__reserve_resources().
>
> Note, due to no setup_data passed in for those old regions they are not
> early reserved (by function early_reserve_memory), crashkernel memblock
> reservation will just regard them as usable memory and it could reserve
> reserve crashkernel region overlaps with the old setup_data regions.
>
> Just like the bug I noticed here, kdump insert_resource failed because
> e820__reserve_resources added the overlapped chunks in /proc/iomem already.
>
> Finally, looking at the code, the old setup_data regions are not used
> at all as no setup_data passed in by the kexec boot loader. Although
> something like SETUP_PCI etc could be needed, kexec should pass
> the info as setup_data so that kexec kernel can take care of them.
> This should be taken care of in other separate patches if needed.
>
> Thus drop the useless buggy code here.
>
> Signed-off-by: Dave Young 
> ---
>  arch/x86/kernel/e820.c |   16 +---
>  1 file changed, 1 insertion(+), 15 deletions(-)
>
> Index: linux/arch/x86/kernel/e820.c
> ===
> --- linux.orig/arch/x86/kernel/e820.c
> +++ linux/arch/x86/kernel/e820.c
> @@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
> pa_next = data->next;
>
> e820__range_update(pa_data, sizeof(*data)+data->len, 
> E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -
> -   /*
> -* SETUP_EFI and SETUP_IMA are supplied by kexec and do not 
> need
> -* to be reserved.
> -*/
> -   if (data->type != SETUP_EFI && data->type != SETUP_IMA)
> -   e820__range_update_kexec(pa_data,
> -sizeof(*data) + data->len,
> -E820_TYPE_RAM, 
> E820_TYPE_RESERVED_KERN);
> -
> if (data->type == SETUP_INDIRECT) {
> len += data->len;
> early_memunmap(data, sizeof(*data));
> @@ -1036,12 +1026,9 @@ void __init e820__reserve_setup_data(voi
>
> indirect = (struct setup_indirect *)data->data;
>
> -   if (indirect->type != SETUP_INDIRECT) {
> +   if (indirect->type != SETUP_INDIRECT)
> e820__range_update(indirect->addr, 
> indirect->len,
>E820_TYPE_RAM, 
> E820_TYPE_RESERVED_KERN);
> -   e820__range_update_kexec(indirect->addr, 
> indirect->len,
> -E820_TYPE_RAM, 
> E820_TYPE_RESERVED_KERN);
> -   }
> }
>
> pa_data = pa_next;
> @@ -1049,7 +1036,6 @@ void __init e820__reserve_setup_data(voi
> }
>
> e820__update_table(e820_table);
> -   e820__update_table(e820_table_kexec);
>
> pr_info("extended physical RAM map:\n");
> e820__print_table("reserve setup_data");
>

Kindly ping for review.

Thanks!
Dave


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec.

2024-03-18 Thread Dave Young
Hi,

Added Ard in cc.

On 03/18/24 at 07:02am, Ashish Kalra wrote:
> From: Ashish Kalra 
> 
> For kexec use case, need to use and stick to the EFI memmap passed
> from the first kernel via boot-params/setup data, hence,
> skip efi_arch_mem_reserve() during kexec.
> 
> Additionally during SNP guest kexec testing discovered that EFI memmap
> is corrupted during chained kexec. kexec_enter_virtual_mode() during
> late init will remap the efi_memmap physical pages allocated in
> efi_arch_mem_reserve() via memboot & then subsequently cause random
> EFI memmap corruption once memblock is freed/teared-down.
> 
> Signed-off-by: Ashish Kalra 
> ---
>  arch/x86/platform/efi/quirks.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index f0cc00032751..d4562d074371 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -258,6 +258,16 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
> size)
>   int num_entries;
>   void *new;
>  
> + /*
> +  * For kexec use case, we need to use the EFI memmap passed from the 
> first
> +  * kernel via setup data, so we need to skip this.
> +  * Additionally kexec_enter_virtual_mode() during late init will remap
> +  * the efi_memmap physical pages allocated here via memboot & then
> +  * subsequently cause random EFI memmap corruption once memblock is 
> freed.

Can you elaborate a bit about the corruption, is it reproducible without
SNP?

> +  */
> + if (efi_setup)
> + return;
> +

How about checking the md attribute instead of checking the efi_setup,
personally I feel it a bit better, something like below:

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..699332b075bb 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -255,15 +255,24 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
size)
struct efi_memory_map_data data = { 0 };
struct efi_mem_range mr;
efi_memory_desc_t md;
-   int num_entries;
+   int num_entries, ret;
void *new;
 
-   if (efi_mem_desc_lookup(addr, ) ||
-   md.type != EFI_BOOT_SERVICES_DATA) {
+   ret = efi_mem_desc_lookup(addr, );
+   if (ret) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", 
);
return;
}
 
+   if (md.type != EFI_BOOT_SERVICES_DATA) {
+   pr_err("Skil reserving non EFI Boot Service Data memory for 
%pa\n", );
+   return;
+   }
+
+   /* Kexec copied the efi memmap from the 1st kernel, thus skip the case. 
*/
+   if (md.attribute & EFI_MEMORY_RUNTIME)
+   return;
+
if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
pr_err("Region spans EFI memory descriptors, %pa\n", );
return;


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Question about Address Range Validation in Crash Kernel Allocation

2024-03-18 Thread Baoquan He
Hi,

On 03/18/24 at 12:00pm, chenhaixiang (A) wrote:
> Dear kexec Community Members,
> 
> I encountered an issue while using kexec-tools on my x86_64 machine.
> When there is a segment marked as 'reserved' within the memory range 
> allocated for the crash kernel in /proc/iomem,the output appears as follows:
> 2d4fd058-60efefff : System RAM
>   2d4fd058-58ff : System RAM
> 4900-58ff : Crash kernel
>   53cbd000-53cc : Reserved

What kernel are you using? the version of kernel, and kexec-tools?

If you are testing on the latest mainline kernel, you could meet the
issue Dave have met and fixed in below patch:

[PATCH] x86/kexec: do not update E820 kexec table for setup_data
https://lore.kernel.org/all/zez2kos-oozns...@darkstar.users.ipa.redhat.com/T/#u

Thanks
Baoquan

> 
> The crash_memory_range array will encounter incorrect address ranges:
> CRASH MEMORY RANGES
> 2d4fd058-48ff (0)
> 53cbd000-48ff (1)
> 5900-53cc (0)
> 
> Read the code, I noticed that the get_crash_memory_ranges() function invokes 
> exclude_region() to handle the splitting of memory regions, but it seems 
> unable to properly handle the scenario described above.
> The code logic is as follows:
> ...
>   if (start < mend && end > mstart) {
>   if (start != mstart && end != mend) {
>   /* Split memory region */
>   crash_memory_range[i].end = start - 1;
>   temp_region.start = end + 1;
>   temp_region.end = mend;
>   temp_region.type = RANGE_RAM;
>   tidx = i+1;
>   } else if (start != mstart)
>   crash_memory_range[i].end = start - 1;
>   else
>   crash_memory_range[i].start = end + 1;
>   }
> ...
> If start < mstart < mend < end, resulting in crash_memory_range[i].end 
> becoming less than crash_memory_range[i].start, leading to incorrect address 
> ranges.
> I would like to know if this behavior is reasonable and whether it is 
> necessary to validate the address ranges for compliance at the end.
> 
> Thank you for your time and assistance.
> 
> Chen Haixiang
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH linux-next] bpf: fix warning for crash_kexec

2024-03-18 Thread Yonghong Song


On 3/17/24 11:52 PM, Hari Bathini wrote:

Just checking on whether this will go via bpf or mm tree?


Sending to bpf-next should be okay.
Could you resubmit the patch as CONFIG_CRASH_DUMP probably not
available to bpf-next when you initially submitted the patch.



On 09/02/24 6:05 pm, Hari Bathini wrote:

With [1], CONFIG_KEXEC & !CONFIG_CRASH_DUMP is supported but that led
to the below warning:

   "WARN: resolve_btfids: unresolved symbol crash_kexec"

Fix it by using the appropriate #ifdef.

[1] https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/

Signed-off-by: Hari Bathini 
---
  kernel/bpf/helpers.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4db1c658254c..e408d1115e26 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2545,7 +2545,7 @@ __bpf_kfunc void bpf_throw(u64 cookie)
  __bpf_kfunc_end_defs();
    BTF_KFUNCS_START(generic_btf_ids)
-#ifdef CONFIG_KEXEC_CORE
+#ifdef CONFIG_CRASH_DUMP
  BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
  #endif
  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)




___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Question about Address Range Validation in Crash Kernel Allocation

2024-03-18 Thread chenhaixiang (A)
Dear kexec Community Members,

I encountered an issue while using kexec-tools on my x86_64 machine.
When there is a segment marked as 'reserved' within the memory range allocated 
for the crash kernel in /proc/iomem,the output appears as follows:
2d4fd058-60efefff : System RAM
  2d4fd058-58ff : System RAM
4900-58ff : Crash kernel
  53cbd000-53cc : Reserved

The crash_memory_range array will encounter incorrect address ranges:
CRASH MEMORY RANGES
2d4fd058-48ff (0)
53cbd000-48ff (1)
5900-53cc (0)

Read the code, I noticed that the get_crash_memory_ranges() function invokes 
exclude_region() to handle the splitting of memory regions, but it seems unable 
to properly handle the scenario described above.
The code logic is as follows:
...
if (start < mend && end > mstart) {
if (start != mstart && end != mend) {
/* Split memory region */
crash_memory_range[i].end = start - 1;
temp_region.start = end + 1;
temp_region.end = mend;
temp_region.type = RANGE_RAM;
tidx = i+1;
} else if (start != mstart)
crash_memory_range[i].end = start - 1;
else
crash_memory_range[i].start = end + 1;
}
...
If start < mstart < mend < end, resulting in crash_memory_range[i].end becoming 
less than crash_memory_range[i].start, leading to incorrect address ranges.
I would like to know if this behavior is reasonable and whether it is necessary 
to validate the address ranges for compliance at the end.

Thank you for your time and assistance.

Chen Haixiang

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: ppc64: get vmalloc start address from vmcoreinfo

2024-03-18 Thread Aditya Gupta


On 18/03/24 14:18, HAGIO KAZUHITO(萩尾 一仁) wrote:

On 2024/03/18 17:26, Aditya Gupta wrote:

Hi,
The commit removing 'vmap_area_list' is now merged in Linux mainline tree.
      commit:     55c49fee57af99f3c663e69dedc5b85e691bbe50
      mm/vmalloc: remove vmap_area_list

Applied with this commit id and the fix.
https://github.com/makedumpfile/makedumpfile/commit/94241fd2feed059227a243618f2acc6aabf366e8


Thanks Kazu.


- Aditya Gupta



Thanks,
Kazu


Any comments on this patch ?

Thanks,

Aditya Gupta

On 24/02/24 00:33, Aditya Gupta wrote:

Below error was noticed when running makedumpfile on linux-next kernel
crash (linux-next tag next-20240121):

  ...
  Checking for memory holes : [100.0 %] | readpage_elf: Attempt to
read non-existent page at 0xc.
  [ 17.551718] kdump.sh[404]: readmem: type_addr: 0,
addr:c00c, size:16384
  [ 17.551793] kdump.sh[404]: __exclude_unnecessary_pages: Can't
read the buffer of struct page.
  [ 17.551864] kdump.sh[404]: create_2nd_bitmap: Can't exclude
unnecessary pages.
  [ 17.562632] kdump.sh[404]: The kernel version is not supported.
  [ 17.562708] kdump.sh[404]: The makedumpfile operation may be
incomplete.
  [ 17.562773] kdump.sh[404]: makedumpfile Failed.
  [ 17.564335] kdump[406]: saving vmcore failed, _exitcode:1

Above error was due to 'vmap_area_list' and 'vmlist' symbols missing
from the vmcore.

'vmap_area_list' was removed in the linux kernel with below commit:

  commit 378eb24a0658dd922b29524e0ce35c6c43f56cba
   mm/vmalloc: remove vmap_area_list

Subsequently the commit also introduced 'VMALLOC_START' in vmcoreinfo to
get base address of vmalloc area, instead of depending on
'vmap_area_list'

Hence if 'VMALLOC_START' symbol is there in vmcoreinfo:
    1. Set vmalloc_start based on 'VMALLOC_START'
    2. Don't error if vmap_area_list/vmlist are not defined

Reported-by: Sachin Sant 
Signed-off-by: Aditya Gupta 
---
   arch/ppc64.c   | 19 +--
   makedumpfile.c |  3 ++-
   makedumpfile.h |  6 +++---
   3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/ppc64.c b/arch/ppc64.c
index 96c357cb0335..bb62e2cd199a 100644
--- a/arch/ppc64.c
+++ b/arch/ppc64.c
@@ -568,7 +568,9 @@ get_machdep_info_ppc64(void)
   /*
    * Get vmalloc_start value from either vmap_area_list or vmlist.
    */
-    if ((SYMBOL(vmap_area_list) != NOT_FOUND_SYMBOL)
+    if (NUMBER(vmalloc_start) != NOT_FOUND_SYMBOL) {
+    vmalloc_start = NUMBER(vmalloc_start);
+    } else if ((SYMBOL(vmap_area_list) != NOT_FOUND_SYMBOL)
   && (OFFSET(vmap_area.va_start) != NOT_FOUND_STRUCTURE)
   && (OFFSET(vmap_area.list) != NOT_FOUND_STRUCTURE)) {
   if (!readmem(VADDR, SYMBOL(vmap_area_list) +
OFFSET(list_head.next),
@@ -684,11 +686,16 @@ vaddr_to_paddr_ppc64(unsigned long vaddr)
   if ((SYMBOL(vmap_area_list) == NOT_FOUND_SYMBOL)
   || (OFFSET(vmap_area.va_start) == NOT_FOUND_STRUCTURE)
   || (OFFSET(vmap_area.list) == NOT_FOUND_STRUCTURE)) {
-    if ((SYMBOL(vmlist) == NOT_FOUND_SYMBOL)
-    || (OFFSET(vm_struct.addr) == NOT_FOUND_STRUCTURE)) {
-    ERRMSG("Can't get info for vmalloc translation.\n");
-    return NOT_PADDR;
-    }
+    /*
+ * Don't depend on vmap_area_list/vmlist if vmalloc_start is
set in
+ * vmcoreinfo, in that case proceed without error
+ */
+    if (NUMBER(vmalloc_start) == NOT_FOUND_NUMBER)
+    if ((SYMBOL(vmlist) == NOT_FOUND_SYMBOL)
+    || (OFFSET(vm_struct.addr) == NOT_FOUND_STRUCTURE)) {
+    ERRMSG("Can't get info for vmalloc translation.\n");
+    return NOT_PADDR;
+    }
   }
   return ppc64_vtop_level4(vaddr);
diff --git a/makedumpfile.c b/makedumpfile.c
index b004b93fecb7..b6c63fad15f3 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -2978,6 +2978,8 @@ read_vmcoreinfo(void)
   READ_NUMBER("PAGE_OFFLINE_MAPCOUNT_VALUE",
PAGE_OFFLINE_MAPCOUNT_VALUE);
   READ_NUMBER("phys_base", phys_base);
   READ_NUMBER("KERNEL_IMAGE_SIZE", KERNEL_IMAGE_SIZE);
+
+    READ_NUMBER_UNSIGNED("VMALLOC_START", vmalloc_start);
   #ifdef __aarch64__
   READ_NUMBER("VA_BITS", VA_BITS);
   READ_NUMBER("TCR_EL1_T1SZ", TCR_EL1_T1SZ);
@@ -2989,7 +2991,6 @@ read_vmcoreinfo(void)
   READ_NUMBER("VA_BITS", va_bits);
   READ_NUMBER_UNSIGNED("phys_ram_base", phys_ram_base);
   READ_NUMBER_UNSIGNED("PAGE_OFFSET", page_offset);
-    READ_NUMBER_UNSIGNED("VMALLOC_START", vmalloc_start);
   READ_NUMBER_UNSIGNED("VMALLOC_END", vmalloc_end);
   READ_NUMBER_UNSIGNED("VMEMMAP_START", vmemmap_start);
   READ_NUMBER_UNSIGNED("VMEMMAP_END", vmemmap_end);
diff --git a/makedumpfile.h b/makedumpfile.h
index 59c83e1d9df3..4021c5af2a34 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -541,8 +541,6 @@ do { \
    * The value of dependence on machine
    */
   #define PAGE_OFFSET    

Re: [PATCH] makedumpfile: ppc64: get vmalloc start address from vmcoreinfo

2024-03-18 Thread 萩尾 一仁
On 2024/03/18 17:26, Aditya Gupta wrote:
> Hi,
> The commit removing 'vmap_area_list' is now merged in Linux mainline tree.
>      commit:     55c49fee57af99f3c663e69dedc5b85e691bbe50
>      mm/vmalloc: remove vmap_area_list

Applied with this commit id and the fix.
https://github.com/makedumpfile/makedumpfile/commit/94241fd2feed059227a243618f2acc6aabf366e8

Thanks,
Kazu

> 
> Any comments on this patch ?
> 
> Thanks,
> 
> Aditya Gupta
> 
> On 24/02/24 00:33, Aditya Gupta wrote:
>> Below error was noticed when running makedumpfile on linux-next kernel
>> crash (linux-next tag next-20240121):
>>
>>  ...
>>  Checking for memory holes : [100.0 %] | readpage_elf: Attempt to 
>> read non-existent page at 0xc.
>>  [ 17.551718] kdump.sh[404]: readmem: type_addr: 0, 
>> addr:c00c, size:16384
>>  [ 17.551793] kdump.sh[404]: __exclude_unnecessary_pages: Can't 
>> read the buffer of struct page.
>>  [ 17.551864] kdump.sh[404]: create_2nd_bitmap: Can't exclude 
>> unnecessary pages.
>>  [ 17.562632] kdump.sh[404]: The kernel version is not supported.
>>  [ 17.562708] kdump.sh[404]: The makedumpfile operation may be 
>> incomplete.
>>  [ 17.562773] kdump.sh[404]: makedumpfile Failed.
>>  [ 17.564335] kdump[406]: saving vmcore failed, _exitcode:1
>>
>> Above error was due to 'vmap_area_list' and 'vmlist' symbols missing
>> from the vmcore.
>>
>> 'vmap_area_list' was removed in the linux kernel with below commit:
>>
>>  commit 378eb24a0658dd922b29524e0ce35c6c43f56cba
>>   mm/vmalloc: remove vmap_area_list
>>
>> Subsequently the commit also introduced 'VMALLOC_START' in vmcoreinfo to
>> get base address of vmalloc area, instead of depending on 
>> 'vmap_area_list'
>>
>> Hence if 'VMALLOC_START' symbol is there in vmcoreinfo:
>>    1. Set vmalloc_start based on 'VMALLOC_START'
>>    2. Don't error if vmap_area_list/vmlist are not defined
>>
>> Reported-by: Sachin Sant 
>> Signed-off-by: Aditya Gupta 
>> ---
>>   arch/ppc64.c   | 19 +--
>>   makedumpfile.c |  3 ++-
>>   makedumpfile.h |  6 +++---
>>   3 files changed, 18 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/ppc64.c b/arch/ppc64.c
>> index 96c357cb0335..bb62e2cd199a 100644
>> --- a/arch/ppc64.c
>> +++ b/arch/ppc64.c
>> @@ -568,7 +568,9 @@ get_machdep_info_ppc64(void)
>>   /*
>>    * Get vmalloc_start value from either vmap_area_list or vmlist.
>>    */
>> -    if ((SYMBOL(vmap_area_list) != NOT_FOUND_SYMBOL)
>> +    if (NUMBER(vmalloc_start) != NOT_FOUND_SYMBOL) {
>> +    vmalloc_start = NUMBER(vmalloc_start);
>> +    } else if ((SYMBOL(vmap_area_list) != NOT_FOUND_SYMBOL)
>>   && (OFFSET(vmap_area.va_start) != NOT_FOUND_STRUCTURE)
>>   && (OFFSET(vmap_area.list) != NOT_FOUND_STRUCTURE)) {
>>   if (!readmem(VADDR, SYMBOL(vmap_area_list) + 
>> OFFSET(list_head.next),
>> @@ -684,11 +686,16 @@ vaddr_to_paddr_ppc64(unsigned long vaddr)
>>   if ((SYMBOL(vmap_area_list) == NOT_FOUND_SYMBOL)
>>   || (OFFSET(vmap_area.va_start) == NOT_FOUND_STRUCTURE)
>>   || (OFFSET(vmap_area.list) == NOT_FOUND_STRUCTURE)) {
>> -    if ((SYMBOL(vmlist) == NOT_FOUND_SYMBOL)
>> -    || (OFFSET(vm_struct.addr) == NOT_FOUND_STRUCTURE)) {
>> -    ERRMSG("Can't get info for vmalloc translation.\n");
>> -    return NOT_PADDR;
>> -    }
>> +    /*
>> + * Don't depend on vmap_area_list/vmlist if vmalloc_start is 
>> set in
>> + * vmcoreinfo, in that case proceed without error
>> + */
>> +    if (NUMBER(vmalloc_start) == NOT_FOUND_NUMBER)
>> +    if ((SYMBOL(vmlist) == NOT_FOUND_SYMBOL)
>> +    || (OFFSET(vm_struct.addr) == NOT_FOUND_STRUCTURE)) {
>> +    ERRMSG("Can't get info for vmalloc translation.\n");
>> +    return NOT_PADDR;
>> +    }
>>   }
>>   return ppc64_vtop_level4(vaddr);
>> diff --git a/makedumpfile.c b/makedumpfile.c
>> index b004b93fecb7..b6c63fad15f3 100644
>> --- a/makedumpfile.c
>> +++ b/makedumpfile.c
>> @@ -2978,6 +2978,8 @@ read_vmcoreinfo(void)
>>   READ_NUMBER("PAGE_OFFLINE_MAPCOUNT_VALUE", 
>> PAGE_OFFLINE_MAPCOUNT_VALUE);
>>   READ_NUMBER("phys_base", phys_base);
>>   READ_NUMBER("KERNEL_IMAGE_SIZE", KERNEL_IMAGE_SIZE);
>> +
>> +    READ_NUMBER_UNSIGNED("VMALLOC_START", vmalloc_start);
>>   #ifdef __aarch64__
>>   READ_NUMBER("VA_BITS", VA_BITS);
>>   READ_NUMBER("TCR_EL1_T1SZ", TCR_EL1_T1SZ);
>> @@ -2989,7 +2991,6 @@ read_vmcoreinfo(void)
>>   READ_NUMBER("VA_BITS", va_bits);
>>   READ_NUMBER_UNSIGNED("phys_ram_base", phys_ram_base);
>>   READ_NUMBER_UNSIGNED("PAGE_OFFSET", page_offset);
>> -    READ_NUMBER_UNSIGNED("VMALLOC_START", vmalloc_start);
>>   READ_NUMBER_UNSIGNED("VMALLOC_END", vmalloc_end);
>>   READ_NUMBER_UNSIGNED("VMEMMAP_START", vmemmap_start);
>>   READ_NUMBER_UNSIGNED("VMEMMAP_END", vmemmap_end);
>> diff --git a/makedumpfile.h 

Re: [PATCH] makedumpfile: ppc64: get vmalloc start address from vmcoreinfo

2024-03-18 Thread Aditya Gupta

Hi,
The commit removing 'vmap_area_list' is now merged in Linux mainline tree.
    commit:     55c49fee57af99f3c663e69dedc5b85e691bbe50
    mm/vmalloc: remove vmap_area_list

Any comments on this patch ?

Thanks,

Aditya Gupta

On 24/02/24 00:33, Aditya Gupta wrote:

Below error was noticed when running makedumpfile on linux-next kernel
crash (linux-next tag next-20240121):

 ...
 Checking for memory holes : [100.0 %] | readpage_elf: Attempt to read 
non-existent page at 0xc.
 [ 17.551718] kdump.sh[404]: readmem: type_addr: 0, addr:c00c, 
size:16384
 [ 17.551793] kdump.sh[404]: __exclude_unnecessary_pages: Can't read the 
buffer of struct page.
 [ 17.551864] kdump.sh[404]: create_2nd_bitmap: Can't exclude unnecessary 
pages.
 [ 17.562632] kdump.sh[404]: The kernel version is not supported.
 [ 17.562708] kdump.sh[404]: The makedumpfile operation may be incomplete.
 [ 17.562773] kdump.sh[404]: makedumpfile Failed.
 [ 17.564335] kdump[406]: saving vmcore failed, _exitcode:1

Above error was due to 'vmap_area_list' and 'vmlist' symbols missing
from the vmcore.

'vmap_area_list' was removed in the linux kernel with below commit:

 commit 378eb24a0658dd922b29524e0ce35c6c43f56cba
  mm/vmalloc: remove vmap_area_list

Subsequently the commit also introduced 'VMALLOC_START' in vmcoreinfo to
get base address of vmalloc area, instead of depending on 'vmap_area_list'

Hence if 'VMALLOC_START' symbol is there in vmcoreinfo:
   1. Set vmalloc_start based on 'VMALLOC_START'
   2. Don't error if vmap_area_list/vmlist are not defined

Reported-by: Sachin Sant 
Signed-off-by: Aditya Gupta 
---
  arch/ppc64.c   | 19 +--
  makedumpfile.c |  3 ++-
  makedumpfile.h |  6 +++---
  3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/ppc64.c b/arch/ppc64.c
index 96c357cb0335..bb62e2cd199a 100644
--- a/arch/ppc64.c
+++ b/arch/ppc64.c
@@ -568,7 +568,9 @@ get_machdep_info_ppc64(void)
/*
 * Get vmalloc_start value from either vmap_area_list or vmlist.
 */
-   if ((SYMBOL(vmap_area_list) != NOT_FOUND_SYMBOL)
+   if (NUMBER(vmalloc_start) != NOT_FOUND_SYMBOL) {
+   vmalloc_start = NUMBER(vmalloc_start);
+   } else if ((SYMBOL(vmap_area_list) != NOT_FOUND_SYMBOL)
&& (OFFSET(vmap_area.va_start) != NOT_FOUND_STRUCTURE)
&& (OFFSET(vmap_area.list) != NOT_FOUND_STRUCTURE)) {
if (!readmem(VADDR, SYMBOL(vmap_area_list) + 
OFFSET(list_head.next),
@@ -684,11 +686,16 @@ vaddr_to_paddr_ppc64(unsigned long vaddr)
if ((SYMBOL(vmap_area_list) == NOT_FOUND_SYMBOL)
|| (OFFSET(vmap_area.va_start) == NOT_FOUND_STRUCTURE)
|| (OFFSET(vmap_area.list) == NOT_FOUND_STRUCTURE)) {
-   if ((SYMBOL(vmlist) == NOT_FOUND_SYMBOL)
-   || (OFFSET(vm_struct.addr) == NOT_FOUND_STRUCTURE)) {
-   ERRMSG("Can't get info for vmalloc translation.\n");
-   return NOT_PADDR;
-   }
+   /*
+* Don't depend on vmap_area_list/vmlist if vmalloc_start is 
set in
+* vmcoreinfo, in that case proceed without error
+*/
+   if (NUMBER(vmalloc_start) == NOT_FOUND_NUMBER)
+   if ((SYMBOL(vmlist) == NOT_FOUND_SYMBOL)
+   || (OFFSET(vm_struct.addr) == 
NOT_FOUND_STRUCTURE)) {
+   ERRMSG("Can't get info for vmalloc 
translation.\n");
+   return NOT_PADDR;
+   }
}
  
  	return ppc64_vtop_level4(vaddr);

diff --git a/makedumpfile.c b/makedumpfile.c
index b004b93fecb7..b6c63fad15f3 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -2978,6 +2978,8 @@ read_vmcoreinfo(void)
READ_NUMBER("PAGE_OFFLINE_MAPCOUNT_VALUE", PAGE_OFFLINE_MAPCOUNT_VALUE);
READ_NUMBER("phys_base", phys_base);
READ_NUMBER("KERNEL_IMAGE_SIZE", KERNEL_IMAGE_SIZE);
+
+   READ_NUMBER_UNSIGNED("VMALLOC_START", vmalloc_start);
  #ifdef __aarch64__
READ_NUMBER("VA_BITS", VA_BITS);
READ_NUMBER("TCR_EL1_T1SZ", TCR_EL1_T1SZ);
@@ -2989,7 +2991,6 @@ read_vmcoreinfo(void)
READ_NUMBER("VA_BITS", va_bits);
READ_NUMBER_UNSIGNED("phys_ram_base", phys_ram_base);
READ_NUMBER_UNSIGNED("PAGE_OFFSET", page_offset);
-   READ_NUMBER_UNSIGNED("VMALLOC_START", vmalloc_start);
READ_NUMBER_UNSIGNED("VMALLOC_END", vmalloc_end);
READ_NUMBER_UNSIGNED("VMEMMAP_START", vmemmap_start);
READ_NUMBER_UNSIGNED("VMEMMAP_END", vmemmap_end);
diff --git a/makedumpfile.h b/makedumpfile.h
index 59c83e1d9df3..4021c5af2a34 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -541,8 +541,6 @@ do { \
   * The value of dependence on machine
   */
  #define PAGE_OFFSET   (info->page_offset)
-#define VMALLOC_START  (info->vmalloc_start)
-#define VMALLOC_END

[PATCH v2 3/3] x86/snp: Convert shared memory back to private on kexec

2024-03-18 Thread Ashish Kalra
From: Ashish Kalra 

SNP guests allocate shared buffers to perform I/O. It is done by
allocating pages normally from the buddy allocator and converting them
to shared with set_memory_decrypted().

The second kernel has no idea what memory is converted this way. It only
sees E820_TYPE_RAM.

Accessing shared memory via private mapping will cause unrecoverable RMP
page-faults.

On kexec walk direct mapping and convert all shared memory back to
private. It makes all RAM private again and second kernel may use it
normally. Additionally for SNP guests convert all bss decrypted section
pages back to private and switch back ROM regions to shared so that
their revalidation does not fail during kexec kernel boot.

The conversion occurs in two steps: stopping new conversions and
unsharing all memory. In the case of normal kexec, the stopping of
conversions takes place while scheduling is still functioning. This
allows for waiting until any ongoing conversions are finished. The
second step is carried out when all CPUs except one are inactive and
interrupts are disabled. This prevents any conflicts with code that may
access shared memory.

Signed-off-by: Ashish Kalra 
---
 arch/x86/include/asm/probe_roms.h |   1 +
 arch/x86/include/asm/sev.h|   4 +
 arch/x86/kernel/probe_roms.c  |  16 +++
 arch/x86/kernel/sev.c | 169 ++
 arch/x86/mm/mem_encrypt_amd.c |   3 +
 5 files changed, 193 insertions(+)

diff --git a/arch/x86/include/asm/probe_roms.h 
b/arch/x86/include/asm/probe_roms.h
index 1c7f3815bbd6..d50b67dbff33 100644
--- a/arch/x86/include/asm/probe_roms.h
+++ b/arch/x86/include/asm/probe_roms.h
@@ -6,4 +6,5 @@ struct pci_dev;
 extern void __iomem *pci_map_biosrom(struct pci_dev *pdev);
 extern void pci_unmap_biosrom(void __iomem *rom);
 extern size_t pci_biosrom_size(struct pci_dev *pdev);
+extern void snp_kexec_unprep_rom_memory(void);
 #endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index d7b27cb34c2b..867518b9bcad 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -229,6 +229,8 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end);
 u64 snp_get_unsupported_features(u64 status);
 u64 sev_get_status(void);
 void kdump_sev_callback(void);
+void snp_kexec_unshare_mem(void);
+void snp_kexec_stop_conversion(bool crash);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -258,6 +260,8 @@ static inline void snp_accept_memory(phys_addr_t start, 
phys_addr_t end) { }
 static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
 static inline u64 sev_get_status(void) { return 0; }
 static inline void kdump_sev_callback(void) { }
+void snp_kexec_unshare_mem(void) {}
+static void snp_kexec_stop_conversion(bool crash) {}
 #endif
 
 #ifdef CONFIG_KVM_AMD_SEV
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 319fef37d9dc..457f1e5c8d00 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -177,6 +177,22 @@ size_t pci_biosrom_size(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL(pci_biosrom_size);
 
+void snp_kexec_unprep_rom_memory(void)
+{
+   unsigned long vaddr, npages, sz;
+
+   /*
+* Switch back ROM regions to shared so that their validation
+* does not fail during kexec kernel boot.
+*/
+   vaddr = (unsigned long)__va(video_rom_resource.start);
+   sz = (system_rom_resource.end + 1) - video_rom_resource.start;
+   npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+   snp_set_memory_shared(vaddr, npages);
+}
+EXPORT_SYMBOL(snp_kexec_unprep_rom_memory);
+
 #define ROMSIGNATURE 0xaa55
 
 static int __init romsignature(const unsigned char *rom)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1ef7ae806a01..7443a9620a31 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DR7_RESET_VALUE0x400
 
@@ -71,6 +72,9 @@ static struct ghcb *boot_ghcb __section(".data");
 /* Bitmap of SEV features supported by the hypervisor */
 static u64 sev_hv_features __ro_after_init;
 
+/* Last address to be switched to private during kexec */
+static unsigned long kexec_last_addr_to_make_private;
+
 /* #VC handler runtime per-CPU data */
 struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -906,6 +910,171 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
 }
 
+static bool set_pte_enc(pte_t *kpte, int level, void *va)
+{
+   pte_t new_pte;
+
+   if (pte_none(*kpte))
+   return false;
+
+   /*
+* Change the physical page attribute from C=0 to C=1. Flush the
+* caches to ensure that data gets accessed with the correct C-bit.
+*/
+   if (pte_present(*kpte))
+   clflush_cache_range(va, page_level_size(level));
+
+   

[PATCH v2 2/3] x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump.

2024-03-18 Thread Ashish Kalra
From: Ashish Kalra 

During crashkernel boot only pre-allocated crash memory is presented as
E820_TYPE_RAM. This can cause page table entries mapping unaccepted memory
table to be zapped during phys_pte_init(), phys_pmd_init(), phys_pud_init()
and phys_p4d_init() as SNP/TDX guest use E820_TYPE_ACPI to store the
unaccepted memory table and pass it between the kernels on
kexec/kdump.

E820_TYPE_ACPI covers not only ACPI data, but also EFI tables and might
be required by kernel to function properly.

The problem was discovered during debugging kdump for SNP guest. The
unaccepted memory table stored with E820_TYPE_ACPI and passed between
the kernels on kdump was getting zapped as the PMD entry mapping this
is above the E820_TYPE_RAM range for the reserved crashkernel memory.

Signed-off-by: Ashish Kalra 
---
 arch/x86/mm/init_64.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a0dffaca6d2b..cc294a9e9fd7 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -469,7 +469,9 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, 
unsigned long paddr_end,
!e820__mapped_any(paddr & PAGE_MASK, paddr_next,
 E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PAGE_MASK, paddr_next,
-E820_TYPE_RESERVED_KERN))
+E820_TYPE_RESERVED_KERN) &&
+   !e820__mapped_any(paddr & PAGE_MASK, paddr_next,
+E820_TYPE_ACPI))
set_pte_init(pte, __pte(0), init);
continue;
}
@@ -524,7 +526,9 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, 
unsigned long paddr_end,
!e820__mapped_any(paddr & PMD_MASK, paddr_next,
 E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PMD_MASK, paddr_next,
-E820_TYPE_RESERVED_KERN))
+E820_TYPE_RESERVED_KERN) &&
+   !e820__mapped_any(paddr & PMD_MASK, paddr_next,
+E820_TYPE_ACPI))
set_pmd_init(pmd, __pmd(0), init);
continue;
}
@@ -611,7 +615,9 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, 
unsigned long paddr_end,
!e820__mapped_any(paddr & PUD_MASK, paddr_next,
 E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PUD_MASK, paddr_next,
-E820_TYPE_RESERVED_KERN))
+E820_TYPE_RESERVED_KERN) &&
+   !e820__mapped_any(paddr & PUD_MASK, paddr_next,
+E820_TYPE_ACPI))
set_pud_init(pud, __pud(0), init);
continue;
}
@@ -698,7 +704,9 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, 
unsigned long paddr_end,
!e820__mapped_any(paddr & P4D_MASK, paddr_next,
 E820_TYPE_RAM) &&
!e820__mapped_any(paddr & P4D_MASK, paddr_next,
-E820_TYPE_RESERVED_KERN))
+E820_TYPE_RESERVED_KERN) &&
+   !e820__mapped_any(paddr & P4D_MASK, paddr_next,
+E820_TYPE_ACPI))
set_p4d_init(p4d, __p4d(0), init);
continue;
}
-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec.

2024-03-18 Thread Ashish Kalra
From: Ashish Kalra 

For kexec use case, need to use and stick to the EFI memmap passed
from the first kernel via boot-params/setup data, hence,
skip efi_arch_mem_reserve() during kexec.

Additionally during SNP guest kexec testing discovered that EFI memmap
is corrupted during chained kexec. kexec_enter_virtual_mode() during
late init will remap the efi_memmap physical pages allocated in
efi_arch_mem_reserve() via memboot & then subsequently cause random
EFI memmap corruption once memblock is freed/teared-down.

Signed-off-by: Ashish Kalra 
---
 arch/x86/platform/efi/quirks.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..d4562d074371 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -258,6 +258,16 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
size)
int num_entries;
void *new;
 
+   /*
+* For kexec use case, we need to use the EFI memmap passed from the 
first
+* kernel via setup data, so we need to skip this.
+* Additionally kexec_enter_virtual_mode() during late init will remap
+* the efi_memmap physical pages allocated here via memboot & then
+* subsequently cause random EFI memmap corruption once memblock is 
freed.
+*/
+   if (efi_setup)
+   return;
+
if (efi_mem_desc_lookup(addr, ) ||
md.type != EFI_BOOT_SERVICES_DATA) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", 
);
-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 0/3] x86/snp: Add kexec support

2024-03-18 Thread Ashish Kalra
From: Ashish Kalra 

The patchset adds bits and pieces to get kexec (and crashkernel) work on
SNP guest.

v2:
- address zeroing of unaccepted memory table mappings at all page table levels
  adding phys_pte_init(), phys_pud_init() and phys_p4d_init().
- include skip efi_arch_mem_reserve() in case of kexec as part of this 
  patch set.
- rename last_address_shd_kexec to a more appropriate 
  kexec_last_address_to_make_private.
- remove duplicate code shared with TDX and use common interfaces
  defined for SNP and TDX for kexec/kdump.
- remove set_pte_enc() dependency on pg_level_to_pfn() and make the 
  function simpler.
- rename unshare_pte() to make_pte_private().
- clarify and make the comment for using kexec_last_address_to_make_private  
  more understandable.
- general cleanup. 

Ashish Kalra (3):
  efi/x86: skip efi_arch_mem_reserve() in case of kexec.
  x86/mm: Do not zap page table entries mapping unaccepted memory table
during kdump.
  x86/snp: Convert shared memory back to private on kexec

 arch/x86/include/asm/probe_roms.h |   1 +
 arch/x86/include/asm/sev.h|   4 +
 arch/x86/kernel/probe_roms.c  |  16 +++
 arch/x86/kernel/sev.c | 169 ++
 arch/x86/mm/init_64.c |  16 ++-
 arch/x86/mm/mem_encrypt_amd.c |   3 +
 arch/x86/platform/efi/quirks.c|  10 ++
 7 files changed, 215 insertions(+), 4 deletions(-)

-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH linux-next] bpf: fix warning for crash_kexec

2024-03-18 Thread Hari Bathini

Just checking on whether this will go via bpf or mm tree?

On 09/02/24 6:05 pm, Hari Bathini wrote:

With [1], CONFIG_KEXEC & !CONFIG_CRASH_DUMP is supported but that led
to the below warning:

   "WARN: resolve_btfids: unresolved symbol crash_kexec"

Fix it by using the appropriate #ifdef.

[1] https://lore.kernel.org/all/20240124051254.67105-1-...@redhat.com/

Signed-off-by: Hari Bathini 
---
  kernel/bpf/helpers.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4db1c658254c..e408d1115e26 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2545,7 +2545,7 @@ __bpf_kfunc void bpf_throw(u64 cookie)
  __bpf_kfunc_end_defs();
  
  BTF_KFUNCS_START(generic_btf_ids)

-#ifdef CONFIG_KEXEC_CORE
+#ifdef CONFIG_CRASH_DUMP
  BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
  #endif
  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec