[PATCH 2/2 RESEND v10] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2019-03-29 Thread Lianbo Jiang
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 17ffc869cab8..1db2754df9e9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/2 RESEND v10] add reserved e820 ranges to the kdump kernel e820 table

2019-03-29 Thread Lianbo Jiang
This patchset did two things:

a). add a new I/O resource descriptor 'IORES_DESC_RESERVED'
When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.

But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
created for the reserved areas, the code originally related to the
descriptor 'IORES_DESC_NONE' also need to be updated.

b). add the e820 reserved ranges to kdump kernel e820 table
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()

Changes since v8:
1. Get rid of all changes about ia64.(Borislav's suggestion)
2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
3. Modify the signature. This patch(add the new I/O resource
   descriptor 'IORES_DESC_RESERVED') was suggested by Boris.

Changes since v9:
1. Improve patch log.
2. No need to modify the kernel/resource.c, so correct them.
3. Change the name of the __ioremap_check_desc_other() to
   __ioremap_check_desc_none_and_reserved(), and modify the
   check condition, add comment above it.

Lianbo Jiang (2):
  x86/mm, resource: add a new I/O resource descriptor
'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/x86/kernel/crash.c |  6 ++
 arch/x86/kernel/e820.c  |  2 +-
 arch/x86/mm/ioremap.c   | 18 +++---
 include/linux/ioport.h  |  1 +
 4 files changed, 23 insertions(+), 4 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2 v10] x86/mm, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED'

2019-03-29 Thread lijiang
在 2019年03月29日 18:39, Borislav Petkov 写道:
> On Fri, Mar 29, 2019 at 02:56:48PM +0800, Lianbo Jiang wrote:
>> When doing kexec_file_load(), the first kernel needs to pass the e820
>> reserved ranges to the second kernel, because some devices may use it
>> in kdump kernel, such as PCI devices.
>>
>> But, the kernel can not exactly match the e820 reserved ranges when
>> walking through the iomem resources via the 'IORES_DESC_NONE', because
>> there are several types of e820 that are described as the 'IORES_DESC_NONE'
>> type. Please refer to the e820_type_to_iores_desc().
>>
>> Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
>> the iomem resources search interfaces. It is helpful to exactly match
>> the reserved resource ranges when walking through iomem resources.
>>
>> In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
>> created for the reserved areas, the code originally related to the
>> descriptor 'IORES_DESC_NONE' also need to be updated.
>>
>> Suggested-by: Borislav Petkov 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/x86/kernel/e820.c |  2 +-
>>  arch/x86/mm/ioremap.c  | 16 ++--
>>  include/linux/ioport.h |  1 +
>>  3 files changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>> index 2879e234e193..16fcde196243 100644
>> --- a/arch/x86/kernel/e820.c
>> +++ b/arch/x86/kernel/e820.c
>> @@ -1050,10 +1050,10 @@ static unsigned long __init 
>> e820_type_to_iores_desc(struct e820_entry *entry)
>>  case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
>>  case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
>>  case E820_TYPE_PRAM:return 
>> IORES_DESC_PERSISTENT_MEMORY_LEGACY;
>> +case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
>>  case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>>  case E820_TYPE_RAM: /* Fall-through: */
>>  case E820_TYPE_UNUSABLE:/* Fall-through: */
>> -case E820_TYPE_RESERVED:/* Fall-through: */
>>  default:return IORES_DESC_NONE;
>>  }
>>  }
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 0029604af8a4..5671ec24df49 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -81,9 +81,21 @@ static bool __ioremap_check_ram(struct resource *res)
>>  return false;
>>  }
>>  
>> -static int __ioremap_check_desc_other(struct resource *res)
> 
> I can see this patch doesn't build even without applying and building
> it.
> 
> How about you build-test your stuff before submitting?
> 
Oh, my God. I made a mistake when i copied the code from another machine.
I will correct this issue and resend the patch v10.

Thanks.
Lianbo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2 v10] x86/mm, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED'

2019-03-29 Thread Borislav Petkov
On Fri, Mar 29, 2019 at 02:56:48PM +0800, Lianbo Jiang wrote:
> When doing kexec_file_load(), the first kernel needs to pass the e820
> reserved ranges to the second kernel, because some devices may use it
> in kdump kernel, such as PCI devices.
> 
> But, the kernel can not exactly match the e820 reserved ranges when
> walking through the iomem resources via the 'IORES_DESC_NONE', because
> there are several types of e820 that are described as the 'IORES_DESC_NONE'
> type. Please refer to the e820_type_to_iores_desc().
> 
> Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
> the iomem resources search interfaces. It is helpful to exactly match
> the reserved resource ranges when walking through iomem resources.
> 
> In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
> created for the reserved areas, the code originally related to the
> descriptor 'IORES_DESC_NONE' also need to be updated.
> 
> Suggested-by: Borislav Petkov 
> Signed-off-by: Lianbo Jiang 
> ---
>  arch/x86/kernel/e820.c |  2 +-
>  arch/x86/mm/ioremap.c  | 16 ++--
>  include/linux/ioport.h |  1 +
>  3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 2879e234e193..16fcde196243 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1050,10 +1050,10 @@ static unsigned long __init 
> e820_type_to_iores_desc(struct e820_entry *entry)
>   case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
>   case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
>   case E820_TYPE_PRAM:return 
> IORES_DESC_PERSISTENT_MEMORY_LEGACY;
> + case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
>   case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>   case E820_TYPE_RAM: /* Fall-through: */
>   case E820_TYPE_UNUSABLE:/* Fall-through: */
> - case E820_TYPE_RESERVED:/* Fall-through: */
>   default:return IORES_DESC_NONE;
>   }
>  }
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 0029604af8a4..5671ec24df49 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -81,9 +81,21 @@ static bool __ioremap_check_ram(struct resource *res)
>   return false;
>  }
>  
> -static int __ioremap_check_desc_other(struct resource *res)

I can see this patch doesn't build even without applying and building
it.

How about you build-test your stuff before submitting?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] x86/boot: Use EFI setup data if provided

2019-03-29 Thread Junichi Nomura
On 3/29/19 6:44 PM, Chao Fan wrote:
> But in my qemu guest, they are different and the address of ACPI20 is
> higher than ACPI 1.0:
> [root@localhost ~]# cat /sys/firmware/efi/systab
> ACPI20=0xbfbfa014
> ACPI=0xbfbfa000
> SMBIOS=0xbfbcc000
> 
> In this condition, ACPI 1.0 comes before ACPI 2.0. So I suggested you
> to keep this logical.

Thank you for the information. The proposed patch keeps the logic
and should work fine on that case as well.

-- 
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] x86/boot: Use EFI setup data if provided

2019-03-29 Thread Chao Fan
On Fri, Mar 29, 2019 at 09:37:00AM +, Junichi Nomura wrote:
>On 3/29/19 6:16 PM, Borislav Petkov wrote:
>> On Fri, Mar 29, 2019 at 05:05:50PM +0800, Chao Fan wrote:
>>> But in my code, I am not sure which version will be found firstly, so I
>>> write this logical, if ACPI20 found, return directly, then consider ACPI 
>>> 1.0.
>> 
>> Thanks.
>> 
>> Junichi, please add a shorter version of that as a comment to the code,
>> above the function name so that it is clear why we're preferring the 2.0
>> version.
>
>Sure, I'll add this above __efi_get_rsdp_addr().
>
>/*
> * Search EFI system tables for RSDP.  If both ACPI_20_TABLE_GUID and
> * ACPI_TABLE_GUID are found, take the former, which has more features.
> */
>

I notice in my host machine, the two tables are the same:
[17:38:11] cfan@localhost /home/cfan (0)
> sudo cat /sys/firmware/efi/systab
[sudo] password for cfan:
MPS=0xfd420
ACPI20=0xdb807000
ACPI=0xdb807000
SMBIOS=0xf04c0

But in my qemu guest, they are different and the address of ACPI20 is
higher than ACPI 1.0:
[root@localhost ~]# cat /sys/firmware/efi/systab
ACPI20=0xbfbfa014
ACPI=0xbfbfa000
SMBIOS=0xbfbcc000

In this condition, ACPI 1.0 comes before ACPI 2.0. So I suggested you
to keep this logical.

Thanks,
Chao Fan

>-- 
>Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
>



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] x86/boot: Use EFI setup data if provided

2019-03-29 Thread Junichi Nomura
On 3/29/19 6:16 PM, Borislav Petkov wrote:
> On Fri, Mar 29, 2019 at 05:05:50PM +0800, Chao Fan wrote:
>> But in my code, I am not sure which version will be found firstly, so I
>> write this logical, if ACPI20 found, return directly, then consider ACPI 1.0.
> 
> Thanks.
> 
> Junichi, please add a shorter version of that as a comment to the code,
> above the function name so that it is clear why we're preferring the 2.0
> version.

Sure, I'll add this above __efi_get_rsdp_addr().

/*
 * Search EFI system tables for RSDP.  If both ACPI_20_TABLE_GUID and
 * ACPI_TABLE_GUID are found, take the former, which has more features.
 */

-- 
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-03-29 Thread Chao Fan
On Fri, Mar 29, 2019 at 05:16:55PM +0800, b...@redhat.com wrote:
>On 03/29/19 at 04:29pm, Chao Fan wrote:
>> On Fri, Mar 29, 2019 at 07:20:38AM +, Junichi Nomura wrote:
>> >Commit 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in
>> >boot_params") broke kexec boot on EFI systems.  efi_get_rsdp_addr()
>> >in the early parsing code tries to search RSDP from EFI table but
>> >that will crash because the table address is virtual when the kernel
>> >was booted by kexec.
>> [...]
>> >-   guid  = tbl->guid;
>> >-   table = tbl->table;
>> >-   }
>> >-
>> >-   if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
>> >-   rsdp_addr = table;
>> >-   else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
>> >-   return table;
>> >-   }
>> >+   return __efi_get_rsdp_addr(config_tables, nr_tables, efi_64);
>> >+#else
>> >+   return 0;
>> > #endif
>> >-   return rsdp_addr;
>> 
>> I remeber the rsdp_addr is defined before #ifdef CONFIG_EFI
>> If so, you don't need
>> #else
>>  return 0;
>
>it doesn't change the old logic. Junichi moved rsdp_addr definition inside
>the CONFIG_EFI iedeffery block.

Yes, got it. Thanks for your explain.

Thanks,
Chao Fan
>> 
>> BY the way, what's your patch based on? I like add patch on my local
>> branch and then review code, but failed.
>> I try to use 'patch -p1 <' your patch to the latest tip master branch,
>> but failed.
>> 
>> Thanks,
>> Chao Fan
>> 
>> > }
>> > 
>> > static u8 compute_checksum(u8 *buffer, u32 length)
>> >@@ -221,6 +284,9 @@ acpi_physical_address get_rsdp_addr(void)
>> >pa = boot_params->acpi_rsdp_addr;
>> > 
>> >if (!pa)
>> >+   pa = kexec_get_rsdp_addr();
>> >+
>> >+   if (!pa)
>> >pa = efi_get_rsdp_addr();
>> > 
>> >if (!pa)
>> >
>> >
>> 
>> 
>
>



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-03-29 Thread Chao Fan
On Fri, Mar 29, 2019 at 08:39:45AM +, Junichi Nomura wrote:
>On 3/29/19 5:29 PM, Chao Fan wrote:
>> On Fri, Mar 29, 2019 at 07:20:38AM +, Junichi Nomura wrote:
>>> +   return __efi_get_rsdp_addr(config_tables, nr_tables, efi_64);
>>> +#else
>>> +   return 0;
>>> #endif
>>> -   return rsdp_addr;
>> 
>> I remeber the rsdp_addr is defined before #ifdef CONFIG_EFI
>> If so, you don't need
>> #else
>>  return 0;
>
>I moved the whole __efi_get_rsdp_addr() to the inside of #ifdef CONFIG_EFI
>and both kexec_get_rsdp_addr() and efi_get_rsdp_addr() just return 0 if
>CONFIG_EFI is not defined.

Ah, got it. I will add the patch and do a simple test.
>
>> BY the way, what's your patch based on? I like add patch on my local
>> branch and then review code, but failed.
>> I try to use 'patch -p1 <' your patch to the latest tip master branch,
>> but failed.
>
>The patch is based on Linus's v5.1-rc2.

Thanks,
Chao Fan

>
>-- 
>Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.
>
>



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-03-29 Thread b...@redhat.com
On 03/29/19 at 04:29pm, Chao Fan wrote:
> On Fri, Mar 29, 2019 at 07:20:38AM +, Junichi Nomura wrote:
> >Commit 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in
> >boot_params") broke kexec boot on EFI systems.  efi_get_rsdp_addr()
> >in the early parsing code tries to search RSDP from EFI table but
> >that will crash because the table address is virtual when the kernel
> >was booted by kexec.
> [...]
> >-guid  = tbl->guid;
> >-table = tbl->table;
> >-}
> >-
> >-if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
> >-rsdp_addr = table;
> >-else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
> >-return table;
> >-}
> >+return __efi_get_rsdp_addr(config_tables, nr_tables, efi_64);
> >+#else
> >+return 0;
> > #endif
> >-return rsdp_addr;
> 
> I remeber the rsdp_addr is defined before #ifdef CONFIG_EFI
> If so, you don't need
> #else
>   return 0;

it doesn't change the old logic. Junichi moved rsdp_addr definition inside
the CONFIG_EFI iedeffery block.
> 
> BY the way, what's your patch based on? I like add patch on my local
> branch and then review code, but failed.
> I try to use 'patch -p1 <' your patch to the latest tip master branch,
> but failed.
> 
> Thanks,
> Chao Fan
> 
> > }
> > 
> > static u8 compute_checksum(u8 *buffer, u32 length)
> >@@ -221,6 +284,9 @@ acpi_physical_address get_rsdp_addr(void)
> > pa = boot_params->acpi_rsdp_addr;
> > 
> > if (!pa)
> >+pa = kexec_get_rsdp_addr();
> >+
> >+if (!pa)
> > pa = efi_get_rsdp_addr();
> > 
> > if (!pa)
> >
> >
> 
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] x86/boot: Use EFI setup data if provided

2019-03-29 Thread Borislav Petkov
On Fri, Mar 29, 2019 at 05:05:50PM +0800, Chao Fan wrote:
> But in my code, I am not sure which version will be found firstly, so I
> write this logical, if ACPI20 found, return directly, then consider ACPI 1.0.

Thanks.

Junichi, please add a shorter version of that as a comment to the code,
above the function name so that it is clear why we're preferring the 2.0
version.

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] x86/boot: Use EFI setup data if provided

2019-03-29 Thread Chao Fan
On Fri, Mar 29, 2019 at 09:39:20AM +0100, Borislav Petkov wrote:
>On Fri, Mar 29, 2019 at 03:05:52AM +, Junichi Nomura wrote:
>> > You don't need that variable and can return "table" or 0 after the endif
>> > below.
>> 
>> I could do that but it will slightly change the current logic.
>> 
>> Existing code does this:
>> 
>> if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
>> rsdp_addr = table;
>> else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
>> return table;
>> 
>> I thought it was to return the table for ACPI_20_TABLE_GUID
>> if both tables exist.  If we remove rsdp_addr, the code will be:
>> 
>> if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
>> return table;
>> else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
>> return table;
>> 
>> So if there are 2 tables, we return the one that comes first.
>> Is it ok?
>
>That's a good question.
>
>Chao, what was the intention there, ACPI_20_TABLE_GUID is the preferred
>table to return? If so, why?

Yes, ACPI_20_TABLE_GUID is preferred.

ACPI_20 means version 2.0 and later versions, ACPI_TABLE_GUID is version 1.0
which is earlier than 2003, it's too old. Version 2.0 has more features than 
1.0.
Sure the new version is preferred.

So many codes prefers ACPI20, such as in drivers/acpi/osl.c where kernel
parses RSDP ACPI20 firstly. Documentation/ABI/testing/sysfs-firmware-efi says in
/sys/firmware/efi/systab, ACPI20 comes before ACPI. So that kexec-tools
code kexec/arch/i386/crashdump-x86.c can easily get ACPI_20(if there is
ACPI_20) before ACPI 1.0.
But in my code, I am not sure which version will be found firstly, so I
write this logical, if ACPI20 found, return directly, then consider ACPI 1.0.

Thanks,
Chao Fan

>-- 
>Regards/Gruss,
>Boris.
>
>Good mailing practices for 400: avoid top-posting and trim the reply.
>
>



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-03-29 Thread Junichi Nomura
On 3/29/19 5:29 PM, Chao Fan wrote:
> On Fri, Mar 29, 2019 at 07:20:38AM +, Junichi Nomura wrote:
>> +return __efi_get_rsdp_addr(config_tables, nr_tables, efi_64);
>> +#else
>> +return 0;
>> #endif
>> -return rsdp_addr;
> 
> I remeber the rsdp_addr is defined before #ifdef CONFIG_EFI
> If so, you don't need
> #else
>   return 0;

I moved the whole __efi_get_rsdp_addr() to the inside of #ifdef CONFIG_EFI
and both kexec_get_rsdp_addr() and efi_get_rsdp_addr() just return 0 if
CONFIG_EFI is not defined.

> BY the way, what's your patch based on? I like add patch on my local
> branch and then review code, but failed.
> I try to use 'patch -p1 <' your patch to the latest tip master branch,
> but failed.

The patch is based on Linus's v5.1-rc2.

-- 
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] x86/boot: Use EFI setup data if provided

2019-03-29 Thread Borislav Petkov
On Fri, Mar 29, 2019 at 03:05:52AM +, Junichi Nomura wrote:
> > You don't need that variable and can return "table" or 0 after the endif
> > below.
> 
> I could do that but it will slightly change the current logic.
> 
> Existing code does this:
> 
> if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
> rsdp_addr = table;
> else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
> return table;
> 
> I thought it was to return the table for ACPI_20_TABLE_GUID
> if both tables exist.  If we remove rsdp_addr, the code will be:
> 
> if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
> return table;
> else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
> return table;
> 
> So if there are 2 tables, we return the one that comes first.
> Is it ok?

That's a good question.

Chao, what was the intention there, ACPI_20_TABLE_GUID is the preferred
table to return? If so, why?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-03-29 Thread Chao Fan
On Fri, Mar 29, 2019 at 07:20:38AM +, Junichi Nomura wrote:
>Commit 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in
>boot_params") broke kexec boot on EFI systems.  efi_get_rsdp_addr()
>in the early parsing code tries to search RSDP from EFI table but
>that will crash because the table address is virtual when the kernel
>was booted by kexec.
[...]
>-  guid  = tbl->guid;
>-  table = tbl->table;
>-  }
>-
>-  if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
>-  rsdp_addr = table;
>-  else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
>-  return table;
>-  }
>+  return __efi_get_rsdp_addr(config_tables, nr_tables, efi_64);
>+#else
>+  return 0;
> #endif
>-  return rsdp_addr;

I remeber the rsdp_addr is defined before #ifdef CONFIG_EFI
If so, you don't need
#else
return 0;

BY the way, what's your patch based on? I like add patch on my local
branch and then review code, but failed.
I try to use 'patch -p1 <' your patch to the latest tip master branch,
but failed.

Thanks,
Chao Fan

> }
> 
> static u8 compute_checksum(u8 *buffer, u32 length)
>@@ -221,6 +284,9 @@ acpi_physical_address get_rsdp_addr(void)
>   pa = boot_params->acpi_rsdp_addr;
> 
>   if (!pa)
>+  pa = kexec_get_rsdp_addr();
>+
>+  if (!pa)
>   pa = efi_get_rsdp_addr();
> 
>   if (!pa)
>
>



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-03-29 Thread b...@redhat.com
On 03/29/19 at 07:20am, Junichi Nomura wrote:
> Commit 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in
> boot_params") broke kexec boot on EFI systems.  efi_get_rsdp_addr()
> in the early parsing code tries to search RSDP from EFI table but
> that will crash because the table address is virtual when the kernel
> was booted by kexec.
> 
> In the case of kexec, physical address of EFI tables is provided
> via efi_setup_data in boot_params, which is set up by kexec(1).
> 
> Factor out the table parsing code and use different pointers depending
> on whether the kernel is booted by kexec or not.
> 
> Fixes: 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in boot_params")
> Signed-off-by: Jun'ichi Nomura 
> Cc: Chao Fan 
> Cc: Borislav Petkov 
> Cc: Dave Young 
> Cc: Baoquan He 
> 
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -44,17 +44,109 @@ static acpi_physical_address get_acpi_rsdp(void)
>   return addr;
>  }
>  
> +#ifdef CONFIG_EFI
> +static unsigned long efi_get_kexec_setup_data_addr(void)
> +{
> + struct setup_data *data;
> + u64 pa_data;
> +
> + pa_data = boot_params->hdr.setup_data;
> + while (pa_data) {
> + data = (struct setup_data *) pa_data;
> + if (data->type == SETUP_EFI)
> + return pa_data + sizeof(struct setup_data);
> + pa_data = data->next;
> + }
> + return 0;
> +}
> +
>  /* Search EFI system tables for RSDP. */
> -static acpi_physical_address efi_get_rsdp_addr(void)
> +static acpi_physical_address
> +__efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables,
> + bool efi_64)
>  {
>   acpi_physical_address rsdp_addr = 0;
> + int i;
> +
> + /* Get EFI tables from systab. */
> + for (i = 0; i < nr_tables; i++) {
> + acpi_physical_address table;
> + efi_guid_t guid;
> +
> + if (efi_64) {
> + efi_config_table_64_t *tbl = (efi_config_table_64_t *) 
> config_tables + i;
> +
> + guid  = tbl->guid;
> + table = tbl->table;
> +
> + if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) {
> + debug_putstr("Error getting RSDP address: EFI 
> config table located above 4GB.\n");
> + return 0;
> + }
> + } else {
> + efi_config_table_32_t *tbl = (efi_config_table_32_t *) 
> config_tables + i;
> +
> + guid  = tbl->guid;
> + table = tbl->table;
> + }
> +
> + if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
> + rsdp_addr = table;
> + else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
> + return table;
> + }
> +
> + return rsdp_addr;
> +}
> +#endif
>  
> +static acpi_physical_address kexec_get_rsdp_addr(void)
> +{
>  #ifdef CONFIG_EFI
> - unsigned long systab, systab_tables, config_tables;
> + efi_system_table_64_t *systab;
> + struct efi_setup_data *esd;
> + struct efi_info *ei;
> + char *sig;
> +
> + esd = (struct efi_setup_data *) efi_get_kexec_setup_data_addr();
> + if (!esd)
> + return 0;
> +
> + if (!esd->tables) {
> + debug_putstr("Wrong kexec SETUP_EFI data.\n");
> + return 0;
> + }
> +
> + ei = _params->efi_info;
> + sig = (char *)>efi_loader_signature;
> + /*
> +  * EFI/kexec support is only added for 64bit. So we don't have to
> +  * care 32bit case.
> +  */
I would put the doc above kexec_get_rsdp_addr(). Other than this, it
looks good to me. Thanks for the effort.

Acked-by: Baoquan He 


> + if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
> + debug_putstr("Wrong kexec EFI loader signature.\n");
> + return 0;
> + }
> +
> + /* Get systab from boot params. */
> + systab = (efi_system_table_64_t *) (ei->efi_systab | 
> ((__u64)ei->efi_systab_hi << 32));
> + if (!systab)
> + error("EFI system table not found in kexec boot_params.");
> +
> + return __efi_get_rsdp_addr((unsigned long) esd->tables,
> +systab->nr_tables, true);
> +#else
> + return 0;
> +#endif
> +}
> +
> +static acpi_physical_address efi_get_rsdp_addr(void)
> +{
> +#ifdef CONFIG_EFI
> + unsigned long systab, config_tables;
>   unsigned int nr_tables;
>   struct efi_info *ei;
>   bool efi_64;
> - int size, i;
>   char *sig;
>  
>   ei = _params->efi_info;
> @@ -88,49 +180,20 @@ static acpi_physical_address efi_get_rsdp_addr(void)
>  
>   config_tables   = stbl->tables;
>   nr_tables   = stbl->nr_tables;
> - size= sizeof(efi_config_table_64_t);
>   } else {
>   

[PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-03-29 Thread Junichi Nomura
Commit 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in
boot_params") broke kexec boot on EFI systems.  efi_get_rsdp_addr()
in the early parsing code tries to search RSDP from EFI table but
that will crash because the table address is virtual when the kernel
was booted by kexec.

In the case of kexec, physical address of EFI tables is provided
via efi_setup_data in boot_params, which is set up by kexec(1).

Factor out the table parsing code and use different pointers depending
on whether the kernel is booted by kexec or not.

Fixes: 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in boot_params")
Signed-off-by: Jun'ichi Nomura 
Cc: Chao Fan 
Cc: Borislav Petkov 
Cc: Dave Young 
Cc: Baoquan He 

diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -44,17 +44,109 @@ static acpi_physical_address get_acpi_rsdp(void)
return addr;
 }
 
+#ifdef CONFIG_EFI
+static unsigned long efi_get_kexec_setup_data_addr(void)
+{
+   struct setup_data *data;
+   u64 pa_data;
+
+   pa_data = boot_params->hdr.setup_data;
+   while (pa_data) {
+   data = (struct setup_data *) pa_data;
+   if (data->type == SETUP_EFI)
+   return pa_data + sizeof(struct setup_data);
+   pa_data = data->next;
+   }
+   return 0;
+}
+
 /* Search EFI system tables for RSDP. */
-static acpi_physical_address efi_get_rsdp_addr(void)
+static acpi_physical_address
+__efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables,
+   bool efi_64)
 {
acpi_physical_address rsdp_addr = 0;
+   int i;
+
+   /* Get EFI tables from systab. */
+   for (i = 0; i < nr_tables; i++) {
+   acpi_physical_address table;
+   efi_guid_t guid;
+
+   if (efi_64) {
+   efi_config_table_64_t *tbl = (efi_config_table_64_t *) 
config_tables + i;
+
+   guid  = tbl->guid;
+   table = tbl->table;
+
+   if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) {
+   debug_putstr("Error getting RSDP address: EFI 
config table located above 4GB.\n");
+   return 0;
+   }
+   } else {
+   efi_config_table_32_t *tbl = (efi_config_table_32_t *) 
config_tables + i;
+
+   guid  = tbl->guid;
+   table = tbl->table;
+   }
+
+   if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
+   rsdp_addr = table;
+   else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
+   return table;
+   }
+
+   return rsdp_addr;
+}
+#endif
 
+static acpi_physical_address kexec_get_rsdp_addr(void)
+{
 #ifdef CONFIG_EFI
-   unsigned long systab, systab_tables, config_tables;
+   efi_system_table_64_t *systab;
+   struct efi_setup_data *esd;
+   struct efi_info *ei;
+   char *sig;
+
+   esd = (struct efi_setup_data *) efi_get_kexec_setup_data_addr();
+   if (!esd)
+   return 0;
+
+   if (!esd->tables) {
+   debug_putstr("Wrong kexec SETUP_EFI data.\n");
+   return 0;
+   }
+
+   ei = _params->efi_info;
+   sig = (char *)>efi_loader_signature;
+   /*
+* EFI/kexec support is only added for 64bit. So we don't have to
+* care 32bit case.
+*/
+   if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
+   debug_putstr("Wrong kexec EFI loader signature.\n");
+   return 0;
+   }
+
+   /* Get systab from boot params. */
+   systab = (efi_system_table_64_t *) (ei->efi_systab | 
((__u64)ei->efi_systab_hi << 32));
+   if (!systab)
+   error("EFI system table not found in kexec boot_params.");
+
+   return __efi_get_rsdp_addr((unsigned long) esd->tables,
+  systab->nr_tables, true);
+#else
+   return 0;
+#endif
+}
+
+static acpi_physical_address efi_get_rsdp_addr(void)
+{
+#ifdef CONFIG_EFI
+   unsigned long systab, config_tables;
unsigned int nr_tables;
struct efi_info *ei;
bool efi_64;
-   int size, i;
char *sig;
 
ei = _params->efi_info;
@@ -88,49 +180,20 @@ static acpi_physical_address efi_get_rsdp_addr(void)
 
config_tables   = stbl->tables;
nr_tables   = stbl->nr_tables;
-   size= sizeof(efi_config_table_64_t);
} else {
efi_system_table_32_t *stbl = (efi_system_table_32_t *)systab;
 
config_tables   = stbl->tables;
nr_tables   = stbl->nr_tables;
-   size= sizeof(efi_config_table_32_t);
}
 
if (!config_tables)
error("EFI config tables not found.");
 

[PATCH 1/2 v10] x86/mm, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED'

2019-03-29 Thread Lianbo Jiang
When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.

But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
created for the reserved areas, the code originally related to the
descriptor 'IORES_DESC_NONE' also need to be updated.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/e820.c |  2 +-
 arch/x86/mm/ioremap.c  | 16 ++--
 include/linux/ioport.h |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 2879e234e193..16fcde196243 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1050,10 +1050,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0029604af8a4..5671ec24df49 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -81,9 +81,21 @@ static bool __ioremap_check_ram(struct resource *res)
return false;
 }
 
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * Originally, these areas described as IORES_DESC_NONE are not mapped
+ * as encrypted when using ioremap(), for example, E820_TYPE_{RESERVED,
+ * RESERVED_KERN,RAM,UNUSABLE}, etc. It checks for a resource that is
+ * not described as IORES_DESC_NONE, which can make sure the reserved
+ * areas are not mapped as encrypted when using ioremap().
+ *
+ * Now IORES_DESC_RESERVED has been created for the reserved areas so
+ * the check needs to be expanded so that these areas are not mapped
+ * encrypted when using ioremap().
+ */
+static int __ioremap_check_desc_none_and_reserved(struct resource *res)
 {
-   return (res->desc != IORES_DESC_NONE);
+   return ((res->desc != IORES_DESC_NONE) &&
+   (res->desc != IORES_DESC_RESERVED));
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/2 v10] add reserved e820 ranges to the kdump kernel e820 table

2019-03-29 Thread Lianbo Jiang
This patchset did two things:

a). add a new I/O resource descriptor 'IORES_DESC_RESERVED'
When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.

But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
created for the reserved areas, the code originally related to the
descriptor 'IORES_DESC_NONE' also need to be updated.

b). add the e820 reserved ranges to kdump kernel e820 table
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()

Changes since v8:
1. Get rid of all changes about ia64.(Borislav's suggestion)
2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
3. Modify the signature. This patch(add the new I/O resource
   descriptor 'IORES_DESC_RESERVED') was suggested by Boris.

Changes since v9:
1. Improve patch log.
2. No need to modify the kernel/resource.c, so correct them.
3. Change the name of the __ioremap_check_desc_other() to
   __ioremap_check_desc_none_and_reserved(), and modify the
   check condition, add comment above it.

Lianbo Jiang (2):
  x86/mm, resource: add a new I/O resource descriptor
'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/x86/kernel/crash.c |  6 ++
 arch/x86/kernel/e820.c  |  2 +-
 arch/x86/mm/ioremap.c   | 16 ++--
 include/linux/ioport.h  |  1 +
 4 files changed, 22 insertions(+), 3 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/2 v10] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2019-03-29 Thread Lianbo Jiang
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 17ffc869cab8..1db2754df9e9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec