[PATCH] efi/fdt: More cleanups

2018-11-29 Thread Ingo Molnar


* Ard Biesheuvel  wrote:

> From: Julien Thierry 
> 
> Closing bracket seems to end a for statement when it is actually ending
> the contained if. Add some brackets to have clear delimitation of each
> scope.
> 
> No functional change/fix, just fix the indentation.
> 
> Signed-off-by: Julien Thierry 
> Signed-off-by: Ard Biesheuvel 
> ---
>  drivers/firmware/efi/libstub/fdt.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/efi/libstub/fdt.c 
> b/drivers/firmware/efi/libstub/fdt.c
> index 0c0d2312f4a8..a3614f9b5f75 100644
> --- a/drivers/firmware/efi/libstub/fdt.c
> +++ b/drivers/firmware/efi/libstub/fdt.c
> @@ -376,7 +376,7 @@ void *get_fdt(efi_system_table_t *sys_table, unsigned 
> long *fdt_size)
>   tables = (efi_config_table_t *) sys_table->tables;
>   fdt = NULL;
>  
> - for (i = 0; i < sys_table->nr_tables; i++)
> + for (i = 0; i < sys_table->nr_tables; i++) {
>   if (efi_guidcmp(tables[i].guid, fdt_guid) == 0) {
>   fdt = (void *) tables[i].table;
>   if (fdt_check_header(fdt) != 0) {
> @@ -385,7 +385,8 @@ void *get_fdt(efi_system_table_t *sys_table, unsigned 
> long *fdt_size)
>   }
>   *fdt_size = fdt_totalsize(fdt);
>   break;
> -  }
> + }
> + }

So if we are doing trivial cleanups, how about the patch below on top? It 
cleans up this file for good. Only minimally build tested.

Thanks,

Ingo

==>
Subject: efi/fdt: More cleanups
From: Ingo Molnar 

Apply a number of cleanups:

 - Introduce fdt_setprop_*var() helper macros to simplify and shorten repetitive
   sequences - this also makes it less likely that the wrong variable size is
   passed in. This change makes a lot of the property-setting calls single-line
   and easier to read.

 - Harmonize comment style: capitalization, punctuation, whitespaces, etc.

 - Fix some whitespace noise in the libstub Makefile which I happened to notice.

 - Use the standard tabular initialization style:

-   map.map =   _map;
-   map.map_size =  _size;
-   map.desc_size = _size;
-   map.desc_ver =  _ver;
-   map.key_ptr =   _key;
-   map.buff_size = _size;

+   map.map = _map;
+   map.map_size= _size;
+   map.desc_size   = _size;
+   map.desc_ver= _ver;
+   map.key_ptr = _key;
+   map.buff_size   = _size;

 - Use tabular structure definition for better readability.

 - Make all pr*() lines single-line, even if they marginally exceed 80 cols - 
this
   makes them visually less intrusive.

 - Unbreak line breaks into single lines when the length exceeds 80 cols only
   marginally, for better readability.

 - Move assignment closer to the actual usage site.

 - Plus some other smaller cleanups, spelling fixes, etc.

No change in functionality intended.

Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Ingo Molnar 
---
 drivers/firmware/efi/libstub/Makefile |4 -
 drivers/firmware/efi/libstub/fdt.c|  109 --
 scripts/dtc/libfdt/libfdt.h   |   10 +++
 3 files changed, 64 insertions(+), 59 deletions(-)

Index: tip/drivers/firmware/efi/libstub/Makefile
===
--- tip.orig/drivers/firmware/efi/libstub/Makefile
+++ tip/drivers/firmware/efi/libstub/Makefile
@@ -49,7 +49,7 @@ lib-$(CONFIG_EFI_ARMSTUB) += arm-stub.o
 
 lib-$(CONFIG_ARM)  += arm32-stub.o
 lib-$(CONFIG_ARM64)+= arm64-stub.o
-CFLAGS_arm64-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET)
+CFLAGS_arm64-stub.o:= -DTEXT_OFFSET=$(TEXT_OFFSET)
 
 #
 # arm64 puts the stub in the kernel proper, which will unnecessarily retain all
@@ -86,7 +86,7 @@ quiet_cmd_stubcopy = STUBCPY $@
   cmd_stubcopy = if $(STRIP) --strip-debug $(STUBCOPY_RM-y) -o $@ $<; \
 then if $(OBJDUMP) -r $@ | grep $(STUBCOPY_RELOC-y); \
 then (echo >&2 "$@: absolute symbol references not allowed 
in the EFI stub"; \
-  rm -f $@; /bin/false); \
+  rm -f $@; /bin/false); \
 else $(OBJCOPY) $(STUBCOPY_FLAGS-y) $< $@; fi\
 else /bin/false; fi
 
Index: tip/drivers/firmware/efi/libstub/fdt.c
===
--- tip.orig/drivers/firmware/efi/libstub/fdt.c
+++ tip/drivers/firmware/efi/libstub/fdt.c
@@ -26,10 +26,8 @@ static void fdt_update_cell_size(efi_sys
offset = fdt_path_offset(fdt, "/");
/* Set the #address-cells and #size-cells values for an empty tree */
 
-   fdt_setprop_u32(fdt, offset, "#address-cells",
-   EFI_DT_ADDR_CELLS_DEFAULT);
-
-   

Re: [PATCH 01/11] x86/efi: Allocate e820 buffer before calling efi_exit_boot_service

2018-11-29 Thread Ingo Molnar


* Ard Biesheuvel  wrote:

> From: Eric Snowberg 
> 
> Commit d64934019f6c ("x86/efi: Use efi_exit_boot_services()")
> introduced a regression on systems with large memory maps
> causing them to hang on boot. The first "goto get_map" that was removed
> from exit_boot insured there was enough room for the memory map when
> efi_call_early(exit_boot_services) was called. This happens when
> (nr_desc > ARRAY_SIZE(params->e820_table).
> 
> Chain of events:
>   exit_boot()
> efi_exit_boot_services()
>   efi_get_memory_map <- at this point the mm can't grow over 8 desc
>   priv_func()
> exit_boot_func()
>   allocate_e820ext() <- new mm grows over 8 desc from e820 alloc
>   efi_call_early(exit_boot_services) <- mm key doesn't match so retry
>   efi_call_early(get_memory_map) <- not enough room for new mm
>   system hangs
> 
> This patch allocates the e820 buffer before calling efi_exit_boot_services
> and fixes the regression.
> 
> Signed-off-by: Eric Snowberg 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/x86/boot/compressed/eboot.c | 65 
>  1 file changed, 41 insertions(+), 24 deletions(-)

Any objections against marking this for -stable and filing it in 
efi/urgent? Boot hangs are show-stopper bugs, so distributions would want 
to backport this fix anyway.

> + boot_map.map =  
> + boot_map.map_size = _size;
> + boot_map.desc_size =_size;
> + boot_map.desc_ver = NULL;
> + boot_map.key_ptr =  NULL;
> + boot_map.buff_size =_size;

I have changed this to the canonical style, which is also used elsewhere 
in the file:

>   map.buff_size   = _size;
>   priv.boot_params= boot_params;
>   priv.efi= _params->efi_info;
> - priv.e820ext= NULL;
> - priv.e820ext_size   = 0;

I.e.:

+   boot_map.map= 
+   boot_map.map_size   = _size;
+   boot_map.desc_size  = _size;
+   boot_map.desc_ver   = NULL;
+   boot_map.key_ptr= NULL;
+   boot_map.buff_size  = _size;
+
+   status = efi_get_memory_map(sys_table, _map);

Thanks,

Ingo


Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-29 Thread lijiang
在 2018年11月30日 11:37, Dave Young 写道:
> + more people
> 

Thank you, Dave.

That would be fine to let more people review this patch.

> On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
>> When doing kexec_file_load, the first kernel needs to pass the e820
>> reserved ranges to the second kernel. But kernel can not exactly
>> match the e820 reserved ranges when walking through the iomem resources
>> with the descriptor 'IORES_DESC_NONE', because several e820 types(
>> e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
>> _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
>> may pass these four types to the kdump kernel, that is not desired result.
>>
>> So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
>> for the iomem resources search interfaces. It is helpful to exactly
>> match the reserved resource ranges when walking through iomem resources.
>>
>> In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
>> these code originally related to the descriptor 'IORES_DESC_NONE' need to
>> be updated. Otherwise, it will be easily confused and also cause some
>> errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
>> descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
>> changed.
>>
>> Suggested-by: Dave Young 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/ia64/kernel/efi.c |  4 
>>  arch/x86/kernel/e820.c |  2 +-
>>  arch/x86/mm/ioremap.c  | 13 -
>>  include/linux/ioport.h |  1 +
>>  kernel/resource.c  |  6 +++---
>>  5 files changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
>> index 8f106638913c..1841e9b4db30 100644
>> --- a/arch/ia64/kernel/efi.c
>> +++ b/arch/ia64/kernel/efi.c
>> @@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
>> *code_resource,
>>  break;
>>  
>>  case EFI_RESERVED_TYPE:
>> +name = "reserved";
> 
> Ingo updated X86 code to use "Reserved",  I think it would be good to do
> same for this case as well
> 

I have noticed the changes on x86, but for IA64, i'm not sure whether it should 
do the same
thing, so keep it as before.

If IA64 people would like to give any comment, that will be appreciated.


>> +desc = IORES_DESC_RESERVED;
>> +break;
>> +
>>  case EFI_RUNTIME_SERVICES_CODE:
>>  case EFI_RUNTIME_SERVICES_DATA:
>>  case EFI_ACPI_RECLAIM_MEMORY:
> 
> Originally, above 3 are all "reserved", so probably they all should be
> IORES_DESC_RESERVED.
> 

This is a good question.

If above 3 types are converted to the new descriptor 'IORES_DESC_RESERVED', how 
to handle
the all 'default' case? Because the all 'default' case is also converted to the 
descriptor
'IORES_DESC_NONE' and the name 'reserved'.

> Can any IA64 people to review this?
> 
>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>> index 50895c2f937d..57fafdafb860 100644
>> --- a/arch/x86/kernel/e820.c
>> +++ b/arch/x86/kernel/e820.c
>> @@ -1048,10 +1048,10 @@ static unsigned long __init 
>> e820_type_to_iores_desc(struct e820_entry *entry)
>>  case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
>>  case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
>>  case E820_TYPE_PRAM:return 
>> IORES_DESC_PERSISTENT_MEMORY_LEGACY;
>> +case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
>>  case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>>  case E820_TYPE_RAM: /* Fall-through: */
>>  case E820_TYPE_UNUSABLE:/* Fall-through: */
>> -case E820_TYPE_RESERVED:/* Fall-through: */
>>  default:return IORES_DESC_NONE;
>>  }
>>  }
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 5378d10f1d31..fea2ef99415d 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
>>  
>>  static int __ioremap_check_desc_other(struct resource *res)
>>  {
>> -return (res->desc != IORES_DESC_NONE);
>> +/*
>> + * But now, the 'E820_TYPE_RESERVED' type is converted to the new
>> + * descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
>> + * it has been changed. And the value of 'mem_flags.desc_other'
>> + * is equal to 'true' if we don't strengthen the condition in this
>> + * function, that is wrong. Because originally it is equal to
>> + * 'false' for the same reserved type.
>> + *
>> + * So, that would be nice to keep it the same as before.
>> + */
>> +return ((res->desc != IORES_DESC_NONE) &&
>> +(res->desc != IORES_DESC_RESERVED));
>>  }
> 
> Added Tom since he added the check function.  Is it possible to only
> check explict valid desc 

Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-29 Thread lijiang
在 2018年11月30日 11:41, Dave Young 写道:
> On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
>> When doing kexec_file_load, the first kernel needs to pass the e820
>> reserved ranges to the second kernel. But kernel can not exactly
>> match the e820 reserved ranges when walking through the iomem resources
>> with the descriptor 'IORES_DESC_NONE', because several e820 types(
>> e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
>> _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
>> may pass these four types to the kdump kernel, that is not desired result.
>>
>> So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
>> for the iomem resources search interfaces. It is helpful to exactly
>> match the reserved resource ranges when walking through iomem resources.
>>
>> In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
>> these code originally related to the descriptor 'IORES_DESC_NONE' need to
>> be updated. Otherwise, it will be easily confused and also cause some
>> errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
>> descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
>> changed.
>>
>> Suggested-by: Dave Young 
> 
> This was suggested by Boris instead :) 
> 

Sorry for this, i forgot to change it.
And i will correct this in next version.

Thanks.

>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/ia64/kernel/efi.c |  4 
>>  arch/x86/kernel/e820.c |  2 +-
>>  arch/x86/mm/ioremap.c  | 13 -
>>  include/linux/ioport.h |  1 +
>>  kernel/resource.c  |  6 +++---
>>  5 files changed, 21 insertions(+), 5 deletions(-)
>>
>>
> [snip]
> 
> Thanks
> Dave
> 


Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-29 Thread Dave Young
Correct Toshi's email addr
On 11/30/18 at 11:37am, Dave Young wrote:
> + more people
> 
> On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
> > When doing kexec_file_load, the first kernel needs to pass the e820
> > reserved ranges to the second kernel. But kernel can not exactly
> > match the e820 reserved ranges when walking through the iomem resources
> > with the descriptor 'IORES_DESC_NONE', because several e820 types(
> > e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
> > _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
> > may pass these four types to the kdump kernel, that is not desired result.
> > 
> > So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
> > for the iomem resources search interfaces. It is helpful to exactly
> > match the reserved resource ranges when walking through iomem resources.
> > 
> > In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
> > these code originally related to the descriptor 'IORES_DESC_NONE' need to
> > be updated. Otherwise, it will be easily confused and also cause some
> > errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
> > descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
> > changed.
> > 
> > Suggested-by: Dave Young 
> > Signed-off-by: Lianbo Jiang 
> > ---
> >  arch/ia64/kernel/efi.c |  4 
> >  arch/x86/kernel/e820.c |  2 +-
> >  arch/x86/mm/ioremap.c  | 13 -
> >  include/linux/ioport.h |  1 +
> >  kernel/resource.c  |  6 +++---
> >  5 files changed, 21 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
> > index 8f106638913c..1841e9b4db30 100644
> > --- a/arch/ia64/kernel/efi.c
> > +++ b/arch/ia64/kernel/efi.c
> > @@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
> > *code_resource,
> > break;
> >  
> > case EFI_RESERVED_TYPE:
> > +   name = "reserved";
> 
> Ingo updated X86 code to use "Reserved",  I think it would be good to do
> same for this case as well
> 
> > +   desc = IORES_DESC_RESERVED;
> > +   break;
> > +
> > case EFI_RUNTIME_SERVICES_CODE:
> > case EFI_RUNTIME_SERVICES_DATA:
> > case EFI_ACPI_RECLAIM_MEMORY:
> 
> Originally, above 3 are all "reserved", so probably they all should be
> IORES_DESC_RESERVED.
> 
> Can any IA64 people to review this?
> 
> > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > index 50895c2f937d..57fafdafb860 100644
> > --- a/arch/x86/kernel/e820.c
> > +++ b/arch/x86/kernel/e820.c
> > @@ -1048,10 +1048,10 @@ static unsigned long __init 
> > e820_type_to_iores_desc(struct e820_entry *entry)
> > case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
> > case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
> > case E820_TYPE_PRAM:return 
> > IORES_DESC_PERSISTENT_MEMORY_LEGACY;
> > +   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
> > case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
> > case E820_TYPE_RAM: /* Fall-through: */
> > case E820_TYPE_UNUSABLE:/* Fall-through: */
> > -   case E820_TYPE_RESERVED:/* Fall-through: */
> > default:return IORES_DESC_NONE;
> > }
> >  }
> > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> > index 5378d10f1d31..fea2ef99415d 100644
> > --- a/arch/x86/mm/ioremap.c
> > +++ b/arch/x86/mm/ioremap.c
> > @@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
> >  
> >  static int __ioremap_check_desc_other(struct resource *res)
> >  {
> > -   return (res->desc != IORES_DESC_NONE);
> > +   /*
> > +* But now, the 'E820_TYPE_RESERVED' type is converted to the new
> > +* descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
> > +* it has been changed. And the value of 'mem_flags.desc_other'
> > +* is equal to 'true' if we don't strengthen the condition in this
> > +* function, that is wrong. Because originally it is equal to
> > +* 'false' for the same reserved type.
> > +*
> > +* So, that would be nice to keep it the same as before.
> > +*/
> > +   return ((res->desc != IORES_DESC_NONE) &&
> > +   (res->desc != IORES_DESC_RESERVED));
> >  }
> 
> Added Tom since he added the check function.  Is it possible to only
> check explict valid desc types instead of exclude IORES_DESC_NONE?
> 
> >  
> >  static int __ioremap_res_check(struct resource *res, void *arg)
> > diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> > index da0ebaec25f0..6ed59de48bd5 100644
> > --- a/include/linux/ioport.h
> > +++ b/include/linux/ioport.h
> > @@ -133,6 +133,7 @@ enum {
> > IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
> > IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
> > 

Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-29 Thread Dave Young
+ more people

On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
> When doing kexec_file_load, the first kernel needs to pass the e820
> reserved ranges to the second kernel. But kernel can not exactly
> match the e820 reserved ranges when walking through the iomem resources
> with the descriptor 'IORES_DESC_NONE', because several e820 types(
> e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
> _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
> may pass these four types to the kdump kernel, that is not desired result.
> 
> So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
> for the iomem resources search interfaces. It is helpful to exactly
> match the reserved resource ranges when walking through iomem resources.
> 
> In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
> these code originally related to the descriptor 'IORES_DESC_NONE' need to
> be updated. Otherwise, it will be easily confused and also cause some
> errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
> descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
> changed.
> 
> Suggested-by: Dave Young 
> Signed-off-by: Lianbo Jiang 
> ---
>  arch/ia64/kernel/efi.c |  4 
>  arch/x86/kernel/e820.c |  2 +-
>  arch/x86/mm/ioremap.c  | 13 -
>  include/linux/ioport.h |  1 +
>  kernel/resource.c  |  6 +++---
>  5 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
> index 8f106638913c..1841e9b4db30 100644
> --- a/arch/ia64/kernel/efi.c
> +++ b/arch/ia64/kernel/efi.c
> @@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
> *code_resource,
>   break;
>  
>   case EFI_RESERVED_TYPE:
> + name = "reserved";

Ingo updated X86 code to use "Reserved",  I think it would be good to do
same for this case as well

> + desc = IORES_DESC_RESERVED;
> + break;
> +
>   case EFI_RUNTIME_SERVICES_CODE:
>   case EFI_RUNTIME_SERVICES_DATA:
>   case EFI_ACPI_RECLAIM_MEMORY:

Originally, above 3 are all "reserved", so probably they all should be
IORES_DESC_RESERVED.

Can any IA64 people to review this?

> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 50895c2f937d..57fafdafb860 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1048,10 +1048,10 @@ static unsigned long __init 
> e820_type_to_iores_desc(struct e820_entry *entry)
>   case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
>   case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
>   case E820_TYPE_PRAM:return 
> IORES_DESC_PERSISTENT_MEMORY_LEGACY;
> + case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
>   case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>   case E820_TYPE_RAM: /* Fall-through: */
>   case E820_TYPE_UNUSABLE:/* Fall-through: */
> - case E820_TYPE_RESERVED:/* Fall-through: */
>   default:return IORES_DESC_NONE;
>   }
>  }
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 5378d10f1d31..fea2ef99415d 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
>  
>  static int __ioremap_check_desc_other(struct resource *res)
>  {
> - return (res->desc != IORES_DESC_NONE);
> + /*
> +  * But now, the 'E820_TYPE_RESERVED' type is converted to the new
> +  * descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
> +  * it has been changed. And the value of 'mem_flags.desc_other'
> +  * is equal to 'true' if we don't strengthen the condition in this
> +  * function, that is wrong. Because originally it is equal to
> +  * 'false' for the same reserved type.
> +  *
> +  * So, that would be nice to keep it the same as before.
> +  */
> + return ((res->desc != IORES_DESC_NONE) &&
> + (res->desc != IORES_DESC_RESERVED));
>  }

Added Tom since he added the check function.  Is it possible to only
check explict valid desc types instead of exclude IORES_DESC_NONE?

>  
>  static int __ioremap_res_check(struct resource *res, void *arg)
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..6ed59de48bd5 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -133,6 +133,7 @@ enum {
>   IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
>   IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
>   IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
> + IORES_DESC_RESERVED = 8,
>  };
>  
>  /* helpers to define resources */
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 

Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-29 Thread Dave Young
On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
> When doing kexec_file_load, the first kernel needs to pass the e820
> reserved ranges to the second kernel. But kernel can not exactly
> match the e820 reserved ranges when walking through the iomem resources
> with the descriptor 'IORES_DESC_NONE', because several e820 types(
> e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
> _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
> may pass these four types to the kdump kernel, that is not desired result.
> 
> So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
> for the iomem resources search interfaces. It is helpful to exactly
> match the reserved resource ranges when walking through iomem resources.
> 
> In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
> these code originally related to the descriptor 'IORES_DESC_NONE' need to
> be updated. Otherwise, it will be easily confused and also cause some
> errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
> descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
> changed.
> 
> Suggested-by: Dave Young 

This was suggested by Boris instead :) 

> Signed-off-by: Lianbo Jiang 
> ---
>  arch/ia64/kernel/efi.c |  4 
>  arch/x86/kernel/e820.c |  2 +-
>  arch/x86/mm/ioremap.c  | 13 -
>  include/linux/ioport.h |  1 +
>  kernel/resource.c  |  6 +++---
>  5 files changed, 21 insertions(+), 5 deletions(-)
> 
> 
[snip]

Thanks
Dave


Re: EFI-pstore, sleeping from back context after fault

2018-11-29 Thread Kees Cook
On Mon, Nov 26, 2018 at 9:04 AM Sebastian Andrzej Siewior
 wrote:
>
> So I triggered a bug in x86 code. First the "okay" backtrace:
> |BUG: GPF in non-whitelisted uaccess (non-canonical address?)
> |general protection fault:  [#1] PREEMPT SMP NOPTI
> |CPU: 26 PID: 2236 Comm: sig-xstate-bum Not tainted 4.20.0-rc3 #45
> |RIP: 0010:__fpu__restore_sig+0x1c1/0x540
> |Call Trace:
> | fpu__restore_sig+0x28/0x40
> | restore_sigcontext+0x13a/0x180
> | __ia32_sys_rt_sigreturn+0xae/0x100
> | do_syscall_64+0x4f/0x100
> | entry_SYSCALL_64_after_hwframe+0x44/0xa9
> |RIP: 0033:0x7f9b06aea227
> |---[ end trace a45ac23b593e9ab0 ]---

This bug got handled by Jann Horn, yes? (I remember seeing a related
thread go by...)

>
> and now the not okay part:
>
> |BUG: sleeping function called from invalid context at 
> kernel/sched/completion.c:99
> |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum
> |Preemption disabled at:
> |[] pstore_dump+0x72/0x330
> |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G  D   
> 4.20.0-rc3 #45
> |Call Trace:
> | dump_stack+0x4f/0x6a
> | ___might_sleep.cold.91+0xd3/0xe4
> | __might_sleep+0x50/0x90
> | wait_for_completion+0x32/0x130
> | virt_efi_query_variable_info+0x14e/0x160
> | efi_query_variable_store+0x51/0x1a0
> | efivar_entry_set_safe+0xa3/0x1b0
> | efi_pstore_write+0x109/0x140

Eww. :)

> | pstore_dump+0x11c/0x330
> | kmsg_dump+0xa4/0xd0
> | oops_exit+0x22/0x30
> | oops_end+0x67/0xd0
> | die+0x41/0x4a
> | do_general_protection+0xc1/0x150
> | general_protection+0x1e/0x30
> |RIP: 0010:__fpu__restore_sig+0x1c1/0x540
> | fpu__restore_sig+0x28/0x40
> | restore_sigcontext+0x13a/0x180
> | __ia32_sys_rt_sigreturn+0xae/0x100
> | do_syscall_64+0x4f/0x100
> | entry_SYSCALL_64_after_hwframe+0x44/0xa9
> |RIP: 0033:0x7f9b06aea227
> … [ moar backtrace ]
>
> This looks like it comes from commit 21b3ddd39fee ("efi: Don't use
> spinlocks for efi vars") because it replaced a spin_lock() with
> down_interruptible() in this case. In this case, since pstore_dump()
> holds psinfo->buf_lock with disabled interrupts we can't block…
>
> I have this as a workaround:

I'm not sure this is correct...

> diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c
> index 9336ffdf6e2c..d4dcfe46eb2e 100644
> --- a/drivers/firmware/efi/vars.c
> +++ b/drivers/firmware/efi/vars.c
> @@ -755,7 +755,9 @@ int efivar_entry_set_safe(efi_char16_t *name, efi_guid_t 
> vendor, u32 attributes,
> return -EINTR;
> }
>
> -   status = check_var_size(attributes, size + ucs2_strsize(name, 1024));
> +   status = ops->query_variable_store(attributes,
> +  size + ucs2_strsize(name, 1024),
> +  block);

Why does this change help?

> if (status != EFI_SUCCESS) {
> up(_lock);
> return -ENOSPC;
> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> index b821054ca3ed..9aa27a2c8d36 100644
> --- a/fs/pstore/platform.c
> +++ b/fs/pstore/platform.c
> @@ -127,10 +127,10 @@ static const char *get_reason_str(enum kmsg_dump_reason 
> reason)
>  bool pstore_cannot_block_path(enum kmsg_dump_reason reason)
>  {
> /*
> -* In case of NMI path, pstore shouldn't be blocked
> -* regardless of reason.
> +* In case of disabled preemption / interrupts we can't block on any
> +* lock regardless of reason.
>  */
> -   if (in_nmi())
> +   if (in_atomic() || irqs_disabled())
> return true;
>
> switch (reason) {

I'd like to avoid any case where pstore might "miss" a crash report,
though... this seems to make it easier for it to fail to record a
crash, yes?

I have some pending clean-ups in this area (buf_lock may not be needed
at all, actually), so I'll try to work through those too.

-Kees

-- 
Kees Cook


[PATCH] efi: let kmemleak ignore false positives

2018-11-29 Thread Qian Cai
unreferenced object 0x8096c1acf580 (size 128):
  comm "swapper/63", pid 0, jiffies 4294937418 (age 1201.230s)
  hex dump (first 32 bytes):
80 87 b5 c1 96 00 00 00 00 00 cc c2 16 00 00 00  
00 00 01 00 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b  
  backtrace:
[<1d2549ba>] kmem_cache_alloc_trace+0x430/0x500
[<93a6dfab>] efi_mem_reserve_persistent+0x50/0xf8
[<0a730828>] its_cpu_init_lpis+0x394/0x4b8
[] its_cpu_init+0x104/0x150
[<4d0342c5>] gic_starting_cpu+0x34/0x40
[<5d9da772>] cpuhp_invoke_callback+0x228/0x1d68
[<61eace9b>] notify_cpu_starting+0xc0/0x118
[<48bc2dc5>] secondary_start_kernel+0x23c/0x3b0
[<15137d6a>] 0x

efi_mem_reserve_persistent+0x50/0xf8:
kmalloc at include/linux/slab.h:546
(inlined by) efi_mem_reserve_persistent at drivers/firmware/efi/efi.c:979

This line,

rsv = kmalloc(sizeof(*rsv), GFP_ATOMIC);

Kmemleak has a known limitation that can only track pointers in the kernel
virtual space. Hence, it will report false positives due to "rsv" will only
reference to other physical addresses,

rsv->next = efi_memreserve_root->next;
efi_memreserve_root->next = __pa(rsv);

Signed-off-by: Qian Cai 
---
 drivers/firmware/efi/efi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index fad7c62..0b69bb6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -980,6 +981,8 @@ int efi_mem_reserve_persistent(phys_addr_t addr, u64 size)
if (!rsv)
return -ENOMEM;
 
+   kmemleak_ignore(rsv);
+
rsv->base = addr;
rsv->size = size;
 
-- 
1.8.3.1



RE: [GIT PULL 00/11] EFI updates

2018-11-29 Thread Prakhya, Sai Praneeth
Hi Ard,

While building from next branch of efi tree, I noticed the below warning. Could 
you please check the same on your side?

  CC  lib/list_debug.o
drivers/firmware/efi/efi.c: In function 'efi_mem_reserve_persistent':
drivers/firmware/efi/efi.c:1000:6: warning: unused variable 'rsvsize' 
[-Wunused-variable]
  int rsvsize = EFI_MEMRESERVE_SIZE(1);
  ^~~
  CC  lib/bitrev.o
  CC  net/core/sock_reuseport.o

> -Original Message-
> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org]
> Sent: Thursday, November 29, 2018 9:12 AM
> To: linux-efi@vger.kernel.org; Ingo Molnar ; Thomas
> Gleixner 
> Cc: Ard Biesheuvel ; linux-ker...@vger.kernel.org;
> Andy Lutomirski ; Arend van Spriel
> ; Bhupesh Sharma ;
> Borislav Petkov ; Hansen, Dave ; Eric
> Snowberg ; Hans de Goede
> ; Joe Perches ; Jon Hunter
> ; Julien Thierry ; Marc
> Zyngier ; Nathan Chancellor
> ; Peter Zijlstra ; Prakhya,
> Sai Praneeth ; Sedat Dilek
> ; YiFei Zhu 
> Subject: [GIT PULL 00/11] EFI updates
> 
> The following changes since commit
> 976b489120cdab2b1b3a41ffa14661db43d58190:
> 
>   efi: Prevent GICv3 WARN() by mapping the memreserve table before first use
> (2018-11-27 13:50:20 +0100)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next
> 
> for you to fetch changes up to 1d29afdbf7ae878a23627ebee81efcd213f11749:
> 
>   efi/x86: earlyprintk - Fix infinite loop on some screen widths (2018-11-28
> 17:58:42 +0100)
>

Regards,
Sai


[PATCH 10/11] efi: reduce the amount of memblock reservations for persistent allocations

2018-11-29 Thread Ard Biesheuvel
The current implementation of efi_mem_reserve_persistent() is rather
naive, in the sense that for each invocation, it creates a separate
linked list entry to describe the reservation. Since the linked list
entries themselves need to persist across subsequent kexec reboots,
every reservation created this way results in two memblock_reserve()
calls at the next boot.

On arm64 systems with 100s of CPUs, this may result in a excessive
number of memblock reservations, and needless fragmentation.

So instead, make use of the newly updated struct linux_efi_memreserve
layout to put multiple reservations into a single linked list entry.
This should get rid of the numerous tiny memblock reservations, and
effectively cut the total number of reservations in half on arm64
systems with many CPUs.

Tested-by: Marc Zyngier 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/efi.c | 20 +---
 include/linux/efi.h|  3 +++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 80b11521627a..e90bc32c2670 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -998,7 +998,8 @@ int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 
size)
 {
struct linux_efi_memreserve *rsv;
int rsvsize = EFI_MEMRESERVE_SIZE(1);
-   int rc;
+   unsigned long prsv;
+   int rc, index;
 
if (efi_memreserve_root == (void *)ULONG_MAX)
return -ENODEV;
@@ -1009,11 +1010,24 @@ int __ref efi_mem_reserve_persistent(phys_addr_t addr, 
u64 size)
return rc;
}
 
-   rsv = kmalloc(rsvsize, GFP_ATOMIC);
+   /* first try to find a slot in an existing linked list entry */
+   for (prsv = efi_memreserve_root->next; prsv; prsv = rsv->next) {
+   rsv = __va(prsv);
+   index = atomic_fetch_add_unless(>count, 1, rsv->size);
+   if (index < rsv->size) {
+   rsv->entry[index].base = addr;
+   rsv->entry[index].size = size;
+
+   return 0;
+   }
+   }
+
+   /* no slot found - allocate a new linked list entry */
+   rsv = (struct linux_efi_memreserve *)__get_free_page(GFP_ATOMIC);
if (!rsv)
return -ENOMEM;
 
-   rsv->size = 1;
+   rsv->size = EFI_MEMRESERVE_COUNT(PAGE_SIZE);
atomic_set(>count, 1);
rsv->entry[0].base = addr;
rsv->entry[0].size = size;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 4f27640fdcdc..becd5d76a207 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1724,4 +1724,7 @@ struct linux_efi_memreserve {
 #define EFI_MEMRESERVE_SIZE(count) (sizeof(struct linux_efi_memreserve) + \
(count) * sizeof(((struct linux_efi_memreserve *)0)->entry[0]))
 
+#define EFI_MEMRESERVE_COUNT(size) (((size) - sizeof(struct 
linux_efi_memreserve)) \
+   / sizeof(((struct linux_efi_memreserve *)0)->entry[0]))
+
 #endif /* _LINUX_EFI_H */
-- 
2.19.1



[PATCH 04/11] x86/mm/pageattr: Introduce helper function to unmap EFI boot services

2018-11-29 Thread Ard Biesheuvel
From: Sai Praneeth Prakhya 

Ideally, after kernel assumes control of the platform, firmware
shouldn't access EFI boot services code/data regions. But, it's noticed
that this is not so true in many x86 platforms. Hence, during boot,
kernel reserves EFI boot services code/data regions [1] and maps [2]
them to efi_pgd so that call to set_virtual_address_map() doesn't fail.
After returning from set_virtual_address_map(), kernel frees the
reserved regions [3] but they still remain mapped. Hence, introduce
kernel_unmap_pages_in_pgd() which will later be used to unmap EFI boot
services code/data regions.

While at it modify kernel_map_pages_in_pgd() by
1. Adding __init modifier because it's always used *only* during boot.
2. Add a warning if it's used after SMP is initialized because it uses
   __flush_tlb_all() which flushes mappings only on current CPU.

Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.

[1] efi_reserve_boot_services()
[2] efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd()
[3] efi_free_boot_services()

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Reviewed-by: Thomas Gleixner 
Signed-off-by: Ard Biesheuvel 
---
 arch/x86/include/asm/pgtable_types.h |  8 --
 arch/x86/mm/pageattr.c   | 40 ++--
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 106b7d0e2dae..d6ff0bbdb394 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -564,8 +564,12 @@ extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned 
long address,
unsigned int *level);
 extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
-extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
-  unsigned numpages, unsigned long page_flags);
+extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,
+ unsigned long address,
+ unsigned numpages,
+ unsigned long page_flags);
+extern int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+   unsigned long numpages);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index db7a10082238..bac35001d896 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2338,8 +2338,8 @@ bool kernel_page_present(struct page *page)
 
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
-int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
-   unsigned numpages, unsigned long page_flags)
+int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+  unsigned numpages, unsigned long page_flags)
 {
int retval = -EINVAL;
 
@@ -2353,6 +2353,8 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned 
long address,
.flags = 0,
};
 
+   WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP");
+
if (!(__supported_pte_mask & _PAGE_NX))
goto out;
 
@@ -2374,6 +2376,40 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, 
unsigned long address,
return retval;
 }
 
+/*
+ * __flush_tlb_all() flushes mappings only on current CPU and hence this
+ * function shouldn't be used in an SMP environment. Presently, it's used only
+ * during boot (way before smp_init()) by EFI subsystem and hence is ok.
+ */
+int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+unsigned long numpages)
+{
+   int retval;
+
+   /*
+* The typical sequence for unmapping is to find a pte through
+* lookup_address_in_pgd() (ideally, it should never return NULL because
+* the address is already mapped) and change it's protections. As pfn is
+* the *target* of a mapping, it's not useful while unmapping.
+*/
+   struct cpa_data cpa = {
+   .vaddr  = ,
+   .pfn= 0,
+   .pgd= pgd,
+   .numpages   = numpages,
+   .mask_set   = __pgprot(0),
+   .mask_clr   = __pgprot(_PAGE_PRESENT | _PAGE_RW),
+   .flags  = 0,
+   };
+
+   WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP");
+
+   retval = __change_page_attr_set_clr(, 0);
+   __flush_tlb_all();
+
+   return 

[PATCH 09/11] efi: permit multiple entries in persistent memreserve data structure

2018-11-29 Thread Ard Biesheuvel
In preparation of updating efi_mem_reserve_persistent() to cause less
fragmentation when dealing with many persistent reservations, update
the struct definition and the code that handles it currently so it
can describe an arbitrary number of reservations using a single linked
list entry. The actual optimization will be implemented in a subsequent
patch.

Tested-by: Marc Zyngier 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/efi.c  | 39 +
 drivers/firmware/efi/libstub/arm-stub.c |  2 +-
 include/linux/efi.h | 13 +++--
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 415849bab233..80b11521627a 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -602,21 +602,33 @@ int __init efi_apply_persistent_mem_reservations(void)
 
while (prsv) {
struct linux_efi_memreserve *rsv;
-
-   /* reserve the entry itself */
-   memblock_reserve(prsv, sizeof(*rsv));
-
-   rsv = early_memremap(prsv, sizeof(*rsv));
-   if (rsv == NULL) {
+   u8 *p;
+   int i;
+
+   /*
+* Just map a full page: that is what we will get
+* anyway, and it permits us to map the entire entry
+* before knowing its size.
+*/
+   p = early_memremap(ALIGN_DOWN(prsv, PAGE_SIZE),
+  PAGE_SIZE);
+   if (p == NULL) {
pr_err("Could not map UEFI memreserve 
entry!\n");
return -ENOMEM;
}
 
-   if (rsv->size)
-   memblock_reserve(rsv->base, rsv->size);
+   rsv = (void *)(p + prsv % PAGE_SIZE);
+
+   /* reserve the entry itself */
+   memblock_reserve(prsv, EFI_MEMRESERVE_SIZE(rsv->size));
+
+   for (i = 0; i < atomic_read(>count); i++) {
+   memblock_reserve(rsv->entry[i].base,
+rsv->entry[i].size);
+   }
 
prsv = rsv->next;
-   early_memunmap(rsv, sizeof(*rsv));
+   early_memunmap(p, PAGE_SIZE);
}
}
 
@@ -985,6 +997,7 @@ static int __init efi_memreserve_map_root(void)
 int __ref efi_mem_reserve_persistent(phys_addr_t addr, u64 size)
 {
struct linux_efi_memreserve *rsv;
+   int rsvsize = EFI_MEMRESERVE_SIZE(1);
int rc;
 
if (efi_memreserve_root == (void *)ULONG_MAX)
@@ -996,12 +1009,14 @@ int __ref efi_mem_reserve_persistent(phys_addr_t addr, 
u64 size)
return rc;
}
 
-   rsv = kmalloc(sizeof(*rsv), GFP_ATOMIC);
+   rsv = kmalloc(rsvsize, GFP_ATOMIC);
if (!rsv)
return -ENOMEM;
 
-   rsv->base = addr;
-   rsv->size = size;
+   rsv->size = 1;
+   atomic_set(>count, 1);
+   rsv->entry[0].base = addr;
+   rsv->entry[0].size = size;
 
spin_lock(_mem_reserve_persistent_lock);
rsv->next = efi_memreserve_root->next;
diff --git a/drivers/firmware/efi/libstub/arm-stub.c 
b/drivers/firmware/efi/libstub/arm-stub.c
index 3d36142cf812..9e20159ea5f5 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -86,8 +86,8 @@ void install_memreserve_table(efi_system_table_t 
*sys_table_arg)
}
 
rsv->next = 0;
-   rsv->base = 0;
rsv->size = 0;
+   atomic_set(>count, 0);
 
status = efi_call_early(install_configuration_table,
_table_guid,
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 2b3b33c83b05..4f27640fdcdc 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1712,9 +1712,16 @@ extern struct efi_runtime_work efi_rts_work;
 extern struct workqueue_struct *efi_rts_wq;
 
 struct linux_efi_memreserve {
-   phys_addr_t next;
-   phys_addr_t base;
-   phys_addr_t size;
+   int size;   // allocated size of the array
+   atomic_tcount;  // number of entries used
+   phys_addr_t next;   // pa of next struct instance
+   struct {
+   phys_addr_t base;
+   phys_addr_t size;
+   } entry[0];
 };
 
+#define EFI_MEMRESERVE_SIZE(count) (sizeof(struct linux_efi_memreserve) + \
+   (count) * sizeof(((struct linux_efi_memreserve *)0)->entry[0]))
+
 #endif /* _LINUX_EFI_H */
-- 
2.19.1



[PATCH 01/11] x86/efi: Allocate e820 buffer before calling efi_exit_boot_service

2018-11-29 Thread Ard Biesheuvel
From: Eric Snowberg 

Commit d64934019f6c ("x86/efi: Use efi_exit_boot_services()")
introduced a regression on systems with large memory maps
causing them to hang on boot. The first "goto get_map" that was removed
from exit_boot insured there was enough room for the memory map when
efi_call_early(exit_boot_services) was called. This happens when
(nr_desc > ARRAY_SIZE(params->e820_table).

Chain of events:
  exit_boot()
efi_exit_boot_services()
  efi_get_memory_map <- at this point the mm can't grow over 8 desc
  priv_func()
exit_boot_func()
  allocate_e820ext() <- new mm grows over 8 desc from e820 alloc
  efi_call_early(exit_boot_services) <- mm key doesn't match so retry
  efi_call_early(get_memory_map) <- not enough room for new mm
  system hangs

This patch allocates the e820 buffer before calling efi_exit_boot_services
and fixes the regression.

Signed-off-by: Eric Snowberg 
Signed-off-by: Ard Biesheuvel 
---
 arch/x86/boot/compressed/eboot.c | 65 
 1 file changed, 41 insertions(+), 24 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 8b4c5e001157..f7bad07bb251 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -1,3 +1,4 @@
+
 /* ---
  *
  *   Copyright 2011 Intel Corporation; author Matt Fleming
@@ -634,37 +635,54 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct 
setup_data **e820ext,
return status;
 }
 
+static efi_status_t allocate_e820(struct boot_params *params,
+ struct setup_data **e820ext,
+ u32 *e820ext_size)
+{
+   unsigned long map_size, desc_size, buff_size;
+   struct efi_boot_memmap boot_map;
+   efi_memory_desc_t *map;
+   efi_status_t status;
+   __u32 nr_desc;
+
+   boot_map.map =  
+   boot_map.map_size = _size;
+   boot_map.desc_size =_size;
+   boot_map.desc_ver = NULL;
+   boot_map.key_ptr =  NULL;
+   boot_map.buff_size =_size;
+
+   status = efi_get_memory_map(sys_table, _map);
+   if (status != EFI_SUCCESS)
+   return status;
+
+   nr_desc = buff_size / desc_size;
+
+   if (nr_desc > ARRAY_SIZE(params->e820_table)) {
+   u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table);
+
+   status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
+   if (status != EFI_SUCCESS)
+   return status;
+   }
+
+   return EFI_SUCCESS;
+}
+
 struct exit_boot_struct {
struct boot_params  *boot_params;
struct efi_info *efi;
-   struct setup_data   *e820ext;
-   __u32   e820ext_size;
 };
 
 static efi_status_t exit_boot_func(efi_system_table_t *sys_table_arg,
   struct efi_boot_memmap *map,
   void *priv)
 {
-   static bool first = true;
const char *signature;
__u32 nr_desc;
efi_status_t status;
struct exit_boot_struct *p = priv;
 
-   if (first) {
-   nr_desc = *map->buff_size / *map->desc_size;
-   if (nr_desc > ARRAY_SIZE(p->boot_params->e820_table)) {
-   u32 nr_e820ext = nr_desc -
-   ARRAY_SIZE(p->boot_params->e820_table);
-
-   status = alloc_e820ext(nr_e820ext, >e820ext,
-  >e820ext_size);
-   if (status != EFI_SUCCESS)
-   return status;
-   }
-   first = false;
-   }
-
signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
   : EFI32_LOADER_SIGNATURE;
memcpy(>efi->efi_loader_signature, signature, sizeof(__u32));
@@ -687,8 +705,8 @@ static efi_status_t exit_boot(struct boot_params 
*boot_params, void *handle)
 {
unsigned long map_sz, key, desc_size, buff_size;
efi_memory_desc_t *mem_map;
-   struct setup_data *e820ext;
-   __u32 e820ext_size;
+   struct setup_data *e820ext = NULL;
+   __u32 e820ext_size = 0;
efi_status_t status;
__u32 desc_version;
struct efi_boot_memmap map;
@@ -702,8 +720,10 @@ static efi_status_t exit_boot(struct boot_params 
*boot_params, void *handle)
map.buff_size   = _size;
priv.boot_params= boot_params;
priv.efi= _params->efi_info;
-   priv.e820ext= NULL;
-   priv.e820ext_size   = 0;
+
+   status = allocate_e820(boot_params, , _size);
+   if (status != EFI_SUCCESS)
+   return status;
 
/* Might as well exit boot services now */
status = efi_exit_boot_services(sys_table, handle, , ,
@@ -711,9 +731,6 @@ static 

[PATCH 02/11] efi/fdt: Indentation fix

2018-11-29 Thread Ard Biesheuvel
From: Julien Thierry 

Closing bracket seems to end a for statement when it is actually ending
the contained if. Add some brackets to have clear delimitation of each
scope.

No functional change/fix, just fix the indentation.

Signed-off-by: Julien Thierry 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/libstub/fdt.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/fdt.c 
b/drivers/firmware/efi/libstub/fdt.c
index 0c0d2312f4a8..a3614f9b5f75 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -376,7 +376,7 @@ void *get_fdt(efi_system_table_t *sys_table, unsigned long 
*fdt_size)
tables = (efi_config_table_t *) sys_table->tables;
fdt = NULL;
 
-   for (i = 0; i < sys_table->nr_tables; i++)
+   for (i = 0; i < sys_table->nr_tables; i++) {
if (efi_guidcmp(tables[i].guid, fdt_guid) == 0) {
fdt = (void *) tables[i].table;
if (fdt_check_header(fdt) != 0) {
@@ -385,7 +385,8 @@ void *get_fdt(efi_system_table_t *sys_table, unsigned long 
*fdt_size)
}
*fdt_size = fdt_totalsize(fdt);
break;
-}
+   }
+   }
 
return fdt;
 }
-- 
2.19.1



[PATCH 06/11] x86/efi: Move efi__boot_services() to arch/x86

2018-11-29 Thread Ard Biesheuvel
From: Sai Praneeth Prakhya 

efi__boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Acked-by: Thomas Gleixner 
Signed-off-by: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  | 2 ++
 arch/x86/platform/efi/efi.c | 2 ++
 include/linux/efi.h | 3 ---
 init/main.c | 4 
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..d1e64ac80b9c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
+extern void efi_free_boot_services(void);
+extern void efi_reserve_boot_services(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7ae939e353cd..e1cb01a22fa8 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -993,6 +993,8 @@ static void __init __efi_enter_virtual_mode(void)
panic("EFI call to SetVirtualAddressMap() failed!");
}
 
+   efi_free_boot_services();
+
/*
 * Now that EFI is in virtual mode, update the function
 * pointers in the runtime service table to the new virtual addresses.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 100ce4a4aff6..2b3b33c83b05 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t 
callback, void *arg);
 extern void efi_gettimeofday (struct timespec64 *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
 #ifdef CONFIG_X86
-extern void efi_free_boot_services(void);
 extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
 #else
-static inline void efi_free_boot_services(void) {}
 
 static inline efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
@@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size);
 extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size);
 extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern void efi_reserve_boot_services(void);
 extern int efi_get_fdt_params(struct efi_fdt_params *params);
 extern struct kobject *efi_kobj;
 
diff --git a/init/main.c b/init/main.c
index ee147103ba1b..ccefcd8e855f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -737,10 +737,6 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
-   efi_free_boot_services();
-   }
-
/* Do the rest non-__init'ed, we're now alive */
arch_call_rest_init();
 }
-- 
2.19.1



[PATCH 08/11] firmware: efi: add NULL pointer checks in efivars api functions

2018-11-29 Thread Ard Biesheuvel
From: Arend van Spriel 

Since commit:

   ce2e6db554fa ("brcmfmac: Add support for getting nvram contents from
 EFI variables")

we have a device driver accessing the efivars API. Several functions in
the efivars API assume __efivars is set, i.e., that they will be accessed
only after efivars_register() has been called. However, the following NULL
pointer access was reported calling efivar_entry_size() from the brcmfmac
device driver.

  Unable to handle kernel NULL pointer dereference at virtual address 0008
  pgd = 60bfa5f1
  [0008] *pgd=
  Internal error: Oops: 5 [#1] SMP ARM
  ...
  Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
  Workqueue: events request_firmware_work_func
  PC is at efivar_entry_size+0x28/0x90
  LR is at brcmf_fw_complete_request+0x3f8/0x8d4 [brcmfmac]
  pc : []lr : []psr: a00d0113
  sp : ede7fe28  ip : ee983410  fp : c1787f30
  r10:   r9 :   r8 : bf2b2258
  r7 : ee983000  r6 : c1604c48  r5 : ede7fe88  r4 : edf337c0
  r3 :   r2 :   r1 : ede7fe88  r0 : c17712c8
  Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
  Control: 10c5387d  Table: ad16804a  DAC: 0051

Disassembly showed that the local static variable __efivars is NULL,
which is not entirely unexpected given that it is a non-EFI platform.
So add a NULL pointer check to efivar_entry_size(), and to related
functions while at it. In efivars_register() a couple of sanity checks
are added as well.

Cc: Hans de Goede 
Reported-by: Jon Hunter 
Signed-off-by: Arend van Spriel 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/vars.c | 99 +
 1 file changed, 78 insertions(+), 21 deletions(-)

diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c
index 9336ffdf6e2c..fceaafd67ec6 100644
--- a/drivers/firmware/efi/vars.c
+++ b/drivers/firmware/efi/vars.c
@@ -318,7 +318,12 @@ EXPORT_SYMBOL_GPL(efivar_variable_is_removable);
 static efi_status_t
 check_var_size(u32 attributes, unsigned long size)
 {
-   const struct efivar_operations *fops = __efivars->ops;
+   const struct efivar_operations *fops;
+
+   if (!__efivars)
+   return EFI_UNSUPPORTED;
+
+   fops = __efivars->ops;
 
if (!fops->query_variable_store)
return EFI_UNSUPPORTED;
@@ -329,7 +334,12 @@ check_var_size(u32 attributes, unsigned long size)
 static efi_status_t
 check_var_size_nonblocking(u32 attributes, unsigned long size)
 {
-   const struct efivar_operations *fops = __efivars->ops;
+   const struct efivar_operations *fops;
+
+   if (!__efivars)
+   return EFI_UNSUPPORTED;
+
+   fops = __efivars->ops;
 
if (!fops->query_variable_store)
return EFI_UNSUPPORTED;
@@ -429,13 +439,18 @@ static void dup_variable_bug(efi_char16_t *str16, 
efi_guid_t *vendor_guid,
 int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *),
void *data, bool duplicates, struct list_head *head)
 {
-   const struct efivar_operations *ops = __efivars->ops;
+   const struct efivar_operations *ops;
unsigned long variable_name_size = 1024;
efi_char16_t *variable_name;
efi_status_t status;
efi_guid_t vendor_guid;
int err = 0;
 
+   if (!__efivars)
+   return -EFAULT;
+
+   ops = __efivars->ops;
+
variable_name = kzalloc(variable_name_size, GFP_KERNEL);
if (!variable_name) {
printk(KERN_ERR "efivars: Memory allocation failed.\n");
@@ -583,12 +598,14 @@ static void efivar_entry_list_del_unlock(struct 
efivar_entry *entry)
  */
 int __efivar_entry_delete(struct efivar_entry *entry)
 {
-   const struct efivar_operations *ops = __efivars->ops;
efi_status_t status;
 
-   status = ops->set_variable(entry->var.VariableName,
-  >var.VendorGuid,
-  0, 0, NULL);
+   if (!__efivars)
+   return -EINVAL;
+
+   status = __efivars->ops->set_variable(entry->var.VariableName,
+ >var.VendorGuid,
+ 0, 0, NULL);
 
return efi_status_to_err(status);
 }
@@ -607,12 +624,17 @@ EXPORT_SYMBOL_GPL(__efivar_entry_delete);
  */
 int efivar_entry_delete(struct efivar_entry *entry)
 {
-   const struct efivar_operations *ops = __efivars->ops;
+   const struct efivar_operations *ops;
efi_status_t status;
 
if (down_interruptible(_lock))
return -EINTR;
 
+   if (!__efivars) {
+   up(_lock);
+   return -EINVAL;
+   }
+   ops = __efivars->ops;
status = ops->set_variable(entry->var.VariableName,
   >var.VendorGuid,
   0, 0, NULL);
@@ -650,13 +672,19 @@ EXPORT_SYMBOL_GPL(efivar_entry_delete);
 int 

[PATCH 07/11] efi/libstub: Disable some warnings for x86{,_64}

2018-11-29 Thread Ard Biesheuvel
From: Nathan Chancellor 

When building the kernel with Clang, some disabled warnings appear
because this Makefile overrides KBUILD_CFLAGS for x86{,_64}. Add them to
this list so that the build is clean again.

-Wpointer-sign was disabled for the whole kernel before the beginning
of git history.

-Waddress-of-packed-member was disabled for the whole kernel in
commit bfb38988c51e ("kbuild: clang: Disable 'address-of-packed-member'
warning") and for x86/boot/compressed in commit 20c6c1890455 ("x86/boot:
Disable the address-of-packed-member compiler warning").

-Wgnu was disabled for the whole kernel in commit 61163efae020 ("kbuild:
LLVMLinux: Add Kbuild support for building kernel with Clang") and for
x86/boot/compressed in commit 6c3b56b19730 ("x86/boot: Disable Clang
warnings about GNU extensions").

Link: https://github.com/ClangBuiltLinux/linux/issues/112
Signed-off-by: Nathan Chancellor 
Reviewed-by: Sedat Dilek 
Tested-by: Sedat Dilek 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/libstub/Makefile | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/Makefile 
b/drivers/firmware/efi/libstub/Makefile
index c51627660dbb..d9845099635e 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -9,7 +9,10 @@ cflags-$(CONFIG_X86_32):= -march=i386
 cflags-$(CONFIG_X86_64):= -mcmodel=small
 cflags-$(CONFIG_X86)   += -m$(BITS) -D__KERNEL__ -O2 \
   -fPIC -fno-strict-aliasing -mno-red-zone \
-  -mno-mmx -mno-sse -fshort-wchar
+  -mno-mmx -mno-sse -fshort-wchar \
+  -Wno-pointer-sign \
+  $(call cc-disable-warning, 
address-of-packed-member) \
+  $(call cc-disable-warning, gnu)
 
 # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
 # disable the stackleak plugin
-- 
2.19.1



[PATCH 05/11] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-11-29 Thread Ard Biesheuvel
From: Sai Praneeth Prakhya 

efi_free_boot_services(), as the name suggests, frees EFI boot services
code/data regions but forgets to unmap these regions from efi_pgd. This
means that any code that's running in efi_pgd address space (e.g:
any EFI runtime service) would still be able to access these regions but
the contents of these regions would have long been over written by
someone else. So, it's important to unmap these regions. Hence,
introduce efi_unmap_pages() to unmap these regions from efi_pgd.

After unmapping EFI boot services code/data regions, any illegal access
by buggy firmware to these regions would result in page fault which will
be handled by EFI specific fault handler.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Acked-by: Thomas Gleixner 
Signed-off-by: Ard Biesheuvel 
---
 arch/x86/platform/efi/quirks.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 95e77a667ba5..09e811b9da26 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -369,6 +369,24 @@ void __init efi_reserve_boot_services(void)
}
 }
 
+/*
+ * Apart from having VA mappings for EFI boot services code/data regions,
+ * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. 
So,
+ * unmap both 1:1 and VA mappings.
+ */
+static void __init efi_unmap_pages(efi_memory_desc_t *md)
+{
+   pgd_t *pgd = efi_mm.pgd;
+   u64 pa = md->phys_addr;
+   u64 va = md->virt_addr;
+
+   if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
+   pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
+
+   if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages))
+   pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
+}
+
 void __init efi_free_boot_services(void)
 {
phys_addr_t new_phys, new_size;
@@ -393,6 +411,13 @@ void __init efi_free_boot_services(void)
continue;
}
 
+   /*
+* Before calling set_virtual_address_map(), EFI boot services
+* code/data regions were mapped as a quirk for buggy firmware.
+* Unmap them from efi_pgd before freeing them up.
+*/
+   efi_unmap_pages(md);
+
/*
 * Nasty quirk: if all sub-1MB memory is used for boot
 * services, we can get here without having allocated the
-- 
2.19.1



[PATCH 03/11] efi/fdt: Simplify get_fdt flow

2018-11-29 Thread Ard Biesheuvel
From: Julien Thierry 

Reorganize get_fdt lookup loop, clearly showing that:
- Nothing is done for table entries that do not have fdt_guid
- Once an entry with fdt_guid is found, break out of the loop

No functional changes.

Suggested-by: Joe Perches 
Signed-off-by: Julien Thierry 
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/libstub/fdt.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/firmware/efi/libstub/fdt.c 
b/drivers/firmware/efi/libstub/fdt.c
index a3614f9b5f75..0dc7b4987cc2 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -370,23 +370,24 @@ void *get_fdt(efi_system_table_t *sys_table, unsigned 
long *fdt_size)
 {
efi_guid_t fdt_guid = DEVICE_TREE_GUID;
efi_config_table_t *tables;
-   void *fdt;
int i;
 
-   tables = (efi_config_table_t *) sys_table->tables;
-   fdt = NULL;
+   tables = (efi_config_table_t *)sys_table->tables;
 
for (i = 0; i < sys_table->nr_tables; i++) {
-   if (efi_guidcmp(tables[i].guid, fdt_guid) == 0) {
-   fdt = (void *) tables[i].table;
-   if (fdt_check_header(fdt) != 0) {
-   pr_efi_err(sys_table, "Invalid header detected 
on UEFI supplied FDT, ignoring ...\n");
-   return NULL;
-   }
-   *fdt_size = fdt_totalsize(fdt);
-   break;
+   void *fdt;
+
+   if (efi_guidcmp(tables[i].guid, fdt_guid) != 0)
+   continue;
+
+   fdt = (void *)tables[i].table;
+   if (fdt_check_header(fdt) != 0) {
+   pr_efi_err(sys_table, "Invalid header detected on UEFI 
supplied FDT, ignoring ...\n");
+   return NULL;
}
+   *fdt_size = fdt_totalsize(fdt);
+   return fdt;
}
 
-   return fdt;
+   return NULL;
 }
-- 
2.19.1



[GIT PULL 00/11] EFI updates

2018-11-29 Thread Ard Biesheuvel
The following changes since commit 976b489120cdab2b1b3a41ffa14661db43d58190:

  efi: Prevent GICv3 WARN() by mapping the memreserve table before first use 
(2018-11-27 13:50:20 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git tags/efi-next

for you to fetch changes up to 1d29afdbf7ae878a23627ebee81efcd213f11749:

  efi/x86: earlyprintk - Fix infinite loop on some screen widths (2018-11-28 
17:58:42 +0100)


EFI updates for v4.21:
- Allocate the E820 buffer before doing the GetMemoryMap/ExitBootServices
  dance so we don't run out of space (Eric)
- Clear EFI boot services mappings when freeing the memory (Sai)
- Harden efivars against callers that invoke it on non-EFI boots (Arend)
- Reduce the number of memblock reservations resulting from extensive
  use of the new efi_mem_reserve_persistent() API (Ard)
- Other assorted fixes and cleanups.


Ard Biesheuvel (2):
  efi: permit multiple entries in persistent memreserve data structure
  efi: reduce the amount of memblock reservations for persistent allocations

Arend van Spriel (1):
  firmware: efi: add NULL pointer checks in efivars api functions

Eric Snowberg (1):
  x86/efi: Allocate e820 buffer before calling efi_exit_boot_service

Julien Thierry (2):
  efi/fdt: Indentation fix
  efi/fdt: Simplify get_fdt flow

Nathan Chancellor (1):
  efi/libstub: Disable some warnings for x86{,_64}

Sai Praneeth Prakhya (3):
  x86/mm/pageattr: Introduce helper function to unmap EFI boot services
  x86/efi: Unmap EFI boot services code/data regions from efi_pgd
  x86/efi: Move efi__boot_services() to arch/x86

YiFei Zhu (1):
  efi/x86: earlyprintk - Fix infinite loop on some screen widths

 arch/x86/boot/compressed/eboot.c| 65 ++
 arch/x86/include/asm/efi.h  |  2 +
 arch/x86/include/asm/pgtable_types.h|  8 ++-
 arch/x86/mm/pageattr.c  | 40 -
 arch/x86/platform/efi/early_printk.c|  2 +-
 arch/x86/platform/efi/efi.c |  2 +
 arch/x86/platform/efi/quirks.c  | 25 +
 drivers/firmware/efi/efi.c  | 55 +-
 drivers/firmware/efi/libstub/Makefile   |  5 +-
 drivers/firmware/efi/libstub/arm-stub.c |  2 +-
 drivers/firmware/efi/libstub/fdt.c  | 30 +-
 drivers/firmware/efi/vars.c | 99 ++---
 include/linux/efi.h | 19 +--
 init/main.c |  4 --
 14 files changed, 269 insertions(+), 89 deletions(-)


Re: [PATCH 4.9 39/92] efi/arm: Revert deferred unmap of early memmap mapping

2018-11-29 Thread Ard Biesheuvel
On Thu, 29 Nov 2018 at 16:02, Greg Kroah-Hartman
 wrote:
>
> On Thu, Nov 29, 2018 at 03:28:44PM +0100, Ard Biesheuvel wrote:
> > On Thu, 29 Nov 2018 at 15:21, Greg Kroah-Hartman
> >  wrote:
> > >
> > > 4.9-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > >
> > > --
> > >
> > > [ Upstream commit 33412b8673135b18ea42beb7f5117ed0091798b6 ]
> > >
> > > Commit:
> > >
> > >   3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory map 
> > > longer for BGRT")
> > >
> > > deferred the unmap of the early mapping of the UEFI memory map to
> > > accommodate the ACPI BGRT code, which looks up the memory type that
> > > backs the BGRT table to validate it against the requirements of the UEFI 
> > > spec.
> > >
> > > Unfortunately, this causes problems on ARM, which does not permit
> > > early mappings to persist after paging_init() is called, resulting
> > > in a WARN() splat. Since we don't support the BGRT table on ARM anway,
> > > let's revert ARM to the old behaviour, which is to take down the
> > > early mapping at the end of efi_init().
> > >
> > > Signed-off-by: Ard Biesheuvel 
> > > Cc: Linus Torvalds 
> > > Cc: Peter Zijlstra 
> > > Cc: Thomas Gleixner 
> > > Cc: linux-efi@vger.kernel.org
> > > Fixes: 3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory ...")
> >
> > This commit is only in v4.19 as far as I know. Does it even apply?
>
> This commit is in the following releases:
> 4.9.129 4.14.72 4.18.10 4.19
>
> so it should apply :)
>

Ah, of course.

I didn't realize (or failed to remember) that it had been taken into -stable.

Thanks,


Re: [PATCH 4.9 39/92] efi/arm: Revert deferred unmap of early memmap mapping

2018-11-29 Thread Greg Kroah-Hartman
On Thu, Nov 29, 2018 at 03:28:44PM +0100, Ard Biesheuvel wrote:
> On Thu, 29 Nov 2018 at 15:21, Greg Kroah-Hartman
>  wrote:
> >
> > 4.9-stable review patch.  If anyone has any objections, please let me know.
> >
> > --
> >
> > [ Upstream commit 33412b8673135b18ea42beb7f5117ed0091798b6 ]
> >
> > Commit:
> >
> >   3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory map longer 
> > for BGRT")
> >
> > deferred the unmap of the early mapping of the UEFI memory map to
> > accommodate the ACPI BGRT code, which looks up the memory type that
> > backs the BGRT table to validate it against the requirements of the UEFI 
> > spec.
> >
> > Unfortunately, this causes problems on ARM, which does not permit
> > early mappings to persist after paging_init() is called, resulting
> > in a WARN() splat. Since we don't support the BGRT table on ARM anway,
> > let's revert ARM to the old behaviour, which is to take down the
> > early mapping at the end of efi_init().
> >
> > Signed-off-by: Ard Biesheuvel 
> > Cc: Linus Torvalds 
> > Cc: Peter Zijlstra 
> > Cc: Thomas Gleixner 
> > Cc: linux-efi@vger.kernel.org
> > Fixes: 3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory ...")
> 
> This commit is only in v4.19 as far as I know. Does it even apply?

This commit is in the following releases:
4.9.129 4.14.72 4.18.10 4.19

so it should apply :)

thanks,

greg k-h


[PATCH 4.9 39/92] efi/arm: Revert deferred unmap of early memmap mapping

2018-11-29 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

[ Upstream commit 33412b8673135b18ea42beb7f5117ed0091798b6 ]

Commit:

  3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory map longer for 
BGRT")

deferred the unmap of the early mapping of the UEFI memory map to
accommodate the ACPI BGRT code, which looks up the memory type that
backs the BGRT table to validate it against the requirements of the UEFI spec.

Unfortunately, this causes problems on ARM, which does not permit
early mappings to persist after paging_init() is called, resulting
in a WARN() splat. Since we don't support the BGRT table on ARM anway,
let's revert ARM to the old behaviour, which is to take down the
early mapping at the end of efi_init().

Signed-off-by: Ard Biesheuvel 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-efi@vger.kernel.org
Fixes: 3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory ...")
Link: http://lkml.kernel.org/r/20181114175544.12860-3-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 drivers/firmware/efi/arm-init.c| 4 
 drivers/firmware/efi/arm-runtime.c | 2 +-
 drivers/firmware/efi/memmap.c  | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index 1d1c9693ebfb..8ee91777abce 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -256,6 +256,10 @@ void __init efi_init(void)
(params.mmap & ~PAGE_MASK)));
 
init_screen_info();
+
+   /* ARM does not permit early mappings to persist across paging_init() */
+   if (IS_ENABLED(CONFIG_ARM))
+   efi_memmap_unmap();
 }
 
 static int __init register_gop_device(void)
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 4d788e0debfe..069c5a4479e6 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -118,7 +118,7 @@ static int __init arm_enable_runtime_services(void)
 {
u64 mapsize;
 
-   if (!efi_enabled(EFI_BOOT) || !efi_enabled(EFI_MEMMAP)) {
+   if (!efi_enabled(EFI_BOOT)) {
pr_info("EFI services will not be available.\n");
return 0;
}
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 78686443cb37..3fd2b450c649 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -117,6 +117,9 @@ int __init efi_memmap_init_early(struct efi_memory_map_data 
*data)
 
 void __init efi_memmap_unmap(void)
 {
+   if (!efi_enabled(EFI_MEMMAP))
+   return;
+
if (!efi.memmap.late) {
unsigned long size;
 
-- 
2.17.1





[PATCH 4.14 061/100] efi/arm: Revert deferred unmap of early memmap mapping

2018-11-29 Thread Greg Kroah-Hartman
4.14-stable review patch.  If anyone has any objections, please let me know.

--

[ Upstream commit 33412b8673135b18ea42beb7f5117ed0091798b6 ]

Commit:

  3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory map longer for 
BGRT")

deferred the unmap of the early mapping of the UEFI memory map to
accommodate the ACPI BGRT code, which looks up the memory type that
backs the BGRT table to validate it against the requirements of the UEFI spec.

Unfortunately, this causes problems on ARM, which does not permit
early mappings to persist after paging_init() is called, resulting
in a WARN() splat. Since we don't support the BGRT table on ARM anway,
let's revert ARM to the old behaviour, which is to take down the
early mapping at the end of efi_init().

Signed-off-by: Ard Biesheuvel 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-efi@vger.kernel.org
Fixes: 3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory ...")
Link: http://lkml.kernel.org/r/20181114175544.12860-3-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 drivers/firmware/efi/arm-init.c| 4 
 drivers/firmware/efi/arm-runtime.c | 2 +-
 drivers/firmware/efi/memmap.c  | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index a7c522eac640..312f9f32e168 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -265,6 +265,10 @@ void __init efi_init(void)
(params.mmap & ~PAGE_MASK)));
 
init_screen_info();
+
+   /* ARM does not permit early mappings to persist across paging_init() */
+   if (IS_ENABLED(CONFIG_ARM))
+   efi_memmap_unmap();
 }
 
 static int __init register_gop_device(void)
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 8995a48bd067..ad1530aff633 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -122,7 +122,7 @@ static int __init arm_enable_runtime_services(void)
 {
u64 mapsize;
 
-   if (!efi_enabled(EFI_BOOT) || !efi_enabled(EFI_MEMMAP)) {
+   if (!efi_enabled(EFI_BOOT)) {
pr_info("EFI services will not be available.\n");
return 0;
}
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 5fc70520e04c..1907db2b38d8 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -118,6 +118,9 @@ int __init efi_memmap_init_early(struct efi_memory_map_data 
*data)
 
 void __init efi_memmap_unmap(void)
 {
+   if (!efi_enabled(EFI_MEMMAP))
+   return;
+
if (!efi.memmap.late) {
unsigned long size;
 
-- 
2.17.1





Re: [PATCH 4.9 39/92] efi/arm: Revert deferred unmap of early memmap mapping

2018-11-29 Thread Ard Biesheuvel
On Thu, 29 Nov 2018 at 15:21, Greg Kroah-Hartman
 wrote:
>
> 4.9-stable review patch.  If anyone has any objections, please let me know.
>
> --
>
> [ Upstream commit 33412b8673135b18ea42beb7f5117ed0091798b6 ]
>
> Commit:
>
>   3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory map longer 
> for BGRT")
>
> deferred the unmap of the early mapping of the UEFI memory map to
> accommodate the ACPI BGRT code, which looks up the memory type that
> backs the BGRT table to validate it against the requirements of the UEFI spec.
>
> Unfortunately, this causes problems on ARM, which does not permit
> early mappings to persist after paging_init() is called, resulting
> in a WARN() splat. Since we don't support the BGRT table on ARM anway,
> let's revert ARM to the old behaviour, which is to take down the
> early mapping at the end of efi_init().
>
> Signed-off-by: Ard Biesheuvel 
> Cc: Linus Torvalds 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: linux-efi@vger.kernel.org
> Fixes: 3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory ...")

This commit is only in v4.19 as far as I know. Does it even apply?


> Link: 
> http://lkml.kernel.org/r/20181114175544.12860-3-ard.biesheu...@linaro.org
> Signed-off-by: Ingo Molnar 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/firmware/efi/arm-init.c| 4 
>  drivers/firmware/efi/arm-runtime.c | 2 +-
>  drivers/firmware/efi/memmap.c  | 3 +++
>  3 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> index 1d1c9693ebfb..8ee91777abce 100644
> --- a/drivers/firmware/efi/arm-init.c
> +++ b/drivers/firmware/efi/arm-init.c
> @@ -256,6 +256,10 @@ void __init efi_init(void)
> (params.mmap & ~PAGE_MASK)));
>
> init_screen_info();
> +
> +   /* ARM does not permit early mappings to persist across paging_init() 
> */
> +   if (IS_ENABLED(CONFIG_ARM))
> +   efi_memmap_unmap();
>  }
>
>  static int __init register_gop_device(void)
> diff --git a/drivers/firmware/efi/arm-runtime.c 
> b/drivers/firmware/efi/arm-runtime.c
> index 4d788e0debfe..069c5a4479e6 100644
> --- a/drivers/firmware/efi/arm-runtime.c
> +++ b/drivers/firmware/efi/arm-runtime.c
> @@ -118,7 +118,7 @@ static int __init arm_enable_runtime_services(void)
>  {
> u64 mapsize;
>
> -   if (!efi_enabled(EFI_BOOT) || !efi_enabled(EFI_MEMMAP)) {
> +   if (!efi_enabled(EFI_BOOT)) {
> pr_info("EFI services will not be available.\n");
> return 0;
> }
> diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> index 78686443cb37..3fd2b450c649 100644
> --- a/drivers/firmware/efi/memmap.c
> +++ b/drivers/firmware/efi/memmap.c
> @@ -117,6 +117,9 @@ int __init efi_memmap_init_early(struct 
> efi_memory_map_data *data)
>
>  void __init efi_memmap_unmap(void)
>  {
> +   if (!efi_enabled(EFI_MEMMAP))
> +   return;
> +
> if (!efi.memmap.late) {
> unsigned long size;
>
> --
> 2.17.1
>
>
>


[PATCH 4.19 094/110] efi/arm: Revert deferred unmap of early memmap mapping

2018-11-29 Thread Greg Kroah-Hartman
4.19-stable review patch.  If anyone has any objections, please let me know.

--

[ Upstream commit 33412b8673135b18ea42beb7f5117ed0091798b6 ]

Commit:

  3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory map longer for 
BGRT")

deferred the unmap of the early mapping of the UEFI memory map to
accommodate the ACPI BGRT code, which looks up the memory type that
backs the BGRT table to validate it against the requirements of the UEFI spec.

Unfortunately, this causes problems on ARM, which does not permit
early mappings to persist after paging_init() is called, resulting
in a WARN() splat. Since we don't support the BGRT table on ARM anway,
let's revert ARM to the old behaviour, which is to take down the
early mapping at the end of efi_init().

Signed-off-by: Ard Biesheuvel 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-efi@vger.kernel.org
Fixes: 3ea86495aef2 ("efi/arm: preserve early mapping of UEFI memory ...")
Link: http://lkml.kernel.org/r/20181114175544.12860-3-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 drivers/firmware/efi/arm-init.c| 4 
 drivers/firmware/efi/arm-runtime.c | 2 +-
 drivers/firmware/efi/memmap.c  | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index 388a929baf95..1a6a77df8a5e 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -265,6 +265,10 @@ void __init efi_init(void)
(params.mmap & ~PAGE_MASK)));
 
init_screen_info();
+
+   /* ARM does not permit early mappings to persist across paging_init() */
+   if (IS_ENABLED(CONFIG_ARM))
+   efi_memmap_unmap();
 }
 
 static int __init register_gop_device(void)
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 922cfb813109..a00934d263c5 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -110,7 +110,7 @@ static int __init arm_enable_runtime_services(void)
 {
u64 mapsize;
 
-   if (!efi_enabled(EFI_BOOT) || !efi_enabled(EFI_MEMMAP)) {
+   if (!efi_enabled(EFI_BOOT)) {
pr_info("EFI services will not be available.\n");
return 0;
}
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 5fc70520e04c..1907db2b38d8 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -118,6 +118,9 @@ int __init efi_memmap_init_early(struct efi_memory_map_data 
*data)
 
 void __init efi_memmap_unmap(void)
 {
+   if (!efi_enabled(EFI_MEMMAP))
+   return;
+
if (!efi.memmap.late) {
unsigned long size;
 
-- 
2.17.1





[PATCH 0/2 v8] add reserved e820 ranges to the kdump kernel e820 table

2018-11-29 Thread Lianbo Jiang
This patchset did two things:
a). add a new I/O resource descriptor 'IORES_DESC_RESERVED'

When doing kexec_file_load, the first kernel needs to pass the e820
reserved ranges to the second kernel. But kernel can not exactly
match the e820 reserved ranges when walking through the iomem resources
with the descriptor 'IORES_DESC_NONE', because several e820 types(
e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
_TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
may pass these four types to the kdump kernel, that is not desired result.

So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
these code originally related to the descriptor 'IORES_DESC_NONE' need to
be updated. Otherwise, it will be easily confused and also cause some
errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
changed.

b). add the e820 reserved ranges to kdump kernel e820 table

At present, when use the kexec_file_load syscall to load the kernel image
and initramfs(for example: kexec -s -p xxx), kernel does not pass the e820
reserved ranges to the second kernel, which might cause two problems:

The first one is the MMCONFIG issue. The basic problem is that this device
is in PCI segment 1 and the kernel PCI probing can not find it without all
the e820 I/O reservations being present in the e820 table. And the kdump
kernel does not have those reservations because the kexec command does not
pass the I/O reservation via the "memmap=xxx" command line option. (This
problem does not show up for other vendors, as SGI is apparently the
actually fails for everyone, but devices in segment 0 are then found by
some legacy lookup method.) The workaround for this is to pass the I/O
reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()


Lianbo Jiang (2):
  resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/ia64/kernel/efi.c  |  4 
 arch/x86/kernel/crash.c |  6 ++
 arch/x86/kernel/e820.c  |  2 +-
 arch/x86/mm/ioremap.c   | 13 -
 include/linux/ioport.h  |  1 +
 kernel/resource.c   |  6 +++---
 6 files changed, 27 insertions(+), 5 deletions(-)

-- 
2.17.1



[PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-29 Thread Lianbo Jiang
When doing kexec_file_load, the first kernel needs to pass the e820
reserved ranges to the second kernel. But kernel can not exactly
match the e820 reserved ranges when walking through the iomem resources
with the descriptor 'IORES_DESC_NONE', because several e820 types(
e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
_TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
may pass these four types to the kdump kernel, that is not desired result.

So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
these code originally related to the descriptor 'IORES_DESC_NONE' need to
be updated. Otherwise, it will be easily confused and also cause some
errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
changed.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/ia64/kernel/efi.c |  4 
 arch/x86/kernel/e820.c |  2 +-
 arch/x86/mm/ioremap.c  | 13 -
 include/linux/ioport.h |  1 +
 kernel/resource.c  |  6 +++---
 5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 8f106638913c..1841e9b4db30 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
*code_resource,
break;
 
case EFI_RESERVED_TYPE:
+   name = "reserved";
+   desc = IORES_DESC_RESERVED;
+   break;
+
case EFI_RUNTIME_SERVICES_CODE:
case EFI_RUNTIME_SERVICES_DATA:
case EFI_ACPI_RECLAIM_MEMORY:
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 50895c2f937d..57fafdafb860 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1048,10 +1048,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 5378d10f1d31..fea2ef99415d 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
 
 static int __ioremap_check_desc_other(struct resource *res)
 {
-   return (res->desc != IORES_DESC_NONE);
+   /*
+* But now, the 'E820_TYPE_RESERVED' type is converted to the new
+* descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
+* it has been changed. And the value of 'mem_flags.desc_other'
+* is equal to 'true' if we don't strengthen the condition in this
+* function, that is wrong. Because originally it is equal to
+* 'false' for the same reserved type.
+*
+* So, that would be nice to keep it the same as before.
+*/
+   return ((res->desc != IORES_DESC_NONE) &&
+   (res->desc != IORES_DESC_RESERVED));
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */
diff --git a/kernel/resource.c b/kernel/resource.c
index b0fbf685c77a..f34a632c4169 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -994,7 +994,7 @@ __reserve_region_with_split(struct resource *root, 
resource_size_t start,
res->start = start;
res->end = end;
res->flags = type | IORESOURCE_BUSY;
-   res->desc = IORES_DESC_NONE;
+   res->desc = IORES_DESC_RESERVED;
 
while (1) {
 
@@ -1029,7 +1029,7 @@ __reserve_region_with_split(struct resource *root, 
resource_size_t start,
next_res->start = conflict->end + 1;

[PATCH 2/2 v8] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2018-11-29 Thread Lianbo Jiang
At present, when use the kexec_file_load syscall to load the kernel image
and initramfs(for example: kexec -s -p xxx), kernel does not pass the e820
reserved ranges to the second kernel, which might cause two problems:

The first one is the MMCONFIG issue. The basic problem is that this device
is in PCI segment 1 and the kernel PCI probing can not find it without all
the e820 I/O reservations being present in the e820 table. And the kdump
kernel does not have those reservations because the kexec command does not
pass the I/O reservation via the "memmap=xxx" command line option. (This
problem does not show up for other vendors, as SGI is apparently the
actually fails for everyone, but devices in segment 0 are then found by
some legacy lookup method.) The workaround for this is to pass the I/O
reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index f631a3f15587..5354a84f1684 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -380,6 +380,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1