Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-06-21 Thread Dan Williams
On Sat, Jun 8, 2019 at 12:20 AM Ard Biesheuvel
 wrote:
>
> On Fri, 7 Jun 2019 at 19:34, Dan Williams  wrote:
> >
> > On Fri, Jun 7, 2019 at 8:23 AM Dan Williams  
> > wrote:
> > >
> > > On Fri, Jun 7, 2019 at 5:29 AM Ard Biesheuvel  
> > > wrote:
> > [..]
> > > > > #ifdef CONFIG_EFI_APPLICATION_RESERVED
> > > > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> > > > > {
> > > > > return md->type == EFI_CONVENTIONAL_MEMORY
> > > > > && (md->attribute & EFI_MEMORY_SP);
> > > > > }
> > > > > #else
> > > > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> > > > > {
> > > > > return false;
> > > > > }
> > > > > #endif
> > > >
> > > > I think this policy decision should not live inside the EFI subsystem.
> > > > EFI just gives you the memory map, and mangling that information
> > > > depending on whether you think a certain memory attribute should be
> > > > ignored is the job of the MM subsystem.
> > >
> > > The problem is that we don't have an mm subsystem at the time a
> > > decision needs to be made. The reservation policy needs to be deployed
> > > before even memblock has been initialized in order to keep kernel
> > > allocations out of the reservation. I agree with the sentiment I just
> > > don't see how to practically achieve an optional "System RAM" vs
> > > "Application Reserved" routing decision without an early (before
> > > e820__memblock_setup()) conditional branch.
> >
> > I can at least move it out of include/linux/efi.h and move it to
> > arch/x86/include/asm/efi.h since it is an x86 specific policy decision
> > / implementation for now.
>
> No, that doesn't make sense to me. If it must live in the EFI
> subsystem, I'd prefer it to be in the core code, not in x86 specific
> code, since there is nothing x86 specific about it.

The decision on whether / if to take any action on this hint is
implementation specific, so I argue it does not belong in the EFI
core. The spec does not mandate any action as it's just a hint.
Instead x86 is making a policy decision in how it translates it to the
x86-specific E820 representation. So, I as I go to release v3 of this
patch set I do not see an argument to move the
is_efi_application_reserved() definition out of
arch/x86/include/asm/efi.h it's 100% tied to the e820 translation.

Now, if some other EFI supporting architecture wanted to follow the
x86 policy we could move it it to a shared location, but that's
something for a follow-on patch set.


Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-06-08 Thread Dan Williams
On Sat, Jun 8, 2019 at 12:20 AM Ard Biesheuvel
 wrote:
>
> On Fri, 7 Jun 2019 at 19:34, Dan Williams  wrote:
> >
> > On Fri, Jun 7, 2019 at 8:23 AM Dan Williams  
> > wrote:
> > >
> > > On Fri, Jun 7, 2019 at 5:29 AM Ard Biesheuvel  
> > > wrote:
> > [..]
> > > > > #ifdef CONFIG_EFI_APPLICATION_RESERVED
> > > > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> > > > > {
> > > > > return md->type == EFI_CONVENTIONAL_MEMORY
> > > > > && (md->attribute & EFI_MEMORY_SP);
> > > > > }
> > > > > #else
> > > > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> > > > > {
> > > > > return false;
> > > > > }
> > > > > #endif
> > > >
> > > > I think this policy decision should not live inside the EFI subsystem.
> > > > EFI just gives you the memory map, and mangling that information
> > > > depending on whether you think a certain memory attribute should be
> > > > ignored is the job of the MM subsystem.
> > >
> > > The problem is that we don't have an mm subsystem at the time a
> > > decision needs to be made. The reservation policy needs to be deployed
> > > before even memblock has been initialized in order to keep kernel
> > > allocations out of the reservation. I agree with the sentiment I just
> > > don't see how to practically achieve an optional "System RAM" vs
> > > "Application Reserved" routing decision without an early (before
> > > e820__memblock_setup()) conditional branch.
> >
> > I can at least move it out of include/linux/efi.h and move it to
> > arch/x86/include/asm/efi.h since it is an x86 specific policy decision
> > / implementation for now.
>
> No, that doesn't make sense to me. If it must live in the EFI
> subsystem, I'd prefer it to be in the core code, not in x86 specific
> code, since there is nothing x86 specific about it.

Ok, but it's still not clear to me where you would accept an early
detection of EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP and route it away
from the "System RAM" default. Please just recommend a place to land a
conditional branch that translates between the base EFI type +
attribute and E820_RAM and E820_APPLICATION_RESERVED.

> Perhaps a efi=xxx command line option would be in order to influence
> the builtin default, but it can be a followup patch independent of
> this series.

Sure, but I expect the default polarity of the branch is a compile
time option with an efi= override.


Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-06-08 Thread Ard Biesheuvel
On Fri, 7 Jun 2019 at 19:34, Dan Williams  wrote:
>
> On Fri, Jun 7, 2019 at 8:23 AM Dan Williams  wrote:
> >
> > On Fri, Jun 7, 2019 at 5:29 AM Ard Biesheuvel  
> > wrote:
> [..]
> > > > #ifdef CONFIG_EFI_APPLICATION_RESERVED
> > > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> > > > {
> > > > return md->type == EFI_CONVENTIONAL_MEMORY
> > > > && (md->attribute & EFI_MEMORY_SP);
> > > > }
> > > > #else
> > > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> > > > {
> > > > return false;
> > > > }
> > > > #endif
> > >
> > > I think this policy decision should not live inside the EFI subsystem.
> > > EFI just gives you the memory map, and mangling that information
> > > depending on whether you think a certain memory attribute should be
> > > ignored is the job of the MM subsystem.
> >
> > The problem is that we don't have an mm subsystem at the time a
> > decision needs to be made. The reservation policy needs to be deployed
> > before even memblock has been initialized in order to keep kernel
> > allocations out of the reservation. I agree with the sentiment I just
> > don't see how to practically achieve an optional "System RAM" vs
> > "Application Reserved" routing decision without an early (before
> > e820__memblock_setup()) conditional branch.
>
> I can at least move it out of include/linux/efi.h and move it to
> arch/x86/include/asm/efi.h since it is an x86 specific policy decision
> / implementation for now.

No, that doesn't make sense to me. If it must live in the EFI
subsystem, I'd prefer it to be in the core code, not in x86 specific
code, since there is nothing x86 specific about it.

Perhaps a efi=xxx command line option would be in order to influence
the builtin default, but it can be a followup patch independent of
this series.


Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-06-07 Thread Dan Williams
On Fri, Jun 7, 2019 at 8:23 AM Dan Williams  wrote:
>
> On Fri, Jun 7, 2019 at 5:29 AM Ard Biesheuvel  
> wrote:
[..]
> > > #ifdef CONFIG_EFI_APPLICATION_RESERVED
> > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> > > {
> > > return md->type == EFI_CONVENTIONAL_MEMORY
> > > && (md->attribute & EFI_MEMORY_SP);
> > > }
> > > #else
> > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> > > {
> > > return false;
> > > }
> > > #endif
> >
> > I think this policy decision should not live inside the EFI subsystem.
> > EFI just gives you the memory map, and mangling that information
> > depending on whether you think a certain memory attribute should be
> > ignored is the job of the MM subsystem.
>
> The problem is that we don't have an mm subsystem at the time a
> decision needs to be made. The reservation policy needs to be deployed
> before even memblock has been initialized in order to keep kernel
> allocations out of the reservation. I agree with the sentiment I just
> don't see how to practically achieve an optional "System RAM" vs
> "Application Reserved" routing decision without an early (before
> e820__memblock_setup()) conditional branch.

I can at least move it out of include/linux/efi.h and move it to
arch/x86/include/asm/efi.h since it is an x86 specific policy decision
/ implementation for now.


Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-06-07 Thread Dan Williams
On Fri, Jun 7, 2019 at 5:29 AM Ard Biesheuvel  wrote:
>
> On Sat, 1 Jun 2019 at 06:26, Dan Williams  wrote:
> >
> > On Fri, May 31, 2019 at 8:30 AM Ard Biesheuvel
> >  wrote:
> > >
> > > On Fri, 31 May 2019 at 17:28, Dan Williams  
> > > wrote:
> > > >
> > > > On Fri, May 31, 2019 at 1:30 AM Ard Biesheuvel
> > > >  wrote:
> > > > >
> > > > > (cc Mike for memblock)
> > > > >
> > > > > On Fri, 31 May 2019 at 01:13, Dan Williams  
> > > > > wrote:
> > > > > >
> > > > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > > > > interpretation of the EFI Memory Types as "reserved for a special
> > > > > > purpose".
> > > > > >
> > > > > > The proposed Linux behavior for specific purpose memory is that it 
> > > > > > is
> > > > > > reserved for direct-access (device-dax) by default and not 
> > > > > > available for
> > > > > > any kernel usage, not even as an OOM fallback. Later, through udev
> > > > > > scripts or another init mechanism, these device-dax claimed ranges 
> > > > > > can
> > > > > > be reconfigured and hot-added to the available System-RAM with a 
> > > > > > unique
> > > > > > node identifier.
> > > > > >
> > > > > > This patch introduces 3 new concepts at once given the entanglement
> > > > > > between early boot enumeration relative to memory that can 
> > > > > > optionally be
> > > > > > reserved from the kernel page allocator by default. The new concepts
> > > > > > are:
> > > > > >
> > > > > > - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
> > > > > >   EFI_CONVENTIONAL memory, update the E820 map with this new type. 
> > > > > > Only
> > > > > >   perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y 
> > > > > > policy is
> > > > > >   enabled, otherwise treat it as typical ram.
> > > > > >
> > > > >
> > > > > OK, so now we have 'special purpose', 'specific' and 'app specific'
> > > > > [below]. Do they all mean the same thing?
> > > >
> > > > I struggled with separating the raw-EFI-type name from the name of the
> > > > Linux specific policy. Since the reservation behavior is optional I
> > > > was thinking there should be a distinct Linux kernel name for that
> > > > policy. I did try to go back and change all occurrences of "special"
> > > > to "specific" from the RFC to this v2, but seems I missed one.
> > > >
> > >
> > > OK
> >
> > I'll go ahead and use "application reserved" terminology consistently
> > throughout the code to distinguish that Linux translation from the raw
> > "EFI specific purpose" attribute.
> >
>
> OK
>
> > >
> > > > >
> > > > > > - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource 
> > > > > > descriptor for
> > > > > >   a device driver to search iomem resources for application specific
> > > > > >   memory. Teach the iomem code to identify such ranges as 
> > > > > > "Application
> > > > > >   Reserved".
> > > > > >
> > > > > > - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
> > > > > >   traditional System RAM pool the expectation is that they will have
> > > > > >   typical SRAT entries. In order to support a policy of device-dax 
> > > > > > by
> > > > > >   default with the option to hotplug later, the numa initialization 
> > > > > > code
> > > > > >   is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.
> > > > > >
> > > > >
> > > > > Can we move the generic memblock changes into a separate patch please?
> > > >
> > > > Yeah, that can move to a lead-in patch.
> > > >
> > > > [..]
> > > > > > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > > > > > index 91368f5ce114..b57b123cbdf9 100644
> > > > > > --- a/include/linux/efi.h
> > > > > > +++ b/include/linux/efi.h
> > > > > > @@ -129,6 +129,19 @@ typedef struct {
> > > > > > u64 attribute;
> > > > > >  } efi_memory_desc_t;
> > > > > >
> > > > > > +#ifdef CONFIG_EFI_SPECIFIC_DAX
> > > > > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > > > > +{
> > > > > > +   return md->type == EFI_CONVENTIONAL_MEMORY
> > > > > > +   && (md->attribute & EFI_MEMORY_SP);
> > > > > > +}
> > > > > > +#else
> > > > > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > > > > +{
> > > > > > +   return false;
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > >  typedef struct {
> > > > > > efi_guid_t guid;
> > > > > > u32 headersize;
> > > > >
> > > > > I'd prefer it if we could avoid this DAX policy distinction leaking
> > > > > into the EFI layer.
> > > > >
> > > > > IOW, I am fine with having a 'is_efi_sp_memory()' helper here, but
> > > > > whether that is DAX memory or not should be decided in the DAX layer.
> > > >
> > > > Ok, how about is_efi_sp_ram()? Since EFI_MEMORY_SP might be applied to
> > > > things that aren't EFI_CONVENTIONAL_MEMORY.
> > >
> > > Yes, that is fine. As long as the #ifdef lives in the DAX code and not 
> > > here.
> >
> > We still need some ifdef in the efi core because that is the central
> > location to make the policy 

Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-06-07 Thread Ard Biesheuvel
On Sat, 1 Jun 2019 at 06:26, Dan Williams  wrote:
>
> On Fri, May 31, 2019 at 8:30 AM Ard Biesheuvel
>  wrote:
> >
> > On Fri, 31 May 2019 at 17:28, Dan Williams  wrote:
> > >
> > > On Fri, May 31, 2019 at 1:30 AM Ard Biesheuvel
> > >  wrote:
> > > >
> > > > (cc Mike for memblock)
> > > >
> > > > On Fri, 31 May 2019 at 01:13, Dan Williams  
> > > > wrote:
> > > > >
> > > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > > > interpretation of the EFI Memory Types as "reserved for a special
> > > > > purpose".
> > > > >
> > > > > The proposed Linux behavior for specific purpose memory is that it is
> > > > > reserved for direct-access (device-dax) by default and not available 
> > > > > for
> > > > > any kernel usage, not even as an OOM fallback. Later, through udev
> > > > > scripts or another init mechanism, these device-dax claimed ranges can
> > > > > be reconfigured and hot-added to the available System-RAM with a 
> > > > > unique
> > > > > node identifier.
> > > > >
> > > > > This patch introduces 3 new concepts at once given the entanglement
> > > > > between early boot enumeration relative to memory that can optionally 
> > > > > be
> > > > > reserved from the kernel page allocator by default. The new concepts
> > > > > are:
> > > > >
> > > > > - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
> > > > >   EFI_CONVENTIONAL memory, update the E820 map with this new type. 
> > > > > Only
> > > > >   perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy 
> > > > > is
> > > > >   enabled, otherwise treat it as typical ram.
> > > > >
> > > >
> > > > OK, so now we have 'special purpose', 'specific' and 'app specific'
> > > > [below]. Do they all mean the same thing?
> > >
> > > I struggled with separating the raw-EFI-type name from the name of the
> > > Linux specific policy. Since the reservation behavior is optional I
> > > was thinking there should be a distinct Linux kernel name for that
> > > policy. I did try to go back and change all occurrences of "special"
> > > to "specific" from the RFC to this v2, but seems I missed one.
> > >
> >
> > OK
>
> I'll go ahead and use "application reserved" terminology consistently
> throughout the code to distinguish that Linux translation from the raw
> "EFI specific purpose" attribute.
>

OK

> >
> > > >
> > > > > - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor 
> > > > > for
> > > > >   a device driver to search iomem resources for application specific
> > > > >   memory. Teach the iomem code to identify such ranges as "Application
> > > > >   Reserved".
> > > > >
> > > > > - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
> > > > >   traditional System RAM pool the expectation is that they will have
> > > > >   typical SRAT entries. In order to support a policy of device-dax by
> > > > >   default with the option to hotplug later, the numa initialization 
> > > > > code
> > > > >   is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.
> > > > >
> > > >
> > > > Can we move the generic memblock changes into a separate patch please?
> > >
> > > Yeah, that can move to a lead-in patch.
> > >
> > > [..]
> > > > > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > > > > index 91368f5ce114..b57b123cbdf9 100644
> > > > > --- a/include/linux/efi.h
> > > > > +++ b/include/linux/efi.h
> > > > > @@ -129,6 +129,19 @@ typedef struct {
> > > > > u64 attribute;
> > > > >  } efi_memory_desc_t;
> > > > >
> > > > > +#ifdef CONFIG_EFI_SPECIFIC_DAX
> > > > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > > > +{
> > > > > +   return md->type == EFI_CONVENTIONAL_MEMORY
> > > > > +   && (md->attribute & EFI_MEMORY_SP);
> > > > > +}
> > > > > +#else
> > > > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > > > +{
> > > > > +   return false;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > >  typedef struct {
> > > > > efi_guid_t guid;
> > > > > u32 headersize;
> > > >
> > > > I'd prefer it if we could avoid this DAX policy distinction leaking
> > > > into the EFI layer.
> > > >
> > > > IOW, I am fine with having a 'is_efi_sp_memory()' helper here, but
> > > > whether that is DAX memory or not should be decided in the DAX layer.
> > >
> > > Ok, how about is_efi_sp_ram()? Since EFI_MEMORY_SP might be applied to
> > > things that aren't EFI_CONVENTIONAL_MEMORY.
> >
> > Yes, that is fine. As long as the #ifdef lives in the DAX code and not here.
>
> We still need some ifdef in the efi core because that is the central
> location to make the policy distinction to identify identify
> EFI_CONVENTIONAL_MEMORY differently depending on whether EFI_MEMORY_SP
> is present. I agree with you that "dax" should be dropped from the
> naming. So how about:
>
> #ifdef CONFIG_EFI_APPLICATION_RESERVED
> static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> {
> return md->type == 

Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-06-05 Thread Dan Williams
On Sun, Jun 2, 2019 at 10:42 PM Mike Rapoport  wrote:
>
> Hi Dan,
>
> On Thu, May 30, 2019 at 03:59:43PM -0700, Dan Williams wrote:
> > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > interpretation of the EFI Memory Types as "reserved for a special
> > purpose".
> >
> > The proposed Linux behavior for specific purpose memory is that it is
> > reserved for direct-access (device-dax) by default and not available for
> > any kernel usage, not even as an OOM fallback. Later, through udev
> > scripts or another init mechanism, these device-dax claimed ranges can
> > be reconfigured and hot-added to the available System-RAM with a unique
> > node identifier.
> >
> > This patch introduces 3 new concepts at once given the entanglement
> > between early boot enumeration relative to memory that can optionally be
> > reserved from the kernel page allocator by default. The new concepts
> > are:
> >
> > - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
> >   EFI_CONVENTIONAL memory, update the E820 map with this new type. Only
> >   perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is
> >   enabled, otherwise treat it as typical ram.
> >
> > - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for
> >   a device driver to search iomem resources for application specific
> >   memory. Teach the iomem code to identify such ranges as "Application
> >   Reserved".
> >
> > - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
> >   traditional System RAM pool the expectation is that they will have
> >   typical SRAT entries. In order to support a policy of device-dax by
> >   default with the option to hotplug later, the numa initialization code
> >   is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.
>
> I'd appreciate a more elaborate description how this flag is going to be
> used.

This flag is only there to communicate to the numa code what ranges of
"conventional memory" should be skipped for onlining and reserved for
device-dax to consume. However, now that I say that out loud I realize
I might be able to get away with just using a plain entry
memblock.reserved. I'll take a look.

>
> > A follow-on change integrates parsing of the ACPI HMAT to identify the
> > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> > now, just identify and reserve memory of this type.
> >
> > Cc: 
> > Cc: Borislav Petkov 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > Cc: Darren Hart 
> > Cc: Andy Shevchenko 
> > Cc: Thomas Gleixner 
> > Cc: Ard Biesheuvel 
> > Reported-by: kbuild test robot 
> > Signed-off-by: Dan Williams 
> > ---
> >  arch/x86/Kconfig  |   20 
> >  arch/x86/boot/compressed/eboot.c  |5 -
> >  arch/x86/boot/compressed/kaslr.c  |2 +-
> >  arch/x86/include/asm/e820/types.h |9 +
> >  arch/x86/kernel/e820.c|9 +++--
> >  arch/x86/kernel/setup.c   |1 +
> >  arch/x86/platform/efi/efi.c   |   37 
> > +
> >  drivers/acpi/numa.c   |   15 ++-
> >  include/linux/efi.h   |   14 ++
> >  include/linux/ioport.h|1 +
> >  include/linux/memblock.h  |7 +++
> >  mm/memblock.c |4 
> >  12 files changed, 115 insertions(+), 9 deletions(-)
>
> ...
>
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 08a5f4a131f5..ddde1c7b1f9a 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -1109,6 +1109,7 @@ void __init setup_arch(char **cmdline_p)
> >
> >   if (efi_enabled(EFI_MEMMAP)) {
> >   efi_fake_memmap();
> > + efi_find_app_specific();
> >   efi_find_mirror();
> >   efi_esrt_init();
> >
> > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > index e1cb01a22fa8..899f1305c77a 100644
> > --- a/arch/x86/platform/efi/efi.c
> > +++ b/arch/x86/platform/efi/efi.c
> > @@ -123,10 +123,15 @@ void __init efi_find_mirror(void)
> >   * more than the max 128 entries that can fit in the e820 legacy
> >   * (zeropage) memory map.
> >   */
> > +enum add_efi_mode {
> > + ADD_EFI_ALL,
> > + ADD_EFI_APP_SPECIFIC,
> > +};
> >
> > -static void __init do_add_efi_memmap(void)
> > +static void __init do_add_efi_memmap(enum add_efi_mode mode)
> >  {
> >   efi_memory_desc_t *md;
> > + int add = 0;
> >
> >   for_each_efi_memory_desc(md) {
> >   unsigned long long start = md->phys_addr;
> > @@ -139,7 +144,9 @@ static void __init do_add_efi_memmap(void)
> >   case EFI_BOOT_SERVICES_CODE:
> >   case EFI_BOOT_SERVICES_DATA:
> >   case EFI_CONVENTIONAL_MEMORY:
> > - if (md->attribute & EFI_MEMORY_WB)
> > + if (is_efi_dax(md))
> > + e820_type = E820_TYPE_SPECIFIC;

Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-06-02 Thread Mike Rapoport
Hi Dan,

On Thu, May 30, 2019 at 03:59:43PM -0700, Dan Williams wrote:
> UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> interpretation of the EFI Memory Types as "reserved for a special
> purpose".
> 
> The proposed Linux behavior for specific purpose memory is that it is
> reserved for direct-access (device-dax) by default and not available for
> any kernel usage, not even as an OOM fallback. Later, through udev
> scripts or another init mechanism, these device-dax claimed ranges can
> be reconfigured and hot-added to the available System-RAM with a unique
> node identifier.
> 
> This patch introduces 3 new concepts at once given the entanglement
> between early boot enumeration relative to memory that can optionally be
> reserved from the kernel page allocator by default. The new concepts
> are:
> 
> - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
>   EFI_CONVENTIONAL memory, update the E820 map with this new type. Only
>   perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is
>   enabled, otherwise treat it as typical ram.
> 
> - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for
>   a device driver to search iomem resources for application specific
>   memory. Teach the iomem code to identify such ranges as "Application
>   Reserved".
> 
> - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
>   traditional System RAM pool the expectation is that they will have
>   typical SRAT entries. In order to support a policy of device-dax by
>   default with the option to hotplug later, the numa initialization code
>   is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.

I'd appreciate a more elaborate description how this flag is going to be
used.
 
> A follow-on change integrates parsing of the ACPI HMAT to identify the
> node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> now, just identify and reserve memory of this type.
> 
> Cc: 
> Cc: Borislav Petkov 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> Cc: Thomas Gleixner 
> Cc: Ard Biesheuvel 
> Reported-by: kbuild test robot 
> Signed-off-by: Dan Williams 
> ---
>  arch/x86/Kconfig  |   20 
>  arch/x86/boot/compressed/eboot.c  |5 -
>  arch/x86/boot/compressed/kaslr.c  |2 +-
>  arch/x86/include/asm/e820/types.h |9 +
>  arch/x86/kernel/e820.c|9 +++--
>  arch/x86/kernel/setup.c   |1 +
>  arch/x86/platform/efi/efi.c   |   37 
> +
>  drivers/acpi/numa.c   |   15 ++-
>  include/linux/efi.h   |   14 ++
>  include/linux/ioport.h|1 +
>  include/linux/memblock.h  |7 +++
>  mm/memblock.c |4 
>  12 files changed, 115 insertions(+), 9 deletions(-)
 
...

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 08a5f4a131f5..ddde1c7b1f9a 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1109,6 +1109,7 @@ void __init setup_arch(char **cmdline_p)
> 
>   if (efi_enabled(EFI_MEMMAP)) {
>   efi_fake_memmap();
> + efi_find_app_specific();
>   efi_find_mirror();
>   efi_esrt_init();
> 
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index e1cb01a22fa8..899f1305c77a 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -123,10 +123,15 @@ void __init efi_find_mirror(void)
>   * more than the max 128 entries that can fit in the e820 legacy
>   * (zeropage) memory map.
>   */
> +enum add_efi_mode {
> + ADD_EFI_ALL,
> + ADD_EFI_APP_SPECIFIC,
> +};
> 
> -static void __init do_add_efi_memmap(void)
> +static void __init do_add_efi_memmap(enum add_efi_mode mode)
>  {
>   efi_memory_desc_t *md;
> + int add = 0;
> 
>   for_each_efi_memory_desc(md) {
>   unsigned long long start = md->phys_addr;
> @@ -139,7 +144,9 @@ static void __init do_add_efi_memmap(void)
>   case EFI_BOOT_SERVICES_CODE:
>   case EFI_BOOT_SERVICES_DATA:
>   case EFI_CONVENTIONAL_MEMORY:
> - if (md->attribute & EFI_MEMORY_WB)
> + if (is_efi_dax(md))
> + e820_type = E820_TYPE_SPECIFIC;
> + else if (md->attribute & EFI_MEMORY_WB)
>   e820_type = E820_TYPE_RAM;
>   else
>   e820_type = E820_TYPE_RESERVED;
> @@ -165,9 +172,24 @@ static void __init do_add_efi_memmap(void)
>   e820_type = E820_TYPE_RESERVED;
>   break;
>   }
> +
> + if (e820_type == E820_TYPE_SPECIFIC) {
> + memblock_remove(start, size);
> + memblock_add_range(, start, size,
> +   

Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-05-31 Thread Dan Williams
On Fri, May 31, 2019 at 8:30 AM Ard Biesheuvel
 wrote:
>
> On Fri, 31 May 2019 at 17:28, Dan Williams  wrote:
> >
> > On Fri, May 31, 2019 at 1:30 AM Ard Biesheuvel
> >  wrote:
> > >
> > > (cc Mike for memblock)
> > >
> > > On Fri, 31 May 2019 at 01:13, Dan Williams  
> > > wrote:
> > > >
> > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > > interpretation of the EFI Memory Types as "reserved for a special
> > > > purpose".
> > > >
> > > > The proposed Linux behavior for specific purpose memory is that it is
> > > > reserved for direct-access (device-dax) by default and not available for
> > > > any kernel usage, not even as an OOM fallback. Later, through udev
> > > > scripts or another init mechanism, these device-dax claimed ranges can
> > > > be reconfigured and hot-added to the available System-RAM with a unique
> > > > node identifier.
> > > >
> > > > This patch introduces 3 new concepts at once given the entanglement
> > > > between early boot enumeration relative to memory that can optionally be
> > > > reserved from the kernel page allocator by default. The new concepts
> > > > are:
> > > >
> > > > - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
> > > >   EFI_CONVENTIONAL memory, update the E820 map with this new type. Only
> > > >   perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is
> > > >   enabled, otherwise treat it as typical ram.
> > > >
> > >
> > > OK, so now we have 'special purpose', 'specific' and 'app specific'
> > > [below]. Do they all mean the same thing?
> >
> > I struggled with separating the raw-EFI-type name from the name of the
> > Linux specific policy. Since the reservation behavior is optional I
> > was thinking there should be a distinct Linux kernel name for that
> > policy. I did try to go back and change all occurrences of "special"
> > to "specific" from the RFC to this v2, but seems I missed one.
> >
>
> OK

I'll go ahead and use "application reserved" terminology consistently
throughout the code to distinguish that Linux translation from the raw
"EFI specific purpose" attribute.

>
> > >
> > > > - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for
> > > >   a device driver to search iomem resources for application specific
> > > >   memory. Teach the iomem code to identify such ranges as "Application
> > > >   Reserved".
> > > >
> > > > - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
> > > >   traditional System RAM pool the expectation is that they will have
> > > >   typical SRAT entries. In order to support a policy of device-dax by
> > > >   default with the option to hotplug later, the numa initialization code
> > > >   is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.
> > > >
> > >
> > > Can we move the generic memblock changes into a separate patch please?
> >
> > Yeah, that can move to a lead-in patch.
> >
> > [..]
> > > > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > > > index 91368f5ce114..b57b123cbdf9 100644
> > > > --- a/include/linux/efi.h
> > > > +++ b/include/linux/efi.h
> > > > @@ -129,6 +129,19 @@ typedef struct {
> > > > u64 attribute;
> > > >  } efi_memory_desc_t;
> > > >
> > > > +#ifdef CONFIG_EFI_SPECIFIC_DAX
> > > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > > +{
> > > > +   return md->type == EFI_CONVENTIONAL_MEMORY
> > > > +   && (md->attribute & EFI_MEMORY_SP);
> > > > +}
> > > > +#else
> > > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > > +{
> > > > +   return false;
> > > > +}
> > > > +#endif
> > > > +
> > > >  typedef struct {
> > > > efi_guid_t guid;
> > > > u32 headersize;
> > >
> > > I'd prefer it if we could avoid this DAX policy distinction leaking
> > > into the EFI layer.
> > >
> > > IOW, I am fine with having a 'is_efi_sp_memory()' helper here, but
> > > whether that is DAX memory or not should be decided in the DAX layer.
> >
> > Ok, how about is_efi_sp_ram()? Since EFI_MEMORY_SP might be applied to
> > things that aren't EFI_CONVENTIONAL_MEMORY.
>
> Yes, that is fine. As long as the #ifdef lives in the DAX code and not here.

We still need some ifdef in the efi core because that is the central
location to make the policy distinction to identify identify
EFI_CONVENTIONAL_MEMORY differently depending on whether EFI_MEMORY_SP
is present. I agree with you that "dax" should be dropped from the
naming. So how about:

#ifdef CONFIG_EFI_APPLICATION_RESERVED
static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
{
return md->type == EFI_CONVENTIONAL_MEMORY
&& (md->attribute & EFI_MEMORY_SP);
}
#else
static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
{
return false;
}
#endif


Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-05-31 Thread Ard Biesheuvel
On Fri, 31 May 2019 at 17:28, Dan Williams  wrote:
>
> On Fri, May 31, 2019 at 1:30 AM Ard Biesheuvel
>  wrote:
> >
> > (cc Mike for memblock)
> >
> > On Fri, 31 May 2019 at 01:13, Dan Williams  wrote:
> > >
> > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > interpretation of the EFI Memory Types as "reserved for a special
> > > purpose".
> > >
> > > The proposed Linux behavior for specific purpose memory is that it is
> > > reserved for direct-access (device-dax) by default and not available for
> > > any kernel usage, not even as an OOM fallback. Later, through udev
> > > scripts or another init mechanism, these device-dax claimed ranges can
> > > be reconfigured and hot-added to the available System-RAM with a unique
> > > node identifier.
> > >
> > > This patch introduces 3 new concepts at once given the entanglement
> > > between early boot enumeration relative to memory that can optionally be
> > > reserved from the kernel page allocator by default. The new concepts
> > > are:
> > >
> > > - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
> > >   EFI_CONVENTIONAL memory, update the E820 map with this new type. Only
> > >   perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is
> > >   enabled, otherwise treat it as typical ram.
> > >
> >
> > OK, so now we have 'special purpose', 'specific' and 'app specific'
> > [below]. Do they all mean the same thing?
>
> I struggled with separating the raw-EFI-type name from the name of the
> Linux specific policy. Since the reservation behavior is optional I
> was thinking there should be a distinct Linux kernel name for that
> policy. I did try to go back and change all occurrences of "special"
> to "specific" from the RFC to this v2, but seems I missed one.
>

OK

> >
> > > - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for
> > >   a device driver to search iomem resources for application specific
> > >   memory. Teach the iomem code to identify such ranges as "Application
> > >   Reserved".
> > >
> > > - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
> > >   traditional System RAM pool the expectation is that they will have
> > >   typical SRAT entries. In order to support a policy of device-dax by
> > >   default with the option to hotplug later, the numa initialization code
> > >   is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.
> > >
> >
> > Can we move the generic memblock changes into a separate patch please?
>
> Yeah, that can move to a lead-in patch.
>
> [..]
> > > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > > index 91368f5ce114..b57b123cbdf9 100644
> > > --- a/include/linux/efi.h
> > > +++ b/include/linux/efi.h
> > > @@ -129,6 +129,19 @@ typedef struct {
> > > u64 attribute;
> > >  } efi_memory_desc_t;
> > >
> > > +#ifdef CONFIG_EFI_SPECIFIC_DAX
> > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > +{
> > > +   return md->type == EFI_CONVENTIONAL_MEMORY
> > > +   && (md->attribute & EFI_MEMORY_SP);
> > > +}
> > > +#else
> > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > +{
> > > +   return false;
> > > +}
> > > +#endif
> > > +
> > >  typedef struct {
> > > efi_guid_t guid;
> > > u32 headersize;
> >
> > I'd prefer it if we could avoid this DAX policy distinction leaking
> > into the EFI layer.
> >
> > IOW, I am fine with having a 'is_efi_sp_memory()' helper here, but
> > whether that is DAX memory or not should be decided in the DAX layer.
>
> Ok, how about is_efi_sp_ram()? Since EFI_MEMORY_SP might be applied to
> things that aren't EFI_CONVENTIONAL_MEMORY.

Yes, that is fine. As long as the #ifdef lives in the DAX code and not here.


Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-05-31 Thread Dan Williams
On Fri, May 31, 2019 at 1:30 AM Ard Biesheuvel
 wrote:
>
> (cc Mike for memblock)
>
> On Fri, 31 May 2019 at 01:13, Dan Williams  wrote:
> >
> > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > interpretation of the EFI Memory Types as "reserved for a special
> > purpose".
> >
> > The proposed Linux behavior for specific purpose memory is that it is
> > reserved for direct-access (device-dax) by default and not available for
> > any kernel usage, not even as an OOM fallback. Later, through udev
> > scripts or another init mechanism, these device-dax claimed ranges can
> > be reconfigured and hot-added to the available System-RAM with a unique
> > node identifier.
> >
> > This patch introduces 3 new concepts at once given the entanglement
> > between early boot enumeration relative to memory that can optionally be
> > reserved from the kernel page allocator by default. The new concepts
> > are:
> >
> > - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
> >   EFI_CONVENTIONAL memory, update the E820 map with this new type. Only
> >   perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is
> >   enabled, otherwise treat it as typical ram.
> >
>
> OK, so now we have 'special purpose', 'specific' and 'app specific'
> [below]. Do they all mean the same thing?

I struggled with separating the raw-EFI-type name from the name of the
Linux specific policy. Since the reservation behavior is optional I
was thinking there should be a distinct Linux kernel name for that
policy. I did try to go back and change all occurrences of "special"
to "specific" from the RFC to this v2, but seems I missed one.

>
> > - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for
> >   a device driver to search iomem resources for application specific
> >   memory. Teach the iomem code to identify such ranges as "Application
> >   Reserved".
> >
> > - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
> >   traditional System RAM pool the expectation is that they will have
> >   typical SRAT entries. In order to support a policy of device-dax by
> >   default with the option to hotplug later, the numa initialization code
> >   is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.
> >
>
> Can we move the generic memblock changes into a separate patch please?

Yeah, that can move to a lead-in patch.

[..]
> > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > index 91368f5ce114..b57b123cbdf9 100644
> > --- a/include/linux/efi.h
> > +++ b/include/linux/efi.h
> > @@ -129,6 +129,19 @@ typedef struct {
> > u64 attribute;
> >  } efi_memory_desc_t;
> >
> > +#ifdef CONFIG_EFI_SPECIFIC_DAX
> > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > +{
> > +   return md->type == EFI_CONVENTIONAL_MEMORY
> > +   && (md->attribute & EFI_MEMORY_SP);
> > +}
> > +#else
> > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > +{
> > +   return false;
> > +}
> > +#endif
> > +
> >  typedef struct {
> > efi_guid_t guid;
> > u32 headersize;
>
> I'd prefer it if we could avoid this DAX policy distinction leaking
> into the EFI layer.
>
> IOW, I am fine with having a 'is_efi_sp_memory()' helper here, but
> whether that is DAX memory or not should be decided in the DAX layer.

Ok, how about is_efi_sp_ram()? Since EFI_MEMORY_SP might be applied to
things that aren't EFI_CONVENTIONAL_MEMORY.


Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax

2019-05-31 Thread Ard Biesheuvel
(cc Mike for memblock)

On Fri, 31 May 2019 at 01:13, Dan Williams  wrote:
>
> UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> interpretation of the EFI Memory Types as "reserved for a special
> purpose".
>
> The proposed Linux behavior for specific purpose memory is that it is
> reserved for direct-access (device-dax) by default and not available for
> any kernel usage, not even as an OOM fallback. Later, through udev
> scripts or another init mechanism, these device-dax claimed ranges can
> be reconfigured and hot-added to the available System-RAM with a unique
> node identifier.
>
> This patch introduces 3 new concepts at once given the entanglement
> between early boot enumeration relative to memory that can optionally be
> reserved from the kernel page allocator by default. The new concepts
> are:
>
> - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
>   EFI_CONVENTIONAL memory, update the E820 map with this new type. Only
>   perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is
>   enabled, otherwise treat it as typical ram.
>

OK, so now we have 'special purpose', 'specific' and 'app specific'
[below]. Do they all mean the same thing?

> - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for
>   a device driver to search iomem resources for application specific
>   memory. Teach the iomem code to identify such ranges as "Application
>   Reserved".
>
> - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
>   traditional System RAM pool the expectation is that they will have
>   typical SRAT entries. In order to support a policy of device-dax by
>   default with the option to hotplug later, the numa initialization code
>   is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.
>

Can we move the generic memblock changes into a separate patch please?

> A follow-on change integrates parsing of the ACPI HMAT to identify the
> node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> now, just identify and reserve memory of this type.
>
> Cc: 
> Cc: Borislav Petkov 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> Cc: Thomas Gleixner 
> Cc: Ard Biesheuvel 
> Reported-by: kbuild test robot 
> Signed-off-by: Dan Williams 
> ---
>  arch/x86/Kconfig  |   20 
>  arch/x86/boot/compressed/eboot.c  |5 -
>  arch/x86/boot/compressed/kaslr.c  |2 +-
>  arch/x86/include/asm/e820/types.h |9 +
>  arch/x86/kernel/e820.c|9 +++--
>  arch/x86/kernel/setup.c   |1 +
>  arch/x86/platform/efi/efi.c   |   37 
> +
>  drivers/acpi/numa.c   |   15 ++-
>  include/linux/efi.h   |   14 ++
>  include/linux/ioport.h|1 +
>  include/linux/memblock.h  |7 +++
>  mm/memblock.c |4 
>  12 files changed, 115 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2bbbd4d1ba31..2d58f32ed6fa 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1955,6 +1955,26 @@ config EFI_MIXED
>
>If unsure, say N.
>
> +config EFI_SPECIFIC_DAX
> +   bool "DAX Support for EFI Specific Purpose Memory"
> +   depends on EFI
> +   ---help---
> + On systems that have mixed performance classes of memory EFI
> + may indicate specific purpose memory with an attribute (See
> + EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> + attribute may have unique performance characteristics compared
> + to the system's general purpose "System RAM" pool. On the
> + expectation that such memory has application specific usage,
> + and its base EFI memory type is "conventional" answer Y to
> + arrange for the kernel to reserve it for direct-access
> + (device-dax) by default. The memory range can later be
> + optionally assigned to the page allocator by system
> + administrator policy via the device-dax kmem facility. Say N
> + to have the kernel treat this memory as general purpose by
> + default.
> +
> + If unsure, say Y.
> +
>  config SECCOMP
> def_bool y
> prompt "Enable seccomp to safely compute untrusted bytecode"
> diff --git a/arch/x86/boot/compressed/eboot.c 
> b/arch/x86/boot/compressed/eboot.c
> index 544ac4fafd11..5afa6de508e4 100644
> --- a/arch/x86/boot/compressed/eboot.c
> +++ b/arch/x86/boot/compressed/eboot.c
> @@ -560,7 +560,10 @@ setup_e820(struct boot_params *params, struct setup_data 
> *e820ext, u32 e820ext_s
> case EFI_BOOT_SERVICES_CODE:
> case EFI_BOOT_SERVICES_DATA:
> case EFI_CONVENTIONAL_MEMORY:
> -   e820_type = E820_TYPE_RAM;
> +   if (is_efi_dax(d))
> +