Re: [edk2-devel] Runtime Page Granularity on ARM64

2023-08-09 Thread Oliver Smith-Denny

Thanks Ard and Andrew, I appreciate the info! That clears things up for
me.

Oliver

On 8/9/2023 3:56 PM, Ard Biesheuvel wrote:

Hi,

On Wed, 9 Aug 2023 at 23:35, Oliver Smith-Denny
 wrote:


Hi Ard,

I just sent out a patch (MdeModulePkg: HeapGuard: Don't Assume Pool Head
Allocated In First Page) to fix HeapGuard GuardAlignedToTail behavior on
ARM64. However, this raised a question of why ARM64 sets
RUNTIME_PAGE_ALLOCATION_GRANULARITY to 64k when X64 does not. You added
this in ProcessorBind.h for ARM64, so I am hoping to get some additional
context from you (or anyone on the mailing list who has insight).

I understand that on ARM64 we can have 64k pages in the OS, but what I
do not understand is why we need to map in 64k chunks in UEFI. I see the
UEFI spec says that ARM allows for 64K pages and that if runtime code
or data is within a 64KB page then all 4k pages within that 64K page
need the same memory attributes, which makes sense.

Is this runtime granularity just to satisfy that requirement that the
memory attributes are the same or is there an additional reason that
we need to use the 64k granularity on ARM64?

In any case, I am confused why this is not an issue on X64 or if we
have 2MB pages in the OS? I'm not as familiar with the mechanisms an
OS will use to map runtime services within its space, but they will be
virtualized and the OS will have its own page tables, so it doesn't
quite follow to me why the OS cares all that much what UEFI has done.

Any light you can shed here would be greatly appreciated.



It is not about how UEFI maps them at boot time at all, and we happily
use 1GB or 2MB mappings for runtime regions if the permission
attributes allow it. It is simply about the granularity of regions
that the OS needs to care about, i.e., those with the
EFI_MEMORY_RUNTIME attribute set in the EFI memory map, as well as
memory used for ACPI tables or mapped at runtime by the AML
interpreter.

There are two important differences with X64:
- arm64 supports 16k and 64k pages
- arm64 does not tolerate aliases with mismatched attributes

If a EFI_MEMORY_RUNTIME region is not aligned to 64k, the OS is
generally forced to round outwards if it is running with a page size >
4k.This means that some adjacent physical pages will be covered by
[typically] a writeback cacheable mapping even though the memory map
may describe them differently. For instance, a EFI reserved region
used by the ACPI parking protocol requires non-cacheable mappings, and
even if such a mapping is created elsewhere in the OS's virtual
address space, if it overlaps with a cacheable mapping of the same
physical 64k, the result is unpredictable. (The uncached accesses are
likely to hit in the cache inadvertently if a [speculative] access via
the cached alias pulled in the data)

Whether or not the architecture or OS supports 2 MB pages is not
really relevant, given that it will never be forced to use those for
regions that are not aligned to 2 MB. A 64k pagesize OS simply has no
smaller granule available, and so it has to round outwards (*)

The least impactful way to achieve this in EDK2 was to increase the
page allocation granularity for runtime memory types (and rework the
pool allocator to make better use of memory that is allocated in
larger chunks). I imagine there might be other ways to ensure that
EFI_MEMORY_RUNTIME regions are aligned sufficiently, e.g., by
reasoning about whether or not adjacent regions may require different
attributes, and permitting misalignment if they don't. But this will
be a lot more hassle, and a lot more room for error.




-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#107661): https://edk2.groups.io/g/devel/message/107661
Mute This Topic: https://groups.io/mt/100652665/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] Runtime Page Granularity on ARM64

2023-08-09 Thread Andrew Fish via groups.io
Oliver,

My understanding is that AArch64 (ARMv8) supports 3 page table granules: 4KB, 
16KB, and 64KB [1]. These granules represent the smallest range for a page 
table, and this granule changes which bits of the VA index into what level of 
paging. On x86 this indexing was fixed an a 2 MiB page table just liveds higher 
up in the same hierarchy. 

While AArch64 supports all 3 granule sizes, how many granule sizes are 
supported by a given CPU is implementation defined. So it is legal for an 
AArch64 CPU to only support 64KB pages. You can always do 4K pages on Intel, so 
that is the difference. 

[1] https://developer.arm.com/documentation/101811/0103/Translation-granule

Thanks,

Andrew Fish

> On Aug 9, 2023, at 5:35 PM, Oliver Smith-Denny  
> wrote:
> 
> Hi Ard,
> 
> I just sent out a patch (MdeModulePkg: HeapGuard: Don't Assume Pool Head
> Allocated In First Page) to fix HeapGuard GuardAlignedToTail behavior on
> ARM64. However, this raised a question of why ARM64 sets
> RUNTIME_PAGE_ALLOCATION_GRANULARITY to 64k when X64 does not. You added
> this in ProcessorBind.h for ARM64, so I am hoping to get some additional
> context from you (or anyone on the mailing list who has insight).
> 
> I understand that on ARM64 we can have 64k pages in the OS, but what I
> do not understand is why we need to map in 64k chunks in UEFI. I see the
> UEFI spec says that ARM allows for 64K pages and that if runtime code
> or data is within a 64KB page then all 4k pages within that 64K page
> need the same memory attributes, which makes sense.
> 
> Is this runtime granularity just to satisfy that requirement that the
> memory attributes are the same or is there an additional reason that
> we need to use the 64k granularity on ARM64?
> 
> In any case, I am confused why this is not an issue on X64 or if we
> have 2MB pages in the OS? I'm not as familiar with the mechanisms an
> OS will use to map runtime services within its space, but they will be
> virtualized and the OS will have its own page tables, so it doesn't
> quite follow to me why the OS cares all that much what UEFI has done.
> 
> Any light you can shed here would be greatly appreciated.
> 
> Thanks,
> Oliver
> 
> 
> 
> 
> 
> 



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#107660): https://edk2.groups.io/g/devel/message/107660
Mute This Topic: https://groups.io/mt/100652665/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: 
https://edk2.groups.io/g/devel/leave/9847357/21656/1706620634/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




Re: [edk2-devel] Runtime Page Granularity on ARM64

2023-08-09 Thread Ard Biesheuvel
Hi,

On Wed, 9 Aug 2023 at 23:35, Oliver Smith-Denny
 wrote:
>
> Hi Ard,
>
> I just sent out a patch (MdeModulePkg: HeapGuard: Don't Assume Pool Head
> Allocated In First Page) to fix HeapGuard GuardAlignedToTail behavior on
> ARM64. However, this raised a question of why ARM64 sets
> RUNTIME_PAGE_ALLOCATION_GRANULARITY to 64k when X64 does not. You added
> this in ProcessorBind.h for ARM64, so I am hoping to get some additional
> context from you (or anyone on the mailing list who has insight).
>
> I understand that on ARM64 we can have 64k pages in the OS, but what I
> do not understand is why we need to map in 64k chunks in UEFI. I see the
> UEFI spec says that ARM allows for 64K pages and that if runtime code
> or data is within a 64KB page then all 4k pages within that 64K page
> need the same memory attributes, which makes sense.
>
> Is this runtime granularity just to satisfy that requirement that the
> memory attributes are the same or is there an additional reason that
> we need to use the 64k granularity on ARM64?
>
> In any case, I am confused why this is not an issue on X64 or if we
> have 2MB pages in the OS? I'm not as familiar with the mechanisms an
> OS will use to map runtime services within its space, but they will be
> virtualized and the OS will have its own page tables, so it doesn't
> quite follow to me why the OS cares all that much what UEFI has done.
>
> Any light you can shed here would be greatly appreciated.
>

It is not about how UEFI maps them at boot time at all, and we happily
use 1GB or 2MB mappings for runtime regions if the permission
attributes allow it. It is simply about the granularity of regions
that the OS needs to care about, i.e., those with the
EFI_MEMORY_RUNTIME attribute set in the EFI memory map, as well as
memory used for ACPI tables or mapped at runtime by the AML
interpreter.

There are two important differences with X64:
- arm64 supports 16k and 64k pages
- arm64 does not tolerate aliases with mismatched attributes

If a EFI_MEMORY_RUNTIME region is not aligned to 64k, the OS is
generally forced to round outwards if it is running with a page size >
4k.This means that some adjacent physical pages will be covered by
[typically] a writeback cacheable mapping even though the memory map
may describe them differently. For instance, a EFI reserved region
used by the ACPI parking protocol requires non-cacheable mappings, and
even if such a mapping is created elsewhere in the OS's virtual
address space, if it overlaps with a cacheable mapping of the same
physical 64k, the result is unpredictable. (The uncached accesses are
likely to hit in the cache inadvertently if a [speculative] access via
the cached alias pulled in the data)

Whether or not the architecture or OS supports 2 MB pages is not
really relevant, given that it will never be forced to use those for
regions that are not aligned to 2 MB. A 64k pagesize OS simply has no
smaller granule available, and so it has to round outwards (*)

The least impactful way to achieve this in EDK2 was to increase the
page allocation granularity for runtime memory types (and rework the
pool allocator to make better use of memory that is allocated in
larger chunks). I imagine there might be other ways to ensure that
EFI_MEMORY_RUNTIME regions are aligned sufficiently, e.g., by
reasoning about whether or not adjacent regions may require different
attributes, and permitting misalignment if they don't. But this will
be a lot more hassle, and a lot more room for error.

-- 
Ard.


(*) Theoretically, we could always map EFI memory with 4k granularity
in Linux, given that we use TTBR0 mappings for it, whose granule size
is independent from the granule size used by the OS for its own VA
space, but this is not straight-forward in terms of implementation.


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#107659): https://edk2.groups.io/g/devel/message/107659
Mute This Topic: https://groups.io/mt/100652665/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-




[edk2-devel] Runtime Page Granularity on ARM64

2023-08-09 Thread Oliver Smith-Denny

Hi Ard,

I just sent out a patch (MdeModulePkg: HeapGuard: Don't Assume Pool Head
Allocated In First Page) to fix HeapGuard GuardAlignedToTail behavior on
ARM64. However, this raised a question of why ARM64 sets
RUNTIME_PAGE_ALLOCATION_GRANULARITY to 64k when X64 does not. You added
this in ProcessorBind.h for ARM64, so I am hoping to get some additional
context from you (or anyone on the mailing list who has insight).

I understand that on ARM64 we can have 64k pages in the OS, but what I
do not understand is why we need to map in 64k chunks in UEFI. I see the
UEFI spec says that ARM allows for 64K pages and that if runtime code
or data is within a 64KB page then all 4k pages within that 64K page
need the same memory attributes, which makes sense.

Is this runtime granularity just to satisfy that requirement that the
memory attributes are the same or is there an additional reason that
we need to use the 64k granularity on ARM64?

In any case, I am confused why this is not an issue on X64 or if we
have 2MB pages in the OS? I'm not as familiar with the mechanisms an
OS will use to map runtime services within its space, but they will be
virtualized and the OS will have its own page tables, so it doesn't
quite follow to me why the OS cares all that much what UEFI has done.

Any light you can shed here would be greatly appreciated.

Thanks,
Oliver



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#107657): https://edk2.groups.io/g/devel/message/107657
Mute This Topic: https://groups.io/mt/100652665/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-