Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-25 Thread Ard Biesheuvel
On 25 November 2016 at 12:28, Ard Biesheuvel  wrote:
> On 25 November 2016 at 11:29, Robert Richter  
> wrote:
>> On 24.11.16 19:42:47, Ard Biesheuvel wrote:
>>> On 24 November 2016 at 19:26, Robert Richter  
>>> wrote:
>>
>>> > I revisited the code and it is working well already since:
>>> >
>>> >  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
>>> >
>>> > Now, try_ram_remap() is only called if the region to be mapped is
>>> > entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
>>> > ranges and not NOMAP mem. region_intersects() then returns
>>> > REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
>>> > REGION_DISJOINT would be returned and thus arch_memremap_wb() being
>>> > called directly. Before the e7cd190385d1 change try_ram_remap() was
>>> > called also for nomap regions.
>>> >
>>> > So we can leave memremap() as it is and just apply this patch
>>> > unmodified. What do you think?
>>>
>>> I agree. The pfn_valid() check in try_ram_remap() is still appropriate
>>> simply because the PageHighmem check requires a valid struct page. But
>>> if we don't enter that code path anymore for NOMAP regions, I think
>>> we're ok.
>>>
>>> > Please ack.
>>> >
>>>
>>> I still don't fully understand how it is guaranteed that *all* memory
>>> (i.e., all regions for which memblock_is_memory() returns true) is
>>> covered by a struct page, but marked as reserved. Are we relying on
>>> the fact that NOMAP memory is also memblock_reserve()'d?
>>
>> See free_low_memory_core_early():
>>
>> 
>> for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , ,
>> NULL)
>> count += __free_memory_core(start, end);
>> 
>>
>> Only mem with the MEMBLOCK_NONE flag is added. And NOMAP pages are
>> also *not* marked reserved. So nothing at all from NOMAP mem is
>> reported to mm, it is not present (see below for a mem config, note
>> flags: 0x4 mem regions).
>>
>
> OK, thanks for clearing that up. But that still does not explain how
> we can be certain that NOMAP regions are guaranteed to be covered by a
> struct page, does it? Because that is ultimately what pfn_valid()
> means, that it is safe to, e.g., look at the page flags.
>

Answering to self: arm64_memory_present() registers all memory as
present, which means that sparse_init() will allocate struct page
backing for all of memory, including NOMAP regions.

We could ask ourselves whether it makes sense to disregard NOMAP
memory here, but that probably buys us very little (but I do wonder
how it affects the occurrence of the bug).

In any case, it looks to me like your patch is safe, i.e., calling
pfn_valid() on NOMAP pages is safe, although I still find it debatable
whether the kernel should be tracking memory it does not own. However,
for performance reasons, it probably makes sense to apply your patch,
and if anyone ever comes up with a use case where this is problematic
(e.g., gigabytes of NOMAP memory), we can look into it then.

Acked-by: Ard Biesheuvel 


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-25 Thread Ard Biesheuvel
On 25 November 2016 at 12:28, Ard Biesheuvel  wrote:
> On 25 November 2016 at 11:29, Robert Richter  
> wrote:
>> On 24.11.16 19:42:47, Ard Biesheuvel wrote:
>>> On 24 November 2016 at 19:26, Robert Richter  
>>> wrote:
>>
>>> > I revisited the code and it is working well already since:
>>> >
>>> >  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
>>> >
>>> > Now, try_ram_remap() is only called if the region to be mapped is
>>> > entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
>>> > ranges and not NOMAP mem. region_intersects() then returns
>>> > REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
>>> > REGION_DISJOINT would be returned and thus arch_memremap_wb() being
>>> > called directly. Before the e7cd190385d1 change try_ram_remap() was
>>> > called also for nomap regions.
>>> >
>>> > So we can leave memremap() as it is and just apply this patch
>>> > unmodified. What do you think?
>>>
>>> I agree. The pfn_valid() check in try_ram_remap() is still appropriate
>>> simply because the PageHighmem check requires a valid struct page. But
>>> if we don't enter that code path anymore for NOMAP regions, I think
>>> we're ok.
>>>
>>> > Please ack.
>>> >
>>>
>>> I still don't fully understand how it is guaranteed that *all* memory
>>> (i.e., all regions for which memblock_is_memory() returns true) is
>>> covered by a struct page, but marked as reserved. Are we relying on
>>> the fact that NOMAP memory is also memblock_reserve()'d?
>>
>> See free_low_memory_core_early():
>>
>> 
>> for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , ,
>> NULL)
>> count += __free_memory_core(start, end);
>> 
>>
>> Only mem with the MEMBLOCK_NONE flag is added. And NOMAP pages are
>> also *not* marked reserved. So nothing at all from NOMAP mem is
>> reported to mm, it is not present (see below for a mem config, note
>> flags: 0x4 mem regions).
>>
>
> OK, thanks for clearing that up. But that still does not explain how
> we can be certain that NOMAP regions are guaranteed to be covered by a
> struct page, does it? Because that is ultimately what pfn_valid()
> means, that it is safe to, e.g., look at the page flags.
>

Answering to self: arm64_memory_present() registers all memory as
present, which means that sparse_init() will allocate struct page
backing for all of memory, including NOMAP regions.

We could ask ourselves whether it makes sense to disregard NOMAP
memory here, but that probably buys us very little (but I do wonder
how it affects the occurrence of the bug).

In any case, it looks to me like your patch is safe, i.e., calling
pfn_valid() on NOMAP pages is safe, although I still find it debatable
whether the kernel should be tracking memory it does not own. However,
for performance reasons, it probably makes sense to apply your patch,
and if anyone ever comes up with a use case where this is problematic
(e.g., gigabytes of NOMAP memory), we can look into it then.

Acked-by: Ard Biesheuvel 


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-25 Thread Ard Biesheuvel
On 25 November 2016 at 11:29, Robert Richter  wrote:
> On 24.11.16 19:42:47, Ard Biesheuvel wrote:
>> On 24 November 2016 at 19:26, Robert Richter  
>> wrote:
>
>> > I revisited the code and it is working well already since:
>> >
>> >  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
>> >
>> > Now, try_ram_remap() is only called if the region to be mapped is
>> > entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
>> > ranges and not NOMAP mem. region_intersects() then returns
>> > REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
>> > REGION_DISJOINT would be returned and thus arch_memremap_wb() being
>> > called directly. Before the e7cd190385d1 change try_ram_remap() was
>> > called also for nomap regions.
>> >
>> > So we can leave memremap() as it is and just apply this patch
>> > unmodified. What do you think?
>>
>> I agree. The pfn_valid() check in try_ram_remap() is still appropriate
>> simply because the PageHighmem check requires a valid struct page. But
>> if we don't enter that code path anymore for NOMAP regions, I think
>> we're ok.
>>
>> > Please ack.
>> >
>>
>> I still don't fully understand how it is guaranteed that *all* memory
>> (i.e., all regions for which memblock_is_memory() returns true) is
>> covered by a struct page, but marked as reserved. Are we relying on
>> the fact that NOMAP memory is also memblock_reserve()'d?
>
> See free_low_memory_core_early():
>
> 
> for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , ,
> NULL)
> count += __free_memory_core(start, end);
> 
>
> Only mem with the MEMBLOCK_NONE flag is added. And NOMAP pages are
> also *not* marked reserved. So nothing at all from NOMAP mem is
> reported to mm, it is not present (see below for a mem config, note
> flags: 0x4 mem regions).
>

OK, thanks for clearing that up. But that still does not explain how
we can be certain that NOMAP regions are guaranteed to be covered by a
struct page, does it? Because that is ultimately what pfn_valid()
means, that it is safe to, e.g., look at the page flags.


> [0.00] efi: Processing EFI memory map:
> [0.00] efi:   0x0140-0x0147 [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x0148-0x024b [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x024c-0x211f [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x2120-0x2121 [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x2122-0xfffebfff [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0xfffec000-0x5fff [ACPI Reclaim Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x6000-0x6fff [ACPI Memory NVS|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x7000-0x [ACPI Reclaim Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x0001-0x000ff7ff [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x000ff800-0x000ff801 [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x000ff802-0x000fffa9efff [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x000fffa9f000-0x000f [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x0140-0x010f816aefff [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f816af000-0x010f816b1fff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f816b2000-0x010f826f1fff [Loader Code|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f826f2000-0x010f82701fff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f82702000-0x010f82787fff [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f82788000-0x010f9276bfff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f9276c000-0x010f9276cfff [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f9276d000-0x010f935a8fff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f935a9000-0x010f93880fff [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f93881000-0x010ff7880fff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010ff7881000-0x010ff7886fff [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010ff7887000-0x010ff78a3fff 

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-25 Thread Ard Biesheuvel
On 25 November 2016 at 11:29, Robert Richter  wrote:
> On 24.11.16 19:42:47, Ard Biesheuvel wrote:
>> On 24 November 2016 at 19:26, Robert Richter  
>> wrote:
>
>> > I revisited the code and it is working well already since:
>> >
>> >  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
>> >
>> > Now, try_ram_remap() is only called if the region to be mapped is
>> > entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
>> > ranges and not NOMAP mem. region_intersects() then returns
>> > REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
>> > REGION_DISJOINT would be returned and thus arch_memremap_wb() being
>> > called directly. Before the e7cd190385d1 change try_ram_remap() was
>> > called also for nomap regions.
>> >
>> > So we can leave memremap() as it is and just apply this patch
>> > unmodified. What do you think?
>>
>> I agree. The pfn_valid() check in try_ram_remap() is still appropriate
>> simply because the PageHighmem check requires a valid struct page. But
>> if we don't enter that code path anymore for NOMAP regions, I think
>> we're ok.
>>
>> > Please ack.
>> >
>>
>> I still don't fully understand how it is guaranteed that *all* memory
>> (i.e., all regions for which memblock_is_memory() returns true) is
>> covered by a struct page, but marked as reserved. Are we relying on
>> the fact that NOMAP memory is also memblock_reserve()'d?
>
> See free_low_memory_core_early():
>
> 
> for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , ,
> NULL)
> count += __free_memory_core(start, end);
> 
>
> Only mem with the MEMBLOCK_NONE flag is added. And NOMAP pages are
> also *not* marked reserved. So nothing at all from NOMAP mem is
> reported to mm, it is not present (see below for a mem config, note
> flags: 0x4 mem regions).
>

OK, thanks for clearing that up. But that still does not explain how
we can be certain that NOMAP regions are guaranteed to be covered by a
struct page, does it? Because that is ultimately what pfn_valid()
means, that it is safe to, e.g., look at the page flags.


> [0.00] efi: Processing EFI memory map:
> [0.00] efi:   0x0140-0x0147 [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x0148-0x024b [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x024c-0x211f [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x2120-0x2121 [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x2122-0xfffebfff [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0xfffec000-0x5fff [ACPI Reclaim Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x6000-0x6fff [ACPI Memory NVS|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x7000-0x [ACPI Reclaim Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x0001-0x000ff7ff [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x000ff800-0x000ff801 [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x000ff802-0x000fffa9efff [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x000fffa9f000-0x000f [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x0140-0x010f816aefff [Conventional Memory|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f816af000-0x010f816b1fff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f816b2000-0x010f826f1fff [Loader Code|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f826f2000-0x010f82701fff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f82702000-0x010f82787fff [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f82788000-0x010f9276bfff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f9276c000-0x010f9276cfff [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f9276d000-0x010f935a8fff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f935a9000-0x010f93880fff [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010f93881000-0x010ff7880fff [Loader Data|   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010ff7881000-0x010ff7886fff [Boot Data  |   | 
>  |  |  |  |  |  |   |WB|WT|WC|UC]
> [0.00] efi:   0x010ff7887000-0x010ff78a3fff [Loader Code|   | 
>  |  |  |  |  |  |   

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-25 Thread Robert Richter
On 24.11.16 19:42:47, Ard Biesheuvel wrote:
> On 24 November 2016 at 19:26, Robert Richter  
> wrote:

> > I revisited the code and it is working well already since:
> >
> >  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
> >
> > Now, try_ram_remap() is only called if the region to be mapped is
> > entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
> > ranges and not NOMAP mem. region_intersects() then returns
> > REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
> > REGION_DISJOINT would be returned and thus arch_memremap_wb() being
> > called directly. Before the e7cd190385d1 change try_ram_remap() was
> > called also for nomap regions.
> >
> > So we can leave memremap() as it is and just apply this patch
> > unmodified. What do you think?
> 
> I agree. The pfn_valid() check in try_ram_remap() is still appropriate
> simply because the PageHighmem check requires a valid struct page. But
> if we don't enter that code path anymore for NOMAP regions, I think
> we're ok.
> 
> > Please ack.
> >
> 
> I still don't fully understand how it is guaranteed that *all* memory
> (i.e., all regions for which memblock_is_memory() returns true) is
> covered by a struct page, but marked as reserved. Are we relying on
> the fact that NOMAP memory is also memblock_reserve()'d?

See free_low_memory_core_early():


for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , ,
NULL)
count += __free_memory_core(start, end);


Only mem with the MEMBLOCK_NONE flag is added. And NOMAP pages are
also *not* marked reserved. So nothing at all from NOMAP mem is
reported to mm, it is not present (see below for a mem config, note
flags: 0x4 mem regions).

-Robert


[0.00] efi: Processing EFI memory map:
[0.00] efi:   0x0140-0x0147 [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x0148-0x024b [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x024c-0x211f [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x2120-0x2121 [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x2122-0xfffebfff [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0xfffec000-0x5fff [ACPI Reclaim Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x6000-0x6fff [ACPI Memory NVS|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x7000-0x [ACPI Reclaim Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x0001-0x000ff7ff [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x000ff800-0x000ff801 [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x000ff802-0x000fffa9efff [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x000fffa9f000-0x000f [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x0140-0x010f816aefff [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f816af000-0x010f816b1fff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f816b2000-0x010f826f1fff [Loader Code|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f826f2000-0x010f82701fff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f82702000-0x010f82787fff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f82788000-0x010f9276bfff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f9276c000-0x010f9276cfff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f9276d000-0x010f935a8fff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f935a9000-0x010f93880fff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f93881000-0x010ff7880fff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff7881000-0x010ff7886fff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff7887000-0x010ff78a3fff [Loader Code|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff78a4000-0x010ff9e8dfff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff9e8e000-0x010ff9f16fff [Runtime Data   |RUN|  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff9f17000-0x010ffaeb5fff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ffaeb6000-0x010ffafc8fff [Runtime Data   |RUN|  
|  |  | 

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-25 Thread Robert Richter
On 24.11.16 19:42:47, Ard Biesheuvel wrote:
> On 24 November 2016 at 19:26, Robert Richter  
> wrote:

> > I revisited the code and it is working well already since:
> >
> >  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
> >
> > Now, try_ram_remap() is only called if the region to be mapped is
> > entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
> > ranges and not NOMAP mem. region_intersects() then returns
> > REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
> > REGION_DISJOINT would be returned and thus arch_memremap_wb() being
> > called directly. Before the e7cd190385d1 change try_ram_remap() was
> > called also for nomap regions.
> >
> > So we can leave memremap() as it is and just apply this patch
> > unmodified. What do you think?
> 
> I agree. The pfn_valid() check in try_ram_remap() is still appropriate
> simply because the PageHighmem check requires a valid struct page. But
> if we don't enter that code path anymore for NOMAP regions, I think
> we're ok.
> 
> > Please ack.
> >
> 
> I still don't fully understand how it is guaranteed that *all* memory
> (i.e., all regions for which memblock_is_memory() returns true) is
> covered by a struct page, but marked as reserved. Are we relying on
> the fact that NOMAP memory is also memblock_reserve()'d?

See free_low_memory_core_early():


for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , ,
NULL)
count += __free_memory_core(start, end);


Only mem with the MEMBLOCK_NONE flag is added. And NOMAP pages are
also *not* marked reserved. So nothing at all from NOMAP mem is
reported to mm, it is not present (see below for a mem config, note
flags: 0x4 mem regions).

-Robert


[0.00] efi: Processing EFI memory map:
[0.00] efi:   0x0140-0x0147 [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x0148-0x024b [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x024c-0x211f [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x2120-0x2121 [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x2122-0xfffebfff [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0xfffec000-0x5fff [ACPI Reclaim Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x6000-0x6fff [ACPI Memory NVS|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x7000-0x [ACPI Reclaim Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x0001-0x000ff7ff [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x000ff800-0x000ff801 [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x000ff802-0x000fffa9efff [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x000fffa9f000-0x000f [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x0140-0x010f816aefff [Conventional Memory|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f816af000-0x010f816b1fff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f816b2000-0x010f826f1fff [Loader Code|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f826f2000-0x010f82701fff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f82702000-0x010f82787fff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f82788000-0x010f9276bfff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f9276c000-0x010f9276cfff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f9276d000-0x010f935a8fff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f935a9000-0x010f93880fff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010f93881000-0x010ff7880fff [Loader Data|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff7881000-0x010ff7886fff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff7887000-0x010ff78a3fff [Loader Code|   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff78a4000-0x010ff9e8dfff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff9e8e000-0x010ff9f16fff [Runtime Data   |RUN|  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ff9f17000-0x010ffaeb5fff [Boot Data  |   |  
|  |  |  |  |  |   |WB|WT|WC|UC]
[0.00] efi:   0x010ffaeb6000-0x010ffafc8fff [Runtime Data   |RUN|  
|  |  |  |  |  |   |WB|WT|WC|UC]
[ 

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
Ard,

> > >> On 24 November 2016 at 13:51, Robert Richter  
> > >> wrote:
> > >> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:

> > >> Regions containing firmware tables are owned by the firmware, and it
> > >> is the firmware that tells us which memory attributes we are allowed
> > >> to use. If those attributes include WB, it is perfectly legal to use a
> > >> cacheable mapping. That does *not* mean they should be covered by the
> > >> linear mapping. The linear mapping is read-write-non-exec, for
> > >> instance, and we may prefer to use a read-only mapping and/or
> > >> executable mapping.
> > >
> > > Ok, I am going to fix try_ram_remap().

I revisited the code and it is working well already since:

 e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem

Now, try_ram_remap() is only called if the region to be mapped is
entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
ranges and not NOMAP mem. region_intersects() then returns
REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
REGION_DISJOINT would be returned and thus arch_memremap_wb() being
called directly. Before the e7cd190385d1 change try_ram_remap() was
called also for nomap regions.

So we can leave memremap() as it is and just apply this patch
unmodified. What do you think? Please ack.

I am going to prepare the pfn_is_ram() change in addition to this
patch, but that should not be required for this fix to work correcly.

Thanks,

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
Ard,

> > >> On 24 November 2016 at 13:51, Robert Richter  
> > >> wrote:
> > >> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:

> > >> Regions containing firmware tables are owned by the firmware, and it
> > >> is the firmware that tells us which memory attributes we are allowed
> > >> to use. If those attributes include WB, it is perfectly legal to use a
> > >> cacheable mapping. That does *not* mean they should be covered by the
> > >> linear mapping. The linear mapping is read-write-non-exec, for
> > >> instance, and we may prefer to use a read-only mapping and/or
> > >> executable mapping.
> > >
> > > Ok, I am going to fix try_ram_remap().

I revisited the code and it is working well already since:

 e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem

Now, try_ram_remap() is only called if the region to be mapped is
entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
ranges and not NOMAP mem. region_intersects() then returns
REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
REGION_DISJOINT would be returned and thus arch_memremap_wb() being
called directly. Before the e7cd190385d1 change try_ram_remap() was
called also for nomap regions.

So we can leave memremap() as it is and just apply this patch
unmodified. What do you think? Please ack.

I am going to prepare the pfn_is_ram() change in addition to this
patch, but that should not be required for this fix to work correcly.

Thanks,

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 19:26, Robert Richter  wrote:
> Ard,
>
>> > >> On 24 November 2016 at 13:51, Robert Richter 
>> > >>  wrote:
>> > >> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:
>
>> > >> Regions containing firmware tables are owned by the firmware, and it
>> > >> is the firmware that tells us which memory attributes we are allowed
>> > >> to use. If those attributes include WB, it is perfectly legal to use a
>> > >> cacheable mapping. That does *not* mean they should be covered by the
>> > >> linear mapping. The linear mapping is read-write-non-exec, for
>> > >> instance, and we may prefer to use a read-only mapping and/or
>> > >> executable mapping.
>> > >
>> > > Ok, I am going to fix try_ram_remap().
>
> I revisited the code and it is working well already since:
>
>  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
>
> Now, try_ram_remap() is only called if the region to be mapped is
> entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
> ranges and not NOMAP mem. region_intersects() then returns
> REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
> REGION_DISJOINT would be returned and thus arch_memremap_wb() being
> called directly. Before the e7cd190385d1 change try_ram_remap() was
> called also for nomap regions.
>
> So we can leave memremap() as it is and just apply this patch
> unmodified. What do you think?

I agree. The pfn_valid() check in try_ram_remap() is still appropriate
simply because the PageHighmem check requires a valid struct page. But
if we don't enter that code path anymore for NOMAP regions, I think
we're ok.

> Please ack.
>

I still don't fully understand how it is guaranteed that *all* memory
(i.e., all regions for which memblock_is_memory() returns true) is
covered by a struct page, but marked as reserved. Are we relying on
the fact that NOMAP memory is also memblock_reserve()'d?

> I am going to prepare the pfn_is_ram() change in addition to this
> patch, but that should not be required for this fix to work correcly.
>

I don't think you need to bother with page_is_ram() then. The only
place we use it is in devmem_is_allowed(), and there it makes sense to
allow NOMAP regions to be accessed (provided that you think /dev/mem
is a good idea in the first place).


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 19:26, Robert Richter  wrote:
> Ard,
>
>> > >> On 24 November 2016 at 13:51, Robert Richter 
>> > >>  wrote:
>> > >> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:
>
>> > >> Regions containing firmware tables are owned by the firmware, and it
>> > >> is the firmware that tells us which memory attributes we are allowed
>> > >> to use. If those attributes include WB, it is perfectly legal to use a
>> > >> cacheable mapping. That does *not* mean they should be covered by the
>> > >> linear mapping. The linear mapping is read-write-non-exec, for
>> > >> instance, and we may prefer to use a read-only mapping and/or
>> > >> executable mapping.
>> > >
>> > > Ok, I am going to fix try_ram_remap().
>
> I revisited the code and it is working well already since:
>
>  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
>
> Now, try_ram_remap() is only called if the region to be mapped is
> entirely in IORESOURCE_SYSTEM_RAM. This is only true for normal mem
> ranges and not NOMAP mem. region_intersects() then returns
> REGION_INTERSECTS and calls try_ram_remap(). For the NOMAP memory case
> REGION_DISJOINT would be returned and thus arch_memremap_wb() being
> called directly. Before the e7cd190385d1 change try_ram_remap() was
> called also for nomap regions.
>
> So we can leave memremap() as it is and just apply this patch
> unmodified. What do you think?

I agree. The pfn_valid() check in try_ram_remap() is still appropriate
simply because the PageHighmem check requires a valid struct page. But
if we don't enter that code path anymore for NOMAP regions, I think
we're ok.

> Please ack.
>

I still don't fully understand how it is guaranteed that *all* memory
(i.e., all regions for which memblock_is_memory() returns true) is
covered by a struct page, but marked as reserved. Are we relying on
the fact that NOMAP memory is also memblock_reserve()'d?

> I am going to prepare the pfn_is_ram() change in addition to this
> patch, but that should not be required for this fix to work correcly.
>

I don't think you need to bother with page_is_ram() then. The only
place we use it is in devmem_is_allowed(), and there it makes sense to
allow NOMAP regions to be accessed (provided that you think /dev/mem
is a good idea in the first place).


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
On 24.11.16 14:23:16, Ard Biesheuvel wrote:
> On 24 November 2016 at 14:11, Robert Richter  
> wrote:
> > On 24.11.16 13:58:30, Ard Biesheuvel wrote:
> >> On 24 November 2016 at 13:51, Robert Richter  
> >> wrote:
> >> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:
> >> >> On 24 November 2016 at 13:42, Robert Richter 
> >> >>  wrote:
> >> >> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
> >> >> >> Why? MEMREMAP_WB is used often, among other things for mapping
> >> >> >> firmware tables, which are marked as NOMAP, so in these cases, the
> >> >> >> linear address is not mapped.
> >> >> >
> >> >> > If fw tables are mapped wb, that is wrong and needs a separate fix.
> >> >> >
> >> >>
> >> >> Why is that wrong?
> >> >
> >> > The whole issue with mapping acpi tables is not marking them cachable,
> >> > what wb does.
> >>
> >> What 'issue'?
> >>
> >> > Otherwise we could just use linear mapping for those mem
> >> > ranges.
> >> >
> >>
> >> Regions containing firmware tables are owned by the firmware, and it
> >> is the firmware that tells us which memory attributes we are allowed
> >> to use. If those attributes include WB, it is perfectly legal to use a
> >> cacheable mapping. That does *not* mean they should be covered by the
> >> linear mapping. The linear mapping is read-write-non-exec, for
> >> instance, and we may prefer to use a read-only mapping and/or
> >> executable mapping.
> >
> > Ok, I am going to fix try_ram_remap().
> >
> 
> Thanks. Could you also add an arm64 version of page_is_ram() that uses
> memblock_is_memory() while you're at it? I think using memblock
> directly in try_ram_remap() may not be the best approach

Sure. I also want to mark the patches as stable.

> > Are there other concerns with this patch?
> >
> 
> I think we all agree that pfn_valid() should return whether a pfn has
> a struct page associated with it, the debate is about whether it makes
> sense to allocate struct pages for memory that the kernel does not
> own. But given that it does not really hurt to do so for small holes,
> I think your suggestion makes sense.

Thanks for your comments and the review.

> Should we be doing anything more to ensure that those pages are not
> dereferenced inadvertently? Is there a page flag we should be setting?

I don't think so. Boot mem is initialized in free_low_memory_core_
early(). The PageReserved flag is set for pages from reserved memory
ranges, and memory ranges not marked NOMAP is just freed. Since pages
are either reservered or in the free_list of pages, pages from other
memory ranges (NOMAP) is not visible to mm.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
On 24.11.16 14:23:16, Ard Biesheuvel wrote:
> On 24 November 2016 at 14:11, Robert Richter  
> wrote:
> > On 24.11.16 13:58:30, Ard Biesheuvel wrote:
> >> On 24 November 2016 at 13:51, Robert Richter  
> >> wrote:
> >> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:
> >> >> On 24 November 2016 at 13:42, Robert Richter 
> >> >>  wrote:
> >> >> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
> >> >> >> Why? MEMREMAP_WB is used often, among other things for mapping
> >> >> >> firmware tables, which are marked as NOMAP, so in these cases, the
> >> >> >> linear address is not mapped.
> >> >> >
> >> >> > If fw tables are mapped wb, that is wrong and needs a separate fix.
> >> >> >
> >> >>
> >> >> Why is that wrong?
> >> >
> >> > The whole issue with mapping acpi tables is not marking them cachable,
> >> > what wb does.
> >>
> >> What 'issue'?
> >>
> >> > Otherwise we could just use linear mapping for those mem
> >> > ranges.
> >> >
> >>
> >> Regions containing firmware tables are owned by the firmware, and it
> >> is the firmware that tells us which memory attributes we are allowed
> >> to use. If those attributes include WB, it is perfectly legal to use a
> >> cacheable mapping. That does *not* mean they should be covered by the
> >> linear mapping. The linear mapping is read-write-non-exec, for
> >> instance, and we may prefer to use a read-only mapping and/or
> >> executable mapping.
> >
> > Ok, I am going to fix try_ram_remap().
> >
> 
> Thanks. Could you also add an arm64 version of page_is_ram() that uses
> memblock_is_memory() while you're at it? I think using memblock
> directly in try_ram_remap() may not be the best approach

Sure. I also want to mark the patches as stable.

> > Are there other concerns with this patch?
> >
> 
> I think we all agree that pfn_valid() should return whether a pfn has
> a struct page associated with it, the debate is about whether it makes
> sense to allocate struct pages for memory that the kernel does not
> own. But given that it does not really hurt to do so for small holes,
> I think your suggestion makes sense.

Thanks for your comments and the review.

> Should we be doing anything more to ensure that those pages are not
> dereferenced inadvertently? Is there a page flag we should be setting?

I don't think so. Boot mem is initialized in free_low_memory_core_
early(). The PageReserved flag is set for pages from reserved memory
ranges, and memory ranges not marked NOMAP is just freed. Since pages
are either reservered or in the free_list of pages, pages from other
memory ranges (NOMAP) is not visible to mm.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 14:11, Robert Richter  wrote:
> On 24.11.16 13:58:30, Ard Biesheuvel wrote:
>> On 24 November 2016 at 13:51, Robert Richter  
>> wrote:
>> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:
>> >> On 24 November 2016 at 13:42, Robert Richter  
>> >> wrote:
>> >> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
>> >> >> Why? MEMREMAP_WB is used often, among other things for mapping
>> >> >> firmware tables, which are marked as NOMAP, so in these cases, the
>> >> >> linear address is not mapped.
>> >> >
>> >> > If fw tables are mapped wb, that is wrong and needs a separate fix.
>> >> >
>> >>
>> >> Why is that wrong?
>> >
>> > The whole issue with mapping acpi tables is not marking them cachable,
>> > what wb does.
>>
>> What 'issue'?
>>
>> > Otherwise we could just use linear mapping for those mem
>> > ranges.
>> >
>>
>> Regions containing firmware tables are owned by the firmware, and it
>> is the firmware that tells us which memory attributes we are allowed
>> to use. If those attributes include WB, it is perfectly legal to use a
>> cacheable mapping. That does *not* mean they should be covered by the
>> linear mapping. The linear mapping is read-write-non-exec, for
>> instance, and we may prefer to use a read-only mapping and/or
>> executable mapping.
>
> Ok, I am going to fix try_ram_remap().
>

Thanks. Could you also add an arm64 version of page_is_ram() that uses
memblock_is_memory() while you're at it? I think using memblock
directly in try_ram_remap() may not be the best approach

> Are there other concerns with this patch?
>

I think we all agree that pfn_valid() should return whether a pfn has
a struct page associated with it, the debate is about whether it makes
sense to allocate struct pages for memory that the kernel does not
own. But given that it does not really hurt to do so for small holes,
I think your suggestion makes sense.

Should we be doing anything more to ensure that those pages are not
dereferenced inadvertently? Is there a page flag we should be setting?


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 14:11, Robert Richter  wrote:
> On 24.11.16 13:58:30, Ard Biesheuvel wrote:
>> On 24 November 2016 at 13:51, Robert Richter  
>> wrote:
>> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:
>> >> On 24 November 2016 at 13:42, Robert Richter  
>> >> wrote:
>> >> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
>> >> >> Why? MEMREMAP_WB is used often, among other things for mapping
>> >> >> firmware tables, which are marked as NOMAP, so in these cases, the
>> >> >> linear address is not mapped.
>> >> >
>> >> > If fw tables are mapped wb, that is wrong and needs a separate fix.
>> >> >
>> >>
>> >> Why is that wrong?
>> >
>> > The whole issue with mapping acpi tables is not marking them cachable,
>> > what wb does.
>>
>> What 'issue'?
>>
>> > Otherwise we could just use linear mapping for those mem
>> > ranges.
>> >
>>
>> Regions containing firmware tables are owned by the firmware, and it
>> is the firmware that tells us which memory attributes we are allowed
>> to use. If those attributes include WB, it is perfectly legal to use a
>> cacheable mapping. That does *not* mean they should be covered by the
>> linear mapping. The linear mapping is read-write-non-exec, for
>> instance, and we may prefer to use a read-only mapping and/or
>> executable mapping.
>
> Ok, I am going to fix try_ram_remap().
>

Thanks. Could you also add an arm64 version of page_is_ram() that uses
memblock_is_memory() while you're at it? I think using memblock
directly in try_ram_remap() may not be the best approach

> Are there other concerns with this patch?
>

I think we all agree that pfn_valid() should return whether a pfn has
a struct page associated with it, the debate is about whether it makes
sense to allocate struct pages for memory that the kernel does not
own. But given that it does not really hurt to do so for small holes,
I think your suggestion makes sense.

Should we be doing anything more to ensure that those pages are not
dereferenced inadvertently? Is there a page flag we should be setting?


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
On 24.11.16 13:58:30, Ard Biesheuvel wrote:
> On 24 November 2016 at 13:51, Robert Richter  
> wrote:
> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:
> >> On 24 November 2016 at 13:42, Robert Richter  
> >> wrote:
> >> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
> >> >> Why? MEMREMAP_WB is used often, among other things for mapping
> >> >> firmware tables, which are marked as NOMAP, so in these cases, the
> >> >> linear address is not mapped.
> >> >
> >> > If fw tables are mapped wb, that is wrong and needs a separate fix.
> >> >
> >>
> >> Why is that wrong?
> >
> > The whole issue with mapping acpi tables is not marking them cachable,
> > what wb does.
> 
> What 'issue'?
> 
> > Otherwise we could just use linear mapping for those mem
> > ranges.
> >
> 
> Regions containing firmware tables are owned by the firmware, and it
> is the firmware that tells us which memory attributes we are allowed
> to use. If those attributes include WB, it is perfectly legal to use a
> cacheable mapping. That does *not* mean they should be covered by the
> linear mapping. The linear mapping is read-write-non-exec, for
> instance, and we may prefer to use a read-only mapping and/or
> executable mapping.

Ok, I am going to fix try_ram_remap().

Are there other concerns with this patch?

Thanks,

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
On 24.11.16 13:58:30, Ard Biesheuvel wrote:
> On 24 November 2016 at 13:51, Robert Richter  
> wrote:
> > On 24.11.16 13:44:31, Ard Biesheuvel wrote:
> >> On 24 November 2016 at 13:42, Robert Richter  
> >> wrote:
> >> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
> >> >> Why? MEMREMAP_WB is used often, among other things for mapping
> >> >> firmware tables, which are marked as NOMAP, so in these cases, the
> >> >> linear address is not mapped.
> >> >
> >> > If fw tables are mapped wb, that is wrong and needs a separate fix.
> >> >
> >>
> >> Why is that wrong?
> >
> > The whole issue with mapping acpi tables is not marking them cachable,
> > what wb does.
> 
> What 'issue'?
> 
> > Otherwise we could just use linear mapping for those mem
> > ranges.
> >
> 
> Regions containing firmware tables are owned by the firmware, and it
> is the firmware that tells us which memory attributes we are allowed
> to use. If those attributes include WB, it is perfectly legal to use a
> cacheable mapping. That does *not* mean they should be covered by the
> linear mapping. The linear mapping is read-write-non-exec, for
> instance, and we may prefer to use a read-only mapping and/or
> executable mapping.

Ok, I am going to fix try_ram_remap().

Are there other concerns with this patch?

Thanks,

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 13:51, Robert Richter  wrote:
> On 24.11.16 13:44:31, Ard Biesheuvel wrote:
>> On 24 November 2016 at 13:42, Robert Richter  
>> wrote:
>> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
>> >> Why? MEMREMAP_WB is used often, among other things for mapping
>> >> firmware tables, which are marked as NOMAP, so in these cases, the
>> >> linear address is not mapped.
>> >
>> > If fw tables are mapped wb, that is wrong and needs a separate fix.
>> >
>>
>> Why is that wrong?
>
> The whole issue with mapping acpi tables is not marking them cachable,
> what wb does.

What 'issue'?

> Otherwise we could just use linear mapping for those mem
> ranges.
>

Regions containing firmware tables are owned by the firmware, and it
is the firmware that tells us which memory attributes we are allowed
to use. If those attributes include WB, it is perfectly legal to use a
cacheable mapping. That does *not* mean they should be covered by the
linear mapping. The linear mapping is read-write-non-exec, for
instance, and we may prefer to use a read-only mapping and/or
executable mapping.

>> >> > If you think pfn_valid() is wrong here, I am happy to send a patch
>> >> > that fixes this by using page_is_ram(). In any case, the worst case
>> >> > that may happen is to behave the same as v4.4, we might fix then the
>> >> > wrong use of pfn_valid() where it is not correctly used to check for
>> >> > ram.
>> >> >
>> >>
>> >> page_is_ram() uses string comparisons to look for regions called
>> >> 'System RAM'. Is that something we can tolerate for each pfn_valid()
>> >> calll?
>> >>
>> >> Perhaps the solution is to reimplement page_is_ram() for arm64 using
>> >> memblock_is_memory() instead, But that still means we need to modify
>> >> the generic memremap() code first to switch to it before changing the
>> >> arm64 implementation of pfn_valid
>> >
>> > No, that's not the solution. pfn_valid() should just check if there is
>> > a valid struct page, as other archs do. And if there is a mis-use of
>> > pfn_valid() to check for ram, only that calls should be fixed to use
>> > page_is_ram(), however this is implemented, or something appropriate.
>> > But I don't see any problematic code, and if so, I will fix that.
>> >
>>
>> memremap() uses pfn_valid() to decide whether some address is covered
>> by the linear mapping. If we correct pfn_valid() to adhere to your
>> definition, we will need to fix memremap() first in any case.
>
> As said, will fix that if needed. But I think the caller is wrong
> then.
>


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 13:51, Robert Richter  wrote:
> On 24.11.16 13:44:31, Ard Biesheuvel wrote:
>> On 24 November 2016 at 13:42, Robert Richter  
>> wrote:
>> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
>> >> Why? MEMREMAP_WB is used often, among other things for mapping
>> >> firmware tables, which are marked as NOMAP, so in these cases, the
>> >> linear address is not mapped.
>> >
>> > If fw tables are mapped wb, that is wrong and needs a separate fix.
>> >
>>
>> Why is that wrong?
>
> The whole issue with mapping acpi tables is not marking them cachable,
> what wb does.

What 'issue'?

> Otherwise we could just use linear mapping for those mem
> ranges.
>

Regions containing firmware tables are owned by the firmware, and it
is the firmware that tells us which memory attributes we are allowed
to use. If those attributes include WB, it is perfectly legal to use a
cacheable mapping. That does *not* mean they should be covered by the
linear mapping. The linear mapping is read-write-non-exec, for
instance, and we may prefer to use a read-only mapping and/or
executable mapping.

>> >> > If you think pfn_valid() is wrong here, I am happy to send a patch
>> >> > that fixes this by using page_is_ram(). In any case, the worst case
>> >> > that may happen is to behave the same as v4.4, we might fix then the
>> >> > wrong use of pfn_valid() where it is not correctly used to check for
>> >> > ram.
>> >> >
>> >>
>> >> page_is_ram() uses string comparisons to look for regions called
>> >> 'System RAM'. Is that something we can tolerate for each pfn_valid()
>> >> calll?
>> >>
>> >> Perhaps the solution is to reimplement page_is_ram() for arm64 using
>> >> memblock_is_memory() instead, But that still means we need to modify
>> >> the generic memremap() code first to switch to it before changing the
>> >> arm64 implementation of pfn_valid
>> >
>> > No, that's not the solution. pfn_valid() should just check if there is
>> > a valid struct page, as other archs do. And if there is a mis-use of
>> > pfn_valid() to check for ram, only that calls should be fixed to use
>> > page_is_ram(), however this is implemented, or something appropriate.
>> > But I don't see any problematic code, and if so, I will fix that.
>> >
>>
>> memremap() uses pfn_valid() to decide whether some address is covered
>> by the linear mapping. If we correct pfn_valid() to adhere to your
>> definition, we will need to fix memremap() first in any case.
>
> As said, will fix that if needed. But I think the caller is wrong
> then.
>


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
On 23.11.16 21:25:06, Ard Biesheuvel wrote:
> Why? MEMREMAP_WB is used often, among other things for mapping
> firmware tables, which are marked as NOMAP, so in these cases, the
> linear address is not mapped.

If fw tables are mapped wb, that is wrong and needs a separate fix.

> > If you think pfn_valid() is wrong here, I am happy to send a patch
> > that fixes this by using page_is_ram(). In any case, the worst case
> > that may happen is to behave the same as v4.4, we might fix then the
> > wrong use of pfn_valid() where it is not correctly used to check for
> > ram.
> >
> 
> page_is_ram() uses string comparisons to look for regions called
> 'System RAM'. Is that something we can tolerate for each pfn_valid()
> calll?
> 
> Perhaps the solution is to reimplement page_is_ram() for arm64 using
> memblock_is_memory() instead, But that still means we need to modify
> the generic memremap() code first to switch to it before changing the
> arm64 implementation of pfn_valid

No, that's not the solution. pfn_valid() should just check if there is
a valid struct page, as other archs do. And if there is a mis-use of
pfn_valid() to check for ram, only that calls should be fixed to use
page_is_ram(), however this is implemented, or something appropriate.
But I don't see any problematic code, and if so, I will fix that.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
On 23.11.16 21:25:06, Ard Biesheuvel wrote:
> Why? MEMREMAP_WB is used often, among other things for mapping
> firmware tables, which are marked as NOMAP, so in these cases, the
> linear address is not mapped.

If fw tables are mapped wb, that is wrong and needs a separate fix.

> > If you think pfn_valid() is wrong here, I am happy to send a patch
> > that fixes this by using page_is_ram(). In any case, the worst case
> > that may happen is to behave the same as v4.4, we might fix then the
> > wrong use of pfn_valid() where it is not correctly used to check for
> > ram.
> >
> 
> page_is_ram() uses string comparisons to look for regions called
> 'System RAM'. Is that something we can tolerate for each pfn_valid()
> calll?
> 
> Perhaps the solution is to reimplement page_is_ram() for arm64 using
> memblock_is_memory() instead, But that still means we need to modify
> the generic memremap() code first to switch to it before changing the
> arm64 implementation of pfn_valid

No, that's not the solution. pfn_valid() should just check if there is
a valid struct page, as other archs do. And if there is a mis-use of
pfn_valid() to check for ram, only that calls should be fixed to use
page_is_ram(), however this is implemented, or something appropriate.
But I don't see any problematic code, and if so, I will fix that.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
On 24.11.16 13:44:31, Ard Biesheuvel wrote:
> On 24 November 2016 at 13:42, Robert Richter  
> wrote:
> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
> >> Why? MEMREMAP_WB is used often, among other things for mapping
> >> firmware tables, which are marked as NOMAP, so in these cases, the
> >> linear address is not mapped.
> >
> > If fw tables are mapped wb, that is wrong and needs a separate fix.
> >
> 
> Why is that wrong?

The whole issue with mapping acpi tables is not marking them cachable,
what wb does. Otherwise we could just use linear mapping for those mem
ranges.

> >> > If you think pfn_valid() is wrong here, I am happy to send a patch
> >> > that fixes this by using page_is_ram(). In any case, the worst case
> >> > that may happen is to behave the same as v4.4, we might fix then the
> >> > wrong use of pfn_valid() where it is not correctly used to check for
> >> > ram.
> >> >
> >>
> >> page_is_ram() uses string comparisons to look for regions called
> >> 'System RAM'. Is that something we can tolerate for each pfn_valid()
> >> calll?
> >>
> >> Perhaps the solution is to reimplement page_is_ram() for arm64 using
> >> memblock_is_memory() instead, But that still means we need to modify
> >> the generic memremap() code first to switch to it before changing the
> >> arm64 implementation of pfn_valid
> >
> > No, that's not the solution. pfn_valid() should just check if there is
> > a valid struct page, as other archs do. And if there is a mis-use of
> > pfn_valid() to check for ram, only that calls should be fixed to use
> > page_is_ram(), however this is implemented, or something appropriate.
> > But I don't see any problematic code, and if so, I will fix that.
> >
> 
> memremap() uses pfn_valid() to decide whether some address is covered
> by the linear mapping. If we correct pfn_valid() to adhere to your
> definition, we will need to fix memremap() first in any case.

As said, will fix that if needed. But I think the caller is wrong
then.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Robert Richter
On 24.11.16 13:44:31, Ard Biesheuvel wrote:
> On 24 November 2016 at 13:42, Robert Richter  
> wrote:
> > On 23.11.16 21:25:06, Ard Biesheuvel wrote:
> >> Why? MEMREMAP_WB is used often, among other things for mapping
> >> firmware tables, which are marked as NOMAP, so in these cases, the
> >> linear address is not mapped.
> >
> > If fw tables are mapped wb, that is wrong and needs a separate fix.
> >
> 
> Why is that wrong?

The whole issue with mapping acpi tables is not marking them cachable,
what wb does. Otherwise we could just use linear mapping for those mem
ranges.

> >> > If you think pfn_valid() is wrong here, I am happy to send a patch
> >> > that fixes this by using page_is_ram(). In any case, the worst case
> >> > that may happen is to behave the same as v4.4, we might fix then the
> >> > wrong use of pfn_valid() where it is not correctly used to check for
> >> > ram.
> >> >
> >>
> >> page_is_ram() uses string comparisons to look for regions called
> >> 'System RAM'. Is that something we can tolerate for each pfn_valid()
> >> calll?
> >>
> >> Perhaps the solution is to reimplement page_is_ram() for arm64 using
> >> memblock_is_memory() instead, But that still means we need to modify
> >> the generic memremap() code first to switch to it before changing the
> >> arm64 implementation of pfn_valid
> >
> > No, that's not the solution. pfn_valid() should just check if there is
> > a valid struct page, as other archs do. And if there is a mis-use of
> > pfn_valid() to check for ram, only that calls should be fixed to use
> > page_is_ram(), however this is implemented, or something appropriate.
> > But I don't see any problematic code, and if so, I will fix that.
> >
> 
> memremap() uses pfn_valid() to decide whether some address is covered
> by the linear mapping. If we correct pfn_valid() to adhere to your
> definition, we will need to fix memremap() first in any case.

As said, will fix that if needed. But I think the caller is wrong
then.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 13:42, Robert Richter  wrote:
> On 23.11.16 21:25:06, Ard Biesheuvel wrote:
>> Why? MEMREMAP_WB is used often, among other things for mapping
>> firmware tables, which are marked as NOMAP, so in these cases, the
>> linear address is not mapped.
>
> If fw tables are mapped wb, that is wrong and needs a separate fix.
>

Why is that wrong?

>> > If you think pfn_valid() is wrong here, I am happy to send a patch
>> > that fixes this by using page_is_ram(). In any case, the worst case
>> > that may happen is to behave the same as v4.4, we might fix then the
>> > wrong use of pfn_valid() where it is not correctly used to check for
>> > ram.
>> >
>>
>> page_is_ram() uses string comparisons to look for regions called
>> 'System RAM'. Is that something we can tolerate for each pfn_valid()
>> calll?
>>
>> Perhaps the solution is to reimplement page_is_ram() for arm64 using
>> memblock_is_memory() instead, But that still means we need to modify
>> the generic memremap() code first to switch to it before changing the
>> arm64 implementation of pfn_valid
>
> No, that's not the solution. pfn_valid() should just check if there is
> a valid struct page, as other archs do. And if there is a mis-use of
> pfn_valid() to check for ram, only that calls should be fixed to use
> page_is_ram(), however this is implemented, or something appropriate.
> But I don't see any problematic code, and if so, I will fix that.
>

memremap() uses pfn_valid() to decide whether some address is covered
by the linear mapping. If we correct pfn_valid() to adhere to your
definition, we will need to fix memremap() first in any case.


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-24 Thread Ard Biesheuvel
On 24 November 2016 at 13:42, Robert Richter  wrote:
> On 23.11.16 21:25:06, Ard Biesheuvel wrote:
>> Why? MEMREMAP_WB is used often, among other things for mapping
>> firmware tables, which are marked as NOMAP, so in these cases, the
>> linear address is not mapped.
>
> If fw tables are mapped wb, that is wrong and needs a separate fix.
>

Why is that wrong?

>> > If you think pfn_valid() is wrong here, I am happy to send a patch
>> > that fixes this by using page_is_ram(). In any case, the worst case
>> > that may happen is to behave the same as v4.4, we might fix then the
>> > wrong use of pfn_valid() where it is not correctly used to check for
>> > ram.
>> >
>>
>> page_is_ram() uses string comparisons to look for regions called
>> 'System RAM'. Is that something we can tolerate for each pfn_valid()
>> calll?
>>
>> Perhaps the solution is to reimplement page_is_ram() for arm64 using
>> memblock_is_memory() instead, But that still means we need to modify
>> the generic memremap() code first to switch to it before changing the
>> arm64 implementation of pfn_valid
>
> No, that's not the solution. pfn_valid() should just check if there is
> a valid struct page, as other archs do. And if there is a mis-use of
> pfn_valid() to check for ram, only that calls should be fixed to use
> page_is_ram(), however this is implemented, or something appropriate.
> But I don't see any problematic code, and if so, I will fix that.
>

memremap() uses pfn_valid() to decide whether some address is covered
by the linear mapping. If we correct pfn_valid() to adhere to your
definition, we will need to fix memremap() first in any case.


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-23 Thread Robert Richter
On 20.11.16 17:07:44, Ard Biesheuvel wrote:
> On 17 November 2016 at 15:18, Robert Richter  
> wrote:

> > The risk of breaking something with my patch is small and limited only
> > to the mapping of efi reserved regions (which is the state of 4.4). If
> > something breaks anyway it can easily be fixed by adding more checks
> > to pfn_valid() as suggested above.
> >
> 
> As I noted before, it looks to me like setting CONFIG_HOLES_IN_ZONE is
> the correct way to address this. However, doing that does uncover a
> bug in move_freepages() where the VM_BUG_ON_PAGE() dereferences struct
> page fields before the pfn_valid_within() check, so it seems those
> need to be switched around.
> 
> Robert, you mentioned that CONFIG_HOLES_IN_ZONE seems inappropriate
> for sparsemem. Care to elaborate why?

HOLES_IN_ZONE is of rare use in the kernel. I think it was introduced
to save memory for the memmap and only some single systems enable it.
There is no architecture that enables it entirely. For good reasons...

It introduces additional checks. pfn_valid() is usually checked only
once for the whole memmap. There are a number of checks enabled, just
grep for pfn_valid_within. This will increase the number of
pfn_valid() calls by a factor of MAX_ORDER_NR_PAGES, in my config this
is 8k. So, this is not the direction to go.

My patch fixes a regression in the kernel that was introduced by the
nomap implementation. Some systems can not boot anymore, beside of
that the BUG_ON() may occur any time depending only on physical page
access, we need to fix 4.9. Here is a reproducer:

 https://patchwork.kernel.org/patch/9407677/

My patch also does not break memremap(). With my patch applied
try_ram_remap() would return a linear addr for nomap regions. But this
is only called if WB is explicitly requested, so it should not happen.
If you think pfn_valid() is wrong here, I am happy to send a patch
that fixes this by using page_is_ram(). In any case, the worst case
that may happen is to behave the same as v4.4, we might fix then the
wrong use of pfn_valid() where it is not correctly used to check for
ram.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-23 Thread Robert Richter
On 20.11.16 17:07:44, Ard Biesheuvel wrote:
> On 17 November 2016 at 15:18, Robert Richter  
> wrote:

> > The risk of breaking something with my patch is small and limited only
> > to the mapping of efi reserved regions (which is the state of 4.4). If
> > something breaks anyway it can easily be fixed by adding more checks
> > to pfn_valid() as suggested above.
> >
> 
> As I noted before, it looks to me like setting CONFIG_HOLES_IN_ZONE is
> the correct way to address this. However, doing that does uncover a
> bug in move_freepages() where the VM_BUG_ON_PAGE() dereferences struct
> page fields before the pfn_valid_within() check, so it seems those
> need to be switched around.
> 
> Robert, you mentioned that CONFIG_HOLES_IN_ZONE seems inappropriate
> for sparsemem. Care to elaborate why?

HOLES_IN_ZONE is of rare use in the kernel. I think it was introduced
to save memory for the memmap and only some single systems enable it.
There is no architecture that enables it entirely. For good reasons...

It introduces additional checks. pfn_valid() is usually checked only
once for the whole memmap. There are a number of checks enabled, just
grep for pfn_valid_within. This will increase the number of
pfn_valid() calls by a factor of MAX_ORDER_NR_PAGES, in my config this
is 8k. So, this is not the direction to go.

My patch fixes a regression in the kernel that was introduced by the
nomap implementation. Some systems can not boot anymore, beside of
that the BUG_ON() may occur any time depending only on physical page
access, we need to fix 4.9. Here is a reproducer:

 https://patchwork.kernel.org/patch/9407677/

My patch also does not break memremap(). With my patch applied
try_ram_remap() would return a linear addr for nomap regions. But this
is only called if WB is explicitly requested, so it should not happen.
If you think pfn_valid() is wrong here, I am happy to send a patch
that fixes this by using page_is_ram(). In any case, the worst case
that may happen is to behave the same as v4.4, we might fix then the
wrong use of pfn_valid() where it is not correctly used to check for
ram.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-23 Thread Ard Biesheuvel
On 23 November 2016 at 21:15, Robert Richter  wrote:
> On 20.11.16 17:07:44, Ard Biesheuvel wrote:
>> On 17 November 2016 at 15:18, Robert Richter  
>> wrote:
>
>> > The risk of breaking something with my patch is small and limited only
>> > to the mapping of efi reserved regions (which is the state of 4.4). If
>> > something breaks anyway it can easily be fixed by adding more checks
>> > to pfn_valid() as suggested above.
>> >
>>
>> As I noted before, it looks to me like setting CONFIG_HOLES_IN_ZONE is
>> the correct way to address this. However, doing that does uncover a
>> bug in move_freepages() where the VM_BUG_ON_PAGE() dereferences struct
>> page fields before the pfn_valid_within() check, so it seems those
>> need to be switched around.
>>
>> Robert, you mentioned that CONFIG_HOLES_IN_ZONE seems inappropriate
>> for sparsemem. Care to elaborate why?
>
> HOLES_IN_ZONE is of rare use in the kernel. I think it was introduced
> to save memory for the memmap and only some single systems enable it.
> There is no architecture that enables it entirely. For good reasons...
>
> It introduces additional checks. pfn_valid() is usually checked only
> once for the whole memmap. There are a number of checks enabled, just
> grep for pfn_valid_within. This will increase the number of
> pfn_valid() calls by a factor of MAX_ORDER_NR_PAGES, in my config this
> is 8k. So, this is not the direction to go.
>

That does sound like a potential issue. But does it cause any slowdown
in practice?

The reality is that CONFIG_HOLES_IN_ZONE perfectly describes the
situation, and so it is still my preferred option if the performance
hit is tolerable.

> My patch fixes a regression in the kernel that was introduced by the
> nomap implementation. Some systems can not boot anymore, beside of
> that the BUG_ON() may occur any time depending only on physical page
> access, we need to fix 4.9. Here is a reproducer:
>
>  https://patchwork.kernel.org/patch/9407677/
>
> My patch also does not break memremap(). With my patch applied
> try_ram_remap() would return a linear addr for nomap regions. But this
> is only called if WB is explicitly requested, so it should not happen.

Why? MEMREMAP_WB is used often, among other things for mapping
firmware tables, which are marked as NOMAP, so in these cases, the
linear address is not mapped.

> If you think pfn_valid() is wrong here, I am happy to send a patch
> that fixes this by using page_is_ram(). In any case, the worst case
> that may happen is to behave the same as v4.4, we might fix then the
> wrong use of pfn_valid() where it is not correctly used to check for
> ram.
>

page_is_ram() uses string comparisons to look for regions called
'System RAM'. Is that something we can tolerate for each pfn_valid()
calll?

Perhaps the solution is to reimplement page_is_ram() for arm64 using
memblock_is_memory() instead, But that still means we need to modify
the generic memremap() code first to switch to it before changing the
arm64 implementation of pfn_valid


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-23 Thread Ard Biesheuvel
On 23 November 2016 at 21:15, Robert Richter  wrote:
> On 20.11.16 17:07:44, Ard Biesheuvel wrote:
>> On 17 November 2016 at 15:18, Robert Richter  
>> wrote:
>
>> > The risk of breaking something with my patch is small and limited only
>> > to the mapping of efi reserved regions (which is the state of 4.4). If
>> > something breaks anyway it can easily be fixed by adding more checks
>> > to pfn_valid() as suggested above.
>> >
>>
>> As I noted before, it looks to me like setting CONFIG_HOLES_IN_ZONE is
>> the correct way to address this. However, doing that does uncover a
>> bug in move_freepages() where the VM_BUG_ON_PAGE() dereferences struct
>> page fields before the pfn_valid_within() check, so it seems those
>> need to be switched around.
>>
>> Robert, you mentioned that CONFIG_HOLES_IN_ZONE seems inappropriate
>> for sparsemem. Care to elaborate why?
>
> HOLES_IN_ZONE is of rare use in the kernel. I think it was introduced
> to save memory for the memmap and only some single systems enable it.
> There is no architecture that enables it entirely. For good reasons...
>
> It introduces additional checks. pfn_valid() is usually checked only
> once for the whole memmap. There are a number of checks enabled, just
> grep for pfn_valid_within. This will increase the number of
> pfn_valid() calls by a factor of MAX_ORDER_NR_PAGES, in my config this
> is 8k. So, this is not the direction to go.
>

That does sound like a potential issue. But does it cause any slowdown
in practice?

The reality is that CONFIG_HOLES_IN_ZONE perfectly describes the
situation, and so it is still my preferred option if the performance
hit is tolerable.

> My patch fixes a regression in the kernel that was introduced by the
> nomap implementation. Some systems can not boot anymore, beside of
> that the BUG_ON() may occur any time depending only on physical page
> access, we need to fix 4.9. Here is a reproducer:
>
>  https://patchwork.kernel.org/patch/9407677/
>
> My patch also does not break memremap(). With my patch applied
> try_ram_remap() would return a linear addr for nomap regions. But this
> is only called if WB is explicitly requested, so it should not happen.

Why? MEMREMAP_WB is used often, among other things for mapping
firmware tables, which are marked as NOMAP, so in these cases, the
linear address is not mapped.

> If you think pfn_valid() is wrong here, I am happy to send a patch
> that fixes this by using page_is_ram(). In any case, the worst case
> that may happen is to behave the same as v4.4, we might fix then the
> wrong use of pfn_valid() where it is not correctly used to check for
> ram.
>

page_is_ram() uses string comparisons to look for regions called
'System RAM'. Is that something we can tolerate for each pfn_valid()
calll?

Perhaps the solution is to reimplement page_is_ram() for arm64 using
memblock_is_memory() instead, But that still means we need to modify
the generic memremap() code first to switch to it before changing the
arm64 implementation of pfn_valid


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-20 Thread Ard Biesheuvel
On 17 November 2016 at 15:18, Robert Richter  wrote:
> Thanks for your answer.
>
> On 17.11.16 14:25:29, Will Deacon wrote:
>> On Wed, Nov 09, 2016 at 08:51:32PM +0100, Robert Richter wrote:
>> > Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
>> > keeps the same behaviour as before since it still uses memblock_is_
>> > memory(). Could you more describe your concerns why do you think this
>> > patch breaks the kernel and moves the problem somewhere else? I
>> > believe it fixes the problem at all.
>>
>> acpi_os_ioremap always ends up in __ioremap_caller, regardless of
>> memblock_is_memory(). __ioremap_caller then fails if pfn_valid is true.
>
> But that's the reason my patch changed the code to use memblock_is_
> map_memory() instead. I was looking into the users of pfn_valid() esp.
> in arm64 code and changed it where required.
>
> This week I looked into the kernel again for code that might break by
> a pfn_valid() change. I found try_ram_remap() in memremap.c that has
> changed behaviour now, but this is explicit for MEMREMAP_WB, so it
> should be fine.
>
> Maybe it might be better to use page_is_ram() in addition to
> pfn_valid() where necessary. This should work now after commit:
>
>  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
>
> I still think pfn_valid() is not the correct use to determine the mem
> attributes for mappings, there are further checks required.
>
> The risk of breaking something with my patch is small and limited only
> to the mapping of efi reserved regions (which is the state of 4.4). If
> something breaks anyway it can easily be fixed by adding more checks
> to pfn_valid() as suggested above.
>

As I noted before, it looks to me like setting CONFIG_HOLES_IN_ZONE is
the correct way to address this. However, doing that does uncover a
bug in move_freepages() where the VM_BUG_ON_PAGE() dereferences struct
page fields before the pfn_valid_within() check, so it seems those
need to be switched around.

Robert, you mentioned that CONFIG_HOLES_IN_ZONE seems inappropriate
for sparsemem. Care to elaborate why?


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-20 Thread Ard Biesheuvel
On 17 November 2016 at 15:18, Robert Richter  wrote:
> Thanks for your answer.
>
> On 17.11.16 14:25:29, Will Deacon wrote:
>> On Wed, Nov 09, 2016 at 08:51:32PM +0100, Robert Richter wrote:
>> > Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
>> > keeps the same behaviour as before since it still uses memblock_is_
>> > memory(). Could you more describe your concerns why do you think this
>> > patch breaks the kernel and moves the problem somewhere else? I
>> > believe it fixes the problem at all.
>>
>> acpi_os_ioremap always ends up in __ioremap_caller, regardless of
>> memblock_is_memory(). __ioremap_caller then fails if pfn_valid is true.
>
> But that's the reason my patch changed the code to use memblock_is_
> map_memory() instead. I was looking into the users of pfn_valid() esp.
> in arm64 code and changed it where required.
>
> This week I looked into the kernel again for code that might break by
> a pfn_valid() change. I found try_ram_remap() in memremap.c that has
> changed behaviour now, but this is explicit for MEMREMAP_WB, so it
> should be fine.
>
> Maybe it might be better to use page_is_ram() in addition to
> pfn_valid() where necessary. This should work now after commit:
>
>  e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem
>
> I still think pfn_valid() is not the correct use to determine the mem
> attributes for mappings, there are further checks required.
>
> The risk of breaking something with my patch is small and limited only
> to the mapping of efi reserved regions (which is the state of 4.4). If
> something breaks anyway it can easily be fixed by adding more checks
> to pfn_valid() as suggested above.
>

As I noted before, it looks to me like setting CONFIG_HOLES_IN_ZONE is
the correct way to address this. However, doing that does uncover a
bug in move_freepages() where the VM_BUG_ON_PAGE() dereferences struct
page fields before the pfn_valid_within() check, so it seems those
need to be switched around.

Robert, you mentioned that CONFIG_HOLES_IN_ZONE seems inappropriate
for sparsemem. Care to elaborate why?


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-17 Thread Robert Richter
Thanks for your answer.

On 17.11.16 14:25:29, Will Deacon wrote:
> On Wed, Nov 09, 2016 at 08:51:32PM +0100, Robert Richter wrote:
> > Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
> > keeps the same behaviour as before since it still uses memblock_is_
> > memory(). Could you more describe your concerns why do you think this
> > patch breaks the kernel and moves the problem somewhere else? I
> > believe it fixes the problem at all.
> 
> acpi_os_ioremap always ends up in __ioremap_caller, regardless of
> memblock_is_memory(). __ioremap_caller then fails if pfn_valid is true.

But that's the reason my patch changed the code to use memblock_is_
map_memory() instead. I was looking into the users of pfn_valid() esp.
in arm64 code and changed it where required.

This week I looked into the kernel again for code that might break by
a pfn_valid() change. I found try_ram_remap() in memremap.c that has
changed behaviour now, but this is explicit for MEMREMAP_WB, so it
should be fine.

Maybe it might be better to use page_is_ram() in addition to
pfn_valid() where necessary. This should work now after commit:

 e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem

I still think pfn_valid() is not the correct use to determine the mem
attributes for mappings, there are further checks required.

The risk of breaking something with my patch is small and limited only
to the mapping of efi reserved regions (which is the state of 4.4). If
something breaks anyway it can easily be fixed by adding more checks
to pfn_valid() as suggested above.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-17 Thread Robert Richter
Thanks for your answer.

On 17.11.16 14:25:29, Will Deacon wrote:
> On Wed, Nov 09, 2016 at 08:51:32PM +0100, Robert Richter wrote:
> > Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
> > keeps the same behaviour as before since it still uses memblock_is_
> > memory(). Could you more describe your concerns why do you think this
> > patch breaks the kernel and moves the problem somewhere else? I
> > believe it fixes the problem at all.
> 
> acpi_os_ioremap always ends up in __ioremap_caller, regardless of
> memblock_is_memory(). __ioremap_caller then fails if pfn_valid is true.

But that's the reason my patch changed the code to use memblock_is_
map_memory() instead. I was looking into the users of pfn_valid() esp.
in arm64 code and changed it where required.

This week I looked into the kernel again for code that might break by
a pfn_valid() change. I found try_ram_remap() in memremap.c that has
changed behaviour now, but this is explicit for MEMREMAP_WB, so it
should be fine.

Maybe it might be better to use page_is_ram() in addition to
pfn_valid() where necessary. This should work now after commit:

 e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem

I still think pfn_valid() is not the correct use to determine the mem
attributes for mappings, there are further checks required.

The risk of breaking something with my patch is small and limited only
to the mapping of efi reserved regions (which is the state of 4.4). If
something breaks anyway it can easily be fixed by adding more checks
to pfn_valid() as suggested above.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-17 Thread Will Deacon
On Wed, Nov 09, 2016 at 08:51:32PM +0100, Robert Richter wrote:
> Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
> keeps the same behaviour as before since it still uses memblock_is_
> memory(). Could you more describe your concerns why do you think this
> patch breaks the kernel and moves the problem somewhere else? I
> believe it fixes the problem at all.

acpi_os_ioremap always ends up in __ioremap_caller, regardless of
memblock_is_memory(). __ioremap_caller then fails if pfn_valid is true.

Will


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-17 Thread Will Deacon
On Wed, Nov 09, 2016 at 08:51:32PM +0100, Robert Richter wrote:
> Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
> keeps the same behaviour as before since it still uses memblock_is_
> memory(). Could you more describe your concerns why do you think this
> patch breaks the kernel and moves the problem somewhere else? I
> believe it fixes the problem at all.

acpi_os_ioremap always ends up in __ioremap_caller, regardless of
memblock_is_memory(). __ioremap_caller then fails if pfn_valid is true.

Will


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-09 Thread Robert Richter
Will,

On 07.11.16 21:05:14, Will Deacon wrote:
> Just to reiterate here, but your patch as it stands will break other parts
> of the kernel. For example, acpi_os_ioremap relies on being able to ioremap
> these regions afaict.
> 
> I think any solution involving pfn_valid is just going to move the crash
> around.

Let me describe the crash more detailed.

Following range is marked nomap (full efi map below):

[0.00] efi:   0x010b9000-0x010ccfff [Runtime Code   |RUN|  
|  |  |  |  |  |   |WB|WT|WC|UC]*
[0.00] efi:   0x010cd000-0x010fefff [Runtime Data   |RUN|  
|  |  |  |  |  |   |WB|WT|WC|UC]*

The mem belongs to this nodes:

[0.00] NUMA: Adding memblock [0x140 - 0xf] on node 0
[0.00] NUMA: Adding memblock [0x140 - 0x10f] on node 1

The following mem ranges are created:

[0.00] Zone ranges:
[0.00]   DMA  [mem 0x0140-0x]
[0.00]   Normal   [mem 0x0001-0x010f]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x0140-0xfffd]
[0.00]   node   0: [mem 0xfffe-0x]
[0.00]   node   0: [mem 0x0001-0x000f]
[0.00]   node   1: [mem 0x0140-0x010ff9e7]
[0.00]   node   1: [mem 0x010ff9e8-0x010ff9f1]
[0.00]   node   1: [mem 0x010ff9f2-0x010ffaea]
[0.00]   node   1: [mem 0x010ffaeb-0x010ffaff]
[0.00]   node   1: [mem 0x010ffb00-0x010a]
[0.00]   node   1: [mem 0x010b-0x010f]

The last range 0x010b-0x010f is correctly
marked as a nomap area.

Paging is then initalized in free_area_init_core() (mm/page_alloc.c)
which calls memmap_init_zone(). This initializes all pages (struct
page) of the zones for each node except for pfns from nomap memory. It
uses early_pfn_valid() which is bound to pfn_valid().

In 4.4 *all* pages of a zone were initialized. Now page from nomap
ranges are skipped, and e.g. pfn 0x10b has an uninitalized struct
page.  Note that nomap mem ranges are still part of the memblock list
and also part of the zone ranges within start_pfn and end_pfn, but
those don't have a valid struct page now. IMO this is a bug.

Later on all mappable memory is reserved and all pages of it are freed
by adding them to the free-pages list except for the reserved
memblocks.

Now the initrd is loaded. This reserves free memory by calling
move_freepages_block(). A block has the size of pageblock_nr_pages
which is in my configuration 0x2000. The kernel choses the block

 0x010fe000-0x010f

that also contains the nomap region around 10ffb00.

In move_freepages() the pages of the zone are freed. The code accesses
pfn #10f but the struct page is uninitialized and thus node is 0.
This is different to #10fe000 and the zone which is node 1. This
causes the BUG_ON() to trigger:

[4.173521] Unpacking initramfs...
[8.420710] [ cut here ]
[8.425344] kernel BUG at mm/page_alloc.c:1844!
[8.429870] Internal error: Oops - BUG: 0 [#1] SMP

I believe this is not the only case where the mm code relies on a
valid struct page for all pfns of a zone, thus modifying mm code to
make this case work is not an option. (E.g. this could be done by
moving the early_pfn_valid() check at the beginning of the loop in
memmap_init_zone().)

So for a fix we need to change early_pfn_valid() which is mapped to
pfn_valid() for SPARSEMEM. Using a different function than pfn_valid()
for this does not look reasonable. The only option is to revert
pfn_valid() to it's original bahaviour.

My fix now uses again memblock_is_memory() for pfn_valid() instead of
memblock_is_map_memory(). So I needed to change some users of
pfn_valid() to use memblock_is_map_memory() there necessary. This is
only required for arch specific code, all other uses of pfn_valid()
are save to use memblock_is_memory().

Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
keeps the same behaviour as before since it still uses memblock_is_
memory(). Could you more describe your concerns why do you think this
patch breaks the kernel and moves the problem somewhere else? I
believe it fixes the problem at all.

Thanks,

-Robert



[0.00] efi: Processing EFI memory map:
[0.00] MEMBLOCK configuration:
[0.00]  memory size = 0x1ffe80 reserved size = 0x10537
[0.00]  memory.cnt  = 0x2
[0.00]  memory[0x0] [0x000140-0x0f], 
0xffec0 bytes flags: 0x0
[0.00]  memory[0x1] [0x000140-0x00010f], 
0xfffc0 bytes flags: 0x0
[0.00]  reserved.cnt  = 0x1
[0.00]  reserved[0x0]   [0x002120-0x0021210536], 0x10537 
bytes 

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-09 Thread Robert Richter
Will,

On 07.11.16 21:05:14, Will Deacon wrote:
> Just to reiterate here, but your patch as it stands will break other parts
> of the kernel. For example, acpi_os_ioremap relies on being able to ioremap
> these regions afaict.
> 
> I think any solution involving pfn_valid is just going to move the crash
> around.

Let me describe the crash more detailed.

Following range is marked nomap (full efi map below):

[0.00] efi:   0x010b9000-0x010ccfff [Runtime Code   |RUN|  
|  |  |  |  |  |   |WB|WT|WC|UC]*
[0.00] efi:   0x010cd000-0x010fefff [Runtime Data   |RUN|  
|  |  |  |  |  |   |WB|WT|WC|UC]*

The mem belongs to this nodes:

[0.00] NUMA: Adding memblock [0x140 - 0xf] on node 0
[0.00] NUMA: Adding memblock [0x140 - 0x10f] on node 1

The following mem ranges are created:

[0.00] Zone ranges:
[0.00]   DMA  [mem 0x0140-0x]
[0.00]   Normal   [mem 0x0001-0x010f]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x0140-0xfffd]
[0.00]   node   0: [mem 0xfffe-0x]
[0.00]   node   0: [mem 0x0001-0x000f]
[0.00]   node   1: [mem 0x0140-0x010ff9e7]
[0.00]   node   1: [mem 0x010ff9e8-0x010ff9f1]
[0.00]   node   1: [mem 0x010ff9f2-0x010ffaea]
[0.00]   node   1: [mem 0x010ffaeb-0x010ffaff]
[0.00]   node   1: [mem 0x010ffb00-0x010a]
[0.00]   node   1: [mem 0x010b-0x010f]

The last range 0x010b-0x010f is correctly
marked as a nomap area.

Paging is then initalized in free_area_init_core() (mm/page_alloc.c)
which calls memmap_init_zone(). This initializes all pages (struct
page) of the zones for each node except for pfns from nomap memory. It
uses early_pfn_valid() which is bound to pfn_valid().

In 4.4 *all* pages of a zone were initialized. Now page from nomap
ranges are skipped, and e.g. pfn 0x10b has an uninitalized struct
page.  Note that nomap mem ranges are still part of the memblock list
and also part of the zone ranges within start_pfn and end_pfn, but
those don't have a valid struct page now. IMO this is a bug.

Later on all mappable memory is reserved and all pages of it are freed
by adding them to the free-pages list except for the reserved
memblocks.

Now the initrd is loaded. This reserves free memory by calling
move_freepages_block(). A block has the size of pageblock_nr_pages
which is in my configuration 0x2000. The kernel choses the block

 0x010fe000-0x010f

that also contains the nomap region around 10ffb00.

In move_freepages() the pages of the zone are freed. The code accesses
pfn #10f but the struct page is uninitialized and thus node is 0.
This is different to #10fe000 and the zone which is node 1. This
causes the BUG_ON() to trigger:

[4.173521] Unpacking initramfs...
[8.420710] [ cut here ]
[8.425344] kernel BUG at mm/page_alloc.c:1844!
[8.429870] Internal error: Oops - BUG: 0 [#1] SMP

I believe this is not the only case where the mm code relies on a
valid struct page for all pfns of a zone, thus modifying mm code to
make this case work is not an option. (E.g. this could be done by
moving the early_pfn_valid() check at the beginning of the loop in
memmap_init_zone().)

So for a fix we need to change early_pfn_valid() which is mapped to
pfn_valid() for SPARSEMEM. Using a different function than pfn_valid()
for this does not look reasonable. The only option is to revert
pfn_valid() to it's original bahaviour.

My fix now uses again memblock_is_memory() for pfn_valid() instead of
memblock_is_map_memory(). So I needed to change some users of
pfn_valid() to use memblock_is_map_memory() there necessary. This is
only required for arch specific code, all other uses of pfn_valid()
are save to use memblock_is_memory().

Thus, I don't see where my patch breaks code. Even acpi_os_ioremap()
keeps the same behaviour as before since it still uses memblock_is_
memory(). Could you more describe your concerns why do you think this
patch breaks the kernel and moves the problem somewhere else? I
believe it fixes the problem at all.

Thanks,

-Robert



[0.00] efi: Processing EFI memory map:
[0.00] MEMBLOCK configuration:
[0.00]  memory size = 0x1ffe80 reserved size = 0x10537
[0.00]  memory.cnt  = 0x2
[0.00]  memory[0x0] [0x000140-0x0f], 
0xffec0 bytes flags: 0x0
[0.00]  memory[0x1] [0x000140-0x00010f], 
0xfffc0 bytes flags: 0x0
[0.00]  reserved.cnt  = 0x1
[0.00]  reserved[0x0]   [0x002120-0x0021210536], 0x10537 
bytes 

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-07 Thread Will Deacon
On Fri, Oct 28, 2016 at 11:19:05AM +0200, Robert Richter wrote:
> On 27.10.16 17:01:36, Will Deacon wrote:
> > It feels to me like NOMAP memory is a new type
> > of memory where there *is* a struct page, but it shouldn't be used for
> > anything.
> 
> IMO, a NOMAP page should just be handled like a reserved page except
> that the page is marked reserved. See free_low_memory_core_early().
> Thus, NOMAP pages are not in the free pages list or set to reserved.
> It is simply not available for mapping at all. Isn't that exactly what
> it should be?
> 
> I also did not yet understand the benefit of the differentiation
> between NOMAP and reserved and the original motivation for its
> implementation. I looked through the mail threads but could not find
> any hint. The only difference I see now is that it is not listed as a
> reserved page, but as long as it is not freed it should behave the
> same. I remember the case to handle memory different (coherency,
> etc.), but are not sure here. Ard, could you explain this?
> 
> > I don't think pfn_valid can describe that, given the way it's
> > currently used, and flipping the logic is just likely to move the problem
> > elsewhere.
> > 
> > What options do we have for fixing this in the NUMA code?
> 
> Out of my mind:
> 
> 1) Treat NOMAP pages same as reserved pages (my patch).

Just to reiterate here, but your patch as it stands will break other parts
of the kernel. For example, acpi_os_ioremap relies on being able to ioremap
these regions afaict.

I think any solution involving pfn_valid is just going to move the crash
around.

Will


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-07 Thread Will Deacon
On Fri, Oct 28, 2016 at 11:19:05AM +0200, Robert Richter wrote:
> On 27.10.16 17:01:36, Will Deacon wrote:
> > It feels to me like NOMAP memory is a new type
> > of memory where there *is* a struct page, but it shouldn't be used for
> > anything.
> 
> IMO, a NOMAP page should just be handled like a reserved page except
> that the page is marked reserved. See free_low_memory_core_early().
> Thus, NOMAP pages are not in the free pages list or set to reserved.
> It is simply not available for mapping at all. Isn't that exactly what
> it should be?
> 
> I also did not yet understand the benefit of the differentiation
> between NOMAP and reserved and the original motivation for its
> implementation. I looked through the mail threads but could not find
> any hint. The only difference I see now is that it is not listed as a
> reserved page, but as long as it is not freed it should behave the
> same. I remember the case to handle memory different (coherency,
> etc.), but are not sure here. Ard, could you explain this?
> 
> > I don't think pfn_valid can describe that, given the way it's
> > currently used, and flipping the logic is just likely to move the problem
> > elsewhere.
> > 
> > What options do we have for fixing this in the NUMA code?
> 
> Out of my mind:
> 
> 1) Treat NOMAP pages same as reserved pages (my patch).

Just to reiterate here, but your patch as it stands will break other parts
of the kernel. For example, acpi_os_ioremap relies on being able to ioremap
these regions afaict.

I think any solution involving pfn_valid is just going to move the crash
around.

Will


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-01 Thread Robert Richter
On 06.10.16 11:52:07, Robert Richter wrote:
> There is a memory setup problem on ThunderX systems with certain
> memory configurations. The symptom is
> 
>  kernel BUG at mm/page_alloc.c:1848!
> 
> This happens for some configs with 64k page size enabled. The bug
> triggers for page zones with some pages in the zone not assigned to
> this particular zone. In my case some pages that are marked as nomap
> were not reassigned to the new zone of node 1, so those are still
> assigned to node 0.
> 
> The reason for the mis-configuration is a change in pfn_valid() which
> reports pages marked nomap as invalid:
> 
>  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear 
> mapping
> 
> This causes pages marked as nomap being no long reassigned to the new
> zone in memmap_init_zone() by calling __init_single_pfn().
> 
> Fixing this by restoring the old behavior of pfn_valid() to use
> memblock_is_memory(). Also changing users of pfn_valid() in arm64 code
> to use memblock_is_map_memory() where necessary. This only affects
> code in ioremap.c. The code in mmu.c still can use the new version of
> pfn_valid().

Below a reproducer for non-numa systems. Note that invalidating the
node id just simulates a different node in reality.

The patch injects a (pageblock_order) unaligned NOMAP mem range at the
end of a memory block and then tries to free that area. This causes a
BUG_ON() (log attached).

-Robert



>From 20d853e300c99be5420c7ee3f072c318804cac1b Mon Sep 17 00:00:00 2001
From: root 
Date: Tue, 1 Nov 2016 15:04:43 +
Subject: [PATCH] mm-fault-reproducer

Signed-off-by: root 
---
 arch/arm64/mm/init.c | 78 
 mm/page_alloc.c  |  4 ++-
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 21c489bdeb4e..feaa7ab97551 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -301,6 +302,80 @@ void __init arm64_memblock_init(void)
memblock_allow_resize();
 }
 
+static struct page *inject_pageblock;
+
+static void __init inject_nomap_create(void)
+{
+   phys_addr_t start, end;
+   unsigned long start_pfn, end_pfn;
+   u64 i;
+   int ret = -ENOMEM;
+
+   pr_info("%s: PAGES_PER_SECTION=%08lx pageblock_nr_pages=%08lx 
PAGE_SIZE=%08lx\n",
+   __func__, PAGES_PER_SECTION, pageblock_nr_pages, PAGE_SIZE);
+
+   /*
+* find a mem range with a complet pageblock in it
+*/
+   for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , , 
NULL) {
+   start_pfn = PFN_DOWN(start);
+   end_pfn = PFN_UP(end);
+   if  (end_pfn - (start_pfn & ~(pageblock_nr_pages-1)) > 2 * 
pageblock_nr_pages)
+   break;
+   }
+
+   if (i == ULLONG_MAX)
+   goto fail;
+
+   start = PFN_PHYS(start_pfn);
+   end = PFN_PHYS(end_pfn) - 1;
+
+   pr_info("%s: Injecting into range: [%pa-%pa]\n", __func__, , 
);
+
+   /* mark the upper 5 pages nomap of a complete pageblock */
+   start_pfn = end_pfn & ~(pageblock_nr_pages-1);
+   start_pfn -= 5; /* unalign by 5 pages */
+
+   start = PFN_PHYS(start_pfn);
+   end = PFN_PHYS(end_pfn) - 1;
+
+   ret = memblock_mark_nomap(start, end - start + 1);
+   if (ret)
+   goto fail;
+
+   inject_pageblock = pfn_to_page(start_pfn & ~(pageblock_nr_pages-1));
+
+   pr_info("%s: Injected nomap range at: [%pa-%pa] zones: %p %p\n", 
__func__,
+   , , page_zone(inject_pageblock),
+   page_zone(inject_pageblock + pageblock_nr_pages - 1));
+
+   return;
+fail:
+   pr_err("%s: Could not inject_unaligned_range: %d\n", __func__, ret);
+}
+
+static void __init inject_nomap_move(void)
+{
+   phys_addr_t start, end;
+   int ret;
+
+   if (!inject_pageblock)
+   return;
+
+   start = PFN_PHYS(page_to_pfn(inject_pageblock));
+   end = PFN_PHYS(page_to_pfn(inject_pageblock) + pageblock_nr_pages) - 1;
+
+   pr_info("%s: Moving [%pa-%pa] zones: %p %p\n", __func__,
+   , , page_zone(inject_pageblock),
+   page_zone(inject_pageblock + pageblock_nr_pages - 1));
+
+   ret = move_freepages_block(page_zone(inject_pageblock),
+   inject_pageblock,
+   gfpflags_to_migratetype(GFP_KERNEL));
+
+   pr_info("%s: Moved %d pages\n", __func__, ret);
+}
+
 void __init bootmem_init(void)
 {
unsigned long min, max;
@@ -320,6 +395,7 @@ void __init bootmem_init(void)
arm64_memory_present();
 
sparse_init();
+   inject_nomap_create();
zone_sizes_init(min, max);
 
high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
@@ -479,6 +555,8 @@ void __init mem_init(void)
 */

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-11-01 Thread Robert Richter
On 06.10.16 11:52:07, Robert Richter wrote:
> There is a memory setup problem on ThunderX systems with certain
> memory configurations. The symptom is
> 
>  kernel BUG at mm/page_alloc.c:1848!
> 
> This happens for some configs with 64k page size enabled. The bug
> triggers for page zones with some pages in the zone not assigned to
> this particular zone. In my case some pages that are marked as nomap
> were not reassigned to the new zone of node 1, so those are still
> assigned to node 0.
> 
> The reason for the mis-configuration is a change in pfn_valid() which
> reports pages marked nomap as invalid:
> 
>  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear 
> mapping
> 
> This causes pages marked as nomap being no long reassigned to the new
> zone in memmap_init_zone() by calling __init_single_pfn().
> 
> Fixing this by restoring the old behavior of pfn_valid() to use
> memblock_is_memory(). Also changing users of pfn_valid() in arm64 code
> to use memblock_is_map_memory() where necessary. This only affects
> code in ioremap.c. The code in mmu.c still can use the new version of
> pfn_valid().

Below a reproducer for non-numa systems. Note that invalidating the
node id just simulates a different node in reality.

The patch injects a (pageblock_order) unaligned NOMAP mem range at the
end of a memory block and then tries to free that area. This causes a
BUG_ON() (log attached).

-Robert



>From 20d853e300c99be5420c7ee3f072c318804cac1b Mon Sep 17 00:00:00 2001
From: root 
Date: Tue, 1 Nov 2016 15:04:43 +
Subject: [PATCH] mm-fault-reproducer

Signed-off-by: root 
---
 arch/arm64/mm/init.c | 78 
 mm/page_alloc.c  |  4 ++-
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 21c489bdeb4e..feaa7ab97551 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -301,6 +302,80 @@ void __init arm64_memblock_init(void)
memblock_allow_resize();
 }
 
+static struct page *inject_pageblock;
+
+static void __init inject_nomap_create(void)
+{
+   phys_addr_t start, end;
+   unsigned long start_pfn, end_pfn;
+   u64 i;
+   int ret = -ENOMEM;
+
+   pr_info("%s: PAGES_PER_SECTION=%08lx pageblock_nr_pages=%08lx 
PAGE_SIZE=%08lx\n",
+   __func__, PAGES_PER_SECTION, pageblock_nr_pages, PAGE_SIZE);
+
+   /*
+* find a mem range with a complet pageblock in it
+*/
+   for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , , 
NULL) {
+   start_pfn = PFN_DOWN(start);
+   end_pfn = PFN_UP(end);
+   if  (end_pfn - (start_pfn & ~(pageblock_nr_pages-1)) > 2 * 
pageblock_nr_pages)
+   break;
+   }
+
+   if (i == ULLONG_MAX)
+   goto fail;
+
+   start = PFN_PHYS(start_pfn);
+   end = PFN_PHYS(end_pfn) - 1;
+
+   pr_info("%s: Injecting into range: [%pa-%pa]\n", __func__, , 
);
+
+   /* mark the upper 5 pages nomap of a complete pageblock */
+   start_pfn = end_pfn & ~(pageblock_nr_pages-1);
+   start_pfn -= 5; /* unalign by 5 pages */
+
+   start = PFN_PHYS(start_pfn);
+   end = PFN_PHYS(end_pfn) - 1;
+
+   ret = memblock_mark_nomap(start, end - start + 1);
+   if (ret)
+   goto fail;
+
+   inject_pageblock = pfn_to_page(start_pfn & ~(pageblock_nr_pages-1));
+
+   pr_info("%s: Injected nomap range at: [%pa-%pa] zones: %p %p\n", 
__func__,
+   , , page_zone(inject_pageblock),
+   page_zone(inject_pageblock + pageblock_nr_pages - 1));
+
+   return;
+fail:
+   pr_err("%s: Could not inject_unaligned_range: %d\n", __func__, ret);
+}
+
+static void __init inject_nomap_move(void)
+{
+   phys_addr_t start, end;
+   int ret;
+
+   if (!inject_pageblock)
+   return;
+
+   start = PFN_PHYS(page_to_pfn(inject_pageblock));
+   end = PFN_PHYS(page_to_pfn(inject_pageblock) + pageblock_nr_pages) - 1;
+
+   pr_info("%s: Moving [%pa-%pa] zones: %p %p\n", __func__,
+   , , page_zone(inject_pageblock),
+   page_zone(inject_pageblock + pageblock_nr_pages - 1));
+
+   ret = move_freepages_block(page_zone(inject_pageblock),
+   inject_pageblock,
+   gfpflags_to_migratetype(GFP_KERNEL));
+
+   pr_info("%s: Moved %d pages\n", __func__, ret);
+}
+
 void __init bootmem_init(void)
 {
unsigned long min, max;
@@ -320,6 +395,7 @@ void __init bootmem_init(void)
arm64_memory_present();
 
sparse_init();
+   inject_nomap_create();
zone_sizes_init(min, max);
 
high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
@@ -479,6 +555,8 @@ void __init mem_init(void)
 */
sysctl_overcommit_memory = 

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-28 Thread Robert Richter
On 27.10.16 17:01:36, Will Deacon wrote:
> Hi Robert,
> 
> On Mon, Oct 17, 2016 at 08:58:01PM +0200, Robert Richter wrote:
> > Mark, Will, any opinion here?
> 
> Having looking at this, I'm inclined to agree with you; pfn_valid() is
> all about whether the underlying mem_map (struct page *) entry exists,
> not about whether the page is mappable or not.
> 
> That said, setting the zone for pages representing NOMAP memory feels
> like a slippery slope to losing information about them being NOMAP in
> the first place and the whole problem getting out-of-hand. Whilst I'm
> happy for pfn_valid() to return true (in the sense that we're within
> bounds of mem_map etc), I'm less happy that we're also saying that the
> struct page contains useful information, such as the zone and the node
> information, which is then subsequently used by the NUMA code.

Let's see it in a different way, pfns and the struct page assigned to
each of it is about *physical* memory. The system knows all the
memory, some is free, some reserved and some marked NOMAP. Regardless
of the mapping of the page the mm code maintains and uses that
information.

There are assumptions on validity and checks in the code that now
cause problems due to partly or non-existing data about nomap pages.
This inconsistency is dangerous since a problem may occur any time
then the page area is accessed first, thus a system may crash randomly
depending on the memory access. Luckily, in my case it triggered
reproducible while loading initrd during boot.

I also think that this is not only NUMA related. E.g. the following
bug report is probably also related:

 https://bugzilla.redhat.com/show_bug.cgi?id=1387793

> On top of that, pfn_valid is used in other places as a coarse "is this
> memory?" check, and will cause things like ioremap to fail whereas it
> wouldn't at the moment.

IMO this is a misuse of pfn_valid() that needs to be fixed with
additional checks, e.g. traversing memblocks.

> It feels to me like NOMAP memory is a new type
> of memory where there *is* a struct page, but it shouldn't be used for
> anything.

IMO, a NOMAP page should just be handled like a reserved page except
that the page is marked reserved. See free_low_memory_core_early().
Thus, NOMAP pages are not in the free pages list or set to reserved.
It is simply not available for mapping at all. Isn't that exactly what
it should be?

I also did not yet understand the benefit of the differentiation
between NOMAP and reserved and the original motivation for its
implementation. I looked through the mail threads but could not find
any hint. The only difference I see now is that it is not listed as a
reserved page, but as long as it is not freed it should behave the
same. I remember the case to handle memory different (coherency,
etc.), but are not sure here. Ard, could you explain this?

> I don't think pfn_valid can describe that, given the way it's
> currently used, and flipping the logic is just likely to move the problem
> elsewhere.
> 
> What options do we have for fixing this in the NUMA code?

Out of my mind:

1) Treat NOMAP pages same as reserved pages (my patch).

2) Change mm code to allow arch specific early_pfn_valid().

3) Fix mm code to only access stuct page (of a zone) if pfn_valid() is
   true.

There can be more alternatives. IMO:

 * We shouldn't touch generic mm code.

 * We should maintain a valid struct page for all pages in a sections.

 * We should only traverse memblock where really necessary (arm64
   only).

 * I don't think this problem is numa specific.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-28 Thread Robert Richter
On 27.10.16 17:01:36, Will Deacon wrote:
> Hi Robert,
> 
> On Mon, Oct 17, 2016 at 08:58:01PM +0200, Robert Richter wrote:
> > Mark, Will, any opinion here?
> 
> Having looking at this, I'm inclined to agree with you; pfn_valid() is
> all about whether the underlying mem_map (struct page *) entry exists,
> not about whether the page is mappable or not.
> 
> That said, setting the zone for pages representing NOMAP memory feels
> like a slippery slope to losing information about them being NOMAP in
> the first place and the whole problem getting out-of-hand. Whilst I'm
> happy for pfn_valid() to return true (in the sense that we're within
> bounds of mem_map etc), I'm less happy that we're also saying that the
> struct page contains useful information, such as the zone and the node
> information, which is then subsequently used by the NUMA code.

Let's see it in a different way, pfns and the struct page assigned to
each of it is about *physical* memory. The system knows all the
memory, some is free, some reserved and some marked NOMAP. Regardless
of the mapping of the page the mm code maintains and uses that
information.

There are assumptions on validity and checks in the code that now
cause problems due to partly or non-existing data about nomap pages.
This inconsistency is dangerous since a problem may occur any time
then the page area is accessed first, thus a system may crash randomly
depending on the memory access. Luckily, in my case it triggered
reproducible while loading initrd during boot.

I also think that this is not only NUMA related. E.g. the following
bug report is probably also related:

 https://bugzilla.redhat.com/show_bug.cgi?id=1387793

> On top of that, pfn_valid is used in other places as a coarse "is this
> memory?" check, and will cause things like ioremap to fail whereas it
> wouldn't at the moment.

IMO this is a misuse of pfn_valid() that needs to be fixed with
additional checks, e.g. traversing memblocks.

> It feels to me like NOMAP memory is a new type
> of memory where there *is* a struct page, but it shouldn't be used for
> anything.

IMO, a NOMAP page should just be handled like a reserved page except
that the page is marked reserved. See free_low_memory_core_early().
Thus, NOMAP pages are not in the free pages list or set to reserved.
It is simply not available for mapping at all. Isn't that exactly what
it should be?

I also did not yet understand the benefit of the differentiation
between NOMAP and reserved and the original motivation for its
implementation. I looked through the mail threads but could not find
any hint. The only difference I see now is that it is not listed as a
reserved page, but as long as it is not freed it should behave the
same. I remember the case to handle memory different (coherency,
etc.), but are not sure here. Ard, could you explain this?

> I don't think pfn_valid can describe that, given the way it's
> currently used, and flipping the logic is just likely to move the problem
> elsewhere.
> 
> What options do we have for fixing this in the NUMA code?

Out of my mind:

1) Treat NOMAP pages same as reserved pages (my patch).

2) Change mm code to allow arch specific early_pfn_valid().

3) Fix mm code to only access stuct page (of a zone) if pfn_valid() is
   true.

There can be more alternatives. IMO:

 * We shouldn't touch generic mm code.

 * We should maintain a valid struct page for all pages in a sections.

 * We should only traverse memblock where really necessary (arm64
   only).

 * I don't think this problem is numa specific.

-Robert


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-27 Thread Will Deacon
Hi Robert,

On Mon, Oct 17, 2016 at 08:58:01PM +0200, Robert Richter wrote:
> Mark, Will, any opinion here?

Having looking at this, I'm inclined to agree with you; pfn_valid() is
all about whether the underlying mem_map (struct page *) entry exists,
not about whether the page is mappable or not.

That said, setting the zone for pages representing NOMAP memory feels
like a slippery slope to losing information about them being NOMAP in
the first place and the whole problem getting out-of-hand. Whilst I'm
happy for pfn_valid() to return true (in the sense that we're within
bounds of mem_map etc), I'm less happy that we're also saying that the
struct page contains useful information, such as the zone and the node
information, which is then subsequently used by the NUMA code.

On top of that, pfn_valid is used in other places as a coarse "is this
memory?" check, and will cause things like ioremap to fail whereas it
wouldn't at the moment. It feels to me like NOMAP memory is a new type
of memory where there *is* a struct page, but it shouldn't be used for
anything. I don't think pfn_valid can describe that, given the way it's
currently used, and flipping the logic is just likely to move the problem
elsewhere.

What options do we have for fixing this in the NUMA code?

Will


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-27 Thread Will Deacon
Hi Robert,

On Mon, Oct 17, 2016 at 08:58:01PM +0200, Robert Richter wrote:
> Mark, Will, any opinion here?

Having looking at this, I'm inclined to agree with you; pfn_valid() is
all about whether the underlying mem_map (struct page *) entry exists,
not about whether the page is mappable or not.

That said, setting the zone for pages representing NOMAP memory feels
like a slippery slope to losing information about them being NOMAP in
the first place and the whole problem getting out-of-hand. Whilst I'm
happy for pfn_valid() to return true (in the sense that we're within
bounds of mem_map etc), I'm less happy that we're also saying that the
struct page contains useful information, such as the zone and the node
information, which is then subsequently used by the NUMA code.

On top of that, pfn_valid is used in other places as a coarse "is this
memory?" check, and will cause things like ioremap to fail whereas it
wouldn't at the moment. It feels to me like NOMAP memory is a new type
of memory where there *is* a struct page, but it shouldn't be used for
anything. I don't think pfn_valid can describe that, given the way it's
currently used, and flipping the logic is just likely to move the problem
elsewhere.

What options do we have for fixing this in the NUMA code?

Will


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-18 Thread Robert Richter
Mark,

thanks for your answer. See below. I also attached the full crash
dump.

On 18.10.16 11:18:36, Mark Rutland wrote:
> Hi Robert, Ard,
> 
> Sorry for the delay in getting to this; I've been travelling a lot
> lately and in the meantime this managed to get buried in my inbox.
> 
> On Thu, Oct 06, 2016 at 06:11:14PM +0200, Robert Richter wrote:
> > On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > > On 6 October 2016 at 10:52, Robert Richter  wrote:
> > > > There is a memory setup problem on ThunderX systems with certain
> > > > memory configurations. The symptom is
> > > >
> > > >  kernel BUG at mm/page_alloc.c:1848!
> > > >
> > > > This happens for some configs with 64k page size enabled. The bug
> > > > triggers for page zones with some pages in the zone not assigned to
> > > > this particular zone. In my case some pages that are marked as nomap
> > > > were not reassigned to the new zone of node 1, so those are still
> > > > assigned to node 0.
> > > >
> > > > The reason for the mis-configuration is a change in pfn_valid() which
> > > > reports pages marked nomap as invalid:
> > > >
> > > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for 
> > > > linear mapping
> > > 
> > > These pages are owned by the firmware, which may map it with
> > > attributes that conflict with the attributes we use for the linear
> > > mapping. This means they should not be covered by the linear mapping.
> > > 
> > > > This causes pages marked as nomap being no long reassigned to the new
> > > > zone in memmap_init_zone() by calling __init_single_pfn().
> 
> Why do we have pages for a nomap region? Given the region shouldn't be
> in the linear mapping, and isn't suitable for general allocation, I
> don't believe it makes sense to have a struct page for any part of it.
> 
> Am I missing some reason that we require a struct page?
> 
> e.g. is it just easier to allocate an unused struct page than to carve
> it out?

Pages are handled in blocks with size MAX_ORDER_NR_PAGES. The start
and end pfn of a memory region is aligned then to fit the mem block
size (see memmap_init_zone()). Therefore a memblock may contain pages
without underlying physical memory.

mm code requires the whole memmap to be initialized, this means that
for each page in the whole mem block there is a valid struct page. See
e.g. move_freepages_block() and move_freepages(), stuct page is
accessed even before pfn_valid() is used. I assume there are other
occurrences of that too.

My interpretation is that pfn_valid() checks for the existence of a
valid struct page, there must not necessarily phys memory mapped to
it. This is the reason why I changed pfn_valid() to use
memblock_is_memory() which is sufficient for generic mm code. Only in
arm64 mm code I additinally added the memblock_is_map_memory() check
where pfn_valid() was used.

> 
> > > This sounds like the root cause of your issue. Could we not fix that 
> > > instead?
> > 
> > Yes, this is proposal b) from my last mail that would work too: I
> > implemented an arm64 private early_pfn_valid() function that uses
> > memblock_is_memory() to setup all pages of a zone. Though, I think
> > this is the wrong way and thus I prefer this patch instead. I see
> > serveral reasons for this:
> > 
> > Inconsistent use of struct *page, it is initialized but never used
> > again.
> 
> As above, I don't believe we should have a struct page to initialise in
> the first place.
> 
> > Other archs only do a basic range check in pfn_valid(), the default
> > implementation just returns if the whole section is valid. As I
> > understand the code, if the mem range is not aligned to the section,
> > then there will be pfn's in the section that don't have physical mem
> > attached. The page is then just initialized, it's not marked reserved
> > nor the refcount is non-zero. It is then simply not used. This is how
> > no-map pages should be handled too.
> > 
> > I think pfn_valid() is just a quick check if the pfn's struct *page
> > can be used. There is a good description for this in include/linux/
> > mmzone.h. So there can be memory holes that have a valid pfn.
> 
> I take it you mean the comment in the CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
> ifdef (line 1266 in v4.9-rc1)?

Yes.

> 
> I'm not sufficiently acquainted with the memmap code to follow; I'll
> need to dig into that a bit further.
> 
> > If the no-map memory needs special handling, then additional checks
> > need to be added to the particular code (as in ioremap.c). It's imo
> > wrong to (mis-)use pfn_valid for that.
> > 
> > Variant b) involves generic mm code to fix it for arm64, this patch is
> > an arm64 change only. This makes it harder to get a fix for it.
> > (Though maybe only a problem of patch logistics.)
> > 
> > > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > > memblock_is_memory().
> > > 
> > > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > > covered by the linear 

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-18 Thread Robert Richter
Mark,

thanks for your answer. See below. I also attached the full crash
dump.

On 18.10.16 11:18:36, Mark Rutland wrote:
> Hi Robert, Ard,
> 
> Sorry for the delay in getting to this; I've been travelling a lot
> lately and in the meantime this managed to get buried in my inbox.
> 
> On Thu, Oct 06, 2016 at 06:11:14PM +0200, Robert Richter wrote:
> > On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > > On 6 October 2016 at 10:52, Robert Richter  wrote:
> > > > There is a memory setup problem on ThunderX systems with certain
> > > > memory configurations. The symptom is
> > > >
> > > >  kernel BUG at mm/page_alloc.c:1848!
> > > >
> > > > This happens for some configs with 64k page size enabled. The bug
> > > > triggers for page zones with some pages in the zone not assigned to
> > > > this particular zone. In my case some pages that are marked as nomap
> > > > were not reassigned to the new zone of node 1, so those are still
> > > > assigned to node 0.
> > > >
> > > > The reason for the mis-configuration is a change in pfn_valid() which
> > > > reports pages marked nomap as invalid:
> > > >
> > > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for 
> > > > linear mapping
> > > 
> > > These pages are owned by the firmware, which may map it with
> > > attributes that conflict with the attributes we use for the linear
> > > mapping. This means they should not be covered by the linear mapping.
> > > 
> > > > This causes pages marked as nomap being no long reassigned to the new
> > > > zone in memmap_init_zone() by calling __init_single_pfn().
> 
> Why do we have pages for a nomap region? Given the region shouldn't be
> in the linear mapping, and isn't suitable for general allocation, I
> don't believe it makes sense to have a struct page for any part of it.
> 
> Am I missing some reason that we require a struct page?
> 
> e.g. is it just easier to allocate an unused struct page than to carve
> it out?

Pages are handled in blocks with size MAX_ORDER_NR_PAGES. The start
and end pfn of a memory region is aligned then to fit the mem block
size (see memmap_init_zone()). Therefore a memblock may contain pages
without underlying physical memory.

mm code requires the whole memmap to be initialized, this means that
for each page in the whole mem block there is a valid struct page. See
e.g. move_freepages_block() and move_freepages(), stuct page is
accessed even before pfn_valid() is used. I assume there are other
occurrences of that too.

My interpretation is that pfn_valid() checks for the existence of a
valid struct page, there must not necessarily phys memory mapped to
it. This is the reason why I changed pfn_valid() to use
memblock_is_memory() which is sufficient for generic mm code. Only in
arm64 mm code I additinally added the memblock_is_map_memory() check
where pfn_valid() was used.

> 
> > > This sounds like the root cause of your issue. Could we not fix that 
> > > instead?
> > 
> > Yes, this is proposal b) from my last mail that would work too: I
> > implemented an arm64 private early_pfn_valid() function that uses
> > memblock_is_memory() to setup all pages of a zone. Though, I think
> > this is the wrong way and thus I prefer this patch instead. I see
> > serveral reasons for this:
> > 
> > Inconsistent use of struct *page, it is initialized but never used
> > again.
> 
> As above, I don't believe we should have a struct page to initialise in
> the first place.
> 
> > Other archs only do a basic range check in pfn_valid(), the default
> > implementation just returns if the whole section is valid. As I
> > understand the code, if the mem range is not aligned to the section,
> > then there will be pfn's in the section that don't have physical mem
> > attached. The page is then just initialized, it's not marked reserved
> > nor the refcount is non-zero. It is then simply not used. This is how
> > no-map pages should be handled too.
> > 
> > I think pfn_valid() is just a quick check if the pfn's struct *page
> > can be used. There is a good description for this in include/linux/
> > mmzone.h. So there can be memory holes that have a valid pfn.
> 
> I take it you mean the comment in the CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
> ifdef (line 1266 in v4.9-rc1)?

Yes.

> 
> I'm not sufficiently acquainted with the memmap code to follow; I'll
> need to dig into that a bit further.
> 
> > If the no-map memory needs special handling, then additional checks
> > need to be added to the particular code (as in ioremap.c). It's imo
> > wrong to (mis-)use pfn_valid for that.
> > 
> > Variant b) involves generic mm code to fix it for arm64, this patch is
> > an arm64 change only. This makes it harder to get a fix for it.
> > (Though maybe only a problem of patch logistics.)
> > 
> > > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > > memblock_is_memory().
> > > 
> > > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > > covered by the linear mapping and the struct 

Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-18 Thread Mark Rutland
Hi Robert, Ard,

Sorry for the delay in getting to this; I've been travelling a lot
lately and in the meantime this managed to get buried in my inbox.

On Thu, Oct 06, 2016 at 06:11:14PM +0200, Robert Richter wrote:
> On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > On 6 October 2016 at 10:52, Robert Richter  wrote:
> > > There is a memory setup problem on ThunderX systems with certain
> > > memory configurations. The symptom is
> > >
> > >  kernel BUG at mm/page_alloc.c:1848!
> > >
> > > This happens for some configs with 64k page size enabled. The bug
> > > triggers for page zones with some pages in the zone not assigned to
> > > this particular zone. In my case some pages that are marked as nomap
> > > were not reassigned to the new zone of node 1, so those are still
> > > assigned to node 0.
> > >
> > > The reason for the mis-configuration is a change in pfn_valid() which
> > > reports pages marked nomap as invalid:
> > >
> > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for 
> > > linear mapping
> > 
> > These pages are owned by the firmware, which may map it with
> > attributes that conflict with the attributes we use for the linear
> > mapping. This means they should not be covered by the linear mapping.
> > 
> > > This causes pages marked as nomap being no long reassigned to the new
> > > zone in memmap_init_zone() by calling __init_single_pfn().

Why do we have pages for a nomap region? Given the region shouldn't be
in the linear mapping, and isn't suitable for general allocation, I
don't believe it makes sense to have a struct page for any part of it.

Am I missing some reason that we require a struct page?

e.g. is it just easier to allocate an unused struct page than to carve
it out?

> > This sounds like the root cause of your issue. Could we not fix that 
> > instead?
> 
> Yes, this is proposal b) from my last mail that would work too: I
> implemented an arm64 private early_pfn_valid() function that uses
> memblock_is_memory() to setup all pages of a zone. Though, I think
> this is the wrong way and thus I prefer this patch instead. I see
> serveral reasons for this:
> 
> Inconsistent use of struct *page, it is initialized but never used
> again.

As above, I don't believe we should have a struct page to initialise in
the first place.

> Other archs only do a basic range check in pfn_valid(), the default
> implementation just returns if the whole section is valid. As I
> understand the code, if the mem range is not aligned to the section,
> then there will be pfn's in the section that don't have physical mem
> attached. The page is then just initialized, it's not marked reserved
> nor the refcount is non-zero. It is then simply not used. This is how
> no-map pages should be handled too.
> 
> I think pfn_valid() is just a quick check if the pfn's struct *page
> can be used. There is a good description for this in include/linux/
> mmzone.h. So there can be memory holes that have a valid pfn.

I take it you mean the comment in the CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
ifdef (line 1266 in v4.9-rc1)?

I'm not sufficiently acquainted with the memmap code to follow; I'll
need to dig into that a bit further.

> If the no-map memory needs special handling, then additional checks
> need to be added to the particular code (as in ioremap.c). It's imo
> wrong to (mis-)use pfn_valid for that.
> 
> Variant b) involves generic mm code to fix it for arm64, this patch is
> an arm64 change only. This makes it harder to get a fix for it.
> (Though maybe only a problem of patch logistics.)
> 
> > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > memblock_is_memory().
> > 
> > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > covered by the linear mapping and the struct page array. Returning
> > reserved ranges that the kernel should not even touch only to please
> > the NUMA code seems like an inappropriate way to deal with this issue.
> 
> As said above, it is not marked as reserved, it is treated like
> non-existing memory.

I think Ard was using "reserved" in the more general sense than the
Linux-specific meaning. NOMAP is distinct from the Linux concept of
"reserved" memory, but is "reserved" in some sense.

Memory with NOMAP is meant to be treated as non-existent for the purpose
of the linear mapping (and thus for the purpose of struct page).

> This has been observed for non-numa kernels too and can happen for
> each zone that is only partly initialized.
> 
> I think the patch addresses your concerns. I can't see there the
> kernel uses memory marked as nomap in a wrong way.

I'll have to dig into this locally; I'm still not familiar enough with
this code to know what the right thing to do is.

Thanks,
Mark.


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-18 Thread Mark Rutland
Hi Robert, Ard,

Sorry for the delay in getting to this; I've been travelling a lot
lately and in the meantime this managed to get buried in my inbox.

On Thu, Oct 06, 2016 at 06:11:14PM +0200, Robert Richter wrote:
> On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > On 6 October 2016 at 10:52, Robert Richter  wrote:
> > > There is a memory setup problem on ThunderX systems with certain
> > > memory configurations. The symptom is
> > >
> > >  kernel BUG at mm/page_alloc.c:1848!
> > >
> > > This happens for some configs with 64k page size enabled. The bug
> > > triggers for page zones with some pages in the zone not assigned to
> > > this particular zone. In my case some pages that are marked as nomap
> > > were not reassigned to the new zone of node 1, so those are still
> > > assigned to node 0.
> > >
> > > The reason for the mis-configuration is a change in pfn_valid() which
> > > reports pages marked nomap as invalid:
> > >
> > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for 
> > > linear mapping
> > 
> > These pages are owned by the firmware, which may map it with
> > attributes that conflict with the attributes we use for the linear
> > mapping. This means they should not be covered by the linear mapping.
> > 
> > > This causes pages marked as nomap being no long reassigned to the new
> > > zone in memmap_init_zone() by calling __init_single_pfn().

Why do we have pages for a nomap region? Given the region shouldn't be
in the linear mapping, and isn't suitable for general allocation, I
don't believe it makes sense to have a struct page for any part of it.

Am I missing some reason that we require a struct page?

e.g. is it just easier to allocate an unused struct page than to carve
it out?

> > This sounds like the root cause of your issue. Could we not fix that 
> > instead?
> 
> Yes, this is proposal b) from my last mail that would work too: I
> implemented an arm64 private early_pfn_valid() function that uses
> memblock_is_memory() to setup all pages of a zone. Though, I think
> this is the wrong way and thus I prefer this patch instead. I see
> serveral reasons for this:
> 
> Inconsistent use of struct *page, it is initialized but never used
> again.

As above, I don't believe we should have a struct page to initialise in
the first place.

> Other archs only do a basic range check in pfn_valid(), the default
> implementation just returns if the whole section is valid. As I
> understand the code, if the mem range is not aligned to the section,
> then there will be pfn's in the section that don't have physical mem
> attached. The page is then just initialized, it's not marked reserved
> nor the refcount is non-zero. It is then simply not used. This is how
> no-map pages should be handled too.
> 
> I think pfn_valid() is just a quick check if the pfn's struct *page
> can be used. There is a good description for this in include/linux/
> mmzone.h. So there can be memory holes that have a valid pfn.

I take it you mean the comment in the CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
ifdef (line 1266 in v4.9-rc1)?

I'm not sufficiently acquainted with the memmap code to follow; I'll
need to dig into that a bit further.

> If the no-map memory needs special handling, then additional checks
> need to be added to the particular code (as in ioremap.c). It's imo
> wrong to (mis-)use pfn_valid for that.
> 
> Variant b) involves generic mm code to fix it for arm64, this patch is
> an arm64 change only. This makes it harder to get a fix for it.
> (Though maybe only a problem of patch logistics.)
> 
> > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > memblock_is_memory().
> > 
> > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > covered by the linear mapping and the struct page array. Returning
> > reserved ranges that the kernel should not even touch only to please
> > the NUMA code seems like an inappropriate way to deal with this issue.
> 
> As said above, it is not marked as reserved, it is treated like
> non-existing memory.

I think Ard was using "reserved" in the more general sense than the
Linux-specific meaning. NOMAP is distinct from the Linux concept of
"reserved" memory, but is "reserved" in some sense.

Memory with NOMAP is meant to be treated as non-existent for the purpose
of the linear mapping (and thus for the purpose of struct page).

> This has been observed for non-numa kernels too and can happen for
> each zone that is only partly initialized.
> 
> I think the patch addresses your concerns. I can't see there the
> kernel uses memory marked as nomap in a wrong way.

I'll have to dig into this locally; I'm still not familiar enough with
this code to know what the right thing to do is.

Thanks,
Mark.


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-17 Thread Robert Richter
Mark, Will, any opinion here?

Thanks,

-Robert

On 06.10.16 18:11:14, Robert Richter wrote:
> Ard,
> 
> thank you for your answer and you explanation.
> 
> On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > On 6 October 2016 at 10:52, Robert Richter  wrote:
> > > There is a memory setup problem on ThunderX systems with certain
> > > memory configurations. The symptom is
> > >
> > >  kernel BUG at mm/page_alloc.c:1848!
> > >
> > > This happens for some configs with 64k page size enabled. The bug
> > > triggers for page zones with some pages in the zone not assigned to
> > > this particular zone. In my case some pages that are marked as nomap
> > > were not reassigned to the new zone of node 1, so those are still
> > > assigned to node 0.
> > >
> > > The reason for the mis-configuration is a change in pfn_valid() which
> > > reports pages marked nomap as invalid:
> > >
> > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for 
> > > linear mapping
> > >
> > 
> > These pages are owned by the firmware, which may map it with
> > attributes that conflict with the attributes we use for the linear
> > mapping. This means they should not be covered by the linear mapping.
> > 
> > > This causes pages marked as nomap being no long reassigned to the new
> > > zone in memmap_init_zone() by calling __init_single_pfn().
> > >
> > 
> > This sounds like the root cause of your issue. Could we not fix that 
> > instead?
> 
> Yes, this is proposal b) from my last mail that would work too: I
> implemented an arm64 private early_pfn_valid() function that uses
> memblock_is_memory() to setup all pages of a zone. Though, I think
> this is the wrong way and thus I prefer this patch instead. I see
> serveral reasons for this:
> 
> Inconsistent use of struct *page, it is initialized but never used
> again.
> 
> Other archs only do a basic range check in pfn_valid(), the default
> implementation just returns if the whole section is valid. As I
> understand the code, if the mem range is not aligned to the section,
> then there will be pfn's in the section that don't have physical mem
> attached. The page is then just initialized, it's not marked reserved
> nor the refcount is non-zero. It is then simply not used. This is how
> no-map pages should be handled too.
> 
> I think pfn_valid() is just a quick check if the pfn's struct *page
> can be used. There is a good description for this in include/linux/
> mmzone.h. So there can be memory holes that have a valid pfn.
> 
> If the no-map memory needs special handling, then additional checks
> need to be added to the particular code (as in ioremap.c). It's imo
> wrong to (mis-)use pfn_valid for that.
> 
> Variant b) involves generic mm code to fix it for arm64, this patch is
> an arm64 change only. This makes it harder to get a fix for it.
> (Though maybe only a problem of patch logistics.)
> 
> > 
> > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > memblock_is_memory().
> > 
> > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > covered by the linear mapping and the struct page array. Returning
> > reserved ranges that the kernel should not even touch only to please
> > the NUMA code seems like an inappropriate way to deal with this issue.
> 
> As said above, it is not marked as reserved, it is treated like
> non-existing memory.
> 
> This has been observed for non-numa kernels too and can happen for
> each zone that is only partly initialized.
> 
> I think the patch addresses your concerns. I can't see there the
> kernel uses memory marked as nomap in a wrong way.
> 
> Thanks,
> 
> -Robert
> 
> > 
> > > Also changing users of pfn_valid() in arm64 code
> > > to use memblock_is_map_memory() where necessary. This only affects
> > > code in ioremap.c. The code in mmu.c still can use the new version of
> > > pfn_valid().
> > >
> > > Should be marked stable v4.5..
> > >
> > > Signed-off-by: Robert Richter 
> > > ---
> > >  arch/arm64/mm/init.c| 2 +-
> > >  arch/arm64/mm/ioremap.c | 5 +++--
> > >  2 files changed, 4 insertions(+), 3 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-17 Thread Robert Richter
Mark, Will, any opinion here?

Thanks,

-Robert

On 06.10.16 18:11:14, Robert Richter wrote:
> Ard,
> 
> thank you for your answer and you explanation.
> 
> On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> > On 6 October 2016 at 10:52, Robert Richter  wrote:
> > > There is a memory setup problem on ThunderX systems with certain
> > > memory configurations. The symptom is
> > >
> > >  kernel BUG at mm/page_alloc.c:1848!
> > >
> > > This happens for some configs with 64k page size enabled. The bug
> > > triggers for page zones with some pages in the zone not assigned to
> > > this particular zone. In my case some pages that are marked as nomap
> > > were not reassigned to the new zone of node 1, so those are still
> > > assigned to node 0.
> > >
> > > The reason for the mis-configuration is a change in pfn_valid() which
> > > reports pages marked nomap as invalid:
> > >
> > >  68709f45385a arm64: only consider memblocks with NOMAP cleared for 
> > > linear mapping
> > >
> > 
> > These pages are owned by the firmware, which may map it with
> > attributes that conflict with the attributes we use for the linear
> > mapping. This means they should not be covered by the linear mapping.
> > 
> > > This causes pages marked as nomap being no long reassigned to the new
> > > zone in memmap_init_zone() by calling __init_single_pfn().
> > >
> > 
> > This sounds like the root cause of your issue. Could we not fix that 
> > instead?
> 
> Yes, this is proposal b) from my last mail that would work too: I
> implemented an arm64 private early_pfn_valid() function that uses
> memblock_is_memory() to setup all pages of a zone. Though, I think
> this is the wrong way and thus I prefer this patch instead. I see
> serveral reasons for this:
> 
> Inconsistent use of struct *page, it is initialized but never used
> again.
> 
> Other archs only do a basic range check in pfn_valid(), the default
> implementation just returns if the whole section is valid. As I
> understand the code, if the mem range is not aligned to the section,
> then there will be pfn's in the section that don't have physical mem
> attached. The page is then just initialized, it's not marked reserved
> nor the refcount is non-zero. It is then simply not used. This is how
> no-map pages should be handled too.
> 
> I think pfn_valid() is just a quick check if the pfn's struct *page
> can be used. There is a good description for this in include/linux/
> mmzone.h. So there can be memory holes that have a valid pfn.
> 
> If the no-map memory needs special handling, then additional checks
> need to be added to the particular code (as in ioremap.c). It's imo
> wrong to (mis-)use pfn_valid for that.
> 
> Variant b) involves generic mm code to fix it for arm64, this patch is
> an arm64 change only. This makes it harder to get a fix for it.
> (Though maybe only a problem of patch logistics.)
> 
> > 
> > > Fixing this by restoring the old behavior of pfn_valid() to use
> > > memblock_is_memory().
> > 
> > This is incorrect imo. In general, pfn_valid() means ordinary memory
> > covered by the linear mapping and the struct page array. Returning
> > reserved ranges that the kernel should not even touch only to please
> > the NUMA code seems like an inappropriate way to deal with this issue.
> 
> As said above, it is not marked as reserved, it is treated like
> non-existing memory.
> 
> This has been observed for non-numa kernels too and can happen for
> each zone that is only partly initialized.
> 
> I think the patch addresses your concerns. I can't see there the
> kernel uses memory marked as nomap in a wrong way.
> 
> Thanks,
> 
> -Robert
> 
> > 
> > > Also changing users of pfn_valid() in arm64 code
> > > to use memblock_is_map_memory() where necessary. This only affects
> > > code in ioremap.c. The code in mmu.c still can use the new version of
> > > pfn_valid().
> > >
> > > Should be marked stable v4.5..
> > >
> > > Signed-off-by: Robert Richter 
> > > ---
> > >  arch/arm64/mm/init.c| 2 +-
> > >  arch/arm64/mm/ioremap.c | 5 +++--
> > >  2 files changed, 4 insertions(+), 3 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-10 Thread David Daney

On 10/06/2016 02:52 AM, Robert Richter wrote:

There is a memory setup problem on ThunderX systems with certain
memory configurations. The symptom is

  kernel BUG at mm/page_alloc.c:1848!

This happens for some configs with 64k page size enabled. The bug
triggers for page zones with some pages in the zone not assigned to
this particular zone. In my case some pages that are marked as nomap
were not reassigned to the new zone of node 1, so those are still
assigned to node 0.

The reason for the mis-configuration is a change in pfn_valid() which
reports pages marked nomap as invalid:

  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear 
mapping

This causes pages marked as nomap being no long reassigned to the new
zone in memmap_init_zone() by calling __init_single_pfn().

Fixing this by restoring the old behavior of pfn_valid() to use
memblock_is_memory(). Also changing users of pfn_valid() in arm64 code
to use memblock_is_map_memory() where necessary. This only affects
code in ioremap.c. The code in mmu.c still can use the new version of
pfn_valid().

Should be marked stable v4.5..


In that case you should add:

Cc:  # 4.5.x-

here.




Signed-off-by: Robert Richter 

[...]


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-10 Thread David Daney

On 10/06/2016 02:52 AM, Robert Richter wrote:

There is a memory setup problem on ThunderX systems with certain
memory configurations. The symptom is

  kernel BUG at mm/page_alloc.c:1848!

This happens for some configs with 64k page size enabled. The bug
triggers for page zones with some pages in the zone not assigned to
this particular zone. In my case some pages that are marked as nomap
were not reassigned to the new zone of node 1, so those are still
assigned to node 0.

The reason for the mis-configuration is a change in pfn_valid() which
reports pages marked nomap as invalid:

  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear 
mapping

This causes pages marked as nomap being no long reassigned to the new
zone in memmap_init_zone() by calling __init_single_pfn().

Fixing this by restoring the old behavior of pfn_valid() to use
memblock_is_memory(). Also changing users of pfn_valid() in arm64 code
to use memblock_is_map_memory() where necessary. This only affects
code in ioremap.c. The code in mmu.c still can use the new version of
pfn_valid().

Should be marked stable v4.5..


In that case you should add:

Cc:  # 4.5.x-

here.




Signed-off-by: Robert Richter 

[...]


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-06 Thread Robert Richter
Ard,

thank you for your answer and you explanation.

On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> On 6 October 2016 at 10:52, Robert Richter  wrote:
> > There is a memory setup problem on ThunderX systems with certain
> > memory configurations. The symptom is
> >
> >  kernel BUG at mm/page_alloc.c:1848!
> >
> > This happens for some configs with 64k page size enabled. The bug
> > triggers for page zones with some pages in the zone not assigned to
> > this particular zone. In my case some pages that are marked as nomap
> > were not reassigned to the new zone of node 1, so those are still
> > assigned to node 0.
> >
> > The reason for the mis-configuration is a change in pfn_valid() which
> > reports pages marked nomap as invalid:
> >
> >  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear 
> > mapping
> >
> 
> These pages are owned by the firmware, which may map it with
> attributes that conflict with the attributes we use for the linear
> mapping. This means they should not be covered by the linear mapping.
> 
> > This causes pages marked as nomap being no long reassigned to the new
> > zone in memmap_init_zone() by calling __init_single_pfn().
> >
> 
> This sounds like the root cause of your issue. Could we not fix that instead?

Yes, this is proposal b) from my last mail that would work too: I
implemented an arm64 private early_pfn_valid() function that uses
memblock_is_memory() to setup all pages of a zone. Though, I think
this is the wrong way and thus I prefer this patch instead. I see
serveral reasons for this:

Inconsistent use of struct *page, it is initialized but never used
again.

Other archs only do a basic range check in pfn_valid(), the default
implementation just returns if the whole section is valid. As I
understand the code, if the mem range is not aligned to the section,
then there will be pfn's in the section that don't have physical mem
attached. The page is then just initialized, it's not marked reserved
nor the refcount is non-zero. It is then simply not used. This is how
no-map pages should be handled too.

I think pfn_valid() is just a quick check if the pfn's struct *page
can be used. There is a good description for this in include/linux/
mmzone.h. So there can be memory holes that have a valid pfn.

If the no-map memory needs special handling, then additional checks
need to be added to the particular code (as in ioremap.c). It's imo
wrong to (mis-)use pfn_valid for that.

Variant b) involves generic mm code to fix it for arm64, this patch is
an arm64 change only. This makes it harder to get a fix for it.
(Though maybe only a problem of patch logistics.)

> 
> > Fixing this by restoring the old behavior of pfn_valid() to use
> > memblock_is_memory().
> 
> This is incorrect imo. In general, pfn_valid() means ordinary memory
> covered by the linear mapping and the struct page array. Returning
> reserved ranges that the kernel should not even touch only to please
> the NUMA code seems like an inappropriate way to deal with this issue.

As said above, it is not marked as reserved, it is treated like
non-existing memory.

This has been observed for non-numa kernels too and can happen for
each zone that is only partly initialized.

I think the patch addresses your concerns. I can't see there the
kernel uses memory marked as nomap in a wrong way.

Thanks,

-Robert

> 
> > Also changing users of pfn_valid() in arm64 code
> > to use memblock_is_map_memory() where necessary. This only affects
> > code in ioremap.c. The code in mmu.c still can use the new version of
> > pfn_valid().
> >
> > Should be marked stable v4.5..
> >
> > Signed-off-by: Robert Richter 
> > ---
> >  arch/arm64/mm/init.c| 2 +-
> >  arch/arm64/mm/ioremap.c | 5 +++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-06 Thread Robert Richter
Ard,

thank you for your answer and you explanation.

On 06.10.16 11:00:33, Ard Biesheuvel wrote:
> On 6 October 2016 at 10:52, Robert Richter  wrote:
> > There is a memory setup problem on ThunderX systems with certain
> > memory configurations. The symptom is
> >
> >  kernel BUG at mm/page_alloc.c:1848!
> >
> > This happens for some configs with 64k page size enabled. The bug
> > triggers for page zones with some pages in the zone not assigned to
> > this particular zone. In my case some pages that are marked as nomap
> > were not reassigned to the new zone of node 1, so those are still
> > assigned to node 0.
> >
> > The reason for the mis-configuration is a change in pfn_valid() which
> > reports pages marked nomap as invalid:
> >
> >  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear 
> > mapping
> >
> 
> These pages are owned by the firmware, which may map it with
> attributes that conflict with the attributes we use for the linear
> mapping. This means they should not be covered by the linear mapping.
> 
> > This causes pages marked as nomap being no long reassigned to the new
> > zone in memmap_init_zone() by calling __init_single_pfn().
> >
> 
> This sounds like the root cause of your issue. Could we not fix that instead?

Yes, this is proposal b) from my last mail that would work too: I
implemented an arm64 private early_pfn_valid() function that uses
memblock_is_memory() to setup all pages of a zone. Though, I think
this is the wrong way and thus I prefer this patch instead. I see
serveral reasons for this:

Inconsistent use of struct *page, it is initialized but never used
again.

Other archs only do a basic range check in pfn_valid(), the default
implementation just returns if the whole section is valid. As I
understand the code, if the mem range is not aligned to the section,
then there will be pfn's in the section that don't have physical mem
attached. The page is then just initialized, it's not marked reserved
nor the refcount is non-zero. It is then simply not used. This is how
no-map pages should be handled too.

I think pfn_valid() is just a quick check if the pfn's struct *page
can be used. There is a good description for this in include/linux/
mmzone.h. So there can be memory holes that have a valid pfn.

If the no-map memory needs special handling, then additional checks
need to be added to the particular code (as in ioremap.c). It's imo
wrong to (mis-)use pfn_valid for that.

Variant b) involves generic mm code to fix it for arm64, this patch is
an arm64 change only. This makes it harder to get a fix for it.
(Though maybe only a problem of patch logistics.)

> 
> > Fixing this by restoring the old behavior of pfn_valid() to use
> > memblock_is_memory().
> 
> This is incorrect imo. In general, pfn_valid() means ordinary memory
> covered by the linear mapping and the struct page array. Returning
> reserved ranges that the kernel should not even touch only to please
> the NUMA code seems like an inappropriate way to deal with this issue.

As said above, it is not marked as reserved, it is treated like
non-existing memory.

This has been observed for non-numa kernels too and can happen for
each zone that is only partly initialized.

I think the patch addresses your concerns. I can't see there the
kernel uses memory marked as nomap in a wrong way.

Thanks,

-Robert

> 
> > Also changing users of pfn_valid() in arm64 code
> > to use memblock_is_map_memory() where necessary. This only affects
> > code in ioremap.c. The code in mmu.c still can use the new version of
> > pfn_valid().
> >
> > Should be marked stable v4.5..
> >
> > Signed-off-by: Robert Richter 
> > ---
> >  arch/arm64/mm/init.c| 2 +-
> >  arch/arm64/mm/ioremap.c | 5 +++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-06 Thread Ard Biesheuvel
Hi Robert,

Apologies for only responding now. I did not quite manage to get my
head around your original email yet, but I don't think this patch is
the correct solution.

On 6 October 2016 at 10:52, Robert Richter  wrote:
> There is a memory setup problem on ThunderX systems with certain
> memory configurations. The symptom is
>
>  kernel BUG at mm/page_alloc.c:1848!
>
> This happens for some configs with 64k page size enabled. The bug
> triggers for page zones with some pages in the zone not assigned to
> this particular zone. In my case some pages that are marked as nomap
> were not reassigned to the new zone of node 1, so those are still
> assigned to node 0.
>
> The reason for the mis-configuration is a change in pfn_valid() which
> reports pages marked nomap as invalid:
>
>  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear 
> mapping
>

These pages are owned by the firmware, which may map it with
attributes that conflict with the attributes we use for the linear
mapping. This means they should not be covered by the linear mapping.

> This causes pages marked as nomap being no long reassigned to the new
> zone in memmap_init_zone() by calling __init_single_pfn().
>

This sounds like the root cause of your issue. Could we not fix that instead?

> Fixing this by restoring the old behavior of pfn_valid() to use
> memblock_is_memory().

This is incorrect imo. In general, pfn_valid() means ordinary memory
covered by the linear mapping and the struct page array. Returning
reserved ranges that the kernel should not even touch only to please
the NUMA code seems like an inappropriate way to deal with this issue.

> Also changing users of pfn_valid() in arm64 code
> to use memblock_is_map_memory() where necessary. This only affects
> code in ioremap.c. The code in mmu.c still can use the new version of
> pfn_valid().
>
> Should be marked stable v4.5..
>
> Signed-off-by: Robert Richter 
> ---
>  arch/arm64/mm/init.c| 2 +-
>  arch/arm64/mm/ioremap.c | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index bbb7ee76e319..25b8659c2a9f 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -147,7 +147,7 @@ static void __init zone_sizes_init(unsigned long min, 
> unsigned long max)
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>  int pfn_valid(unsigned long pfn)
>  {
> -   return memblock_is_map_memory(pfn << PAGE_SHIFT);
> +   return memblock_is_memory(pfn << PAGE_SHIFT);
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  #endif
> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> index 01e88c8bcab0..c17c220b0c48 100644
> --- a/arch/arm64/mm/ioremap.c
> +++ b/arch/arm64/mm/ioremap.c
> @@ -21,6 +21,7 @@
>   */
>
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -55,7 +56,7 @@ static void __iomem *__ioremap_caller(phys_addr_t 
> phys_addr, size_t size,
> /*
>  * Don't allow RAM to be mapped.
>  */
> -   if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr
> +   if (WARN_ON(memblock_is_map_memory(phys_addr)))
> return NULL;
>
> area = get_vm_area_caller(size, VM_IOREMAP, caller);
> @@ -96,7 +97,7 @@ EXPORT_SYMBOL(__iounmap);
>  void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
>  {
> /* For normal memory we already have a cacheable mapping. */
> -   if (pfn_valid(__phys_to_pfn(phys_addr)))
> +   if (memblock_is_map_memory(phys_addr))
> return (void __iomem *)__phys_to_virt(phys_addr);
>
> return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
> --
> 2.7.0.rc3
>


Re: [PATCH] arm64: mm: Fix memmap to be initialized for the entire section

2016-10-06 Thread Ard Biesheuvel
Hi Robert,

Apologies for only responding now. I did not quite manage to get my
head around your original email yet, but I don't think this patch is
the correct solution.

On 6 October 2016 at 10:52, Robert Richter  wrote:
> There is a memory setup problem on ThunderX systems with certain
> memory configurations. The symptom is
>
>  kernel BUG at mm/page_alloc.c:1848!
>
> This happens for some configs with 64k page size enabled. The bug
> triggers for page zones with some pages in the zone not assigned to
> this particular zone. In my case some pages that are marked as nomap
> were not reassigned to the new zone of node 1, so those are still
> assigned to node 0.
>
> The reason for the mis-configuration is a change in pfn_valid() which
> reports pages marked nomap as invalid:
>
>  68709f45385a arm64: only consider memblocks with NOMAP cleared for linear 
> mapping
>

These pages are owned by the firmware, which may map it with
attributes that conflict with the attributes we use for the linear
mapping. This means they should not be covered by the linear mapping.

> This causes pages marked as nomap being no long reassigned to the new
> zone in memmap_init_zone() by calling __init_single_pfn().
>

This sounds like the root cause of your issue. Could we not fix that instead?

> Fixing this by restoring the old behavior of pfn_valid() to use
> memblock_is_memory().

This is incorrect imo. In general, pfn_valid() means ordinary memory
covered by the linear mapping and the struct page array. Returning
reserved ranges that the kernel should not even touch only to please
the NUMA code seems like an inappropriate way to deal with this issue.

> Also changing users of pfn_valid() in arm64 code
> to use memblock_is_map_memory() where necessary. This only affects
> code in ioremap.c. The code in mmu.c still can use the new version of
> pfn_valid().
>
> Should be marked stable v4.5..
>
> Signed-off-by: Robert Richter 
> ---
>  arch/arm64/mm/init.c| 2 +-
>  arch/arm64/mm/ioremap.c | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index bbb7ee76e319..25b8659c2a9f 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -147,7 +147,7 @@ static void __init zone_sizes_init(unsigned long min, 
> unsigned long max)
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>  int pfn_valid(unsigned long pfn)
>  {
> -   return memblock_is_map_memory(pfn << PAGE_SHIFT);
> +   return memblock_is_memory(pfn << PAGE_SHIFT);
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  #endif
> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> index 01e88c8bcab0..c17c220b0c48 100644
> --- a/arch/arm64/mm/ioremap.c
> +++ b/arch/arm64/mm/ioremap.c
> @@ -21,6 +21,7 @@
>   */
>
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -55,7 +56,7 @@ static void __iomem *__ioremap_caller(phys_addr_t 
> phys_addr, size_t size,
> /*
>  * Don't allow RAM to be mapped.
>  */
> -   if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr
> +   if (WARN_ON(memblock_is_map_memory(phys_addr)))
> return NULL;
>
> area = get_vm_area_caller(size, VM_IOREMAP, caller);
> @@ -96,7 +97,7 @@ EXPORT_SYMBOL(__iounmap);
>  void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
>  {
> /* For normal memory we already have a cacheable mapping. */
> -   if (pfn_valid(__phys_to_pfn(phys_addr)))
> +   if (memblock_is_map_memory(phys_addr))
> return (void __iomem *)__phys_to_virt(phys_addr);
>
> return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
> --
> 2.7.0.rc3
>