Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Jia He



On 3/15/2018 11:39 PM, Daniel Vacek Wrote:

On Thu, Mar 15, 2018 at 3:08 PM, Jia He  wrote:

Hi Daniel



On 3/14/2018 6:42 AM, Daniel Vacek Wrote:

On some architectures (reported on arm64) commit 864b75f9d6b01
("mm/page_alloc: fix memmap_init_zone pageblock alignment")
causes a boot hang. This patch fixes the hang making sure the alignment
never steps back.

Link:
http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.ne...@redhat.com
Fixes: 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock
alignment")
Signed-off-by: Daniel Vacek 
Tested-by: Sudeep Holla 
Tested-by: Naresh Kamboju 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Paul Burton 
Cc: Pavel Tatashin 
Cc: Vlastimil Babka 
Cc: 
---
   mm/page_alloc.c | 7 ++-
   1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3d974cb2a1a1..e033a6895c6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5364,9 +5364,14 @@ void __meminit memmap_init_zone(unsigned long size,
int nid, unsigned long zone,
  * is not. move_freepages_block() can shift ahead
of
  * the valid region but still depends on correct
page
  * metadata.
+* Also make sure we never step back.
  */
-   pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
+   unsigned long next_pfn;
+
+   next_pfn = (memblock_next_valid_pfn(pfn, end_pfn)
&
 ~(pageblock_nr_pages-1)) - 1;
+   if (next_pfn > pfn)
+   pfn = next_pfn;

It didn't resolve the booting hang issue in my arm64 server.
what if memblock_next_valid_pfn(pfn, end_pfn) is 32 and pageblock_nr_pages
is 8196?
Thus, next_pfn will be (unsigned long)-1 and be larger than pfn.
So still there is an infinite loop here.

Hi Jia,

Yeah, looks like another uncovered case. Noone reported this so far.
Anyways upstream reverted all this for now and we're discussing the
right approach here.

In any case thanks for this report. Can you share something like below
from your machine?

sure.
[    0.00] NUMA: Faking a node at [mem 
0x-0x0017]

[    0.00] NUMA: NODE_DATA [mem 0x17cb80-0x17]
[    0.00] Zone ranges:
[    0.00]   DMA32    [mem 0x0020-0x]
[    0.00]   Normal   [mem 0x0001-0x0017]
[    0.00] Movable zone start for each node
[    0.00] Early memory node ranges
[    0.00]   node   0: [mem 0x0020-0x0021]
[    0.00]   node   0: [mem 0x0082-0x0307]
[    0.00]   node   0: [mem 0x0308-0x0308]
[    0.00]   node   0: [mem 0x0309-0x031f]
[    0.00]   node   0: [mem 0x0320-0x033f]
[    0.00]   node   0: [mem 0x0341-0x0563]
[    0.00]   node   0: [mem 0x0564-0x0567]
[    0.00]   node   0: [mem 0x0568-0x056d]
[    0.00]   node   0: [mem 0x056e-0x086f]
[    0.00]   node   0: [mem 0x0870-0x0871]
[    0.00]   node   0: [mem 0x0872-0x0894]
[    0.00]   node   0: [mem 0x0895-0x08ba]
[    0.00]   node   0: [mem 0x08bb-0x08bc]
[    0.00]   node   0: [mem 0x08bd-0x08c4]
[    0.00]   node   0: [mem 0x08c5-0x08e2]
[    0.00]   node   0: [mem 0x08e3-0x08e4]
[    0.00]   node   0: [mem 0x08e5-0x08fc]
[    0.00]   node   0: [mem 0x08fd-0x0910]
[    0.00]   node   0: [mem 0x0911-0x092e]
[    0.00]   node   0: [mem 0x092f-0x0930]
[    0.00]   node   0: [mem 0x0931-0x0963]
[    0.00]   node   0: [mem 0x0964-0x0e61]
[    0.00]   node   0: [mem 0x0e62-0x0e64]
[    0.00]   node   0: [mem 0x0e65-0x0fff]
[    0.00]   node   0: [mem 0x1080-0x17fe]
[    0.00]   node   0: [mem 0x1c00-0x1c00]
[    0.00]   node   0: [mem 0x1c01-0x1c7f]
[    0.00]   node   0: [mem 0x1c81-0x7efb]
[    0.00]   node   0: [mem 0x7efc-0x7efd]
[    0.00]   node   0: [mem 

Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Jia He



On 3/15/2018 11:39 PM, Daniel Vacek Wrote:

On Thu, Mar 15, 2018 at 3:08 PM, Jia He  wrote:

Hi Daniel



On 3/14/2018 6:42 AM, Daniel Vacek Wrote:

On some architectures (reported on arm64) commit 864b75f9d6b01
("mm/page_alloc: fix memmap_init_zone pageblock alignment")
causes a boot hang. This patch fixes the hang making sure the alignment
never steps back.

Link:
http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.ne...@redhat.com
Fixes: 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock
alignment")
Signed-off-by: Daniel Vacek 
Tested-by: Sudeep Holla 
Tested-by: Naresh Kamboju 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Paul Burton 
Cc: Pavel Tatashin 
Cc: Vlastimil Babka 
Cc: 
---
   mm/page_alloc.c | 7 ++-
   1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3d974cb2a1a1..e033a6895c6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5364,9 +5364,14 @@ void __meminit memmap_init_zone(unsigned long size,
int nid, unsigned long zone,
  * is not. move_freepages_block() can shift ahead
of
  * the valid region but still depends on correct
page
  * metadata.
+* Also make sure we never step back.
  */
-   pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
+   unsigned long next_pfn;
+
+   next_pfn = (memblock_next_valid_pfn(pfn, end_pfn)
&
 ~(pageblock_nr_pages-1)) - 1;
+   if (next_pfn > pfn)
+   pfn = next_pfn;

It didn't resolve the booting hang issue in my arm64 server.
what if memblock_next_valid_pfn(pfn, end_pfn) is 32 and pageblock_nr_pages
is 8196?
Thus, next_pfn will be (unsigned long)-1 and be larger than pfn.
So still there is an infinite loop here.

Hi Jia,

Yeah, looks like another uncovered case. Noone reported this so far.
Anyways upstream reverted all this for now and we're discussing the
right approach here.

In any case thanks for this report. Can you share something like below
from your machine?

sure.
[    0.00] NUMA: Faking a node at [mem 
0x-0x0017]

[    0.00] NUMA: NODE_DATA [mem 0x17cb80-0x17]
[    0.00] Zone ranges:
[    0.00]   DMA32    [mem 0x0020-0x]
[    0.00]   Normal   [mem 0x0001-0x0017]
[    0.00] Movable zone start for each node
[    0.00] Early memory node ranges
[    0.00]   node   0: [mem 0x0020-0x0021]
[    0.00]   node   0: [mem 0x0082-0x0307]
[    0.00]   node   0: [mem 0x0308-0x0308]
[    0.00]   node   0: [mem 0x0309-0x031f]
[    0.00]   node   0: [mem 0x0320-0x033f]
[    0.00]   node   0: [mem 0x0341-0x0563]
[    0.00]   node   0: [mem 0x0564-0x0567]
[    0.00]   node   0: [mem 0x0568-0x056d]
[    0.00]   node   0: [mem 0x056e-0x086f]
[    0.00]   node   0: [mem 0x0870-0x0871]
[    0.00]   node   0: [mem 0x0872-0x0894]
[    0.00]   node   0: [mem 0x0895-0x08ba]
[    0.00]   node   0: [mem 0x08bb-0x08bc]
[    0.00]   node   0: [mem 0x08bd-0x08c4]
[    0.00]   node   0: [mem 0x08c5-0x08e2]
[    0.00]   node   0: [mem 0x08e3-0x08e4]
[    0.00]   node   0: [mem 0x08e5-0x08fc]
[    0.00]   node   0: [mem 0x08fd-0x0910]
[    0.00]   node   0: [mem 0x0911-0x092e]
[    0.00]   node   0: [mem 0x092f-0x0930]
[    0.00]   node   0: [mem 0x0931-0x0963]
[    0.00]   node   0: [mem 0x0964-0x0e61]
[    0.00]   node   0: [mem 0x0e62-0x0e64]
[    0.00]   node   0: [mem 0x0e65-0x0fff]
[    0.00]   node   0: [mem 0x1080-0x17fe]
[    0.00]   node   0: [mem 0x1c00-0x1c00]
[    0.00]   node   0: [mem 0x1c01-0x1c7f]
[    0.00]   node   0: [mem 0x1c81-0x7efb]
[    0.00]   node   0: [mem 0x7efc-0x7efd]
[    0.00]   node   0: [mem 0x7efe-0x7efe]
[    0.00]   node   0: [mem 0x7eff-0x7eff]
[    0.00]   node   0: [mem 0x7f00-0x0017]
[    0.00] Initmem setup node 0 [mem 
0x0020-0x0017]


--

Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Daniel Vacek
On Thu, Mar 15, 2018 at 3:08 PM, Jia He  wrote:
> Hi Daniel
>
>
>
> On 3/14/2018 6:42 AM, Daniel Vacek Wrote:
>>
>> On some architectures (reported on arm64) commit 864b75f9d6b01
>> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> causes a boot hang. This patch fixes the hang making sure the alignment
>> never steps back.
>>
>> Link:
>> http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.ne...@redhat.com
>> Fixes: 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock
>> alignment")
>> Signed-off-by: Daniel Vacek 
>> Tested-by: Sudeep Holla 
>> Tested-by: Naresh Kamboju 
>> Cc: Andrew Morton 
>> Cc: Mel Gorman 
>> Cc: Michal Hocko 
>> Cc: Paul Burton 
>> Cc: Pavel Tatashin 
>> Cc: Vlastimil Babka 
>> Cc: 
>> ---
>>   mm/page_alloc.c | 7 ++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3d974cb2a1a1..e033a6895c6f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5364,9 +5364,14 @@ void __meminit memmap_init_zone(unsigned long size,
>> int nid, unsigned long zone,
>>  * is not. move_freepages_block() can shift ahead
>> of
>>  * the valid region but still depends on correct
>> page
>>  * metadata.
>> +* Also make sure we never step back.
>>  */
>> -   pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>> +   unsigned long next_pfn;
>> +
>> +   next_pfn = (memblock_next_valid_pfn(pfn, end_pfn)
>> &
>> ~(pageblock_nr_pages-1)) - 1;
>> +   if (next_pfn > pfn)
>> +   pfn = next_pfn;
>
> It didn't resolve the booting hang issue in my arm64 server.
> what if memblock_next_valid_pfn(pfn, end_pfn) is 32 and pageblock_nr_pages
> is 8196?
> Thus, next_pfn will be (unsigned long)-1 and be larger than pfn.
> So still there is an infinite loop here.

Hi Jia,

Yeah, looks like another uncovered case. Noone reported this so far.
Anyways upstream reverted all this for now and we're discussing the
right approach here.

In any case thanks for this report. Can you share something like below
from your machine?

   Booting Linux on physical CPU 0x00 [0x410fd034]
   Linux version 4.16.0-rc5-4-gfc6eabbbf8ef-dirty (ard@dogfood) ...
   Machine model: Socionext Developer Box
   earlycon: pl11 at MMIO 0x2a40 (options '')
   bootconsole [pl11] enabled
   efi: Getting EFI parameters from FDT:
   efi: EFI v2.70 by Linaro
   efi:  SMBIOS 3.0=0xff58  ESRT=0xf9948198  MEMATTR=0xf83b1a98
RNG=0xff7ac898
   random: fast init done
   efi: seeding entropy pool
   esrt: Reserving ESRT space from 0xf9948198 to 0xf99481d0.
   cma: Reserved 16 MiB at 0xfd80
   NUMA: No NUMA configuration found
   NUMA: Faking a node at [mem 0x-0x000f]
   NUMA: NODE_DATA [mem 0xd8d80-0xda87f]
   Zone ranges:
 DMA32[mem 0x8000-0x]
 Normal   [mem 0x0001-0x000f]
   Movable zone start for each node
   Early memory node ranges
 node   0: [mem 0x8000-0xfebe]
 node   0: [mem 0xfebf-0xfefc]
 node   0: [mem 0xfefd-0xff43]
 node   0: [mem 0xff44-0xff7a]
 node   0: [mem 0xff7b-0x]
 node   0: [mem 0x00088000-0x000f]
   Initmem setup node 0 [mem 0x8000-0x000f]


Thank you.

--nX

> Cheers,
> Jia He
>>
>>   #endif
>> continue;
>> }
>
>
> --
> Cheers,
> Jia
>


Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Daniel Vacek
On Thu, Mar 15, 2018 at 3:08 PM, Jia He  wrote:
> Hi Daniel
>
>
>
> On 3/14/2018 6:42 AM, Daniel Vacek Wrote:
>>
>> On some architectures (reported on arm64) commit 864b75f9d6b01
>> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> causes a boot hang. This patch fixes the hang making sure the alignment
>> never steps back.
>>
>> Link:
>> http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.ne...@redhat.com
>> Fixes: 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock
>> alignment")
>> Signed-off-by: Daniel Vacek 
>> Tested-by: Sudeep Holla 
>> Tested-by: Naresh Kamboju 
>> Cc: Andrew Morton 
>> Cc: Mel Gorman 
>> Cc: Michal Hocko 
>> Cc: Paul Burton 
>> Cc: Pavel Tatashin 
>> Cc: Vlastimil Babka 
>> Cc: 
>> ---
>>   mm/page_alloc.c | 7 ++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3d974cb2a1a1..e033a6895c6f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5364,9 +5364,14 @@ void __meminit memmap_init_zone(unsigned long size,
>> int nid, unsigned long zone,
>>  * is not. move_freepages_block() can shift ahead
>> of
>>  * the valid region but still depends on correct
>> page
>>  * metadata.
>> +* Also make sure we never step back.
>>  */
>> -   pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>> +   unsigned long next_pfn;
>> +
>> +   next_pfn = (memblock_next_valid_pfn(pfn, end_pfn)
>> &
>> ~(pageblock_nr_pages-1)) - 1;
>> +   if (next_pfn > pfn)
>> +   pfn = next_pfn;
>
> It didn't resolve the booting hang issue in my arm64 server.
> what if memblock_next_valid_pfn(pfn, end_pfn) is 32 and pageblock_nr_pages
> is 8196?
> Thus, next_pfn will be (unsigned long)-1 and be larger than pfn.
> So still there is an infinite loop here.

Hi Jia,

Yeah, looks like another uncovered case. Noone reported this so far.
Anyways upstream reverted all this for now and we're discussing the
right approach here.

In any case thanks for this report. Can you share something like below
from your machine?

   Booting Linux on physical CPU 0x00 [0x410fd034]
   Linux version 4.16.0-rc5-4-gfc6eabbbf8ef-dirty (ard@dogfood) ...
   Machine model: Socionext Developer Box
   earlycon: pl11 at MMIO 0x2a40 (options '')
   bootconsole [pl11] enabled
   efi: Getting EFI parameters from FDT:
   efi: EFI v2.70 by Linaro
   efi:  SMBIOS 3.0=0xff58  ESRT=0xf9948198  MEMATTR=0xf83b1a98
RNG=0xff7ac898
   random: fast init done
   efi: seeding entropy pool
   esrt: Reserving ESRT space from 0xf9948198 to 0xf99481d0.
   cma: Reserved 16 MiB at 0xfd80
   NUMA: No NUMA configuration found
   NUMA: Faking a node at [mem 0x-0x000f]
   NUMA: NODE_DATA [mem 0xd8d80-0xda87f]
   Zone ranges:
 DMA32[mem 0x8000-0x]
 Normal   [mem 0x0001-0x000f]
   Movable zone start for each node
   Early memory node ranges
 node   0: [mem 0x8000-0xfebe]
 node   0: [mem 0xfebf-0xfefc]
 node   0: [mem 0xfefd-0xff43]
 node   0: [mem 0xff44-0xff7a]
 node   0: [mem 0xff7b-0x]
 node   0: [mem 0x00088000-0x000f]
   Initmem setup node 0 [mem 0x8000-0x000f]


Thank you.

--nX

> Cheers,
> Jia He
>>
>>   #endif
>> continue;
>> }
>
>
> --
> Cheers,
> Jia
>


Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Daniel Vacek
On Thu, Mar 15, 2018 at 12:50 PM, Michal Hocko  wrote:
> On Thu 15-03-18 02:30:41, Daniel Vacek wrote:
>> On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko  wrote:
>> > On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
>> >> On some architectures (reported on arm64) commit 864b75f9d6b01 
>> >> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> >> causes a boot hang. This patch fixes the hang making sure the alignment
>> >> never steps back.
>> >
>> > I am sorry to be complaining again, but the code is so obscure that I
>>
>> No worries, I'm glad for any review. Which code exactly you do find
>> obscure? This patch or my former fix or the original commit
>> introducing memblock_next_valid_pfn()? Coz I'd agree the original
>> commit looks pretty obscure...
>
> As mentioned in the other email, the whole going back and forth in the
> same loop is just too ugly to live.

It's not really supposed to go back, but I guess you understand.

>> > would _really_ appreciate some more information about what is going
>> > on here. memblock_next_valid_pfn will most likely return a pfn within
>> > the same memblock and the alignment will move it before the old pfn
>> > which is not valid - so the block has some holes. Is that correct?
>>
>> I do not understand what you mean by 'pfn within the same memblock'?
>
> Sorry, I should have said in the same pageblock
>
>> And by 'the block has some holes'?
>
> memblock_next_valid_pfn clearly returns pfn which is within a pageblock
> and that is why we do not initialize pages in the begining of the block
> while move_freepages_block does really expect the full pageblock to be
> initialized properly. That is the fundamental problem, right?

Yes, that's correct.

>> memblock has types 'memory' (as usable memory) and 'reserved' (for
>> unusable mem), if I understand correctly.
>
> We might not have struct pages for invalid pfns. That really depends on
> the memory mode. Sure sparse mem model will usually allocate struct
> pages for whole memory sections but that is not universally true and
> adding such a suble assumption is simply wrong.

This is gray area for me. But if I understand correctly this
assumption comes from the code. It was already there and got broken
hence I was trying to keep it. If anything needs redesigning I'm all
for it. But I was just calming the fire here. I only didn't test on
arm, which seems to be the only one different.

> I suspect you are making strong assumptions based on a very specific
> implementation which might be not true in general. That was the feeling
> I've had since the patch was proposed for the first time. This is such a
> cluttered area that I am not really sure myself, thoug.

I understand. And again this is likely correct. I'll be glad for any
assistance here. My limited knowledge is the primary cause for lack of
relevant details I guess. What I checked looks like pfn_valid is a
generic function used by all arches but arm, which seems to be the
only one to implement CONFIG_HAVE_ARCH_PFN_VALID if I didn't miss
anything. So if this config is enabled on arm, it uses it's own
version of pfn_valid(). If not, I'd expect all arches behave the same.
That's where my assumption comes from.

> --
> Michal Hocko
> SUSE Labs


Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Daniel Vacek
On Thu, Mar 15, 2018 at 12:50 PM, Michal Hocko  wrote:
> On Thu 15-03-18 02:30:41, Daniel Vacek wrote:
>> On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko  wrote:
>> > On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
>> >> On some architectures (reported on arm64) commit 864b75f9d6b01 
>> >> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> >> causes a boot hang. This patch fixes the hang making sure the alignment
>> >> never steps back.
>> >
>> > I am sorry to be complaining again, but the code is so obscure that I
>>
>> No worries, I'm glad for any review. Which code exactly you do find
>> obscure? This patch or my former fix or the original commit
>> introducing memblock_next_valid_pfn()? Coz I'd agree the original
>> commit looks pretty obscure...
>
> As mentioned in the other email, the whole going back and forth in the
> same loop is just too ugly to live.

It's not really supposed to go back, but I guess you understand.

>> > would _really_ appreciate some more information about what is going
>> > on here. memblock_next_valid_pfn will most likely return a pfn within
>> > the same memblock and the alignment will move it before the old pfn
>> > which is not valid - so the block has some holes. Is that correct?
>>
>> I do not understand what you mean by 'pfn within the same memblock'?
>
> Sorry, I should have said in the same pageblock
>
>> And by 'the block has some holes'?
>
> memblock_next_valid_pfn clearly returns pfn which is within a pageblock
> and that is why we do not initialize pages in the begining of the block
> while move_freepages_block does really expect the full pageblock to be
> initialized properly. That is the fundamental problem, right?

Yes, that's correct.

>> memblock has types 'memory' (as usable memory) and 'reserved' (for
>> unusable mem), if I understand correctly.
>
> We might not have struct pages for invalid pfns. That really depends on
> the memory mode. Sure sparse mem model will usually allocate struct
> pages for whole memory sections but that is not universally true and
> adding such a suble assumption is simply wrong.

This is gray area for me. But if I understand correctly this
assumption comes from the code. It was already there and got broken
hence I was trying to keep it. If anything needs redesigning I'm all
for it. But I was just calming the fire here. I only didn't test on
arm, which seems to be the only one different.

> I suspect you are making strong assumptions based on a very specific
> implementation which might be not true in general. That was the feeling
> I've had since the patch was proposed for the first time. This is such a
> cluttered area that I am not really sure myself, thoug.

I understand. And again this is likely correct. I'll be glad for any
assistance here. My limited knowledge is the primary cause for lack of
relevant details I guess. What I checked looks like pfn_valid is a
generic function used by all arches but arm, which seems to be the
only one to implement CONFIG_HAVE_ARCH_PFN_VALID if I didn't miss
anything. So if this config is enabled on arm, it uses it's own
version of pfn_valid(). If not, I'd expect all arches behave the same.
That's where my assumption comes from.

> --
> Michal Hocko
> SUSE Labs


Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Jia He

Hi Daniel


On 3/14/2018 6:42 AM, Daniel Vacek Wrote:

On some architectures (reported on arm64) commit 864b75f9d6b01 ("mm/page_alloc: fix 
memmap_init_zone pageblock alignment")
causes a boot hang. This patch fixes the hang making sure the alignment
never steps back.

Link: 
http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.ne...@redhat.com
Fixes: 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
Signed-off-by: Daniel Vacek 
Tested-by: Sudeep Holla 
Tested-by: Naresh Kamboju 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Paul Burton 
Cc: Pavel Tatashin 
Cc: Vlastimil Babka 
Cc: 
---
  mm/page_alloc.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3d974cb2a1a1..e033a6895c6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5364,9 +5364,14 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 * is not. move_freepages_block() can shift ahead of
 * the valid region but still depends on correct page
 * metadata.
+* Also make sure we never step back.
 */
-   pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
+   unsigned long next_pfn;
+
+   next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
~(pageblock_nr_pages-1)) - 1;
+   if (next_pfn > pfn)
+   pfn = next_pfn;

It didn't resolve the booting hang issue in my arm64 server.
what if memblock_next_valid_pfn(pfn, end_pfn) is 32 and 
pageblock_nr_pages is 8196?

Thus, next_pfn will be (unsigned long)-1 and be larger than pfn.
So still there is an infinite loop here.

Cheers,
Jia He

  #endif
continue;
}


--
Cheers,
Jia



Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Jia He

Hi Daniel


On 3/14/2018 6:42 AM, Daniel Vacek Wrote:

On some architectures (reported on arm64) commit 864b75f9d6b01 ("mm/page_alloc: fix 
memmap_init_zone pageblock alignment")
causes a boot hang. This patch fixes the hang making sure the alignment
never steps back.

Link: 
http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.ne...@redhat.com
Fixes: 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
Signed-off-by: Daniel Vacek 
Tested-by: Sudeep Holla 
Tested-by: Naresh Kamboju 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Paul Burton 
Cc: Pavel Tatashin 
Cc: Vlastimil Babka 
Cc: 
---
  mm/page_alloc.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3d974cb2a1a1..e033a6895c6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5364,9 +5364,14 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 * is not. move_freepages_block() can shift ahead of
 * the valid region but still depends on correct page
 * metadata.
+* Also make sure we never step back.
 */
-   pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
+   unsigned long next_pfn;
+
+   next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
~(pageblock_nr_pages-1)) - 1;
+   if (next_pfn > pfn)
+   pfn = next_pfn;

It didn't resolve the booting hang issue in my arm64 server.
what if memblock_next_valid_pfn(pfn, end_pfn) is 32 and 
pageblock_nr_pages is 8196?

Thus, next_pfn will be (unsigned long)-1 and be larger than pfn.
So still there is an infinite loop here.

Cheers,
Jia He

  #endif
continue;
}


--
Cheers,
Jia



Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Michal Hocko
On Thu 15-03-18 02:30:41, Daniel Vacek wrote:
> On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko  wrote:
> > On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
> >> On some architectures (reported on arm64) commit 864b75f9d6b01 
> >> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
> >> causes a boot hang. This patch fixes the hang making sure the alignment
> >> never steps back.
> >
> > I am sorry to be complaining again, but the code is so obscure that I
> 
> No worries, I'm glad for any review. Which code exactly you do find
> obscure? This patch or my former fix or the original commit
> introducing memblock_next_valid_pfn()? Coz I'd agree the original
> commit looks pretty obscure...

As mentioned in the other email, the whole going back and forth in the
same loop is just too ugly to live.

> > would _really_ appreciate some more information about what is going
> > on here. memblock_next_valid_pfn will most likely return a pfn within
> > the same memblock and the alignment will move it before the old pfn
> > which is not valid - so the block has some holes. Is that correct?
> 
> I do not understand what you mean by 'pfn within the same memblock'?

Sorry, I should have said in the same pageblock

> And by 'the block has some holes'?

memblock_next_valid_pfn clearly returns pfn which is within a pageblock
and that is why we do not initialize pages in the begining of the block
while move_freepages_block does really expect the full pageblock to be
initialized properly. That is the fundamental problem, right?

> memblock has types 'memory' (as usable memory) and 'reserved' (for
> unusable mem), if I understand correctly.

We might not have struct pages for invalid pfns. That really depends on
the memory mode. Sure sparse mem model will usually allocate struct
pages for whole memory sections but that is not universally true and
adding such a suble assumption is simply wrong.

I suspect you are making strong assumptions based on a very specific
implementation which might be not true in general. That was the feeling
I've had since the patch was proposed for the first time. This is such a
cluttered area that I am not really sure myself, thoug.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-15 Thread Michal Hocko
On Thu 15-03-18 02:30:41, Daniel Vacek wrote:
> On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko  wrote:
> > On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
> >> On some architectures (reported on arm64) commit 864b75f9d6b01 
> >> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
> >> causes a boot hang. This patch fixes the hang making sure the alignment
> >> never steps back.
> >
> > I am sorry to be complaining again, but the code is so obscure that I
> 
> No worries, I'm glad for any review. Which code exactly you do find
> obscure? This patch or my former fix or the original commit
> introducing memblock_next_valid_pfn()? Coz I'd agree the original
> commit looks pretty obscure...

As mentioned in the other email, the whole going back and forth in the
same loop is just too ugly to live.

> > would _really_ appreciate some more information about what is going
> > on here. memblock_next_valid_pfn will most likely return a pfn within
> > the same memblock and the alignment will move it before the old pfn
> > which is not valid - so the block has some holes. Is that correct?
> 
> I do not understand what you mean by 'pfn within the same memblock'?

Sorry, I should have said in the same pageblock

> And by 'the block has some holes'?

memblock_next_valid_pfn clearly returns pfn which is within a pageblock
and that is why we do not initialize pages in the begining of the block
while move_freepages_block does really expect the full pageblock to be
initialized properly. That is the fundamental problem, right?

> memblock has types 'memory' (as usable memory) and 'reserved' (for
> unusable mem), if I understand correctly.

We might not have struct pages for invalid pfns. That really depends on
the memory mode. Sure sparse mem model will usually allocate struct
pages for whole memory sections but that is not universally true and
adding such a suble assumption is simply wrong.

I suspect you are making strong assumptions based on a very specific
implementation which might be not true in general. That was the feeling
I've had since the patch was proposed for the first time. This is such a
cluttered area that I am not really sure myself, thoug.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-14 Thread Daniel Vacek
On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko  wrote:
> On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
>> On some architectures (reported on arm64) commit 864b75f9d6b01 
>> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> causes a boot hang. This patch fixes the hang making sure the alignment
>> never steps back.
>
> I am sorry to be complaining again, but the code is so obscure that I

No worries, I'm glad for any review. Which code exactly you do find
obscure? This patch or my former fix or the original commit
introducing memblock_next_valid_pfn()? Coz I'd agree the original
commit looks pretty obscure...

> would _really_ appreciate some more information about what is going
> on here. memblock_next_valid_pfn will most likely return a pfn within
> the same memblock and the alignment will move it before the old pfn
> which is not valid - so the block has some holes. Is that correct?

I do not understand what you mean by 'pfn within the same memblock'?
And by 'the block has some holes'?

memblock has types 'memory' (as usable memory) and 'reserved' (for
unusable mem), if I understand correctly. Both of them have an array
of regions (as an equivalent or kind of abstraction of ranges, IIUC).
memblock is a global symbol which contains all of this. So one could
say all pfns are within the same memblock. Did you mean the same
memory region perhaps? A region is solid by definition, I believe. So
there should be no holes within a single region _by_definition_
anyways. Or am I wrong here?

Now, memblock_next_valid_pfn(pfn) is called only conditionally when
the old pfn is already 'invalid' in the first place. Here the meaning
of this 'invalid' is arch/config dependent. With my former fix based
on x86_64 config I was working with assumption that the semantics of
'invalid' means the memsection does not have memmap:

early_pfn_valid(pfn)
  pfn_valid(pfn)
valid_section(__nr_to_section(pfn_to_section_nr(pfn)))
  return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP))

>From this I implied that when memblock_next_valid_pfn(pfn) is called
this pfn is the first in a section (ie. section aligned) and this
whole section is going to be skipped. So that the returned pfn will
gonna be at least one full section away from the old pfn. That's also
exactly what I can see in the memory dumps from bugreports resulting
in my former fix.
Note that with next_pfn being at least a full section away from old
pfn, there is no need to check whether it steps back or not. Even when
pageblock aligned. That's why I did not include the hunk in this patch
in my former fix.

Now I have no idea why above said does not hold true for arm64 or what
config was used there. I did not have a chance nor time to get my
hands on any arm hardware where it broke. The same way as I did not
test any architecture but x86_64 where I had original reports of
crashes before applying my former fix.
Also I am not deeply experienced with mm details and internals and how
everything works on every architecture. And even less when we speak
about early boot init. I mostly only considered the algorithm
abstractions. I bisected and reviewed the patch which regressed
x86_64. It clearly skips some pfns based on memblock memory ranges (if
available) and the skipped pages are not initialized, causing the
crash. This is clearly visible in the memory dumps I got from reports
and hopefully I already clearly explained that. I fixed this by
applying the alignment to keep at least the bare minimum of pages
initialized (so skipping a bit less than to the beginning of the next
usable range in case that range is not
section/memblock/pageblock/whatever aligned - with the pageblock
alignment being the significant one here). Since the next pfn will
always be from different section on x86_64 (at least with the
configurations I checked and my limited knowledge of this stuff), this
patch was not originally applied as it seemed redundant to me at that
time. It was only theoretically possible the algo can start looping if
the next pfn would fall under the original one but it seemed this is
impossible to happen. Since this is generic code and not arch
specific, I expected other arches to behave in a similar manner.
Though it seems arm has it's own view of pfn_valid() based on memblock
instead of mem sections:

neelx@metal:~/nX/src/linux$ global -g CONFIG_HAVE_ARCH_PFN_VALID
arch/arm/include/asm/page.h
arch/arm/mm/init.c
arch/arm64/include/asm/page.h
arch/arm64/mm/init.c
include/linux/mmzone.h
neelx@metal:~/nX/src/linux$ sed '287,293!d' arch/arm64/mm/init.c
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
int pfn_valid(unsigned long pfn)
{
return memblock_is_map_memory(pfn << PAGE_SHIFT);
}
EXPORT_SYMBOL(pfn_valid);
#endif
neelx@metal:~/nX/src/linux$


ARM seems to be a bit unique here and that's what I missed and that's
why my former fix broke arm. I was simply not expecting this. And
that's why this patch is needed (exclusively for arm it seems).

Now we 

Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-14 Thread Daniel Vacek
On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko  wrote:
> On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
>> On some architectures (reported on arm64) commit 864b75f9d6b01 
>> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> causes a boot hang. This patch fixes the hang making sure the alignment
>> never steps back.
>
> I am sorry to be complaining again, but the code is so obscure that I

No worries, I'm glad for any review. Which code exactly you do find
obscure? This patch or my former fix or the original commit
introducing memblock_next_valid_pfn()? Coz I'd agree the original
commit looks pretty obscure...

> would _really_ appreciate some more information about what is going
> on here. memblock_next_valid_pfn will most likely return a pfn within
> the same memblock and the alignment will move it before the old pfn
> which is not valid - so the block has some holes. Is that correct?

I do not understand what you mean by 'pfn within the same memblock'?
And by 'the block has some holes'?

memblock has types 'memory' (as usable memory) and 'reserved' (for
unusable mem), if I understand correctly. Both of them have an array
of regions (as an equivalent or kind of abstraction of ranges, IIUC).
memblock is a global symbol which contains all of this. So one could
say all pfns are within the same memblock. Did you mean the same
memory region perhaps? A region is solid by definition, I believe. So
there should be no holes within a single region _by_definition_
anyways. Or am I wrong here?

Now, memblock_next_valid_pfn(pfn) is called only conditionally when
the old pfn is already 'invalid' in the first place. Here the meaning
of this 'invalid' is arch/config dependent. With my former fix based
on x86_64 config I was working with assumption that the semantics of
'invalid' means the memsection does not have memmap:

early_pfn_valid(pfn)
  pfn_valid(pfn)
valid_section(__nr_to_section(pfn_to_section_nr(pfn)))
  return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP))

>From this I implied that when memblock_next_valid_pfn(pfn) is called
this pfn is the first in a section (ie. section aligned) and this
whole section is going to be skipped. So that the returned pfn will
gonna be at least one full section away from the old pfn. That's also
exactly what I can see in the memory dumps from bugreports resulting
in my former fix.
Note that with next_pfn being at least a full section away from old
pfn, there is no need to check whether it steps back or not. Even when
pageblock aligned. That's why I did not include the hunk in this patch
in my former fix.

Now I have no idea why above said does not hold true for arm64 or what
config was used there. I did not have a chance nor time to get my
hands on any arm hardware where it broke. The same way as I did not
test any architecture but x86_64 where I had original reports of
crashes before applying my former fix.
Also I am not deeply experienced with mm details and internals and how
everything works on every architecture. And even less when we speak
about early boot init. I mostly only considered the algorithm
abstractions. I bisected and reviewed the patch which regressed
x86_64. It clearly skips some pfns based on memblock memory ranges (if
available) and the skipped pages are not initialized, causing the
crash. This is clearly visible in the memory dumps I got from reports
and hopefully I already clearly explained that. I fixed this by
applying the alignment to keep at least the bare minimum of pages
initialized (so skipping a bit less than to the beginning of the next
usable range in case that range is not
section/memblock/pageblock/whatever aligned - with the pageblock
alignment being the significant one here). Since the next pfn will
always be from different section on x86_64 (at least with the
configurations I checked and my limited knowledge of this stuff), this
patch was not originally applied as it seemed redundant to me at that
time. It was only theoretically possible the algo can start looping if
the next pfn would fall under the original one but it seemed this is
impossible to happen. Since this is generic code and not arch
specific, I expected other arches to behave in a similar manner.
Though it seems arm has it's own view of pfn_valid() based on memblock
instead of mem sections:

neelx@metal:~/nX/src/linux$ global -g CONFIG_HAVE_ARCH_PFN_VALID
arch/arm/include/asm/page.h
arch/arm/mm/init.c
arch/arm64/include/asm/page.h
arch/arm64/mm/init.c
include/linux/mmzone.h
neelx@metal:~/nX/src/linux$ sed '287,293!d' arch/arm64/mm/init.c
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
int pfn_valid(unsigned long pfn)
{
return memblock_is_map_memory(pfn << PAGE_SHIFT);
}
EXPORT_SYMBOL(pfn_valid);
#endif
neelx@metal:~/nX/src/linux$


ARM seems to be a bit unique here and that's what I missed and that's
why my former fix broke arm. I was simply not expecting this. And
that's why this patch is needed (exclusively for arm it seems).

Now we can start 

Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-14 Thread Michal Hocko
On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
> On some architectures (reported on arm64) commit 864b75f9d6b01 
> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
> causes a boot hang. This patch fixes the hang making sure the alignment
> never steps back.

I am sorry to be complaining again, but the code is so obscure that I
would _really_ appreciate some more information about what is going
on here. memblock_next_valid_pfn will most likely return a pfn within
the same memblock and the alignment will move it before the old pfn
which is not valid - so the block has some holes. Is that correct?
If yes then please put it into the changelog. Maybe reuse data provided
by Arnd 
http://lkml.kernel.org/r/20180314134431.13241-1-ard.biesheu...@linaro.org
 
> Link: 
> http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.ne...@redhat.com
> Fixes: 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock 
> alignment")
> Signed-off-by: Daniel Vacek 
> Tested-by: Sudeep Holla 
> Tested-by: Naresh Kamboju 
> Cc: Andrew Morton 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Paul Burton 
> Cc: Pavel Tatashin 
> Cc: Vlastimil Babka 
> Cc: 
> ---
>  mm/page_alloc.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3d974cb2a1a1..e033a6895c6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5364,9 +5364,14 @@ void __meminit memmap_init_zone(unsigned long size, 
> int nid, unsigned long zone,
>* is not. move_freepages_block() can shift ahead of
>* the valid region but still depends on correct page
>* metadata.
> +  * Also make sure we never step back.
>*/
> - pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
> + unsigned long next_pfn;
> +
> + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>   ~(pageblock_nr_pages-1)) - 1;
> + if (next_pfn > pfn)
> + pfn = next_pfn;
>  #endif
>   continue;
>   }
> -- 
> 2.16.2
> 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone

2018-03-14 Thread Michal Hocko
On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
> On some architectures (reported on arm64) commit 864b75f9d6b01 
> ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
> causes a boot hang. This patch fixes the hang making sure the alignment
> never steps back.

I am sorry to be complaining again, but the code is so obscure that I
would _really_ appreciate some more information about what is going
on here. memblock_next_valid_pfn will most likely return a pfn within
the same memblock and the alignment will move it before the old pfn
which is not valid - so the block has some holes. Is that correct?
If yes then please put it into the changelog. Maybe reuse data provided
by Arnd 
http://lkml.kernel.org/r/20180314134431.13241-1-ard.biesheu...@linaro.org
 
> Link: 
> http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.1520011945.git.ne...@redhat.com
> Fixes: 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock 
> alignment")
> Signed-off-by: Daniel Vacek 
> Tested-by: Sudeep Holla 
> Tested-by: Naresh Kamboju 
> Cc: Andrew Morton 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Paul Burton 
> Cc: Pavel Tatashin 
> Cc: Vlastimil Babka 
> Cc: 
> ---
>  mm/page_alloc.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3d974cb2a1a1..e033a6895c6f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5364,9 +5364,14 @@ void __meminit memmap_init_zone(unsigned long size, 
> int nid, unsigned long zone,
>* is not. move_freepages_block() can shift ahead of
>* the valid region but still depends on correct page
>* metadata.
> +  * Also make sure we never step back.
>*/
> - pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
> + unsigned long next_pfn;
> +
> + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>   ~(pageblock_nr_pages-1)) - 1;
> + if (next_pfn > pfn)
> + pfn = next_pfn;
>  #endif
>   continue;
>   }
> -- 
> 2.16.2
> 

-- 
Michal Hocko
SUSE Labs