Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix

2017-09-09 Thread Michael Ellerman
We should do the 1G block size as a fix, and backport it, and then make the hot 
unplug code smarter.

cheers

On 8 September 2017 11:15:47 am AEST, Anton Blanchard  wrote:
>Hi Reza,
>
>> I may be misunderstanding this, but what if we did something like x86
>
>> does? When trying to unplug a region smaller than the mapping, they
>> fill that part of the pagetable with 0xFD instead of freeing the
>> whole thing. Once the whole thing is 0xFD, free it.
>> 
>> See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table()
>> 
>> ---%<---
>>  memset((void *)addr, PAGE_INUSE, next - addr);
>> 
>>  page_addr = page_address(pte_page(*pte));
>>  if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
>>  ...
>>  pte_clear(_mm, addr, pte);
>>  ...
>>  }
>> ---%<---
>
>But you only have 1GB ptes at this point, you'd need to start
>instantiating a new level in the tree, and populate 2MB ptes.
>
>That is what Ben is suggesting. I'm happy to go any way (fix hotplug
>to handle this, or increase the memblock size on PowerNV to 1GB), I
>just
>need a solution.
>
>Anton

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix

2017-09-08 Thread Balbir Singh
On Thu, Sep 7, 2017 at 5:21 PM, Benjamin Herrenschmidt
 wrote:
> On Thu, 2017-09-07 at 15:17 +1000, Anton Blanchard wrote:
>> Hi,
>>
>> > There is a similar issue being worked on w.r.t pseries.
>> >
>> > https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bhar...@linux.vnet.ibm.com
>> >
>> > The question is should we map these regions ? ie, we need to tell the
>> > kernel memory region that we would like to hot unplug later so that
>> > we avoid doing kernel allocations from that. If we do that, then we
>> > can possibly map them via 2M size ?
>>
>> But all of memory on PowerNV should be able to be hot unplugged, so

For this ideally we need movable mappings for the regions we intend
to hot-unplug - no? Otherwise, there is no guarantee that hot-unplug
will work

>> there are two options as I see it - either increase the memory block
>> size, or map everything with 2MB pages.
>
> Or be smarter and map with 1G when blocks of 1G are available and break
> down to 2M where necessary, it shouldn't be too hard.
>

strict_rwx patches added helpers to do this

Balbir Singh.


Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix

2017-09-07 Thread Anton Blanchard
Hi Reza,

> I may be misunderstanding this, but what if we did something like x86 
> does? When trying to unplug a region smaller than the mapping, they
> fill that part of the pagetable with 0xFD instead of freeing the
> whole thing. Once the whole thing is 0xFD, free it.
> 
> See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table()
> 
> ---%<---
>   memset((void *)addr, PAGE_INUSE, next - addr);
> 
>   page_addr = page_address(pte_page(*pte));
>   if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
>   ...
>   pte_clear(_mm, addr, pte);
>   ...
>   }
> ---%<---

But you only have 1GB ptes at this point, you'd need to start
instantiating a new level in the tree, and populate 2MB ptes.

That is what Ben is suggesting. I'm happy to go any way (fix hotplug
to handle this, or increase the memblock size on PowerNV to 1GB), I just
need a solution.

Anton


Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix

2017-09-07 Thread Reza Arbab

On Thu, Sep 07, 2017 at 05:17:41AM +, Anton Blanchard wrote:

But all of memory on PowerNV should be able to be hot unplugged, so
there are two options as I see it - either increase the memory block
size, or map everything with 2MB pages.


I may be misunderstanding this, but what if we did something like x86 
does? When trying to unplug a region smaller than the mapping, they fill 
that part of the pagetable with 0xFD instead of freeing the whole thing.  
Once the whole thing is 0xFD, free it.


See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table()

---%<---
memset((void *)addr, PAGE_INUSE, next - addr);

page_addr = page_address(pte_page(*pte));
if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
...
pte_clear(_mm, addr, pte);
...
}
---%<---

--
Reza Arbab



Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix

2017-09-07 Thread Benjamin Herrenschmidt
On Thu, 2017-09-07 at 15:17 +1000, Anton Blanchard wrote:
> Hi,
> 
> > There is a similar issue being worked on w.r.t pseries.
> > 
> > https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bhar...@linux.vnet.ibm.com
> > 
> > The question is should we map these regions ? ie, we need to tell the 
> > kernel memory region that we would like to hot unplug later so that
> > we avoid doing kernel allocations from that. If we do that, then we
> > can possibly map them via 2M size ?
> 
> But all of memory on PowerNV should be able to be hot unplugged, so
> there are two options as I see it - either increase the memory block
> size, or map everything with 2MB pages. 

Or be smarter and map with 1G when blocks of 1G are available and break
down to 2M where necessary, it shouldn't be too hard.

Cheers,
Ben.



Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix

2017-09-06 Thread Anton Blanchard
Hi,

> There is a similar issue being worked on w.r.t pseries.
> 
> https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bhar...@linux.vnet.ibm.com
> 
> The question is should we map these regions ? ie, we need to tell the 
> kernel memory region that we would like to hot unplug later so that
> we avoid doing kernel allocations from that. If we do that, then we
> can possibly map them via 2M size ?

But all of memory on PowerNV should be able to be hot unplugged, so
there are two options as I see it - either increase the memory block
size, or map everything with 2MB pages. 

Anton


Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix

2017-09-06 Thread Aneesh Kumar K.V



On 09/07/2017 10:35 AM, Anton Blanchard wrote:

From: Anton Blanchard 

Memory hot unplug on PowerNV radix hosts is broken. Our memory block
size is 256MB but since we map the linear region with very large pages,
each pte we tear down maps 1GB.

A hot unplug of one 256MB memory block results in 768MB of memory
getting unintentionally unmapped. At this point we are likely to oops.

Fix this by increasing our memory block size to 1GB on PowerNV radix
hosts.

Signed-off-by: Anton Blanchard 
---
  arch/powerpc/platforms/powernv/setup.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 897aa1400eb8..bbb73aa0eb8f 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -272,7 +272,15 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int 
secondary)
  #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
  static unsigned long pnv_memory_block_size(void)
  {
-   return 256UL * 1024 * 1024;
+   /*
+* We map the kernel linear region with 1GB large pages on radix. For
+* memory hot unplug to work our memory block size must be at least
+* this size.
+*/
+   if (radix_enabled())
+   return 1UL * 1024 * 1024 * 1024;
+   else
+   return 256UL * 1024 * 1024;
  }
  #endif



There is a similar issue being worked on w.r.t pseries.

https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bhar...@linux.vnet.ibm.com

The question is should we map these regions ? ie, we need to tell the 
kernel memory region that we would like to hot unplug later so that we 
avoid doing kernel allocations from that. If we do that, then we can 
possibly map them via 2M size ?


-aneesh



[PATCH] powerpc/powernv: Increase memory block size to 1GB on radix

2017-09-06 Thread Anton Blanchard
From: Anton Blanchard 

Memory hot unplug on PowerNV radix hosts is broken. Our memory block
size is 256MB but since we map the linear region with very large pages,
each pte we tear down maps 1GB.

A hot unplug of one 256MB memory block results in 768MB of memory
getting unintentionally unmapped. At this point we are likely to oops.

Fix this by increasing our memory block size to 1GB on PowerNV radix
hosts.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/platforms/powernv/setup.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 897aa1400eb8..bbb73aa0eb8f 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -272,7 +272,15 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int 
secondary)
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 static unsigned long pnv_memory_block_size(void)
 {
-   return 256UL * 1024 * 1024;
+   /*
+* We map the kernel linear region with 1GB large pages on radix. For
+* memory hot unplug to work our memory block size must be at least
+* this size.
+*/
+   if (radix_enabled())
+   return 1UL * 1024 * 1024 * 1024;
+   else
+   return 256UL * 1024 * 1024;
 }
 #endif
 
-- 
2.11.0