Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
We should do the 1G block size as a fix, and backport it, and then make the hot unplug code smarter. cheers On 8 September 2017 11:15:47 am AEST, Anton Blanchardwrote: >Hi Reza, > >> I may be misunderstanding this, but what if we did something like x86 > >> does? When trying to unplug a region smaller than the mapping, they >> fill that part of the pagetable with 0xFD instead of freeing the >> whole thing. Once the whole thing is 0xFD, free it. >> >> See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table() >> >> ---%<--- >> memset((void *)addr, PAGE_INUSE, next - addr); >> >> page_addr = page_address(pte_page(*pte)); >> if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) { >> ... >> pte_clear(_mm, addr, pte); >> ... >> } >> ---%<--- > >But you only have 1GB ptes at this point, you'd need to start >instantiating a new level in the tree, and populate 2MB ptes. > >That is what Ben is suggesting. I'm happy to go any way (fix hotplug >to handle this, or increase the memblock size on PowerNV to 1GB), I >just >need a solution. > >Anton -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
On Thu, Sep 7, 2017 at 5:21 PM, Benjamin Herrenschmidtwrote: > On Thu, 2017-09-07 at 15:17 +1000, Anton Blanchard wrote: >> Hi, >> >> > There is a similar issue being worked on w.r.t pseries. >> > >> > https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bhar...@linux.vnet.ibm.com >> > >> > The question is should we map these regions ? ie, we need to tell the >> > kernel memory region that we would like to hot unplug later so that >> > we avoid doing kernel allocations from that. If we do that, then we >> > can possibly map them via 2M size ? >> >> But all of memory on PowerNV should be able to be hot unplugged, so For this ideally we need movable mappings for the regions we intend to hot-unplug - no? Otherwise, there is no guarantee that hot-unplug will work >> there are two options as I see it - either increase the memory block >> size, or map everything with 2MB pages. > > Or be smarter and map with 1G when blocks of 1G are available and break > down to 2M where necessary, it shouldn't be too hard. > strict_rwx patches added helpers to do this Balbir Singh.
Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
Hi Reza, > I may be misunderstanding this, but what if we did something like x86 > does? When trying to unplug a region smaller than the mapping, they > fill that part of the pagetable with 0xFD instead of freeing the > whole thing. Once the whole thing is 0xFD, free it. > > See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table() > > ---%<--- > memset((void *)addr, PAGE_INUSE, next - addr); > > page_addr = page_address(pte_page(*pte)); > if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) { > ... > pte_clear(_mm, addr, pte); > ... > } > ---%<--- But you only have 1GB ptes at this point, you'd need to start instantiating a new level in the tree, and populate 2MB ptes. That is what Ben is suggesting. I'm happy to go any way (fix hotplug to handle this, or increase the memblock size on PowerNV to 1GB), I just need a solution. Anton
Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
On Thu, Sep 07, 2017 at 05:17:41AM +, Anton Blanchard wrote: But all of memory on PowerNV should be able to be hot unplugged, so there are two options as I see it - either increase the memory block size, or map everything with 2MB pages. I may be misunderstanding this, but what if we did something like x86 does? When trying to unplug a region smaller than the mapping, they fill that part of the pagetable with 0xFD instead of freeing the whole thing. Once the whole thing is 0xFD, free it. See arch/x86/mm/init_64.c:remove_{pte,pmd,pud}_table() ---%<--- memset((void *)addr, PAGE_INUSE, next - addr); page_addr = page_address(pte_page(*pte)); if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) { ... pte_clear(_mm, addr, pte); ... } ---%<--- -- Reza Arbab
Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
On Thu, 2017-09-07 at 15:17 +1000, Anton Blanchard wrote: > Hi, > > > There is a similar issue being worked on w.r.t pseries. > > > > https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bhar...@linux.vnet.ibm.com > > > > The question is should we map these regions ? ie, we need to tell the > > kernel memory region that we would like to hot unplug later so that > > we avoid doing kernel allocations from that. If we do that, then we > > can possibly map them via 2M size ? > > But all of memory on PowerNV should be able to be hot unplugged, so > there are two options as I see it - either increase the memory block > size, or map everything with 2MB pages. Or be smarter and map with 1G when blocks of 1G are available and break down to 2M where necessary, it shouldn't be too hard. Cheers, Ben.
Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
Hi, > There is a similar issue being worked on w.r.t pseries. > > https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bhar...@linux.vnet.ibm.com > > The question is should we map these regions ? ie, we need to tell the > kernel memory region that we would like to hot unplug later so that > we avoid doing kernel allocations from that. If we do that, then we > can possibly map them via 2M size ? But all of memory on PowerNV should be able to be hot unplugged, so there are two options as I see it - either increase the memory block size, or map everything with 2MB pages. Anton
Re: [PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
On 09/07/2017 10:35 AM, Anton Blanchard wrote: From: Anton BlanchardMemory hot unplug on PowerNV radix hosts is broken. Our memory block size is 256MB but since we map the linear region with very large pages, each pte we tear down maps 1GB. A hot unplug of one 256MB memory block results in 768MB of memory getting unintentionally unmapped. At this point we are likely to oops. Fix this by increasing our memory block size to 1GB on PowerNV radix hosts. Signed-off-by: Anton Blanchard --- arch/powerpc/platforms/powernv/setup.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 897aa1400eb8..bbb73aa0eb8f 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -272,7 +272,15 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int secondary) #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE static unsigned long pnv_memory_block_size(void) { - return 256UL * 1024 * 1024; + /* +* We map the kernel linear region with 1GB large pages on radix. For +* memory hot unplug to work our memory block size must be at least +* this size. +*/ + if (radix_enabled()) + return 1UL * 1024 * 1024 * 1024; + else + return 256UL * 1024 * 1024; } #endif There is a similar issue being worked on w.r.t pseries. https://lkml.kernel.org/r/1502357028-27465-1-git-send-email-bhar...@linux.vnet.ibm.com The question is should we map these regions ? ie, we need to tell the kernel memory region that we would like to hot unplug later so that we avoid doing kernel allocations from that. If we do that, then we can possibly map them via 2M size ? -aneesh
[PATCH] powerpc/powernv: Increase memory block size to 1GB on radix
From: Anton BlanchardMemory hot unplug on PowerNV radix hosts is broken. Our memory block size is 256MB but since we map the linear region with very large pages, each pte we tear down maps 1GB. A hot unplug of one 256MB memory block results in 768MB of memory getting unintentionally unmapped. At this point we are likely to oops. Fix this by increasing our memory block size to 1GB on PowerNV radix hosts. Signed-off-by: Anton Blanchard --- arch/powerpc/platforms/powernv/setup.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 897aa1400eb8..bbb73aa0eb8f 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -272,7 +272,15 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int secondary) #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE static unsigned long pnv_memory_block_size(void) { - return 256UL * 1024 * 1024; + /* +* We map the kernel linear region with 1GB large pages on radix. For +* memory hot unplug to work our memory block size must be at least +* this size. +*/ + if (radix_enabled()) + return 1UL * 1024 * 1024 * 1024; + else + return 256UL * 1024 * 1024; } #endif -- 2.11.0