Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-29 Thread Reza Arbab

On Tue, Jun 28, 2016 at 09:21:05PM +1000, Michael Ellerman wrote:

No, you need to use mmu_linear_psize for the hotplug case.

But you can probably factor out a common routine that both cases use, and hide
the hash vs radix check in that.


Okay, I'm trying to refactor {create,remove}_section_mapping() into 
hash__ and radix__ variants. This lead to a couple of questions.


Pseudocode for radix__create_section_mapping(start, end):

page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
start = _ALIGN_DOWN(start, page_size);

for (; start < end; start += page_size) {
radix__map_kernel_page(start, __pa(start),
   PAGE_KERNEL, page_size);
}

Should the above use PAGE_KERNEL, like the the hash table bolt, or 
(_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_KERNEL_RW), like in the radix 
vmemmap creation?


The other question is what radix__remove_section_mapping() should do.
I don't know offhand what the opposite of map_kernel_page() is. As 
Aneesh mentioned, radix vmemmap removal is currently stubbed as a FIXME 
so I couldn't use that as a reference.


--
Reza Arbab



Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-28 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> On Thu, 2016-06-23 at 14:37 -0500, Reza Arbab wrote:
>> On Thu, Jun 23, 2016 at 10:47:20PM +0530, Aneesh Kumar K.V wrote:
>> > Reza Arbab  writes:
>> > > These functions are making direct calls to the hash table APIs,
>> > > leading to a BUG() on systems using radix.
>> > > 
>> > > Switch them to the vmemmap_{create,remove}_mapping() wrappers, and
>> > > move to the __meminit section.
>> > 
>> > They are really not the same. They can possibly end up using different
>> > base page size. Also vmemmap is available only with SPARSEMEM_VMEMMAP
>> > enabled. Does hotplug depend on sparsemem vmemmap ?
>> 
>> I'm not sure. Maybe it's best if I back up a step and explain what lead 
>> me to this patch. During hotplug, you get
>> 
>> ...
>>  arch_add_memory
>>  create_section_mapping
>>  htab_bolt_mapping
>>  BUG_ON(!ppc_md.hpte_insert);
>> 
>> So it seemed to me that I needed a radix equivalent of 
>> create_section_mapping().
>> 
>> After some digging, I found hash__vmemmap_create_mapping() and 
>> radix__vmemmap_create_mapping() did what I needed. I did not notice the 
>> #ifdef SPARSEMEM_VMEMMAP around them.
>
> I think that's more by luck than design. The vmemmap routines use
> mmu_vmemmap_psize which is probably but not definitely the same as
> mmu_linear_psize.
>
>> Could it be that the functions just need to be renamed 
>> hash__create_mapping()/radix__create_mapping() and moved out of #ifdef 
>> SPARSEMEM_VMEMMAP?
>
> No, you need to use mmu_linear_psize for the hotplug case.
>
> But you can probably factor out a common routine that both cases use, and hide
> the hash vs radix check in that.
>
> And probably send me a patch to make MEMORY_HOTPLUG depend on !RADIX for v4.7?

Few other stuff we need to still look from Radix point
1) machine check handling/memory errors
2) kexec

-aneesh



Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-28 Thread Michael Ellerman
On Thu, 2016-06-23 at 14:37 -0500, Reza Arbab wrote:
> On Thu, Jun 23, 2016 at 10:47:20PM +0530, Aneesh Kumar K.V wrote:
> > Reza Arbab  writes:
> > > These functions are making direct calls to the hash table APIs,
> > > leading to a BUG() on systems using radix.
> > > 
> > > Switch them to the vmemmap_{create,remove}_mapping() wrappers, and
> > > move to the __meminit section.
> > 
> > They are really not the same. They can possibly end up using different
> > base page size. Also vmemmap is available only with SPARSEMEM_VMEMMAP
> > enabled. Does hotplug depend on sparsemem vmemmap ?
> 
> I'm not sure. Maybe it's best if I back up a step and explain what lead 
> me to this patch. During hotplug, you get
> 
> ...
>   arch_add_memory
>   create_section_mapping
>   htab_bolt_mapping
>   BUG_ON(!ppc_md.hpte_insert);
> 
> So it seemed to me that I needed a radix equivalent of 
> create_section_mapping().
> 
> After some digging, I found hash__vmemmap_create_mapping() and 
> radix__vmemmap_create_mapping() did what I needed. I did not notice the 
> #ifdef SPARSEMEM_VMEMMAP around them.

I think that's more by luck than design. The vmemmap routines use
mmu_vmemmap_psize which is probably but not definitely the same as
mmu_linear_psize.

> Could it be that the functions just need to be renamed 
> hash__create_mapping()/radix__create_mapping() and moved out of #ifdef 
> SPARSEMEM_VMEMMAP?

No, you need to use mmu_linear_psize for the hotplug case.

But you can probably factor out a common routine that both cases use, and hide
the hash vs radix check in that.

And probably send me a patch to make MEMORY_HOTPLUG depend on !RADIX for v4.7?

cheers



Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-23 Thread Balbir Singh


On 24/06/16 03:17, Aneesh Kumar K.V wrote:
> Reza Arbab  writes:
> 
>> These functions are making direct calls to the hash table APIs,
>> leading to a BUG() on systems using radix.
>>
>> Switch them to the vmemmap_{create,remove}_mapping() wrappers, and
>> move to the __meminit section.
> 
> 
> They are really not the same. They can possibly end up using different
> base page size. Also vmemmap is available only with SPARSEMEM_VMEMMAP
> enabled. Does hotplug depend on sparsemem vmemmap ?

# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on ARCH_ENABLE_MEMORY_HOTPLUG

We depend on sparsemem for sure. vmemmap is just a way of getting the memory
virtually mapped. From the patch perspective, I think we need the equivalent of
just mapping the pages in kernel. The address may differ based on whether 
vmemmap
is used or not and of-course page_size, 

Balbir Singh


Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-23 Thread Reza Arbab

On Thu, Jun 23, 2016 at 02:37:39PM -0500, Reza Arbab wrote:
Could it be that the functions just need to be renamed 
hash__create_mapping()/radix__create_mapping() and moved out of #ifdef 
SPARSEMEM_VMEMMAP?


Or vice-versa, maybe the callers should have been wrapped in the first 
place:


arch_add_memory() {
...
if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
vmemmap_create_mapping()
...
}

arch_remove_memory() {
...
if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
vmemmap_remove_mapping()
...
}

--
Reza Arbab



Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-23 Thread Reza Arbab

On Thu, Jun 23, 2016 at 10:47:20PM +0530, Aneesh Kumar K.V wrote:

Reza Arbab  writes:

These functions are making direct calls to the hash table APIs,
leading to a BUG() on systems using radix.

Switch them to the vmemmap_{create,remove}_mapping() wrappers, and
move to the __meminit section.


They are really not the same. They can possibly end up using different
base page size. Also vmemmap is available only with SPARSEMEM_VMEMMAP
enabled. Does hotplug depend on sparsemem vmemmap ?


I'm not sure. Maybe it's best if I back up a step and explain what lead 
me to this patch. During hotplug, you get


...
arch_add_memory
create_section_mapping
htab_bolt_mapping
BUG_ON(!ppc_md.hpte_insert);

So it seemed to me that I needed a radix equivalent of 
create_section_mapping().


After some digging, I found hash__vmemmap_create_mapping() and 
radix__vmemmap_create_mapping() did what I needed. I did not notice the 
#ifdef SPARSEMEM_VMEMMAP around them.


Hotplug/remove are now working for me as far as I can tell, so I think
the functional change in itself was correct. Hotplug may not depend on 
sparsemem vmemmap, but we seem to need a radix create_section_mapping() 
regardless.


Could it be that the functions just need to be renamed 
hash__create_mapping()/radix__create_mapping() and moved out of #ifdef 
SPARSEMEM_VMEMMAP?


--
Reza Arbab



Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-23 Thread Aneesh Kumar K.V
Reza Arbab  writes:

> These functions are making direct calls to the hash table APIs,
> leading to a BUG() on systems using radix.
>
> Switch them to the vmemmap_{create,remove}_mapping() wrappers, and
> move to the __meminit section.


They are really not the same. They can possibly end up using different
base page size. Also vmemmap is available only with SPARSEMEM_VMEMMAP
enabled. Does hotplug depend on sparsemem vmemmap ?

>
> Signed-off-by: Reza Arbab 
> ---
>  arch/powerpc/mm/mem.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 2fd57fa..80c6ee7 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -116,7 +116,7 @@ int memory_add_physaddr_to_nid(u64 start)
>  }
>  #endif
>
> -int arch_add_memory(int nid, u64 start, u64 size, bool for_device)
> +int __meminit arch_add_memory(int nid, u64 start, u64 size, bool for_device)
>  {
>   struct pglist_data *pgdata;
>   struct zone *zone;
> @@ -127,7 +127,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>   pgdata = NODE_DATA(nid);
>
>   start = (unsigned long)__va(start);
> - rc = create_section_mapping(start, start + size);
> + rc = vmemmap_create_mapping(start, size, __pa(start));
>   if (rc) {
>   pr_warning(
>   "Unable to create mapping for hot added memory 
> 0x%llx..0x%llx: %d\n",
> @@ -143,7 +143,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
> for_device)
>  }
>
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> -int arch_remove_memory(u64 start, u64 size)
> +int __meminit arch_remove_memory(u64 start, u64 size)
>  {
>   unsigned long start_pfn = start >> PAGE_SHIFT;
>   unsigned long nr_pages = size >> PAGE_SHIFT;
> @@ -157,7 +157,7 @@ int arch_remove_memory(u64 start, u64 size)
>
>   /* Remove htab bolted mappings for this section of memory */
>   start = (unsigned long)__va(start);
> - ret = remove_section_mapping(start, start + size);
> + vmemmap_remove_mapping(start, size);
>
>   /* Ensure all vmalloc mappings are flushed in case they also
>* hit that section of memory
> -- 
> 1.8.3.1

We really want to look at memory hotplug to fully understand the
radix impact. The unplug operation needs to do some additional freeing.

I did put in comments around that with the idea of coming back to that
later

#ifdef CONFIG_MEMORY_HOTPLUG
void radix__vmemmap_remove_mapping(unsigned long start, unsigned long page_size)
{
/* FIXME!! intel does more. We should free page tables mapping vmemmap 
? */
}
#endif

-aneesh



[PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-23 Thread Reza Arbab
These functions are making direct calls to the hash table APIs,
leading to a BUG() on systems using radix.

Switch them to the vmemmap_{create,remove}_mapping() wrappers, and
move to the __meminit section.

Signed-off-by: Reza Arbab 
---
 arch/powerpc/mm/mem.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 2fd57fa..80c6ee7 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -116,7 +116,7 @@ int memory_add_physaddr_to_nid(u64 start)
 }
 #endif
 
-int arch_add_memory(int nid, u64 start, u64 size, bool for_device)
+int __meminit arch_add_memory(int nid, u64 start, u64 size, bool for_device)
 {
struct pglist_data *pgdata;
struct zone *zone;
@@ -127,7 +127,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
for_device)
pgdata = NODE_DATA(nid);
 
start = (unsigned long)__va(start);
-   rc = create_section_mapping(start, start + size);
+   rc = vmemmap_create_mapping(start, size, __pa(start));
if (rc) {
pr_warning(
"Unable to create mapping for hot added memory 
0x%llx..0x%llx: %d\n",
@@ -143,7 +143,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
for_device)
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-int arch_remove_memory(u64 start, u64 size)
+int __meminit arch_remove_memory(u64 start, u64 size)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -157,7 +157,7 @@ int arch_remove_memory(u64 start, u64 size)
 
/* Remove htab bolted mappings for this section of memory */
start = (unsigned long)__va(start);
-   ret = remove_section_mapping(start, start + size);
+   vmemmap_remove_mapping(start, size);
 
/* Ensure all vmalloc mappings are flushed in case they also
 * hit that section of memory
-- 
1.8.3.1