Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
On 2013/1/31 18:38, Simon Jeons wrote: Hi Tang, On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote: Hi Simon, On 01/31/2013 04:48 PM, Simon Jeons wrote: Hi Tang, On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote: 1. IIUC, there is a button on machine which supports hot-remove memory, then what's the difference between press button and echo to /sys? No important difference, I think. Since I don't have the machine you are saying, I cannot surely answer you. :) AFAIK, pressing the button means trigger the hotplug from hardware, sysfs is just another entrance. At last, they will run into the same code. 2. Since kernel memory is linear mapping(I mean direct mapping part), why can't put kernel direct mapping memory into one memory device, and other memory into the other devices? We cannot do that because in that way, we will lose NUMA performance. If you know NUMA, you will understand the following example: node0:node1: cpu0~cpu15cpu16~cpu31 memory0~memory511 memory512~memory1023 cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511. If we set direct mapping area in node0, and movable area in node1, then the kernel code running on cpu16~cpu31 will have to access memory0~memory511. This is a terrible performance down. So if config NUMA, kernel memory will not be linear mapping anymore? For example, Node 0 Node 1 0 ~ 10G 11G~14G kernel memory only at Node 0? Can part of kernel memory also at Node 1? How big is kernel direct mapping memory in x86_64? Is there max limit? Max kernel direct mapping memory in x86_64 is 64TB. It seems that only around 896MB on x86_32. As you know x86_64 don't need highmem, IIUC, all kernel memory will linear mapping in this case. Is my idea available? If is correct, x86_32 can't implement in the same way since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's hard to focus kernel memory on single memory device. Sorry, I'm not quite familiar with x86_32 box. 3. In current implementation, if memory hotplug just need memory subsystem and ACPI codes support? Or also needs firmware take part in? Hope you can explain in details, thanks in advance. :) We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware based memory migration mentioned by Liu Jiang. Is there any material about firmware based memory migration? So far, I only know this. :) 4. What's the status of memory hotplug? Apart from can't remove kernel memory, other things are fully implementation? I think the main job is done for now. And there are still bugs to fix. And this functionality is not stable. Thanks. :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a . ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
On 2013/2/1 9:36, Simon Jeons wrote: On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote: On 2013/1/31 18:38, Simon Jeons wrote: Hi Tang, On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote: Hi Simon, On 01/31/2013 04:48 PM, Simon Jeons wrote: Hi Tang, On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote: 1. IIUC, there is a button on machine which supports hot-remove memory, then what's the difference between press button and echo to /sys? No important difference, I think. Since I don't have the machine you are saying, I cannot surely answer you. :) AFAIK, pressing the button means trigger the hotplug from hardware, sysfs is just another entrance. At last, they will run into the same code. 2. Since kernel memory is linear mapping(I mean direct mapping part), why can't put kernel direct mapping memory into one memory device, and other memory into the other devices? We cannot do that because in that way, we will lose NUMA performance. If you know NUMA, you will understand the following example: node0:node1: cpu0~cpu15cpu16~cpu31 memory0~memory511 memory512~memory1023 cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511. If we set direct mapping area in node0, and movable area in node1, then the kernel code running on cpu16~cpu31 will have to access memory0~memory511. This is a terrible performance down. So if config NUMA, kernel memory will not be linear mapping anymore? For example, Node 0 Node 1 0 ~ 10G 11G~14G kernel memory only at Node 0? Can part of kernel memory also at Node 1? How big is kernel direct mapping memory in x86_64? Is there max limit? Max kernel direct mapping memory in x86_64 is 64TB. For example, I have 8G memory, all of them will be direct mapping for kernel? then userspace memory allocated from where? Direct mapping memory means you can use __va() and pa(), but not means that them can be only used by kernel, them can be used by user-space too, as long as them are free. It seems that only around 896MB on x86_32. As you know x86_64 don't need highmem, IIUC, all kernel memory will linear mapping in this case. Is my idea available? If is correct, x86_32 can't implement in the same way since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's hard to focus kernel memory on single memory device. Sorry, I'm not quite familiar with x86_32 box. 3. In current implementation, if memory hotplug just need memory subsystem and ACPI codes support? Or also needs firmware take part in? Hope you can explain in details, thanks in advance. :) We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware based memory migration mentioned by Liu Jiang. Is there any material about firmware based memory migration? So far, I only know this. :) 4. What's the status of memory hotplug? Apart from can't remove kernel memory, other things are fully implementation? I think the main job is done for now. And there are still bugs to fix. And this functionality is not stable. Thanks. :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a . . ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory
On 2013/2/1 10:06, Simon Jeons wrote: Hi Jianguo, On Fri, 2013-02-01 at 09:57 +0800, Jianguo Wu wrote: On 2013/2/1 9:36, Simon Jeons wrote: On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote: On 2013/1/31 18:38, Simon Jeons wrote: Hi Tang, On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote: Hi Simon, On 01/31/2013 04:48 PM, Simon Jeons wrote: Hi Tang, On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote: 1. IIUC, there is a button on machine which supports hot-remove memory, then what's the difference between press button and echo to /sys? No important difference, I think. Since I don't have the machine you are saying, I cannot surely answer you. :) AFAIK, pressing the button means trigger the hotplug from hardware, sysfs is just another entrance. At last, they will run into the same code. 2. Since kernel memory is linear mapping(I mean direct mapping part), why can't put kernel direct mapping memory into one memory device, and other memory into the other devices? We cannot do that because in that way, we will lose NUMA performance. If you know NUMA, you will understand the following example: node0:node1: cpu0~cpu15cpu16~cpu31 memory0~memory511 memory512~memory1023 cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511. If we set direct mapping area in node0, and movable area in node1, then the kernel code running on cpu16~cpu31 will have to access memory0~memory511. This is a terrible performance down. So if config NUMA, kernel memory will not be linear mapping anymore? For example, Node 0 Node 1 0 ~ 10G 11G~14G kernel memory only at Node 0? Can part of kernel memory also at Node 1? How big is kernel direct mapping memory in x86_64? Is there max limit? Max kernel direct mapping memory in x86_64 is 64TB. For example, I have 8G memory, all of them will be direct mapping for kernel? then userspace memory allocated from where? Direct mapping memory means you can use __va() and pa(), but not means that them can be only used by kernel, them can be used by user-space too, as long as them are free. IIUC, the benefit of va() and pa() is just for quick get virtual/physical address, it takes advantage of linear mapping. But mmu still need to go through pgd/pud/pmd/pte, correct? Yes. It seems that only around 896MB on x86_32. As you know x86_64 don't need highmem, IIUC, all kernel memory will linear mapping in this case. Is my idea available? If is correct, x86_32 can't implement in the same way since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's hard to focus kernel memory on single memory device. Sorry, I'm not quite familiar with x86_32 box. 3. In current implementation, if memory hotplug just need memory subsystem and ACPI codes support? Or also needs firmware take part in? Hope you can explain in details, thanks in advance. :) We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware based memory migration mentioned by Liu Jiang. Is there any material about firmware based memory migration? So far, I only know this. :) 4. What's the status of memory hotplug? Apart from can't remove kernel memory, other things are fully implementation? I think the main job is done for now. And there are still bugs to fix. And this functionality is not stable. Thanks. :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a . . . ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 08/15] memory-hotplug: Common APIs to support page tables hot-remove
On 2013/1/29 21:02, Simon Jeons wrote: Hi Tang, On Wed, 2013-01-09 at 17:32 +0800, Tang Chen wrote: From: Wen Congyang we...@cn.fujitsu.com When memory is removed, the corresponding pagetables should alse be removed. This patch introduces some common APIs to support vmemmap pagetable and x86_64 architecture pagetable removing. When page table of hot-add memory is created? Hi Simon, For x86_64, page table of hot-add memory is created by: add_memory-arch_add_memory-init_memory_mapping-kernel_physical_mapping_init All pages of virtual mapping in removed memory cannot be freedi if some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch uses the following way to check whether page can be freed or not. 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Jianguo Wu wujian...@huawei.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Tang Chen tangc...@cn.fujitsu.com --- arch/x86/include/asm/pgtable_types.h |1 + arch/x86/mm/init_64.c| 299 ++ arch/x86/mm/pageattr.c | 47 +++--- include/linux/bootmem.h |1 + 4 files changed, 326 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 3c32db8..4b6fd2a 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -352,6 +352,7 @@ static inline void update_page_count(int level, unsigned long pages) { } * as a pte too. */ extern pte_t *lookup_address(unsigned long address, unsigned int *level); +extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase); #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 9ac1723..fe01116 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -682,6 +682,305 @@ int arch_add_memory(int nid, u64 start, u64 size) } EXPORT_SYMBOL_GPL(arch_add_memory); +#define PAGE_INUSE 0xFD + +static void __meminit free_pagetable(struct page *page, int order) +{ +struct zone *zone; +bool bootmem = false; +unsigned long magic; +unsigned int nr_pages = 1 order; + +/* bootmem page has reserved flag */ +if (PageReserved(page)) { +__ClearPageReserved(page); +bootmem = true; + +magic = (unsigned long)page-lru.next; +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) { +while (nr_pages--) +put_page_bootmem(page++); +} else +__free_pages_bootmem(page, order); +} else +free_pages((unsigned long)page_address(page), order); + +/* + * SECTION_INFO pages and MIX_SECTION_INFO pages + * are all allocated by bootmem. + */ +if (bootmem) { +zone = page_zone(page); +zone_span_writelock(zone); +zone-present_pages += nr_pages; +zone_span_writeunlock(zone); +totalram_pages += nr_pages; +} +} + +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) +{ +pte_t *pte; +int i; + +for (i = 0; i PTRS_PER_PTE; i++) { +pte = pte_start + i; +if (pte_val(*pte)) +return; +} + +/* free a pte talbe */ +free_pagetable(pmd_page(*pmd), 0); +spin_lock(init_mm.page_table_lock); +pmd_clear(pmd); +spin_unlock(init_mm.page_table_lock); +} + +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) +{ +pmd_t *pmd; +int i; + +for (i = 0; i PTRS_PER_PMD; i++) { +pmd = pmd_start + i; +if (pmd_val(*pmd)) +return; +} + +/* free a pmd talbe */ +free_pagetable(pud_page(*pud), 0); +spin_lock(init_mm.page_table_lock); +pud_clear(pud); +spin_unlock(init_mm.page_table_lock); +} + +/* Return true if pgd is changed, otherwise return false. */ +static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd) +{ +pud_t *pud; +int i; + +for (i = 0; i PTRS_PER_PUD; i++) { +pud = pud_start + i; +if (pud_val(*pud)) +return false; +} + +/* free a pud table */ +free_pagetable(pgd_page(*pgd), 0); +spin_lock(init_mm.page_table_lock); +pgd_clear(pgd); +spin_unlock(init_mm.page_table_lock); + +return true; +} + +static void __meminit +remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end, + bool direct) +{ +unsigned long
Re: [PATCH v5 06/14] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap
); + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + continue; + get_page_bootmem(section_nr, pmd_page(*pmd), + SECTION_INFO); Hi Tangļ¼ In this case, pmd maps 512 pages, but you only get_page_bootmem() on the first page. I think the whole 512 pages should be get_page_bootmem(), what do you think? Thanks, Jianguo Wu + } + } +} + void __meminit vmemmap_populate_print_last(void) { if (p_start) { diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 31a563b..2441f36 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -174,17 +174,10 @@ static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) #endif /* CONFIG_NUMA */ #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ -#ifdef CONFIG_SPARSEMEM_VMEMMAP -static inline void register_page_bootmem_info_node(struct pglist_data *pgdat) -{ -} -static inline void put_page_bootmem(struct page *page) -{ -} -#else extern void register_page_bootmem_info_node(struct pglist_data *pgdat); extern void put_page_bootmem(struct page *page); -#endif +extern void get_page_bootmem(unsigned long ingo, struct page *page, + unsigned long type); /* * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug diff --git a/include/linux/mm.h b/include/linux/mm.h index 6320407..1eca498 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1709,7 +1709,8 @@ int vmemmap_populate_basepages(struct page *start_page, unsigned long pages, int node); int vmemmap_populate(struct page *start_page, unsigned long pages, int node); void vmemmap_populate_print_last(void); - +void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, + unsigned long size); enum mf_flags { MF_COUNT_INCREASED = 1 0, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 2c5d734..34c656b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -91,9 +91,8 @@ static void release_memory_resource(struct resource *res) } #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE -#ifndef CONFIG_SPARSEMEM_VMEMMAP -static void get_page_bootmem(unsigned long info, struct page *page, - unsigned long type) +void get_page_bootmem(unsigned long info, struct page *page, + unsigned long type) { page-lru.next = (struct list_head *) type; SetPagePrivate(page); @@ -128,6 +127,7 @@ void __ref put_page_bootmem(struct page *page) } +#ifndef CONFIG_SPARSEMEM_VMEMMAP static void register_page_bootmem_info_section(unsigned long start_pfn) { unsigned long *usemap, mapsize, section_nr, i; @@ -161,6 +161,32 @@ static void register_page_bootmem_info_section(unsigned long start_pfn) get_page_bootmem(section_nr, page, MIX_SECTION_INFO); } +#else +static void register_page_bootmem_info_section(unsigned long start_pfn) +{ + unsigned long *usemap, mapsize, section_nr, i; + struct mem_section *ms; + struct page *page, *memmap; + + if (!pfn_valid(start_pfn)) + return; + + section_nr = pfn_to_section_nr(start_pfn); + ms = __nr_to_section(section_nr); + + memmap = sparse_decode_mem_map(ms-section_mem_map, section_nr); + + register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION); + + usemap = __nr_to_section(section_nr)-pageblock_flags; + page = virt_to_page(usemap); + + mapsize = PAGE_ALIGN(usemap_size()) PAGE_SHIFT; + + for (i = 0; i mapsize; i++, page++) + get_page_bootmem(section_nr, page, MIX_SECTION_INFO); +} +#endif void register_page_bootmem_info_node(struct pglist_data *pgdat) { @@ -203,7 +229,6 @@ void register_page_bootmem_info_node(struct pglist_data *pgdat) register_page_bootmem_info_section(pfn); } } -#endif /* !CONFIG_SPARSEMEM_VMEMMAP */ static void grow_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 08/14] memory-hotplug: Common APIs to support page tables hot-remove
On 2012/12/24 20:09, Tang Chen wrote: From: Wen Congyang we...@cn.fujitsu.com When memory is removed, the corresponding pagetables should alse be removed. This patch introduces some common APIs to support vmemmap pagetable and x86_64 architecture pagetable removing. All pages of virtual mapping in removed memory cannot be freedi if some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch uses the following way to check whether page can be freed or not. 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Jianguo Wu wujian...@huawei.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Tang Chen tangc...@cn.fujitsu.com --- arch/x86/include/asm/pgtable_types.h |1 + arch/x86/mm/init_64.c| 297 ++ arch/x86/mm/pageattr.c | 47 +++--- include/linux/bootmem.h |1 + 4 files changed, 324 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 3c32db8..4b6fd2a 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -352,6 +352,7 @@ static inline void update_page_count(int level, unsigned long pages) { } * as a pte too. */ extern pte_t *lookup_address(unsigned long address, unsigned int *level); +extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase); #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index aeaa27e..b30df3c 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -682,6 +682,303 @@ int arch_add_memory(int nid, u64 start, u64 size) } EXPORT_SYMBOL_GPL(arch_add_memory); +#define PAGE_INUSE 0xFD + +static void __meminit free_pagetable(struct page *page, int order) +{ + struct zone *zone; + bool bootmem = false; + unsigned long magic; + + /* bootmem page has reserved flag */ + if (PageReserved(page)) { + __ClearPageReserved(page); + bootmem = true; + + magic = (unsigned long)page-lru.next; + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) + put_page_bootmem(page); Hi Tang, For removing memmap of sparse-vmemmap, in cpu_has_pse case, if magic == SECTION_INFO, the order will be get_order(PMD_SIZE), so we need a loop here to put all the 512 pages. Thanks, Jianguo Wu + else + __free_pages_bootmem(page, order); + } else + free_pages((unsigned long)page_address(page), order); + + /* + * SECTION_INFO pages and MIX_SECTION_INFO pages + * are all allocated by bootmem. + */ + if (bootmem) { + zone = page_zone(page); + zone_span_writelock(zone); + zone-present_pages++; + zone_span_writeunlock(zone); + totalram_pages++; + } +} + +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) +{ + pte_t *pte; + int i; + + for (i = 0; i PTRS_PER_PTE; i++) { + pte = pte_start + i; + if (pte_val(*pte)) + return; + } + + /* free a pte talbe */ + free_pagetable(pmd_page(*pmd), 0); + spin_lock(init_mm.page_table_lock); + pmd_clear(pmd); + spin_unlock(init_mm.page_table_lock); +} + +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) +{ + pmd_t *pmd; + int i; + + for (i = 0; i PTRS_PER_PMD; i++) { + pmd = pmd_start + i; + if (pmd_val(*pmd)) + return; + } + + /* free a pmd talbe */ + free_pagetable(pud_page(*pud), 0); + spin_lock(init_mm.page_table_lock); + pud_clear(pud); + spin_unlock(init_mm.page_table_lock); +} + +/* Return true if pgd is changed, otherwise return false. */ +static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd) +{ + pud_t *pud; + int i; + + for (i = 0; i PTRS_PER_PUD; i++) { + pud = pud_start + i; + if (pud_val(*pud)) + return false; + } + + /* free a pud table */ + free_pagetable(pgd_page(*pgd), 0); + spin_lock(init_mm.page_table_lock); + pgd_clear(pgd); + spin_unlock(init_mm.page_table_lock); + + return true; +} + +static void __meminit +remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end, + bool direct) +{ + unsigned long next, pages = 0; + pte_t *pte; + void *page_addr; + phys_addr_t phys_addr; + + pte
Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Hi Tang, On 2012/12/7 9:42, Tang Chen wrote: Hi Wu, I met some problems when I was digging into the code. It's very kind of you if you could help me with that. :) If I misunderstood your code, please tell me. Please see below. :) On 12/03/2012 10:23 AM, Jianguo Wu wrote: Signed-off-by: Jianguo Wuwujian...@huawei.com Signed-off-by: Jiang Liujiang@huawei.com --- include/linux/mm.h |1 + mm/sparse-vmemmap.c | 231 +++ mm/sparse.c |3 +- 3 files changed, 234 insertions(+), 1 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5657670..1f26af5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node); void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_free(struct page *memmap, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 1b7e22a..748732d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -29,6 +29,10 @@ #includeasm/pgalloc.h #includeasm/pgtable.h +#ifdef CONFIG_MEMORY_HOTREMOVE +#includeasm/tlbflush.h +#endif + /* * Allocate a block of memory to be used to back the virtual memory map * or to back the page tables that are used to create the mapping. @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, vmemmap_buf_end = NULL; } } + +#ifdef CONFIG_MEMORY_HOTREMOVE + +#define PAGE_INUSE 0xFD + +static void vmemmap_free_pages(struct page *page, int order) +{ +struct zone *zone; +unsigned long magic; + +magic = (unsigned long) page-lru.next; +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) { +put_page_bootmem(page); + +zone = page_zone(page); +zone_span_writelock(zone); +zone-present_pages++; +zone_span_writeunlock(zone); +totalram_pages++; +} else +free_pages((unsigned long)page_address(page), order); Here, I think SECTION_INFO and MIX_SECTION_INFO pages are all allocated by bootmem, so I put this function this way. I'm not sure if parameter order is necessary here. It will always be 0 in your code. Is this OK to you ? parameter order is necessary in cpu_has_pse case: vmemmap_pmd_remove free_pagetable(pmd_page(*pmd), get_order(PMD_SIZE)) static void free_pagetable(struct page *page) { struct zone *zone; bool bootmem = false; unsigned long magic; /* bootmem page has reserved flag */ if (PageReserved(page)) { __ClearPageReserved(page); bootmem = true; } magic = (unsigned long) page-lru.next; if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) put_page_bootmem(page); else __free_page(page); /* * SECTION_INFO pages and MIX_SECTION_INFO pages * are all allocated by bootmem. */ if (bootmem) { zone = page_zone(page); zone_span_writelock(zone); zone-present_pages++; zone_span_writeunlock(zone); totalram_pages++; } } (snip) + +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long end) +{ +pte_t *pte; +unsigned long next; +void *page_addr; + +pte = pte_offset_kernel(pmd, addr); +for (; addr end; pte++, addr += PAGE_SIZE) { +next = (addr + PAGE_SIZE) PAGE_MASK; +if (next end) +next = end; + +if (pte_none(*pte)) Here, you checked xxx_none() in your vmemmap_xxx_remove(), but you used !xxx_present() in your x86_64 patches. Is it OK if I only check !xxx_present() ? It is Ok. +continue; +if (IS_ALIGNED(addr, PAGE_SIZE) +IS_ALIGNED(next, PAGE_SIZE)) { +vmemmap_free_pages(pte_page(*pte), 0); +spin_lock(init_mm.page_table_lock); +pte_clear(init_mm, addr, pte); +spin_unlock(init_mm.page_table_lock); +} else { +/* + * Removed page structs are filled with 0xFD. + */ +memset((void *)addr, PAGE_INUSE, next - addr); +page_addr = page_address(pte_page(*pte)); + +if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) { +spin_lock(init_mm.page_table_lock); +pte_clear(init_mm, addr, pte); +spin_unlock(init_mm.page_table_lock); Here, since we clear pte, we should also free the page, right ? Right, I forgot here, sorry
Re: [Patch v4 09/12] memory-hotplug: remove page table of x86_64 architecture
On 2012/12/7 14:43, Tang Chen wrote: On 11/27/2012 06:00 PM, Wen Congyang wrote: For hot removing memory, we sholud remove page table about the memory. So the patch searches a page table about the removed memory, and clear page table. (snip) +void __meminit +kernel_physical_mapping_remove(unsigned long start, unsigned long end) +{ +unsigned long next; +bool pgd_changed = false; + +start = (unsigned long)__va(start); +end = (unsigned long)__va(end); Hi Wu, Here, you expect start and end are physical addresses. But in phys_xxx_remove() function, I think using virtual addresses is just fine. Functions like pmd_addr_end() and pud_index() only calculate an offset. Hi Tang, Virtual addresses will work fine, I used physical addresses in order to keep consistent with phys_pud[pmd/pte]_init(), So I think we should keep this. Thanks, Jianguo Wu So, would you please tell me if we have to use physical addresses here ? Thanks. :) + +for (; start end; start = next) { +pgd_t *pgd = pgd_offset_k(start); +pud_t *pud; + +next = pgd_addr_end(start, end); + +if (!pgd_present(*pgd)) +continue; + +pud = map_low_page((pud_t *)pgd_page_vaddr(*pgd)); +phys_pud_remove(pud, __pa(start), __pa(next)); +if (free_pud_table(pud, pgd)) +pgd_changed = true; +unmap_low_page(pud); +} + +if (pgd_changed) +sync_global_pgds(start, end - 1); + +flush_tlb_all(); +} + #ifdef CONFIG_MEMORY_HOTREMOVE int __ref arch_remove_memory(u64 start, u64 size) { @@ -692,6 +921,8 @@ int __ref arch_remove_memory(u64 start, u64 size) ret = __remove_pages(zone, start_pfn, nr_pages); WARN_ON_ONCE(ret); +kernel_physical_mapping_remove(start, start + size); + return ret; } #endif . ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Hi Tang, Thanks for your review and comments, Please see below for my reply. On 2012/12/4 17:13, Tang Chen wrote: Hi Wu, Sorry to make noise here. Please see below. :) On 12/03/2012 10:23 AM, Jianguo Wu wrote: Signed-off-by: Jianguo Wuwujian...@huawei.com Signed-off-by: Jiang Liujiang@huawei.com --- include/linux/mm.h |1 + mm/sparse-vmemmap.c | 231 +++ mm/sparse.c |3 +- 3 files changed, 234 insertions(+), 1 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5657670..1f26af5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node); void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_free(struct page *memmap, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 1b7e22a..748732d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -29,6 +29,10 @@ #includeasm/pgalloc.h #includeasm/pgtable.h +#ifdef CONFIG_MEMORY_HOTREMOVE +#includeasm/tlbflush.h +#endif + /* * Allocate a block of memory to be used to back the virtual memory map * or to back the page tables that are used to create the mapping. @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, vmemmap_buf_end = NULL; } } + +#ifdef CONFIG_MEMORY_HOTREMOVE + +#define PAGE_INUSE 0xFD + +static void vmemmap_free_pages(struct page *page, int order) +{ +struct zone *zone; +unsigned long magic; + +magic = (unsigned long) page-lru.next; +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) { +put_page_bootmem(page); + +zone = page_zone(page); +zone_span_writelock(zone); +zone-present_pages++; +zone_span_writeunlock(zone); +totalram_pages++; Seems that we have different ways to handle pages allocated by bootmem or by regular allocator. Is the checking way in [PATCH 09/12] available here ? +/* bootmem page has reserved flag */ +if (PageReserved(page)) { .. +} If so, I think we can just merge these two functions. Hmm, direct mapping table isn't allocated by bootmem allocator such as memblock, can't be free by put_page_bootmem(). But I will try to merge these two functions. +} else +free_pages((unsigned long)page_address(page), order); +} + +static void free_pte_table(pmd_t *pmd) +{ +pte_t *pte, *pte_start; +int i; + +pte_start = (pte_t *)pmd_page_vaddr(*pmd); +for (i = 0; i PTRS_PER_PTE; i++) { +pte = pte_start + i; +if (pte_val(*pte)) +return; +} + +/* free a pte talbe */ +vmemmap_free_pages(pmd_page(*pmd), 0); +spin_lock(init_mm.page_table_lock); +pmd_clear(pmd); +spin_unlock(init_mm.page_table_lock); +} + +static void free_pmd_table(pud_t *pud) +{ +pmd_t *pmd, *pmd_start; +int i; + +pmd_start = (pmd_t *)pud_page_vaddr(*pud); +for (i = 0; i PTRS_PER_PMD; i++) { +pmd = pmd_start + i; +if (pmd_val(*pmd)) +return; +} + +/* free a pmd talbe */ +vmemmap_free_pages(pud_page(*pud), 0); +spin_lock(init_mm.page_table_lock); +pud_clear(pud); +spin_unlock(init_mm.page_table_lock); +} + +static void free_pud_table(pgd_t *pgd) +{ +pud_t *pud, *pud_start; +int i; + +pud_start = (pud_t *)pgd_page_vaddr(*pgd); +for (i = 0; i PTRS_PER_PUD; i++) { +pud = pud_start + i; +if (pud_val(*pud)) +return; +} + +/* free a pud table */ +vmemmap_free_pages(pgd_page(*pgd), 0); +spin_lock(init_mm.page_table_lock); +pgd_clear(pgd); +spin_unlock(init_mm.page_table_lock); +} All the free_xxx_table() are very similar to the functions in [PATCH 09/12]. Could we reuse them anyway ? yes, we can reuse them. + +static int split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase) +{ +struct page *page = pmd_page(*(pmd_t *)kpte); +int i = 0; +unsigned long magic; +unsigned long section_nr; + +__split_large_page(kpte, address, pbase); Is this patch going to replace [PATCH 08/12] ? I wish to replace [PATCH 08/12], but need Congyang and Yasuaki to confirm first:) If so, __split_large_page() was added and exported in [PATCH 09/12], then we should move it here, right ? yes. and what do you think about moving vmemmap_pud[pmd/pte]_remove() to arch/x86/mm/init_64.c, to be consistent with vmemmap_populate() ? I will rework [PATCH 08/12] and [PATCH 09/12] soon. Thanks, Jianguo Wu. If not, free_map_bootmem() and __kfree_section_memmap
Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Hi Tang, On 2012/12/5 10:07, Tang Chen wrote: Hi Wu, On 12/04/2012 08:20 PM, Jianguo Wu wrote: (snip) Seems that we have different ways to handle pages allocated by bootmem or by regular allocator. Is the checking way in [PATCH 09/12] available here ? +/* bootmem page has reserved flag */ +if (PageReserved(page)) { .. +} If so, I think we can just merge these two functions. Hmm, direct mapping table isn't allocated by bootmem allocator such as memblock, can't be free by put_page_bootmem(). But I will try to merge these two functions. Oh, I didn't notice this, thanks. :) (snip) + +__split_large_page(kpte, address, pbase); Is this patch going to replace [PATCH 08/12] ? I wish to replace [PATCH 08/12], but need Congyang and Yasuaki to confirm first:) If so, __split_large_page() was added and exported in [PATCH 09/12], then we should move it here, right ? yes. and what do you think about moving vmemmap_pud[pmd/pte]_remove() to arch/x86/mm/init_64.c, to be consistent with vmemmap_populate() ? It is a good idea since pud/pmd/pte related code could be platform dependent. And I'm also trying to move vmemmap_free() to arch/x86/mm/init_64.c too. I want to have a common interface just like vmemmap_populate(). :) Great. I will rework [PATCH 08/12] and [PATCH 09/12] soon. I am rebasing the whole patch set now. And I think I chould finish part of your work too. A new patch-set is coming soon, and your rework is also welcome. :) Since you are rebasing now, I will wait for your new patche-set :). Thanks. Jianguo Wu Thanks. :) . ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Hi Congyang, This is the new version. Thanks, Jianguo Wu. Signed-off-by: Jianguo Wu wujian...@huawei.com Signed-off-by: Jiang Liu jiang@huawei.com --- include/linux/mm.h |1 + mm/sparse-vmemmap.c | 231 +++ mm/sparse.c |3 +- 3 files changed, 234 insertions(+), 1 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5657670..1f26af5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node); void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_free(struct page *memmap, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 1b7e22a..748732d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -29,6 +29,10 @@ #include asm/pgalloc.h #include asm/pgtable.h +#ifdef CONFIG_MEMORY_HOTREMOVE +#include asm/tlbflush.h +#endif + /* * Allocate a block of memory to be used to back the virtual memory map * or to back the page tables that are used to create the mapping. @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, vmemmap_buf_end = NULL; } } + +#ifdef CONFIG_MEMORY_HOTREMOVE + +#define PAGE_INUSE 0xFD + +static void vmemmap_free_pages(struct page *page, int order) +{ + struct zone *zone; + unsigned long magic; + + magic = (unsigned long) page-lru.next; + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) { + put_page_bootmem(page); + + zone = page_zone(page); + zone_span_writelock(zone); + zone-present_pages++; + zone_span_writeunlock(zone); + totalram_pages++; + } else + free_pages((unsigned long)page_address(page), order); +} + +static void free_pte_table(pmd_t *pmd) +{ + pte_t *pte, *pte_start; + int i; + + pte_start = (pte_t *)pmd_page_vaddr(*pmd); + for (i = 0; i PTRS_PER_PTE; i++) { + pte = pte_start + i; + if (pte_val(*pte)) + return; + } + + /* free a pte talbe */ + vmemmap_free_pages(pmd_page(*pmd), 0); + spin_lock(init_mm.page_table_lock); + pmd_clear(pmd); + spin_unlock(init_mm.page_table_lock); +} + +static void free_pmd_table(pud_t *pud) +{ + pmd_t *pmd, *pmd_start; + int i; + + pmd_start = (pmd_t *)pud_page_vaddr(*pud); + for (i = 0; i PTRS_PER_PMD; i++) { + pmd = pmd_start + i; + if (pmd_val(*pmd)) + return; + } + + /* free a pmd talbe */ + vmemmap_free_pages(pud_page(*pud), 0); + spin_lock(init_mm.page_table_lock); + pud_clear(pud); + spin_unlock(init_mm.page_table_lock); +} + +static void free_pud_table(pgd_t *pgd) +{ + pud_t *pud, *pud_start; + int i; + + pud_start = (pud_t *)pgd_page_vaddr(*pgd); + for (i = 0; i PTRS_PER_PUD; i++) { + pud = pud_start + i; + if (pud_val(*pud)) + return; + } + + /* free a pud table */ + vmemmap_free_pages(pgd_page(*pgd), 0); + spin_lock(init_mm.page_table_lock); + pgd_clear(pgd); + spin_unlock(init_mm.page_table_lock); +} + +static int split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase) +{ + struct page *page = pmd_page(*(pmd_t *)kpte); + int i = 0; + unsigned long magic; + unsigned long section_nr; + + __split_large_page(kpte, address, pbase); + __flush_tlb_all(); + + magic = (unsigned long) page-lru.next; + if (magic == SECTION_INFO) { + section_nr = pfn_to_section_nr(page_to_pfn(page)); + while (i PTRS_PER_PMD) { + page++; + i++; + get_page_bootmem(section_nr, page, SECTION_INFO); + } + } + + return 0; +} + +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long end) +{ + pte_t *pte; + unsigned long next; + void *page_addr; + + pte = pte_offset_kernel(pmd, addr); + for (; addr end; pte++, addr += PAGE_SIZE) { + next = (addr + PAGE_SIZE) PAGE_MASK; + if (next end) + next = end; + + if (pte_none(*pte)) + continue; + if (IS_ALIGNED(addr, PAGE_SIZE) + IS_ALIGNED(next, PAGE_SIZE)) { + vmemmap_free_pages(pte_page(*pte), 0); + spin_lock(init_mm.page_table_lock); + pte_clear(init_mm, addr, pte
Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Hi Congyang, Thanks for your review and comments. On 2012/11/30 9:45, Wen Congyang wrote: At 11/28/2012 05:40 PM, Jianguo Wu Wrote: Hi Congyang, I think vmemmap's pgtable pages should be freed after all entries are cleared, I have a patch to do this. The code logic is the same as [Patch v4 09/12] memory-hotplug: remove page table of x86_64 architecture. How do you think about this? Signed-off-by: Jianguo Wu wujian...@huawei.com Signed-off-by: Jiang Liu jiang@huawei.com --- include/linux/mm.h |1 + mm/sparse-vmemmap.c | 214 +++ mm/sparse.c |5 +- 3 files changed, 218 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5657670..1f26af5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node); void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_free(struct page *memmap, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 1b7e22a..242cb28 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -29,6 +29,10 @@ #include asm/pgalloc.h #include asm/pgtable.h +#ifdef CONFIG_MEMORY_HOTREMOVE +#include asm/tlbflush.h +#endif + /* * Allocate a block of memory to be used to back the virtual memory map * or to back the page tables that are used to create the mapping. @@ -224,3 +228,213 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, vmemmap_buf_end = NULL; } } + +#ifdef CONFIG_MEMORY_HOTREMOVE +static void vmemmap_free_pages(struct page *page, int order) +{ +struct zone *zone; +unsigned long magic; + +magic = (unsigned long) page-lru.next; +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) { +put_page_bootmem(page); + +zone = page_zone(page); +zone_span_writelock(zone); +zone-present_pages++; +zone_span_writeunlock(zone); +totalram_pages++; +} else { +if (is_vmalloc_addr(page_address(page))) +vfree(page_address(page)); Hmm, vmemmap doesn't use vmalloc() to allocate memory. yes, this can be removed. +else +free_pages((unsigned long)page_address(page), order); +} +} + +static void free_pte_table(pmd_t *pmd) +{ +pte_t *pte, *pte_start; +int i; + +pte_start = (pte_t *)pmd_page_vaddr(*pmd); +for (i = 0; i PTRS_PER_PTE; i++) { +pte = pte_start + i; +if (pte_val(*pte)) +return; +} + +/* free a pte talbe */ +vmemmap_free_pages(pmd_page(*pmd), 0); +spin_lock(init_mm.page_table_lock); +pmd_clear(pmd); +spin_unlock(init_mm.page_table_lock); +} + +static void free_pmd_table(pud_t *pud) +{ +pmd_t *pmd, *pmd_start; +int i; + +pmd_start = (pmd_t *)pud_page_vaddr(*pud); +for (i = 0; i PTRS_PER_PMD; i++) { +pmd = pmd_start + i; +if (pmd_val(*pmd)) +return; +} + +/* free a pmd talbe */ +vmemmap_free_pages(pud_page(*pud), 0); +spin_lock(init_mm.page_table_lock); +pud_clear(pud); +spin_unlock(init_mm.page_table_lock); +} + +static void free_pud_table(pgd_t *pgd) +{ +pud_t *pud, *pud_start; +int i; + +pud_start = (pud_t *)pgd_page_vaddr(*pgd); +for (i = 0; i PTRS_PER_PUD; i++) { +pud = pud_start + i; +if (pud_val(*pud)) +return; +} + +/* free a pud table */ +vmemmap_free_pages(pgd_page(*pgd), 0); +spin_lock(init_mm.page_table_lock); +pgd_clear(pgd); +spin_unlock(init_mm.page_table_lock); +} + +static int split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase) +{ +struct page *page = pmd_page(*(pmd_t *)kpte); +int i = 0; +unsigned long magic; +unsigned long section_nr; + +__split_large_page(kpte, address, pbase); +__flush_tlb_all(); + +magic = (unsigned long) page-lru.next; +if (magic == SECTION_INFO) { +section_nr = pfn_to_section_nr(page_to_pfn(page)); +while (i PTRS_PER_PMD) { +page++; +i++; +get_page_bootmem(section_nr, page, SECTION_INFO); +} +} + +return 0; +} + +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long end) +{ +pte_t *pte; +unsigned long next; + +pte = pte_offset_kernel(pmd, addr); +for (; addr end; pte++, addr += PAGE_SIZE) { +next = (addr + PAGE_SIZE
Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap
Hi Congyang, I think vmemmap's pgtable pages should be freed after all entries are cleared, I have a patch to do this. The code logic is the same as [Patch v4 09/12] memory-hotplug: remove page table of x86_64 architecture. How do you think about this? Signed-off-by: Jianguo Wu wujian...@huawei.com Signed-off-by: Jiang Liu jiang@huawei.com --- include/linux/mm.h |1 + mm/sparse-vmemmap.c | 214 +++ mm/sparse.c |5 +- 3 files changed, 218 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5657670..1f26af5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned long pages, int node); void vmemmap_populate_print_last(void); void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long size); +void vmemmap_free(struct page *memmap, unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 0, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 1b7e22a..242cb28 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -29,6 +29,10 @@ #include asm/pgalloc.h #include asm/pgtable.h +#ifdef CONFIG_MEMORY_HOTREMOVE +#include asm/tlbflush.h +#endif + /* * Allocate a block of memory to be used to back the virtual memory map * or to back the page tables that are used to create the mapping. @@ -224,3 +228,213 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, vmemmap_buf_end = NULL; } } + +#ifdef CONFIG_MEMORY_HOTREMOVE +static void vmemmap_free_pages(struct page *page, int order) +{ + struct zone *zone; + unsigned long magic; + + magic = (unsigned long) page-lru.next; + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) { + put_page_bootmem(page); + + zone = page_zone(page); + zone_span_writelock(zone); + zone-present_pages++; + zone_span_writeunlock(zone); + totalram_pages++; + } else { + if (is_vmalloc_addr(page_address(page))) + vfree(page_address(page)); + else + free_pages((unsigned long)page_address(page), order); + } +} + +static void free_pte_table(pmd_t *pmd) +{ + pte_t *pte, *pte_start; + int i; + + pte_start = (pte_t *)pmd_page_vaddr(*pmd); + for (i = 0; i PTRS_PER_PTE; i++) { + pte = pte_start + i; + if (pte_val(*pte)) + return; + } + + /* free a pte talbe */ + vmemmap_free_pages(pmd_page(*pmd), 0); + spin_lock(init_mm.page_table_lock); + pmd_clear(pmd); + spin_unlock(init_mm.page_table_lock); +} + +static void free_pmd_table(pud_t *pud) +{ + pmd_t *pmd, *pmd_start; + int i; + + pmd_start = (pmd_t *)pud_page_vaddr(*pud); + for (i = 0; i PTRS_PER_PMD; i++) { + pmd = pmd_start + i; + if (pmd_val(*pmd)) + return; + } + + /* free a pmd talbe */ + vmemmap_free_pages(pud_page(*pud), 0); + spin_lock(init_mm.page_table_lock); + pud_clear(pud); + spin_unlock(init_mm.page_table_lock); +} + +static void free_pud_table(pgd_t *pgd) +{ + pud_t *pud, *pud_start; + int i; + + pud_start = (pud_t *)pgd_page_vaddr(*pgd); + for (i = 0; i PTRS_PER_PUD; i++) { + pud = pud_start + i; + if (pud_val(*pud)) + return; + } + + /* free a pud table */ + vmemmap_free_pages(pgd_page(*pgd), 0); + spin_lock(init_mm.page_table_lock); + pgd_clear(pgd); + spin_unlock(init_mm.page_table_lock); +} + +static int split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase) +{ + struct page *page = pmd_page(*(pmd_t *)kpte); + int i = 0; + unsigned long magic; + unsigned long section_nr; + + __split_large_page(kpte, address, pbase); + __flush_tlb_all(); + + magic = (unsigned long) page-lru.next; + if (magic == SECTION_INFO) { + section_nr = pfn_to_section_nr(page_to_pfn(page)); + while (i PTRS_PER_PMD) { + page++; + i++; + get_page_bootmem(section_nr, page, SECTION_INFO); + } + } + + return 0; +} + +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long end) +{ + pte_t *pte; + unsigned long next; + + pte = pte_offset_kernel(pmd, addr); + for (; addr end; pte++, addr += PAGE_SIZE) { + next = (addr + PAGE_SIZE) PAGE_MASK; + if (next end) + next = end; + + if (pte_none(*pte)) + continue
Re: [PATCH v3 11/12] memory-hotplug: remove sysfs file of node
On 2012/11/1 17:44, Wen Congyang wrote: This patch introduces a new function try_offline_node() to remove sysfs file of node when all memory sections of this node are removed. If some memory sections of this node are not removed, this function does nothing. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/acpi/acpi_memhotplug.c | 8 +- include/linux/memory_hotplug.h | 2 +- mm/memory_hotplug.c| 58 -- 3 files changed, 64 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 24c807f..0780f99 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -310,7 +310,9 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) { int result; struct acpi_memory_info *info, *n; + int node; + node = acpi_get_node(mem_device-device-handle); /* * Ask the VM to offline this memory range. @@ -318,7 +320,11 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device) */ list_for_each_entry_safe(info, n, mem_device-res_list, list) { if (info-enabled) { - result = remove_memory(info-start_addr, info-length); + if (node 0) + node = memory_add_physaddr_to_nid( + info-start_addr); + result = remove_memory(node, info-start_addr, + info-length); if (result) return result; } diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index d4c4402..7b4cfe6 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -231,7 +231,7 @@ extern int arch_add_memory(int nid, u64 start, u64 size); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); extern int offline_memory_block(struct memory_block *mem); extern bool is_memblock_offlined(struct memory_block *mem); -extern int remove_memory(u64 start, u64 size); +extern int remove_memory(int node, u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7bcced0..d965da3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -29,6 +29,7 @@ #include linux/suspend.h #include linux/mm_inline.h #include linux/firmware-map.h +#include linux/stop_machine.h #include asm/tlbflush.h @@ -1299,7 +1300,58 @@ static int is_memblock_offlined_cb(struct memory_block *mem, void *arg) return ret; } -int __ref remove_memory(u64 start, u64 size) +static int check_cpu_on_node(void *data) +{ + struct pglist_data *pgdat = data; + int cpu; + + for_each_present_cpu(cpu) { + if (cpu_to_node(cpu) == pgdat-node_id) + /* + * the cpu on this node isn't removed, and we can't + * offline this node. + */ + return -EBUSY; + } + + return 0; +} + +/* offline the node if all memory sections of this node are removed */ +static void try_offline_node(int nid) +{ + unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn; + unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages; + unsigned long pfn; + + for (pfn = start_pfn; pfn end_pfn; pfn += PAGES_PER_SECTION) { + unsigned long section_nr = pfn_to_section_nr(pfn); + + if (!present_section_nr(section_nr)) + continue; + + if (pfn_to_nid(pfn) != nid) + continue; + + /* + * some memory sections of this node are not removed, and we + * can't offline node now. + */ + return; + } + + if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL)) + return; how about: if (nr_cpus_node(nid)) return; + + /* + * all memory/cpu of this node are removed, we can offline this + * node now. + */ + node_set_offline(nid); + unregister_one_node(nid); +} + +int __ref remove_memory(int nid, u64 start, u64 size) { unsigned long start_pfn,
Re: [PATCH v3 08/12] memory-hotplug: remove memmap of sparse-vmemmap
*/ + vmemmap_kfree(page, nr_pages); } static void free_map_bootmem(struct page *page, unsigned long nr_pages) { + vmemmap_free_bootmem(page, nr_pages); } Hi Congyang, For vmemmap, nr_pages should be PAGES_PER_SECTION for free_map_bootmem(), which is passed by free_section_usemap(), right? But now, nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page)) PAGE_SHIFT. Signed-off-by: Jianguo Wu wujian...@huawei.com --- mm/sparse.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/mm/sparse.c b/mm/sparse.c index fac95f2..31e5282 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -713,8 +713,12 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap) struct page *memmap_page; memmap_page = virt_to_page(memmap); +#ifdef CONFIG_SPARSEMEM_VMEMMAP + nr_pages = PAGES_PER_SECTION; +#else nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page)) PAGE_SHIFT; +#endif free_map_bootmem(memmap_page, nr_pages); } -- 1.7.6.1 #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v3 08/12] memory-hotplug: remove memmap of sparse-vmemmap
On 2012/11/27 14:49, Wen Congyang wrote: At 11/27/2012 01:47 PM, Jianguo Wu Wrote: On 2012/11/1 17:44, Wen Congyang wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com All pages of virtual mapping in removed memory cannot be freed, since some pages used as PGD/PUD includes not only removed memory but also other memory. So the patch checks whether page can be freed or not. How to check whether page can be freed or not? 1. When removing memory, the page structs of the revmoved memory are filled with 0FD. 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted. Note: vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64, ppc, s390, and sparc. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com CC: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com --- arch/ia64/mm/discontig.c | 8 arch/powerpc/mm/init_64.c | 8 arch/s390/mm/vmem.c | 8 arch/sparc/mm/init_64.c | 8 arch/x86/mm/init_64.c | 119 ++ include/linux/mm.h| 2 + mm/memory_hotplug.c | 17 +-- mm/sparse.c | 5 +- 8 files changed, 158 insertions(+), 17 deletions(-) diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c index 33943db..0d23b69 100644 --- a/arch/ia64/mm/discontig.c +++ b/arch/ia64/mm/discontig.c @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page, return vmemmap_populate_basepages(start_page, size, node); } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 6466440..df7d155 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -298,6 +298,14 @@ int __meminit vmemmap_populate(struct page *start_page, return 0; } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 4f4803a..ab69c34 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -236,6 +236,14 @@ out: return ret; } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 75a984b..546855d 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2232,6 +2232,14 @@ void __meminit vmemmap_populate_print_last(void) } } +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages) +{ +} + +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages) +{ +} + void register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page, unsigned long size) { diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 795dae3..e85626d 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -998,6 +998,125 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node) return 0; } +#define PAGE_INUSE 0xFD + +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end, + struct page **pp, int *page_size) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte = NULL; + void *page_addr; + unsigned long next; + + *pp = NULL; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + return pgd_addr_end(addr, end); + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + return pud_addr_end(addr, end); + + if (!cpu_has_pse) { + next = (addr + PAGE_SIZE) PAGE_MASK; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return next; + + pte = pte_offset_kernel(pmd, addr
Re: [PATCH v2 10/12] memory-hotplug: memory_hotplug: clear zone when removing the memory
On 2012/10/23 18:30, we...@cn.fujitsu.com wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com When a memory is added, we update zone's and pgdat's start_pfn and spanned_pages in the function __add_zone(). So we should revert them when the memory is removed. The patch adds a new function __remove_zone() to do this. CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- mm/memory_hotplug.c | 207 +++ 1 files changed, 207 insertions(+), 0 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 03153cf..55a228d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -312,10 +312,213 @@ static int __meminit __add_section(int nid, struct zone *zone, return register_new_memory(nid, __pfn_to_section(phys_start_pfn)); } +/* find the smallest valid pfn in the range [start_pfn, end_pfn) */ +static int find_smallest_section_pfn(int nid, struct zone *zone, + unsigned long start_pfn, + unsigned long end_pfn) +{ + struct mem_section *ms; + + for (; start_pfn end_pfn; start_pfn += PAGES_PER_SECTION) { + ms = __pfn_to_section(start_pfn); + + if (unlikely(!valid_section(ms))) + continue; + + if (unlikely(pfn_to_nid(start_pfn)) != nid) if (unlikely(pfn_to_nid(start_pfn) != nid)) + continue; + + if (zone zone != page_zone(pfn_to_page(start_pfn))) + continue; + + return start_pfn; + } + + return 0; +} + +/* find the biggest valid pfn in the range [start_pfn, end_pfn). */ +static int find_biggest_section_pfn(int nid, struct zone *zone, + unsigned long start_pfn, + unsigned long end_pfn) +{ + struct mem_section *ms; + unsigned long pfn; + + /* pfn is the end pfn of a memory section. */ + pfn = end_pfn - 1; + for (; pfn = start_pfn; pfn -= PAGES_PER_SECTION) { + ms = __pfn_to_section(pfn); + + if (unlikely(!valid_section(ms))) + continue; + + if (unlikely(pfn_to_nid(pfn)) != nid) if (unlikely(pfn_to_nid(pfn) != nid)) + continue; + + if (zone zone != page_zone(pfn_to_page(pfn))) + continue; + + return pfn; + } + + return 0; +} + +static void shrink_zone_span(struct zone *zone, unsigned long start_pfn, + unsigned long end_pfn) +{ + unsigned long zone_start_pfn = zone-zone_start_pfn; + unsigned long zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages; + unsigned long pfn; + struct mem_section *ms; + int nid = zone_to_nid(zone); + + zone_span_writelock(zone); + if (zone_start_pfn == start_pfn) { + /* + * If the section is smallest section in the zone, it need + * shrink zone-zone_start_pfn and zone-zone_spanned_pages. + * In this case, we find second smallest valid mem_section + * for shrinking zone. + */ + pfn = find_smallest_section_pfn(nid, zone, end_pfn, + zone_end_pfn); + if (pfn) { + zone-zone_start_pfn = pfn; + zone-spanned_pages = zone_end_pfn - pfn; + } + } else if (zone_end_pfn == end_pfn) { + /* + * If the section is biggest section in the zone, it need + * shrink zone-spanned_pages. + * In this case, we find second biggest valid mem_section for + * shrinking zone. + */ + pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn, +start_pfn); + if (pfn) + zone-spanned_pages = pfn - zone_start_pfn + 1; + } + + /* + * The section is not biggest or smallest mem_section in the zone, it + * only creates a hole in the zone. So in this case, we need not + * change the zone. But perhaps, the zone has only hole data. Thus + * it check the zone has only hole or not. + */ + pfn = zone_start_pfn; + for (; pfn zone_end_pfn; pfn += PAGES_PER_SECTION) { + ms = __pfn_to_section(pfn); + + if (unlikely(!valid_section(ms))) + continue; + + if
Re: [RFC V7 PATCH 18/19] memory-hotplug: add node_device_release
On 2012/8/20 17:35, we...@cn.fujitsu.com wrote: From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com When calling unregister_node(), the function shows following message at device_release(). Device 'node2' does not have a release() function, it is broken and must be fixed. So the patch implements node_device_release() CC: David Rientjes rient...@google.com CC: Jiang Liu liu...@gmail.com CC: Len Brown len.br...@intel.com CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org CC: Christoph Lameter c...@linux.com Cc: Minchan Kim minchan@gmail.com CC: Andrew Morton a...@linux-foundation.org CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- drivers/base/node.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index af1a177..9bc2f57 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -252,6 +252,13 @@ static inline void hugetlb_register_node(struct node *node) {} static inline void hugetlb_unregister_node(struct node *node) {} #endif +static void node_device_release(struct device *dev) +{ + struct node *node_dev = to_node(dev); + + flush_work(node_dev-node_work); Hi Congyang, I think this should be: #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) defined(CONFIG_HUGETLBFS) flush_work(node_dev-node_work); #endif As struct node defined in node.h: struct node { struct sys_device sysdev; #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) defined(CONFIG_HUGETLBFS) struct work_struct node_work; #endif }; Thanks Jianguo Wu + memset(node_dev, 0, sizeof(struct node)); +} /* * register_node - Setup a sysfs device for a node. @@ -265,6 +272,7 @@ int register_node(struct node *node, int num, struct node *parent) node-dev.id = num; node-dev.bus = node_subsys; + node-dev.release = node_device_release; error = device_register(node-dev); if (!error){ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev