Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Jianguo Wu
On 2013/1/31 18:38, Simon Jeons wrote:

 Hi Tang,
 On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
 Hi Simon,

 On 01/31/2013 04:48 PM, Simon Jeons wrote:
 Hi Tang,
 On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:

 1. IIUC, there is a button on machine which supports hot-remove memory,
 then what's the difference between press button and echo to /sys?

 No important difference, I think. Since I don't have the machine you are
 saying, I cannot surely answer you. :)
 AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
 is just another entrance. At last, they will run into the same code.

 2. Since kernel memory is linear mapping(I mean direct mapping part),
 why can't put kernel direct mapping memory into one memory device, and
 other memory into the other devices?

 We cannot do that because in that way, we will lose NUMA performance.

 If you know NUMA, you will understand the following example:

 node0:node1:
 cpu0~cpu15cpu16~cpu31
 memory0~memory511 memory512~memory1023

 cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
 If we set direct mapping area in node0, and movable area in node1, then
 the kernel code running on cpu16~cpu31 will have to access 
 memory0~memory511.
 This is a terrible performance down.
 
 So if config NUMA, kernel memory will not be linear mapping anymore? For
 example, 
 
 Node 0  Node 1 
 
 0 ~ 10G 11G~14G
 
 kernel memory only at Node 0? Can part of kernel memory also at Node 1?
 
 How big is kernel direct mapping memory in x86_64? Is there max limit?


Max kernel direct mapping memory in x86_64 is 64TB.

 It seems that only around 896MB on x86_32. 
 

 As you know x86_64 don't need
 highmem, IIUC, all kernel memory will linear mapping in this case. Is my
 idea available? If is correct, x86_32 can't implement in the same way
 since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
 hard to focus kernel memory on single memory device.

 Sorry, I'm not quite familiar with x86_32 box.

 3. In current implementation, if memory hotplug just need memory
 subsystem and ACPI codes support? Or also needs firmware take part in?
 Hope you can explain in details, thanks in advance. :)

 We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
 based memory migration mentioned by Liu Jiang.
 
 Is there any material about firmware based memory migration?
 

 So far, I only know this. :)

 4. What's the status of memory hotplug? Apart from can't remove kernel
 memory, other things are fully implementation?

 I think the main job is done for now. And there are still bugs to fix.
 And this functionality is not stable.

 Thanks. :)
 
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 .
 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Jianguo Wu
On 2013/2/1 9:36, Simon Jeons wrote:

 On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
 On 2013/1/31 18:38, Simon Jeons wrote:

 Hi Tang,
 On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
 Hi Simon,

 On 01/31/2013 04:48 PM, Simon Jeons wrote:
 Hi Tang,
 On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:

 1. IIUC, there is a button on machine which supports hot-remove memory,
 then what's the difference between press button and echo to /sys?

 No important difference, I think. Since I don't have the machine you are
 saying, I cannot surely answer you. :)
 AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
 is just another entrance. At last, they will run into the same code.

 2. Since kernel memory is linear mapping(I mean direct mapping part),
 why can't put kernel direct mapping memory into one memory device, and
 other memory into the other devices?

 We cannot do that because in that way, we will lose NUMA performance.

 If you know NUMA, you will understand the following example:

 node0:node1:
 cpu0~cpu15cpu16~cpu31
 memory0~memory511 memory512~memory1023

 cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
 If we set direct mapping area in node0, and movable area in node1, then
 the kernel code running on cpu16~cpu31 will have to access 
 memory0~memory511.
 This is a terrible performance down.

 So if config NUMA, kernel memory will not be linear mapping anymore? For
 example, 

 Node 0  Node 1 

 0 ~ 10G 11G~14G

 kernel memory only at Node 0? Can part of kernel memory also at Node 1?

 How big is kernel direct mapping memory in x86_64? Is there max limit?


 Max kernel direct mapping memory in x86_64 is 64TB.
 
 For example, I have 8G memory, all of them will be direct mapping for
 kernel? then userspace memory allocated from where?

Direct mapping memory means you can use __va() and pa(), but not means that them
can be only used by kernel, them can be used by user-space too, as long as them 
are free.

 

 It seems that only around 896MB on x86_32. 


 As you know x86_64 don't need
 highmem, IIUC, all kernel memory will linear mapping in this case. Is my
 idea available? If is correct, x86_32 can't implement in the same way
 since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
 hard to focus kernel memory on single memory device.

 Sorry, I'm not quite familiar with x86_32 box.

 3. In current implementation, if memory hotplug just need memory
 subsystem and ACPI codes support? Or also needs firmware take part in?
 Hope you can explain in details, thanks in advance. :)

 We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
 based memory migration mentioned by Liu Jiang.

 Is there any material about firmware based memory migration?


 So far, I only know this. :)

 4. What's the status of memory hotplug? Apart from can't remove kernel
 memory, other things are fully implementation?

 I think the main job is done for now. And there are still bugs to fix.
 And this functionality is not stable.

 Thanks. :)


 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

 .




 
 
 
 .
 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Jianguo Wu
On 2013/2/1 10:06, Simon Jeons wrote:

 Hi Jianguo,
 On Fri, 2013-02-01 at 09:57 +0800, Jianguo Wu wrote:
 On 2013/2/1 9:36, Simon Jeons wrote:

 On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
 On 2013/1/31 18:38, Simon Jeons wrote:

 Hi Tang,
 On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
 Hi Simon,

 On 01/31/2013 04:48 PM, Simon Jeons wrote:
 Hi Tang,
 On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:

 1. IIUC, there is a button on machine which supports hot-remove memory,
 then what's the difference between press button and echo to /sys?

 No important difference, I think. Since I don't have the machine you are
 saying, I cannot surely answer you. :)
 AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
 is just another entrance. At last, they will run into the same code.

 2. Since kernel memory is linear mapping(I mean direct mapping part),
 why can't put kernel direct mapping memory into one memory device, and
 other memory into the other devices?

 We cannot do that because in that way, we will lose NUMA performance.

 If you know NUMA, you will understand the following example:

 node0:node1:
 cpu0~cpu15cpu16~cpu31
 memory0~memory511 memory512~memory1023

 cpu16~cpu31 access memory16~memory1023 much faster than 
 memory0~memory511.
 If we set direct mapping area in node0, and movable area in node1, then
 the kernel code running on cpu16~cpu31 will have to access 
 memory0~memory511.
 This is a terrible performance down.

 So if config NUMA, kernel memory will not be linear mapping anymore? For
 example, 

 Node 0  Node 1 

 0 ~ 10G 11G~14G

 kernel memory only at Node 0? Can part of kernel memory also at Node 1?

 How big is kernel direct mapping memory in x86_64? Is there max limit?


 Max kernel direct mapping memory in x86_64 is 64TB.

 For example, I have 8G memory, all of them will be direct mapping for
 kernel? then userspace memory allocated from where?

 Direct mapping memory means you can use __va() and pa(), but not means that 
 them
 can be only used by kernel, them can be used by user-space too, as long as 
 them are free.
 
 IIUC, the benefit of va() and pa() is just for quick get
 virtual/physical address, it takes advantage of linear mapping. But mmu
 still need to go through pgd/pud/pmd/pte, correct?

Yes.

 




 It seems that only around 896MB on x86_32. 


 As you know x86_64 don't need
 highmem, IIUC, all kernel memory will linear mapping in this case. Is my
 idea available? If is correct, x86_32 can't implement in the same way
 since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
 hard to focus kernel memory on single memory device.

 Sorry, I'm not quite familiar with x86_32 box.

 3. In current implementation, if memory hotplug just need memory
 subsystem and ACPI codes support? Or also needs firmware take part in?
 Hope you can explain in details, thanks in advance. :)

 We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
 based memory migration mentioned by Liu Jiang.

 Is there any material about firmware based memory migration?


 So far, I only know this. :)

 4. What's the status of memory hotplug? Apart from can't remove kernel
 memory, other things are fully implementation?

 I think the main job is done for now. And there are still bugs to fix.
 And this functionality is not stable.

 Thanks. :)


 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

 .







 .




 
 
 
 .
 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 08/15] memory-hotplug: Common APIs to support page tables hot-remove

2013-01-29 Thread Jianguo Wu
On 2013/1/29 21:02, Simon Jeons wrote:

 Hi Tang,
 On Wed, 2013-01-09 at 17:32 +0800, Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com

 When memory is removed, the corresponding pagetables should alse be removed.
 This patch introduces some common APIs to support vmemmap pagetable and 
 x86_64
 architecture pagetable removing.

 
 When page table of hot-add memory is created?


Hi Simon,

For x86_64, page table of hot-add memory is created by:

add_memory-arch_add_memory-init_memory_mapping-kernel_physical_mapping_init

 
 All pages of virtual mapping in removed memory cannot be freedi if some pages
 used as PGD/PUD includes not only removed memory but also other memory. So 
 the
 patch uses the following way to check whether page can be freed or not.

  1. When removing memory, the page structs of the revmoved memory are filled
 with 0FD.
  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
 In this case, the page used as PT/PMD can be freed.

 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 Signed-off-by: Jianguo Wu wujian...@huawei.com
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
 ---
  arch/x86/include/asm/pgtable_types.h |1 +
  arch/x86/mm/init_64.c|  299 
 ++
  arch/x86/mm/pageattr.c   |   47 +++---
  include/linux/bootmem.h  |1 +
  4 files changed, 326 insertions(+), 22 deletions(-)

 diff --git a/arch/x86/include/asm/pgtable_types.h 
 b/arch/x86/include/asm/pgtable_types.h
 index 3c32db8..4b6fd2a 100644
 --- a/arch/x86/include/asm/pgtable_types.h
 +++ b/arch/x86/include/asm/pgtable_types.h
 @@ -352,6 +352,7 @@ static inline void update_page_count(int level, unsigned 
 long pages) { }
   * as a pte too.
   */
  extern pte_t *lookup_address(unsigned long address, unsigned int *level);
 +extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t 
 *pbase);
  
  #endif  /* !__ASSEMBLY__ */
  
 diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
 index 9ac1723..fe01116 100644
 --- a/arch/x86/mm/init_64.c
 +++ b/arch/x86/mm/init_64.c
 @@ -682,6 +682,305 @@ int arch_add_memory(int nid, u64 start, u64 size)
  }
  EXPORT_SYMBOL_GPL(arch_add_memory);
  
 +#define PAGE_INUSE 0xFD
 +
 +static void __meminit free_pagetable(struct page *page, int order)
 +{
 +struct zone *zone;
 +bool bootmem = false;
 +unsigned long magic;
 +unsigned int nr_pages = 1  order;
 +
 +/* bootmem page has reserved flag */
 +if (PageReserved(page)) {
 +__ClearPageReserved(page);
 +bootmem = true;
 +
 +magic = (unsigned long)page-lru.next;
 +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
 +while (nr_pages--)
 +put_page_bootmem(page++);
 +} else
 +__free_pages_bootmem(page, order);
 +} else
 +free_pages((unsigned long)page_address(page), order);
 +
 +/*
 + * SECTION_INFO pages and MIX_SECTION_INFO pages
 + * are all allocated by bootmem.
 + */
 +if (bootmem) {
 +zone = page_zone(page);
 +zone_span_writelock(zone);
 +zone-present_pages += nr_pages;
 +zone_span_writeunlock(zone);
 +totalram_pages += nr_pages;
 +}
 +}
 +
 +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
 +{
 +pte_t *pte;
 +int i;
 +
 +for (i = 0; i  PTRS_PER_PTE; i++) {
 +pte = pte_start + i;
 +if (pte_val(*pte))
 +return;
 +}
 +
 +/* free a pte talbe */
 +free_pagetable(pmd_page(*pmd), 0);
 +spin_lock(init_mm.page_table_lock);
 +pmd_clear(pmd);
 +spin_unlock(init_mm.page_table_lock);
 +}
 +
 +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
 +{
 +pmd_t *pmd;
 +int i;
 +
 +for (i = 0; i  PTRS_PER_PMD; i++) {
 +pmd = pmd_start + i;
 +if (pmd_val(*pmd))
 +return;
 +}
 +
 +/* free a pmd talbe */
 +free_pagetable(pud_page(*pud), 0);
 +spin_lock(init_mm.page_table_lock);
 +pud_clear(pud);
 +spin_unlock(init_mm.page_table_lock);
 +}
 +
 +/* Return true if pgd is changed, otherwise return false. */
 +static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd)
 +{
 +pud_t *pud;
 +int i;
 +
 +for (i = 0; i  PTRS_PER_PUD; i++) {
 +pud = pud_start + i;
 +if (pud_val(*pud))
 +return false;
 +}
 +
 +/* free a pud table */
 +free_pagetable(pgd_page(*pgd), 0);
 +spin_lock(init_mm.page_table_lock);
 +pgd_clear(pgd);
 +spin_unlock(init_mm.page_table_lock);
 +
 +return true;
 +}
 +
 +static void __meminit
 +remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 + bool direct)
 +{
 +unsigned long

Re: [PATCH v5 06/14] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap

2012-12-25 Thread Jianguo Wu
);
 +
 + pmd = pmd_offset(pud, addr);
 + if (pmd_none(*pmd))
 + continue;
 + get_page_bootmem(section_nr, pmd_page(*pmd),
 +  SECTION_INFO);

Hi Tangļ¼Œ
In this case, pmd maps 512 pages, but you only get_page_bootmem() on 
the first page.
I think the whole 512 pages should be get_page_bootmem(), what do you think?

Thanks,
Jianguo Wu

 + }
 + }
 +}
 +
  void __meminit vmemmap_populate_print_last(void)
  {
   if (p_start) {
 diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
 index 31a563b..2441f36 100644
 --- a/include/linux/memory_hotplug.h
 +++ b/include/linux/memory_hotplug.h
 @@ -174,17 +174,10 @@ static inline void arch_refresh_nodedata(int nid, 
 pg_data_t *pgdat)
  #endif /* CONFIG_NUMA */
  #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
  
 -#ifdef CONFIG_SPARSEMEM_VMEMMAP
 -static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 -{
 -}
 -static inline void put_page_bootmem(struct page *page)
 -{
 -}
 -#else
  extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
  extern void put_page_bootmem(struct page *page);
 -#endif
 +extern void get_page_bootmem(unsigned long ingo, struct page *page,
 +  unsigned long type);
  
  /*
   * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
 diff --git a/include/linux/mm.h b/include/linux/mm.h
 index 6320407..1eca498 100644
 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1709,7 +1709,8 @@ int vmemmap_populate_basepages(struct page *start_page,
   unsigned long pages, int node);
  int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
  void vmemmap_populate_print_last(void);
 -
 +void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
 +   unsigned long size);
  
  enum mf_flags {
   MF_COUNT_INCREASED = 1  0,
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index 2c5d734..34c656b 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -91,9 +91,8 @@ static void release_memory_resource(struct resource *res)
  }
  
  #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
 -#ifndef CONFIG_SPARSEMEM_VMEMMAP
 -static void get_page_bootmem(unsigned long info,  struct page *page,
 -  unsigned long type)
 +void get_page_bootmem(unsigned long info,  struct page *page,
 +   unsigned long type)
  {
   page-lru.next = (struct list_head *) type;
   SetPagePrivate(page);
 @@ -128,6 +127,7 @@ void __ref put_page_bootmem(struct page *page)
  
  }
  
 +#ifndef CONFIG_SPARSEMEM_VMEMMAP
  static void register_page_bootmem_info_section(unsigned long start_pfn)
  {
   unsigned long *usemap, mapsize, section_nr, i;
 @@ -161,6 +161,32 @@ static void register_page_bootmem_info_section(unsigned 
 long start_pfn)
   get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
  
  }
 +#else
 +static void register_page_bootmem_info_section(unsigned long start_pfn)
 +{
 + unsigned long *usemap, mapsize, section_nr, i;
 + struct mem_section *ms;
 + struct page *page, *memmap;
 +
 + if (!pfn_valid(start_pfn))
 + return;
 +
 + section_nr = pfn_to_section_nr(start_pfn);
 + ms = __nr_to_section(section_nr);
 +
 + memmap = sparse_decode_mem_map(ms-section_mem_map, section_nr);
 +
 + register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);
 +
 + usemap = __nr_to_section(section_nr)-pageblock_flags;
 + page = virt_to_page(usemap);
 +
 + mapsize = PAGE_ALIGN(usemap_size())  PAGE_SHIFT;
 +
 + for (i = 0; i  mapsize; i++, page++)
 + get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 +}
 +#endif
  
  void register_page_bootmem_info_node(struct pglist_data *pgdat)
  {
 @@ -203,7 +229,6 @@ void register_page_bootmem_info_node(struct pglist_data 
 *pgdat)
   register_page_bootmem_info_section(pfn);
   }
  }
 -#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
  
  static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
  unsigned long end_pfn)



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 08/14] memory-hotplug: Common APIs to support page tables hot-remove

2012-12-25 Thread Jianguo Wu
On 2012/12/24 20:09, Tang Chen wrote:

 From: Wen Congyang we...@cn.fujitsu.com
 
 When memory is removed, the corresponding pagetables should alse be removed.
 This patch introduces some common APIs to support vmemmap pagetable and x86_64
 architecture pagetable removing.
 
 All pages of virtual mapping in removed memory cannot be freedi if some pages
 used as PGD/PUD includes not only removed memory but also other memory. So the
 patch uses the following way to check whether page can be freed or not.
 
  1. When removing memory, the page structs of the revmoved memory are filled
 with 0FD.
  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
 In this case, the page used as PT/PMD can be freed.
 
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 Signed-off-by: Jianguo Wu wujian...@huawei.com
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
 ---
  arch/x86/include/asm/pgtable_types.h |1 +
  arch/x86/mm/init_64.c|  297 
 ++
  arch/x86/mm/pageattr.c   |   47 +++---
  include/linux/bootmem.h  |1 +
  4 files changed, 324 insertions(+), 22 deletions(-)
 
 diff --git a/arch/x86/include/asm/pgtable_types.h 
 b/arch/x86/include/asm/pgtable_types.h
 index 3c32db8..4b6fd2a 100644
 --- a/arch/x86/include/asm/pgtable_types.h
 +++ b/arch/x86/include/asm/pgtable_types.h
 @@ -352,6 +352,7 @@ static inline void update_page_count(int level, unsigned 
 long pages) { }
   * as a pte too.
   */
  extern pte_t *lookup_address(unsigned long address, unsigned int *level);
 +extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t 
 *pbase);
  
  #endif   /* !__ASSEMBLY__ */
  
 diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
 index aeaa27e..b30df3c 100644
 --- a/arch/x86/mm/init_64.c
 +++ b/arch/x86/mm/init_64.c
 @@ -682,6 +682,303 @@ int arch_add_memory(int nid, u64 start, u64 size)
  }
  EXPORT_SYMBOL_GPL(arch_add_memory);
  
 +#define PAGE_INUSE 0xFD
 +
 +static void __meminit free_pagetable(struct page *page, int order)
 +{
 + struct zone *zone;
 + bool bootmem = false;
 + unsigned long magic;
 +
 + /* bootmem page has reserved flag */
 + if (PageReserved(page)) {
 + __ClearPageReserved(page);
 + bootmem = true;
 +
 + magic = (unsigned long)page-lru.next;
 + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
 + put_page_bootmem(page);

Hi Tang,

For removing memmap of sparse-vmemmap, in cpu_has_pse case, if magic == 
SECTION_INFO,
the order will be get_order(PMD_SIZE), so we need a loop here to put all the 
512 pages.

Thanks,
Jianguo Wu

 + else
 + __free_pages_bootmem(page, order);
 + } else
 + free_pages((unsigned long)page_address(page), order);
 +
 + /*
 +  * SECTION_INFO pages and MIX_SECTION_INFO pages
 +  * are all allocated by bootmem.
 +  */
 + if (bootmem) {
 + zone = page_zone(page);
 + zone_span_writelock(zone);
 + zone-present_pages++;
 + zone_span_writeunlock(zone);
 + totalram_pages++;
 + }
 +}
 +
 +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
 +{
 + pte_t *pte;
 + int i;
 +
 + for (i = 0; i  PTRS_PER_PTE; i++) {
 + pte = pte_start + i;
 + if (pte_val(*pte))
 + return;
 + }
 +
 + /* free a pte talbe */
 + free_pagetable(pmd_page(*pmd), 0);
 + spin_lock(init_mm.page_table_lock);
 + pmd_clear(pmd);
 + spin_unlock(init_mm.page_table_lock);
 +}
 +
 +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
 +{
 + pmd_t *pmd;
 + int i;
 +
 + for (i = 0; i  PTRS_PER_PMD; i++) {
 + pmd = pmd_start + i;
 + if (pmd_val(*pmd))
 + return;
 + }
 +
 + /* free a pmd talbe */
 + free_pagetable(pud_page(*pud), 0);
 + spin_lock(init_mm.page_table_lock);
 + pud_clear(pud);
 + spin_unlock(init_mm.page_table_lock);
 +}
 +
 +/* Return true if pgd is changed, otherwise return false. */
 +static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd)
 +{
 + pud_t *pud;
 + int i;
 +
 + for (i = 0; i  PTRS_PER_PUD; i++) {
 + pud = pud_start + i;
 + if (pud_val(*pud))
 + return false;
 + }
 +
 + /* free a pud table */
 + free_pagetable(pgd_page(*pgd), 0);
 + spin_lock(init_mm.page_table_lock);
 + pgd_clear(pgd);
 + spin_unlock(init_mm.page_table_lock);
 +
 + return true;
 +}
 +
 +static void __meminit
 +remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 +  bool direct)
 +{
 + unsigned long next, pages = 0;
 + pte_t *pte;
 + void *page_addr;
 + phys_addr_t phys_addr;
 +
 + pte

Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-12-06 Thread Jianguo Wu
Hi Tang,

On 2012/12/7 9:42, Tang Chen wrote:

 Hi Wu,
 
 I met some problems when I was digging into the code. It's very
 kind of you if you could help me with that. :)
 
 If I misunderstood your code, please tell me.
 Please see below. :)
 
 On 12/03/2012 10:23 AM, Jianguo Wu wrote:
 Signed-off-by: Jianguo Wuwujian...@huawei.com
 Signed-off-by: Jiang Liujiang@huawei.com
 ---
   include/linux/mm.h  |1 +
   mm/sparse-vmemmap.c |  231 
 +++
   mm/sparse.c |3 +-
   3 files changed, 234 insertions(+), 1 deletions(-)

 diff --git a/include/linux/mm.h b/include/linux/mm.h
 index 5657670..1f26af5 100644
 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned 
 long pages, int node);
   void vmemmap_populate_print_last(void);
   void register_page_bootmem_memmap(unsigned long section_nr, struct page 
 *map,
 unsigned long size);
 +void vmemmap_free(struct page *memmap, unsigned long nr_pages);

   enum mf_flags {
   MF_COUNT_INCREASED = 1  0,
 diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
 index 1b7e22a..748732d 100644
 --- a/mm/sparse-vmemmap.c
 +++ b/mm/sparse-vmemmap.c
 @@ -29,6 +29,10 @@
   #includeasm/pgalloc.h
   #includeasm/pgtable.h

 +#ifdef CONFIG_MEMORY_HOTREMOVE
 +#includeasm/tlbflush.h
 +#endif
 +
   /*
* Allocate a block of memory to be used to back the virtual memory map
* or to back the page tables that are used to create the mapping.
 @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page 
 **map_map,
   vmemmap_buf_end = NULL;
   }
   }
 +
 +#ifdef CONFIG_MEMORY_HOTREMOVE
 +
 +#define PAGE_INUSE 0xFD
 +
 +static void vmemmap_free_pages(struct page *page, int order)
 +{
 +struct zone *zone;
 +unsigned long magic;
 +
 +magic = (unsigned long) page-lru.next;
 +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
 +put_page_bootmem(page);
 +
 +zone = page_zone(page);
 +zone_span_writelock(zone);
 +zone-present_pages++;
 +zone_span_writeunlock(zone);
 +totalram_pages++;
 +} else
 +free_pages((unsigned long)page_address(page), order);
 
 Here, I think SECTION_INFO and MIX_SECTION_INFO pages are all allocated
 by bootmem, so I put this function this way.
 
 I'm not sure if parameter order is necessary here. It will always be 0
 in your code. Is this OK to you ?
 

parameter order is necessary in cpu_has_pse case:
vmemmap_pmd_remove
free_pagetable(pmd_page(*pmd), get_order(PMD_SIZE))

 static void free_pagetable(struct page *page)
 {
 struct zone *zone;
 bool bootmem = false;
 unsigned long magic;
 
 /* bootmem page has reserved flag */
 if (PageReserved(page)) {
 __ClearPageReserved(page);
 bootmem = true;
 }
 
 magic = (unsigned long) page-lru.next;
 if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
 put_page_bootmem(page);
 else
 __free_page(page);
 
 /*
  * SECTION_INFO pages and MIX_SECTION_INFO pages
  * are all allocated by bootmem.
  */
 if (bootmem) {
 zone = page_zone(page);
 zone_span_writelock(zone);
 zone-present_pages++;
 zone_span_writeunlock(zone);
 totalram_pages++;
 }
 }
 
 (snip)
 
 +
 +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned 
 long end)
 +{
 +pte_t *pte;
 +unsigned long next;
 +void *page_addr;
 +
 +pte = pte_offset_kernel(pmd, addr);
 +for (; addr  end; pte++, addr += PAGE_SIZE) {
 +next = (addr + PAGE_SIZE)  PAGE_MASK;
 +if (next  end)
 +next = end;
 +
 +if (pte_none(*pte))
 
 Here, you checked xxx_none() in your vmemmap_xxx_remove(), but you used
 !xxx_present() in your x86_64 patches. Is it OK if I only check
 !xxx_present() ?

It is Ok.

 
 +continue;
 +if (IS_ALIGNED(addr, PAGE_SIZE)
 +IS_ALIGNED(next, PAGE_SIZE)) {
 +vmemmap_free_pages(pte_page(*pte), 0);
 +spin_lock(init_mm.page_table_lock);
 +pte_clear(init_mm, addr, pte);
 +spin_unlock(init_mm.page_table_lock);
 +} else {
 +/*
 + * Removed page structs are filled with 0xFD.
 + */
 +memset((void *)addr, PAGE_INUSE, next - addr);
 +page_addr = page_address(pte_page(*pte));
 +
 +if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
 +spin_lock(init_mm.page_table_lock);
 +pte_clear(init_mm, addr, pte);
 +spin_unlock(init_mm.page_table_lock);
 
 Here, since we clear pte, we should also free the page, right ?
 

Right, I forgot here, sorry

Re: [Patch v4 09/12] memory-hotplug: remove page table of x86_64 architecture

2012-12-06 Thread Jianguo Wu
On 2012/12/7 14:43, Tang Chen wrote:

 On 11/27/2012 06:00 PM, Wen Congyang wrote:
 For hot removing memory, we sholud remove page table about the memory.
 So the patch searches a page table about the removed memory, and clear
 page table.
 
 (snip)
 
 +void __meminit
 +kernel_physical_mapping_remove(unsigned long start, unsigned long end)
 +{
 +unsigned long next;
 +bool pgd_changed = false;
 +
 +start = (unsigned long)__va(start);
 +end = (unsigned long)__va(end);
 
 Hi Wu,
 
 Here, you expect start and end are physical addresses. But in
 phys_xxx_remove() function, I think using virtual addresses is just
 fine. Functions like pmd_addr_end() and pud_index() only calculate
 an offset.


Hi Tang,

 

Virtual addresses will work fine, I used physical addresses in order to
keep consistent with phys_pud[pmd/pte]_init(), So I think we should keep this.

Thanks,
Jianguo Wu

 So, would you please tell me if we have to use physical addresses here ?
 
 Thanks. :)
 
 +
 +for (; start  end; start = next) {
 +pgd_t *pgd = pgd_offset_k(start);
 +pud_t *pud;
 +
 +next = pgd_addr_end(start, end);
 +
 +if (!pgd_present(*pgd))
 +continue;
 +
 +pud = map_low_page((pud_t *)pgd_page_vaddr(*pgd));
 +phys_pud_remove(pud, __pa(start), __pa(next));
 +if (free_pud_table(pud, pgd))
 +pgd_changed = true;
 +unmap_low_page(pud);
 +}
 +
 +if (pgd_changed)
 +sync_global_pgds(start, end - 1);
 +
 +flush_tlb_all();
 +}
 +
   #ifdef CONFIG_MEMORY_HOTREMOVE
   int __ref arch_remove_memory(u64 start, u64 size)
   {
 @@ -692,6 +921,8 @@ int __ref arch_remove_memory(u64 start, u64 size)
   ret = __remove_pages(zone, start_pfn, nr_pages);
   WARN_ON_ONCE(ret);

 +kernel_physical_mapping_remove(start, start + size);
 +
   return ret;
   }
   #endif
 
 
 
 .
 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-12-04 Thread Jianguo Wu
Hi Tang,

Thanks for your review and comments, Please see below for my reply.

On 2012/12/4 17:13, Tang Chen wrote:

 Hi Wu,
 
 Sorry to make noise here. Please see below. :)
 
 On 12/03/2012 10:23 AM, Jianguo Wu wrote:
 Signed-off-by: Jianguo Wuwujian...@huawei.com
 Signed-off-by: Jiang Liujiang@huawei.com
 ---
   include/linux/mm.h  |1 +
   mm/sparse-vmemmap.c |  231 
 +++
   mm/sparse.c |3 +-
   3 files changed, 234 insertions(+), 1 deletions(-)

 diff --git a/include/linux/mm.h b/include/linux/mm.h
 index 5657670..1f26af5 100644
 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned 
 long pages, int node);
   void vmemmap_populate_print_last(void);
   void register_page_bootmem_memmap(unsigned long section_nr, struct page 
 *map,
 unsigned long size);
 +void vmemmap_free(struct page *memmap, unsigned long nr_pages);

   enum mf_flags {
   MF_COUNT_INCREASED = 1  0,
 diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
 index 1b7e22a..748732d 100644
 --- a/mm/sparse-vmemmap.c
 +++ b/mm/sparse-vmemmap.c
 @@ -29,6 +29,10 @@
   #includeasm/pgalloc.h
   #includeasm/pgtable.h

 +#ifdef CONFIG_MEMORY_HOTREMOVE
 +#includeasm/tlbflush.h
 +#endif
 +
   /*
* Allocate a block of memory to be used to back the virtual memory map
* or to back the page tables that are used to create the mapping.
 @@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page 
 **map_map,
   vmemmap_buf_end = NULL;
   }
   }
 +
 +#ifdef CONFIG_MEMORY_HOTREMOVE
 +
 +#define PAGE_INUSE 0xFD
 +
 +static void vmemmap_free_pages(struct page *page, int order)
 +{
 +struct zone *zone;
 +unsigned long magic;
 +
 +magic = (unsigned long) page-lru.next;
 +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
 +put_page_bootmem(page);
 +
 +zone = page_zone(page);
 +zone_span_writelock(zone);
 +zone-present_pages++;
 +zone_span_writeunlock(zone);
 +totalram_pages++;
 
 Seems that we have different ways to handle pages allocated by bootmem
 or by regular allocator. Is the checking way in [PATCH 09/12] available
 here ?
 
 +/* bootmem page has reserved flag */
 +if (PageReserved(page)) {
 ..
 +}
 
 If so, I think we can just merge these two functions.

Hmm, direct mapping table isn't allocated by bootmem allocator such as 
memblock, can't be free by put_page_bootmem().
But I will try to merge these two functions.

 
 +} else
 +free_pages((unsigned long)page_address(page), order);
 +}
 +
 +static void free_pte_table(pmd_t *pmd)
 +{
 +pte_t *pte, *pte_start;
 +int i;
 +
 +pte_start = (pte_t *)pmd_page_vaddr(*pmd);
 +for (i = 0; i  PTRS_PER_PTE; i++) {
 +pte = pte_start + i;
 +if (pte_val(*pte))
 +return;
 +}
 +
 +/* free a pte talbe */
 +vmemmap_free_pages(pmd_page(*pmd), 0);
 +spin_lock(init_mm.page_table_lock);
 +pmd_clear(pmd);
 +spin_unlock(init_mm.page_table_lock);
 +}
 +
 +static void free_pmd_table(pud_t *pud)
 +{
 +pmd_t *pmd, *pmd_start;
 +int i;
 +
 +pmd_start = (pmd_t *)pud_page_vaddr(*pud);
 +for (i = 0; i  PTRS_PER_PMD; i++) {
 +pmd = pmd_start + i;
 +if (pmd_val(*pmd))
 +return;
 +}
 +
 +/* free a pmd talbe */
 +vmemmap_free_pages(pud_page(*pud), 0);
 +spin_lock(init_mm.page_table_lock);
 +pud_clear(pud);
 +spin_unlock(init_mm.page_table_lock);
 +}
 +
 +static void free_pud_table(pgd_t *pgd)
 +{
 +pud_t *pud, *pud_start;
 +int i;
 +
 +pud_start = (pud_t *)pgd_page_vaddr(*pgd);
 +for (i = 0; i  PTRS_PER_PUD; i++) {
 +pud = pud_start + i;
 +if (pud_val(*pud))
 +return;
 +}
 +
 +/* free a pud table */
 +vmemmap_free_pages(pgd_page(*pgd), 0);
 +spin_lock(init_mm.page_table_lock);
 +pgd_clear(pgd);
 +spin_unlock(init_mm.page_table_lock);
 +}
 
 All the free_xxx_table() are very similar to the functions in
 [PATCH 09/12]. Could we reuse them anyway ?

yes, we can reuse them.

 
 +
 +static int split_large_page(pte_t *kpte, unsigned long address, pte_t 
 *pbase)
 +{
 +struct page *page = pmd_page(*(pmd_t *)kpte);
 +int i = 0;
 +unsigned long magic;
 +unsigned long section_nr;
 +
 +__split_large_page(kpte, address, pbase);
 
 Is this patch going to replace [PATCH 08/12] ?
 

I wish to replace [PATCH 08/12], but need Congyang and Yasuaki to confirm 
first:)

 If so, __split_large_page() was added and exported in [PATCH 09/12],
 then we should move it here, right ?

yes.

and what do you think about moving vmemmap_pud[pmd/pte]_remove() to 
arch/x86/mm/init_64.c,
to be consistent with vmemmap_populate() ?

I will rework [PATCH 08/12] and [PATCH 09/12] soon.

Thanks,
Jianguo Wu.

 
 If not, free_map_bootmem() and __kfree_section_memmap

Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-12-04 Thread Jianguo Wu
Hi Tang,

On 2012/12/5 10:07, Tang Chen wrote:

 Hi Wu,
 
 On 12/04/2012 08:20 PM, Jianguo Wu wrote:
 (snip)

 Seems that we have different ways to handle pages allocated by bootmem
 or by regular allocator. Is the checking way in [PATCH 09/12] available
 here ?

 +/* bootmem page has reserved flag */
 +if (PageReserved(page)) {
 ..
 +}

 If so, I think we can just merge these two functions.

 Hmm, direct mapping table isn't allocated by bootmem allocator such as 
 memblock, can't be free by put_page_bootmem().
 But I will try to merge these two functions.

 
 Oh, I didn't notice this, thanks. :)
 
 (snip)
 
 +
 +__split_large_page(kpte, address, pbase);

 Is this patch going to replace [PATCH 08/12] ?


 I wish to replace [PATCH 08/12], but need Congyang and Yasuaki to confirm 
 first:)

 If so, __split_large_page() was added and exported in [PATCH 09/12],
 then we should move it here, right ?

 yes.

 and what do you think about moving vmemmap_pud[pmd/pte]_remove() to 
 arch/x86/mm/init_64.c,
 to be consistent with vmemmap_populate() ?
 
 It is a good idea since pud/pmd/pte related code could be platform
 dependent. And I'm also trying to move vmemmap_free() to
 arch/x86/mm/init_64.c too. I want to have a common interface just
 like vmemmap_populate(). :)
 

Great.


 I will rework [PATCH 08/12] and [PATCH 09/12] soon.
 
 I am rebasing the whole patch set now. And I think I chould finish part
 of your work too. A new patch-set is coming soon, and your rework is
 also welcome. :)


Since you are rebasing now, I will wait for your new patche-set :).

Thanks.
Jianguo Wu

 Thanks. :)
 
 
 
 .
 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-12-02 Thread Jianguo Wu
Hi Congyang,

This is the new version.

Thanks,
Jianguo Wu.


Signed-off-by: Jianguo Wu wujian...@huawei.com
Signed-off-by: Jiang Liu jiang@huawei.com
---
 include/linux/mm.h  |1 +
 mm/sparse-vmemmap.c |  231 +++
 mm/sparse.c |3 +-
 3 files changed, 234 insertions(+), 1 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5657670..1f26af5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned 
long pages, int node);
 void vmemmap_populate_print_last(void);
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
  unsigned long size);
+void vmemmap_free(struct page *memmap, unsigned long nr_pages);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1  0,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 1b7e22a..748732d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -29,6 +29,10 @@
 #include asm/pgalloc.h
 #include asm/pgtable.h
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+#include asm/tlbflush.h
+#endif
+
 /*
  * Allocate a block of memory to be used to back the virtual memory map
  * or to back the page tables that are used to create the mapping.
@@ -224,3 +228,230 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
vmemmap_buf_end = NULL;
}
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+
+#define PAGE_INUSE 0xFD
+
+static void vmemmap_free_pages(struct page *page, int order)
+{
+   struct zone *zone;
+   unsigned long magic;
+
+   magic = (unsigned long) page-lru.next;
+   if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
+   put_page_bootmem(page);
+
+   zone = page_zone(page);
+   zone_span_writelock(zone);
+   zone-present_pages++;
+   zone_span_writeunlock(zone);
+   totalram_pages++;
+   } else
+   free_pages((unsigned long)page_address(page), order);
+}
+
+static void free_pte_table(pmd_t *pmd)
+{
+   pte_t *pte, *pte_start;
+   int i;
+
+   pte_start = (pte_t *)pmd_page_vaddr(*pmd);
+   for (i = 0; i  PTRS_PER_PTE; i++) {
+   pte = pte_start + i;
+   if (pte_val(*pte))
+   return;
+   }
+
+   /* free a pte talbe */
+   vmemmap_free_pages(pmd_page(*pmd), 0);
+   spin_lock(init_mm.page_table_lock);
+   pmd_clear(pmd);
+   spin_unlock(init_mm.page_table_lock);
+}
+
+static void free_pmd_table(pud_t *pud)
+{
+   pmd_t *pmd, *pmd_start;
+   int i;
+
+   pmd_start = (pmd_t *)pud_page_vaddr(*pud);
+   for (i = 0; i  PTRS_PER_PMD; i++) {
+   pmd = pmd_start + i;
+   if (pmd_val(*pmd))
+   return;
+   }
+
+   /* free a pmd talbe */
+   vmemmap_free_pages(pud_page(*pud), 0);
+   spin_lock(init_mm.page_table_lock);
+   pud_clear(pud);
+   spin_unlock(init_mm.page_table_lock);
+}
+
+static void free_pud_table(pgd_t *pgd)
+{
+   pud_t *pud, *pud_start;
+   int i;
+
+   pud_start = (pud_t *)pgd_page_vaddr(*pgd);
+   for (i = 0; i  PTRS_PER_PUD; i++) {
+   pud = pud_start + i;
+   if (pud_val(*pud))
+   return;
+   }
+
+   /* free a pud table */
+   vmemmap_free_pages(pgd_page(*pgd), 0);
+   spin_lock(init_mm.page_table_lock);
+   pgd_clear(pgd);
+   spin_unlock(init_mm.page_table_lock);
+}
+
+static int split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase)
+{
+   struct page *page = pmd_page(*(pmd_t *)kpte);
+   int i = 0;
+   unsigned long magic;
+   unsigned long section_nr;
+
+   __split_large_page(kpte, address, pbase);
+   __flush_tlb_all();
+
+   magic = (unsigned long) page-lru.next;
+   if (magic == SECTION_INFO) {
+   section_nr = pfn_to_section_nr(page_to_pfn(page));
+   while (i  PTRS_PER_PMD) {
+   page++;
+   i++;
+   get_page_bootmem(section_nr, page, SECTION_INFO);
+   }
+   }
+
+   return 0;
+}
+
+static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long 
end)
+{
+   pte_t *pte;
+   unsigned long next;
+   void *page_addr;
+
+   pte = pte_offset_kernel(pmd, addr);
+   for (; addr  end; pte++, addr += PAGE_SIZE) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   if (next  end)
+   next = end;
+
+   if (pte_none(*pte))
+   continue;
+   if (IS_ALIGNED(addr, PAGE_SIZE) 
+   IS_ALIGNED(next, PAGE_SIZE)) {
+   vmemmap_free_pages(pte_page(*pte), 0);
+   spin_lock(init_mm.page_table_lock);
+   pte_clear(init_mm, addr, pte

Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-29 Thread Jianguo Wu
Hi Congyang,

Thanks for your review and comments.

On 2012/11/30 9:45, Wen Congyang wrote:

 At 11/28/2012 05:40 PM, Jianguo Wu Wrote:
 Hi Congyang,

 I think vmemmap's pgtable pages should be freed after all entries are 
 cleared, I have a patch to do this.
 The code logic is the same as [Patch v4 09/12] memory-hotplug: remove page 
 table of x86_64 architecture.

 How do you think about this?

 Signed-off-by: Jianguo Wu wujian...@huawei.com
 Signed-off-by: Jiang Liu jiang@huawei.com
 ---
  include/linux/mm.h  |1 +
  mm/sparse-vmemmap.c |  214 
 +++
  mm/sparse.c |5 +-
  3 files changed, 218 insertions(+), 2 deletions(-)

 diff --git a/include/linux/mm.h b/include/linux/mm.h
 index 5657670..1f26af5 100644
 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned 
 long pages, int node);
  void vmemmap_populate_print_last(void);
  void register_page_bootmem_memmap(unsigned long section_nr, struct page 
 *map,
unsigned long size);
 +void vmemmap_free(struct page *memmap, unsigned long nr_pages);
  
  enum mf_flags {
  MF_COUNT_INCREASED = 1  0,
 diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
 index 1b7e22a..242cb28 100644
 --- a/mm/sparse-vmemmap.c
 +++ b/mm/sparse-vmemmap.c
 @@ -29,6 +29,10 @@
  #include asm/pgalloc.h
  #include asm/pgtable.h
  
 +#ifdef CONFIG_MEMORY_HOTREMOVE
 +#include asm/tlbflush.h
 +#endif
 +
  /*
   * Allocate a block of memory to be used to back the virtual memory map
   * or to back the page tables that are used to create the mapping.
 @@ -224,3 +228,213 @@ void __init sparse_mem_maps_populate_node(struct page 
 **map_map,
  vmemmap_buf_end = NULL;
  }
  }
 +
 +#ifdef CONFIG_MEMORY_HOTREMOVE
 +static void vmemmap_free_pages(struct page *page, int order)
 +{
 +struct zone *zone;
 +unsigned long magic;
 +
 +magic = (unsigned long) page-lru.next;
 +if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
 +put_page_bootmem(page);
 +
 +zone = page_zone(page);
 +zone_span_writelock(zone);
 +zone-present_pages++;
 +zone_span_writeunlock(zone);
 +totalram_pages++;
 +} else {
 +if (is_vmalloc_addr(page_address(page)))
 +vfree(page_address(page));
 
 Hmm, vmemmap doesn't use vmalloc() to allocate memory.
 

yes, this can be removed.

 +else
 +free_pages((unsigned long)page_address(page), order);
 +}
 +}
 +
 +static void free_pte_table(pmd_t *pmd)
 +{
 +pte_t *pte, *pte_start;
 +int i;
 +
 +pte_start = (pte_t *)pmd_page_vaddr(*pmd);
 +for (i = 0; i  PTRS_PER_PTE; i++) {
 +pte = pte_start + i;
 +if (pte_val(*pte))
 +return;
 +}
 +
 +/* free a pte talbe */
 +vmemmap_free_pages(pmd_page(*pmd), 0);
 +spin_lock(init_mm.page_table_lock);
 +pmd_clear(pmd);
 +spin_unlock(init_mm.page_table_lock);
 +}
 +
 +static void free_pmd_table(pud_t *pud)
 +{
 +pmd_t *pmd, *pmd_start;
 +int i;
 +
 +pmd_start = (pmd_t *)pud_page_vaddr(*pud);
 +for (i = 0; i  PTRS_PER_PMD; i++) {
 +pmd = pmd_start + i;
 +if (pmd_val(*pmd))
 +return;
 +}
 +
 +/* free a pmd talbe */
 +vmemmap_free_pages(pud_page(*pud), 0);
 +spin_lock(init_mm.page_table_lock);
 +pud_clear(pud);
 +spin_unlock(init_mm.page_table_lock);
 +}
 +
 +static void free_pud_table(pgd_t *pgd)
 +{
 +pud_t *pud, *pud_start;
 +int i;
 +
 +pud_start = (pud_t *)pgd_page_vaddr(*pgd);
 +for (i = 0; i  PTRS_PER_PUD; i++) {
 +pud = pud_start + i;
 +if (pud_val(*pud))
 +return;
 +}
 +
 +/* free a pud table */
 +vmemmap_free_pages(pgd_page(*pgd), 0);
 +spin_lock(init_mm.page_table_lock);
 +pgd_clear(pgd);
 +spin_unlock(init_mm.page_table_lock);
 +}
 +
 +static int split_large_page(pte_t *kpte, unsigned long address, pte_t 
 *pbase)
 +{
 +struct page *page = pmd_page(*(pmd_t *)kpte);
 +int i = 0;
 +unsigned long magic;
 +unsigned long section_nr;
 +
 +__split_large_page(kpte, address, pbase);
 +__flush_tlb_all();
 +
 +magic = (unsigned long) page-lru.next;
 +if (magic == SECTION_INFO) {
 +section_nr = pfn_to_section_nr(page_to_pfn(page));
 +while (i  PTRS_PER_PMD) {
 +page++;
 +i++;
 +get_page_bootmem(section_nr, page, SECTION_INFO);
 +}
 +}
 +
 +return 0;
 +}
 +
 +static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned 
 long end)
 +{
 +pte_t *pte;
 +unsigned long next;
 +
 +pte = pte_offset_kernel(pmd, addr);
 +for (; addr  end; pte++, addr += PAGE_SIZE) {
 +next = (addr + PAGE_SIZE

Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-28 Thread Jianguo Wu
Hi Congyang,

I think vmemmap's pgtable pages should be freed after all entries are cleared, 
I have a patch to do this.
The code logic is the same as [Patch v4 09/12] memory-hotplug: remove page 
table of x86_64 architecture.

How do you think about this?

Signed-off-by: Jianguo Wu wujian...@huawei.com
Signed-off-by: Jiang Liu jiang@huawei.com
---
 include/linux/mm.h  |1 +
 mm/sparse-vmemmap.c |  214 +++
 mm/sparse.c |5 +-
 3 files changed, 218 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5657670..1f26af5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1642,6 +1642,7 @@ int vmemmap_populate(struct page *start_page, unsigned 
long pages, int node);
 void vmemmap_populate_print_last(void);
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
  unsigned long size);
+void vmemmap_free(struct page *memmap, unsigned long nr_pages);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1  0,
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 1b7e22a..242cb28 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -29,6 +29,10 @@
 #include asm/pgalloc.h
 #include asm/pgtable.h
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+#include asm/tlbflush.h
+#endif
+
 /*
  * Allocate a block of memory to be used to back the virtual memory map
  * or to back the page tables that are used to create the mapping.
@@ -224,3 +228,213 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
vmemmap_buf_end = NULL;
}
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+static void vmemmap_free_pages(struct page *page, int order)
+{
+   struct zone *zone;
+   unsigned long magic;
+
+   magic = (unsigned long) page-lru.next;
+   if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
+   put_page_bootmem(page);
+
+   zone = page_zone(page);
+   zone_span_writelock(zone);
+   zone-present_pages++;
+   zone_span_writeunlock(zone);
+   totalram_pages++;
+   } else {
+   if (is_vmalloc_addr(page_address(page)))
+   vfree(page_address(page));
+   else
+   free_pages((unsigned long)page_address(page), order);
+   }
+}
+
+static void free_pte_table(pmd_t *pmd)
+{
+   pte_t *pte, *pte_start;
+   int i;
+
+   pte_start = (pte_t *)pmd_page_vaddr(*pmd);
+   for (i = 0; i  PTRS_PER_PTE; i++) {
+   pte = pte_start + i;
+   if (pte_val(*pte))
+   return;
+   }
+
+   /* free a pte talbe */
+   vmemmap_free_pages(pmd_page(*pmd), 0);
+   spin_lock(init_mm.page_table_lock);
+   pmd_clear(pmd);
+   spin_unlock(init_mm.page_table_lock);
+}
+
+static void free_pmd_table(pud_t *pud)
+{
+   pmd_t *pmd, *pmd_start;
+   int i;
+
+   pmd_start = (pmd_t *)pud_page_vaddr(*pud);
+   for (i = 0; i  PTRS_PER_PMD; i++) {
+   pmd = pmd_start + i;
+   if (pmd_val(*pmd))
+   return;
+   }
+
+   /* free a pmd talbe */
+   vmemmap_free_pages(pud_page(*pud), 0);
+   spin_lock(init_mm.page_table_lock);
+   pud_clear(pud);
+   spin_unlock(init_mm.page_table_lock);
+}
+
+static void free_pud_table(pgd_t *pgd)
+{
+   pud_t *pud, *pud_start;
+   int i;
+
+   pud_start = (pud_t *)pgd_page_vaddr(*pgd);
+   for (i = 0; i  PTRS_PER_PUD; i++) {
+   pud = pud_start + i;
+   if (pud_val(*pud))
+   return;
+   }
+
+   /* free a pud table */
+   vmemmap_free_pages(pgd_page(*pgd), 0);
+   spin_lock(init_mm.page_table_lock);
+   pgd_clear(pgd);
+   spin_unlock(init_mm.page_table_lock);
+}
+
+static int split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase)
+{
+   struct page *page = pmd_page(*(pmd_t *)kpte);
+   int i = 0;
+   unsigned long magic;
+   unsigned long section_nr;
+
+   __split_large_page(kpte, address, pbase);
+   __flush_tlb_all();
+
+   magic = (unsigned long) page-lru.next;
+   if (magic == SECTION_INFO) {
+   section_nr = pfn_to_section_nr(page_to_pfn(page));
+   while (i  PTRS_PER_PMD) {
+   page++;
+   i++;
+   get_page_bootmem(section_nr, page, SECTION_INFO);
+   }
+   }
+
+   return 0;
+}
+
+static void vmemmap_pte_remove(pmd_t *pmd, unsigned long addr, unsigned long 
end)
+{
+   pte_t *pte;
+   unsigned long next;
+
+   pte = pte_offset_kernel(pmd, addr);
+   for (; addr  end; pte++, addr += PAGE_SIZE) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   if (next  end)
+   next = end;
+
+   if (pte_none(*pte))
+   continue

Re: [PATCH v3 11/12] memory-hotplug: remove sysfs file of node

2012-11-26 Thread Jianguo Wu
On 2012/11/1 17:44, Wen Congyang wrote:
 This patch introduces a new function try_offline_node() to
 remove sysfs file of node when all memory sections of this
 node are removed. If some memory sections of this node are
 not removed, this function does nothing.
 
 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  drivers/acpi/acpi_memhotplug.c |  8 +-
  include/linux/memory_hotplug.h |  2 +-
  mm/memory_hotplug.c| 58 
 --
  3 files changed, 64 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
 index 24c807f..0780f99 100644
 --- a/drivers/acpi/acpi_memhotplug.c
 +++ b/drivers/acpi/acpi_memhotplug.c
 @@ -310,7 +310,9 @@ static int acpi_memory_disable_device(struct 
 acpi_memory_device *mem_device)
  {
   int result;
   struct acpi_memory_info *info, *n;
 + int node;
  
 + node = acpi_get_node(mem_device-device-handle);
  
   /*
* Ask the VM to offline this memory range.
 @@ -318,7 +320,11 @@ static int acpi_memory_disable_device(struct 
 acpi_memory_device *mem_device)
*/
   list_for_each_entry_safe(info, n, mem_device-res_list, list) {
   if (info-enabled) {
 - result = remove_memory(info-start_addr, info-length);
 + if (node  0)
 + node = memory_add_physaddr_to_nid(
 + info-start_addr);
 + result = remove_memory(node, info-start_addr,
 + info-length);
   if (result)
   return result;
   }
 diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
 index d4c4402..7b4cfe6 100644
 --- a/include/linux/memory_hotplug.h
 +++ b/include/linux/memory_hotplug.h
 @@ -231,7 +231,7 @@ extern int arch_add_memory(int nid, u64 start, u64 size);
  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
  extern int offline_memory_block(struct memory_block *mem);
  extern bool is_memblock_offlined(struct memory_block *mem);
 -extern int remove_memory(u64 start, u64 size);
 +extern int remove_memory(int node, u64 start, u64 size);
  extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
   int nr_pages);
  extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
 *ms);
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index 7bcced0..d965da3 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -29,6 +29,7 @@
  #include linux/suspend.h
  #include linux/mm_inline.h
  #include linux/firmware-map.h
 +#include linux/stop_machine.h
  
  #include asm/tlbflush.h
  
 @@ -1299,7 +1300,58 @@ static int is_memblock_offlined_cb(struct memory_block 
 *mem, void *arg)
   return ret;
  }
  
 -int __ref remove_memory(u64 start, u64 size)
 +static int check_cpu_on_node(void *data)
 +{
 + struct pglist_data *pgdat = data;
 + int cpu;
 +
 + for_each_present_cpu(cpu) {
 + if (cpu_to_node(cpu) == pgdat-node_id)
 + /*
 +  * the cpu on this node isn't removed, and we can't
 +  * offline this node.
 +  */
 + return -EBUSY;
 + }
 +
 + return 0;
 +}
 +
 +/* offline the node if all memory sections of this node are removed */
 +static void try_offline_node(int nid)
 +{
 + unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn;
 + unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages;
 + unsigned long pfn;
 +
 + for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
 + unsigned long section_nr = pfn_to_section_nr(pfn);
 +
 + if (!present_section_nr(section_nr))
 + continue;
 +
 + if (pfn_to_nid(pfn) != nid)
 + continue;
 +
 + /*
 +  * some memory sections of this node are not removed, and we
 +  * can't offline node now.
 +  */
 + return;
 + }
 +
 + if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL))
 + return;

how about:
if (nr_cpus_node(nid))
return;
 +
 + /*
 +  * all memory/cpu of this node are removed, we can offline this
 +  * node now.
 +  */
 + node_set_offline(nid);
 + unregister_one_node(nid);
 +}
 +
 +int __ref remove_memory(int nid, u64 start, u64 size)
  {
   unsigned long start_pfn, 

Re: [PATCH v3 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-26 Thread Jianguo Wu
 */
 + vmemmap_kfree(page, nr_pages);
  }
  static void free_map_bootmem(struct page *page, unsigned long nr_pages)
  {
 + vmemmap_free_bootmem(page, nr_pages);
  }

Hi Congyang,
For vmemmap, nr_pages should be PAGES_PER_SECTION for 
free_map_bootmem(),
which is passed by free_section_usemap(), right? 
But now, nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))  
PAGE_SHIFT.

Signed-off-by: Jianguo Wu wujian...@huawei.com
---
 mm/sparse.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index fac95f2..31e5282 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -713,8 +713,12 @@ static void free_section_usemap(struct page *memmap, 
unsigned long *usemap)
struct page *memmap_page;
memmap_page = virt_to_page(memmap);
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+   nr_pages = PAGES_PER_SECTION;
+#else
nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page))
 PAGE_SHIFT;
+#endif
 
free_map_bootmem(memmap_page, nr_pages);
}
-- 
1.7.6.1

  #else
  static struct page *__kmalloc_section_memmap(unsigned long nr_pages)



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-26 Thread Jianguo Wu
On 2012/11/27 14:49, Wen Congyang wrote:

 At 11/27/2012 01:47 PM, Jianguo Wu Wrote:
 On 2012/11/1 17:44, Wen Congyang wrote:

 From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 All pages of virtual mapping in removed memory cannot be freed, since some 
 pages
 used as PGD/PUD includes not only removed memory but also other memory. So 
 the
 patch checks whether page can be freed or not.

 How to check whether page can be freed or not?
  1. When removing memory, the page structs of the revmoved memory are filled
 with 0FD.
  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
 In this case, the page used as PT/PMD can be freed.

 Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
 into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

 Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for 
 ia64,
 ppc, s390, and sparc.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 ---
  arch/ia64/mm/discontig.c  |   8 
  arch/powerpc/mm/init_64.c |   8 
  arch/s390/mm/vmem.c   |   8 
  arch/sparc/mm/init_64.c   |   8 
  arch/x86/mm/init_64.c | 119 
 ++
  include/linux/mm.h|   2 +
  mm/memory_hotplug.c   |  17 +--
  mm/sparse.c   |   5 +-
  8 files changed, 158 insertions(+), 17 deletions(-)

 diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
 index 33943db..0d23b69 100644
 --- a/arch/ia64/mm/discontig.c
 +++ b/arch/ia64/mm/discontig.c
 @@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
 return vmemmap_populate_basepages(start_page, size, node);
  }
  
 +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
 +{
 +}
 +
 +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
 +{
 +}
 +
  void register_page_bootmem_memmap(unsigned long section_nr,
   struct page *start_page, unsigned long size)
  {
 diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
 index 6466440..df7d155 100644
 --- a/arch/powerpc/mm/init_64.c
 +++ b/arch/powerpc/mm/init_64.c
 @@ -298,6 +298,14 @@ int __meminit vmemmap_populate(struct page *start_page,
 return 0;
  }
  
 +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
 +{
 +}
 +
 +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
 +{
 +}
 +
  void register_page_bootmem_memmap(unsigned long section_nr,
   struct page *start_page, unsigned long size)
  {
 diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
 index 4f4803a..ab69c34 100644
 --- a/arch/s390/mm/vmem.c
 +++ b/arch/s390/mm/vmem.c
 @@ -236,6 +236,14 @@ out:
 return ret;
  }
  
 +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
 +{
 +}
 +
 +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
 +{
 +}
 +
  void register_page_bootmem_memmap(unsigned long section_nr,
   struct page *start_page, unsigned long size)
  {
 diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
 index 75a984b..546855d 100644
 --- a/arch/sparc/mm/init_64.c
 +++ b/arch/sparc/mm/init_64.c
 @@ -2232,6 +2232,14 @@ void __meminit vmemmap_populate_print_last(void)
 }
  }
  
 +void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
 +{
 +}
 +
 +void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
 +{
 +}
 +
  void register_page_bootmem_memmap(unsigned long section_nr,
   struct page *start_page, unsigned long size)
  {
 diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
 index 795dae3..e85626d 100644
 --- a/arch/x86/mm/init_64.c
 +++ b/arch/x86/mm/init_64.c
 @@ -998,6 +998,125 @@ vmemmap_populate(struct page *start_page, unsigned 
 long size, int node)
 return 0;
  }
  
 +#define PAGE_INUSE 0xFD
 +
 +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long 
 end,
 +   struct page **pp, int *page_size)
 +{
 +   pgd_t *pgd;
 +   pud_t *pud;
 +   pmd_t *pmd;
 +   pte_t *pte = NULL;
 +   void *page_addr;
 +   unsigned long next;
 +
 +   *pp = NULL;
 +
 +   pgd = pgd_offset_k(addr);
 +   if (pgd_none(*pgd))
 +   return pgd_addr_end(addr, end);
 +
 +   pud = pud_offset(pgd, addr);
 +   if (pud_none(*pud))
 +   return pud_addr_end(addr, end);
 +
 +   if (!cpu_has_pse) {
 +   next = (addr + PAGE_SIZE)  PAGE_MASK;
 +   pmd = pmd_offset(pud, addr);
 +   if (pmd_none(*pmd))
 +   return next;
 +
 +   pte = pte_offset_kernel(pmd, addr

Re: [PATCH v2 10/12] memory-hotplug: memory_hotplug: clear zone when removing the memory

2012-10-29 Thread Jianguo Wu
On 2012/10/23 18:30, we...@cn.fujitsu.com wrote:
 From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 
 When a memory is added, we update zone's and pgdat's start_pfn and
 spanned_pages in the function __add_zone(). So we should revert them
 when the memory is removed.
 
 The patch adds a new function __remove_zone() to do this.
 
 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  mm/memory_hotplug.c |  207 
 +++
  1 files changed, 207 insertions(+), 0 deletions(-)
 
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index 03153cf..55a228d 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -312,10 +312,213 @@ static int __meminit __add_section(int nid, struct 
 zone *zone,
   return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
  }
  
 +/* find the smallest valid pfn in the range [start_pfn, end_pfn) */
 +static int find_smallest_section_pfn(int nid, struct zone *zone,
 +  unsigned long start_pfn,
 +  unsigned long end_pfn)
 +{
 + struct mem_section *ms;
 +
 + for (; start_pfn  end_pfn; start_pfn += PAGES_PER_SECTION) {
 + ms = __pfn_to_section(start_pfn);
 +
 + if (unlikely(!valid_section(ms)))
 + continue;
 +
 + if (unlikely(pfn_to_nid(start_pfn)) != nid)

if (unlikely(pfn_to_nid(start_pfn) != nid))

 + continue;
 +
 + if (zone  zone != page_zone(pfn_to_page(start_pfn)))
 + continue;
 +
 + return start_pfn;
 + }
 +
 + return 0;
 +}
 +
 +/* find the biggest valid pfn in the range [start_pfn, end_pfn). */
 +static int find_biggest_section_pfn(int nid, struct zone *zone,
 + unsigned long start_pfn,
 + unsigned long end_pfn)
 +{
 + struct mem_section *ms;
 + unsigned long pfn;
 +
 + /* pfn is the end pfn of a memory section. */
 + pfn = end_pfn - 1;
 + for (; pfn = start_pfn; pfn -= PAGES_PER_SECTION) {
 + ms = __pfn_to_section(pfn);
 +
 + if (unlikely(!valid_section(ms)))
 + continue;
 +
 + if (unlikely(pfn_to_nid(pfn)) != nid)

if (unlikely(pfn_to_nid(pfn) != nid))

 + continue;
 +
 + if (zone  zone != page_zone(pfn_to_page(pfn)))
 + continue;
 +
 + return pfn;
 + }
 +
 + return 0;
 +}
 +
 +static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
 +  unsigned long end_pfn)
 +{
 + unsigned long zone_start_pfn =  zone-zone_start_pfn;
 + unsigned long zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages;
 + unsigned long pfn;
 + struct mem_section *ms;
 + int nid = zone_to_nid(zone);
 +
 + zone_span_writelock(zone);
 + if (zone_start_pfn == start_pfn) {
 + /*
 +  * If the section is smallest section in the zone, it need
 +  * shrink zone-zone_start_pfn and zone-zone_spanned_pages.
 +  * In this case, we find second smallest valid mem_section
 +  * for shrinking zone.
 +  */
 + pfn = find_smallest_section_pfn(nid, zone, end_pfn,
 + zone_end_pfn);
 + if (pfn) {
 + zone-zone_start_pfn = pfn;
 + zone-spanned_pages = zone_end_pfn - pfn;
 + }
 + } else if (zone_end_pfn == end_pfn) {
 + /*
 +  * If the section is biggest section in the zone, it need
 +  * shrink zone-spanned_pages.
 +  * In this case, we find second biggest valid mem_section for
 +  * shrinking zone.
 +  */
 + pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn,
 +start_pfn);
 + if (pfn)
 + zone-spanned_pages = pfn - zone_start_pfn + 1;
 + }
 +
 + /*
 +  * The section is not biggest or smallest mem_section in the zone, it
 +  * only creates a hole in the zone. So in this case, we need not
 +  * change the zone. But perhaps, the zone has only hole data. Thus
 +  * it check the zone has only hole or not.
 +  */
 + pfn = zone_start_pfn;
 + for (; pfn  zone_end_pfn; pfn += PAGES_PER_SECTION) {
 + ms = __pfn_to_section(pfn);
 +
 + if (unlikely(!valid_section(ms)))
 + continue;
 +
 + if 

Re: [RFC V7 PATCH 18/19] memory-hotplug: add node_device_release

2012-08-20 Thread Jianguo Wu
On 2012/8/20 17:35, we...@cn.fujitsu.com wrote:
 From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 
 When calling unregister_node(), the function shows following message at
 device_release().
 
 Device 'node2' does not have a release() function, it is broken and must be
 fixed.
 
 So the patch implements node_device_release()
 
 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  drivers/base/node.c |8 
  1 files changed, 8 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/base/node.c b/drivers/base/node.c
 index af1a177..9bc2f57 100644
 --- a/drivers/base/node.c
 +++ b/drivers/base/node.c
 @@ -252,6 +252,13 @@ static inline void hugetlb_register_node(struct node 
 *node) {}
  static inline void hugetlb_unregister_node(struct node *node) {}
  #endif
  
 +static void node_device_release(struct device *dev)
 +{
 + struct node *node_dev = to_node(dev);
 +
 + flush_work(node_dev-node_work);

Hi Congyang,
I think this should be:
#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE)  defined(CONFIG_HUGETLBFS)
flush_work(node_dev-node_work);
#endif

As struct node defined in node.h:
struct node {
struct sys_device   sysdev;

#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE)  defined(CONFIG_HUGETLBFS)
struct work_struct  node_work;
#endif
};

Thanks
Jianguo Wu

 + memset(node_dev, 0, sizeof(struct node));
 +}
  
  /*
   * register_node - Setup a sysfs device for a node.
 @@ -265,6 +272,7 @@ int register_node(struct node *node, int num, struct node 
 *parent)
  
   node-dev.id = num;
   node-dev.bus = node_subsys;
 + node-dev.release = node_device_release;
   error = device_register(node-dev);
  
   if (!error){
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev