Re: [PATCH v2] memory-hotplug: Fix kernel warning during memory hotplug on ppc64

2015-11-03 Thread Yasuaki Ishimatsu

On Tue, 3 Nov 2015 11:21:59 -0600
John Allen <jal...@linux.vnet.ibm.com> wrote:

> This patch fixes a bug where a kernel warning is triggered when performing
> a memory hotplug on ppc64. This warning may also occur on any architecture
> that has multiple sections per memory block.
> 
> [   78.300767] [ cut here ]
> [   78.300768] WARNING: at ../drivers/base/memory.c:210
> [   78.300769] Modules linked in: rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag 
> inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c 
> ibmveth(X) rtc_generic btrfs xor raid6_pq xts gf128mul dm_crypt sd_mod sr_mod 
> cdrom crc_t10dif ibmvscsi(X) scsi_transport_srp scsi_tgt dm_mod sg scsi_mod 
> autofs4
> [   78.300789] Supported: Yes, External
> [   78.300791] CPU: 1 PID: 3090 Comm: systemd-udevd Tainted: G  X 
> 3.12.45-1-default #1
> [   78.300793] task: c004d7d1d970 ti: c004d7b9 task.ti: 
> c004d7b9
> [   78.300794] NIP: c04fcff8 LR: c04fda84 CTR: 
> 
> [   78.300795] REGS: c004d7b93930 TRAP: 0700   Tainted: G  X  
> (3.12.45-1-default)
> [   78.300796] MSR: 80029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24088848  
> XER: 
> [   78.300800] CFAR: c04fcf98 SOFTE: 1
> GPR00: 0537 c004d7b93bb0 c0e7f200 00053000
> GPR04: 1000 0001 c0e0f200 
> GPR08:  0001 0537 014dc000
> GPR12: 00054000 ce7f0900 10041040 
> GPR16: 0100206f0010 1003ff78 1006c824 100410b0
> GPR20: 1003ff90 1006c00c 01002073cd20 0100206f0760
> GPR24: 0100206f85a0 c076d950 c004ef7c95e0 c004d7b93e00
> GPR28: c004de601738 0001 c1218f80 003f
> [   78.300818] NIP [c04fcff8] memory_block_action+0x258/0x2e0
> [   78.300820] LR [c04fda84] memory_subsys_online+0x54/0x100
> [   78.300821] Call Trace:
> [   78.300822] [c004d7b93bb0] [c9071ce0] 0xc9071ce0 
> (unreliable)
> [   78.300824] [c004d7b93c40] [c04fda84] 
> memory_subsys_online+0x54/0x100
> [   78.300826] [c004d7b93c70] [c04df784] device_online+0xb4/0x120
> [   78.300828] [c004d7b93cb0] [c04fd738] 
> store_mem_state+0x88/0x220
> [   78.300830] [c004d7b93cf0] [c04db448] dev_attr_store+0x68/0xa0
> [   78.300833] [c004d7b93d30] [c031f938] 
> sysfs_write_file+0xf8/0x1d0
> [   78.300835] [c004d7b93d90] [c027d29c] vfs_write+0xec/0x250
> [   78.300837] [c004d7b93de0] [c027dfdc] SyS_write+0x6c/0xf0
> [   78.300839] [c004d7b93e30] [c000a17c] syscall_exit+0x0/0x7c
> [   78.300840] Instruction dump:
> [   78.300841] 780a0560 79482ea4 7ce94214 2fa7 41de0014 7d09402a 396b4000 
> 7907ffe3
> [   78.300844] 4082ff54 3cc2fff9 8926b83a 69290001 <0b09> 2fa9 
> 40de006c 3860fff0
> [   78.300847] ---[ end trace dfec8da06ebbc762 ]---
> 
> The warning is triggered because there is a udev rule that automatically
> tries to online memory after it has been added. The udev rule varies from
> distro to distro, but will generally look something like:
> 
> SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", 
> ATTR{state}="online"
> 
> On any architecture that uses memory_probe_store to reserve memory,
> this can interrupt the memory reservation process. This patch modifies
> memory_probe_store to take the hotplug sysfs lock to prevent the online
> of added memory before the completion of the probe.
> 
> Signed-off-by: John Allen <jal...@linux.vnet.ibm.com>
> ---

Looks good to me.

Reviewed-by: Yasuaki Ishimatsu <isimatu.yasu...@jp.fujitsu.com>

Thanks,
Yasuaki Ishimatsu

> v2: Move call to unlock_device_hotplug under "out" label
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index bece691..7c50415 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -422,6 +422,10 @@ memory_probe_store(struct device *dev, struct 
> device_attribute *attr,
>   if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
>   return -EINVAL;
> 
> + ret = lock_device_hotplug_sysfs();
> + if (ret)
> + return ret;
> +
>   for (i = 0; i < sections_per_block; i++) {
>   nid = memory_add_physaddr_to_nid(phys_addr);
>   ret = add_memory(nid, phys_addr,
> @@ -434,6 +438,7 @@ memory_probe_store(struct device *dev, struct 
> device_attrib

Re: [PATCH] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Yasuaki Ishimatsu

(2014/12/01 7:16), Paul Mackerras wrote:

The bounds check for nodeid in cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers.


Do you have the call trace? If you have it, please add it in the description.


To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid, the
BUG_ON in the next line will catch that.

Signed-off-by: Paul Mackerras pau...@samba.org


Do you need to backport it into -stable kernels?


---
diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;




-   VM_BUG_ON(nodeid  num_online_nodes());
+   VM_BUG_ON(nodeid  0 || nodeid = MAX_NUMNODES);


How about use:
VM_BUG_ON(!node_online(nodeid));

When allocating the memory, the node of the memory being allocated must be
online. But your code cannot check the condition.

Thanks,
Yasuaki Ishimatsu


n = get_node(cachep, nodeid);
BUG_ON(!n);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Yasuaki Ishimatsu

(2014/12/01 9:42), Paul Mackerras wrote:

On Mon, Dec 01, 2014 at 09:14:40AM +0900, Yasuaki Ishimatsu wrote:

(2014/12/01 7:16), Paul Mackerras wrote:

The bounds check for nodeid in cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers.


Do you have the call trace? If you have it, please add it in the description.


I can get it easily enough.


To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid, the
BUG_ON in the next line will catch that.

Signed-off-by: Paul Mackerras pau...@samba.org


Do you need to backport it into -stable kernels?


It does need to go to stable, yes, for 3.10 and later.


---
diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;






-   VM_BUG_ON(nodeid  num_online_nodes());
+   VM_BUG_ON(nodeid  0 || nodeid = MAX_NUMNODES);


How about use:
VM_BUG_ON(!node_online(nodeid));


That would not be better, since node_online() doesn't bounds-check its
argument.



Ah. You are right.


When allocating the memory, the node of the memory being allocated must be
online. But your code cannot check the condition.


The following two lines:


n = get_node(cachep, nodeid);
BUG_ON(!n);


effectively check that condition already, as I tried to explain in the
commit message.


O.K. I understood.

Thansk,
Yasuaki Ishimatsu



Regards,
Paul.




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

2014-11-30 Thread Yasuaki Ishimatsu

(2014/12/01 13:28), Paul Mackerras wrote:

The bounds check for nodeid in cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers, like this:

kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
task: c13ba230 ti: c1494000 task.ti: c1494000
NIP: c0264f6c LR: c0264f5c CTR: 
REGS: c14979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
MSR: 92021032 SF,HV,VEC,ME,IR,DR,RI  CR: 28000448  XER: 2000
CFAR: c047e978 SOFTE: 0
GPR00: c0264f5c c1497c20 c1499d48 0004
GPR04: 0100 0010 0068 
GPR08:  0001 082d c0cca5a8
GPR12: 48000448 cfda 01003bd44ff0 10020578
GPR16: 01003bd44ff8 01003bd45000 0001 
GPR20:    0010
GPR24: c00ffe80 c0c824ec 0068 c00ffe80
GPR28: 0010 c00ffe80 0010 
NIP [c0264f6c] .cache_alloc_node+0x6c/0x270
LR [c0264f5c] .cache_alloc_node+0x5c/0x270
Call Trace:
[c1497c20] [c0264f5c] .cache_alloc_node+0x5c/0x270 
(unreliable)
[c1497cf0] [c026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
[c1497dc0] [c0c824ec] .init_list+0x3c/0x128
[c1497e50] [c0c827b4] .kmem_cache_init+0x1dc/0x258
[c1497ef0] [c0c54090] .start_kernel+0x2a0/0x568
[c1497f90] [c0008c6c] start_here_common+0x20/0xa8
Instruction dump:
7c7d1b78 7c962378 4bda4e91 6000 3c620004 38800100 386370d8 48219959
6000 7f83e000 7d301026 5529effe 0b09 393c0010 79291f24 7d3d4a14

To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
example if the node is off-line), the BUG_ON in the next line will
catch that.

Signed-off-by: Paul Mackerras pau...@samba.org
---


Looks good to me.

Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

If you need to backport it into -stable kernel, please read
Documentation/stable_kernel_rules.txt.

Thanks,
Yasuaki Ishimatsu


v2: include the oops message in the patch description

  mm/slab.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
void *obj;
int x;

-   VM_BUG_ON(nodeid  num_online_nodes());
+   VM_BUG_ON(nodeid  0 || nodeid = MAX_NUMNODES);
n = get_node(cachep, nodeid);
BUG_ON(!n);





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] Unify CPU hotplug lock interface

2013-08-29 Thread Yasuaki Ishimatsu
(2013/08/30 9:22), Toshi Kani wrote:
 lock_device_hotplug() was recently introduced to serialize CPU  Memory
 online/offline and hotplug operations, along with sysfs online interface
 restructure (commit 4f3549d7).  With this new locking scheme,
 cpu_hotplug_driver_lock() is redundant and is no longer necessary.
 
 This patchset makes sure that lock_device_hotplug() covers all CPU online/
 offline interfaces, and then removes cpu_hotplug_driver_lock().
 
 v2:
   - Rebased to the pm tree, bleeding-edge.
   - Changed patch 2/4 to use lock_device_hotplug_sysfs().
 
 ---
 Toshi Kani (4):
hotplug, x86: Fix online state in cpu0 debug interface
hotplug, x86: Add hotplug lock to missing places
hotplug, x86: Disable ARCH_CPU_PROBE_RELEASE on x86
hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock()
 
 ---
The patch-set looks good to me.

Acked-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

Thanks,
Yasuaki Ishimatsu


   arch/powerpc/kernel/smp.c  | 12 --
   arch/powerpc/platforms/pseries/dlpar.c | 40 
 +-
   arch/x86/Kconfig   |  4 
   arch/x86/kernel/smpboot.c  | 21 --
   arch/x86/kernel/topology.c | 11 ++
   drivers/base/cpu.c | 34 +++--
   include/linux/cpu.h| 13 ---
   7 files changed, 45 insertions(+), 90 deletions(-)
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 7/7] drivers: base: refactor add_memory_section() to add_memory_block()

2013-08-22 Thread Yasuaki Ishimatsu
(2013/08/22 17:20), Yasuaki Ishimatsu wrote:
 (2013/08/21 2:13), Seth Jennings wrote:
 Right now memory_dev_init() maintains the memory block pointer
 between iterations of add_memory_section().  This is nasty.

 This patch refactors add_memory_section() to become add_memory_block().
 The refactoring pulls the section scanning out of memory_dev_init()
 and simplifies the signature.

 Signed-off-by: Seth Jennings sjenn...@linux.vnet.ibm.com
 ---
drivers/base/memory.c | 48 
 +---
1 file changed, 21 insertions(+), 27 deletions(-)

 diff --git a/drivers/base/memory.c b/drivers/base/memory.c
 index 7d9d3bc..021283a 100644
 --- a/drivers/base/memory.c
 +++ b/drivers/base/memory.c
 @@ -602,32 +602,31 @@ static int init_memory_block(struct memory_block 
 **memory,
  return ret;
}

 -static int add_memory_section(struct mem_section *section,
 -struct memory_block **mem_p)
 +static int add_memory_block(int base_section_nr)
{
 -struct memory_block *mem = NULL;
 -int scn_nr = __section_nr(section);
 -int ret = 0;
 -
 -if (mem_p  *mem_p) {
 -if (scn_nr = (*mem_p)-start_section_nr 
 -scn_nr = (*mem_p)-end_section_nr) {
 -mem = *mem_p;
 -}
 -}
 +struct memory_block *mem;
 +int i, ret, section_count = 0, section_nr;

 -if (mem)
 -mem-section_count++;
 -else {
 -ret = init_memory_block(mem, section, MEM_ONLINE);
 -/* store memory_block pointer for next loop */
 -if (!ret  mem_p)
 -*mem_p = mem;
 +for (i = base_section_nr;
 + (i  base_section_nr + sections_per_block)  i  NR_MEM_SECTIONS;
 + i++) {
 +if (!present_section_nr(i))
 +continue;
 +if (section_count == 0)
 +section_nr = i;
 +section_count++;
  }

 -return ret;
 +if (section_count == 0)
 +return 0;
 +ret = init_memory_block(mem, __nr_to_section(section_nr), MEM_ONLINE);
 +if (ret)
 +return ret;
 +mem-section_count = section_count;
 +return 0;
}

 +
/*
 * need an interface for the VM to add new memory regions,
 * but without onlining it.
 @@ -733,7 +732,6 @@ int __init memory_dev_init(void)
  int ret;
  int err;
  unsigned long block_sz;
 -struct memory_block *mem = NULL;

  ret = subsys_system_register(memory_subsys, memory_root_attr_groups);
  if (ret)
 @@ -747,12 +745,8 @@ int __init memory_dev_init(void)
   * during boot and have been initialized
   */
  mutex_lock(mem_sysfs_mutex);
 -for (i = 0; i  NR_MEM_SECTIONS; i++) {
 -if (!present_section_nr(i))
 -continue;
 -/* don't need to reuse memory_block if only one per block */
 -err = add_memory_section(__nr_to_section(i),
 - (sections_per_block == 1) ? NULL : mem);
 +for (i = 0; i  NR_MEM_SECTIONS; i += sections_per_block) {
 
 Why do you remove present_setcion_nr() check?

Sorry for the noise. I understood.
The check was moved into add_memory_section(). So it was removed.

Thanks,
Yasuaki Ishimatsu

 
 +err = add_memory_block(i);
  if (!ret)
 
 Thanks,
 Yasuaki Ishimatasu
 
  ret = err;
  }

 
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/7] drivers: base: move mutex lock out of add_memory_section()

2013-08-22 Thread Yasuaki Ishimatsu

(2013/08/21 2:24), Seth Jennings wrote:

Gah! Forgot the cover letter.

This patchset just seeks to clean up and refactor some things in
memory.c for better understanding and possibly better performance due do
a decrease in mutex acquisitions and refcount churn at boot time.  No
functional change is intended by this set!


All patches were
Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Tested-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

Thanks,
Yasuaki Ishimatsu



Seth

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 7/7] drivers: base: refactor add_memory_section() to add_memory_block()

2013-08-22 Thread Yasuaki Ishimatsu
(2013/08/21 2:13), Seth Jennings wrote:
 Right now memory_dev_init() maintains the memory block pointer
 between iterations of add_memory_section().  This is nasty.
 
 This patch refactors add_memory_section() to become add_memory_block().
 The refactoring pulls the section scanning out of memory_dev_init()
 and simplifies the signature.
 
 Signed-off-by: Seth Jennings sjenn...@linux.vnet.ibm.com
 ---
   drivers/base/memory.c | 48 +---
   1 file changed, 21 insertions(+), 27 deletions(-)
 
 diff --git a/drivers/base/memory.c b/drivers/base/memory.c
 index 7d9d3bc..021283a 100644
 --- a/drivers/base/memory.c
 +++ b/drivers/base/memory.c
 @@ -602,32 +602,31 @@ static int init_memory_block(struct memory_block 
 **memory,
   return ret;
   }
   
 -static int add_memory_section(struct mem_section *section,
 - struct memory_block **mem_p)
 +static int add_memory_block(int base_section_nr)
   {
 - struct memory_block *mem = NULL;
 - int scn_nr = __section_nr(section);
 - int ret = 0;
 -
 - if (mem_p  *mem_p) {
 - if (scn_nr = (*mem_p)-start_section_nr 
 - scn_nr = (*mem_p)-end_section_nr) {
 - mem = *mem_p;
 - }
 - }
 + struct memory_block *mem;
 + int i, ret, section_count = 0, section_nr;
   
 - if (mem)
 - mem-section_count++;
 - else {
 - ret = init_memory_block(mem, section, MEM_ONLINE);
 - /* store memory_block pointer for next loop */
 - if (!ret  mem_p)
 - *mem_p = mem;
 + for (i = base_section_nr;
 +  (i  base_section_nr + sections_per_block)  i  NR_MEM_SECTIONS;
 +  i++) {
 + if (!present_section_nr(i))
 + continue;
 + if (section_count == 0)
 + section_nr = i;
 + section_count++;
   }
   
 - return ret;
 + if (section_count == 0)
 + return 0;
 + ret = init_memory_block(mem, __nr_to_section(section_nr), MEM_ONLINE);
 + if (ret)
 + return ret;
 + mem-section_count = section_count;
 + return 0;
   }
   
 +
   /*
* need an interface for the VM to add new memory regions,
* but without onlining it.
 @@ -733,7 +732,6 @@ int __init memory_dev_init(void)
   int ret;
   int err;
   unsigned long block_sz;
 - struct memory_block *mem = NULL;
   
   ret = subsys_system_register(memory_subsys, memory_root_attr_groups);
   if (ret)
 @@ -747,12 +745,8 @@ int __init memory_dev_init(void)
* during boot and have been initialized
*/
   mutex_lock(mem_sysfs_mutex);
 - for (i = 0; i  NR_MEM_SECTIONS; i++) {
 - if (!present_section_nr(i))
 - continue;
 - /* don't need to reuse memory_block if only one per block */
 - err = add_memory_section(__nr_to_section(i),
 -  (sections_per_block == 1) ? NULL : mem);
 + for (i = 0; i  NR_MEM_SECTIONS; i += sections_per_block) {

Why do you remove present_setcion_nr() check?

 + err = add_memory_block(i);
   if (!ret)

Thanks,
Yasuaki Ishimatasu

   ret = err;
   }
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [Patch v4 08/12] memory-hotplug: remove memmap of sparse-vmemmap

2012-11-29 Thread Yasuaki Ishimatsu
;
+
+   pte = pte_offset_kernel(pmd, addr);
+   for (; addr  end; pte++, addr += PAGE_SIZE) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   if (next  end)
+   next = end;
+
+   if (pte_none(*pte))
+   continue;
+   if (IS_ALIGNED(addr, PAGE_SIZE) 
+   IS_ALIGNED(end, PAGE_SIZE)) {
+   vmemmap_free_pages(pte_page(*pte), 0);
+   spin_lock(init_mm.page_table_lock);
+   pte_clear(init_mm, addr, pte);
+   spin_unlock(init_mm.page_table_lock);


If addr or end is not alianed with PAGE_SIZE, you may leak some
memory.



yes, I think we can handle this situation with the method you mentioned in the 
change log:
1. When removing memory, the page structs of the revmoved memory are filled
with 0xFD.
2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
In this case, the page used as PT/PMD can be freed.

By the way, why is 0xFD?


There is no reason. I just filled the page with unique number.

Thanks,
Yasuaki Ishimatsu




+   }
+   }
+
+   free_pte_table(pmd);
+   __flush_tlb_all();
+}
+
+static void vmemmap_pmd_remove(pud_t *pud, unsigned long addr, unsigned long 
end)
+{
+   unsigned long next;
+   pmd_t *pmd;
+
+   pmd = pmd_offset(pud, addr);
+   for (; addr  end; addr = next, pmd++) {
+   next = pmd_addr_end(addr, end);
+   if (pmd_none(*pmd))
+   continue;
+
+   if (cpu_has_pse) {
+   unsigned long pte_base;
+
+   if (IS_ALIGNED(addr, PMD_SIZE) 
+   IS_ALIGNED(next, PMD_SIZE)) {
+   vmemmap_free_pages(pmd_page(*pmd),
+  get_order(PMD_SIZE));
+   spin_lock(init_mm.page_table_lock);
+   pmd_clear(pmd);
+   spin_unlock(init_mm.page_table_lock);
+   continue;
+   }
+
+   /*
+* We use 2M page, but we need to remove part of them,
+* so split 2M page to 4K page.
+*/
+   pte_base = get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);


get_zeored_page() may fail. You should handle this error.



That means system is out of memory, I will trigger a bug_on.


+   split_large_page((pte_t *)pmd, addr, (pte_t *)pte_base);
+   __flush_tlb_all();
+
+   spin_lock(init_mm.page_table_lock);
+   pmd_populate_kernel(init_mm, pmd, (pte_t *)pte_base);
+   spin_unlock(init_mm.page_table_lock);
+   }
+
+   vmemmap_pte_remove(pmd, addr, next);
+   }
+
+   free_pmd_table(pud);
+   __flush_tlb_all();
+}
+
+static void vmemmap_pud_remove(pgd_t *pgd, unsigned long addr, unsigned long 
end)
+{
+   unsigned long next;
+   pud_t *pud;
+
+   pud = pud_offset(pgd, addr);
+   for (; addr  end; addr = next, pud++) {
+   next = pud_addr_end(addr, end);
+   if (pud_none(*pud))
+   continue;
+
+   vmemmap_pmd_remove(pud, addr, next);
+   }
+
+   free_pud_table(pgd);
+   __flush_tlb_all();
+}
+
+void vmemmap_free(struct page *memmap, unsigned long nr_pages)
+{
+   unsigned long addr = (unsigned long)memmap;
+   unsigned long end = (unsigned long)(memmap + nr_pages);
+   unsigned long next;
+
+   for (; addr  end; addr = next) {
+   pgd_t *pgd = pgd_offset_k(addr);
+
+   next = pgd_addr_end(addr, end);
+   if (!pgd_present(*pgd))
+   continue;
+
+   vmemmap_pud_remove(pgd, addr, next);
+   sync_global_pgds(addr, next);


The parameter for sync_global_pgds() is [start, end], not
[start, end)



yes, thanks.


+   }
+}
+#endif
diff --git a/mm/sparse.c b/mm/sparse.c
index fac95f2..3a16d68 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -613,12 +613,13 @@ static inline struct page 
*kmalloc_section_memmap(unsigned long pnum, int nid,
/* This will make the necessary allocations eventually. */
return sparse_mem_map_populate(pnum, nid);
  }
-static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
+static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)

Why do you change this line?



0k, it is no need to change.


  {
-   return; /* XXX: Not implemented yet */
+   vmemmap_free(page, nr_pages);
  }
  static void free_map_bootmem(struct page *page, unsigned long nr_pages)
  {
+   vmemmap_free(page, nr_pages);
  }
  #else
  static struct page *__kmalloc_section_memmap(unsigned long nr_pages

Re: [Patch v4 00/12] memory-hotplug: hot-remove physical memory

2012-11-27 Thread Yasuaki Ishimatsu

Hi Andrew,

2012/11/28 4:27, Andrew Morton wrote:

On Tue, 27 Nov 2012 18:00:10 +0800
Wen Congyang we...@cn.fujitsu.com wrote:


The patch-set was divided from following thread's patch-set.
 https://lkml.org/lkml/2012/9/5/201

The last version of this patchset:
 https://lkml.org/lkml/2012/11/1/93


As we're now at -rc7 I'd prefer to take a look at all of this after the
3.7 release - please resend everything shortly after 3.8-rc1.


Almost patches about memory hotplug has been merged into your and Rafael's
tree. And these patches are waiting to open the v3.8 merge window.
Remaining patches are only this patch-set. So we hope that this patch-set
is merged into v3.8.

In merging this patch-set into v3.8, Linux on x86_64 makes a memory hot plug
possible.

Thanks,
Yasuaki Ishimatsu




If you want to know the reason, please read following thread.

https://lkml.org/lkml/2012/10/2/83


Please include the rationale within each version of the patchset rather
than by linking to an old email.  Because

a) this way, more people are likely to read it

b) it permits the text to be maimtained as the code evolves

c) it permits the text to be included in the mainlnie commit, where
people can find it.


The patch-set has only the function of kernel core side for physical
memory hot remove. So if you use the patch, please apply following
patches.

- bug fix for memory hot remove
   https://lkml.org/lkml/2012/10/31/269

- acpi framework
   https://lkml.org/lkml/2012/10/26/175


What's happening with the acpi framework?  has it received any feedback
from the ACPI developers?




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3 00/12] memory-hotplug: hot-remove physical memory

2012-11-21 Thread Yasuaki Ishimatsu
Hi Andrew,

The patch-set aims to linux-3.8. So we would like you to merge
the patch-set into your tree.

The patch-set applied many comments. Currently there is no
comment to the patch-set.

Additionally, we have spent a lot of time on the verification
of the patch-set. And we found many bugs, and fixed them.

So we believe that Linux on x86_64 can support memory hot remove
by the patch-set. 

Thanks,
Yasuaki Ishimatsu

2012/11/01 18:44, Wen Congyang wrote:
 The patch-set was divided from following thread's patch-set.
  https://lkml.org/lkml/2012/9/5/201
 
 The last version of this patchset:
  https://lkml.org/lkml/2012/10/23/213
 
 If you want to know the reason, please read following thread.
 
 https://lkml.org/lkml/2012/10/2/83
 
 The patch-set has only the function of kernel core side for physical
 memory hot remove. So if you use the patch, please apply following
 patches.
 
 - bug fix for memory hot remove
https://lkml.org/lkml/2012/10/31/269

 - acpi framework
https://lkml.org/lkml/2012/10/26/175
 
 The patches can free/remove the following things:
 
- /sys/firmware/memmap/X/{end, start, type} : [PATCH 2/10]
- mem_section and related sysfs files   : [PATCH 3-4/10]
- memmap of sparse-vmemmap  : [PATCH 5-7/10]
- page table of removed memory  : [RFC PATCH 8/10]
- node and related sysfs files  : [RFC PATCH 9-10/10]
 
 * [PATCH 2/10] checks whether the memory can be removed or not.
 
 If you find lack of function for physical memory hot-remove, please let me
 know.
 
 How to test this patchset?
 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
 ACPI_HOTPLUG_MEMORY must be selected.
 2. load the module acpi_memhotplug
 3. hotplug the memory device(it depends on your hardware)
 You will see the memory device under the directory /sys/bus/acpi/devices/.
 Its name is PNP0C80:XX.
 4. online/offline pages provided by this memory device
 You can write online/offline to /sys/devices/system/memory/memoryX/state 
 to
 online/offline pages provided by this memory device
 5. hotremove the memory device
 You can hotremove the memory device by the hardware, or writing 1 to
 /sys/bus/acpi/devices/PNP0C80:XX/eject.
 
 Note: if the memory provided by the memory device is used by the kernel, it
 can't be offlined. It is not a bug.
 
 Known problems:
 1. hotremoving memory device may cause kernel panicked
 This bug will be fixed by Liu Jiang's patch:
 https://lkml.org/lkml/2012/7/3/1
 
 
 Changelogs from v2 to v3:
   Patch9: call sync_global_pgds() if pgd is changed
   Patch10: fix a problem int the patch
 
 Changelogs from v1 to v2:
   Patch1: new patch, offline memory twice. 1st iterate: offline every non 
 primary
   memory block. 2nd iterate: offline primary (i.e. first added) memory
   block.
 
   Patch3: new patch, no logical change, just remove reduntant codes.
 
   Patch9: merge the patch from wujianguo into this patch. flush tlb on all cpu
   after the pagetable is changed.
 
   Patch12: new patch, free node_data when a node is offlined
 
 Wen Congyang (6):
memory-hotplug: try to offline the memory twice to avoid dependence
memory-hotplug: remove redundant codes
memory-hotplug: introduce new function arch_remove_memory() for
  removing page table depends on architecture
memory-hotplug: remove page table of x86_64 architecture
memory-hotplug: remove sysfs file of node
memory-hotplug: free node_data when a node is offlined
 
 Yasuaki Ishimatsu (6):
memory-hotplug: check whether all memory blocks are offlined or not
  when removing memory
memory-hotplug: remove /sys/firmware/memmap/X sysfs
memory-hotplug: unregister memory section on SPARSEMEM_VMEMMAP
memory-hotplug: implement register_page_bootmem_info_section of
  sparse-vmemmap
memory-hotplug: remove memmap of sparse-vmemmap
memory-hotplug: memory_hotplug: clear zone when removing the memory
 
   arch/ia64/mm/discontig.c |  14 ++
   arch/ia64/mm/init.c  |  18 ++
   arch/powerpc/mm/init_64.c|  14 ++
   arch/powerpc/mm/mem.c|  12 +
   arch/s390/mm/init.c  |  12 +
   arch/s390/mm/vmem.c  |  14 ++
   arch/sh/mm/init.c|  17 ++
   arch/sparc/mm/init_64.c  |  14 ++
   arch/tile/mm/init.c  |   8 +
   arch/x86/include/asm/pgtable_types.h |   1 +
   arch/x86/mm/init_32.c|  12 +
   arch/x86/mm/init_64.c| 417 
 +++
   arch/x86/mm/pageattr.c   |  47 ++--
   drivers/acpi/acpi_memhotplug.c   |   8 +-
   drivers/base/memory.c|   6 +
   drivers/firmware/memmap.c|  98 +++-
   include/linux/firmware-map.h |   6 +
   include/linux/memory_hotplug.h   |  15 +-
   include/linux/mm.h

Re: [PATCH v3 11/12] memory-hotplug: remove sysfs file of node

2012-11-19 Thread Yasuaki Ishimatsu
Hi Wen,

This patch cannot be applied, if I apply latest acpi framework's patch-set:

https://lkml.org/lkml/2012/11/15/21

Because acpi_memory_disable_device() is gone by the patch-set.

I updated the patch and attached it on the mail.

2012/11/01 18:44, Wen Congyang wrote:
 This patch introduces a new function try_offline_node() to
 remove sysfs file of node when all memory sections of this
 node are removed. If some memory sections of this node are
 not removed, this function does nothing.
 
 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
   drivers/acpi/acpi_memhotplug.c |  8 +-
   include/linux/memory_hotplug.h |  2 +-
   mm/memory_hotplug.c| 58 
 --
   3 files changed, 64 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
 index 24c807f..0780f99 100644
 --- a/drivers/acpi/acpi_memhotplug.c
 +++ b/drivers/acpi/acpi_memhotplug.c
 @@ -310,7 +310,9 @@ static int acpi_memory_disable_device(struct 
 acpi_memory_device *mem_device)
   {
   int result;
   struct acpi_memory_info *info, *n;
 + int node;
   
 + node = acpi_get_node(mem_device-device-handle);
   
   /*
* Ask the VM to offline this memory range.
 @@ -318,7 +320,11 @@ static int acpi_memory_disable_device(struct 
 acpi_memory_device *mem_device)
*/
   list_for_each_entry_safe(info, n, mem_device-res_list, list) {
   if (info-enabled) {
 - result = remove_memory(info-start_addr, info-length);
 + if (node  0)
 + node = memory_add_physaddr_to_nid(
 + info-start_addr);
 + result = remove_memory(node, info-start_addr,
 + info-length);
   if (result)
   return result;
   }
 diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
 index d4c4402..7b4cfe6 100644
 --- a/include/linux/memory_hotplug.h
 +++ b/include/linux/memory_hotplug.h
 @@ -231,7 +231,7 @@ extern int arch_add_memory(int nid, u64 start, u64 size);
   extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
   extern int offline_memory_block(struct memory_block *mem);
   extern bool is_memblock_offlined(struct memory_block *mem);
 -extern int remove_memory(u64 start, u64 size);
 +extern int remove_memory(int node, u64 start, u64 size);
   extern int sparse_add_one_section(struct zone *zone, unsigned long 
 start_pfn,
   int nr_pages);
   extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
 *ms);
 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index 7bcced0..d965da3 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -29,6 +29,7 @@
   #include linux/suspend.h
   #include linux/mm_inline.h
   #include linux/firmware-map.h
 +#include linux/stop_machine.h
   
   #include asm/tlbflush.h
   
 @@ -1299,7 +1300,58 @@ static int is_memblock_offlined_cb(struct memory_block 
 *mem, void *arg)
   return ret;
   }
   
 -int __ref remove_memory(u64 start, u64 size)
 +static int check_cpu_on_node(void *data)
 +{
 + struct pglist_data *pgdat = data;
 + int cpu;
 +
 + for_each_present_cpu(cpu) {
 + if (cpu_to_node(cpu) == pgdat-node_id)
 + /*
 +  * the cpu on this node isn't removed, and we can't
 +  * offline this node.
 +  */
 + return -EBUSY;
 + }
 +
 + return 0;
 +}
 +
 +/* offline the node if all memory sections of this node are removed */
 +static void try_offline_node(int nid)
 +{
 + unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn;
 + unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages;
 + unsigned long pfn;
 +
 + for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
 + unsigned long section_nr = pfn_to_section_nr(pfn);
 +
 + if (!present_section_nr(section_nr))
 + continue;
 +
 + if (pfn_to_nid(pfn) != nid)
 + continue;
 +
 + /*
 +  * some memory sections of this node are not removed, and we
 +  * can't offline node now.
 +  */
 + return;
 + }
 +
 + if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL))
 + return;
 +
 + /*
 +  * all memory/cpu of this node are removed, we can offline

Re: [PATCH 5/10] memory-hotplug : memory-hotplug: check page type in get_page_bootmem

2012-10-18 Thread Yasuaki Ishimatsu

Hi Kosaki,

Sorry for late reply.

2012/10/13 4:28, KOSAKI Motohiro wrote:

On Thu, Oct 4, 2012 at 10:32 PM, Yasuaki Ishimatsu
isimatu.yasu...@jp.fujitsu.com wrote:

The function get_page_bootmem() may be called more than one time to the same
page. There is no need to set page's type, private if the function is not
the first time called to the page.

Note: the patch is just optimization and does not fix any problem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
  mm/memory_hotplug.c |   15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

Index: linux-3.6/mm/memory_hotplug.c
===
--- linux-3.6.orig/mm/memory_hotplug.c  2012-10-04 18:29:58.284676075 +0900
+++ linux-3.6/mm/memory_hotplug.c   2012-10-04 18:30:03.454680542 +0900
@@ -95,10 +95,17 @@ static void release_memory_resource(stru
  static void get_page_bootmem(unsigned long info,  struct page *page,
  unsigned long type)
  {
-   page-lru.next = (struct list_head *) type;
-   SetPagePrivate(page);
-   set_page_private(page, info);
-   atomic_inc(page-_count);
+   unsigned long page_type;
+
+   page_type = (unsigned long)page-lru.next;


If I understand correctly, page-lru.next might be uninitialized yet.


Ah yes. I was misunderstanding...

Hi Wen,

When you update the physical hot remove patch-set, please drop the patch.

Thanks,
Yasuaki Ishimatsu  
 

Moreover, I have no seen any good effect in this patch. I don't understand
why we need to increase code complexity.




+   if (page_type  MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+   page_type  MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+   page-lru.next = (struct list_head *)type;
+   SetPagePrivate(page);
+   set_page_private(page, info);
+   atomic_inc(page-_count);
+   } else
+   atomic_inc(page-_count);
  }

  /* reference to __meminit __free_pages_bootmem is valid

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/10] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-10-11 Thread Yasuaki Ishimatsu

2012/10/06 4:36, KOSAKI Motohiro wrote:

On Thu, Oct 4, 2012 at 10:26 PM, Yasuaki Ishimatsu
isimatu.yasu...@jp.fujitsu.com wrote:

When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note : The code does not free firmware_map_entry since there is no way to free
memory which is allocated by bootmem.


You have to explain why this is ok. I guess the unfreed
firmware_map_entry is reused
at next online memory and don't make memory leak, right?


Unfortunately, it is no. It makes memory leak about firmware_map_entry size.
If we hot add memory, slab allocater prepares a other memory for
firmware_map_entry.

In my understanding, if the memory is allocated by bootmem allocator,
the memory is not managed by slab allocator. So we can not use kfree()
against the memory.
On the other hand, the page of the memory may have various data allocalted
by bootmem allocater with the exception of the firmware_map_entry. Thus we
cannot free the page.

So the patch makes memory leak. But I think the memory leak size is
very samll. And it does not affect the system.







CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
  drivers/firmware/memmap.c|   98 
++-
  include/linux/firmware-map.h |6 ++
  mm/memory_hotplug.c  |7 ++-
  3 files changed, 108 insertions(+), 3 deletions(-)

Index: linux-3.6/drivers/firmware/memmap.c
===
--- linux-3.6.orig/drivers/firmware/memmap.c2012-10-04 18:27:05.195500420 
+0900
+++ linux-3.6/drivers/firmware/memmap.c 2012-10-04 18:27:18.901514330 +0900
@@ -21,6 +21,7 @@
  #include linux/types.h
  #include linux/bootmem.h
  #include linux/slab.h
+#include linux/mm.h

  /*
   * Data types 
--
@@ -41,6 +42,7 @@ struct firmware_map_entry {
 const char  *type;  /* type of the memory range */
 struct list_headlist;   /* entry for the linked list */
 struct kobject  kobj;   /* kobject for each entry */
+   unsigned intbootmem:1; /* allocated from bootmem */


Use bool.


We'll update it.




  };

  /*
@@ -79,7 +81,26 @@ static const struct sysfs_ops memmap_att
 .show = memmap_attr_show,
  };

+
+static inline struct firmware_map_entry *
+to_memmap_entry(struct kobject *kobj)
+{
+   return container_of(kobj, struct firmware_map_entry, kobj);
+}
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+   struct firmware_map_entry *entry = to_memmap_entry(kobj);
+
+   if (entry-bootmem)
+   /* There is no way to free memory allocated from bootmem */
+   return;
+
+   kfree(entry);
+}
+
  static struct kobj_type memmap_ktype = {
+   .release= release_firmware_map_entry,
 .sysfs_ops  = memmap_attr_ops,
 .default_attrs  = def_attrs,
  };
@@ -94,6 +115,7 @@ static struct kobj_type memmap_ktype = {
   * in firmware initialisation code in one single thread of execution.
   */
  static LIST_HEAD(map_entries);
+static DEFINE_SPINLOCK(map_entries_lock);

  /**
   * firmware_map_add_entry() - Does the real work to add a firmware memmap 
entry.
@@ -118,11 +140,25 @@ static int firmware_map_add_entry(u64 st
 INIT_LIST_HEAD(entry-list);
 kobject_init(entry-kobj, memmap_ktype);

+   spin_lock(map_entries_lock);
 list_add_tail(entry-list, map_entries);
+   spin_unlock(map_entries_lock);

 return 0;
  }

+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)


Don't use inline in *.c file. gcc is wise than you.


We'll update it.


+{
+   spin_lock(map_entries_lock);
+   list_del(entry-list);
+   spin_unlock(map_entries_lock);
+}
+
  /*
   * Add memmap entry on sysfs
   */
@@ -144,6 +180,35 @@ static int add_sysfs_fw_map_entry(struct
 return 0;
  }

+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+   kobject_put(entry-kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+static struct firmware_map_entry * __meminit
+firmware_map_find_entry(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   spin_lock(map_entries_lock);
+   list_for_each_entry(entry, map_entries, list

Re: linux-next: build failure after merge of the origin tree

2012-10-09 Thread Yasuaki Ishimatsu

Hi Stephen,

2012/10/10 8:45, Andrew Morton wrote:

On Wed, 10 Oct 2012 10:21:50 +1100 Stephen Rothwell s...@canb.auug.org.au 
wrote:


Hi Linus,

In Linus' tree, today's linux-next build (powerpc ppc64_defconfig) failed
like this:

arch/powerpc/platforms/pseries/hotplug-memory.c: In function 
'pseries_remove_memblock':
arch/powerpc/platforms/pseries/hotplug-memory.c:103:17: error: unused variable 
'pfn' [-Werror=unused-variable]

Caused by commit d760afd4d257 (memory-hotplug: suppress Trying to free
nonexistent resource - warning).

I can't see what the point of the pfn variable is


This:

--- a/arch/powerpc/platforms/pseries/hotplug-memory.c~a
+++ a/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -101,7 +101,7 @@ static int pseries_remove_memblock(unsig
sections_to_remove = (memblock_size  PAGE_SHIFT) / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
-   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   ret = __remove_pages(zone, pfn, PAGES_PER_SECTION);
if (ret)
return ret;
}


I believe the error to be fixed with this patch.
Could you try it?

Thanks,
Yasuaki Ishimatsu




and this patch never
appeared in linux-next before being merged.  :-(


It was first sighted October 3.


I have reverted that commit for today.

If this patch truly was authored yesterday (according the Author Date in
git), why was it merged yesterday while still under discussion?  And the
latest update to it still has this build problem ... did anyone even try
to build this for powerpc (since that architecture was obviously
affected)?


Apparently not - the ppc bit was a best-effort fixup for a patch which
addresses an x86 problem.




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: memory-hotplug : suppres Trying to free nonexistent resource XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY warning

2012-10-08 Thread Yasuaki Ishimatsu

Hi Andrew,

2012/10/06 6:09, Andrew Morton wrote:

On Thu, 4 Oct 2012 14:31:09 +0900
Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com wrote:


When our x86 box calls __remove_pages(), release_mem_region() shows
many warnings. And x86 box cannot unregister iomem_resource.

Trying to free nonexistent resource -

release_mem_region() has been changed as called in each PAGES_PER_SECTION
chunk since applying a patch(de7f0cba96786c). Because powerpc registers
iomem_resource in each PAGES_PER_SECTION chunk. But when I hot add memory
on x86 box, iomem_resource is register in each _CRS not PAGES_PER_SECTION
chunk. So x86 box unregisters iomem_resource.

The patch fixes the problem.

--- linux-3.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  
2012-10-04 14:22:59.833520792 +0900
+++ linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c   2012-10-04 
14:23:05.150521411 +0900
@@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsig
  {
unsigned long start, start_pfn;
struct zone *zone;
-   int ret;
+   int i, ret;
+   int sections_to_remove;

start_pfn = base  PAGE_SHIFT;

@@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig
 * to sysfs state file and we can't remove sysfs entries
 * while writing to it. So we have to defer it to here.
 */
-   ret = __remove_pages(zone, start_pfn, memblock_size  PAGE_SHIFT);
-   if (ret)
-   return ret;
+   sections_to_remove = (memblock_size  PAGE_SHIFT) / PAGES_PER_SECTION;
+   for (i = 0; i  sections_to_remove; i++) {
+   unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
+   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   if (ret)
+   return ret;
+   }


It is inappropriate that `i' have a signed 32-bit type.  I doubt if
there's any possibility of an overflow bug here, but using a consistent
and well-chosen type would eliminate all doubt.

Note that __remove_pages() does use an unsigned long for this, although
it stupidly calls that variable i, despite the C programmers'
expectation that a variable called i has type int.

The same applies to `sections_to_remove', but __remove_pages() went and
decided to use an `int' for that variable.  Sigh.

Anyway, please have a think, and see if we can come up with the best
and most accurate choice of types and identifiers in this code.


Your concern is right. Overflow bug may occur in the future.
So I changed type of i and sections_to_remove to unsigned long.
Please merge it into your tree instead of previous patch.

__remove_pages() also has same concern. So I'll fix it.

---
When our x86 box calls __remove_pages(), release_mem_region() shows
many warnings. And x86 box cannot unregister iomem_resource.

Trying to free nonexistent resource -

release_mem_region() has been changed as called in each PAGES_PER_SECTION
chunk since applying a patch(de7f0cba96786c). Because powerpc registers
iomem_resource in each PAGES_PER_SECTION chunk. But when I hot add memory
on x86 box, iomem_resource is register in each _CRS not PAGES_PER_SECTION
chunk. So x86 box unregisters iomem_resource.

The patch fixes the problem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   11 ---
 mm/memory_hotplug.c |4 ++--
 2 files changed, 10 insertions(+), 5 deletions(-)

Index: linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux-3.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  
2012-10-05 14:33:09.516197839 +0900
+++ linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c   2012-10-09 
11:27:50.555709827 +0900
@@ -78,6 +78,7 @@ static int pseries_remove_memblock(unsig
unsigned long start, start_pfn;
struct zone *zone;
int ret;
+   unsigned long i, sections_to_remove;
 
 	start_pfn = base  PAGE_SHIFT;
 
@@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig

 * to sysfs state file and we can't remove sysfs entries
 * while writing to it. So we have to defer it to here.
 */
-   ret = __remove_pages(zone, start_pfn, memblock_size  PAGE_SHIFT);
-   if (ret)
-   return ret;
+   sections_to_remove = (memblock_size  PAGE_SHIFT) / PAGES_PER_SECTION;
+   for (i = 0; i

Re: [RFC v9 PATCH 16/21] memory-hotplug: free memmap of sparse-vmemmap

2012-10-04 Thread Yasuaki Ishimatsu

Hi Chen,

Sorry for late reply.

2012/10/02 13:21, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

All pages of virtual mapping in removed memory cannot be freed, since some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch checks whether page can be freed or not.

How to check whether page can be freed or not?
  1. When removing memory, the page structs of the revmoved memory are filled
 with 0FD.
  2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
 In this case, the page used as PT/PMD can be freed.

Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64,
ppc, s390, and sparc.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
  arch/ia64/mm/discontig.c  |8 +++
  arch/powerpc/mm/init_64.c |8 +++
  arch/s390/mm/vmem.c   |8 +++
  arch/sparc/mm/init_64.c   |8 +++
  arch/x86/mm/init_64.c |  119 +
  include/linux/mm.h|2 +
  mm/memory_hotplug.c   |   17 +--
  mm/sparse.c   |5 +-
  8 files changed, 158 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 33943db..0d23b69 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
  return vmemmap_populate_basepages(start_page, size, node);
  }
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
  void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
  {
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 3690c44..835a2b3 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -299,6 +299,14 @@ int __meminit vmemmap_populate(struct page *start_page,
  return 0;
  }
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
  void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
  {
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index eda55cd..4b42b0b 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -227,6 +227,14 @@ out:
  return ret;
  }
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
  void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
  {
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index add1cc7..1384826 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2078,6 +2078,14 @@ void __meminit vmemmap_populate_print_last(void)
  }
  }
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
  void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
  {
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0075592..4e8f8a4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1138,6 +1138,125 @@ vmemmap_populate(struct page *start_page, unsigned long 
size, int node)
  return 0;
  }
+#define PAGE_INUSE 0xFD
+
+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+struct page **pp, int *page_size)
+{
+pgd_t *pgd;
+pud_t *pud;
+pmd_t *pmd;
+pte_t *pte;
+void *page_addr;
+unsigned long next;
+
+*pp = NULL;
+
+pgd = pgd_offset_k(addr);
+if (pgd_none(*pgd))
+return pgd_addr_end(addr, end);
+
+pud = pud_offset(pgd, addr);
+if (pud_none(*pud))
+return pud_addr_end(addr, end);
+
+if (!cpu_has_pse) {
+next = (addr + PAGE_SIZE)  PAGE_MASK;
+pmd = pmd_offset(pud, addr);
+if (pmd_none(*pmd))
+return next;
+
+pte = pte_offset_kernel(pmd, addr);
+if (pte_none(*pte))
+return next;
+
+*page_size = PAGE_SIZE;
+*pp = pte_page

[PATCH 0/10] memory-hotplug: hot-remove physical memory

2012-10-04 Thread Yasuaki Ishimatsu
The patch-set was divided from following thread's patch-set.

https://lkml.org/lkml/2012/9/5/201

If you want to know the reason, please read following thread.

https://lkml.org/lkml/2012/10/2/83

The patch-set has only the function of kernel core side for physical
memory hot remove. So if you use the patch, please apply following
patches.

- bug fix for memory hot remove
  https://lkml.org/lkml/2012/9/27/39
  https://lkml.org/lkml/2012/10/2/83
  http://www.spinics.net/lists/linux-mm/msg42982.html
  
- acpi framework
  https://lkml.org/lkml/2012/10/3/126
  https://lkml.org/lkml/2012/10/3/641

The patches can free/remove the following things:

  - /sys/firmware/memmap/X/{end, start, type} : [PATCH 2/10]
  - mem_section and related sysfs files   : [PATCH 3-4/10]
  - memmap of sparse-vmemmap  : [PATCH 5-7/10]
  - page table of removed memory  : [RFC PATCH 8/10]
  - node and related sysfs files  : [RFC PATCH 9-10/10]

* [PATCH 1/10] checks whether the memory can be removed or not.

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
   ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug
3. hotplug the memory device(it depends on your hardware)
   You will see the memory device under the directory /sys/bus/acpi/devices/.
   Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
   You can write online/offline to /sys/devices/system/memory/memoryX/state to
   online/offline pages provided by this memory device
5. hotremove the memory device
   You can hotremove the memory device by the hardware, or writing 1 to
   /sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
   For example: there is a memory device on node 1. The address range
   is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
   and memory11 under the directory /sys/devices/system/memory/.
   If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
   when we online pages. When we online memory8, the memory stored page cgroup
   is not provided by this memory device. But when we online memory9, the memory
   stored page cgroup may be provided by memory8. So we can't offline memory8
   now. We should offline the memory in the reversed order.
   When the memory device is hotremoved, we will auto offline memory provided
   by this memory device. But we don't know which memory is onlined first, so
   offlining memory may fail. In such case, you should offline the memory by
   hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
   This bug will be fixed by Liu Jiang's patch:
   https://lkml.org/lkml/2012/7/3/1


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/10] memory-hotplug : check whether memory is offline or not when removing memory

2012-10-04 Thread Yasuaki Ishimatsu
When calling remove_memory(), the memory should be offline. If the function
is used to online memory, kernel panic may occur.

So the patch checks whether memory is offline or not.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/base/memory.c  |   39 +++
 include/linux/memory.h |5 +
 mm/memory_hotplug.c|   17 +++--
 3 files changed, 59 insertions(+), 2 deletions(-)

Index: linux-3.6/drivers/base/memory.c
===
--- linux-3.6.orig/drivers/base/memory.c2012-10-04 14:22:57.0 
+0900
+++ linux-3.6/drivers/base/memory.c 2012-10-04 14:45:46.653585860 +0900
@@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(
 }
 EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 
+bool is_memblk_offline(unsigned long start, unsigned long size)
+{
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = PFN_UP(start + size);
+
+   for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
+   section_nr = pfn_to_section_nr(pfn);
+   if (!present_section_nr(section_nr))
+   continue;
+
+   section = __nr_to_section(section_nr);
+   /* same memblock? */
+   if (mem)
+   if ((section_nr = mem-start_section_nr) 
+   (section_nr = mem-end_section_nr))
+   continue;
+
+   mem = find_memory_block_hinted(section, mem);
+   if (!mem)
+   continue;
+   if (mem-state == MEM_OFFLINE)
+   continue;
+
+   kobject_put(mem-dev.kobj);
+   return false;
+   }
+
+   if (mem)
+   kobject_put(mem-dev.kobj);
+
+   return true;
+}
+EXPORT_SYMBOL(is_memblk_offline);
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
Index: linux-3.6/include/linux/memory.h
===
--- linux-3.6.orig/include/linux/memory.h   2012-10-02 18:00:22.0 
+0900
+++ linux-3.6/include/linux/memory.h2012-10-04 14:44:40.902581028 +0900
@@ -106,6 +106,10 @@ static inline int memory_isolate_notify(
 {
return 0;
 }
+static inline bool is_memblk_offline(unsigned long start, unsigned long size)
+{
+   return false;
+}
 #else
 extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
@@ -120,6 +124,7 @@ extern int memory_isolate_notify(unsigne
 extern struct memory_block *find_memory_block_hinted(struct mem_section *,
struct memory_block *);
 extern struct memory_block *find_memory_block(struct mem_section *);
+extern bool is_memblk_offline(unsigned long start, unsigned long size);
 #define CONFIG_MEM_BLOCK_SIZE  (PAGES_PER_SECTIONPAGE_SHIFT)
 enum mem_add_context { BOOT, HOTPLUG };
 #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
Index: linux-3.6/mm/memory_hotplug.c
===
--- linux-3.6.orig/mm/memory_hotplug.c  2012-10-04 14:31:08.0 +0900
+++ linux-3.6/mm/memory_hotplug.c   2012-10-04 14:58:22.449687986 +0900
@@ -1045,8 +1045,21 @@ int offline_memory(u64 start, u64 size)
 
 int remove_memory(int nid, u64 start, u64 size)
 {
-   /* It is not implemented yet*/
-   return 0;
+   int ret = 0;
+   lock_memory_hotplug();
+   /*
+* The memory might become online by other task, even if you offine it.
+* So we check whether the memory has been onlined or not.
+*/
+   if (!is_memblk_offline(start, size)) {
+   pr_warn(memory removing [mem %#010llx-%#010llx] failed, 
+   because the memmory range is online\n,
+   start, start + size);
+   ret = -EAGAIN;
+   }
+
+   unlock_memory_hotplug();
+   return ret;
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 #else

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/10] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-10-04 Thread Yasuaki Ishimatsu
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note : The code does not free firmware_map_entry since there is no way to free
   memory which is allocated by bootmem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/firmware/memmap.c|   98 ++-
 include/linux/firmware-map.h |6 ++
 mm/memory_hotplug.c  |7 ++-
 3 files changed, 108 insertions(+), 3 deletions(-)

Index: linux-3.6/drivers/firmware/memmap.c
===
--- linux-3.6.orig/drivers/firmware/memmap.c2012-10-04 18:27:05.195500420 
+0900
+++ linux-3.6/drivers/firmware/memmap.c 2012-10-04 18:27:18.901514330 +0900
@@ -21,6 +21,7 @@
 #include linux/types.h
 #include linux/bootmem.h
 #include linux/slab.h
+#include linux/mm.h
 
 /*
  * Data types 
--
@@ -41,6 +42,7 @@ struct firmware_map_entry {
const char  *type;  /* type of the memory range */
struct list_headlist;   /* entry for the linked list */
struct kobject  kobj;   /* kobject for each entry */
+   unsigned intbootmem:1; /* allocated from bootmem */
 };
 
 /*
@@ -79,7 +81,26 @@ static const struct sysfs_ops memmap_att
.show = memmap_attr_show,
 };
 
+
+static inline struct firmware_map_entry *
+to_memmap_entry(struct kobject *kobj)
+{
+   return container_of(kobj, struct firmware_map_entry, kobj);
+}
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+   struct firmware_map_entry *entry = to_memmap_entry(kobj);
+
+   if (entry-bootmem)
+   /* There is no way to free memory allocated from bootmem */
+   return;
+
+   kfree(entry);
+}
+
 static struct kobj_type memmap_ktype = {
+   .release= release_firmware_map_entry,
.sysfs_ops  = memmap_attr_ops,
.default_attrs  = def_attrs,
 };
@@ -94,6 +115,7 @@ static struct kobj_type memmap_ktype = {
  * in firmware initialisation code in one single thread of execution.
  */
 static LIST_HEAD(map_entries);
+static DEFINE_SPINLOCK(map_entries_lock);
 
 /**
  * firmware_map_add_entry() - Does the real work to add a firmware memmap 
entry.
@@ -118,11 +140,25 @@ static int firmware_map_add_entry(u64 st
INIT_LIST_HEAD(entry-list);
kobject_init(entry-kobj, memmap_ktype);
 
+   spin_lock(map_entries_lock);
list_add_tail(entry-list, map_entries);
+   spin_unlock(map_entries_lock);
 
return 0;
 }
 
+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
+{
+   spin_lock(map_entries_lock);
+   list_del(entry-list);
+   spin_unlock(map_entries_lock);
+}
+
 /*
  * Add memmap entry on sysfs
  */
@@ -144,6 +180,35 @@ static int add_sysfs_fw_map_entry(struct
return 0;
 }
 
+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+   kobject_put(entry-kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+static struct firmware_map_entry * __meminit
+firmware_map_find_entry(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   spin_lock(map_entries_lock);
+   list_for_each_entry(entry, map_entries, list)
+   if ((entry-start == start)  (entry-end == end) 
+   (!strcmp(entry-type, type))) {
+   spin_unlock(map_entries_lock);
+   return entry;
+   }
+
+   spin_unlock(map_entries_lock);
+   return NULL;
+}
+
 /**
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
@@ -193,9 +258,36 @@ int __init firmware_map_add_early(u64 st
if (WARN_ON(!entry))
return -ENOMEM;
 
+   entry-bootmem = 1;
return firmware_map_add_entry(start, end, type, entry);
 }
 
+/**
+ * firmware_map_remove() - remove a firmware mapping entry
+ * @start: Start of the memory range.
+ * @end:   End of the memory range.
+ * @type:  Type of the memory range.
+ *
+ * removes a firmware mapping entry.
+ *
+ * Returns 0 on success, or -EINVAL if no entry.
+ **/
+int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   entry

[PATCH 3/10] memory-hotplug : introduce new function arch_remove_memory() for removing page table depends on architecture

2012-10-04 Thread Yasuaki Ishimatsu
From: Wen Congyang we...@cn.fujitsu.com

For removing memory, we need to remove page table. But it depends
on architecture. So the patch introduce arch_remove_memory() for
removing page table. Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
  (I don't know how to implement it for s390).

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 arch/ia64/mm/init.c|   18 ++
 arch/powerpc/mm/mem.c  |   12 
 arch/s390/mm/init.c|   12 
 arch/sh/mm/init.c  |   17 +
 arch/tile/mm/init.c|8 
 arch/x86/mm/init_32.c  |   12 
 arch/x86/mm/init_64.c  |   15 +++
 include/linux/memory_hotplug.h |1 +
 mm/memory_hotplug.c|1 +
 9 files changed, 96 insertions(+)

Index: linux-3.6/arch/ia64/mm/init.c
===
--- linux-3.6.orig/arch/ia64/mm/init.c  2012-10-04 18:27:03.082498276 +0900
+++ linux-3.6/arch/ia64/mm/init.c   2012-10-04 18:28:50.087606867 +0900
@@ -688,6 +688,24 @@ int arch_add_memory(int nid, u64 start, 
 
return ret;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+   struct zone *zone;
+   int ret;
+
+   zone = page_zone(pfn_to_page(start_pfn));
+   ret = __remove_pages(zone, start_pfn, nr_pages);
+   if (ret)
+   pr_warn(%s: Problem encountered in __remove_pages() as
+ret=%d\n, __func__,  ret);
+
+   return ret;
+}
+#endif
 #endif
 
 /*
Index: linux-3.6/arch/powerpc/mm/mem.c
===
--- linux-3.6.orig/arch/powerpc/mm/mem.c2012-10-04 18:27:03.084498278 
+0900
+++ linux-3.6/arch/powerpc/mm/mem.c 2012-10-04 18:28:50.094606874 +0900
@@ -133,6 +133,18 @@ int arch_add_memory(int nid, u64 start, 
 
return __add_pages(nid, zone, start_pfn, nr_pages);
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+   struct zone *zone;
+
+   zone = page_zone(pfn_to_page(start_pfn));
+   return __remove_pages(zone, start_pfn, nr_pages);
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 /*
Index: linux-3.6/arch/s390/mm/init.c
===
--- linux-3.6.orig/arch/s390/mm/init.c  2012-10-04 18:27:03.080498274 +0900
+++ linux-3.6/arch/s390/mm/init.c   2012-10-04 18:28:50.104606884 +0900
@@ -257,4 +257,16 @@ int arch_add_memory(int nid, u64 start, 
vmem_remove_mapping(start, size);
return rc;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   /*
+* There is no hardware or firmware interface which could trigger a
+* hot memory remove on s390. So there is nothing that needs to be
+* implemented.
+*/
+   return -EBUSY;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
Index: linux-3.6/arch/sh/mm/init.c
===
--- linux-3.6.orig/arch/sh/mm/init.c2012-10-04 18:27:03.091498285 +0900
+++ linux-3.6/arch/sh/mm/init.c 2012-10-04 18:28:50.116606897 +0900
@@ -558,4 +558,21 @@ int memory_add_physaddr_to_nid(u64 addr)
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+   struct zone *zone;
+   int ret;
+
+   zone = page_zone(pfn_to_page(start_pfn));
+   ret = __remove_pages(zone, start_pfn, nr_pages);
+   if (unlikely(ret))
+   pr_warn(%s: Failed, __remove_pages() == %d\n, __func__,
+   ret);
+
+   return ret;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
Index: linux-3.6/arch/tile/mm/init.c
===
--- linux-3.6.orig/arch/tile/mm/init.c  2012-10-04 18:27:03.078498272 +0900
+++ linux-3.6/arch/tile/mm/init.c   2012-10-04 18:28:50.122606903 +0900
@@ -935,6 +935,14 @@ int remove_memory(u64 start, u64 size)
 {
return -EINVAL;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64

[PATCH 4/10] memory-hotplug : unregister memory section on SPARSEMEM_VMEMMAP

2012-10-04 Thread Yasuaki Ishimatsu
Currently __remove_section for SPARSEMEM_VMEMMAP does nothing. But even if
we use SPARSEMEM_VMEMMAP, we can unregister the memory_section.

So the patch add unregister_memory_section() into __remove_section().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 mm/memory_hotplug.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-3.6/mm/memory_hotplug.c
===
--- linux-3.6.orig/mm/memory_hotplug.c  2012-10-04 18:29:50.577668254 +0900
+++ linux-3.6/mm/memory_hotplug.c   2012-10-04 18:29:58.284676075 +0900
@@ -279,11 +279,14 @@ static int __meminit __add_section(int n
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
-   /*
-* XXX: Freeing memmap with vmemmap is not implement yet.
-*  This should be removed later.
-*/
-   return -EBUSY;
+   int ret = -EINVAL;
+
+   if (!valid_section(ms))
+   return ret;
+
+   ret = unregister_memory_section(ms);
+
+   return ret;
 }
 #else
 static int __remove_section(struct zone *zone, struct mem_section *ms)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 5/10] memory-hotplug : memory-hotplug: check page type in get_page_bootmem

2012-10-04 Thread Yasuaki Ishimatsu
The function get_page_bootmem() may be called more than one time to the same
page. There is no need to set page's type, private if the function is not
the first time called to the page.

Note: the patch is just optimization and does not fix any problem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 mm/memory_hotplug.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

Index: linux-3.6/mm/memory_hotplug.c
===
--- linux-3.6.orig/mm/memory_hotplug.c  2012-10-04 18:29:58.284676075 +0900
+++ linux-3.6/mm/memory_hotplug.c   2012-10-04 18:30:03.454680542 +0900
@@ -95,10 +95,17 @@ static void release_memory_resource(stru
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
-   page-lru.next = (struct list_head *) type;
-   SetPagePrivate(page);
-   set_page_private(page, info);
-   atomic_inc(page-_count);
+   unsigned long page_type;
+
+   page_type = (unsigned long)page-lru.next;
+   if (page_type  MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+   page_type  MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+   page-lru.next = (struct list_head *)type;
+   SetPagePrivate(page);
+   set_page_private(page, info);
+   atomic_inc(page-_count);
+   } else
+   atomic_inc(page-_count);
 }
 
 /* reference to __meminit __free_pages_bootmem is valid

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 6/10] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap

2012-10-04 Thread Yasuaki Ishimatsu
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
So the patch searches pages of virtual mapping and registers the pages by
get_page_bootmem().

Note: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390,
and sparc.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 arch/ia64/mm/discontig.c   |6 
 arch/powerpc/mm/init_64.c  |6 
 arch/s390/mm/vmem.c|6 
 arch/sparc/mm/init_64.c|6 
 arch/x86/mm/init_64.c  |   52 +
 include/linux/memory_hotplug.h |   11 +---
 include/linux/mm.h |3 +-
 mm/memory_hotplug.c|   37 ++---
 8 files changed, 113 insertions(+), 14 deletions(-)

Index: linux-3.6/include/linux/memory_hotplug.h
===
--- linux-3.6.orig/include/linux/memory_hotplug.h   2012-10-04 
17:15:03.029828127 +0900
+++ linux-3.6/include/linux/memory_hotplug.h2012-10-04 17:15:59.010833688 
+0900
@@ -163,17 +163,10 @@ static inline void arch_refresh_nodedata
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
-{
-}
-static inline void put_page_bootmem(struct page *page)
-{
-}
-#else
 extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
-#endif
+extern void get_page_bootmem(unsigned long ingo, struct page *page,
+unsigned long type);
 
 /*
  * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
Index: linux-3.6/mm/memory_hotplug.c
===
--- linux-3.6.orig/mm/memory_hotplug.c  2012-10-04 17:15:27.213831361 +0900
+++ linux-3.6/mm/memory_hotplug.c   2012-10-04 17:37:00.176401540 +0900
@@ -91,9 +91,8 @@ static void release_memory_resource(stru
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
-static void get_page_bootmem(unsigned long info,  struct page *page,
-unsigned long type)
+void get_page_bootmem(unsigned long info,  struct page *page,
+ unsigned long type)
 {
unsigned long page_type;
 
@@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page 
 
 }
 
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
unsigned long *usemap, mapsize, section_nr, i;
@@ -160,6 +160,36 @@ static void register_page_bootmem_info_s
get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 
 }
+#else
+static void register_page_bootmem_info_section(unsigned long start_pfn)
+{
+   unsigned long *usemap, mapsize, section_nr, i;
+   struct mem_section *ms;
+   struct page *page, *memmap;
+
+   if (!pfn_valid(start_pfn))
+   return;
+
+   section_nr = pfn_to_section_nr(start_pfn);
+   ms = __nr_to_section(section_nr);
+
+   memmap = sparse_decode_mem_map(ms-section_mem_map, section_nr);
+
+   page = virt_to_page(memmap);
+   mapsize = sizeof(struct page) * PAGES_PER_SECTION;
+   mapsize = PAGE_ALIGN(mapsize)  PAGE_SHIFT;
+
+   register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);
+
+   usemap = __nr_to_section(section_nr)-pageblock_flags;
+   page = virt_to_page(usemap);
+
+   mapsize = PAGE_ALIGN(usemap_size())  PAGE_SHIFT;
+
+   for (i = 0; i  mapsize; i++, page++)
+   get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
+}
+#endif
 
 void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
@@ -202,7 +232,6 @@ void register_page_bootmem_info_node(str
register_page_bootmem_info_section(pfn);
}
 }
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
   unsigned long end_pfn)
Index: linux-3.6/arch/ia64/mm/discontig.c
===
--- linux-3.6.orig/arch/ia64/mm/discontig.c 2012-10-01 08:47:46.0 
+0900
+++ linux-3.6/arch/ia64/mm/discontig.c  2012-10-04 17:15:59.209833459 +0900
@@ -822,4 +822,10 @@ int __meminit vmemmap_populate(struct pa
 {
return vmemmap_populate_basepages(start_page, size, node);
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr

[PATCH 7/10] memory-hotplug : remove memmap of sparse-vmemmap

2012-10-04 Thread Yasuaki Ishimatsu
All pages of virtual mapping in removed memory cannot be freed, since some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch checks whether page can be freed or not.

How to check whether page can be freed or not?
 1. When removing memory, the page structs of the revmoved memory are filled
with 0FD.
 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
In this case, the page used as PT/PMD can be freed.

Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64,
ppc, s390, and sparc.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 arch/ia64/mm/discontig.c  |8 +++
 arch/powerpc/mm/init_64.c |8 +++
 arch/s390/mm/vmem.c   |8 +++
 arch/sparc/mm/init_64.c   |8 +++
 arch/x86/mm/init_64.c |  119 ++
 include/linux/mm.h|2 
 mm/memory_hotplug.c   |   17 --
 mm/sparse.c   |5 +
 8 files changed, 158 insertions(+), 17 deletions(-)

Index: linux-3.6/arch/ia64/mm/discontig.c
===
--- linux-3.6.orig/arch/ia64/mm/discontig.c 2012-10-04 18:30:15.475692638 
+0900
+++ linux-3.6/arch/ia64/mm/discontig.c  2012-10-04 18:30:21.145698389 +0900
@@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct pa
return vmemmap_populate_basepages(start_page, size, node);
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
Index: linux-3.6/arch/powerpc/mm/init_64.c
===
--- linux-3.6.orig/arch/powerpc/mm/init_64.c2012-10-04 18:30:15.494692657 
+0900
+++ linux-3.6/arch/powerpc/mm/init_64.c 2012-10-04 18:30:21.150698394 +0900
@@ -299,6 +299,14 @@ int __meminit vmemmap_populate(struct pa
return 0;
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
Index: linux-3.6/arch/s390/mm/vmem.c
===
--- linux-3.6.orig/arch/s390/mm/vmem.c  2012-10-04 18:30:15.506692670 +0900
+++ linux-3.6/arch/s390/mm/vmem.c   2012-10-04 18:30:21.157698401 +0900
@@ -227,6 +227,14 @@ out:
return ret;
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
Index: linux-3.6/arch/sparc/mm/init_64.c
===
--- linux-3.6.orig/arch/sparc/mm/init_64.c  2012-10-04 18:30:15.512692676 
+0900
+++ linux-3.6/arch/sparc/mm/init_64.c   2012-10-04 18:30:21.163698408 +0900
@@ -2078,6 +2078,14 @@ void __meminit vmemmap_populate_print_la
}
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
Index: linux-3.6/arch/x86/mm/init_64.c
===
--- linux-3.6.orig/arch/x86/mm/init_64.c2012-10-04 18:30:15.517692681 
+0900
+++ linux-3.6/arch/x86/mm/init_64.c 2012-10-04 18:30:21.171698416 +0900
@@ -993,6 +993,125 @@ vmemmap_populate(struct page *start_page
return 0;
 }
 
+#define PAGE_INUSE 0xFD
+
+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+   struct page **pp, int *page_size)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   void *page_addr;
+   unsigned long next;
+
+   *pp = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd))
+   return pgd_addr_end(addr, end);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud))
+   return

[PATCH 8/10] memory-hotplug : remove page table of x86_64 architecture

2012-10-04 Thread Yasuaki Ishimatsu
From: Wen Congyang we...@cn.fujitsu.com

For hot removing memory, we sholud remove page table about the memory.
So the patch searches a page table about the removed memory, and clear
page table.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 arch/x86/include/asm/pgtable_types.h |1 
 arch/x86/mm/init_64.c|  147 +++
 arch/x86/mm/pageattr.c   |   47 +--
 3 files changed, 173 insertions(+), 22 deletions(-)

Index: linux-3.6/arch/x86/mm/init_64.c
===
--- linux-3.6.orig/arch/x86/mm/init_64.c2012-10-04 18:30:21.171698416 
+0900
+++ linux-3.6/arch/x86/mm/init_64.c 2012-10-04 18:30:27.317704652 +0900
@@ -675,6 +675,151 @@ int arch_add_memory(int nid, u64 start, 
 }
 EXPORT_SYMBOL_GPL(arch_add_memory);
 
+static void __meminit
+phys_pte_remove(pte_t *pte_page, unsigned long addr, unsigned long end)
+{
+   unsigned pages = 0;
+   int i = pte_index(addr);
+
+   pte_t *pte = pte_page + pte_index(addr);
+
+   for (; i  PTRS_PER_PTE; i++, addr += PAGE_SIZE, pte++) {
+
+   if (addr = end)
+   break;
+
+   if (!pte_present(*pte))
+   continue;
+
+   pages++;
+   set_pte(pte, __pte(0));
+   }
+
+   update_page_count(PG_LEVEL_4K, -pages);
+}
+
+static void __meminit
+phys_pmd_remove(pmd_t *pmd_page, unsigned long addr, unsigned long end)
+{
+   unsigned long pages = 0, next;
+   int i = pmd_index(addr);
+
+   for (; i  PTRS_PER_PMD; i++, addr = next) {
+   unsigned long pte_phys;
+   pmd_t *pmd = pmd_page + pmd_index(addr);
+   pte_t *pte;
+
+   if (addr = end)
+   break;
+
+   next = (addr  PMD_MASK) + PMD_SIZE;
+
+   if (!pmd_present(*pmd))
+   continue;
+
+   if (pmd_large(*pmd)) {
+   if ((addr  ~PMD_MASK) == 0  next = end) {
+   set_pmd(pmd, __pmd(0));
+   pages++;
+   continue;
+   }
+
+   /*
+* We use 2M page, but we need to remove part of them,
+* so split 2M page to 4K page.
+*/
+   pte = alloc_low_page(pte_phys);
+   __split_large_page((pte_t *)pmd, addr, pte);
+
+   spin_lock(init_mm.page_table_lock);
+   pmd_populate_kernel(init_mm, pmd, __va(pte_phys));
+   spin_unlock(init_mm.page_table_lock);
+   }
+
+   spin_lock(init_mm.page_table_lock);
+   pte = map_low_page((pte_t *)pmd_page_vaddr(*pmd));
+   phys_pte_remove(pte, addr, end);
+   unmap_low_page(pte);
+   spin_unlock(init_mm.page_table_lock);
+   }
+   update_page_count(PG_LEVEL_2M, -pages);
+}
+
+static void __meminit
+phys_pud_remove(pud_t *pud_page, unsigned long addr, unsigned long end)
+{
+   unsigned long pages = 0, next;
+   int i = pud_index(addr);
+
+   for (; i  PTRS_PER_PUD; i++, addr = next) {
+   unsigned long pmd_phys;
+   pud_t *pud = pud_page + pud_index(addr);
+   pmd_t *pmd;
+
+   if (addr = end)
+   break;
+
+   next = (addr  PUD_MASK) + PUD_SIZE;
+
+   if (!pud_present(*pud))
+   continue;
+
+   if (pud_large(*pud)) {
+   if ((addr  ~PUD_MASK) == 0  next = end) {
+   set_pud(pud, __pud(0));
+   pages++;
+   continue;
+   }
+
+   /*
+* We use 1G page, but we need to remove part of them,
+* so split 1G page to 2M page.
+*/
+   pmd = alloc_low_page(pmd_phys);
+   __split_large_page((pte_t *)pud, addr, (pte_t *)pmd);
+
+   spin_lock(init_mm.page_table_lock);
+   pud_populate(init_mm, pud, __va(pmd_phys));
+   spin_unlock(init_mm.page_table_lock);
+   }
+
+   pmd = map_low_page(pmd_offset(pud, 0));
+   phys_pmd_remove(pmd, addr, end);
+   unmap_low_page(pmd);
+   __flush_tlb_all();
+   }
+   __flush_tlb_all

[PATCH 9/10] memory-hotplug : memory_hotplug: clear zone when removing the memory

2012-10-04 Thread Yasuaki Ishimatsu
When a memory is added, we update zone's and pgdat's start_pfn and
spanned_pages in the function __add_zone(). So we should revert them
when the memory is removed.

The patch adds a new function __remove_zone() to do this.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 mm/memory_hotplug.c |  207 
 1 file changed, 207 insertions(+)

Index: linux-3.6/mm/memory_hotplug.c
===
--- linux-3.6.orig/mm/memory_hotplug.c  2012-10-04 18:30:21.182698427 +0900
+++ linux-3.6/mm/memory_hotplug.c   2012-10-04 18:30:31.767709165 +0900
@@ -312,10 +312,213 @@ static int __meminit __add_section(int n
return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
 }
 
+/* find the smallest valid pfn in the range [start_pfn, end_pfn) */
+static int find_smallest_section_pfn(int nid, struct zone *zone,
+unsigned long start_pfn,
+unsigned long end_pfn)
+{
+   struct mem_section *ms;
+
+   for (; start_pfn  end_pfn; start_pfn += PAGES_PER_SECTION) {
+   ms = __pfn_to_section(start_pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+   if (unlikely(pfn_to_nid(start_pfn)) != nid)
+   continue;
+
+   if (zone  zone != page_zone(pfn_to_page(start_pfn)))
+   continue;
+
+   return start_pfn;
+   }
+
+   return 0;
+}
+
+/* find the biggest valid pfn in the range [start_pfn, end_pfn). */
+static int find_biggest_section_pfn(int nid, struct zone *zone,
+   unsigned long start_pfn,
+   unsigned long end_pfn)
+{
+   struct mem_section *ms;
+   unsigned long pfn;
+
+   /* pfn is the end pfn of a memory section. */
+   pfn = end_pfn - 1;
+   for (; pfn = start_pfn; pfn -= PAGES_PER_SECTION) {
+   ms = __pfn_to_section(pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+   if (unlikely(pfn_to_nid(pfn)) != nid)
+   continue;
+
+   if (zone  zone != page_zone(pfn_to_page(pfn)))
+   continue;
+
+   return pfn;
+   }
+
+   return 0;
+}
+
+static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
+unsigned long end_pfn)
+{
+   unsigned long zone_start_pfn =  zone-zone_start_pfn;
+   unsigned long zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages;
+   unsigned long pfn;
+   struct mem_section *ms;
+   int nid = zone_to_nid(zone);
+
+   zone_span_writelock(zone);
+   if (zone_start_pfn == start_pfn) {
+   /*
+* If the section is smallest section in the zone, it need
+* shrink zone-zone_start_pfn and zone-zone_spanned_pages.
+* In this case, we find second smallest valid mem_section
+* for shrinking zone.
+*/
+   pfn = find_smallest_section_pfn(nid, zone, end_pfn,
+   zone_end_pfn);
+   if (pfn) {
+   zone-zone_start_pfn = pfn;
+   zone-spanned_pages = zone_end_pfn - pfn;
+   }
+   } else if (zone_end_pfn == end_pfn) {
+   /*
+* If the section is biggest section in the zone, it need
+* shrink zone-spanned_pages.
+* In this case, we find second biggest valid mem_section for
+* shrinking zone.
+*/
+   pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn,
+  start_pfn);
+   if (pfn)
+   zone-spanned_pages = pfn - zone_start_pfn + 1;
+   }
+
+   /*
+* The section is not biggest or smallest mem_section in the zone, it
+* only creates a hole in the zone. So in this case, we need not
+* change the zone. But perhaps, the zone has only hole data. Thus
+* it check the zone has only hole or not.
+*/
+   pfn = zone_start_pfn;
+   for (; pfn  zone_end_pfn; pfn += PAGES_PER_SECTION) {
+   ms = __pfn_to_section(pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+   if (page_zone(pfn_to_page(pfn)) != zone)
+   continue;
+
+/* If the section

[PATCH 10/10] memory-hotplug : remove sysfs file of node

2012-10-04 Thread Yasuaki Ishimatsu
From: Wen Congyang we...@cn.fujitsu.com

This patch introduces a new function try_offline_node() to
remove sysfs file of node when all memory sections of this
node are removed. If some memory sections of this node are
not removed, this function does nothing.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 mm/memory_hotplug.c |   54 
 1 file changed, 54 insertions(+)

Index: linux-3.6/mm/memory_hotplug.c
===
--- linux-3.6.orig/mm/memory_hotplug.c  2012-10-04 18:30:31.767709165 +0900
+++ linux-3.6/mm/memory_hotplug.c   2012-10-04 18:32:46.907842637 +0900
@@ -29,6 +29,7 @@
 #include linux/suspend.h
 #include linux/mm_inline.h
 #include linux/firmware-map.h
+#include linux/stop_machine.h
 
 #include asm/tlbflush.h
 
@@ -1276,6 +1277,57 @@ int offline_memory(u64 start, u64 size)
return 0;
 }
 
+static int check_cpu_on_node(void *data)
+{
+   struct pglist_data *pgdat = data;
+   int cpu;
+
+   for_each_online_cpu(cpu) {
+   if (cpu_to_node(cpu) == pgdat-node_id)
+   /*
+* the cpu on this node is onlined, and we can't
+* offline this node.
+*/
+   return -EBUSY;
+   }
+
+   return 0;
+}
+
+/* offline the node if all memory sections of this node are removed */
+static void try_offline_node(int nid)
+{
+   unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn;
+   unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages;
+   unsigned long pfn;
+
+   for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
+   unsigned long section_nr = pfn_to_section_nr(pfn);
+
+   if (!present_section_nr(section_nr))
+   continue;
+
+   if (pfn_to_nid(pfn) != nid)
+   continue;
+
+   /*
+* some memory sections of this node are not removed, and we
+* can't offline node now.
+*/
+   return;
+   }
+
+   if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL))
+   return;
+
+   /*
+* all memory sections of this node are removed, we can offline this
+* node now.
+*/
+   node_set_offline(nid);
+   unregister_one_node(nid);
+}
+
 int __ref remove_memory(int nid, u64 start, u64 size)
 {
int ret = 0;
@@ -1296,6 +1348,8 @@ int __ref remove_memory(int nid, u64 sta
firmware_map_remove(start, start + size, System RAM);
 
arch_remove_memory(start, size);
+
+   try_offline_node(nid);
 out:
unlock_memory_hotplug();
return ret;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


memory-hotplug : suppres Trying to free nonexistent resource XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY warning

2012-10-03 Thread Yasuaki Ishimatsu
When our x86 box calls __remove_pages(), release_mem_region() shows
many warnings. And x86 box cannot unregister iomem_resource.

Trying to free nonexistent resource -

release_mem_region() has been changed as called in each PAGES_PER_SECTION
chunk since applying a patch(de7f0cba96786c). Because powerpc registers
iomem_resource in each PAGES_PER_SECTION chunk. But when I hot add memory
on x86 box, iomem_resource is register in each _CRS not PAGES_PER_SECTION
chunk. So x86 box unregisters iomem_resource.

The patch fixes the problem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   13 +
 mm/memory_hotplug.c |4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

Index: linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux-3.6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  
2012-10-04 14:22:59.833520792 +0900
+++ linux-3.6/arch/powerpc/platforms/pseries/hotplug-memory.c   2012-10-04 
14:23:05.150521411 +0900
@@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsig
 {
unsigned long start, start_pfn;
struct zone *zone;
-   int ret;
+   int i, ret;
+   int sections_to_remove;
 
start_pfn = base  PAGE_SHIFT;
 
@@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig
 * to sysfs state file and we can't remove sysfs entries
 * while writing to it. So we have to defer it to here.
 */
-   ret = __remove_pages(zone, start_pfn, memblock_size  PAGE_SHIFT);
-   if (ret)
-   return ret;
+   sections_to_remove = (memblock_size  PAGE_SHIFT) / PAGES_PER_SECTION;
+   for (i = 0; i  sections_to_remove; i++) {
+   unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
+   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   if (ret)
+   return ret;
+   }
 
/*
 * Update memory regions for memory remove
Index: linux-3.6/mm/memory_hotplug.c
===
--- linux-3.6.orig/mm/memory_hotplug.c  2012-10-04 14:22:59.829520788 +0900
+++ linux-3.6/mm/memory_hotplug.c   2012-10-04 14:23:25.860527278 +0900
@@ -362,11 +362,11 @@ int __remove_pages(struct zone *zone, un
BUG_ON(phys_start_pfn  ~PAGE_SECTION_MASK);
BUG_ON(nr_pages % PAGES_PER_SECTION);
 
+   release_mem_region(phys_start_pfn  PAGE_SHIFT, nr_pages * PAGE_SIZE);
+
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
-   release_mem_region(pfn  PAGE_SHIFT,
-  PAGES_PER_SECTION  PAGE_SHIFT);
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v9 PATCH 03/21] memory-hotplug: store the node id in acpi_memory_device

2012-10-01 Thread Yasuaki Ishimatsu

Hi Chen,

2012/09/28 12:21, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang we...@cn.fujitsu.com

The memory device has only one node id. Store the node id when
enable the memory device, and we can reuse it when removing the
memory device.


one question:
if use numa emulation, memory device will associated to one node or ...?


Memory device has only one node, even if you use numa emulation.

Thanks,
Yasuaki Ishimatsu





CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
  drivers/acpi/acpi_memhotplug.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 2a7beac..7873832 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -83,6 +83,7 @@ struct acpi_memory_info {
  struct acpi_memory_device {
  struct acpi_device * device;
  unsigned int state;/* State of the memory device */
+int nid;
  struct list_head res_list;
  };
@@ -256,6 +257,9 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
  info-enabled = 1;
  num_enabled++;
  }
+
+mem_device-nid = node;
+
  if (!num_enabled) {
  printk(KERN_ERR PREFIX add_memory failed\n);
  mem_device-state = MEMORY_INVALID_STATE;





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-10-01 Thread Yasuaki Ishimatsu

Hi Chen,

2012/10/02 8:45, Ni zhan Chen wrote:

On 10/01/2012 12:44 PM, Yasuaki Ishimatsu wrote:

Hi Chen,

2012/09/29 17:19, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang we...@cn.fujitsu.com

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug


Hi Yasuaki,

where is the acpi_memhotplug module?


If you build acpi_memhotplug as module, it is created under
/lib/modules/kernel-version/driver/acpi/ directory. It depends
on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in
function. So you don't need to care about it.
Thanks,
Yasuaki Ishimatsu


Hi Yasuaki,

I build the kernel, MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY are 
seleted as [*], but I can't find PNP0C80:XX under the directory 
/sys/bus/acpi/devices/.

[root@localhost ~]# ls /sys/bus/acpi/devices/
device:00  device:07  device:0e  device:15  device:1c  device:23 device:2a   
LNXCPU:00  LNXCPU:07PNP0501:00  PNP0C02:00 PNP0C0F:02 PNP0C14:01
device:01  device:08  device:0f  device:16  device:1d  device:24 device:2b   
LNXCPU:01  LNXPWRBN:00  PNP0800:00  PNP0C02:01 PNP0C0F:03 PNP0C31:00
device:02  device:09  device:10  device:17  device:1e  device:25 device:2c   
LNXCPU:02  LNXSYSTM:00  PNP0A08:00  PNP0C02:02 PNP0C0F:04
device:03  device:0a  device:11  device:18  device:1f  device:26 device:2d   
LNXCPU:03  PNP:00   PNP0B00:00  PNP0C04:00 PNP0C0F:05
device:04  device:0b  device:12  device:19  device:20  device:27 device:2e   
LNXCPU:04  PNP0100:00   PNP0C01:00  PNP0C0C:00 PNP0C0F:06
device:05  device:0c  device:13  device:1a  device:21  device:28 device:2f   
LNXCPU:05  PNP0103:00   PNP0C01:01  PNP0C0F:00 PNP0C0F:07
device:06  device:0d  device:14  device:1b  device:22  device:29 INT3F0D:00  
LNXCPU:06  PNP0200:00   PNP0C01:02  PNP0C0F:01 PNP0C14:00

then what I miss ? thanks.


It depend on hardware. It seems that your system does not support
memory hotplug. If you use KVM, you can try memory hotplug on KVM
guest by applying Vasilis' patch-set.

http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html

Thanks,
Yasuaki Ishimatsu








3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the 
memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, you should offline the memory by
hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the memory
  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup

Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()

2012-10-01 Thread Yasuaki Ishimatsu

Hi Kosaki-san,

2012/09/29 7:15, KOSAKI Motohiro wrote:

On Thu, Sep 27, 2012 at 11:50 PM, Yasuaki Ishimatsu
isimatu.yasu...@jp.fujitsu.com wrote:

Hi Chen,


2012/09/28 11:22, Ni zhan Chen wrote:


On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:


From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

remove_memory() only try to offline pages. It is called in two cases:
1. hot remove a memory device
2. echo offline /sys/devices/system/memory/memoryXX/state

In the 1st case, we should also change memory block's state, and notify
the userspace that the memory block's state is changed after offlining
pages.

So rename remove_memory() to offline_memory()/offline_pages(). And in
the 1st case, offline_memory() will be used. The function
offline_memory()
is not implemented. In the 2nd case, offline_pages() will be used.



But this time there is not a function associated with add_memory.



To associate with add_memory() later, we renamed it.


Then, you introduced bisect breakage. It is definitely unacceptable.


What is bisect breakage meaning?

Thanks,
Yasuaki Ishimatsu



NAK.




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v9 PATCH 13/21] memory-hotplug: check page type in get_page_bootmem

2012-09-30 Thread Yasuaki Ishimatsu

Hi Chen,

2012/09/29 11:15, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

The function get_page_bootmem() may be called more than one time to the same
page. There is no need to set page's type, private if the function is not
the first time called to the page.

Note: the patch is just optimization and does not fix any problem.


Hi Yasuaki,

this patch is reasonable to me. I have another question associated to 
get_page_bootmem(), the question is from another fujitsu guy's patch changelog 
[commit : 04753278769f3], the changelog said  that:

  1) When the memmap of removing section is allocated on other
  section by bootmem, it should/can be free.
  2) When the memmap of removing section is allocated on the
  same section, it shouldn't be freed. Because the section has to be
  logical memory offlined already and all pages must be isolated against
  page allocater. If it is freed, page allocator may use it which will
  be removed physically soon.

but I don't see his patch guarantee 2), it means that his patch doesn't 
guarantee the memmap of removing section which is allocated on other section by 
bootmem doesn't be freed. Hopefully get your explaination in details, thanks in 
advance. :-)


In my understanding, the patch does not guarantee it.
Please see [commit : 0c0a4a517a31e]. free_map_bootmem() in the commit
guarantees it.

Thanks,
Yasuaki Ishimatsu





CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
  mm/memory_hotplug.c |   15 +++
  1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d736df3..26a5012 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
  static void get_page_bootmem(unsigned long info,  struct page *page,
   unsigned long type)
  {
-page-lru.next = (struct list_head *) type;
-SetPagePrivate(page);
-set_page_private(page, info);
-atomic_inc(page-_count);
+unsigned long page_type;
+
+page_type = (unsigned long)page-lru.next;
+if (page_type  MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+page_type  MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+page-lru.next = (struct list_head *)type;
+SetPagePrivate(page);
+set_page_private(page, info);
+atomic_inc(page-_count);
+} else
+atomic_inc(page-_count);
  }
  /* reference to __meminit __free_pages_bootmem is valid





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-30 Thread Yasuaki Ishimatsu

Hi Chen,

2012/09/29 17:19, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang we...@cn.fujitsu.com

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug


Hi Yasuaki,

where is the acpi_memhotplug module?


If you build acpi_memhotplug as module, it is created under
/lib/modules/kernel-version/driver/acpi/ directory. It depends
on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in
function. So you don't need to care about it.  


Thanks,
Yasuaki Ishimatsu




3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the 
memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, you should offline the memory by
hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the memory
  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup when onlining memory block
  failed

change log of v8:
  [RFC PATCH v8 17/20]
* Fix problems when one node's range include the other nodes
  [RFC PATCH v8 18/20]
* fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
  is not defined.
  [RFC PATCH v8 19/20]
* don't offline node when some memory sections are not removed
  [RFC PATCH v8 20/20]
* create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
  [RFC PATCH v7 4/19]
* do not continue if acpi_memory_device_remove_memory() fails.
  [RFC PATCH v7 15/19]
* handle usemap in register_page_bootmem_info_section() too.

change log of v6:
  [RFC PATCH v6 12/19]
* fix building error on other archtitectures than x86

  [RFC PATCH v6 15-16/19]
* fix building error on other archtitectures than x86

change log of v5:
  * merge the patchset to clear page table and the patchset to hot remove
memory(from ishimatsu) to one big patchset.

  [RFC PATCH v5 1/19]
* rename remove_memory() to offline_memory()/offline_pages()

  [RFC PATCH v5 2/19]
* new patch: implement offline_memory(). This function offlines pages,
  update memory block's state, and notify the userspace that the memory
  block's state is changed.

  [RFC PATCH v5 4/19]
* offline and remove memory in acpi_memory_disable_device() too.

  [RFC PATCH v5 17/19]
* new patch: add a new function __remove_zone() to revert the things done
  in the function __add_zone().

  [RFC PATCH v5 18/19]
* flush work befor reseting node device.

change log of v4:
  * remove

Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()

2012-09-27 Thread Yasuaki Ishimatsu

Hi Chen,

2012/09/28 11:22, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

remove_memory() only try to offline pages. It is called in two cases:
1. hot remove a memory device
2. echo offline /sys/devices/system/memory/memoryXX/state

In the 1st case, we should also change memory block's state, and notify
the userspace that the memory block's state is changed after offlining
pages.

So rename remove_memory() to offline_memory()/offline_pages(). And in
the 1st case, offline_memory() will be used. The function offline_memory()
is not implemented. In the 2nd case, offline_pages() will be used.


But this time there is not a function associated with add_memory.


To associate with add_memory() later, we renamed it.

Thanks,
Yasuaki Ishimatsu





CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
  drivers/acpi/acpi_memhotplug.c |2 +-
  drivers/base/memory.c  |9 +++--
  include/linux/memory_hotplug.h |3 ++-
  mm/memory_hotplug.c|   22 ++
  4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24c807f..2a7beac 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
   */
  list_for_each_entry_safe(info, n, mem_device-res_list, list) {
  if (info-enabled) {
-result = remove_memory(info-start_addr, info-length);
+result = offline_memory(info-start_addr, info-length);
  if (result)
  return result;
  }
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..44e7de6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -248,26 +248,23 @@ static bool pages_correctly_reserved(unsigned long 
start_pfn,
  static int
  memory_block_action(unsigned long phys_index, unsigned long action)
  {
-unsigned long start_pfn, start_paddr;
+unsigned long start_pfn;
  unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
  struct page *first_page;
  int ret;
  first_page = pfn_to_page(phys_index  PFN_SECTION_SHIFT);
+start_pfn = page_to_pfn(first_page);
  switch (action) {
  case MEM_ONLINE:
-start_pfn = page_to_pfn(first_page);
-
  if (!pages_correctly_reserved(start_pfn, nr_pages))
  return -EBUSY;
  ret = online_pages(start_pfn, nr_pages);
  break;
  case MEM_OFFLINE:
-start_paddr = page_to_pfn(first_page)  PAGE_SHIFT;
-ret = remove_memory(start_paddr,
-nr_pages  PAGE_SHIFT);
+ret = offline_pages(start_pfn, nr_pages);
  break;
  default:
  WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..c183f39 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -233,7 +233,8 @@ static inline int is_mem_section_removable(unsigned long 
pfn,
  extern int mem_online_node(int nid);
  extern int add_memory(int nid, u64 start, u64 size);
  extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory(u64 start, u64 size);
  extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
  int nr_pages);
  extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..bb42316 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -866,7 +866,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long 
end_pfn)
  return offlined;
  }
-static int __ref offline_pages(unsigned long start_pfn,
+static int __ref __offline_pages(unsigned long start_pfn,
unsigned long end_pfn, unsigned long timeout)
  {
  unsigned long pfn, nr_pages, expire;
@@ -994,18 +994,24 @@ out:
  return ret;
  }
-int remove_memory(u64 start, u64 size)
+int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
  {
-unsigned long start_pfn, end_pfn;
+return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+}
-start_pfn = PFN_DOWN(start

Re: [RFC v9 PATCH 05/21] memory-hotplug: check whether memory is present or not

2012-09-10 Thread Yasuaki Ishimatsu

Hi Wen,

2012/09/11 11:15, Wen Congyang wrote:

Hi, ishimatsu

At 09/05/2012 05:25 PM, we...@cn.fujitsu.com Wrote:

From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

If system supports memory hot-remove, online_pages() may online removed pages.
So online_pages() need to check whether onlining pages are present or not.


Because we use memory_block_change_state() to hotremoving memory, I think
this patch can be removed. What do you think?


Pleae teach me detals a little more. If we use memory_block_change_state(),
does the conflict never occur? Why?

Thansk,
Yasuaki Ishimatsu


Thanks
Wen Congyang



CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
  include/linux/mmzone.h |   19 +++
  mm/memory_hotplug.c|   13 +
  2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2daa54f..ac3ae30 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1180,6 +1180,25 @@ void sparse_init(void);
  #define sparse_index_init(_sec, _nid)  do {} while (0)
  #endif /* CONFIG_SPARSEMEM */

+#ifdef CONFIG_SPARSEMEM
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   int i;
+   for (i = 0; i  nr_pages; i++) {
+   if (pfn_present(pfn + i))
+   continue;
+   else
+   return -EINVAL;
+   }
+   return 0;
+}
+#else
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   return 0;
+}
+#endif /* CONFIG_SPARSEMEM*/
+
  #ifdef CONFIG_NODES_SPAN_OTHER_NODES
  bool early_pfn_in_nid(unsigned long pfn, int nid);
  #else
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 49f7747..299747d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
struct memory_notify arg;

lock_memory_hotplug();
+   /*
+* If system supports memory hot-remove, the memory may have been
+* removed. So we check whether the memory has been removed or not.
+*
+* Note: When CONFIG_SPARSEMEM is defined, pfns_present() become
+*   effective. If CONFIG_SPARSEMEM is not defined, pfns_present()
+*   always returns 0.
+*/
+   ret = pfns_present(pfn, nr_pages);
+   if (ret) {
+   unlock_memory_hotplug();
+   return ret;
+   }
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory

2012-09-09 Thread Yasuaki Ishimatsu

Hi Wen,

2012/09/01 5:49, Andrew Morton wrote:

On Tue, 28 Aug 2012 18:00:07 +0800
we...@cn.fujitsu.com wrote:


This patch series aims to support physical memory hot-remove.


Have you had much review and testing feedback yet?


The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.





I doubt if many people have hardware which permits physical memory
removal?  How would you suggest that people with regular hardware can
test these chagnes?


How do you test the patch? As Andrew says, for hot-removing memory,
we need a particular hardware. I think so too. So many people may want
to know how to test the patch.
If we apply following patch to kvm guest, can we hot-remove memory on
kvm guest?

http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html

Thanks,
Yasuaki Ishimatsu




Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.


That's quite a problem!  Do you have a description of why this is the
case, and a plan for fixing it?




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v8 PATCH 13/20] memory-hotplug: check page type in get_page_bootmem

2012-09-04 Thread Yasuaki Ishimatsu

Hi Wen,

2012/09/04 12:46, Wen Congyang wrote:

Hi, isimatu-san

At 09/01/2012 05:30 AM, Andrew Morton Wrote:

On Tue, 28 Aug 2012 18:00:20 +0800
we...@cn.fujitsu.com wrote:


From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

There is a possibility that get_page_bootmem() is called to the same page many
times. So when get_page_bootmem is called to the same page, the function only
increments page-_count.


I really don't understand this explanation, even after having looked at
the code.  Can you please have another attempt at the changelog?


What is the problem that you want to fix? The function get_page_bootmem()
may be called to the same page more than once, but I don't find any problem
about current implementation.


The patch is just optimization. The patch does not fix a problems.
As you know, the function may be called many times for the same page.
I think if a page is sets to page_type and Page Private flag and page-private,
the page need not be set the same things again. So we check page_type when
get_page_bootmem() is called. And if the page has been set to them, the page
is only incremented page-_count.

Thanks,
Yasuaki Ishimatsu



Thanks
Wen Congyang




--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
  static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
  {
-   page-lru.next = (struct list_head *) type;
-   SetPagePrivate(page);
-   set_page_private(page, info);
-   atomic_inc(page-_count);
+   unsigned long page_type;
+
+   page_type = (unsigned long) page-lru.next;
+   if (page_type  MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+   page_type  MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+   page-lru.next = (struct list_head *) type;
+   SetPagePrivate(page);
+   set_page_private(page, info);
+   atomic_inc(page-_count);
+   } else
+   atomic_inc(page-_count);
  }


And a code comment which explains what is going on would be good.  As
is always the case ;)







___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v5 00/19] memory-hotplug: hot-remove physical memory

2012-07-27 Thread Yasuaki Ishimatsu

Hi Wen,

2012/07/27 19:20, Wen Congyang wrote:

This patch series aims to support physical memory hot-remove.

The patches can free/remove following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

change log of v5:
  * merge the patchset to clear page table and the patchset to hot remove
memory(from ishimatsu) to one big patchset.


Thank you for merging patches. I'll review next Monday.

Thanks,
Yasuaki Ishimatsu


  [RFC PATCH v5 1/19]
* rename remove_memory() to offline_memory()/offline_pages()

  [RFC PATCH v5 2/19]
* new patch: implement offline_memory(). This function offlines pages,
  update memory block's state, and notify the userspace that the memory
  block's state is changed.

  [RFC PATCH v5 4/19]
* offline and remove memory in acpi_memory_disable_device() too.

  [RFC PATCH v5 17/19]
* new patch: add a new function __remove_zone() to revert the things done
  in the function __add_zone().

  [RFC PATCH v5 18/19]
* flush work befor reseting node device.

change log of v4:
  * remove memory-hotplug : unify argument of firmware_map_add_early/hotplug
from the patch series, since the patch is a bugfix. It is being disccussed
on other thread. But for testing the patch series, the patch is needed.
So I added the patch as [PATCH 0/13].

  [RFC PATCH v4 2/13]
* check memory is online or not at remove_memory()
* add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
  getting node id

  [RFC PATCH v4 3/13]
* create new patch : check memory is online or not at online_pages()

  [RFC PATCH v4 4/13]
* add __ref section to remove_memory()
* call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()

  [RFC PATCH v4 11/13]
* rewrite register_page_bootmem_memmap() for removing page used as PT/PMD

change log of v3:
  * rebase to 3.5.0-rc6

  [RFC PATCH v2 2/13]
* remove extra kobject_put()

* The patch was commented by Wen. Wen's comment is
  acpi_memory_device_remove() should ignore a return value of
  remove_memory() since caller does not care the return value.
  But I did not change it since I think caller should care the
  return value. And I am trying to fix it as follow:

  https://lkml.org/lkml/2012/7/5/624

  [RFC PATCH v2 4/13]
* remove a firmware_memmap_entry allocated by kzmalloc()

change log of v2:
  [RFC PATCH v2 2/13]
* check whether memory block is offline or not before calling 
offline_memory()
* check whether section is valid or not in is_memblk_offline()
* call kobject_put() for each memory_block in is_memblk_offline()

  [RFC PATCH v2 3/13]
* unify the end argument of firmware_map_add_early/hotplug

  [RFC PATCH v2 4/13]
* add release_firmware_map_entry() for freeing firmware_map_entry

  [RFC PATCH v2 6/13]
   * add release_memory_block() for freeing memory_block

  [RFC PATCH v2 11/13]
   * fix wrong arguments of free_pages()


Wen Congyang (5):
   memory-hotplug: implement offline_memory()
   memory-hotplug: store the node id in acpi_memory_device
   memory-hotplug: export the function acpi_bus_remove()
   memory-hotplug: call acpi_bus_remove() to remove memory device
   memory-hotplug: introduce new function arch_remove_memory()

Yasuaki Ishimatsu (14):
   memory-hotplug: rename remove_memory() to
 offline_memory()/offline_pages()
   memory-hotplug: offline and remove memory when removing the memory
 device
   memory-hotplug: check whether memory is present or not
   memory-hotplug: remove /sys/firmware/memmap/X sysfs
   memory-hotplug: does not release memory region in PAGES_PER_SECTION
 chunks
   memory-hotplug: add memory_block_release
   memory-hotplug: remove_memory calls __remove_pages
   memory-hotplug: check page type in get_page_bootmem
   memory-hotplug: move register_page_bootmem_info_node and
 put_page_bootmem for sparse-vmemmap
   memory-hotplug: implement register_page_bootmem_info_section of
 sparse-vmemmap
   memory-hotplug: free memmap of sparse-vmemmap
   memory_hotplug: clear zone when the memory is removed
   memory-hotplug: add node_device_release
   memory-hotplug: remove sysfs file of node

  arch/ia64/mm/init.c |   16 +
  arch/powerpc/mm/mem.c   |   14 +
  arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
  arch/s390/mm/init.c |8 +
  arch/sh/mm/init.c   |   15 +
  arch/tile/mm/init.c |8 +
  arch

Re: [RFC PATCH v5 19/19] memory-hotplug: remove sysfs file of node

2012-07-27 Thread Yasuaki Ishimatsu

Hi Wen,

2012/07/27 19:36, Wen Congyang wrote:

From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

The patch adds node_set_offline() and unregister_one_node() to remove_memory()
for removing sysfs file of node.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
  mm/memory_hotplug.c |5 +
  1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5ac035f..5681968 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1267,6 +1267,11 @@ int __ref remove_memory(int nid, u64 start, u64 size)
/* remove memmap entry */
firmware_map_remove(start, start + size, System RAM);

+   if (!node_present_pages(nid)) {


Applying [PATCH v5 17/19], pgdat-node_spanned_pages can become 0 when
all memory of the pgdat is removed. When pgdat-node_spanned_pages is 0,
it means the pgdat has no memory. So I think node_spanned_pages() is
better.

Thanks,
Yasuaki Ishimatsu


+   node_set_offline(nid);
+   unregister_one_node(nid);
+   }
+
arch_remove_memory(start, size);
  out:
unlock_memory_hotplug();




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 0/8] memory-hotplug : hot-remove physical memory(clear page table)

2012-07-20 Thread Yasuaki Ishimatsu

[Hi Wen,

Good news!! I was waiting for this patch to come.
Applying the patches, can we hot-remove physical memory completely?

Thanks,
Yasuaki Ishimatsu

2012/07/20 16:06, Wen Congyang wrote:

This patch series aims to support physical memory hot-remove(clear page table).

This patch series base on ishimatsu's patch series. You can get it here:
http://www.spinics.net/lists/linux-acpi/msg36804.html

The patches can remove following things:
   - page table of removed memory

If you find lack of function for physical memory hot-remove, please let me
know.

Note:
* The patch remove memory info from list before freeing it is being disccussed
   in other thread. But for testing the patch series, the patch is needed.
   So I added the patch as [PATCH 0/8].
* You need to apply ishimatsu's patch series first before applying this patch
   series.

Wen Congyang (8):
   memory-hotplug: store the node id in acpi_memory_device
   memory-hotplug: offline memory only when it is onlined
   memory-hotplug: call remove_memory() to cleanup when removing memory
 device
   memory-hotplug: export the function acpi_bus_remove()
   memory-hotplug: call acpi_bus_remove() to remove memory device
   memory-hotplug: introduce new function arch_remove_memory()
   x86: make __split_large_page() generally avialable
   memory-hotplug: implement arch_remove_memory()

  arch/ia64/mm/init.c  |   16 
  arch/powerpc/mm/mem.c|   14 +++
  arch/s390/mm/init.c  |8 ++
  arch/sh/mm/init.c|   15 +++
  arch/tile/mm/init.c  |8 ++
  arch/x86/include/asm/pgtable_types.h |1 +
  arch/x86/mm/init_32.c|   10 ++
  arch/x86/mm/init_64.c|  160 ++
  arch/x86/mm/pageattr.c   |   47 +-
  drivers/acpi/acpi_memhotplug.c   |   24 --
  drivers/acpi/scan.c  |3 +-
  include/acpi/acpi_bus.h  |1 +
  include/linux/memory_hotplug.h   |1 +
  mm/memory_hotplug.c  |2 +-
  14 files changed, 280 insertions(+), 30 deletions(-)




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/8] memory-hotplug: store the node id in acpi_memory_device

2012-07-20 Thread Yasuaki Ishimatsu

Hi Wen,

2012/07/20 16:09, Wen Congyang wrote:

The memory device has only one node id. Store the node id when
enabling the memory device, and we can reuse it when removing the
memory device.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---


It looks to me.
Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

Thanks,
Yasuaki Ishimatsu


  drivers/acpi/acpi_memhotplug.c |8 +---
  1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 5cafd6b..db8de39 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -84,6 +84,7 @@ struct acpi_memory_info {
  struct acpi_memory_device {
struct acpi_device * device;
unsigned int state; /* State of the memory device */
+   int nid;
struct list_head res_list;
  };

@@ -257,6 +258,9 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
info-enabled = 1;
num_enabled++;
}
+
+   mem_device-nid = node;
+
if (!num_enabled) {
printk(KERN_ERR PREFIX add_memory failed\n);
mem_device-state = MEMORY_INVALID_STATE;
@@ -463,7 +467,7 @@ static int acpi_memory_device_remove(struct acpi_device 
*device, int type)

mem_device = acpi_driver_data(device);

-   node = acpi_get_node(mem_device-device-handle);
+   node = mem_device-nid;
list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
if (!info-enabled)
continue;
@@ -473,8 +477,6 @@ static int acpi_memory_device_remove(struct acpi_device 
*device, int type)
if (result)
return result;
}
-   if (node  0)
-   node = memory_add_physaddr_to_nid(info-start_addr);

result = remove_memory(node, info-start_addr, info-length);
if (result)




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/8] memory-hotplug: offline memory only when it is onlined

2012-07-20 Thread Yasuaki Ishimatsu

Hi Wen,

2012/07/20 16:10, Wen Congyang wrote:

offline_memory() will fail if the memory is not onlined. So check
whether the memory is onlined before calling offline_memory().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---


I have no comment.
Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

Thanks,
Yasuaki Ishimatsu


  drivers/acpi/acpi_memhotplug.c |   10 +++---
  1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index db8de39..712e767 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -323,9 +323,13 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
 */
list_for_each_entry_safe(info, n, mem_device-res_list, list) {
if (info-enabled) {
-   result = offline_memory(info-start_addr, info-length);
-   if (result)
-   return result;
+   if (!is_memblk_offline(info-start_addr,
+  info-length)) {
+   result = offline_memory(info-start_addr,
+   info-length);
+   if (result)
+   return result;
+   }
}
list_del(info-list);
kfree(info);




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 3/8] memory-hotplug: call remove_memory() to cleanup when removing memory device

2012-07-20 Thread Yasuaki Ishimatsu

Hi Wen,

2012/07/20 16:10, Wen Congyang wrote:

We should remove the following things when removing the memory device:
1. memmap and related sysfs files
2. iomem_resource
3. mem_section and related sysfs files
4. node and related sysfs files

The function remove_memory() can do this. So call it after the memory device
is offlined.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---


I have no comment.
Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

Thanks,
Yasuaki Ishimatsu



  drivers/acpi/acpi_memhotplug.c |7 ++-
  1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 712e767..58e4e63 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -315,7 +315,7 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
  {
int result;
struct acpi_memory_info *info, *n;
-
+   int node = mem_device-nid;

/*
 * Ask the VM to offline this memory range.
@@ -330,6 +330,11 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
if (result)
return result;
}
+
+   result = remove_memory(node, info-start_addr,
+  info-length);
+   if (result)
+   return result;
}
list_del(info-list);
kfree(info);




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 6/8] memory-hotplug: introduce new function arch_remove_memory()

2012-07-20 Thread Yasuaki Ishimatsu

2012/07/20 16:12, Wen Congyang wrote:

We don't call __add_pages() directly in the function add_memory()
because some other architecture related thins needs to be done
before or after calling __add_pages(). So we should not call
__remove_pages() directly in the function remove_memory.
Introduce new function arch_remove_memory() to revert the things done
in arch_add_memory().

Note: the function for x86_64 will be implemented later. And I don't
know how to implement it for s390.


I think you need cc to other arch ML for reviewing the patch.



CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
  arch/ia64/mm/init.c|   16 
  arch/powerpc/mm/mem.c  |   14 ++
  arch/s390/mm/init.c|8 
  arch/sh/mm/init.c  |   15 +++
  arch/tile/mm/init.c|8 
  arch/x86/mm/init_32.c  |   10 ++
  arch/x86/mm/init_64.c  |7 +++
  include/linux/memory_hotplug.h |1 +
  mm/memory_hotplug.c|2 +-
  9 files changed, 80 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 0eab454..1e345ed 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -688,6 +688,22 @@ int arch_add_memory(int nid, u64 start, u64 size)

return ret;
  }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+   int ret;
+
+   ret = __remove_pages(start_pfn, nr_pages);
+   if (ret)
+   pr_warn(%s: Problem encountered in __remove_pages() as
+ret=%d\n, __func__,  ret);
+
+   return ret;
+}
+#endif
  #endif

  /*
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index baaafde..249cef4 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -133,6 +133,20 @@ int arch_add_memory(int nid, u64 start, u64 size)

return __add_pages(nid, zone, start_pfn, nr_pages);
  }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+
+   start = (unsigned long)__va(start);
+   if (remove_section_mapping(start, start + size))
+   return -EINVAL;
+
+   return __remove_pages(start_pfn, nr_pages);
+}
+#endif
  #endif /* CONFIG_MEMORY_HOTPLUG */

  /*
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 2bea060..3de0d5b 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -259,4 +259,12 @@ int arch_add_memory(int nid, u64 start, u64 size)
vmem_remove_mapping(start, size);
return rc;
  }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   /* TODO */
+   return -EBUSY;
+}
+#endif
  #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 82cc576..fc84491 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -558,4 +558,19 @@ int memory_add_physaddr_to_nid(u64 addr)
  EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
  #endif

+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+   int ret;
+
+   ret = __remove_pages(start_pfn, nr_pages);
+   if (unlikely(ret))
+   pr_warn(%s: Failed, __remove_pages() == %d\n, __func__,
+   ret);
+
+   return ret;
+}
+#endif
  #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 630dd2c..bdd8a99 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -947,6 +947,14 @@ int remove_memory(u64 start, u64 size)
  {
return -EINVAL;
  }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   /* TODO */
+   return -EBUSY;
+}
+#endif
  #endif

  struct kmem_cache *pgd_cache;
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 575d86f..a690153 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -842,6 +842,16 @@ int arch_add_memory(int nid, u64 start, u64 size)

return __add_pages(nid, zone, start_pfn, nr_pages);
  }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(unsigned long start, unsigned long size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+
+   return __remove_pages

Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap

2012-07-19 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/19 14:58, Wen Congyang wrote:
 At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote:
 All pages of virtual mapping in removed memory cannot be freed, since some 
 pages
 used as PGD/PUD includes not only removed memory but also other memory. So 
 the
 patch checks whether page can be freed or not.

 How to check whether page can be freed or not?
   1. When removing memory, the page structs of the revmoved memory are filled
  with 0FD.
   2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
  In this case, the page used as PT/PMD can be freed.

 Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
 into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   arch/x86/mm/init_64.c |  121 
 ++
   include/linux/mm.h|2
   mm/memory_hotplug.c   |   19 ---
   mm/sparse.c   |5 +-
   4 files changed, 128 insertions(+), 19 deletions(-)

 Index: linux-3.5-rc6/include/linux/mm.h
 ===
 --- linux-3.5-rc6.orig/include/linux/mm.h2012-07-18 18:01:28.0 
 +0900
 +++ linux-3.5-rc6/include/linux/mm.h 2012-07-18 18:03:05.551168773 +0900
 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
   void vmemmap_populate_print_last(void);
   void register_page_bootmem_memmap(unsigned long section_nr, struct page 
 *map,
unsigned long size);
 +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
 +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
   
   enum mf_flags {
  MF_COUNT_INCREASED = 1  0,
 Index: linux-3.5-rc6/mm/sparse.c
 ===
 --- linux-3.5-rc6.orig/mm/sparse.c   2012-07-18 17:59:25.0 +0900
 +++ linux-3.5-rc6/mm/sparse.c2012-07-18 18:03:05.553168749 +0900
 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
  /* This will make the necessary allocations eventually. */
  return sparse_mem_map_populate(pnum, nid);
   }
 -static void __kfree_section_memmap(struct page *memmap, unsigned long 
 nr_pages)
 +static void __kfree_section_memmap(struct page *page, unsigned long 
 nr_pages)
   {
 -return; /* XXX: Not implemented yet */
 +vmemmap_kfree(page, nr_pages);
   }
   static void free_map_bootmem(struct page *page, unsigned long nr_pages)
   {
 +vmemmap_free_bootmem(page, nr_pages);
   }
   #else
   static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 Index: linux-3.5-rc6/arch/x86/mm/init_64.c
 ===
 --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c 2012-07-18 18:01:28.0 
 +0900
 +++ linux-3.5-rc6/arch/x86/mm/init_64.c  2012-07-18 18:03:05.564168611 
 +0900
 @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
  return 0;
   }
   
 +#define PAGE_INUSE 0xFD
 +
 +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
 +struct page **pp, int *page_size)
 +{
 +pgd_t *pgd;
 +pud_t *pud;
 +pmd_t *pmd;
 +pte_t *pte;
 +void *page_addr;
 +unsigned long next;
 +
 +*pp = NULL;
 +
 +pgd = pgd_offset_k(addr);
 +if (pgd_none(*pgd))
 +return pgd_addr_end(addr, end);
 +
 +pud = pud_offset(pgd, addr);
 +if (pud_none(*pud))
 +return pud_addr_end(addr,end);
 +
 +if (!cpu_has_pse) {
 +next = (addr + PAGE_SIZE)  PAGE_MASK;
 +pmd = pmd_offset(pud, addr);
 +if (pmd_none(*pmd))
 +return next;
 +
 +pte = pte_offset_kernel(pmd, addr);
 +if (pte_none(*pte))
 +return next;
 +
 +*page_size = PAGE_SIZE;
 +*pp = pte_page(*pte);
 +} else {
 +next = pmd_addr_end(addr, end);
 +
 +pmd = pmd_offset(pud, addr);
 +if (pmd_none(*pmd))
 +return next;
 +
 +*page_size = PMD_SIZE;
 +*pp = pmd_page(*pmd);
 +}
 +
 +/*
 + * Removed page structs are filled with 0xFD.
 + */
 +memset((void *)addr, PAGE_INUSE, next - addr);
 +
 +page_addr = page_address(*pp);
 +
 +/*
 + * Check the page is filled with 0xFD or not.
 + * memchr_inv() returns the address. In this case, we cannot
 + * clear PTE/PUD entry, since the page is used by other

[RESEND RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap

2012-07-19 Thread Yasuaki Ishimatsu
All pages of virtual mapping in removed memory cannot be freed, since some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch checks whether page can be freed or not.

How to check whether page can be freed or not?
 1. When removing memory, the page structs of the revmoved memory are filled
with 0FD.
 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
In this case, the page used as PT/PMD can be freed.

Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/x86/mm/init_64.c |  121 ++
 include/linux/mm.h|2 
 mm/memory_hotplug.c   |   17 ---
 mm/sparse.c   |5 +-
 4 files changed, 128 insertions(+), 17 deletions(-)

Index: linux-3.5-rc6/include/linux/mm.h
===
--- linux-3.5-rc6.orig/include/linux/mm.h   2012-07-19 15:07:48.836986796 
+0900
+++ linux-3.5-rc6/include/linux/mm.h2012-07-19 15:07:59.101858469 +0900
@@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
 void vmemmap_populate_print_last(void);
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
  unsigned long size);
+void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
+void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1  0,
Index: linux-3.5-rc6/mm/sparse.c
===
--- linux-3.5-rc6.orig/mm/sparse.c  2012-07-19 11:57:09.065797011 +0900
+++ linux-3.5-rc6/mm/sparse.c   2012-07-19 15:07:59.114858306 +0900
@@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
/* This will make the necessary allocations eventually. */
return sparse_mem_map_populate(pnum, nid);
 }
-static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
+static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
 {
-   return; /* XXX: Not implemented yet */
+   vmemmap_kfree(page, nr_pages);
 }
 static void free_map_bootmem(struct page *page, unsigned long nr_pages)
 {
+   vmemmap_free_bootmem(page, nr_pages);
 }
 #else
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
Index: linux-3.5-rc6/arch/x86/mm/init_64.c
===
--- linux-3.5-rc6.orig/arch/x86/mm/init_64.c2012-07-19 15:07:48.898986022 
+0900
+++ linux-3.5-rc6/arch/x86/mm/init_64.c 2012-07-19 15:14:05.870273270 +0900
@@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
return 0;
 }
 
+#define PAGE_INUSE 0xFD
+
+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+   struct page **pp, int *page_size)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   void *page_addr;
+   unsigned long next;
+
+   *pp = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd))
+   return pgd_addr_end(addr, end);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud))
+   return pud_addr_end(addr, end);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte))
+   return next;
+
+   *page_size = PAGE_SIZE;
+   *pp = pte_page(*pte);
+   } else {
+   next = pmd_addr_end(addr, end);
+
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   *page_size = PMD_SIZE;
+   *pp = pmd_page(*pmd);
+   }
+
+   /*
+* Removed page structs are filled with 0xFD.
+*/
+   memset((void *)addr, PAGE_INUSE, next - addr);
+
+   page_addr = page_address(*pp);
+
+   /*
+* Check the page is filled with 0xFD or not.
+* memchr_inv() returns the address. In this case, we cannot
+* clear PTE/PUD entry, since the page is used by other.
+* So we cannot also free the page.
+*
+* memchr_inv() returns NULL. In this case, we

Re: [RFC PATCH v4 1/13] memory-hotplug : rename remove_memory to offline_memory

2012-07-19 Thread Yasuaki Ishimatsu

Hi Bob,

2012/07/19 17:19, Bob Liu wrote:

Hi Yasuaki,

On Wed, Jul 18, 2012 at 6:05 PM, Yasuaki Ishimatsu
isimatu.yasu...@jp.fujitsu.com wrote:

remove_memory() does not remove memory but just offlines memory. The patch
changes name of it to offline_memory().


Since offline_memory() just align the start/end pfn and there is no
matched online_memory() function,
i think it's better to remove this function and add the alignment into
offline_pages().


If we change it, these argument becomes different as follows:

  online_pages  : page frame number and number of page frame number
  offline_pages : memory address and memory length

I think it is ugly. So I don't want to change it. As you say, there is no
function that matches to offline_memory(). If we create export symbol
function for onlining page, in this case, the function should be named
online_memory().

Thanks,
Yasuaki Ishimatsu





CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
  drivers/acpi/acpi_memhotplug.c |2 +-
  drivers/base/memory.c  |4 ++--
  include/linux/memory_hotplug.h |2 +-
  mm/memory_hotplug.c|6 +++---
  4 files changed, 7 insertions(+), 7 deletions(-)

Index: linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c
===
--- linux-3.5-rc4.orig/drivers/acpi/acpi_memhotplug.c   2012-07-03 
14:21:46.102416917 +0900
+++ linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c2012-07-03 
14:21:49.458374960 +0900
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(st
  */
 list_for_each_entry_safe(info, n, mem_device-res_list, list) {
 if (info-enabled) {
-   result = remove_memory(info-start_addr, info-length);
+   result = offline_memory(info-start_addr, info-length);
 if (result)
 return result;
 }
Index: linux-3.5-rc4/drivers/base/memory.c
===
--- linux-3.5-rc4.orig/drivers/base/memory.c2012-07-03 14:21:46.095417003 
+0900
+++ linux-3.5-rc4/drivers/base/memory.c 2012-07-03 14:21:49.459374948 +0900
@@ -266,8 +266,8 @@ memory_block_action(unsigned long phys_i
 break;
 case MEM_OFFLINE:
 start_paddr = page_to_pfn(first_page)  PAGE_SHIFT;
-   ret = remove_memory(start_paddr,
-   nr_pages  PAGE_SHIFT);
+   ret = offline_memory(start_paddr,
+nr_pages  PAGE_SHIFT);
 break;
 default:
 WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: 
Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:21:46.102416917 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:21:49.466374860 +0900
@@ -990,7 +990,7 @@ out:
 return ret;
  }

-int remove_memory(u64 start, u64 size)
+int offline_memory(u64 start, u64 size)
  {
 unsigned long start_pfn, end_pfn;

@@ -999,9 +999,9 @@ int remove_memory(u64 start, u64 size)
 return offline_pages(start_pfn, end_pfn, 120 * HZ);
  }
  #else
-int remove_memory(u64 start, u64 size)
+int offline_memory(u64 start, u64 size)
  {
 return -EINVAL;
  }
  #endif /* CONFIG_MEMORY_HOTREMOVE */
-EXPORT_SYMBOL_GPL(remove_memory);
+EXPORT_SYMBOL_GPL(offline_memory);
Index: linux-3.5-rc4/include/linux/memory_hotplug.h
===
--- linux-3.5-rc4.orig/include/linux/memory_hotplug.h   2012-07-03 
14:21:46.102416917 +0900
+++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 
14:21:49.471374796 +0900
@@ -233,7 +233,7 @@ static inline int is_mem_section_removab
  extern int mem_online_node(int nid);
  extern int add_memory(int nid, u64 start, u64 size);
  extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_memory(u64 start, u64 size);
  extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
 int nr_pages);
  extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info

Re: [RFC PATCH v4 7/13] memory-hotplug : remove_memory calls __remove_pages

2012-07-19 Thread Yasuaki Ishimatsu

Hi Bob,

2012/07/19 17:32, Bob Liu wrote:

On Wed, Jul 18, 2012 at 6:12 PM, Yasuaki Ishimatsu
isimatu.yasu...@jp.fujitsu.com wrote:

The patch adds __remove_pages() to remove_memory(). Then the range of
phys_start_pfn argument and nr_pages argument in __remove_pagse() may
have different zone. So zone argument is removed from __remove_pages()
and __remove_pages() caluculates zone in each section.

When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap.
So __remove_section only calls unregister_memory_section().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
  arch/powerpc/platforms/pseries/hotplug-memory.c |5 +
  include/linux/memory_hotplug.h  |3 +--
  mm/memory_hotplug.c |   19 ---
  3 files changed, 14 insertions(+), 13 deletions(-)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-18 18:00:27.440145432 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 18:01:02.070712487 +0900
@@ -275,11 +275,14 @@ static int __meminit __add_section(int n
  #ifdef CONFIG_SPARSEMEM_VMEMMAP
  static int __remove_section(struct zone *zone, struct mem_section *ms)
  {
-   /*
-* XXX: Freeing memmap with vmemmap is not implement yet.
-*  This should be removed later.
-*/
-   return -EBUSY;
+   int ret = -EINVAL;
+
+   if (!valid_section(ms))
+   return ret;
+
+   ret = unregister_memory_section(ms);
+


I saw a patch from Jiang Liu mm/hotplug: free zone-pageset when a
zone becomes empty to
free the zone-pageset and i think there may more cleanup needed when
a zone becomes empty.

We already have __add_zone() in __add_section(), what about add a
function like __remove_zone()
to do the cleanup here?


Thank you for your cooment. As you say, I think cleanup function of zone
is necessary. So I'll update it.

Thanks,
Yasuaki Ishimatsu.




+   return ret;
  }
  #else
  static int __remove_section(struct zone *zone, struct mem_section *ms)
@@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages);
   * sure that pages are marked reserved and zones are adjust properly by
   * calling offline_pages().
   */
-int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-unsigned long nr_pages)
+int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages)
  {
 unsigned long i, ret = 0;
 int sections_to_remove;
+   struct zone *zone;

 /*
  * We can only remove entire sections
@@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, un
 sections_to_remove = nr_pages / PAGES_PER_SECTION;
 for (i = 0; i  sections_to_remove; i++) {
 unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+   zone = page_zone(pfn_to_page(pfn));
 ret = __remove_section(zone, __pfn_to_section(pfn));
 if (ret)
 break;
@@ -1031,6 +1035,7 @@ int __ref remove_memory(int nid, u64 sta
 /* remove memmap entry */
 firmware_map_remove(start, start + size, System RAM);

+   __remove_pages(start  PAGE_SHIFT, size  PAGE_SHIFT);
  out:
 unlock_memory_hotplug();
 return ret;
Index: linux-3.5-rc6/include/linux/memory_hotplug.h
===
--- linux-3.5-rc6.orig/include/linux/memory_hotplug.h   2012-07-18 
18:00:27.445145371 +0900
+++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-18 
18:00:40.461982690 +0900
@@ -89,8 +89,7 @@ extern bool is_pageblock_removable_noloc
  /* reasonably generic interface to expand the physical pages in a zone  */
  extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
 unsigned long nr_pages);
-extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-   unsigned long nr_pages);
+extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages);

  #ifdef CONFIG_NUMA
  extern int memory_add_physaddr_to_nid(u64 start);
Index: linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux-3.5-rc6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  
2012-07-18 18:00:27.442145407 +0900
+++ linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c   
2012-07-18 18:00:40.470982578 +0900
@@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(vo

Re: [RFC PATCH v4 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-19 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/19 16:23, Wen Congyang wrote:
 At 07/18/2012 06:06 PM, Yasuaki Ishimatsu Wrote:
 acpi_memory_device_remove() has been prepared to remove physical memory.
 But, the function only frees acpi_memory_device currentlry.

 The patch adds following functions into acpi_memory_device_remove():
- offline memory
- remove physical memory. It only check whether memory is online or not.
- free acpi_memory_device

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   drivers/acpi/acpi_memhotplug.c |   27 ++-
   drivers/base/memory.c  |   39 
 +++
   include/linux/memory.h |5 +
   include/linux/memory_hotplug.h |5 +
   mm/memory_hotplug.c|   22 ++
   5 files changed, 97 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
 ===
 --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c2012-07-17 
 11:20:15.117796971 +0900
 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-17 
 13:36:30.325594022 +0900
 @@ -29,6 +29,7 @@
   #include linux/module.h
   #include linux/init.h
   #include linux/types.h
 +#include linux/memory.h
   #include linux/memory_hotplug.h
   #include linux/slab.h
   #include acpi/acpi_drivers.h
 @@ -452,12 +453,36 @@ static int acpi_memory_device_add(struct
   static int acpi_memory_device_remove(struct acpi_device *device, int type)
   {
  struct acpi_memory_device *mem_device = NULL;
 -
 +struct acpi_memory_info *info, *tmp;
 +int result;
 +int node;
   
  if (!device || !acpi_driver_data(device))
  return -EINVAL;
   
  mem_device = acpi_driver_data(device);
 +
 +node = acpi_get_node(mem_device-device-handle);
 +list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
 +if (!info-enabled)
 +continue;
 +
 +if (!is_memblk_offline(info-start_addr, info-length)) {
 +result = offline_memory(info-start_addr, info-length);
 +if (result)
 +return result;
 +}
 +if (node  0)
 +node = memory_add_physaddr_to_nid(info-start_addr);
 +
 +result = remove_memory(node, info-start_addr, info-length);
 +if (result)
 +return result;
 +
 +list_del(info-list);
 +kfree(info);
 +}
 +
  kfree(mem_device);
   
  return 0;
 Index: linux-3.5-rc6/include/linux/memory_hotplug.h
 ===
 --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h2012-07-17 
 11:20:15.133796772 +0900
 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-17 
 11:29:41.490716352 +0900
 @@ -221,6 +221,7 @@ static inline void unlock_memory_hotplug
   #ifdef CONFIG_MEMORY_HOTREMOVE
   
   extern int is_mem_section_removable(unsigned long pfn, unsigned long 
 nr_pages);
 +extern int remove_memory(int nid, u64 start, u64 size);
   
   #else
   static inline int is_mem_section_removable(unsigned long pfn,
 @@ -228,6 +229,10 @@ static inline int is_mem_section_removab
   {
  return 0;
   }
 +static inline int remove_memory(int nid, u64 start, u64 size)
 +{
 +return -EBUSY;
 +}
   #endif /* CONFIG_MEMORY_HOTREMOVE */
   
   extern int mem_online_node(int nid);
 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c   2012-07-17 11:20:15.129796821 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-17 13:25:18.952986069 
 +0900
 @@ -998,6 +998,28 @@ int offline_memory(u64 start, u64 size)
  end_pfn = start_pfn + PFN_DOWN(size);
  return offline_pages(start_pfn, end_pfn, 120 * HZ);
   }
 +
 +int remove_memory(int nid, u64 start, u64 size)
 +{
 +int ret = -EBUSY;
 +lock_memory_hotplug();
 +/*
 + * The memory might become online by other task, even if you offine it.
 + * So we check whether the cpu has been onlined or not.
 + */
 +if (!is_memblk_offline(start, size)) {
 +pr_warn(memory removing [mem %#010llx-%#010llx] failed, 
 +because the memmory range is online\n,
 +start, start + size);
 +ret = -EAGAIN;
 +}
 +
 +unlock_memory_hotplug();
 +return ret;
 +
 +}
 +EXPORT_SYMBOL_GPL(remove_memory);
 +
   #else

Re: [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap

2012-07-19 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/19 18:45, Wen Congyang wrote:
 At 07/18/2012 06:16 PM, Yasuaki Ishimatsu Wrote:
 All pages of virtual mapping in removed memory cannot be freed, since some 
 pages
 used as PGD/PUD includes not only removed memory but also other memory. So 
 the
 patch checks whether page can be freed or not.

 How to check whether page can be freed or not?
   1. When removing memory, the page structs of the revmoved memory are filled
  with 0FD.
   2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
  In this case, the page used as PT/PMD can be freed.

 Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
 into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   arch/x86/mm/init_64.c |  121 
 ++
   include/linux/mm.h|2
   mm/memory_hotplug.c   |   19 ---
   mm/sparse.c   |5 +-
   4 files changed, 128 insertions(+), 19 deletions(-)

 Index: linux-3.5-rc6/include/linux/mm.h
 ===
 --- linux-3.5-rc6.orig/include/linux/mm.h2012-07-18 18:01:28.0 
 +0900
 +++ linux-3.5-rc6/include/linux/mm.h 2012-07-18 18:03:05.551168773 +0900
 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
   void vmemmap_populate_print_last(void);
   void register_page_bootmem_memmap(unsigned long section_nr, struct page 
 *map,
unsigned long size);
 +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
 +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
   
   enum mf_flags {
  MF_COUNT_INCREASED = 1  0,
 Index: linux-3.5-rc6/mm/sparse.c
 ===
 --- linux-3.5-rc6.orig/mm/sparse.c   2012-07-18 17:59:25.0 +0900
 +++ linux-3.5-rc6/mm/sparse.c2012-07-18 18:03:05.553168749 +0900
 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
  /* This will make the necessary allocations eventually. */
  return sparse_mem_map_populate(pnum, nid);
   }
 -static void __kfree_section_memmap(struct page *memmap, unsigned long 
 nr_pages)
 +static void __kfree_section_memmap(struct page *page, unsigned long 
 nr_pages)
   {
 -return; /* XXX: Not implemented yet */
 +vmemmap_kfree(page, nr_pages);
   }
   static void free_map_bootmem(struct page *page, unsigned long nr_pages)
   {
 +vmemmap_free_bootmem(page, nr_pages);
   }
   #else
   static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 Index: linux-3.5-rc6/arch/x86/mm/init_64.c
 ===
 --- linux-3.5-rc6.orig/arch/x86/mm/init_64.c 2012-07-18 18:01:28.0 
 +0900
 +++ linux-3.5-rc6/arch/x86/mm/init_64.c  2012-07-18 18:03:05.564168611 
 +0900
 @@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
  return 0;
   }
   
 +#define PAGE_INUSE 0xFD
 +
 +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
 +struct page **pp, int *page_size)
 +{
 +pgd_t *pgd;
 +pud_t *pud;
 +pmd_t *pmd;
 +pte_t *pte;
 +void *page_addr;
 +unsigned long next;
 +
 +*pp = NULL;
 +
 +pgd = pgd_offset_k(addr);
 +if (pgd_none(*pgd))
 +return pgd_addr_end(addr, end);
 +
 +pud = pud_offset(pgd, addr);
 +if (pud_none(*pud))
 +return pud_addr_end(addr,end);
 +
 +if (!cpu_has_pse) {
 +next = (addr + PAGE_SIZE)  PAGE_MASK;
 +pmd = pmd_offset(pud, addr);
 +if (pmd_none(*pmd))
 +return next;
 +
 +pte = pte_offset_kernel(pmd, addr);
 +if (pte_none(*pte))
 +return next;
 +
 +*page_size = PAGE_SIZE;
 +*pp = pte_page(*pte);
 +} else {
 +next = pmd_addr_end(addr, end);
 +
 +pmd = pmd_offset(pud, addr);
 +if (pmd_none(*pmd))
 +return next;
 +
 +*page_size = PMD_SIZE;
 +*pp = pmd_page(*pmd);
 +}
 +
 +/*
 + * Removed page structs are filled with 0xFD.
 + */
 +memset((void *)addr, PAGE_INUSE, next - addr);
 +
 +page_addr = page_address(*pp);
 +
 +/*
 + * Check the page is filled with 0xFD or not.
 + * memchr_inv() returns the address. In this case, we cannot
 + * clear PTE/PUD entry, since the page is used by other

[RFC PATCH v4 0/13] memory-hotplug : hot-remove physical memory

2012-07-18 Thread Yasuaki Ishimatsu
This patch series aims to support physical memory hot-remove.

  [RFC PATCH v4 1/13] memory-hotplug : rename remove_memory to offline_memory
  [RFC PATCH v4 2/13] memory-hotplug : add physical memory hotplug code to 
acpi_memory_device_remove
  [RFC PATCH v4 3/13] memory-hotplug : check whether memory is present or not
  [RFC PATCH v4 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
  [RFC PATCH v4 5/13] memory-hotplug : does not release memory region in 
PAGES_PER_SECTION chunks
  [RFC PATCH v4 6/13] memory-hotplug : add memory_block_release
  [RFC PATCH v4 7/13] memory-hotplug : remove_memory calls __remove_pages
  [RFC PATCH v4 8/13] memory-hotplug : check page type in get_page_bootmem
  [RFC PATCH v4 9/13] memory-hotplug : move register_page_bootmem_info_node and 
put_page_bootmem for
sparse-vmemmap4
  [RFC PATCH v4 10/13] memory-hotplug : implement 
register_page_bootmem_info_section of sparse-vmemmap
  [RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap
  [RFC PATCH v4 12/13] memory-hotplug : add node_device_release
  [RFC PATCH v4 13/13] memory-hotplug : remove sysfs file of node

Even if you apply these patches, you cannot remove the physical memory
completely since these patches are still under development. But other
components can be removed. I want you to cooperate to improve the
physical memory hot-remove. So please review these patches and give
your comment/idea.

The patches can free/remove following things:

  - acpi_memory_info  : [RFC PATCH 2/13]
  - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 4/13]
  - iomem_resource: [RFC PATCH 5/13]
  - mem_section and related sysfs files   : [RFC PATCH 6-11/13]
  - node and related sysfs files  : [RFC PATCH 12-13/13]

The patches cannot do following things yet:

  - page table of removed memory

If you find lack of function for physical memory hot-remove, please let me
know.

change log of v4:
 * remove memory-hotplug : unify argument of firmware_map_add_early/hotplug
   from the patch series, since the patch is a bugfix. It is being disccussed
   on other thread. But for testing the patch series, the patch is needed.
   So I added the patch as [PATCH 0/13].

 [RFC PATCH v4 2/13]
   * check memory is online or not at remove_memory()
   * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
 getting node id
 
 [RFC PATCH v4 3/13]
   * create new patch : check memory is online or not at online_pages()

 [RFC PATCH v4 4/13]
   * add __ref section to remove_memory()
   * call firmware_map_remove_entry() before remove_sysfs_fw_map_entry()

 [RFC PATCH v4 11/13]
   * rewrite register_page_bootmem_memmap() for removing page used as PT/PMD

change log of v3:
 * rebase to 3.5.0-rc6

 [RFC PATCH v2 2/13]
   * remove extra kobject_put()

   * The patch was commented by Wen. Wen's comment is
 acpi_memory_device_remove() should ignore a return value of
 remove_memory() since caller does not care the return value.
 But I did not change it since I think caller should care the
 return value. And I am trying to fix it as follow:

 https://lkml.org/lkml/2012/7/5/624

 [RFC PATCH v2 4/13]
   * remove a firmware_memmap_entry allocated by kzmalloc()

change log of v2:
 [RFC PATCH v2 2/13]
   * check whether memory block is offline or not before calling 
offline_memory()
   * check whether section is valid or not in is_memblk_offline()
   * call kobject_put() for each memory_block in is_memblk_offline()

 [RFC PATCH v2 3/13]
   * unify the end argument of firmware_map_add_early/hotplug

 [RFC PATCH v2 4/13]
   * add release_firmware_map_entry() for freeing firmware_map_entry

 [RFC PATCH v2 6/13]
  * add release_memory_block() for freeing memory_block

 [RFC PATCH v2 11/13]
  * fix wrong arguments of free_pages()

---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
 arch/x86/mm/init_64.c   |  144 
 drivers/acpi/acpi_memhotplug.c  |   28 
 drivers/base/memory.c   |   54 -
 drivers/base/node.c |7 +
 drivers/firmware/memmap.c   |   78 -
 include/linux/firmware-map.h|6 +
 include/linux/memory.h  |5 
 include/linux/memory_hotplug.h  |   17 --
 include/linux/mm.h  |5 
 mm/memory_hotplug.c |   98 
 mm/sparse.c |5 
 12 files changed, 414 insertions(+), 49 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH 0/13] firmware_map : unify argument of firmware_map_add_early/hotplug

2012-07-18 Thread Yasuaki Ishimatsu
There are two ways to create /sys/firmware/memmap/X sysfs:

  - firmware_map_add_early
When the system starts, it is calledd from e820_reserve_resources()
  - firmware_map_add_hotplug
When the memory is hot plugged, it is called from add_memory()

But these functions are called without unifying value of end argument as below:

  - end argument of firmware_map_add_early()   : start + size - 1
  - end argument of firmware_map_add_hogplug() : start + size

The patch unifies them to start + size. Even if applying the patch,
/sys/firmware/memmap/X/end file content does not change.

CC: Thomas Gleixner t...@linutronix.de
CC: Ingo Molnar mi...@kernel.org
CC: H. Peter Anvin h...@zytor.com
CC: Tejun Heo t...@kernel.org
CC: Andrew Morton a...@linux-foundation.org
Reviewed-by: Dave Hansen d...@linux.vnet.ibm.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/x86/kernel/e820.c|2 +-
 drivers/firmware/memmap.c |8 
 2 files changed, 5 insertions(+), 5 deletions(-)

Index: linux-3.5-rc6/arch/x86/kernel/e820.c
===
--- linux-3.5-rc6.orig/arch/x86/kernel/e820.c   2012-07-18 17:19:38.391365260 
+0900
+++ linux-3.5-rc6/arch/x86/kernel/e820.c2012-07-18 17:19:43.616300222 
+0900
@@ -944,7 +944,7 @@ void __init e820_reserve_resources(void)
for (i = 0; i  e820_saved.nr_map; i++) {
struct e820entry *entry = e820_saved.map[i];
firmware_map_add_early(entry-addr,
-   entry-addr + entry-size - 1,
+   entry-addr + entry-size,
e820_type_to_string(entry-type));
}
 }
Index: linux-3.5-rc6/drivers/firmware/memmap.c
===
--- linux-3.5-rc6.orig/drivers/firmware/memmap.c2012-07-18 
17:19:38.388365299 +0900
+++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-18 18:30:47.608390251 
+0900
@@ -98,7 +98,7 @@ static LIST_HEAD(map_entries);
 /**
  * firmware_map_add_entry() - Does the real work to add a firmware memmap 
entry.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  * @entry: Pre-allocated (either kmalloc() or bootmem allocator), uninitialised
  * entry.
@@ -113,7 +113,7 @@ static int firmware_map_add_entry(u64 st
BUG_ON(start  end);
 
entry-start = start;
-   entry-end = end;
+   entry-end = end - 1;
entry-type = type;
INIT_LIST_HEAD(entry-list);
kobject_init(entry-kobj, memmap_ktype);
@@ -148,7 +148,7 @@ static int add_sysfs_fw_map_entry(struct
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  *
  * Adds a firmware mapping entry. This function is for memory hotplug, it is
@@ -175,7 +175,7 @@ int __meminit firmware_map_add_hotplug(u
 /**
  * firmware_map_add_early() - Adds a firmware mapping entry.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  *
  * Adds a firmware mapping entry. This function uses the bootmem allocator

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 1/13] memory-hotplug : rename remove_memory to offline_memory

2012-07-18 Thread Yasuaki Ishimatsu
remove_memory() does not remove memory but just offlines memory. The patch
changes name of it to offline_memory().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/acpi/acpi_memhotplug.c |2 +-
 drivers/base/memory.c  |4 ++--
 include/linux/memory_hotplug.h |2 +-
 mm/memory_hotplug.c|6 +++---
 4 files changed, 7 insertions(+), 7 deletions(-)

Index: linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c
===
--- linux-3.5-rc4.orig/drivers/acpi/acpi_memhotplug.c   2012-07-03 
14:21:46.102416917 +0900
+++ linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c2012-07-03 
14:21:49.458374960 +0900
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(st
 */
list_for_each_entry_safe(info, n, mem_device-res_list, list) {
if (info-enabled) {
-   result = remove_memory(info-start_addr, info-length);
+   result = offline_memory(info-start_addr, info-length);
if (result)
return result;
}
Index: linux-3.5-rc4/drivers/base/memory.c
===
--- linux-3.5-rc4.orig/drivers/base/memory.c2012-07-03 14:21:46.095417003 
+0900
+++ linux-3.5-rc4/drivers/base/memory.c 2012-07-03 14:21:49.459374948 +0900
@@ -266,8 +266,8 @@ memory_block_action(unsigned long phys_i
break;
case MEM_OFFLINE:
start_paddr = page_to_pfn(first_page)  PAGE_SHIFT;
-   ret = remove_memory(start_paddr,
-   nr_pages  PAGE_SHIFT);
+   ret = offline_memory(start_paddr,
+nr_pages  PAGE_SHIFT);
break;
default:
WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: 
Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:21:46.102416917 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:21:49.466374860 +0900
@@ -990,7 +990,7 @@ out:
return ret;
 }
 
-int remove_memory(u64 start, u64 size)
+int offline_memory(u64 start, u64 size)
 {
unsigned long start_pfn, end_pfn;
 
@@ -999,9 +999,9 @@ int remove_memory(u64 start, u64 size)
return offline_pages(start_pfn, end_pfn, 120 * HZ);
 }
 #else
-int remove_memory(u64 start, u64 size)
+int offline_memory(u64 start, u64 size)
 {
return -EINVAL;
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
-EXPORT_SYMBOL_GPL(remove_memory);
+EXPORT_SYMBOL_GPL(offline_memory);
Index: linux-3.5-rc4/include/linux/memory_hotplug.h
===
--- linux-3.5-rc4.orig/include/linux/memory_hotplug.h   2012-07-03 
14:21:46.102416917 +0900
+++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 
14:21:49.471374796 +0900
@@ -233,7 +233,7 @@ static inline int is_mem_section_removab
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-18 Thread Yasuaki Ishimatsu
acpi_memory_device_remove() has been prepared to remove physical memory.
But, the function only frees acpi_memory_device currentlry. 

The patch adds following functions into acpi_memory_device_remove():
  - offline memory
  - remove physical memory. It only check whether memory is online or not.
  - free acpi_memory_device

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/acpi/acpi_memhotplug.c |   27 ++-
 drivers/base/memory.c  |   39 +++
 include/linux/memory.h |5 +
 include/linux/memory_hotplug.h |5 +
 mm/memory_hotplug.c|   22 ++
 5 files changed, 97 insertions(+), 1 deletion(-)

Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
===
--- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c   2012-07-17 
11:20:15.117796971 +0900
+++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c2012-07-17 
13:36:30.325594022 +0900
@@ -29,6 +29,7 @@
 #include linux/module.h
 #include linux/init.h
 #include linux/types.h
+#include linux/memory.h
 #include linux/memory_hotplug.h
 #include linux/slab.h
 #include acpi/acpi_drivers.h
@@ -452,12 +453,36 @@ static int acpi_memory_device_add(struct
 static int acpi_memory_device_remove(struct acpi_device *device, int type)
 {
struct acpi_memory_device *mem_device = NULL;
-
+   struct acpi_memory_info *info, *tmp;
+   int result;
+   int node;
 
if (!device || !acpi_driver_data(device))
return -EINVAL;
 
mem_device = acpi_driver_data(device);
+
+   node = acpi_get_node(mem_device-device-handle);
+   list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
+   if (!info-enabled)
+   continue;
+
+   if (!is_memblk_offline(info-start_addr, info-length)) {
+   result = offline_memory(info-start_addr, info-length);
+   if (result)
+   return result;
+   }
+   if (node  0)
+   node = memory_add_physaddr_to_nid(info-start_addr);
+
+   result = remove_memory(node, info-start_addr, info-length);
+   if (result)
+   return result;
+
+   list_del(info-list);
+   kfree(info);
+   }
+
kfree(mem_device);
 
return 0;
Index: linux-3.5-rc6/include/linux/memory_hotplug.h
===
--- linux-3.5-rc6.orig/include/linux/memory_hotplug.h   2012-07-17 
11:20:15.133796772 +0900
+++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-17 
11:29:41.490716352 +0900
@@ -221,6 +221,7 @@ static inline void unlock_memory_hotplug
 #ifdef CONFIG_MEMORY_HOTREMOVE
 
 extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
+extern int remove_memory(int nid, u64 start, u64 size);
 
 #else
 static inline int is_mem_section_removable(unsigned long pfn,
@@ -228,6 +229,10 @@ static inline int is_mem_section_removab
 {
return 0;
 }
+static inline int remove_memory(int nid, u64 start, u64 size)
+{
+   return -EBUSY;
+}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 extern int mem_online_node(int nid);
Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-17 11:20:15.129796821 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-17 13:25:18.952986069 +0900
@@ -998,6 +998,28 @@ int offline_memory(u64 start, u64 size)
end_pfn = start_pfn + PFN_DOWN(size);
return offline_pages(start_pfn, end_pfn, 120 * HZ);
 }
+
+int remove_memory(int nid, u64 start, u64 size)
+{
+   int ret = -EBUSY;
+   lock_memory_hotplug();
+   /*
+* The memory might become online by other task, even if you offine it.
+* So we check whether the cpu has been onlined or not.
+*/
+   if (!is_memblk_offline(start, size)) {
+   pr_warn(memory removing [mem %#010llx-%#010llx] failed, 
+   because the memmory range is online\n,
+   start, start + size);
+   ret = -EAGAIN;
+   }
+
+   unlock_memory_hotplug();
+   return ret;
+
+}
+EXPORT_SYMBOL_GPL(remove_memory);
+
 #else
 int offline_memory(u64 start, u64 size)
 {
Index: linux-3.5-rc6/drivers/base/memory.c

[PATCH v4 3/13] memory-hotplug : check whether memory is present or not

2012-07-18 Thread Yasuaki Ishimatsu
If system supports memory hot-remove, online_pages() may online removed pages.
So online_pages() need to check whether onlining pages are present or not.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 include/linux/mmzone.h |   21 +
 mm/memory_hotplug.c|   13 +
 2 files changed, 34 insertions(+)

Index: linux-3.5-rc6/include/linux/mmzone.h
===
--- linux-3.5-rc6.orig/include/linux/mmzone.h   2012-07-08 09:23:56.0 
+0900
+++ linux-3.5-rc6/include/linux/mmzone.h2012-07-17 16:10:21.588186145 
+0900
@@ -1168,6 +1168,27 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_SPARSEMEM
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   int i;
+   for (i = 0; i  nr_pages; i++) {
+   if (pfn_present(pfn + 1))
+   continue;
+   else {
+   unlock_memory_hotplug();
+   return -EINVAL;
+   }
+   }
+   return 0;
+}
+#else
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   return 0;
+}
+#endif /* CONFIG_SPARSEMEM*/
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
 #else
Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-17 14:26:40.0 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-17 16:09:50.070580170 +0900
@@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn
struct memory_notify arg;
 
lock_memory_hotplug();
+   /*
+* If system supports memory hot-remove, the memory may have been
+* removed. So we check whether the memory has been removed or not.
+*
+* Note: When CONFIG_SPARSEMEM is defined, pfns_present() become
+*   effective. If CONFIG_SPARSEMEM is not defined, pfns_present()
+*   always returns 0.
+*/
+   ret = pfns_present(pfn, nr_pages);
+   if (ret) {
+   unlock_memory_hotplug();
+   return ret;
+   }
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-07-18 Thread Yasuaki Ishimatsu
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note : The code does not free firmware_map_entry since there is no way to free
   memory which is allocated by bootmem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/firmware/memmap.c|   78 ++-
 include/linux/firmware-map.h |6 +++
 mm/memory_hotplug.c  |9 +++-
 3 files changed, 90 insertions(+), 3 deletions(-)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-18 17:20:05.670024283 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 17:51:03.933189930 +0900
@@ -1012,9 +1012,9 @@ int offline_memory(u64 start, u64 size)
return offline_pages(start_pfn, end_pfn, 120 * HZ);
 }
 
-int remove_memory(int nid, u64 start, u64 size)
+int __ref remove_memory(int nid, u64 start, u64 size)
 {
-   int ret = -EBUSY;
+   int ret = 0;
lock_memory_hotplug();
/*
 * The memory might become online by other task, even if you offine it.
@@ -1025,8 +1025,13 @@ int remove_memory(int nid, u64 start, u6
because the memmory range is online\n,
start, start + size);
ret = -EAGAIN;
+   goto out;
}
 
+   /* remove memmap entry */
+   firmware_map_remove(start, start + size, System RAM);
+
+out:
unlock_memory_hotplug();
return ret;
 
Index: linux-3.5-rc6/include/linux/firmware-map.h
===
--- linux-3.5-rc6.orig/include/linux/firmware-map.h 2012-07-18 
17:19:37.007382563 +0900
+++ linux-3.5-rc6/include/linux/firmware-map.h  2012-07-18 17:42:20.804730245 
+0900
@@ -25,6 +25,7 @@
 
 int firmware_map_add_early(u64 start, u64 end, const char *type);
 int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
+int firmware_map_remove(u64 start, u64 end, const char *type);
 
 #else /* CONFIG_FIRMWARE_MEMMAP */
 
@@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl
return 0;
 }
 
+static inline int firmware_map_remove(u64 start, u64 end, const char *type)
+{
+   return 0;
+}
+
 #endif /* CONFIG_FIRMWARE_MEMMAP */
 
 #endif /* _LINUX_FIRMWARE_MAP_H */
Index: linux-3.5-rc6/drivers/firmware/memmap.c
===
--- linux-3.5-rc6.orig/drivers/firmware/memmap.c2012-07-18 
17:19:43.618300182 +0900
+++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-18 17:42:20.846729721 
+0900
@@ -21,6 +21,7 @@
 #include linux/types.h
 #include linux/bootmem.h
 #include linux/slab.h
+#include linux/mm.h
 
 /*
  * Data types 
--
@@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att
.show = memmap_attr_show,
 };
 
+#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+   struct firmware_map_entry *entry = to_memmap_entry(kobj);
+   struct page *page;
+
+   page = virt_to_page(entry);
+   if (PageSlab(page) || PageCompound(page))
+   kfree(entry);
+
+   /* There is no way to free memory allocated from bootmem*/
+}
+
 static struct kobj_type memmap_ktype = {
+   .release= release_firmware_map_entry,
.sysfs_ops  = memmap_attr_ops,
.default_attrs  = def_attrs,
 };
@@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st
return 0;
 }
 
+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
+{
+   list_del(entry-list);
+}
+
 /*
  * Add memmap entry on sysfs
  */
@@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct
return 0;
 }
 
+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+   kobject_put(entry-kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+struct firmware_map_entry * __meminit
+find_firmware_map_entry(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   list_for_each_entry(entry, map_entries, list)
+   if ((entry

[RFC PATCH v4 5/13] memory-hotplug : does not release memory region in PAGES_PER_SECTION chunks

2012-07-18 Thread Yasuaki Ishimatsu
Since applying a patch(de7f0cba96786c), release_mem_region() has been changed
as called in PAGES_PER_SECTION chunks because register_memory_resource() is
called in PAGES_PER_SECTION chunks by add_memory(). But it seems firmware
dependency. If CRS are written in the PAGES_PER_SECTION chunks in ACPI DSDT
Table, register_memory_resource() is called in PAGES_PER_SECTION chunks.
But if CRS are written in the DIMM unit in ACPI DSDT Table,
register_memory_resource() is called in DIMM unit. So release_mem_region()
should not be called in PAGES_PER_SECTION chunks. The patch fixes it.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   13 +
 mm/memory_hotplug.c |4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-18 17:51:03.933189930 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 17:51:17.550020005 +0900
@@ -358,11 +358,11 @@ int __remove_pages(struct zone *zone, un
BUG_ON(phys_start_pfn  ~PAGE_SECTION_MASK);
BUG_ON(nr_pages % PAGES_PER_SECTION);
 
+   release_mem_region(phys_start_pfn  PAGE_SHIFT,  nr_pages * PAGE_SIZE);
+
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
-   release_mem_region(pfn  PAGE_SHIFT,
-  PAGES_PER_SECTION  PAGE_SHIFT);
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;
Index: linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux-3.5-rc6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  
2012-07-18 17:50:49.893365814 +0900
+++ linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c   
2012-07-18 17:51:17.553019968 +0900
@@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsig
 {
unsigned long start, start_pfn;
struct zone *zone;
-   int ret;
+   int i, ret;
+   int sections_to_remove;
 
start_pfn = base  PAGE_SHIFT;
 
@@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig
 * to sysfs state file and we can't remove sysfs entries
 * while writing to it. So we have to defer it to here.
 */
-   ret = __remove_pages(zone, start_pfn, memblock_size  PAGE_SHIFT);
-   if (ret)
-   return ret;
+   sections_to_remove = (memblock_size  PAGE_SHIFT) / PAGES_PER_SECTION;
+   for (i = 0; i  sections_to_remove; i++) {
+   unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
+   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   if (ret)
+   return ret;
+   }
 
/*
 * Update memory regions for memory remove

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 7/13] memory-hotplug : remove_memory calls __remove_pages

2012-07-18 Thread Yasuaki Ishimatsu
The patch adds __remove_pages() to remove_memory(). Then the range of
phys_start_pfn argument and nr_pages argument in __remove_pagse() may
have different zone. So zone argument is removed from __remove_pages()
and __remove_pages() caluculates zone in each section.

When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap.
So __remove_section only calls unregister_memory_section().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/powerpc/platforms/pseries/hotplug-memory.c |5 +
 include/linux/memory_hotplug.h  |3 +--
 mm/memory_hotplug.c |   19 ---
 3 files changed, 14 insertions(+), 13 deletions(-)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-18 18:00:27.440145432 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 18:01:02.070712487 +0900
@@ -275,11 +275,14 @@ static int __meminit __add_section(int n
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
-   /*
-* XXX: Freeing memmap with vmemmap is not implement yet.
-*  This should be removed later.
-*/
-   return -EBUSY;
+   int ret = -EINVAL;
+
+   if (!valid_section(ms))
+   return ret;
+
+   ret = unregister_memory_section(ms);
+
+   return ret;
 }
 #else
 static int __remove_section(struct zone *zone, struct mem_section *ms)
@@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages);
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-unsigned long nr_pages)
+int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages)
 {
unsigned long i, ret = 0;
int sections_to_remove;
+   struct zone *zone;
 
/*
 * We can only remove entire sections
@@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, un
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+   zone = page_zone(pfn_to_page(pfn));
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;
@@ -1031,6 +1035,7 @@ int __ref remove_memory(int nid, u64 sta
/* remove memmap entry */
firmware_map_remove(start, start + size, System RAM);
 
+   __remove_pages(start  PAGE_SHIFT, size  PAGE_SHIFT);
 out:
unlock_memory_hotplug();
return ret;
Index: linux-3.5-rc6/include/linux/memory_hotplug.h
===
--- linux-3.5-rc6.orig/include/linux/memory_hotplug.h   2012-07-18 
18:00:27.445145371 +0900
+++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-18 
18:00:40.461982690 +0900
@@ -89,8 +89,7 @@ extern bool is_pageblock_removable_noloc
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
-extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-   unsigned long nr_pages);
+extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
Index: linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux-3.5-rc6.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  
2012-07-18 18:00:27.442145407 +0900
+++ linux-3.5-rc6/arch/powerpc/platforms/pseries/hotplug-memory.c   
2012-07-18 18:00:40.470982578 +0900
@@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(vo
 static int pseries_remove_memblock(unsigned long base, unsigned int 
memblock_size)
 {
unsigned long start, start_pfn;
-   struct zone *zone;
int i, ret;
int sections_to_remove;
 
@@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsig
return 0;
}
 
-   zone = page_zone(pfn_to_page(start_pfn));
-
/*
 * Remove section mappings and sysfs entries for the
 * section of the memory we are removing.
@@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsig
sections_to_remove = (memblock_size  PAGE_SHIFT

[RFC PATCH v4 6/13] memory-hotplug : add memory_block_release

2012-07-18 Thread Yasuaki Ishimatsu
When calling remove_memory_block(), the function shows following message at
device_release().

Device 'memory528' does not have a release() function, it is broken and must
be fixed.

remove_memory_block() calls kfree(mem). I think it shouled be called from
device_release(). So the patch implements memory_block_release()

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/base/memory.c |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Index: linux-3.5-rc6/drivers/base/memory.c
===
--- linux-3.5-rc6.orig/drivers/base/memory.c2012-07-18 17:50:49.659368740 
+0900
+++ linux-3.5-rc6/drivers/base/memory.c 2012-07-18 17:51:28.655881214 +0900
@@ -109,6 +109,15 @@ bool is_memblk_offline(unsigned long sta
 }
 EXPORT_SYMBOL(is_memblk_offline);
 
+#define to_memory_block(device) container_of(device, struct memory_block, dev)
+
+static void release_memory_block(struct device *dev)
+{
+   struct memory_block *mem = to_memory_block(dev);
+
+   kfree(mem);
+}
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
@@ -119,6 +128,7 @@ int register_memory(struct memory_block 
 
memory-dev.bus = memory_subsys;
memory-dev.id = memory-start_section_nr / sections_per_block;
+   memory-dev.release = release_memory_block;
 
error = device_register(memory-dev);
return error;
@@ -669,7 +679,6 @@ int remove_memory_block(unsigned long no
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
unregister_memory(mem);
-   kfree(mem);
} else
kobject_put(mem-dev.kobj);
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 8/13] memory-hotplug : check page type in get_page_bootmem

2012-07-18 Thread Yasuaki Ishimatsu
There is a possibility that get_page_bootmem() is called to the same page many
times. So when get_page_bootmem is called to the same page, the function only
increments page-_count.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 mm/memory_hotplug.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-18 18:01:02.070712487 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 18:01:12.586581077 +0900
@@ -95,10 +95,17 @@ static void release_memory_resource(stru
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
-   page-lru.next = (struct list_head *) type;
-   SetPagePrivate(page);
-   set_page_private(page, info);
-   atomic_inc(page-_count);
+   unsigned long page_type;
+
+   page_type = (unsigned long) page-lru.next;
+   if (type  MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+   type  MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+   page-lru.next = (struct list_head *) type;
+   SetPagePrivate(page);
+   set_page_private(page, info);
+   atomic_inc(page-_count);
+   } else
+   atomic_inc(page-_count);
 }
 
 /* reference to __meminit __free_pages_bootmem is valid

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 9/13] memory-hotplug : move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap

2012-07-18 Thread Yasuaki Ishimatsu
For implementing register_page_bootmem_info_node of sparse-vmemmap, 
register_page_bootmem_info_node and put_page_bootmem are moved to
memory_hotplug.c

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 include/linux/memory_hotplug.h |9 -
 mm/memory_hotplug.c|8 ++--
 2 files changed, 6 insertions(+), 11 deletions(-)

Index: linux-3.5-rc6/include/linux/memory_hotplug.h
===
--- linux-3.5-rc6.orig/include/linux/memory_hotplug.h   2012-07-18 
18:00:40.461982690 +0900
+++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-18 
18:01:24.217435670 +0900
@@ -160,17 +160,8 @@ static inline void arch_refresh_nodedata
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
-{
-}
-static inline void put_page_bootmem(struct page *page)
-{
-}
-#else
 extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
-#endif
 
 /*
  * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-18 18:01:12.586581077 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 18:01:24.221435622 +0900
@@ -91,7 +91,6 @@ static void release_memory_resource(stru
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
@@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page 
 
 }
 
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
unsigned long *usemap, mapsize, section_nr, i;
@@ -163,6 +163,11 @@ static void register_page_bootmem_info_s
get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 
 }
+#else
+static inline void register_page_bootmem_info_section(unsigned long start_pfn)
+{
+}
+#endif
 
 void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
@@ -198,7 +203,6 @@ void register_page_bootmem_info_node(str
register_page_bootmem_info_section(pfn);
 
 }
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
   unsigned long end_pfn)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 10/13] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap

2012-07-18 Thread Yasuaki Ishimatsu
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
So the patch searches pages of virtual mapping and registers the pages by
get_page_bootmem().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/x86/mm/init_64.c  |   52 +
 include/linux/memory_hotplug.h |2 +
 include/linux/mm.h |3 +-
 mm/memory_hotplug.c|   23 +++---
 4 files changed, 76 insertions(+), 4 deletions(-)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-18 18:01:24.221435622 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 18:01:28.156386427 +0900
@@ -91,8 +91,8 @@ static void release_memory_resource(stru
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-static void get_page_bootmem(unsigned long info,  struct page *page,
-unsigned long type)
+void get_page_bootmem(unsigned long info,  struct page *page,
+ unsigned long type)
 {
unsigned long page_type;
 
@@ -164,8 +164,25 @@ static void register_page_bootmem_info_s
 
 }
 #else
-static inline void register_page_bootmem_info_section(unsigned long start_pfn)
+static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
+   unsigned long mapsize, section_nr;
+   struct mem_section *ms;
+   struct page *page, *memmap;
+
+   if (!pfn_valid(start_pfn))
+   return;
+
+   section_nr = pfn_to_section_nr(start_pfn);
+   ms = __nr_to_section(section_nr);
+
+   memmap = sparse_decode_mem_map(ms-section_mem_map, section_nr);
+
+   page = virt_to_page(memmap);
+   mapsize = sizeof(struct page) * PAGES_PER_SECTION;
+   mapsize = PAGE_ALIGN(mapsize)  PAGE_SHIFT;
+
+   register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);
 }
 #endif
 
Index: linux-3.5-rc6/include/linux/mm.h
===
--- linux-3.5-rc6.orig/include/linux/mm.h   2012-07-18 17:59:51.225598230 
+0900
+++ linux-3.5-rc6/include/linux/mm.h2012-07-18 18:01:28.161386365 +0900
@@ -1586,7 +1586,8 @@ int vmemmap_populate_basepages(struct pa
unsigned long pages, int node);
 int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
 void vmemmap_populate_print_last(void);
-
+void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
+ unsigned long size);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1  0,
Index: linux-3.5-rc6/arch/x86/mm/init_64.c
===
--- linux-3.5-rc6.orig/arch/x86/mm/init_64.c2012-07-18 17:59:51.221598278 
+0900
+++ linux-3.5-rc6/arch/x86/mm/init_64.c 2012-07-18 18:01:28.169386264 +0900
@@ -978,6 +978,58 @@ vmemmap_populate(struct page *start_page
return 0;
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   unsigned long addr = (unsigned long)start_page;
+   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long next;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+
+   for (; addr  end; addr = next) {
+   pte_t *pte = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud)) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   continue;
+   get_page_bootmem(section_nr, pmd_page(*pmd),
+MIX_SECTION_INFO);
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte

[RFC PATCH v4 11/13] memory-hotplug : free memmap of sparse-vmemmap

2012-07-18 Thread Yasuaki Ishimatsu
All pages of virtual mapping in removed memory cannot be freed, since some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch checks whether page can be freed or not.

How to check whether page can be freed or not?
 1. When removing memory, the page structs of the revmoved memory are filled
with 0FD.
 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
In this case, the page used as PT/PMD can be freed.

Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/x86/mm/init_64.c |  121 ++
 include/linux/mm.h|2 
 mm/memory_hotplug.c   |   19 ---
 mm/sparse.c   |5 +-
 4 files changed, 128 insertions(+), 19 deletions(-)

Index: linux-3.5-rc6/include/linux/mm.h
===
--- linux-3.5-rc6.orig/include/linux/mm.h   2012-07-18 18:01:28.0 
+0900
+++ linux-3.5-rc6/include/linux/mm.h2012-07-18 18:03:05.551168773 +0900
@@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
 void vmemmap_populate_print_last(void);
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
  unsigned long size);
+void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
+void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1  0,
Index: linux-3.5-rc6/mm/sparse.c
===
--- linux-3.5-rc6.orig/mm/sparse.c  2012-07-18 17:59:25.0 +0900
+++ linux-3.5-rc6/mm/sparse.c   2012-07-18 18:03:05.553168749 +0900
@@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
/* This will make the necessary allocations eventually. */
return sparse_mem_map_populate(pnum, nid);
 }
-static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
+static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
 {
-   return; /* XXX: Not implemented yet */
+   vmemmap_kfree(page, nr_pages);
 }
 static void free_map_bootmem(struct page *page, unsigned long nr_pages)
 {
+   vmemmap_free_bootmem(page, nr_pages);
 }
 #else
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
Index: linux-3.5-rc6/arch/x86/mm/init_64.c
===
--- linux-3.5-rc6.orig/arch/x86/mm/init_64.c2012-07-18 18:01:28.0 
+0900
+++ linux-3.5-rc6/arch/x86/mm/init_64.c 2012-07-18 18:03:05.564168611 +0900
@@ -978,6 +978,127 @@ vmemmap_populate(struct page *start_page
return 0;
 }
 
+#define PAGE_INUSE 0xFD
+
+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+   struct page **pp, int *page_size)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   void *page_addr;
+   unsigned long next;
+
+   *pp = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd))
+   return pgd_addr_end(addr, end);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud))
+   return pud_addr_end(addr,end);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte))
+   return next;
+
+   *page_size = PAGE_SIZE;
+   *pp = pte_page(*pte);
+   } else {
+   next = pmd_addr_end(addr, end);
+
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   *page_size = PMD_SIZE;
+   *pp = pmd_page(*pmd);
+   }
+
+   /*
+* Removed page structs are filled with 0xFD.
+*/
+   memset((void *)addr, PAGE_INUSE, next - addr);
+
+   page_addr = page_address(*pp);
+
+   /*
+* Check the page is filled with 0xFD or not.
+* memchr_inv() returns the address. In this case, we cannot
+* clear PTE/PUD entry, since the page is used by other.
+* So we cannot also free the page.
+*
+* memchr_inv() returns NULL. In this case, we

[RFC PATCH v4 12/13] memory-hotplug : add node_device_release

2012-07-18 Thread Yasuaki Ishimatsu
When calling unregister_node(), the function shows following message at
device_release().

Device 'node2' does not have a release() function, it is broken and must be
fixed.

So the patch implements node_device_release()

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/base/node.c |7 +++
 1 file changed, 7 insertions(+)

Index: linux-3.5-rc6/drivers/base/node.c
===
--- linux-3.5-rc6.orig/drivers/base/node.c  2012-07-18 18:24:29.191121066 
+0900
+++ linux-3.5-rc6/drivers/base/node.c   2012-07-18 18:25:47.46983 +0900
@@ -252,6 +252,12 @@ static inline void hugetlb_register_node
 static inline void hugetlb_unregister_node(struct node *node) {}
 #endif
 
+static void node_device_release(struct device *dev)
+{
+   struct node *node_dev = to_node(dev);
+
+   memset(node_dev, 0, sizeof(struct node));
+}
 
 /*
  * register_node - Setup a sysfs device for a node.
@@ -265,6 +271,7 @@ int register_node(struct node *node, int
 
node-dev.id = num;
node-dev.bus = node_subsys;
+   node-dev.release = node_device_release;
error = device_register(node-dev);
 
if (!error){

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v4 13/13] memory-hotplug : remove sysfs file of node

2012-07-18 Thread Yasuaki Ishimatsu
The patch adds node_set_offline() and unregister_one_node() to remove_memory()
for removing sysfs file of node.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org 
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com 
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 mm/memory_hotplug.c |5 +
 1 file changed, 5 insertions(+)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-18 18:25:11.036597977 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-18 18:25:54.860050109 +0900
@@ -1048,6 +1048,11 @@ int __ref remove_memory(int nid, u64 sta
/* remove memmap entry */
firmware_map_remove(start, start + size, System RAM);
 
+   if (!node_present_pages(nid)) {
+   node_set_offline(nid);
+   unregister_one_node(nid);
+   }
+
__remove_pages(start  PAGE_SHIFT, size  PAGE_SHIFT);
 out:
unlock_memory_hotplug();

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v4 3/13] memory-hotplug : check whether memory is present or not

2012-07-18 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/18 19:25, Wen Congyang wrote:
 At 07/18/2012 06:07 PM, Yasuaki Ishimatsu Wrote:
 If system supports memory hot-remove, online_pages() may online removed 
 pages.
 So online_pages() need to check whether onlining pages are present or not.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   include/linux/mmzone.h |   21 +
   mm/memory_hotplug.c|   13 +
   2 files changed, 34 insertions(+)

 Index: linux-3.5-rc6/include/linux/mmzone.h
 ===
 --- linux-3.5-rc6.orig/include/linux/mmzone.h2012-07-08 
 09:23:56.0 +0900
 +++ linux-3.5-rc6/include/linux/mmzone.h 2012-07-17 16:10:21.588186145 
 +0900
 @@ -1168,6 +1168,27 @@ void sparse_init(void);
   #define sparse_index_init(_sec, _nid)  do {} while (0)
   #endif /* CONFIG_SPARSEMEM */
   
 +#ifdef CONFIG_SPARSEMEM
 +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
 +{
 +int i;
 +for (i = 0; i  nr_pages; i++) {
 +if (pfn_present(pfn + 1))
 +continue;
 +else {
 +unlock_memory_hotplug();
 
 Why do you unlock memory hotplug here? The caller will do it.

Ah, you are right. In this case, the function should only return -EINVAL.

Thansks,
Yasuaki Ishimatsu
 
 Thanks
 Wen Congyang
 
 +return -EINVAL;
 +}
 +}
 +return 0;
 +}
 +#else
 +static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
 +{
 +return 0;
 +}
 +#endif /* CONFIG_SPARSEMEM*/
 +
   #ifdef CONFIG_NODES_SPAN_OTHER_NODES
   bool early_pfn_in_nid(unsigned long pfn, int nid);
   #else
 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c   2012-07-17 14:26:40.0 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-17 16:09:50.070580170 
 +0900
 @@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn
  struct memory_notify arg;
   
  lock_memory_hotplug();
 +/*
 + * If system supports memory hot-remove, the memory may have been
 + * removed. So we check whether the memory has been removed or not.
 + *
 + * Note: When CONFIG_SPARSEMEM is defined, pfns_present() become
 + *   effective. If CONFIG_SPARSEMEM is not defined, pfns_present()
 + *   always returns 0.
 + */
 +ret = pfns_present(pfn, nr_pages);
 +if (ret) {
 +unlock_memory_hotplug();
 +return ret;
 +}
  arg.start_pfn = pfn;
  arg.nr_pages = nr_pages;
  arg.status_change_nid = -1;


 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v4 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-07-18 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/18 19:33, Wen Congyang wrote:
 At 07/18/2012 06:09 PM, Yasuaki Ishimatsu Wrote:
 When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, 
 type}
 sysfs files are created. But there is no code to remove these files. The 
 patch
 implements the function to remove them.

 Note : The code does not free firmware_map_entry since there is no way to 
 free
 memory which is allocated by bootmem.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   drivers/firmware/memmap.c|   78 
 ++-
   include/linux/firmware-map.h |6 +++
   mm/memory_hotplug.c  |9 +++-
   3 files changed, 90 insertions(+), 3 deletions(-)

 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c   2012-07-18 17:20:05.670024283 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-18 17:51:03.933189930 
 +0900
 @@ -1012,9 +1012,9 @@ int offline_memory(u64 start, u64 size)
  return offline_pages(start_pfn, end_pfn, 120 * HZ);
   }
   
 -int remove_memory(int nid, u64 start, u64 size)
 +int __ref remove_memory(int nid, u64 start, u64 size)
   {
 -int ret = -EBUSY;
 +int ret = 0;
  lock_memory_hotplug();
  /*
   * The memory might become online by other task, even if you offine it.
 @@ -1025,8 +1025,13 @@ int remove_memory(int nid, u64 start, u6
  because the memmory range is online\n,
  start, start + size);
  ret = -EAGAIN;
 +goto out;
  }
   
 +/* remove memmap entry */
 +firmware_map_remove(start, start + size, System RAM);
 +
 +out:
  unlock_memory_hotplug();
  return ret;
   
 Index: linux-3.5-rc6/include/linux/firmware-map.h
 ===
 --- linux-3.5-rc6.orig/include/linux/firmware-map.h  2012-07-18 
 17:19:37.007382563 +0900
 +++ linux-3.5-rc6/include/linux/firmware-map.h   2012-07-18 
 17:42:20.804730245 +0900
 @@ -25,6 +25,7 @@
   
   int firmware_map_add_early(u64 start, u64 end, const char *type);
   int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
 +int firmware_map_remove(u64 start, u64 end, const char *type);
   
   #else /* CONFIG_FIRMWARE_MEMMAP */
   
 @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl
  return 0;
   }
   
 +static inline int firmware_map_remove(u64 start, u64 end, const char *type)
 +{
 +return 0;
 +}
 +
   #endif /* CONFIG_FIRMWARE_MEMMAP */
   
   #endif /* _LINUX_FIRMWARE_MAP_H */
 Index: linux-3.5-rc6/drivers/firmware/memmap.c
 ===
 --- linux-3.5-rc6.orig/drivers/firmware/memmap.c 2012-07-18 
 17:19:43.618300182 +0900
 +++ linux-3.5-rc6/drivers/firmware/memmap.c  2012-07-18 17:42:20.846729721 
 +0900
 @@ -21,6 +21,7 @@
   #include linux/types.h
   #include linux/bootmem.h
   #include linux/slab.h
 +#include linux/mm.h
   
   /*
* Data types 
 --
 @@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att
  .show = memmap_attr_show,
   };
   
 +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, 
 kobj)
 +
 +static void release_firmware_map_entry(struct kobject *kobj)
 +{
 +struct firmware_map_entry *entry = to_memmap_entry(kobj);
 +struct page *page;
 +
 +page = virt_to_page(entry);
 +if (PageSlab(page) || PageCompound(page))
 +kfree(entry);
 
 IIRC, this function's implementation is changed. Why do you do it?
 If PageCompound(page), should we check page-first_page's flags?

I forgot to write the change to change log. Jiang and Christoph discussed
how to find the slab page:

- https://lkml.org/lkml/2012/7/6/333

Then, Christoph proposed this method.  So I changed it.

Thanks,
Yasuaki Ishimatsu

 
 Thanks
 Wen Congyang
 
 +
 +/* There is no way to free memory allocated from bootmem*/
 +}
 +
   static struct kobj_type memmap_ktype = {
 +.release= release_firmware_map_entry,
  .sysfs_ops  = memmap_attr_ops,
  .default_attrs  = def_attrs,
   };
 @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st
  return 0;
   }
   
 +/**
 + * firmware_map_remove_entry() - Does the real work to remove a firmware
 + * memmap entry.
 + * @entry: removed entry.
 + **/
 +static inline void firmware_map_remove_entry(struct firmware_map_entry 
 *entry)
 +{
 +list_del(entry

Re: [RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/13 18:10, Wen Congyang wrote:
 At 07/09/2012 06:26 PM, Yasuaki Ishimatsu Wrote:
 When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, 
 type}
 sysfs files are created. But there is no code to remove these files. The 
 patch
 implements the function to remove them.

 Note : The code does not free firmware_map_entry since there is no way to 
 free
 memory which is allocated by bootmem.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   drivers/firmware/memmap.c|   78 
 ++-
   include/linux/firmware-map.h |6 +++
   mm/memory_hotplug.c  |6 ++-
   3 files changed, 88 insertions(+), 2 deletions(-)

 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c   2012-07-09 18:23:13.323844923 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:23:19.522767424 
 +0900
 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory);

   int remove_memory(int nid, u64 start, u64 size)
   {
 -return -EBUSY;
 +lock_memory_hotplug();
 +/* remove memmap entry */
 +firmware_map_remove(start, start + size - 1, System RAM);
 
 firmware_map_remove() is in meminit section, so remove_memory() should be in
 ref section.

I'll add it.

Thanks,
Yasuaki Ishimatsu

 
 Thanks
 Wen Congyang
 
 +unlock_memory_hotplug();
 +return 0;

   }
   EXPORT_SYMBOL_GPL(remove_memory);
 Index: linux-3.5-rc6/include/linux/firmware-map.h
 ===
 --- linux-3.5-rc6.orig/include/linux/firmware-map.h  2012-07-09 
 18:23:09.532892314 +0900
 +++ linux-3.5-rc6/include/linux/firmware-map.h   2012-07-09 
 18:23:19.523767412 +0900
 @@ -25,6 +25,7 @@

   int firmware_map_add_early(u64 start, u64 end, const char *type);
   int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
 +int firmware_map_remove(u64 start, u64 end, const char *type);

   #else /* CONFIG_FIRMWARE_MEMMAP */

 @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl
  return 0;
   }

 +static inline int firmware_map_remove(u64 start, u64 end, const char *type)
 +{
 +return 0;
 +}
 +
   #endif /* CONFIG_FIRMWARE_MEMMAP */

   #endif /* _LINUX_FIRMWARE_MAP_H */
 Index: linux-3.5-rc6/drivers/firmware/memmap.c
 ===
 --- linux-3.5-rc6.orig/drivers/firmware/memmap.c 2012-07-09 
 18:23:09.532892314 +0900
 +++ linux-3.5-rc6/drivers/firmware/memmap.c  2012-07-09 18:25:46.371931554 
 +0900
 @@ -21,6 +21,7 @@
   #include linux/types.h
   #include linux/bootmem.h
   #include linux/slab.h
 +#include linux/mm.h

   /*
* Data types 
 --
 @@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att
  .show = memmap_attr_show,
   };

 +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, 
 kobj)
 +
 +static void release_firmware_map_entry(struct kobject *kobj)
 +{
 +struct firmware_map_entry *entry = to_memmap_entry(kobj);
 +struct page *head_page;
 +
 +head_page = virt_to_head_page(entry);
 +if (PageSlab(head_page))
 +kfree(entry);
 +
 +/* There is no way to free memory allocated from bootmem*/
 +}
 +
   static struct kobj_type memmap_ktype = {
 +.release= release_firmware_map_entry,
  .sysfs_ops  = memmap_attr_ops,
  .default_attrs  = def_attrs,
   };
 @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st
  return 0;
   }

 +/**
 + * firmware_map_remove_entry() - Does the real work to remove a firmware
 + * memmap entry.
 + * @entry: removed entry.
 + **/
 +static inline void firmware_map_remove_entry(struct firmware_map_entry 
 *entry)
 +{
 +list_del(entry-list);
 +}
 +
   /*
* Add memmap entry on sysfs
*/
 @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct
  return 0;
   }

 +/*
 + * Remove memmap entry on sysfs
 + */
 +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry 
 *entry)
 +{
 +kobject_put(entry-kobj);
 +}
 +
 +/*
 + * Search memmap entry
 + */
 +
 +struct firmware_map_entry * __meminit
 +find_firmware_map_entry(u64 start, u64 end, const char *type)
 +{
 +struct firmware_map_entry *entry;
 +
 +list_for_each_entry(entry, map_entries, list)
 +if ((entry-start == start)  (entry-end == end) 
 +(!strcmp(entry-type, type)))
 +return

Re: [RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/16 11:32, Wen Congyang wrote:
 At 07/09/2012 06:26 PM, Yasuaki Ishimatsu Wrote:
 When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, 
 type}
 sysfs files are created. But there is no code to remove these files. The 
 patch
 implements the function to remove them.

 Note : The code does not free firmware_map_entry since there is no way to 
 free
 memory which is allocated by bootmem.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   drivers/firmware/memmap.c|   78 
 ++-
   include/linux/firmware-map.h |6 +++
   mm/memory_hotplug.c  |6 ++-
   3 files changed, 88 insertions(+), 2 deletions(-)

 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c   2012-07-09 18:23:13.323844923 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:23:19.522767424 
 +0900
 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory);

   int remove_memory(int nid, u64 start, u64 size)
   {
 -return -EBUSY;
 +lock_memory_hotplug();
 +/* remove memmap entry */
 +firmware_map_remove(start, start + size - 1, System RAM);
 +unlock_memory_hotplug();
 +return 0;

   }
   EXPORT_SYMBOL_GPL(remove_memory);
 Index: linux-3.5-rc6/include/linux/firmware-map.h
 ===
 --- linux-3.5-rc6.orig/include/linux/firmware-map.h  2012-07-09 
 18:23:09.532892314 +0900
 +++ linux-3.5-rc6/include/linux/firmware-map.h   2012-07-09 
 18:23:19.523767412 +0900
 @@ -25,6 +25,7 @@

   int firmware_map_add_early(u64 start, u64 end, const char *type);
   int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
 +int firmware_map_remove(u64 start, u64 end, const char *type);

   #else /* CONFIG_FIRMWARE_MEMMAP */

 @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl
  return 0;
   }

 +static inline int firmware_map_remove(u64 start, u64 end, const char *type)
 +{
 +return 0;
 +}
 +
   #endif /* CONFIG_FIRMWARE_MEMMAP */

   #endif /* _LINUX_FIRMWARE_MAP_H */
 Index: linux-3.5-rc6/drivers/firmware/memmap.c
 ===
 --- linux-3.5-rc6.orig/drivers/firmware/memmap.c 2012-07-09 
 18:23:09.532892314 +0900
 +++ linux-3.5-rc6/drivers/firmware/memmap.c  2012-07-09 18:25:46.371931554 
 +0900
 @@ -21,6 +21,7 @@
   #include linux/types.h
   #include linux/bootmem.h
   #include linux/slab.h
 +#include linux/mm.h

   /*
* Data types 
 --
 @@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att
  .show = memmap_attr_show,
   };

 +#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, 
 kobj)
 +
 +static void release_firmware_map_entry(struct kobject *kobj)
 +{
 +struct firmware_map_entry *entry = to_memmap_entry(kobj);
 +struct page *head_page;
 +
 +head_page = virt_to_head_page(entry);
 +if (PageSlab(head_page))
 +kfree(entry);
 +
 +/* There is no way to free memory allocated from bootmem*/
 +}
 +
   static struct kobj_type memmap_ktype = {
 +.release= release_firmware_map_entry,
  .sysfs_ops  = memmap_attr_ops,
  .default_attrs  = def_attrs,
   };
 @@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st
  return 0;
   }

 +/**
 + * firmware_map_remove_entry() - Does the real work to remove a firmware
 + * memmap entry.
 + * @entry: removed entry.
 + **/
 +static inline void firmware_map_remove_entry(struct firmware_map_entry 
 *entry)
 +{
 +list_del(entry-list);
 +}
 +
   /*
* Add memmap entry on sysfs
*/
 @@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct
  return 0;
   }

 +/*
 + * Remove memmap entry on sysfs
 + */
 +static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry 
 *entry)
 +{
 +kobject_put(entry-kobj);
 +}
 +
 +/*
 + * Search memmap entry
 + */
 +
 +struct firmware_map_entry * __meminit
 +find_firmware_map_entry(u64 start, u64 end, const char *type)
 +{
 +struct firmware_map_entry *entry;
 +
 +list_for_each_entry(entry, map_entries, list)
 +if ((entry-start == start)  (entry-end == end) 
 +(!strcmp(entry-type, type)))
 +return entry;
 +
 +return NULL;
 +}
 +
   /**
* firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
* memory hotplug.
 @@ -196,6 +247,32

Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/13 12:26, Wen Congyang wrote:
 At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote:
 acpi_memory_device_remove() has been prepared to remove physical memory.
 But, the function only frees acpi_memory_device currentlry.

 The patch adds following functions into acpi_memory_device_remove():
- offline memory
- remove physical memory (only return -EBUSY)
- free acpi_memory_device

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   drivers/acpi/acpi_memhotplug.c |   26 +-
   drivers/base/memory.c  |   39 
 +++
   include/linux/memory.h |5 +
   include/linux/memory_hotplug.h |1 +
   mm/memory_hotplug.c|8 
   5 files changed, 78 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
 ===
 --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c2012-07-09 
 18:08:29.946888653 +0900
 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-09 
 18:08:43.470719531 +0900
 @@ -29,6 +29,7 @@
   #include linux/module.h
   #include linux/init.h
   #include linux/types.h
 +#include linux/memory.h
   #include linux/memory_hotplug.h
   #include linux/slab.h
   #include acpi/acpi_drivers.h
 @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct
   static int acpi_memory_device_remove(struct acpi_device *device, int type)
   {
  struct acpi_memory_device *mem_device = NULL;
 -
 +struct acpi_memory_info *info, *tmp;
 +int result;
 +int node;

  if (!device || !acpi_driver_data(device))
  return -EINVAL;

  mem_device = acpi_driver_data(device);
 +
 +node = acpi_get_node(mem_device-device-handle);
 +
 +list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
 +if (!info-enabled)
 +continue;
 +
 +if (!is_memblk_offline(info-start_addr, info-length)) {
 +result = offline_memory(info-start_addr, info-length);
 +if (result)
 +return result;
 +}
 +
 +result = remove_memory(node, info-start_addr, info-length);
 +if (result)
 +return result;
 +
 +list_del(info-list);
 +kfree(info);
 +}
 +
  kfree(mem_device);

  return 0;
 Index: linux-3.5-rc6/include/linux/memory_hotplug.h
 ===
 --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h2012-07-09 
 18:08:29.955888542 +0900
 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-09 
 18:08:43.471719518 +0900
 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab
   extern int mem_online_node(int nid);
   extern int add_memory(int nid, u64 start, u64 size);
   extern int arch_add_memory(int nid, u64 start, u64 size);
 +extern int remove_memory(int nid, u64 start, u64 size);
 
 
 Here should be:
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern int remove_memory(int nid, u64 start, u64 size);
 #else
 static int inline remove_memory(int nid, u64 start, u64 size)
 {
   return -EBUSY;
 }
 #endif

O.K. I'll update it.

Thanks,
Yasuaki Ishimatsu


 
   extern int offline_memory(u64 start, u64 size);
   extern int sparse_add_one_section(struct zone *zone, unsigned long 
 start_pfn,
  int nr_pages);
 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c   2012-07-09 18:08:29.953888567 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:08:43.476719455 
 +0900
 @@ -659,6 +659,14 @@ out:
   }
   EXPORT_SYMBOL_GPL(add_memory);

 +int remove_memory(int nid, u64 start, u64 size)
 +{
 +return -EBUSY;
 +
 +}
 +EXPORT_SYMBOL_GPL(remove_memory);
 
 We only need to implement this function when CONFIG_MEMORY_HOTREMOVE
 is defined here.
 
 Thanks
 Wen Congyang
 
 +
 +
   #ifdef CONFIG_MEMORY_HOTREMOVE
   /*
* A free page on the buddy free lists (not the per-cpu lists) has 
 PageBuddy
 Index: linux-3.5-rc6/drivers/base/memory.c
 ===
 --- linux-3.5-rc6.orig/drivers/base/memory.c 2012-07-09 18:08:29.947888640 
 +0900
 +++ linux-3.5-rc6/drivers/base/memory.c  2012-07-09 18:10:54.880076739 
 +0900
 @@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier

Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/13 19:40, Wen Congyang wrote:
 At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote:
 acpi_memory_device_remove() has been prepared to remove physical memory.
 But, the function only frees acpi_memory_device currentlry.

 The patch adds following functions into acpi_memory_device_remove():
- offline memory
- remove physical memory (only return -EBUSY)
- free acpi_memory_device

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   drivers/acpi/acpi_memhotplug.c |   26 +-
   drivers/base/memory.c  |   39 
 +++
   include/linux/memory.h |5 +
   include/linux/memory_hotplug.h |1 +
   mm/memory_hotplug.c|8 
   5 files changed, 78 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
 ===
 --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c2012-07-09 
 18:08:29.946888653 +0900
 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-09 
 18:08:43.470719531 +0900
 @@ -29,6 +29,7 @@
   #include linux/module.h
   #include linux/init.h
   #include linux/types.h
 +#include linux/memory.h
   #include linux/memory_hotplug.h
   #include linux/slab.h
   #include acpi/acpi_drivers.h
 @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct
   static int acpi_memory_device_remove(struct acpi_device *device, int type)
   {
  struct acpi_memory_device *mem_device = NULL;
 -
 +struct acpi_memory_info *info, *tmp;
 +int result;
 +int node;

  if (!device || !acpi_driver_data(device))
  return -EINVAL;

  mem_device = acpi_driver_data(device);
 +
 +node = acpi_get_node(mem_device-device-handle);
 
 acpi_get_node() may return -1, and you should call 
 memory_add_physaddr_to_nid()
 to get the node id.

O.K. I'll update it.

Thanks,
Yasuaki Ishimatsu

 
 Thanks
 Wen Congyang
 
 +
 +list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
 +if (!info-enabled)
 +continue;
 +
 +if (!is_memblk_offline(info-start_addr, info-length)) {
 +result = offline_memory(info-start_addr, info-length);
 +if (result)
 +return result;
 +}
 +
 +result = remove_memory(node, info-start_addr, info-length);
 +if (result)
 +return result;
 +
 +list_del(info-list);
 +kfree(info);
 +}
 +
  kfree(mem_device);

  return 0;
 Index: linux-3.5-rc6/include/linux/memory_hotplug.h
 ===
 --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h2012-07-09 
 18:08:29.955888542 +0900
 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-09 
 18:08:43.471719518 +0900
 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab
   extern int mem_online_node(int nid);
   extern int add_memory(int nid, u64 start, u64 size);
   extern int arch_add_memory(int nid, u64 start, u64 size);
 +extern int remove_memory(int nid, u64 start, u64 size);
   extern int offline_memory(u64 start, u64 size);
   extern int sparse_add_one_section(struct zone *zone, unsigned long 
 start_pfn,
  int nr_pages);
 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c   2012-07-09 18:08:29.953888567 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:08:43.476719455 
 +0900
 @@ -659,6 +659,14 @@ out:
   }
   EXPORT_SYMBOL_GPL(add_memory);

 +int remove_memory(int nid, u64 start, u64 size)
 +{
 +return -EBUSY;
 +
 +}
 +EXPORT_SYMBOL_GPL(remove_memory);
 +
 +
   #ifdef CONFIG_MEMORY_HOTREMOVE
   /*
* A free page on the buddy free lists (not the per-cpu lists) has 
 PageBuddy
 Index: linux-3.5-rc6/drivers/base/memory.c
 ===
 --- linux-3.5-rc6.orig/drivers/base/memory.c 2012-07-09 18:08:29.947888640 
 +0900
 +++ linux-3.5-rc6/drivers/base/memory.c  2012-07-09 18:10:54.880076739 
 +0900
 @@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(
   }
   EXPORT_SYMBOL(unregister_memory_isolate_notifier);

 +bool is_memblk_offline(unsigned long start, unsigned long size)
 +{
 +struct memory_block *mem = NULL;
 +struct mem_section *section;
 +unsigned

Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/13 12:35, Wen Congyang wrote:
 At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote:
 acpi_memory_device_remove() has been prepared to remove physical memory.
 But, the function only frees acpi_memory_device currentlry.

 The patch adds following functions into acpi_memory_device_remove():
- offline memory
- remove physical memory (only return -EBUSY)
- free acpi_memory_device

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   drivers/acpi/acpi_memhotplug.c |   26 +-
   drivers/base/memory.c  |   39 
 +++
   include/linux/memory.h |5 +
   include/linux/memory_hotplug.h |1 +
   mm/memory_hotplug.c|8 
   5 files changed, 78 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
 ===
 --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c2012-07-09 
 18:08:29.946888653 +0900
 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c 2012-07-09 
 18:08:43.470719531 +0900
 @@ -29,6 +29,7 @@
   #include linux/module.h
   #include linux/init.h
   #include linux/types.h
 +#include linux/memory.h
   #include linux/memory_hotplug.h
   #include linux/slab.h
   #include acpi/acpi_drivers.h
 @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct
   static int acpi_memory_device_remove(struct acpi_device *device, int type)
   {
  struct acpi_memory_device *mem_device = NULL;
 -
 +struct acpi_memory_info *info, *tmp;
 +int result;
 +int node;

  if (!device || !acpi_driver_data(device))
  return -EINVAL;

  mem_device = acpi_driver_data(device);
 +
 +node = acpi_get_node(mem_device-device-handle);
 +
 +list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
 +if (!info-enabled)
 +continue;
 +
 +if (!is_memblk_offline(info-start_addr, info-length)) {
 +result = offline_memory(info-start_addr, info-length);
 +if (result)
 +return result;
 +}
 +
 +result = remove_memory(node, info-start_addr, info-length);
 
 The user may online the memory between offline_memory() and remove_memory().
 So I think we should lock memory hotplug before check the memory's status
 and release it after remove_memory().

How about get mem_block-state_mutex of removed memory? When offlining
memory, we need to change memory_block-state into MEM_OFFLINE.
In this case, we get mem_block-state_mutex. So I think the mutex lock
is beneficial.

Thanks,
Yasuaki Ishimatsu

 
 Thanks
 Wen Congyang
 
 +if (result)
 +return result;
 +
 +list_del(info-list);
 +kfree(info);
 +}
 +
  kfree(mem_device);

  return 0;
 Index: linux-3.5-rc6/include/linux/memory_hotplug.h
 ===
 --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h2012-07-09 
 18:08:29.955888542 +0900
 +++ linux-3.5-rc6/include/linux/memory_hotplug.h 2012-07-09 
 18:08:43.471719518 +0900
 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab
   extern int mem_online_node(int nid);
   extern int add_memory(int nid, u64 start, u64 size);
   extern int arch_add_memory(int nid, u64 start, u64 size);
 +extern int remove_memory(int nid, u64 start, u64 size);
   extern int offline_memory(u64 start, u64 size);
   extern int sparse_add_one_section(struct zone *zone, unsigned long 
 start_pfn,
  int nr_pages);
 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c   2012-07-09 18:08:29.953888567 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c2012-07-09 18:08:43.476719455 
 +0900
 @@ -659,6 +659,14 @@ out:
   }
   EXPORT_SYMBOL_GPL(add_memory);

 +int remove_memory(int nid, u64 start, u64 size)
 +{
 +return -EBUSY;
 +
 +}
 +EXPORT_SYMBOL_GPL(remove_memory);
 +
 +
   #ifdef CONFIG_MEMORY_HOTREMOVE
   /*
* A free page on the buddy free lists (not the per-cpu lists) has 
 PageBuddy
 Index: linux-3.5-rc6/drivers/base/memory.c
 ===
 --- linux-3.5-rc6.orig/drivers/base/memory.c 2012-07-09 18:08:29.947888640 
 +0900
 +++ linux-3.5-rc6/drivers/base/memory.c  2012-07-09 18:10:54.880076739

Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/17 10:44, Yasuaki Ishimatsu wrote:
 Hi Wen,
 
 2012/07/13 12:35, Wen Congyang wrote:
 At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote:
 acpi_memory_device_remove() has been prepared to remove physical memory.
 But, the function only frees acpi_memory_device currentlry.

 The patch adds following functions into acpi_memory_device_remove():
 - offline memory
 - remove physical memory (only return -EBUSY)
 - free acpi_memory_device

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
drivers/acpi/acpi_memhotplug.c |   26 +-
drivers/base/memory.c  |   39 
 +++
include/linux/memory.h |5 +
include/linux/memory_hotplug.h |1 +
mm/memory_hotplug.c|8 
5 files changed, 78 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
 ===
 --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c   2012-07-09 
 18:08:29.946888653 +0900
 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c2012-07-09 
 18:08:43.470719531 +0900
 @@ -29,6 +29,7 @@
#include linux/module.h
#include linux/init.h
#include linux/types.h
 +#include linux/memory.h
#include linux/memory_hotplug.h
#include linux/slab.h
#include acpi/acpi_drivers.h
 @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct
static int acpi_memory_device_remove(struct acpi_device *device, int 
 type)
{
 struct acpi_memory_device *mem_device = NULL;
 -
 +   struct acpi_memory_info *info, *tmp;
 +   int result;
 +   int node;

 if (!device || !acpi_driver_data(device))
 return -EINVAL;

 mem_device = acpi_driver_data(device);
 +
 +   node = acpi_get_node(mem_device-device-handle);
 +
 +   list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
 +   if (!info-enabled)
 +   continue;
 +
 +   if (!is_memblk_offline(info-start_addr, info-length)) {
 +   result = offline_memory(info-start_addr, info-length);
 +   if (result)
 +   return result;
 +   }
 +
 +   result = remove_memory(node, info-start_addr, info-length);

 The user may online the memory between offline_memory() and remove_memory().
 So I think we should lock memory hotplug before check the memory's status
 and release it after remove_memory().
 
 How about get mem_block-state_mutex of removed memory? When offlining
 memory, we need to change memory_block-state into MEM_OFFLINE.
 In this case, we get mem_block-state_mutex. So I think the mutex lock
 is beneficial.

It is not good idea since remove_memory frees mem_block structure...
Do you have any ideas?

Thanks,
Yasuaki Ishimatsu

 Thanks,
 Yasuaki Ishimatsu
 

 Thanks
 Wen Congyang

 +   if (result)
 +   return result;
 +
 +   list_del(info-list);
 +   kfree(info);
 +   }
 +
 kfree(mem_device);

 return 0;
 Index: linux-3.5-rc6/include/linux/memory_hotplug.h
 ===
 --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h   2012-07-09 
 18:08:29.955888542 +0900
 +++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-09 
 18:08:43.471719518 +0900
 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab
extern int mem_online_node(int nid);
extern int add_memory(int nid, u64 start, u64 size);
extern int arch_add_memory(int nid, u64 start, u64 size);
 +extern int remove_memory(int nid, u64 start, u64 size);
extern int offline_memory(u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long 
 start_pfn,
 int nr_pages);
 Index: linux-3.5-rc6/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-09 18:08:29.953888567 
 +0900
 +++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-09 18:08:43.476719455 
 +0900
 @@ -659,6 +659,14 @@ out:
}
EXPORT_SYMBOL_GPL(add_memory);

 +int remove_memory(int nid, u64 start, u64 size)
 +{
 +   return -EBUSY;
 +
 +}
 +EXPORT_SYMBOL_GPL(remove_memory);
 +
 +
#ifdef CONFIG_MEMORY_HOTREMOVE
/*
 * A free page on the buddy free lists (not the per-cpu lists) has 
 PageBuddy
 Index: linux-3.5-rc6/drivers/base/memory.c

Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/17 11:32, Wen Congyang wrote:
 At 07/17/2012 09:54 AM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/17 10:44, Yasuaki Ishimatsu wrote:
 Hi Wen,

 2012/07/13 12:35, Wen Congyang wrote:
 At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote:
 acpi_memory_device_remove() has been prepared to remove physical memory.
 But, the function only frees acpi_memory_device currentlry.

 The patch adds following functions into acpi_memory_device_remove():
  - offline memory
  - remove physical memory (only return -EBUSY)
  - free acpi_memory_device

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
 drivers/acpi/acpi_memhotplug.c |   26 +-
 drivers/base/memory.c  |   39 
 +++
 include/linux/memory.h |5 +
 include/linux/memory_hotplug.h |1 +
 mm/memory_hotplug.c|8 
 5 files changed, 78 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
 ===
 --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c 2012-07-09 
 18:08:29.946888653 +0900
 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c  2012-07-09 
 18:08:43.470719531 +0900
 @@ -29,6 +29,7 @@
 #include linux/module.h
 #include linux/init.h
 #include linux/types.h
 +#include linux/memory.h
 #include linux/memory_hotplug.h
 #include linux/slab.h
 #include acpi/acpi_drivers.h
 @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct
 static int acpi_memory_device_remove(struct acpi_device *device, int 
 type)
 {
   struct acpi_memory_device *mem_device = NULL;
 -
 + struct acpi_memory_info *info, *tmp;
 + int result;
 + int node;

   if (!device || !acpi_driver_data(device))
   return -EINVAL;

   mem_device = acpi_driver_data(device);
 +
 + node = acpi_get_node(mem_device-device-handle);
 +
 + list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
 + if (!info-enabled)
 + continue;
 +
 + if (!is_memblk_offline(info-start_addr, info-length)) {
 + result = offline_memory(info-start_addr, info-length);
 + if (result)
 + return result;
 + }
 +
 + result = remove_memory(node, info-start_addr, info-length);

 The user may online the memory between offline_memory() and 
 remove_memory().
 So I think we should lock memory hotplug before check the memory's status
 and release it after remove_memory().

 How about get mem_block-state_mutex of removed memory? When offlining
 memory, we need to change memory_block-state into MEM_OFFLINE.
 In this case, we get mem_block-state_mutex. So I think the mutex lock
 is beneficial.

 It is not good idea since remove_memory frees mem_block structure...
 Do you have any ideas?
 
 Hmm, split offline_memory() to 2 functions: offline_pages() and 
 __offline_pages()
 
 offline_pages()
   lock_memory_hotplug();
   __offline_pages();
   unlock_memory_hotplug();
 
 and implement remove_memory() like this:
 remove_memory()
   lock_memory_hotplug()
   if (!is_memblk_offline()) {
   __offline_pages();
   }
   // cleanup
   unlock_memory_hotplug();
 
 What about this?

I also thought about it once. But a problem remains. Current offilne_pages()
cannot realize the memory has been removed by remove_memory(). So even if
protecting the race by lock_memory_hotplug(), offline_pages() can offline
the removed memory. offline_pages() should have the means to know the memory
was removed. But I don't have good idea.

Thanks,
Yasuaki Ishimatsu

 
 Thanks
 Wen Congyang

 Thanks,
 Yasuaki Ishimatsu

 Thanks,
 Yasuaki Ishimatsu


 Thanks
 Wen Congyang

 + if (result)
 + return result;
 +
 + list_del(info-list);
 + kfree(info);
 + }
 +
   kfree(mem_device);

   return 0;
 Index: linux-3.5-rc6/include/linux/memory_hotplug.h
 ===
 --- linux-3.5-rc6.orig/include/linux/memory_hotplug.h 2012-07-09 
 18:08:29.955888542 +0900
 +++ linux-3.5-rc6/include/linux/memory_hotplug.h  2012-07-09 
 18:08:43.471719518 +0900
 @@ -233,6 +233,7 @@ static inline int is_mem_section_removab
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
 +extern int

Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/17 12:32, Wen Congyang wrote:
 At 07/17/2012 11:08 AM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/17 11:32, Wen Congyang wrote:
 At 07/17/2012 09:54 AM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/17 10:44, Yasuaki Ishimatsu wrote:
 Hi Wen,

 2012/07/13 12:35, Wen Congyang wrote:
 At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote:
 acpi_memory_device_remove() has been prepared to remove physical memory.
 But, the function only frees acpi_memory_device currentlry.

 The patch adds following functions into acpi_memory_device_remove():
   - offline memory
   - remove physical memory (only return -EBUSY)
   - free acpi_memory_device

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
  drivers/acpi/acpi_memhotplug.c |   26 +-
  drivers/base/memory.c  |   39 
 +++
  include/linux/memory.h |5 +
  include/linux/memory_hotplug.h |1 +
  mm/memory_hotplug.c|8 
  5 files changed, 78 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
 ===
 --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c   2012-07-09 
 18:08:29.946888653 +0900
 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c2012-07-09 
 18:08:43.470719531 +0900
 @@ -29,6 +29,7 @@
  #include linux/module.h
  #include linux/init.h
  #include linux/types.h
 +#include linux/memory.h
  #include linux/memory_hotplug.h
  #include linux/slab.h
  #include acpi/acpi_drivers.h
 @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct
  static int acpi_memory_device_remove(struct acpi_device *device, 
 int type)
  {
 struct acpi_memory_device *mem_device = NULL;
 -
 +   struct acpi_memory_info *info, *tmp;
 +   int result;
 +   int node;

 if (!device || !acpi_driver_data(device))
 return -EINVAL;

 mem_device = acpi_driver_data(device);
 +
 +   node = acpi_get_node(mem_device-device-handle);
 +
 +   list_for_each_entry_safe(info, tmp, mem_device-res_list, 
 list) {
 +   if (!info-enabled)
 +   continue;
 +
 +   if (!is_memblk_offline(info-start_addr, info-length)) 
 {
 +   result = offline_memory(info-start_addr, 
 info-length);
 +   if (result)
 +   return result;
 +   }
 +
 +   result = remove_memory(node, info-start_addr, 
 info-length);

 The user may online the memory between offline_memory() and 
 remove_memory().
 So I think we should lock memory hotplug before check the memory's status
 and release it after remove_memory().

 How about get mem_block-state_mutex of removed memory? When offlining
 memory, we need to change memory_block-state into MEM_OFFLINE.
 In this case, we get mem_block-state_mutex. So I think the mutex lock
 is beneficial.

 It is not good idea since remove_memory frees mem_block structure...
 Do you have any ideas?

 Hmm, split offline_memory() to 2 functions: offline_pages() and 
 __offline_pages()

 offline_pages()
 lock_memory_hotplug();
 __offline_pages();
 unlock_memory_hotplug();

 and implement remove_memory() like this:
 remove_memory()
 lock_memory_hotplug()
 if (!is_memblk_offline()) {
 __offline_pages();
 }
 // cleanup
 unlock_memory_hotplug();

 What about this?

 I also thought about it once. But a problem remains. Current offilne_pages()
 cannot realize the memory has been removed by remove_memory(). So even if
 protecting the race by lock_memory_hotplug(), offline_pages() can offline
 the removed memory. offline_pages() should have the means to know the memory
 was removed. But I don't have good idea.
 
 We can not online/offline part of memory block, so what about this?

It seems you do not understand my concern.
When memory_remove() and offline_pages() run to same memory simultaneously,
offline_pages runs to removed memory.

memory_remove()  | offline_pages()
---
lock_memory_hotplug()|
 | wait at lock_memory_hotplug()
remove memory|
unlock_memory_hotplug()  |
 | wake up and start offline_pages()
 | offline page
 | = but the memory has already removed

Re: [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-16 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/17 14:17, Wen Congyang wrote:
 At 07/17/2012 12:51 PM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/17 12:32, Wen Congyang wrote:
 At 07/17/2012 11:08 AM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/17 11:32, Wen Congyang wrote:
 At 07/17/2012 09:54 AM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/17 10:44, Yasuaki Ishimatsu wrote:
 Hi Wen,

 2012/07/13 12:35, Wen Congyang wrote:
 At 07/09/2012 06:24 PM, Yasuaki Ishimatsu Wrote:
 acpi_memory_device_remove() has been prepared to remove physical 
 memory.
 But, the function only frees acpi_memory_device currentlry.

 The patch adds following functions into acpi_memory_device_remove():
- offline memory
- remove physical memory (only return -EBUSY)
- free acpi_memory_device

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   drivers/acpi/acpi_memhotplug.c |   26 +-
   drivers/base/memory.c  |   39 
 +++
   include/linux/memory.h |5 +
   include/linux/memory_hotplug.h |1 +
   mm/memory_hotplug.c|8 
   5 files changed, 78 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
 ===
 --- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c 2012-07-09 
 18:08:29.946888653 +0900
 +++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c  2012-07-09 
 18:08:43.470719531 +0900
 @@ -29,6 +29,7 @@
   #include linux/module.h
   #include linux/init.h
   #include linux/types.h
 +#include linux/memory.h
   #include linux/memory_hotplug.h
   #include linux/slab.h
   #include acpi/acpi_drivers.h
 @@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct
   static int acpi_memory_device_remove(struct acpi_device 
 *device, int type)
   {
   struct acpi_memory_device *mem_device = NULL;
 -
 + struct acpi_memory_info *info, *tmp;
 + int result;
 + int node;

   if (!device || !acpi_driver_data(device))
   return -EINVAL;

   mem_device = acpi_driver_data(device);
 +
 + node = acpi_get_node(mem_device-device-handle);
 +
 + list_for_each_entry_safe(info, tmp, mem_device-res_list, 
 list) {
 + if (!info-enabled)
 + continue;
 +
 + if (!is_memblk_offline(info-start_addr, info-length)) 
 {
 + result = offline_memory(info-start_addr, 
 info-length);
 + if (result)
 + return result;
 + }
 +
 + result = remove_memory(node, info-start_addr, 
 info-length);

 The user may online the memory between offline_memory() and 
 remove_memory().
 So I think we should lock memory hotplug before check the memory's 
 status
 and release it after remove_memory().

 How about get mem_block-state_mutex of removed memory? When offlining
 memory, we need to change memory_block-state into MEM_OFFLINE.
 In this case, we get mem_block-state_mutex. So I think the mutex lock
 is beneficial.

 It is not good idea since remove_memory frees mem_block structure...
 Do you have any ideas?

 Hmm, split offline_memory() to 2 functions: offline_pages() and 
 __offline_pages()

 offline_pages()
   lock_memory_hotplug();
   __offline_pages();
   unlock_memory_hotplug();

 and implement remove_memory() like this:
 remove_memory()
   lock_memory_hotplug()
   if (!is_memblk_offline()) {
   __offline_pages();
   }
   // cleanup
   unlock_memory_hotplug();

 What about this?

 I also thought about it once. But a problem remains. Current 
 offilne_pages()
 cannot realize the memory has been removed by remove_memory(). So even if
 protecting the race by lock_memory_hotplug(), offline_pages() can offline
 the removed memory. offline_pages() should have the means to know the 
 memory
 was removed. But I don't have good idea.

 We can not online/offline part of memory block, so what about this?

 It seems you do not understand my concern.
 When memory_remove() and offline_pages() run to same memory simultaneously,
 offline_pages runs to removed memory.

 memory_remove()  | offline_pages()
 ---
 lock_memory_hotplug()|
   | wait at lock_memory_hotplug()
 remove memory|
 unlock_memory_hotplug()  |
   | wake up and start offline_pages

Re: [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug

2012-07-12 Thread Yasuaki Ishimatsu
Hi Dave,

2012/07/12 22:40, Dave Hansen wrote:
 On 07/11/2012 09:52 PM, Yasuaki Ishimatsu wrote:
 Does the following patch include your comment? If O.K., I will separate
 the patch from the series and send it for bug fix.
 
 Looks sane to me.  It does now mean that the calling conventions for
 some of the other firmware_map*() functions are different, but I think
 that's OK since they're only used internally to memmap.c.

Thank you for reviewing my patch.
I'll send the patch.

Thanks,
Yasuaki Ishimatsu

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug

2012-07-12 Thread Yasuaki Ishimatsu
Hi Dave,

2012/07/12 22:40, Dave Hansen wrote:
 On 07/11/2012 09:52 PM, Yasuaki Ishimatsu wrote:
 Does the following patch include your comment? If O.K., I will separate
 the patch from the series and send it for bug fix.
 
 Looks sane to me.  It does now mean that the calling conventions for
 some of the other firmware_map*() functions are different, but I think
 that's OK since they're only used internally to memmap.c.

Can I add Reviewed-by: Dave Hansen to the patch?

Thanks,
Yasuaki Ishimatsu

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap

2012-07-11 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/11 15:25, Wen Congyang wrote:
 At 07/11/2012 01:52 PM, Yasuaki Ishimatsu Wrote:
 2012/07/11 14:06, Wen Congyang wrote:
 Hi Wen,

 At 07/09/2012 06:33 PM, Yasuaki Ishimatsu Wrote:
 I don't think that all pages of virtual mapping in removed memory can be
 freed, since page which type is MIX_SECTION_INFO is difficult to free.
 So, the patch only frees page which type is SECTION_INFO at first.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
arch/x86/mm/init_64.c |   91 
 ++
include/linux/mm.h|2 +
mm/memory_hotplug.c   |5 ++
mm/sparse.c   |5 +-
4 files changed, 101 insertions(+), 2 deletions(-)

 Index: linux-3.5-rc4/include/linux/mm.h
 ===
 --- linux-3.5-rc4.orig/include/linux/mm.h  2012-07-03 14:22:18.530011567 
 +0900
 +++ linux-3.5-rc4/include/linux/mm.h   2012-07-03 14:22:20.83872 
 +0900
 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
void vmemmap_populate_print_last(void);
void register_page_bootmem_memmap(unsigned long section_nr, struct page 
 *map,
  unsigned long size);
 +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
 +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);

enum mf_flags {
MF_COUNT_INCREASED = 1  0,
 Index: linux-3.5-rc4/mm/sparse.c
 ===
 --- linux-3.5-rc4.orig/mm/sparse.c 2012-07-03 14:21:45.071429805 +0900
 +++ linux-3.5-rc4/mm/sparse.c  2012-07-03 14:22:21.000983767 +0900
 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
/* This will make the necessary allocations eventually. */
return sparse_mem_map_populate(pnum, nid);
}
 -static void __kfree_section_memmap(struct page *memmap, unsigned long 
 nr_pages)
 +static void __kfree_section_memmap(struct page *page, unsigned long 
 nr_pages)
{
 -  return; /* XXX: Not implemented yet */
 +  vmemmap_kfree(page, nr_pages);

 Hmm, I think you try to free the memory allocated in 
 kmalloc_section_memmap().

 Yes.


}
static void free_map_bootmem(struct page *page, unsigned long nr_pages)
{
 +  vmemmap_free_bootmem(page, nr_pages);
}

 Hmm, which function is the memory you try to free allocated in?

 The function try to free memory allocated from bootmem. The memory has
 been registered by get_page_bootmem(). So we can free the memory by
 put_page_bootmem().
 
 OK, I will read these codes, and check it.
 


#else
static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 Index: linux-3.5-rc4/arch/x86/mm/init_64.c
 ===
 --- linux-3.5-rc4.orig/arch/x86/mm/init_64.c   2012-07-03 
 14:22:18.538011465 +0900
 +++ linux-3.5-rc4/arch/x86/mm/init_64.c2012-07-03 14:22:21.007983103 
 +0900
 @@ -978,6 +978,97 @@ vmemmap_populate(struct page *start_page
return 0;
}

 +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long 
 end,
 +struct page **pp)
 +{
 +  pgd_t *pgd;
 +  pud_t *pud;
 +  pmd_t *pmd;
 +  pte_t *pte;
 +  unsigned long next;
 +
 +  *pp = NULL;
 +
 +  pgd = pgd_offset_k(addr);
 +  if (pgd_none(*pgd))
 +  return (addr + PAGE_SIZE)  PAGE_MASK;

 Hmm, why not goto next pgd?

 Does it mean return (addr + PGDIR_SIZE)  PGDIR_MASK?


 +
 +  pud = pud_offset(pgd, addr);
 +  if (pud_none(*pud))
 +  return (addr + PAGE_SIZE)  PAGE_MASK;
 +
 +  if (!cpu_has_pse) {
 +  next = (addr + PAGE_SIZE)  PAGE_MASK;
 +  pmd = pmd_offset(pud, addr);
 +  if (pmd_none(*pmd))
 +  return next;
 +
 +  pte = pte_offset_kernel(pmd, addr);
 +  if (pte_none(*pte))
 +  return next;
 +
 +  *pp = pte_page(*pte);
 +  pte_clear(init_mm, addr, pte);

 I think you should flush tlb here.

 Thanks, I'll update it.


 +  } else {
 +  next = pmd_addr_end(addr, end);
 +
 +  pmd = pmd_offset(pud, addr);
 +  if (pmd_none(*pmd))
 +  return next;
 +
 +  *pp = pmd_page(*pmd);
 +  pmd_clear(pmd);
 +  }
 +
 +  return next;
 +}
 +
 +void __meminit
 +vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
 +{
 +  unsigned long addr = (unsigned long)memmap;
 +  unsigned long end = (unsigned long)(memmap + nr_pages);
 +  unsigned long next

Re: [RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug

2012-07-11 Thread Yasuaki Ishimatsu
Hi Dave,

2012/07/12 0:30, Dave Hansen wrote:
 On 07/09/2012 03:25 AM, Yasuaki Ishimatsu wrote:
 @@ -642,7 +642,7 @@ int __ref add_memory(int nid, u64 start,
  }

  /* create new memmap entry */
 -firmware_map_add_hotplug(start, start + size, System RAM);
 +firmware_map_add_hotplug(start, start + size - 1, System RAM);
 
 I know the firmware_map_*() calls use inclusive end addresses
 internally, but do we really need to expose them?  Both of the callers
 you mentioned do:
 
   firmware_map_add_hotplug(start, start + size - 1, System RAM);
 
 or
 
  firmware_map_add_early(entry-addr,
  entry-addr + entry-size - 1,
  e820_type_to_string(entry-type));
 
 So it seems a _bit_ silly to keep all of the callers doing this size-1
 thing.  I also noted that the new caller that you added does the same
 thing.  Could we just change the external calling convention to be
 exclusive?

Thank you for your comment.

Does the following patch include your comment? If O.K., I will separate
the patch from the series and send it for bug fix.

---
 arch/x86/kernel/e820.c|2 +-
 drivers/firmware/memmap.c |8 
 2 files changed, 5 insertions(+), 5 deletions(-)

Index: linux-next/arch/x86/kernel/e820.c
===
--- linux-next.orig/arch/x86/kernel/e820.c  2012-07-02 09:50:23.0 
+0900
+++ linux-next/arch/x86/kernel/e820.c   2012-07-12 13:30:45.942318179 +0900
@@ -944,7 +944,7 @@
for (i = 0; i  e820_saved.nr_map; i++) {
struct e820entry *entry = e820_saved.map[i];
firmware_map_add_early(entry-addr,
-   entry-addr + entry-size - 1,
+   entry-addr + entry-size,
e820_type_to_string(entry-type));
}
 }
Index: linux-next/drivers/firmware/memmap.c
===
--- linux-next.orig/drivers/firmware/memmap.c   2012-07-02 09:50:26.0 
+0900
+++ linux-next/drivers/firmware/memmap.c2012-07-12 13:40:53.823318481 
+0900
@@ -98,7 +98,7 @@
 /**
  * firmware_map_add_entry() - Does the real work to add a firmware memmap 
entry.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  * @entry: Pre-allocated (either kmalloc() or bootmem allocator), uninitialised
  * entry.
@@ -113,7 +113,7 @@
BUG_ON(start  end);

entry-start = start;
-   entry-end = end;
+   entry-end = end - 1;
entry-type = type;
INIT_LIST_HEAD(entry-list);
kobject_init(entry-kobj, memmap_ktype);
@@ -148,7 +148,7 @@
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  *
  * Adds a firmware mapping entry. This function is for memory hotplug, it is
@@ -175,7 +175,7 @@
 /**
  * firmware_map_add_early() - Adds a firmware mapping entry.
  * @start: Start of the memory range.
- * @end:   End of the memory range (inclusive).
+ * @end:   End of the memory range.
  * @type:  Type of the memory range.
  *
  * Adds a firmware mapping entry. This function uses the bootmem allocator

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory

2012-07-10 Thread Yasuaki Ishimatsu

Hi Christoph,

2012/07/10 0:18, Christoph Lameter wrote:


On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote:


Even if you apply these patches, you cannot remove the physical memory
completely since these patches are still under development. I want you to
cooperate to improve the physical memory hot-remove. So please review these
patches and give your comment/idea.


Could you at least give a method on how you want to do physical memory
removal?


We plan to release a dynamic hardware partitionable system. It will be
able to hot remove/add a system board which included memory and cpu.
But as you know, Linux does not support memory hot-remove on x86 box.
So I try to develop it.

Current plan to hot remove system board is to use container driver.
Thus I define the system board in ACPI DSDT table as a container device.
It have supported hot-add a container device. And if container device
has _EJ0 ACPI method, eject file to remove the container device is
prepared as follow:

# ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject
--w---. 1 root root 4096 Jul 10 18:19 
/sys/bus/acpi/devices/ACPI0004:01/eject

When I hot-remove the container device, I echo 1 to the file as follow:

#echo 1  /sys/bus/acpi/devices/ACPI0004\:02/eject

Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove()
for removing memory device. But the code does not do nothing.
So I developed the continuation of the function.


You would have to remove all objects from the range you want to
physically remove. That is only possible under special circumstances and
with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE
you still may get cases where pages are pinned for a long time.


I know it. So my memory hot-remove plan is as follows:

1. hot-added a system board
   All memory which included the system board is offline.

2. online the memory as removable page
   The function has not supported yet. It is being developed by Lai as follow:
   http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html
   If it is supported, I will be able to create movable memory.

3. hot-remove the memory by container device's eject file

Thanks,
Yasuaki Ishimatsu



I am not sure that these patches are useful unless we know where you are
going with this. If we end up with a situation where we still cannot
remove physical memory then this patchset is not helpful.




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory

2012-07-10 Thread Yasuaki Ishimatsu

Hi Jiang,

2012/07/11 1:50, Jiang Liu wrote:

On 07/10/2012 05:58 PM, Yasuaki Ishimatsu wrote:

Hi Christoph,

2012/07/10 0:18, Christoph Lameter wrote:


On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote:


Even if you apply these patches, you cannot remove the physical memory
completely since these patches are still under development. I want you to
cooperate to improve the physical memory hot-remove. So please review these
patches and give your comment/idea.


Could you at least give a method on how you want to do physical memory
removal?


We plan to release a dynamic hardware partitionable system. It will be
able to hot remove/add a system board which included memory and cpu.
But as you know, Linux does not support memory hot-remove on x86 box.
So I try to develop it.

Current plan to hot remove system board is to use container driver.
Thus I define the system board in ACPI DSDT table as a container device.
It have supported hot-add a container device. And if container device
has _EJ0 ACPI method, eject file to remove the container device is
prepared as follow:

# ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject
--w---. 1 root root 4096 Jul 10 18:19 
/sys/bus/acpi/devices/ACPI0004:01/eject

When I hot-remove the container device, I echo 1 to the file as follow:

#echo 1  /sys/bus/acpi/devices/ACPI0004\:02/eject

Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove()
for removing memory device. But the code does not do nothing.
So I developed the continuation of the function.


You would have to remove all objects from the range you want to
physically remove. That is only possible under special circumstances and
with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE
you still may get cases where pages are pinned for a long time.


I know it. So my memory hot-remove plan is as follows:

1. hot-added a system board
All memory which included the system board is offline.

2. online the memory as removable page
The function has not supported yet. It is being developed by Lai as follow:
http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html
If it is supported, I will be able to create movable memory.

3. hot-remove the memory by container device's eject file

We have implemented a prototype to do physical node (mem + CPU + IOH) hotplug
for Itanium and is now porting it to x86. But with currently solution, memory
hotplug functionality may cause 10-20% performance decrease because we 
concentrate
all DMA/Normal memory to the first NUMA node, and all other NUMA nodes only
hosts ZONE_MOVABLE. We are working on solution to minimize the performance
drop now.


Thank you for your interesting response.

I have a question. How do you move all other NUMA nodes to ZONE_MOVABLE?
To use ZONE_MOVABLE, we need to use boot options like kernelcore or movablecore.
But it is not enough, since the requested amount is spread evenly throughout
all nodes in the system. So I think we do not have way to move all other NUMA
node to ZONE_MOVABLE.

Thanks,
Yasuaki Ishimatsu





Thanks,
Yasuaki Ishimatsu



I am not sure that these patches are useful unless we know where you are
going with this. If we end up with a situation where we still cannot
remove physical memory then this patchset is not helpful.











___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory

2012-07-10 Thread Yasuaki Ishimatsu

Hi Jiang,

2012/07/11 9:21, Jiang Liu wrote:

On 07/11/2012 08:09 AM, Yasuaki Ishimatsu wrote:

Hi Jiang,

2012/07/11 1:50, Jiang Liu wrote:

On 07/10/2012 05:58 PM, Yasuaki Ishimatsu wrote:

Hi Christoph,

2012/07/10 0:18, Christoph Lameter wrote:


On Mon, 9 Jul 2012, Yasuaki Ishimatsu wrote:


Even if you apply these patches, you cannot remove the physical memory
completely since these patches are still under development. I want you to
cooperate to improve the physical memory hot-remove. So please review these
patches and give your comment/idea.


Could you at least give a method on how you want to do physical memory
removal?


We plan to release a dynamic hardware partitionable system. It will be
able to hot remove/add a system board which included memory and cpu.
But as you know, Linux does not support memory hot-remove on x86 box.
So I try to develop it.

Current plan to hot remove system board is to use container driver.
Thus I define the system board in ACPI DSDT table as a container device.
It have supported hot-add a container device. And if container device
has _EJ0 ACPI method, eject file to remove the container device is
prepared as follow:

# ls -l /sys/bus/acpi/devices/ACPI0004\:01/eject
--w---. 1 root root 4096 Jul 10 18:19 
/sys/bus/acpi/devices/ACPI0004:01/eject

When I hot-remove the container device, I echo 1 to the file as follow:

#echo 1  /sys/bus/acpi/devices/ACPI0004\:02/eject

Then acpi_bus_trim() is called. And it calls acpi_memory_device_remove()
for removing memory device. But the code does not do nothing.
So I developed the continuation of the function.


You would have to remove all objects from the range you want to
physically remove. That is only possible under special circumstances and
with a limited set of objects. Even if you exclusively use ZONE_MOVEABLE
you still may get cases where pages are pinned for a long time.


I know it. So my memory hot-remove plan is as follows:

1. hot-added a system board
 All memory which included the system board is offline.

2. online the memory as removable page
 The function has not supported yet. It is being developed by Lai as follow:
 http://lkml.indiana.edu/hypermail/linux/kernel/1207.0/01478.html
 If it is supported, I will be able to create movable memory.

3. hot-remove the memory by container device's eject file

We have implemented a prototype to do physical node (mem + CPU + IOH) hotplug
for Itanium and is now porting it to x86. But with currently solution, memory
hotplug functionality may cause 10-20% performance decrease because we 
concentrate
all DMA/Normal memory to the first NUMA node, and all other NUMA nodes only
hosts ZONE_MOVABLE. We are working on solution to minimize the performance
drop now.


Thank you for your interesting response.

I have a question. How do you move all other NUMA nodes to ZONE_MOVABLE?
To use ZONE_MOVABLE, we need to use boot options like kernelcore or movablecore.
But it is not enough, since the requested amount is spread evenly throughout
all nodes in the system. So I think we do not have way to move all other NUMA
node to ZONE_MOVABLE.

We have modified the ZONE_MOVABLE spreading and bootmem allocation. If the 
kernelcore
or movablecore kernel parameters are present, we follow current behavior. If 
those
parameter are absent and the platform supports physical hotplug, we will 
concentrate
DMA/NORMAL memory to specific nodes.


That's interesting. I want to know more details, if you do not mind.
Current kernel doesn't do the behavior, does it? So I think you have some
patches for changing the behavior. Will you merge these patches into
community kernel?

Thanks,
Yasuaki Ishimatsu





Thanks,
Yasuaki Ishimatsu





Thanks,
Yasuaki Ishimatsu



I am not sure that these patches are useful unless we know where you are
going with this. If we end up with a situation where we still cannot
remove physical memory then this patchset is not helpful.


















___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap

2012-07-10 Thread Yasuaki Ishimatsu
2012/07/11 14:06, Wen Congyang wrote:
Hi Wen,

 At 07/09/2012 06:33 PM, Yasuaki Ishimatsu Wrote:
 I don't think that all pages of virtual mapping in removed memory can be
 freed, since page which type is MIX_SECTION_INFO is difficult to free.
 So, the patch only frees page which type is SECTION_INFO at first.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 CC: Wen Congyang we...@cn.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
   arch/x86/mm/init_64.c |   91 
 ++
   include/linux/mm.h|2 +
   mm/memory_hotplug.c   |5 ++
   mm/sparse.c   |5 +-
   4 files changed, 101 insertions(+), 2 deletions(-)

 Index: linux-3.5-rc4/include/linux/mm.h
 ===
 --- linux-3.5-rc4.orig/include/linux/mm.h2012-07-03 14:22:18.530011567 
 +0900
 +++ linux-3.5-rc4/include/linux/mm.h 2012-07-03 14:22:20.83872 +0900
 @@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
   void vmemmap_populate_print_last(void);
   void register_page_bootmem_memmap(unsigned long section_nr, struct page 
 *map,
unsigned long size);
 +void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
 +void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);

   enum mf_flags {
  MF_COUNT_INCREASED = 1  0,
 Index: linux-3.5-rc4/mm/sparse.c
 ===
 --- linux-3.5-rc4.orig/mm/sparse.c   2012-07-03 14:21:45.071429805 +0900
 +++ linux-3.5-rc4/mm/sparse.c2012-07-03 14:22:21.000983767 +0900
 @@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
  /* This will make the necessary allocations eventually. */
  return sparse_mem_map_populate(pnum, nid);
   }
 -static void __kfree_section_memmap(struct page *memmap, unsigned long 
 nr_pages)
 +static void __kfree_section_memmap(struct page *page, unsigned long 
 nr_pages)
   {
 -return; /* XXX: Not implemented yet */
 +vmemmap_kfree(page, nr_pages);
 
 Hmm, I think you try to free the memory allocated in kmalloc_section_memmap().

Yes.

 
   }
   static void free_map_bootmem(struct page *page, unsigned long nr_pages)
   {
 +vmemmap_free_bootmem(page, nr_pages);
   }
 
 Hmm, which function is the memory you try to free allocated in?

The function try to free memory allocated from bootmem. The memory has
been registered by get_page_bootmem(). So we can free the memory by
put_page_bootmem().

 
   #else
   static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
 Index: linux-3.5-rc4/arch/x86/mm/init_64.c
 ===
 --- linux-3.5-rc4.orig/arch/x86/mm/init_64.c 2012-07-03 14:22:18.538011465 
 +0900
 +++ linux-3.5-rc4/arch/x86/mm/init_64.c  2012-07-03 14:22:21.007983103 
 +0900
 @@ -978,6 +978,97 @@ vmemmap_populate(struct page *start_page
  return 0;
   }

 +unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
 +  struct page **pp)
 +{
 +pgd_t *pgd;
 +pud_t *pud;
 +pmd_t *pmd;
 +pte_t *pte;
 +unsigned long next;
 +
 +*pp = NULL;
 +
 +pgd = pgd_offset_k(addr);
 +if (pgd_none(*pgd))
 +return (addr + PAGE_SIZE)  PAGE_MASK;
 
 Hmm, why not goto next pgd?

Does it mean return (addr + PGDIR_SIZE)  PGDIR_MASK?

 
 +
 +pud = pud_offset(pgd, addr);
 +if (pud_none(*pud))
 +return (addr + PAGE_SIZE)  PAGE_MASK;
 +
 +if (!cpu_has_pse) {
 +next = (addr + PAGE_SIZE)  PAGE_MASK;
 +pmd = pmd_offset(pud, addr);
 +if (pmd_none(*pmd))
 +return next;
 +
 +pte = pte_offset_kernel(pmd, addr);
 +if (pte_none(*pte))
 +return next;
 +
 +*pp = pte_page(*pte);
 +pte_clear(init_mm, addr, pte);
 
 I think you should flush tlb here.

Thanks, I'll update it.

 
 +} else {
 +next = pmd_addr_end(addr, end);
 +
 +pmd = pmd_offset(pud, addr);
 +if (pmd_none(*pmd))
 +return next;
 +
 +*pp = pmd_page(*pmd);
 +pmd_clear(pmd);
 +}
 +
 +return next;
 +}
 +
 +void __meminit
 +vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
 +{
 +unsigned long addr = (unsigned long)memmap;
 +unsigned long end = (unsigned long)(memmap + nr_pages);
 +unsigned long next;
 +unsigned int order;
 +struct page *page;
 +
 +for (; addr  end; addr = next) {
 +page = NULL

Re: [RFC PATCH v2 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-07-09 Thread Yasuaki Ishimatsu
Hi Wen,

2012/07/06 18:20, Wen Congyang wrote:
 At 07/06/2012 04:27 PM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/04 19:01, Wen Congyang wrote:
 At 07/04/2012 01:52 PM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/04 14:08, Wen Congyang wrote:
 At 07/04/2012 12:45 PM, Yasuaki Ishimatsu Wrote:
 Hi Wen,

 2012/07/03 15:35, Wen Congyang wrote:
 At 07/03/2012 01:56 PM, Yasuaki Ishimatsu Wrote:
 When (hot)adding memory into system, /sys/firmware/memmap/X/{end, 
 start, type}
 sysfs files are created. But there is no code to remove these files. 
 The patch
 implements the function to remove them.

 Note : The code does not free firmware_map_entry since there is no way 
 to free
memory which is allocated by bootmem.

 CC: David Rientjes rient...@google.com
 CC: Jiang Liu liu...@gmail.com
 CC: Len Brown len.br...@intel.com
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: Christoph Lameter c...@linux.com
 Cc: Minchan Kim minchan@gmail.com
 CC: Andrew Morton a...@linux-foundation.org
 CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
 Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 ---
  drivers/firmware/memmap.c|   70 
 +++
  include/linux/firmware-map.h |6 +++
  mm/memory_hotplug.c  |6 +++
  3 files changed, 81 insertions(+), 1 deletion(-)

 Index: linux-3.5-rc4/mm/memory_hotplug.c
 ===
 --- linux-3.5-rc4.orig/mm/memory_hotplug.c 2012-07-03 
 14:22:00.190240794 +0900
 +++ linux-3.5-rc4/mm/memory_hotplug.c  2012-07-03 14:22:03.549198802 
 +0900
 @@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory);

  int remove_memory(int nid, u64 start, u64 size)
  {
 -  return -EBUSY;
 +  lock_memory_hotplug();
 +  /* remove memmap entry */
 +  firmware_map_remove(start, start + size - 1, System RAM);
 +  unlock_memory_hotplug();
 +  return 0;

  }
  EXPORT_SYMBOL_GPL(remove_memory);
 Index: linux-3.5-rc4/include/linux/firmware-map.h
 ===
 --- linux-3.5-rc4.orig/include/linux/firmware-map.h2012-07-03 
 14:21:45.766421116 +0900
 +++ linux-3.5-rc4/include/linux/firmware-map.h 2012-07-03 
 14:22:03.550198789 +0900
 @@ -25,6 +25,7 @@

  int firmware_map_add_early(u64 start, u64 end, const char *type);
  int firmware_map_add_hotplug(u64 start, u64 end, const char 
 *type);
 +int firmware_map_remove(u64 start, u64 end, const char *type);

  #else /* CONFIG_FIRMWARE_MEMMAP */

 @@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl
return 0;
  }

 +static inline int firmware_map_remove(u64 start, u64 end, const char 
 *type)
 +{
 +  return 0;
 +}
 +
  #endif /* CONFIG_FIRMWARE_MEMMAP */

  #endif /* _LINUX_FIRMWARE_MAP_H */
 Index: linux-3.5-rc4/drivers/firmware/memmap.c
 ===
 --- linux-3.5-rc4.orig/drivers/firmware/memmap.c   2012-07-03 
 14:21:45.761421180 +0900
 +++ linux-3.5-rc4/drivers/firmware/memmap.c2012-07-03 
 14:22:03.569198549 +0900
 @@ -79,7 +79,16 @@ static const struct sysfs_ops memmap_att
.show = memmap_attr_show,
  };

 +static void release_firmware_map_entry(struct kobject *kobj)
 +{
 +  /*
 +   * FIXME : There is no idea.
 +   * How to free the entry which allocated bootmem?
 +   */

 I find a function free_bootmem(), but I am not sure whether it can work 
 here.

 It cannot work here.

 Another problem: how to check whether the entry uses bootmem?

 When firmware_map_entry is allocated by kzalloc(), the page has PG_slab.

 This is not true. In my test, I find the page does not have PG_slab 
 sometimes.

 I think that it depends on the allocated size. firmware_map_entry size is
 smaller than PAGE_SIZE. So the page has PG_Slab.

 In my test, I add printk in the function firmware_map_add_hotplug() to 
 display
 page's flags. And sometimes the page is not allocated by slab(I use 
 PageSlab()
 to verify it).

 How did you check it? Could you send your debug patch?
 
 When the memory is not allocated from slab, the flags is 0x108000.

Thank you for sending the patch.
I think the page to not have PageSlab is a compound page. So we can check
whether the entry is allocate from bootmem or not as follow:

static void release_firmware_map_entry(struct kobject *kobj)
{
struct firmware_map_entry *entry = to_memmap_entry(kobj);
struct page *head_page;

head_page = virt_to_head_page(entry);
if (PageSlab(head_page))
kfree(etnry);
else
/* the entry is allocated from bootmem */
}

Thanks,
Yasuaki Ishimatsu

 
  From 8dd51368d6c03edf7edc89cab17441e3741c39c7 Mon Sep 17 00:00:00 2001
 From: Wen Congyang we...@cn.fujitsu.com
 Date: Wed, 4 Jul 2012 16:05:26 +0800
 Subject: [PATCH] debug

[RFC PATCH v3 0/13] memory-hotplug : hot-remove physical memory

2012-07-09 Thread Yasuaki Ishimatsu
This patch series aims to support physical memory hot-remove.

  [RFC PATCH v3 1/13] memory-hotplug : rename remove_memory to offline_memory
  [RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to 
acpi_memory_device_remove
  [RFC PATCH v3 3/13] memory-hotplug : unify argument of 
firmware_map_add_early/hotplug
  [RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs
  [RFC PATCH v3 5/13] memory-hotplug : does not release memory region in 
PAGES_PER_SECTION chunks
  [RFC PATCH v3 6/13] memory-hotplug : add memory_block_release
  [RFC PATCH v3 7/13] memory-hotplug : remove_memory calls __remove_pages
  [RFC PATCH v3 8/13] memory-hotplug : check page type in get_page_bootmem
  [RFC PATCH v3 9/13] memory-hotplug : move register_page_bootmem_info_node and 
put_page_bootmem for
sparse-vmemmap
  [RFC PATCH v3 10/13] memory-hotplug : implement 
register_page_bootmem_info_section of sparse-vmemmap
  [RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap
  [RFC PATCH v3 12/13] memory-hotplug : add node_device_release
  [RFC PATCH v3 13/13] memory-hotplug : remove sysfs file of node

Even if you apply these patches, you cannot remove the physical memory
completely since these patches are still under development. I want you to
cooperate to improve the physical memory hot-remove. So please review these
patches and give your comment/idea.

The patches can free/remove following things:

  - acpi_memory_info  : [RFC PATCH 2/13]
  - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 4/13]
  - iomem_resource: [RFC PATCH 5/13]
  - mem_section and related sysfs files   : [RFC PATCH 6-11/13]
  - node and related sysfs files  : [RFC PATCH 12-13/13]

The patches cannot do following things yet:

  - page table of removed memory

If you find lack of function for physical memory hot-remove, please let me
know.

change log of v3:
 * rebase to 3.5.0-rc6

 [RFC PATCH v2 2/13]
   * remove extra kobject_put()

   * The patch was commented by Wen. Wen's comment is
 acpi_memory_device_remove() should ignore a return value of
 remove_memory() since caller does not care the return value.
 But I did not change it since I think caller should care the
 return value. And I am trying to fix it as follow:

 https://lkml.org/lkml/2012/7/5/624

 [RFC PATCH v2 4/13]
   * remove a firmware_memmap_entry allocated by kzmalloc()

change log of v2:
 [RFC PATCH v2 2/13]
   * check whether memory block is offline or not before calling 
offline_memory()
   * check whether section is valid or not in is_memblk_offline()
   * call kobject_put() for each memory_block in is_memblk_offline()

 [RFC PATCH v2 3/13]
   * unify the end argument of firmware_map_add_early/hotplug

 [RFC PATCH v2 4/13]
   * add release_firmware_map_entry() for freeing firmware_map_entry

 [RFC PATCH v2 6/13]
  * add release_memory_block() for freeing memory_block

 [RFC PATCH v2 11/13]
  * fix wrong arguments of free_pages()

---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   16 +-
 arch/x86/mm/init_64.c   |  144 
 drivers/acpi/acpi_memhotplug.c  |   28 
 drivers/base/memory.c   |   54 -
 drivers/base/node.c |7 +
 drivers/firmware/memmap.c   |   78 -
 include/linux/firmware-map.h|6 +
 include/linux/memory.h  |5
 include/linux/memory_hotplug.h  |   17 --
 include/linux/mm.h  |5
 mm/memory_hotplug.c |   98 
 mm/sparse.c |5
 12 files changed, 414 insertions(+), 49 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 1/13] memory-hotplug : rename remove_memory to offline_memory

2012-07-09 Thread Yasuaki Ishimatsu
remove_memory() does not remove memory but just offlines memory. The patch
changes name of it to offline_memory().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/acpi/acpi_memhotplug.c |2 +-
 drivers/base/memory.c  |4 ++--
 include/linux/memory_hotplug.h |2 +-
 mm/memory_hotplug.c|6 +++---
 4 files changed, 7 insertions(+), 7 deletions(-)

Index: linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c
===
--- linux-3.5-rc4.orig/drivers/acpi/acpi_memhotplug.c   2012-07-03 
14:21:46.102416917 +0900
+++ linux-3.5-rc4/drivers/acpi/acpi_memhotplug.c2012-07-03 
14:21:49.458374960 +0900
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(st
 */
list_for_each_entry_safe(info, n, mem_device-res_list, list) {
if (info-enabled) {
-   result = remove_memory(info-start_addr, info-length);
+   result = offline_memory(info-start_addr, info-length);
if (result)
return result;
}
Index: linux-3.5-rc4/drivers/base/memory.c
===
--- linux-3.5-rc4.orig/drivers/base/memory.c2012-07-03 14:21:46.095417003 
+0900
+++ linux-3.5-rc4/drivers/base/memory.c 2012-07-03 14:21:49.459374948 +0900
@@ -266,8 +266,8 @@ memory_block_action(unsigned long phys_i
break;
case MEM_OFFLINE:
start_paddr = page_to_pfn(first_page)  PAGE_SHIFT;
-   ret = remove_memory(start_paddr,
-   nr_pages  PAGE_SHIFT);
+   ret = offline_memory(start_paddr,
+nr_pages  PAGE_SHIFT);
break;
default:
WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: 
Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:21:46.102416917 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:21:49.466374860 +0900
@@ -990,7 +990,7 @@ out:
return ret;
 }

-int remove_memory(u64 start, u64 size)
+int offline_memory(u64 start, u64 size)
 {
unsigned long start_pfn, end_pfn;

@@ -999,9 +999,9 @@ int remove_memory(u64 start, u64 size)
return offline_pages(start_pfn, end_pfn, 120 * HZ);
 }
 #else
-int remove_memory(u64 start, u64 size)
+int offline_memory(u64 start, u64 size)
 {
return -EINVAL;
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
-EXPORT_SYMBOL_GPL(remove_memory);
+EXPORT_SYMBOL_GPL(offline_memory);
Index: linux-3.5-rc4/include/linux/memory_hotplug.h
===
--- linux-3.5-rc4.orig/include/linux/memory_hotplug.h   2012-07-03 
14:21:46.102416917 +0900
+++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 
14:21:49.471374796 +0900
@@ -233,7 +233,7 @@ static inline int is_mem_section_removab
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 2/13] memory-hotplug : add physical memory hotplug code to acpi_memory_device_remove

2012-07-09 Thread Yasuaki Ishimatsu
acpi_memory_device_remove() has been prepared to remove physical memory.
But, the function only frees acpi_memory_device currentlry.

The patch adds following functions into acpi_memory_device_remove():
  - offline memory
  - remove physical memory (only return -EBUSY)
  - free acpi_memory_device

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/acpi/acpi_memhotplug.c |   26 +-
 drivers/base/memory.c  |   39 +++
 include/linux/memory.h |5 +
 include/linux/memory_hotplug.h |1 +
 mm/memory_hotplug.c|8 
 5 files changed, 78 insertions(+), 1 deletion(-)

Index: linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c
===
--- linux-3.5-rc6.orig/drivers/acpi/acpi_memhotplug.c   2012-07-09 
18:08:29.946888653 +0900
+++ linux-3.5-rc6/drivers/acpi/acpi_memhotplug.c2012-07-09 
18:08:43.470719531 +0900
@@ -29,6 +29,7 @@
 #include linux/module.h
 #include linux/init.h
 #include linux/types.h
+#include linux/memory.h
 #include linux/memory_hotplug.h
 #include linux/slab.h
 #include acpi/acpi_drivers.h
@@ -452,12 +453,35 @@ static int acpi_memory_device_add(struct
 static int acpi_memory_device_remove(struct acpi_device *device, int type)
 {
struct acpi_memory_device *mem_device = NULL;
-
+   struct acpi_memory_info *info, *tmp;
+   int result;
+   int node;

if (!device || !acpi_driver_data(device))
return -EINVAL;

mem_device = acpi_driver_data(device);
+
+   node = acpi_get_node(mem_device-device-handle);
+
+   list_for_each_entry_safe(info, tmp, mem_device-res_list, list) {
+   if (!info-enabled)
+   continue;
+
+   if (!is_memblk_offline(info-start_addr, info-length)) {
+   result = offline_memory(info-start_addr, info-length);
+   if (result)
+   return result;
+   }
+
+   result = remove_memory(node, info-start_addr, info-length);
+   if (result)
+   return result;
+
+   list_del(info-list);
+   kfree(info);
+   }
+
kfree(mem_device);

return 0;
Index: linux-3.5-rc6/include/linux/memory_hotplug.h
===
--- linux-3.5-rc6.orig/include/linux/memory_hotplug.h   2012-07-09 
18:08:29.955888542 +0900
+++ linux-3.5-rc6/include/linux/memory_hotplug.h2012-07-09 
18:08:43.471719518 +0900
@@ -233,6 +233,7 @@ static inline int is_mem_section_removab
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
+extern int remove_memory(int nid, u64 start, u64 size);
 extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-09 18:08:29.953888567 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-09 18:08:43.476719455 +0900
@@ -659,6 +659,14 @@ out:
 }
 EXPORT_SYMBOL_GPL(add_memory);

+int remove_memory(int nid, u64 start, u64 size)
+{
+   return -EBUSY;
+
+}
+EXPORT_SYMBOL_GPL(remove_memory);
+
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * A free page on the buddy free lists (not the per-cpu lists) has PageBuddy
Index: linux-3.5-rc6/drivers/base/memory.c
===
--- linux-3.5-rc6.orig/drivers/base/memory.c2012-07-09 18:08:29.947888640 
+0900
+++ linux-3.5-rc6/drivers/base/memory.c 2012-07-09 18:10:54.880076739 +0900
@@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(
 }
 EXPORT_SYMBOL(unregister_memory_isolate_notifier);

+bool is_memblk_offline(unsigned long start, unsigned long size)
+{
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = start_pfn + PFN_DOWN(start);
+
+   for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
+   section_nr = pfn_to_section_nr(pfn);
+   if (!present_section_nr(section_nr

[RFC PATCH v3 3/13] memory-hotplug : unify argument of firmware_map_add_early/hotplug

2012-07-09 Thread Yasuaki Ishimatsu
There are two ways to create /sys/firmware/memmap/X sysfs:

  - firmware_map_add_early
When the system starts, it is calledd from e820_reserve_resources()
  - firmware_map_add_hotplug
When the memory is hot plugged, it is called from add_memory()

But these functions are called without unifying value of end argument as below:

  - end argument of firmware_map_add_early()   : start + size - 1
  - end argument of firmware_map_add_hogplug() : start + size

The patch unifies them to start + size - 1.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 mm/memory_hotplug.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-09 18:08:43.476719455 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-09 18:13:57.664791810 +0900
@@ -642,7 +642,7 @@ int __ref add_memory(int nid, u64 start,
}

/* create new memmap entry */
-   firmware_map_add_hotplug(start, start + size, System RAM);
+   firmware_map_add_hotplug(start, start + size - 1, System RAM);

goto out;


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 4/13] memory-hotplug : remove /sys/firmware/memmap/X sysfs

2012-07-09 Thread Yasuaki Ishimatsu
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note : The code does not free firmware_map_entry since there is no way to free
   memory which is allocated by bootmem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/firmware/memmap.c|   78 ++-
 include/linux/firmware-map.h |6 +++
 mm/memory_hotplug.c  |6 ++-
 3 files changed, 88 insertions(+), 2 deletions(-)

Index: linux-3.5-rc6/mm/memory_hotplug.c
===
--- linux-3.5-rc6.orig/mm/memory_hotplug.c  2012-07-09 18:23:13.323844923 
+0900
+++ linux-3.5-rc6/mm/memory_hotplug.c   2012-07-09 18:23:19.522767424 +0900
@@ -661,7 +661,11 @@ EXPORT_SYMBOL_GPL(add_memory);

 int remove_memory(int nid, u64 start, u64 size)
 {
-   return -EBUSY;
+   lock_memory_hotplug();
+   /* remove memmap entry */
+   firmware_map_remove(start, start + size - 1, System RAM);
+   unlock_memory_hotplug();
+   return 0;

 }
 EXPORT_SYMBOL_GPL(remove_memory);
Index: linux-3.5-rc6/include/linux/firmware-map.h
===
--- linux-3.5-rc6.orig/include/linux/firmware-map.h 2012-07-09 
18:23:09.532892314 +0900
+++ linux-3.5-rc6/include/linux/firmware-map.h  2012-07-09 18:23:19.523767412 
+0900
@@ -25,6 +25,7 @@

 int firmware_map_add_early(u64 start, u64 end, const char *type);
 int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
+int firmware_map_remove(u64 start, u64 end, const char *type);

 #else /* CONFIG_FIRMWARE_MEMMAP */

@@ -38,6 +39,11 @@ static inline int firmware_map_add_hotpl
return 0;
 }

+static inline int firmware_map_remove(u64 start, u64 end, const char *type)
+{
+   return 0;
+}
+
 #endif /* CONFIG_FIRMWARE_MEMMAP */

 #endif /* _LINUX_FIRMWARE_MAP_H */
Index: linux-3.5-rc6/drivers/firmware/memmap.c
===
--- linux-3.5-rc6.orig/drivers/firmware/memmap.c2012-07-09 
18:23:09.532892314 +0900
+++ linux-3.5-rc6/drivers/firmware/memmap.c 2012-07-09 18:25:46.371931554 
+0900
@@ -21,6 +21,7 @@
 #include linux/types.h
 #include linux/bootmem.h
 #include linux/slab.h
+#include linux/mm.h

 /*
  * Data types 
--
@@ -79,7 +80,22 @@ static const struct sysfs_ops memmap_att
.show = memmap_attr_show,
 };

+#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+   struct firmware_map_entry *entry = to_memmap_entry(kobj);
+   struct page *head_page;
+
+   head_page = virt_to_head_page(entry);
+   if (PageSlab(head_page))
+   kfree(entry);
+
+   /* There is no way to free memory allocated from bootmem*/
+}
+
 static struct kobj_type memmap_ktype = {
+   .release= release_firmware_map_entry,
.sysfs_ops  = memmap_attr_ops,
.default_attrs  = def_attrs,
 };
@@ -123,6 +139,16 @@ static int firmware_map_add_entry(u64 st
return 0;
 }

+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
+{
+   list_del(entry-list);
+}
+
 /*
  * Add memmap entry on sysfs
  */
@@ -144,6 +170,31 @@ static int add_sysfs_fw_map_entry(struct
return 0;
 }

+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+   kobject_put(entry-kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+struct firmware_map_entry * __meminit
+find_firmware_map_entry(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   list_for_each_entry(entry, map_entries, list)
+   if ((entry-start == start)  (entry-end == end) 
+   (!strcmp(entry-type, type)))
+   return entry;
+
+   return NULL;
+}
+
 /**
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
@@ -196,6 +247,32 @@ int __init firmware_map_add_early(u64 st
return firmware_map_add_entry(start, end, type, entry);
 }

+/**
+ * firmware_map_remove() - remove a firmware mapping entry
+ * @start: Start

[RFC PATCH v3 5/13] memory-hotplug : does not release memory region in PAGES_PER_SECTION chunks

2012-07-09 Thread Yasuaki Ishimatsu
Since applying a patch(de7f0cba96786c), release_mem_region() has been changed
as called in PAGES_PER_SECTION chunks because register_memory_resource() is
called in PAGES_PER_SECTION chunks by add_memory(). But it seems firmware
dependency. If CRS are written in the PAGES_PER_SECTION chunks in ACPI DSDT
Table, register_memory_resource() is called in PAGES_PER_SECTION chunks.
But if CRS are written in the DIMM unit in ACPI DSDT Table,
register_memory_resource() is called in DIMM unit. So release_mem_region()
should not be called in PAGES_PER_SECTION chunks. The patch fixes it.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   13 +
 mm/memory_hotplug.c |4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:22:03.549198802 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:22:05.919169458 +0900
@@ -358,11 +358,11 @@ int __remove_pages(struct zone *zone, un
BUG_ON(phys_start_pfn  ~PAGE_SECTION_MASK);
BUG_ON(nr_pages % PAGES_PER_SECTION);

+   release_mem_region(phys_start_pfn  PAGE_SHIFT,  nr_pages * PAGE_SIZE);
+
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
-   release_mem_region(pfn  PAGE_SHIFT,
-  PAGES_PER_SECTION  PAGE_SHIFT);
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;
Index: linux-3.5-rc4/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux-3.5-rc4.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  
2012-07-03 14:21:45.641422678
+0900
+++ linux-3.5-rc4/arch/powerpc/platforms/pseries/hotplug-memory.c   
2012-07-03 14:22:05.920169437 +0900
@@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsig
 {
unsigned long start, start_pfn;
struct zone *zone;
-   int ret;
+   int i, ret;
+   int sections_to_remove;

start_pfn = base  PAGE_SHIFT;

@@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsig
 * to sysfs state file and we can't remove sysfs entries
 * while writing to it. So we have to defer it to here.
 */
-   ret = __remove_pages(zone, start_pfn, memblock_size  PAGE_SHIFT);
-   if (ret)
-   return ret;
+   sections_to_remove = (memblock_size  PAGE_SHIFT) / PAGES_PER_SECTION;
+   for (i = 0; i  sections_to_remove; i++) {
+   unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
+   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   if (ret)
+   return ret;
+   }

/*
 * Update memory regions for memory remove

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 6/13] memory-hotplug : add memory_block_release

2012-07-09 Thread Yasuaki Ishimatsu
When calling remove_memory_block(), the function shows following message at
device_release().

Device 'memory528' does not have a release() function, it is broken and must
be fixed.

remove_memory_block() calls kfree(mem). I think it shouled be called from
device_release(). So the patch implements memory_block_release()

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/base/memory.c |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Index: linux-3.5-rc6/drivers/base/memory.c
===
--- linux-3.5-rc6.orig/drivers/base/memory.c2012-07-09 18:10:54.880076739 
+0900
+++ linux-3.5-rc6/drivers/base/memory.c 2012-07-09 18:19:20.471755922 +0900
@@ -109,6 +109,15 @@ bool is_memblk_offline(unsigned long sta
 }
 EXPORT_SYMBOL(is_memblk_offline);

+#define to_memory_block(device) container_of(device, struct memory_block, dev)
+
+static void release_memory_block(struct device *dev)
+{
+   struct memory_block *mem = to_memory_block(dev);
+
+   kfree(mem);
+}
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
@@ -119,6 +128,7 @@ int register_memory(struct memory_block

memory-dev.bus = memory_subsys;
memory-dev.id = memory-start_section_nr / sections_per_block;
+   memory-dev.release = release_memory_block;

error = device_register(memory-dev);
return error;
@@ -669,7 +679,6 @@ int remove_memory_block(unsigned long no
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
unregister_memory(mem);
-   kfree(mem);
} else
kobject_put(mem-dev.kobj);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 7/13] memory-hotplug : remove_memory calls __remove_pages

2012-07-09 Thread Yasuaki Ishimatsu
The patch adds __remove_pages() to remove_memory(). Then the range of
phys_start_pfn argument and nr_pages argument in __remove_pagse() may
have different zone. So zone argument is removed from __remove_pages()
and __remove_pages() caluculates zone in each section.

When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap.
So __remove_section only calls unregister_memory_section().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/powerpc/platforms/pseries/hotplug-memory.c |5 +
 include/linux/memory_hotplug.h  |3 +--
 mm/memory_hotplug.c |   20 +---
 3 files changed, 15 insertions(+), 13 deletions(-)

Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:22:05.919169458 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:22:10.170116406 +0900
@@ -275,11 +275,14 @@ static int __meminit __add_section(int n
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
-   /*
-* XXX: Freeing memmap with vmemmap is not implement yet.
-*  This should be removed later.
-*/
-   return -EBUSY;
+   int ret;
+
+   if (!valid_section(ms))
+   return ret;
+
+   ret = unregister_memory_section(ms);
+
+   return ret;
 }
 #else
 static int __remove_section(struct zone *zone, struct mem_section *ms)
@@ -346,11 +349,11 @@ EXPORT_SYMBOL_GPL(__add_pages);
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-unsigned long nr_pages)
+int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages)
 {
unsigned long i, ret = 0;
int sections_to_remove;
+   struct zone *zone;

/*
 * We can only remove entire sections
@@ -363,6 +366,7 @@ int __remove_pages(struct zone *zone, un
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+   zone = page_zone(pfn_to_page(pfn));
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;
@@ -664,6 +668,8 @@ int remove_memory(int nid, u64 start, u6
lock_memory_hotplug();
/* remove memmap entry */
firmware_map_remove(start, start + size - 1, System RAM);
+
+   __remove_pages(start  PAGE_SHIFT, size  PAGE_SHIFT);
unlock_memory_hotplug();
return 0;

Index: linux-3.5-rc4/include/linux/memory_hotplug.h
===
--- linux-3.5-rc4.orig/include/linux/memory_hotplug.h   2012-07-03 
14:21:58.330264047 +0900
+++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 
14:22:10.170116406 +0900
@@ -89,8 +89,7 @@ extern bool is_pageblock_removable_noloc
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
-extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-   unsigned long nr_pages);
+extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages);

 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
Index: linux-3.5-rc4/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux-3.5-rc4.orig/arch/powerpc/platforms/pseries/hotplug-memory.c  
2012-07-03 14:22:05.920169437
+0900
+++ linux-3.5-rc4/arch/powerpc/platforms/pseries/hotplug-memory.c   
2012-07-03 14:22:10.172116353 +0900
@@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(vo
 static int pseries_remove_memblock(unsigned long base, unsigned int 
memblock_size)
 {
unsigned long start, start_pfn;
-   struct zone *zone;
int i, ret;
int sections_to_remove;

@@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsig
return 0;
}

-   zone = page_zone(pfn_to_page(start_pfn));
-
/*
 * Remove section mappings and sysfs entries for the
 * section of the memory we are removing.
@@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsig
sections_to_remove = (memblock_size  PAGE_SHIFT

[RFC PATCH v3 8/13] memory-hotplug : check page type in get_page_bootmem

2012-07-09 Thread Yasuaki Ishimatsu
There is a possibility that get_page_bootmem() is called to the same page many
times. So when get_page_bootmem is called to the same page, the function only
increments page-_count.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 mm/memory_hotplug.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:22:10.170116406 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:22:12.299089413 +0900
@@ -95,10 +95,17 @@ static void release_memory_resource(stru
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
-   page-lru.next = (struct list_head *) type;
-   SetPagePrivate(page);
-   set_page_private(page, info);
-   atomic_inc(page-_count);
+   unsigned long page_type;
+
+   page_type = (unsigned long) page-lru.next;
+   if (type  MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+   type  MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+   page-lru.next = (struct list_head *) type;
+   SetPagePrivate(page);
+   set_page_private(page, info);
+   atomic_inc(page-_count);
+   } else
+   atomic_inc(page-_count);
 }

 /* reference to __meminit __free_pages_bootmem is valid

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 9/13] memory-hotplug : move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap

2012-07-09 Thread Yasuaki Ishimatsu
For implementing register_page_bootmem_info_node of sparse-vmemmap,
register_page_bootmem_info_node and put_page_bootmem are moved to
memory_hotplug.c

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 include/linux/memory_hotplug.h |9 -
 mm/memory_hotplug.c|8 ++--
 2 files changed, 6 insertions(+), 11 deletions(-)

Index: linux-3.5-rc4/include/linux/memory_hotplug.h
===
--- linux-3.5-rc4.orig/include/linux/memory_hotplug.h   2012-07-03 
14:22:10.170116406 +0900
+++ linux-3.5-rc4/include/linux/memory_hotplug.h2012-07-03 
14:22:14.409063086 +0900
@@ -160,17 +160,8 @@ static inline void arch_refresh_nodedata
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */

-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
-{
-}
-static inline void put_page_bootmem(struct page *page)
-{
-}
-#else
 extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
-#endif

 /*
  * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:22:12.299089413 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:22:14.419062959 +0900
@@ -91,7 +91,6 @@ static void release_memory_resource(stru
 }

 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
@@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page

 }

+#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
unsigned long *usemap, mapsize, section_nr, i;
@@ -163,6 +163,11 @@ static void register_page_bootmem_info_s
get_page_bootmem(section_nr, page, MIX_SECTION_INFO);

 }
+#else
+static inline void register_page_bootmem_info_section(unsigned long start_pfn)
+{
+}
+#endif

 void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
@@ -198,7 +203,6 @@ void register_page_bootmem_info_node(str
register_page_bootmem_info_section(pfn);

 }
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */

 static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
   unsigned long end_pfn)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 10/13] memory-hotplug : implement register_page_bootmem_info_section of sparse-vmemmap

2012-07-09 Thread Yasuaki Ishimatsu
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
So the patch searches pages of virtual mapping and registers the pages by
get_page_bootmem().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/x86/mm/init_64.c  |   53 +
 include/linux/memory_hotplug.h |2 +
 include/linux/mm.h |3 +-
 mm/memory_hotplug.c|   23 +++--
 4 files changed, 77 insertions(+), 4 deletions(-)

Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:22:14.419062959 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:22:18.522011667 +0900
@@ -91,8 +91,8 @@ static void release_memory_resource(stru
 }

 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-static void get_page_bootmem(unsigned long info,  struct page *page,
-unsigned long type)
+void get_page_bootmem(unsigned long info,  struct page *page,
+ unsigned long type)
 {
unsigned long page_type;

@@ -164,8 +164,25 @@ static void register_page_bootmem_info_s

 }
 #else
-static inline void register_page_bootmem_info_section(unsigned long start_pfn)
+static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
+   unsigned long mapsize, section_nr;
+   struct mem_section *ms;
+   struct page *page, *memmap;
+
+   if (!pfn_valid(start_pfn))
+   return;
+
+   section_nr = pfn_to_section_nr(start_pfn);
+   ms = __nr_to_section(section_nr);
+
+   memmap = sparse_decode_mem_map(ms-section_mem_map, section_nr);
+
+   page = virt_to_page(memmap);
+   mapsize = sizeof(struct page) * PAGES_PER_SECTION;
+   mapsize = PAGE_ALIGN(mapsize)  PAGE_SHIFT;
+
+   register_page_bootmem_memmap(section_nr, memmap, PAGES_PER_SECTION);
 }
 #endif

Index: linux-3.5-rc4/include/linux/mm.h
===
--- linux-3.5-rc4.orig/include/linux/mm.h   2012-07-03 14:21:45.223427904 
+0900
+++ linux-3.5-rc4/include/linux/mm.h2012-07-03 14:22:18.530011567 +0900
@@ -1586,7 +1586,8 @@ int vmemmap_populate_basepages(struct pa
unsigned long pages, int node);
 int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
 void vmemmap_populate_print_last(void);
-
+void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
+ unsigned long size);

 enum mf_flags {
MF_COUNT_INCREASED = 1  0,
Index: linux-3.5-rc4/arch/x86/mm/init_64.c
===
--- linux-3.5-rc4.orig/arch/x86/mm/init_64.c2012-07-03 14:21:45.228427843 
+0900
+++ linux-3.5-rc4/arch/x86/mm/init_64.c 2012-07-03 14:22:18.538011465 +0900
@@ -978,6 +978,59 @@ vmemmap_populate(struct page *start_page
return 0;
 }

+void __meminit
+register_page_bootmem_memmap(unsigned long section_nr, struct page *start_page,
+unsigned long size)
+{
+   unsigned long addr = (unsigned long)start_page;
+   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long next;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+
+   for (; addr  end; addr = next) {
+   pte_t *pte = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud)) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   continue;
+   get_page_bootmem(section_nr, pmd_page(*pmd),
+MIX_SECTION_INFO);
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte

[RFC PATCH v3 11/13] memory-hotplug : free memmap of sparse-vmemmap

2012-07-09 Thread Yasuaki Ishimatsu
I don't think that all pages of virtual mapping in removed memory can be
freed, since page which type is MIX_SECTION_INFO is difficult to free.
So, the patch only frees page which type is SECTION_INFO at first.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 arch/x86/mm/init_64.c |   91 ++
 include/linux/mm.h|2 +
 mm/memory_hotplug.c   |5 ++
 mm/sparse.c   |5 +-
 4 files changed, 101 insertions(+), 2 deletions(-)

Index: linux-3.5-rc4/include/linux/mm.h
===
--- linux-3.5-rc4.orig/include/linux/mm.h   2012-07-03 14:22:18.530011567 
+0900
+++ linux-3.5-rc4/include/linux/mm.h2012-07-03 14:22:20.83872 +0900
@@ -1588,6 +1588,8 @@ int vmemmap_populate(struct page *start_
 void vmemmap_populate_print_last(void);
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
  unsigned long size);
+void vmemmap_kfree(struct page *memmpa, unsigned long nr_pages);
+void vmemmap_free_bootmem(struct page *memmpa, unsigned long nr_pages);

 enum mf_flags {
MF_COUNT_INCREASED = 1  0,
Index: linux-3.5-rc4/mm/sparse.c
===
--- linux-3.5-rc4.orig/mm/sparse.c  2012-07-03 14:21:45.071429805 +0900
+++ linux-3.5-rc4/mm/sparse.c   2012-07-03 14:22:21.000983767 +0900
@@ -614,12 +614,13 @@ static inline struct page *kmalloc_secti
/* This will make the necessary allocations eventually. */
return sparse_mem_map_populate(pnum, nid);
 }
-static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages)
+static void __kfree_section_memmap(struct page *page, unsigned long nr_pages)
 {
-   return; /* XXX: Not implemented yet */
+   vmemmap_kfree(page, nr_pages);
 }
 static void free_map_bootmem(struct page *page, unsigned long nr_pages)
 {
+   vmemmap_free_bootmem(page, nr_pages);
 }
 #else
 static struct page *__kmalloc_section_memmap(unsigned long nr_pages)
Index: linux-3.5-rc4/arch/x86/mm/init_64.c
===
--- linux-3.5-rc4.orig/arch/x86/mm/init_64.c2012-07-03 14:22:18.538011465 
+0900
+++ linux-3.5-rc4/arch/x86/mm/init_64.c 2012-07-03 14:22:21.007983103 +0900
@@ -978,6 +978,97 @@ vmemmap_populate(struct page *start_page
return 0;
 }

+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+ struct page **pp)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   unsigned long next;
+
+   *pp = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd))
+   return (addr + PAGE_SIZE)  PAGE_MASK;
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud))
+   return (addr + PAGE_SIZE)  PAGE_MASK;
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte))
+   return next;
+
+   *pp = pte_page(*pte);
+   pte_clear(init_mm, addr, pte);
+   } else {
+   next = pmd_addr_end(addr, end);
+
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   *pp = pmd_page(*pmd);
+   pmd_clear(pmd);
+   }
+
+   return next;
+}
+
+void __meminit
+vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+   unsigned long addr = (unsigned long)memmap;
+   unsigned long end = (unsigned long)(memmap + nr_pages);
+   unsigned long next;
+   unsigned int order;
+   struct page *page;
+
+   for (; addr  end; addr = next) {
+   page = NULL;
+   next = find_and_clear_pte_page(addr, end, page);
+   if (!page)
+   continue;
+
+   if (is_vmalloc_addr(page_address(page)))
+   vfree(page_address(page));
+   else {
+   order = next - addr;
+   free_pages((unsigned long)page_address(page),
+  get_order(order));
+   }
+   }
+}
+
+void __meminit
+vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+   unsigned long addr

[RFC PATCH v3 12/13] memory-hotplug : add node_device_release

2012-07-09 Thread Yasuaki Ishimatsu
When calling unregister_node(), the function shows following message at
device_release().

Device 'node2' does not have a release() function, it is broken and must be
fixed.

So the patch implements node_device_release()

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 drivers/base/node.c |7 +++
 1 file changed, 7 insertions(+)

Index: linux-3.5-rc4/drivers/base/node.c
===
--- linux-3.5-rc4.orig/drivers/base/node.c  2012-07-03 14:21:44.882432167 
+0900
+++ linux-3.5-rc4/drivers/base/node.c   2012-07-03 14:22:23.296951921 +0900
@@ -252,6 +252,12 @@ static inline void hugetlb_register_node
 static inline void hugetlb_unregister_node(struct node *node) {}
 #endif

+static void node_device_release(struct device *dev)
+{
+   struct node *node_dev = to_node(dev);
+
+   memset(node_dev, 0, sizeof(struct node));
+}

 /*
  * register_node - Setup a sysfs device for a node.
@@ -265,6 +271,7 @@ int register_node(struct node *node, int

node-dev.id = num;
node-dev.bus = node_subsys;
+   node-dev.release = node_device_release;
error = device_register(node-dev);

if (!error){

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC PATCH v3 13/13] memory-hotplug : remove sysfs file of node

2012-07-09 Thread Yasuaki Ishimatsu
The patch adds node_set_offline() and unregister_one_node() to remove_memory()
for removing sysfs file of node.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

---
 mm/memory_hotplug.c |5 +
 1 file changed, 5 insertions(+)

Index: linux-3.5-rc4/mm/memory_hotplug.c
===
--- linux-3.5-rc4.orig/mm/memory_hotplug.c  2012-07-03 14:22:21.012982694 
+0900
+++ linux-3.5-rc4/mm/memory_hotplug.c   2012-07-03 14:22:25.405925554 +0900
@@ -702,6 +702,11 @@ int remove_memory(int nid, u64 start, u6
/* remove memmap entry */
firmware_map_remove(start, start + size - 1, System RAM);

+   if (!node_present_pages(nid)) {
+   node_set_offline(nid);
+   unregister_one_node(nid);
+   }
+
__remove_pages(start  PAGE_SHIFT, size  PAGE_SHIFT);
unlock_memory_hotplug();
return 0;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


  1   2   >