Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-10-09 Thread Wen Congyang
At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote:
> Hi,
> 
> I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes 
> [1],[2],[3]
> Running in a guest (qemu+seabios from [4]). 
> CONFIG_SLAB=y
> CONFIG_DEBUG_SLAB=y
> 
> After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > 
> /sys/bus/acpi/devices/PNP/eject"
> When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, 
> or in other caches

The following patch can fix this problem:
https://lkml.org/lkml/2012/7/12/186

Thanks
Wen Congyang

> 
> [  170.566995] Slab corruption (Not tainted): Acpi-State 
> start=88009fc1e548, len=80
> [  170.567265] Redzone: 0x0/0x0.
> [  170.567399] Last user: [<  (null)>](0x0)
> [  170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569712] Prev obj: start=9fc1e4d0, len=80
> [  170.569869] BUG: unable to handle kernel paging request at 9fc1e520
> [  170.570171] IP: [] print_objinfo+0x9c/0x110
> [  170.570397] PGD 7cf37067 PUD 0 
> [  170.570619] Oops:  [#1] SMP 
> [  170.570843] Modules linked in: netconsole acpiphp pci_hotplug 
> acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse 
> parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core 
> button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk 
> ata_piix libata scsi_mod virtio_pci virtio_ring virtio
> [  170.573474] CPU 0 
> [  170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 
> Bochs Bochs
> [  170.573830] RIP: 0010:[]  [] 
> print_objinfo+0x9c/0x110
> [  170.574106] RSP: 0018:88003eaf3a70  EFLAGS: 00010202
> [  170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: 
> 24b8
> [  170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: 
> 88003e9bb980
> [  170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: 
> 
> [  170.574870] R10: 021e R11: 0002 R12: 
> 9fc1e4c8
> [  170.575070] R13: 9fc1e520 R14: 004f R15: 
> ffa5
> [  170.575274] FS:  7fc6b7530700() GS:88003fc0() 
> knlGS:
> [  170.575494] CS:  0010 DS:  ES:  CR0: 8005003b
> [  170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: 
> 06f0
> [  170.575870] DR0:  DR1:  DR2: 
> 
> [  170.576075] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, 
> task 88003ea941c0)
> [  170.576507] Stack:
> [  170.576599]  0010 01893fbe 88009fc1e000 
> 0050
> [  170.576938]  9fc1e4c8 004f ffa5 
> 8112899f
> [  170.576938]  88003eb309d8 81712d6d 88003e9bb980 
> 88009fc1e540
> [  170.576938] Call Trace:
> [  170.576938]  [] ? check_poison_obj+0x1df/0x1f0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? 
> cache_alloc_debugcheck_after.isra.52+0xed/0x220
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? kmem_cache_alloc+0xb5/0x1e0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ds_result_push+0x5d/0x12e
> [  170.576938]  [] ? acpi_ds_exec_end_op+0x28e/0x3d3
> [  170.576938]  [] ? acpi_ps_parse_loop+0x79f/0x931
> [  170.576938]  [] ? acpi_ps_parse_aml+0x89/0x261
> [  170.576938]  [] ? acpi_ps_execute_method+0x1be/0x266
> [  170.576938]  [] ? acpi_ns_evaluate+0xd3/0x19a
> [  170.576938]  [] ? acpi_evaluate_object+0xf3/0x1f4
> [  170.576938]  [] ? acpi_os_wait_events_complete+0x1b/0x1b
> [  170.576938]  [] ? acpi_bus_hot_remove_device+0xeb/0x123
> [  170.576938]  [] ? acpi_os_execute_deferred+0x1d/0x29
> [  170.576938]  [] ? process_one_work+0x125/0x560
> [  170.576938]  [] ? worker_thread+0x16a/0x4e0
> [  170.576938]  [] ? manage_workers+0x310/0x310
> [  170.576938]  [] ? kthread+0x85/0x90
> [  170.576938]  [] ? kernel_thread_helper+0x4/0x10
> [  170.576938]  [] ? flush_kthread_worker+0xa0/0xa0
> [  170.576938]  [] ? gs_change+0x13/0x13
> [  170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b 
> 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> 
> 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f 81 31 c0 e8 3e 34 3b 00 
> 
> Other times, the problem happens on a slab object free:
> 
> [   52.313366] Offlined Pages 32768
> [   52.800232] slab err

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-10-08 Thread Wen Congyang
At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote:
> Hi,
> 
> I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes 
> [1],[2],[3]
> Running in a guest (qemu+seabios from [4]). 
> CONFIG_SLAB=y
> CONFIG_DEBUG_SLAB=y
> 
> After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > 
> /sys/bus/acpi/devices/PNP/eject"
> When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, 
> or in other caches

I found the reason: when you do OSPM-eject, the kernel will auto offline and 
remove the memory.
But, offlining memory fails, and the memory is still used by the kernel. But 
device_release_driver()
doesn't tell this error to the caller acpi_bus_remove(). The kernel will 
poweroff and eject
the device by emulate _PS3 and _EJ0. The kernel uses some memory which doesn't 
exist. It's
very dangerous.

Thanks
Wen Conyang

> 
> [  170.566995] Slab corruption (Not tainted): Acpi-State 
> start=88009fc1e548, len=80
> [  170.567265] Redzone: 0x0/0x0.
> [  170.567399] Last user: [<  (null)>](0x0)
> [  170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569712] Prev obj: start=9fc1e4d0, len=80
> [  170.569869] BUG: unable to handle kernel paging request at 9fc1e520
> [  170.570171] IP: [] print_objinfo+0x9c/0x110
> [  170.570397] PGD 7cf37067 PUD 0 
> [  170.570619] Oops:  [#1] SMP 
> [  170.570843] Modules linked in: netconsole acpiphp pci_hotplug 
> acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse 
> parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core 
> button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk 
> ata_piix libata scsi_mod virtio_pci virtio_ring virtio
> [  170.573474] CPU 0 
> [  170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 
> Bochs Bochs
> [  170.573830] RIP: 0010:[]  [] 
> print_objinfo+0x9c/0x110
> [  170.574106] RSP: 0018:88003eaf3a70  EFLAGS: 00010202
> [  170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: 
> 24b8
> [  170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: 
> 88003e9bb980
> [  170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: 
> 
> [  170.574870] R10: 021e R11: 0002 R12: 
> 9fc1e4c8
> [  170.575070] R13: 9fc1e520 R14: 004f R15: 
> ffa5
> [  170.575274] FS:  7fc6b7530700() GS:88003fc0() 
> knlGS:
> [  170.575494] CS:  0010 DS:  ES:  CR0: 8005003b
> [  170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: 
> 06f0
> [  170.575870] DR0:  DR1:  DR2: 
> 
> [  170.576075] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, 
> task 88003ea941c0)
> [  170.576507] Stack:
> [  170.576599]  0010 01893fbe 88009fc1e000 
> 0050
> [  170.576938]  9fc1e4c8 004f ffa5 
> 8112899f
> [  170.576938]  88003eb309d8 81712d6d 88003e9bb980 
> 88009fc1e540
> [  170.576938] Call Trace:
> [  170.576938]  [] ? check_poison_obj+0x1df/0x1f0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? 
> cache_alloc_debugcheck_after.isra.52+0xed/0x220
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? kmem_cache_alloc+0xb5/0x1e0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ds_result_push+0x5d/0x12e
> [  170.576938]  [] ? acpi_ds_exec_end_op+0x28e/0x3d3
> [  170.576938]  [] ? acpi_ps_parse_loop+0x79f/0x931
> [  170.576938]  [] ? acpi_ps_parse_aml+0x89/0x261
> [  170.576938]  [] ? acpi_ps_execute_method+0x1be/0x266
> [  170.576938]  [] ? acpi_ns_evaluate+0xd3/0x19a
> [  170.576938]  [] ? acpi_evaluate_object+0xf3/0x1f4
> [  170.576938]  [] ? acpi_os_wait_events_complete+0x1b/0x1b
> [  170.576938]  [] ? acpi_bus_hot_remove_device+0xeb/0x123
> [  170.576938]  [] ? acpi_os_execute_deferred+0x1d/0x29
> [  170.576938]  [] ? process_one_work+0x125/0x560
> [  170.576938]  [] ? worker_thread+0x16a/0x4e0
> [  170.576938]  [] ? manage_workers+0x310/0x310
> [  170.576938]  [] ? kthread+0x85/0x90
> [  170.576938]  [] ? kernel_thread_helper+0x4/0x10
> [  170.576938]  [] ? flush_kthread_worker+0xa0/0xa0
> [  170.576938]  [] ? gs_change+0x13/0x13
> [  170.576938] Code: cb 75 d

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-10-08 Thread Wen Congyang
At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote:
> Hi,
> 
> I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes 
> [1],[2],[3]
> Running in a guest (qemu+seabios from [4]). 
> CONFIG_SLAB=y
> CONFIG_DEBUG_SLAB=y
> 
> After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > 
> /sys/bus/acpi/devices/PNP/eject"
> When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, 
> or in other caches

I can reproduce this problem without my patchset.

Thanks
Wen Congyang

> 
> [  170.566995] Slab corruption (Not tainted): Acpi-State 
> start=88009fc1e548, len=80
> [  170.567265] Redzone: 0x0/0x0.
> [  170.567399] Last user: [<  (null)>](0x0)
> [  170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569712] Prev obj: start=9fc1e4d0, len=80
> [  170.569869] BUG: unable to handle kernel paging request at 9fc1e520
> [  170.570171] IP: [] print_objinfo+0x9c/0x110
> [  170.570397] PGD 7cf37067 PUD 0 
> [  170.570619] Oops:  [#1] SMP 
> [  170.570843] Modules linked in: netconsole acpiphp pci_hotplug 
> acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse 
> parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core 
> button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk 
> ata_piix libata scsi_mod virtio_pci virtio_ring virtio
> [  170.573474] CPU 0 
> [  170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 
> Bochs Bochs
> [  170.573830] RIP: 0010:[]  [] 
> print_objinfo+0x9c/0x110
> [  170.574106] RSP: 0018:88003eaf3a70  EFLAGS: 00010202
> [  170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: 
> 24b8
> [  170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: 
> 88003e9bb980
> [  170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: 
> 
> [  170.574870] R10: 021e R11: 0002 R12: 
> 9fc1e4c8
> [  170.575070] R13: 9fc1e520 R14: 004f R15: 
> ffa5
> [  170.575274] FS:  7fc6b7530700() GS:88003fc0() 
> knlGS:
> [  170.575494] CS:  0010 DS:  ES:  CR0: 8005003b
> [  170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: 
> 06f0
> [  170.575870] DR0:  DR1:  DR2: 
> 
> [  170.576075] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, 
> task 88003ea941c0)
> [  170.576507] Stack:
> [  170.576599]  0010 01893fbe 88009fc1e000 
> 0050
> [  170.576938]  9fc1e4c8 004f ffa5 
> 8112899f
> [  170.576938]  88003eb309d8 81712d6d 88003e9bb980 
> 88009fc1e540
> [  170.576938] Call Trace:
> [  170.576938]  [] ? check_poison_obj+0x1df/0x1f0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? 
> cache_alloc_debugcheck_after.isra.52+0xed/0x220
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? kmem_cache_alloc+0xb5/0x1e0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ds_result_push+0x5d/0x12e
> [  170.576938]  [] ? acpi_ds_exec_end_op+0x28e/0x3d3
> [  170.576938]  [] ? acpi_ps_parse_loop+0x79f/0x931
> [  170.576938]  [] ? acpi_ps_parse_aml+0x89/0x261
> [  170.576938]  [] ? acpi_ps_execute_method+0x1be/0x266
> [  170.576938]  [] ? acpi_ns_evaluate+0xd3/0x19a
> [  170.576938]  [] ? acpi_evaluate_object+0xf3/0x1f4
> [  170.576938]  [] ? acpi_os_wait_events_complete+0x1b/0x1b
> [  170.576938]  [] ? acpi_bus_hot_remove_device+0xeb/0x123
> [  170.576938]  [] ? acpi_os_execute_deferred+0x1d/0x29
> [  170.576938]  [] ? process_one_work+0x125/0x560
> [  170.576938]  [] ? worker_thread+0x16a/0x4e0
> [  170.576938]  [] ? manage_workers+0x310/0x310
> [  170.576938]  [] ? kthread+0x85/0x90
> [  170.576938]  [] ? kernel_thread_helper+0x4/0x10
> [  170.576938]  [] ? flush_kthread_worker+0xa0/0xa0
> [  170.576938]  [] ? gs_change+0x13/0x13
> [  170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b 
> 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> 
> 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f 81 31 c0 e8 3e 34 3b 00 
> 
> Other times, the problem happens on a slab object free:
> 
> [   52.313366] Offlined Pages 32768
> [   52.800232] slab error in verify_redzone_free():

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-10-01 Thread Yasuaki Ishimatsu

Hi Chen,

2012/10/02 8:45, Ni zhan Chen wrote:

On 10/01/2012 12:44 PM, Yasuaki Ishimatsu wrote:

Hi Chen,

2012/09/29 17:19, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang 

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug


Hi Yasuaki,

where is the acpi_memhotplug module?


If you build acpi_memhotplug as module, it is created under
/lib/modules//driver/acpi/ directory. It depends
on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in
function. So you don't need to care about it.
Thanks,
Yasuaki Ishimatsu


Hi Yasuaki,

I build the kernel, MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY are 
seleted as [*], but I can't find PNP0C80:XX under the directory 
/sys/bus/acpi/devices/.

[root@localhost ~]# ls /sys/bus/acpi/devices/
device:00  device:07  device:0e  device:15  device:1c  device:23 device:2a   
LNXCPU:00  LNXCPU:07PNP0501:00  PNP0C02:00 PNP0C0F:02 PNP0C14:01
device:01  device:08  device:0f  device:16  device:1d  device:24 device:2b   
LNXCPU:01  LNXPWRBN:00  PNP0800:00  PNP0C02:01 PNP0C0F:03 PNP0C31:00
device:02  device:09  device:10  device:17  device:1e  device:25 device:2c   
LNXCPU:02  LNXSYSTM:00  PNP0A08:00  PNP0C02:02 PNP0C0F:04
device:03  device:0a  device:11  device:18  device:1f  device:26 device:2d   
LNXCPU:03  PNP:00   PNP0B00:00  PNP0C04:00 PNP0C0F:05
device:04  device:0b  device:12  device:19  device:20  device:27 device:2e   
LNXCPU:04  PNP0100:00   PNP0C01:00  PNP0C0C:00 PNP0C0F:06
device:05  device:0c  device:13  device:1a  device:21  device:28 device:2f   
LNXCPU:05  PNP0103:00   PNP0C01:01  PNP0C0F:00 PNP0C0F:07
device:06  device:0d  device:14  device:1b  device:22  device:29 INT3F0D:00  
LNXCPU:06  PNP0200:00   PNP0C01:02  PNP0C0F:01 PNP0C14:00

then what I miss ? thanks.


It depend on hardware. It seems that your system does not support
memory hotplug. If you use KVM, you can try memory hotplug on KVM
guest by applying Vasilis' patch-set.

http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html

Thanks,
Yasuaki Ishimatsu








3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the 
memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, you should offline the memory by
hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the memory
  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup when onlining memory block
  fail

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-10-01 Thread Ni zhan Chen

On 10/01/2012 12:44 PM, Yasuaki Ishimatsu wrote:

Hi Chen,

2012/09/29 17:19, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang 

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 
13-16/19]

   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please 
let me

know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, 
MEMORY_HOTREMOVE,

ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug


Hi Yasuaki,

where is the acpi_memhotplug module?


If you build acpi_memhotplug as module, it is created under
/lib/modules//driver/acpi/ directory. It depends
on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in
function. So you don't need to care about it.
Thanks,
Yasuaki Ishimatsu


Hi Yasuaki,

I build the kernel, MEMORY_HOTPLUG, MEMORY_HOTREMOVE, 
ACPI_HOTPLUG_MEMORY are seleted as [*], but I can't find PNP0C80:XX 
under the directory /sys/bus/acpi/devices/.


[root@localhost ~]# ls /sys/bus/acpi/devices/
device:00  device:07  device:0e  device:15  device:1c  device:23 
device:2a   LNXCPU:00  LNXCPU:07PNP0501:00  PNP0C02:00 PNP0C0F:02  
PNP0C14:01
device:01  device:08  device:0f  device:16  device:1d  device:24 
device:2b   LNXCPU:01  LNXPWRBN:00  PNP0800:00  PNP0C02:01 PNP0C0F:03  
PNP0C31:00
device:02  device:09  device:10  device:17  device:1e  device:25 
device:2c   LNXCPU:02  LNXSYSTM:00  PNP0A08:00  PNP0C02:02 PNP0C0F:04
device:03  device:0a  device:11  device:18  device:1f  device:26 
device:2d   LNXCPU:03  PNP:00   PNP0B00:00  PNP0C04:00 PNP0C0F:05
device:04  device:0b  device:12  device:19  device:20  device:27 
device:2e   LNXCPU:04  PNP0100:00   PNP0C01:00  PNP0C0C:00 PNP0C0F:06
device:05  device:0c  device:13  device:1a  device:21  device:28 
device:2f   LNXCPU:05  PNP0103:00   PNP0C01:01  PNP0C0F:00 PNP0C0F:07
device:06  device:0d  device:14  device:1b  device:22  device:29 
INT3F0D:00  LNXCPU:06  PNP0200:00   PNP0C01:02  PNP0C0F:01 PNP0C14:00


then what I miss ? thanks.






3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory 
/sys/bus/acpi/devices/.

Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to 
/sys/devices/system/memory/memoryX/state to

online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 
1 to

/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the 
kernel, it

can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, 
memory10,

and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store 
page cgroup
when we online pages. When we online memory8, the memory stored 
page cgroup
is not provided by this memory device. But when we online 
memory9, the memory
stored page cgroup may be provided by memory8. So we can't 
offline memory8

now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline 
memory provided
by this memory device. But we don't know which memory is onlined 
first, so
offlining memory may fail. In such case, you should offline the 
memory by

hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the 
memory

  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup when onlining 
memory block

  failed

change log of v8:
  [RFC PATCH v8 17/20]
* Fix problems when one node's range include the other nodes
  [RFC PATCH v8 18/20]
* fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or 
CONFIG_HUGETLBFS

  is not defined.
  [RFC PATCH v8 19/20]
* don't offline node

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-30 Thread Yasuaki Ishimatsu

Hi Chen,

2012/09/29 17:19, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang 

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug


Hi Yasuaki,

where is the acpi_memhotplug module?


If you build acpi_memhotplug as module, it is created under
/lib/modules//driver/acpi/ directory. It depends
on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in
function. So you don't need to care about it.  


Thanks,
Yasuaki Ishimatsu




3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the 
memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, you should offline the memory by
hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the memory
  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup when onlining memory block
  failed

change log of v8:
  [RFC PATCH v8 17/20]
* Fix problems when one node's range include the other nodes
  [RFC PATCH v8 18/20]
* fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
  is not defined.
  [RFC PATCH v8 19/20]
* don't offline node when some memory sections are not removed
  [RFC PATCH v8 20/20]
* create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
  [RFC PATCH v7 4/19]
* do not continue if acpi_memory_device_remove_memory() fails.
  [RFC PATCH v7 15/19]
* handle usemap in register_page_bootmem_info_section() too.

change log of v6:
  [RFC PATCH v6 12/19]
* fix building error on other archtitectures than x86

  [RFC PATCH v6 15-16/19]
* fix building error on other archtitectures than x86

change log of v5:
  * merge the patchset to clear page table and the patchset to hot remove
memory(from ishimatsu) to one big patchset.

  [RFC PATCH v5 1/19]
* rename remove_memory() to offline_memory()/offline_pages()

  [RFC PATCH v5 2/19]
* new patch: implement offline_memory(). This function offlines pages,
  update memory block's state, and notify the userspace that the memory
  block's state is changed.

  [RFC PATCH v5 4/19]
* offline and remove memory in acpi_memory_disable_device() too.

  [RFC PATCH v5 17/19]
* new patch: add a new function __remove_zone() to revert the things done
  in the function __add_zone().

  [RFC PATCH v5 18/19]
* flush work befor reseting node device.

change log of v4:
  * remove "memory-hotplug : unify argument of fir

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-29 Thread Ni zhan Chen

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang 

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug


Hi Yasuaki,

where is the acpi_memhotplug module?


3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the 
memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, you should offline the memory by
hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the memory
  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup when onlining memory block
  failed

change log of v8:
  [RFC PATCH v8 17/20]
* Fix problems when one node's range include the other nodes
  [RFC PATCH v8 18/20]
* fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
  is not defined.
  [RFC PATCH v8 19/20]
* don't offline node when some memory sections are not removed
  [RFC PATCH v8 20/20]
* create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
  [RFC PATCH v7 4/19]
* do not continue if acpi_memory_device_remove_memory() fails.
  [RFC PATCH v7 15/19]
* handle usemap in register_page_bootmem_info_section() too.

change log of v6:
  [RFC PATCH v6 12/19]
* fix building error on other archtitectures than x86

  [RFC PATCH v6 15-16/19]
* fix building error on other archtitectures than x86

change log of v5:
  * merge the patchset to clear page table and the patchset to hot remove
memory(from ishimatsu) to one big patchset.

  [RFC PATCH v5 1/19]
* rename remove_memory() to offline_memory()/offline_pages()

  [RFC PATCH v5 2/19]
* new patch: implement offline_memory(). This function offlines pages,
  update memory block's state, and notify the userspace that the memory
  block's state is changed.

  [RFC PATCH v5 4/19]
* offline and remove memory in acpi_memory_disable_device() too.

  [RFC PATCH v5 17/19]
* new patch: add a new function __remove_zone() to revert the things done
  in the function __add_zone().

  [RFC PATCH v5 18/19]
* flush work befor reseting node device.

change log of v4:
  * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
from the patch series, since the patch is a bugfix. It is being disccussed
on other thread. But for testing the patch series, the patch is needed.
So I added the patch as [PATCH 0/13].

  [RFC PATCH v4 2/13]
* check memory is online or not at remove_memory()
*

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-28 Thread Ni zhan Chen

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang 

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.


Since patchset is too big, could you add more patchset changelog to 
describe how this patchset works? in order that it is easier to review.




How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug
3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the 
memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, you should offline the memory by
hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the memory
  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup when onlining memory block
  failed

change log of v8:
  [RFC PATCH v8 17/20]
* Fix problems when one node's range include the other nodes
  [RFC PATCH v8 18/20]
* fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
  is not defined.
  [RFC PATCH v8 19/20]
* don't offline node when some memory sections are not removed
  [RFC PATCH v8 20/20]
* create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
  [RFC PATCH v7 4/19]
* do not continue if acpi_memory_device_remove_memory() fails.
  [RFC PATCH v7 15/19]
* handle usemap in register_page_bootmem_info_section() too.

change log of v6:
  [RFC PATCH v6 12/19]
* fix building error on other archtitectures than x86

  [RFC PATCH v6 15-16/19]
* fix building error on other archtitectures than x86

change log of v5:
  * merge the patchset to clear page table and the patchset to hot remove
memory(from ishimatsu) to one big patchset.

  [RFC PATCH v5 1/19]
* rename remove_memory() to offline_memory()/offline_pages()

  [RFC PATCH v5 2/19]
* new patch: implement offline_memory(). This function offlines pages,
  update memory block's state, and notify the userspace that the memory
  block's state is changed.

  [RFC PATCH v5 4/19]
* offline and remove memory in acpi_memory_disable_device() too.

  [RFC PATCH v5 17/19]
* new patch: add a new function __remove_zone() to revert the things done
  in the function __add_zone().

  [RFC PATCH v5 18/19]
* flush work befor reseting node device.

change log of v4:
  * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
from the patch series, since the patch is a bugfix. It is being disccussed
on other thread. But for testing the patch series, the patch is needed.
So I added the patch as [PATC

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-27 Thread Wen Congyang
At 09/27/2012 06:35 PM, Vasilis Liaskovitis Wrote:
> On Thu, Sep 27, 2012 at 02:37:14PM +0800, Wen Congyang wrote:
>> Hi Vasilis Liaskovitis
>>
>> At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote:
>>> Hi,
>>>
>>> I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes 
>>> [1],[2],[3]
>>> Running in a guest (qemu+seabios from [4]). 
>>> CONFIG_SLAB=y
>>> CONFIG_DEBUG_SLAB=y
>>>
>>> After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > 
>>> /sys/bus/acpi/devices/PNP/eject"
>>> When I do the OSPM-eject, I often get slab corruption in "acpi-state" 
>>> cache, or in other caches
>>
>> I can't reproduce this problem. Can you provide the following information:
>> 1. config file
>> 2. qemu's command line
>>
>> You said you did OSPM-eject. Do you mean write 1 to 
>> /sys/bus/acpi/devices/PNP0C80:XX/eject?
> yes.
> 
> example qemu command line with one dimm:
> 
> "/opt/qemu-kvm-memhp/bin/qemu-system-x86_64 -bios
> /opt/extra/vliaskov/devel/seabios-upstream/out/bios.bin -enable-kvm -M pc -smp
> 4,maxcpus=8 -cpu host -m 2048 -drive 
> file=/opt/extra/debian-template.raw,if=none,id=drive-virtio-disk0,format=raw
> -device 
> virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -vga cirrus -netdev type=tap,id=guest0,vhost=on -device 
> virtio-net-pci,netdev=guest0
> -monitor unix:/tmp/qemu.monitor11,server,nowait -chardev stdio,id=seabios  
> -device
> isa-debugcon,iobase=0x402,chardev=seabios
> -dimm id=n0,size=512M,node=0"
> 
> or last line with 2 numa nodes:
> "-dimm id=n0,size=512M,node=0 -dimm id=n1,size=512M,node=1 -numa 
> node,nodeid=0 -numa node,nodeid=1"

I have reproduced this problem. It only can be reproduced when the dimm's 
memory is on node 0.
I investigate it now.

Thanks
Wen Congyang

> 
> attached config. Tree is at:
> https://github.com/vliaskov/linux/commits/memhp-fujitsu
> 
> thanks,
> - Vasilis

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-27 Thread Vasilis Liaskovitis
On Thu, Sep 27, 2012 at 06:06:30PM +0800, Wen Congyang wrote:
> Please try the following patch:
> From a38ec678e0a9b48b252f457d7910b7527049dc43 Mon Sep 17 00:00:00 2001
> From: Wen Congyang 
> Date: Thu, 27 Sep 2012 17:27:57 +0800
> Subject: [PATCH] clear the memory to store page information

this solves the hot re-add problem for me.
thanks for the quick solution.

- Vasilis

> 
> ---
>  mm/sparse.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index ab9d755..36dda08 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -639,7 +639,6 @@ static struct page *__kmalloc_section_memmap(unsigned 
> long nr_pages)
>  got_map_page:
>   ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
>  got_map_ptr:
> - memset(ret, 0, memmap_size);
>  
>   return ret;
>  }
> @@ -761,6 +760,8 @@ int __meminit sparse_add_one_section(struct zone *zone, 
> unsigned long start_pfn,
>   goto out;
>   }
>  
> + memset(memmap, 0, sizeof(struct page) * nr_pages);
> +
>   ms->section_mem_map |= SECTION_MARKED_PRESENT;
>  
>   ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
> -- 
> 1.7.1
> 
> Thanks
> Wen Congyang
> 
> > 
> > thanks,
> > 
> > - Vasilis
> > 
> > [1] https://lkml.org/lkml/2012/9/6/635
> > [2] https://lkml.org/lkml/2012/9/11/542
> > [3] https://lkml.org/lkml/2012/9/20/37
> > [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691
> > 
> > 
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-27 Thread Wen Congyang
At 09/27/2012 12:58 AM, Vasilis Liaskovitis Wrote:
> Testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3]
> Running in a guest (qemu+seabios from [4]). 
> CONFIG_SLAB=y
> CONFIG_DEBUG_SLAB=y
> 
> - succesfull hot-add and online
> - succesfull hot-remove with SCI (qemu) eject
> - attempt to hot-readd same memory
> 
> When the pages are re-onlined on hot-readd, I get a bad_page state for many
> pages e.g.
> 
> [   59.611278] init_memory_mapping: [mem 0x8000-0x9fff]
> [   59.637836] Built 2 zonelists in Node order, mobility grouping on.  Total 
> pages: 547617
> [   59.638739] Policy zone: Normal
> [   59.650840] BUG: Bad page state in process bash  pfn:9b6dc
> [   59.651124] page:ea0002200020 count:0 mapcount:0 mapping:  
> (null) index:0xfdfdfdfdfdfdfdfd
> [   59.651494] page flags: 
> 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
> [   59.653604] Modules linked in: netconsole acpiphp pci_hotplug 
> acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse 
> serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys 
> ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk 
> libata virtio_pci virtio_ring virtio scsi_mod
> [   59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
> [   59.657172] Call Trace:
> [   59.657275]  [] ? bad_page+0xb0/0x100
> [   59.657434]  [] ? free_pages_prepare+0xb3/0x100
> [   59.657610]  [] ? free_hot_cold_page+0x48/0x1a0
> [   59.657787]  [] ? online_pages_range+0x68/0xa0
> [   59.657961]  [] ? 
> __online_page_increment_counters+0x10/0x10
> [   59.658162]  [] ? walk_system_ram_range+0x101/0x110
> [   59.658346]  [] ? online_pages+0x1a5/0x2b0
> [   59.658515]  [] ? __memory_block_change_state+0x20d/0x270
> [   59.658710]  [] ? store_mem_state+0xb6/0xf0
> [   59.658878]  [] ? sysfs_write_file+0xd2/0x160
> [   59.659052]  [] ? vfs_write+0xaa/0x160
> [   59.659212]  [] ? sys_write+0x47/0x90
> [   59.659371]  [] ? async_page_fault+0x25/0x30
> [   59.659543]  [] ? system_call_fastpath+0x16/0x1b
> [   59.659720] Disabling lock debugging due to kernel taint
> 
> Patch 20/21 deals with a similar scenario, but only for __PG_HWPOISON flag.
> Did i miss any other patch for this?

Please try the following patch:
>From a38ec678e0a9b48b252f457d7910b7527049dc43 Mon Sep 17 00:00:00 2001
From: Wen Congyang 
Date: Thu, 27 Sep 2012 17:27:57 +0800
Subject: [PATCH] clear the memory to store page information

---
 mm/sparse.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index ab9d755..36dda08 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -639,7 +639,6 @@ static struct page *__kmalloc_section_memmap(unsigned long 
nr_pages)
 got_map_page:
ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
 got_map_ptr:
-   memset(ret, 0, memmap_size);
 
return ret;
 }
@@ -761,6 +760,8 @@ int __meminit sparse_add_one_section(struct zone *zone, 
unsigned long start_pfn,
goto out;
}
 
+   memset(memmap, 0, sizeof(struct page) * nr_pages);
+
ms->section_mem_map |= SECTION_MARKED_PRESENT;
 
ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
-- 
1.7.1

Thanks
Wen Congyang

> 
> thanks,
> 
> - Vasilis
> 
> [1] https://lkml.org/lkml/2012/9/6/635
> [2] https://lkml.org/lkml/2012/9/11/542
> [3] https://lkml.org/lkml/2012/9/20/37
> [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-27 Thread Wen Congyang
At 09/27/2012 12:58 AM, Vasilis Liaskovitis Wrote:
> Testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3]
> Running in a guest (qemu+seabios from [4]). 
> CONFIG_SLAB=y
> CONFIG_DEBUG_SLAB=y
> 
> - succesfull hot-add and online
> - succesfull hot-remove with SCI (qemu) eject
> - attempt to hot-readd same memory
> 
> When the pages are re-onlined on hot-readd, I get a bad_page state for many
> pages e.g.

I have reproduced this problem, and I investigate it now.

Thanks
Wen Congyang

> 
> [   59.611278] init_memory_mapping: [mem 0x8000-0x9fff]
> [   59.637836] Built 2 zonelists in Node order, mobility grouping on.  Total 
> pages: 547617
> [   59.638739] Policy zone: Normal
> [   59.650840] BUG: Bad page state in process bash  pfn:9b6dc
> [   59.651124] page:ea0002200020 count:0 mapcount:0 mapping:  
> (null) index:0xfdfdfdfdfdfdfdfd
> [   59.651494] page flags: 
> 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
> [   59.653604] Modules linked in: netconsole acpiphp pci_hotplug 
> acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse 
> serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys 
> ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk 
> libata virtio_pci virtio_ring virtio scsi_mod
> [   59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
> [   59.657172] Call Trace:
> [   59.657275]  [] ? bad_page+0xb0/0x100
> [   59.657434]  [] ? free_pages_prepare+0xb3/0x100
> [   59.657610]  [] ? free_hot_cold_page+0x48/0x1a0
> [   59.657787]  [] ? online_pages_range+0x68/0xa0
> [   59.657961]  [] ? 
> __online_page_increment_counters+0x10/0x10
> [   59.658162]  [] ? walk_system_ram_range+0x101/0x110
> [   59.658346]  [] ? online_pages+0x1a5/0x2b0
> [   59.658515]  [] ? __memory_block_change_state+0x20d/0x270
> [   59.658710]  [] ? store_mem_state+0xb6/0xf0
> [   59.658878]  [] ? sysfs_write_file+0xd2/0x160
> [   59.659052]  [] ? vfs_write+0xaa/0x160
> [   59.659212]  [] ? sys_write+0x47/0x90
> [   59.659371]  [] ? async_page_fault+0x25/0x30
> [   59.659543]  [] ? system_call_fastpath+0x16/0x1b
> [   59.659720] Disabling lock debugging due to kernel taint
> 
> Patch 20/21 deals with a similar scenario, but only for __PG_HWPOISON flag.
> Did i miss any other patch for this?
> 
> thanks,
> 
> - Vasilis
> 
> [1] https://lkml.org/lkml/2012/9/6/635
> [2] https://lkml.org/lkml/2012/9/11/542
> [3] https://lkml.org/lkml/2012/9/20/37
> [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-26 Thread Wen Congyang
Hi Vasilis Liaskovitis

At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote:
> Hi,
> 
> I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes 
> [1],[2],[3]
> Running in a guest (qemu+seabios from [4]). 
> CONFIG_SLAB=y
> CONFIG_DEBUG_SLAB=y
> 
> After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > 
> /sys/bus/acpi/devices/PNP/eject"
> When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, 
> or in other caches

I can't reproduce this problem. Can you provide the following information:
1. config file
2. qemu's command line

You said you did OSPM-eject. Do you mean write 1 to 
/sys/bus/acpi/devices/PNP0C80:XX/eject?

Thanks
Wen Congyang

> 
> [  170.566995] Slab corruption (Not tainted): Acpi-State 
> start=88009fc1e548, len=80
> [  170.567265] Redzone: 0x0/0x0.
> [  170.567399] Last user: [<  (null)>](0x0)
> [  170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569712] Prev obj: start=9fc1e4d0, len=80
> [  170.569869] BUG: unable to handle kernel paging request at 9fc1e520
> [  170.570171] IP: [] print_objinfo+0x9c/0x110
> [  170.570397] PGD 7cf37067 PUD 0 
> [  170.570619] Oops:  [#1] SMP 
> [  170.570843] Modules linked in: netconsole acpiphp pci_hotplug 
> acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse 
> parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core 
> button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk 
> ata_piix libata scsi_mod virtio_pci virtio_ring virtio
> [  170.573474] CPU 0 
> [  170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 
> Bochs Bochs
> [  170.573830] RIP: 0010:[]  [] 
> print_objinfo+0x9c/0x110
> [  170.574106] RSP: 0018:88003eaf3a70  EFLAGS: 00010202
> [  170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: 
> 24b8
> [  170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: 
> 88003e9bb980
> [  170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: 
> 
> [  170.574870] R10: 021e R11: 0002 R12: 
> 9fc1e4c8
> [  170.575070] R13: 9fc1e520 R14: 004f R15: 
> ffa5
> [  170.575274] FS:  7fc6b7530700() GS:88003fc0() 
> knlGS:
> [  170.575494] CS:  0010 DS:  ES:  CR0: 8005003b
> [  170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: 
> 06f0
> [  170.575870] DR0:  DR1:  DR2: 
> 
> [  170.576075] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, 
> task 88003ea941c0)
> [  170.576507] Stack:
> [  170.576599]  0010 01893fbe 88009fc1e000 
> 0050
> [  170.576938]  9fc1e4c8 004f ffa5 
> 8112899f
> [  170.576938]  88003eb309d8 81712d6d 88003e9bb980 
> 88009fc1e540
> [  170.576938] Call Trace:
> [  170.576938]  [] ? check_poison_obj+0x1df/0x1f0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? 
> cache_alloc_debugcheck_after.isra.52+0xed/0x220
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? kmem_cache_alloc+0xb5/0x1e0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ds_result_push+0x5d/0x12e
> [  170.576938]  [] ? acpi_ds_exec_end_op+0x28e/0x3d3
> [  170.576938]  [] ? acpi_ps_parse_loop+0x79f/0x931
> [  170.576938]  [] ? acpi_ps_parse_aml+0x89/0x261
> [  170.576938]  [] ? acpi_ps_execute_method+0x1be/0x266
> [  170.576938]  [] ? acpi_ns_evaluate+0xd3/0x19a
> [  170.576938]  [] ? acpi_evaluate_object+0xf3/0x1f4
> [  170.576938]  [] ? acpi_os_wait_events_complete+0x1b/0x1b
> [  170.576938]  [] ? acpi_bus_hot_remove_device+0xeb/0x123
> [  170.576938]  [] ? acpi_os_execute_deferred+0x1d/0x29
> [  170.576938]  [] ? process_one_work+0x125/0x560
> [  170.576938]  [] ? worker_thread+0x16a/0x4e0
> [  170.576938]  [] ? manage_workers+0x310/0x310
> [  170.576938]  [] ? kthread+0x85/0x90
> [  170.576938]  [] ? kernel_thread_helper+0x4/0x10
> [  170.576938]  [] ? flush_kthread_worker+0xa0/0xa0
> [  170.576938]  [] ? gs_change+0x13/0x13
> [  170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b 
> 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> 
> 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-26 Thread Wen Congyang
At 09/27/2012 12:58 AM, Vasilis Liaskovitis Wrote:
> Testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3]
> Running in a guest (qemu+seabios from [4]). 
> CONFIG_SLAB=y
> CONFIG_DEBUG_SLAB=y
> 
> - succesfull hot-add and online
> - succesfull hot-remove with SCI (qemu) eject
> - attempt to hot-readd same memory
> 
> When the pages are re-onlined on hot-readd, I get a bad_page state for many
> pages e.g.

Can you provide your config file?

Thanks
Wen Congyang

> 
> [   59.611278] init_memory_mapping: [mem 0x8000-0x9fff]
> [   59.637836] Built 2 zonelists in Node order, mobility grouping on.  Total 
> pages: 547617
> [   59.638739] Policy zone: Normal
> [   59.650840] BUG: Bad page state in process bash  pfn:9b6dc
> [   59.651124] page:ea0002200020 count:0 mapcount:0 mapping:  
> (null) index:0xfdfdfdfdfdfdfdfd
> [   59.651494] page flags: 
> 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
> [   59.653604] Modules linked in: netconsole acpiphp pci_hotplug 
> acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse 
> serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys 
> ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk 
> libata virtio_pci virtio_ring virtio scsi_mod
> [   59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
> [   59.657172] Call Trace:
> [   59.657275]  [] ? bad_page+0xb0/0x100
> [   59.657434]  [] ? free_pages_prepare+0xb3/0x100
> [   59.657610]  [] ? free_hot_cold_page+0x48/0x1a0
> [   59.657787]  [] ? online_pages_range+0x68/0xa0
> [   59.657961]  [] ? 
> __online_page_increment_counters+0x10/0x10
> [   59.658162]  [] ? walk_system_ram_range+0x101/0x110
> [   59.658346]  [] ? online_pages+0x1a5/0x2b0
> [   59.658515]  [] ? __memory_block_change_state+0x20d/0x270
> [   59.658710]  [] ? store_mem_state+0xb6/0xf0
> [   59.658878]  [] ? sysfs_write_file+0xd2/0x160
> [   59.659052]  [] ? vfs_write+0xaa/0x160
> [   59.659212]  [] ? sys_write+0x47/0x90
> [   59.659371]  [] ? async_page_fault+0x25/0x30
> [   59.659543]  [] ? system_call_fastpath+0x16/0x1b
> [   59.659720] Disabling lock debugging due to kernel taint
> 
> Patch 20/21 deals with a similar scenario, but only for __PG_HWPOISON flag.
> Did i miss any other patch for this?
> 
> thanks,
> 
> - Vasilis
> 
> [1] https://lkml.org/lkml/2012/9/6/635
> [2] https://lkml.org/lkml/2012/9/11/542
> [3] https://lkml.org/lkml/2012/9/20/37
> [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-26 Thread Wen Congyang
At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote:
> Hi,
> 
> I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes 
> [1],[2],[3]
> Running in a guest (qemu+seabios from [4]). 
> CONFIG_SLAB=y
> CONFIG_DEBUG_SLAB=y
> 
> After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > 
> /sys/bus/acpi/devices/PNP/eject"
> When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, 
> or in other caches
> 
> [  170.566995] Slab corruption (Not tainted): Acpi-State 
> start=88009fc1e548, len=80
> [  170.567265] Redzone: 0x0/0x0.
> [  170.567399] Last user: [<  (null)>](0x0)
> [  170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> [  170.569712] Prev obj: start=9fc1e4d0, len=80
> [  170.569869] BUG: unable to handle kernel paging request at 9fc1e520
> [  170.570171] IP: [] print_objinfo+0x9c/0x110
> [  170.570397] PGD 7cf37067 PUD 0 
> [  170.570619] Oops:  [#1] SMP 
> [  170.570843] Modules linked in: netconsole acpiphp pci_hotplug 
> acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse 
> parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core 
> button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk 
> ata_piix libata scsi_mod virtio_pci virtio_ring virtio
> [  170.573474] CPU 0 
> [  170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 
> Bochs Bochs
> [  170.573830] RIP: 0010:[]  [] 
> print_objinfo+0x9c/0x110
> [  170.574106] RSP: 0018:88003eaf3a70  EFLAGS: 00010202
> [  170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: 
> 24b8
> [  170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: 
> 88003e9bb980
> [  170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: 
> 
> [  170.574870] R10: 021e R11: 0002 R12: 
> 9fc1e4c8
> [  170.575070] R13: 9fc1e520 R14: 004f R15: 
> ffa5
> [  170.575274] FS:  7fc6b7530700() GS:88003fc0() 
> knlGS:
> [  170.575494] CS:  0010 DS:  ES:  CR0: 8005003b
> [  170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: 
> 06f0
> [  170.575870] DR0:  DR1:  DR2: 
> 
> [  170.576075] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, 
> task 88003ea941c0)
> [  170.576507] Stack:
> [  170.576599]  0010 01893fbe 88009fc1e000 
> 0050
> [  170.576938]  9fc1e4c8 004f ffa5 
> 8112899f
> [  170.576938]  88003eb309d8 81712d6d 88003e9bb980 
> 88009fc1e540
> [  170.576938] Call Trace:
> [  170.576938]  [] ? check_poison_obj+0x1df/0x1f0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? 
> cache_alloc_debugcheck_after.isra.52+0xed/0x220
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? kmem_cache_alloc+0xb5/0x1e0
> [  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
> [  170.576938]  [] ? acpi_ds_result_push+0x5d/0x12e
> [  170.576938]  [] ? acpi_ds_exec_end_op+0x28e/0x3d3
> [  170.576938]  [] ? acpi_ps_parse_loop+0x79f/0x931
> [  170.576938]  [] ? acpi_ps_parse_aml+0x89/0x261
> [  170.576938]  [] ? acpi_ps_execute_method+0x1be/0x266
> [  170.576938]  [] ? acpi_ns_evaluate+0xd3/0x19a
> [  170.576938]  [] ? acpi_evaluate_object+0xf3/0x1f4
> [  170.576938]  [] ? acpi_os_wait_events_complete+0x1b/0x1b
> [  170.576938]  [] ? acpi_bus_hot_remove_device+0xeb/0x123
> [  170.576938]  [] ? acpi_os_execute_deferred+0x1d/0x29
> [  170.576938]  [] ? process_one_work+0x125/0x560
> [  170.576938]  [] ? worker_thread+0x16a/0x4e0
> [  170.576938]  [] ? manage_workers+0x310/0x310
> [  170.576938]  [] ? kthread+0x85/0x90
> [  170.576938]  [] ? kernel_thread_helper+0x4/0x10
> [  170.576938]  [] ? flush_kthread_worker+0xa0/0xa0
> [  170.576938]  [] ? gs_change+0x13/0x13
> [  170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b 
> 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> 
> 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f 81 31 c0 e8 3e 34 3b 00 
> 
> Other times, the problem happens on a slab object free:
> 
> [   52.313366] Offlined Pages 32768
> [   52.800232] slab error in verify_redzone_free(): cache `Acpi-ParseExt': 
> memory outside object was overwritten
> [   52

Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-26 Thread Vasilis Liaskovitis
Testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3]
Running in a guest (qemu+seabios from [4]). 
CONFIG_SLAB=y
CONFIG_DEBUG_SLAB=y

- succesfull hot-add and online
- succesfull hot-remove with SCI (qemu) eject
- attempt to hot-readd same memory

When the pages are re-onlined on hot-readd, I get a bad_page state for many
pages e.g.

[   59.611278] init_memory_mapping: [mem 0x8000-0x9fff]
[   59.637836] Built 2 zonelists in Node order, mobility grouping on.  Total 
pages: 547617
[   59.638739] Policy zone: Normal
[   59.650840] BUG: Bad page state in process bash  pfn:9b6dc
[   59.651124] page:ea0002200020 count:0 mapcount:0 mapping:  
(null) index:0xfdfdfdfdfdfdfdfd
[   59.651494] page flags: 
0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock)
[   59.653604] Modules linked in: netconsole acpiphp pci_hotplug 
acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse 
serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys 
ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk 
libata virtio_pci virtio_ring virtio scsi_mod
[   59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12
[   59.657172] Call Trace:
[   59.657275]  [] ? bad_page+0xb0/0x100
[   59.657434]  [] ? free_pages_prepare+0xb3/0x100
[   59.657610]  [] ? free_hot_cold_page+0x48/0x1a0
[   59.657787]  [] ? online_pages_range+0x68/0xa0
[   59.657961]  [] ? 
__online_page_increment_counters+0x10/0x10
[   59.658162]  [] ? walk_system_ram_range+0x101/0x110
[   59.658346]  [] ? online_pages+0x1a5/0x2b0
[   59.658515]  [] ? __memory_block_change_state+0x20d/0x270
[   59.658710]  [] ? store_mem_state+0xb6/0xf0
[   59.658878]  [] ? sysfs_write_file+0xd2/0x160
[   59.659052]  [] ? vfs_write+0xaa/0x160
[   59.659212]  [] ? sys_write+0x47/0x90
[   59.659371]  [] ? async_page_fault+0x25/0x30
[   59.659543]  [] ? system_call_fastpath+0x16/0x1b
[   59.659720] Disabling lock debugging due to kernel taint

Patch 20/21 deals with a similar scenario, but only for __PG_HWPOISON flag.
Did i miss any other patch for this?

thanks,

- Vasilis

[1] https://lkml.org/lkml/2012/9/6/635
[2] https://lkml.org/lkml/2012/9/11/542
[3] https://lkml.org/lkml/2012/9/20/37
[4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-26 Thread Vasilis Liaskovitis
Hi,

I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3]
Running in a guest (qemu+seabios from [4]). 
CONFIG_SLAB=y
CONFIG_DEBUG_SLAB=y

After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > 
/sys/bus/acpi/devices/PNP/eject"
When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, or 
in other caches

[  170.566995] Slab corruption (Not tainted): Acpi-State 
start=88009fc1e548, len=80
[  170.567265] Redzone: 0x0/0x0.
[  170.567399] Last user: [<  (null)>](0x0)
[  170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

[  170.569712] Prev obj: start=9fc1e4d0, len=80
[  170.569869] BUG: unable to handle kernel paging request at 9fc1e520
[  170.570171] IP: [] print_objinfo+0x9c/0x110
[  170.570397] PGD 7cf37067 PUD 0 
[  170.570619] Oops:  [#1] SMP 
[  170.570843] Modules linked in: netconsole acpiphp pci_hotplug 
acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse 
parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core 
button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk 
ata_piix libata scsi_mod virtio_pci virtio_ring virtio
[  170.573474] CPU 0 
[  170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 Bochs 
Bochs
[  170.573830] RIP: 0010:[]  [] 
print_objinfo+0x9c/0x110
[  170.574106] RSP: 0018:88003eaf3a70  EFLAGS: 00010202
[  170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: 24b8
[  170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: 88003e9bb980
[  170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: 
[  170.574870] R10: 021e R11: 0002 R12: 9fc1e4c8
[  170.575070] R13: 9fc1e520 R14: 004f R15: ffa5
[  170.575274] FS:  7fc6b7530700() GS:88003fc0() 
knlGS:
[  170.575494] CS:  0010 DS:  ES:  CR0: 8005003b
[  170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: 06f0
[  170.575870] DR0:  DR1:  DR2: 
[  170.576075] DR3:  DR6: 0ff0 DR7: 0400
[  170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, task 
88003ea941c0)
[  170.576507] Stack:
[  170.576599]  0010 01893fbe 88009fc1e000 
0050
[  170.576938]  9fc1e4c8 004f ffa5 
8112899f
[  170.576938]  88003eb309d8 81712d6d 88003e9bb980 
88009fc1e540
[  170.576938] Call Trace:
[  170.576938]  [] ? check_poison_obj+0x1df/0x1f0
[  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
[  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
[  170.576938]  [] ? 
cache_alloc_debugcheck_after.isra.52+0xed/0x220
[  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
[  170.576938]  [] ? kmem_cache_alloc+0xb5/0x1e0
[  170.576938]  [] ? acpi_ut_create_generic_state+0x2f/0x4c
[  170.576938]  [] ? acpi_ds_result_push+0x5d/0x12e
[  170.576938]  [] ? acpi_ds_exec_end_op+0x28e/0x3d3
[  170.576938]  [] ? acpi_ps_parse_loop+0x79f/0x931
[  170.576938]  [] ? acpi_ps_parse_aml+0x89/0x261
[  170.576938]  [] ? acpi_ps_execute_method+0x1be/0x266
[  170.576938]  [] ? acpi_ns_evaluate+0xd3/0x19a
[  170.576938]  [] ? acpi_evaluate_object+0xf3/0x1f4
[  170.576938]  [] ? acpi_os_wait_events_complete+0x1b/0x1b
[  170.576938]  [] ? acpi_bus_hot_remove_device+0xeb/0x123
[  170.576938]  [] ? acpi_os_execute_deferred+0x1d/0x29
[  170.576938]  [] ? process_one_work+0x125/0x560
[  170.576938]  [] ? worker_thread+0x16a/0x4e0
[  170.576938]  [] ? manage_workers+0x310/0x310
[  170.576938]  [] ? kthread+0x85/0x90
[  170.576938]  [] ? kernel_thread_helper+0x4/0x10
[  170.576938]  [] ? flush_kthread_worker+0xa0/0xa0
[  170.576938]  [] ? gs_change+0x13/0x13
[  170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b 
7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> 8b 
55 00 48 8b 30 48 c7 c7 8c 39 6f 81 31 c0 e8 3e 34 3b 00 

Other times, the problem happens on a slab object free:

[   52.313366] Offlined Pages 32768
[   52.800232] slab error in verify_redzone_free(): cache `Acpi-ParseExt': 
memory outside object was overwritten
[   52.801298] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12
[   52.802039] Call Trace:
[   52.802443]  [] ? __slab_error.isra.46+0x1b/0x30
[   52.803199]  [] ? cache_free_debugcheck+0x256/0x260
[   52.803940]  [] ? acpi_os_release_object+0x7/0xc
[   52.804645]  [] ?

[RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-05 Thread wency
From: Wen Congyang 

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

  - acpi_memory_info  : [RFC PATCH 4/19]
  - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
  - iomem_resource: [RFC PATCH 9/19]
  - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
  - page table of removed memory  : [RFC PATCH 12/19]
  - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
   ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug
3. hotplug the memory device(it depends on your hardware)
   You will see the memory device under the directory /sys/bus/acpi/devices/.
   Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
   You can write online/offline to /sys/devices/system/memory/memoryX/state to
   online/offline pages provided by this memory device
5. hotremove the memory device
   You can hotremove the memory device by the hardware, or writing 1 to
   /sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
   For example: there is a memory device on node 1. The address range
   is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
   and memory11 under the directory /sys/devices/system/memory/.
   If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
   when we online pages. When we online memory8, the memory stored page cgroup
   is not provided by this memory device. But when we online memory9, the memory
   stored page cgroup may be provided by memory8. So we can't offline memory8
   now. We should offline the memory in the reversed order.
   When the memory device is hotremoved, we will auto offline memory provided
   by this memory device. But we don't know which memory is onlined first, so
   offlining memory may fail. In such case, you should offline the memory by
   hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
   This bug will be fixed by Liu Jiang's patch:
   https://lkml.org/lkml/2012/7/3/1

change log of v9:
 [RFC PATCH v9 8/21]
   * add a lock to protect the list map_entries
   * add an indicator to firmware_map_entry to remember whether the memory
 is allocated from bootmem
 [RFC PATCH v9 10/21]
   * change the macro to inline function
 [RFC PATCH v9 19/21]
   * don't offline the node if the cpu on the node is onlined
 [RFC PATCH v9 21/21]
   * create new patch: auto offline page_cgroup when onlining memory block
 failed

change log of v8:
 [RFC PATCH v8 17/20]
   * Fix problems when one node's range include the other nodes
 [RFC PATCH v8 18/20]
   * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
 is not defined.
 [RFC PATCH v8 19/20]
   * don't offline node when some memory sections are not removed
 [RFC PATCH v8 20/20]
   * create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
 [RFC PATCH v7 4/19]
   * do not continue if acpi_memory_device_remove_memory() fails.
 [RFC PATCH v7 15/19]
   * handle usemap in register_page_bootmem_info_section() too.

change log of v6:
 [RFC PATCH v6 12/19]
   * fix building error on other archtitectures than x86

 [RFC PATCH v6 15-16/19]
   * fix building error on other archtitectures than x86

change log of v5:
 * merge the patchset to clear page table and the patchset to hot remove
   memory(from ishimatsu) to one big patchset.

 [RFC PATCH v5 1/19]
   * rename remove_memory() to offline_memory()/offline_pages()

 [RFC PATCH v5 2/19]
   * new patch: implement offline_memory(). This function offlines pages,
 update memory block's state, and notify the userspace that the memory
 block's state is changed.

 [RFC PATCH v5 4/19]
   * offline and remove memory in acpi_memory_disable_device() too.

 [RFC PATCH v5 17/19]
   * new patch: add a new function __remove_zone() to revert the things done
 in the function __add_zone().

 [RFC PATCH v5 18/19]
   * flush work befor reseting node device.

change log of v4:
 * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
   from the patch series, since the patch is a bugfix. It is being disccussed
   on other thread. But for testing the patch series, the patch is needed.
   So I added the patch as [PATCH 0/13].

 [RFC PATCH v4 2/13]
   * check memory is online or not at remove_memory()
   * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
 getting node id
 
 [RFC PATCH v4 3/13]
   * create new patch : check memory is online or not at online_pages()