Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote: > Hi, > > I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes > [1],[2],[3] > Running in a guest (qemu+seabios from [4]). > CONFIG_SLAB=y > CONFIG_DEBUG_SLAB=y > > After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > > /sys/bus/acpi/devices/PNP/eject" > When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, > or in other caches The following patch can fix this problem: https://lkml.org/lkml/2012/7/12/186 Thanks Wen Congyang > > [ 170.566995] Slab corruption (Not tainted): Acpi-State > start=88009fc1e548, len=80 > [ 170.567265] Redzone: 0x0/0x0. > [ 170.567399] Last user: [< (null)>](0x0) > [ 170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569712] Prev obj: start=9fc1e4d0, len=80 > [ 170.569869] BUG: unable to handle kernel paging request at 9fc1e520 > [ 170.570171] IP: [] print_objinfo+0x9c/0x110 > [ 170.570397] PGD 7cf37067 PUD 0 > [ 170.570619] Oops: [#1] SMP > [ 170.570843] Modules linked in: netconsole acpiphp pci_hotplug > acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse > parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core > button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk > ata_piix libata scsi_mod virtio_pci virtio_ring virtio > [ 170.573474] CPU 0 > [ 170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 > Bochs Bochs > [ 170.573830] RIP: 0010:[] [] > print_objinfo+0x9c/0x110 > [ 170.574106] RSP: 0018:88003eaf3a70 EFLAGS: 00010202 > [ 170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: > 24b8 > [ 170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: > 88003e9bb980 > [ 170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: > > [ 170.574870] R10: 021e R11: 0002 R12: > 9fc1e4c8 > [ 170.575070] R13: 9fc1e520 R14: 004f R15: > ffa5 > [ 170.575274] FS: 7fc6b7530700() GS:88003fc0() > knlGS: > [ 170.575494] CS: 0010 DS: ES: CR0: 8005003b > [ 170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: > 06f0 > [ 170.575870] DR0: DR1: DR2: > > [ 170.576075] DR3: DR6: 0ff0 DR7: > 0400 > [ 170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, > task 88003ea941c0) > [ 170.576507] Stack: > [ 170.576599] 0010 01893fbe 88009fc1e000 > 0050 > [ 170.576938] 9fc1e4c8 004f ffa5 > 8112899f > [ 170.576938] 88003eb309d8 81712d6d 88003e9bb980 > 88009fc1e540 > [ 170.576938] Call Trace: > [ 170.576938] [] ? check_poison_obj+0x1df/0x1f0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? > cache_alloc_debugcheck_after.isra.52+0xed/0x220 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? kmem_cache_alloc+0xb5/0x1e0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ds_result_push+0x5d/0x12e > [ 170.576938] [] ? acpi_ds_exec_end_op+0x28e/0x3d3 > [ 170.576938] [] ? acpi_ps_parse_loop+0x79f/0x931 > [ 170.576938] [] ? acpi_ps_parse_aml+0x89/0x261 > [ 170.576938] [] ? acpi_ps_execute_method+0x1be/0x266 > [ 170.576938] [] ? acpi_ns_evaluate+0xd3/0x19a > [ 170.576938] [] ? acpi_evaluate_object+0xf3/0x1f4 > [ 170.576938] [] ? acpi_os_wait_events_complete+0x1b/0x1b > [ 170.576938] [] ? acpi_bus_hot_remove_device+0xeb/0x123 > [ 170.576938] [] ? acpi_os_execute_deferred+0x1d/0x29 > [ 170.576938] [] ? process_one_work+0x125/0x560 > [ 170.576938] [] ? worker_thread+0x16a/0x4e0 > [ 170.576938] [] ? manage_workers+0x310/0x310 > [ 170.576938] [] ? kthread+0x85/0x90 > [ 170.576938] [] ? kernel_thread_helper+0x4/0x10 > [ 170.576938] [] ? flush_kthread_worker+0xa0/0xa0 > [ 170.576938] [] ? gs_change+0x13/0x13 > [ 170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b > 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> > 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f 81 31 c0 e8 3e 34 3b 00 > > Other times, the problem happens on a slab object free: > > [ 52.313366] Offlined Pages 32768 > [ 52.800232] slab err
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote: > Hi, > > I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes > [1],[2],[3] > Running in a guest (qemu+seabios from [4]). > CONFIG_SLAB=y > CONFIG_DEBUG_SLAB=y > > After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > > /sys/bus/acpi/devices/PNP/eject" > When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, > or in other caches I found the reason: when you do OSPM-eject, the kernel will auto offline and remove the memory. But, offlining memory fails, and the memory is still used by the kernel. But device_release_driver() doesn't tell this error to the caller acpi_bus_remove(). The kernel will poweroff and eject the device by emulate _PS3 and _EJ0. The kernel uses some memory which doesn't exist. It's very dangerous. Thanks Wen Conyang > > [ 170.566995] Slab corruption (Not tainted): Acpi-State > start=88009fc1e548, len=80 > [ 170.567265] Redzone: 0x0/0x0. > [ 170.567399] Last user: [< (null)>](0x0) > [ 170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569712] Prev obj: start=9fc1e4d0, len=80 > [ 170.569869] BUG: unable to handle kernel paging request at 9fc1e520 > [ 170.570171] IP: [] print_objinfo+0x9c/0x110 > [ 170.570397] PGD 7cf37067 PUD 0 > [ 170.570619] Oops: [#1] SMP > [ 170.570843] Modules linked in: netconsole acpiphp pci_hotplug > acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse > parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core > button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk > ata_piix libata scsi_mod virtio_pci virtio_ring virtio > [ 170.573474] CPU 0 > [ 170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 > Bochs Bochs > [ 170.573830] RIP: 0010:[] [] > print_objinfo+0x9c/0x110 > [ 170.574106] RSP: 0018:88003eaf3a70 EFLAGS: 00010202 > [ 170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: > 24b8 > [ 170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: > 88003e9bb980 > [ 170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: > > [ 170.574870] R10: 021e R11: 0002 R12: > 9fc1e4c8 > [ 170.575070] R13: 9fc1e520 R14: 004f R15: > ffa5 > [ 170.575274] FS: 7fc6b7530700() GS:88003fc0() > knlGS: > [ 170.575494] CS: 0010 DS: ES: CR0: 8005003b > [ 170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: > 06f0 > [ 170.575870] DR0: DR1: DR2: > > [ 170.576075] DR3: DR6: 0ff0 DR7: > 0400 > [ 170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, > task 88003ea941c0) > [ 170.576507] Stack: > [ 170.576599] 0010 01893fbe 88009fc1e000 > 0050 > [ 170.576938] 9fc1e4c8 004f ffa5 > 8112899f > [ 170.576938] 88003eb309d8 81712d6d 88003e9bb980 > 88009fc1e540 > [ 170.576938] Call Trace: > [ 170.576938] [] ? check_poison_obj+0x1df/0x1f0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? > cache_alloc_debugcheck_after.isra.52+0xed/0x220 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? kmem_cache_alloc+0xb5/0x1e0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ds_result_push+0x5d/0x12e > [ 170.576938] [] ? acpi_ds_exec_end_op+0x28e/0x3d3 > [ 170.576938] [] ? acpi_ps_parse_loop+0x79f/0x931 > [ 170.576938] [] ? acpi_ps_parse_aml+0x89/0x261 > [ 170.576938] [] ? acpi_ps_execute_method+0x1be/0x266 > [ 170.576938] [] ? acpi_ns_evaluate+0xd3/0x19a > [ 170.576938] [] ? acpi_evaluate_object+0xf3/0x1f4 > [ 170.576938] [] ? acpi_os_wait_events_complete+0x1b/0x1b > [ 170.576938] [] ? acpi_bus_hot_remove_device+0xeb/0x123 > [ 170.576938] [] ? acpi_os_execute_deferred+0x1d/0x29 > [ 170.576938] [] ? process_one_work+0x125/0x560 > [ 170.576938] [] ? worker_thread+0x16a/0x4e0 > [ 170.576938] [] ? manage_workers+0x310/0x310 > [ 170.576938] [] ? kthread+0x85/0x90 > [ 170.576938] [] ? kernel_thread_helper+0x4/0x10 > [ 170.576938] [] ? flush_kthread_worker+0xa0/0xa0 > [ 170.576938] [] ? gs_change+0x13/0x13 > [ 170.576938] Code: cb 75 d
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote: > Hi, > > I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes > [1],[2],[3] > Running in a guest (qemu+seabios from [4]). > CONFIG_SLAB=y > CONFIG_DEBUG_SLAB=y > > After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > > /sys/bus/acpi/devices/PNP/eject" > When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, > or in other caches I can reproduce this problem without my patchset. Thanks Wen Congyang > > [ 170.566995] Slab corruption (Not tainted): Acpi-State > start=88009fc1e548, len=80 > [ 170.567265] Redzone: 0x0/0x0. > [ 170.567399] Last user: [< (null)>](0x0) > [ 170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569712] Prev obj: start=9fc1e4d0, len=80 > [ 170.569869] BUG: unable to handle kernel paging request at 9fc1e520 > [ 170.570171] IP: [] print_objinfo+0x9c/0x110 > [ 170.570397] PGD 7cf37067 PUD 0 > [ 170.570619] Oops: [#1] SMP > [ 170.570843] Modules linked in: netconsole acpiphp pci_hotplug > acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse > parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core > button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk > ata_piix libata scsi_mod virtio_pci virtio_ring virtio > [ 170.573474] CPU 0 > [ 170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 > Bochs Bochs > [ 170.573830] RIP: 0010:[] [] > print_objinfo+0x9c/0x110 > [ 170.574106] RSP: 0018:88003eaf3a70 EFLAGS: 00010202 > [ 170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: > 24b8 > [ 170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: > 88003e9bb980 > [ 170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: > > [ 170.574870] R10: 021e R11: 0002 R12: > 9fc1e4c8 > [ 170.575070] R13: 9fc1e520 R14: 004f R15: > ffa5 > [ 170.575274] FS: 7fc6b7530700() GS:88003fc0() > knlGS: > [ 170.575494] CS: 0010 DS: ES: CR0: 8005003b > [ 170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: > 06f0 > [ 170.575870] DR0: DR1: DR2: > > [ 170.576075] DR3: DR6: 0ff0 DR7: > 0400 > [ 170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, > task 88003ea941c0) > [ 170.576507] Stack: > [ 170.576599] 0010 01893fbe 88009fc1e000 > 0050 > [ 170.576938] 9fc1e4c8 004f ffa5 > 8112899f > [ 170.576938] 88003eb309d8 81712d6d 88003e9bb980 > 88009fc1e540 > [ 170.576938] Call Trace: > [ 170.576938] [] ? check_poison_obj+0x1df/0x1f0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? > cache_alloc_debugcheck_after.isra.52+0xed/0x220 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? kmem_cache_alloc+0xb5/0x1e0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ds_result_push+0x5d/0x12e > [ 170.576938] [] ? acpi_ds_exec_end_op+0x28e/0x3d3 > [ 170.576938] [] ? acpi_ps_parse_loop+0x79f/0x931 > [ 170.576938] [] ? acpi_ps_parse_aml+0x89/0x261 > [ 170.576938] [] ? acpi_ps_execute_method+0x1be/0x266 > [ 170.576938] [] ? acpi_ns_evaluate+0xd3/0x19a > [ 170.576938] [] ? acpi_evaluate_object+0xf3/0x1f4 > [ 170.576938] [] ? acpi_os_wait_events_complete+0x1b/0x1b > [ 170.576938] [] ? acpi_bus_hot_remove_device+0xeb/0x123 > [ 170.576938] [] ? acpi_os_execute_deferred+0x1d/0x29 > [ 170.576938] [] ? process_one_work+0x125/0x560 > [ 170.576938] [] ? worker_thread+0x16a/0x4e0 > [ 170.576938] [] ? manage_workers+0x310/0x310 > [ 170.576938] [] ? kthread+0x85/0x90 > [ 170.576938] [] ? kernel_thread_helper+0x4/0x10 > [ 170.576938] [] ? flush_kthread_worker+0xa0/0xa0 > [ 170.576938] [] ? gs_change+0x13/0x13 > [ 170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b > 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> > 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f 81 31 c0 e8 3e 34 3b 00 > > Other times, the problem happens on a slab object free: > > [ 52.313366] Offlined Pages 32768 > [ 52.800232] slab error in verify_redzone_free():
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
Hi Chen, 2012/10/02 8:45, Ni zhan Chen wrote: On 10/01/2012 12:44 PM, Yasuaki Ishimatsu wrote: Hi Chen, 2012/09/29 17:19, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug Hi Yasuaki, where is the acpi_memhotplug module? If you build acpi_memhotplug as module, it is created under /lib/modules//driver/acpi/ directory. It depends on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in function. So you don't need to care about it. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I build the kernel, MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY are seleted as [*], but I can't find PNP0C80:XX under the directory /sys/bus/acpi/devices/. [root@localhost ~]# ls /sys/bus/acpi/devices/ device:00 device:07 device:0e device:15 device:1c device:23 device:2a LNXCPU:00 LNXCPU:07PNP0501:00 PNP0C02:00 PNP0C0F:02 PNP0C14:01 device:01 device:08 device:0f device:16 device:1d device:24 device:2b LNXCPU:01 LNXPWRBN:00 PNP0800:00 PNP0C02:01 PNP0C0F:03 PNP0C31:00 device:02 device:09 device:10 device:17 device:1e device:25 device:2c LNXCPU:02 LNXSYSTM:00 PNP0A08:00 PNP0C02:02 PNP0C0F:04 device:03 device:0a device:11 device:18 device:1f device:26 device:2d LNXCPU:03 PNP:00 PNP0B00:00 PNP0C04:00 PNP0C0F:05 device:04 device:0b device:12 device:19 device:20 device:27 device:2e LNXCPU:04 PNP0100:00 PNP0C01:00 PNP0C0C:00 PNP0C0F:06 device:05 device:0c device:13 device:1a device:21 device:28 device:2f LNXCPU:05 PNP0103:00 PNP0C01:01 PNP0C0F:00 PNP0C0F:07 device:06 device:0d device:14 device:1b device:22 device:29 INT3F0D:00 LNXCPU:06 PNP0200:00 PNP0C01:02 PNP0C0F:01 PNP0C14:00 then what I miss ? thanks. It depend on hardware. It seems that your system does not support memory hotplug. If you use KVM, you can try memory hotplug on KVM guest by applying Vasilis' patch-set. http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html Thanks, Yasuaki Ishimatsu 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup when onlining memory block fail
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
On 10/01/2012 12:44 PM, Yasuaki Ishimatsu wrote: Hi Chen, 2012/09/29 17:19, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug Hi Yasuaki, where is the acpi_memhotplug module? If you build acpi_memhotplug as module, it is created under /lib/modules//driver/acpi/ directory. It depends on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in function. So you don't need to care about it. Thanks, Yasuaki Ishimatsu Hi Yasuaki, I build the kernel, MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY are seleted as [*], but I can't find PNP0C80:XX under the directory /sys/bus/acpi/devices/. [root@localhost ~]# ls /sys/bus/acpi/devices/ device:00 device:07 device:0e device:15 device:1c device:23 device:2a LNXCPU:00 LNXCPU:07PNP0501:00 PNP0C02:00 PNP0C0F:02 PNP0C14:01 device:01 device:08 device:0f device:16 device:1d device:24 device:2b LNXCPU:01 LNXPWRBN:00 PNP0800:00 PNP0C02:01 PNP0C0F:03 PNP0C31:00 device:02 device:09 device:10 device:17 device:1e device:25 device:2c LNXCPU:02 LNXSYSTM:00 PNP0A08:00 PNP0C02:02 PNP0C0F:04 device:03 device:0a device:11 device:18 device:1f device:26 device:2d LNXCPU:03 PNP:00 PNP0B00:00 PNP0C04:00 PNP0C0F:05 device:04 device:0b device:12 device:19 device:20 device:27 device:2e LNXCPU:04 PNP0100:00 PNP0C01:00 PNP0C0C:00 PNP0C0F:06 device:05 device:0c device:13 device:1a device:21 device:28 device:2f LNXCPU:05 PNP0103:00 PNP0C01:01 PNP0C0F:00 PNP0C0F:07 device:06 device:0d device:14 device:1b device:22 device:29 INT3F0D:00 LNXCPU:06 PNP0200:00 PNP0C01:02 PNP0C0F:01 PNP0C14:00 then what I miss ? thanks. 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup when onlining memory block failed change log of v8: [RFC PATCH v8 17/20] * Fix problems when one node's range include the other nodes [RFC PATCH v8 18/20] * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS is not defined. [RFC PATCH v8 19/20] * don't offline node
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
Hi Chen, 2012/09/29 17:19, Ni zhan Chen wrote: On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug Hi Yasuaki, where is the acpi_memhotplug module? If you build acpi_memhotplug as module, it is created under /lib/modules//driver/acpi/ directory. It depends on config ACPI_HOTPLUG_MEMORY. The confing is [*], it becomes built-in function. So you don't need to care about it. Thanks, Yasuaki Ishimatsu 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup when onlining memory block failed change log of v8: [RFC PATCH v8 17/20] * Fix problems when one node's range include the other nodes [RFC PATCH v8 18/20] * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS is not defined. [RFC PATCH v8 19/20] * don't offline node when some memory sections are not removed [RFC PATCH v8 20/20] * create new patch: clear hwpoisoned flag when onlining pages change log of v7: [RFC PATCH v7 4/19] * do not continue if acpi_memory_device_remove_memory() fails. [RFC PATCH v7 15/19] * handle usemap in register_page_bootmem_info_section() too. change log of v6: [RFC PATCH v6 12/19] * fix building error on other archtitectures than x86 [RFC PATCH v6 15-16/19] * fix building error on other archtitectures than x86 change log of v5: * merge the patchset to clear page table and the patchset to hot remove memory(from ishimatsu) to one big patchset. [RFC PATCH v5 1/19] * rename remove_memory() to offline_memory()/offline_pages() [RFC PATCH v5 2/19] * new patch: implement offline_memory(). This function offlines pages, update memory block's state, and notify the userspace that the memory block's state is changed. [RFC PATCH v5 4/19] * offline and remove memory in acpi_memory_disable_device() too. [RFC PATCH v5 17/19] * new patch: add a new function __remove_zone() to revert the things done in the function __add_zone(). [RFC PATCH v5 18/19] * flush work befor reseting node device. change log of v4: * remove "memory-hotplug : unify argument of fir
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug Hi Yasuaki, where is the acpi_memhotplug module? 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup when onlining memory block failed change log of v8: [RFC PATCH v8 17/20] * Fix problems when one node's range include the other nodes [RFC PATCH v8 18/20] * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS is not defined. [RFC PATCH v8 19/20] * don't offline node when some memory sections are not removed [RFC PATCH v8 20/20] * create new patch: clear hwpoisoned flag when onlining pages change log of v7: [RFC PATCH v7 4/19] * do not continue if acpi_memory_device_remove_memory() fails. [RFC PATCH v7 15/19] * handle usemap in register_page_bootmem_info_section() too. change log of v6: [RFC PATCH v6 12/19] * fix building error on other archtitectures than x86 [RFC PATCH v6 15-16/19] * fix building error on other archtitectures than x86 change log of v5: * merge the patchset to clear page table and the patchset to hot remove memory(from ishimatsu) to one big patchset. [RFC PATCH v5 1/19] * rename remove_memory() to offline_memory()/offline_pages() [RFC PATCH v5 2/19] * new patch: implement offline_memory(). This function offlines pages, update memory block's state, and notify the userspace that the memory block's state is changed. [RFC PATCH v5 4/19] * offline and remove memory in acpi_memory_disable_device() too. [RFC PATCH v5 17/19] * new patch: add a new function __remove_zone() to revert the things done in the function __add_zone(). [RFC PATCH v5 18/19] * flush work befor reseting node device. change log of v4: * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug" from the patch series, since the patch is a bugfix. It is being disccussed on other thread. But for testing the patch series, the patch is needed. So I added the patch as [PATCH 0/13]. [RFC PATCH v4 2/13] * check memory is online or not at remove_memory() *
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote: From: Wen Congyang This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. Since patchset is too big, could you add more patchset changelog to describe how this patchset works? in order that it is easier to review. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup when onlining memory block failed change log of v8: [RFC PATCH v8 17/20] * Fix problems when one node's range include the other nodes [RFC PATCH v8 18/20] * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS is not defined. [RFC PATCH v8 19/20] * don't offline node when some memory sections are not removed [RFC PATCH v8 20/20] * create new patch: clear hwpoisoned flag when onlining pages change log of v7: [RFC PATCH v7 4/19] * do not continue if acpi_memory_device_remove_memory() fails. [RFC PATCH v7 15/19] * handle usemap in register_page_bootmem_info_section() too. change log of v6: [RFC PATCH v6 12/19] * fix building error on other archtitectures than x86 [RFC PATCH v6 15-16/19] * fix building error on other archtitectures than x86 change log of v5: * merge the patchset to clear page table and the patchset to hot remove memory(from ishimatsu) to one big patchset. [RFC PATCH v5 1/19] * rename remove_memory() to offline_memory()/offline_pages() [RFC PATCH v5 2/19] * new patch: implement offline_memory(). This function offlines pages, update memory block's state, and notify the userspace that the memory block's state is changed. [RFC PATCH v5 4/19] * offline and remove memory in acpi_memory_disable_device() too. [RFC PATCH v5 17/19] * new patch: add a new function __remove_zone() to revert the things done in the function __add_zone(). [RFC PATCH v5 18/19] * flush work befor reseting node device. change log of v4: * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug" from the patch series, since the patch is a bugfix. It is being disccussed on other thread. But for testing the patch series, the patch is needed. So I added the patch as [PATC
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
At 09/27/2012 06:35 PM, Vasilis Liaskovitis Wrote: > On Thu, Sep 27, 2012 at 02:37:14PM +0800, Wen Congyang wrote: >> Hi Vasilis Liaskovitis >> >> At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote: >>> Hi, >>> >>> I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes >>> [1],[2],[3] >>> Running in a guest (qemu+seabios from [4]). >>> CONFIG_SLAB=y >>> CONFIG_DEBUG_SLAB=y >>> >>> After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > >>> /sys/bus/acpi/devices/PNP/eject" >>> When I do the OSPM-eject, I often get slab corruption in "acpi-state" >>> cache, or in other caches >> >> I can't reproduce this problem. Can you provide the following information: >> 1. config file >> 2. qemu's command line >> >> You said you did OSPM-eject. Do you mean write 1 to >> /sys/bus/acpi/devices/PNP0C80:XX/eject? > yes. > > example qemu command line with one dimm: > > "/opt/qemu-kvm-memhp/bin/qemu-system-x86_64 -bios > /opt/extra/vliaskov/devel/seabios-upstream/out/bios.bin -enable-kvm -M pc -smp > 4,maxcpus=8 -cpu host -m 2048 -drive > file=/opt/extra/debian-template.raw,if=none,id=drive-virtio-disk0,format=raw > -device > virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 > -vga cirrus -netdev type=tap,id=guest0,vhost=on -device > virtio-net-pci,netdev=guest0 > -monitor unix:/tmp/qemu.monitor11,server,nowait -chardev stdio,id=seabios > -device > isa-debugcon,iobase=0x402,chardev=seabios > -dimm id=n0,size=512M,node=0" > > or last line with 2 numa nodes: > "-dimm id=n0,size=512M,node=0 -dimm id=n1,size=512M,node=1 -numa > node,nodeid=0 -numa node,nodeid=1" I have reproduced this problem. It only can be reproduced when the dimm's memory is on node 0. I investigate it now. Thanks Wen Congyang > > attached config. Tree is at: > https://github.com/vliaskov/linux/commits/memhp-fujitsu > > thanks, > - Vasilis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
On Thu, Sep 27, 2012 at 06:06:30PM +0800, Wen Congyang wrote: > Please try the following patch: > From a38ec678e0a9b48b252f457d7910b7527049dc43 Mon Sep 17 00:00:00 2001 > From: Wen Congyang > Date: Thu, 27 Sep 2012 17:27:57 +0800 > Subject: [PATCH] clear the memory to store page information this solves the hot re-add problem for me. thanks for the quick solution. - Vasilis > > --- > mm/sparse.c |3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/mm/sparse.c b/mm/sparse.c > index ab9d755..36dda08 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -639,7 +639,6 @@ static struct page *__kmalloc_section_memmap(unsigned > long nr_pages) > got_map_page: > ret = (struct page *)pfn_to_kaddr(page_to_pfn(page)); > got_map_ptr: > - memset(ret, 0, memmap_size); > > return ret; > } > @@ -761,6 +760,8 @@ int __meminit sparse_add_one_section(struct zone *zone, > unsigned long start_pfn, > goto out; > } > > + memset(memmap, 0, sizeof(struct page) * nr_pages); > + > ms->section_mem_map |= SECTION_MARKED_PRESENT; > > ret = sparse_init_one_section(ms, section_nr, memmap, usemap); > -- > 1.7.1 > > Thanks > Wen Congyang > > > > > thanks, > > > > - Vasilis > > > > [1] https://lkml.org/lkml/2012/9/6/635 > > [2] https://lkml.org/lkml/2012/9/11/542 > > [3] https://lkml.org/lkml/2012/9/20/37 > > [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691 > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
At 09/27/2012 12:58 AM, Vasilis Liaskovitis Wrote: > Testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3] > Running in a guest (qemu+seabios from [4]). > CONFIG_SLAB=y > CONFIG_DEBUG_SLAB=y > > - succesfull hot-add and online > - succesfull hot-remove with SCI (qemu) eject > - attempt to hot-readd same memory > > When the pages are re-onlined on hot-readd, I get a bad_page state for many > pages e.g. > > [ 59.611278] init_memory_mapping: [mem 0x8000-0x9fff] > [ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total > pages: 547617 > [ 59.638739] Policy zone: Normal > [ 59.650840] BUG: Bad page state in process bash pfn:9b6dc > [ 59.651124] page:ea0002200020 count:0 mapcount:0 mapping: > (null) index:0xfdfdfdfdfdfdfdfd > [ 59.651494] page flags: > 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock) > [ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug > acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse > serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys > ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk > libata virtio_pci virtio_ring virtio scsi_mod > [ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12 > [ 59.657172] Call Trace: > [ 59.657275] [] ? bad_page+0xb0/0x100 > [ 59.657434] [] ? free_pages_prepare+0xb3/0x100 > [ 59.657610] [] ? free_hot_cold_page+0x48/0x1a0 > [ 59.657787] [] ? online_pages_range+0x68/0xa0 > [ 59.657961] [] ? > __online_page_increment_counters+0x10/0x10 > [ 59.658162] [] ? walk_system_ram_range+0x101/0x110 > [ 59.658346] [] ? online_pages+0x1a5/0x2b0 > [ 59.658515] [] ? __memory_block_change_state+0x20d/0x270 > [ 59.658710] [] ? store_mem_state+0xb6/0xf0 > [ 59.658878] [] ? sysfs_write_file+0xd2/0x160 > [ 59.659052] [] ? vfs_write+0xaa/0x160 > [ 59.659212] [] ? sys_write+0x47/0x90 > [ 59.659371] [] ? async_page_fault+0x25/0x30 > [ 59.659543] [] ? system_call_fastpath+0x16/0x1b > [ 59.659720] Disabling lock debugging due to kernel taint > > Patch 20/21 deals with a similar scenario, but only for __PG_HWPOISON flag. > Did i miss any other patch for this? Please try the following patch: >From a38ec678e0a9b48b252f457d7910b7527049dc43 Mon Sep 17 00:00:00 2001 From: Wen Congyang Date: Thu, 27 Sep 2012 17:27:57 +0800 Subject: [PATCH] clear the memory to store page information --- mm/sparse.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/mm/sparse.c b/mm/sparse.c index ab9d755..36dda08 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -639,7 +639,6 @@ static struct page *__kmalloc_section_memmap(unsigned long nr_pages) got_map_page: ret = (struct page *)pfn_to_kaddr(page_to_pfn(page)); got_map_ptr: - memset(ret, 0, memmap_size); return ret; } @@ -761,6 +760,8 @@ int __meminit sparse_add_one_section(struct zone *zone, unsigned long start_pfn, goto out; } + memset(memmap, 0, sizeof(struct page) * nr_pages); + ms->section_mem_map |= SECTION_MARKED_PRESENT; ret = sparse_init_one_section(ms, section_nr, memmap, usemap); -- 1.7.1 Thanks Wen Congyang > > thanks, > > - Vasilis > > [1] https://lkml.org/lkml/2012/9/6/635 > [2] https://lkml.org/lkml/2012/9/11/542 > [3] https://lkml.org/lkml/2012/9/20/37 > [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691 > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
At 09/27/2012 12:58 AM, Vasilis Liaskovitis Wrote: > Testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3] > Running in a guest (qemu+seabios from [4]). > CONFIG_SLAB=y > CONFIG_DEBUG_SLAB=y > > - succesfull hot-add and online > - succesfull hot-remove with SCI (qemu) eject > - attempt to hot-readd same memory > > When the pages are re-onlined on hot-readd, I get a bad_page state for many > pages e.g. I have reproduced this problem, and I investigate it now. Thanks Wen Congyang > > [ 59.611278] init_memory_mapping: [mem 0x8000-0x9fff] > [ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total > pages: 547617 > [ 59.638739] Policy zone: Normal > [ 59.650840] BUG: Bad page state in process bash pfn:9b6dc > [ 59.651124] page:ea0002200020 count:0 mapcount:0 mapping: > (null) index:0xfdfdfdfdfdfdfdfd > [ 59.651494] page flags: > 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock) > [ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug > acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse > serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys > ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk > libata virtio_pci virtio_ring virtio scsi_mod > [ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12 > [ 59.657172] Call Trace: > [ 59.657275] [] ? bad_page+0xb0/0x100 > [ 59.657434] [] ? free_pages_prepare+0xb3/0x100 > [ 59.657610] [] ? free_hot_cold_page+0x48/0x1a0 > [ 59.657787] [] ? online_pages_range+0x68/0xa0 > [ 59.657961] [] ? > __online_page_increment_counters+0x10/0x10 > [ 59.658162] [] ? walk_system_ram_range+0x101/0x110 > [ 59.658346] [] ? online_pages+0x1a5/0x2b0 > [ 59.658515] [] ? __memory_block_change_state+0x20d/0x270 > [ 59.658710] [] ? store_mem_state+0xb6/0xf0 > [ 59.658878] [] ? sysfs_write_file+0xd2/0x160 > [ 59.659052] [] ? vfs_write+0xaa/0x160 > [ 59.659212] [] ? sys_write+0x47/0x90 > [ 59.659371] [] ? async_page_fault+0x25/0x30 > [ 59.659543] [] ? system_call_fastpath+0x16/0x1b > [ 59.659720] Disabling lock debugging due to kernel taint > > Patch 20/21 deals with a similar scenario, but only for __PG_HWPOISON flag. > Did i miss any other patch for this? > > thanks, > > - Vasilis > > [1] https://lkml.org/lkml/2012/9/6/635 > [2] https://lkml.org/lkml/2012/9/11/542 > [3] https://lkml.org/lkml/2012/9/20/37 > [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
Hi Vasilis Liaskovitis At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote: > Hi, > > I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes > [1],[2],[3] > Running in a guest (qemu+seabios from [4]). > CONFIG_SLAB=y > CONFIG_DEBUG_SLAB=y > > After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > > /sys/bus/acpi/devices/PNP/eject" > When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, > or in other caches I can't reproduce this problem. Can you provide the following information: 1. config file 2. qemu's command line You said you did OSPM-eject. Do you mean write 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject? Thanks Wen Congyang > > [ 170.566995] Slab corruption (Not tainted): Acpi-State > start=88009fc1e548, len=80 > [ 170.567265] Redzone: 0x0/0x0. > [ 170.567399] Last user: [< (null)>](0x0) > [ 170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569712] Prev obj: start=9fc1e4d0, len=80 > [ 170.569869] BUG: unable to handle kernel paging request at 9fc1e520 > [ 170.570171] IP: [] print_objinfo+0x9c/0x110 > [ 170.570397] PGD 7cf37067 PUD 0 > [ 170.570619] Oops: [#1] SMP > [ 170.570843] Modules linked in: netconsole acpiphp pci_hotplug > acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse > parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core > button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk > ata_piix libata scsi_mod virtio_pci virtio_ring virtio > [ 170.573474] CPU 0 > [ 170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 > Bochs Bochs > [ 170.573830] RIP: 0010:[] [] > print_objinfo+0x9c/0x110 > [ 170.574106] RSP: 0018:88003eaf3a70 EFLAGS: 00010202 > [ 170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: > 24b8 > [ 170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: > 88003e9bb980 > [ 170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: > > [ 170.574870] R10: 021e R11: 0002 R12: > 9fc1e4c8 > [ 170.575070] R13: 9fc1e520 R14: 004f R15: > ffa5 > [ 170.575274] FS: 7fc6b7530700() GS:88003fc0() > knlGS: > [ 170.575494] CS: 0010 DS: ES: CR0: 8005003b > [ 170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: > 06f0 > [ 170.575870] DR0: DR1: DR2: > > [ 170.576075] DR3: DR6: 0ff0 DR7: > 0400 > [ 170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, > task 88003ea941c0) > [ 170.576507] Stack: > [ 170.576599] 0010 01893fbe 88009fc1e000 > 0050 > [ 170.576938] 9fc1e4c8 004f ffa5 > 8112899f > [ 170.576938] 88003eb309d8 81712d6d 88003e9bb980 > 88009fc1e540 > [ 170.576938] Call Trace: > [ 170.576938] [] ? check_poison_obj+0x1df/0x1f0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? > cache_alloc_debugcheck_after.isra.52+0xed/0x220 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? kmem_cache_alloc+0xb5/0x1e0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ds_result_push+0x5d/0x12e > [ 170.576938] [] ? acpi_ds_exec_end_op+0x28e/0x3d3 > [ 170.576938] [] ? acpi_ps_parse_loop+0x79f/0x931 > [ 170.576938] [] ? acpi_ps_parse_aml+0x89/0x261 > [ 170.576938] [] ? acpi_ps_execute_method+0x1be/0x266 > [ 170.576938] [] ? acpi_ns_evaluate+0xd3/0x19a > [ 170.576938] [] ? acpi_evaluate_object+0xf3/0x1f4 > [ 170.576938] [] ? acpi_os_wait_events_complete+0x1b/0x1b > [ 170.576938] [] ? acpi_bus_hot_remove_device+0xeb/0x123 > [ 170.576938] [] ? acpi_os_execute_deferred+0x1d/0x29 > [ 170.576938] [] ? process_one_work+0x125/0x560 > [ 170.576938] [] ? worker_thread+0x16a/0x4e0 > [ 170.576938] [] ? manage_workers+0x310/0x310 > [ 170.576938] [] ? kthread+0x85/0x90 > [ 170.576938] [] ? kernel_thread_helper+0x4/0x10 > [ 170.576938] [] ? flush_kthread_worker+0xa0/0xa0 > [ 170.576938] [] ? gs_change+0x13/0x13 > [ 170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b > 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> > 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
At 09/27/2012 12:58 AM, Vasilis Liaskovitis Wrote: > Testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3] > Running in a guest (qemu+seabios from [4]). > CONFIG_SLAB=y > CONFIG_DEBUG_SLAB=y > > - succesfull hot-add and online > - succesfull hot-remove with SCI (qemu) eject > - attempt to hot-readd same memory > > When the pages are re-onlined on hot-readd, I get a bad_page state for many > pages e.g. Can you provide your config file? Thanks Wen Congyang > > [ 59.611278] init_memory_mapping: [mem 0x8000-0x9fff] > [ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total > pages: 547617 > [ 59.638739] Policy zone: Normal > [ 59.650840] BUG: Bad page state in process bash pfn:9b6dc > [ 59.651124] page:ea0002200020 count:0 mapcount:0 mapping: > (null) index:0xfdfdfdfdfdfdfdfd > [ 59.651494] page flags: > 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock) > [ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug > acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse > serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys > ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk > libata virtio_pci virtio_ring virtio scsi_mod > [ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12 > [ 59.657172] Call Trace: > [ 59.657275] [] ? bad_page+0xb0/0x100 > [ 59.657434] [] ? free_pages_prepare+0xb3/0x100 > [ 59.657610] [] ? free_hot_cold_page+0x48/0x1a0 > [ 59.657787] [] ? online_pages_range+0x68/0xa0 > [ 59.657961] [] ? > __online_page_increment_counters+0x10/0x10 > [ 59.658162] [] ? walk_system_ram_range+0x101/0x110 > [ 59.658346] [] ? online_pages+0x1a5/0x2b0 > [ 59.658515] [] ? __memory_block_change_state+0x20d/0x270 > [ 59.658710] [] ? store_mem_state+0xb6/0xf0 > [ 59.658878] [] ? sysfs_write_file+0xd2/0x160 > [ 59.659052] [] ? vfs_write+0xaa/0x160 > [ 59.659212] [] ? sys_write+0x47/0x90 > [ 59.659371] [] ? async_page_fault+0x25/0x30 > [ 59.659543] [] ? system_call_fastpath+0x16/0x1b > [ 59.659720] Disabling lock debugging due to kernel taint > > Patch 20/21 deals with a similar scenario, but only for __PG_HWPOISON flag. > Did i miss any other patch for this? > > thanks, > > - Vasilis > > [1] https://lkml.org/lkml/2012/9/6/635 > [2] https://lkml.org/lkml/2012/9/11/542 > [3] https://lkml.org/lkml/2012/9/20/37 > [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote: > Hi, > > I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes > [1],[2],[3] > Running in a guest (qemu+seabios from [4]). > CONFIG_SLAB=y > CONFIG_DEBUG_SLAB=y > > After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > > /sys/bus/acpi/devices/PNP/eject" > When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, > or in other caches > > [ 170.566995] Slab corruption (Not tainted): Acpi-State > start=88009fc1e548, len=80 > [ 170.567265] Redzone: 0x0/0x0. > [ 170.567399] Last user: [< (null)>](0x0) > [ 170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > [ 170.569712] Prev obj: start=9fc1e4d0, len=80 > [ 170.569869] BUG: unable to handle kernel paging request at 9fc1e520 > [ 170.570171] IP: [] print_objinfo+0x9c/0x110 > [ 170.570397] PGD 7cf37067 PUD 0 > [ 170.570619] Oops: [#1] SMP > [ 170.570843] Modules linked in: netconsole acpiphp pci_hotplug > acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse > parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core > button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk > ata_piix libata scsi_mod virtio_pci virtio_ring virtio > [ 170.573474] CPU 0 > [ 170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 > Bochs Bochs > [ 170.573830] RIP: 0010:[] [] > print_objinfo+0x9c/0x110 > [ 170.574106] RSP: 0018:88003eaf3a70 EFLAGS: 00010202 > [ 170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: > 24b8 > [ 170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: > 88003e9bb980 > [ 170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: > > [ 170.574870] R10: 021e R11: 0002 R12: > 9fc1e4c8 > [ 170.575070] R13: 9fc1e520 R14: 004f R15: > ffa5 > [ 170.575274] FS: 7fc6b7530700() GS:88003fc0() > knlGS: > [ 170.575494] CS: 0010 DS: ES: CR0: 8005003b > [ 170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: > 06f0 > [ 170.575870] DR0: DR1: DR2: > > [ 170.576075] DR3: DR6: 0ff0 DR7: > 0400 > [ 170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, > task 88003ea941c0) > [ 170.576507] Stack: > [ 170.576599] 0010 01893fbe 88009fc1e000 > 0050 > [ 170.576938] 9fc1e4c8 004f ffa5 > 8112899f > [ 170.576938] 88003eb309d8 81712d6d 88003e9bb980 > 88009fc1e540 > [ 170.576938] Call Trace: > [ 170.576938] [] ? check_poison_obj+0x1df/0x1f0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? > cache_alloc_debugcheck_after.isra.52+0xed/0x220 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? kmem_cache_alloc+0xb5/0x1e0 > [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c > [ 170.576938] [] ? acpi_ds_result_push+0x5d/0x12e > [ 170.576938] [] ? acpi_ds_exec_end_op+0x28e/0x3d3 > [ 170.576938] [] ? acpi_ps_parse_loop+0x79f/0x931 > [ 170.576938] [] ? acpi_ps_parse_aml+0x89/0x261 > [ 170.576938] [] ? acpi_ps_execute_method+0x1be/0x266 > [ 170.576938] [] ? acpi_ns_evaluate+0xd3/0x19a > [ 170.576938] [] ? acpi_evaluate_object+0xf3/0x1f4 > [ 170.576938] [] ? acpi_os_wait_events_complete+0x1b/0x1b > [ 170.576938] [] ? acpi_bus_hot_remove_device+0xeb/0x123 > [ 170.576938] [] ? acpi_os_execute_deferred+0x1d/0x29 > [ 170.576938] [] ? process_one_work+0x125/0x560 > [ 170.576938] [] ? worker_thread+0x16a/0x4e0 > [ 170.576938] [] ? manage_workers+0x310/0x310 > [ 170.576938] [] ? kthread+0x85/0x90 > [ 170.576938] [] ? kernel_thread_helper+0x4/0x10 > [ 170.576938] [] ? flush_kthread_worker+0xa0/0xa0 > [ 170.576938] [] ? gs_change+0x13/0x13 > [ 170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b > 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> > 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f 81 31 c0 e8 3e 34 3b 00 > > Other times, the problem happens on a slab object free: > > [ 52.313366] Offlined Pages 32768 > [ 52.800232] slab error in verify_redzone_free(): cache `Acpi-ParseExt': > memory outside object was overwritten > [ 52
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
Testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3] Running in a guest (qemu+seabios from [4]). CONFIG_SLAB=y CONFIG_DEBUG_SLAB=y - succesfull hot-add and online - succesfull hot-remove with SCI (qemu) eject - attempt to hot-readd same memory When the pages are re-onlined on hot-readd, I get a bad_page state for many pages e.g. [ 59.611278] init_memory_mapping: [mem 0x8000-0x9fff] [ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total pages: 547617 [ 59.638739] Policy zone: Normal [ 59.650840] BUG: Bad page state in process bash pfn:9b6dc [ 59.651124] page:ea0002200020 count:0 mapcount:0 mapping: (null) index:0xfdfdfdfdfdfdfdfd [ 59.651494] page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock) [ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod [ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12 [ 59.657172] Call Trace: [ 59.657275] [] ? bad_page+0xb0/0x100 [ 59.657434] [] ? free_pages_prepare+0xb3/0x100 [ 59.657610] [] ? free_hot_cold_page+0x48/0x1a0 [ 59.657787] [] ? online_pages_range+0x68/0xa0 [ 59.657961] [] ? __online_page_increment_counters+0x10/0x10 [ 59.658162] [] ? walk_system_ram_range+0x101/0x110 [ 59.658346] [] ? online_pages+0x1a5/0x2b0 [ 59.658515] [] ? __memory_block_change_state+0x20d/0x270 [ 59.658710] [] ? store_mem_state+0xb6/0xf0 [ 59.658878] [] ? sysfs_write_file+0xd2/0x160 [ 59.659052] [] ? vfs_write+0xaa/0x160 [ 59.659212] [] ? sys_write+0x47/0x90 [ 59.659371] [] ? async_page_fault+0x25/0x30 [ 59.659543] [] ? system_call_fastpath+0x16/0x1b [ 59.659720] Disabling lock debugging due to kernel taint Patch 20/21 deals with a similar scenario, but only for __PG_HWPOISON flag. Did i miss any other patch for this? thanks, - Vasilis [1] https://lkml.org/lkml/2012/9/6/635 [2] https://lkml.org/lkml/2012/9/11/542 [3] https://lkml.org/lkml/2012/9/20/37 [4] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/98691 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
Hi, I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes [1],[2],[3] Running in a guest (qemu+seabios from [4]). CONFIG_SLAB=y CONFIG_DEBUG_SLAB=y After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > /sys/bus/acpi/devices/PNP/eject" When I do the OSPM-eject, I often get slab corruption in "acpi-state" cache, or in other caches [ 170.566995] Slab corruption (Not tainted): Acpi-State start=88009fc1e548, len=80 [ 170.567265] Redzone: 0x0/0x0. [ 170.567399] Last user: [< (null)>](0x0) [ 170.567667] 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 170.568078] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 170.568487] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 170.568894] 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 170.569302] 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 170.569712] Prev obj: start=9fc1e4d0, len=80 [ 170.569869] BUG: unable to handle kernel paging request at 9fc1e520 [ 170.570171] IP: [] print_objinfo+0x9c/0x110 [ 170.570397] PGD 7cf37067 PUD 0 [ 170.570619] Oops: [#1] SMP [ 170.570843] Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm tpm_tis microcode tpm tpm_bios psmouse parport_pc serio_raw evdev parport i2c_piix4 processor thermal_sys i2c_core button ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk ata_piix libata scsi_mod virtio_pci virtio_ring virtio [ 170.573474] CPU 0 [ 170.573568] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 Bochs Bochs [ 170.573830] RIP: 0010:[] [] print_objinfo+0x9c/0x110 [ 170.574106] RSP: 0018:88003eaf3a70 EFLAGS: 00010202 [ 170.574268] RAX: 9fc1e4c8 RBX: 0002 RCX: 24b8 [ 170.574468] RDX: 9fc1e4c8 RSI: 9fc1e4c8 RDI: 88003e9bb980 [ 170.574668] RBP: 88003e9bb980 R08: 880037964078 R09: [ 170.574870] R10: 021e R11: 0002 R12: 9fc1e4c8 [ 170.575070] R13: 9fc1e520 R14: 004f R15: ffa5 [ 170.575274] FS: 7fc6b7530700() GS:88003fc0() knlGS: [ 170.575494] CS: 0010 DS: ES: CR0: 8005003b [ 170.575665] CR2: 9fc1e520 CR3: 7c9c1000 CR4: 06f0 [ 170.575870] DR0: DR1: DR2: [ 170.576075] DR3: DR6: 0ff0 DR7: 0400 [ 170.576276] Process kworker/0:1 (pid: 29, threadinfo 88003eaf2000, task 88003ea941c0) [ 170.576507] Stack: [ 170.576599] 0010 01893fbe 88009fc1e000 0050 [ 170.576938] 9fc1e4c8 004f ffa5 8112899f [ 170.576938] 88003eb309d8 81712d6d 88003e9bb980 88009fc1e540 [ 170.576938] Call Trace: [ 170.576938] [] ? check_poison_obj+0x1df/0x1f0 [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c [ 170.576938] [] ? cache_alloc_debugcheck_after.isra.52+0xed/0x220 [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c [ 170.576938] [] ? kmem_cache_alloc+0xb5/0x1e0 [ 170.576938] [] ? acpi_ut_create_generic_state+0x2f/0x4c [ 170.576938] [] ? acpi_ds_result_push+0x5d/0x12e [ 170.576938] [] ? acpi_ds_exec_end_op+0x28e/0x3d3 [ 170.576938] [] ? acpi_ps_parse_loop+0x79f/0x931 [ 170.576938] [] ? acpi_ps_parse_aml+0x89/0x261 [ 170.576938] [] ? acpi_ps_execute_method+0x1be/0x266 [ 170.576938] [] ? acpi_ns_evaluate+0xd3/0x19a [ 170.576938] [] ? acpi_evaluate_object+0xf3/0x1f4 [ 170.576938] [] ? acpi_os_wait_events_complete+0x1b/0x1b [ 170.576938] [] ? acpi_bus_hot_remove_device+0xeb/0x123 [ 170.576938] [] ? acpi_os_execute_deferred+0x1d/0x29 [ 170.576938] [] ? process_one_work+0x125/0x560 [ 170.576938] [] ? worker_thread+0x16a/0x4e0 [ 170.576938] [] ? manage_workers+0x310/0x310 [ 170.576938] [] ? kthread+0x85/0x90 [ 170.576938] [] ? kernel_thread_helper+0x4/0x10 [ 170.576938] [] ? flush_kthread_worker+0xa0/0xa0 [ 170.576938] [] ? gs_change+0x13/0x13 [ 170.576938] Code: cb 75 dc 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 8b 7f 0c 4c 89 e2 e8 02 fd ff ff 4c 89 e6 49 89 c5 48 89 ef e8 d4 fc ff ff <49> 8b 55 00 48 8b 30 48 c7 c7 8c 39 6f 81 31 c0 e8 3e 34 3b 00 Other times, the problem happens on a slab object free: [ 52.313366] Offlined Pages 32768 [ 52.800232] slab error in verify_redzone_free(): cache `Acpi-ParseExt': memory outside object was overwritten [ 52.801298] Pid: 29, comm: kworker/0:1 Not tainted 3.6.0-rc7-guest #12 [ 52.802039] Call Trace: [ 52.802443] [] ? __slab_error.isra.46+0x1b/0x30 [ 52.803199] [] ? cache_free_debugcheck+0x256/0x260 [ 52.803940] [] ? acpi_os_release_object+0x7/0xc [ 52.804645] [] ?
[RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory
From: Wen Congyang This patch series aims to support physical memory hot-remove. The patches can free/remove the following things: - acpi_memory_info : [RFC PATCH 4/19] - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19] - iomem_resource: [RFC PATCH 9/19] - mem_section and related sysfs files : [RFC PATCH 10-11, 13-16/19] - page table of removed memory : [RFC PATCH 12/19] - node and related sysfs files : [RFC PATCH 18-19/19] If you find lack of function for physical memory hot-remove, please let me know. How to test this patchset? 1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE, ACPI_HOTPLUG_MEMORY must be selected. 2. load the module acpi_memhotplug 3. hotplug the memory device(it depends on your hardware) You will see the memory device under the directory /sys/bus/acpi/devices/. Its name is PNP0C80:XX. 4. online/offline pages provided by this memory device You can write online/offline to /sys/devices/system/memory/memoryX/state to online/offline pages provided by this memory device 5. hotremove the memory device You can hotremove the memory device by the hardware, or writing 1 to /sys/bus/acpi/devices/PNP0C80:XX/eject. Note: if the memory provided by the memory device is used by the kernel, it can't be offlined. It is not a bug. Known problems: 1. memory can't be offlined when CONFIG_MEMCG is selected. For example: there is a memory device on node 1. The address range is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10, and memory11 under the directory /sys/devices/system/memory/. If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup when we online pages. When we online memory8, the memory stored page cgroup is not provided by this memory device. But when we online memory9, the memory stored page cgroup may be provided by memory8. So we can't offline memory8 now. We should offline the memory in the reversed order. When the memory device is hotremoved, we will auto offline memory provided by this memory device. But we don't know which memory is onlined first, so offlining memory may fail. In such case, you should offline the memory by hand before hotremoving the memory device. 2. hotremoving memory device may cause kernel panicked This bug will be fixed by Liu Jiang's patch: https://lkml.org/lkml/2012/7/3/1 change log of v9: [RFC PATCH v9 8/21] * add a lock to protect the list map_entries * add an indicator to firmware_map_entry to remember whether the memory is allocated from bootmem [RFC PATCH v9 10/21] * change the macro to inline function [RFC PATCH v9 19/21] * don't offline the node if the cpu on the node is onlined [RFC PATCH v9 21/21] * create new patch: auto offline page_cgroup when onlining memory block failed change log of v8: [RFC PATCH v8 17/20] * Fix problems when one node's range include the other nodes [RFC PATCH v8 18/20] * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS is not defined. [RFC PATCH v8 19/20] * don't offline node when some memory sections are not removed [RFC PATCH v8 20/20] * create new patch: clear hwpoisoned flag when onlining pages change log of v7: [RFC PATCH v7 4/19] * do not continue if acpi_memory_device_remove_memory() fails. [RFC PATCH v7 15/19] * handle usemap in register_page_bootmem_info_section() too. change log of v6: [RFC PATCH v6 12/19] * fix building error on other archtitectures than x86 [RFC PATCH v6 15-16/19] * fix building error on other archtitectures than x86 change log of v5: * merge the patchset to clear page table and the patchset to hot remove memory(from ishimatsu) to one big patchset. [RFC PATCH v5 1/19] * rename remove_memory() to offline_memory()/offline_pages() [RFC PATCH v5 2/19] * new patch: implement offline_memory(). This function offlines pages, update memory block's state, and notify the userspace that the memory block's state is changed. [RFC PATCH v5 4/19] * offline and remove memory in acpi_memory_disable_device() too. [RFC PATCH v5 17/19] * new patch: add a new function __remove_zone() to revert the things done in the function __add_zone(). [RFC PATCH v5 18/19] * flush work befor reseting node device. change log of v4: * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug" from the patch series, since the patch is a bugfix. It is being disccussed on other thread. But for testing the patch series, the patch is needed. So I added the patch as [PATCH 0/13]. [RFC PATCH v4 2/13] * check memory is online or not at remove_memory() * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for getting node id [RFC PATCH v4 3/13] * create new patch : check memory is online or not at online_pages()