[Kernel-packages] [Bug 2015414] Re: 5.15.0-69 ice driver deadlocks with bonded e810 NICs
Happy to provide either hardware or help testing a solution if needed! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2015414 Title: 5.15.0-69 ice driver deadlocks with bonded e810 NICs Status in linux package in Ubuntu: Confirmed Bug description: The ice driver in the 5.15.0-69 kernel deadlocks on rtnl_lock() when adding e810 NICs to a bond interface. Booting with `sysctl.hung_task_panic=1` and `sysctl.hung_task_all_cpu_backtrace=1` added to the kernel command-line shows (among lots of other output): ``` [ 244.980100] INFO: task kworker/6:1:182 blocked for more than 120 seconds. [ 244.988431] Not tainted 5.15.0-69-generic #76-Ubuntu [ 244.995279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 245.004826] task:kworker/6:1 state:D stack:0 pid: 182 ppid: 2 flags:0x4000 [ 245.015017] Workqueue: events linkwatch_event [ 245.020734] Call Trace: [ 245.024144] [ 245.027137] __schedule+0x24e/0x590 [ 245.031848] schedule+0x69/0x110 [ 245.036228] schedule_preempt_disabled+0xe/0x20 [ 245.042066] __mutex_lock.constprop.0+0x267/0x490 [ 245.047993] __mutex_lock_slowpath+0x13/0x20 [ 245.053432] mutex_lock+0x38/0x50 [ 245.057714] rtnl_lock+0x15/0x20 [ 245.061901] linkwatch_event+0xe/0x30 [ 245.066571] process_one_work+0x228/0x3d0 [ 245.071607] worker_thread+0x53/0x420 [ 245.076260] ? process_one_work+0x3d0/0x3d0 [ 245.081493] kthread+0x127/0x150 [ 245.085592] ? set_kthread_struct+0x50/0x50 [ 245.090769] ret_from_fork+0x1f/0x30 [ 245.095266] ``` and ``` [ 245.530629] INFO: task ifenslave:849 blocked for more than 121 seconds. [ 245.540433] Not tainted 5.15.0-69-generic #76-Ubuntu [ 245.549050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 245.558960] task:ifenslave state:D stack:0 pid: 849 ppid: 847 flags:0x4002 [ 245.570930] Call Trace: [ 245.576175] [ 245.581018] __schedule+0x24e/0x590 [ 245.587445] schedule+0x69/0x110 [ 245.593631] schedule_timeout+0x103/0x140 [ 245.600573] __wait_for_common+0xab/0x150 [ 245.607526] ? usleep_range_state+0x90/0x90 [ 245.614743] wait_for_completion+0x24/0x30 [ 245.621903] flush_workqueue+0x133/0x3e0 [ 245.628887] ib_cache_cleanup_one+0x21/0xf0 [ib_core] [ 245.637083] __ib_unregister_device+0x79/0xc0 [ib_core] [ 245.645398] ib_unregister_device+0x27/0x40 [ib_core] [ 245.653541] irdma_ib_unregister_device+0x4b/0x70 [irdma] [ 245.662105] irdma_remove+0x1f/0x70 [irdma] [ 245.669446] auxiliary_bus_remove+0x1d/0x40 [ 245.676688] __device_release_driver+0x1a8/0x2a0 [ 245.684241] device_release_driver+0x29/0x40 [ 245.691416] bus_remove_device+0xde/0x150 [ 245.698396] device_del+0x19c/0x400 [**712178] ice_lag_link.isra.0+0xdd/0xf0 [ice] m] (3 of 5) A start job is runni[ 245.720683] ice_lag_changeupper_event+0xe1/0x130 [ice] ng for\u2026rk interfaces (3min 47s[ 245.729739] ice_lag_event_handler+0x5b/0x150 [ice] / 5min 3s) [ 245.738525] raw_notifier_call_chain+0x46/0x60 [ 245.746006] call_netdevice_notifiers_info+0x52/0xa0 [ 245.754123] __netdev_upper_dev_link+0x1b7/0x310 [ 245.761658] netdev_master_upper_dev_link+0x3e/0x60 [ 245.769627] bond_enslave+0xc3a/0x1720 [bonding] [ 245.777398] ? sscanf+0x4e/0x70 [ 245.783375] bond_option_slaves_set+0xca/0x170 [bonding] [ 245.791738] __bond_opt_set+0xbd/0x1a0 [bonding] [ 245.799505] __bond_opt_set_notify+0x30/0xb0 [bonding] [ 245.807860] bond_opt_tryset_rtnl+0x56/0xa0 [bonding] [ 245.816062] bonding_sysfs_store_option+0x52/0xa0 [bonding] [ 245.824750] dev_attr_store+0x14/0x30 [ 245.831443] sysfs_kf_write+0x3b/0x50 [ 245.837979] kernfs_fop_write_iter+0x138/0x1c0 [ 245.845469] new_sync_write+0x111/0x1a0 [ 245.852210] vfs_write+0x1d5/0x270 [ 245.858429] ksys_write+0x67/0xf0 [ 245.864624] __x64_sys_write+0x19/0x20 [ 245.871288] do_syscall_64+0x59/0xc0 [ 245.877715] ? handle_mm_fault+0xd8/0x2c0 [ 245.884566] ? do_user_addr_fault+0x1e7/0x670 [ 245.891990] ? filp_close+0x60/0x70 [ 245.898452] ? exit_to_user_mode_prepare+0x37/0xb0 [ 245.906272] ? irqentry_exit_to_user_mode+0x9/0x20 [ 245.914042] ? irqentry_exit+0x1d/0x30 [ 245.920703] ? exc_page_fault+0x89/0x170 [ 245.927555] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 245.935763] RIP: 0033:0x7f1e86855a37 [ 245.942153] RSP: 002b:7fff8da477a8 EFLAGS: 0246 ORIG_RAX: 0001 [ 245.953034] RAX: ffda RBX: 000a RCX: 7f1e86855a37 [ 245.963554] RDX: 000a RSI: 556eff580510 RDI: 0001 [ 245.972468] RBP: 556eff580510 R08: 556eff582c5a R09: [ 245.983048] R10:
[Kernel-packages] [Bug 1534054] Re: use-after-free found by KASAN in blk_mq_register_disk
** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1534054 Title: use-after-free found by KASAN in blk_mq_register_disk Status in linux package in Ubuntu: Confirmed Bug description: We are trying to debug the kernel using KASAN and we found that when a VM is booting in our cloud, on the virtualised kernel, there is a use- after-free access that should not be there. The failing VM was running on a host with kernel 3.13.0-66-generic (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0. Hosts' seabios: 1.7.5-1ubuntu1~cloud0 The flavour of this VM is 4 CPUs, 8G RAM, 80G of root disk, 0 G swap and 0 G ephemeral disk. Here is the trace from KASAN (from the VM): The error message can be observed in the dmesg when the guest VM booted with v3.13.0-65 with KASAN enabled and "slub_debug=PU,kmalloc-32" in kernel command line. == BUG: KASan: out of bounds access in blk_mq_register_disk+0x193/0x260 at addr 8801f43f4d90 Read of size 8 by task swapper/0/1 = BUG kmalloc-32 (Not tainted): kasan: bad access detected - Disabling lock debugging due to kernel taint INFO: Allocated in blk_mq_init_hw_queues+0x778/0x920 age=5 cpu=1 pid=1 __slab_alloc+0x4f8/0x560 __kmalloc_node+0xad/0x310 blk_mq_init_hw_queues+0x778/0x920 blk_mq_init_queue+0x5f7/0x6c0 virtblk_probe+0x207/0x980 virtio_dev_probe+0x1be/0x280 driver_probe_device+0xe2/0x5c0 __driver_attach+0xc3/0xd0 bus_for_each_dev+0x95/0xe0 driver_attach+0x2b/0x30 bus_add_driver+0x268/0x360 driver_register+0xd3/0x1a0 register_virtio_driver+0x3c/0x60 init+0x53/0x80 do_one_initcall+0xda/0x1a0 kernel_init_freeable+0x1eb/0x27e INFO: Freed in kzfree+0x2d/0x40 age=13 cpu=0 pid=8 __slab_free+0x2ab/0x3f0 kfree+0x161/0x170 kzfree+0x2d/0x40 aa_free_task_context+0x5d/0xa0 apparmor_cred_free+0x24/0x40 security_cred_free+0x2b/0x30 put_cred_rcu+0x38/0x140 rcu_nocb_kthread+0x25a/0x410 kthread+0x101/0x120 ret_from_fork+0x58/0x90 INFO: Slab 0xea0007d0fd00 objects=23 used=21 fp=0x8801f43f52d0 flags=0x2004080 INFO: Object 0x8801f43f4d70 @offset=3440 fp=0x8801f43f5830 Bytes b4 8801f43f4d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Object 8801f43f4d70: 00 ac 61 f7 01 88 ff ff 00 ac 69 f7 01 88 ff ff ..a...i. Object 8801f43f4d80: 00 ac 71 f7 01 88 ff ff 00 ac 79 f7 01 88 ff ff ..q...y. CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105 Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20150310_111955-batsu 04/01/2014 ea0007d0fd00 8801f40cf9a8 81a6ce35 8801f7001c00 8801f40cf9d8 81244aed 8801f7001c00 ea0007d0fd00 8801f43f4d70 8801f779ac98 8801f40cfa00 8124ac36 Call Trace: [] dump_stack+0x45/0x56 [] print_trailer+0xfd/0x170 [] object_err+0x36/0x40 [] kasan_report_error+0x1e9/0x3a0 [] ? sysfs_get+0x17/0x50 [] ? kobject_add_internal+0x29b/0x4a0 [] kasan_report+0x40/0x50 [] ? dev_printk_emit+0x20/0x40 [] ? blk_mq_register_disk+0x193/0x260 [] __asan_load8+0x69/0xa0 [] blk_mq_register_disk+0x193/0x260 [] blk_register_queue+0xd2/0x170 [] add_disk+0x31f/0x720 [] virtblk_probe+0x58a/0x980 [] ? virtblk_restore+0x100/0x100 [] virtio_dev_probe+0x1be/0x280 [] ? __device_attach+0x70/0x70 [] driver_probe_device+0xe2/0x5c0 [] ? __device_attach+0x70/0x70 [] __driver_attach+0xc3/0xd0 [] bus_for_each_dev+0x95/0xe0 [] driver_attach+0x2b/0x30 [] bus_add_driver+0x268/0x360 [] driver_register+0xd3/0x1a0 [] ? loop_init+0x14b/0x14b [] register_virtio_driver+0x3c/0x60 [] init+0x53/0x80 [] do_one_initcall+0xda/0x1a0 [] kernel_init_freeable+0x1eb/0x27e [] ? rest_init+0x80/0x80 [] kernel_init+0xe/0x130 [] ret_from_fork+0x58/0x90 [] ? rest_init+0x80/0x80 Memory state around the buggy address: 8801f43f4c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 8801f43f4d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 00 00 >8801f43f4d80: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ 8801f43f4e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 8801f43f4e80: fc fc fc fc fc fc fc fc fc 00 00 00 00 fc fc fc ==
[Kernel-packages] [Bug 1534054] Re: use-after-free found by KASAN in blk_mq_register_disk
** Description changed: We are trying to debug the kernel using KASAN and we found that when a VM is booting in our cloud, on the virtualised kernel, there is a use- after-free access that should not be there. - Here is the trace from KASAN: + The failing VM was running on a host with kernel 3.13.0-66-generic + (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0 + + Here is the trace from KASAN (from the VM): The error message can be observed in the dmesg when the guest VM booted with v3.13.0-65 with KASAN enabled. == BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 8801ec247400 Read of size 8 by task swapper/0/1 = BUG kmalloc-32 (Not tainted): kasan: bad access detected - Disabling lock debugging due to kernel taint INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x (null) flags=0x280 INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420 Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff ..q...y. Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c t$./virtual Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00 /bdi/253:0.. CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105 Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20150310_111955-batsu 04/01/2014 ea0007b091c0 8801ec0cb9a8 81a6ce35 8801ef001c00 8801ec0cb9d8 81244aed 8801ef001c00 ea0007b091c0 8801ec247400 8801ef79ac98 8801ec0cba00 8124ac36 Call Trace: [] dump_stack+0x45/0x56 [] print_trailer+0xfd/0x170 [] object_err+0x36/0x40 [] kasan_report_error+0x1e9/0x3a0 [] ? sysfs_get+0x17/0x50 [] ? kobject_add_internal+0x29b/0x4a0 [] kasan_report+0x40/0x50 [] ? dev_printk_emit+0x20/0x40 [] ? blk_mq_register_disk+0x193/0x260 [] __asan_load8+0x69/0xa0 [] blk_mq_register_disk+0x193/0x260 [] blk_register_queue+0xd2/0x170 [] add_disk+0x31f/0x720 [] virtblk_probe+0x58a/0x980 [] ? virtblk_restore+0x100/0x100 [] virtio_dev_probe+0x1be/0x280 [] ? __device_attach+0x70/0x70 [] driver_probe_device+0xe2/0x5c0 [] ? __device_attach+0x70/0x70 [] __driver_attach+0xc3/0xd0 [] bus_for_each_dev+0x95/0xe0 [] driver_attach+0x2b/0x30 [] bus_add_driver+0x268/0x360 [] driver_register+0xd3/0x1a0 [] ? loop_init+0x14b/0x14b [] register_virtio_driver+0x3c/0x60 [] init+0x53/0x80 [] do_one_initcall+0xda/0x1a0 [] kernel_init_freeable+0x1eb/0x27e [] ? rest_init+0x80/0x80 [] kernel_init+0xe/0x130 [] ret_from_fork+0x58/0x90 [] ? rest_init+0x80/0x80 Memory state around the buggy address: 8801ec247300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8801ec247380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >8801ec247400: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ^ 8801ec247480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 8801ec247500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc == ** Description changed: We are trying to debug the kernel using KASAN and we found that when a VM is booting in our cloud, on the virtualised kernel, there is a use- after-free access that should not be there. The failing VM was running on a host with kernel 3.13.0-66-generic (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0 + + The flavour of this VM is 4 CPUs, 8G RAM, 80G of root disk, 0 G swap and + 0 G ephemeral disk. Here is the trace from KASAN (from the VM): The error message can be observed in the dmesg when the guest VM booted with v3.13.0-65 with KASAN enabled. == BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 8801ec247400 Read of size 8 by task swapper/0/1 = BUG kmalloc-32 (Not tainted): kasan: bad access detected - Disabling lock debugging due to kernel taint INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x (null) flags=0x280 INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420 Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff ..q...y. Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c t$./virtual Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00 /bdi/253:0.. CPU: 0 PID: 1 Comm:
[Kernel-packages] [Bug 1534054] Re: use-after-free found by KASAN in blk_mq_register_disk
** Description changed: + We are trying to debug the kernel using KASAN and we found that when a + VM is booting in our cloud, on the virtualised kernel, there is a use- + after-free access that should not be there. + + Here is the trace from KASAN: + The error message can be observed in the dmesg when the guest VM booted with v3.13.0-65 with KASAN enabled. == BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 8801ec247400 Read of size 8 by task swapper/0/1 = BUG kmalloc-32 (Not tainted): kasan: bad access detected - Disabling lock debugging due to kernel taint INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x (null) flags=0x280 INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420 Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff ..q...y. Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c t$./virtual Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00 /bdi/253:0.. CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105 Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20150310_111955-batsu 04/01/2014 ea0007b091c0 8801ec0cb9a8 81a6ce35 8801ef001c00 8801ec0cb9d8 81244aed 8801ef001c00 ea0007b091c0 8801ec247400 8801ef79ac98 8801ec0cba00 8124ac36 Call Trace: [] dump_stack+0x45/0x56 [] print_trailer+0xfd/0x170 [] object_err+0x36/0x40 [] kasan_report_error+0x1e9/0x3a0 [] ? sysfs_get+0x17/0x50 [] ? kobject_add_internal+0x29b/0x4a0 [] kasan_report+0x40/0x50 [] ? dev_printk_emit+0x20/0x40 [] ? blk_mq_register_disk+0x193/0x260 [] __asan_load8+0x69/0xa0 [] blk_mq_register_disk+0x193/0x260 [] blk_register_queue+0xd2/0x170 [] add_disk+0x31f/0x720 [] virtblk_probe+0x58a/0x980 [] ? virtblk_restore+0x100/0x100 [] virtio_dev_probe+0x1be/0x280 [] ? __device_attach+0x70/0x70 [] driver_probe_device+0xe2/0x5c0 [] ? __device_attach+0x70/0x70 [] __driver_attach+0xc3/0xd0 [] bus_for_each_dev+0x95/0xe0 [] driver_attach+0x2b/0x30 [] bus_add_driver+0x268/0x360 [] driver_register+0xd3/0x1a0 [] ? loop_init+0x14b/0x14b [] register_virtio_driver+0x3c/0x60 [] init+0x53/0x80 [] do_one_initcall+0xda/0x1a0 [] kernel_init_freeable+0x1eb/0x27e [] ? rest_init+0x80/0x80 [] kernel_init+0xe/0x130 [] ret_from_fork+0x58/0x90 [] ? rest_init+0x80/0x80 Memory state around the buggy address: 8801ec247300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8801ec247380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >8801ec247400: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ^ 8801ec247480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 8801ec247500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc == ** Tags added: sts -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1534054 Title: use-after-free found by KASAN in blk_mq_register_disk Status in linux package in Ubuntu: Incomplete Bug description: We are trying to debug the kernel using KASAN and we found that when a VM is booting in our cloud, on the virtualised kernel, there is a use- after-free access that should not be there. Here is the trace from KASAN: The error message can be observed in the dmesg when the guest VM booted with v3.13.0-65 with KASAN enabled. == BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 8801ec247400 Read of size 8 by task swapper/0/1 = BUG kmalloc-32 (Not tainted): kasan: bad access detected - Disabling lock debugging due to kernel taint INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x (null) flags=0x280 INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420 Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff ..q...y. Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c t$./virtual Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00 /bdi/253:0.. CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105 Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20150310_111955-batsu 04/01/2014
[Kernel-packages] [Bug 1534054] Re: use-after-free found by KASAN in blk_mq_register_disk
** Description changed: We are trying to debug the kernel using KASAN and we found that when a VM is booting in our cloud, on the virtualised kernel, there is a use- after-free access that should not be there. The failing VM was running on a host with kernel 3.13.0-66-generic - (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0 + (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0. Hosts' + seabios: 1.7.5-1ubuntu1~cloud0 The flavour of this VM is 4 CPUs, 8G RAM, 80G of root disk, 0 G swap and 0 G ephemeral disk. Here is the trace from KASAN (from the VM): The error message can be observed in the dmesg when the guest VM booted with v3.13.0-65 with KASAN enabled. == BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 8801ec247400 Read of size 8 by task swapper/0/1 = BUG kmalloc-32 (Not tainted): kasan: bad access detected - Disabling lock debugging due to kernel taint INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x (null) flags=0x280 INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420 Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff ..q...y. Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c t$./virtual Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00 /bdi/253:0.. CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105 Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20150310_111955-batsu 04/01/2014 ea0007b091c0 8801ec0cb9a8 81a6ce35 8801ef001c00 8801ec0cb9d8 81244aed 8801ef001c00 ea0007b091c0 8801ec247400 8801ef79ac98 8801ec0cba00 8124ac36 Call Trace: [] dump_stack+0x45/0x56 [] print_trailer+0xfd/0x170 [] object_err+0x36/0x40 [] kasan_report_error+0x1e9/0x3a0 [] ? sysfs_get+0x17/0x50 [] ? kobject_add_internal+0x29b/0x4a0 [] kasan_report+0x40/0x50 [] ? dev_printk_emit+0x20/0x40 [] ? blk_mq_register_disk+0x193/0x260 [] __asan_load8+0x69/0xa0 [] blk_mq_register_disk+0x193/0x260 [] blk_register_queue+0xd2/0x170 [] add_disk+0x31f/0x720 [] virtblk_probe+0x58a/0x980 [] ? virtblk_restore+0x100/0x100 [] virtio_dev_probe+0x1be/0x280 [] ? __device_attach+0x70/0x70 [] driver_probe_device+0xe2/0x5c0 [] ? __device_attach+0x70/0x70 [] __driver_attach+0xc3/0xd0 [] bus_for_each_dev+0x95/0xe0 [] driver_attach+0x2b/0x30 [] bus_add_driver+0x268/0x360 [] driver_register+0xd3/0x1a0 [] ? loop_init+0x14b/0x14b [] register_virtio_driver+0x3c/0x60 [] init+0x53/0x80 [] do_one_initcall+0xda/0x1a0 [] kernel_init_freeable+0x1eb/0x27e [] ? rest_init+0x80/0x80 [] kernel_init+0xe/0x130 [] ret_from_fork+0x58/0x90 [] ? rest_init+0x80/0x80 Memory state around the buggy address: 8801ec247300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8801ec247380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >8801ec247400: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ^ 8801ec247480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 8801ec247500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc == -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1534054 Title: use-after-free found by KASAN in blk_mq_register_disk Status in linux package in Ubuntu: Incomplete Bug description: We are trying to debug the kernel using KASAN and we found that when a VM is booting in our cloud, on the virtualised kernel, there is a use- after-free access that should not be there. The failing VM was running on a host with kernel 3.13.0-66-generic (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0. Hosts' seabios: 1.7.5-1ubuntu1~cloud0 The flavour of this VM is 4 CPUs, 8G RAM, 80G of root disk, 0 G swap and 0 G ephemeral disk. Here is the trace from KASAN (from the VM): The error message can be observed in the dmesg when the guest VM booted with v3.13.0-65 with KASAN enabled. == BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 8801ec247400 Read of size 8 by task swapper/0/1 = BUG kmalloc-32 (Not tainted): kasan: bad access detected - Disabling lock debugging due to kernel taint INFO:
[Kernel-packages] [Bug 1324125] Re: Unable to trigger a kernel crash dump on laptop
This bug is probably not relevant anymore, I just tried on my laptop and it crashed nicely: ii linux-crashdump 3.13.0.49.56 amd64Linux kernel crashdump setup for the latest generic kernel Kernel: 3.16.0-28-generic On trusty. ** Tags removed: cts ** Changed in: kexec-tools (Ubuntu) Status: Confirmed = Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to kexec-tools in Ubuntu. https://bugs.launchpad.net/bugs/1324125 Title: Unable to trigger a kernel crash dump on laptop Status in kexec-tools package in Ubuntu: Incomplete Bug description: kernel crash dump doesn't work on laptop 2. Ubuntu release, Trusty Steps to Reproduce: echo c /proc/sysrq-trigger with linux-crashdump installed To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1324125/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM
My deployment is still running strong after over 36 hours. No crashes. I will leave it running for a few more days to see if it happens after a few days... and will report back. @arges, thanks for this fix! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1413540 Title: Trusty soft lockup issues with nested KVM Status in linux package in Ubuntu: Fix Released Status in linux source package in Trusty: Fix Committed Bug description: [Impact] Upstream discussion: https://lkml.org/lkml/2015/2/11/247 Certain workloads that need to execute functions on a non-local CPU using smp_call_function_* can result in soft lockups with the following backtrace: PID: 22262 TASK: 8804274bb000 CPU: 1 COMMAND: qemu-system-x86 #0 [88043fd03d18] machine_kexec at 8104ac02 #1 [88043fd03d68] crash_kexec at 810e7203 #2 [88043fd03e30] panic at 81719ff4 #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5 #4 [88043fd03ed8] __run_hrtimer at 8108e787 #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f #6 [88043fd03f80] local_apic_timer_interrupt at 81043537 #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f #8 [88043fd03fb0] apic_timer_interrupt at 817326dd --- IRQ stack --- #9 [880426f0d958] apic_timer_interrupt at 817326dd [exception RIP: generic_exec_single+130] RIP: 810dbe62 RSP: 880426f0da00 RFLAGS: 0202 RAX: 0002 RBX: 880426f0d9d0 RCX: 0001 RDX: 8180ad60 RSI: RDI: 0286 RBP: 880426f0da30 R8: 8180ad48 R9: 88042713bc68 R10: 7fe7d1f2dbd0 R11: 0206 R12: 8804274bb000 R13: R14: 880407670280 R15: ORIG_RAX: ff10 CS: 0010 SS: 0018 #10 [880426f0da38] smp_call_function_single at 810dbf75 #11 [880426f0dab0] smp_call_function_many at 810dc3a6 #12 [880426f0db10] native_flush_tlb_others at 8105c8f7 #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb #14 [880426f0db68] pmdp_splitting_flush at 8105b80d #15 [880426f0db88] __split_huge_page at 811ac90b #16 [880426f0dc20] split_huge_page_to_list at 811acfb8 #17 [880426f0dc48] __split_huge_page_pmd at 811ad956 #18 [880426f0dcc8] unmap_page_range at 8117728d #19 [880426f0dda0] unmap_single_vma at 81177341 #20 [880426f0ddd8] zap_page_range at 811784cd #21 [880426f0de90] sys_madvise at 81174fbf #22 [880426f0df80] system_call_fastpath at 8173196d RIP: 7fe7ca2cc647 RSP: 7fe7be9febf0 RFLAGS: 0293 RAX: 001c RBX: 8173196d RCX: RDX: 0004 RSI: 007fb000 RDI: 7fe7be1ff000 RBP: R8: R9: 7fe7d1cd2738 R10: 7fe7d1f2dbd0 R11: 0206 R12: 7fe7be9ff700 R13: 7fe7be9ff9c0 R14: R15: ORIG_RAX: 001c CS: 0033 SS: 002b [Fix] commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3, Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of the affected 3.13 distro kernel. However the issue can still occur in some cases. [Workaround] In order to avoid this issue, the workload needs to be pinned to CPUs such that the function always executes locally. For the nested VM case, this means the the L1 VM needs to have all vCPUs pinned to a unique CPU. This can be accomplished with the following (for 2 vCPUs): virsh vcpupin domain 0 0 virsh vcpupin domain 1 1 [Test Case] - Deploy openstack on openstack - Run tempest on L1 cloud - Check kernel log of L1 nova-compute nodes (Although this may not necessarily be related to nested KVM) Potentially related: https://lkml.org/lkml/2014/11/14/656 Another test case is to do the following (on affected hardware): 1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce) 2) Create an L2 KVM VM inside the L1 VM with 1 vCPU 3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM Sometimes this is sufficient to reproduce the issue, I've observed that running KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others). If this doesn't reproduce then you can do the following: 4) Migrate the L2 vCPU randomly (via virsh vcpupin --live OR tasksel) between L1 vCPUs until the hang occurs. -- Original Description: When installing qemu-kvm on a VM, KSM is enabled. I have encountered this problem in trusty:$ lsb_release -a Distributor ID: Ubuntu Description:
[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM
I have been trying to verify this kernel and I haven't seen exactly the soft lockup crash, but this other one, which may or may not be related but wanted to make a note of it: [ 2406.041444] Kernel panic - not syncing: hung_task: blocked tasks [ 2406.043163] CPU: 1 PID: 35 Comm: khungtaskd Not tainted 3.13.0-51-generic #84-Ubuntu [ 2406.044223] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 [ 2406.044223] 003fffd1 88080ec7fdf0 817225ce 81a62a65 [ 2406.044223] 88080ec7fe68 8171b46d 0008 88080ec7fe78 [ 2406.044223] 88080ec7fe18 88080ec7fe40 0100 0004 [ 2406.044223] Call Trace: [ 2406.044223] [817225ce] dump_stack+0x45/0x56 [ 2406.044223] [8171b46d] panic+0xc8/0x1d7 [ 2406.044223] [8110d7b6] watchdog+0x296/0x2e0 [ 2406.044223] [8110d520] ? reset_hung_task_detector+0x20/0x20 [ 2406.044223] [8108b5d2] kthread+0xd2/0xf0 [ 2406.044223] [8108b500] ? kthread_create_on_node+0x1c0/0x1c0 [ 2406.044223] [8173300c] ret_from_fork+0x7c/0xb0 [ 2406.044223] [8108b500] ? kthread_create_on_node+0x1c0/0x1c0 I have the crashdump for it, let me know how you want to proceed. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1413540 Title: Trusty soft lockup issues with nested KVM Status in linux package in Ubuntu: Fix Released Status in linux source package in Trusty: Fix Committed Bug description: [Impact] Upstream discussion: https://lkml.org/lkml/2015/2/11/247 Certain workloads that need to execute functions on a non-local CPU using smp_call_function_* can result in soft lockups with the following backtrace: PID: 22262 TASK: 8804274bb000 CPU: 1 COMMAND: qemu-system-x86 #0 [88043fd03d18] machine_kexec at 8104ac02 #1 [88043fd03d68] crash_kexec at 810e7203 #2 [88043fd03e30] panic at 81719ff4 #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5 #4 [88043fd03ed8] __run_hrtimer at 8108e787 #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f #6 [88043fd03f80] local_apic_timer_interrupt at 81043537 #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f #8 [88043fd03fb0] apic_timer_interrupt at 817326dd --- IRQ stack --- #9 [880426f0d958] apic_timer_interrupt at 817326dd [exception RIP: generic_exec_single+130] RIP: 810dbe62 RSP: 880426f0da00 RFLAGS: 0202 RAX: 0002 RBX: 880426f0d9d0 RCX: 0001 RDX: 8180ad60 RSI: RDI: 0286 RBP: 880426f0da30 R8: 8180ad48 R9: 88042713bc68 R10: 7fe7d1f2dbd0 R11: 0206 R12: 8804274bb000 R13: R14: 880407670280 R15: ORIG_RAX: ff10 CS: 0010 SS: 0018 #10 [880426f0da38] smp_call_function_single at 810dbf75 #11 [880426f0dab0] smp_call_function_many at 810dc3a6 #12 [880426f0db10] native_flush_tlb_others at 8105c8f7 #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb #14 [880426f0db68] pmdp_splitting_flush at 8105b80d #15 [880426f0db88] __split_huge_page at 811ac90b #16 [880426f0dc20] split_huge_page_to_list at 811acfb8 #17 [880426f0dc48] __split_huge_page_pmd at 811ad956 #18 [880426f0dcc8] unmap_page_range at 8117728d #19 [880426f0dda0] unmap_single_vma at 81177341 #20 [880426f0ddd8] zap_page_range at 811784cd #21 [880426f0de90] sys_madvise at 81174fbf #22 [880426f0df80] system_call_fastpath at 8173196d RIP: 7fe7ca2cc647 RSP: 7fe7be9febf0 RFLAGS: 0293 RAX: 001c RBX: 8173196d RCX: RDX: 0004 RSI: 007fb000 RDI: 7fe7be1ff000 RBP: R8: R9: 7fe7d1cd2738 R10: 7fe7d1f2dbd0 R11: 0206 R12: 7fe7be9ff700 R13: 7fe7be9ff9c0 R14: R15: ORIG_RAX: 001c CS: 0033 SS: 002b [Fix] commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3, Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of the affected 3.13 distro kernel. However the issue can still occur in some cases. [Workaround] In order to avoid this issue, the workload needs to be pinned to CPUs such that the function always executes locally. For the nested VM case, this means the the L1 VM needs to have all vCPUs pinned to a unique CPU. This can be accomplished with the following (for 2 vCPUs): virsh vcpupin domain 0 0 virsh vcpupin
[Kernel-packages] [Bug 1413540] Re: qemu-kvm package enables KSM on VMs
apport file for linux. ** Attachment added: apport.linux-image-3.13.0-44-generic.61de1tqv.apport https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1413540/+attachment/4303621/+files/apport.linux-image-3.13.0-44-generic.61de1tqv.apport -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1413540 Title: qemu-kvm package enables KSM on VMs Status in linux package in Ubuntu: Incomplete Status in qemu package in Ubuntu: Confirmed Bug description: When installing qemu-kvm on a VM, KSM is enabled. I have encountered this problem in trusty:$ lsb_release -a Distributor ID: Ubuntu Description:Ubuntu 14.04.1 LTS Release:14.04 Codename: trusty $ uname -a Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux The way to see the behaviour: 1) $ more /sys/kernel/mm/ksm/run 0 2) $ sudo apt-get install qemu-kvm 3) $ more /sys/kernel/mm/ksm/run 1 To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least): 24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] I am not sure whether the problem is that we are enabling KSM on a VM or the problem is that nested KSM is not behaving properly. Either way I can easily reproduce, please contact me if you need further details. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1413540] Re: qemu-kvm package enables KSM on VMs
apport for qemu ** Attachment added: apport.qemu.pnfp6lff.apport https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1413540/+attachment/4303623/+files/apport.qemu.pnfp6lff.apport -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1413540 Title: qemu-kvm package enables KSM on VMs Status in linux package in Ubuntu: Incomplete Status in qemu package in Ubuntu: Confirmed Bug description: When installing qemu-kvm on a VM, KSM is enabled. I have encountered this problem in trusty:$ lsb_release -a Distributor ID: Ubuntu Description:Ubuntu 14.04.1 LTS Release:14.04 Codename: trusty $ uname -a Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux The way to see the behaviour: 1) $ more /sys/kernel/mm/ksm/run 0 2) $ sudo apt-get install qemu-kvm 3) $ more /sys/kernel/mm/ksm/run 1 To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least): 24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] I am not sure whether the problem is that we are enabling KSM on a VM or the problem is that nested KSM is not behaving properly. Either way I can easily reproduce, please contact me if you need further details. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1413540] Re: qemu-kvm package enables KSM on VMs
I can reproduce this issue and hand a VM over to whoever is going to triage in a hung state. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1413540 Title: qemu-kvm package enables KSM on VMs Status in linux package in Ubuntu: Incomplete Status in qemu package in Ubuntu: Confirmed Bug description: When installing qemu-kvm on a VM, KSM is enabled. I have encountered this problem in trusty:$ lsb_release -a Distributor ID: Ubuntu Description:Ubuntu 14.04.1 LTS Release:14.04 Codename: trusty $ uname -a Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux The way to see the behaviour: 1) $ more /sys/kernel/mm/ksm/run 0 2) $ sudo apt-get install qemu-kvm 3) $ more /sys/kernel/mm/ksm/run 1 To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least): 24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] I am not sure whether the problem is that we are enabling KSM on a VM or the problem is that nested KSM is not behaving properly. Either way I can easily reproduce, please contact me if you need further details. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1413540] Re: qemu-kvm package enables KSM on VMs
apport file for linux. ** Attachment added: apport.linux-image-3.13.0-44-generic.61de1tqv.apport https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1413540/+attachment/4303622/+files/apport.linux-image-3.13.0-44-generic.61de1tqv.apport -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1413540 Title: qemu-kvm package enables KSM on VMs Status in linux package in Ubuntu: Incomplete Status in qemu package in Ubuntu: Confirmed Bug description: When installing qemu-kvm on a VM, KSM is enabled. I have encountered this problem in trusty:$ lsb_release -a Distributor ID: Ubuntu Description:Ubuntu 14.04.1 LTS Release:14.04 Codename: trusty $ uname -a Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux The way to see the behaviour: 1) $ more /sys/kernel/mm/ksm/run 0 2) $ sudo apt-get install qemu-kvm 3) $ more /sys/kernel/mm/ksm/run 1 To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least): 24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] I am not sure whether the problem is that we are enabling KSM on a VM or the problem is that nested KSM is not behaving properly. Either way I can easily reproduce, please contact me if you need further details. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1413540] Re: issues with KSM enabled for nested KVM VMs
I have a different VM that has crashed (also nested nova compute), this one had ksm disabled. See log attached. ** Attachment added: soft-lockup-different-node.log https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1413540/+attachment/4303814/+files/soft-lockup-different-node.log ** Changed in: linux (Ubuntu) Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1413540 Title: issues with KSM enabled for nested KVM VMs Status in linux package in Ubuntu: Confirmed Status in qemu package in Ubuntu: Confirmed Bug description: When installing qemu-kvm on a VM, KSM is enabled. I have encountered this problem in trusty:$ lsb_release -a Distributor ID: Ubuntu Description:Ubuntu 14.04.1 LTS Release:14.04 Codename: trusty $ uname -a Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux The way to see the behaviour: 1) $ more /sys/kernel/mm/ksm/run 0 2) $ sudo apt-get install qemu-kvm 3) $ more /sys/kernel/mm/ksm/run 1 To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least): 24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] I am not sure whether the problem is that we are enabling KSM on a VM or the problem is that nested KSM is not behaving properly. Either way I can easily reproduce, please contact me if you need further details. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp