[Kernel-packages] [Bug 2015414] Re: 5.15.0-69 ice driver deadlocks with bonded e810 NICs

2023-04-06 Thread Gema Gomez
Happy to provide either hardware or help testing a solution if needed!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2015414

Title:
  5.15.0-69 ice driver deadlocks with bonded e810 NICs

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The ice driver in the 5.15.0-69 kernel deadlocks on rtnl_lock() when
  adding e810 NICs to a bond interface.  Booting with
  `sysctl.hung_task_panic=1` and `sysctl.hung_task_all_cpu_backtrace=1`
  added to the kernel command-line shows (among lots of other output):

  ```
  [  244.980100] INFO: task kworker/6:1:182 blocked for more than 120 seconds.
  [  244.988431]   Not tainted 5.15.0-69-generic #76-Ubuntu
  [  244.995279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  245.004826] task:kworker/6:1 state:D stack:0 pid:  182 ppid: 2 
flags:0x4000
  [  245.015017] Workqueue: events linkwatch_event
  [  245.020734] Call Trace:
  [  245.024144]  
  [  245.027137]  __schedule+0x24e/0x590
  [  245.031848]  schedule+0x69/0x110
  [  245.036228]  schedule_preempt_disabled+0xe/0x20
  [  245.042066]  __mutex_lock.constprop.0+0x267/0x490
  [  245.047993]  __mutex_lock_slowpath+0x13/0x20
  [  245.053432]  mutex_lock+0x38/0x50
  [  245.057714]  rtnl_lock+0x15/0x20
  [  245.061901]  linkwatch_event+0xe/0x30
  [  245.066571]  process_one_work+0x228/0x3d0
  [  245.071607]  worker_thread+0x53/0x420
  [  245.076260]  ? process_one_work+0x3d0/0x3d0
  [  245.081493]  kthread+0x127/0x150
  [  245.085592]  ? set_kthread_struct+0x50/0x50
  [  245.090769]  ret_from_fork+0x1f/0x30
  [  245.095266]  
  ```

  and

  ```
  [  245.530629] INFO: task ifenslave:849 blocked for more than 121 seconds.
  [  245.540433]   Not tainted 5.15.0-69-generic #76-Ubuntu
  [  245.549050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  245.558960] task:ifenslave   state:D stack:0 pid:  849 ppid:   847 
flags:0x4002
  [  245.570930] Call Trace:
  [  245.576175]  
  [  245.581018]  __schedule+0x24e/0x590
  [  245.587445]  schedule+0x69/0x110
  [  245.593631]  schedule_timeout+0x103/0x140
  [  245.600573]  __wait_for_common+0xab/0x150
  [  245.607526]  ? usleep_range_state+0x90/0x90
  [  245.614743]  wait_for_completion+0x24/0x30
  [  245.621903]  flush_workqueue+0x133/0x3e0
  [  245.628887]  ib_cache_cleanup_one+0x21/0xf0 [ib_core]
  [  245.637083]  __ib_unregister_device+0x79/0xc0 [ib_core]
  [  245.645398]  ib_unregister_device+0x27/0x40 [ib_core]
  [  245.653541]  irdma_ib_unregister_device+0x4b/0x70 [irdma]
  [  245.662105]  irdma_remove+0x1f/0x70 [irdma]
  [  245.669446]  auxiliary_bus_remove+0x1d/0x40
  [  245.676688]  __device_release_driver+0x1a8/0x2a0
  [  245.684241]  device_release_driver+0x29/0x40
  [  245.691416]  bus_remove_device+0xde/0x150
  [  245.698396]  device_del+0x19c/0x400
  [**712178]  ice_lag_link.isra.0+0xdd/0xf0 [ice]
  m] (3 of 5) A start job is runni[  245.720683]  
ice_lag_changeupper_event+0xe1/0x130 [ice]
  ng for\u2026rk interfaces (3min 47s[  245.729739]  
ice_lag_event_handler+0x5b/0x150 [ice]
   / 5min 3s)
  [  245.738525]  raw_notifier_call_chain+0x46/0x60
  [  245.746006]  call_netdevice_notifiers_info+0x52/0xa0
  [  245.754123]  __netdev_upper_dev_link+0x1b7/0x310
  [  245.761658]  netdev_master_upper_dev_link+0x3e/0x60
  [  245.769627]  bond_enslave+0xc3a/0x1720 [bonding]
  [  245.777398]  ? sscanf+0x4e/0x70
  [  245.783375]  bond_option_slaves_set+0xca/0x170 [bonding]
  [  245.791738]  __bond_opt_set+0xbd/0x1a0 [bonding]
  [  245.799505]  __bond_opt_set_notify+0x30/0xb0 [bonding]
  [  245.807860]  bond_opt_tryset_rtnl+0x56/0xa0 [bonding]
  [  245.816062]  bonding_sysfs_store_option+0x52/0xa0 [bonding]
  [  245.824750]  dev_attr_store+0x14/0x30
  [  245.831443]  sysfs_kf_write+0x3b/0x50
  [  245.837979]  kernfs_fop_write_iter+0x138/0x1c0
  [  245.845469]  new_sync_write+0x111/0x1a0
  [  245.852210]  vfs_write+0x1d5/0x270
  [  245.858429]  ksys_write+0x67/0xf0
  [  245.864624]  __x64_sys_write+0x19/0x20
  [  245.871288]  do_syscall_64+0x59/0xc0
  [  245.877715]  ? handle_mm_fault+0xd8/0x2c0
  [  245.884566]  ? do_user_addr_fault+0x1e7/0x670
  [  245.891990]  ? filp_close+0x60/0x70
  [  245.898452]  ? exit_to_user_mode_prepare+0x37/0xb0
  [  245.906272]  ? irqentry_exit_to_user_mode+0x9/0x20
  [  245.914042]  ? irqentry_exit+0x1d/0x30
  [  245.920703]  ? exc_page_fault+0x89/0x170
  [  245.927555]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
  [  245.935763] RIP: 0033:0x7f1e86855a37
  [  245.942153] RSP: 002b:7fff8da477a8 EFLAGS: 0246 ORIG_RAX: 
0001
  [  245.953034] RAX: ffda RBX: 000a RCX: 
7f1e86855a37
  [  245.963554] RDX: 000a RSI: 556eff580510 RDI: 
0001
  [  245.972468] RBP: 556eff580510 R08: 556eff582c5a R09: 

  [  245.983048] R10: 

[Kernel-packages] [Bug 1534054] Re: use-after-free found by KASAN in blk_mq_register_disk

2016-01-14 Thread Gema Gomez
** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1534054

Title:
  use-after-free found by KASAN in blk_mq_register_disk

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  We are trying to debug the kernel using KASAN and we found that when a
  VM is booting in our cloud, on the virtualised kernel, there is a use-
  after-free access that should not be there.

  The failing VM was running on a host with kernel 3.13.0-66-generic
  (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0. Hosts'
  seabios: 1.7.5-1ubuntu1~cloud0

  The flavour of this VM is 4 CPUs, 8G RAM, 80G of root disk, 0 G swap
  and 0 G ephemeral disk.

  Here is the trace from KASAN (from the VM):

  The error message can be observed in the dmesg when the guest VM
  booted with v3.13.0-65 with KASAN enabled and
  "slub_debug=PU,kmalloc-32" in kernel command line.

  ==
  BUG: KASan: out of bounds access in blk_mq_register_disk+0x193/0x260 at addr 
8801f43f4d90
  Read of size 8 by task swapper/0/1
  =
  BUG kmalloc-32 (Not tainted): kasan: bad access detected
  -

  Disabling lock debugging due to kernel taint
  INFO: Allocated in blk_mq_init_hw_queues+0x778/0x920 age=5 cpu=1 pid=1
  __slab_alloc+0x4f8/0x560
  __kmalloc_node+0xad/0x310
  blk_mq_init_hw_queues+0x778/0x920
  blk_mq_init_queue+0x5f7/0x6c0
  virtblk_probe+0x207/0x980
  virtio_dev_probe+0x1be/0x280
  driver_probe_device+0xe2/0x5c0
  __driver_attach+0xc3/0xd0
  bus_for_each_dev+0x95/0xe0
  driver_attach+0x2b/0x30
  bus_add_driver+0x268/0x360
  driver_register+0xd3/0x1a0
  register_virtio_driver+0x3c/0x60
  init+0x53/0x80
  do_one_initcall+0xda/0x1a0
  kernel_init_freeable+0x1eb/0x27e
  INFO: Freed in kzfree+0x2d/0x40 age=13 cpu=0 pid=8
  __slab_free+0x2ab/0x3f0
  kfree+0x161/0x170
  kzfree+0x2d/0x40
  aa_free_task_context+0x5d/0xa0
  apparmor_cred_free+0x24/0x40
  security_cred_free+0x2b/0x30
  put_cred_rcu+0x38/0x140
  rcu_nocb_kthread+0x25a/0x410
  kthread+0x101/0x120
  ret_from_fork+0x58/0x90
  INFO: Slab 0xea0007d0fd00 objects=23 used=21 fp=0x8801f43f52d0 
flags=0x2004080
  INFO: Object 0x8801f43f4d70 @offset=3440 fp=0x8801f43f5830
  Bytes b4 8801f43f4d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  

  Object 8801f43f4d70: 00 ac 61 f7 01 88 ff ff 00 ac 69 f7 01 88 ff ff  
..a...i.
  Object 8801f43f4d80: 00 ac 71 f7 01 88 ff ff 00 ac 79 f7 01 88 ff ff  
..q...y.
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105
  Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
1.7.5-20150310_111955-batsu 04/01/2014
   ea0007d0fd00 8801f40cf9a8 81a6ce35 8801f7001c00
   8801f40cf9d8 81244aed 8801f7001c00 ea0007d0fd00
   8801f43f4d70 8801f779ac98 8801f40cfa00 8124ac36
  Call Trace:
   [] dump_stack+0x45/0x56
   [] print_trailer+0xfd/0x170
   [] object_err+0x36/0x40
   [] kasan_report_error+0x1e9/0x3a0
   [] ? sysfs_get+0x17/0x50
   [] ? kobject_add_internal+0x29b/0x4a0
   [] kasan_report+0x40/0x50
   [] ? dev_printk_emit+0x20/0x40
   [] ? blk_mq_register_disk+0x193/0x260
   [] __asan_load8+0x69/0xa0
   [] blk_mq_register_disk+0x193/0x260
   [] blk_register_queue+0xd2/0x170
   [] add_disk+0x31f/0x720
   [] virtblk_probe+0x58a/0x980
   [] ? virtblk_restore+0x100/0x100
   [] virtio_dev_probe+0x1be/0x280
   [] ? __device_attach+0x70/0x70
   [] driver_probe_device+0xe2/0x5c0
   [] ? __device_attach+0x70/0x70
   [] __driver_attach+0xc3/0xd0
   [] bus_for_each_dev+0x95/0xe0
   [] driver_attach+0x2b/0x30
   [] bus_add_driver+0x268/0x360
   [] driver_register+0xd3/0x1a0
   [] ? loop_init+0x14b/0x14b
   [] register_virtio_driver+0x3c/0x60
   [] init+0x53/0x80
   [] do_one_initcall+0xda/0x1a0
   [] kernel_init_freeable+0x1eb/0x27e
   [] ? rest_init+0x80/0x80
   [] kernel_init+0xe/0x130
   [] ret_from_fork+0x58/0x90
   [] ? rest_init+0x80/0x80
  Memory state around the buggy address:
   8801f43f4c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   8801f43f4d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 00 00
  >8801f43f4d80: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ^
   8801f43f4e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   8801f43f4e80: fc fc fc fc fc fc fc fc fc 00 00 00 00 fc fc fc
  ==

[Kernel-packages] [Bug 1534054] Re: use-after-free found by KASAN in blk_mq_register_disk

2016-01-14 Thread Gema Gomez
** Description changed:

  We are trying to debug the kernel using KASAN and we found that when a
  VM is booting in our cloud, on the virtualised kernel, there is a use-
  after-free access that should not be there.
  
- Here is the trace from KASAN:
+ The failing VM was running on a host with kernel 3.13.0-66-generic
+ (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0
+ 
+ Here is the trace from KASAN (from the VM):
  
  The error message can be observed in the dmesg when the guest VM booted
  with v3.13.0-65 with KASAN enabled.
  
  ==
  BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 
8801ec247400
  Read of size 8 by task swapper/0/1
  =
  BUG kmalloc-32 (Not tainted): kasan: bad access detected
  -
  
  Disabling lock debugging due to kernel taint
  INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x  (null) 
flags=0x280
  INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420
  
  Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff  
..q...y.
  Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c   
t$./virtual
  Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00  
/bdi/253:0..
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105
  Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
1.7.5-20150310_111955-batsu 04/01/2014
   ea0007b091c0 8801ec0cb9a8 81a6ce35 8801ef001c00
   8801ec0cb9d8 81244aed 8801ef001c00 ea0007b091c0
   8801ec247400 8801ef79ac98 8801ec0cba00 8124ac36
  Call Trace:
   [] dump_stack+0x45/0x56
   [] print_trailer+0xfd/0x170
   [] object_err+0x36/0x40
   [] kasan_report_error+0x1e9/0x3a0
   [] ? sysfs_get+0x17/0x50
   [] ? kobject_add_internal+0x29b/0x4a0
   [] kasan_report+0x40/0x50
   [] ? dev_printk_emit+0x20/0x40
   [] ? blk_mq_register_disk+0x193/0x260
   [] __asan_load8+0x69/0xa0
   [] blk_mq_register_disk+0x193/0x260
   [] blk_register_queue+0xd2/0x170
   [] add_disk+0x31f/0x720
   [] virtblk_probe+0x58a/0x980
   [] ? virtblk_restore+0x100/0x100
   [] virtio_dev_probe+0x1be/0x280
   [] ? __device_attach+0x70/0x70
   [] driver_probe_device+0xe2/0x5c0
   [] ? __device_attach+0x70/0x70
   [] __driver_attach+0xc3/0xd0
   [] bus_for_each_dev+0x95/0xe0
   [] driver_attach+0x2b/0x30
   [] bus_add_driver+0x268/0x360
   [] driver_register+0xd3/0x1a0
   [] ? loop_init+0x14b/0x14b
   [] register_virtio_driver+0x3c/0x60
   [] init+0x53/0x80
   [] do_one_initcall+0xda/0x1a0
   [] kernel_init_freeable+0x1eb/0x27e
   [] ? rest_init+0x80/0x80
   [] kernel_init+0xe/0x130
   [] ret_from_fork+0x58/0x90
   [] ? rest_init+0x80/0x80
  Memory state around the buggy address:
   8801ec247300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   8801ec247380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >8801ec247400: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
     ^
   8801ec247480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   8801ec247500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ==

** Description changed:

  We are trying to debug the kernel using KASAN and we found that when a
  VM is booting in our cloud, on the virtualised kernel, there is a use-
  after-free access that should not be there.
  
  The failing VM was running on a host with kernel 3.13.0-66-generic
  (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0
+ 
+ The flavour of this VM is 4 CPUs, 8G RAM, 80G of root disk, 0 G swap and
+ 0 G ephemeral disk.
  
  Here is the trace from KASAN (from the VM):
  
  The error message can be observed in the dmesg when the guest VM booted
  with v3.13.0-65 with KASAN enabled.
  
  ==
  BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 
8801ec247400
  Read of size 8 by task swapper/0/1
  =
  BUG kmalloc-32 (Not tainted): kasan: bad access detected
  -
  
  Disabling lock debugging due to kernel taint
  INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x  (null) 
flags=0x280
  INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420
  
  Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff  
..q...y.
  Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c   
t$./virtual
  Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00  
/bdi/253:0..
  CPU: 0 PID: 1 Comm: 

[Kernel-packages] [Bug 1534054] Re: use-after-free found by KASAN in blk_mq_register_disk

2016-01-14 Thread Gema Gomez
** Description changed:

+ We are trying to debug the kernel using KASAN and we found that when a
+ VM is booting in our cloud, on the virtualised kernel, there is a use-
+ after-free access that should not be there.
+ 
+ Here is the trace from KASAN:
+ 
  The error message can be observed in the dmesg when the guest VM booted
  with v3.13.0-65 with KASAN enabled.
  
  ==
  BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 
8801ec247400
  Read of size 8 by task swapper/0/1
  =
  BUG kmalloc-32 (Not tainted): kasan: bad access detected
  -
  
  Disabling lock debugging due to kernel taint
  INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x  (null) 
flags=0x280
  INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420
  
  Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff  
..q...y.
  Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c   
t$./virtual
  Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00  
/bdi/253:0..
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105
  Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
1.7.5-20150310_111955-batsu 04/01/2014
   ea0007b091c0 8801ec0cb9a8 81a6ce35 8801ef001c00
   8801ec0cb9d8 81244aed 8801ef001c00 ea0007b091c0
   8801ec247400 8801ef79ac98 8801ec0cba00 8124ac36
  Call Trace:
   [] dump_stack+0x45/0x56
   [] print_trailer+0xfd/0x170
   [] object_err+0x36/0x40
   [] kasan_report_error+0x1e9/0x3a0
   [] ? sysfs_get+0x17/0x50
   [] ? kobject_add_internal+0x29b/0x4a0
   [] kasan_report+0x40/0x50
   [] ? dev_printk_emit+0x20/0x40
   [] ? blk_mq_register_disk+0x193/0x260
   [] __asan_load8+0x69/0xa0
   [] blk_mq_register_disk+0x193/0x260
   [] blk_register_queue+0xd2/0x170
   [] add_disk+0x31f/0x720
   [] virtblk_probe+0x58a/0x980
   [] ? virtblk_restore+0x100/0x100
   [] virtio_dev_probe+0x1be/0x280
   [] ? __device_attach+0x70/0x70
   [] driver_probe_device+0xe2/0x5c0
   [] ? __device_attach+0x70/0x70
   [] __driver_attach+0xc3/0xd0
   [] bus_for_each_dev+0x95/0xe0
   [] driver_attach+0x2b/0x30
   [] bus_add_driver+0x268/0x360
   [] driver_register+0xd3/0x1a0
   [] ? loop_init+0x14b/0x14b
   [] register_virtio_driver+0x3c/0x60
   [] init+0x53/0x80
   [] do_one_initcall+0xda/0x1a0
   [] kernel_init_freeable+0x1eb/0x27e
   [] ? rest_init+0x80/0x80
   [] kernel_init+0xe/0x130
   [] ret_from_fork+0x58/0x90
   [] ? rest_init+0x80/0x80
  Memory state around the buggy address:
   8801ec247300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   8801ec247380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >8801ec247400: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
     ^
   8801ec247480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   8801ec247500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ==

** Tags added: sts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1534054

Title:
  use-after-free found by KASAN in blk_mq_register_disk

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  We are trying to debug the kernel using KASAN and we found that when a
  VM is booting in our cloud, on the virtualised kernel, there is a use-
  after-free access that should not be there.

  Here is the trace from KASAN:

  The error message can be observed in the dmesg when the guest VM
  booted with v3.13.0-65 with KASAN enabled.

  ==
  BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 
8801ec247400
  Read of size 8 by task swapper/0/1
  =
  BUG kmalloc-32 (Not tainted): kasan: bad access detected
  -

  Disabling lock debugging due to kernel taint
  INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x  (null) 
flags=0x280
  INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420

  Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff  
..q...y.
  Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c   
t$./virtual
  Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00  
/bdi/253:0..
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105
  Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
1.7.5-20150310_111955-batsu 04/01/2014
  

[Kernel-packages] [Bug 1534054] Re: use-after-free found by KASAN in blk_mq_register_disk

2016-01-14 Thread Gema Gomez
** Description changed:

  We are trying to debug the kernel using KASAN and we found that when a
  VM is booting in our cloud, on the virtualised kernel, there is a use-
  after-free access that should not be there.
  
  The failing VM was running on a host with kernel 3.13.0-66-generic
- (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0
+ (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0. Hosts'
+ seabios: 1.7.5-1ubuntu1~cloud0
  
  The flavour of this VM is 4 CPUs, 8G RAM, 80G of root disk, 0 G swap and
  0 G ephemeral disk.
  
  Here is the trace from KASAN (from the VM):
  
  The error message can be observed in the dmesg when the guest VM booted
  with v3.13.0-65 with KASAN enabled.
  
  ==
  BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 
8801ec247400
  Read of size 8 by task swapper/0/1
  =
  BUG kmalloc-32 (Not tainted): kasan: bad access detected
  -
  
  Disabling lock debugging due to kernel taint
  INFO: Slab 0xea0007b091c0 objects=128 used=128 fp=0x  (null) 
flags=0x280
  INFO: Object 0x8801ec247400 @offset=1024 fp=0x8801ec247420
  
  Bytes b4 8801ec2473f0: 00 ac 71 ef 01 88 ff ff 00 ac 79 ef 01 88 ff ff  
..q...y.
  Object 8801ec247400: 20 74 24 ec 01 88 ff ff 2f 76 69 72 74 75 61 6c   
t$./virtual
  Object 8801ec247410: 2f 62 64 69 2f 32 35 33 3a 30 00 00 00 00 00 00  
/bdi/253:0..
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB 3.13.0-65-generic #105
  Hardware name: OpenStack Foundation OpenStack Nova, BIOS 
1.7.5-20150310_111955-batsu 04/01/2014
   ea0007b091c0 8801ec0cb9a8 81a6ce35 8801ef001c00
   8801ec0cb9d8 81244aed 8801ef001c00 ea0007b091c0
   8801ec247400 8801ef79ac98 8801ec0cba00 8124ac36
  Call Trace:
   [] dump_stack+0x45/0x56
   [] print_trailer+0xfd/0x170
   [] object_err+0x36/0x40
   [] kasan_report_error+0x1e9/0x3a0
   [] ? sysfs_get+0x17/0x50
   [] ? kobject_add_internal+0x29b/0x4a0
   [] kasan_report+0x40/0x50
   [] ? dev_printk_emit+0x20/0x40
   [] ? blk_mq_register_disk+0x193/0x260
   [] __asan_load8+0x69/0xa0
   [] blk_mq_register_disk+0x193/0x260
   [] blk_register_queue+0xd2/0x170
   [] add_disk+0x31f/0x720
   [] virtblk_probe+0x58a/0x980
   [] ? virtblk_restore+0x100/0x100
   [] virtio_dev_probe+0x1be/0x280
   [] ? __device_attach+0x70/0x70
   [] driver_probe_device+0xe2/0x5c0
   [] ? __device_attach+0x70/0x70
   [] __driver_attach+0xc3/0xd0
   [] bus_for_each_dev+0x95/0xe0
   [] driver_attach+0x2b/0x30
   [] bus_add_driver+0x268/0x360
   [] driver_register+0xd3/0x1a0
   [] ? loop_init+0x14b/0x14b
   [] register_virtio_driver+0x3c/0x60
   [] init+0x53/0x80
   [] do_one_initcall+0xda/0x1a0
   [] kernel_init_freeable+0x1eb/0x27e
   [] ? rest_init+0x80/0x80
   [] kernel_init+0xe/0x130
   [] ret_from_fork+0x58/0x90
   [] ? rest_init+0x80/0x80
  Memory state around the buggy address:
   8801ec247300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   8801ec247380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >8801ec247400: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
     ^
   8801ec247480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   8801ec247500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ==

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1534054

Title:
  use-after-free found by KASAN in blk_mq_register_disk

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  We are trying to debug the kernel using KASAN and we found that when a
  VM is booting in our cloud, on the virtualised kernel, there is a use-
  after-free access that should not be there.

  The failing VM was running on a host with kernel 3.13.0-66-generic
  (trusty). Hosts' qemu version: 1:2.2+dfsg-5expubuntu9.3~cloud0. Hosts'
  seabios: 1.7.5-1ubuntu1~cloud0

  The flavour of this VM is 4 CPUs, 8G RAM, 80G of root disk, 0 G swap
  and 0 G ephemeral disk.

  Here is the trace from KASAN (from the VM):

  The error message can be observed in the dmesg when the guest VM
  booted with v3.13.0-65 with KASAN enabled.

  ==
  BUG: KASan: use after free in blk_mq_register_disk+0x193/0x260 at addr 
8801ec247400
  Read of size 8 by task swapper/0/1
  =
  BUG kmalloc-32 (Not tainted): kasan: bad access detected
  -

  Disabling lock debugging due to kernel taint
  INFO: 

[Kernel-packages] [Bug 1324125] Re: Unable to trigger a kernel crash dump on laptop

2015-04-27 Thread Gema Gomez
This bug is probably not relevant anymore, I just tried on my laptop and
it crashed nicely:

ii  linux-crashdump   3.13.0.49.56
amd64Linux kernel crashdump setup for the latest generic kernel

Kernel: 3.16.0-28-generic

On trusty.

** Tags removed: cts

** Changed in: kexec-tools (Ubuntu)
   Status: Confirmed = Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to kexec-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1324125

Title:
  Unable to trigger a kernel crash dump on laptop

Status in kexec-tools package in Ubuntu:
  Incomplete

Bug description:
  kernel crash dump doesn't work on laptop

  2. Ubuntu release,

  Trusty

  Steps to Reproduce:
 echo c  /proc/sysrq-trigger with linux-crashdump installed

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1324125/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-24 Thread Gema Gomez
My deployment is still running strong after over 36 hours. No crashes. I
will leave it running for a few more days to see if it happens after a
few days... and will report back.

@arges, thanks for this fix!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: qemu-system-x86
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  --- IRQ stack ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin domain 0 0
  virsh vcpupin domain 1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-22 Thread Gema Gomez
I have been trying to verify this kernel and I haven't seen exactly the
soft lockup crash, but this other one, which may or may not be related
but wanted to make a note of it:

[ 2406.041444] Kernel panic - not syncing: hung_task: blocked tasks
[ 2406.043163] CPU: 1 PID: 35 Comm: khungtaskd Not tainted 3.13.0-51-generic 
#84-Ubuntu
[ 2406.044223] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 
01/01/2011
[ 2406.044223]  003fffd1 88080ec7fdf0 817225ce 
81a62a65
[ 2406.044223]  88080ec7fe68 8171b46d 0008 
88080ec7fe78
[ 2406.044223]  88080ec7fe18 88080ec7fe40 0100 
0004
[ 2406.044223] Call Trace:
[ 2406.044223]  [817225ce] dump_stack+0x45/0x56
[ 2406.044223]  [8171b46d] panic+0xc8/0x1d7
[ 2406.044223]  [8110d7b6] watchdog+0x296/0x2e0
[ 2406.044223]  [8110d520] ? reset_hung_task_detector+0x20/0x20
[ 2406.044223]  [8108b5d2] kthread+0xd2/0xf0
[ 2406.044223]  [8108b500] ? kthread_create_on_node+0x1c0/0x1c0
[ 2406.044223]  [8173300c] ret_from_fork+0x7c/0xb0
[ 2406.044223]  [8108b500] ? kthread_create_on_node+0x1c0/0x1c0

I have the crashdump for it, let me know how you want to proceed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: qemu-system-x86
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  --- IRQ stack ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin domain 0 0
  virsh vcpupin 

[Kernel-packages] [Bug 1413540] Re: qemu-kvm package enables KSM on VMs

2015-01-22 Thread Gema Gomez
apport file for linux.

** Attachment added: apport.linux-image-3.13.0-44-generic.61de1tqv.apport
   
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1413540/+attachment/4303621/+files/apport.linux-image-3.13.0-44-generic.61de1tqv.apport

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  qemu-kvm package enables KSM on VMs

Status in linux package in Ubuntu:
  Incomplete
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1413540] Re: qemu-kvm package enables KSM on VMs

2015-01-22 Thread Gema Gomez
apport for qemu

** Attachment added: apport.qemu.pnfp6lff.apport
   
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1413540/+attachment/4303623/+files/apport.qemu.pnfp6lff.apport

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  qemu-kvm package enables KSM on VMs

Status in linux package in Ubuntu:
  Incomplete
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1413540] Re: qemu-kvm package enables KSM on VMs

2015-01-22 Thread Gema Gomez
I can reproduce this issue and hand a VM over to whoever is going to
triage in a hung state.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  qemu-kvm package enables KSM on VMs

Status in linux package in Ubuntu:
  Incomplete
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1413540] Re: qemu-kvm package enables KSM on VMs

2015-01-22 Thread Gema Gomez
apport file for linux.

** Attachment added: apport.linux-image-3.13.0-44-generic.61de1tqv.apport
   
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1413540/+attachment/4303622/+files/apport.linux-image-3.13.0-44-generic.61de1tqv.apport

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  qemu-kvm package enables KSM on VMs

Status in linux package in Ubuntu:
  Incomplete
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1413540] Re: issues with KSM enabled for nested KVM VMs

2015-01-22 Thread Gema Gomez
I have a different VM that has crashed (also nested nova compute), this
one had ksm disabled. See log attached.

** Attachment added: soft-lockup-different-node.log
   
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1413540/+attachment/4303814/+files/soft-lockup-different-node.log

** Changed in: linux (Ubuntu)
   Status: Incomplete = Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  issues with KSM enabled for nested KVM VMs

Status in linux package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp