[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-29 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 3.13.0-51.84

---
linux (3.13.0-51.84) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
- LP: #1444141
  * Merged back Ubuntu-3.13.0-49.83 security release

linux (3.13.0-50.82) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
- LP: #1442285

  [ Andy Whitcroft ]

  * [Config] CONFIG_DEFAULT_MMAP_MIN_ADDR needs to match on armhf and arm64
- LP: #1418140

  [ Chris J Arges ]

  * [Config] CONFIG_PCIEASPM_DEBUG=y
- LP: #1398544

  [ Upstream Kernel Changes ]

  * KEYS: request_key() should reget expired keys rather than give
EKEYEXPIRED
- LP: #1124250
  * audit: correctly record file names with different path name types
- LP: #1439441
  * KVM: x86: Check for nested events if there is an injectable interrupt
- LP: #1413540
  * be2iscsi: fix memory leak in error path
- LP: #1440156
  * block: remove old blk_iopoll_enabled variable
- LP: #1440156
  * be2iscsi: Fix handling timed out MBX completion from FW
- LP: #1440156
  * be2iscsi: Fix doorbell format for EQ/CQ/RQ s per SLI spec.
- LP: #1440156
  * be2iscsi: Fix the session cleanup when reboot/shutdown happens
- LP: #1440156
  * be2iscsi: Fix scsi_cmnd leakage in driver.
- LP: #1440156
  * be2iscsi : Fix DMA Out of SW-IOMMU space error
- LP: #1440156
  * be2iscsi: Fix retrieving MCCQ_WRB in non-embedded Mbox path
- LP: #1440156
  * be2iscsi: Fix exposing Host in sysfs after adapter initialization is
complete
- LP: #1440156
  * be2iscsi: Fix interrupt Coalescing mechanism.
- LP: #1440156
  * be2iscsi: Fix TCP parameters while connection offloading.
- LP: #1440156
  * be2iscsi: Fix memory corruption in MBX path
- LP: #1440156
  * be2iscsi: Fix destroy MCC-CQ before MCC-EQ is destroyed
- LP: #1440156
  * be2iscsi: add an missing goto in error path
- LP: #1440156
  * be2iscsi: remove potential junk pointer free
- LP: #1440156
  * be2iscsi: Fix memory leak in mgmt_set_ip()
- LP: #1440156
  * be2iscsi: Fix the sparse warning introduced in previous submission
- LP: #1440156
  * be2iscsi: Fix updating the boot enteries in sysfs
- LP: #1440156
  * be2iscsi: Fix processing CQE before connection resources are freed
- LP: #1440156
  * be2iscsi : Fix kernel panic during reboot/shutdown
- LP: #1440156
  * fixed invalid assignment of 64bit mask to host dma_boundary for scatter
gather segment boundary limit.
- LP: #1440156
  * quota: Store maximum space limit in bytes
- LP: #1441284
  * ip: zero sockaddr returned on error queue
- LP: #1441284
  * net: rps: fix cpu unplug
- LP: #1441284
  * ipv6: stop sending PTB packets for MTU < 1280
- LP: #1441284
  * netxen: fix netxen_nic_poll() logic
- LP: #1441284
  * udp_diag: Fix socket skipping within chain
- LP: #1441284
  * ping: Fix race in free in receive path
- LP: #1441284
  * bnx2x: fix napi poll return value for repoll
- LP: #1441284
  * net: don't OOPS on socket aio
- LP: #1441284
  * bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify
- LP: #1441284
  * ipv4: tcp: get rid of ugly unicast_sock
- LP: #1441284
  * ppp: deflate: never return len larger than output buffer
- LP: #1441284
  * net: sctp: fix passing wrong parameter header to param_type2af in
sctp_process_param
- LP: #1441284
  * ARM: pxa: add regulator_has_full_constraints to corgi board file
- LP: #1441284
  * ARM: pxa: add regulator_has_full_constraints to poodle board file
- LP: #1441284
  * ARM: pxa: add regulator_has_full_constraints to spitz board file
- LP: #1441284
  * hx4700: regulator: declare full constraints
- LP: #1441284
  * HID: input: fix confusion on conflicting mappings
- LP: #1441284
  * HID: fixup the conflicting keyboard mappings quirk
- LP: #1441284
  * megaraid_sas: disable interrupt_mask before enabling hardware
interrupts
- LP: #1441284
  * PCI: Generate uppercase hex for modalias var in uevent
- LP: #1441284
  * usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGN
- LP: #1441284
  * tty/serial: at91: enable peripheral clock before accessing I/O
registers
- LP: #1441284
  * tty/serial: at91: fix error handling in atmel_serial_probe()
- LP: #1441284
  * axonram: Fix bug in direct_access
- LP: #1441284
  * ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
- LP: #1441284
  * TPM: Add new TPMs to the tail of the list to prevent inadvertent change
of dev
- LP: #1441284
  * char: tpm: Add missing error check for devm_kzalloc
- LP: #1441284
  * tpm_tis: verify interrupt during init
- LP: #1441284
  * tpm: Fix NULL return in tpm_ibmvtpm_get_desired_dma
- LP: #1441284
  * tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send
- LP: #1441284
  * tpm/tpm_i2c_stm_st33: Add status check when reading data on the FIFO
- LP: #1441284
  * mmc: sdhci-pxa

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-24 Thread Gema Gomez
My deployment is still running strong after over 36 hours. No crashes. I
will leave it running for a few more days to see if it happens after a
few days... and will report back.

@arges, thanks for this fix!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  R

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-23 Thread Chris J Arges
@baco-1

1) What kind of hardware are you running on L0? ('ubuntu-bug linux' and filing 
a bug would collect the necessary info)
2) What kind of load are you seeing in L0, L1?
3) Can you give me the output of 'tail /sys/module/kvm_intel/parameters/*' ?
4) You could setup crashdump to dump on a hang (if we think its the right one), 
or just have a full backtrace on a softlockup by adding the following to the 
kernel cmdline:
softlockup_all_cpu_backtrace=1

Having a single vCPU could either be reducing the load, or avoiding a
race; it would be hard to tell without a proper backtrace of the hang
itsself.

This seems like a pretty simple testcase, I will put it on my list of
things to try and reproduce.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce t

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-23 Thread Guy Baconniere
@arges

For me it's related at least part of it...

If I don't update the kernel to proposed-updates I have the following messages :
If I use one CPU instead of two, I don't have those messages.

BUG: soft lockup  CPU#1 stuck for 22s! [qemu-system-x86:6889]
INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=15002 
jiffies, g=5324, c=5323, q=0)
BUG: soft lockup  CPU#1 stuck for 22s! [qemu-system-x86:6889]
...

My stress test is installing openstack with cloud-install on a single VM 
(nested KVM)
generated with uvt-kvm and two vcpus.

The backtrace I posted is the result of a manual "force off" (using 
virt-manager) of the test openstack VM.
shortly after cloud-install tries to launch a VM in VM, the CPU reach 100% and 
all the shells (also console) get stuck

The last kernel message of the VM before I loose control is
"[  942.295014] IPv6: ADDRCONF(NETDEV_UP): virbr0: link is not ready"

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-22 Thread Chris J Arges
@baco-1

These backtraces look a bit different than the original bug. Can you
file a new bug with how you are reproducing this and gather complete
logs?

--chris

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-22 Thread Guy Baconniere
I still have the same issue with kernel 3.16.0-36-generic or 3.13.0-51-generic 
(proposed-updates)

# KVM HOST (3.16.0-36-generic)
sudo apt-get install linux-signed-generic-lts-utopic/trusty-proposed

# KVM GUEST (3.16.0-36-generic)
sudo apt-get install linux-virtual-lts-utopic/trusty-proposed
apt-get install cloud-installer
cloud-install

[ 1196.920613] kvm: vmptrld   (null)/7800 failed
[ 1196.920953] vmwrite error: reg 401e value 31 (err 1)
[ 1196.921243] CPU: 23 PID: 5240 Comm: qemu-system-x86 Not tainted 
3.16.0-36-generic #48~14.04.1-Ubuntu
[ 1196.921244] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 11/03/2014
[ 1196.921245]   88202018fb58 81764a5f 
880fe496
[ 1196.921248]  88202018fb68 c0a9320d 88202018fb78 
c0a878bf
[ 1196.921250]  88202018fba8 c0a8e1cf 880fe496 

[ 1196.921252] Call Trace:
[ 1196.921262]  [] dump_stack+0x45/0x56
[ 1196.921277]  [] vmwrite_error+0x2c/0x2e [kvm_intel]
[ 1196.921280]  [] vmcs_writel+0x1f/0x30 [kvm_intel]
[ 1196.921283]  [] free_nested.part.73+0x5f/0x170 [kvm_intel]
[ 1196.921286]  [] vmx_free_vcpu+0x33/0x70 [kvm_intel]
[ 1196.921305]  [] kvm_arch_vcpu_free+0x44/0x50 [kvm]
[ 1196.921312]  [] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm]
[ 1196.921318]  [] ? synchronize_srcu+0x1d/0x20
[ 1196.921323]  [] kvm_put_kvm+0x10e/0x220 [kvm]
[ 1196.921328]  [] kvm_vcpu_release+0x18/0x20 [kvm]
[ 1196.921331]  [] __fput+0xe4/0x220
[ 1196.921333]  [] fput+0xe/0x10
[ 1196.921337]  [] task_work_run+0xc4/0xe0
[ 1196.921342]  [] do_exit+0x2b8/0xa60
[ 1196.921345]  [] ? __unqueue_futex+0x32/0x70
[ 1196.921347]  [] ? futex_wait+0x126/0x290
[ 1196.921349]  [] ? check_preempt_curr+0x85/0xa0
[ 1196.921351]  [] do_group_exit+0x3f/0xa0
[ 1196.921353]  [] get_signal_to_deliver+0x1d0/0x6f0
[ 1196.921357]  [] do_signal+0x48/0xad0
[ 1196.921359]  [] ? __switch_to+0x167/0x590
[ 1196.921361]  [] do_notify_resume+0x69/0xb0
[ 1196.921364]  [] int_signal+0x12/0x17
[ 1196.921365] vmwrite error: reg 2800 value  (err -255)
[ 1196.921733] CPU: 23 PID: 5240 Comm: qemu-system-x86 Not tainted 
3.16.0-36-generic #48~14.04.1-Ubuntu
[ 1196.921734] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 11/03/2014
[ 1196.921735]   88202018fb58 81764a5f 
880fe496
[ 1196.921736]  88202018fb68 c0a9320d 88202018fb78 
c0a878bf
[ 1196.921737]  88202018fba8 c0a8e1e0 880fe496 

[ 1196.921739] Call Trace:
[ 1196.921741]  [] dump_stack+0x45/0x56
[ 1196.921744]  [] vmwrite_error+0x2c/0x2e [kvm_intel]
[ 1196.921746]  [] vmcs_writel+0x1f/0x30 [kvm_intel]
[ 1196.921748]  [] free_nested.part.73+0x70/0x170 [kvm_intel]
[ 1196.921751]  [] vmx_free_vcpu+0x33/0x70 [kvm_intel]
[ 1196.921757]  [] kvm_arch_vcpu_free+0x44/0x50 [kvm]
[ 1196.921763]  [] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm]
[ 1196.921765]  [] ? synchronize_srcu+0x1d/0x20
[ 1196.921770]  [] kvm_put_kvm+0x10e/0x220 [kvm]
[ 1196.921774]  [] kvm_vcpu_release+0x18/0x20 [kvm]
[ 1196.921775]  [] __fput+0xe4/0x220
[ 1196.921777]  [] fput+0xe/0x10
[ 1196.921778]  [] task_work_run+0xc4/0xe0
[ 1196.921780]  [] do_exit+0x2b8/0xa60
[ 1196.921782]  [] ? __unqueue_futex+0x32/0x70
[ 1196.921783]  [] ? futex_wait+0x126/0x290
[ 1196.921784]  [] ? check_preempt_curr+0x85/0xa0
[ 1196.921786]  [] do_group_exit+0x3f/0xa0
[ 1196.921788]  [] get_signal_to_deliver+0x1d0/0x6f0
[ 1196.921790]  [] do_signal+0x48/0xad0
[ 1196.921791]  [] ? __switch_to+0x167/0x590
[ 1196.921793]  [] do_notify_resume+0x69/0xb0
[ 1196.921795]  [] int_signal+0x12/0x17
[ 1270.766540] device vnet3 entered promiscuous mode
[ 1270.865885] device vnet4 entered promiscuous mode
[ 1273.824576] kvm: zapping shadow pages for mmio generation wraparound
[ 1447.725335] kvm [6152]: vcpu0 unhandled rdmsr: 0x606

uvt-kvm create \
--memory 16384 \
--disk 100 \
--cpu 2 \
--ssh-public-key-file uvt-authorized_keys \
--template uvt-template.xml \
test release=trusty arch=amd64


  2
  



  
  
SandyBridge
Intel




-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-22 Thread Chris J Arges
After speaking to Gema, she will re-test with this kernel installed in L0 in 
addition to L1.
NOTE: This fix needs to be present for L0/L1 kernels.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
 

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-22 Thread Gema Gomez
I have been trying to verify this kernel and I haven't seen exactly the
soft lockup crash, but this other one, which may or may not be related
but wanted to make a note of it:

[ 2406.041444] Kernel panic - not syncing: hung_task: blocked tasks
[ 2406.043163] CPU: 1 PID: 35 Comm: khungtaskd Not tainted 3.13.0-51-generic 
#84-Ubuntu
[ 2406.044223] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 
01/01/2011
[ 2406.044223]  003fffd1 88080ec7fdf0 817225ce 
81a62a65
[ 2406.044223]  88080ec7fe68 8171b46d 0008 
88080ec7fe78
[ 2406.044223]  88080ec7fe18 88080ec7fe40 0100 
0004
[ 2406.044223] Call Trace:
[ 2406.044223]  [] dump_stack+0x45/0x56
[ 2406.044223]  [] panic+0xc8/0x1d7
[ 2406.044223]  [] watchdog+0x296/0x2e0
[ 2406.044223]  [] ? reset_hung_task_detector+0x20/0x20
[ 2406.044223]  [] kthread+0xd2/0xf0
[ 2406.044223]  [] ? kthread_create_on_node+0x1c0/0x1c0
[ 2406.044223]  [] ret_from_fork+0x7c/0xb0
[ 2406.044223]  [] ? kthread_create_on_node+0x1c0/0x1c0

I have the crashdump for it, let me know how you want to proceed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although 

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-21 Thread Chris J Arges
Verified on my reproducers. I'm marking the development task as fixed
for this bug. I'll move the upstream investigation to another bug.

** Changed in: linux (Ubuntu)
 Assignee: Chris J Arges (arges) => (unassigned)

** Changed in: linux (Ubuntu)
   Status: Confirmed => Fix Released

** Changed in: linux (Ubuntu)
   Importance: High => Undecided

** Tags removed: verification-needed-trusty
** Tags added: verification-done-trusty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs 

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-17 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-trusty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-09 Thread Dr. Jens Rosenboom
@Andy: So 3.16.0-34 is the kernel with the fix? Any chance that it will
also be backported to the 3.13 series?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-gen

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-08 Thread Andy Whitcroft
** Changed in: linux (Ubuntu Trusty)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  Fix Committed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-06 Thread Ramy Asselin
Ran into this bug too on 3.13.0-48. My workaround is to run QEMU on top
of KVM (instead of kvm on top of KVM)

devstack local.conf:
[[post-config|$NOVA_CONF]]
[libvirt]
virt_type = qemu

nova.conf
[libvirt]
virt_type = qemu

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  In Progress

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.0

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-06 Thread Chris J Arges
With a revert of b6b8a145 ('Rework interception of IRQs and NMIs'), the
issue does not occur readily with the test case. I was able to run for
1+ hour. Generally I can reproduce within 15m.

With 9242b5b6 ('KVM: x86: Check for nested events if there is an
injectable interrupt') applied, I can run for 1+ hour without issue.

Current 3.13.0 patchlevel is in between those two commits which allows
for this bug to reproduce easily.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  In Progress

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang o

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-06 Thread Chris J Arges
** Description changed:

  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247
  
  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the following
  backtrace:
  
  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b
  
+ [Fix]
+ 
+ commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
+ Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of 
the affected 3.13 distro kernel. However the issue can still occur in some 
cases.
+ 
+ 
  [Workaround]
  
  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM case,
  this means the the L1 VM needs to have all vCPUs pinned to a unique CPU.
  This can be accomplished with the following (for 2 vCPUs):
  
  virsh vcpupin  0 0
  virsh vcpupin  1 1
  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes
  
  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656
  
- 
  Another test case is to do the following (on affected hardware):
  
  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM
  
  Sometimes this is sufficient to reproduce the issue, I've observed that 
running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.
- 
  
  --
  
  Original Description:
  
  When installing qemu-kvm on a VM, KSM is enabled.
  
  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
  
  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1
  
  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eve

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-06 Thread Chris J Arges
** Also affects: linux (Ubuntu Trusty)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Trusty)
 Assignee: (unassigned) => Chris J Arges (arges)

** Changed in: linux (Ubuntu Trusty)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Trusty)
   Status: New => In Progress

** Description changed:

  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247
  
  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the following
  backtrace:
  
  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b
  
  [Workaround]
  
  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM case,
  this means the the L1 VM needs to have all vCPUs pinned to a unique CPU.
  This can be accomplished with the following (for 2 vCPUs):
  
  virsh vcpupin  0 0
  virsh vcpupin  1 1
  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes
  
  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656
  
+ 
+ Another test case is to do the following (on affected hardware):
+ 
+ 1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
+ 2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
+ 3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM
+ 
+ Sometimes this is sufficient to reproduce the issue, I've observed that 
running
+ KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
+ If this doesn't reproduce then you can do the following:
+ 4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
+ L1 vCPUs until the hang occurs.
+ 
+ 
  --
  
  Original Description:
  
  When installing qemu-kvm on a VM, KSM is enabled.
  
  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
  
  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1
  
  To see the soft lockups, deploy a cloud on a virtualised env like

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-01 Thread Aaron Rosen
@chris: done
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1439394

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:247

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-04-01 Thread Chris J Arges
@arosen,
This looks like a different softlockup, and also the machine seems to recover 
from it. Please file a new bug and be sure to attach logs to the bug. Describe 
in detail how to reproduce this as well, what kind of host machine do you have? 
what VM definition are you using? Etc etc.

** Description changed:

  [Impact]
- Certain workloads that need to execute functions on a non-local CPU using 
smp_call_function_* can result in soft lockups with the following backtrace:
+ Upstream discussion: https://lkml.org/lkml/2015/2/11/247
+ 
+ Certain workloads that need to execute functions on a non-local CPU
+ using smp_call_function_* can result in soft lockups with the following
+ backtrace:
  
  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b
  
  [Workaround]
  
  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM case,
  this means the the L1 VM needs to have all vCPUs pinned to a unique CPU.
  This can be accomplished with the following (for 2 vCPUs):
  
  virsh vcpupin  0 0
  virsh vcpupin  1 1
  
- 
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes
  
  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656
  
  --
  
  Original Description:
  
  When installing qemu-kvm on a VM, KSM is enabled.
  
  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
  
  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1
  
  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.07200

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-31 Thread Aaron Rosen
I am also hitting this issue in my CI a lot. Here is the trace I'm
getting in syslog: http://logs2.aaronorosen.com/85/169585/1/check/dsvm-
tempest-full-congress-
nodepool/94f8441/logs/syslog.txt.gz#_Apr__1_02_43_44

Is there a work around for this?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Certain workloads that need to execute functions on a non-local CPU using 
smp_call_function_* can result in soft lockups with the following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 s

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-27 Thread Chris J Arges
@fifieldt

Hi, that is the same bug. Things to reduce the hangs right now are:
- Disabling KSM in L1 guest
- Using 3.16 kernel on the L0 host
- Pinning L1 vCPUs to L0 host CPU

Note this doesn't fix the issue, it only decreases (potentially) the frequency 
of these lockups.
--chris

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Certain workloads that need to execute functions on a non-local CPU using 
smp_call_function_* can result in soft lockups with the following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-27 Thread Tom Fifield
Hi,

Just wanted to chime in that this bug also affected me - running
OpenStack Juno w/KVM inside a KVM hypervisor.

CPU on the host machine is:
vendor_id   : GenuineIntel
cpu family  : 6
model   : 58
model name  : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

running 14.04 with the latest packages applied as of today (2015-03-27)
for both the host and the guest.

Lockup appeared to happen with one host-guest VM after I altered the
number of CPUs allocated to another VM (yet to reboot that VM for
changes to take affect), though I had also recently booted a new host-
guest-guest VM.


ar 27 15:12:43 compute ntpd[1775]: peers refreshed
Mar 27 15:12:43 compute ntpd[1775]: new interface(s) found: waking up resolver
Mar 27 15:12:48 compute dnsmasq-dhcp[2044]: DHCPDISCOVER(br100) 
fa:16:3e:c3:81:22 
Mar 27 15:12:48 compute dnsmasq-dhcp[2044]: DHCPOFFER(br100) 203.0.113.27 
fa:16:3e:c3:81:22 
Mar 27 15:12:48 compute dnsmasq-dhcp[2044]: DHCPREQUEST(br100) 203.0.113.27 
fa:16:3e:c3:81:22 
Mar 27 15:12:48 compute dnsmasq-dhcp[2044]: DHCPACK(br100) 203.0.113.27 
fa:16:3e:c3:81:22 test03
Mar 27 15:15:40 compute kernel: [  436.12] BUG: soft lockup - CPU#5 stuck 
for 23s! [ksmd:68]
Mar 27 15:15:40 compute kernel: [  436.12] Modules linked in: vhost_net 
vhost macvtap macvlan xt_CHECKSUM ebt_ip ebt_arp ebtable_filter br
idge stp llc xt_conntrack xt_nat xt_tcpudp iptable_mangle iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6tabl
e_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nbd 
ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_
tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel cirrus 
snd_hda_codec ttm snd_hwdep drm_kms_helper snd_pcm drm snd_page_alloc snd_
timer syscopyarea snd sysfillrect soundcore sysimgblt dm_multipath i2c_piix4 
kvm_intel scsi_dh serio_raw kvm mac_hid lp parport 8139too psmous
e 8139cp mii floppy pata_acpi
Mar 27 15:15:40 compute kernel: [  436.12] CPU: 5 PID: 68 Comm: ksmd Not 
tainted 3.13.0-46-generic #79-Ubuntu
Mar 27 15:15:40 compute kernel: [  436.12] Hardware name: QEMU Standard PC 
(i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Mar 27 15:15:40 compute kernel: [  436.12] task: 8802306db000 ti: 
8802306e4000 task.ti: 8802306e4000
Mar 27 15:15:40 compute kernel: [  436.12] RIP: 0010:[]  
[] generic_exec_single+0x86/0xb0
Mar 27 15:15:40 compute kernel: [  436.12] RSP: 0018:8802306e5c00  
EFLAGS: 0202
Mar 27 15:15:40 compute kernel: [  436.12] RAX: 0006 RBX: 
8802306e5bd0 RCX: 0005
Mar 27 15:15:40 compute kernel: [  436.12] RDX: 8180ade0 RSI: 
 RDI: 0286
Mar 27 15:15:40 compute kernel: [  436.12] RBP: 8802306e5c30 R08: 
8180adc8 R09: 880232989b48
Mar 27 15:15:40 compute kernel: [  436.12] R10: 0867 R11: 
 R12: 
Mar 27 15:15:40 compute kernel: [  436.12] R13:  R14: 
 R15: 
Mar 27 15:15:40 compute kernel: [  436.12] FS:  () 
GS:88023fd4() knlGS:
Mar 27 15:15:40 compute kernel: [  436.12] CS:  0010 DS:  ES:  CR0: 
8005003b
Mar 27 15:15:40 compute kernel: [  436.12] CR2: 7fb0557bf000 CR3: 
36b7d000 CR4: 26e0
Mar 27 15:15:40 compute kernel: [  436.12] Stack:
Mar 27 15:15:40 compute kernel: [  436.12]  88023fd13f80 
0004 0005 81d14300
Mar 27 15:15:40 compute kernel: [  436.12]  8105c7a0 
88023212c380 8802306e5ca8 810dc065
Mar 27 15:15:40 compute kernel: [  436.12]  000134c0 
000134c0 88023fd13f80 88023fd13f80
Mar 27 15:15:40 compute kernel: [  436.12] Call Trace:
Mar 27 15:15:40 compute kernel: [  436.12]  [] ? 
leave_mm+0x80/0x80
Mar 27 15:15:40 compute kernel: [  436.12]  [] 
smp_call_function_single+0xe5/0x190
Mar 27 15:15:40 compute kernel: [  436.12]  [] ? 
leave_mm+0x80/0x80
Mar 27 15:15:40 compute kernel: [  436.12]  [] ? 
kvm_handle_hva_range+0x11a/0x180 [kvm]
Mar 27 15:15:40 compute kernel: [  436.12]  [] ? 
rmap_write_protect+0x80/0x80 [kvm]
Mar 27 15:15:40 compute kernel: [  436.12]  [] 
smp_call_function_many+0x286/0x2d0
Mar 27 15:15:40 compute kernel: [  436.12]  [] ? 
leave_mm+0x80/0x80
Mar 27 15:15:40 compute kernel: [  436.12]  [] 
native_flush_tlb_others+0x37/0x40
Mar 27 15:15:40 compute kernel: [  436.12]  [] 
flush_tlb_page+0x56/0xa0
Mar 27 15:15:40 compute kernel: [  436.12]  [] 
ptep_clear_flush+0x48/0x60
Mar 27 15:15:40 compute kernel: [  436.12]  [] 
try_to_merge_with_ksm_page+0x14f/0x650
Mar 27 15:15:40 compute kernel: [  436.12]  [] 
ksm_do_scan+0xb96/0xdb0
Mar 27 15:15:40 compute kernel: [  436.12]  [] 
ksm_scan_thread+0x7f/0x200
Mar 27 15:15:40 compute kernel: [  436.12]  [] ? 
prepare_

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-25 Thread Chris J Arges
Ideas going forward:

1) Instrument kernel for debugging csd_lock
2) Determine which CPUs exhibit this issue
3) Examine pinning more in depth pin 0-0 1-2 for example
4) Test older kernels , newer kernels to verify issue

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Certain workloads that need to execute functions on a non-local CPU using 
smp_call_function_* can result in soft lockups with the following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x8

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-25 Thread Chris J Arges
Stefan,
This looks like a separate bug (as we discussed). Please file another bug for 
this when you have time.

** Description changed:

  [Impact]
- Users of nested KVM for testing openstack have soft lockups as follows:
+ Certain workloads that need to execute functions on a non-local CPU using 
smp_call_function_* can result in soft lockups with the following backtrace:
  
  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
-  #0 [88043fd03d18] machine_kexec at 8104ac02
-  #1 [88043fd03d68] crash_kexec at 810e7203
-  #2 [88043fd03e30] panic at 81719ff4
-  #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
-  #4 [88043fd03ed8] __run_hrtimer at 8108e787
-  #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
-  #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
-  #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
-  #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
+  #0 [88043fd03d18] machine_kexec at 8104ac02
+  #1 [88043fd03d68] crash_kexec at 810e7203
+  #2 [88043fd03e30] panic at 81719ff4
+  #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
+  #4 [88043fd03ed8] __run_hrtimer at 8108e787
+  #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
+  #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
+  #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
+  #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
-  #9 [880426f0d958] apic_timer_interrupt at 817326dd
- [exception RIP: generic_exec_single+130]
- RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
- RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
- RDX: 8180ad60  RSI:   RDI: 0286
- RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
- R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
- R13:   R14: 880407670280  R15: 
- ORIG_RAX: ff10  CS: 0010  SS: 0018
+  #9 [880426f0d958] apic_timer_interrupt at 817326dd
+ [exception RIP: generic_exec_single+130]
+ RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
+ RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
+ RDX: 8180ad60  RSI:   RDI: 0286
+ RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
+ R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
+ R13:   R14: 880407670280  R15: 
+ ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
- RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
- RAX: 001c  RBX: 8173196d  RCX: 
- RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
- RBP:    R8:    R9: 7fe7d1cd2738
- R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
- R13: 7fe7be9ff9c0  R14:   R15: 
- ORIG_RAX: 001c  CS: 0033  SS: 002b
+ RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
+ RAX: 001c  RBX: 8173196d  RCX: 
+ RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
+ RBP:    R8:    R9: 7fe7d1cd2738
+ R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
+ R13: 7fe7be9ff9c0  R14:   R15: 
+ ORIG_RAX: 001c  CS: 0033  SS: 002b
+ 
+ [Workaround]
+ 
+ In order to avoid this issue, the workload needs to be pinned to CPUs
+ such that the function always executes locally. For the nested VM case,
+ this means the the L1 VM needs to have all vCPUs pinned to a unique CPU.
+ This can be accomplished with the following (for 2 vCPUs):
+ 
+ virsh vcpupin  0 0
+ virsh vcpupin  1 1
  
  
  [Test Case]
  - Deplo

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-25 Thread Chris J Arges
I've added instructions for a workaround. The code paths I've seen in
crashes has been the following:

kvm_sched_in
 -> kvm_arch_vcpu_load
  -> vmx_vcpu_load
   -> loaded_vmcs_clear
-> smp_call_function_single

pmdp_clear_flush
 -> flush_tlb_mm_range
  -> native_flush_tlb_others
-> smp_call_function_many

Generally this has been caused by workloads that use nested VMs, and
stress L2/L1 vms (causing non-local CPU TLB flushing or VMCS clearing).

The hang is in csd_lock_wait waiting for CSD_FLAG_LOCK bit to be
cleared, which can only be triggered with non-local smp_call_function_*
calls.

Another data point is that this can happen with x2apic as well as flat
apic (as tested with nox2apic).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Certain workloads that need to execute functions on a non-local CPU using 
smp_call_function_* can result in soft lockups with the following backtrace:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin  0 0
  virsh vcpupin  1 1

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virt

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-25 Thread Stefan Bader
Hrmn... When I repeated the setup I seem to have triggered some kind of
lockup even while bringing up l2. Of course hard to say without details
of Ryan's dump. However mine seems to have backtraces in the log which
remind me an awful lot of an issue related to punching holes into ext4
based qcow images. Chris had been working on something like this
before... He is on a sprint this week. Anyway, my strace in the log:

[ 1200.288031] INFO: task qemu-system-x86:4545 blocked for more than 120 
seconds.
[ 1200.288712]   Not tainted 3.13.0-46-generic #77-Ubuntu
[ 1200.289204] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1200.289892] qemu-system-x86 D 88007fc134c0 0  4545  1 0x
[ 1200.289895]  88007a9c5d28 0082 88007bbd3000 
88007a9c5fd8
[ 1200.289897]  000134c0 000134c0 88007bbd3000 
88007fc13d58
[ 1200.289898]  88007ffcdee8 0002 8114eef0 
88007a9c5da0
[ 1200.289900] Call Trace:
[ 1200.289906]  [] ? wait_on_page_read+0x60/0x60
[ 1200.289909]  [] io_schedule+0x9d/0x140
[ 1200.289910]  [] sleep_on_page+0xe/0x20
[ 1200.289912]  [] __wait_on_bit+0x62/0x90
[ 1200.289914]  [] wait_on_page_bit+0x7f/0x90
[ 1200.289917]  [] ? autoremove_wake_function+0x40/0x40
[ 1200.289919]  [] ? pagevec_lookup_tag+0x21/0x30
[ 1200.289921]  [] filemap_fdatawait_range+0xf9/0x190
[ 1200.289923]  [] filemap_write_and_wait_range+0x3f/0x70
[ 1200.289927]  [] ext4_sync_file+0xba/0x320
[ 1200.289930]  [] do_fsync+0x51/0x80
[ 1200.289931]  [] SyS_fdatasync+0x13/0x20
[ 1200.289933]  [] system_call_fastpath+0x1a/0x1f

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor I

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-25 Thread Ryan Beisner
@smb - after repeating the test a few times, I too ran out of space with
the default 8GB VM disk size, resulting in a paused VM.  You'll have to
re-create the VMs a little bit differently (--disk ).

ex:
@L0:
sudo uvt-kvm destroy trusty-vm
sudo uvt-kvm create --memory 2048 --disk 40 trusty-vm release=trusty

@L1:
#repeat original repro

ref:
http://manpages.ubuntu.com/manpages/trusty/man1/uvt-kvm.1.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-25 Thread Stefan Bader
Yeah, will do. Just got distracted and wanted to ensure that the repro
was not accidentally another form of failure path to the out of space
issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-25 Thread Stefan Bader
Hm, following your instructions I rather run into a situation where the
l2 guest gets paused. Likely because l1 runs out of disk space. The
default of uvtool is 7G which I would say the l2 stress run fills as it
grows the l2 qcow image on l1 which has to stuff all the initial cloud-
image and the snapshot of that for l2 into its 7G.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. E

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-23 Thread Ryan Beisner
I've collected crash dumps, and have stored them on an internal
Canonical server as they are 2gb+.   Feel free to ping me for access.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscript

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-23 Thread Ryan Beisner
A few hrs later, those two L0 bare metal host CPUs are still maxed.  In
scenarios where L0 is hosting many VMs, such as in a cloud, this bug can
be expected to cause significant performance, consistency and capacity
issues on the host and in the cloud as a whole.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need furth

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-23 Thread Ryan Beisner
** Attachment added: "L1-console-log-soft-lockup.png"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+attachment/4353984/+files/L1-console-log-soft-lockup.png

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/u

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-23 Thread Ryan Beisner
** Attachment added: "L0-baremetal-cpu-pegged.png"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+attachment/4353983/+files/L0-baremetal-cpu-pegged.png

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-23 Thread Ryan Beisner
s/static/sym/  ;-)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Un

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-23 Thread Ryan Beisner
This does not appear to be specific to OpenStack, nor tempest.  I've
reproduced with Trusty on Trusty on Trusty, vanilla qemu/kvm.

Simplified reproducer, with an existing MAAS cluster:

@L0 baremetal:
 - Create a Trusty bare metal host from daily images.
 - sudo apt-get update -y && sudo apt-get -y install uvtool
 - sudo uvt-simplestreams-libvirt sync release=trusty arch=amd64
 - sudo uvt-simplestreams-libvirt query
 - ssh-keygen
 - sudo uvt-kvm create --memory 2048 trusty-vm release=trusty
 - sudo virsh shutdown trusty-vm
 - # edit the /etc/libvirt/qemu/trusty-vm.xml to enable serial console dump to 
file:

  
  


  
  

 - sudo virsh define /etc/libvirt/qemu/trusty-vm.xml
 - sudo virsh start trusty-vm
 - # confirm console output:
 - sudo tailf /tmp/trusty-vm-console.log
 - # take note of the VM's IP:
 - sudo uvt-kvm ip trusty-vm
 - # ssh into the new vm.

@L1 "trusty-vm":
 - sudo apt-get update -y && sudo apt-get -y install uvtool
 - sudo uvt-simplestreams-libvirt sync release=trusty arch=amd64
 - sudo uvt-simplestreams-libvirt query
 - ssh-keygen
 - # change .122. to .123. in /etc/libvirt/qemu/networks/default.xml
 - # make sure default.xml is static linked inside /etc/libvirt/qemu/networks
 - sudo reboot  # for good measure
 - sudo uvt-kvm create --memory 768 trusty-nest release=trusty
 - # take note of the nested VM's IP
 - sudo uvt-kvm ip trusty-vm
 - # ssh into the new vm.

@L2 "trusty-nest":
 - sudo apt-get update && sudo apt-get install stress
 - stress -c 1 -i 1 -m 1 -d 1 -t 600

Now watch the "trusty-vm" console for:  [  496.076004] BUG: soft lockup
- CPU#0 stuck for 23s! [ksmd:36].  It happens to me within a couple of
minutes.  Then, both L1 and L2 become unreachable indefinitely, with two
cores on L0 stuck at 100%.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-23 Thread Ryan Beisner
Also FYI:  I was not able to reproduce this issue when using Vivid as
the bare metal L0.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions

-- 
Mailing list: https://launchpad.net

[Kernel-packages] [Bug 1413540] Re: Trusty soft lockup issues with nested KVM

2015-03-23 Thread Ryan Beisner
** Summary changed:

- soft lockup issues with nested KVM VMs running tempest
+ Trusty soft lockup issues with nested KVM

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:

  PID: 22262  TASK: 8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [88043fd03d18] machine_kexec at 8104ac02
   #1 [88043fd03d68] crash_kexec at 810e7203
   #2 [88043fd03e30] panic at 81719ff4
   #3 [88043fd03ea8] watchdog_timer_fn at 8110d7c5
   #4 [88043fd03ed8] __run_hrtimer at 8108e787
   #5 [88043fd03f18] hrtimer_interrupt at 8108ef4f
   #6 [88043fd03f80] local_apic_timer_interrupt at 81043537
   #7 [88043fd03f98] smp_apic_timer_interrupt at 81733d4f
   #8 [88043fd03fb0] apic_timer_interrupt at 817326dd
  ---  ---
   #9 [880426f0d958] apic_timer_interrupt at 817326dd
  [exception RIP: generic_exec_single+130]
  RIP: 810dbe62  RSP: 880426f0da00  RFLAGS: 0202
  RAX: 0002  RBX: 880426f0d9d0  RCX: 0001
  RDX: 8180ad60  RSI:   RDI: 0286
  RBP: 880426f0da30   R8: 8180ad48   R9: 88042713bc68
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 8804274bb000
  R13:   R14: 880407670280  R15: 
  ORIG_RAX: ff10  CS: 0010  SS: 0018
  #10 [880426f0da38] smp_call_function_single at 810dbf75
  #11 [880426f0dab0] smp_call_function_many at 810dc3a6
  #12 [880426f0db10] native_flush_tlb_others at 8105c8f7
  #13 [880426f0db38] flush_tlb_mm_range at 8105c9cb
  #14 [880426f0db68] pmdp_splitting_flush at 8105b80d
  #15 [880426f0db88] __split_huge_page at 811ac90b
  #16 [880426f0dc20] split_huge_page_to_list at 811acfb8
  #17 [880426f0dc48] __split_huge_page_pmd at 811ad956
  #18 [880426f0dcc8] unmap_page_range at 8117728d
  #19 [880426f0dda0] unmap_single_vma at 81177341
  #20 [880426f0ddd8] zap_page_range at 811784cd
  #21 [880426f0de90] sys_madvise at 81174fbf
  #22 [880426f0df80] system_call_fastpath at 8173196d
  RIP: 7fe7ca2cc647  RSP: 7fe7be9febf0  RFLAGS: 0293
  RAX: 001c  RBX: 8173196d  RCX: 
  RDX: 0004  RSI: 007fb000  RDI: 7fe7be1ff000
  RBP:    R8:    R9: 7fe7d1cd2738
  R10: 7fe7d1f2dbd0  R11: 0206  R12: 7fe7be9ff700
  R13: 7fe7be9ff9c0  R14:   R15: 
  ORIG_RAX: 001c  CS: 0033  SS: 002b

  
  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:Ubuntu 14.04.1 LTS
  Release:14.04
  Codename:   trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 
17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, 
run tempest on it, the compute nodes of the virtualised deployment will 
eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions

-- 
Ma