Public bug reported:

[Impact]
It has been brought to my attention that VMware Guest[1] randomly crashes after 
moving the VMs from a VMware "5.5" env to VMware 6.5 env.

The crashes started after the above move (5.5->6.5).

Notes:
* The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to 
happens with Vmware 6.5
* The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on 
VMware 6.5.

Here's the stack trace took from the .vmss converted to be readable by
Linux debugger:

[17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100
[17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 
nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink 
xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack 
bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw 
coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs 
lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi
[17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic 
#184-Ubuntu
[17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 10/22/2013
[17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 
ffff88042f263d30
[17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c 
ffff88042f263c00
[17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 
ffff880428c25fd8
[17007961.189885] Call Trace:
[17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82
[17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a
[17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0
[17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70
[17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310
[17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0
[17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0
[17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100
[17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150
[17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20
[17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0
[17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60
[17007961.189999] [<ffffffff810ca280>] ? 
ftrace_raw_output_rcu_utilization+0x50/0x50
[17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50
[17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 
[vmw_vmci]
[17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 
[vmw_vsock_vmci_transport]
[17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock]
[17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180
[17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20
[17007961.190102] [<ffffffffa018a2c0>] 
vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport]
[17007961.190114] [<ffffffffa01a7efc>] 
vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci]
[17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci]
[17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130
[17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
[17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110
[17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0
[17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d
[17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10
[17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100
[17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30
[17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0
[17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0
[17007961.190221] bad: scheduling from the idle thread!

[Other infos]

I have identified a patch series[2] which seems to fix the exact same situation.
A full discussion can be found on patchworks[3], suggesting a certain patch 
series[2] authored by Vmware.

[1] - VM details :
Release: Trusty
Kernel: Ubuntu-3.13.0-135

[2] Upstream patch series
8ab18d7 VSOCK: Detach QP check should filter out non matching QPs.
8566b86 VSOCK: Fix lockdep issue.
4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context

[3] - https://patchwork.kernel.org/patch/9948741/

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Fix Released

** Affects: linux (Ubuntu Trusty)
     Importance: Medium
     Assignee: Eric Desrochers (slashd)
         Status: In Progress


** Tags: trusty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1780470

Title:
  BUG: scheduling while atomic (Kernel : Ubuntu-3.13 + VMware: 6.0 and
  late)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  In Progress

Bug description:
  [Impact]
  It has been brought to my attention that VMware Guest[1] randomly crashes 
after moving the VMs from a VMware "5.5" env to VMware 6.5 env.

  The crashes started after the above move (5.5->6.5).

  Notes:
  * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started 
to happens with Vmware 6.5
  * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on 
VMware 6.5.

  Here's the stack trace took from the .vmss converted to be readable by
  Linux debugger:

  [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100
  [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl 
nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink 
nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat 
nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx 
serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc 
mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi
  [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 
3.13.0-135-generic #184-Ubuntu
  [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 10/22/2013
  [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 
ffff88042f263d30
  [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c 
ffff88042f263c00
  [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 
ffff880428c25fd8
  [17007961.189885] Call Trace:
  [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82
  [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a
  [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0
  [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70
  [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310
  [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0
  [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0
  [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100
  [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150
  [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20
  [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0
  [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60
  [17007961.189999] [<ffffffff810ca280>] ? 
ftrace_raw_output_rcu_utilization+0x50/0x50
  [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50
  [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 
[vmw_vmci]
  [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 
[vmw_vsock_vmci_transport]
  [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock]
  [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180
  [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20
  [17007961.190102] [<ffffffffa018a2c0>] 
vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport]
  [17007961.190114] [<ffffffffa01a7efc>] 
vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci]
  [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci]
  [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130
  [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
  [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110
  [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0
  [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d
  [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10
  [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100
  [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30
  [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0
  [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0
  [17007961.190221] bad: scheduling from the idle thread!

  [Other infos]

  I have identified a patch series[2] which seems to fix the exact same 
situation.
  A full discussion can be found on patchworks[3], suggesting a certain patch 
series[2] authored by Vmware.

  [1] - VM details :
  Release: Trusty
  Kernel: Ubuntu-3.13.0-135

  [2] Upstream patch series
  8ab18d7 VSOCK: Detach QP check should filter out non matching QPs.
  8566b86 VSOCK: Fix lockdep issue.
  4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context

  [3] - https://patchwork.kernel.org/patch/9948741/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780470/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to