Starting to see a few errors listed as
Nov 24 23:10:09 proliant02 kernel: hrtimer: interrupt took 9285 ns Nov 20 19:10:43 proliant01 kernel: hrtimer: interrupt took 115866 ns On 2 servers in my 4 server cluster.. Prior to this late on the evening of the 23 it must have been bad enough to generate several errors all containing Nov 23 19:20:39 proliant02 kernel: [<ffffffff810a44d4>] ? hrtimer_start_range_ns+0x14/0x20 And end up with all guest offline, but server was still running.. Also surrounding these messages were as listed below Asking for a nudge in the right direction bad guest, bug in kvm, and even read something about possible cpu voltage issue. Thanks for you input/feedback Nov 23 19:20:15 proliant02 kernel: BUG: soft lockup - CPU#15 stuck for 67s! [kvm:679003] Nov 23 19:20:15 proliant02 kernel: Modules linked in: ip_set vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vhost_net tun macvtap macvlan nfnetlink_log kvm_amd nfnetlink kvm dlm configfs vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc bonding 8021q garp ipv6 fuse snd_pcsp snd_pcm snd_page_alloc snd_timer snd soundcore i2c_piix4 amd64_edac_mod k10temp fam15h_power edac_mce_amd serio_raw edac_core i2c_core hpilo hpwdt shpchp power_meter ext3 mbcache jbd sg ata_generic pata_acpi tg3 ahci ptp pata_atiixp pps_core hpsa [last unloaded: scsi_wait_scan] Nov 23 19:20:15 proliant02 kernel: CPU 15 Nov 23 19:20:15 proliant02 kernel: Modules linked in: ip_set vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vhost_net tun macvtap macvlan nfnetlink_log kvm_amd nfnetlink kvm dlm configfs vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc bonding 8021q garp ipv6 fuse snd_pcsp snd_pcm snd_page_alloc snd_timer snd soundcore i2c_piix4 amd64_edac_mod k10temp fam15h_power edac_mce_amd serio_raw edac_core i2c_core hpilo hpwdt shpchp power_meter ext3 mbcache jbd sg ata_generic pata_acpi tg3 ahci ptp pata_atiixp pps_core hpsa [last unloaded: scsi_wait_scan] Nov 23 19:20:15 proliant02 kernel: Nov 23 19:20:15 proliant02 kernel: Pid: 679003, comm: kvm veid: 0 Not tainted 2.6.32-34-pve #1 042stab094_7 HP ProLiant DL385p Gen8 Nov 23 19:20:15 proliant02 kernel: RIP: 0010:[<ffffffff8155d21e>] [<ffffffff8155d21e>] _spin_lock+0x1e/0x30 Nov 23 19:20:15 proliant02 kernel: RSP: 0018:ffff880d4009da08 EFLAGS: 00000297 Nov 23 19:20:15 proliant02 kernel: RAX: 0000000000006351 RBX: ffff880d4009da08 RCX: ffff880cf125ec30 Nov 23 19:20:15 proliant02 kernel: RDX: 0000000000006350 RSI: ffff8800000005d0 RDI: ffffea0031fdb650 Nov 23 19:20:15 proliant02 kernel: RBP: ffffffff8100bcce R08: ffff880ba7b1f6b8 R09: ffff880d4009dbc0 Nov 23 19:20:15 proliant02 kernel: R10: ffff880d4009dbc0 R11: 0000000000000005 R12: ffff88074c054aa8 Nov 23 19:20:15 proliant02 kernel: R13: ffff88074c054ac0 R14: ffff880cd100bce0 R15: ffff88103cfcfc00 Nov 23 19:20:15 proliant02 kernel: FS: 00007f7452d5e780(0000) GS:ffff880c50640000(0000) knlGS:0000000000000000 Nov 23 19:20:15 proliant02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 23 19:20:15 proliant02 kernel: CR2: 00007f7452d77000 CR3: 000000090ced3000 CR4: 00000000000407e0 Nov 23 19:20:15 proliant02 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 23 19:20:15 proliant02 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Nov 23 19:20:15 proliant02 kernel: Process kvm (pid: 679003, veid: 0, threadinfo ffff880d4009c000, task ffff880cf125ec30) Nov 23 19:20:15 proliant02 kernel: Stack: Nov 23 19:20:15 proliant02 kernel: ffff880d4009da68 ffffffff8116480b ffff880c7f6d95d0 ffff880ba7b1f6b8 Nov 23 19:20:15 proliant02 kernel: <d> 00000000506599c0 ffffea0031fdb650 ffff880d4009da68 0000000000000007 Nov 23 19:20:15 proliant02 kernel: <d> ffff880e448d8080 ffff880cf125ec30 0000000000000000 ffff880a35cb0200 Nov 23 19:20:15 proliant02 kernel: Call Trace: Nov 23 19:20:15 proliant02 kernel: [<ffffffff8116480b>] ? follow_page+0x24b/0x4b0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff81166d0d>] ? __get_user_pages+0x10d/0x3d0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff8116708a>] ? get_user_pages+0x5a/0x60 Nov 23 19:20:15 proliant02 kernel: [<ffffffff8104c9d7>] ? get_user_pages_fast+0xa7/0x190 Nov 23 19:20:15 proliant02 kernel: [<ffffffffa0454f93>] ? hva_to_pfn+0x33/0x170 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffff8155c766>] ? down_read+0x16/0x2b Nov 23 19:20:15 proliant02 kernel: [<ffffffffa046db5b>] ? mapping_level+0x14b/0x1b0 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa0474994>] ? tdp_page_fault+0x74/0x150 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa046f688>] ? kvm_mmu_page_fault+0x28/0xd0 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa04bd0af>] ? pf_interception+0x7f/0xe0 [kvm_amd] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa04c0f8e>] ? handle_exit+0x1be/0x3c0 [kvm_amd] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa0469ee0>] ? kvm_arch_vcpu_ioctl_run+0x3c0/0xf60 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa04508e3>] ? kvm_vcpu_ioctl+0x2e3/0x580 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffff810c18b9>] ? do_futex+0x159/0xb00 Nov 23 19:20:15 proliant02 kernel: [<ffffffff81050ee5>] ? __wake_up_common+0x55/0x90 Nov 23 19:20:15 proliant02 kernel: [<ffffffff811bcb9a>] ? vfs_ioctl+0x2a/0xa0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff811f6af6>] ? eventfd_write+0xc6/0x1d0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff811bd1ce>] ? do_vfs_ioctl+0x7e/0x5a0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff8155a49c>] ? thread_return+0xbc/0x880 Nov 23 19:20:15 proliant02 kernel: [<ffffffff810c22ed>] ? sys_futex+0x8d/0x190 Nov 23 19:20:15 proliant02 kernel: [<ffffffff811bd73f>] ? sys_ioctl+0x4f/0x80 Nov 23 19:20:15 proliant02 kernel: [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b Nov 23 19:20:15 proliant02 kernel: Code: 00 00 00 01 74 05 e8 02 5c d3 ff 5d c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f4 eb df 5d c3 0f 1f 40 00 55 48 89 Nov 23 19:20:15 proliant02 kernel: Call Trace: Nov 23 19:20:15 proliant02 kernel: [<ffffffff8116480b>] ? follow_page+0x24b/0x4b0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff81166d0d>] ? __get_user_pages+0x10d/0x3d0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff8116708a>] ? get_user_pages+0x5a/0x60 Nov 23 19:20:15 proliant02 kernel: [<ffffffff8104c9d7>] ? get_user_pages_fast+0xa7/0x190 Nov 23 19:20:15 proliant02 kernel: [<ffffffffa0454f93>] ? hva_to_pfn+0x33/0x170 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffff8155c766>] ? down_read+0x16/0x2b Nov 23 19:20:15 proliant02 kernel: [<ffffffffa046db5b>] ? mapping_level+0x14b/0x1b0 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa0474994>] ? tdp_page_fault+0x74/0x150 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa046f688>] ? kvm_mmu_page_fault+0x28/0xd0 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa04bd0af>] ? pf_interception+0x7f/0xe0 [kvm_amd] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa04c0f8e>] ? handle_exit+0x1be/0x3c0 [kvm_amd] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa0469ee0>] ? kvm_arch_vcpu_ioctl_run+0x3c0/0xf60 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffffa04508e3>] ? kvm_vcpu_ioctl+0x2e3/0x580 [kvm] Nov 23 19:20:15 proliant02 kernel: [<ffffffff810c18b9>] ? do_futex+0x159/0xb00 Nov 23 19:20:15 proliant02 kernel: [<ffffffff81050ee5>] ? __wake_up_common+0x55/0x90 Nov 23 19:20:15 proliant02 kernel: [<ffffffff811bcb9a>] ? vfs_ioctl+0x2a/0xa0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff811f6af6>] ? eventfd_write+0xc6/0x1d0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff811bd1ce>] ? do_vfs_ioctl+0x7e/0x5a0 Nov 23 19:20:15 proliant02 kernel: [<ffffffff8155a49c>] ? thread_return+0xbc/0x880 Nov 23 19:20:15 proliant02 kernel: [<ffffffff810c22ed>] ? sys_futex+0x8d/0x190 Nov 23 19:20:15 proliant02 kernel: [<ffffffff811bd73f>] ? sys_ioctl+0x4f/0x80 Nov 23 19:20:15 proliant02 kernel: [<ffffffff8100b182>] ? system_call_fastpath+0x16/0x1b server shows the following pkg versions... proxmox-ve-2.6.32: 3.3-139 (running kernel: 2.6.32-34-pve) pve-manager: 3.3-5 (running version: 3.3-5/bfebec03) pve-kernel-2.6.32-32-pve: 2.6.32-136 pve-kernel-2.6.32-27-pve: 2.6.32-121 pve-kernel-2.6.32-33-pve: 2.6.32-138 pve-kernel-2.6.32-34-pve: 2.6.32-139 pve-kernel-2.6.32-31-pve: 2.6.32-132 lvm2: 2.02.98-pve4 clvm: 2.02.98-pve4 corosync-pve: 1.4.7-1 openais-pve: 1.1.4-3 libqb0: 0.11.1-2 redhat-cluster-pve: 3.2.0-2 resource-agents-pve: 3.9.2-4 fence-agents-pve: 4.0.10-1 pve-cluster: 3.0-15 qemu-server: 3.3-3 pve-firmware: 1.1-3 libpve-common-perl: 3.0-19 libpve-access-control: 3.0-15 libpve-storage-perl: 3.0-25 pve-libspice-server1: 0.12.4-3 vncterm: 1.1-8 vzctl: 4.0-1pve6 vzprocps: 2.0.11-2 vzquota: 3.1-2 pve-qemu-kvm: 2.1-10 ksm-control-daemon: 1.1-1 glusterfs-client: 3.5.2-1 _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
