Hi Valerio, On 7 August 2012 11:17, Valerio Schiavoni <[email protected]> wrote: > Dear all, > one of the hosts in my cluster is reported to be in state ERROR. > I've looked through its sys log messages, and I found this: > > [ 1.847465] WARNING: at /build/buildd/linux-3.2.0/kernel/watchdog.c:241 > watchdog_overflow_callback+0x9a/0xc0() > [ 1.847465] Hardware name: SE1102 > [ 1.847465] Watchdog detected hard LOCKUP on cpu 7 > [ 1.847465] Modules linked in: > [ 1.847465] Pid: 1, comm: swapper/0 Not tainted 3.2.0-27-generic > #43-Ubuntu > [ 1.847465] Call Trace: > [ 1.847465] <NMI> [<ffffffff8106729f>] warn_slowpath_common+0x7f/0xc0 > [ 1.847465] [<ffffffff81067396>] warn_slowpath_fmt+0x46/0x50 > [ 1.847465] [<ffffffff810d837a>] watchdog_overflow_callback+0x9a/0xc0 > [ 1.847465] [<ffffffff81111bb6>] __perf_event_overflow+0x96/0x1e0 > [ 1.847465] [<ffffffff8110f141>] ? perf_event_update_userpage+0x11/0xc0 > [ 1.847465] [<ffffffff81023aaa>] ? x86_perf_event_set_period+0xda/0x150 > [ 1.847465] [<ffffffff81112104>] perf_event_overflow+0x14/0x20 > [ 1.847465] [<ffffffff810281c3>] intel_pmu_handle_irq+0x163/0x210 > [ 1.847465] [<ffffffff8139c350>] ? ghes_read_estatus+0x90/0x180 > [ 1.847465] [<ffffffff8165b7c1>] perf_event_nmi_handler+0x21/0x30 > [ 1.847465] [<ffffffff8165b089>] default_do_nmi+0x69/0x220 > [ 1.847465] [<ffffffff8165b2c0>] do_nmi+0x80/0x90 > [ 1.847465] [<ffffffff8165a6b0>] nmi+0x20/0x30 > [ 1.847465] [<ffffffff814c84c2>] ? __i8042_command.part.1+0xd2/0x200 > [ 1.847465] <<EOE>> [<ffffffff814c8630>] __i8042_command+0x40/0x60 > [ 1.847465] [<ffffffff814c8689>] i8042_command+0x39/0x60 > [ 1.847465] [<ffffffff81d3447e>] i8042_check_aux+0x24/0x201 > [ 1.847465] [<ffffffff81d34cb1>] i8042_setup_aux+0x16/0x12e > [ 1.847465] [<ffffffff81d34dee>] i8042_probe.part.10+0x25/0xbd > [ 1.847465] [<ffffffff81d34eb1>] i8042_probe+0x2b/0x2d > [ 1.847465] [<ffffffff813f6e57>] platform_drv_probe+0x17/0x20 > [ 1.847465] [<ffffffff813f56b8>] really_probe+0x68/0x190 > [ 1.847465] [<ffffffff813f5945>] driver_probe_device+0x45/0x70 > [ 1.847465] [<ffffffff813f5a1b>] __driver_attach+0xab/0xb0 > [ 1.847465] [<ffffffff813f5970>] ? driver_probe_device+0x70/0x70 > [ 1.847465] [<ffffffff813f5970>] ? driver_probe_device+0x70/0x70 > [ 1.847465] [<ffffffff813f47ac>] bus_for_each_dev+0x5c/0x90 > [ 1.847465] [<ffffffff813f547e>] driver_attach+0x1e/0x20 > [ 1.847465] [<ffffffff813f50d0>] bus_add_driver+0x1a0/0x270 > [ 1.847465] [<ffffffff813f5f86>] driver_register+0x76/0x140 > [ 1.847465] [<ffffffff813f7416>] platform_driver_register+0x46/0x50 > [ 1.847465] [<ffffffff813f7448>] platform_driver_probe+0x28/0xb0 > [ 1.847465] [<ffffffff813f7be1>] platform_create_bundle+0xc1/0xf0 > [ 1.847465] [<ffffffff81d34e86>] ? i8042_probe.part.10+0xbd/0xbd > [ 1.847465] [<ffffffff81d349eb>] ? i8042_platform_init+0xb1/0xb1 > [ 1.847465] [<ffffffff81d34a47>] i8042_init+0x5c/0x80 > [ 1.847465] [<ffffffff81002040>] do_one_initcall+0x40/0x180 > [ 1.847465] [<ffffffff81cfbce9>] kernel_init+0xd9/0x158 > [ 1.847465] [<ffffffff816643f4>] kernel_thread_helper+0x4/0x10 > [ 1.847465] [<ffffffff81cfbc10>] ? start_kernel+0x3bd/0x3bd > [ 1.847465] [<ffffffff816643f0>] ? gs_change+0x13/0x13 > [ 1.847465] ---[ end trace ffc3668eca9f076b ]--- > ... > [ 60.721539] init: failsafe main process (734) killed by TERM signal > [ 60.756219] type=1400 audit(1343661831.840:8): apparmor="STATUS" > operation="profile_replace" name="/sbin/dhclient" pid=877 > comm="apparmor_parser" > [ 60.756574] type=1400 audit(1343661831.840:9): apparmor="STATUS" > operation="profile_replace" > name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=877 > comm="apparmor_parser" > [ 60.756777] type=1400 audit(1343661831.840:10): apparmor="STATUS" > operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" > pid=877 comm="apparmor_parser" > [ 60.762495] type=1400 audit(1343661831.844:11): apparmor="STATUS" > operation="profile_load" name="/usr/sbin/libvirtd" pid=879 > comm="apparmor_parser" > [ 60.772471] type=1400 audit(1343661831.856:12): apparmor="STATUS" > operation="profile_load" name="/usr/sbin/tcpdump" pid=881 > comm="apparmor_parser" > [ 60.772800] type=1400 audit(1343661831.856:13): apparmor="STATUS" > operation="profile_load" name="/usr/lib/libvirt/virt-aa-helper" pid=878 > comm="apparmor_parser" > [ 60.837322] init: libvirt-bin main process (957) terminated with status 6 > [ 60.837345] init: libvirt-bin main process ended, respawning > [ 60.850389] init: libvirt-bin main process (973) terminated with status 6 > [ 60.850411] init: libvirt-bin main process ended, respawning > [ 60.863761] init: libvirt-bin main process (989) terminated with status 6 > [ 60.863783] init: libvirt-bin main process ended, respawning > [ 60.890572] init: libvirt-bin main process (1022) terminated with status > 6 > [ 60.890597] init: libvirt-bin main process ended, respawning > [ 60.904526] init: libvirt-bin main process (1042) terminated with status > 6 > [ 60.904550] init: libvirt-bin main process ended, respawning > [ 60.917624] init: libvirt-bin main process (1058) terminated with status > 6 > [ 60.917647] init: libvirt-bin main process ended, respawning > [ 60.931100] init: libvirt-bin main process (1074) terminated with status > 6 > [ 60.931123] init: libvirt-bin main process ended, respawning > [ 60.944178] init: libvirt-bin main process (1090) terminated with status > 6 > [ 60.944200] init: libvirt-bin main process ended, respawning > [ 60.963068] init: libvirt-bin main process (1106) terminated with status > 6 > [ 60.963101] init: libvirt-bin main process ended, respawning > [ 60.975641] init: libvirt-bin main process (1122) terminated with status > 6 > [ 60.975674] init: libvirt-bin main process ended, respawning > [ 60.988176] init: libvirt-bin main process (1138) terminated with status > 6 > [ 60.988203] init: libvirt-bin respawning too fast, stopped > > > Do you believe the first WARNING message is responsible for the libvirt > error somehow? > The machine runs Ubuntu 12.04 LTS, 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 > 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > Any suggestion is highly appreciated.
Could you check if there is any error message in oned.log or onehost show ID -x? Cheers > Best regards, > Valerio > > _______________________________________________ > Users mailing list > [email protected] > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org > -- Daniel Molina Project Engineer OpenNebula - The Open Source Solution for Data Center Virtualization www.OpenNebula.org | [email protected] | @OpenNebula _______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
