Hello, I lost contact with my Scientific Linux 6.1 KVM host earlier today.
The machine is headless and I don't have any IPMI stuff on the machine so I had to plug a monitor into it. However, there was no life from the monitor and I pressed the reset button. It seems to me that the networking died. The machine is booted first thing every morning (so the 9:00am start was missed by two minutes!) and the networking error seems to have occurred about 27 minutes after the initial boot. Only two guests are on the machine and they are set to autostart. The messages I found in /var/log messages are as follows: Oct 5 09:02:42 kvm-sl6x kernel: ------------[ cut here ]------------ Oct 5 09:02:42 kvm-sl6x kernel: WARNING: at arch/x86/kernel/cpu/mtrr/generic.c:467 generic_get_mtrr+0x11e/0x140() (Not tainted) Oct 5 09:02:42 kvm-sl6x kernel: Hardware name: ProLiant ML115 G5 Oct 5 09:02:42 kvm-sl6x kernel: mtrr: your BIOS has set up an incorrect mask, fixing it up. Oct 5 09:02:42 kvm-sl6x kernel: Modules linked in: Oct 5 09:02:42 kvm-sl6x kernel: Pid: 0, comm: swapper Not tainted 2.6.32-131.12.1.el6.x86_64 #1 Oct 5 09:02:42 kvm-sl6x kernel: Call Trace: Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff810670f7>] ? warn_slowpath_common+0x87/0xc0 Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff810671e6>] ? warn_slowpath_fmt+0x46/0x50 Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff8102648e>] ? generic_get_mtrr+0x11e/0x140 Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff81c298dc>] ? mtrr_cleanup+0x8c/0x3fd Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff81c28752>] ? get_mtrr_state+0x2ec/0x2fb Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff81c28293>] ? mtrr_bp_init+0x1ab/0x1d2 Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff81c2307f>] ? setup_arch+0x4b8/0xad0 Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff814da754>] ? printk+0x41/0x45 Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff81c1dbe7>] ? start_kernel+0xdc/0x429 Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff81c1d33a>] ? x86_64_start_reservations+0x125/0x129 Oct 5 09:02:42 kvm-sl6x kernel: [<ffffffff81c1d438>] ? x86_64_start_kernel+0xfa/0x109 Oct 5 09:02:42 kvm-sl6x kernel: ---[ end trace a7919e7f17c0a725 ]--- Oct 5 09:29:10 kvm-sl6x kernel: ------------[ cut here ]------------ Oct 5 09:29:10 kvm-sl6x kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Tainted: G W ---------------- ) Oct 5 09:29:10 kvm-sl6x kernel: Hardware name: ProLiant ML115 G5 Oct 5 09:29:10 kvm-sl6x kernel: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out Oct 5 09:29:10 kvm-sl6x kernel: Modules linked in: ebtable_nat ebtables ipt_REJECT sunrpc bridge stp llc ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun kvm_amd kvm tg3 ghes hed amd64_edac_mod edac_core edac_mce_amd k8temp hwmon shpchp sg i2c_nforce2 i2c_core ext4 mbcache jbd2 raid1 sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi sata_nv dm_mod [last unloaded: scsi_wait_scan] Oct 5 09:29:10 kvm-sl6x kernel: Pid: 0, comm: swapper Tainted: G W ---------------- 2.6.32-131.12.1.el6.x86_64 #1 Oct 5 09:29:10 kvm-sl6x kernel: Call Trace: Oct 5 09:29:10 kvm-sl6x kernel: <IRQ> [<ffffffff810670f7>] ? warn_slowpath_common+0x87/0xc0 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff810671e6>] ? warn_slowpath_fmt+0x46/0x50 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8143a39d>] ? dev_watchdog+0x26d/0x280 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff810792c5>] ? internal_add_timer+0xb5/0x110 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8143a130>] ? dev_watchdog+0x0/0x280 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff81079ef7>] ? run_timer_softirq+0x197/0x340 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8109e010>] ? tick_sched_timer+0x0/0xc0 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8102a00d>] ? lapic_next_event+0x1d/0x30 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8106f6e1>] ? __do_softirq+0xc1/0x1d0 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff81092cc0>] ? hrtimer_interrupt+0x140/0x250 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8106f4c5>] ? irq_exit+0x85/0x90 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff814e3030>] ? smp_apic_timer_interrupt+0x70/0x9b Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff8100bc93>] ? apic_timer_interrupt+0x13/0x20 Oct 5 09:29:10 kvm-sl6x kernel: <EOI> [<ffffffff8103628b>] ? native_safe_halt+0xb/0x10 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff814e06f6>] ? notifier_call_chain+0x16/0x80 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff810142ed>] ? default_idle+0x4d/0xb0 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff810143b3>] ? c1e_idle+0x63/0x120 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff81009e86>] ? cpu_idle+0xb6/0x110 Oct 5 09:29:10 kvm-sl6x kernel: [<ffffffff814d438a>] ? start_secondary+0x202/0x245 Oct 5 09:29:10 kvm-sl6x kernel: ---[ end trace a7919e7f17c0a727 ]--- Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: eth0: transmit timed out, resetting Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: eth0: DEBUG: MAC_TX_STATUS[ffffffff] MAC_RX_STATUS[ffffffff] Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: eth0: DEBUG: RDMAC_STATUS[ffffffff] WDMAC_STATUS[ffffffff] Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=2c00 enable_bit=2 Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=2000 enable_bit=2 Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=2400 enable_bit=2 Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=2800 enable_bit=2 Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=3000 enable_bit=2 Oct 5 09:29:10 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=1400 enable_bit=2 Oct 5 09:29:11 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=1800 enable_bit=2 Oct 5 09:29:11 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=c00 enable_bit=2 Oct 5 09:29:11 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=4800 enable_bit=2 Oct 5 09:29:11 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=1000 enable_bit=2 Oct 5 09:29:11 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=1c00 enable_bit=2 Oct 5 09:29:11 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff Oct 5 09:29:11 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=3c00 enable_bit=2 Oct 5 09:29:11 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2 Oct 5 09:29:12 kvm-sl6x kernel: tg3 0000:11:00.0: eth0: No firmware running Oct 5 09:29:14 kvm-sl6x kernel: tg3 0000:11:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff Oct 5 09:29:26 kvm-sl6x kernel: tg3 0000:11:00.0: eth0: Link is down Oct 5 09:29:26 kvm-sl6x kernel: Clocksource tsc unstable (delta = 4398046088828 ns) Oct 5 09:29:26 kvm-sl6x kernel: br0: port 1(eth0) entering disabled state Does anyone have any ideas about these errors? James
