Hi Jurrien, I don't see anything in logs on the nodes itself. The only thing we see in logs are in engine log - it looses connectivity to the host. Definitely CentOS 7.1/7.2 related. Downgraded the hosts to ovirt-iso 3.5, this resolves the issue.
On Fri, Mar 18, 2016 at 9:01 AM, Bloemen, Jurriën < jurrien.bloe...@dmc.amcnetworks.com> wrote: > Hi Johan, > > Could you check if you see the following in you dmesg or message log file? > > [1123306.014288] ------------[ cut here ]------------ > [1123306.014302] WARNING: at net/core/dev.c:2189 > skb_warn_bad_offload+0xcd/0xda() > [1123306.014306] : caps=(0x0000000200004849, 0x0000000000000000) len=330 > data_len=276 gso_size=276 gso_type=1 ip_summed=1 > [1123306.014308] Modules linked in: vhost_net macvtap macvlan > ip6table_filter ip6_tables iptable_filter ip_tables ebt_arp ebtable_nat > ebtables tun scsi_transport_iscsi iTCO_wdt iTCO_vendor_support > dm_service_time intel_powerclamp coretemp intel_rapl kvm_intel kvm > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd pcspkr sb_edac > edac_core i2c_i801 lpc_ich mfd_core mei_me mei wmi ioatdma shpchp > ipmi_devintf ipmi_si ipmi_msghandler acpi_power_meter acpi_pad 8021q garp > mrp bridge stp llc bonding dm_multipath xfs libcrc32c sd_mod crc_t10dif > crct10dif_common ast syscopyarea sysfillrect sysimgblt drm_kms_helper ttm > crc32c_intel igb drm ahci ixgbe i2c_algo_bit libahci libata mdio i2c_core > ptp megaraid_sas pps_core dca dm_mirror dm_region_hash dm_log dm_mod > [1123306.014360] CPU: 30 PID: 0 Comm: swapper/30 Tainted: G W > -------------- 3.10.0-229.1.2.el7.x86_64 #1 > [1123306.014362] Hardware name: Supermicro SYS-2028TP-HC1TR/X10DRT-PT, > BIOS 1.1 08/03/2015 > [1123306.014364] ffff881fffc439a8 5326fb90ad1041ea ffff881fffc43960 > ffffffff81604afa > [1123306.014371] ffff881fffc43998 ffffffff8106e34b ffff881fcebb0500 > ffff881fce88c000 > [1123306.014376] 0000000000000001 0000000000000001 ffff881fcebb0500 > ffff881fffc43a00 > [1123306.014381] Call Trace: > [1123306.014383] <IRQ> [<ffffffff81604afa>] dump_stack+0x19/0x1b > [1123306.014396] [<ffffffff8106e34b>] warn_slowpath_common+0x6b/0xb0 > [1123306.014399] [<ffffffff8106e3ec>] warn_slowpath_fmt+0x5c/0x80 > [1123306.014405] [<ffffffff812db093>] ? ___ratelimit+0x93/0x100 > [1123306.014409] [<ffffffff816076c3>] skb_warn_bad_offload+0xcd/0xda > [1123306.014425] [<ffffffff814fdeb9>] __skb_gso_segment+0x79/0xb0 > [1123306.014429] [<ffffffff814fe1c2>] dev_hard_start_xmit+0x1a2/0x580 > [1123306.014438] [<ffffffffa0168790>] ? deliver_clone+0x50/0x50 [bridge] > [1123306.014443] [<ffffffff8151df1e>] sch_direct_xmit+0xee/0x1c0 > [1123306.014447] [<ffffffff814fe798>] dev_queue_xmit+0x1f8/0x4a0 > [1123306.014453] [<ffffffffa016880b>] br_dev_queue_push_xmit+0x7b/0xc0 > [bridge] > [1123306.014458] [<ffffffffa0168a22>] br_forward_finish+0x22/0x60 [bridge] > [1123306.014464] [<ffffffffa0168ae0>] __br_forward+0x80/0xf0 [bridge] > [1123306.014469] [<ffffffffa0168ebb>] br_forward+0x8b/0xa0 [bridge] > [1123306.014476] [<ffffffffa0169e65>] br_handle_frame_finish+0x175/0x410 > [bridge] > [1123306.014481] [<ffffffffa016a275>] br_handle_frame+0x175/0x260 [bridge] > [1123306.014485] [<ffffffff814fc112>] __netif_receive_skb_core+0x282/0x870 > [1123306.014490] [<ffffffff8101b589>] ? read_tsc+0x9/0x10 > [1123306.014493] [<ffffffff814fc718>] __netif_receive_skb+0x18/0x60 > [1123306.014497] [<ffffffff814fc7a0>] netif_receive_skb+0x40/0xd0 > [1123306.014500] [<ffffffff814fd2b0>] napi_gro_receive+0x80/0xb0 > [1123306.014512] [<ffffffffa00cde2c>] ixgbe_clean_rx_irq+0x7ac/0xb30 > [ixgbe] > [1123306.014519] [<ffffffffa00cf07b>] ixgbe_poll+0x4bb/0x930 [ixgbe] > [1123306.014524] [<ffffffff814fcb62>] net_rx_action+0x152/0x240 > [1123306.014528] [<ffffffff81077bf7>] __do_softirq+0xf7/0x290 > [1123306.014533] [<ffffffff8161635c>] call_softirq+0x1c/0x30 > [1123306.014539] [<ffffffff81015de5>] do_softirq+0x55/0x90 > [1123306.014543] [<ffffffff81077f95>] irq_exit+0x115/0x120 > [1123306.014546] [<ffffffff81616ef8>] do_IRQ+0x58/0xf0 > [1123306.014551] [<ffffffff8160c0ed>] common_interrupt+0x6d/0x6d > [1123306.014553] <EOI> [<ffffffff814aa6d2>] ? > cpuidle_enter_state+0x52/0xc0 > [1123306.014561] [<ffffffff814aa6c8>] ? cpuidle_enter_state+0x48/0xc0 > [1123306.014565] [<ffffffff814aa805>] cpuidle_idle_call+0xc5/0x200 > [1123306.014569] [<ffffffff8101d21e>] arch_cpu_idle+0xe/0x30 > [1123306.014574] [<ffffffff810c6945>] cpu_startup_entry+0xf5/0x290 > [1123306.014580] [<ffffffff810423ca>] start_secondary+0x1ba/0x230 > [1123306.014582] ---[ end trace 4d5a1bc838e1fcc0 ]--- > > If so, then could you try the following: > > ethtool -K <nic name> lro off > > Do this for all the 10G intel nics and check if the problems still exists > > > *Kind regards,* > > > > *Jurriën Bloemen* > > On 17-03-16 09:49, Johan Kooijman wrote: > > Hi all, > > Since we upgraded to the latest ovirt node running 7.2, we're seeing that > nodes become unavailable after a while. It's running fine, with a couple of > VM's on it, untill it becomes non responsive. At that moment it doesn't > even respond to ICMP. It'll come back by itself after a while, but oVirt > fences the machine before that time and restarts VM's elsewhere. > > Engine tells me this message: > > VDSM host09 command failed: Message timeout which can be caused by > communication issues > > Is anyone else experiencing these issues with ixgbe drivers? I'm running > on Intel X540-AT2 cards. > > -- > Met vriendelijke groeten / With kind regards, > Johan Kooijman > > > _______________________________________________ > Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users > > > This message (including any attachments) may contain information that is > privileged or confidential. If you are not the intended recipient, please > notify the sender and delete this email immediately from your systems and > destroy all copies of it. You may not, directly or indirectly, use, > disclose, distribute, print or copy this email or any part of it if you are > not the intended recipient > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > -- Met vriendelijke groeten / With kind regards, Johan Kooijman
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users