Hi Tivon, I personally think that it's worth it to reproduce the issue and get the logs, even though it does really sound like a driver/kernel issue. That may help get more understanding as to why it happens, and maybe even get the driver/kernel fix.
Thanks in advance, On Thu, Jul 15, 2021 at 12:38 AM Tivon Häberlein <tivon.haeberl...@secges.de> wrote: > Hi Nathaniel, > > thanks for your time here and sorry for my late reply now. > > Even though my NICs didn't use the E1000E driver I now got a broadcom NIC > from the stash and gave it a try. > I'm happy to announce that the NICs don't seem to be resetting on the > broadcom NIC. > This obviously means that there's some driver issue with the Intel NICs I > have been trying. > > I still don't get the host into operational state because "Failed to > connect Host n3 to Storage Pool cl1" even though NFS is mounted properly > but I this is a different issue I think. > > If you want I can reproduce this issue and grab all logs to maybe find a > fix other than "get a broadcom NIC" for the community. > To be honest though, I think this just can be added to the "weird driver > fuckups in centos" list if we start digging. > > -- > Best regards > Tivon Häberlein > > > > On 13.07.2021 01:07, Nathaniel Roach via Users wrote: > > > On 12/7/21 11:44 pm, Nathaniel Roach via Users wrote: > > Do you get anything in the logs at all? For something like this I would > expect it to show in syslog from the kernel. > > It really does sound like the E1000E issue, but will probably have a > different fix - I first encountered it on a router when I was pushing > >100Mbps in *and then back out* the same NIC. Otherwise it wouldn't > happen at all. That would explain why it's not an issue in maintenance mode > and downloading an image works fine. > On 12/7/21 7:57 am, Tivon Häberlein wrote: > > Hi Strahil, > > the server uses Intel NICs with ixgbe and igb kernel drivers. > I did upgrade the firmware to the latest available one (through Dell > lifecycle-contoller). > I also tried replacing the network card itself but without success. > > As this issue did not arise when running Debian 10 or even oVirt Node > before adding it to the cluster I don't think its hardware related. For my > testing I mounted my oVirt Datastore manually on the fresh install of oVirt > node (using the ISO) and then coping a large ISO file to the local disk. > This fills the NIC up to the full 1 Gbit/s I have available there for a > good 5 to 10 minutes. > Also the administration through cockpit works perfectly before adding it > to the cluster. > > As soon as I add the node to the cluster the trouble starts. > 1. oVirt reports that the install has failed on this host > 2. the node logs (kernel log) adapter resets on some interfaces (even ones > that arent UP) > > Having read your message again, are you able to capture these log events > before the node gets fenced (or just disable fencing for the time)? > > 3. the engine looses connection to the host and declares it "Unresponsive" > 4. the node becomes unmanageable through cockpit or ssh because the > connection is lost repeatedly. > 5. the fencing agent reboots the node (If fencing is enabled) > 6. node comes up and gets added to the cluster (oVirt says the node is in > state UP) > 7. repeat from step 2 > > It seems that this behavior stops when I put the node into maintenance. > Then I can even mount the Datastore manually and transfer large ISOs > without it dropping the connection. > > This is all very strange and I don't understand what causes this. > > Thank you. > > -- > Best regards > Tivon Häberlein > > On 11.07.2021 13:51, Strahil Nikolov wrote: > > Are you sure it's not a HW issue ? > Try to update the server to latest firmware and test again.At least it > won't hurt. > > Best Regards, > Strahil Nikolov > > On Sat, Jul 10, 2021 at 14:45, Tivon Häberlein > <tivon.haeberl...@secges.de> <tivon.haeberl...@secges.de> wrote: > > Hi, > > I've been trying to get oVirt Node 4.4.6 up and running on my Dell r620 > hosts but am facing a strange issue where seemingly all network adapters > get reset at random times after install. > The interfaces reset as soon as a bit of traffic is flowing through them. > Also the logs show nfs timeouts. > > This only happens after I have installed the host using the oVirt engine > and it also only happens when the host is connected to the engine. When the > host is in maintenance mode it also seems to work fine. > > The host and networks work fine when its by itself (I tested right after > install using the ISO and also after I have removed the host from the > cluster) > > I cant figure why this is happening. Am I missing something? > I've been stuck on this for the last couple of weeks, a bit of help would > be much appreciated. > > Thank you! > > My cluster is looking like this: > Engine: oVirt 4.4.6 - CentOS Linux release 8.3.2011 > host1: oVirt 4.4 repository on CentOS Linux release 8.4.2105 > host2: oVirt 4.4 repository on CentOS Linux release 8.4.2105 > host3 (this is the one I'm trying to install): oVirt node 4.4.6 > > -- > Best regards > Tivon Häberlein > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/UQP3S4LFWGEP4KL4EUFDZ47WPKT4M6QN/ > > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/ADWSPMDDO6DJYL7LVKYLHC4KMDTIFMA6/ > > -- > > *Nathaniel Roach* > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/4VCK77N63IFZRNP2NEDS6TRABVGYXCLH/ > > -- > > *Nathaniel Roach* > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/DRECREHLNKLOYMZWCQEVDMEWAR734AJ3/ > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLI7DD6LIPSIYMQAY57TSGBXP6U3JCNO/ > -- Lev Veyde Senior Software Engineer, RHCE | RHCVA | MCITP Red Hat Israel <https://www.redhat.com> l...@redhat.com | lve...@redhat.com <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LCFMVGJVM3MGHHYBDIOFO3QEXOTOYBSI/