Hi Tivon, I think that the most interesting one to see is the /var/log/messages , however I think it's best to simply archive the whole /var/log
Thanks in advance, On Thu, Jul 15, 2021 at 1:36 PM Tivon Häberlein <tivon.haeberl...@secges.de> wrote: > Hi Lev, > > thanks for your reply. > I'll gladly grab the logs in the next couple of days (got to go back to > the DC to swap the cards back). > > Can you give me a list of logs I should grab so I don't miss any? > > -- > Best regards > Tivon Häberlein > > On 15.07.2021 01:25, Lev Veyde wrote: > > Hi Tivon, > > I personally think that it's worth it to reproduce the issue and get the > logs, even though it does really sound like a driver/kernel issue. > That may help get more understanding as to why it happens, and maybe even > get the driver/kernel fix. > > Thanks in advance, > > On Thu, Jul 15, 2021 at 12:38 AM Tivon Häberlein < > tivon.haeberl...@secges.de> wrote: > >> Hi Nathaniel, >> >> thanks for your time here and sorry for my late reply now. >> >> Even though my NICs didn't use the E1000E driver I now got a broadcom NIC >> from the stash and gave it a try. >> I'm happy to announce that the NICs don't seem to be resetting on the >> broadcom NIC. >> This obviously means that there's some driver issue with the Intel NICs I >> have been trying. >> >> I still don't get the host into operational state because "Failed to >> connect Host n3 to Storage Pool cl1" even though NFS is mounted properly >> but I this is a different issue I think. >> >> If you want I can reproduce this issue and grab all logs to maybe find a >> fix other than "get a broadcom NIC" for the community. >> To be honest though, I think this just can be added to the "weird driver >> fuckups in centos" list if we start digging. >> >> -- >> Best regards >> Tivon Häberlein >> >> >> >> On 13.07.2021 01:07, Nathaniel Roach via Users wrote: >> >> >> On 12/7/21 11:44 pm, Nathaniel Roach via Users wrote: >> >> Do you get anything in the logs at all? For something like this I would >> expect it to show in syslog from the kernel. >> >> It really does sound like the E1000E issue, but will probably have a >> different fix - I first encountered it on a router when I was pushing >> >100Mbps in *and then back out* the same NIC. Otherwise it wouldn't >> happen at all. That would explain why it's not an issue in maintenance mode >> and downloading an image works fine. >> On 12/7/21 7:57 am, Tivon Häberlein wrote: >> >> Hi Strahil, >> >> the server uses Intel NICs with ixgbe and igb kernel drivers. >> I did upgrade the firmware to the latest available one (through Dell >> lifecycle-contoller). >> I also tried replacing the network card itself but without success. >> >> As this issue did not arise when running Debian 10 or even oVirt Node >> before adding it to the cluster I don't think its hardware related. For my >> testing I mounted my oVirt Datastore manually on the fresh install of oVirt >> node (using the ISO) and then coping a large ISO file to the local disk. >> This fills the NIC up to the full 1 Gbit/s I have available there for a >> good 5 to 10 minutes. >> Also the administration through cockpit works perfectly before adding it >> to the cluster. >> >> As soon as I add the node to the cluster the trouble starts. >> 1. oVirt reports that the install has failed on this host >> 2. the node logs (kernel log) adapter resets on some interfaces (even >> ones that arent UP) >> >> Having read your message again, are you able to capture these log events >> before the node gets fenced (or just disable fencing for the time)? >> >> 3. the engine looses connection to the host and declares it "Unresponsive" >> 4. the node becomes unmanageable through cockpit or ssh because the >> connection is lost repeatedly. >> 5. the fencing agent reboots the node (If fencing is enabled) >> 6. node comes up and gets added to the cluster (oVirt says the node is in >> state UP) >> 7. repeat from step 2 >> >> It seems that this behavior stops when I put the node into maintenance. >> Then I can even mount the Datastore manually and transfer large ISOs >> without it dropping the connection. >> >> This is all very strange and I don't understand what causes this. >> >> Thank you. >> >> -- >> Best regards >> Tivon Häberlein >> >> On 11.07.2021 13:51, Strahil Nikolov wrote: >> >> Are you sure it's not a HW issue ? >> Try to update the server to latest firmware and test again.At least it >> won't hurt. >> >> Best Regards, >> Strahil Nikolov >> >> On Sat, Jul 10, 2021 at 14:45, Tivon Häberlein >> <tivon.haeberl...@secges.de> <tivon.haeberl...@secges.de> wrote: >> >> Hi, >> >> I've been trying to get oVirt Node 4.4.6 up and running on my Dell r620 >> hosts but am facing a strange issue where seemingly all network adapters >> get reset at random times after install. >> The interfaces reset as soon as a bit of traffic is flowing through them. >> Also the logs show nfs timeouts. >> >> This only happens after I have installed the host using the oVirt engine >> and it also only happens when the host is connected to the engine. When the >> host is in maintenance mode it also seems to work fine. >> >> The host and networks work fine when its by itself (I tested right after >> install using the ISO and also after I have removed the host from the >> cluster) >> >> I cant figure why this is happening. Am I missing something? >> I've been stuck on this for the last couple of weeks, a bit of help would >> be much appreciated. >> >> Thank you! >> >> My cluster is looking like this: >> Engine: oVirt 4.4.6 - CentOS Linux release 8.3.2011 >> host1: oVirt 4.4 repository on CentOS Linux release 8.4.2105 >> host2: oVirt 4.4 repository on CentOS Linux release 8.4.2105 >> host3 (this is the one I'm trying to install): oVirt node 4.4.6 >> >> -- >> Best regards >> Tivon Häberlein >> >> _______________________________________________ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/UQP3S4LFWGEP4KL4EUFDZ47WPKT4M6QN/ >> >> >> _______________________________________________ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ADWSPMDDO6DJYL7LVKYLHC4KMDTIFMA6/ >> >> -- >> >> *Nathaniel Roach* >> >> _______________________________________________ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4VCK77N63IFZRNP2NEDS6TRABVGYXCLH/ >> >> -- >> >> *Nathaniel Roach* >> >> _______________________________________________ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/DRECREHLNKLOYMZWCQEVDMEWAR734AJ3/ >> >> _______________________________________________ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLI7DD6LIPSIYMQAY57TSGBXP6U3JCNO/ >> > > > -- > > Lev Veyde > > Senior Software Engineer, RHCE | RHCVA | MCITP > > Red Hat Israel > > <https://www.redhat.com> > > l...@redhat.com | lve...@redhat.com > <https://red.ht/sig> > TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/LCFMVGJVM3MGHHYBDIOFO3QEXOTOYBSI/ > > -- Lev Veyde Senior Software Engineer, RHCE | RHCVA | MCITP Red Hat Israel <https://www.redhat.com> l...@redhat.com | lve...@redhat.com <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BHXE23SOETZTE3H22MQ77244UUPCWNBV/