Hi Tivon,

I think that the most interesting one to see is the /var/log/messages ,
however I think it's best to simply archive the whole /var/log

Thanks in advance,

On Thu, Jul 15, 2021 at 1:36 PM Tivon Häberlein <tivon.haeberl...@secges.de>
wrote:

> Hi Lev,
>
> thanks for your reply.
> I'll gladly grab the logs in the next couple of days (got to go back to
> the DC to swap the cards back).
>
> Can you give me a list of logs I should grab so I don't miss any?
>
> --
> Best regards
> Tivon Häberlein
>
> On 15.07.2021 01:25, Lev Veyde wrote:
>
> Hi Tivon,
>
> I personally think that it's worth it to reproduce the issue and get the
> logs, even though it does really sound like a driver/kernel issue.
> That may help get more understanding as to why it happens, and maybe even
> get the driver/kernel fix.
>
> Thanks in advance,
>
> On Thu, Jul 15, 2021 at 12:38 AM Tivon Häberlein <
> tivon.haeberl...@secges.de> wrote:
>
>> Hi Nathaniel,
>>
>> thanks for your time here and sorry for my late reply now.
>>
>> Even though my NICs didn't use the E1000E driver I now got a broadcom NIC
>> from the stash and gave it a try.
>> I'm happy to announce that the NICs don't seem to be resetting on the
>> broadcom NIC.
>> This obviously means that there's some driver issue with the Intel NICs I
>> have been trying.
>>
>> I still don't get the host into operational state because "Failed to
>> connect Host n3 to Storage Pool cl1" even though NFS is mounted properly
>> but I this is a different issue I think.
>>
>> If you want I can reproduce this issue and grab all logs to maybe find a
>> fix other than "get a broadcom NIC" for the community.
>> To be honest though, I think this just can be added to the "weird driver
>> fuckups in centos" list if we start digging.
>>
>> --
>> Best regards
>> Tivon Häberlein
>>
>>
>>
>> On 13.07.2021 01:07, Nathaniel Roach via Users wrote:
>>
>>
>> On 12/7/21 11:44 pm, Nathaniel Roach via Users wrote:
>>
>> Do you get anything in the logs at all? For something like this I would
>> expect it to show in syslog from the kernel.
>>
>> It really does sound like the E1000E issue, but will probably have a
>> different fix - I first encountered it on a router when I was pushing
>> >100Mbps in *and then back out* the same NIC. Otherwise it wouldn't
>> happen at all. That would explain why it's not an issue in maintenance mode
>> and downloading an image works fine.
>> On 12/7/21 7:57 am, Tivon Häberlein wrote:
>>
>> Hi Strahil,
>>
>> the server uses Intel NICs with ixgbe and igb kernel drivers.
>> I did upgrade the firmware to the latest available one (through Dell
>> lifecycle-contoller).
>> I also tried replacing the network card itself but without success.
>>
>> As this issue did not arise when running Debian 10 or even oVirt Node
>> before adding it to the cluster I don't think its hardware related. For my
>> testing I mounted my oVirt Datastore manually on the fresh install of oVirt
>> node (using the ISO) and then coping a large ISO file to the local disk.
>> This fills the NIC up to the full 1 Gbit/s I have available there for a
>> good 5 to 10 minutes.
>> Also the administration through cockpit works perfectly before adding it
>> to the cluster.
>>
>> As soon as I add the node to the cluster the trouble starts.
>> 1. oVirt reports that the install has failed on this host
>> 2. the node logs (kernel log) adapter resets on some interfaces (even
>> ones that arent UP)
>>
>> Having read your message again, are you able to capture these log events
>> before the node gets fenced (or just disable fencing for the time)?
>>
>> 3. the engine looses connection to the host and declares it "Unresponsive"
>> 4. the node becomes unmanageable through cockpit or ssh because the
>> connection is lost repeatedly.
>> 5. the fencing agent reboots the node (If fencing is enabled)
>> 6. node comes up and gets added to the cluster (oVirt says the node is in
>> state UP)
>> 7. repeat from step 2
>>
>> It seems that this behavior stops when I put the node into maintenance.
>> Then I can even mount the Datastore manually and transfer large ISOs
>> without it dropping the connection.
>>
>> This is all very strange and I don't understand what causes this.
>>
>> Thank you.
>>
>> --
>> Best regards
>> Tivon Häberlein
>>
>> On 11.07.2021 13:51, Strahil Nikolov wrote:
>>
>> Are you sure it's not a HW issue ?
>> Try to update the server to latest firmware and test again.At least it
>> won't hurt.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On Sat, Jul 10, 2021 at 14:45, Tivon Häberlein
>> <tivon.haeberl...@secges.de> <tivon.haeberl...@secges.de> wrote:
>>
>> Hi,
>>
>> I've been trying to get oVirt Node 4.4.6 up and running on my Dell r620
>> hosts but am facing a strange issue where seemingly all network adapters
>> get reset at random times after install.
>> The interfaces reset as soon as a bit of traffic is flowing through them.
>> Also the logs show nfs timeouts.
>>
>> This only happens after I have installed the host using the oVirt engine
>> and it also only happens when the host is connected to the engine. When the
>> host is in maintenance mode it also seems to work fine.
>>
>> The host and networks work fine when its by itself (I tested right after
>> install using the ISO and also after I have removed the host from the
>> cluster)
>>
>> I cant figure why this is happening. Am I missing something?
>> I've been stuck on this for the last couple of weeks, a bit of help would
>> be much appreciated.
>>
>> Thank you!
>>
>> My cluster is looking like this:
>> Engine: oVirt 4.4.6 - CentOS Linux release 8.3.2011
>> host1: oVirt 4.4 repository on CentOS Linux release 8.4.2105
>> host2: oVirt 4.4 repository on CentOS Linux release 8.4.2105
>> host3 (this is the one I'm trying to install): oVirt node 4.4.6
>>
>> --
>> Best regards
>> Tivon Häberlein
>>
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/UQP3S4LFWGEP4KL4EUFDZ47WPKT4M6QN/
>>
>>
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ADWSPMDDO6DJYL7LVKYLHC4KMDTIFMA6/
>>
>> --
>>
>> *Nathaniel Roach*
>>
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4VCK77N63IFZRNP2NEDS6TRABVGYXCLH/
>>
>> --
>>
>> *Nathaniel Roach*
>>
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/DRECREHLNKLOYMZWCQEVDMEWAR734AJ3/
>>
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLI7DD6LIPSIYMQAY57TSGBXP6U3JCNO/
>>
>
>
> --
>
> Lev Veyde
>
> Senior Software Engineer, RHCE | RHCVA | MCITP
>
> Red Hat Israel
>
> <https://www.redhat.com>
>
> l...@redhat.com | lve...@redhat.com
> <https://red.ht/sig>
> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LCFMVGJVM3MGHHYBDIOFO3QEXOTOYBSI/
>
>

-- 

Lev Veyde

Senior Software Engineer, RHCE | RHCVA | MCITP

Red Hat Israel

<https://www.redhat.com>

l...@redhat.com | lve...@redhat.com
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BHXE23SOETZTE3H22MQ77244UUPCWNBV/

Reply via email to