Hi Tivon,

I personally think that it's worth it to reproduce the issue and get the
logs, even though it does really sound like a driver/kernel issue.
That may help get more understanding as to why it happens, and maybe even
get the driver/kernel fix.

Thanks in advance,

On Thu, Jul 15, 2021 at 12:38 AM Tivon Häberlein <tivon.haeberl...@secges.de>
wrote:

> Hi Nathaniel,
>
> thanks for your time here and sorry for my late reply now.
>
> Even though my NICs didn't use the E1000E driver I now got a broadcom NIC
> from the stash and gave it a try.
> I'm happy to announce that the NICs don't seem to be resetting on the
> broadcom NIC.
> This obviously means that there's some driver issue with the Intel NICs I
> have been trying.
>
> I still don't get the host into operational state because "Failed to
> connect Host n3 to Storage Pool cl1" even though NFS is mounted properly
> but I this is a different issue I think.
>
> If you want I can reproduce this issue and grab all logs to maybe find a
> fix other than "get a broadcom NIC" for the community.
> To be honest though, I think this just can be added to the "weird driver
> fuckups in centos" list if we start digging.
>
> --
> Best regards
> Tivon Häberlein
>
>
>
> On 13.07.2021 01:07, Nathaniel Roach via Users wrote:
>
>
> On 12/7/21 11:44 pm, Nathaniel Roach via Users wrote:
>
> Do you get anything in the logs at all? For something like this I would
> expect it to show in syslog from the kernel.
>
> It really does sound like the E1000E issue, but will probably have a
> different fix - I first encountered it on a router when I was pushing
> >100Mbps in *and then back out* the same NIC. Otherwise it wouldn't
> happen at all. That would explain why it's not an issue in maintenance mode
> and downloading an image works fine.
> On 12/7/21 7:57 am, Tivon Häberlein wrote:
>
> Hi Strahil,
>
> the server uses Intel NICs with ixgbe and igb kernel drivers.
> I did upgrade the firmware to the latest available one (through Dell
> lifecycle-contoller).
> I also tried replacing the network card itself but without success.
>
> As this issue did not arise when running Debian 10 or even oVirt Node
> before adding it to the cluster I don't think its hardware related. For my
> testing I mounted my oVirt Datastore manually on the fresh install of oVirt
> node (using the ISO) and then coping a large ISO file to the local disk.
> This fills the NIC up to the full 1 Gbit/s I have available there for a
> good 5 to 10 minutes.
> Also the administration through cockpit works perfectly before adding it
> to the cluster.
>
> As soon as I add the node to the cluster the trouble starts.
> 1. oVirt reports that the install has failed on this host
> 2. the node logs (kernel log) adapter resets on some interfaces (even ones
> that arent UP)
>
> Having read your message again, are you able to capture these log events
> before the node gets fenced (or just disable fencing for the time)?
>
> 3. the engine looses connection to the host and declares it "Unresponsive"
> 4. the node becomes unmanageable through cockpit or ssh because the
> connection is lost repeatedly.
> 5. the fencing agent reboots the node (If fencing is enabled)
> 6. node comes up and gets added to the cluster (oVirt says the node is in
> state UP)
> 7. repeat from step 2
>
> It seems that this behavior stops when I put the node into maintenance.
> Then I can even mount the Datastore manually and transfer large ISOs
> without it dropping the connection.
>
> This is all very strange and I don't understand what causes this.
>
> Thank you.
>
> --
> Best regards
> Tivon Häberlein
>
> On 11.07.2021 13:51, Strahil Nikolov wrote:
>
> Are you sure it's not a HW issue ?
> Try to update the server to latest firmware and test again.At least it
> won't hurt.
>
> Best Regards,
> Strahil Nikolov
>
> On Sat, Jul 10, 2021 at 14:45, Tivon Häberlein
> <tivon.haeberl...@secges.de> <tivon.haeberl...@secges.de> wrote:
>
> Hi,
>
> I've been trying to get oVirt Node 4.4.6 up and running on my Dell r620
> hosts but am facing a strange issue where seemingly all network adapters
> get reset at random times after install.
> The interfaces reset as soon as a bit of traffic is flowing through them.
> Also the logs show nfs timeouts.
>
> This only happens after I have installed the host using the oVirt engine
> and it also only happens when the host is connected to the engine. When the
> host is in maintenance mode it also seems to work fine.
>
> The host and networks work fine when its by itself (I tested right after
> install using the ISO and also after I have removed the host from the
> cluster)
>
> I cant figure why this is happening. Am I missing something?
> I've been stuck on this for the last couple of weeks, a bit of help would
> be much appreciated.
>
> Thank you!
>
> My cluster is looking like this:
> Engine: oVirt 4.4.6 - CentOS Linux release 8.3.2011
> host1: oVirt 4.4 repository on CentOS Linux release 8.4.2105
> host2: oVirt 4.4 repository on CentOS Linux release 8.4.2105
> host3 (this is the one I'm trying to install): oVirt node 4.4.6
>
> --
> Best regards
> Tivon Häberlein
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/UQP3S4LFWGEP4KL4EUFDZ47WPKT4M6QN/
>
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ADWSPMDDO6DJYL7LVKYLHC4KMDTIFMA6/
>
> --
>
> *Nathaniel Roach*
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4VCK77N63IFZRNP2NEDS6TRABVGYXCLH/
>
> --
>
> *Nathaniel Roach*
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/DRECREHLNKLOYMZWCQEVDMEWAR734AJ3/
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLI7DD6LIPSIYMQAY57TSGBXP6U3JCNO/
>


-- 

Lev Veyde

Senior Software Engineer, RHCE | RHCVA | MCITP

Red Hat Israel

<https://www.redhat.com>

l...@redhat.com | lve...@redhat.com
<https://red.ht/sig>
TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LCFMVGJVM3MGHHYBDIOFO3QEXOTOYBSI/

Reply via email to