[ovirt-users] Re: Unrecoverable NMI error on HP Gen8 hosts.

2022-01-08 Thread Diggy Mc
These Gen8 servers have iLO-3 on them.  They are at the latest version of iLO.  
I'm not familiar with 'fencing'.  Any guidance you can offer on the subject 
would be greatly appreciated.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QVZXPYGPRESD752IVJW4XAHBFOERLZCG/


[ovirt-users] Re: Unrecoverable NMI error on HP Gen8 hosts.

2022-01-08 Thread Diggy Mc
> On Thu, Dec 30, 2021 at 8:02 PM Diggy Mc  
> Are you sure it's related to oVirt at all? To Linux?
> Did you check the hardware? Contact your hardware support? Perhaps check
> some on-board diagnostics/logs/whatever?
> 
> Good luck and best regards,

Before putting the Gen8 servers into production (as with all servers), I ran 
the comprehensive HP diagnostic tests on them for 24 hours.  I also ran the 
intensive MemTest86 tests on them for 24 hours as well.  All tests passed. I 
feel safe assuming the hardware is okay.

The little information I have found on the matter suggests it is a Kernel 
watchdog issue, but those articles offered no help in resolving the problem.

I am not suggesting it is an oVirt issue.  I am simply hoping that someone in 
the oVirt community has encountered the same problem and can offer a solution 
or work-around.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/36PJKK26RWFUYI2QA272ZTE6QXYDXZRX/


[ovirt-users] Re: Unrecoverable NMI error on HP Gen8 hosts.

2022-01-03 Thread Strahil Nikolov via Users
I'm not sure if fencing is not generating those NMI options.
Have you tested the fencing ? If not, follow the documentation to test fencing 
and if that's the reason for the NMI.Also check any pending firmware updates 
like the newest iLO4.
Best Regards,Strahil Nikolov
 
 
  On Sun, Jan 2, 2022 at 17:44, Gilboa Davara wrote:   
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MXADE3ZVXA3VNQISODECP5XQEBEUYA4Y/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OTIXFYGQOMM2BWOQTGZ4XRP3JP5IARY6/


[ovirt-users] Re: Unrecoverable NMI error on HP Gen8 hosts.

2022-01-02 Thread Gilboa Davara
On Thu, Dec 30, 2021 at 8:02 PM Diggy Mc  wrote:

>
> I have oVirt Node v4.4.8.3 running on several HP ProLiant Gen8 servers.  I
> receive the following error under certain circumstances:
> "An Unrecoverable System Error (NMI) has occurred (iLO application
> watchdog timeout NMI, Service Information: 0x002B, 0x)"
>
> When a host starts taking a load (but nowhere near a threshold), I
> encounter the above iLO-logged error and the host locks-up.  I have had to
> grossly under-utilize my hosts to avoid this problem.  I'm hoping for a
> better fix or work-around.
>
> I've had the same problem beginning with my oVirt 4.3.x hosts, so it isn't
> oVirt version specific.
>
> The little information I could find on the error wasn't helpful.  Red Hat
> acknowledges the issue, but limited to shutdown/reboot operations; not
> during "normal" operations.
>
> Anyone else experienced this problem?  How did you fix it or work around
> it?  I'd like to better utilize my servers if possible.
>
> In advance, thank you to anyone and everyone who offers help.
>
> NMI errors are usually hardware related or kernel / system related. (E.g.
memory failure, hardware health check watchdog, etc)
They are not oVirt related per-say.

That said, I'm seeing an HPE report with the same NMI service code.
https://community.hpe.com/t5/ProLiant-Servers-ML-DL-SL/Proliant-dl360p-gen8An-Unrecoverable-SystemError-NMI-has/td-p/7043891#.YdHHOduxUik

- Gilboa
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MXADE3ZVXA3VNQISODECP5XQEBEUYA4Y/


[ovirt-users] Re: Unrecoverable NMI error on HP Gen8 hosts.

2022-01-01 Thread Yedidyah Bar David
On Thu, Dec 30, 2021 at 8:02 PM Diggy Mc  wrote:
>
>
> I have oVirt Node v4.4.8.3 running on several HP ProLiant Gen8 servers.  I 
> receive the following error under certain circumstances:
> "An Unrecoverable System Error (NMI) has occurred (iLO application watchdog 
> timeout NMI, Service Information: 0x002B, 0x)"
>
> When a host starts taking a load (but nowhere near a threshold), I encounter 
> the above iLO-logged error and the host locks-up.  I have had to grossly 
> under-utilize my hosts to avoid this problem.  I'm hoping for a better fix or 
> work-around.
>
> I've had the same problem beginning with my oVirt 4.3.x hosts, so it isn't 
> oVirt version specific.
>
> The little information I could find on the error wasn't helpful.  Red Hat 
> acknowledges the issue, but limited to shutdown/reboot operations; not during 
> "normal" operations.
>
> Anyone else experienced this problem?  How did you fix it or work around it?  
> I'd like to better utilize my servers if possible.
>
> In advance, thank you to anyone and everyone who offers help.

Are you sure it's related to oVirt at all? To Linux?
Did you check the hardware? Contact your hardware support? Perhaps check
some on-board diagnostics/logs/whatever?

Good luck and best regards,
-- 
Didi
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W3M33TB5D5VEXFANU6R7IBNFM3MOW3EV/