Interesting is that I don't find anything recent , but this one:
https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653

Can you check if anything in the OS was updated/changed recently ?

Also check if the VM is with nested virtualization enabled. 

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão 
<fer...@versatushpc.com.br> написа: 





Strahil, thank you man. We finally got some output:

2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA 
nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, 
core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], 
CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, 
core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0]
2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should 
be described in NUMA config, ability to start up with partial NUMA mappings is 
obsoleted and will be removed in future
KVM: entry failed, hardware error 0x80000021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX=00000000 EBX=01746180 ECX=4be7c002 EDX=000400b6
ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770
EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =0000 00000000 ffffffff 00809300
CS =8d00 7ff8d000 ffffffff 00809300
SS =0000 00000000 ffffffff 00809300
DS =0000 00000000 ffffffff 00809300
FS =0000 00000000 ffffffff 00809300
GS =0000 00000000 ffffffff 00809300
LDT=0000 00000000 000fffff 00000000
TR =0040 04c59000 00000067 00008b00
GDT=    04c5afb0 00000057
IDT=    00000000 00000000
CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 
DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 
(<unknown process>)
2020-09-16 04:12:02.212+0000: shutting down, reason=shutdown






That’s the issue, I got this on the logs of both physical machines. The 
probability of both machines are damaged is not quite common right? So even 
with the log saying it’s a hardware error it may be software related? And 
again, this only happens with this VM.

> On 21 Sep 2020, at 17:36, Strahil Nikolov <hunter86...@yahoo.com> wrote:
> 
> Usually libvirt's log might provide hints (yet , no clues) of any issues.
> 
> For example: 
> /var/log/libvirt/qemu/<VM_NAME>.log
> 
> Anything changed recently (maybe oVirt version was increased) ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão 
> <fer...@versatushpc.com.br> написа: 
> 
> 
> 
> 
> 
> Hi Strahil, 
> 
> 
> 
> Both disks are VirtIO-SCSI and are Preallocated:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> 
> 
> 
> 
> 
>>  
>> On 21 Sep 2020, at 17:09, Strahil Nikolov <hunter86...@yahoo.com> wrote:
>> 
>> 
>>  
>> What type of disks are you using ? Any change you use thin disks ?
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> 
>> 
>> В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via 
>> Users <users@ovirt.org> написа: 
>> 
>> 
>> 
>> 
>> 
>> Hi, sorry to bump the thread.
>> 
>> But I still with this issue on the VM. This crashes are still happening, and 
>> I really don’t know what to do. Since there’s nothing on logs, except from 
>> that message on `dmesg` of the host machine I started changing setting to 
>> see if anything changes or if I at least I get a pattern.
>> 
>> What I’ve tried:
>> 1. Disabled I/O Threading on VM.
>> 2. Increased I/O Threading to 2 form 1.
>> 3. Disabled Memory Balooning.
>> 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
>> RAM.
>> 5. Moved the VM to another host.
>> 6. Dedicated a host specific to this VM.
>> 7. Check on the storage system to see if there’s any resource starvation, 
>> but everything seems to be fine.
>> 8. Checked both iSCSI switches to see if there’s something wrong with the 
>> fabrics: 0 errors.
>> 
>> I’m really running out of ideas. The VM was working normally and suddenly 
>> this started.
>> 
>> Thanks,
>> 
>> PS: When I was typing this message it crashed again:
>> 
>> [427483.126725] *** Guest State ***
>> [427483.127661] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, 
>> gh_mask=fffffffffffffff7
>> [427483.128505] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, 
>> gh_mask=fffffffffffff871
>> [427483.129342] CR3 = 0x00000001849ff002
>> [427483.130177] RSP = 0xffffb10186ffffb0  RIP = 0x0000000000008000
>> [427483.131014] RFLAGS=0x00000002        DR7 = 0x0000000000000400
>> [427483.131859] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
>> [427483.132708] CS:  sel=0x9b00, attr=0x08093, limit=0xffffffff, 
>> base=0x000000007ff9b000
>> [427483.133559] DS:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>> base=0x0000000000000000
>> [427483.134413] SS:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>> base=0x0000000000000000
>> [427483.135237] ES:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>> base=0x0000000000000000
>> [427483.136040] FS:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>> base=0x0000000000000000
>> [427483.136842] GS:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>> base=0x0000000000000000
>> [427483.137629] GDTR:                          limit=0x00000057, 
>> base=0xffffb10186eb4fb0
>> [427483.138409] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, 
>> base=0x0000000000000000
>> [427483.139202] IDTR:                          limit=0x00000000, 
>> base=0x0000000000000000
>> [427483.139998] TR:  sel=0x0040, attr=0x0008b, limit=0x00000067, 
>> base=0xffffb10186eb3000
>> [427483.140816] EFER =    0x0000000000000000  PAT = 0x0007010600070106
>> [427483.141650] DebugCtl = 0x0000000000000000  DebugExceptions = 
>> 0x0000000000000000
>> [427483.142503] Interruptibility = 00000009  ActivityState = 00000000
>> [427483.143353] *** Host State ***
>> [427483.144194] RIP = 0xffffffffc0c65024  RSP = 0xffff9253c0b9bc90
>> [427483.145043] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
>> [427483.145903] FSBase=00007fcc13816700 GSBase=ffff925adf240000 
>> TRBase=ffff925adf244000
>> [427483.146766] GDTBase=ffff925adf24c000 IDTBase=ffffffffff528000
>> [427483.147630] CR0=0000000080050033 CR3=00000010597b6000 
>> CR4=00000000001627e0
>> [427483.148498] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0
>> [427483.149365] EFER = 0x0000000000000d01  PAT = 0x0007050600070106
>> [427483.150231] *** Control State ***
>> [427483.151077] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb
>> [427483.151942] EntryControls=0000d1ff ExitControls=002fefff
>> [427483.152800] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
>> [427483.153661] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000
>> [427483.154521] VMExit: intr_info=00000000 errcode=00000000 ilen=00000004
>> [427483.155376]        reason=80000021 qualification=0000000000000000
>> [427483.156230] IDTVectoring: info=00000000 errcode=00000000
>> [427483.157068] TSC Offset = 0xfffccfc261506dd9
>> [427483.157905] TPR Threshold = 0x0d
>> [427483.158728] EPT pointer = 0x00000009b437701e
>> [427483.159550] PLE Gap=00000080 Window=00080000
>> [427483.160370] Virtual processor ID = 0x0004
>> 
>> 
>> 
>>> On 16 Sep 2020, at 17:11, Vinícius Ferrão <fer...@versatushpc.com.br> wrote:
>>> 
>>> Hello,
>>> 
>>> I’m an Exchange Server VM that’s going down to suspend without possibility 
>>> of recovery. I need to click on shutdown and them power on. I can’t find 
>>> anything useful on the logs, except on “dmesg” of the host:
>>> 
>>> [47807.747606] *** Guest State ***
>>> [47807.747633] CR0: actual=0x0000000000050032, shadow=0x0000000000050032, 
>>> gh_mask=fffffffffffffff7
>>> [47807.747671] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, 
>>> gh_mask=fffffffffffff871
>>> [47807.747721] CR3 = 0x00000000001ad002
>>> [47807.747739] RSP = 0xffffc20904fa3770  RIP = 0x0000000000008000
>>> [47807.747766] RFLAGS=0x00000002        DR7 = 0x0000000000000400
>>> [47807.747792] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
>>> [47807.747821] CS:  sel=0x9100, attr=0x08093, limit=0xffffffff, 
>>> base=0x000000007ff91000
>>> [47807.747855] DS:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>>> base=0x0000000000000000
>>> [47807.747889] SS:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>>> base=0x0000000000000000
>>> [47807.747923] ES:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>>> base=0x0000000000000000
>>> [47807.747957] FS:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>>> base=0x0000000000000000
>>> [47807.747991] GS:  sel=0x0000, attr=0x08093, limit=0xffffffff, 
>>> base=0x0000000000000000
>>> [47807.748025] GDTR:                          limit=0x00000057, 
>>> base=0xffff80817e7d5fb0
>>> [47807.748059] LDTR: sel=0x0000, attr=0x10000, limit=0x000fffff, 
>>> base=0x0000000000000000
>>> [47807.748093] IDTR:                          limit=0x00000000, 
>>> base=0x0000000000000000
>>> [47807.748128] TR:  sel=0x0040, attr=0x0008b, limit=0x00000067, 
>>> base=0xffff80817e7d4000
>>> [47807.748162] EFER =    0x0000000000000000  PAT = 0x0007010600070106
>>> [47807.748189] DebugCtl = 0x0000000000000000  DebugExceptions = 
>>> 0x0000000000000000
>>> [47807.748221] Interruptibility = 00000009  ActivityState = 00000000
>>> [47807.748248] *** Host State ***
>>> [47807.748263] RIP = 0xffffffffc0c65024  RSP = 0xffff9252bda5fc90
>>> [47807.748290] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
>>> [47807.748318] FSBase=00007f46d462a700 GSBase=ffff9252ffac0000 
>>> TRBase=ffff9252ffac4000
>>> [47807.748351] GDTBase=ffff9252ffacc000 IDTBase=ffffffffff528000
>>> [47807.748377] CR0=0000000080050033 CR3=000000105ac8c000 
>>> CR4=00000000001627e0
>>> [47807.748407] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff8f196cc0
>>> [47807.748435] EFER = 0x0000000000000d01  PAT = 0x0007050600070106
>>> [47807.748461] *** Control State ***
>>> [47807.748478] PinBased=0000003f CPUBased=b6a1edfa SecondaryExec=00000ceb
>>> [47807.748507] EntryControls=0000d1ff ExitControls=002fefff
>>> [47807.748531] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
>>> [47807.748561] VMEntry: intr_info=00000000 errcode=00000006 ilen=00000000
>>> [47807.748589] VMExit: intr_info=00000000 errcode=00000000 ilen=00000001
>>> [47807.748618]        reason=80000021 qualification=0000000000000000
>>> [47807.748645] IDTVectoring: info=00000000 errcode=00000000
>>> [47807.748669] TSC Offset = 0xfffff9b8c8d943b6
>>> [47807.748699] TPR Threshold = 0x00
>>> [47807.748715] EPT pointer = 0x000000105cd5601e
>>> [47807.748735] PLE Gap=00000080 Window=00001000
>>> [47807.748755] Virtual processor ID = 0x0003
>>> 
>>> So something really went crazy. The VM is going down at least two times a 
>>> day for the last 5 days.
>>> 
>>> At first I thought it would be an hardware issue, so I restarted the VM on 
>>> other host, and the same thing happened.
>>> 
>>> About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 
>>> with iSCSI storage to a FreeNAS box, where the VM disks are running; there 
>>> are a 300GB disc for C:\ and 2TB disk for D:\.
>>> 
>>> Any ideia on how to start troubleshooting it?
>>> 
>>> Thanks,
>>> 
>>> 
>> 
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:  
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIAIVV6I2MUVPVR4FBJQTZW5OL4MNS5Q/
>> 
>> 
>> 
>> 
> 
> 
> 
> 
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/U3OLTH27Q2Y5RE4OLLIUZMJ2YDC7HNIT/

Reply via email to