[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
Hi again Strahil,

It’s oVirt 4.3.10. Same CPU on the entire cluster, it’s three machines with 
Xeon E5-2620v2 (Ivy Bridge), all the machines are identical in model and specs.

I’ve changed the VM CPU Model to:
Nehalem,+spec-ctrl,+ssbd

Let’s see how it behaves. If it crashes again I’ll definitely look at rolling 
back the OS updates.

Thank you all.

PS: I can try upgrading to 4.4.

> On 22 Sep 2020, at 04:28, Strahil Nikolov  wrote:
> 
> This looks much like my openBSD 6.6 under Latest AMD CPUs. KVM did not accept 
> a pretty valid instruction and it was a bug in KVM.
> 
> Maybe you can try to :
> - power off the VM
> - pick an older CPU type for that VM only
> - power on and monitor in the next days 
> 
> Do you have a cluster with different cpu vendor (if currently on AMD -> Intel 
> and if currently Intel -> AMD)? Maybe you can move it to another cluster and 
> identify if the issue happens there too.
> 
> Another option is to try to rollback the windows updates , to identify if any 
> of them has caused the problem. Yet, that's aworkaround and not a fix .
> 
> 
> Are you using oVirt 4.3 or 4.4 ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 10:08:44 Гринуич+3, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Hi Strahil, yes I can’t find anything recently either. You digged way further 
> then me, I found some regressions on the kernel but I don’t know if it’s 
> related or not: 
> 
> 
> 
> https://patchwork.kernel.org/patch/5526561/
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027
> 
> 
> 
> 
> Regarding the OS, nothing new was installed, just regular Windows Updates.
> 
> And finally about nested virtualisation, it’s disabled on hypervisor.
> 
> 
> 
> 
> One thing that caught my attention on the link you’ve sent is regarding a 
> rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443
> 
> 
> 
> 
> But come on, it’s from 2006…
> 
> 
> 
> 
> Well, I’m up to other ideas, VM just crashed once again:
> 
> 
> 
> 
> EAX= EBX=075c5180 ECX=75432002 EDX=000400b6
> ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770
> EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
> ES =   00809300
> CS =9900 7ff99000  00809300
> SS =   00809300
> DS =   00809300
> FS =   00809300
> GS =   00809300
> LDT=  000f 
> TR =0040 075da000 0067 8b00
> GDT= 075dbfb0 0057
> IDT=  
> CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=
> DR0= DR1= DR2= 
> DR3= 
> DR6=4ff0 DR7=0400
> EFER=
> Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff 
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff
> 
> 
> 
> 
> [519192.536247] *** Guest State ***
> [519192.536275] CR0: actual=0x00050032, shadow=0x00050032, 
> gh_mask=fff7
> [519192.536324] CR4: actual=0x2050, shadow=0x, 
> gh_mask=f871
> [519192.537322] CR3 = 0x001ad002
> [519192.538166] RSP = 0xfb047db5d770  RIP = 0x8000
> [519192.539017] RFLAGS=0x0002 DR7 = 0x0400
> [519192.539861] Sysenter RSP= CS:RIP=:
> [519192.540690] CS:   sel=0x9900, attr=0x08093, limit=0x, 
> base=0x7ff99000
> [519192.541523] DS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.542356] SS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.543167] ES:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.543961] FS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.544747] GS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.545511] GDTR:   limit=0x0057, 
> base=0xad01075dbfb0
> [519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, 
> base=0x
> [519192.547052] IDTR:   limit=0x, 
> base=0x
> [519192.547841] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
> base=0xad01075da000
> [519192.548639] EFER = 0x  PAT = 0x0007010600070106
> [519192.549460] DebugCtl = 0x  DebugExceptions = 
> 0x
> [519192.550302] Interruptibility = 0009  ActivityState = 
> [519192.551137] *** Host State ***
> [519192.551963] RIP = 0xc150a034  RSP = 0x88cd9cafbc90
> [519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
> [519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c 
> TRBase=88d45f2c4000
> [519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000
> [519192.555347] 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
Hi Gianluca.

On 22 Sep 2020, at 04:24, Gianluca Cecchi 
mailto:gianluca.cec...@gmail.com>> wrote:



On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hi Strahil, yes I can’t find anything recently either. You digged way further 
then me, I found some regressions on the kernel but I don’t know if it’s 
related or not:

https://patchwork.kernel.org/patch/5526561/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027

Regarding the OS, nothing new was installed, just regular Windows Updates.
And finally about nested virtualisation, it’s disabled on hypervisor.



In your original post you wrote about the VM going suspended.
So I think there could be something useful in engine.log on the engine and/or 
vdsm.log on the hypervisor.
Could you check those?

Yes I goes to suspend. I think this is just the engine don’t knowing what 
really happened and guessing it was suspended. On engine.log I only have this 
two lines:

# grep "2020-09-22 01:51" /var/log/ovirt-engine/engine.log
2020-09-22 01:51:52,604-03 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] VM 
'351db98a-5f74-439f-99a4-31f611b2d250'(cerulean) moved from 'Up' --> 'Paused'
2020-09-22 01:51:52,699-03 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] 
EVENT_ID: VM_PAUSED(1,025), VM cerulean has been paused.

Note that I’ve “grepped” with time. There’s only this two lines when it crashed 
like 2h30m ago.

On vdsm.log on the near time with the name of the VM I only found an huge JSON, 
with the characteristics of the VM. If there something that I should check 
specifically? Tried some combinations of “grep” but nothing really useful.

Also, do you see anything in event viewer of the WIndows VM and/or in Freenas 
logs?

FreeNAS is just cool, nothing wrong there. No errors on dmesg, nor resource 
starvation on ZFS. No overload on the disks, nothing… the storage is running 
easy.

About Windows Event Viewer it’s my Achilles’ heel; nothing relevant either as 
far as I’m concerned. There’s of course some mentions of improperly shutdown 
due to the crash, but nothing else. I’m looking further here, will report back 
if I found something useful.

Thanks,


Gianluca

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XTTUYAGYB6EE5I3XNNLBZEBWY363XTIQ/


[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Strahil Nikolov via Users
This looks much like my openBSD 6.6 under Latest AMD CPUs. KVM did not accept a 
pretty valid instruction and it was a bug in KVM.

Maybe you can try to :
- power off the VM
- pick an older CPU type for that VM only
- power on and monitor in the next days 

Do you have a cluster with different cpu vendor (if currently on AMD -> Intel 
and if currently Intel -> AMD)? Maybe you can move it to another cluster and 
identify if the issue happens there too.

Another option is to try to rollback the windows updates , to identify if any 
of them has caused the problem. Yet, that's aworkaround and not a fix .


Are you using oVirt 4.3 or 4.4 ?

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 10:08:44 Гринуич+3, Vinícius Ferrão 
 написа: 





Hi Strahil, yes I can’t find anything recently either. You digged way further 
then me, I found some regressions on the kernel but I don’t know if it’s 
related or not: 



https://patchwork.kernel.org/patch/5526561/

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027




Regarding the OS, nothing new was installed, just regular Windows Updates.

And finally about nested virtualisation, it’s disabled on hypervisor.




One thing that caught my attention on the link you’ve sent is regarding a 
rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443




But come on, it’s from 2006…




Well, I’m up to other ideas, VM just crashed once again:




EAX= EBX=075c5180 ECX=75432002 EDX=000400b6
ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770
EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =   00809300
CS =9900 7ff99000  00809300
SS =   00809300
DS =   00809300
FS =   00809300
GS =   00809300
LDT=  000f 
TR =0040 075da000 0067 8b00
GDT=     075dbfb0 0057
IDT=      
CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=
DR0= DR1= DR2= 
DR3= 
DR6=4ff0 DR7=0400
EFER=
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff




[519192.536247] *** Guest State ***
[519192.536275] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[519192.536324] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[519192.537322] CR3 = 0x001ad002
[519192.538166] RSP = 0xfb047db5d770  RIP = 0x8000
[519192.539017] RFLAGS=0x0002         DR7 = 0x0400
[519192.539861] Sysenter RSP= CS:RIP=:
[519192.540690] CS:   sel=0x9900, attr=0x08093, limit=0x, 
base=0x7ff99000
[519192.541523] DS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.542356] SS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.543167] ES:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.543961] FS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.544747] GS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.545511] GDTR:                           limit=0x0057, 
base=0xad01075dbfb0
[519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, 
base=0x
[519192.547052] IDTR:                           limit=0x, 
base=0x
[519192.547841] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
base=0xad01075da000
[519192.548639] EFER =     0x  PAT = 0x0007010600070106
[519192.549460] DebugCtl = 0x  DebugExceptions = 
0x
[519192.550302] Interruptibility = 0009  ActivityState = 
[519192.551137] *** Host State ***
[519192.551963] RIP = 0xc150a034  RSP = 0x88cd9cafbc90
[519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
[519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c 
TRBase=88d45f2c4000
[519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000
[519192.555347] CR0=80050033 CR3=00033dc82000 CR4=001627e0
[519192.556202] Sysenter RSP= CS:RIP=0010:91596cc0
[519192.557058] EFER = 0x0d01  PAT = 0x0007050600070106
[519192.557913] *** Control State ***
[519192.558757] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
[519192.559605] EntryControls=d1ff ExitControls=002fefff
[519192.560453] ExceptionBitmap=00060042 PFECmask= PFECmatch=
[519192.561306] VMEntry: intr_info= errcode=0006 ilen=
[519192.562158] VMExit: intr_info= errcode= ilen=0001
[519192.563006]         reason=8021 qualification=
[519192.563860] IDTVectoring: 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Gianluca Cecchi
On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users 
wrote:

> Hi Strahil, yes I can’t find anything recently either. You digged way
> further then me, I found some regressions on the kernel but I don’t know if
> it’s related or not:
>
> https://patchwork.kernel.org/patch/5526561/
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027
>
> Regarding the OS, nothing new was installed, just regular Windows Updates.
> And finally about nested virtualisation, it’s disabled on hypervisor.
>
>
>
In your original post you wrote about the VM going suspended.
So I think there could be something useful in engine.log on the engine
and/or vdsm.log on the hypervisor.
Could you check those?
Also, do you see anything in event viewer of the WIndows VM and/or in
Freenas logs?

Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/X52ZUYHMIVBVFYWOSQDTTV75YYCHDC5L/


[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
Hi Strahil, yes I can’t find anything recently either. You digged way further 
then me, I found some regressions on the kernel but I don’t know if it’s 
related or not:

https://patchwork.kernel.org/patch/5526561/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027

Regarding the OS, nothing new was installed, just regular Windows Updates.
And finally about nested virtualisation, it’s disabled on hypervisor.

One thing that caught my attention on the link you’ve sent is regarding a 
rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443

But come on, it’s from 2006…

Well, I’m up to other ideas, VM just crashed once again:

EAX= EBX=075c5180 ECX=75432002 EDX=000400b6
ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770
EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =   00809300
CS =9900 7ff99000  00809300
SS =   00809300
DS =   00809300
FS =   00809300
GS =   00809300
LDT=  000f 
TR =0040 075da000 0067 8b00
GDT= 075dbfb0 0057
IDT=  
CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=
DR0= DR1= DR2= 
DR3=
DR6=4ff0 DR7=0400
EFER=
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

[519192.536247] *** Guest State ***
[519192.536275] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[519192.536324] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[519192.537322] CR3 = 0x001ad002
[519192.538166] RSP = 0xfb047db5d770  RIP = 0x8000
[519192.539017] RFLAGS=0x0002 DR7 = 0x0400
[519192.539861] Sysenter RSP= CS:RIP=:
[519192.540690] CS:   sel=0x9900, attr=0x08093, limit=0x, 
base=0x7ff99000
[519192.541523] DS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.542356] SS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.543167] ES:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.543961] FS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.544747] GS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.545511] GDTR:   limit=0x0057, 
base=0xad01075dbfb0
[519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, 
base=0x
[519192.547052] IDTR:   limit=0x, 
base=0x
[519192.547841] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
base=0xad01075da000
[519192.548639] EFER = 0x  PAT = 0x0007010600070106
[519192.549460] DebugCtl = 0x  DebugExceptions = 
0x
[519192.550302] Interruptibility = 0009  ActivityState = 
[519192.551137] *** Host State ***
[519192.551963] RIP = 0xc150a034  RSP = 0x88cd9cafbc90
[519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
[519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c 
TRBase=88d45f2c4000
[519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000
[519192.555347] CR0=80050033 CR3=00033dc82000 CR4=001627e0
[519192.556202] Sysenter RSP= CS:RIP=0010:91596cc0
[519192.557058] EFER = 0x0d01  PAT = 0x0007050600070106
[519192.557913] *** Control State ***
[519192.558757] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
[519192.559605] EntryControls=d1ff ExitControls=002fefff
[519192.560453] ExceptionBitmap=00060042 PFECmask= PFECmatch=
[519192.561306] VMEntry: intr_info= errcode=0006 ilen=
[519192.562158] VMExit: intr_info= errcode= ilen=0001
[519192.563006] reason=8021 qualification=
[519192.563860] IDTVectoring: info= errcode=
[519192.564695] TSC Offset = 0xfffcc6c7d53f16d7
[519192.565526] TPR Threshold = 0x00
[519192.566345] EPT pointer = 0x000b9397901e
[519192.567162] PLE Gap=0080 Window=1000
[519192.567984] Virtual processor ID = 0x0005


Thank you!


On 22 Sep 2020, at 02:30, Strahil Nikolov 
mailto:hunter86...@yahoo.com>> wrote:

Interesting is that I don't find anything recent , but this one:
https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653

Can you check if anything in the OS was updated/changed recently ?

Also check if the VM is with nested virtualization enabled.

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão 
 написа:





Strahil, thank you man. We finally got some output:


[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-21 Thread Strahil Nikolov via Users
Interesting is that I don't find anything recent , but this one:
https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653

Can you check if anything in the OS was updated/changed recently ?

Also check if the VM is with nested virtualization enabled. 

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão 
 написа: 





Strahil, thank you man. We finally got some output:

2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA 
nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, 
core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], 
CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, 
core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0]
2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should 
be described in NUMA config, ability to start up with partial NUMA mappings is 
obsoleted and will be removed in future
KVM: entry failed, hardware error 0x8021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX= EBX=01746180 ECX=4be7c002 EDX=000400b6
ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770
EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =   00809300
CS =8d00 7ff8d000  00809300
SS =   00809300
DS =   00809300
FS =   00809300
GS =   00809300
LDT=  000f 
TR =0040 04c59000 0067 8b00
GDT=    04c5afb0 0057
IDT=     
CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=
DR0= DR1= DR2= 
DR3= 
DR6=0ff0 DR7=0400
EFER=
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 
()
2020-09-16 04:12:02.212+: shutting down, reason=shutdown






That’s the issue, I got this on the logs of both physical machines. The 
probability of both machines are damaged is not quite common right? So even 
with the log saying it’s a hardware error it may be software related? And 
again, this only happens with this VM.

> On 21 Sep 2020, at 17:36, Strahil Nikolov  wrote:
> 
> Usually libvirt's log might provide hints (yet , no clues) of any issues.
> 
> For example: 
> /var/log/libvirt/qemu/.log
> 
> Anything changed recently (maybe oVirt version was increased) ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Hi Strahil, 
> 
> 
> 
> Both disks are VirtIO-SCSI and are Preallocated:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> 
> 
> 
> 
> 
>>  
>> On 21 Sep 2020, at 17:09, Strahil Nikolov  wrote:
>> 
>> 
>>  
>> What type of disks are you using ? Any change you use thin disks ?
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> 
>> 
>> В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via 
>> Users  написа: 
>> 
>> 
>> 
>> 
>> 
>> Hi, sorry to bump the thread.
>> 
>> But I still with this issue on the VM. This crashes are still happening, and 
>> I really don’t know what to do. Since there’s nothing on logs, except from 
>> that message on `dmesg` of the host machine I started changing setting to 
>> see if anything changes or if I at least I get a pattern.
>> 
>> What I’ve tried:
>> 1. Disabled I/O Threading on VM.
>> 2. Increased I/O Threading to 2 form 1.
>> 3. Disabled Memory Balooning.
>> 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
>> RAM.
>> 5. Moved the VM to another host.
>> 6. Dedicated a host specific to this VM.
>> 7. Check on the storage system to see if there’s any resource starvation, 
>> but everything seems to be fine.
>> 8. Checked both iSCSI switches to see if there’s something wrong with the 
>> fabrics: 0 errors.
>> 
>> I’m really running out of ideas. The VM was working normally and suddenly 
>> this started.
>> 
>> Thanks,
>> 
>> PS: When I was typing this message it crashed again:
>> 
>> [427483.126725] *** Guest State ***
>> [427483.127661] CR0: actual=0x00050032, shadow=0x00050032, 
>> gh_mask=fff7
>> [427483.128505] CR4: actual=0x2050, shadow=0x, 
>> gh_mask=f871
>> [427483.129342] CR3 = 0x0001849ff002
>> [427483.130177] RSP = 0xb10186b0  RIP = 0x8000
>> [427483.131014] RFLAGS=0x0002        DR7 = 0x0400
>> [427483.131859] 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-21 Thread Vinícius Ferrão via Users
Strahil, thank you man. We finally got some output:

2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA 
nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, 
core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], 
CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, 
core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0]
2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should 
be described in NUMA config, ability to start up with partial NUMA mappings is 
obsoleted and will be removed in future
KVM: entry failed, hardware error 0x8021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX= EBX=01746180 ECX=4be7c002 EDX=000400b6
ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770
EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =   00809300
CS =8d00 7ff8d000  00809300
SS =   00809300
DS =   00809300
FS =   00809300
GS =   00809300
LDT=  000f 
TR =0040 04c59000 0067 8b00
GDT= 04c5afb0 0057
IDT=  
CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=
DR0= DR1= DR2= 
DR3= 
DR6=0ff0 DR7=0400
EFER=
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 
()
2020-09-16 04:12:02.212+: shutting down, reason=shutdown






That’s the issue, I got this on the logs of both physical machines. The 
probability of both machines are damaged is not quite common right? So even 
with the log saying it’s a hardware error it may be software related? And 
again, this only happens with this VM.

> On 21 Sep 2020, at 17:36, Strahil Nikolov  wrote:
> 
> Usually libvirt's log might provide hints (yet , no clues) of any issues.
> 
> For example: 
> /var/log/libvirt/qemu/.log
> 
> Anything changed recently (maybe oVirt version was increased) ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Hi Strahil, 
> 
> 
> 
> Both disks are VirtIO-SCSI and are Preallocated:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> 
> 
> 
> 
> 
>>   
>> On 21 Sep 2020, at 17:09, Strahil Nikolov  wrote:
>> 
>> 
>>   
>> What type of disks are you using ? Any change you use thin disks ?
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> 
>> 
>> В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via 
>> Users  написа: 
>> 
>> 
>> 
>> 
>> 
>> Hi, sorry to bump the thread.
>> 
>> But I still with this issue on the VM. This crashes are still happening, and 
>> I really don’t know what to do. Since there’s nothing on logs, except from 
>> that message on `dmesg` of the host machine I started changing setting to 
>> see if anything changes or if I at least I get a pattern.
>> 
>> What I’ve tried:
>> 1. Disabled I/O Threading on VM.
>> 2. Increased I/O Threading to 2 form 1.
>> 3. Disabled Memory Balooning.
>> 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
>> RAM.
>> 5. Moved the VM to another host.
>> 6. Dedicated a host specific to this VM.
>> 7. Check on the storage system to see if there’s any resource starvation, 
>> but everything seems to be fine.
>> 8. Checked both iSCSI switches to see if there’s something wrong with the 
>> fabrics: 0 errors.
>> 
>> I’m really running out of ideas. The VM was working normally and suddenly 
>> this started.
>> 
>> Thanks,
>> 
>> PS: When I was typing this message it crashed again:
>> 
>> [427483.126725] *** Guest State ***
>> [427483.127661] CR0: actual=0x00050032, shadow=0x00050032, 
>> gh_mask=fff7
>> [427483.128505] CR4: actual=0x2050, shadow=0x, 
>> gh_mask=f871
>> [427483.129342] CR3 = 0x0001849ff002
>> [427483.130177] RSP = 0xb10186b0  RIP = 0x8000
>> [427483.131014] RFLAGS=0x0002DR7 = 0x0400
>> [427483.131859] Sysenter RSP= CS:RIP=:
>> [427483.132708] CS:  sel=0x9b00, attr=0x08093, limit=0x, 
>> base=0x7ff9b000
>> [427483.133559] DS:  sel=0x, attr=0x08093, limit=0x, 
>> base=0x
>> [427483.134413] SS:  sel=0x, attr=0x08093, limit=0x, 
>> base=0x
>> [427483.135237] ES:  sel=0x, 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-21 Thread Strahil Nikolov via Users
Usually libvirt's log might provide hints (yet , no clues) of any issues.

For example: 
/var/log/libvirt/qemu/.log

Anything changed recently (maybe oVirt version was increased) ?

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão 
 написа: 





Hi Strahil, 



Both disks are VirtIO-SCSI and are Preallocated:














Thanks,








>  
> On 21 Sep 2020, at 17:09, Strahil Nikolov  wrote:
> 
> 
>  
> What type of disks are you using ? Any change you use thin disks ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via 
> Users  написа: 
> 
> 
> 
> 
> 
> Hi, sorry to bump the thread.
> 
> But I still with this issue on the VM. This crashes are still happening, and 
> I really don’t know what to do. Since there’s nothing on logs, except from 
> that message on `dmesg` of the host machine I started changing setting to see 
> if anything changes or if I at least I get a pattern.
> 
> What I’ve tried:
> 1. Disabled I/O Threading on VM.
> 2. Increased I/O Threading to 2 form 1.
> 3. Disabled Memory Balooning.
> 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
> RAM.
> 5. Moved the VM to another host.
> 6. Dedicated a host specific to this VM.
> 7. Check on the storage system to see if there’s any resource starvation, but 
> everything seems to be fine.
> 8. Checked both iSCSI switches to see if there’s something wrong with the 
> fabrics: 0 errors.
> 
> I’m really running out of ideas. The VM was working normally and suddenly 
> this started.
> 
> Thanks,
> 
> PS: When I was typing this message it crashed again:
> 
> [427483.126725] *** Guest State ***
> [427483.127661] CR0: actual=0x00050032, shadow=0x00050032, 
> gh_mask=fff7
> [427483.128505] CR4: actual=0x2050, shadow=0x, 
> gh_mask=f871
> [427483.129342] CR3 = 0x0001849ff002
> [427483.130177] RSP = 0xb10186b0  RIP = 0x8000
> [427483.131014] RFLAGS=0x0002        DR7 = 0x0400
> [427483.131859] Sysenter RSP= CS:RIP=:
> [427483.132708] CS:  sel=0x9b00, attr=0x08093, limit=0x, 
> base=0x7ff9b000
> [427483.133559] DS:  sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [427483.134413] SS:  sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [427483.135237] ES:  sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [427483.136040] FS:  sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [427483.136842] GS:  sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [427483.137629] GDTR:                          limit=0x0057, 
> base=0xb10186eb4fb0
> [427483.138409] LDTR: sel=0x, attr=0x1, limit=0x000f, 
> base=0x
> [427483.139202] IDTR:                          limit=0x, 
> base=0x
> [427483.139998] TR:  sel=0x0040, attr=0x0008b, limit=0x0067, 
> base=0xb10186eb3000
> [427483.140816] EFER =    0x  PAT = 0x0007010600070106
> [427483.141650] DebugCtl = 0x  DebugExceptions = 
> 0x
> [427483.142503] Interruptibility = 0009  ActivityState = 
> [427483.143353] *** Host State ***
> [427483.144194] RIP = 0xc0c65024  RSP = 0x9253c0b9bc90
> [427483.145043] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
> [427483.145903] FSBase=7fcc13816700 GSBase=925adf24 
> TRBase=925adf244000
> [427483.146766] GDTBase=925adf24c000 IDTBase=ff528000
> [427483.147630] CR0=80050033 CR3=0010597b6000 CR4=001627e0
> [427483.148498] Sysenter RSP= CS:RIP=0010:8f196cc0
> [427483.149365] EFER = 0x0d01  PAT = 0x0007050600070106
> [427483.150231] *** Control State ***
> [427483.151077] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
> [427483.151942] EntryControls=d1ff ExitControls=002fefff
> [427483.152800] ExceptionBitmap=00060042 PFECmask= PFECmatch=
> [427483.153661] VMEntry: intr_info= errcode=0006 ilen=
> [427483.154521] VMExit: intr_info= errcode= ilen=0004
> [427483.155376]        reason=8021 qualification=
> [427483.156230] IDTVectoring: info= errcode=
> [427483.157068] TSC Offset = 0xfffccfc261506dd9
> [427483.157905] TPR Threshold = 0x0d
> [427483.158728] EPT pointer = 0x0009b437701e
> [427483.159550] PLE Gap=0080 Window=0008
> [427483.160370] Virtual processor ID = 0x0004
> 
> 
> 
>> On 16 Sep 2020, at 17:11, Vinícius Ferrão  wrote:
>> 
>> Hello,
>> 
>> I’m an Exchange Server VM that’s going down to suspend without possibility 
>> of recovery. I need to click on shutdown and them power on. I can’t find 
>> anything 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-21 Thread Strahil Nikolov via Users
What type of disks are you using ? Any change you use thin disks ?

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via 
Users  написа: 





Hi, sorry to bump the thread.

But I still with this issue on the VM. This crashes are still happening, and I 
really don’t know what to do. Since there’s nothing on logs, except from that 
message on `dmesg` of the host machine I started changing setting to see if 
anything changes or if I at least I get a pattern.

What I’ve tried:
1. Disabled I/O Threading on VM.
2. Increased I/O Threading to 2 form 1.
3. Disabled Memory Balooning.
4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
RAM.
5. Moved the VM to another host.
6. Dedicated a host specific to this VM.
7. Check on the storage system to see if there’s any resource starvation, but 
everything seems to be fine.
8. Checked both iSCSI switches to see if there’s something wrong with the 
fabrics: 0 errors.

I’m really running out of ideas. The VM was working normally and suddenly this 
started.

Thanks,

PS: When I was typing this message it crashed again:

[427483.126725] *** Guest State ***
[427483.127661] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[427483.128505] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[427483.129342] CR3 = 0x0001849ff002
[427483.130177] RSP = 0xb10186b0  RIP = 0x8000
[427483.131014] RFLAGS=0x0002        DR7 = 0x0400
[427483.131859] Sysenter RSP= CS:RIP=:
[427483.132708] CS:  sel=0x9b00, attr=0x08093, limit=0x, 
base=0x7ff9b000
[427483.133559] DS:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.134413] SS:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.135237] ES:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.136040] FS:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.136842] GS:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.137629] GDTR:                          limit=0x0057, 
base=0xb10186eb4fb0
[427483.138409] LDTR: sel=0x, attr=0x1, limit=0x000f, 
base=0x
[427483.139202] IDTR:                          limit=0x, 
base=0x
[427483.139998] TR:  sel=0x0040, attr=0x0008b, limit=0x0067, 
base=0xb10186eb3000
[427483.140816] EFER =    0x  PAT = 0x0007010600070106
[427483.141650] DebugCtl = 0x  DebugExceptions = 
0x
[427483.142503] Interruptibility = 0009  ActivityState = 
[427483.143353] *** Host State ***
[427483.144194] RIP = 0xc0c65024  RSP = 0x9253c0b9bc90
[427483.145043] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
[427483.145903] FSBase=7fcc13816700 GSBase=925adf24 
TRBase=925adf244000
[427483.146766] GDTBase=925adf24c000 IDTBase=ff528000
[427483.147630] CR0=80050033 CR3=0010597b6000 CR4=001627e0
[427483.148498] Sysenter RSP= CS:RIP=0010:8f196cc0
[427483.149365] EFER = 0x0d01  PAT = 0x0007050600070106
[427483.150231] *** Control State ***
[427483.151077] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
[427483.151942] EntryControls=d1ff ExitControls=002fefff
[427483.152800] ExceptionBitmap=00060042 PFECmask= PFECmatch=
[427483.153661] VMEntry: intr_info= errcode=0006 ilen=
[427483.154521] VMExit: intr_info= errcode= ilen=0004
[427483.155376]        reason=8021 qualification=
[427483.156230] IDTVectoring: info= errcode=
[427483.157068] TSC Offset = 0xfffccfc261506dd9
[427483.157905] TPR Threshold = 0x0d
[427483.158728] EPT pointer = 0x0009b437701e
[427483.159550] PLE Gap=0080 Window=0008
[427483.160370] Virtual processor ID = 0x0004


> On 16 Sep 2020, at 17:11, Vinícius Ferrão  wrote:
> 
> Hello,
> 
> I’m an Exchange Server VM that’s going down to suspend without possibility of 
> recovery. I need to click on shutdown and them power on. I can’t find 
> anything useful on the logs, except on “dmesg” of the host:
> 
> [47807.747606] *** Guest State ***
> [47807.747633] CR0: actual=0x00050032, shadow=0x00050032, 
> gh_mask=fff7
> [47807.747671] CR4: actual=0x2050, shadow=0x, 
> gh_mask=f871
> [47807.747721] CR3 = 0x001ad002
> [47807.747739] RSP = 0xc20904fa3770  RIP = 0x8000
> [47807.747766] RFLAGS=0x0002        DR7 = 0x0400
> [47807.747792] Sysenter RSP= CS:RIP=:
> [47807.747821] CS:  sel=0x9100, attr=0x08093, limit=0x, 
> base=0x7ff91000
> [47807.747855] DS:  sel=0x, 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-20 Thread Vinícius Ferrão via Users
Hi, sorry to bump the thread.

But I still with this issue on the VM. This crashes are still happening, and I 
really don’t know what to do. Since there’s nothing on logs, except from that 
message on `dmesg` of the host machine I started changing setting to see if 
anything changes or if I at least I get a pattern.

What I’ve tried:
1. Disabled I/O Threading on VM.
2. Increased I/O Threading to 2 form 1.
3. Disabled Memory Balooning.
4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
RAM.
5. Moved the VM to another host.
6. Dedicated a host specific to this VM.
7. Check on the storage system to see if there’s any resource starvation, but 
everything seems to be fine.
8. Checked both iSCSI switches to see if there’s something wrong with the 
fabrics: 0 errors.

I’m really running out of ideas. The VM was working normally and suddenly this 
started.

Thanks,

PS: When I was typing this message it crashed again:

[427483.126725] *** Guest State ***
[427483.127661] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[427483.128505] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[427483.129342] CR3 = 0x0001849ff002
[427483.130177] RSP = 0xb10186b0  RIP = 0x8000
[427483.131014] RFLAGS=0x0002 DR7 = 0x0400
[427483.131859] Sysenter RSP= CS:RIP=:
[427483.132708] CS:   sel=0x9b00, attr=0x08093, limit=0x, 
base=0x7ff9b000
[427483.133559] DS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.134413] SS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.135237] ES:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.136040] FS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.136842] GS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.137629] GDTR:   limit=0x0057, 
base=0xb10186eb4fb0
[427483.138409] LDTR: sel=0x, attr=0x1, limit=0x000f, 
base=0x
[427483.139202] IDTR:   limit=0x, 
base=0x
[427483.139998] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
base=0xb10186eb3000
[427483.140816] EFER = 0x  PAT = 0x0007010600070106
[427483.141650] DebugCtl = 0x  DebugExceptions = 
0x
[427483.142503] Interruptibility = 0009  ActivityState = 
[427483.143353] *** Host State ***
[427483.144194] RIP = 0xc0c65024  RSP = 0x9253c0b9bc90
[427483.145043] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
[427483.145903] FSBase=7fcc13816700 GSBase=925adf24 
TRBase=925adf244000
[427483.146766] GDTBase=925adf24c000 IDTBase=ff528000
[427483.147630] CR0=80050033 CR3=0010597b6000 CR4=001627e0
[427483.148498] Sysenter RSP= CS:RIP=0010:8f196cc0
[427483.149365] EFER = 0x0d01  PAT = 0x0007050600070106
[427483.150231] *** Control State ***
[427483.151077] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
[427483.151942] EntryControls=d1ff ExitControls=002fefff
[427483.152800] ExceptionBitmap=00060042 PFECmask= PFECmatch=
[427483.153661] VMEntry: intr_info= errcode=0006 ilen=
[427483.154521] VMExit: intr_info= errcode= ilen=0004
[427483.155376] reason=8021 qualification=
[427483.156230] IDTVectoring: info= errcode=
[427483.157068] TSC Offset = 0xfffccfc261506dd9
[427483.157905] TPR Threshold = 0x0d
[427483.158728] EPT pointer = 0x0009b437701e
[427483.159550] PLE Gap=0080 Window=0008
[427483.160370] Virtual processor ID = 0x0004


> On 16 Sep 2020, at 17:11, Vinícius Ferrão  wrote:
> 
> Hello,
> 
> I’m an Exchange Server VM that’s going down to suspend without possibility of 
> recovery. I need to click on shutdown and them power on. I can’t find 
> anything useful on the logs, except on “dmesg” of the host:
> 
> [47807.747606] *** Guest State ***
> [47807.747633] CR0: actual=0x00050032, shadow=0x00050032, 
> gh_mask=fff7
> [47807.747671] CR4: actual=0x2050, shadow=0x, 
> gh_mask=f871
> [47807.747721] CR3 = 0x001ad002
> [47807.747739] RSP = 0xc20904fa3770  RIP = 0x8000
> [47807.747766] RFLAGS=0x0002 DR7 = 0x0400
> [47807.747792] Sysenter RSP= CS:RIP=:
> [47807.747821] CS:   sel=0x9100, attr=0x08093, limit=0x, 
> base=0x7ff91000
> [47807.747855] DS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [47807.747889] SS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [47807.747923] ES:   sel=0x,