[ovirt-users] VM has been paused due to unknown storage error. - Only on NFS / EMC

2018-08-22 Thread jeanbaptiste
Hello,

I'm facing a strange issue on my OVirt Dev pool.

Indeed, when I create high disk load a VM (kickstart installation or iozone 
test for example) , VM is paused due to storage I/O error.
Problem is 100% reproducible, and is located only on NFS (v3 and v4) on my EMC 
VNXe3200 NAS's (I have a 10TB and a 20TB NAS)
I have done test (simple iozone -a) with  VM 1 vCPU / 2GB RAM and 2 disk 
(1*20GB + 1*10GB). Both VMs disk are placed into the same SAN / NAS for each 
test. Results are:
- EMC VNXe3200 (10TB) NFSv3 => VM stopped after 10- 30s iozone lauch
- EMC VNXe3200 (20TB) NFSv3 => VM stopped after 10- 30s iozone lauch
- EMC VNXe3200 (10TB) ISCSI => No problem, iozone test finish, and performance 
are "standard" regarding load of the VNXe (60MB/s sequential Write for info)
- EMC VNXe3200 (20TB) ISCSI=> No problem, iozone test finish, and performance 
are "standard"regarding load of the VNXe (40-60MB/S sequential Write for info)
- NETAPP FAS2240 NFSv3 => No problem, iozone test finish, and performance are 
good (100MB/s sequential Write for info)
- Freebsd10 NAS NFSv3 => No problem, iozone test finish, and performance are 
good regarding NAS conf (80MB/s sequential Write for info)

I can't explain why I have an issue on NFS and I have not issue on ISCSI (on 
the same EMC VNxe3200...). 
NFS default params are keeped when storage added to 
datacenter:(rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=XXX,local_lock=none,addr=XX)
(rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=XX,mountvers=3,mountport=1234,mountproto=udp,local_lock=all,addr=XX)

Debug logs on host does not help me a lot:
2018-08-22 15:36:13,883+0200 INFO  (periodic/22) [vdsm.api] START 
multipath_health() from=internal, task_id=53da2eca-eb66-400c-8367-ab62cedc5dc1 
(api:46)
2018-08-22 15:36:13,883+0200 INFO  (periodic/22) [vdsm.api] FINISH 
multipath_health return={} from=internal, 
task_id=53da2eca-eb66-400c-8367-ab62cedc5dc1 (api:52)
2018-08-22 15:36:15,161+0200 INFO  (libvirt/events) [virt.vm] 
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') abnormal vm stop device 
ua-179375b0-0a18-4fcb-a884-4aeb1c  
8fed97 error eother (vm:5116)
2018-08-22 15:36:15,161+0200 INFO  (libvirt/events) [virt.vm] 
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') CPU stopped: onIOError (vm:6157)
2018-08-22 15:36:15,162+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] 
values: {'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2', 
'resumeBehavior': 'au  to_resume', 
'memGuaranteedSize': 1024, 'launchPaused': 'false', 'startTime': 
1534944832.058459, 'destroy_on_reboot': False, 'pauseTime': 4999289.49} 
(metadata:596)
2018-08-22 15:36:15,162+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] 
values updated: {'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2', 
'resumeBehavi  or': 'auto_resume', 
'memGuaranteedSize': 1024, 'launchPaused': 'false', 'startTime': 
1534944832.058459, 'destroy_on_reboot': False, 'pauseTime': 4999289.49} 
(metadat  a:601)

2018-08-22 15:36:15,168+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] 
dumped metadata for b139a9b9-16bc-40ee-ba84-d1d59e5ce17a: 



2018-08-22 15:36:15,169+0200 DEBUG (libvirt/events) [virt.vm] 
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') event Suspended detail 2 opaque 
None (vm:5520)
2018-08-22 15:36:15,169+0200 INFO  (libvirt/events) [virt.vm] 
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') CPU stopped: onSuspend (vm:6157)
2018-08-22 15:36:15,174+0200 WARN  (libvirt/events) [virt.vm] 
(vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') device sda reported I/O error 
(vm:4065)
2018-08-22 15:36:15,340+0200 DEBUG (vmchannels) [virt.vm] 
(vmId='46d496af-e2d0-4caa-9a13-10c624f265d8') Guest's message heartbeat: 
{u'memory-stat': {u'swap_out': 0,   
u'majflt': 0, u'swap_usage': 0, u'mem_cached': 119020, u'mem_free': 3693900, 
u'mem_buffers': 2108, u'swap_in': 0, u'swap_total': 8257532, u'pageflt': 141, 
u'mem_tota  l': 3878980, 
u'mem_unused': 3572772}, u'free-ram': u'3607', u'apiVersion': 3} 
(guestagent:337)


Do you have some idea ?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/74HTXVBQEYTF4C5ZTH4FR3UQ3SM7LBQU/


Re: [ovirt-users] VM has been paused due to unknown storage error

2015-07-16 Thread Konstantinos Christidis

Hello Mario,

On 16/07/2015 04:12 μμ, m...@ohnewald.net wrote:
Check your vdsm Logs on your nodes. I bet you find something about I/O 
errors i guess...

Yes there are many IO errors
libvirtEventLoop::INFO::2015-07-16 
22:30:02,237::vm::3609::virt.vm::(onIOError) 
vmId=`bb46929c-0b4e-4f01-868a-7e7638fa943b`::abnormal vm stop device 
virtio-disk0 error eother
libvirtEventLoop::INFO::2015-07-16 
22:30:02,237::vm::4889::virt.vm::(_logGuestCpuStatus) 
vmId=`bb46929c-0b4e-4f01-868a-7e7638fa943b`::CPU stopped: onIOError

Full vdsm log - https://paste.fedoraproject.org/245148/43707759/


and glusterfs errors
W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 
0-distributed_vol-client-0: remote operation failed: Transport endpoint 
is not connected. Path: / (----0001) 
[Transport endpoint is not connected]
W [fuse-bridge.c:2273:fuse_writev_cbk] 0-glusterfs-fuse: 362694: WRITE 
=> -1 (Transport endpoint is not connected)



K.



Also check your glusterfs logs. Maybe you can find some problems, too.

Mario



Am 16.07.15 um 10:29 schrieb Konstantinos Christidis:

Hello oVirt users,

I am facing a serious problem regarding my GlusterFS storage and virtual
machines that have *bootable* disks on this storage.

All my VMs that have GlusterFS disks are occasionally (1-2 times/hour)
becoming paused with the following Error: VM vm02.mytld has been paused
due to unknown storage error.

Engine Log
INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
(DefaultQuartzScheduler_Worker-69) [] VM
'247bb0f3-1a77-44e4-a404-3271eaee94be'(vm02.mytld) moved from 'Up' -->
'Paused'
INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM vm02.mytld has been paused.
ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM vm02.mytld has been paused due to
unknown storage error

My iSCSI VM's, some of which may have mounted (not bootable) disks from
the same GlusterFS storage, do NOT suffer from this issue AFAIK.

My installation (oVirt 3.6/CentOS 7) is pretty much a typical one, with
a GlusterFS enabled cluster with 4 hosts, 2-3 networks, and 6-7 VMs..

Thanks,

K.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM has been paused due to unknown storage error

2015-07-16 Thread m...@ohnewald.net
Check your vdsm Logs on your nodes. I bet you find something about I/O 
errors i guess...


Also check your glusterfs logs. Maybe you can find some problems, too.

Mario



Am 16.07.15 um 10:29 schrieb Konstantinos Christidis:

Hello oVirt users,

I am facing a serious problem regarding my GlusterFS storage and virtual
machines that have *bootable* disks on this storage.

All my VMs that have GlusterFS disks are occasionally (1-2 times/hour)
becoming paused with the following Error: VM vm02.mytld has been paused
due to unknown storage error.

Engine Log
INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer]
(DefaultQuartzScheduler_Worker-69) [] VM
'247bb0f3-1a77-44e4-a404-3271eaee94be'(vm02.mytld) moved from 'Up' -->
'Paused'
INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM vm02.mytld has been paused.
ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM vm02.mytld has been paused due to
unknown storage error

My iSCSI VM's, some of which may have mounted (not bootable) disks from
the same GlusterFS storage, do NOT suffer from this issue AFAIK.

My installation (oVirt 3.6/CentOS 7) is pretty much a typical one, with
a GlusterFS enabled cluster with 4 hosts, 2-3 networks, and 6-7 VMs..

Thanks,

K.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM has been paused due to unknown storage error

2015-07-16 Thread Amit Aviram
Hi again Konstantinos.

Can you please attach the full VDSM & Engine logs so we can understand the 
reason for your problem?

Thanks.

- Original Message -
From: "Konstantinos Christidis" 
To: users@ovirt.org
Sent: Thursday, July 16, 2015 11:29:26 AM
Subject: [ovirt-users] VM has been paused due to unknown storage error

Hello oVirt users,

I am facing a serious problem regarding my GlusterFS storage and virtual 
machines that have *bootable* disks on this storage.

All my VMs that have GlusterFS disks are occasionally (1-2 times/hour) 
becoming paused with the following Error: VM vm02.mytld has been paused 
due to unknown storage error.

Engine Log
INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-69) [] VM 
'247bb0f3-1a77-44e4-a404-3271eaee94be'(vm02.mytld) moved from 'Up' --> 
'Paused'
INFO 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm02.mytld has been paused.
ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm02.mytld has been paused due to 
unknown storage error

My iSCSI VM's, some of which may have mounted (not bootable) disks from 
the same GlusterFS storage, do NOT suffer from this issue AFAIK.

My installation (oVirt 3.6/CentOS 7) is pretty much a typical one, with 
a GlusterFS enabled cluster with 4 hosts, 2-3 networks, and 6-7 VMs..

Thanks,

K.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] VM has been paused due to unknown storage error

2015-07-16 Thread Konstantinos Christidis

Hello oVirt users,

I am facing a serious problem regarding my GlusterFS storage and virtual 
machines that have *bootable* disks on this storage.


All my VMs that have GlusterFS disks are occasionally (1-2 times/hour) 
becoming paused with the following Error: VM vm02.mytld has been paused 
due to unknown storage error.


Engine Log
INFO  [org.ovirt.engine.core.vdsbroker.VmAnalyzer] 
(DefaultQuartzScheduler_Worker-69) [] VM 
'247bb0f3-1a77-44e4-a404-3271eaee94be'(vm02.mytld) moved from 'Up' --> 
'Paused'
INFO 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm02.mytld has been paused.
ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM vm02.mytld has been paused due to 
unknown storage error


My iSCSI VM's, some of which may have mounted (not bootable) disks from 
the same GlusterFS storage, do NOT suffer from this issue AFAIK.


My installation (oVirt 3.6/CentOS 7) is pretty much a typical one, with 
a GlusterFS enabled cluster with 4 hosts, 2-3 networks, and 6-7 VMs..


Thanks,

K.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users