[ovirt-users] VM has been paused due to unknown storage error. - Only on NFS / EMC
Hello, I'm facing a strange issue on my OVirt Dev pool. Indeed, when I create high disk load a VM (kickstart installation or iozone test for example) , VM is paused due to storage I/O error. Problem is 100% reproducible, and is located only on NFS (v3 and v4) on my EMC VNXe3200 NAS's (I have a 10TB and a 20TB NAS) I have done test (simple iozone -a) with VM 1 vCPU / 2GB RAM and 2 disk (1*20GB + 1*10GB). Both VMs disk are placed into the same SAN / NAS for each test. Results are: - EMC VNXe3200 (10TB) NFSv3 => VM stopped after 10- 30s iozone lauch - EMC VNXe3200 (20TB) NFSv3 => VM stopped after 10- 30s iozone lauch - EMC VNXe3200 (10TB) ISCSI => No problem, iozone test finish, and performance are "standard" regarding load of the VNXe (60MB/s sequential Write for info) - EMC VNXe3200 (20TB) ISCSI=> No problem, iozone test finish, and performance are "standard"regarding load of the VNXe (40-60MB/S sequential Write for info) - NETAPP FAS2240 NFSv3 => No problem, iozone test finish, and performance are good (100MB/s sequential Write for info) - Freebsd10 NAS NFSv3 => No problem, iozone test finish, and performance are good regarding NAS conf (80MB/s sequential Write for info) I can't explain why I have an issue on NFS and I have not issue on ISCSI (on the same EMC VNxe3200...). NFS default params are keeped when storage added to datacenter:(rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=XXX,local_lock=none,addr=XX) (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=XX,mountvers=3,mountport=1234,mountproto=udp,local_lock=all,addr=XX) Debug logs on host does not help me a lot: 2018-08-22 15:36:13,883+0200 INFO (periodic/22) [vdsm.api] START multipath_health() from=internal, task_id=53da2eca-eb66-400c-8367-ab62cedc5dc1 (api:46) 2018-08-22 15:36:13,883+0200 INFO (periodic/22) [vdsm.api] FINISH multipath_health return={} from=internal, task_id=53da2eca-eb66-400c-8367-ab62cedc5dc1 (api:52) 2018-08-22 15:36:15,161+0200 INFO (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') abnormal vm stop device ua-179375b0-0a18-4fcb-a884-4aeb1c 8fed97 error eother (vm:5116) 2018-08-22 15:36:15,161+0200 INFO (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') CPU stopped: onIOError (vm:6157) 2018-08-22 15:36:15,162+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] values: {'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2', 'resumeBehavior': 'au to_resume', 'memGuaranteedSize': 1024, 'launchPaused': 'false', 'startTime': 1534944832.058459, 'destroy_on_reboot': False, 'pauseTime': 4999289.49} (metadata:596) 2018-08-22 15:36:15,162+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] values updated: {'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2', 'resumeBehavi or': 'auto_resume', 'memGuaranteedSize': 1024, 'launchPaused': 'false', 'startTime': 1534944832.058459, 'destroy_on_reboot': False, 'pauseTime': 4999289.49} (metadat a:601) 2018-08-22 15:36:15,168+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] dumped metadata for b139a9b9-16bc-40ee-ba84-d1d59e5ce17a: 2018-08-22 15:36:15,169+0200 DEBUG (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') event Suspended detail 2 opaque None (vm:5520) 2018-08-22 15:36:15,169+0200 INFO (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') CPU stopped: onSuspend (vm:6157) 2018-08-22 15:36:15,174+0200 WARN (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') device sda reported I/O error (vm:4065) 2018-08-22 15:36:15,340+0200 DEBUG (vmchannels) [virt.vm] (vmId='46d496af-e2d0-4caa-9a13-10c624f265d8') Guest's message heartbeat: {u'memory-stat': {u'swap_out': 0, u'majflt': 0, u'swap_usage': 0, u'mem_cached': 119020, u'mem_free': 3693900, u'mem_buffers': 2108, u'swap_in': 0, u'swap_total': 8257532, u'pageflt': 141, u'mem_tota l': 3878980, u'mem_unused': 3572772}, u'free-ram': u'3607', u'apiVersion': 3} (guestagent:337) Do you have some idea ? ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/74HTXVBQEYTF4C5ZTH4FR3UQ3SM7LBQU/
Re: [ovirt-users] VM has been paused due to unknown storage error
Hello Mario, On 16/07/2015 04:12 μμ, m...@ohnewald.net wrote: Check your vdsm Logs on your nodes. I bet you find something about I/O errors i guess... Yes there are many IO errors libvirtEventLoop::INFO::2015-07-16 22:30:02,237::vm::3609::virt.vm::(onIOError) vmId=`bb46929c-0b4e-4f01-868a-7e7638fa943b`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::INFO::2015-07-16 22:30:02,237::vm::4889::virt.vm::(_logGuestCpuStatus) vmId=`bb46929c-0b4e-4f01-868a-7e7638fa943b`::CPU stopped: onIOError Full vdsm log - https://paste.fedoraproject.org/245148/43707759/ and glusterfs errors W [MSGID: 114031] [client-rpc-fops.c:2973:client3_3_lookup_cbk] 0-distributed_vol-client-0: remote operation failed: Transport endpoint is not connected. Path: / (----0001) [Transport endpoint is not connected] W [fuse-bridge.c:2273:fuse_writev_cbk] 0-glusterfs-fuse: 362694: WRITE => -1 (Transport endpoint is not connected) K. Also check your glusterfs logs. Maybe you can find some problems, too. Mario Am 16.07.15 um 10:29 schrieb Konstantinos Christidis: Hello oVirt users, I am facing a serious problem regarding my GlusterFS storage and virtual machines that have *bootable* disks on this storage. All my VMs that have GlusterFS disks are occasionally (1-2 times/hour) becoming paused with the following Error: VM vm02.mytld has been paused due to unknown storage error. Engine Log INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-69) [] VM '247bb0f3-1a77-44e4-a404-3271eaee94be'(vm02.mytld) moved from 'Up' --> 'Paused' INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm02.mytld has been paused. ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm02.mytld has been paused due to unknown storage error My iSCSI VM's, some of which may have mounted (not bootable) disks from the same GlusterFS storage, do NOT suffer from this issue AFAIK. My installation (oVirt 3.6/CentOS 7) is pretty much a typical one, with a GlusterFS enabled cluster with 4 hosts, 2-3 networks, and 6-7 VMs.. Thanks, K. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VM has been paused due to unknown storage error
Check your vdsm Logs on your nodes. I bet you find something about I/O errors i guess... Also check your glusterfs logs. Maybe you can find some problems, too. Mario Am 16.07.15 um 10:29 schrieb Konstantinos Christidis: Hello oVirt users, I am facing a serious problem regarding my GlusterFS storage and virtual machines that have *bootable* disks on this storage. All my VMs that have GlusterFS disks are occasionally (1-2 times/hour) becoming paused with the following Error: VM vm02.mytld has been paused due to unknown storage error. Engine Log INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-69) [] VM '247bb0f3-1a77-44e4-a404-3271eaee94be'(vm02.mytld) moved from 'Up' --> 'Paused' INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm02.mytld has been paused. ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm02.mytld has been paused due to unknown storage error My iSCSI VM's, some of which may have mounted (not bootable) disks from the same GlusterFS storage, do NOT suffer from this issue AFAIK. My installation (oVirt 3.6/CentOS 7) is pretty much a typical one, with a GlusterFS enabled cluster with 4 hosts, 2-3 networks, and 6-7 VMs.. Thanks, K. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VM has been paused due to unknown storage error
Hi again Konstantinos. Can you please attach the full VDSM & Engine logs so we can understand the reason for your problem? Thanks. - Original Message - From: "Konstantinos Christidis" To: users@ovirt.org Sent: Thursday, July 16, 2015 11:29:26 AM Subject: [ovirt-users] VM has been paused due to unknown storage error Hello oVirt users, I am facing a serious problem regarding my GlusterFS storage and virtual machines that have *bootable* disks on this storage. All my VMs that have GlusterFS disks are occasionally (1-2 times/hour) becoming paused with the following Error: VM vm02.mytld has been paused due to unknown storage error. Engine Log INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-69) [] VM '247bb0f3-1a77-44e4-a404-3271eaee94be'(vm02.mytld) moved from 'Up' --> 'Paused' INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm02.mytld has been paused. ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm02.mytld has been paused due to unknown storage error My iSCSI VM's, some of which may have mounted (not bootable) disks from the same GlusterFS storage, do NOT suffer from this issue AFAIK. My installation (oVirt 3.6/CentOS 7) is pretty much a typical one, with a GlusterFS enabled cluster with 4 hosts, 2-3 networks, and 6-7 VMs.. Thanks, K. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] VM has been paused due to unknown storage error
Hello oVirt users, I am facing a serious problem regarding my GlusterFS storage and virtual machines that have *bootable* disks on this storage. All my VMs that have GlusterFS disks are occasionally (1-2 times/hour) becoming paused with the following Error: VM vm02.mytld has been paused due to unknown storage error. Engine Log INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-69) [] VM '247bb0f3-1a77-44e4-a404-3271eaee94be'(vm02.mytld) moved from 'Up' --> 'Paused' INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm02.mytld has been paused. ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-69) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm02.mytld has been paused due to unknown storage error My iSCSI VM's, some of which may have mounted (not bootable) disks from the same GlusterFS storage, do NOT suffer from this issue AFAIK. My installation (oVirt 3.6/CentOS 7) is pretty much a typical one, with a GlusterFS enabled cluster with 4 hosts, 2-3 networks, and 6-7 VMs.. Thanks, K. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users