On Sat, Mar 28, 2020 at 5:00 AM Gianluca Cecchi <gianluca.cec...@gmail.com> wrote: ... > Further information. > What I see around time frame in gluster brick log file > gluster_bricks-vmstore-vmstore.log (timestamp is behind 1 hour in log file) > > [2020-03-27 23:30:38.575808] I [MSGID: 101055] > [client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down connection > CTX_ID:6e8f70b8-1946-4505-860f-be90e5807cb3-GRAPH_ID:0-PID:223418-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0 > [2020-03-27 23:35:15.281449] E [MSGID: 113072] > [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: > offset 0, [Invalid argument] > [2020-03-27 23:35:15.281545] E [MSGID: 115067] > [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-vmstore-server: 34139378: > WRITEV 10 (00d9fe81-8a31-498e-8401-7b9d1477378e), client: > CTX_ID:d04437ba-ef98-43df-864f-5e9d3738620a-GRAPH_ID:0-PID:27687-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0, > error-xlator: vmstore-posix [Invalid argument] > [2020-03-27 23:40:15.415794] E [MSGID: 113072] > [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: > offset 0, [Invalid argument]
Invalid arguments are expected when activating a storage domain, and every 5 minutes when storage domain are refreshed. The writes are performed to to a temporary file at /rhev/data-center/mnt/server:_path/.prob-random-uuid These logs do not show the path, so we don't know if the writes are related to block size probing. But in vdsm log we see: 2020-03-27 00:40:08,979+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) This call happens when vdsm is refreshing storage domain. Right before this log, vdsm try to detect the underlying storage block size. So it looks like the gluster logs are related to block size probing and are not related to the I/O error that caused the VM to pause. Looking at both "abnormal vm stop" and storage refresh events: $ egrep 'Removing remnants of deleted images|abnormal vm stop' vdsm.log.18 2020-03-27 00:20:08,555+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:24:38,254+0100 INFO (monitor/0cb6ade) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:24:47,799+0100 INFO (monitor/81b9724) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:25:08,660+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:29:38,344+0100 INFO (monitor/0cb6ade) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:29:47,901+0100 INFO (monitor/81b9724) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:30:08,761+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:34:38,436+0100 INFO (monitor/0cb6ade) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:34:48,004+0100 INFO (monitor/81b9724) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:35:08,877+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:35:23,817+0100 INFO (libvirt/events) [virt.vm] (vmId='1abeafb6-72b2-4893-9cc5-41846b737670') abnormal vm stop device ua-3b753210-09f5-4c40-90fb-ded93b00d19f error eother (vm:5079) VM disk is on: /rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/3b753210-09f5-4c40-90fb-ded93b00d19f/5a4aac90-a455-41b4-80dd-cf8c1ed81893 2020-03-27 00:39:38,536+0100 INFO (monitor/0cb6ade) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:39:48,089+0100 INFO (monitor/81b9724) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:40:08,979+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:44:38,623+0100 INFO (monitor/0cb6ade) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:44:48,166+0100 INFO (monitor/81b9724) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:45:09,107+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:49:38,719+0100 INFO (monitor/0cb6ade) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:49:48,246+0100 INFO (monitor/81b9724) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:50:09,234+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:53:07,405+0100 INFO (libvirt/events) [virt.vm] (vmId='6a779081-abd1-4bb1-9e9c-97f334656131') abnormal vm stop device ua-0a91c346-23a5-4432-8af7-ae0a28f9c208 error eother (vm:5079) VM disk is on: /rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/0a91c346-23a5-4432-8af7-ae0a28f9c208/2741af0b-27fe-4f7b-a8bc-8b34b9e31cb6 2020-03-27 00:54:15,546+0100 INFO (libvirt/events) [virt.vm] (vmId='0df701f4-2ae2-4b39-9445-d011673ddf28') abnormal vm stop device ua-73776048-27ba-412d-a01c-9a46d267401c error eother (vm:5079) VM disk is on: /rhev/data-center/mnt/glusterSD/ovirtst.mydomain.storage:_vmstore/81b97244-4b69-4d49-84c4-c822387adc6a/images/73776048-27ba-412d-a01c-9a46d267401c/b3bc914f-8882-43e1-9dd0-dd7e43d60084 2020-03-27 00:54:38,830+0100 INFO (monitor/0cb6ade) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:54:48,344+0100 INFO (monitor/81b9724) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:55:09,361+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:59:38,933+0100 INFO (monitor/0cb6ade) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 00:59:48,431+0100 INFO (monitor/81b9724) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) 2020-03-27 01:00:09,474+0100 INFO (monitor/665ff83) [storage.StorageDomain] Removing remnants of deleted images [] (fileSD:726) I don't see any relation between refreshes and the abnormal vm stop events. I think the key to understanding this is to enable more verbose logs in gluster understand what was the failure that caused the vm to stop. It would also help if we had detailed error logs in qemu log in /var/log/libvirt/qemu/vm-name.log Did you enable libvirt logs? We may have more information about the error there. You can enable logs by modifying these lines in /etc/libvirt/libvirtd.conf: log_filters="1:qemu 1:libvirt 4:object 4:json 4:event 1:util" log_outputs="1:file:/var/log/libvirt/libvirtd.log" And restart libvirt. Note that libvirt log may be huge, so you need to watch it and change the log level or filter after you collect what you need. To log only warning and errors use: log_outputs="3:file:/var/log/libvirt/libvirtd.log" Someone from gluster should help debugging this. Nir _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VKFEGZORC47ZIER2HL6MMDON35HWK2MR/