[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Gianluca Cecchi Sat, 28 Mar 2020 16:46:09 -0700

On Sat, Mar 28, 2020 at 7:34 PM Nir Soffer <nsof...@redhat.com> wrote:


> On Sat, Mar 28, 2020 at 5:00 AM Gianluca Cecchi
> <gianluca.cec...@gmail.com> wrote:
> ...
> > Further information.
> > What I see around time frame in gluster brick log file
> gluster_bricks-vmstore-vmstore.log (timestamp is behind 1 hour in log file)
> >
> > [2020-03-27 23:30:38.575808] I [MSGID: 101055]
> [client_t.c:436:gf_client_unref] 0-vmstore-server: Shutting down connection
> CTX_ID:6e8f70b8-1946-4505-860f-be90e5807cb3-GRAPH_ID:0-PID:223418-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0
> > [2020-03-27 23:35:15.281449] E [MSGID: 113072]
> [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed:
> offset 0, [Invalid argument]
> > [2020-03-27 23:35:15.281545] E [MSGID: 115067]
> [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-vmstore-server: 34139378:
> WRITEV 10 (00d9fe81-8a31-498e-8401-7b9d1477378e), client:
> CTX_ID:d04437ba-ef98-43df-864f-5e9d3738620a-GRAPH_ID:0-PID:27687-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0,
> error-xlator: vmstore-posix [Invalid argument]
> > [2020-03-27 23:40:15.415794] E [MSGID: 113072]
> [posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed:
> offset 0, [Invalid argument]
>
> Invalid arguments are expected when activating a storage domain, and
> every 5 minutes when
> storage domain are refreshed. The writes are performed to to a temporary
> file at
> /rhev/data-center/mnt/server:_path/.prob-random-uuid
>
> These logs do not show the path, so we don't know if the writes are
> related to block size probing.
>
> But in vdsm log we see:
>
> 2020-03-27 00:40:08,979+0100 INFO  (monitor/665ff83)
> [storage.StorageDomain] Removing remnants of deleted images []
> (fileSD:726)
>
> This call happens when vdsm is refreshing storage domain. Right before
> this log, vdsm try to detect the underlying
> storage block size.
>
> So it looks like the gluster logs are related to block size probing
> and are not related to the
> I/O error that caused the VM to pause.
>

[snip]


> Looking at both "abnormal vm stop" and storage refresh events:
>
> [snip]


> I don't see any relation between refreshes and the abnormal vm stop events.
>
> I think the key to understanding this is to enable more verbose logs
> in gluster understand what was
> the failure that caused the vm to stop.
>
>

Ah, ok. Thanks
It seems default gluster logs level are INFO and I can have them more
verbose for a limited amount of time seeing if more information is provided.
Can I do it with VMs running and only doing sort of reload of the service
or do I have to stop all to do it?



> It would also help if we had detailed error logs in qemu log in
> /var/log/libvirt/qemu/vm-name.log
>

I will find them. The system is not available to check right now


> Did you enable libvirt logs? We may have more information about the error
> there.
>
> You can enable logs by modifying these lines in /etc/libvirt/libvirtd.conf:
>
> log_filters="1:qemu 1:libvirt 4:object 4:json 4:event 1:util"
> log_outputs="1:file:/var/log/libvirt/libvirtd.log"
>
> And restart libvirt.
>
> Note that libvirt log may be huge, so you need to watch it and change
> the log level or filter after you collect what you need.
>
> To log only warning and errors use:
>
> log_outputs="3:file:/var/log/libvirt/libvirtd.log"
>
> Someone from gluster should help debugging this.
>
> Nir
>
>
Ok, I could also try this way if enabling more verbose gluster logs is not
sufficient.

Thanks for all detailed explanations, in the mean time.
Gianluca

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7MOOUIVWUABS2IMM4BVP4BR7JZL2DTWW/

[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

Reply via email to