Re: [ovirt-users] Ovirt 3.5 host gluster storage connection failure
On 12/23/2015 11:00 PM, Steve Dainard wrote: I've attached the client gluster log starting at the first log of the same day as failure. Nothing significant in the client log after the crash and subsequent remount. The ENODATA warnings can be ignored. There was a patch (http://review.gluster.org/#/c/12015/) to change the log level, let me check if it has made it to a recent release version. The brick logs are all 0 file size on all of the replica 3 nodes... I just set 'gluster volume set vm-storage diagnostics.brick-log-level WARNING' but I'm not immediately seeing any logging to disk. I've also attached the compute1 vdsm.log file, which over the same time period is able to dd successfully so perhaps this discounts a storage side issue? I've also attached compute2 (failed node) for comparison. Ravi - I'm not familiar with core files, would this be in a non-devel version of gluster? Or is this something I can enable? I don't mind enabling it now if it could help diagnose a future issue. You should get the core files even on the non-devel versions. I think the method to enable core files and its location is distribution specific (depending on whether it uses abrt, systemd etc.) but you can check for ulimit, and /proc/sys/kernel/core_pattern settings in general. On my Fedora 23 machine, I get them on /core.pid_of_the_process>. The timestamp from the logs should also help in identifying the core file: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2015-12-22 23:04:00 Regards, Ravi ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Ovirt 3.5 host gluster storage connection failure
On 12/23/2015 11:44 AM, Sahina Bose wrote: signal received: 6 time of crash: 2015-12-22 23:04:00 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.6.7 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f0d091f6392] /lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7f0d0920d88d] /lib64/libc.so.6(+0x35650)[0x7f0d0820f650] /lib64/libc.so.6(gsignal+0x37)[0x7f0d0820f5d7] /lib64/libc.so.6(abort+0x148)[0x7f0d08210cc8] /lib64/libc.so.6(+0x75e07)[0x7f0d0824fe07] /lib64/libc.so.6(+0x7d1fd)[0x7f0d082571fd] /usr/lib64/glusterfs/3.6.7/xlator/protocol/client.so(client_local_wipe+0x39)[0x7f0cfe8acdf9] /usr/lib64/glusterfs/3.6.7/xlator/protocol/client.so(client3_3_readv_cbk+0x487)[0x7f0cfe8c1197] /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f0d08fca100] /lib64/libgfrpc.so.0(rpc_clnt_notify+0x174)[0x7f0d08fca374] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f0d08fc62c3] /usr/lib64/glusterfs/3.6.7/rpc-transport/socket.so(+0x8790)[0x7f0d047f3790] /usr/lib64/glusterfs/3.6.7/rpc-transport/socket.so(+0xaf84)[0x7f0d047f5f84] /lib64/libglusterfs.so.0(+0x767c2)[0x7f0d0924b7c2] /usr/sbin/glusterfs(main+0x502)[0x7f0d096a0fe2] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f0d081fbaf5] /usr/sbin/glusterfs(+0x6381)[0x7f0d096a1381] Could you provide the gluster mount logs from the client, and the gluster brick logs from the gluster servers? Also, do you have a core file of the crash? -Ravi ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Ovirt 3.5 host gluster storage connection failure
[+ Ravi, Pranith] On 12/23/2015 06:00 AM, Steve Dainard wrote: I have two hosts, only one of them was running VM's at the time of this crash so I can't tell if this is a node specific problem. rpm -qa | egrep -i 'gluster|vdsm|libvirt' |sort glusterfs-3.6.7-1.el7.x86_64 glusterfs-api-3.6.7-1.el7.x86_64 glusterfs-cli-3.6.7-1.el7.x86_64 glusterfs-fuse-3.6.7-1.el7.x86_64 glusterfs-libs-3.6.7-1.el7.x86_64 glusterfs-rdma-3.6.7-1.el7.x86_64 libvirt-client-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-interface-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-network-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-qemu-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-secret-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-storage-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-kvm-1.2.8-16.el7_1.5.x86_64 libvirt-lock-sanlock-1.2.8-16.el7_1.5.x86_64 libvirt-python-1.2.8-7.el7_1.1.x86_64 vdsm-4.16.30-0.el7.centos.x86_64 vdsm-cli-4.16.30-0.el7.centos.noarch vdsm-jsonrpc-4.16.30-0.el7.centos.noarch vdsm-python-4.16.30-0.el7.centos.noarch vdsm-python-zombiereaper-4.16.30-0.el7.centos.noarch vdsm-xmlrpc-4.16.30-0.el7.centos.noarch vdsm-yajsonrpc-4.16.30-0.el7.centos.noarch VM's were in a paused state, with errors in UI: 2015-Dec-22, 15:06 VM pcic-apps has paused due to unknown storage error. 2015-Dec-22, 15:06 Host compute2 is not responding. It will stay in Connecting state for a grace period of 82 seconds and after that an attempt to fence the host will be issued. 2015-Dec-22, 15:03 Invalid status on Data Center EDC2. Setting Data Center status to Non Responsive (On host compute2, Error: General Exception). 2015-Dec-22, 15:03 VM pcic-storage has paused due to unknown storage error. 2015-Dec-22, 15:03 VM docker1 has paused due to unknown storage error. VDSM logs look normal until: Dummy-99::DEBUG::2015-12-22 23:03:58,949::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) Dummy-99::DEBUG::2015-12-22 23:03:58,963::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) SUCCESS: = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.00350501 s, 292 MB/s\n'; = 0 VM Channels Listener::INFO::2015-12-22 23:03:59,527::guestagent::180::vm.Vm::(_handleAPIVersion) vmId=`7067679e-43aa-43c0-b263-b0a711ade2e2`::Guest API version changed from 2 to 1 Thread-245428::DEBUG::2015-12-22 23:03:59,718::libvirtconnection::151::root::(wrapper) Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found: Requested metadata element is not present libvirtEventLoop::INFO::2015-12-22 23:04:00,447::vm::4982::vm.Vm::(_onIOError) vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::DEBUG::2015-12-22 23:04:00,447::vm::5666::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2015-12-22 23:04:00,447::vm::4982::vm.Vm::(_onIOError) vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::abnormal vm stop device virtio-disk0 error eother ... libvirtEventLoop::INFO::2015-12-22 23:04:00,843::vm::4982::vm.Vm::(_onIOError) vmId=`97fbbf97-944b-4b77-b0bf-6a831f9090d8`::abnormal vm stop device virtio-disk1 error eother libvirtEventLoop::DEBUG::2015-12-22 23:04:00,844::vm::5666::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`97fbbf97-944b-4b77-b0bf-6a831f9090d8`::event Suspended detail 2 opaque None Dummy-99::DEBUG::2015-12-22 23:04:00,973::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) Dummy-99::DEBUG::2015-12-22 23:04:00,983::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) FAILED: = "dd: failed to open '/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox': Transport endpoint is not connected\n"; = 1 Dummy-99::ERROR::2015-12-22 23:04:00,983::storage_mailbox::787::Storage.MailBox.SpmMailMonitor::(run) Error checking for mail Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 785, in run self._checkForMail() File "/usr/share/vdsm/storage/storage_mailbox.py", line 734, in _checkForMail "Could not read mailbox: %s" % self._inbox) IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox Dummy-99::DEBUG::2015-12-22 23:04:02,987::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) Dummy-99::DEBUG::2015-
[ovirt-users] Ovirt 3.5 host gluster storage connection failure
I have two hosts, only one of them was running VM's at the time of this crash so I can't tell if this is a node specific problem. rpm -qa | egrep -i 'gluster|vdsm|libvirt' |sort glusterfs-3.6.7-1.el7.x86_64 glusterfs-api-3.6.7-1.el7.x86_64 glusterfs-cli-3.6.7-1.el7.x86_64 glusterfs-fuse-3.6.7-1.el7.x86_64 glusterfs-libs-3.6.7-1.el7.x86_64 glusterfs-rdma-3.6.7-1.el7.x86_64 libvirt-client-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-interface-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-network-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-qemu-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-secret-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-driver-storage-1.2.8-16.el7_1.5.x86_64 libvirt-daemon-kvm-1.2.8-16.el7_1.5.x86_64 libvirt-lock-sanlock-1.2.8-16.el7_1.5.x86_64 libvirt-python-1.2.8-7.el7_1.1.x86_64 vdsm-4.16.30-0.el7.centos.x86_64 vdsm-cli-4.16.30-0.el7.centos.noarch vdsm-jsonrpc-4.16.30-0.el7.centos.noarch vdsm-python-4.16.30-0.el7.centos.noarch vdsm-python-zombiereaper-4.16.30-0.el7.centos.noarch vdsm-xmlrpc-4.16.30-0.el7.centos.noarch vdsm-yajsonrpc-4.16.30-0.el7.centos.noarch VM's were in a paused state, with errors in UI: 2015-Dec-22, 15:06 VM pcic-apps has paused due to unknown storage error. 2015-Dec-22, 15:06 Host compute2 is not responding. It will stay in Connecting state for a grace period of 82 seconds and after that an attempt to fence the host will be issued. 2015-Dec-22, 15:03 Invalid status on Data Center EDC2. Setting Data Center status to Non Responsive (On host compute2, Error: General Exception). 2015-Dec-22, 15:03 VM pcic-storage has paused due to unknown storage error. 2015-Dec-22, 15:03 VM docker1 has paused due to unknown storage error. VDSM logs look normal until: Dummy-99::DEBUG::2015-12-22 23:03:58,949::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) Dummy-99::DEBUG::2015-12-22 23:03:58,963::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) SUCCESS: = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.00350501 s, 292 MB/s\n'; = 0 VM Channels Listener::INFO::2015-12-22 23:03:59,527::guestagent::180::vm.Vm::(_handleAPIVersion) vmId=`7067679e-43aa-43c0-b263-b0a711ade2e2`::Guest API version changed from 2 to 1 Thread-245428::DEBUG::2015-12-22 23:03:59,718::libvirtconnection::151::root::(wrapper) Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found: Requested metadata element is not present libvirtEventLoop::INFO::2015-12-22 23:04:00,447::vm::4982::vm.Vm::(_onIOError) vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::DEBUG::2015-12-22 23:04:00,447::vm::5666::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2015-12-22 23:04:00,447::vm::4982::vm.Vm::(_onIOError) vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::abnormal vm stop device virtio-disk0 error eother ... libvirtEventLoop::INFO::2015-12-22 23:04:00,843::vm::4982::vm.Vm::(_onIOError) vmId=`97fbbf97-944b-4b77-b0bf-6a831f9090d8`::abnormal vm stop device virtio-disk1 error eother libvirtEventLoop::DEBUG::2015-12-22 23:04:00,844::vm::5666::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`97fbbf97-944b-4b77-b0bf-6a831f9090d8`::event Suspended detail 2 opaque None Dummy-99::DEBUG::2015-12-22 23:04:00,973::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) Dummy-99::DEBUG::2015-12-22 23:04:00,983::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) FAILED: = "dd: failed to open '/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox': Transport endpoint is not connected\n"; = 1 Dummy-99::ERROR::2015-12-22 23:04:00,983::storage_mailbox::787::Storage.MailBox.SpmMailMonitor::(run) Error checking for mail Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 785, in run self._checkForMail() File "/usr/share/vdsm/storage/storage_mailbox.py", line 734, in _checkForMail "Could not read mailbox: %s" % self._inbox) IOError: [Errno 5] _handleRequests._checkForMail - Could not read mailbox: /rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox Dummy-99::DEBUG::2015-12-22 23:04:02,987::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail) dd if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) Dummy-99::DEBUG::2015-12-22 23:04:02,994::storage_mailbox::731::Storage.Misc.excCmd::(_check