Re: [ovirt-users] Ovirt 3.5 host gluster storage connection failure

2015-12-23 Thread Ravishankar N

On 12/23/2015 11:00 PM, Steve Dainard wrote:

I've attached the client gluster log starting at the first log of the
same day as failure.
Nothing significant in the client log after the crash and subsequent 
remount. The ENODATA warnings can be ignored. There was a patch 
(http://review.gluster.org/#/c/12015/) to change the log level, let me 
check if it has made it to a recent release version.


The brick logs are all 0 file size on all of the replica 3 nodes... I
just set 'gluster volume set vm-storage diagnostics.brick-log-level
WARNING' but I'm not immediately seeing any logging to disk.

I've also attached the compute1 vdsm.log file, which over the same
time period is able to dd successfully so perhaps this discounts a
storage side issue? I've also attached compute2 (failed node) for
comparison.


Ravi - I'm not familiar with core files, would this be in a non-devel
version of gluster? Or is this something I can enable? I don't mind
enabling it now if it could help diagnose a future issue.


You should get the core files even on the non-devel versions. I think 
the method to enable core files and its location  is distribution 
specific (depending on whether it uses abrt, systemd etc.)  but you can 
check for ulimit, and /proc/sys/kernel/core_pattern settings in general. 
On my Fedora 23 machine, I get them on /core.pid_of_the_process>.


The timestamp from the logs should also help in identifying the core file:

frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2015-12-22 23:04:00



Regards,
Ravi


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt 3.5 host gluster storage connection failure

2015-12-23 Thread Ravishankar N

On 12/23/2015 11:44 AM, Sahina Bose wrote:

signal received: 6
time of crash:
2015-12-22 23:04:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.7
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f0d091f6392]
/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7f0d0920d88d]
/lib64/libc.so.6(+0x35650)[0x7f0d0820f650]
/lib64/libc.so.6(gsignal+0x37)[0x7f0d0820f5d7]
/lib64/libc.so.6(abort+0x148)[0x7f0d08210cc8]
/lib64/libc.so.6(+0x75e07)[0x7f0d0824fe07]
/lib64/libc.so.6(+0x7d1fd)[0x7f0d082571fd]
/usr/lib64/glusterfs/3.6.7/xlator/protocol/client.so(client_local_wipe+0x39)[0x7f0cfe8acdf9] 

/usr/lib64/glusterfs/3.6.7/xlator/protocol/client.so(client3_3_readv_cbk+0x487)[0x7f0cfe8c1197] 


/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f0d08fca100]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x174)[0x7f0d08fca374]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f0d08fc62c3]
/usr/lib64/glusterfs/3.6.7/rpc-transport/socket.so(+0x8790)[0x7f0d047f3790] 

/usr/lib64/glusterfs/3.6.7/rpc-transport/socket.so(+0xaf84)[0x7f0d047f5f84] 


/lib64/libglusterfs.so.0(+0x767c2)[0x7f0d0924b7c2]
/usr/sbin/glusterfs(main+0x502)[0x7f0d096a0fe2]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f0d081fbaf5]
/usr/sbin/glusterfs(+0x6381)[0x7f0d096a1381]



Could you provide the gluster mount logs from the client, and the 
gluster brick logs from the gluster servers? 

Also, do you have a core file of the crash?

-Ravi

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Ovirt 3.5 host gluster storage connection failure

2015-12-22 Thread Steve Dainard
I have two hosts, only one of them was running VM's at the time of
this crash so I can't tell if this is a node specific problem.

rpm -qa | egrep -i 'gluster|vdsm|libvirt' |sort
glusterfs-3.6.7-1.el7.x86_64
glusterfs-api-3.6.7-1.el7.x86_64
glusterfs-cli-3.6.7-1.el7.x86_64
glusterfs-fuse-3.6.7-1.el7.x86_64
glusterfs-libs-3.6.7-1.el7.x86_64
glusterfs-rdma-3.6.7-1.el7.x86_64
libvirt-client-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-interface-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-network-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-qemu-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-secret-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-storage-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-kvm-1.2.8-16.el7_1.5.x86_64
libvirt-lock-sanlock-1.2.8-16.el7_1.5.x86_64
libvirt-python-1.2.8-7.el7_1.1.x86_64
vdsm-4.16.30-0.el7.centos.x86_64
vdsm-cli-4.16.30-0.el7.centos.noarch
vdsm-jsonrpc-4.16.30-0.el7.centos.noarch
vdsm-python-4.16.30-0.el7.centos.noarch
vdsm-python-zombiereaper-4.16.30-0.el7.centos.noarch
vdsm-xmlrpc-4.16.30-0.el7.centos.noarch
vdsm-yajsonrpc-4.16.30-0.el7.centos.noarch

VM's were in a paused state, with errors in UI:

2015-Dec-22, 15:06
VM pcic-apps has paused due to unknown storage error.
2015-Dec-22, 15:06
Host compute2 is not responding. It will stay in Connecting state for
a grace period of 82 seconds and after that an attempt to fence the
host will be issued.
2015-Dec-22, 15:03
Invalid status on Data Center EDC2. Setting Data Center status to Non
Responsive (On host compute2, Error: General Exception).
2015-Dec-22, 15:03
VM pcic-storage has paused due to unknown storage error.
2015-Dec-22, 15:03
VM docker1 has paused due to unknown storage error.

VDSM logs look normal until:
Dummy-99::DEBUG::2015-12-22
23:03:58,949::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
dd 
if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox
iflag=direct,fullblock count=1 bs=1024000 (cwd None)
Dummy-99::DEBUG::2015-12-22
23:03:58,963::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
SUCCESS:  = '1+0 records in\n1+0 records out\n1024000 bytes (1.0
MB) copied, 0.00350501 s, 292 MB/s\n';  = 0
VM Channels Listener::INFO::2015-12-22
23:03:59,527::guestagent::180::vm.Vm::(_handleAPIVersion)
vmId=`7067679e-43aa-43c0-b263-b0a711ade2e2`::Guest API version changed
from 2 to 1
Thread-245428::DEBUG::2015-12-22
23:03:59,718::libvirtconnection::151::root::(wrapper) Unknown
libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found:
Requested metadata element is not present
libvirtEventLoop::INFO::2015-12-22
23:04:00,447::vm::4982::vm.Vm::(_onIOError)
vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::abnormal vm stop device
virtio-disk0 error eother
libvirtEventLoop::DEBUG::2015-12-22
23:04:00,447::vm::5666::vm.Vm::(_onLibvirtLifecycleEvent)
vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::event Suspended detail 2
opaque None
libvirtEventLoop::INFO::2015-12-22
23:04:00,447::vm::4982::vm.Vm::(_onIOError)
vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::abnormal vm stop device
virtio-disk0 error eother
...
libvirtEventLoop::INFO::2015-12-22
23:04:00,843::vm::4982::vm.Vm::(_onIOError)
vmId=`97fbbf97-944b-4b77-b0bf-6a831f9090d8`::abnormal vm stop device
virtio-disk1 error eother
libvirtEventLoop::DEBUG::2015-12-22
23:04:00,844::vm::5666::vm.Vm::(_onLibvirtLifecycleEvent)
vmId=`97fbbf97-944b-4b77-b0bf-6a831f9090d8`::event Suspended detail 2
opaque None
Dummy-99::DEBUG::2015-12-22
23:04:00,973::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
dd 
if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox
iflag=direct,fullblock count=1 bs=1024000 (cwd None)
Dummy-99::DEBUG::2015-12-22
23:04:00,983::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
FAILED:  = "dd: failed to open
'/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox':
Transport endpoint is not connected\n";  = 1
Dummy-99::ERROR::2015-12-22
23:04:00,983::storage_mailbox::787::Storage.MailBox.SpmMailMonitor::(run)
Error checking for mail
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/storage_mailbox.py", line 785, in run
self._checkForMail()
  File "/usr/share/vdsm/storage/storage_mailbox.py", line 734, in _checkForMail
"Could not read mailbox: %s" % self._inbox)
IOError: [Errno 5] _handleRequests._checkForMail - Could not read
mailbox: 
/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox
Dummy-99::DEBUG::2015-12-22
23:04:02,987::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
dd 
if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox
iflag=direct,fullblock count=1 bs=1024000 (cwd None)
Dummy-99::DEBUG::2015-12-22

Re: [ovirt-users] Ovirt 3.5 host gluster storage connection failure

2015-12-22 Thread Sahina Bose

[+ Ravi, Pranith]

On 12/23/2015 06:00 AM, Steve Dainard wrote:

I have two hosts, only one of them was running VM's at the time of
this crash so I can't tell if this is a node specific problem.

rpm -qa | egrep -i 'gluster|vdsm|libvirt' |sort
glusterfs-3.6.7-1.el7.x86_64
glusterfs-api-3.6.7-1.el7.x86_64
glusterfs-cli-3.6.7-1.el7.x86_64
glusterfs-fuse-3.6.7-1.el7.x86_64
glusterfs-libs-3.6.7-1.el7.x86_64
glusterfs-rdma-3.6.7-1.el7.x86_64
libvirt-client-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-config-nwfilter-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-interface-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-network-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-nodedev-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-nwfilter-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-qemu-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-secret-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-driver-storage-1.2.8-16.el7_1.5.x86_64
libvirt-daemon-kvm-1.2.8-16.el7_1.5.x86_64
libvirt-lock-sanlock-1.2.8-16.el7_1.5.x86_64
libvirt-python-1.2.8-7.el7_1.1.x86_64
vdsm-4.16.30-0.el7.centos.x86_64
vdsm-cli-4.16.30-0.el7.centos.noarch
vdsm-jsonrpc-4.16.30-0.el7.centos.noarch
vdsm-python-4.16.30-0.el7.centos.noarch
vdsm-python-zombiereaper-4.16.30-0.el7.centos.noarch
vdsm-xmlrpc-4.16.30-0.el7.centos.noarch
vdsm-yajsonrpc-4.16.30-0.el7.centos.noarch

VM's were in a paused state, with errors in UI:

2015-Dec-22, 15:06
VM pcic-apps has paused due to unknown storage error.
2015-Dec-22, 15:06
Host compute2 is not responding. It will stay in Connecting state for
a grace period of 82 seconds and after that an attempt to fence the
host will be issued.
2015-Dec-22, 15:03
Invalid status on Data Center EDC2. Setting Data Center status to Non
Responsive (On host compute2, Error: General Exception).
2015-Dec-22, 15:03
VM pcic-storage has paused due to unknown storage error.
2015-Dec-22, 15:03
VM docker1 has paused due to unknown storage error.

VDSM logs look normal until:
Dummy-99::DEBUG::2015-12-22
23:03:58,949::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
dd 
if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox
iflag=direct,fullblock count=1 bs=1024000 (cwd None)
Dummy-99::DEBUG::2015-12-22
23:03:58,963::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
SUCCESS:  = '1+0 records in\n1+0 records out\n1024000 bytes (1.0
MB) copied, 0.00350501 s, 292 MB/s\n';  = 0
VM Channels Listener::INFO::2015-12-22
23:03:59,527::guestagent::180::vm.Vm::(_handleAPIVersion)
vmId=`7067679e-43aa-43c0-b263-b0a711ade2e2`::Guest API version changed
from 2 to 1
Thread-245428::DEBUG::2015-12-22
23:03:59,718::libvirtconnection::151::root::(wrapper) Unknown
libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found:
Requested metadata element is not present
libvirtEventLoop::INFO::2015-12-22
23:04:00,447::vm::4982::vm.Vm::(_onIOError)
vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::abnormal vm stop device
virtio-disk0 error eother
libvirtEventLoop::DEBUG::2015-12-22
23:04:00,447::vm::5666::vm.Vm::(_onLibvirtLifecycleEvent)
vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::event Suspended detail 2
opaque None
libvirtEventLoop::INFO::2015-12-22
23:04:00,447::vm::4982::vm.Vm::(_onIOError)
vmId=`376e98b7-7798-46e8-be03-5dddf6cfb54f`::abnormal vm stop device
virtio-disk0 error eother
...
libvirtEventLoop::INFO::2015-12-22
23:04:00,843::vm::4982::vm.Vm::(_onIOError)
vmId=`97fbbf97-944b-4b77-b0bf-6a831f9090d8`::abnormal vm stop device
virtio-disk1 error eother
libvirtEventLoop::DEBUG::2015-12-22
23:04:00,844::vm::5666::vm.Vm::(_onLibvirtLifecycleEvent)
vmId=`97fbbf97-944b-4b77-b0bf-6a831f9090d8`::event Suspended detail 2
opaque None
Dummy-99::DEBUG::2015-12-22
23:04:00,973::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
dd 
if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox
iflag=direct,fullblock count=1 bs=1024000 (cwd None)
Dummy-99::DEBUG::2015-12-22
23:04:00,983::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
FAILED:  = "dd: failed to open
'/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox':
Transport endpoint is not connected\n";  = 1
Dummy-99::ERROR::2015-12-22
23:04:00,983::storage_mailbox::787::Storage.MailBox.SpmMailMonitor::(run)
Error checking for mail
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/storage_mailbox.py", line 785, in run
 self._checkForMail()
   File "/usr/share/vdsm/storage/storage_mailbox.py", line 734, in _checkForMail
 "Could not read mailbox: %s" % self._inbox)
IOError: [Errno 5] _handleRequests._checkForMail - Could not read
mailbox: 
/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox
Dummy-99::DEBUG::2015-12-22
23:04:02,987::storage_mailbox::731::Storage.Misc.excCmd::(_checkForMail)
dd 
if=/rhev/data-center/f72ec125-69a1-4c1b-a5e1-313fcb70b6ff/mastersd/dom_md/inbox
iflag=direct,fullblock count=1 bs=1024000 (cwd None)