Re: [ovirt-users] vdsm losing connection to libvirt

2014-12-17 Thread Francesco Romani
- Original Message -
 From: Dan Kenigsberg dan...@redhat.com
 To: users@ovirt.org, ybron...@redhat.com
 Sent: Tuesday, December 16, 2014 11:29:36 PM
 Subject: Re: [ovirt-users] vdsm losing connection to libvirt

  20:56:38,754::libvirtconnection::135::root::(wrapper) connection to
  libvirt broken. ecode: 1 edom: 7
[...]
  20:56:38,755::BindingXMLRPC::1142::vds::(wrapper) libvirt error
  Traceback (most recent call last):
File /usr/share/vdsm/rpc/BindingXMLRPC.py, line 1135, in wrapper
  res = f(*args, **kwargs)
File /usr/share/vdsm/rpc/BindingXMLRPC.py, line 463, in getCapabilities
  ret = api.getCapabilities()
File /usr/share/vdsm/API.py, line 1245, in getCapabilities
  c = caps.get()
File /usr/share/vdsm/caps.py, line 615, in get
  caps.update(netinfo.get())
File /usr/lib/python2.7/site-packages/vdsm/netinfo.py, line 812, in get
  nets = networks()
File /usr/lib/python2.7/site-packages/vdsm/netinfo.py, line 119, in
networks
  allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
File /usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py, line
129, in wrapper
  __connections.get(id(target)).pingLibvirt()
File /usr/lib64/python2.7/site-packages/libvirt.py, line 3642, in
getLibVersion
  if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed',
  conn=self)
  libvirtError: internal error: client socket is closed
 
 Does anything suspecious show up on libvirtd.log?
 
 I heard a report about a libvirt segmentation fault on centos7 - could
 this be it, too? Does `ps -o stime `pgrep libvirtd`` match with reported
 libvirt internal error?
 
 Would you provide the precise version of libvirt and vdsm?

Confirmed. I'm aware of these bugs in particular, which could be related
to a fair amount of mysterious libvirtd crashes:

https://bugzilla.redhat.com/show_bug.cgi?id=1162208
https://bugzilla.redhat.com/show_bug.cgi?id=1171124
https://bugzilla.redhat.com/show_bug.cgi?id=1141159

It seems *more frequent* when migrations are involved, but as last bug tells
us, it may happen on other cases as well.

-- 
Francesco Romani
RedHat Engineering Virtualization R  D
Phone: 8261328
IRC: fromani
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] vdsm losing connection to libvirt

2014-12-16 Thread Chris Adams
I have a oVirt setup that has three nodes, all running CentOS 7, with a
hosted engine running CentOS 6.  Two of the nodes (node8 and node9) are
configured for hosted engine, and the third (node2) is just a regular
node (as you might guess from the names, more nodes are coming as I
migrate VMs to oVirt).

On one node, node8, vdsm periodically loses its connection to libvirt,
which causes vdsm to restart.  There doesn't appear to be any trigger
that I can see (not time of day, load, etc. related).  The engine VM is
up and running on node8 (don't know if that has anything to do with it).

I get some entries in /var/log/messages repeated continuously; the
ovirt-ha-broker: sending ioctl 5401 to a partition I mentioned before,
and the following:

Dec 15 20:56:23 node8 journal: User record for user '107' was not found: No 
such file or directory
Dec 15 20:56:23 node8 journal: Group record for user '107' was not found: No 
such file or directory

I don't think those have any relevance (don't know where they come
from); filtering those out, I see:

Dec 15 20:56:33 node8 journal: End of file while reading data: Input/output 
error
Dec 15 20:56:33 node8 journal: Tried to close invalid fd 0
Dec 15 20:56:38 node8 journal: vdsm root WARNING connection to libvirt broken. 
ecode: 1 edom: 7
Dec 15 20:56:38 node8 journal: vdsm root CRITICAL taking calling process down.
Dec 15 20:56:38 node8 journal: vdsm vds ERROR libvirt error
Dec 15 20:56:38 node8 journal: ovirt-ha-broker mgmt_bridge.MgmtBridge ERROR 
Failed to getVdsCapabilities: Error 16 from getVdsCapabilities: Unexpected 
exception
Dec 15 20:56:45 node8 journal: End of file while reading data: Input/output 
error
Dec 15 20:56:45 node8 vdsmd_init_common.sh: vdsm: Running run_final_hooks
Dec 15 20:56:45 node8 systemd: Starting Virtual Desktop Server Manager...
and then all the normal-looking vdsm startup

It is happening about once a day, but not at any regular interval or
time (was 02:23 Sunday, then 20:56 Monday).

vdsm.log has this at that time:

Thread-601576::DEBUG::2014-12-15 
20:56:38,715::BindingXMLRPC::1132::vds::(wrapper) client [127.0.0.1]::call 
getCapabilities with () {}
Thread-601576::DEBUG::2014-12-15 20:56:38,718::utils::738::root::(execCmd) 
/sbin/ip route show to 0.0.0.0/0 table all (cwd None)
Thread-601576::DEBUG::2014-12-15 20:56:38,746::utils::758::root::(execCmd) 
SUCCESS: err = ''; rc = 0
Thread-601576::WARNING::2014-12-15 
20:56:38,754::libvirtconnection::135::root::(wrapper) connection to libvirt 
broken. ecode: 1 edom: 7
Thread-601576::CRITICAL::2014-12-15 
20:56:38,754::libvirtconnection::137::root::(wrapper) taking calling process 
down.
MainThread::DEBUG::2014-12-15 20:56:38,754::vdsm::58::vds::(sigtermHandler) 
Received signal 15
Thread-601576::DEBUG::2014-12-15 
20:56:38,755::libvirtconnection::143::root::(wrapper) Unknown libvirterror: 
ecode: 1 edom: 7 level: 2 message: internal error: client socket is closed
MainThread::DEBUG::2014-12-15 
20:56:38,755::protocoldetector::135::vds.MultiProtocolAcceptor::(stop) Stopping 
Acceptor
MainThread::INFO::2014-12-15 
20:56:38,755::__init__::563::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC 
Server
Detector thread::DEBUG::2014-12-15 
20:56:38,756::protocoldetector::106::vds.MultiProtocolAcceptor::(_cleanup) 
Cleaning Acceptor
MainThread::INFO::2014-12-15 20:56:38,757::vmchannels::188::vds::(stop) VM 
channels listener was stopped.
MainThread::INFO::2014-12-15 20:56:38,758::momIF::91::MOM::(stop) Shutting down 
MOM
MainThread::DEBUG::2014-12-15 
20:56:38,759::task::595::Storage.TaskManager.Task::(_updateState) 
Task=`26c7680c-23e2-42bb-964c-272e778a168a`::moving from state init - state 
preparing
MainThread::INFO::2014-12-15 20:56:38,759::logUtils::44::dispatcher::(wrapper) 
Run and protect: prepareForShutdown(options=None)
Thread-601576::ERROR::2014-12-15 
20:56:38,755::BindingXMLRPC::1142::vds::(wrapper) libvirt error
Traceback (most recent call last):
  File /usr/share/vdsm/rpc/BindingXMLRPC.py, line 1135, in wrapper
res = f(*args, **kwargs)
  File /usr/share/vdsm/rpc/BindingXMLRPC.py, line 463, in getCapabilities
ret = api.getCapabilities()
  File /usr/share/vdsm/API.py, line 1245, in getCapabilities
c = caps.get()
  File /usr/share/vdsm/caps.py, line 615, in get
caps.update(netinfo.get())
  File /usr/lib/python2.7/site-packages/vdsm/netinfo.py, line 812, in get
nets = networks()
  File /usr/lib/python2.7/site-packages/vdsm/netinfo.py, line 119, in networks
allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
  File /usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py, line 129, 
in wrapper
__connections.get(id(target)).pingLibvirt()
  File /usr/lib64/python2.7/site-packages/libvirt.py, line 3642, in 
getLibVersion
if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', 
conn=self)
libvirtError: internal error: client socket is closed


-- 
Chris Adams c...@cmadams.net
___
Users mailing 

Re: [ovirt-users] vdsm losing connection to libvirt

2014-12-16 Thread Dan Kenigsberg
On Tue, Dec 16, 2014 at 08:48:48AM -0600, Chris Adams wrote:
 I have a oVirt setup that has three nodes, all running CentOS 7, with a
 hosted engine running CentOS 6.  Two of the nodes (node8 and node9) are
 configured for hosted engine, and the third (node2) is just a regular
 node (as you might guess from the names, more nodes are coming as I
 migrate VMs to oVirt).
 
 On one node, node8, vdsm periodically loses its connection to libvirt,
 which causes vdsm to restart.  There doesn't appear to be any trigger
 that I can see (not time of day, load, etc. related).  The engine VM is
 up and running on node8 (don't know if that has anything to do with it).
 
 I get some entries in /var/log/messages repeated continuously; the
 ovirt-ha-broker: sending ioctl 5401 to a partition I mentioned before,
 and the following:
 
 Dec 15 20:56:23 node8 journal: User record for user '107' was not found: No 
 such file or directory
 Dec 15 20:56:23 node8 journal: Group record for user '107' was not found: No 
 such file or directory
 
 I don't think those have any relevance (don't know where they come
 from); filtering those out, I see:
 
 Dec 15 20:56:33 node8 journal: End of file while reading data: Input/output 
 error
 Dec 15 20:56:33 node8 journal: Tried to close invalid fd 0
 Dec 15 20:56:38 node8 journal: vdsm root WARNING connection to libvirt 
 broken. ecode: 1 edom: 7
 Dec 15 20:56:38 node8 journal: vdsm root CRITICAL taking calling process down.
 Dec 15 20:56:38 node8 journal: vdsm vds ERROR libvirt error
 Dec 15 20:56:38 node8 journal: ovirt-ha-broker mgmt_bridge.MgmtBridge ERROR 
 Failed to getVdsCapabilities: Error 16 from getVdsCapabilities: Unexpected 
 exception
 Dec 15 20:56:45 node8 journal: End of file while reading data: Input/output 
 error
 Dec 15 20:56:45 node8 vdsmd_init_common.sh: vdsm: Running run_final_hooks
 Dec 15 20:56:45 node8 systemd: Starting Virtual Desktop Server Manager...
 and then all the normal-looking vdsm startup
 
 It is happening about once a day, but not at any regular interval or
 time (was 02:23 Sunday, then 20:56 Monday).
 
 vdsm.log has this at that time:
 
 Thread-601576::DEBUG::2014-12-15 
 20:56:38,715::BindingXMLRPC::1132::vds::(wrapper) client [127.0.0.1]::call 
 getCapabilities with () {}
 Thread-601576::DEBUG::2014-12-15 20:56:38,718::utils::738::root::(execCmd) 
 /sbin/ip route show to 0.0.0.0/0 table all (cwd None)
 Thread-601576::DEBUG::2014-12-15 20:56:38,746::utils::758::root::(execCmd) 
 SUCCESS: err = ''; rc = 0
 Thread-601576::WARNING::2014-12-15 
 20:56:38,754::libvirtconnection::135::root::(wrapper) connection to libvirt 
 broken. ecode: 1 edom: 7
 Thread-601576::CRITICAL::2014-12-15 
 20:56:38,754::libvirtconnection::137::root::(wrapper) taking calling process 
 down.
 MainThread::DEBUG::2014-12-15 20:56:38,754::vdsm::58::vds::(sigtermHandler) 
 Received signal 15
 Thread-601576::DEBUG::2014-12-15 
 20:56:38,755::libvirtconnection::143::root::(wrapper) Unknown libvirterror: 
 ecode: 1 edom: 7 level: 2 message: internal error: client socket is closed
 MainThread::DEBUG::2014-12-15 
 20:56:38,755::protocoldetector::135::vds.MultiProtocolAcceptor::(stop) 
 Stopping Acceptor
 MainThread::INFO::2014-12-15 
 20:56:38,755::__init__::563::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC 
 Server
 Detector thread::DEBUG::2014-12-15 
 20:56:38,756::protocoldetector::106::vds.MultiProtocolAcceptor::(_cleanup) 
 Cleaning Acceptor
 MainThread::INFO::2014-12-15 20:56:38,757::vmchannels::188::vds::(stop) VM 
 channels listener was stopped.
 MainThread::INFO::2014-12-15 20:56:38,758::momIF::91::MOM::(stop) Shutting 
 down MOM
 MainThread::DEBUG::2014-12-15 
 20:56:38,759::task::595::Storage.TaskManager.Task::(_updateState) 
 Task=`26c7680c-23e2-42bb-964c-272e778a168a`::moving from state init - state 
 preparing
 MainThread::INFO::2014-12-15 
 20:56:38,759::logUtils::44::dispatcher::(wrapper) Run and protect: 
 prepareForShutdown(options=None)
 Thread-601576::ERROR::2014-12-15 
 20:56:38,755::BindingXMLRPC::1142::vds::(wrapper) libvirt error
 Traceback (most recent call last):
   File /usr/share/vdsm/rpc/BindingXMLRPC.py, line 1135, in wrapper
 res = f(*args, **kwargs)
   File /usr/share/vdsm/rpc/BindingXMLRPC.py, line 463, in getCapabilities
 ret = api.getCapabilities()
   File /usr/share/vdsm/API.py, line 1245, in getCapabilities
 c = caps.get()
   File /usr/share/vdsm/caps.py, line 615, in get
 caps.update(netinfo.get())
   File /usr/lib/python2.7/site-packages/vdsm/netinfo.py, line 812, in get
 nets = networks()
   File /usr/lib/python2.7/site-packages/vdsm/netinfo.py, line 119, in 
 networks
 allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
   File /usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py, line 
 129, in wrapper
 __connections.get(id(target)).pingLibvirt()
   File /usr/lib64/python2.7/site-packages/libvirt.py, line 3642, in 
 getLibVersion
 if ret == -1: raise libvirtError ('virConnectGetLibVersion()