Re: [ovirt-users] hosted engine health check issues

2014-05-06 Thread Jiri Moskovcak
 and
 create a new vm with virt-manager which
loads ovirtmgmt
   disk. I
 could
 reach my engine over the ovirtmgmt bridge
(so bridge must be
 working).

 I also started libvirtd with Option -v and I
saw the
   following
 in
 libvirtd.log when trying to start ovirt engine:
 2014-04-22 14:18:25.432+: 8901: debug :
 virCommandRunAsync:2250 :
 Command result 0, with PID 11491
 2014-04-22 14:18:25.478+: 8901: debug :
   virCommandRun:2045 :
   Result
 exit status 255, stdout: '' stderr:
'iptables v1.4.7: goto
 'FO-vnet0'
   is
 not a chain

 So it could be that something is broken in
my hosted-engine
 network.
   Do
 you have any clue how I can troubleshoot this?


 Thanks,
 René



 - Original Message -
 From: René Koch rk...@linuxland.at
mailto:rk...@linuxland.at
 To: Martin Sivak msi...@redhat.com
mailto:msi...@redhat.com
 Cc: users@ovirt.org mailto:users@ovirt.org
 Sent: Tuesday, April 22, 2014 1:46:38 PM
 Subject: Re: [ovirt-users] hosted engine
health check
   issues

 Hi,

 I rebooted one of my ovirt hosts today and
the result is
   now
 that I
 can't start hosted-engine anymore.

 ovirt-ha-agent isn't running because the
lockspace file is
 missing
 (sanlock complains about it).
 So I tried to start hosted-engine with
--vm-start and I
   get
 the
 following errors:

 == /var/log/sanlock.log ==
 2014-04-22 12:38:17+0200 654 [3093]: r2
cmd_acquire
   2,9,5733
 invalid
 lockspace found -1 failed 0 name
   2851af27-8744-445d-9fb1-a0d083c8dc82

 == /var/log/messages ==
 Apr 22 12:38:17 ovirt-host02
sanlock[3079]: 2014-04-22
   12:38:17+0200 654
 [3093]: r2 cmd_acquire 2,9,5733 invalid
lockspace found -1
 failed 0
   name
 2851af27-8744-445d-9fb1-a0d083c8dc82
 Apr 22 12:38:17 ovirt-host02 kernel:
ovirtmgmt: port
   2(vnet0)
   entering
 disabled state
 Apr 22 12:38:17 ovirt-host02 kernel:
device vnet0 left
 promiscuous
   mode
 Apr 22 12:38:17 ovirt-host02 kernel:
ovirtmgmt: port
   2(vnet0)
   entering
 disabled state

 == /var/log/vdsm/vdsm.log ==
 Thread-21::DEBUG::2014-04-22

12:38:17,563::libvirtconnection::124::root::(wrapper)
   Unknown
 libvirterror: ecode: 38 edom: 42 level: 2
message: Failed
   to
 acquire
 lock: No space left on device
 Thread-21::DEBUG::2014-04-22

12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)

   vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
   released
 Thread-21::ERROR::2014-04-22

12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)

vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
 process
   failed
 Traceback (most recent call last):
  File /usr/share/vdsm/vm.py, line
2249, in
 _startUnderlyingVm
self._run()
  File /usr/share/vdsm/vm.py, line
3170, in _run
self._connection.createXML(domxml,
flags),
  File


  /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
 line 92, in wrapper
ret = f(*args, **kwargs)
  File
/usr/lib64/python2.6/site-packages/libvirt.py,
 line
   2665, in
 createXML
if ret is None:raise
   libvirtError('virDomainCreateXML()
   failed',
 conn=self)
 libvirtError: Failed to acquire lock: No
space left

Re: [ovirt-users] hosted engine health check issues

2014-04-28 Thread Martin Sivak
network.
  Do
you have any clue how I can troubleshoot this?
   
   
Thanks,
René
   
   
   
- Original Message -
From: René Koch rk...@linuxland.at
To: Martin Sivak msi...@redhat.com
Cc: users@ovirt.org
Sent: Tuesday, April 22, 2014 1:46:38 PM
Subject: Re: [ovirt-users] hosted engine health check
  issues
   
Hi,
   
I rebooted one of my ovirt hosts today and the result is
  now
that I
can't start hosted-engine anymore.
   
ovirt-ha-agent isn't running because the lockspace file is
missing
(sanlock complains about it).
So I tried to start hosted-engine with --vm-start and I
  get
the
following errors:
   
== /var/log/sanlock.log ==
2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire
  2,9,5733
invalid
lockspace found -1 failed 0 name
  2851af27-8744-445d-9fb1-a0d083c8dc82
   
== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22
  12:38:17+0200 654
[3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1
failed 0
  name
2851af27-8744-445d-9fb1-a0d083c8dc82
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port
  2(vnet0)
  entering
disabled state
Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left
promiscuous
  mode
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port
  2(vnet0)
  entering
disabled state
   
== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22
12:38:17,563::libvirtconnection::124::root::(wrapper)
  Unknown
libvirterror: ecode: 38 edom: 42 level: 2 message: Failed
  to
acquire
lock: No space left on device
Thread-21::DEBUG::2014-04-22
12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
   
  vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
  released
Thread-21::ERROR::2014-04-22
12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
process
  failed
Traceback (most recent call last):
 File /usr/share/vdsm/vm.py, line 2249, in
_startUnderlyingVm
   self._run()
 File /usr/share/vdsm/vm.py, line 3170, in _run
   self._connection.createXML(domxml, flags),
 File
   
 /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
line 92, in wrapper
   ret = f(*args, **kwargs)
 File /usr/lib64/python2.6/site-packages/libvirt.py,
line
  2665, in
createXML
   if ret is None:raise
  libvirtError('virDomainCreateXML()
  failed',
conn=self)
libvirtError: Failed to acquire lock: No space left on
  device
   
== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
process
failed#012Traceback (most recent call last):#012  File
/usr/share/vdsm/vm.py, line 2249, in
  _startUnderlyingVm#012
self._run()#012  File /usr/share/vdsm/vm.py, line 3170,
  in
  _run#012
self._connection.createXML(domxml, flags),#012  File
   
/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
  line 92,
in wrapper#012ret = f(*args, **kwargs)#012  File
/usr/lib64/python2.6/site-packages/libvirt.py, line
  2665, in
createXML#012if ret is None:raise
  libvirtError('virDomainCreateXML()
failed', conn=self)#012libvirtError: Failed to acquire
  lock:
No
  space
left on device
   
== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22
12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed
  state to
Down:
Failed to acquire lock: No space left on device
   
   
No space left on device is nonsense as there is enough
  space
(I had
  this
issue last time as well where I had to patch machine.py,
  but
this
  file
is now Python 2.6.6 compatible.
   
Any idea what prevents hosted-engine from starting?
ovirt-ha-broker, vdsmd and sanlock are running btw.
   
Btw, I can see in log that json rpc server module is
  missing
- which
package is required for CentOS 6.5?
Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to
  load
the
  json
rpc server module. Please make sure it is installed.
   
   
Thanks,
René
   
   
   
On 04/17/2014 10:02 AM, Martin Sivak wrote:
Hi,
   
How can I disable notifications?
   
The notification is configured in
/etc/ovirt-hosted

Re: [ovirt-users] hosted engine health check issues

2014-04-25 Thread Martin Sivak
 bridge?
  as long as it is down vdsm may have issues starting up properly
  and this is why you see the complaints on the rpc server.
 
  Can you try manually fixing the network part first and then
  restart vdsm?
  Once vdsm is happy hosted engine VM will start.
 
  Thanks for your feedback, Doron.
 
  My ovirtmgmt bridge seems to be on or isn't it:
  # brctl show ovirtmgmt
  bridge namebridge id   STP enabled
  interfaces
  ovirtmgmt  8000.0025907587c2   no
   eth0.200
 
  # ip a s ovirtmgmt
  7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc
  noqueue
  state UNKNOWN
link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
inet 10.0.200.102/24 brd 10.0.200.255 scope global
  ovirtmgmt
inet6 fe80::225:90ff:fe75:87c2/64 scope link
   valid_lft forever preferred_lft forever
 
  # ip a s eth0.200
  6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500
  qdisc
  noqueue state UP
link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
inet6 fe80::225:90ff:fe75:87c2/64 scope link
   valid_lft forever preferred_lft forever
 
  I tried the following yesterday:
  Copy virtual disk from GlusterFS storage to local disk of host
  and
  create a new vm with virt-manager which loads ovirtmgmt disk. I
  could
  reach my engine over the ovirtmgmt bridge (so bridge must be
  working).
 
  I also started libvirtd with Option -v and I saw the following
  in
  libvirtd.log when trying to start ovirt engine:
  2014-04-22 14:18:25.432+: 8901: debug :
  virCommandRunAsync:2250 :
  Command result 0, with PID 11491
  2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 :
Result
  exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto
  'FO-vnet0'
is
  not a chain
 
  So it could be that something is broken in my hosted-engine
  network.
Do
  you have any clue how I can troubleshoot this?
 
 
  Thanks,
  René
 
 
 
  - Original Message -
  From: René Koch rk...@linuxland.at
  To: Martin Sivak msi...@redhat.com
  Cc: users@ovirt.org
  Sent: Tuesday, April 22, 2014 1:46:38 PM
  Subject: Re: [ovirt-users] hosted engine health check issues
 
  Hi,
 
  I rebooted one of my ovirt hosts today and the result is now
  that I
  can't start hosted-engine anymore.
 
  ovirt-ha-agent isn't running because the lockspace file is
  missing
  (sanlock complains about it).
  So I tried to start hosted-engine with --vm-start and I get
  the
  following errors:
 
  == /var/log/sanlock.log ==
  2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733
  invalid
  lockspace found -1 failed 0 name
2851af27-8744-445d-9fb1-a0d083c8dc82
 
  == /var/log/messages ==
  Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22
12:38:17+0200 654
  [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1
  failed 0
name
  2851af27-8744-445d-9fb1-a0d083c8dc82
  Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0)
entering
  disabled state
  Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left
  promiscuous
mode
  Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0)
entering
  disabled state
 
  == /var/log/vdsm/vdsm.log ==
  Thread-21::DEBUG::2014-04-22
  12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
  libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to
  acquire
  lock: No space left on device
  Thread-21::DEBUG::2014-04-22
  12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
  vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
released
  Thread-21::ERROR::2014-04-22
  12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
  vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
  process
failed
  Traceback (most recent call last):
   File /usr/share/vdsm/vm.py, line 2249, in
  _startUnderlyingVm
 self._run()
   File /usr/share/vdsm/vm.py, line 3170, in _run
 self._connection.createXML(domxml, flags),
   File
 
   /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
  line 92, in wrapper
 ret = f(*args, **kwargs)
   File /usr/lib64/python2.6/site-packages/libvirt.py,
  line
2665, in
  createXML
 if ret is None:raise libvirtError('virDomainCreateXML()
failed',
  conn=self)
  libvirtError: Failed to acquire lock: No space left on device
 
  == /var/log/messages ==
  Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
  vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
  process
  failed#012Traceback (most recent call last):#012  File

Re: [ovirt-users] hosted engine health check issues

2014-04-23 Thread René Koch

On 04/23/2014 12:28 AM, Doron Fediuck wrote:

Hi Rene,
any idea what closed your ovirtmgmt bridge?
as long as it is down vdsm may have issues starting up properly
and this is why you see the complaints on the rpc server.

Can you try manually fixing the network part first and then
restart vdsm?
Once vdsm is happy hosted engine VM will start.


Thanks for your feedback, Doron.

My ovirtmgmt bridge seems to be on or isn't it:
# brctl show ovirtmgmt
bridge name bridge id   STP enabled interfaces
ovirtmgmt   8000.0025907587c2   no  eth0.200

# ip a s ovirtmgmt
7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue 
state UNKNOWN

link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
inet6 fe80::225:90ff:fe75:87c2/64 scope link
   valid_lft forever preferred_lft forever

# ip a s eth0.200
6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc 
noqueue state UP

link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
inet6 fe80::225:90ff:fe75:87c2/64 scope link
   valid_lft forever preferred_lft forever

I tried the following yesterday:
Copy virtual disk from GlusterFS storage to local disk of host and 
create a new vm with virt-manager which loads ovirtmgmt disk. I could 
reach my engine over the ovirtmgmt bridge (so bridge must be working).


I also started libvirtd with Option -v and I saw the following in 
libvirtd.log when trying to start ovirt engine:
2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 : 
Command result 0, with PID 11491
2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result 
exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is 
not a chain


So it could be that something is broken in my hosted-engine network. Do 
you have any clue how I can troubleshoot this?



Thanks,
René




- Original Message -

From: René Koch rk...@linuxland.at
To: Martin Sivak msi...@redhat.com
Cc: users@ovirt.org
Sent: Tuesday, April 22, 2014 1:46:38 PM
Subject: Re: [ovirt-users] hosted engine health check issues

Hi,

I rebooted one of my ovirt hosts today and the result is now that I
can't start hosted-engine anymore.

ovirt-ha-agent isn't running because the lockspace file is missing
(sanlock complains about it).
So I tried to start hosted-engine with --vm-start and I get the
following errors:

== /var/log/sanlock.log ==
2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82

== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654
[3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name
2851af27-8744-445d-9fb1-a0d083c8dc82
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
disabled state
Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
disabled state

== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22
12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
lock: No space left on device
Thread-21::DEBUG::2014-04-22
12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
Thread-21::ERROR::2014-04-22
12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
Traceback (most recent call last):
File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm
  self._run()
File /usr/share/vdsm/vm.py, line 3170, in _run
  self._connection.createXML(domxml, flags),
File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
line 92, in wrapper
  ret = f(*args, **kwargs)
File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in
createXML
  if ret is None:raise libvirtError('virDomainCreateXML() failed',
conn=self)
libvirtError: Failed to acquire lock: No space left on device

== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
failed#012Traceback (most recent call last):#012  File
/usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012
self._run()#012  File /usr/share/vdsm/vm.py, line 3170, in _run#012
   self._connection.createXML(domxml, flags),#012  File
/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92,
in wrapper#012ret = f(*args, **kwargs)#012  File
/usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in
createXML#012if ret is None:raise libvirtError('virDomainCreateXML()
failed', conn=self)#012libvirtError: Failed to acquire lock: No space
left on device

== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22
12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
vmId=`f26dd37e-13b5

Re: [ovirt-users] hosted engine health check issues

2014-04-23 Thread Martin Sivak
Hi René,

  libvirtError: Failed to acquire lock: No space left on device

  2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
  lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82

Can you please check the contents of /rhev/data-center/your nfs mount/nfs 
domain uuid/ha_agent/?

This is how it should look like:

[root@dev-03 ~]# ls -al 
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
total 2036
drwxr-x---. 2 vdsm kvm4096 Mar 19 18:46 .
drwxr-xr-x. 6 vdsm kvm4096 Mar 19 18:46 ..
-rw-rw. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace
-rw-rw. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata

The errors seem to indicate that you somehow lost the lockspace file.

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

- Original Message -
 On 04/23/2014 12:28 AM, Doron Fediuck wrote:
  Hi Rene,
  any idea what closed your ovirtmgmt bridge?
  as long as it is down vdsm may have issues starting up properly
  and this is why you see the complaints on the rpc server.
 
  Can you try manually fixing the network part first and then
  restart vdsm?
  Once vdsm is happy hosted engine VM will start.
 
 Thanks for your feedback, Doron.
 
 My ovirtmgmt bridge seems to be on or isn't it:
 # brctl show ovirtmgmt
 bridge name   bridge id   STP enabled interfaces
 ovirtmgmt 8000.0025907587c2   no  eth0.200
 
 # ip a s ovirtmgmt
 7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue
 state UNKNOWN
  link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
  inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
  inet6 fe80::225:90ff:fe75:87c2/64 scope link
 valid_lft forever preferred_lft forever
 
 # ip a s eth0.200
 6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc
 noqueue state UP
  link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
  inet6 fe80::225:90ff:fe75:87c2/64 scope link
 valid_lft forever preferred_lft forever
 
 I tried the following yesterday:
 Copy virtual disk from GlusterFS storage to local disk of host and
 create a new vm with virt-manager which loads ovirtmgmt disk. I could
 reach my engine over the ovirtmgmt bridge (so bridge must be working).
 
 I also started libvirtd with Option -v and I saw the following in
 libvirtd.log when trying to start ovirt engine:
 2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 :
 Command result 0, with PID 11491
 2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result
 exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is
 not a chain
 
 So it could be that something is broken in my hosted-engine network. Do
 you have any clue how I can troubleshoot this?
 
 
 Thanks,
 René
 
 
 
  - Original Message -
  From: René Koch rk...@linuxland.at
  To: Martin Sivak msi...@redhat.com
  Cc: users@ovirt.org
  Sent: Tuesday, April 22, 2014 1:46:38 PM
  Subject: Re: [ovirt-users] hosted engine health check issues
 
  Hi,
 
  I rebooted one of my ovirt hosts today and the result is now that I
  can't start hosted-engine anymore.
 
  ovirt-ha-agent isn't running because the lockspace file is missing
  (sanlock complains about it).
  So I tried to start hosted-engine with --vm-start and I get the
  following errors:
 
  == /var/log/sanlock.log ==
  2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
  lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
 
  == /var/log/messages ==
  Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654
  [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name
  2851af27-8744-445d-9fb1-a0d083c8dc82
  Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
  disabled state
  Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
  Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
  disabled state
 
  == /var/log/vdsm/vdsm.log ==
  Thread-21::DEBUG::2014-04-22
  12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
  libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
  lock: No space left on device
  Thread-21::DEBUG::2014-04-22
  12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
  vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
  Thread-21::ERROR::2014-04-22
  12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
  vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
  Traceback (most recent call last):
  File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm
self._run()
  File /usr/share/vdsm/vm.py, line 3170, in _run
self._connection.createXML(domxml, flags),
  File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
  line 92, in wrapper
ret = f(*args, **kwargs)
  File /usr/lib64/python2.6/site-packages

Re: [ovirt-users] hosted engine health check issues

2014-04-23 Thread René Koch

On 04/23/2014 11:08 AM, Martin Sivak wrote:

Hi René,


libvirtError: Failed to acquire lock: No space left on device



2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82


Can you please check the contents of /rhev/data-center/your nfs mount/nfs 
domain uuid/ha_agent/?

This is how it should look like:

[root@dev-03 ~]# ls -al 
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
total 2036
drwxr-x---. 2 vdsm kvm4096 Mar 19 18:46 .
drwxr-xr-x. 6 vdsm kvm4096 Mar 19 18:46 ..
-rw-rw. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace
-rw-rw. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata

The errors seem to indicate that you somehow lost the lockspace file.


True :)
Isn't this file created when hosted engine is started? Or how can I 
create this file manually?




--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

- Original Message -

On 04/23/2014 12:28 AM, Doron Fediuck wrote:

Hi Rene,
any idea what closed your ovirtmgmt bridge?
as long as it is down vdsm may have issues starting up properly
and this is why you see the complaints on the rpc server.

Can you try manually fixing the network part first and then
restart vdsm?
Once vdsm is happy hosted engine VM will start.


Thanks for your feedback, Doron.

My ovirtmgmt bridge seems to be on or isn't it:
# brctl show ovirtmgmt
bridge name bridge id   STP enabled interfaces
ovirtmgmt   8000.0025907587c2   no  eth0.200

# ip a s ovirtmgmt
7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue
state UNKNOWN
  link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
  inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
  inet6 fe80::225:90ff:fe75:87c2/64 scope link
 valid_lft forever preferred_lft forever

# ip a s eth0.200
6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc
noqueue state UP
  link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
  inet6 fe80::225:90ff:fe75:87c2/64 scope link
 valid_lft forever preferred_lft forever

I tried the following yesterday:
Copy virtual disk from GlusterFS storage to local disk of host and
create a new vm with virt-manager which loads ovirtmgmt disk. I could
reach my engine over the ovirtmgmt bridge (so bridge must be working).

I also started libvirtd with Option -v and I saw the following in
libvirtd.log when trying to start ovirt engine:
2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 :
Command result 0, with PID 11491
2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result
exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is
not a chain

So it could be that something is broken in my hosted-engine network. Do
you have any clue how I can troubleshoot this?


Thanks,
René




- Original Message -

From: René Koch rk...@linuxland.at
To: Martin Sivak msi...@redhat.com
Cc: users@ovirt.org
Sent: Tuesday, April 22, 2014 1:46:38 PM
Subject: Re: [ovirt-users] hosted engine health check issues

Hi,

I rebooted one of my ovirt hosts today and the result is now that I
can't start hosted-engine anymore.

ovirt-ha-agent isn't running because the lockspace file is missing
(sanlock complains about it).
So I tried to start hosted-engine with --vm-start and I get the
following errors:

== /var/log/sanlock.log ==
2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82

== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654
[3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name
2851af27-8744-445d-9fb1-a0d083c8dc82
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
disabled state
Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
disabled state

== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22
12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
lock: No space left on device
Thread-21::DEBUG::2014-04-22
12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
Thread-21::ERROR::2014-04-22
12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
Traceback (most recent call last):
 File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm
   self._run()
 File /usr/share/vdsm/vm.py, line 3170, in _run
   self._connection.createXML(domxml, flags),
 File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
line 92, in wrapper
   ret = f(*args, **kwargs)
 File /usr/lib64

Re: [ovirt-users] hosted engine health check issues

2014-04-23 Thread Martin Sivak
Hi,

 Isn't this file created when hosted engine is started?

The file is created by the setup script. If it got lost then there was probably 
something bad happening in your NFS or Gluster storage.

 Or how can I create this file manually?

I can give you experimental treatment for this. We do not have any official way 
as this is something that should not ever happen :)

!! But before you do that make sure you do not have any nodes running properly. 
This will destroy and reinitialize the lockspace database for the whole 
hosted-engine environment (which you apparently lack, but..). !!

You have to create the ha_agent/hosted-engine.lockspace file with the expected 
size (1MB) and then tell sanlock to initialize it as a lockspace using:

# python
 import sanlock
 sanlock.write_lockspace(lockspace=hosted-engine,
... path=/rhev/data-center/mnt/nfs/hosted engine storage 
domain/ha_agent/hosted-engine.lockspace,
... offset=0)


Then try starting the services (both broker and agent) again.

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ


- Original Message -
 On 04/23/2014 11:08 AM, Martin Sivak wrote:
  Hi René,
 
  libvirtError: Failed to acquire lock: No space left on device
 
  2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
  lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
 
  Can you please check the contents of /rhev/data-center/your nfs
  mount/nfs domain uuid/ha_agent/?
 
  This is how it should look like:
 
  [root@dev-03 ~]# ls -al
  /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
  total 2036
  drwxr-x---. 2 vdsm kvm4096 Mar 19 18:46 .
  drwxr-xr-x. 6 vdsm kvm4096 Mar 19 18:46 ..
  -rw-rw. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace
  -rw-rw. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata
 
  The errors seem to indicate that you somehow lost the lockspace file.
 
 True :)
 Isn't this file created when hosted engine is started? Or how can I
 create this file manually?
 
 
  --
  Martin Sivák
  msi...@redhat.com
  Red Hat Czech
  RHEV-M SLA / Brno, CZ
 
  - Original Message -
  On 04/23/2014 12:28 AM, Doron Fediuck wrote:
  Hi Rene,
  any idea what closed your ovirtmgmt bridge?
  as long as it is down vdsm may have issues starting up properly
  and this is why you see the complaints on the rpc server.
 
  Can you try manually fixing the network part first and then
  restart vdsm?
  Once vdsm is happy hosted engine VM will start.
 
  Thanks for your feedback, Doron.
 
  My ovirtmgmt bridge seems to be on or isn't it:
  # brctl show ovirtmgmt
  bridge namebridge id   STP enabled interfaces
  ovirtmgmt  8000.0025907587c2   no  eth0.200
 
  # ip a s ovirtmgmt
  7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue
  state UNKNOWN
link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
inet6 fe80::225:90ff:fe75:87c2/64 scope link
   valid_lft forever preferred_lft forever
 
  # ip a s eth0.200
  6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc
  noqueue state UP
link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
inet6 fe80::225:90ff:fe75:87c2/64 scope link
   valid_lft forever preferred_lft forever
 
  I tried the following yesterday:
  Copy virtual disk from GlusterFS storage to local disk of host and
  create a new vm with virt-manager which loads ovirtmgmt disk. I could
  reach my engine over the ovirtmgmt bridge (so bridge must be working).
 
  I also started libvirtd with Option -v and I saw the following in
  libvirtd.log when trying to start ovirt engine:
  2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 :
  Command result 0, with PID 11491
  2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result
  exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is
  not a chain
 
  So it could be that something is broken in my hosted-engine network. Do
  you have any clue how I can troubleshoot this?
 
 
  Thanks,
  René
 
 
 
  - Original Message -
  From: René Koch rk...@linuxland.at
  To: Martin Sivak msi...@redhat.com
  Cc: users@ovirt.org
  Sent: Tuesday, April 22, 2014 1:46:38 PM
  Subject: Re: [ovirt-users] hosted engine health check issues
 
  Hi,
 
  I rebooted one of my ovirt hosts today and the result is now that I
  can't start hosted-engine anymore.
 
  ovirt-ha-agent isn't running because the lockspace file is missing
  (sanlock complains about it).
  So I tried to start hosted-engine with --vm-start and I get the
  following errors:
 
  == /var/log/sanlock.log ==
  2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
  lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
 
  == /var/log/messages ==
  Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04

Re: [ovirt-users] hosted engine health check issues

2014-04-23 Thread Kevin Tibi
same pb. ovirt-ha-broker have 400% cpu and is defunct. I can't kill with -9.


2014-04-23 13:55 GMT+02:00 Martin Sivak msi...@redhat.com:

 Hi,

  Isn't this file created when hosted engine is started?

 The file is created by the setup script. If it got lost then there was
 probably something bad happening in your NFS or Gluster storage.

  Or how can I create this file manually?

 I can give you experimental treatment for this. We do not have any
 official way as this is something that should not ever happen :)

 !! But before you do that make sure you do not have any nodes running
 properly. This will destroy and reinitialize the lockspace database for the
 whole hosted-engine environment (which you apparently lack, but..). !!

 You have to create the ha_agent/hosted-engine.lockspace file with the
 expected size (1MB) and then tell sanlock to initialize it as a lockspace
 using:

 # python
  import sanlock
  sanlock.write_lockspace(lockspace=hosted-engine,
 ... path=/rhev/data-center/mnt/nfs/hosted engine storage
 domain/ha_agent/hosted-engine.lockspace,
 ... offset=0)
 

 Then try starting the services (both broker and agent) again.

 --
 Martin Sivák
 msi...@redhat.com
 Red Hat Czech
 RHEV-M SLA / Brno, CZ


 - Original Message -
  On 04/23/2014 11:08 AM, Martin Sivak wrote:
   Hi René,
  
   libvirtError: Failed to acquire lock: No space left on device
  
   2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
   lockspace found -1 failed 0 name
 2851af27-8744-445d-9fb1-a0d083c8dc82
  
   Can you please check the contents of /rhev/data-center/your nfs
   mount/nfs domain uuid/ha_agent/?
  
   This is how it should look like:
  
   [root@dev-03 ~]# ls -al
  
 /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
   total 2036
   drwxr-x---. 2 vdsm kvm4096 Mar 19 18:46 .
   drwxr-xr-x. 6 vdsm kvm4096 Mar 19 18:46 ..
   -rw-rw. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace
   -rw-rw. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata
  
   The errors seem to indicate that you somehow lost the lockspace file.
 
  True :)
  Isn't this file created when hosted engine is started? Or how can I
  create this file manually?
 
  
   --
   Martin Sivák
   msi...@redhat.com
   Red Hat Czech
   RHEV-M SLA / Brno, CZ
  
   - Original Message -
   On 04/23/2014 12:28 AM, Doron Fediuck wrote:
   Hi Rene,
   any idea what closed your ovirtmgmt bridge?
   as long as it is down vdsm may have issues starting up properly
   and this is why you see the complaints on the rpc server.
  
   Can you try manually fixing the network part first and then
   restart vdsm?
   Once vdsm is happy hosted engine VM will start.
  
   Thanks for your feedback, Doron.
  
   My ovirtmgmt bridge seems to be on or isn't it:
   # brctl show ovirtmgmt
   bridge namebridge id   STP enabled interfaces
   ovirtmgmt  8000.0025907587c2   no  eth0.200
  
   # ip a s ovirtmgmt
   7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue
   state UNKNOWN
 link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
 inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt
 inet6 fe80::225:90ff:fe75:87c2/64 scope link
valid_lft forever preferred_lft forever
  
   # ip a s eth0.200
   6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc
   noqueue state UP
 link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff
 inet6 fe80::225:90ff:fe75:87c2/64 scope link
valid_lft forever preferred_lft forever
  
   I tried the following yesterday:
   Copy virtual disk from GlusterFS storage to local disk of host and
   create a new vm with virt-manager which loads ovirtmgmt disk. I could
   reach my engine over the ovirtmgmt bridge (so bridge must be working).
  
   I also started libvirtd with Option -v and I saw the following in
   libvirtd.log when trying to start ovirt engine:
   2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 :
   Command result 0, with PID 11491
   2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 :
 Result
   exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0'
 is
   not a chain
  
   So it could be that something is broken in my hosted-engine network.
 Do
   you have any clue how I can troubleshoot this?
  
  
   Thanks,
   René
  
  
  
   - Original Message -
   From: René Koch rk...@linuxland.at
   To: Martin Sivak msi...@redhat.com
   Cc: users@ovirt.org
   Sent: Tuesday, April 22, 2014 1:46:38 PM
   Subject: Re: [ovirt-users] hosted engine health check issues
  
   Hi,
  
   I rebooted one of my ovirt hosts today and the result is now that I
   can't start hosted-engine anymore.
  
   ovirt-ha-agent isn't running because the lockspace file is missing
   (sanlock complains about it).
   So I tried to start hosted-engine with --vm-start and I get

Re: [ovirt-users] hosted engine health check issues

2014-04-23 Thread Martin Sivak
 this?
   
   
Thanks,
René
   
   
   
- Original Message -
From: René Koch rk...@linuxland.at
To: Martin Sivak msi...@redhat.com
Cc: users@ovirt.org
Sent: Tuesday, April 22, 2014 1:46:38 PM
Subject: Re: [ovirt-users] hosted engine health check issues
   
Hi,
   
I rebooted one of my ovirt hosts today and the result is now that I
can't start hosted-engine anymore.
   
ovirt-ha-agent isn't running because the lockspace file is missing
(sanlock complains about it).
So I tried to start hosted-engine with --vm-start and I get the
following errors:
   
== /var/log/sanlock.log ==
2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
lockspace found -1 failed 0 name
  2851af27-8744-445d-9fb1-a0d083c8dc82
   
== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22
  12:38:17+0200 654
[3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0
  name
2851af27-8744-445d-9fb1-a0d083c8dc82
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0)
  entering
disabled state
Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous
  mode
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0)
  entering
disabled state
   
== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22
12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
lock: No space left on device
Thread-21::DEBUG::2014-04-22
12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
  released
Thread-21::ERROR::2014-04-22
12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
  failed
Traceback (most recent call last):
 File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm
   self._run()
 File /usr/share/vdsm/vm.py, line 3170, in _run
   self._connection.createXML(domxml, flags),
 File
 /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
line 92, in wrapper
   ret = f(*args, **kwargs)
 File /usr/lib64/python2.6/site-packages/libvirt.py, line
  2665, in
createXML
   if ret is None:raise libvirtError('virDomainCreateXML()
  failed',
conn=self)
libvirtError: Failed to acquire lock: No space left on device
   
== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
failed#012Traceback (most recent call last):#012  File
/usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012
self._run()#012  File /usr/share/vdsm/vm.py, line 3170, in
  _run#012
self._connection.createXML(domxml, flags),#012  File
/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
  line 92,
in wrapper#012ret = f(*args, **kwargs)#012  File
/usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in
createXML#012if ret is None:raise
  libvirtError('virDomainCreateXML()
failed', conn=self)#012libvirtError: Failed to acquire lock: No
  space
left on device
   
== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22
12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
Failed to acquire lock: No space left on device
   
   
No space left on device is nonsense as there is enough space (I had
  this
issue last time as well where I had to patch machine.py, but this
  file
is now Python 2.6.6 compatible.
   
Any idea what prevents hosted-engine from starting?
ovirt-ha-broker, vdsmd and sanlock are running btw.
   
Btw, I can see in log that json rpc server module is missing - which
package is required for CentOS 6.5?
Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the
  json
rpc server module. Please make sure it is installed.
   
   
Thanks,
René
   
   
   
On 04/17/2014 10:02 AM, Martin Sivak wrote:
Hi,
   
How can I disable notifications?
   
The notification is configured in
/etc/ovirt-hosted-engine-ha/broker.conf
section notification.
The email is sent when the key state_transition exists and the
  string
OldState-NewState contains the (case insensitive) regexp from the
value.
   
Is it intended to send out these messages and detect that ovirt
engine
is down (which is false anyway), but not to restart the vm?
   
Forget about emails for now and check the
/var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and
  attach
them
as well btw).
   
oVirt hosts think that hosted engine is down because it seems
  that
hosts
can't write to hosted-engine.lockspace due to glusterfs issues
  (or
at
least I think so

Re: [ovirt-users] hosted engine health check issues

2014-04-22 Thread René Koch

Hi,

I rebooted one of my ovirt hosts today and the result is now that I 
can't start hosted-engine anymore.


ovirt-ha-agent isn't running because the lockspace file is missing 
(sanlock complains about it).
So I tried to start hosted-engine with --vm-start and I get the 
following errors:


== /var/log/sanlock.log ==
2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid 
lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82


== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 
[3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 
2851af27-8744-445d-9fb1-a0d083c8dc82
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering 
disabled state

Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering 
disabled state


== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22 
12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown 
libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire 
lock: No space left on device
Thread-21::DEBUG::2014-04-22 
12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) 
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
Thread-21::ERROR::2014-04-22 
12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) 
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed

Traceback (most recent call last):
  File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm
self._run()
  File /usr/share/vdsm/vm.py, line 3170, in _run
self._connection.createXML(domxml, flags),
  File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, 
line 92, in wrapper

ret = f(*args, **kwargs)
  File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in 
createXML
if ret is None:raise libvirtError('virDomainCreateXML() failed', 
conn=self)

libvirtError: Failed to acquire lock: No space left on device

== /var/log/messages ==
Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR 
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process 
failed#012Traceback (most recent call last):#012  File 
/usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012 
self._run()#012  File /usr/share/vdsm/vm.py, line 3170, in _run#012 
 self._connection.createXML(domxml, flags),#012  File 
/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, 
in wrapper#012ret = f(*args, **kwargs)#012  File 
/usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in 
createXML#012if ret is None:raise libvirtError('virDomainCreateXML() 
failed', conn=self)#012libvirtError: Failed to acquire lock: No space 
left on device


== /var/log/vdsm/vdsm.log ==
Thread-21::DEBUG::2014-04-22 
12:38:17,569::vm::2731::vm.Vm::(setDownStatus) 
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down: 
Failed to acquire lock: No space left on device



No space left on device is nonsense as there is enough space (I had this 
issue last time as well where I had to patch machine.py, but this file 
is now Python 2.6.6 compatible.


Any idea what prevents hosted-engine from starting?
ovirt-ha-broker, vdsmd and sanlock are running btw.

Btw, I can see in log that json rpc server module is missing - which 
package is required for CentOS 6.5?
Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json 
rpc server module. Please make sure it is installed.



Thanks,
René



On 04/17/2014 10:02 AM, Martin Sivak wrote:

Hi,


How can I disable notifications?


The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf 
section notification.
The email is sent when the key state_transition exists and the string 
OldState-NewState contains the (case insensitive) regexp from the value.


Is it intended to send out these messages and detect that ovirt engine
is down (which is false anyway), but not to restart the vm?


Forget about emails for now and check the 
/var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them as 
well btw).


oVirt hosts think that hosted engine is down because it seems that hosts
can't write to hosted-engine.lockspace due to glusterfs issues (or at
least I think so).


The hosts think so or can't really write there? The lockspace is managed by 
sanlock and our HA daemons do not touch it at all. We only ask sanlock to get 
make sure we have unique server id.


Is is possible or planned to make the whole ha feature optional?


Well the system won't perform any automatic actions if you put the hosted 
engine to global maintenance and only start/stop/migrate the VM manually.
I would discourage you from stopping agent/broker, because the engine itself 
has some logic based on the reporting.

Regards

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

- Original Message -

On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:

On 04/14/2014 10:50 AM, René Koch wrote:

Hi,

I have 

Re: [ovirt-users] hosted engine health check issues

2014-04-22 Thread Itamar Heim

On 04/14/2014 11:50 AM, René Koch wrote:

Hi,

I have some issues with hosted engine status.

oVirt hosts think that hosted engine is down because it seems that hosts
can't write to hosted-engine.lockspace due to glusterfs issues (or at
least I think so).

Here's the output of vm-status:

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date  : False
Hostname   : 10.0.200.102
Host ID: 1
Engine status  : unknown stale-data
Score  : 2400
Local maintenance  : False
Host timestamp : 1397035677
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
 host-id=1
 score=2400
 maintenance=False
 state=EngineUp


--== Host 2 status ==--

Status up-to-date  : True
Hostname   : 10.0.200.101
Host ID: 2
Engine status  : {'reason': 'vm not running on this
host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
Score  : 0
Local maintenance  : False
Host timestamp : 1397464031
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
 host-id=2
 score=0
 maintenance=False
 state=EngineUnexpectedlyDown
 timeout=Mon Apr 14 10:35:05 2014

oVirt engine is sending me 2 emails every 10 minutes with the following
subjects:
- ovirt-hosted-engine state transition EngineDown-EngineStart
- ovirt-hosted-engine state transition EngineStart-EngineUp

In oVirt webadmin I can see the following message:
VM HostedEngine is down. Exit message: internal error Failed to acquire
lock: error -243.

These messages are really annoying as oVirt isn't doing anything with
hosted engine - I have an uptime of 9 days in my engine vm.

So my questions are now:
Is it intended to send out these messages and detect that ovirt engine
is down (which is false anyway), but not to restart the vm?

How can I disable notifications? I'm planning to write a Nagios plugin
which parses the output of hosted-engine --vm-status and only Nagios
should notify me, not hosted-engine script.

Is is possible or planned to make the whole ha feature optional? I
really really really hate cluster software as it causes more troubles
then standalone machines and in my case the hosted-engine ha feature
really causes troubles (and I didn't had a hardware or network outage
yet only issues with hosted-engine ha agent). I don't need any ha
feature for hosted engine. I just want to run engine virtualized on
oVirt and if engine vm fails (e.g. because of issues with a host) I'll
restart it on another node.

Thanks,
René




I'm pretty sure we removed hosted-engine on gluster due to concerns 
around the locking issues.

is the gluster configured with quorum to avoid split brains?

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted engine health check issues

2014-04-22 Thread René Koch

On 04/22/2014 04:04 PM, Itamar Heim wrote:

On 04/14/2014 11:50 AM, René Koch wrote:

Hi,

I have some issues with hosted engine status.

oVirt hosts think that hosted engine is down because it seems that hosts
can't write to hosted-engine.lockspace due to glusterfs issues (or at
least I think so).

Here's the output of vm-status:

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date  : False
Hostname   : 10.0.200.102
Host ID: 1
Engine status  : unknown stale-data
Score  : 2400
Local maintenance  : False
Host timestamp : 1397035677
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
 host-id=1
 score=2400
 maintenance=False
 state=EngineUp


--== Host 2 status ==--

Status up-to-date  : True
Hostname   : 10.0.200.101
Host ID: 2
Engine status  : {'reason': 'vm not running on this
host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
Score  : 0
Local maintenance  : False
Host timestamp : 1397464031
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
 host-id=2
 score=0
 maintenance=False
 state=EngineUnexpectedlyDown
 timeout=Mon Apr 14 10:35:05 2014

oVirt engine is sending me 2 emails every 10 minutes with the following
subjects:
- ovirt-hosted-engine state transition EngineDown-EngineStart
- ovirt-hosted-engine state transition EngineStart-EngineUp

In oVirt webadmin I can see the following message:
VM HostedEngine is down. Exit message: internal error Failed to acquire
lock: error -243.

These messages are really annoying as oVirt isn't doing anything with
hosted engine - I have an uptime of 9 days in my engine vm.

So my questions are now:
Is it intended to send out these messages and detect that ovirt engine
is down (which is false anyway), but not to restart the vm?

How can I disable notifications? I'm planning to write a Nagios plugin
which parses the output of hosted-engine --vm-status and only Nagios
should notify me, not hosted-engine script.

Is is possible or planned to make the whole ha feature optional? I
really really really hate cluster software as it causes more troubles
then standalone machines and in my case the hosted-engine ha feature
really causes troubles (and I didn't had a hardware or network outage
yet only issues with hosted-engine ha agent). I don't need any ha
feature for hosted engine. I just want to run engine virtualized on
oVirt and if engine vm fails (e.g. because of issues with a host) I'll
restart it on another node.

Thanks,
René




I'm pretty sure we removed hosted-engine on gluster due to concerns
around the locking issues.
is the gluster configured with quorum to avoid split brains?



At the moment there's no quorum (1 host online is enough - but GlusterFS 
network is on dedicated nics which are directly connected between two 
hosts), as I'm waiting for additional memory and disks for the other 2 
nodes (so I have only 2 nodes atm).


But GlusterFS looks fine (now) - same for info heal-failed and info 
split-brain:


# gluster volume heal engine info
Gathering Heal info on volume engine has been successful

Brick ovirt-host01-gluster:/data/engine
Number of entries: 0

Brick ovirt-host02-gluster:/data/engine
Number of entries: 0


I can also create (touch) the lockspace file on the mounted GlusterFS 
volume - so imho GlusterFS isn't blocking libvirt.



Regards,
René
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted engine health check issues

2014-04-22 Thread Doron Fediuck
Hi Rene,
any idea what closed your ovirtmgmt bridge?
as long as it is down vdsm may have issues starting up properly
and this is why you see the complaints on the rpc server.

Can you try manually fixing the network part first and then
restart vdsm?
Once vdsm is happy hosted engine VM will start.

- Original Message -
 From: René Koch rk...@linuxland.at
 To: Martin Sivak msi...@redhat.com
 Cc: users@ovirt.org
 Sent: Tuesday, April 22, 2014 1:46:38 PM
 Subject: Re: [ovirt-users] hosted engine health check issues
 
 Hi,
 
 I rebooted one of my ovirt hosts today and the result is now that I
 can't start hosted-engine anymore.
 
 ovirt-ha-agent isn't running because the lockspace file is missing
 (sanlock complains about it).
 So I tried to start hosted-engine with --vm-start and I get the
 following errors:
 
 == /var/log/sanlock.log ==
 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid
 lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82
 
 == /var/log/messages ==
 Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654
 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name
 2851af27-8744-445d-9fb1-a0d083c8dc82
 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
 disabled state
 Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode
 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering
 disabled state
 
 == /var/log/vdsm/vdsm.log ==
 Thread-21::DEBUG::2014-04-22
 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown
 libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire
 lock: No space left on device
 Thread-21::DEBUG::2014-04-22
 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
 vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
 Thread-21::ERROR::2014-04-22
 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
 vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed
 Traceback (most recent call last):
File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm
  self._run()
File /usr/share/vdsm/vm.py, line 3170, in _run
  self._connection.createXML(domxml, flags),
File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py,
 line 92, in wrapper
  ret = f(*args, **kwargs)
File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in
 createXML
  if ret is None:raise libvirtError('virDomainCreateXML() failed',
 conn=self)
 libvirtError: Failed to acquire lock: No space left on device
 
 == /var/log/messages ==
 Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
 vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
 failed#012Traceback (most recent call last):#012  File
 /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012
 self._run()#012  File /usr/share/vdsm/vm.py, line 3170, in _run#012
   self._connection.createXML(domxml, flags),#012  File
 /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92,
 in wrapper#012ret = f(*args, **kwargs)#012  File
 /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in
 createXML#012if ret is None:raise libvirtError('virDomainCreateXML()
 failed', conn=self)#012libvirtError: Failed to acquire lock: No space
 left on device
 
 == /var/log/vdsm/vdsm.log ==
 Thread-21::DEBUG::2014-04-22
 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
 vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
 Failed to acquire lock: No space left on device
 
 
 No space left on device is nonsense as there is enough space (I had this
 issue last time as well where I had to patch machine.py, but this file
 is now Python 2.6.6 compatible.
 
 Any idea what prevents hosted-engine from starting?
 ovirt-ha-broker, vdsmd and sanlock are running btw.
 
 Btw, I can see in log that json rpc server module is missing - which
 package is required for CentOS 6.5?
 Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json
 rpc server module. Please make sure it is installed.
 
 
 Thanks,
 René
 
 
 
 On 04/17/2014 10:02 AM, Martin Sivak wrote:
  Hi,
 
  How can I disable notifications?
 
  The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf
  section notification.
  The email is sent when the key state_transition exists and the string
  OldState-NewState contains the (case insensitive) regexp from the value.
 
  Is it intended to send out these messages and detect that ovirt engine
  is down (which is false anyway), but not to restart the vm?
 
  Forget about emails for now and check the
  /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them
  as well btw).
 
  oVirt hosts think that hosted engine is down because it seems that hosts
  can't write to hosted-engine.lockspace due to glusterfs issues (or at
  least I think so).
 
  The hosts think so or can't really write there? The lockspace is managed by
  sanlock and our HA daemons do not touch it at all. We only

Re: [ovirt-users] hosted engine health check issues

2014-04-17 Thread René Koch

On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:

On 04/14/2014 10:50 AM, René Koch wrote:

Hi,

I have some issues with hosted engine status.

oVirt hosts think that hosted engine is down because it seems that hosts
can't write to hosted-engine.lockspace due to glusterfs issues (or at
least I think so).

Here's the output of vm-status:

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date  : False
Hostname   : 10.0.200.102
Host ID: 1
Engine status  : unknown stale-data
Score  : 2400
Local maintenance  : False
Host timestamp : 1397035677
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
 host-id=1
 score=2400
 maintenance=False
 state=EngineUp


--== Host 2 status ==--

Status up-to-date  : True
Hostname   : 10.0.200.101
Host ID: 2
Engine status  : {'reason': 'vm not running on this
host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
Score  : 0
Local maintenance  : False
Host timestamp : 1397464031
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
 host-id=2
 score=0
 maintenance=False
 state=EngineUnexpectedlyDown
 timeout=Mon Apr 14 10:35:05 2014

oVirt engine is sending me 2 emails every 10 minutes with the following
subjects:
- ovirt-hosted-engine state transition EngineDown-EngineStart
- ovirt-hosted-engine state transition EngineStart-EngineUp

In oVirt webadmin I can see the following message:
VM HostedEngine is down. Exit message: internal error Failed to acquire
lock: error -243.

These messages are really annoying as oVirt isn't doing anything with
hosted engine - I have an uptime of 9 days in my engine vm.

So my questions are now:
Is it intended to send out these messages and detect that ovirt engine
is down (which is false anyway), but not to restart the vm?

How can I disable notifications? I'm planning to write a Nagios plugin
which parses the output of hosted-engine --vm-status and only Nagios
should notify me, not hosted-engine script.

Is is possible or planned to make the whole ha feature optional? I
really really really hate cluster software as it causes more troubles
then standalone machines and in my case the hosted-engine ha feature
really causes troubles (and I didn't had a hardware or network outage
yet only issues with hosted-engine ha agent). I don't need any ha
feature for hosted engine. I just want to run engine virtualized on
oVirt and if engine vm fails (e.g. because of issues with a host) I'll
restart it on another node.


Hi, you can:
1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
the logger as you like
2. or kill ovirt-ha-broker  ovirt-ha-agent services


Thanks for the information.
So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't 
running?



Regards,
René



--Jirka


Thanks,
René





___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted engine health check issues

2014-04-17 Thread Jiri Moskovcak

On 04/17/2014 09:34 AM, René Koch wrote:

On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:

On 04/14/2014 10:50 AM, René Koch wrote:

Hi,

I have some issues with hosted engine status.

oVirt hosts think that hosted engine is down because it seems that hosts
can't write to hosted-engine.lockspace due to glusterfs issues (or at
least I think so).

Here's the output of vm-status:

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date  : False
Hostname   : 10.0.200.102
Host ID: 1
Engine status  : unknown stale-data
Score  : 2400
Local maintenance  : False
Host timestamp : 1397035677
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
 host-id=1
 score=2400
 maintenance=False
 state=EngineUp


--== Host 2 status ==--

Status up-to-date  : True
Hostname   : 10.0.200.101
Host ID: 2
Engine status  : {'reason': 'vm not running on this
host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
Score  : 0
Local maintenance  : False
Host timestamp : 1397464031
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
 host-id=2
 score=0
 maintenance=False
 state=EngineUnexpectedlyDown
 timeout=Mon Apr 14 10:35:05 2014

oVirt engine is sending me 2 emails every 10 minutes with the following
subjects:
- ovirt-hosted-engine state transition EngineDown-EngineStart
- ovirt-hosted-engine state transition EngineStart-EngineUp

In oVirt webadmin I can see the following message:
VM HostedEngine is down. Exit message: internal error Failed to acquire
lock: error -243.

These messages are really annoying as oVirt isn't doing anything with
hosted engine - I have an uptime of 9 days in my engine vm.

So my questions are now:
Is it intended to send out these messages and detect that ovirt engine
is down (which is false anyway), but not to restart the vm?

How can I disable notifications? I'm planning to write a Nagios plugin
which parses the output of hosted-engine --vm-status and only Nagios
should notify me, not hosted-engine script.

Is is possible or planned to make the whole ha feature optional? I
really really really hate cluster software as it causes more troubles
then standalone machines and in my case the hosted-engine ha feature
really causes troubles (and I didn't had a hardware or network outage
yet only issues with hosted-engine ha agent). I don't need any ha
feature for hosted engine. I just want to run engine virtualized on
oVirt and if engine vm fails (e.g. because of issues with a host) I'll
restart it on another node.


Hi, you can:
1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
the logger as you like
2. or kill ovirt-ha-broker  ovirt-ha-agent services


Thanks for the information.
So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't
running?



- yes, it might cause some problems if you set up another host for 
hosted engine and run the agent on the other host, but as long as you 
don't have the agent running anywhere or you don't need to migrate the 
engine vm, you should be fine.


--Jirka



Regards,
René



--Jirka


Thanks,
René






___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted engine health check issues

2014-04-17 Thread René Koch

On 04/17/2014 09:40 AM, Jiri Moskovcak wrote:

On 04/17/2014 09:34 AM, René Koch wrote:

On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:

On 04/14/2014 10:50 AM, René Koch wrote:

Hi,

I have some issues with hosted engine status.

oVirt hosts think that hosted engine is down because it seems that
hosts
can't write to hosted-engine.lockspace due to glusterfs issues (or at
least I think so).

Here's the output of vm-status:

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date  : False
Hostname   : 10.0.200.102
Host ID: 1
Engine status  : unknown stale-data
Score  : 2400
Local maintenance  : False
Host timestamp : 1397035677
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
 host-id=1
 score=2400
 maintenance=False
 state=EngineUp


--== Host 2 status ==--

Status up-to-date  : True
Hostname   : 10.0.200.101
Host ID: 2
Engine status  : {'reason': 'vm not running on this
host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
Score  : 0
Local maintenance  : False
Host timestamp : 1397464031
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
 host-id=2
 score=0
 maintenance=False
 state=EngineUnexpectedlyDown
 timeout=Mon Apr 14 10:35:05 2014

oVirt engine is sending me 2 emails every 10 minutes with the following
subjects:
- ovirt-hosted-engine state transition EngineDown-EngineStart
- ovirt-hosted-engine state transition EngineStart-EngineUp

In oVirt webadmin I can see the following message:
VM HostedEngine is down. Exit message: internal error Failed to acquire
lock: error -243.

These messages are really annoying as oVirt isn't doing anything with
hosted engine - I have an uptime of 9 days in my engine vm.

So my questions are now:
Is it intended to send out these messages and detect that ovirt engine
is down (which is false anyway), but not to restart the vm?

How can I disable notifications? I'm planning to write a Nagios plugin
which parses the output of hosted-engine --vm-status and only Nagios
should notify me, not hosted-engine script.

Is is possible or planned to make the whole ha feature optional? I
really really really hate cluster software as it causes more troubles
then standalone machines and in my case the hosted-engine ha feature
really causes troubles (and I didn't had a hardware or network outage
yet only issues with hosted-engine ha agent). I don't need any ha
feature for hosted engine. I just want to run engine virtualized on
oVirt and if engine vm fails (e.g. because of issues with a host) I'll
restart it on another node.


Hi, you can:
1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
the logger as you like
2. or kill ovirt-ha-broker  ovirt-ha-agent services


Thanks for the information.
So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't
running?



- yes, it might cause some problems if you set up another host for
hosted engine and run the agent on the other host, but as long as you
don't have the agent running anywhere or you don't need to migrate the
engine vm, you should be fine.


Thanks!

At the moment I have an issue with ovirt-ha-broker running crazy and 
don't react on kill -9:


# ps aux | egrep -e '%CPU|\[ovirt-ha-broker\]' | grep -v grep
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
vdsm  3059  224  0.0  0 0 ?Zl   Mar03 145536:45 
[ovirt-ha-broker] defunct

# kill -9 3059
# ps aux | egrep -e '%CPU|\[ovirt-ha-broker\]' | grep -v grep
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
vdsm  3059  224  0.0  0 0 ?Zl   Mar03 145545:17 
[ovirt-ha-broker] defunct





--Jirka



Regards,
René



--Jirka


Thanks,
René







___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted engine health check issues

2014-04-17 Thread Martin Sivak
Hi,

  How can I disable notifications?

The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf 
section notification.
The email is sent when the key state_transition exists and the string 
OldState-NewState contains the (case insensitive) regexp from the value.

  Is it intended to send out these messages and detect that ovirt engine
  is down (which is false anyway), but not to restart the vm?

Forget about emails for now and check the 
/var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them as 
well btw).

  oVirt hosts think that hosted engine is down because it seems that hosts
  can't write to hosted-engine.lockspace due to glusterfs issues (or at
  least I think so).

The hosts think so or can't really write there? The lockspace is managed by 
sanlock and our HA daemons do not touch it at all. We only ask sanlock to get 
make sure we have unique server id.

  Is is possible or planned to make the whole ha feature optional?

Well the system won't perform any automatic actions if you put the hosted 
engine to global maintenance and only start/stop/migrate the VM manually.
I would discourage you from stopping agent/broker, because the engine itself 
has some logic based on the reporting.

Regards

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

- Original Message -
 On 04/15/2014 04:53 PM, Jiri Moskovcak wrote:
  On 04/14/2014 10:50 AM, René Koch wrote:
  Hi,
 
  I have some issues with hosted engine status.
 
  oVirt hosts think that hosted engine is down because it seems that hosts
  can't write to hosted-engine.lockspace due to glusterfs issues (or at
  least I think so).
 
  Here's the output of vm-status:
 
  # hosted-engine --vm-status
 
 
  --== Host 1 status ==--
 
  Status up-to-date  : False
  Hostname   : 10.0.200.102
  Host ID: 1
  Engine status  : unknown stale-data
  Score  : 2400
  Local maintenance  : False
  Host timestamp : 1397035677
  Extra metadata (valid at timestamp):
   metadata_parse_version=1
   metadata_feature_version=1
   timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
   host-id=1
   score=2400
   maintenance=False
   state=EngineUp
 
 
  --== Host 2 status ==--
 
  Status up-to-date  : True
  Hostname   : 10.0.200.101
  Host ID: 2
  Engine status  : {'reason': 'vm not running on this
  host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
  Score  : 0
  Local maintenance  : False
  Host timestamp : 1397464031
  Extra metadata (valid at timestamp):
   metadata_parse_version=1
   metadata_feature_version=1
   timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
   host-id=2
   score=0
   maintenance=False
   state=EngineUnexpectedlyDown
   timeout=Mon Apr 14 10:35:05 2014
 
  oVirt engine is sending me 2 emails every 10 minutes with the following
  subjects:
  - ovirt-hosted-engine state transition EngineDown-EngineStart
  - ovirt-hosted-engine state transition EngineStart-EngineUp
 
  In oVirt webadmin I can see the following message:
  VM HostedEngine is down. Exit message: internal error Failed to acquire
  lock: error -243.
 
  These messages are really annoying as oVirt isn't doing anything with
  hosted engine - I have an uptime of 9 days in my engine vm.
 
  So my questions are now:
  Is it intended to send out these messages and detect that ovirt engine
  is down (which is false anyway), but not to restart the vm?
 
  How can I disable notifications? I'm planning to write a Nagios plugin
  which parses the output of hosted-engine --vm-status and only Nagios
  should notify me, not hosted-engine script.
 
  Is is possible or planned to make the whole ha feature optional? I
  really really really hate cluster software as it causes more troubles
  then standalone machines and in my case the hosted-engine ha feature
  really causes troubles (and I didn't had a hardware or network outage
  yet only issues with hosted-engine ha agent). I don't need any ha
  feature for hosted engine. I just want to run engine virtualized on
  oVirt and if engine vm fails (e.g. because of issues with a host) I'll
  restart it on another node.
 
  Hi, you can:
  1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak
  the logger as you like
  2. or kill ovirt-ha-broker  ovirt-ha-agent services
 
 Thanks for the information.
 So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't
 running?
 
 
 Regards,
 René
 
 
  --Jirka
 
  Thanks,
  René
 
 
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users 

Re: [ovirt-users] hosted engine health check issues

2014-04-15 Thread Jiri Moskovcak

On 04/14/2014 10:50 AM, René Koch wrote:

Hi,

I have some issues with hosted engine status.

oVirt hosts think that hosted engine is down because it seems that hosts
can't write to hosted-engine.lockspace due to glusterfs issues (or at
least I think so).

Here's the output of vm-status:

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date  : False
Hostname   : 10.0.200.102
Host ID: 1
Engine status  : unknown stale-data
Score  : 2400
Local maintenance  : False
Host timestamp : 1397035677
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
 host-id=1
 score=2400
 maintenance=False
 state=EngineUp


--== Host 2 status ==--

Status up-to-date  : True
Hostname   : 10.0.200.101
Host ID: 2
Engine status  : {'reason': 'vm not running on this
host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}
Score  : 0
Local maintenance  : False
Host timestamp : 1397464031
Extra metadata (valid at timestamp):
 metadata_parse_version=1
 metadata_feature_version=1
 timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
 host-id=2
 score=0
 maintenance=False
 state=EngineUnexpectedlyDown
 timeout=Mon Apr 14 10:35:05 2014

oVirt engine is sending me 2 emails every 10 minutes with the following
subjects:
- ovirt-hosted-engine state transition EngineDown-EngineStart
- ovirt-hosted-engine state transition EngineStart-EngineUp

In oVirt webadmin I can see the following message:
VM HostedEngine is down. Exit message: internal error Failed to acquire
lock: error -243.

These messages are really annoying as oVirt isn't doing anything with
hosted engine - I have an uptime of 9 days in my engine vm.

So my questions are now:
Is it intended to send out these messages and detect that ovirt engine
is down (which is false anyway), but not to restart the vm?

How can I disable notifications? I'm planning to write a Nagios plugin
which parses the output of hosted-engine --vm-status and only Nagios
should notify me, not hosted-engine script.

Is is possible or planned to make the whole ha feature optional? I
really really really hate cluster software as it causes more troubles
then standalone machines and in my case the hosted-engine ha feature
really causes troubles (and I didn't had a hardware or network outage
yet only issues with hosted-engine ha agent). I don't need any ha
feature for hosted engine. I just want to run engine virtualized on
oVirt and if engine vm fails (e.g. because of issues with a host) I'll
restart it on another node.


Hi, you can:
1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak 
the logger as you like

2. or kill ovirt-ha-broker  ovirt-ha-agent services

--Jirka


Thanks,
René




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] hosted engine health check issues

2014-04-14 Thread René Koch

Hi,

I have some issues with hosted engine status.

oVirt hosts think that hosted engine is down because it seems that hosts 
can't write to hosted-engine.lockspace due to glusterfs issues (or at 
least I think so).


Here's the output of vm-status:

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date  : False
Hostname   : 10.0.200.102
Host ID: 1
Engine status  : unknown stale-data
Score  : 2400
Local maintenance  : False
Host timestamp : 1397035677
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1397035677 (Wed Apr  9 11:27:57 2014)
host-id=1
score=2400
maintenance=False
state=EngineUp


--== Host 2 status ==--

Status up-to-date  : True
Hostname   : 10.0.200.101
Host ID: 2
Engine status  : {'reason': 'vm not running on this 
host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}

Score  : 0
Local maintenance  : False
Host timestamp : 1397464031
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1397464031 (Mon Apr 14 10:27:11 2014)
host-id=2
score=0
maintenance=False
state=EngineUnexpectedlyDown
timeout=Mon Apr 14 10:35:05 2014

oVirt engine is sending me 2 emails every 10 minutes with the following 
subjects:

- ovirt-hosted-engine state transition EngineDown-EngineStart
- ovirt-hosted-engine state transition EngineStart-EngineUp

In oVirt webadmin I can see the following message:
VM HostedEngine is down. Exit message: internal error Failed to acquire 
lock: error -243.


These messages are really annoying as oVirt isn't doing anything with 
hosted engine - I have an uptime of 9 days in my engine vm.


So my questions are now:
Is it intended to send out these messages and detect that ovirt engine 
is down (which is false anyway), but not to restart the vm?


How can I disable notifications? I'm planning to write a Nagios plugin 
which parses the output of hosted-engine --vm-status and only Nagios 
should notify me, not hosted-engine script.


Is is possible or planned to make the whole ha feature optional? I 
really really really hate cluster software as it causes more troubles 
then standalone machines and in my case the hosted-engine ha feature 
really causes troubles (and I didn't had a hardware or network outage 
yet only issues with hosted-engine ha agent). I don't need any ha 
feature for hosted engine. I just want to run engine virtualized on 
oVirt and if engine vm fails (e.g. because of issues with a host) I'll 
restart it on another node.


Thanks,
René


--
Best Regards

René Koch
Senior Solution Architect


LIS-Linuxland GmbH
Brünner Straße 163, A-1210 Vienna

Phone:   +43 1 236 91 60
Mobile:  +43 660 / 512 21 31
E-Mail:  rk...@linuxland.at


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users