Re: [Users] The SPM host node is in unresponsive mode

Itamar Heim Tue, 15 May 2012 01:36:39 -0700

On 05/15/2012 09:14 AM, Shu Ming wrote:

Some errors in service status, Is engine-notifierd critical to VDSM? Why
did it say" pgrep: invalid user name: engine"


no. engine-notifierd just sends emails to users


[root@ovirt-node1 ~]# service --status-all
/etc/init.d/ceph: ceph conf /etc/ceph/ceph.conf not found; system is not
configured.
# Generated by ebtables-save v1.0 on Tue May 15 14:08:06 CST 2012
*nat
:PREROUTING ACCEPT
:OUTPUT ACCEPT
:POSTROUTING ACCEPT

pgrep: invalid user name: engine
/etc/init.d/engine-notifierd is stopped
JAVA_EXECUTABLE or HSQLDB_JAR_PATH in '/etc/sysconfig/hsqldb' is set to
a non-file.
No active sessions
On 2012-5-15 12:19, Haim Ateya wrote:


----- Original Message -----

From: "Shu Ming"<[email protected]>
To: "[email protected]"<[email protected]>
Sent: Tuesday, May 15, 2012 4:56:36 AM
Subject: [Users] The SPM host node is in unresponsive mode

Hi,
I attached one host node in my engine. Because it is the only one
node, it is automatically the SPM node. And it used to run well in
my
engine. Yesterday, some errors happened in the network work of the
host
node. That made the node become "unresponsive" in the engine. I am
sure the network errors are fixed and want to bring the node back to
life now. However, I found that the only one node could not be
"confirm as host been rebooted" and could not be set into the
maintenance mode. The reason given there is no active host in the
datacenter and SPM can not enter into maintenance mode. It seems
that
it fell into a logic loop here. Losting network can be quite common
in
developing environment even in production environment, I think we
should
have a way to address this problem on how to repair a host node
encountering network down for a while.

Hi Shu,

first, for the manual fence to work ("confirm host have been
rebooted") you will need
another host in the cluster which will be used as a proxy and send the
actual manual fence command.
second, you are absolutely right, loss of network is a common
scenario, and we should be able
to recover, but lets try to understand why your host remain
unresponsive after network returned.
please ssh to the host and try the following:

- vdsClient -s 0 getVdsCaps (validity check making sure vdsm service
is up and running and communicate with its network socket from localhost)
- please ping between host and engine
- please make sure there is no firewall on blocking tcp 54321 (on both
host and engine)

also, please provide vdsm.log (from the time network issues begun) and
spm-lock.log (both located on /var/log/vdsm/).

as for a mitigation, we can always manipulate db and set it correctly,
but first, lets try the above.

--
Shu Ming<[email protected]>
IBM China Systems and Technology Laboratory


_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users


_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [Users] The SPM host node is in unresponsive mode

Reply via email to