----- Original Message -----
> From: "Shu Ming" <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Tuesday, May 15, 2012 4:56:36 AM
> Subject: [Users] The SPM host  node is in unresponsive mode
> 
> Hi,
>    I attached one host node in my engine.  Because it is the only one
> node, it is automatically the SPM node.  And it used to run well in
> my
> engine.  Yesterday, some errors happened in the network work of the
> host
> node.  That made the node become "unresponsive" in the engine.  I am
> sure the network errors are fixed and want to bring the node back to
> life now.  However, I found that the only one node could not  be
> "confirm as host been rebooted" and could not be set into the
> maintenance mode.   The reason  given there is no active host in the
> datacenter and SPM can not enter into maintenance mode.  It seems
> that
> it fell into a logic loop here.  Losting network can be quite common
> in
> developing environment even in production environment, I think we
> should
> have a way to address this problem on how to repair a host node
> encountering network down for a while.

Hi Shu, 

first, for the manual fence to work ("confirm host have been rebooted") you 
will need
another host in the cluster which will be used as a proxy and send the actual 
manual fence command.
second, you are absolutely right, loss of network is a common scenario, and we 
should be able 
to recover, but lets try to understand why your host remain unresponsive after 
network returned. 
please ssh to the host and try the following:

- vdsClient -s 0 getVdsCaps (validity check making sure vdsm service is up and 
running and communicate with its network socket from localhost)
- please ping between host and engine 
- please make sure there is no firewall on blocking tcp 54321 (on both host and 
engine)

also, please provide vdsm.log (from the time network issues begun) and 
spm-lock.log (both located on /var/log/vdsm/). 

as for a mitigation, we can always manipulate db and set it correctly, but 
first, lets try the above.

> 
> --
> Shu Ming<[email protected]>
> IBM China Systems and Technology Laboratory
> 
> 
> _______________________________________________
> Users mailing list
> [email protected]
> http://lists.ovirt.org/mailman/listinfo/users
> 
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to