----- Original Message ----- > From: "Shu Ming" <[email protected]> > To: "Haim Ateya" <[email protected]> > Cc: "[email protected]" <[email protected]> > Sent: Tuesday, May 15, 2012 9:03:42 AM > Subject: Re: [Users] The SPM host node is in unresponsive mode > > On 2012-5-15 12:19, Haim Ateya wrote: > > > > ----- Original Message ----- > >> From: "Shu Ming"<[email protected]> > >> To: "[email protected]"<[email protected]> > >> Sent: Tuesday, May 15, 2012 4:56:36 AM > >> Subject: [Users] The SPM host node is in unresponsive mode > >> > >> Hi, > >> I attached one host node in my engine. Because it is the only > >> one > >> node, it is automatically the SPM node. And it used to run well > >> in > >> my > >> engine. Yesterday, some errors happened in the network work of > >> the > >> host > >> node. That made the node become "unresponsive" in the engine. I > >> am > >> sure the network errors are fixed and want to bring the node back > >> to > >> life now. However, I found that the only one node could not be > >> "confirm as host been rebooted" and could not be set into the > >> maintenance mode. The reason given there is no active host in > >> the > >> datacenter and SPM can not enter into maintenance mode. It seems > >> that > >> it fell into a logic loop here. Losting network can be quite > >> common > >> in > >> developing environment even in production environment, I think we > >> should > >> have a way to address this problem on how to repair a host node > >> encountering network down for a while. > > Hi Shu, > > > > first, for the manual fence to work ("confirm host have been > > rebooted") you will need > > another host in the cluster which will be used as a proxy and send > > the actual manual fence command. > > second, you are absolutely right, loss of network is a common > > scenario, and we should be able > > to recover, but lets try to understand why your host remain > > unresponsive after network returned. > > please ssh to the host and try the following: > > > > - vdsClient -s 0 getVdsCaps (validity check making sure vdsm > > service is up and running and communicate with its network socket > > from localhost) > [root@ovirt-node1 ~]# vdsClient -s 0 getVdsCaps > Connection to 9.181.129.110:54321 refused > [root@ovirt-node1 ~]# > > root@ovirt-node1 ~]# ps -ef |grep vdsm > root 1365 1 0 09:37 ? 00:00:00 /usr/sbin/libvirtd > --listen # by vdsm > root 5534 4652 0 13:53 pts/0 00:00:00 grep --color=auto > vdsm > [root@ovirt-node1 ~]# service vdsmd start > Redirecting to /bin/systemctl start vdsmd.service > > root@ovirt-node1 ~]# ps -ef |grep vdsm > root 1365 1 0 09:37 ? 00:00:00 /usr/sbin/libvirtd > --listen # by vdsm > root 5534 4652 0 13:53 pts/0 00:00:00 grep --color=auto > vdsm > > It seems that VDSM process was gone while libvirtd spawned by VDSM > was > there. Then I tried to start the VDSM daemon, however it did > nothing. > After checking the vdsm.log file, the latest message was five hours > ago > and useless. Also, there was no useful message in libvirtd.log.
[HA] problem is systemctl doesn't show real reason why service didn't go, lets try the following: - # cd /lib/systemd/ - # ./systemd-vdsmd restart > > > > - please ping between host and engine > It works in both ways. > > > > - please make sure there is no firewall on blocking tcp 54321 (on > > both host and engine) > > No firewall. > > > > > also, please provide vdsm.log (from the time network issues begun) > > and spm-lock.log (both located on /var/log/vdsm/). > > > > as for a mitigation, we can always manipulate db and set it > > correctly, but first, lets try the above. > Also, there is no useful message in spm-lock.log. The latest message > was 24 hours ago. > > >> -- > >> Shu Ming<[email protected]> > >> IBM China Systems and Technology Laboratory > >> > >> > >> _______________________________________________ > >> Users mailing list > >> [email protected] > >> http://lists.ovirt.org/mailman/listinfo/users > >> > > > -- > Shu Ming<[email protected]> > IBM China Systems and Technology Laboratory > > > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

