I have done a cleanup of ressources with echo "" > /etc/pve/ha/resources.cfg
It seems to have resolved all problems with inconsistent status of lrm/lcm in the GUI. A new master have been elected. The manager_status file have been cleaned up. All nodes are idle or active. I am re-starting all vms in ha with "ha manager add". Seems to work now... :-/ Le 09/11/2016 à 17:40, Dhaussy Alexandre a écrit : > Sorry my old message was too big... > > Thanks for the input !... > > I have attached manager_status files. > .old is the original file, and .new is the file i have modified and put > in /etc/pve/ha. > > I know this is bad but here's what i've done : > > - delnode on known NON-working nodes. > - rm -Rf /etc/pve/nodes/x for all NON-working nodes. > - replace all NON-working nodes with working nodes in > /etc/pve/ha/manager_status > - mv VM.conf files in the proper node directory > (/etc/pve/nodes/x/qemu-server/) in reference to /etc/pve/ha/manager_status > - restart pve-ha-crm and pve-ha-lrm on all nodes > > Now on several nodes i have thoses messages : > > nov. 09 17:08:19 proxmoxt34 pve-ha-crm[26200]: status change startup => > wait_for_quorum > nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed: > Noeud final de transport n'est pas connecté > nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed: > Connexion refusée > nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed: > Connexion refusée > > nov. 09 17:08:22 proxmoxt34 pve-ha-lrm[26282]: status change startup => > wait_for_agent_lock > nov. 09 17:12:07 proxmoxt34 pve-ha-lrm[26282]: ipcc_send_rec failed: > Noeud final de transport n'est pas connecté > > We are also investigating on a possible network problem.. > > Le 09/11/2016 à 17:00, Thomas Lamprecht a écrit : >> Hi, >> >> On 09.11.2016 16:29, Dhaussy Alexandre wrote: >>> I try to remove from ha in the gui, but nothing happends. >>> There are some services in "error" or "fence" state. >>> >>> Now i tried to remove the non-working nodes from the cluster... but i >>> still see those nodes in /etc/pve/ha/manager_status. >> Can you post the manager status please? >> >> Also, is pve-ha-lrm and pve-ha-crm up and running without any error >> on all nodes, at least on those in the quorate partition? >> >> check with: >> systemctl status pve-ha-lrm >> systemctl status pve-ha-crm >> >> If not restart them, and if then its still problematic please post the >> output >> of the systemctl status call (if its the same on all node one output >> should be enough). >> >> >>> Le 09/11/2016 à 16:13, Dietmar Maurer a écrit : >>>>> I wanted to remove vms from HA and start the vms locally, but I >>>>> can’t even do >>>>> that (nothing happens.) >> You can remove them from HA by emptying the HA resource file (this >> deletes also >> comments and group settings, but if you need to start them _now_ that >> shouldn't be a problem) >> >> echo "" > /etc/pve/ha/resources.cfg >> >> Afterwards you should be able to start them manually. >> >> >>>> How do you do that exactly (on the GUI)? You should be able to start >>>> them >>>> manually afterwards. >>>> >>> _______________________________________________ >>> pve-user mailing list >>> [email protected] >>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user >>> >> _______________________________________________ >> pve-user mailing list >> [email protected] >> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > [email protected] > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
