On 09.11.2016 18:05, Dhaussy Alexandre wrote:
I have done a cleanup of ressources with  echo "" >
/etc/pve/ha/resources.cfg

It seems to have resolved all problems with inconsistent status of
lrm/lcm in the GUI.


Good. Logs would be interesting to see what went wrong but I do not
know if I can skim through them as your setup is not too small and there
may be much noise from the outage in there.

If you have time you may sent me the log file(s) generated by:

journalctl --since "-2 days" -u corosync -u pve-ha-lrm -u pve-ha-crm -u 
pve-cluster  > pve-log-$(hostname).log

(adapt the "-2 days" accordingly, it understands also something like, "-1 day 3 
hours")

Sent them directly to my address (The list does not accepts bigger attachments,
limit is something like 20-20 kb AFAIK).
I cannot promise any deep examination, but I can skim through them and
look what happened in the HA stack, maybe I see something obvious.

A new master have been elected. The manager_status file have been
cleaned up.
All nodes are idle or active.

I am re-starting all vms in ha with "ha manager add".
Seems to work now... :-/

Le 09/11/2016 à 17:40, Dhaussy Alexandre a écrit :
Sorry my old message was too big...

Thanks for the input !...

I have attached manager_status files.
.old is the original file, and .new is the file i have modified and put
in /etc/pve/ha.

I know this is bad but here's what i've done :

- delnode on known NON-working nodes.
- rm -Rf /etc/pve/nodes/x for all NON-working nodes.
- replace all NON-working nodes with working nodes in
/etc/pve/ha/manager_status
- mv VM.conf files in the proper node directory
(/etc/pve/nodes/x/qemu-server/) in reference to /etc/pve/ha/manager_status
- restart pve-ha-crm and pve-ha-lrm on all nodes

Now on several nodes i have thoses messages :

nov. 09 17:08:19 proxmoxt34 pve-ha-crm[26200]: status change startup =>
wait_for_quorum
nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed:
Noeud final de transport n'est pas connecté
nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed:
Connexion refusée
nov. 09 17:12:04 proxmoxt34 pve-ha-crm[26200]: ipcc_send_rec failed:
Connexion refusée



This means that something with the cluster filesystem (pve-cluster) was not OK.
Those messages weren't there previously?


nov. 09 17:08:22 proxmoxt34 pve-ha-lrm[26282]: status change startup =>
wait_for_agent_lock
nov. 09 17:12:07 proxmoxt34 pve-ha-lrm[26282]: ipcc_send_rec failed:
Noeud final de transport n'est pas connecté

We are also investigating on a possible network problem..


Multicast properly working?


Le 09/11/2016 à 17:00, Thomas Lamprecht a écrit :
Hi,

On 09.11.2016 16:29, Dhaussy Alexandre wrote:
I try to remove from ha in the gui, but nothing happends.
There are some services in "error" or "fence" state.

Now i tried to remove the non-working nodes from the cluster... but i
still see those nodes in /etc/pve/ha/manager_status.
Can you post the manager status please?

Also, is pve-ha-lrm and pve-ha-crm up and running without any error
on all nodes, at least on those in the quorate partition?

check with:
systemctl status pve-ha-lrm
systemctl status pve-ha-crm

If not restart them, and if then its still problematic please post the
output
of the systemctl status call (if its the same on all node one output
should be enough).


Le 09/11/2016 à 16:13, Dietmar Maurer a écrit :
I wanted to remove vms from HA and start the vms locally, but I
can’t even do
that (nothing happens.)
You can remove them from HA by emptying the HA resource file (this
deletes also
comments and group settings, but if you need to start them _now_ that
shouldn't be a problem)

echo "" > /etc/pve/ha/resources.cfg

Afterwards you should be able to start them manually.


How do you do that exactly (on the GUI)? You should be able to start
them
manually afterwards.

_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to