On 03/08/2017 10:40 AM, Daniel wrote:
Hi there,

one College remove one server from the datacenter and after that the whole 
cluster is broken:

Did this server act as a multicast querier? Could explain the behavior.

Check if your switch has setup IGMP snooping, if yes you could disable it temporarily to see if that fixes the problem (may have a performance impact on the whole network as multicast messages get delivered to all network members).

You may also try to enable a querier on one node:

# echo 1 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier


Mar  8 10:35:00 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:00 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:00 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:00 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:01 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 230
Mar  8 10:35:01 host01 snmpd[1441]: Connection from UDP: 
[10.0.2.50]:40800->[10.0.2.110]:161
Mar  8 10:35:01 host01 snmpd[1441]: Connection from UDP: 
[10.0.2.50]:55768->[10.0.2.110]:161
Mar  8 10:35:02 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 240
Mar  8 10:35:03 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 250
Mar  8 10:35:04 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 260
Mar  8 10:35:05 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 270
Mar  8 10:35:06 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 280
Mar  8 10:35:07 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 290
Mar  8 10:35:08 host01 /usr/share/filebeat/bin/filebeat[20736]: logp.go:230: 
Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=6 
libbeat.logstash.publish.write_bytes=4907 libbeat.publisher.published_events=76 
libbeat.logstash.published_and_acked_events=76 publish.events=76 
libbeat.logstash.publish.read_bytes=222 registrar.states.update=76 
registrar.writes=6
Mar  8 10:35:08 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 300
Mar  8 10:35:09 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 310
Mar  8 10:35:10 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 320
Mar  8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar  8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused

So /etc/pve/ is not mounted anymore and I cant restart anythink.
Anyone have an idee what can happen?

Whats your corosync and pve-cluster status?
systemctl status corosync pve-cluster

Looks like corosync is dead/broken and does not let our cluster filesystem join.

cheers and good luck,
Thomas

_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to