On 03/08/2017 10:40 AM, Daniel wrote:
Hi there,
one College remove one server from the datacenter and after that the whole
cluster is broken:
Did this server act as a multicast querier? Could explain the behavior.
Check if your switch has setup IGMP snooping, if yes you could disable
it temporarily to see if that fixes the problem (may have a performance
impact on the whole network as multicast messages get delivered to all
network members).
You may also try to enable a querier on one node:
# echo 1 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier
Mar 8 10:35:00 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:00 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:00 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:00 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:01 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 230
Mar 8 10:35:01 host01 snmpd[1441]: Connection from UDP:
[10.0.2.50]:40800->[10.0.2.110]:161
Mar 8 10:35:01 host01 snmpd[1441]: Connection from UDP:
[10.0.2.50]:55768->[10.0.2.110]:161
Mar 8 10:35:02 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 240
Mar 8 10:35:03 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 250
Mar 8 10:35:04 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 260
Mar 8 10:35:05 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 270
Mar 8 10:35:06 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 280
Mar 8 10:35:07 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 290
Mar 8 10:35:08 host01 /usr/share/filebeat/bin/filebeat[20736]: logp.go:230:
Non-zero metrics in the last 30s: libbeat.logstash.call_count.PublishEvents=6
libbeat.logstash.publish.write_bytes=4907 libbeat.publisher.published_events=76
libbeat.logstash.published_and_acked_events=76 publish.events=76
libbeat.logstash.publish.read_bytes=222 registrar.states.update=76
registrar.writes=6
Mar 8 10:35:08 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 300
Mar 8 10:35:09 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 310
Mar 8 10:35:10 host01 pmxcfs[7399]: [dcdb] notice: cpg_join retry 320
Mar 8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
Mar 8 10:35:10 host01 pvestatd[2090]: ipcc_send_rec failed: Connection refused
So /etc/pve/ is not mounted anymore and I cant restart anythink.
Anyone have an idee what can happen?
Whats your corosync and pve-cluster status?
systemctl status corosync pve-cluster
Looks like corosync is dead/broken and does not let our cluster
filesystem join.
cheers and good luck,
Thomas
_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user