Hi all,

In a 3-node cluster, we're experiencing a strange clustering problem.

Sometimes, the first node drops out of quorum, usually for some hours, only to return back to quorum later.

During the last 2 weeks, this has happened 7 times.

Additionally, one time the second and third node dropped out of quorum, and soon after first and third node reached quorum. Second node rejoined after a manual restart of pve-cluster.

The strange thing (at least for me) is that 2nd and 3rd node have lost rrd data around the times 1st node was out (no graphics at GUI for those hours). 1st node has all rrd data, graphics are complete.

I understand that we could have a network problem (we're trying to catch the problem live again for additional tests...), but why is rrd data missing on cluster-joined nodes? Any idea?


Servers:
node1 - 1xE3-1240v6 4c8t - 64GB RAM - 1x10G for VM+cluster, 2x1G for storage
node2 - 2xE5507 4c            - 96GB RAM - 2x1G for VM + cluster, 2x1G for storage node3 - 2xE5507 4c            - 96GB RAM - 2x1G for VM + cluster, 2x1G for storage

VM storage is EMC VNXe3200
Switch is HP 5406zl with 5 switch-modules.
- Node1 is connected to module E (8x10G),
- node2 and node3 are connected to module A (24x1G).
Storage switches(2) are Cisco Catalyst 2960G

Nodes have plenty of free RAM (usage below 50%), use less than 10-20% max network, CPU mean use is below 20%)

(for all three nodes)
# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1


Thanks a lot
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to