>>It appeared after the network team changed network active equipments in >>the building (but this might not be the origin of the problem).
Hi, what is the previous and new equipments ? (I have had some cisco problem in the past). ----- Mail original ----- De: "Jonathan Schaeffer" <[email protected]> À: [email protected] Envoyé: Lundi 8 Juillet 2013 12:56:48 Objet: [PVE-User] Cman crash problem Hi all, I'm experiencing a serious problem on our 4 nodes cluster (PVE 3.0). It appeared after the network team changed network active equipments in the building (but this might not be the origin of the problem). The symptoms are : - The nodes appear in red on the web gui, except the one hosting the web service IP - The VM, while still running correctly, do not show any information (running, rrd graphs, etc) - clustat shows nodes as "online" - some nodes seems to have been fenced (while not restarted) (see log extracts : barbossa_fenced.log and jim_fenced.log) - /var/log/cluster/corosync.log shows LOT of messages : Jul 08 07:06:49 corosync [TOTEM ] Retransmit List: 13f54a 13f54b 13f54c 13f54d 13f54e 13f54f 13f550 13f551 13f552 13f553 13f554 13f555 13f556 13f557 13f558 13f559 13f55a 13f55b 13f55c 13f55d 13f55e If I restart one node, the fencing is going to happen, other nodes will reboot and all the VMs hosted allong with them. I don't want this to happen. I can provide more logs if necessary. Do you have an idea to help me understand what is going on here ? Thanks, Jonathan barbossa_fenced.log : Jul 03 12:07:21 fenced fencing deferred to jim Jul 03 13:45:40 fenced receive_start 1:15 add node with started_count 8 Jul 03 13:45:40 fenced receive_start 2:11 add node with started_count 4 Jul 03 13:45:40 fenced receive_start 3:7 add node with started_count 1 Jul 04 00:29:35 fenced receive_start 1:16 add node with started_count 8 Jul 04 00:29:35 fenced receive_start 3:8 add node with started_count 1 Jul 04 00:38:31 fenced receive_start 2:17 add node with started_count 4 Jul 04 00:38:31 fenced receive_start 3:13 add node with started_count 1 Jul 04 00:38:31 fenced receive_start 1:21 add node with started_count 8 Jul 04 10:44:12 fenced receive_start 1:22 add node with started_count 8 Jul 04 10:44:12 fenced receive_start 3:14 add node with started_count 1 Jul 04 10:44:24 fenced receive_start 1:23 add node with started_count 8 Jul 04 10:44:24 fenced telling cman to remove nodeid 2 from cluster jim_fenced.log : Jul 03 12:07:21 fenced fencing node longjohn Jul 03 12:07:32 fenced fence longjohn success Jul 03 13:45:40 fenced receive_start 5:13 add node with started_count 6 Jul 03 13:45:40 fenced receive_start 2:11 add node with started_count 4 Jul 03 13:45:40 fenced receive_start 3:7 add node with started_count 1 Jul 04 00:29:35 fenced receive_start 3:8 add node with started_count 1 Jul 04 00:29:35 fenced receive_start 5:14 add node with started_count 6 Jul 04 00:38:31 fenced receive_start 2:17 add node with started_count 4 Jul 04 00:38:31 fenced receive_start 3:13 add node with started_count 1 Jul 04 00:38:31 fenced receive_start 5:19 add node with started_count 6 Jul 04 10:44:12 fenced receive_start 5:20 add node with started_count 6 Jul 04 10:44:12 fenced receive_start 3:14 add node with started_count 1 Jul 04 10:44:24 fenced telling cman to remove nodeid 2 from cluster Jul 04 10:44:24 fenced receive_start 2:23 add node with started_count 4 Jul 04 10:44:24 fenced receive_start 3:15 add node with started_count 1 Jul 04 10:44:24 fenced receive_start 5:21 add node with started_count 6 Jul 04 10:44:46 fenced receive_start 5:22 add node with started_count 6 Jul 04 10:44:46 fenced receive_start 3:16 add node with started_count 1 longjohn_fenced.log : Jul 03 09:47:12 fenced fenced 1352871249 started Jul 03 11:28:46 fenced cluster is down, exiting Jul 03 11:28:46 fenced daemon cpg_dispatch error 2 Jul 03 12:11:43 fenced fenced 1364188437 started Jul 03 13:45:40 fenced receive_start 5:13 add node with started_count 6 Jul 03 13:45:40 fenced receive_start 1:15 add node with started_count 8 Jul 03 13:45:40 fenced receive_start 2:11 add node with started_count 4 Jul 04 00:29:35 fenced receive_start 1:16 add node with started_count 8 Jul 04 00:29:35 fenced receive_start 5:14 add node with started_count 6 Jul 04 00:38:31 fenced receive_start 2:17 add node with started_count 4 Jul 04 00:38:31 fenced receive_start 1:21 add node with started_count 8 Jul 04 00:38:31 fenced receive_start 5:19 add node with started_count 6 Jul 04 10:44:12 fenced receive_start 1:22 add node with started_count 8 Jul 04 10:44:12 fenced receive_start 5:20 add node with started_count 6 Jul 04 10:44:24 fenced receive_start 1:23 add node with started_count 8 Jul 04 10:44:24 fenced telling cman to remove nodeid 2 from cluster Jul 04 10:44:24 fenced receive_start 2:23 add node with started_count 4 Jul 04 10:44:24 fenced receive_start 5:21 add node with started_count 6 Jul 04 10:44:46 fenced receive_start 5:22 add node with started_count 6 Jul 04 10:44:46 fenced receive_start 1:24 add node with started_count 8 flint_fenced.log : Jul 03 11:18:30 fenced fenced 1364188437 started Jul 03 12:07:21 fenced fencing deferred to jim Jul 03 13:45:40 fenced receive_start 5:13 add node with started_count 6 Jul 03 13:45:40 fenced receive_start 1:15 add node with started_count 8 Jul 03 13:45:40 fenced receive_start 3:7 add node with started_count 1 Jul 04 00:38:31 fenced receive_start 3:13 add node with started_count 1 Jul 04 00:38:31 fenced receive_start 1:21 add node with started_count 8 Jul 04 00:38:31 fenced receive_start 5:19 add node with started_count 6 Jul 04 10:44:24 fenced receive_start 1:23 add node with started_count 8 Jul 04 10:44:24 fenced receive_start 3:15 add node with started_count 1 Jul 04 10:44:24 fenced receive_start 5:21 add node with started_count 6 Jul 04 10:44:24 fenced cluster is down, exiting -- IUEM - Service Informatique rue Dumont D'Urville Technopôle Brest-Iroise 29280 Plouzané France http://www-iuem.univ-brest.fr/feiri tel: +33 2 98 49 87 94 _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
