The ring id is not restored or saved during a failure of recovery.  This
results in messages delivered incorrectly when recovery fails and there
are messages ordered during recovery.

A majority of times this doesn't matter because no messages need to be
recovered.  On high-loss networks recovery happens often.  One scenario
that occurs is that a gather occurs (triggering a recovery failure)
which causes complete blocking or segfaults to the APIs.

Regards
-steve
Index: exec/totemsrp.c
===================================================================
--- exec/totemsrp.c	(revision 2791)
+++ exec/totemsrp.c	(working copy)
@@ -1390,6 +1390,8 @@
 {
 	if (instance->old_ring_state_saved == 0) {
 		instance->old_ring_state_saved = 1;
+		memcpy (&instance->my_old_ring_id, &instance->my_ring_id,
+			sizeof (struct memb_ring_id));
 		instance->old_ring_state_aru = instance->my_aru;
 		instance->old_ring_state_high_seq_received = instance->my_high_seq_received;
 		log_printf (instance->totemsrp_log_level_debug,
@@ -1401,7 +1403,9 @@
 static void ring_state_restore (struct totemsrp_instance *instance)
 {
 	if (instance->old_ring_state_saved) {
-		totemip_zero_set(&instance->my_ring_id.rep);
+		memcpy (&instance->my_ring_id, &instance->my_old_ring_id,
+			sizeof (struct memb_ring_id));
+
 		instance->my_aru = instance->old_ring_state_aru;
 		instance->my_high_seq_received = instance->old_ring_state_high_seq_received;
 		log_printf (instance->totemsrp_log_level_debug,
@@ -1412,6 +1416,8 @@
 
 static void old_ring_state_reset (struct totemsrp_instance *instance)
 {
+	log_printf (instance->totemsrp_log_level_debug,
+		"Resetting old ring state\n");
 	instance->old_ring_state_saved = 0;
 }
 
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to