The ring id is not restored or saved during a failure of recovery. This
results in messages delivered incorrectly when recovery fails and there
are messages ordered during recovery.
A majority of times this doesn't matter because no messages need to be
recovered. On high-loss networks recovery happens often. One scenario
that occurs is that a gather occurs (triggering a recovery failure)
which causes complete blocking or segfaults to the APIs.
Regards
-steve
Index: exec/totemsrp.c
===================================================================
--- exec/totemsrp.c (revision 2791)
+++ exec/totemsrp.c (working copy)
@@ -1390,6 +1390,8 @@
{
if (instance->old_ring_state_saved == 0) {
instance->old_ring_state_saved = 1;
+ memcpy (&instance->my_old_ring_id, &instance->my_ring_id,
+ sizeof (struct memb_ring_id));
instance->old_ring_state_aru = instance->my_aru;
instance->old_ring_state_high_seq_received = instance->my_high_seq_received;
log_printf (instance->totemsrp_log_level_debug,
@@ -1401,7 +1403,9 @@
static void ring_state_restore (struct totemsrp_instance *instance)
{
if (instance->old_ring_state_saved) {
- totemip_zero_set(&instance->my_ring_id.rep);
+ memcpy (&instance->my_ring_id, &instance->my_old_ring_id,
+ sizeof (struct memb_ring_id));
+
instance->my_aru = instance->old_ring_state_aru;
instance->my_high_seq_received = instance->old_ring_state_high_seq_received;
log_printf (instance->totemsrp_log_level_debug,
@@ -1412,6 +1416,8 @@
static void old_ring_state_reset (struct totemsrp_instance *instance)
{
+ log_printf (instance->totemsrp_log_level_debug,
+ "Resetting old ring state\n");
instance->old_ring_state_saved = 0;
}
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais