On Sun, 2010-02-21 at 21:59 -0700, hj lee wrote:
> Hi,
>
> I am seeing this message time to time in the log. Does this measure
> the pause time of corosyns correctly? When the corosync is scheduled
> back, how is memb_join message processed before pause_timer expires?
> The pause_timer can expire before memb_join message, then it can not
> measure the time of corosync descheduled.
>
HJ,
I have not seen any process pause detected messages with token=1000 at
32 node count. the pause_timer should expire every token/5, which
resets the pause_timestamp indicating when corosync was last scheduled.
The way coropoll works though, is to schedule timers after executing
delivery of all the UDP messages. If it takes token/2 time to process
all those udp messages, it is possible the timer that resets the
pause_timestamp reset is being caught behind a bunch of messages
processed by the poll loop.
Could you try the attached patch. It resets the pause timestamp on
receipt of the various message events that occur to prevent this
theoretical condition.
Regards
-steve
> Thanks
> hj
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais
Index: exec/totemsrp.c
===================================================================
--- exec/totemsrp.c (revision 2662)
+++ exec/totemsrp.c (working copy)
@@ -3427,6 +3427,7 @@
cancel_heartbeat_timeout(instance);
}
+ timer_function_pause_timeout (instance);
return (0); /* discard token */
}
@@ -3451,6 +3452,7 @@
cancel_heartbeat_timeout(instance);
}
+ timer_function_pause_timeout (instance);
return (0); /* discard token */
}
@@ -3605,6 +3607,7 @@
cancel_heartbeat_timeout(instance);
}
+ timer_function_pause_timeout (instance);
return (0);
}
@@ -3775,6 +3778,7 @@
memb_set_merge (&mcast_header.system_from, 1,
instance->my_proc_list, &instance->my_proc_list_entries);
memb_state_gather_enter (instance, 8);
+ timer_function_pause_timeout (instance);
return (0);
}
break;
@@ -3789,6 +3793,7 @@
instance->stats.rx_msg_dropped++;
break;
}
+ timer_function_pause_timeout (instance);
return (0);
}
@@ -3831,6 +3836,7 @@
}
/* TODO remove from retrans message queue for old ring in recovery state */
+ timer_function_pause_timeout (instance);
return (0);
}
@@ -3856,6 +3862,7 @@
if (memcmp (&instance->my_ring_id, &memb_merge_detect.ring_id,
sizeof (struct memb_ring_id)) == 0) {
+ timer_function_pause_timeout (instance);
return (0);
}
@@ -3891,6 +3898,8 @@
/* do nothing in recovery */
break;
}
+
+ timer_function_pause_timeout (instance);
return (0);
}
@@ -4161,6 +4170,7 @@
}
break;
}
+ timer_function_pause_timeout (instance);
return (0);
}
@@ -4242,6 +4252,8 @@
}
break;
}
+
+ timer_function_pause_timeout (instance);
return (0);
}
@@ -4261,6 +4273,8 @@
timer_function_token_retransmit_timeout (instance);
}
}
+
+ timer_function_pause_timeout (instance);
return (0);
}
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais