Angus Salkeld napsal(a):
> On Wed, Aug 17, 2011 at 01:19:53PM +1200, Tim Beale wrote:
>> Hi,
>>
>> I'm resending this patch in a separate thread because I think this part of 
>> the
>> cluster formation problems I'm seeing has been overlooked. The patch attached
>> is one way of addressing the problem, but I'm open to alternatives.
>>
>> Basically the problem is that if the cluster experiences formation problems,
>> then CPG can sometimes choose a downlist that includes the local node. When
>> it processes the node leave event for itself it sets its cpd state to
>> CPD_STATE_UNJOINED and clears the cpd->group_name. This means CPG events are 
>> no
>> longer sent to the CPG client, because the cpd->group_name no longer matches.
>>
>> This patch avoids the problem by only clearing the group_name if cpg_leave() 
>> is
>> called and not when processing a downlist leave event. I'm not 100% sure 
>> about
>> the case where the CPG client exits unexpectedly (in which case the reason is
>> also CONFCHG_CPG_REASON_PROCDOWN), but I figure the cpd info gets cleaned up
>> immediately on the local node if this happens.
>>
> 
> Tim, this seems reasonable to me. But it would be good to get Honza to
> review this as he wrote it.

Actually, git blame says that Steve wrote it, but change seems to be 
reasonable for me too.

> 
> -Angus
> 
>> Regards,
>> Tim
>>
>> ---
>>
>>  services/cpg.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/services/cpg.c b/services/cpg.c
>> index 8e71dcf..c66037b 100644
>> --- a/services/cpg.c
>> +++ b/services/cpg.c
>> @@ -683,7 +683,8 @@ static int notify_lib_joinlist(
>>                              }
>>                              if (left_list_entries) {
>>                                      if (left_list[0].pid == cpd->pid &&
>> -                                            left_list[0].nodeid == 
>> api->totem_nodeid_get()) {
>> +                                            left_list[0].nodeid == 
>> api->totem_nodeid_get() &&
>> +                                            left_list[0].reason == 
>> CONFCHG_CPG_REASON_LEAVE) {
>>
>>                                              cpd->pid = 0;
>>                                              memset (&cpd->group_name, 0, 
>> sizeof(cpd->group_name));
> 
>> From: Tim Beale <[email protected]>
>>
>> A CPG client can sometimes lockup if the local node is in the downlist
>>
>> In a 10-node cluster where all nodes are booting up and starting corosync
>> at the same time, sometimes during this process corosync detects a node as
>> leaving and rejoining the cluster.
>>
>> Occasionally the downlist that gets picked contains the local node. When the
>> local node sends leave events for the downlist (including itself), it sets
>> its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
>> means it no longer sends CPG events to the CPG client.
>>
>> ---
>>
>>  services/cpg.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/services/cpg.c b/services/cpg.c
>> index 8e71dcf..c66037b 100644
>> --- a/services/cpg.c
>> +++ b/services/cpg.c
>> @@ -683,7 +683,8 @@ static int notify_lib_joinlist(
>>                              }
>>                              if (left_list_entries) {
>>                                      if (left_list[0].pid == cpd->pid &&
>> -                                            left_list[0].nodeid == 
>> api->totem_nodeid_get()) {
>> +                                            left_list[0].nodeid == 
>> api->totem_nodeid_get() &&
>> +                                            left_list[0].reason == 
>> CONFCHG_CPG_REASON_LEAVE) {
>>  
>>                                              cpd->pid = 0;
>>                                              memset (&cpd->group_name, 0, 
>> sizeof(cpd->group_name));
> 
>> _______________________________________________
>> Openais mailing list
>> [email protected]
>> https://lists.linux-foundation.org/mailman/listinfo/openais
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to