Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

A V Mahesh Tue, 11 Apr 2017 22:40:03 -0700

Hi Hoang,

On 2/10/2017 3:09 PM, Vo Minh Hoang wrote:
> Dear Mahesh,
>
> Based on what I saw, in this case, retention time cannot detect CPND
> temporarily down because its pid changed.
I will check that , I have some test cases based this retention time , 
not sure how were they working.


Can you please provide reproducible steps, I did look at ticket , but 
looks complex ,
if you have any application that reproduces the case please share.

-AVM
>
> If cpnd is temporary down only, we don't need clean up anything.
> If cpnd is permanently down, the bad effect of this proposal is that replica
> is not clean up. But if cpnd permanently down, we have to reboot node for
> recovering so I think this cleanup is not really necessary.
>
> I also checked this implementation with possible test cases and have not
> seen any side effect.
> Please consider it.
>
> Thank you and best regards,
> Hoang
>
> -----Original Message-----
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: Friday, February 10, 2017 10:40 AM
> To: Hoang Vo <hoang.m...@dektech.com.au>; zoran.milinko...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv
> [#1765] V5
>
> Hi Hoang,
>
> The CPD_CPND_DOWN_RETENTION  is to recognize, ether CPND temporarily down or
> permanently down, this is started a CPND is down and based on
> cpd_evt_proc_timer_expiry(), cpd recognize that the CPND is complete down
> and do cleanup, else  cpnd rejoined with in CPD_CPND_DOWN_RETENTION_TIME ,
> the CPD_CPND_DOWN_RETENTION is stoped.
>
> If we stop CPD_CPND_DOWN_RETENTION timer in cpd_process_cpnd_dow(), do cpd
> recognize the CPD permanently down, the cpd_process_cpnd_dow() being called
> in multiple flows, can you please check all the flows, is stopping
> CPD_CPND_DOWN_RETENTION timer has any impact ?
>
> -AVM
>
> On 2/9/2017 1:35 PM, Hoang Vo wrote:
>>    src/ckpt/ckptd/cpd_proc.c |  11 ++++++++++-
>>    1 files changed, 10 insertions(+), 1 deletions(-)
>>
>>
>> problem:
>> In case failover multiple times, the cpnd is down for a moment so
>> there is no cpnd opening specific checkpoint. This lead to retention timer
> is trigger.
>> When cpnd is up again but has different pid so retention timer is not
> stoped.
>> Repica is deleted at retention while its information still be in ckpt
> database.
>> That cause problem
>>
>> Fix:
>> - Stop timer of removed node.
>> - Update data in patricia trees (for retention value consistence).
>>
>> diff --git a/src/ckpt/ckptd/cpd_proc.c b/src/ckpt/ckptd/cpd_proc.c
>> --- a/src/ckpt/ckptd/cpd_proc.c
>> +++ b/src/ckpt/ckptd/cpd_proc.c
>> @@ -679,7 +679,8 @@ uint32_t cpd_process_cpnd_down(CPD_CB *c
>>      cpd_cpnd_info_node_find_add(&cb->cpnd_tree, cpnd_dest, &cpnd_info,
> &add_flag);
>>      if (!cpnd_info)
>>              return NCSCC_RC_SUCCESS;
>> -
>> +    /* Stop timer before processing down */
>> +    cpd_tmr_stop(&cpnd_info->cpnd_ret_timer);
>>      cref_info = cpnd_info->ckpt_ref_list;
>>    
>>      while (cref_info) {
>> @@ -989,6 +990,14 @@ uint32_t cpd_proc_retention_set(CPD_CB *
>>    
>>      /* Update the retention Time */
>>      (*ckpt_node)->ret_time = reten_time;
>> +    (*ckpt_node)->attributes.retentionDuration = reten_time;
>> +
>> +    /* Update the related patricia tree */
>> +    CPD_CKPT_MAP_INFO *map_info = NULL;
>> +    cpd_ckpt_map_node_get(&cb->ckpt_map_tree, (*ckpt_node)->ckpt_name,
> &map_info);
>> +    if (map_info) {
>> +            map_info->attributes.retentionDuration = reten_time;
>> +    }
>>      return rc;
>>    }
>>    
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

Reply via email to