Hi Hoang, On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: > Dear Mahesh, > > Based on what I saw, in this case, retention time cannot detect CPND > temporarily down because its pid changed. I will check that , I have some test cases based this retention time , not sure how were they working.
Can you please provide reproducible steps, I did look at ticket , but looks complex , if you have any application that reproduces the case please share. -AVM > > If cpnd is temporary down only, we don't need clean up anything. > If cpnd is permanently down, the bad effect of this proposal is that replica > is not clean up. But if cpnd permanently down, we have to reboot node for > recovering so I think this cleanup is not really necessary. > > I also checked this implementation with possible test cases and have not > seen any side effect. > Please consider it. > > Thank you and best regards, > Hoang > > -----Original Message----- > From: A V Mahesh [mailto:mahesh.va...@oracle.com] > Sent: Friday, February 10, 2017 10:40 AM > To: Hoang Vo <hoang.m...@dektech.com.au>; zoran.milinko...@ericsson.com > Cc: opensaf-devel@lists.sourceforge.net > Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv > [#1765] V5 > > Hi Hoang, > > The CPD_CPND_DOWN_RETENTION is to recognize, ether CPND temporarily down or > permanently down, this is started a CPND is down and based on > cpd_evt_proc_timer_expiry(), cpd recognize that the CPND is complete down > and do cleanup, else cpnd rejoined with in CPD_CPND_DOWN_RETENTION_TIME , > the CPD_CPND_DOWN_RETENTION is stoped. > > If we stop CPD_CPND_DOWN_RETENTION timer in cpd_process_cpnd_dow(), do cpd > recognize the CPD permanently down, the cpd_process_cpnd_dow() being called > in multiple flows, can you please check all the flows, is stopping > CPD_CPND_DOWN_RETENTION timer has any impact ? > > -AVM > > On 2/9/2017 1:35 PM, Hoang Vo wrote: >> src/ckpt/ckptd/cpd_proc.c | 11 ++++++++++- >> 1 files changed, 10 insertions(+), 1 deletions(-) >> >> >> problem: >> In case failover multiple times, the cpnd is down for a moment so >> there is no cpnd opening specific checkpoint. This lead to retention timer > is trigger. >> When cpnd is up again but has different pid so retention timer is not > stoped. >> Repica is deleted at retention while its information still be in ckpt > database. >> That cause problem >> >> Fix: >> - Stop timer of removed node. >> - Update data in patricia trees (for retention value consistence). >> >> diff --git a/src/ckpt/ckptd/cpd_proc.c b/src/ckpt/ckptd/cpd_proc.c >> --- a/src/ckpt/ckptd/cpd_proc.c >> +++ b/src/ckpt/ckptd/cpd_proc.c >> @@ -679,7 +679,8 @@ uint32_t cpd_process_cpnd_down(CPD_CB *c >> cpd_cpnd_info_node_find_add(&cb->cpnd_tree, cpnd_dest, &cpnd_info, > &add_flag); >> if (!cpnd_info) >> return NCSCC_RC_SUCCESS; >> - >> + /* Stop timer before processing down */ >> + cpd_tmr_stop(&cpnd_info->cpnd_ret_timer); >> cref_info = cpnd_info->ckpt_ref_list; >> >> while (cref_info) { >> @@ -989,6 +990,14 @@ uint32_t cpd_proc_retention_set(CPD_CB * >> >> /* Update the retention Time */ >> (*ckpt_node)->ret_time = reten_time; >> + (*ckpt_node)->attributes.retentionDuration = reten_time; >> + >> + /* Update the related patricia tree */ >> + CPD_CKPT_MAP_INFO *map_info = NULL; >> + cpd_ckpt_map_node_get(&cb->ckpt_map_tree, (*ckpt_node)->ckpt_name, > &map_info); >> + if (map_info) { >> + map_info->attributes.retentionDuration = reten_time; >> + } >> return rc; >> } >> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel