Hi Hoang, ACK with following : ( tested basic ND restarts)
- The below errors are not related this patch, those are test case related - It look their a existing issue ( not related to this patch ) on Cpnd down the STANDBY Cpd is also starting `cpd_tmr_start(&node_info->cpnd_ret_timer,..);` please check that flow once (after cpnd restart keep some sleep Actvie CPD and do a switch over ) - You introduced cpd_tmr_stop(&cpnd_info->cpnd_ret_timer); in cpnd_down_process() but cpnd_up_process() do call `cpd_tmr_stop(&cpnd_info->cpnd_ret_timer);` do check that it may be redundant call . -AVM On 4/12/2017 2:19 PM, A V Mahesh wrote: > Hi Hoang, > > On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: >> If cpnd is temporary down only, we don't need clean up anything. >> If cpnd is permanently down, the bad effect of this proposal is that >> replica >> is not clean up. But if cpnd permanently down, we have to reboot node >> for >> recovering so I think this cleanup is not really necessary. >> >> I also checked this implementation with possible test cases and have not >> seen any side effect. >> Please consider it > We are observing new node_user_info databases mismatch Errors, while > testing multiple CPND restart > with this patch,I will do more debugging and update the root cause. > > =========================================================================================================================== > > > > Apr 12 14:06:57 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0 > *Apr 12 14:06:58 SC-1 osafckptd[27594]: ER > cpd_proc_decrease_node_user_info failed - no user on node id 0x2020F* > Apr 12 14:06:58 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f0501750, arg=0x7f86f0501ef0 > *Apr 12 14:06:59 SC-1 osafckptd[27594]: ER > cpd_proc_decrease_node_user_info failed - no user on node id 0x2020F* > Apr 12 14:06:59 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f0503ab0, arg=0x7f86f0501ef0 > Apr 12 14:07:00 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f0500c70, arg=0x7f86f0501ef0 > Apr 12 14:07:01 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f0500930, arg=0x7f86f0501ef0 > *Apr 12 14:07:03 SC-1 osafckptd[27594]: ER > cpd_proc_decrease_node_user_info failed - no user on node id 0x2020*F > Apr 12 14:07:03 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f04fe3a0, arg=0x7f86f0501ef0 > Apr 12 14:07:04 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start > CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0 > > =========================================================================================================================== > > > > -AVM > > > On 4/12/2017 11:08 AM, A V Mahesh wrote: >> Hi Hoang, >> >> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: >>> Dear Mahesh, >>> >>> Based on what I saw, in this case, retention time cannot detect CPND >>> temporarily down because its pid changed. >> I will check that , I have some test cases based this retention time >> , not sure how were they working. >> >> Can you please provide reproducible steps, I did look at ticket , but >> looks complex , >> if you have any application that reproduces the case please share. >> >> -AVM >>> >>> If cpnd is temporary down only, we don't need clean up anything. >>> If cpnd is permanently down, the bad effect of this proposal is that >>> replica >>> is not clean up. But if cpnd permanently down, we have to reboot >>> node for >>> recovering so I think this cleanup is not really necessary. >>> >>> I also checked this implementation with possible test cases and have >>> not >>> seen any side effect. >>> Please consider it. >>> >>> Thank you and best regards, >>> Hoang >>> >>> -----Original Message----- >>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>> Sent: Friday, February 10, 2017 10:40 AM >>> To: Hoang Vo <hoang.m...@dektech.com.au>; zoran.milinko...@ericsson.com >>> Cc: opensaf-devel@lists.sourceforge.net >>> Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv >>> [#1765] V5 >>> >>> Hi Hoang, >>> >>> The CPD_CPND_DOWN_RETENTION is to recognize, ether CPND temporarily >>> down or >>> permanently down, this is started a CPND is down and based on >>> cpd_evt_proc_timer_expiry(), cpd recognize that the CPND is complete >>> down >>> and do cleanup, else cpnd rejoined with in >>> CPD_CPND_DOWN_RETENTION_TIME , >>> the CPD_CPND_DOWN_RETENTION is stoped. >>> >>> If we stop CPD_CPND_DOWN_RETENTION timer in cpd_process_cpnd_dow(), >>> do cpd >>> recognize the CPD permanently down, the cpd_process_cpnd_dow() being >>> called >>> in multiple flows, can you please check all the flows, is stopping >>> CPD_CPND_DOWN_RETENTION timer has any impact ? >>> >>> -AVM >>> >>> On 2/9/2017 1:35 PM, Hoang Vo wrote: >>>> src/ckpt/ckptd/cpd_proc.c | 11 ++++++++++- >>>> 1 files changed, 10 insertions(+), 1 deletions(-) >>>> >>>> >>>> problem: >>>> In case failover multiple times, the cpnd is down for a moment so >>>> there is no cpnd opening specific checkpoint. This lead to >>>> retention timer >>> is trigger. >>>> When cpnd is up again but has different pid so retention timer is not >>> stoped. >>>> Repica is deleted at retention while its information still be in ckpt >>> database. >>>> That cause problem >>>> >>>> Fix: >>>> - Stop timer of removed node. >>>> - Update data in patricia trees (for retention value consistence). >>>> >>>> diff --git a/src/ckpt/ckptd/cpd_proc.c b/src/ckpt/ckptd/cpd_proc.c >>>> --- a/src/ckpt/ckptd/cpd_proc.c >>>> +++ b/src/ckpt/ckptd/cpd_proc.c >>>> @@ -679,7 +679,8 @@ uint32_t cpd_process_cpnd_down(CPD_CB *c >>>> cpd_cpnd_info_node_find_add(&cb->cpnd_tree, cpnd_dest, >>>> &cpnd_info, >>> &add_flag); >>>> if (!cpnd_info) >>>> return NCSCC_RC_SUCCESS; >>>> - >>>> + /* Stop timer before processing down */ >>>> + cpd_tmr_stop(&cpnd_info->cpnd_ret_timer); >>>> cref_info = cpnd_info->ckpt_ref_list; >>>> while (cref_info) { >>>> @@ -989,6 +990,14 @@ uint32_t cpd_proc_retention_set(CPD_CB * >>>> /* Update the retention Time */ >>>> (*ckpt_node)->ret_time = reten_time; >>>> + (*ckpt_node)->attributes.retentionDuration = reten_time; >>>> + >>>> + /* Update the related patricia tree */ >>>> + CPD_CKPT_MAP_INFO *map_info = NULL; >>>> + cpd_ckpt_map_node_get(&cb->ckpt_map_tree, >>>> (*ckpt_node)->ckpt_name, >>> &map_info); >>>> + if (map_info) { >>>> + map_info->attributes.retentionDuration = reten_time; >>>> + } >>>> return rc; >>>> } >>> >> > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel