Hi Hoang,
ACK with following : ( tested basic ND restarts)
- The below errors are not related this patch, those are test case related
- It look their a existing issue ( not related to this patch ) on Cpnd
down the STANDBY Cpd is
also starting `cpd_tmr_start(&node_info->cpnd_ret_timer,..);` please
check that flow once
(after cpnd restart keep some sleep Actvie CPD and do a switch over )
- You introduced cpd_tmr_stop(&cpnd_info->cpnd_ret_timer); in
cpnd_down_process()
but cpnd_up_process() do call
`cpd_tmr_stop(&cpnd_info->cpnd_ret_timer);`
do check that it may be redundant call .
-AVM
On 4/12/2017 2:19 PM, A V Mahesh wrote:
> Hi Hoang,
>
> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote:
>> If cpnd is temporary down only, we don't need clean up anything.
>> If cpnd is permanently down, the bad effect of this proposal is that
>> replica
>> is not clean up. But if cpnd permanently down, we have to reboot node
>> for
>> recovering so I think this cleanup is not really necessary.
>>
>> I also checked this implementation with possible test cases and have not
>> seen any side effect.
>> Please consider it
> We are observing new node_user_info databases mismatch Errors, while
> testing multiple CPND restart
> with this patch,I will do more debugging and update the root cause.
>
> ===========================================================================================================================
>
>
>
> Apr 12 14:06:57 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start
> CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0
> *Apr 12 14:06:58 SC-1 osafckptd[27594]: ER
> cpd_proc_decrease_node_user_info failed - no user on node id 0x2020F*
> Apr 12 14:06:58 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start
> CPND_RETENTION timer id = 0x7f86f0501750, arg=0x7f86f0501ef0
> *Apr 12 14:06:59 SC-1 osafckptd[27594]: ER
> cpd_proc_decrease_node_user_info failed - no user on node id 0x2020F*
> Apr 12 14:06:59 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start
> CPND_RETENTION timer id = 0x7f86f0503ab0, arg=0x7f86f0501ef0
> Apr 12 14:07:00 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start
> CPND_RETENTION timer id = 0x7f86f0500c70, arg=0x7f86f0501ef0
> Apr 12 14:07:01 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start
> CPND_RETENTION timer id = 0x7f86f0500930, arg=0x7f86f0501ef0
> *Apr 12 14:07:03 SC-1 osafckptd[27594]: ER
> cpd_proc_decrease_node_user_info failed - no user on node id 0x2020*F
> Apr 12 14:07:03 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start
> CPND_RETENTION timer id = 0x7f86f04fe3a0, arg=0x7f86f0501ef0
> Apr 12 14:07:04 SC-1 osafckptd[27594]: NO cpnd_down_process:: Start
> CPND_RETENTION timer id = 0x7f86f0500cf0, arg=0x7f86f0501ef0
>
> ===========================================================================================================================
>
>
>
> -AVM
>
>
> On 4/12/2017 11:08 AM, A V Mahesh wrote:
>> Hi Hoang,
>>
>> On 2/10/2017 3:09 PM, Vo Minh Hoang wrote:
>>> Dear Mahesh,
>>>
>>> Based on what I saw, in this case, retention time cannot detect CPND
>>> temporarily down because its pid changed.
>> I will check that , I have some test cases based this retention time
>> , not sure how were they working.
>>
>> Can you please provide reproducible steps, I did look at ticket , but
>> looks complex ,
>> if you have any application that reproduces the case please share.
>>
>> -AVM
>>>
>>> If cpnd is temporary down only, we don't need clean up anything.
>>> If cpnd is permanently down, the bad effect of this proposal is that
>>> replica
>>> is not clean up. But if cpnd permanently down, we have to reboot
>>> node for
>>> recovering so I think this cleanup is not really necessary.
>>>
>>> I also checked this implementation with possible test cases and have
>>> not
>>> seen any side effect.
>>> Please consider it.
>>>
>>> Thank you and best regards,
>>> Hoang
>>>
>>> -----Original Message-----
>>> From: A V Mahesh [mailto:[email protected]]
>>> Sent: Friday, February 10, 2017 10:40 AM
>>> To: Hoang Vo <[email protected]>; [email protected]
>>> Cc: [email protected]
>>> Subject: Re: [PATCH 1 of 1] cpd: to correct failover behavior of cpsv
>>> [#1765] V5
>>>
>>> Hi Hoang,
>>>
>>> The CPD_CPND_DOWN_RETENTION is to recognize, ether CPND temporarily
>>> down or
>>> permanently down, this is started a CPND is down and based on
>>> cpd_evt_proc_timer_expiry(), cpd recognize that the CPND is complete
>>> down
>>> and do cleanup, else cpnd rejoined with in
>>> CPD_CPND_DOWN_RETENTION_TIME ,
>>> the CPD_CPND_DOWN_RETENTION is stoped.
>>>
>>> If we stop CPD_CPND_DOWN_RETENTION timer in cpd_process_cpnd_dow(),
>>> do cpd
>>> recognize the CPD permanently down, the cpd_process_cpnd_dow() being
>>> called
>>> in multiple flows, can you please check all the flows, is stopping
>>> CPD_CPND_DOWN_RETENTION timer has any impact ?
>>>
>>> -AVM
>>>
>>> On 2/9/2017 1:35 PM, Hoang Vo wrote:
>>>> src/ckpt/ckptd/cpd_proc.c | 11 ++++++++++-
>>>> 1 files changed, 10 insertions(+), 1 deletions(-)
>>>>
>>>>
>>>> problem:
>>>> In case failover multiple times, the cpnd is down for a moment so
>>>> there is no cpnd opening specific checkpoint. This lead to
>>>> retention timer
>>> is trigger.
>>>> When cpnd is up again but has different pid so retention timer is not
>>> stoped.
>>>> Repica is deleted at retention while its information still be in ckpt
>>> database.
>>>> That cause problem
>>>>
>>>> Fix:
>>>> - Stop timer of removed node.
>>>> - Update data in patricia trees (for retention value consistence).
>>>>
>>>> diff --git a/src/ckpt/ckptd/cpd_proc.c b/src/ckpt/ckptd/cpd_proc.c
>>>> --- a/src/ckpt/ckptd/cpd_proc.c
>>>> +++ b/src/ckpt/ckptd/cpd_proc.c
>>>> @@ -679,7 +679,8 @@ uint32_t cpd_process_cpnd_down(CPD_CB *c
>>>> cpd_cpnd_info_node_find_add(&cb->cpnd_tree, cpnd_dest,
>>>> &cpnd_info,
>>> &add_flag);
>>>> if (!cpnd_info)
>>>> return NCSCC_RC_SUCCESS;
>>>> -
>>>> + /* Stop timer before processing down */
>>>> + cpd_tmr_stop(&cpnd_info->cpnd_ret_timer);
>>>> cref_info = cpnd_info->ckpt_ref_list;
>>>> while (cref_info) {
>>>> @@ -989,6 +990,14 @@ uint32_t cpd_proc_retention_set(CPD_CB *
>>>> /* Update the retention Time */
>>>> (*ckpt_node)->ret_time = reten_time;
>>>> + (*ckpt_node)->attributes.retentionDuration = reten_time;
>>>> +
>>>> + /* Update the related patricia tree */
>>>> + CPD_CKPT_MAP_INFO *map_info = NULL;
>>>> + cpd_ckpt_map_node_get(&cb->ckpt_map_tree,
>>>> (*ckpt_node)->ckpt_name,
>>> &map_info);
>>>> + if (map_info) {
>>>> + map_info->attributes.retentionDuration = reten_time;
>>>> + }
>>>> return rc;
>>>> }
>>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel