Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-18 Thread Zoran Milinkovic
Hi Hoang, Ack from me. Thanks, Zoran -Original Message- From: Vo Minh Hoang [mailto:hoang.m...@dektech.com.au] Sent: den 14 april 2017 10:44 To: 'A V Mahesh' ; Zoran Milinkovic Cc: opensaf-devel@lists.sourceforge.net; 'Ramesh

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-14 Thread A V Mahesh
Hi Hoang, ACK , you can push. >>So I will continue checking it in separate ticket. Please create a ticket for tracking. -AVM On 4/14/2017 2:14 PM, Vo Minh Hoang wrote: > Dear Mahesh, > > Thank you for your comments. > I add 2 of my ideals inline, please find [Hoang] tags. > > Dear Zoran, >

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-14 Thread Vo Minh Hoang
Dear Mahesh, Thank you for your comments. I add 2 of my ideals inline, please find [Hoang] tags. Dear Zoran, Do you have any extra comment about this patch? If not, I will request pushing it at start of next week. Sincerely, Hoang -Original Message- From: A V Mahesh

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-13 Thread A V Mahesh
Hi Hoang, ACK with following : ( tested basic ND restarts) - The below errors are not related this patch, those are test case related - It look their a existing issue ( not related to this patch ) on Cpnd down the STANDBY Cpd is also starting `cpd_tmr_start(_info->cpnd_ret_timer,..);`

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-12 Thread Vo Minh Hoang
Dear Mahesh, Sorry when it takes time to recall some long lost information. Bellowing is the reproduce steps in newest source code: - create some non-collocated checkpoints in SC-1 - make failover occur by pkill -9 amfd - do that again 4 time with active SC - check /run/shm found that all

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-12 Thread A V Mahesh
Hi Hoang, On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: > If cpnd is temporary down only, we don't need clean up anything. > If cpnd is permanently down, the bad effect of this proposal is that replica > is not clean up. But if cpnd permanently down, we have to reboot node for > recovering so I

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-04-11 Thread A V Mahesh
Hi Hoang, On 2/10/2017 3:09 PM, Vo Minh Hoang wrote: > Dear Mahesh, > > Based on what I saw, in this case, retention time cannot detect CPND > temporarily down because its pid changed. I will check that , I have some test cases based this retention time , not sure how were they working. Can you

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-02-10 Thread Vo Minh Hoang
Dear Mahesh, Based on what I saw, in this case, retention time cannot detect CPND temporarily down because its pid changed. If cpnd is temporary down only, we don't need clean up anything. If cpnd is permanently down, the bad effect of this proposal is that replica is not clean up. But if cpnd

Re: [devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-02-09 Thread A V Mahesh
Hi Hoang, The CPD_CPND_DOWN_RETENTION is to recognize, ether CPND temporarily down or permanently down, this is started a CPND is down and based on cpd_evt_proc_timer_expiry(), cpd recognize that the CPND is complete down and do cleanup, else cpnd rejoined with in

[devel] [PATCH 1 of 1] cpd: to correct failover behavior of cpsv [#1765] V5

2017-02-09 Thread Hoang Vo
src/ckpt/ckptd/cpd_proc.c | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) problem: In case failover multiple times, the cpnd is down for a moment so there is no cpnd opening specific checkpoint. This lead to retention timer is trigger. When cpnd is up again but has