Hi Nhat,

Please see below for my comments  tagged with [AVM]

-AVM

On 12/2/2015 12:33 PM, Nhat Pham wrote:
>
> Hi Mahesh,
>
> The ticket #1615 and 1616 report 2 different problems although the 
> steps to reproduce the problem are quite similar.
>
> The problem scenarios are quite tricky I think. J
>
> Please see below for my comments.
>
> Best regards,
>
> Nhat Pham
>
> *From:*A V Mahesh [mailto:[email protected]]
> *Sent:* Wednesday, December 2, 2015 12:04 PM
> *To:* Nhat Pham <[email protected]>; [email protected]
> *Cc:* [email protected]
> *Subject:* Re: [PATCH 1 of 1] cpsv: cpd broadcasts 
> CPND_EVT_D2ND_CKPT_RDSET with STOP [#1615]
>
> Hi Nhat,
>
> On 12/2/2015 7:42 AM, Nhat Pham wrote:
>
>     Problem:
>
>     >>>--------
>
>     >>>A non-collocated checkpoint is firstly created on SC-2. Then the
>
>     >>>checkpoint is closed on SC-2.
>
>     >>>The CPD broadcasts CPND_EVT_D2ND_CKPT_RDSET with START to start
>
>     >>>retention duration timer on CPND because there is no user. During
>
>     >>>that time the checkpoint is opened again and using on PL-3.
>
>     >>>After retention duration, the checkpoint is destroyed on both SC-1
>
>     >>>and SC-2.
>
>
> I interpreted  `again  using on PL-3`  as the PL-3 was also opened the 
> non-collocated checkpoint at fist time it self both  SC-2  & PL-3  
> (opened/closed)   in sequence
> so I thought  PL3 CPND already has the database of checkpoint , and 
> then  we are trying to re-open the ckpt again on PL-3.
>
> In  ticket #1616  reproducible step,  it was mentioned PL-3 was also 
> opened ckpt , so I carried out the same test
> and interpreted   the #1615   & #1616  test cases are same   except 
> the UNLINK is called ,  so  I was suggesting
> to address both together  in one go with and with out  Unlink called.
>
> Now I got it they are NOT related  testcases.  my current 
> understanding test cases of
> of  is Ticket #1616  & Ticket #1615 is as follows  and you addressed 
> #1615 in this patch , please confirm :
>
> ==========================================
> Ticket #1616  - Non-collocated ckpt
>
> S1. Create a checkpoint on SC-2     ( OpenFlags with 
> SA_CKPT_CHECKPOINT_CREATE ) /success/
> S2. Close the checkpoint on SC-2   ( retention timer starts because 
> of  simple Close ) /success/
> S3. Open  checkpoint on PL-3 /success/
> S4. Unlink the checkpoint on PL-3 /success/
> S5. Close the checkpoint on PL-3 /success/
> /
> After this replicas will be deleted immediately on SC-1 & SC-2 ( no 
> retention timer starts because of  Unlink called ) ,
> the subsequent  checkpoint Create with same name on  any node 
> (PL-3/SC-2/SC-1) ///
>
> [Nhat Pham] Actually, the replicas on SC-1 and SC-2 are not deleted 
> immediately after this step. This leads the problem after S6.
>
[AVM]   If no process has the checkpoint open (S2 & S5  closed 
checkpoint )  when saCkptCheckpointUnlink() is
invoked, the checkpoint is immediately deleted.

I hope in your test case case  after the S5  &  before  S6 no process 
has the checkpoint opened  , so we need to fix this  issue
fist.  this fix will fix IMM objects are not created issue.

>
> S6. Create the checkpoint on PL-3  ( OpenFlags with  
> SA_CKPT_CHECKPOINT_CREATE )
>
> /checkpoint is created  successfully with replicas BUT the IMM objects 
> is not created imm database ///
>
> [Nhat Pham] It should be “Create the checkpoint on SC-2” (not PL-3).
>
> Actually, the checkpoint is not created in this case because the CPND 
> on SC-2 finds that checkpoint exists.
>
> It just informs the CPD with CPD_EVT_ND2D_CKPT_USR_INFO. Thus the IMM 
> objects are not created.
>
>
[AVM]  This shouldn't  happen if no process has the checkpoint open when 
saCkptCheckpointUnlink() is invoked
>
> ==========================================
>
>
> ==========================================
> Ticket #1615  - Non-collocated ckpt
>
> S1. Create a checkpoint on SC-1     ( OpenFlags with 
> SA_CKPT_CHECKPOINT_CREATE ) /success/
> S2. Open a checkpoint on SC-2 /success/
> S3. Close the checkpoint on SC-1 /success/
> S4. Close the checkpoint on SC-2 /success   ( /After this replicas 
> Still exist on SC-1 & SC-2 and the retention timer started )
>
> [Nhat Pham] S2 and S4 might not be necessary to trigger the retention 
> timer started.
>
> S5. Open  checkpoint on PL-3 /success /( retention timer still running 
> on SC-1 & SC-2 it was not stopped ) ,
> S6. Create Section   Failed    with  SA_AIS_ERR_NOT_EXIST
>
> The reason for the  Section  Create SA_AIS_ERR_NOT_EXIST  is the  
> retention timer SC-1 & SC-2  was not stopped even after  step 5 (S5),
> the subsequent  checkpoint Create with same name on  any node 
> (PL-3/SC-2/SC-1)
>
> [Nhat Pham] regarding “the subsequent  checkpoint Create with same 
> name on  any node (PL-3/SC-2/SC-1) ”, do you mean that:
>
> The checkpoint with same name can be re-created on any node after this 
> step.
>
[AVM]  I mean  Open same  checkpoint  ( S5. Open  checkpoint on PL-3 ) , 
while  retention timer running we can reopen the checkpoint
              then it will stop retention timer.
>
>
>
> ==========================================
>
> -AVM
>
> On 12/2/2015 7:42 AM, Nhat Pham wrote:
>
>     Hi Mahesh,
>
>       
>
>     I'm not clear about the your proposal below. Could you please help to make
>
>     it clearer? Thanks.
>
>       
>
>     My understanding about the existing implementation:
>
>     In case the non-collocated checkpoint exist on controllers, when the
>
>     checkpoint is opened on PL first time
>
>     (i.e the PL doesn't know that if the checkpoint exist and the cp_node
>
>     doesn't exist in CPND database)
>
>     the cpnd on PL sends CPD_EVT_ND2D_CKPT_CREATE to CPD to create the
>
>     checkpoint.
>
>     The CPD finds the checkpoint existing so it returns the message
>
>     CPND_EVT_D2ND_CKPT_INFO with create_replica == false.
>
>     The CPND updates its database with new checkpoint node without creating a
>
>     replica on PL.
>
>       
>
>     Dec  1  9:03:35.780616 osafckptnd [468:cpsv_evt.c:2199] TR cpnd <<==
>
>     CPND_EVT_A2ND_CKPT_OPEN(hdl=1, safCkpt=test3) from node 0x2030F
>
>     Dec  1  9:03:35.780874 osafckptnd [468:cpsv_evt.c:2195] TR cpnd ==>>
>
>     CPD_EVT_ND2D_CKPT_CREATE(safCkpt=test3, creationFlags=0x2) to CPD
>
>     Dec  1  9:03:35.782806 osafckptnd [468:cpsv_evt.c:2201] TR cpnd <<== [3]
>
>     CPND_EVT_D2ND_CKPT_INFO(err=1, active=0x2020F, create_rep=false) from CPD
>
>       
>
>     So, the flow in this case is:
>
>     cpnd_evt_proc_ckpt_open() --> CPD_EVT_ND2D_CKPT_CREATE  -->
>
>     cpd_evt_proc_ckpt_create()
>
>       
>
>     Best regards,
>
>     Nhat Pham
>
>       
>
>     -----Original Message-----
>
>     From: A V Mahesh [mailto:[email protected]]
>
>     Sent: Tuesday, December 1, 2015 5:51 PM
>
>     To: Nhat Pham<[email protected]> 
> <mailto:[email protected]>;[email protected] 
> <mailto:[email protected]>
>
>     Cc:[email protected]
>     <mailto:[email protected]>
>
>     Subject: Re: [PATCH 1 of 1] cpsv: cpd broadcasts CPND_EVT_D2ND_CKPT_RDSET
>
>     with STOP [#1615]
>
>       
>
>     Hi,
>
>     we need to check wha  cpnd_ckpt_node_find_by_name() is returning on PL-3 
> if
>
>     a no-collocated ckpt replicas exist  on controller with unlinked ,
>
>       
>
>     If it returns null we also need to find any  non-collated replica exist on
>
>     Controller nodes , while opening  a  checkpoint from PL-3, We are not
>
>     suppose to create new Replica on PL-3 if replica exist on controllers ( 
> sc-1
>
>     & sc-2  )
>
>       
>
>     -AVM
>
>       
>
>     On 12/1/2015 3:47 PM, A V Mahesh wrote:
>
>         Hi ,
>
>           
>
>         We may need to handle  else condition of  below with
>
>         `cp_node->is_unlink == true` case  in function
>
>         cpnd_evt_proc_ckpt_open()
>
>           
>
>         `if(((cp_node = cpnd_ckpt_node_find_by_name(cb, ckpt_name)) != NULL)
>
>         && cp_node->is_unlink == false) {`
>
>           
>
>         -AVM
>
>           
>
>         On 12/1/2015 3:25 PM, A V Mahesh wrote:
>
>             Hi ,
>
>               
>
>             The approach of  stopping  existing ckpt is different , it should 
> be
>
>             through
>
>               
>
>             cpnd_evt_proc_ckpt_open() --> cpnd_send_ckpt_usr_info_to_cpd -->
>
>             CPD_EVT_ND2D_CKPT_USR_INFO --> cpd_evt_proc_ckpt_usr_info() So 
> please
>
>             do change based on this flow  in
>
>             cpd_evt_proc_ckpt_usr_info() and republish the patch .
>
>               
>
>               
>
>             -AVM
>
>               
>
>               
>
>             On 12/1/2015 12:25 PM, Nhat Pham wrote:
>
>                 osaf/libs/common/cpsv/include/cpd_proc.h |   2 ++
>
>                    osaf/services/saf/cpsv/cpd/cpd_evt.c     |   8 +++++++-
>
>                    osaf/services/saf/cpsv/cpd/cpd_proc.c    |  22 
> ++++++++++++++++++++++
>
>                    3 files changed, 31 insertions(+), 1 deletions(-)
>
>                   
>
>                   
>
>                 Problem:
>
>                 --------
>
>                 A non-collocated checkpoint is firstly created on SC-2. Then 
> the
>
>                 checkpoint is closed on SC-2.
>
>                 The CPD broadcasts CPND_EVT_D2ND_CKPT_RDSET with START to 
> start
>
>                 retention duration timer on CPND because there is no user. 
> During
>
>                 that time the checkpoint is opened again and using on PL-3.
>
>                 After retention duration, the checkpoint is destroyed on both 
> SC-1
>
>                 and SC-2.
>
>                   
>
>                 Solution:
>
>                 ---------
>
>                 The problem happens because the CPD doesn't broadcasts
>
>                 CPND_EVT_D2ND_CKPT_RDSET with STOP when the checkpoint is 
> opened
>
>                 again on PL-3. The CPD is updated to broadcasts
>
>                 CPND_EVT_D2ND_CKPT_RDSET with STOP when the checkpoint is 
> opened
>
>                 again.
>
>                   
>
>                 diff --git a/osaf/libs/common/cpsv/include/cpd_proc.h
>
>                 b/osaf/libs/common/cpsv/include/cpd_proc.h
>
>                 --- a/osaf/libs/common/cpsv/include/cpd_proc.h
>
>                 +++ b/osaf/libs/common/cpsv/include/cpd_proc.h
>
>                 @@ -71,6 +71,8 @@ uint32_t cpd_proc_retention_set(CPD_CB *
>
>                    uint32_t cpd_proc_unlink_set(CPD_CB *cb, CPD_CKPT_INFO_NODE
>
>                 **ckpt_node,
>
>                                       CPD_CKPT_MAP_INFO *map_info, SaNameT 
> *ckpt_name);
>
>                    +void cpd_proc_broadcast_RDSET_STOP(SaCkptCheckpointHandleT
>
>                 ckpt_id, CPD_CB *cb);
>
>                 +
>
>                    void cpd_cb_dump(void);
>
>                      uint32_t cpd_mbcsv_chgrole(CPD_CB *cb); diff --git
>
>                 a/osaf/services/saf/cpsv/cpd/cpd_evt.c
>
>                 b/osaf/services/saf/cpsv/cpd/cpd_evt.c
>
>                 --- a/osaf/services/saf/cpsv/cpd/cpd_evt.c
>
>                 +++ b/osaf/services/saf/cpsv/cpd/cpd_evt.c
>
>                 @@ -355,8 +355,14 @@ static uint32_t cpd_evt_proc_ckpt_create
>
>                        }
>
>                        if (is_first_rep)
>
>                            TRACE_2("cpd ckpt create success for first replica
>
>                 ckpt_id:%llx,dest :%"PRIu64,map_info->ckpt_id,sinfo->dest);
>
>                 -    else
>
>                 +    else
>
>                            TRACE_2("cpd ckpt create success ckpt_id:%llx,dest
>
>                 :%"PRIu64,map_info->ckpt_id,sinfo->dest);
>
>                 +
>
>                 +
>
>                 +    /* In case the first user re-creates the existing
>
>                 non-collocated checkpoint. All CPND should stop RD timer */
>
>                 +    if ((is_first_rep == false) &&
>
>                 (!(map_info->attributes.creationFlags &
>
>                 SA_CKPT_CHECKPOINT_COLLOCATED)))
>
>                 +        if (ckpt_node->num_users == 1)
>
>                 + cpd_proc_broadcast_RDSET_STOP(ckpt_node->ckpt_id, cb);
>
>                          TRACE_LEAVE();
>
>                        return proc_rc;
>
>                 diff --git a/osaf/services/saf/cpsv/cpd/cpd_proc.c
>
>                 b/osaf/services/saf/cpsv/cpd/cpd_proc.c
>
>                 --- a/osaf/services/saf/cpsv/cpd/cpd_proc.c
>
>                 +++ b/osaf/services/saf/cpsv/cpd/cpd_proc.c
>
>                 @@ -1251,3 +1251,25 @@ uint32_t 
> cpd_ckpt_reploc_imm_object_dele
>
>                        }
>
>                        return NCSCC_RC_SUCCESS;
>
>                    }
>
>                 +
>
>                 
> +/******************************************************************
>
>                 +************************
>
>                   
>
>                 + * Name          : cpd_proc_broadcast_RDSET_STOP
>
>                 + *
>
>                 + * Description   : This routine broadcast message
>
>                 CPND_EVT_D2ND_CKPT_RDSET with STOP
>
>                 + *
>
>                 + * Return Values : None
>
>                 + *
>
>                 + * Notes         : None
>
>                 
> +*******************************************************************
>
>                 +***********************/
>
>                   
>
>                 +
>
>                 +void cpd_proc_broadcast_RDSET_STOP(SaCkptCheckpointHandleT 
> ckpt_id,
>
>                 CPD_CB *cb)
>
>                 +{
>
>                 +    CPSV_EVT send_evt;
>
>                 +
>
>                 +    memset(&send_evt, 0, sizeof(CPSV_EVT));
>
>                 +    send_evt.type = CPSV_EVT_TYPE_CPND;
>
>                 +    send_evt.info.cpnd.type = CPND_EVT_D2ND_CKPT_RDSET;
>
>                 +    send_evt.info.cpnd.info.rdset.ckpt_id = ckpt_id;
>
>                 +    send_evt.info.cpnd.info.rdset.type = 
> CPSV_CKPT_RDSET_STOP;
>
>                 +    cpd_mds_bcast_send(cb, &send_evt, NCSMDS_SVC_ID_CPND); }
>
>               
>
>           
>
>       
>
>       
>

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to