Hi Mahesh,

 

I’ll include the fix in the new patch for this ticket and send it to you for
review.

 

Best regards,

Nhat Pham

 

From: A V Mahesh [mailto:[email protected]] 
Sent: Tuesday, January 5, 2016 10:46 AM
To: Nhat Pham <[email protected]>;
[email protected]
Subject: Re: [devel] [PATCH 1 of 1] cpsv: improve handling unlink and close
non-collocated checkpoint [#1616]

 

Hi Nhat Pham,

On 1/5/2016 8:19 AM, Nhat Pham wrote:

But the replica IMM objects are not created. This is because when unlink the
checkpoint, the CPD doesn’t delete relating nodes from ckpt_reploc_tree.

This should be fixed too.


 We will do in a New ticket ?  or you would like to adders in this ticket it
self 

-AVM

On 1/5/2016 8:19 AM, Nhat Pham wrote:

Hi Mahesh,

For the case you described below, a new replica is created with same name.

root@PL-3:~# ll /run/shm

total 696

drwxrwxrwt  2 root root    100 Jan  5 09:02 ./

drwxr-xr-x 11 root root    440 Jan  5 09:01 ../

-rw-r--r--  1 root root 704008 Jan  5 09:00
opensaf_CPND_CHECKPOINT_INFO_131855

-rw-r--r--  1 root root   8872 Jan  5 09:00 opensaf_safCkpt=tes_131855_1

-rw-r--r--  1 root root   8872 Jan  5 09:02 opensaf_safCkpt=tes_131855_2

But the replica IMM objects are not created. This is because when unlink the
checkpoint, the CPD doesn’t delete relating nodes from ckpt_reploc_tree.

This should be fixed too.

root@PL-3:~# immfind | grep safCkpt

safApp=safCkptService

safCkpt=test   ç=============== No Replica IMM object is created

Best regards,

Nhat Pham

==== Create checkpoint safCkpt=test

root@PL-3:~# immfind | grep safCkpt

safApp=safCkptService

safCkpt=test

safReplica=safNode=PL-3\,safCluster=myClmCluster,safCkpt=test

safReplica=safNode=SC-1\,safCluster=myClmCluster,safCkpt=test

safReplica=safNode=SC-2\,safCluster=myClmCluster,safCkpt=test

root@PL-3:~# ll /run/shm/

total 692

drwxrwxrwt  2 root root     80 Jan  5 09:00 ./

drwxr-xr-x 11 root root    440 Jan  5 09:01 ../

-rw-r--r--  1 root root 704008 Jan  5 09:00
opensaf_CPND_CHECKPOINT_INFO_131855

-rw-r--r--  1 root root   8872 Jan  5 09:00 opensaf_safCkpt=tes_131855_1

==== Unlink checkpoint safCkpt=test – checkpoint is still being used.

root@PL-3:~# ll /run/shm/

total 692

drwxrwxrwt  2 root root     80 Jan  5 09:00 ./

drwxr-xr-x 11 root root    440 Jan  5 09:01 ../

-rw-r--r--  1 root root 704008 Jan  5 09:00
opensaf_CPND_CHECKPOINT_INFO_131855

-rw-r--r--  1 root root   8872 Jan  5 09:00 opensaf_safCkpt=tes_131855_1

root@PL-3:~# immfind | grep safCkpt

safApp=safCkptService

==== Create a new checkpoint with same name safCkpt=test

root@PL-3:~# immfind | grep safCkpt

safApp=safCkptService

safCkpt=test   ç=============== No Replica IMM object is created

root@PL-3:~# ll /run/shm

total 696

drwxrwxrwt  2 root root    100 Jan  5 09:02 ./

drwxr-xr-x 11 root root    440 Jan  5 09:01 ../

-rw-r--r--  1 root root 704008 Jan  5 09:00
opensaf_CPND_CHECKPOINT_INFO_131855

-rw-r--r--  1 root root   8872 Jan  5 09:00 opensaf_safCkpt=tes_131855_1

-rw-r--r--  1 root root   8872 Jan  5 09:02 opensaf_safCkpt=tes_131855_2

-----Original Message-----
From: A V Mahesh [mailto:[email protected]]
Sent: Monday, January 4, 2016 11:29 AM
To: [email protected]
<mailto:[email protected]> 
Subject: Re: [devel] [PATCH 1 of 1] cpsv: improve handling unlink and close
non-collocated checkpoint [#1616]

Hi Nhat Pham,

As this patch fixed the issue of `unlinked checkpoint is deleted although
there is a client using the checkpoint`

Can you please verify following  subsequent case which is also to be
addressed :

Say  multiple application opened  checkpoint  and one application closed
with unlink , still some application are uniting the checkpoint  , at that
moment  the checkpoint is re-created by specifying in an open call the
SA_CKPT_CHECKPOINT_CREATE flag and the same name of the checkpoint which is
to be unlinked but checkpoint is possibly not yet finally deleted.

According to CPSV specification A new instance of the checkpoint is created
while the old instance of the checkpoint is possibly not yet finally
deleted.

this means , at CPND  while   open called the SA_CKPT_CHECKPOINT_CREATE 

flag and the same name of the checkpoint where  checkpoint  exist with same
name  and unlink is already marked for that old instence CPND should create
a New of  checkpoint/ replica ( this mean at a given point of time two
instance of cktp/replicas exist on cluster)

 

-AVM

On 12/9/2015 11:48 AM, A V Mahesh wrote:

> Hi,

> 

> Ok got it , I will review both cases

> 

> -AVM

> 

> On 12/9/2015 11:34 AM, Nhat Pham wrote:

>> Hi Mahesh,

>> 

>> For 'several problem', I mean 3 use cases where:

>> 1,2 : the checkpoint replicas are not deleted immediately even no 

>> client exists

>> 3: the checkpoint is deleted although there is a client using the 

>> checkpoint.

>> 

>> The patch only addresses the problem in these 3 use cases.

>> 

>> Best regards,

>> Nhat Pham

>> 

>> -----Original Message-----

>> From: A V Mahesh [mailto:[email protected]]

>> Sent: Wednesday, December 9, 2015 12:21 PM

>> To: Nhat Pham <[email protected] <mailto:[email protected]>
>; [email protected] <mailto:[email protected]> 

>> Cc: [email protected]
<mailto:[email protected]> 

>> Subject: Re: [PATCH 1 of 1] cpsv: improve handling unlink and close 

>> non-collocated checkpoint [#1616]

>> 

>> Hi Nhat

>> 

>>    >>There are several problems relating to closing and unlinking 

>> non-collocated checkpoint.

>> 

>> I can see only one problem unlinked non-collocated checkpoint is not 

>> getting deleted immediate even No client exist for that non-collocated
checkpoint.

>> 

>> I see 1,2 ,3 are  use-case of  non-collocated checkpoint , in all 

>> cases the the non-collocated checkpoint is not getting deleted 

>> immediately is that you mean by `several problems` ?

>> 

>> Please let me know is any other portable exist and it is being 

>> addressed in this patch , so that I can look the patch in that point of
view as well .

>> 

>> -AVM

>> 

>> 

>> On 12/9/2015 8:06 AM, Nhat Pham wrote:

>>>     osaf/services/saf/cpsv/cpnd/cpnd_evt.c  |  51
+++++++++++++------------

>>>     osaf/services/saf/cpsv/cpnd/cpnd_proc.c |  66

>> ++++++++++++++++++--------------

>>>     2 files changed, 64 insertions(+), 53 deletions(-)

>>> 

>>> 

>>> Problem:

>>> --------

>>> There are several problems relating to closing and unlinking

>> non-collocated checkpoint.

>>> 1. A non-collocated checkpoint is firstly created on SC-2. It is 

>>> closed on

>> SC-2. It is opened on PL-3.

>>> It is unlinked. It is closes on PL-3. The replicas on SCs are not 

>>> destroyed although the checkpoint is unlinked and no client is using it.

>>> 

>>> 2. A non-collocated checkpoint is firstly created on PL-3. It is 

>>> closed on

>> PL-3. It is opened on SC-2.

>>> It is unlinked. It is closes on SC-2. The replicas on SCs and PL-3 

>>> are not destroyed although the checkpoint is unlinked and no client 

>>> is using

>> it.

>>> 3. A non-collocated checkpoint is firstly created on PL-3. It is 

>>> closed on

>> PL-3. It is opened on PL-4.

>>> It is unlinked. The replicas on SCs and PL-3 are destroyed although 

>>> the

>> checkpoint is using on PL-4.

>>> Solution:

>>> ---------

>>> The main cause of above problems is to use checking if 

>>> non-collocated replica is on PL to decide destroying the replicas. 

>>> This mechanism is not correct in some cases. The solution is use 

>>> another mechanism which checks if there is any client using the 

>>> checkpoint on the cluster by

>> verifying if the retention duration timer is active or not.

>>> Test:

>>> -----

>>> Following test cases were executed for both non-collocated and 

>>> collocated checkpoint to verify the solution:

>>> 1. verify_unlink_ckpt_created_on_sc_before_close_it_from_sc

>>> 2. verify_unlink_ckpt_created_on_sc_before_close_it_from_pl

>>> 3. verify_unlink_ckpt_created_on_sc_after_close_it

>>> 4. verify_unlink_ckpt_created_on_pl_before_close_it_from_pl

>>> 5. verify_unlink_ckpt_created_on_pl_before_close_it_from_sc

>>> 6. verify_unlink_ckpt_created_on_pl_before_close_it_from_other_pl

>>> 7. verify_unlink_ckpt_created_on_pl_after_close_it

>>> 

>>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c

>>> b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c

>>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_evt.c

>>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_evt.c

>>> @@ -26,7 +26,7 @@

>>> 

>>>     #include "cpnd.h"

>>> 

>>> -extern uint32_t cpnd_ckpt_non_collocated_rplica_close(CPND_CB *cb, 

>>> CPND_CKPT_NODE *cp_node, SaAisErrorT *error);

>>> +extern uint32_t cpnd_proc_rdset_start(CPND_CB *cb, CPND_CKPT_NODE 

>>> +*cp_node);

>>>     extern uint32_t cpnd_proc_non_colloc_rt_expiry(CPND_CB *cb, 

>>> SaCkptCheckpointHandleT ckpt_id);

>>> 

>>>     static uint32_t cpnd_evt_proc_cb_dump(CPND_CB *cb); @@ -1194,8

>>> +1194,7 @@ static uint32_t cpnd_evt_proc_ckpt_unlin

>>> 

>> /********************************************************************

>> *******

>> *

>>>      * Name          : cpnd_evt_proc_ckpt_unlink_info

>>>      *

>>> - * Description   : Function to process check point unlink

>>> - *                 from Applications.

>>> + * Description   : Function to process checkpoint unlink event from CPD

>>>      *

>>>      * Arguments     : CPND_CB *cb - CPND CB pointer

>>>      *                 CPSV_EVT *evt - Received Event structure

>>> @@ -1209,10 +1208,11 @@ static uint32_t cpnd_evt_proc_ckpt_unlin

>>>     {

>>>             uint32_t rc = NCSCC_RC_SUCCESS;

>>>             CPND_CKPT_NODE *cp_node = NULL;

>>> -   SaAisErrorT error;

>>> +   SaAisErrorT error = SA_AIS_OK;

>>>             CPSV_SEND_INFO sinfo_cpa;

>>>             CPSV_EVT send_evt;

>>>             bool sinfo_cpa_flag = false;

>>> +   bool destroy_replica = false;

>>> 

>>>             TRACE_ENTER();

>>>             memset(&send_evt, '\0', sizeof(CPSV_EVT)); @@ -1220,25
+1220,35 

>>> @@ static uint32_t cpnd_evt_proc_ckpt_unlin

>>>             if (cp_node == NULL) {

>>>                     TRACE_4("cpnd ckpt node get failed for

>> ckpt_id:%llx",evt->info.ckpt_ulink.ckpt_id);

>>>                     rc = NCSCC_RC_FAILURE;

>>> -           send_evt.info.cpa.info.ulinkRsp.error =

>> SA_AIS_ERR_NOT_EXIST;

>>> +           error = SA_AIS_ERR_NOT_EXIST;

>>>                     goto agent_rsp;

>>>             }

>>> 

>>>             sinfo_cpa = cp_node->cpa_sinfo;

>>>             sinfo_cpa_flag = cp_node->cpa_sinfo_flag;

>>> +

>>>             if (cp_node->is_close == true) {

>>> -           send_evt.info.cpa.info.ulinkRsp.error = SA_AIS_OK;

>>> +           /* For non-collocated checkpoint if retention duration timer

>> is active

>>> +            * (i.e the checkpoint is not opened by any client in

>> cluster) the replica

>>> +            * should be destroyed in this case */

>>> +           if

>> (!m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags)

>> ) {

>>> +                   if (cp_node->ret_tmr.is_active) {

>>> +                           TRACE_1("cpnd destroy replica ckpt_id:%llx -

>> No client opens the non-collocated checkpoint ",

>>> +                                   cp_node->ckpt_id);

>>> +                           destroy_replica = true;

>>> +                   }

>>> +           }

>>> +           /* For collocated checkpoint, there is no client opening the

>> checkpoint on this

>>> +            * node. The replica should be destroyed. */

>>> +           else

>>> +                   destroy_replica = true;

>>> +   }

>>> +

>>> +   if (destroy_replica == true) {

>>>                     /* check timer is present,if yes...stop the timer
and

>> destroy shm_info and the node */

>>>                     if (cp_node->ret_tmr.is_active)

>>>                             cpnd_tmr_stop(&cp_node->ret_tmr);

>>> 

>>> -           if

>> (!m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags)

>> ) {

>>> -                   if

>> (cpnd_is_noncollocated_replica_present_on_payload(cb, cp_node)) {

>>> -                           rc = NCSCC_RC_SUCCESS;

>>> -                           goto agent_rsp;

>>> -                   }

>>> -           }

>>> -

>>>                     rc = cpnd_ckpt_replica_destroy(cb, cp_node, &error);

>>>                     if (rc == NCSCC_RC_FAILURE) {

>>>                             TRACE_4("cpnd ckpt replica destroy failed
for

>> ckpt_id:%llx,error

>>> %u",cp_node->ckpt_id, error); @@ -1260,8 +1270,6 @@ static uint32_t 

>>> cpnd_evt_proc_ckpt_unlin

>>> 

>>>                     }

>>>                     TRACE_4("cpnd proc ckpt unlink set for 

>>> ckpt_id:%llx",cp_node->ckpt_id);

>>> -

>>> -           send_evt.info.cpa.info.ulinkRsp.error = SA_AIS_OK;

>>>             }

>>> 

>>>      agent_rsp:

>>> @@ -1269,6 +1277,7 @@ static uint32_t cpnd_evt_proc_ckpt_unlin

>>>             if (sinfo_cpa_flag == 1) {

>>>                     send_evt.type = CPSV_EVT_TYPE_CPA;

>>>                     send_evt.info.cpa.type =
CPA_EVT_ND2A_CKPT_UNLINK_RSP;

>>> +           send_evt.info.cpa.info.ulinkRsp.error = error;

>>>                     rc = cpnd_mds_send_rsp(cb, &sinfo_cpa, &send_evt);

>>> 

>>>             }

>>> @@ -1767,7 +1776,6 @@ static uint32_t cpnd_evt_proc_ckpt_activ

>>>     static uint32_t cpnd_evt_proc_ckpt_rdset_info(CPND_CB *cb, 

>>> CPND_EVT

>> *evt, CPSV_SEND_INFO *sinfo)

>>>     {

>>>             CPND_CKPT_NODE *cp_node = NULL;

>>> -   SaAisErrorT error = SA_AIS_OK;

>>> 

>>>             TRACE_ENTER();

>>>             /* get cp_node from ckpt_info_db */ @@ -1791,14 +1799,9 @@ 

>>> static uint32_t cpnd_evt_proc_ckpt_rdset

>>>             }

>>> 

>>>             if (evt->info.rdset.type == CPSV_CKPT_RDSET_START) {

>>> -           if

>> (!m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags)

>> ) {

>>> -                   if (cpnd_ckpt_non_collocated_rplica_close(cb,

>> cp_node, &error) == NCSCC_RC_FAILURE) {

>>> -                           TRACE_4("cpnd ckpt relica close failed for

>> client_hdl:%llx,ckpt_id:%llx",evt->info.closeReq.client_hdl,

>> cp_node->ckpt_id);

>>> -

>>> -                   }

>>> -                   TRACE_LEAVE();

>>> -                   return NCSCC_RC_SUCCESS;

>>> -           }

>>> +           cpnd_proc_rdset_start(cb, cp_node);

>>> +           TRACE_LEAVE();

>>> +           return NCSCC_RC_SUCCESS;

>>>             }

>>> 

>>>             /* if timer already started on one of the node then what to
do!!!

>>> diff --git a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c

>>> b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c

>>> --- a/osaf/services/saf/cpsv/cpnd/cpnd_proc.c

>>> +++ b/osaf/services/saf/cpsv/cpnd/cpnd_proc.c

>>> @@ -2297,53 +2297,61 @@ uint32_t cpnd_ckpt_replica_close(CPND_CB

>>>     }

>>> 

>>> 

>> /********************************************************************

>> *******

>> *************

>>> - * Name          : cpnd_ckpt_non_collocated_rplica_close

>>> + * Name          : cpnd_proc_rdset_start

>>>      *

>>> - * Description   : This is the function close the non_collocated Ckpt

>> Replica

>>> + * Description   : This is the function process the event

>> CPSV_CKPT_RDSET_START

>>> + *                 This event is only applicable for non-collocated

>> checkpoint

>>>      * Arguments     : cb       - CPND Control Block pointer

>>>      *                 cp_node  - pointer to checkpoint node

>>>      *

>>>      * Return Values : NCSCC_RC_SUCCESS/NCSCC_RC_FAILURE

>>> 

>>> ********************************************************************

>>> **

>>> *******************/

>>> 

>>> -uint32_t cpnd_ckpt_non_collocated_rplica_close(CPND_CB *cb, 

>>> CPND_CKPT_NODE *cp_node, SaAisErrorT *error)

>>> +uint32_t cpnd_proc_rdset_start(CPND_CB *cb, CPND_CKPT_NODE 

>>> +*cp_node)

>>>     {

>>>             SaTimeT presentTime;

>>> +   SaAisErrorT error = SA_AIS_OK;

>>>             uint32_t rc = NCSCC_RC_SUCCESS;

>>> 

>>>             TRACE_ENTER();

>>> -   if (cp_node->ckpt_lcl_ref_cnt == 0) {

>>> 

>>> -           cp_node->is_close = true;

>>> -           cpnd_restart_set_close_flag(cb, cp_node);

>>> +   if

>> (m_CPND_IS_COLLOCATED_ATTR_SET(cp_node->create_attrib.creationFlags)) 

>> {

>>> +           TRACE_LEAVE();

>>> +           return NCSCC_RC_SUCCESS;

>>> +   }

>>> 

>>> -           if (cp_node->is_unlink != true &&

>>> -

>> (m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(cp_node->create_attrib.retention

>> Duratio

>> n) != 0)) {

>>> -                   m_GET_TIME_STAMP(presentTime);

>>> -                   cpnd_restart_update_timer(cb, cp_node, presentTime);

>>> +   if (cp_node->ckpt_lcl_ref_cnt != 0) {

>>> +           LOG_ER("cpnd receives CPND_EVT_D2ND_RDSET_INFO with START

>> while ckpt_lcl_ref_cnt = %d", cp_node->ckpt_lcl_ref_cnt);

>>> +           TRACE_LEAVE();

>>> +           return NCSCC_RC_FAILURE;

>>> +   }

>>> 

>>> -                   cp_node->ret_tmr.type =

>> CPND_TMR_TYPE_NON_COLLOC_RETENTION;

>>> -                   cp_node->ret_tmr.uarg = cb->cpnd_cb_hdl_id;

>>> -                   cp_node->ret_tmr.ckpt_id = cp_node->ckpt_id;

>>> -                   cpnd_tmr_start(&cp_node->ret_tmr,

>>> -

>> m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(cp_node->create_attrib.retentionD

>> uration

>> ));

>>> -                   TRACE_1("cpnd ckpt ret tmr success

>> ckpt_id:%llx",cp_node->ckpt_id);

>>> -           } else {

>>> -                   /* Check for Non-Collocated Replica */

>>> -                   if

>> (cpnd_is_noncollocated_replica_present_on_payload(cb, cp_node)) {

>>> -                           return NCSCC_RC_SUCCESS;

>>> -                   }

>>> -                   rc = cpnd_ckpt_replica_destroy(cb, cp_node, error);

>>> -                   if (rc == NCSCC_RC_FAILURE) {

>>> -                           TRACE_4("cpnd ckpt replica destroy failed

>> ckpt_id:%llx",cp_node->ckpt_id);

>>> -                           return NCSCC_RC_FAILURE;

>>> -                   }

>>> -                   TRACE_1("cpnd ckpt replica destroy failed

>> ckpt_id:%llx",cp_node->ckpt_id);

>>> +   cp_node->is_close = true;

>>> +   cpnd_restart_set_close_flag(cb, cp_node);

>>> 

>>> -                   cpnd_restart_shm_ckpt_free(cb, cp_node);

>>> -                   cpnd_ckpt_node_destroy(cb, cp_node);

>>> +   if (cp_node->is_unlink != true &&

>>> +

>> (m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(cp_node->create_attrib.retention

>> Duratio

>> n) != 0)) {

>>> +           m_GET_TIME_STAMP(presentTime);

>>> +           cpnd_restart_update_timer(cb, cp_node, presentTime);

>>> +

>>> +           cp_node->ret_tmr.type = CPND_TMR_TYPE_NON_COLLOC_RETENTION;

>>> +           cp_node->ret_tmr.uarg = cb->cpnd_cb_hdl_id;

>>> +           cp_node->ret_tmr.ckpt_id = cp_node->ckpt_id;

>>> +           cpnd_tmr_start(&cp_node->ret_tmr,

>>> +

>> m_CPSV_CONVERT_SATIME_TEN_MILLI_SEC(cp_node->create_attrib.retentionD

>> uration

>> ));

>>> +           TRACE_1("cpnd ckpt ret tmr success

>> ckpt_id:%llx",cp_node->ckpt_id);

>>> +   } else {

>>> +           rc = cpnd_ckpt_replica_destroy(cb, cp_node, &error);

>>> +           if (rc == NCSCC_RC_FAILURE) {

>>> +                   LOG_ER("cpnd ckpt replica destroy failed

>> ckpt_id:%llx, error:%d",cp_node->ckpt_id, error);

>>> +                   return NCSCC_RC_FAILURE;

>>>                     }

>>> +           TRACE_1("cpnd ckpt replica destroy success 

>>> +ckpt_id:%llx",cp_node->ckpt_id);

>>> +

>>> +           cpnd_restart_shm_ckpt_free(cb, cp_node);

>>> +           cpnd_ckpt_node_destroy(cb, cp_node);

>>>             }

>>> +

>>>             TRACE_LEAVE();

>>>             return NCSCC_RC_SUCCESS;

>>>     }

> 

> ----------------------------------------------------------------------

> -------- _______________________________________________

> Opensaf-devel mailing list

> [email protected]
<mailto:[email protected]> 

> https://lists.sourceforge.net/lists/listinfo/opensaf-devel

 

----------------------------------------------------------------------------
--

_______________________________________________

Opensaf-devel mailing list

[email protected]
<mailto:[email protected]> 

https://lists.sourceforge.net/lists/listinfo/opensaf-devel

 

------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to